Squirro 2.6.0 - Release Notes

We're excited to announce Squirro 2.6.0, released on April 3rd, 2018, based on Elasticsearch 6.2.2 and packed with new features and improvements

Contents

What's in the release?

New Features

  • Never Get Lost Again: add our new map widget to your dashboard see see exactly where your data is coming from.
  • More control over your data: use Pipeline Workflows to specify exactly how each data source should be handled by Squirro
  • Create Dashboards Like a Bosswe’ve overhauled the Dashboard Editor to make it friendlier and far more efficient to use, exposing widget properties in a new panel that will be immediately familiar to users of tools like Adobe Photoshop and Illustrator
  • High Precision Data Model: the new Catalyst Data Model allows you to relate and display items in powerful new ways, and is the foundation for building recommendations.
  • Smart Recommendations: Squirro is now able to recommend items, facets or entities based on multiple models. This is a door opener for new kinds of powerful visualisations.
  • Super Charged Machine Learningalpha: we've added something we call the Machine Learning Service, which is a major milestone in what we call pragmatic AI. The Machine Learning Service allows you to train and run your own ML models on Squirro datasets.
  • First Impressions Count: we added an introductory walkthrough tutorial wizard to make it easier for new users to understand how Squirro works. Try it for yourself at here.
  • Studio Plugins Easier to Install: using the "Plugin Repository" studio plugin (under "Server Management") you can install other plugins, such as custom dataloaders or pipelets, directly from the UI

Improvements

  • Updated to Elasticsearch 6.2.2: see https://www.elastic.co/guide/en/elasticsearch/reference/current/release-notes-6.2.2.html
  • Latest Frontend Dependencies: jQuery 3.3.1, Highcharts 6 and D3 5.0
  • Faster Pipeline: in Squirro 2.5.1 - Birch - Release Notes we introduced Pipeline 2.0. With this release we've made it the default and only data processing Pipeline in Squirro, as well as significantly improving it's performance and stability since 2.5.1
  • Faster Tagging: we improved parallelism in keyword tagging and the filtering service by having the pipeline 2.0 ingester run multiple batches in parallel
  • Multiple Pipelets in Parallel: added support for consume_multiple pipelets in the plumber service
  • Faster De-duplication: we improved the performance of item de-duplication, increasing overall pipeline performance where this step is used
  • Pyrebloom Gone: while updating the deduplication service, we removed the pyrebloom package which was no longer required
  • Bulk Indexing Change: we removed bulk indexing support from Python client. This is now handled automatically be the new pipeline.
  • Improved Self Service Dataloaders: in Squirro 2.5.3 we introduced Self Service dataloaders. We've continued to improve on them in this release including key / value storage for your dataloaders, help texts, boolean fields, password protected fields, automatic field-based mapping and quick facet creation pre filled with field name.
  • Easier to Configure: all asset configuration files, such as custom data loaders, Studio plugins and widgets now support Hjson, the human friendlier form of JSON.
  • Easier CSV and Excel Uploading: we automatically detect character encoding now, making uploading these types of files pain-free
  • Weighted Keywordsbeta: in Squirro 2.5.3 we introduced weighted keywords. These can now be displayed in the UI and Squirro dashboards.
  • Monitoring for nginx: we included an optional nginx monitoring module nginx-module-vts
  • Robuster Topic Server: we made Topic service start up more robust in the event of failure of packaged studio / dataloader plugins
  • Smarter Disk Usage: we switched to logrotating the /var/log/squirro/*/nginx*.log files
  • Monitoring with Prometheus: it's now possible to monitor Squirro installations with Prometheus.
  • Cluster Service always available: in RedHat 7 and CentOS 7 deployments including single-node installations
  • Copy visibility condition: Copy and paste the visibility condition across different dashboard layers

Bug Fixes

  • No longer stop redis when redis cache is reset.
  • Fixed the ability to import items with "almost empty" files and html documents, e.g. those containing only whitespace, comments, or html "processing instructions"
  • Fixed issue with deleting projects on Centos 6
  • Fixed issue where Bulk operations fail silently in case of Elasticsearch errors
  • Fixed issue with language-detection step: AttributeError: 'dict' object has no attribute 'xhtml_utf8'
  • Fixed issue with cleanup step: AttributeError: 'dict' object has no attribute 'clean'
  • Fixed issue where if a user group has a higher role than the user itself, still the lower role is applied
  • Fixed issue with inconsistent highlighting color for abstract, item detail and smartfilter explain
  • Fixed issue where dragging a widget to the page boundary did not initiate page scroll
  • And many more small fixes and improvements.
  • Fixed keyword tagging with ElasticSearch 6.2 (added on April 10, 2018 with release 2.6.0-102)
  • Fixed pipeline workflow overeager enabling of the Noise Removal step and fixed pipeline 2.0 file cleanup to handle data files whose sources have been deleted (added on April 10, 2018 with release 2.6.0-102)
  • Logging improvements to pipeline 2.0, avoid temporary persisting of internal "sq:" fields, fixed tagging of keywords containing spaces (added on April 16, 2018 with release 2.6.0-105)
  • Added mariabdb-server rpm packages and it's dependencies on Centos 7 mirror for offline install (added on April 17, 2018)
  • Fixed reset action for the ingester (added on April 24, 2018)
  • Added keepalive support for Zookeeper and Redis (added on April 24, 2018)
  • Testing, stability, and validation for llibnlp. Added window filter and analyzers (added on April 24, 2018)
  • Fixed Machine Learning Service time representations, and flattened configurations (added on April 24, 2018)
  • Increased RPM dependencies such as "squirro-monit" and "squirro-redis-server" to have high enough "iteration numbers" to be included in upgrades from older Squirro versions (adding on May 14, 2018 with release 2.6.0-114)

Added on May 8, 2018 (Build 112)

  • (SQ-9204) Can not rerun Enrichment.
  • (SQ-9202) Pipelets can not be deleted.
  • (SQ-9192) Project Export/Import broken due to enrichments/workflows.
  • (SQ-9245) Can't edit empty nargs list.
  • (SQ-9203) Can't use space as delimiter in dataloader.
  • (SQ-9200) Pipelet step can not be edited.
  • (SQ-9199) Allow empty config for pipelets.
  • (SQ-9194) Can not upload files in studio.
  • (SQ-9193) Source filter from detail item broken.
  • (SQ-9198) Content Augmentation not saved correctly when adding as new.
  • (SQ-9295) Squirro fails to work with uBlock.

Added on May 16, 2018 (Build 116)

  • Fixed error reporting bug in machine learning service.
  • Added machine learning configuration validation.
  • Fixed hdf5 packaging.
  • Added optional ids for projects, subscriptions, and objects upon creation.
  • Simplified machine learning workflows.

Added on May 25, 2018 (Build 121)

  • Fixed default selection for itemuploader.
  • Fixed recommendations when used in combination with a filter query.
  • Fixed typeahead for weighted keywords.
  • Added separate "train" and "process" datasets in ML service.

Added on May 30, 2018 (Build 125)

  • Fixed hdf5 dependency to allow saving of Keras models.
  • Saner default config for ML service runs.
  • Revert flup version back to version: 1.0.3.dev-20110405
  • Fixed comment typo which causes machinelearning service not to start.

Added on June 6, 2018 (Build 126)

  • Fixed an issue with returning no results for very complex queries under high load
  • Fixed typeahead and aggregation issue for weighted keywords
  • Fixed an issue with multiple spaces in a query

Added on June 14, 2018 (Build 127)

  • Fixed code fields not being populated while editing pipelets
  • Fixed code fields not being populated while editing data sources

Installing and Upgrading

Plan your Upgrade

Squirro 2.6.0 contains two significant changes;

  • Pipeline 2.0 is now the default and only data processing pipeline in Squirro and is based on new and configurable Pipeline Workflows.
  • The provided Elasticsearch version is 6.2.2 (in Squirro 2.5.3 we had Elasticsearch version 5.6.0). This changes means we need to re-index existing Squirro projects, which may require extended periods of downtime.


Fresh Installation Instructions

Please follow the regular installation steps

Upgrade Instructions

Please ensure that your current version is the latest patch of 2.5.3. If you are on a version older than 2.5.3, please contact support.

Upgrading to Squirro 2.6.0 involves two major changes that will consume additional caution and time:

  1. The Pipeline 1.0 service is being fully replaced by Pipeline 2.0. Before the upgrade we recommend that you pause all sources, wait for incoming data to stop arriving, ensure that all keyword taggings have been applied, and only then proceed with the actual upgrade.
  2. Additionally, the upgrade will reindex the v8 elasticsearch indexes to new indexes with template version v9. This can take hours if your Squirro installation contains a large amount of data.

Planned Downtime for Re-indexing

Estimating Your Upgrade Window: we roughly estimate that each 100K documents in a Squirro index take about 1 minute to migrate to the latest version. From the command line you can find out how many documents are in the index from by issuing commands like;

$ # Example of output from Elasticsearch on index status
$ curl http://localhost:9200/_cat/indices
green open squirro_v8_w6vldmrdt4qq6pvfphoeaa        SUKcwIRAQVuG_2sunFpbTQ 3 1   87790    336    3.3gb    1.6gb
green open squirro_v8_sdmv1va9qxw4xovqnl-pxw        hI_5ttd6Rcelc_29vB9JwQ 3 1   25241      0      1gb    547mb
green open squirro_v8_ignlqdyqsta1dqutozh0xg        B6NrCemiTceoRxAM5JpJmg 3 1  528962     66   16.4gb    8.2gb

$ # The command below takes the 7th column from above and sums to the total number of items in the index
$ curl http://localhost:9200/_cat/indices | awk '{sum += $7} END {print sum}'

So estimate with downtime >= (# total documents / 100000) * 60 seconds

Additionally if you are using Squirro in a Box, additional steps are involved. In this case we also ask you to contact support.


 1. Upgrade Storage Nodes and Cluster Nodes collocated on the same machine/VM
CentOS 6 / RHEL 6
# Pause all sources in the user interface

# Ensure the latest 2.5.3 patch release has been applied
STORAGE_NODE_VERSION=$(yum list installed squirro-storage-node | grep squirro-storage-node | sed -e "s/[^ ]* \+//" -e "s/ \+[^ ]*//")
if [ "$STORAGE_NODE_VERSION" \< "2.5.3-4109" ]; then
    echo "SQUIRRO-STORAGE-NODE PACKAGE VERSION $STORAGE_NODE_VERSION TOO LOW - PLEASE UPGRADE TO THE LATEST SQUIRRO 2.5.3 PATCH RELEASE FIRST" 1>&2
    exit 1
fi

CLUSTER_NODE_VERSION=$(yum list installed squirro-cluster-node | grep squirro-cluster-node | sed -e "s/[^ ]* \+//" -e "s/ \+[^ ]*//")
if [ "$CLUSTER_NODE_VERSION" \< "2.5.3-4113" ]; then
    echo "SQUIRRO-CLUSTER-NODE PACKAGE VERSION $CLUSTER_NODE_VERSION TOO LOW - PLEASE UPGRADE TO THE LATEST SQUIRRO 2.5.3 PATCH RELEASE FIRST" 1>&2
    exit 1
fi

for service in $(ls /etc/monit.d/sq*d | sed -e "s|^.*/||" | grep -v "sqclusterd" | grep -v "sqtopicproxyd"); do monit stop $service; done
# wait for `monit summary` to indicate that all but 6 services are stopped
yum update squirro-storage-node-users
yum update elasticsearch
# the following may take a while, so please wait until all index migrations are done
yum update squirro-storage-node
yum update squirro-cluster-node-users
yum update squirro-*
monit monitor all

# Resume the sources paused in the beginning
CentOS 7
# Pause all sources in the user interface

# Ensure the latest 2.5.3 patch release has been applied
STORAGE_NODE_VERSION=$(yum list installed squirro-storage-node | grep squirro-storage-node | sed -e "s/[^ ]* \+//" -e "s/ \+[^ ]*//")
if [ "$STORAGE_NODE_VERSION" \< "2.5.3-4109" ]; then
    echo "SQUIRRO-STORAGE-NODE PACKAGE VERSION $STORAGE_NODE_VERSION TOO LOW - PLEASE UPGRADE TO THE LATEST SQUIRRO 2.5.3 PATCH RELEASE FIRST" 1>&2
    exit 1
fi

CLUSTER_NODE_VERSION=$(yum list installed squirro-cluster-node | grep squirro-cluster-node | sed -e "s/[^ ]* \+//" -e "s/ \+[^ ]*//")
if [ "$CLUSTER_NODE_VERSION" \< "2.5.3-4113" ]; then
    echo "SQUIRRO-CLUSTER-NODE PACKAGE VERSION $CLUSTER_NODE_VERSION TOO LOW - PLEASE UPGRADE TO THE LATEST SQUIRRO 2.5.3 PATCH RELEASE FIRST" 1>&2
    exit 1
fi

for service in $(ls /lib/systemd/system/sq*d.service | sed -e "s|^.*/||" | grep -v "sqclusterd" | grep -v "sqtopicproxyd"); do echo "Stopping $service"; systemctl stop $service; done
# the output of following statement should indicate that all sq*d services but sqclusterd and sqtopicproxyd are stopped:
for service in $(ls /lib/systemd/system/sq*d.service | sed -e "s|^.*/||"); do echo "Status of $service"; systemctl status $service; done

yum update squirro-storage-node-users
yum update elasticsearch
# the following may take a while, so please wait until all index migrations are done
yum update squirro-storage-node
systemctl daemon-reload
yum update squirro-cluster-node-users
yum update squirro-*
for service in $(ls /lib/systemd/system/sq*d.service | sed -e "s|^.*/||"); do echo "Starting $service"; systemctl start $service; done

# Resume the sources paused in the beginning
 2. Upgrade Storage and Cluster Nodes when they are on different servers (and there is only one storage node and one cluster node)

On the one cluster node, shut down most of the Squirro services like so:

CentOS 6 / RHEL 6
CLUSTER_NODE_VERSION=$(yum list installed squirro-cluster-node | grep squirro-cluster-node | sed -e "s/[^ ]* \+//" -e "s/ \+[^ ]*//")
if [ "$CLUSTER_NODE_VERSION" \< "2.5.3-4113" ]; then
    echo "SQUIRRO-CLUSTER-NODE PACKAGE VERSION $CLUSTER_NODE_VERSION TOO LOW - PLEASE UPGRADE TO THE LATEST SQUIRRO 2.5.3 PATCH RELEASE FIRST" 1>&2
    exit 1
fi

# Pause all sources in the user interface

for service in $(ls /etc/monit.d/sq*d | sed -e "s|^.*/||" | grep -v "sqclusterd" | grep -v "sqtopicproxyd"); do monit stop $service; done
# wait for `monit summary` to indicate that all but 6 services are stopped
CentOS 7
CLUSTER_NODE_VERSION=$(yum list installed squirro-cluster-node | grep squirro-cluster-node | sed -e "s/[^ ]* \+//" -e "s/ \+[^ ]*//")
if [ "$CLUSTER_NODE_VERSION" \< "2.5.3-4113" ]; then
    echo "SQUIRRO-CLUSTER-NODE PACKAGE VERSION $CLUSTER_NODE_VERSION TOO LOW - PLEASE UPGRADE TO THE LATEST SQUIRRO 2.5.3 PATCH RELEASE FIRST" 1>&2
    exit 1
fi

# Pause all sources in the user interface

for service in $(ls /lib/systemd/system/sq*d.service | sed -e "s|^.*/||" | grep -v "sqclusterd" | grep -v "sqtopicproxyd"); do echo "Stopping $service"; systemctl stop $service; done
# the output of following statement should indicate that all sq*d services but sqclusterd and sqtopicproxyd are stopped:
for service in $(ls /lib/systemd/system/sq*d.service | sed -e "s|^.*/||"); do echo "Status of $service"; systemctl status $service; done

Upgrade the one storage node by running:

CentOS 6 / RHEL 6
# Ensure the latest 2.5.3 patch release has been applied
STORAGE_NODE_VERSION=$(yum list installed squirro-storage-node | grep squirro-storage-node | sed -e "s/[^ ]* \+//" -e "s/ \+[^ ]*//")
if [ "$STORAGE_NODE_VERSION" \< "2.5.3-4109" ]; then
    echo "SQUIRRO-STORAGE-NODE PACKAGE VERSION $STORAGE_NODE_VERSION TOO LOW - PLEASE UPGRADE TO THE LATEST SQUIRRO 2.5.3 PATCH RELEASE FIRST" 1>&2
    exit 1
fi

yum update squirro-storage-node-users
yum update elasticsearch
yum update squirro-storage-node
# this may take a while, wait until all index migrations are done
CentOS 7
# Ensure the latest 2.5.3 patch release has been applied
STORAGE_NODE_VERSION=$(yum list installed squirro-storage-node | grep squirro-storage-node | sed -e "s/[^ ]* \+//" -e "s/ \+[^ ]*//")
if [ "$STORAGE_NODE_VERSION" \< "2.5.3-4109" ]; then
    echo "SQUIRRO-STORAGE-NODE PACKAGE VERSION $STORAGE_NODE_VERSION TOO LOW - PLEASE UPGRADE TO THE LATEST SQUIRRO 2.5.3 PATCH RELEASE FIRST" 1>&2
    exit 1
fi

yum update squirro-storage-node-users
yum update elasticsearch
yum update squirro-storage-node
systemctl daemon-reload
# this may take a while, wait until all index migrations are done

Upgrade the one cluster node by running:

CentOS 6 / RHEL 6
yum update squirro-cluster-node-users
yum update squirro-*
monit monitor all

# Resume the sources paused in the beginning
CentOS 7
yum update squirro-cluster-node-users
yum update squirro-*
for service in $(ls /lib/systemd/system/sq*d.service | sed -e "s|^.*/||"); do echo "Starting $service"; systemctl start $service; done
# wait for the following statement to indicate that all sq*d services are started
for service in $(ls /lib/systemd/system/sq*d.service | sed -e "s|^.*/||"); do echo "Status of $service"; systemctl status $service; done

# Resume the sources paused in the beginning
 3. Upgrade multi-node clusters (multiple Storage Nodes and/or multiple Cluster Nodes)

Upgrading clusters of Squirro nodes to release 2.6.0 is very involved. Please contact Squirro support for assistance.

On each cluster node, shut down most of the Squirro services like so:

CentOS 6 / RHEL 6
CLUSTER_NODE_VERSION=$(yum list installed squirro-cluster-node | grep squirro-cluster-node | sed -e "s/[^ ]* \+//" -e "s/ \+[^ ]*//")
if [ "$CLUSTER_NODE_VERSION" \< "2.5.3-4113" ]; then
    echo "SQUIRRO-CLUSTER-NODE PACKAGE VERSION $CLUSTER_NODE_VERSION TOO LOW - PLEASE UPGRADE TO THE LATEST SQUIRRO 2.5.3 PATCH RELEASE FIRST" 1>&2
    exit 1
fi

# Pause all sources in the user interface

for service in $(ls /etc/monit.d/sq*d | sed -e "s|^.*/||" | grep -v "sqclusterd" | grep -v "sqtopicproxyd"); do monit stop $service; done
# wait for `monit summary` to indicate that all services but sqclusterd and sqtopicproxyd are stopped
CentOS 7
CLUSTER_NODE_VERSION=$(yum list installed squirro-cluster-node | grep squirro-cluster-node | sed -e "s/[^ ]* \+//" -e "s/ \+[^ ]*//")
if [ "$CLUSTER_NODE_VERSION" \< "2.5.3-4113" ]; then
    echo "SQUIRRO-CLUSTER-NODE PACKAGE VERSION $CLUSTER_NODE_VERSION TOO LOW - PLEASE UPGRADE TO THE LATEST SQUIRRO 2.5.3 PATCH RELEASE FIRST" 1>&2
    exit 1
fi

# Pause all sources in the user interface

for service in $(ls /lib/systemd/system/sq*d.service | sed -e "s|^.*/||" | grep -v "sqclusterd" | grep -v "sqtopicproxyd"); do echo "Stopping $service"; systemctl stop $service; done
# wait for the following statement to indicate that all services but sqclusterd and sqtopicproxyd are stopped
for service in $(ls /lib/systemd/system/sq*d.service | sed -e "s|^.*/||"); do echo "Status of $service"; systemctl status $service; done


Upgrade every storage nodes by running:

CentOS 6 / RHEL 6
# Ensure the latest 2.5.3 patch release has been applied
STORAGE_NODE_VERSION=$(yum list installed squirro-storage-node | grep squirro-storage-node | sed -e "s/[^ ]* \+//" -e "s/ \+[^ ]*//")
if [ "$STORAGE_NODE_VERSION" \< "2.5.3-4109" ]; then
    echo "SQUIRRO-STORAGE-NODE PACKAGE VERSION $STORAGE_NODE_VERSION TOO LOW - PLEASE UPGRADE TO THE LATEST SQUIRRO 2.5.3 PATCH RELEASE FIRST" 1>&2
    exit 1
fi

yum update squirro-storage-node-users
yum update elasticsearch
CentOS 7
# Ensure the latest 2.5.3 patch release has been applied
STORAGE_NODE_VERSION=$(yum list installed squirro-storage-node | grep squirro-storage-node | sed -e "s/[^ ]* \+//" -e "s/ \+[^ ]*//")
if [ "$STORAGE_NODE_VERSION" \< "2.5.3-4109" ]; then
    echo "SQUIRRO-STORAGE-NODE PACKAGE VERSION $STORAGE_NODE_VERSION TOO LOW - PLEASE UPGRADE TO THE LATEST SQUIRRO 2.5.3 PATCH RELEASE FIRST" 1>&2
    exit 1
fi

yum update squirro-storage-node-users
yum update elasticsearch

and then do it on one of the storage node and make sure that the Elasticsearch health is green before proceeding to further nodes. If the Elasticsearch health is not green, please refrain from updating other storage nodes and contact Squirro support.
This step usually take long time as it needs to migrate old indices to indices in new template, so make sure start this command in a screen section to avoid disconnection during update.

CentOS 6 / RHEL 6
screen
yum update squirro-storage-node
CentOS 7
screen
systemctl daemon-reload
yum update squirro-storage-node


Upgrade each cluster nodes by running:

CentOS 6 / RHEL 6

First run the following on all cluster nodes one at a time:

yum update squirro-cluster-node-users
yum update squirro-python-squirro.service.cluster

Followed by running the following on all cluster nodes one at a time:

yum update squirro-*
monit monitor all

# Resume the sources paused in the beginning
CentOS 7
yum update squirro-cluster-node-users
yum update squirro-*
for service in $(ls /lib/systemd/system/sq*d.service | sed -e "s|^.*/||"); do echo "Starting $service"; systemctl start $service; done

# Resume the sources paused in the beginning