Squirro 2.6.0 - Release Notes
We're excited to announce Squirro 2.6.0, released on April 3rd, 2018, based on Elasticsearch 6.2.2 and packed with new features and improvements
Contents
What's in the release?
New Features
- Never Get Lost Again: add our new map widget to your dashboard see see exactly where your data is coming from.
- More control over your data: use Pipeline Workflows to specify exactly how each data source should be handled by Squirro
- Create Dashboards Like a Boss: we’ve overhauled the Dashboard Editor to make it friendlier and far more efficient to use, exposing widget properties in a new panel that will be immediately familiar to users of tools like Adobe Photoshop and Illustrator
- High Precision Data Model: the new Catalyst Data Model allows you to relate and display items in powerful new ways, and is the foundation for building recommendations.
- Smart Recommendations: Squirro is now able to recommend items, facets or entities based on multiple models. This is a door opener for new kinds of powerful visualisations.
- Super Charged Machine Learningalpha: we've added something we call the Machine Learning Service, which is a major milestone in what we call pragmatic AI. The Machine Learning Service allows you to train and run your own ML models on Squirro datasets.
- First Impressions Count: we added an introductory walkthrough tutorial wizard to make it easier for new users to understand how Squirro works. Try it for yourself at here.
- Studio Plugins Easier to Install: using the "Plugin Repository" studio plugin (under "Server Management") you can install other plugins, such as custom dataloaders or pipelets, directly from the UI
Improvements
- Updated to Elasticsearch 6.2.2: see https://www.elastic.co/guide/en/elasticsearch/reference/current/release-notes-6.2.2.html
- Latest Frontend Dependencies: jQuery 3.3.1, Highcharts 6 and D3 5.0
- Faster Pipeline: in Squirro 2.5.1 - Birch - Release Notes we introduced Pipeline 2.0. With this release we've made it the default and only data processing Pipeline in Squirro, as well as significantly improving it's performance and stability since 2.5.1
- Faster Tagging: we improved parallelism in keyword tagging and the filtering service by having the pipeline 2.0 ingester run multiple batches in parallel
- Multiple Pipelets in Parallel: added support for consume_multiple pipelets in the plumber service
- Faster De-duplication: we improved the performance of item de-duplication, increasing overall pipeline performance where this step is used
- Pyrebloom Gone: while updating the deduplication service, we removed the pyrebloom package which was no longer required
- Bulk Indexing Change: we removed bulk indexing support from Python client. This is now handled automatically be the new pipeline.
- Improved Self Service Dataloaders: in Squirro 2.5.3 we introduced Self Service dataloaders. We've continued to improve on them in this release including key / value storage for your dataloaders, help texts, boolean fields, password protected fields, automatic field-based mapping and quick facet creation pre filled with field name.
- Easier to Configure: all asset configuration files, such as custom data loaders, Studio plugins and widgets now support Hjson, the human friendlier form of JSON.
- Easier CSV and Excel Uploading: we automatically detect character encoding now, making uploading these types of files pain-free
- Weighted Keywordsbeta: in Squirro 2.5.3 we introduced weighted keywords. These can now be displayed in the UI and Squirro dashboards.
- Monitoring for nginx: we included an optional nginx monitoring module nginx-module-vts
- Robuster Topic Server: we made Topic service start up more robust in the event of failure of packaged studio / dataloader plugins
- Smarter Disk Usage: we switched to logrotating the
/var/log/squirro/*/nginx*.log
files - Monitoring with Prometheus: it's now possible to monitor Squirro installations with Prometheus.
- Cluster Service always available: in RedHat 7 and CentOS 7 deployments including single-node installations
- Copy visibility condition: Copy and paste the visibility condition across different dashboard layers
Bug Fixes
- No longer stop redis when redis cache is reset.
- Fixed the ability to import items with "almost empty" files and html documents, e.g. those containing only whitespace, comments, or html "processing instructions"
- Fixed issue with deleting projects on Centos 6
- Fixed issue where Bulk operations fail silently in case of Elasticsearch errors
- Fixed issue with language-detection step: AttributeError: 'dict' object has no attribute 'xhtml_utf8'
- Fixed issue with cleanup step: AttributeError: 'dict' object has no attribute 'clean'
- Fixed issue where if a user group has a higher role than the user itself, still the lower role is applied
- Fixed issue with inconsistent highlighting color for abstract, item detail and smartfilter explain
- Fixed issue where dragging a widget to the page boundary did not initiate page scroll
- And many more small fixes and improvements.
- Fixed keyword tagging with ElasticSearch 6.2 (added on April 10, 2018 with release 2.6.0-102)
- Fixed pipeline workflow overeager enabling of the Noise Removal step and fixed pipeline 2.0 file cleanup to handle data files whose sources have been deleted (added on April 10, 2018 with release 2.6.0-102)
- Logging improvements to pipeline 2.0, avoid temporary persisting of internal "sq:" fields, fixed tagging of keywords containing spaces (added on April 16, 2018 with release 2.6.0-105)
- Added mariabdb-server rpm packages and it's dependencies on Centos 7 mirror for offline install (added on April 17, 2018)
- Fixed reset action for the ingester (added on April 24, 2018)
- Added keepalive support for Zookeeper and Redis (added on April 24, 2018)
- Testing, stability, and validation for llibnlp. Added window filter and analyzers (added on April 24, 2018)
- Fixed Machine Learning Service time representations, and flattened configurations (added on April 24, 2018)
- Increased RPM dependencies such as "squirro-monit" and "squirro-redis-server" to have high enough "iteration numbers" to be included in upgrades from older Squirro versions (adding on May 14, 2018 with release 2.6.0-114)
Added on May 8, 2018 (Build 112)
- (SQ-9204) Can not rerun Enrichment.
- (SQ-9202) Pipelets can not be deleted.
- (SQ-9192) Project Export/Import broken due to enrichments/workflows.
- (SQ-9245) Can't edit empty nargs list.
- (SQ-9203) Can't use space as delimiter in dataloader.
- (SQ-9200) Pipelet step can not be edited.
- (SQ-9199) Allow empty config for pipelets.
- (SQ-9194) Can not upload files in studio.
- (SQ-9193) Source filter from detail item broken.
- (SQ-9198) Content Augmentation not saved correctly when adding as new.
- (SQ-9295) Squirro fails to work with uBlock.
Added on May 16, 2018 (Build 116)
- Fixed error reporting bug in machine learning service.
- Added machine learning configuration validation.
- Fixed hdf5 packaging.
- Added optional ids for projects, subscriptions, and objects upon creation.
- Simplified machine learning workflows.
Added on May 25, 2018 (Build 121)
- Fixed default selection for itemuploader.
- Fixed recommendations when used in combination with a filter query.
- Fixed typeahead for weighted keywords.
- Added separate "train" and "process" datasets in ML service.
Added on May 30, 2018 (Build 125)
- Fixed hdf5 dependency to allow saving of Keras models.
- Saner default config for ML service runs.
- Revert flup version back to version: 1.0.3.dev-20110405
- Fixed comment typo which causes machinelearning service not to start.
Added on June 6, 2018 (Build 126)
- Fixed an issue with returning no results for very complex queries under high load
- Fixed typeahead and aggregation issue for weighted keywords
- Fixed an issue with multiple spaces in a query
Added on June 14, 2018 (Build 127)
- Fixed code fields not being populated while editing pipelets
- Fixed code fields not being populated while editing data sources
Installing and Upgrading
Plan your Upgrade
Squirro 2.6.0 contains two significant changes;
- Pipeline 2.0 is now the default and only data processing pipeline in Squirro and is based on new and configurable Pipeline Workflows.
- The provided Elasticsearch version is 6.2.2 (in Squirro 2.5.3 we had Elasticsearch version 5.6.0). This changes means we need to re-index existing Squirro projects, which may require extended periods of downtime.
Fresh Installation Instructions
Please follow the regular installation steps
Upgrade Instructions
Please ensure that your current version is the latest patch of 2.5.3. If you are on a version older than 2.5.3, please contact support.
Upgrading to Squirro 2.6.0 involves two major changes that will consume additional caution and time:
- The Pipeline 1.0 service is being fully replaced by Pipeline 2.0. Before the upgrade we recommend that you pause all sources, wait for incoming data to stop arriving, ensure that all keyword taggings have been applied, and only then proceed with the actual upgrade.
- Additionally, the upgrade will reindex the v8 elasticsearch indexes to new indexes with template version v9. This can take hours if your Squirro installation contains a large amount of data.
Planned Downtime for Re-indexing
Estimating Your Upgrade Window: we roughly estimate that each 100K documents in a Squirro index take about 1 minute to migrate to the latest version. From the command line you can find out how many documents are in the index from by issuing commands like;
$ # Example of output from Elasticsearch on index status $ curl http://localhost:9200/_cat/indices green open squirro_v8_w6vldmrdt4qq6pvfphoeaa SUKcwIRAQVuG_2sunFpbTQ 3 1 87790 336 3.3gb 1.6gb green open squirro_v8_sdmv1va9qxw4xovqnl-pxw hI_5ttd6Rcelc_29vB9JwQ 3 1 25241 0 1gb 547mb green open squirro_v8_ignlqdyqsta1dqutozh0xg B6NrCemiTceoRxAM5JpJmg 3 1 528962 66 16.4gb 8.2gb $ # The command below takes the 7th column from above and sums to the total number of items in the index $ curl http://localhost:9200/_cat/indices | awk '{sum += $7} END {print sum}'
So estimate with downtime
>= (# total documents
/ 100000
) * 60 seconds
Additionally if you are using Squirro in a Box, additional steps are involved. In this case we also ask you to contact support.