Squirro 3.2.0 - Release Notes

 

We're excited to announce Squirro 3.2.0, released on the 9th of November 2020.

Contents - What's in the release?

New Features

Data Science Studio (Beta)

The Data Science Studio enables Squirro Business Analysts to solve text classification problems by making use of machine learning. The whole process of labelling and model development is split into six steps after the data is loaded into Squirro:

  • Candidate Set - First, the data is filtered using Squirro’s search and filtering capabilities.

  • Ground Truth - Secondly, the data is labelled with the aid of the highlighted search and filter matches.

  • Model - Thirdly, an appropriate machine learning template is selected, configured and trained with the labelled data.

  • Validation - Fourthly, the validation metrics of the trained model is analysed to guarantee the quality of the NLP models.

  • Publish - Fifthly, the model is published directly into production, by exposing it into the enrich pipeline library.

  • Add ML Enrichment to Pipeline Workflows - Sixthly, the published model(s) can be added to the pipeline workflow to enrich the data seen by the end-users through Squirro’s dashboards user interface.

Further information about the Data Science Studio can be found in the User Manual.

Cognitive Search powered by Communities

  • Communities Setup: "Communities" is a feature which allows end-users to specify the topics or communities of their interest. Built upon the idea of User Preferences, it allows personalisation of content by subscribing to communities, which effectively are facet:value pairs. They can be set up in bulk by leveraging a facet, where the values of the facet selected automatically transform into communities. They can also be created manually.
    For each community, images matching the names are pre-populated to make the communities easily identifiable. If needed, representative images for the communities can also be uploaded manually.

  • A new widget called 'Communities List' lists all the communities created in the project, categorised by community type (the facet behind that community). Users can click on a community to instantly subscribe or unsubscribe themselves to it.

  • A new widget called 'Community Items' displays one card for every community that the user has subscribed to and displays the latest 3 items within each card using a carousel pattern. Thus, on landing on the app, the users get to know the latest happenings within their favourite communities. The widget can be configured by dashboard store to change its visibility by changing the store value.

  • Tabs widget is now extended to be configured by communities. If so, each subscribed community is displayed as tabs along with the respective images so that the user can filter the dashboard by using them.

  • Dashboard visibility conditions are now configurable by communities so that on subscribing or unsubscribing to communities, layer visibility can be manipulated.

All these features together can be used to build a very beautiful and powerful Cognitive Search experience. We will launch a pre-baked configuration of Cognitive Search leveraging all these features with our self service cloud launch in the coming days.

 

Data Loader improvements 

With the Squirro 3.2.0 release, we have made significant improvements to our data loaders :

  • The Squirro Data Loader frontend is now more powerful than ever, exposing plethora of additional options to put it on par with the command line version in terms of configurability. These features include additional options such as Source Batch Size, Ingestion Batch size and many more. In addition, we have also included some quality of life improvements to our Data Loader frontend to make working with them more flawlessly. These include functionalities like requesting logs of the last run or killing a running data loader job. For more details, please visit Dataloader Frontend Improvements.

  • In addition to enhancing our data loaders, we have also been working on empowering our plugin authors to develop more powerful data loader plugins to allow their users to breeze through the configuration process. One significant step that we took in this direction is the introduction of what we call "Data loader Templates". A data loader template is a collection of all your facet mappings, field mappings, scheduling options and pipeline workflow. For your end user this means that they can interact with these beautiful plugins with a one click interaction without going through the long process of configuration in the beginning. Please refer to the Dataloader Templates documentation for details on how to configure and use such templates.

  • Along the theme of making our data loader plugins more powerful and secure at the same time, we are also rolling out for the first time the support for providing an OAuth login flow that the plugin authors can implement to make the login process easier for the end users. With this functionality enabled, the process of providing Squirro with login information to a data source is reduced to just a few clicks. Data loader plugins can define this OAuth flow as a separate component to the plugin. More details on how to enable that are coming soon.

 

  • We also made a few quality of life improvements to data loaders for more resiliency down the road:

    • The incremental values are now stored in the MySQL database, thus benefiting from replication. We do not use SQLite anymore for storing these values. This feature is also backwards compatible with older Squirro versions through auto-migration.

    • We have introduced the ability to view and modify the current maximum incremental value for a data source using the API and the Squirro client. This will also become available in the frontend in a future Squirro release.

    • We also implemented a retry/backoff logic in the data source service to handle cases where the provider service is unavailable or fails to provide an apropos response.

Goodbye Python 2

Python 2 has served Squirro well for the better part of the last decade. But with Squirro release 3.2.0, we have finally dropped the support for Python 2 completely. Any new Squirro 3.2.0 installations will not come with any Python 2 packages installed and we will also not support any bug fixes or feature improvements on top of Python 2 environments anymore. However, all previous releases (<= 3.1.0) will still be supported for bug fixing on top of Python 2 environments.

If you are upgrading from a previous Squirro release, we will not explicitly remove the Python 2 packages installed in the previous release but the upgrade process will switch over all the existing services to run in Python 3 environments by default. Please check our upgrade instructions below on how to remove these orphaned Python 2 packages.

Squirro spricht Deutsch

It has been a long-requested feature from our customers to be able to customise Squirro’s interface in the language of their choosing. We laid out the engineering ground-work in this release to easily support the Squirro interface in multiple languages. This release already ships with a German version of Squirro, which can be enabled by following the instructions here. Stay tuned for more language support in the upcoming releases or contact us directly in case you want your language to be on the list. We have also released this feature in a patch release for 3.1.0.

Redis Server Upgrade

With Squirro Release 3.2.0, we have upgraded the supported version of Redis to 6.0.8. With this update, we now also support TLS over Redis. This comes in addition to already supporting TLS with MariaDB and Elasticsearch. Eventually, this results in more secure Squirro clusters. Please head over to https://squirro.atlassian.net/wiki/spaces/DOC/pages/2220458142  page for more details on how to configure the TLS support for Redis.

Additional Machine Learning libraries

We have also packaged a few additional state-of-the-art Machine Learning Libraries to lay the groundwork for building a more powerful data classification toolset with our offering of the Data Science Studio. In the meantime, if you want to play around with these libraries using custom pipelets or custom Machine Learning steps, then we happily take feedback from your learning.

These are the libraries that we have included/updated:

lib.nlp improvements

  • Balancer - The balancer step helps to uniform the distribution of the number of elements per class in a data set.

  • Randomizer - The randomizer step shuffles the order of the documents, which helps to generate a more general applicable model.

  • Cross Validation - The k-fold cross-validation step helps to evaluate the skills of a machine learning model on unseen data.

  • We also added the beta support for rule-based proximity models in Chinese and Japanese. Simply start by defining your rule-based model in these languages and Squirro will automatically understand how to correctly tokenize sentences in these languages.

Improvements

Widget Improvements

  • Link widget

    • The widget now has more visual options like Button filled and Button outline.

    • One can also choose to align the link / button to the left, center or right within a widget container.

    • Link widget icons now bigger and more legible with 24px icon size.

  • Tabs widget

    • Now possible to hide the ‘All’ tab from the vertical mode of the tabs widget.

  • Favorites widget

    • A new config. option for the favorite widget called 'Extended' where one has the widget in the the table format with a searchbar to search within the favorite filters.

    • This mode does not has checkboxes and only one filter can be selected at a time. Now configurable by dashboard store, on making a selection on the widget, one can choose to hide the widget and show the layer with results instead.

  • Cards widget

    • A new mode 'Fullscreen' added to the widget to have a focused reading experience on the white background. On opening an item, the item detail lays over the entire screen and can be closed using the x button. Essentially, a white mode of the modal window

    • Experience the new reading experience which uses bigger fonts for item title and body to enhance the reading experience.

    • Build your own newspaper by using the new ‘masonry’ mode which uses dynamic card sizing to size each card individually to fit the content inside and thus creates a collage of items belonging to each community subscribed by the user.

  • Table widget

    • Now configurable by multiple facets. One can specify the no. of values to display per facet.

  • Empty state illustration

    • No results? Use the config. 'Show image when empty' to display playful Squirrel artwork and create a nice user experience when a widget is empty. The text that goes with the image is also customisable.

  • New Styling

    • New warm background color in a shade of cream to create visual harmony with the brand color. A new shade of blue to complement our brand orange.

    • Body, header and item title font are bigger and better.

    • Bigger border radius to eliminate sharp angles and go with the organic elements used in developing our brand identity.

    • New improved layout within the cards which makes both thumbnails and PDF previews look equally good.

    • The new light weight tags have discarded the chips style. They are now displayed as plain text with # sign.

Other Improvements

  • Squirro now supports mariadb natively and does not require the renaming of mariadb service to mysqld service anymore. Please head over to our Setup on Linux page to benefit from our streamlined install instructions.

  • Studio plugins upload do not need a restart of Frontend service. They are automatically reloaded by the Frontend service after the upload to the server.

  • We have made significant search performance improvements when requesting entities and subitems in the query. If you have a few terrabytes of PDF documents, upgrading to this release will immediately benefit the search speed. 

  • All migration scripts now run in Python 3 environments.

  • We fixed the fetching and rendering of project guides.

  • We fixed the `slow_log` logging in our Topic API for slow Squirro Queries.

  • We have almost re-written our query parser from scratch to avoid running into recursion issues. This specifically fixes SB-466.

  • nltk-punkt is now available on Squirro mirror and can be installed with a simple yum install squirro-python-ntlk-punkt.

  • Data loader plugins running in the frontend do not rely on the end-user token for setting up the load but rely on an internal service user grant. This means that tokens no longer expire after a few months of running loaders.

  • We made a few improvements to our cluster service to improve the Python 3 compatibility.

  • We fixed an issue in the Augmented Banking App where queries with multiple entities would result in no items.

Breaking Changes

  • Squirro Toolbox with version < 3.2.0 can not talk to a 3.2.0 server anymore. While we usually try to maintain such compatibility, in this release there were a few explicit changes required which break managing of data loader jobs if the toolbox is an older version than the Squirro release.

  • We have removed our very old ext-auth service with this release. The new way for setting up Single Sign-On for a very long time is to use the studio plugins e.g - extauth_saml plugin for SAML based SSO systems (see https://squirro.atlassian.net/wiki/spaces/DOC/pages/28213289 or https://squirro.atlassian.net/wiki/spaces/DOC/pages/187170839). 

  • We have replaced the datalake plugin with a more powerful Squirro plugin. The existing sources which were created based on the datalake plugin can be kept running without hindrance but one cannot add new sources from the old datalake plugin. To add new sources, the newly improved Squirro plugin must be used.

Bug Fixes



SB-494 - Wrong tab highlighted for studio plugins under project settings

SB-497 - Don't show Kill Option if data loader job is not running.

SB-484 - Wrong highlighting for PDF (duplicated words)

SB-428 - In the workflow content augmentation fetch external content setting is not saved

SB-495 - Facet widget not reacting to dashboard reset

SB-487 - Search not working after project reset

SQ-12370 - SpaCy model version conflict leading to the failure of the fingerprint service

SB-480 - Can not combine entity queries

SB-475 - Squirro LWC component issues

SB-387 - Issue when importing a project, Error message formatting

SB-486 - Typeahead error message: "WARNING Unable to fix highlight cursor position"

SB-474 - Typeahead error message: "WARNING Unable to fix highlight cursor position"

SB-418 - Entity Feedback - Wrong classification logic not working

SB-466 - "maximum recursion depth exceeded in comparison" error when running ML workflow with infer count > 100

SQ-12065 - Add a new synonym: action takes to EDIT previously added synonym

SB-469 - Error upgrading squirro-python-tensorflow-1.9.0-5.x86_64

SB-465 - Facet Dropdown doesn't work when there is an additional query

SB-462 - Error creating KEE with topic service on Python 3

SQ-12143 - Pie Chart Widget: Not showing Display Name of Facet but facet_name

SB-467 - Error running pipelet rerun with entities

SQ-12214 - Error in query_loader

SB-476 - Issue in deep_compare method in squirro.common.testing

SB-157 - Facet typeahead property enabled for non-analyzed facet



Added on November 11, 2020 (build 184 - Patch 1)

  • Custom Widget API: Allow custom url in aggregations collections

Added on November 11, 2020 (build 185 - Patch 2)

  • Custom Widget API: Allow customising url params in aggregations collections

Added on November 17, 2020 (build 186 - Patch 3)

  • Custom Widget API: Expose Highcharts Solid Gauge module

Added on March 8, 2021 (build 189 - Patch 4)

  • CVE-2021-27945: XSS vulnerability security bug fix.

Fresh Installation Instructions

Please follow the regular installation steps

Upgrade Instructions

Please ensure that your current version is 3.1.0. If you are on a version older than 3.1.0, please contact support.

With this release, we have officially dropped the support for Python 2. If your custom Python plugins are not compatible with Python 3, please refrain from upgrading to this Squirro release until you have migrated the custom Python plugins to be Python 3 compatible.

yum update java-1.8.0-openjdk yum update squirro-storage-node-users # update storage node yum update squirro-storage-node yum update squirro-cluster-node-users yum update squirro-* # Resolve any `rpmnew` files. We anticipate `/etc/squirro/storage.ini`, `/etc/squirro/machinelearning.ini` and `/etc/nginx/conf.d/frontend.conf` to at least be resolved. systemctl reload nginx squirro_restart # Remove all orphaned python 2 packages yum erase squirro-python27*

Resolve all the `.rpmnew` files in `/etc/squirro`. This process involves merging the changes between the `.ini` (e.g., `storage.ini`) and '.ini.rpmnew`(e.g., `storage.ini.rpmnew`) files and then eventually deleting the `.ini.rpmnew` files. Finally, restart the services with merged `.ini` files (or just use `squirro_restart` to restart all services).

 

Upgrade the storage node by running:

yum update java-1.8.0-openjdk yum update squirro-storage-node-users # update storage node yum update squirro-storage-node

Upgrade the cluster node by running:

yum update java-1.8.0-openjdk yum update squirro-cluster-node-users yum update squirro-* # Resolve any `rpmnew` files. We anticipate `/etc/squirro/storage.ini`, `/etc/squirro/machinelearning.ini` and `/etc/nginx/conf.d/frontend.conf` to at least be resolved. systemctl reload nginx squirro_restart # Remove all orphaned python 2 packages yum erase squirro-python27*

Resolve all the `.rpmnew` files in `/etc/squirro`. This process involves merging the changes between the `.ini` (e.g., `storage.ini`) and '.ini.rpmnew`(e.g., `storage.ini.rpmnew`) files and then eventually deleting the `.ini.rpmnew` files. Finally, restart the services with merged `.ini` files (or just use `squirro_restart` to restart all services).

 

Upgrade every storage nodes (one by one) by running:

Upgrade each cluster nodes by running:

Resolve all the `.rpmnew` files in `/etc/squirro`. This process involves merging the changes between the `.ini` (e.g., `storage.ini`) and '.ini.rpmnew`(e.g., `storage.ini.rpmnew`) files and then eventually deleting the `.ini.rpmnew` files. Finally, restart the services with merged `.ini` files (or just use `squirro_restart` to restart all services).