Squirro 3.2.10 LTS - Release Notes

Welcome to Squirro 3.2.10, released on April 29, 2021!

We have released a bug-fix release 3.2.11 on May 6, 2021 with some critical bug fixes.

Squirro 3.2.10 is a long-term support (LTS) release and will receive updates for security issues and important bug fixes for the next two years. See the Squirro Release Process document for details on Squirro’s versioning.

These release notes cover all the new features and improvements since the previous LTS release 3.2.0. Most of these were already introduced in the intermediate releases of the 3.2.x series and have been covered in those release notes as well.

Table of Contents

New Features

  • New pipeline editor

  • Document-level tagging in the AI Studio

  • Feedback processing inside the AI Studio

  • Email template for scheduled newsletters

  • Content-based typeahead

  • Translations for Dashboards

  • Support for Single Logout (SLO) in the SAML Single-SignOn (SSO) flow

  • Support for Red Hat® Enterprise Linux® 8 and CentOS 8

Pipeline

The pipeline editor has been completely recreated. The new editor is more visual, and provides a much easier overview of the various pipelines in a project.

Please see Pipeline Editor for an introduction into the new editor.

AI Studio

The previously introduced Data Science Studio was renamed into AI Studio. Apart from this renaming, the AI Studio has seen significant new features and improvements to make training of classification models even easier.

We introduce a document-level classification in the AI Studio: The ground truths and the trained models, can be configured to be trained on document-level tagging besides the already existing sentence-level tagging. This enables training of models based on the whole content of a document.

The configuration of ground truth definition for sentence-level tagging has been extended with an option to define sentence splitting rules. This enables processing of content that cannot easily be split up on typical sentence structure, for example CRM call notes or certain textual news sources.

For sentence-level classifications, we now expose the user feedback in the ground truth for processing. When enabled, the end users can provide feedback on each classified sentence. This feedback can then be processed in the ground truth to add new labels for further training.

Cognitive Search - Project template

When creating a new project, the Cognitive Search and Cognitive Search+ templates are available to kick-start the creation of a new Squirro Cognitive Search project. These two templates are equivalent to Cognitive Search and Cognitive Search: Food Safety applications provided on start.squirro.com.

The Cognitive Search template includes a number of pre-configured dashboards for an advanced search experience. These allow users to follow key communities and to search all the indexed data. The project creator just needs to load the data and define the communities.

The Cognitive Search+ template even includes example data and communities already. It is a great way to start with a fully populated Squirro experience and use for learning and demoing.

Updating Squirro will update the project templates, but already created projects are not updated to reflect any changes in the template.

Newsletters

End users can receive regular updates of relevant content through email newsletters. These newsletters are sent based on user’s communities subscriptions and will include any new content in the subscribed communities.

The project creator can enable newsletters on their project. The newsletter template can be edited by the server administrator using a visual editor and support complex templating. And individual users can manage their newsletter subscriptions in their profile.

See Newsletters for information about how to enable and change these newsletters.

Content-based typeahead

With content-based typeahead the search suggestions provided to end users become more relevant. For this indexed documents are analyzed to extract key phrases. These phrases are then suggested to the user based on the current query. This is in addition to the previous facet value typeahead which is still available.

See Content-based Typeahead for documentation.

Translations for Dashboards

Squirro dashboards are now available in multiple languages. In addition to German, Squirro also includes French and Italian translations.

Additionally dashboards and widgets now have access to translation keys. These can be configured in the Project Translations section. In dashboards they can then be accessed using the translation syntax of $key syntax ($ followed by a defined translation key).

More documentation on this feature will follow shortly.

Single Logout

Squirro supports Single Sign-On (SSO) out-of-the-box using the SAML standard. With this release Single Logout (SLO) is now also supported. When the identity provider has SLO enabled, a user’s logout from Squirro will fully log them out on the identity provider side as well.

Red Hat Enterprise Linux 8 / CentOS 8 packages

Squirro now provides packages for Red Hat® Enterprise Linux® (RHEL) and the free community edition CentOS version 8.

Please be aware that CentOS 8 has been moved to end-of-life by Red Hat and will only be supported by them until the end of the year 2021. Squirro is actively monitoring the evolving landscape with new community-supported RHEL versions and will provide supported packages for at least one of those editions in the future.

Breaking and High-Impact Changes

  • With the introduction of the new pipeline editor, the navigation structure in the Setup space has changed. To make room for the new pipeline editor to use the full width, all the options that were previously under the Enrich tab have moved to the new AI Studio tab. Re-running of enrichments will soon disappear from there and then move into the new pipeline editor properly.

  • Feedback options in the user interface are only presented for entities that have a model_id property. This is part of the user feedback additions. Previously, on sentence-level entity highlights it was possible to collect feedback in a few widgets by pointing to a custom endpoint. Now the feedback is collected centrally (and can be accessed through the API).

  • Email templates were moved from the Setup space to the Server space. This reflects the fact that the templates are global and not project-specific.

  • The squirro_activate command now activates the Python 3 virtualenv as opposed to the Python 2 version. If the Python 2 virtualenv is still present, the command will print a warning with a message on how to activate the Python 2 environment manually.

  • The Squirro navigational interface has been redesigned and we have got rid of “Enrich”, “Predict”, & “Train” tabs. Any custom studio plugins under these sections have to be re-uploaded to the new section called “AI Studio”. This can be achieved by specifying the ”location”: “dss” in the studio_plugin.json file.

Deprecated Functionality

  • Pipelets can declare their accepted configuration using a getArguments method. When that method is not present, a JSON editor is displayed in the Pipeline Editor for that step and the project creator can enter an arbitrary configuration.
    This default behaviour is now deprecated and will be removed in a future version. To continue accepting configuration options, it is recommended to implement the getArguments method for all pipelets.
    Please refer to the Pipelets documentation for details.

Improvements

Gather

  • If the data loader is run with the delete flag, we do not delete any items which have missing IDs.

  • New source options are better documented in the command line data loader (squirro_data_load command). Most notably the squirro_source option which allows you to fetch data from a different Squirro source cluster. Run squirro_data_load --help to give it a go.

  • This aforementioned Squirro data source now uses the inbuilt key_value_store and key_value_cache APIs for its state management and duplication detection logic.

Understand

  • The timeout for the plumber service (responsible for running pipelets) is now configurable to allow for pipelets to take more time for their processing without running into timeout issues. The default value for this timeout was also increased from 120 seconds to 600 seconds. See common.ini.

  • Improved ingester logging for back-offs during ingestion of batches with errors.

  • The squirro_kee upload command also uploads the tokenizers if they exist.

Act

  • Project export and import files are now forward-compatible. So a file exported from 3.2.0 can be imported into a 3.2.10 Squirro installation (and can also be imported in future releases such as 3.3.0, etc.). Previously the files had to be from the exact same Squirro version.

  • Export and Import support App/Nav Bar settings, community types, communities, and dashboard themes. The pictures defined for communities are included in the export as well.

  • The Squirro Frontend SDK now allows passing a pre-defined query as an initialization parameter.

  • The Frontend SDK no longer tries to load fonts. Instead they need to be loaded by the caller.

  • Additional Highcharts modules are loaded by default, enabling the building of custom gauge widgets.

  • The Global Search bar now supports spell-checking.

  • Communities and Known Entity Extraction play together better:

    • When setting up communities, the Known Entity Extraction configuration can be automatically set up.

    • Known Entity Extraction configurations done through the user interface can use a community type as the source for the names to tag.

  • Community creation supports Excel / CSV import.

  • Community subscriptions can now be done in batch (multiple at once), both in the API as well as in the user interface.

AI Studio

  • Rules defined during AI Studio labeling now also store the corresponding label and hence can be visualized in the rules overview page.

  • The usability of the entire training process has been improved. This includes a better flow between successive steps, from the addition of new candidate sets to existing ground truths, and more.

  • Performance improvements for fetching labelled data.

  • When proximity rule definition is disabled for a ground truth, the rules are hidden from view.

  • Optimize Item fetching for Machine Learning workflows. squirro_query_loader in lib/nlp only fetches the desired number of documents when the default batch_size is larger than the requested documents.

Admin and Operations

  • Support for reading relevant config from /etc/squirro to establish connection to Elasticsearch and Squirro services. This allows for the tool to work correctly even when you have set up SSL / TLS for Squirro services as well as ES. See Squirro Secure Config Guide.

  • Ability to disable user management functionality: This is useful when Squirro is integrated with a Single-SignOn (SSO) system. To disable this, set the configuration flag frontend.userapp.sso-disable-user-edit to true (see configuration.ini). Enabling this, disables the following functionality:

    • Setting of profile information (name / email)

    • Setting of password in user profile

    • Management of users in "Server" space

  • More powerful squirro_stop, squirro_start, and squirro_restart commands. Now you can simply say sudo squirro_start (or similar) in situation where you have the permissions to run the systemctl commands with sudo and Squirro will do the right thing and use sudo to start/stop/restart all relevant Squirro services. Furthermore, we skip cluster service from these commands to avoid having accidental fail-overs (even if the /etc/squirro/cluster.ini file exists).

  • Migration scripts for topic service now run at the boot time of topic service and are no longer run during the upgrade time. This allows us to make sure that our databases are in a consistent state every time the service boots up. If this becomes a problem for you, this migrations at the boot time of the service can be disabled by setting the execute_at_startup in migration section of the relevant service file as False.

  • Increased caching performance by utilizing pickle instead of json for internal caches.

Updated Dependencies

  • Updated Python package PySAML2 to the latest version 6.3.1.

  • Remove dependency of Python package statsd.

  • Remove R-core dependency. This means less external dependencies needed and fewer package install conflicts during Squirro installation.

  • Add dependency for the poppler-utils package.

Bug Fixes

  • Fixed a bug where internal Squirro Users (used for data loading) were leaking into the Frontend interface.

  • Fixed content-type detection pipeline step which returned application/octet-stream instead of the correct content type.

  • Fixed an issue where data loader jobs would not be able to load data due to an expired token.

  • SAML Single-SignOn now supports user IDs (often the email address) of up to 256 characters. This was previously limited to 40.

  • Fixed Known Entity Extraction running on PDF documents and other items with pages.

Breaking Changes

  • Activity logging: Streamlined query format

    • Previously different activity actions (query vs. query.result) logged the user’s query in a different format to activity.{datetime}.jsonl. Now all query related actions log in the same format.

    • The Squirro Activity Data Loader Plugin got updated to parse the new and old format correctly.
      Projects need to update to latest activity data loader.

Intermediate releases

This lists above lists only the most important bug fixes since the previous LTS release. All the other bug fixes can be found in the intermediate release notes since 3.2.0:

Installation and Upgrading

For new installations please follow the https://squirro.atlassian.net/wiki/spaces/DOC/pages/15597575 instructions.

To upgrade an existing installation, please consult the Upgrades for Squirro 3.2.0 and later guide.