Squirro 3.2.10 LTS - Release Notes
Welcome to Squirro 3.2.10, released on April 29, 2021!
We have released a bug-fix release 3.2.11 on May 6, 2021 with some critical bug fixes.
Squirro 3.2.10 is a long-term support (LTS) release and will receive updates for security issues and important bug fixes for the next two years. See the Squirro Release Process document for details on Squirro’s versioning.
These release notes cover all the new features and improvements since the previous LTS release 3.2.0. Most of these were already introduced in the intermediate releases of the 3.2.x series and have been covered in those release notes as well.
Table of Contents
New Features
New pipeline editor
Document-level tagging in the AI Studio
Feedback processing inside the AI Studio
Email template for scheduled newsletters
Content-based typeahead
Translations for Dashboards
Support for Single Logout (SLO) in the SAML Single-SignOn (SSO) flow
Support for Red Hat® Enterprise Linux® 8 and CentOS 8
Pipeline
The pipeline editor has been completely recreated. The new editor is more visual, and provides a much easier overview of the various pipelines in a project.
Please see Pipeline Editor for an introduction into the new editor.
AI Studio
The previously introduced Data Science Studio was renamed into AI Studio. Apart from this renaming, the AI Studio has seen significant new features and improvements to make training of classification models even easier.
We introduce a document-level classification in the AI Studio: The ground truths and the trained models, can be configured to be trained on document-level tagging besides the already existing sentence-level tagging. This enables training of models based on the whole content of a document.
The configuration of ground truth definition for sentence-level tagging has been extended with an option to define sentence splitting rules. This enables processing of content that cannot easily be split up on typical sentence structure, for example CRM call notes or certain textual news sources.
For sentence-level classifications, we now expose the user feedback in the ground truth for processing. When enabled, the end users can provide feedback on each classified sentence. This feedback can then be processed in the ground truth to add new labels for further training.
Cognitive Search - Project template
When creating a new project, the Cognitive Search and Cognitive Search+ templates are available to kick-start the creation of a new Squirro Cognitive Search project. These two templates are equivalent to Cognitive Search and Cognitive Search: Food Safety applications provided on start.squirro.com.
The Cognitive Search template includes a number of pre-configured dashboards for an advanced search experience. These allow users to follow key communities and to search all the indexed data. The project creator just needs to load the data and define the communities.
The Cognitive Search+ template even includes example data and communities already. It is a great way to start with a fully populated Squirro experience and use for learning and demoing.
Updating Squirro will update the project templates, but already created projects are not updated to reflect any changes in the template.
Newsletters
End users can receive regular updates of relevant content through email newsletters. These newsletters are sent based on user’s communities subscriptions and will include any new content in the subscribed communities.
The project creator can enable newsletters on their project. The newsletter template can be edited by the server administrator using a visual editor and support complex templating. And individual users can manage their newsletter subscriptions in their profile.
See Newsletters for information about how to enable and change these newsletters.
Content-based typeahead
With content-based typeahead the search suggestions provided to end users become more relevant. For this indexed documents are analyzed to extract key phrases. These phrases are then suggested to the user based on the current query. This is in addition to the previous facet value typeahead which is still available.
See Content-based Typeahead for documentation.
Translations for Dashboards
Squirro dashboards are now available in multiple languages. In addition to German, Squirro also includes French and Italian translations.
Additionally dashboards and widgets now have access to translation keys. These can be configured in the Project Translations section. In dashboards they can then be accessed using the translation syntax of $key
syntax ($
followed by a defined translation key).
More documentation on this feature will follow shortly.
Single Logout
Squirro supports Single Sign-On (SSO) out-of-the-box using the SAML standard. With this release Single Logout (SLO) is now also supported. When the identity provider has SLO enabled, a user’s logout from Squirro will fully log them out on the identity provider side as well.
Red Hat Enterprise Linux 8 / CentOS 8 packages
Squirro now provides packages for Red Hat® Enterprise Linux® (RHEL) and the free community edition CentOS version 8.
Please be aware that CentOS 8 has been moved to end-of-life by Red Hat and will only be supported by them until the end of the year 2021. Squirro is actively monitoring the evolving landscape with new community-supported RHEL versions and will provide supported packages for at least one of those editions in the future.
Breaking and High-Impact Changes
With the introduction of the new pipeline editor, the navigation structure in the Setup space has changed. To make room for the new pipeline editor to use the full width, all the options that were previously under the Enrich tab have moved to the new AI Studio tab. Re-running of enrichments will soon disappear from there and then move into the new pipeline editor properly.
Feedback options in the user interface are only presented for entities that have a
model_id
property. This is part of the user feedback additions. Previously, on sentence-level entity highlights it was possible to collect feedback in a few widgets by pointing to a custom endpoint. Now the feedback is collected centrally (and can be accessed through the API).Email templates were moved from the Setup space to the Server space. This reflects the fact that the templates are global and not project-specific.
The
squirro_activate
command now activates the Python 3 virtualenv as opposed to the Python 2 version. If the Python 2 virtualenv is still present, the command will print a warning with a message on how to activate the Python 2 environment manually.The Squirro navigational interface has been redesigned and we have got rid of “Enrich”, “Predict”, & “Train” tabs. Any custom studio plugins under these sections have to be re-uploaded to the new section called “AI Studio”. This can be achieved by specifying the
”location”: “dss”
in thestudio_plugin.json
file.
Deprecated Functionality
Pipelets can declare their accepted configuration using a
getArguments
method. When that method is not present, a JSON editor is displayed in the Pipeline Editor for that step and the project creator can enter an arbitrary configuration.
This default behaviour is now deprecated and will be removed in a future version. To continue accepting configuration options, it is recommended to implement thegetArguments
method for all pipelets.
Please refer to the Pipelets documentation for details.
Improvements
Gather
If the data loader is run with the delete flag, we do not delete any items which have missing IDs.
New source options are better documented in the command line data loader (
squirro_data_load
command). Most notably thesquirro_source
option which allows you to fetch data from a different Squirro source cluster. Runsquirro_data_load --help
to give it a go.This aforementioned Squirro data source now uses the inbuilt
key_value_store
andkey_value_cache
APIs for its state management and duplication detection logic.
Understand
The timeout for the plumber service (responsible for running pipelets) is now configurable to allow for pipelets to take more time for their processing without running into timeout issues. The default value for this timeout was also increased from 120 seconds to 600 seconds. See common.ini.
Improved ingester logging for back-offs during ingestion of batches with errors.
The
squirro_kee upload
command also uploads the tokenizers if they exist.
Act
Project export and import files are now forward-compatible. So a file exported from 3.2.0 can be imported into a 3.2.10 Squirro installation (and can also be imported in future releases such as 3.3.0, etc.). Previously the files had to be from the exact same Squirro version.
Export and Import support App/Nav Bar settings, community types, communities, and dashboard themes. The pictures defined for communities are included in the export as well.
The Squirro Frontend SDK now allows passing a pre-defined query as an initialization parameter.
The Frontend SDK no longer tries to load fonts. Instead they need to be loaded by the caller.
Additional Highcharts modules are loaded by default, enabling the building of custom gauge widgets.
The Global Search bar now supports spell-checking.
Communities and Known Entity Extraction play together better:
When setting up communities, the Known Entity Extraction configuration can be automatically set up.
Known Entity Extraction configurations done through the user interface can use a community type as the source for the names to tag.
Community creation supports Excel / CSV import.
Community subscriptions can now be done in batch (multiple at once), both in the API as well as in the user interface.
AI Studio
Rules defined during AI Studio labeling now also store the corresponding label and hence can be visualized in the rules overview page.
The usability of the entire training process has been improved. This includes a better flow between successive steps, from the addition of new candidate sets to existing ground truths, and more.
Performance improvements for fetching labelled data.
When proximity rule definition is disabled for a ground truth, the rules are hidden from view.
Optimize Item fetching for Machine Learning workflows.
squirro_query_loader
inlib/nlp
only fetches the desired number of documents when the default batch_size is larger than the requested documents.
Admin and Operations
Support for reading relevant config from
/etc/squirro
to establish connection to Elasticsearch and Squirro services. This allows for the tool to work correctly even when you have set up SSL / TLS for Squirro services as well as ES. See Squirro Secure Config Guide.Ability to disable user management functionality: This is useful when Squirro is integrated with a Single-SignOn (SSO) system. To disable this, set the configuration flag
frontend.userapp.sso-disable-user-edit
totrue
(see configuration.ini). Enabling this, disables the following functionality:Setting of profile information (name / email)
Setting of password in user profile
Management of users in "Server" space
More powerful
squirro_stop
,squirro_start
, andsquirro_restart
commands. Now you can simply saysudo squirro_start
(or similar) in situation where you have the permissions to run thesystemctl
commands with sudo and Squirro will do the right thing and use sudo to start/stop/restart all relevant Squirro services. Furthermore, we skipcluster
service from these commands to avoid having accidental fail-overs (even if the/etc/squirro/cluster.ini
file exists).Migration scripts for topic service now run at the boot time of topic service and are no longer run during the upgrade time. This allows us to make sure that our databases are in a consistent state every time the service boots up. If this becomes a problem for you, this migrations at the boot time of the service can be disabled by setting the
execute_at_startup
inmigration
section of the relevant service file asFalse
.Increased caching performance by utilizing
pickle
instead ofjson
for internal caches.
Updated Dependencies
Updated Python package PySAML2 to the latest version 6.3.1.
Remove dependency of Python package statsd.
Remove R-core dependency. This means less external dependencies needed and fewer package install conflicts during Squirro installation.
Add dependency for the poppler-utils package.
Bug Fixes
Fixed a bug where internal Squirro Users (used for data loading) were leaking into the Frontend interface.
Fixed content-type detection pipeline step which returned
application/octet-stream
instead of the correct content type.Fixed an issue where data loader jobs would not be able to load data due to an expired token.
SAML Single-SignOn now supports user IDs (often the email address) of up to 256 characters. This was previously limited to 40.
Fixed Known Entity Extraction running on PDF documents and other items with pages.
Breaking Changes
Activity logging: Streamlined query format
Previously different activity actions (query vs. query.result) logged the user’s query in a different format to activity.{datetime}.jsonl. Now all query related actions log in the same format.
The Squirro Activity Data Loader Plugin got updated to parse the new and old format correctly.
Projects need to update to latest activity data loader.
Intermediate releases
This lists above lists only the most important bug fixes since the previous LTS release. All the other bug fixes can be found in the intermediate release notes since 3.2.0:
https://squirro.atlassian.net/wiki/spaces/DOC/pages/2396356958
https://squirro.atlassian.net/wiki/spaces/DOC/pages/2408185861
https://squirro.atlassian.net/wiki/spaces/DOC/pages/2448097471
https://squirro.atlassian.net/wiki/spaces/DOC/pages/2448654369
Installation and Upgrading
For new installations please follow the https://squirro.atlassian.net/wiki/spaces/DOC/pages/15597575 instructions.
To upgrade an existing installation, please consult the Upgrades for Squirro 3.2.0 and later guide.