Squirro 3.3.9 LTS - Release Notes

Welcome to Squirro 3.3.9, released on September 29, 2021!

Squirro 3.3.9 is a long-term support (LTS) release and will receive updates for security issues and important bug fixes for the next two years. See the Squirro Release Process document for details on Squirro’s versioning.

These release notes cover all the new features and improvements since the previous LTS release 3.2.11. Most of these were already introduced in the intermediate releases of the 3.3.x series and have been covered in those release notes as well.

With Squirro 3.3.9 the support for Microsoft's Internet Explorer has been dropped. This is in line with Microsoft’s own developments of not supporting Internet Explorer anymore. See System Requirements for the currently supported clients.

Table of Contents

New Features

  • Sales Insights

  • Sentiment analysis

  • PDF support for AI Studio

  • Insights generation

  • Pipeline rerunning

  • Pipeline stages

  • Email Parser Pipelet

  • Configuration service

Sales Insights

The Sales Insights application assists sales teams in targeting the right customer at the right time, and knowing what to talk about with them. This has been introduced as a new project template that can be selected from the project creation dialogue. The application template is also available on our self-service platform start.squirro.com.

Sentiment analysis

To support the Sales Insights application, we introduced out of the box sentiment analysis modules.

There are various options:

  • A high-quality rule-based sentiment tagger, optimized for social media or domains like editorials, movie reviews, and product reviews. This is available as part of the NLP Tagger.

  • Pre-trained models such as FinBERT or DistilBERT can be used in libnlp. See specifically the BERTSentiment classifier.

  • An AI Studio model template which can be used to train models specialized for sentiment analysis.

PDF support for AI Studio

The Squirro AI Studio now has full support for PDF documents. This includes labeling as well as inference for document-level classifiers as well as sentence-level classifiers.

On the libnlp level, the PdfSentencesTokenizer extracts the required information from PDF files, and the SquirroEntityFilter has been extended to be able to extract sentences from PDF documents.

Insight generation

The Insights Generator data connector is exposed in the new Data Analytics category when creating data sources. The Insights Generator data connector is used to run a pipeline workflow, containing the Insights Generator Pipelet, periodically to analyze and store insights on already indexed data.

The items created from the Insights Generator data source are insight cards that show the sentiment or engagement scores for a particular client. A sentiment score is calculated by aggregating sentiments across all the engagements (e.g., call notes) associated to that client.

Pipeline rerunning

We have added Pipeline rerunning capability of a pipeline workflow on previously indexed data.

The rerun feature is accessible inside the pipeline settings drop-down and supports two rerun types: row format and in item format from the Squirro index.

Note that this may require a configuration change on server to update the item_transformation_in_pipeline option. See the section Breaking and High-Impact Changes below for details.

Pipeline stages

We introduced a new pipeline step called Change Pipeline, allowing to exit the current and switch to a new pipeline. A typical scenario to change the pipeline is to have (quick) indexing separated from time consuming processing. In the first pipeline items are indexed to be already available in Squirro and then sent trough a second pipeline for (more) processing using the Change Pipeline step.

Email Parser Pipelet

The new Email parser pipelet can be used to parse any email data. It extracts the email body and discards email content after the occurrence of an email footers (configurable).

Configuration service

Configuration service for easy configuration of advanced settings through the user interface. See Configuration service for details.

Breaking and High-Impact Changes

  • Internet Explorer (IE) is no longer supported.

  • Query templates have been changed. They are now appended to the query instead of modifying the query string. If you have existing query templates, they will need to be adjusted. See Query Templates for details.

  • We have split the Salesforce connector into a Salesforce sales cloud connector and a Salesforce service cloud connector for a more intuitive user experience. The two connectors have a different 1-click default configuration. Existing configurations are not impacted by this.

  • The Microsoft SharePoint connector has been split out from the Microsoft OneDrive connector. SharePoint data is no longer loaded as part of the OneDrive connector.

  • The default value configuration option item_transformation_in_pipeline in section pipeline of /etc/squirro/common.ini has been changed from false to true in this release. Please ensure that this value is also set to that value on your installations, to ensure the Pipeline rerunning functionality works.

  • The Content Standardization step has been split into 3 steps: Sanitize HTML, Remove HTML and Content Standardization. The existing configurations should automatically be migrated when updating.

Improvements

Gather

  • 1-click connectors include a source_type facet by default.

  • Performance of running pipelets in the pipeline was improved through parallelized mini-batching.

  • New Feed data loader: the RSS Feed Dataloader was reworked providing now more metadata from the RSS feeds to be mapped to facets and removing an issue where duplication of items happened.

  • When editing a data source, it is now possible to delete the field templates in the mapping screen.

  • Setup of incremental data loaders has been simplified. The data loader plugin can now declare which fields should be used for an incremental load, thus simplifying the options for the project creator setting up the data loader.

  • Introduce three new content processing steps, which replace the previous Content Standardization pipeline step:

    • Sanitize HTML: cleans HTML document of potential malicious HTML tags

    • Remove HTML: removes HTML from fields

    • Content Standardization: makes sure that the item as the correct structure to get indexed

Understand

  • We have switched to a new and more robust HTML extractor by default for our indexing and NLP processing steps.

  • NLP Tagger improvements:

    • Performance improvement for big PDFs by processing only a subset of pages.

    • Allow usage of not installed / custom SpaCy language models.

    • Add a simple Topic Tagger based on key-phrase de-duplication and more aggressive cleaning.

    • Add a term/phrase filter step to exclude unwanted Named-Entities or Sentiment-Terms.

    • Extend text-cleaning to remove Emails & HTTP-Links.

AI Studio

For efficient Ground Truth labeling the Ground Truth focus view has been improved. Filtering options are now available to quickly select all labeled or not yet labeled elements of a candidate set.

Labeling

  • Multiple Candidate Sets can be selected on the overview page to create a Ground Truth with them.

  • Candidate Sets with Smart Filter searches are highlighted when labeling in the Ground Truth.

  • Introduction of additional filter options in ground truth labeling process, which allow the user to see all the labeled or not yet labeled elements of a candidate set.

Proximity models

  • Improved support for proximity rules, especially by providing editing options in the user interface.

  • When labeling content in the ground truth, the rule creation dialogue is now only shown when no matching rule exists yet. This speeds up training of proximity models a lot and reduces the mental overhead of training such models.

Model templates

  • Template which can be used for sentiment analysis by training the sentiment classifier on your own data.

  • Encode sentences with the pre-trained universal sentence encoder of Google and use logistic regression to classify the sentences.

  • Encode sentences with the pre-trained universal sentence encoder of Google and use cosine similarity to classify the sentences. This template enables users to test their text classification use cases on a low effort base.

  • Update the SVC and the Naive Bayes ML templates from using Bag of Words embeddings to TF-IDF embeddings.

Note that for on-premise installations of Squirro the pre-trained model needs to be installed manually first with yum install squirro-google-universal-sentence-encoder.

Other improvements

  • We have improved the loading performance for the validation screen in AI Studio.

  • Improve user guidance through the different process steps.

  • Improve the HTML tokenizer to also tokenize titles.

Act

  • We have improved the performance of the Search Highlighting. We have achieved this by

    • Computing the fragment size dynamically based on highlightable query-term length.

    • Changing the default highlighter type to use Elastic's new default unified highlighter that brings up to 10x faster highlighting compared to previous default to fvh. In case of any discrepancies, it is possible to revert back to the old highlighter.

  • Layer visibility conditions can use language as a filter.

  • We have improved the query parsing for content-based typeahead. This now handles the phrases and non-word characters in facet-values.

  • The default newsletter template has been improved with better texts, adding a default unsubscribe link, and links to the communities.

  • Mobile dashboards can now be embedded with iFrame.

  • Apply highlighting to titles in all item details.

Admin and Operations

  • Removed the demo user type.

  • Introduce a feature flag for the Content-Security-Policy header. This can be used to prevent Squirro from being embedded as an iFrame and introduces other security improvements. See the Squirro Secure Config Guide for more information.

  • Query templates have been changed to always be added to the user’s query, rather than overwriting it. This increases security as it reduces the potential for introducing query-rewriting attacks.

Bug Fixes

See the intermediate release notes for a list of all the bugs fixed since the last LTS release.

Intermediate releases

This lists above lists only the most important bug fixes since the previous LTS release. All the other bug fixes can be found in the intermediate release notes since 3.2.11:

Installation and Upgrading

For new installations please follow the Setup on Linux instructions.

To upgrade an existing installation, please consult the Upgrades for Squirro 3.2.0 and later guide.