Beta Release Notes

Beta is currently not installable, while we are working out a few kinks related to the Elasticsearch upgrade in the upcoming new version.


Changes in the beta repository are shown here as soon as a new version is available.

  • 2018-01-29 13:52:14.554794
  • Support hjson in assets, e.g. data loader, studio plugins, widgets main configuration files
  • 2018-02-02 14:28:40.233311
  • No user visible changes
  • 2018-02-09 07:28:07.646470
  • Always include the Squirro cluster service in RedHat 7 and CentOS 7 deployments including single-node installations
  • Fixing an issue with proper ordering of pipelets within Pipeline 2.0
  • Plugin repository Studio plugin - install Squirro plugins such as data loader plugins or pipelets from the UI
  • 2018-02-16 07:04:51.931427
  • Bugfix: do not stop redis when redis cache is stopped or restarted.
  • 2018-02-16 07:33:27.944698
  • No user visible changes
  • 2018-02-23 07:07:27.254662
  • Extend Dataloader API to provide access to a key-value store as well as a key-value cache to the Dataloader plugins.
  • Including optional nginx monitoring module nginx-module-vts
  • Fixes to walk-through try.squirro.cloud wizard
  • Support for Self-Service demo role and walkthrough wizard
  • 2018-03-09 07:09:16.563789
  • Performance improvements to item deduplication.
  • ... Make Topic service start up more robust in the event of failure of packaged studio/dataloader plugins.
  • Fix for our new server monitoring infrastructure
  • Be more correct in cleaning empty inputstream directories.
  • Ensure that pipeline 2.0 speaks the same language as pipeline 1.0
  • More robust error handling in Pipeline 2.0
  • Add support for consume_multiple pipelets in the plumber service.
  • Remove bulk indexing support from Python client. By default, the new pipeline will be used from now which is as flexible as the old bulk indexing system.
  • Obsolete the pipeline and the processor service.
  • Changed plumber API to allow for bulk operations for pipelet execution, also added bulking for cache cleaning step.
  • Removed the pipeline 1.0, pipeline 2.0 is the only choice now.
  • Improved disk usage by logrotating the /var/log/squirro/*/nginx*.log files
  • Unit test only fix
  • Really fix redis server restarting.
  • Support in the Pipeline 2.0 for granular item fault handling within batched enrichments and ability for batched steps to modify items
  • 2018-03-16 07:07:30.557932
  • Be more robust when the deduplication step failed because of a restarted redis.
  • dataloader and SquirroAPI subscription processing-config backward compatibility with pipeline workflows
  • Enrichment API compatibility on top of pipeline workflows
  • Fixed the ability to import items with "almost empty" html documents, e.g. those containing only whitespace, comments, or html "processing instructions"
  • Bug fix to ensure sources relate properly to pipeline workflows upon upgrade
  • Pipeline workflows
  • Support of managing built-in enrichments in addition to custom enrichment in the User Interface and the Squirro API via "Pipeline Workflows"
  • Slow backoff for failing retries in the pipeline.
  • 2018-03-23 07:17:23.047508
  • Improved handling of empty files
  • Improved Pipeline 2.0 performance of Language Detection and various logging fixes.
  • Remove pyrebloom package.
  • Fix plumber handling of item IDs in case of multiple items per source item or pipelet failures.
  • Simplify deduplication step to use bulk deletion and only support the replace policy based on ID.
  • ...Add machine learning service for running various Squirro's machine learning models to enable things like auto-clustering, recommendations to begin with. Currently in alpha.
  • Upgrade numpy to version 1.14.2
  • Prevent pipeline 2.0 ingester from generating artificial sq:doc field on retries.
  • Cleaning up of old and legacy files in Pipeline 2.0 ingester inputstream directories.
  • Improved parallelism in keyword tagging and the filtering service by having the pipeline 2.0 ingester run multiple batches in parallel
  • Add action and status endpoints for the ingester, to be used by a Studio plugin in the future.
  • 2018-03-23 07:38:42.898908
  • No user visible changes
  • 2018-03-29 02:19:56.074580
  • Subscription API compatibility with old version of Squirro SDK tools
  • ...Studio plugins can now use the cache instance of redis by invoking `get_injected('redis_studio_cache').
  • Slightly break the status api of the ingester (returning dicts instead of lists now). Improve robustness of status api, add failed batches truncation.
  • Enabled pipeline 2.0 failed file reaping at hourly granularity
  • Accept source processing config alongside workflow_id if processing config contains no changes .
  • Fix file importer.
  • Fix to allow enabling Prometheus based monitoring of Squirro
  • 2018-04-06 07:40:23.286320
  • Fix machine learning install for centos 7
  • Robustness improvements to install and upgrade
  • Tolerate services no longer running before removal.
  • 2018-04-13 07:10:19.301226
  • Fix keyword validation within steps within pipeline 2.0 and workflow for keywords with special characters such as spaces whose type is not basestring.
  • Remove squirro.service.fileimport
  • Remove squirro.tools.replay
  • Handle source-based legacy inputstream files when sources no longer exist. Also improve logging in case there are unforeseen future uncaught exceptions and don't let the file reaper thread die because of them.
  • Don't auto-opt-in to Noise Removal step on upgrades to Squirro version 2.6.0 and when Pipeline Workflow is not specific as in legacy Squirro API
  • Remove explicit dependencies to JRE. Replaced by checks for JAVA availability in the pre-install and pre-upgrade scripts of squirro-cluster-node and squirro-storage-node.
  • 2018-04-20 11:15:00.070551
  • Pipelet name handling was fixed, this unbreaks various code dealing with Pipelets.
  • Remove squirro.api.bulk
  • Improved logging of pipeline 2.0 ingester and avoiding persisting sq:blocks and sq:bodyhash in retried batches as these are internal fields.
  • Human readable representation of time-formats returned in the Machine learning Jobs status.
  • More robust failure with next instructions for install/upgrades when java is missing.
  • 2018-04-27 07:37:38.529513
  • Add a forking mode to pipelet (plumber) and filtering services for optional better CPU-bound scaling.
  • Do not write file contents to disk in the pipeline. This fixes livelocks of the ingester for some failing pipeline workflows.
  • Optimize performance of the pipeline in case the boilerplate removal is being used (less memory usage).
  • Provide a simple load balancing scheme in the nginx config.
  • Add `desyncFromDesktop` in topic.dashboards database to. To be used for supporting mobile dashboards.
  • Fix reset action of ingester.
  • 2018-05-04 07:13:12.245403
  • Libnlp shim fixup.
  • Bug fix for resolving new facet keys.
  • Add errors_grouped section to the ingester status endpoint output.
  • Handle special characters in weighted keywords.
  • 2018-05-11 07:13:20.425084
  • <internal build change>
  • Add packaging for schema library.
  • 2018-06-11 14:05:03.958925
  • Fixed query parser for incorrect curly brace usage.
  • Fix an issue with quoted values for weighted keywords search
  • Fixes an issue with weighted keywords aggregation combined with non-weighted keywords.
  • Fix typeahead for weighted keywords.
  • Fix for error handling for ML service.
  • Abstract out runner into libnlp
  • Updating ply to version 3.11 and fixing an issue with complex query parsing under heavy load.
  • Extend ML jobs api to expose the last run log of the job.
  • Bump spacy version down to 2.0.11
  • When batches of some valid and other non-valid items are sent to Squirro, we now process the valid ones instead of rejecting them along with the non-valid items.
  • Avoid log noise in topic service when redis-server-cache cannot be reached
  • Less verbose low-value logging
  • Update version of spacy to fix working of spacy on systems where the CPU instruction set is missing the avx instructions.
  • Machine learning datasets are broken into `train`, `test`, and `infer`. Added `runtime`, `status`, and `last_result` for machine learning jobs.
  • Revert python-flup dependency to version 1.0.3.dev-20110405
  • Custom widgets now require author in addition to description
  • Provide kill API for running machine learning jobs.
  • Better default config for ML jobs queuing.
  • Update hdf5 dependency to a newer version 1.8.20
  • Ensure that subscription creation, deletion and workflow reassignment refreshes workflow subscription counts.
  • Fixes recommendations when used in combination with a filter query.
  • Harden widget migration script to tolerate files where directories are expected
  • Include subscription count in pipeline workflow API
  • Fix for typeahead for weighted facets
  • Clarifying output messages in squirro_asset and squirro_widget tools
  • Widgets now require a description entry in their configuration to specify their purpose.
  • Current activity monitoring endpoint for pipeline 2.0 ingester processors
  • Remove dependency of Machine Learning service on R-core. R-core is still needed by Trend service though.
  • Adding the ability to do aggregations on weighted keywords. Returns document counts of weighted keyword values independent of probabilities.
  • Also support reset of email templates.
  • An optional id parameter can now be supplied for the creation of projects, subscriptions and dashboards. Useful for migrating projects across servers while keeping the same ids
  • Add hdf5 dependency.
  • Wait in the ingester until the topicproxy has started.
  • Fix typo
  • Centralized validation of Machine learning workflow configs.
  • Change emailsender/templates from mako files to database backed Jinja2.
  • optionally join entities to items after libnlp run
  • validate against libnlp schema on machine learning workflow creation.
  • 2018-06-15 07:13:18.185040
  • Ensure phrases are properly highlighted
  • Allow for highlighted abstract sizes smaller than 18 characters.
  • Facet name length restriction is checked before creating a new facet with the topic API.
  • Fixing an issue with query parsing where a facet value contains an equal sign.
  • Do not allow changing the email address to the one of another user.
  • 2018-06-22 07:12:38.672949
  • Support for loading pretrained glove embeddings
  • Do not log Python warnings into the stderr log in case of the ingester, but into the rotated log files.
  • Add pretrained glove embeddings. Can be installed with `yum install squirro-glove`.
  • Update minor version of certain dependencies to update Scrapy to the latest version.
  • SQ-9463: Proper removal of entities upon deletion.
  • Recommendation explore page aggregates available input features for display chips.
  • Ensure reserved facets cannot be modified via the client.
  • Escaping trailing \ and ^ characters in query tokens.
  • Control install of the dataloader and studio plugins on each restart of the topic service using new flags `install_dataloader_plugins`, `install_studio_plugins`
  • This fixes an issue where smartfilters could not be created without the squirro_v9 main index.
  • Correctly show version number in toolbox tools if they are installed as RPM.
  • 2018-07-06 07:47:22.697975
  • Remove the deprecated packages during upgrade
  • Ensure aggregation fields are converted to a list if sent as string.
  • Only create a new subscription if a non-default pipeline workflow was selected or the subscription does not exist.
  • Update zookeeper minor version from 3.4.11 to 3.4.12
  • Fix a bug where Zookeeper did not come up on centos7.
  • For new installations, disallow duplicate email addresses per tenant also on the DB level.
  • Allow to set fields = ['*'] to get all fields back in the query api.
  • Fixing the returned matching sub items in case a query is combined with a facet query.
  • Fix sorting issue for facets without any values
  • LD_LIBRARY_PATH automatically resolve themselves after a service restart.
  • Add pathspec dependency
  • 2018-07-13 07:12:25.968819
  • No user visible changes
  • 2018-07-27 07:12:53.879279
  • Fix Not found error when killing a Machine learning job
  • Monit files for mysql is not included on centos7
  • 2018-07-27 07:38:25.731915
  • No user visible changes
  • 2018-08-03 07:15:16.791852
  • Fix the ingester backlog status computation.
  • Pretrained glove embeddings for wikipedia dataset is now also available in 100 & 200 dimensional vectors (in addition to the already available 50 dimensional vectors).
  • 2018-08-03 07:41:35.871019
  • No user visible changes
  • 2018-08-17 07:15:01.066536
  • Add selinux policy to allow nginx to read log files.
  • ...Delete dataloader sources if there are no subscriptions referencing them anymore.
  • 2018-08-17 07:42:36.946322
  • No user visible changes
  • 2018-09-07 09:50:15.157821
  • Simplify daemon/service startup scripts.
  • Add support for custom splash screen
  • Optimize running larger direct inference workloads by leveraging the jobs manager.
  • Expose `studio_plugin` upload option in the help of `squirro_asset` command.
  • Add support for TF in on centos7/rhel7
  • 2018-09-14 07:06:21.951993
  • No user visible changes
  • 2018-09-14 07:28:36.040712
  • No user visible changes
  • 2018-09-21 07:05:08.215956
  • Fix running the topic service in case Java needed an LD_LIBRARY_PATH.
  • 2018-09-21 07:25:29.025881
  • No user visible changes
  • 2018-09-28 07:05:31.312589
  • This change allows to sort on weighted keywords in the following way: sort:weighted_keyword_fieldname.weighted_keyword_field_value[:asc|desc]
  • Configuration option to enable the de-duplication of binary files during ingestion of binary data. Potential drawback being that if two different Squirro items has a reference to the same binary file, deleting one item would end up deleting the binary file for the second item too.
  • This change allows to use boosting (the ^ char) to be used in the query language.
  • Allows to arbitrarily transform the user query by a query transformer class. Transformation happens before query templating in the topic API.
  • Allows to specify the full elasticsearch sort dict to be passed in with the squirro query syntax.
  • Make content conversion more robust by avoiding intra-step retries.
  • Do also dedup intra-batch duplicates.
  • 2018-09-28 07:26:34.657753
  • No user visible changes
  • 2018-10-12 07:07:22.198174
  • Reduce the number of ingester workers and processors.
  • Allow upgrades from installations with broken permissions/packages.
  • Fix topic migration script 024 that was failing in the past weeks.
  • Adjust Elasticsearch logging to compress log files and delete them if the total log size exceeds 2GB.
  • Avoid error message about /etc/squirro/selinux/squirro-nginx-log.te on install/upgrade on CentOS/RHEL 6.
  • Enforce a longer client-side limit for bulk operations. This reduces the amount of batch index operations which time out to a minimum.
  • 2019-02-14 14:39:06.537283
  • No user visible changes
  • 2019-02-14 15:17:29.532767
  • No user visible changes
  • 2019-03-04 07:04:54.798426
  • Expose source backlog info with the sources
  • 2019-03-11 07:03:54.785834
  • No user visible changes
  • 2019-03-25 07:05:02.084072
  • No user visible changes
  • 2019-04-01 07:04:01.942425
  • No user visible changes
  • 2019-04-08 07:04:21.937094
  • No user visible changes
  • 2019-04-15 07:04:51.683497
  • No user visible changes
  • 2019-04-15 09:10:07.231677
  • No user visible changes
  • 2019-04-22 07:04:31.718824
  • No user visible changes