Pipeline

When indexing items in Squirro, they run through the Squirro pipeline to apply a number of enrichments.

Architecture Overview

As outlined in the Architecture, the indexing is a step-by-step approach where items are first imported (see Data Import), then enriched in the pipeline and then searched and presented. The following diagram shows the overview of this.

Pipeline Architecture

If we now expand the Pipeline step, we get a bigger diagram with lots of detailed steps that together form this pipeline.

There are built-in enrichments, many of which are enabled by default. Examples of this are the language detection or duplicate detection.

This can be extended by a number of custom enrichments, including Search Tagging, Known Entity Extraction and Pipelets.

Processing

The pipeline steps are run sequentially. When a pipeline step fails for any reason, the item is re-queued and the full pipeline will be re-run on that item. If processing fails persistently (10 times by default) the item is dropped from the pipeline.

Items are only displayed to the users once the full pipeline - with exception of Search Tagging - has run through. For details on the search tagging delay, see the Filtering step.

Configuration

The steps to be executed is configurable partially on a per-project basis and partially for each subscription. For built-in enrichments, the Processing Config is used to enable and disable those steps.

See the documentation on the various enrichments for how to make use of the various enrichment options.

Pipeline

Table of Contents

Architecture Overview

Pipeline Architecture

Processing

Configuration