Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

When indexing items in Squirro, they run through the Squirro pipeline to apply a number of enrichments.

Table of Contents

Architecture Overview

As outlined in theĀ Architecture, the indexing is a step-by-step approach where items are first imported (see Data Import), then enriched in the pipeline and then searched and presented. The following diagram shows the overview of this.

Pipeline Architecture

If we now expand the Pipeline step, we get a bigger diagram with lots of detailed steps that together form this pipeline.

There are built-in enrichments, many of which are enabled by default. Examples of this are the language detection or duplicate detection.

This can be extended by a number of custom enrichments, including Search Tagging, Known Entity Extraction and Pipelets.

Processing

The pipeline steps are run sequentially. When a pipeline step fails for any reason, the item is re-queued and the full pipeline will be re-run on that item. If processing fails persistently (10 times by default) the item is dropped from the pipeline.

Items are only displayed to the users once the full pipeline - with exception of Search Tagging - has run through. For details on the search tagging delay, see the Filtering step.

Configuration

The steps to be executed is configurable partially on a per-project basis and partially for each subscription. For built-in enrichments, the Processing Config is used to enable and disable those steps.

See the documentation on the various enrichments for how to make use of the various enrichment options.

  • No labels