Pipeline Priorities

Overview

The Squirro pipeline supports priorities for data that is being ingested. This way it is possible to ensure that certain data is processed more quickly than other.

This becomes helpful when some data sources are more important than others. For example data that come from a premium data provider might hold more value than the data that derives from a public RSS feed.

The pipeline supports this use case by having three layers of priority: Low, Normal, and High.

Priorities

The pipeline supports three priority levels:

Low
Normal
High

Each of these priorities has its own processor thus ensuring that the items from different priorities do not block each other.

Using Priorities

Priorities can be defined in the data source, and can be influenced using the Change Pipeline step.

Data Source

By default, all the data sources are created with the Normal priority level. It is possible to define the priority level during the creation of the data source and later by editing the data source.

The rationale for choosing the priority level of a data source is to judge how valuable for you the data from this source are compared to the data from the rest of your sources.

Change Pipeline

The priorities can also be changed when queuing work in a new workflow using the Change Pipeline step.

This allows a setup where one initial pipeline workflow does the minimum effort required to index the data. From this moment the data is available and searchable for users. The more resource-intensive processing can then be deferred to a secondary pipeline workflow which is invoked using the Change Pipeline step. To avoid those steps from clogging up the processing of the initial item the change pipeline step can reduce the priority at this time.

Configuration

The setup of the pipeline priorities can be configured in the Configuration service. Please see Configuration of Ingester for prioritised Data Sources for information.

Monitoring

To monitor how busy the different queues are use the Monitoring Plugin in the Server space.