Table of Contents
Table of Contents | ||||||
---|---|---|---|---|---|---|
|
Overview
The Squirro pipeline supports priorities for data that is being ingested. This way it is possible to ensure that certain data is processed more quickly than other.
This becomes helpful when some data sources are more important than others. For example data that come from a premium data provider might hold more value than the data that derives from a public RSS feed.
The pipeline supports this use case by having three layers of priority: Low, Normal, and High.
Priorities
The pipeline supports three priority levels:
Low
Normal
High
Each of these priorities has its own processor thus ensuring that the items from different priorities do not block each other.
Using Priorities
Priorities can be defined in the data source, and can be influenced using the Change Pipeline step.
Data Source
...
By default, all the data sources are created with the Normal priority level. It is possible to define the priority level during the creation of the data source and later by editing the data source.
The rationale for choosing the priority level of a data source is to judge how valuable for you the data from this source are compared to the data from the rest of your sources.
Change Pipeline
The priorities can also be changed when queuing work in a new workflow using the Change Pipeline step.
This allows a setup where one initial pipeline workflow does the minimum effort required to index the data. From this moment the data is available and searchable for users. The more resource-intensive processing can then be deferred to a secondary pipeline workflow which is invoked using the Change Pipeline step. To avoid those steps from clogging up the processing of the initial item the change pipeline step can reduce the priority at this time.
...
Configuration
The setup of the pipeline priorities can be configured in the Configuration service. Please see Configuration of Ingester for prioritised Data Sources for information.
Monitoring
To monitor how busy the different queues are use the Monitoring Plugin in the Server spaceThis page can now be found at Pipeline Prioritization on the Squirro Docs site.