The pipeline steps are run sequentially. When a pipeline step fails for any reason, the item is re-queued and the full pipeline will be re-run on that item. If processing fails persistently (10 times by default) the item is dropped from the pipeline.
Some errors are handled by adding an error code to the item. The known error codes for this are documented in the Processing Error table.
The sqingesterd service is responsible for executing the Pipeline workflows and their steps.
The configuration option processors under the section ingester of the /etc/squirro/ingester.ini file controls the number of processors used by the sqingesterd service to consume the batch files found under the /var/lib/squirro/inputstream directory. Each processor works on a single batch file at a time. Under the hood, each processor is a separate Unix process. The default value of this option is 1 (i.e., a single processor is spawned by the service for ingesting data).
The configuration option workers under the section processor of the /etc/squirro/ingester.ini file controls the number of threads spawned by each processor. This setting is being used for the execution of certain Pipeline steps which consume a single item from the batch at a time. Other Pipeline steps work on a batch level and therefore this option is irrelevant to them. The default value of this option is 3. (i.e., approximately 3 items of a batch are executed concurrently by a single processor).
Pipeline Step Dependencies
Some Pipeline steps have dependencies on other steps. If these dependencies are not met, then it is possible that either the processing of the Pipeline workflow will fail completely, or it will be successful but the ingested items will not get transformed as expected. The following list outlines the current known step dependencies per section: