Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The pipeline editor provides the functionality to rerun individual steps of a pipeline workflow on data for which the pipeline is configured. If you need to rerun an entire pipeline workflow, check out the documentation here.

Overview

Rerunning of an individual step is typically required when you have added or changed an enrichment in your pipeline workflow and want to (re-) apply the enrichment to already indexed data.

For example, you have ingested a set of PDF documents using a computationally expensive workflow for binary documents some time ago. You would now like to run the documents through a recently developed text classification model to enrich the data with entities resulting from the classification. To avoid rerunning the entire workflow again, you can choose to only run the classification step.

Configuration

Navigate to the pipeline that contains the step you want to rerun. In the pipeline editor, hover over the step, click the three dots and select Rerun from the dropdown (see screenshot below).

...

You can configure the following two options:

...

Name

...

Default

...

Description

...

UI Setting

...

Query

...

The query filters the set of item for which you want to rerun the step. Use the standard Squirro query syntax.

Providing a query is optional. If no query is provided, the step runs on all the items of the data sources that are configured to use this pipeline workflow.

...

...

Run linked steps

...

True

...

Check to run any linked steps that are required for ingesting and enriching the data successfully.

Enrichments of the step you are rerunning are not persisted if you omit running the linked steps. Omitting linked steps is meant for development and testing purposes, to check that items successfully run through the enrichment step.

...

Note

Please be aware that when you add a new step to a pipeline workflow, you are not able to access its Rerun option unless you save the pipeline workflow and then refresh the page.

Example

...

In the above screenshot, we submit the query source_type:ZIP. The query selects items that were indexed using a ZIP data source. The rerun will not affect any other items from data sources that are configured to use this workflow. By checking the Run linked steps option, the 5 steps linked to the Proximity Filter step will also run upon clicking the RERUN buttonThis page can now be found at Pipeline Reruns on the Squirro Docs site.