Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 10 Current »

The pipeline editor provides the functionality to rerun individual steps of a pipeline workflow on data for which the pipeline is configured. If you need to rerun an entire pipeline workflow, check out the documentation here.

Overview

Rerunning of an individual step is typically required when you have added or changed an enrichment in your pipeline workflow and want to (re-) apply the enrichment to already indexed data.

For example, you have ingested a set of PDF documents using a computationally expensive workflow for binary documents some time ago. You would now like to run the documents through a recently developed text classification model to enrich the data with entities resulting from the classification. To avoid rerunning the entire workflow again, you can choose to only run the classification step.

Configuration

Navigate to the pipeline that contains the step you want to rerun. In the pipeline editor, hover over the step, click the three dots and select Rerun from the dropdown (see screenshot below).

You can configure the following two options:

Name

Default

Description

UI Setting

Query

The query filters the set of item for which you want to rerun the step. Use the standard Squirro query syntax.

Providing a query is optional. If no query is provided, the step runs on all the items of the data sources that are configured to use this pipeline workflow.

Run linked steps

True

Check to run any linked steps that are required for ingesting and enriching the data successfully.

Enrichments of the step you are rerunning are not persisted if you omit running the linked steps. Omitting linked steps is meant for development and testing purposes, to check that items successfully run through the enrichment step.

Please be aware that when you add a new step to a pipeline workflow, you are not able to access its Rerun option unless you save the pipeline workflow and then refresh the page.

Example

In the above screenshot, we submit the query source_type:ZIP. The query selects items that were indexed using a ZIP data source. The rerun will not affect any other items from data sources that are configured to use this workflow. By checking the Run linked steps option, the 5 steps linked to the Proximity Filter step will also run upon clicking the RERUN button.

  • No labels