...
The pipeline editor has been completely recreated. The new editor is more visual, and provides a much easier overview of the various pipelines in a project.
Add In addition to that we have also laid down a lot of ground-work to allow for the re-running of pipeline workflows (for datasources running in the Frontend only) which allows for easier experimentation during the project setup process. For the more technical audience, this is enabled through the following underlying configurations. This will be included in the frontend in the upcoming releases.
We have added a new built-in pipeline step “Transform Input” which does the item fields and facets mapping. This
was previously done
in the dataloader
itself but can now be handled in the pipeline itself. This step is controlled using the configuration option
item_transformation_in_pipeline
. It is disabled by default, and should be considered
a beta feature for this release.
We have introduced a new
processed
directory in the Ingester to
store the input data to this directory before performing pipeline steps. This enables us to
processed
directory after executing the pipelinekeep a copy of the raw data to re-run the pipeline without fetching the data from the original source. This behavior is controlled by the configuration option
keep_processed_data
, which is also disabled by default
.
We have also extend the Ingester to automatically remove the
input data after a certain time period or disk space threshold
to avoid disk over-filling. This is controlled by the configuration options
days_to_retain_processed_batches
andhours_to_retain_processed_batches
. This mechanism kicks in when thekeep_processed
Offer three Pipeline Workflow presets, a set of pre-made Pipeline Workflows with steps for covering various use cases.
Minimal
Standard
Binary Document
- We have
_data
is enabled.
In addition, we now offer three different Pipeline Workflow presets designed for various use-cases.
...
We have also added the functionality to rename all of the steps in pipeline workflow to your liking.
Pipelets which hint in their names that they perform Known Entity Extraction are now by default categorized in the “Relate” section of the Pipeline Editor.
...