Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The pipeline editor has been completely recreated. The new editor is more visual, and provides a much easier overview of the various pipelines in a project.

  • Add In addition to that we have also laid down a lot of ground-work to allow for the re-running of pipeline workflows (for datasources running in the Frontend only) which allows for easier experimentation during the project setup process. For the more technical audience, this is enabled through the following underlying configurations. This will be included in the frontend in the upcoming releases.

    • We have added a new built-in pipeline step “Transform Input” which does the item fields and facets mapping. This

    used to be
    • was previously done

    before
    • in the dataloader

    .Control whether item transformation happens during the extraction phase from the dataloader or inside a pipeline workflow from the “Transform Input” step
    • itself but can now be handled in the pipeline itself. This step is controlled using the configuration option item_transformation_in_pipeline. It is disabled by default, and should be considered

    experimental
    • a beta feature for this release.

    Modify
    • We have introduced a new processed directory in the Ingester to

    move
    • store the input data to this directory before performing pipeline steps. This enables us to

    a processed directory after executing the pipeline
    • keep a copy of the raw data to re-run the pipeline without fetching the data from the original source. This behavior is controlled by the configuration option keep_processed_data, which is also disabled by default

    . Useful for re-running
    • .

    Not recommended for production setup.Extend
    • We have also extend the Ingester to automatically remove the

    processed
    • input data after a certain time period or disk space threshold

    . Controlled
    • to avoid disk over-filling. This is controlled by the configuration options days_to_retain_processed_batchesand hours_to_retain_processed_batches. This mechanism kicks in when the keep_processed

    data are configured to be retained.
  • Offer three Pipeline Workflow presets, a set of pre-made Pipeline Workflows with steps for covering various use cases.

    • Minimal

    • Standard

    • Binary Document

  • We have
    • _data is enabled.

  • In addition, we now offer three different Pipeline Workflow presets designed for various use-cases.

...

  • We have also added the functionality to rename all of the steps in pipeline workflow to your liking.

  • Pipelets which hint in their names that they perform Known Entity Extraction are now by default categorized in the “Relate” section of the Pipeline Editor.

...