Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: update document given that pipeline 2.0 is the only pipeline in Squirro 2.6.0

...

Excerpt

The Squirro Pipeline 2.0 allows you to import large amounts of data in such a resource-friendly way that Squirro remains responsive even during high volume data inflow. This document describes the steps you can follow to use Pipeline 2.0 to import large amounts of data into Squirro installations versioned 2.5.1 or 2.5.2. The . The Squirro Pipeline 2.0 is became available starting with Squirro version 2.5.1, and default and will become the only way to import data into Squirro with release only pipeline with Squirro 2.6.0 in early , released 3rd April 2018.

...

Setting Up Pipeline 2.0

...

Whenever you have hundreds of millions of documents to import or if you don't need all the built-in enrichments that come at a processing cost and slow down data ingestion, Pipeline 2.0 is for you. A key feature of Pipeline 2.0 is that you would "opt in" to enrichments rather than paying the price of running all enrichments.

Because the Pipeline 2.0 can run alongside Pipeline 1.0, you have a couple of options:

...

Squirro Server

...

wide

...

Because Pipeline 2.0 does not run any of the built-in enrichments by default (until you opt in each enrichment individually), switching existing subscriptions may have unintended side-effects. This is because your application and users have likely come to rely on the behavior introduced by the enrichments. In such cases we recommend contacting Squirro support if you are interested in speeding up the import of existing data subscriptions.

...

Pipeline 2.0 relies on a file system to queue data in batches before Squirro inserts the data into ElasticSearch in bulk. Options for storing this temporary data files are:

...

If you elect not to use the local file system, change source_metadata_directory and data_directories accordingly.

...

Configuring Pipeline 2.0 for Individual Subscriptions

The same as in the previous section, ensure that there is enough space and sequential IO capacity for temporarily queued data files.

...