Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


Excerpt

The configuration of built-in enrichments is done with the processing config. This can be used both for enabling and disabling of enrichments, as well as adding additional configuration for a step. In addition you can use processing config to mark a source or project to use the new Squirro Pipeline 2.0

...

  • Per source / subscription: when creating a new subscription, the processing instructions can be passed in to fine-tune the behavior for that one source.
  • Per project: a project also has a processing config, which applies to all items coming in for a project.
  • Within Data Loader using the argument --source-config-file.

Source processing config

To set up a processing configuration, specify the processing field in a source's config. The value of that field is again a dictionary, with the enrichment names as keys.

...

Processing StepDocumentation Link
unshorten-link

Unshorten Link

deduplicationDuplicate Detection
content-augmentation

Content Augmentation

content-conversionContent ConversionExtraction
language-detectionLanguage Detection
boilerplate-removalBoilerplate Noise Removal
nearduplicate-detectionNear-Duplicate Detection
webshotWebshotThumbnail Extraction
filteringFiltering


For example to set up a Twitter source with duplicate detection disabled, the following configuration would be used:

...

Project processing config

Please contact the Squirro team if you want to use project processing configs.