Excerpt |
---|
The configuration of built-in enrichments is done with the processing config. This can be used both for enabling and disabling of enrichments, as well as adding additional configuration for a step. In addition you can use processing config to mark a source or project to use the new Squirro Pipeline 2.0 |
...
- Per source / subscription: when creating a new subscription, the processing instructions can be passed in to fine-tune the behavior for that one source.
- Per project: a project also has a processing config, which applies to all items coming in for a project.
- Within Data Loader using the argument --source-config-file.
Source processing config
To set up a processing configuration, specify the processing
field in a source's config. The value of that field is again a dictionary, with the enrichment names as keys.
...
Processing Step | Documentation Link |
---|---|
unshorten-link | |
deduplication | Duplicate Detection |
content-augmentation | |
content-conversion | Content ConversionExtraction |
language-detection | Language Detection |
boilerplate-removal | Boilerplate Noise Removal |
nearduplicate-detection | Near-Duplicate Detection |
webshotWebshot | Thumbnail Extraction |
filtering | Filtering |
For example to set up a Twitter source with duplicate detection disabled, the following configuration would be used:
...
Project processing config
Please contact the Squirro team if you want to use project processing configs.