The configuration of built-in enrichments is done with the processing config. This can be used both for enabling and disabling of enrichments, as well as adding additional configuration for a step.There are two places this configuration can be specified:
- Per source / subscription: when creating a new subscription, the processing instructions can be passed in to fine-tune the behavior for that one source.
- Per project: a project also has a processing config, which applies to all items coming in for a project.
Source processing config
To set up a processing configuration, specify the processing
field in a source's config. The value of that field is again a dictionary, with the enrichment names as keys.
Enrichments which can be specified include:
Processing Step | Documentation Link |
---|---|
unshorten-link | |
deduplication | Duplicate Detection |
content-augmentation | |
content-conversion | Content Conversion |
language-detection | Language Detection |
boilerplate-removal | Boilerplate Removal |
nearduplicate-detection | Near-Duplicate Detection |
webshot | Webshot |
filtering | Filtering |
For example to set up a Twitter source with duplicate detection disabled, the following configuration would be used:
{ "query": "Squirro", "processing": { "deduplication": { "enabled": false } } }
Using the Python SDK a subscription for this could be created with the following code snippet:
client = SquirroClient(None, None, cluster='https://next.squirro.net/') client.authenticate(refresh_token='293d…a13b') client.new_subscription(project_id, object_id='default', provider='twitter', processing_config={ 'query': 'Squirro', 'processing': { 'deduplication': { 'enabled': False } } })
The enabled
property is available for every built-in enrichment and can be set to true
or false
. Some of the enrichments have additional configuration options, that are described on the corresponding page.
Project processing config
Please contact the Squirro team if you want to use project processing configs.