...
Enrichments which can be specified include:
Processing Step | Documentation Link |
---|---|
unshorten-link | |
deduplication | Duplicate Detection |
content-augmentation | |
content-conversion | Content Conversion |
language-detection | Language Detection |
boilerplate-removal | Boilerplate Removal |
nearduplicate-detection | Near-Duplicate Detection |
webshot | Webshot |
filtering | Filtering |
For example to set up a Twitter source with duplicate detection disabled, the following configuration would be used:
...
To mark a source for processing by the new Squirro Pipeline 2.0 (available starting with Squirro Version 2.5.1):
Code Block | ||
---|---|---|
| ||
client = SquirroClient(None, None, cluster='https://demo-25.squirro.net/') client.authenticate(refresh_token='293d…a13b') client.new_subscription(project_id, object_id='default', provider='bulk', config={ 'pipeline': 'ingester', 'name': 'large_subscription_1', 'ext_id': 'large_subscription_1_id' }) |
...