Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Enrichments which can be specified include:

Processing StepDocumentation Link
unshorten-link

Unshorten Link

deduplicationDuplicate Detection
content-augmentation

Content Augmentation

content-conversionContent Conversion
language-detectionLanguage Detection
boilerplate-removalBoilerplate Removal
nearduplicate-detectionNear-Duplicate Detection
webshotWebshot
filteringFiltering


For example to set up a Twitter source with duplicate detection disabled, the following configuration would be used:

...

To mark a source for processing by the new Squirro Pipeline 2.0 (available starting with Squirro Version 2.5.1):

Code Block
languagepy
client = SquirroClient(None, None, cluster='https://demo-25.squirro.net/')
client.authenticate(refresh_token='293d…a13b')
client.new_subscription(project_id, object_id='default', provider='bulk',
    config={
        'pipeline': 'ingester',
        'name': 'large_subscription_1',
        'ext_id': 'large_subscription_1_id'
    })

...