Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Next »

Squirro comes with a number of built-in enrichments. These can all be enabled and disabled on a per-source as well as a per-project level. Some enrichments also support configuration to fine-tune their behavior.

Table of Contents

Overview

When each item goes through the pipeline, it is sequentially processed by the individual enrichment steps. These enrichments follow a fixed order as seen in the following diagram.

The final step of the pipeline is always the indexing into the search index.

Most steps are enabled by default, except in these cases:

  • The bulk provider disables a few enrichments by default. This affects all file uploads, Excel imports, etc. The disabled enrichments are: Near-duplicate Detection, Webshot, Unshorten Link.
  • The Diffbot Provider adds additional fields to the deduplication step (title and link are checked as well).

Details about whether an enrichment is enabled by default can be found on the pages for each individual processing steps.

Processing Config

The configuration of built-in enrichments is done with the processing config. This can be used both for enabling and disabling of enrichments, as well as adding additional configuration for a step.

There are two places this configuration can be specified:

  • Per source / subscription: when creating a new subscription, the processing instructions can be passed in to fine-tune the behavior for that one source.
  • Per project: a project also has a processing config, which applies to all items coming in for a project.

Source processing config

To set up a processing configuration, specify the processing field in a source's config. The value of that field is again a dictionary, with the enrichment names as keys.

For example to set up a Twitter source with duplicate detection disabled, the following configuration would be used:

{
    "query": "Squirro",
    "processing": {
        "deduplication": {
            "enabled": false
        }
    }
}

Using the Python SDK a subscription for this could be created with the following code snippet:

client = SquirroClient(None, None, cluster='https://next.squirro.net/')
client.authenticate(refresh_token='293d…a13b')
client.new_subscription(project_id, object_id='default', provider='twitter',
    processing_config={
        'query': 'Squirro',
        'processing': {
            'deduplication': {
                'enabled': False
            }
        }
    })

The enabled property is available for every built-in enrichment and can be set to true or false. Some of the enrichments have additional configuration options, that are described on the corresponding page.

Project processing config

Please contact the Squirro team if you want to use project processing configs.

Available Enrichments

The built-in enrichments are listed below.

  • No labels