Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Update to new subscription methods

...

When link fetching is enabled, this step will often be combined with the Boilerplate Noise Removal enrichment.

Configuration

FieldDescription
fetch_link_contentBoolean value indicating whether to fetch the content from the web site referenced with the link attribute. Default: false.

...

Code Block
languagepy
linenumberstrue
from squirro_client import SquirroClient
 
client = SquirroClient(None, None, cluster='https://next.squirro.net/')
client.authenticate(refresh_token='293d…a13b')
 
# Get existing source configuration (including processing configuration)
source = client.get_subscription(project_source(projectid='…', object_id='…', sourcesubscription_id='…')
config = source.get('config', {})
processing_config = config.get('processing_config', {})
 
# Modify processing configuration
processing_config['content-augmentation'] = {
    'enabled': True,
    'fetch_link_content': True,
}
config['processing'] = processing_config
client.modify_subscription(project_source(projectid='…', object_id='…', sourcesubscription_id='…', config=config)

In the example above the processing pipeline is instructed to fetch the content for every new incoming item (from the link attribute) and use it as the item body.