...
When link fetching is enabled, this step will often be combined with the Boilerplate Noise Removal enrichment.
Configuration
Field | Description |
---|---|
fetch_link_content | Boolean value indicating whether to fetch the content from the web site referenced with the link attribute. Default: false . |
...
Code Block | ||||
---|---|---|---|---|
| ||||
from squirro_client import SquirroClient client = SquirroClient(None, None, cluster='https://next.squirro.net/') client.authenticate(refresh_token='293d…a13b') # Get existing source configuration (including processing configuration) source = client.get_subscription(project_source(projectid='…', object_id='…', sourcesubscription_id='…') config = source.get('config', {}) processing_config = config.get('processing_config', {}) # Modify processing configuration processing_config['content-augmentation'] = { 'enabled': True, 'fetch_link_content': True, } config['processing'] = processing_config client.modify_subscription(project_source(projectid='…', object_id='…', sourcesubscription_id='…', config=config) |
In the example above the processing pipeline is instructed to fetch the content for every new incoming item (from the link
attribute) and use it as the item body
.