...
Pipelets are written in Python. They need to inherit from the squirro.sdk.PipeletV1
class and implement the consume
method. The simplest possible pipelet looks like this:
Code Block | ||
---|---|---|
| ||
from squirro.sdk import PipeletV1 class NoopPipelet(PipeletV1): def consume(self, item): return item |
As it name says it does nothing but return the item unchanged. The item can be modified before it is returned. For example:
Code Block | ||
---|---|---|
| ||
from squirro.sdk import PipeletV1 class ModifyTitlePipelet(PipeletV1): def consume(self, item): item['title'] = item.get('title', '') + ' - Hello, World!' return item |
...
The pipelet is always called for each item individually. But in some use cases the pipelet should not just return one item but multiple ones. In those cases use the Python yield
statement to return each individual item. For example:
Code Block | ||
---|---|---|
| ||
from squirro.sdk import PipeletV1 class ExtendTitlePipelet(PipeletV1): def consume(self, item): for i in range(10): new_item = dict(item) new_item['title'] = '{0} ({1})'.format(item.get('title', ''), i) yield new_item |
...
Pipelets are limited in what you can do. For example the print
statement is disallowed and you can not import any external libraries except squirro.sdk
. If you do need access to external libraries, you need to use the @require
decorator. For example to log some output:
Code Block | ||
---|---|---|
| ||
from squirro.sdk import PipeletV1, require @require('log') class LoggingPipelet(PipeletV1): def consume(self, item): self.log.debug('Processing item: %r', item['id']) return item |
...
HTTP requests can be executed by using the requests
dependency. The following pipelet shows an example for sentiment detection:
Code Block | ||
---|---|---|
| ||
from squirro.sdk import PipeletV1, require @require('requests') class SentimentPipelet(PipeletV1): def consume(self, item): text_content = ' '.join([item.get('title', ''), item.get('body', '')]) res = self.requests.post('http://example.com/detect', data={'text': text_content}, headers={'Accept': 'application/json'}) sentiment = res.json()['sentiment'] item.setdefault('keywords', {})['sentiment'] = [sentiment] return item |
...
Pipelets are only run for items that are processed in the system after the enrichment has been configured. For information on how to process old items with a pipelet, see Rerunning a Pipelet.
Adding Enrichments
Pipelets are added to a project using the Add Enrichment screen in the user interface.
Alternatively the Enrichments API can also be used to add pipelet enrichments to a project. The following example shows how (using the Python SDK):
Code Block |
---|
# client = SquirroClient(…)
client.create_enrichment(
'Sz7LLLbyTzy_SddblwIxaA',
'pipelet',
'TextRazor', {'pipelet': 'tenant-example/Modify Title', 'suffix': ' - Title Modified'}
) |
Also pipelets can be set to execute before a certain pipeline stage. The following example outlines how to do that:
Code Block |
---|
# client = SquirroClient(…)
client.create_enrichment(
'Sz7LLLbyTzy_SddblwIxaA',
'pipelet',
'TextRazor', {'pipelet': 'tenant-example/Modify Title', 'suffix': ' - Title Modified'},
before='content'
) |