Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Pipelets are written in Python. They need to inherit from the squirro.sdk.PipeletV1 class and implement the consume method. The simplest possible pipelet looks like this:

Code Block
languagepy
from squirro.sdk import PipeletV1
 
class NoopPipelet(PipeletV1):
    def consume(self, item):
        return item

As it name says it does nothing but return the item unchanged. The item can be modified before it is returned. For example:

Code Block
languagepy
from squirro.sdk import PipeletV1
 
class ModifyTitlePipelet(PipeletV1):
    def consume(self, item):
        item['title'] = item.get('title', '') + ' - Hello, World!'
        return item

...

The pipelet is always called for each item individually. But in some use cases the pipelet should not just return one item but multiple ones. In those cases use the Python yield statement to return each individual item. For example:

Code Block
languagepy
from squirro.sdk import PipeletV1
 
class ExtendTitlePipelet(PipeletV1):
    def consume(self, item):
        for i in range(10):
            new_item = dict(item)
            new_item['title'] = '{0} ({1})'.format(item.get('title', ''), i)
            yield new_item

...

Pipelets are limited in what you can do. For example the print statement is disallowed and you can not import any external libraries except squirro.sdk. If you do need access to external libraries, you need to use the @require decorator. For example to log some output:

Code Block
languagepy
from squirro.sdk import PipeletV1, require	

@require('log')
class LoggingPipelet(PipeletV1):
    def consume(self, item):
        self.log.debug('Processing item: %r', item['id'])
        return item

...

HTTP requests can be executed by using the requests dependency. The following pipelet shows an example for sentiment detection:

Code Block
languagepy
from squirro.sdk import PipeletV1, require

@require('requests')
class SentimentPipelet(PipeletV1):
    def consume(self, item):
        text_content = ' '.join([item.get('title', ''),
                                 item.get('body', '')])
        res = self.requests.post('http://example.com/detect',
                                 data={'text': text_content},
                                 headers={'Accept': 'application/json'})
        sentiment = res.json()['sentiment']
        item.setdefault('keywords', {})['sentiment'] = [sentiment]
        return item

...

Pipelets are only run for items that are processed in the system after the enrichment has been configured. For information on how to process old items with a pipelet, see Rerunning a Pipelet.

Adding Enrichments

Pipelets are added to a project using the Add Enrichment screen in the user interface.

Alternatively the Enrichments API can also be used to add pipelet enrichments to a project. The following example shows how (using the Python SDK):

Code Block
# client = SquirroClient(…)
client.create_enrichment(
    'Sz7LLLbyTzy_SddblwIxaA',
    'pipelet',
    'TextRazor', {'pipelet': 'tenant-example/Modify Title', 'suffix': ' - Title Modified'}
)

Also pipelets can be set to execute before a certain pipeline stage. The following example outlines how to do that:

Code Block
# client = SquirroClient(…)
client.create_enrichment(
    'Sz7LLLbyTzy_SddblwIxaA',
    'pipelet',
    'TextRazor', {'pipelet': 'tenant-example/Modify Title', 'suffix': ' - Title Modified'},
    before='content'
)