Page Comparison

Besides publishing models directly from the AI Studio or import published models through project import, you can also use the SquirroClient to publish ML models to the Squirro pipeline.

This guide explains

How-to publish a new ML model by submitting a complete ML workflow configuration
How-to publish an already existing ML workflow as model to the Squirro pipeline

Publish a New ML Model

Using Python, connect and authenticate with the SquirroClient:

Code Block

language	py

from squirro_client import SquirroClient

cluster = '<YOUR CLUSTER>'
project_id='<YOUR PROJECT_ID>'
token = '<YOUR TOKEN>'

client = SquirroClient(None, None, cluster=cluster)
client.authenticate(refresh_token=token)

Define the workflow configuration:

Code Block

language	py

config =
{'dataset': {},
 'pipeline': [{'fields': ['body'], 'step': 'loader', 'type': 'squirro_query'},
              {'fields': ['body'],
               'mark_as_skipped': True,
               'step': 'filter',
               'type': 'empty'},
              {'cleaning': {'approx.': 'approx',
                            'etc.': 'etc',
                            'i.e.': 'ie'},
               'input_fields': ['body'],
               'output_fields': ['extract_sentences'],
               'rules': ['**',
                         '...',
                         '…',
                         ': '],
               'step': 'tokenizer',
               'type': 'sentences_nltk'},
              {'fields': ['extract_sentences'],
               'step': 'filter',
               'type': 'doc_split'},
              {'input_fields': ['extract_sentences'],
               'output_fields': ['extract_sentences'],
               'step': 'tokenizer',
               'type': 'html'},
              {'fields': ['extract_sentences'],
               'step': 'filter',
               'type': 'doc_split'},
              {'input_fields': ['extract_sentences'],
               'output_fields': ['sentences_normalized'],
               'step': 'normalizer',
               'type': 'html'},
              {'fields': ['sentences_normalized'],
               'mark_as_skipped': True,
               'step': 'filter',
               'type': 'regex',
               'whitelist_regexes': ['^.{20,}$']},
              {'blacklist_terms': [],
               'fields': ['sentences_normalized'],
               'matching_label': 'tax_rate1',
               'name': './models/ais/proximity',
               'non_matching_label': 'not_tax_rate1_tax_rate2',
               'output_field': 'prediction_tax_rate1',
               'step': 'filter',
               'type': 'proximity',
               'whitelist_terms': ['tax rate of~1|','tax rate~2|']},
              {'blacklist_terms': [],
               'fields': ['sentences_normalized'],
               'matching_label': 'tax_rate2',
               'name': './models/ais/proximity',
               'non_matching_label': 'not_tax_rate1_tax_rate2',
               'output_field': 'prediction_tax_rate2',
               'step': 'filter',
               'type': 'proximity',
               'whitelist_terms': ['tax rate~4|']},
              {'delimiter': ',',
               'input_fields': ['prediction_tax_rate1', 'prediction_tax_rate2'],
               'output_field': 'prediction',
               'step': 'filter',
               'type': 'merge'},
              {'input_field': 'prediction',
               'output_field': 'prediction',
               'step': 'filter',
               'type': 'split'},
              {'fields': ['sentences_normalized', 'prediction'],
               'step': 'filter',
               'type': 'doc_join'},
              {'entity_name_field': 'Catalyst',
               'entity_type': 'Catalyst',
               'excluded_values': ['not_tax_rate1_tax_rate2'],
               'extract_field': 'sentences_normalized',
               'format_values': False,
               'global_property_field_map': {},
               'modes': ['process'],
               'property_field_map': {'Catalyst': ['prediction']},
               'required_properties': ['Catalyst'],
               'source_field': 'body',
               'step': 'filter',
               'type': 'squirro_entity'}
               ]
    }

Publish the model using client.ml_publish_model. Below is an example how the command could look like for the above workflow config:

Code Block

language	py

client.ml_publish_model(project_id,\
    published_as='Proximity Model Tax Rate',\
    description='Proximity Model for Tax Rate v1',\
    external_model=True,\
    global_id='<UNIQUE_HASH>',\
    location='<LOCATION_OF_ORIGIN>',\
    labels=['tax_rate1','tax_rate2','not_tax_rate1_tax_rate2'],\
    tagging_level='sentence',\
    workflow_name='[PUB] prox config import',\
    workflow_config=config)

Publish Existing ML Workflow

To publish an existing ML Workflow, retrieve its ID from ML Workflows under the AI STUDIO tab:

...

Publish the model using client.ml_publish_model and submitting the workflow_id:

Code Block

language	py

client.ml_publish_model(project_id,\
    published_as='Proximity Model Tax Rate',\
    description='Proximity Model for Tax Rate v1',\
    external_model=True,\
    global_id='<UNIQUE_HASH>',\
    location='<LOCATION_OF_ORIGIN>',\
    labels=['tax_rate1','tax_rate2','not_tax_rate1_tax_rate2'],\
    tagging_level='sentence',\
    workflow_id='VLGRAEbLRZ2v5Uq_MPt77w')

Remarks

For document level tagging you must provide the keywords.prediction field in the output_fields to store the predictions in a keyword (see the example configuration below).

Expand

title	Example workflow configuration for document level tagging.

Code Block

language	json

{
    "dataset": {
        "infer": {
            "count": 10000,
            "query_string": "language:en"
        }
    },
    "pipeline": [
        {
            "fields": [
                "body"
            ],
            "step": "loader",
            "type": "squirro_query"
        },
        {
            "fields": [
                "body"
            ],
            "step": "filter",
            "type": "empty"
        },
        {
            "input_fields": [
                "body"
            ],
            "output_fields": [
                "clean_body"
            ],
            "step": "normalizer",
            "type": "html"
        },
        {
            "input_fields": [
                "clean_body"
            ],
            "output_fields": [
                "extract_sentences"
            ],
            "step": "tokenizer",
            "type": "sentences_nltk"
        },
        {
            "fields": [
                "extract_sentences"
            ],
            "step": "filter",
            "type": "doc_split"
        },
        {
            "input_fields": [
                "extract_sentences"
            ],
            "label_field": "",
            "output_field": "prediction",
            "step": "classifier",
            "type": "vadersentiment"
        },
        {
            "fields": [
                "extract_sentences",
                "prediction"
            ],
            "step": "filter",
            "type": "doc_join"
        },
        {
            "input_fields": [
                "prediction"
            ],
            "output_fields": [
                "keywords.prediction"
            ],
            "step": "filter",
            "type": "vote"
        },
        {
            "fields": [
                "keywords.prediction"
            ],
            "step": "saver",
            "type": "squirro_item"
        }
    ]
}

You can then assign the keyword in which the predictions are stored from the Facet dropdown in the pipeline editor when editing the published model step:

...

This page can now be found at How To Publish ML Models Using the Squirro Client on the Squirro Docs site.

Versions Compared

Old Version 5

New Version Current

Key

Publish a New ML Model

Publish Existing ML Workflow

Remarks