Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Besides publishing models directly from the AI Studio or import published models through project import, you can also use the SquirroClient to publish ML models to the Squirro pipeline.

This guide explains

  • How-to publish a new ML model by submitting a complete ML workflow configuration

  • How-to publish an already existing ML workflow as model to the Squirro pipeline

Publish a New ML Model

Using Python, connect and authenticate with the SquirroClient:

Code Block
languagepy
from squirro_client import SquirroClient

cluster = '<YOUR CLUSTER>'
project_id='<YOUR PROJECT_ID>'
token = '<YOUR TOKEN>'

client = SquirroClient(None, None, cluster=cluster)
client.authenticate(refresh_token=token)

Define the workflow configuration:

Code Block
languagepy
config =
{'dataset': {},
 'pipeline': [{'fields': ['body'], 'step': 'loader', 'type': 'squirro_query'},
              {'fields': ['body'],
               'mark_as_skipped': True,
               'step': 'filter',
               'type': 'empty'},
              {'cleaning': {'approx.': 'approx',
                            'etc.': 'etc',
                            'i.e.': 'ie'},
               'input_fields': ['body'],
               'output_fields': ['extract_sentences'],
               'rules': ['**',
                         '...',
                         '…',
                         ': '],
               'step': 'tokenizer',
               'type': 'sentences_nltk'},
              {'fields': ['extract_sentences'],
               'step': 'filter',
               'type': 'doc_split'},
              {'input_fields': ['extract_sentences'],
               'output_fields': ['extract_sentences'],
               'step': 'tokenizer',
               'type': 'html'},
              {'fields': ['extract_sentences'],
               'step': 'filter',
               'type': 'doc_split'},
              {'input_fields': ['extract_sentences'],
               'output_fields': ['sentences_normalized'],
               'step': 'normalizer',
               'type': 'html'},
              {'fields': ['sentences_normalized'],
               'mark_as_skipped': True,
               'step': 'filter',
               'type': 'regex',
               'whitelist_regexes': ['^.{20,}$']},
              {'blacklist_terms': [],
               'fields': ['sentences_normalized'],
               'matching_label': 'tax_rate1',
               'name': './models/ais/proximity',
               'non_matching_label': 'not_tax_rate1_tax_rate2',
               'output_field': 'prediction_tax_rate1',
               'step': 'filter',
               'type': 'proximity',
               'whitelist_terms': ['tax rate of~1|','tax rate~2|']},
              {'blacklist_terms': [],
               'fields': ['sentences_normalized'],
               'matching_label': 'tax_rate2',
               'name': './models/ais/proximity',
               'non_matching_label': 'not_tax_rate1_tax_rate2',
               'output_field': 'prediction_tax_rate2',
               'step': 'filter',
               'type': 'proximity',
               'whitelist_terms': ['tax rate~4|']},
              {'delimiter': ',',
               'input_fields': ['prediction_tax_rate1', 'prediction_tax_rate2'],
               'output_field': 'prediction',
               'step': 'filter',
               'type': 'merge'},
              {'input_field': 'prediction',
               'output_field': 'prediction',
               'step': 'filter',
               'type': 'split'},
              {'fields': ['sentences_normalized', 'prediction'],
               'step': 'filter',
               'type': 'doc_join'},
              {'entity_name_field': 'Catalyst',
               'entity_type': 'Catalyst',
               'excluded_values': ['not_tax_rate1_tax_rate2'],
               'extract_field': 'sentences_normalized',
               'format_values': False,
               'global_property_field_map': {},
               'modes': ['process'],
               'property_field_map': {'Catalyst': ['prediction']},
               'required_properties': ['Catalyst'],
               'source_field': 'body',
               'step': 'filter',
               'type': 'squirro_entity'}
               ]
    }

Publish the model using client.ml_publish_model. Below is an example how the command could look like for the above workflow config:

Code Block
languagepy
client.ml_publish_model(project_id,\
    published_as='Proximity Model Tax Rate',\
    description='Proximity Model for Tax Rate v1',\
    external_model=True,\
    global_id='<UNIQUE_HASH>',\
    location='<LOCATION_OF_ORIGIN>',\
    labels=['tax_rate1','tax_rate2','not_tax_rate1_tax_rate2'],\
    tagging_level='sentence',\
    workflow_name='[PUB] prox config import',\
    workflow_config=config)

Publish Existing ML Workflow

To publish an existing ML Workflow, retrieve its ID from ML Workflows under the AI STUDIO tab:

...

Publish the model using client.ml_publish_model and submitting the workflow_id:

Code Block
languagepy
client.ml_publish_model(project_id,\
    published_as='Proximity Model Tax Rate',\
    description='Proximity Model for Tax Rate v1',\
    external_model=True,\
    global_id='<UNIQUE_HASH>',\
    location='<LOCATION_OF_ORIGIN>',\
    labels=['tax_rate1','tax_rate2','not_tax_rate1_tax_rate2'],\
    tagging_level='sentence',\
    workflow_id='VLGRAEbLRZ2v5Uq_MPt77w')

Remarks

For document level tagging you must provide the keywords.prediction field in the output_fields to store the predictions in a keyword (see the example configuration below).

Expand
titleExample workflow configuration for document level tagging.
Code Block
languagejson
{
    "dataset": {
        "infer": {
            "count": 10000,
            "query_string": "language:en"
        }
    },
    "pipeline": [
        {
            "fields": [
                "body"
            ],
            "step": "loader",
            "type": "squirro_query"
        },
        {
            "fields": [
                "body"
            ],
            "step": "filter",
            "type": "empty"
        },
        {
            "input_fields": [
                "body"
            ],
            "output_fields": [
                "clean_body"
            ],
            "step": "normalizer",
            "type": "html"
        },
        {
            "input_fields": [
                "clean_body"
            ],
            "output_fields": [
                "extract_sentences"
            ],
            "step": "tokenizer",
            "type": "sentences_nltk"
        },
        {
            "fields": [
                "extract_sentences"
            ],
            "step": "filter",
            "type": "doc_split"
        },
        {
            "input_fields": [
                "extract_sentences"
            ],
            "label_field": "",
            "output_field": "prediction",
            "step": "classifier",
            "type": "vadersentiment"
        },
        {
            "fields": [
                "extract_sentences",
                "prediction"
            ],
            "step": "filter",
            "type": "doc_join"
        },
        {
            "input_fields": [
                "prediction"
            ],
            "output_fields": [
                "keywords.prediction"
            ],
            "step": "filter",
            "type": "vote"
        },
        {
            "fields": [
                "keywords.prediction"
            ],
            "step": "saver",
            "type": "squirro_item"
        }
    ]
}

You can then assign the keyword in which the predictions are stored from the Facet dropdown in the pipeline editor when editing the published model step:

...

This page can now be found at How To Publish ML Models Using the Squirro Client on the Squirro Docs site.