Besides publishing models directly from the AI Studio or import published models through project import, you can also use the SquirroClient
to publish ML models to the Squirro pipeline.
This guide explains
How-to publish a new ML model by submitting a complete ML workflow configuration
How-to publish an already existing ML workflow as model to the Squirro pipeline
Publish a New ML Model
Using Python, connect and authenticate with the SquirroClient
:
Code Block | ||
---|---|---|
| ||
from squirro_client import SquirroClient
cluster = '<YOUR CLUSTER>'
project_id='<YOUR PROJECT_ID>'
token = '<YOUR TOKEN>'
client = SquirroClient(None, None, cluster=cluster)
client.authenticate(refresh_token=token) |
Define the workflow configuration:
Code Block | ||
---|---|---|
| ||
config =
{'dataset': {},
'pipeline': [{'fields': ['body'], 'step': 'loader', 'type': 'squirro_query'},
{'fields': ['body'],
'mark_as_skipped': True,
'step': 'filter',
'type': 'empty'},
{'cleaning': {'approx.': 'approx',
'etc.': 'etc',
'i.e.': 'ie'},
'input_fields': ['body'],
'output_fields': ['extract_sentences'],
'rules': ['**',
'...',
'…',
': '],
'step': 'tokenizer',
'type': 'sentences_nltk'},
{'fields': ['extract_sentences'],
'step': 'filter',
'type': 'doc_split'},
{'input_fields': ['extract_sentences'],
'output_fields': ['extract_sentences'],
'step': 'tokenizer',
'type': 'html'},
{'fields': ['extract_sentences'],
'step': 'filter',
'type': 'doc_split'},
{'input_fields': ['extract_sentences'],
'output_fields': ['sentences_normalized'],
'step': 'normalizer',
'type': 'html'},
{'fields': ['sentences_normalized'],
'mark_as_skipped': True,
'step': 'filter',
'type': 'regex',
'whitelist_regexes': ['^.{20,}$']},
{'blacklist_terms': [],
'fields': ['sentences_normalized'],
'matching_label': 'tax_rate1',
'name': './models/ais/proximity',
'non_matching_label': 'not_tax_rate1_tax_rate2',
'output_field': 'prediction_tax_rate1',
'step': 'filter',
'type': 'proximity',
'whitelist_terms': ['tax rate of~1|','tax rate~2|']},
{'blacklist_terms': [],
'fields': ['sentences_normalized'],
'matching_label': 'tax_rate2',
'name': './models/ais/proximity',
'non_matching_label': 'not_tax_rate1_tax_rate2',
'output_field': 'prediction_tax_rate2',
'step': 'filter',
'type': 'proximity',
'whitelist_terms': ['tax rate~4|']},
{'delimiter': ',',
'input_fields': ['prediction_tax_rate1', 'prediction_tax_rate2'],
'output_field': 'prediction',
'step': 'filter',
'type': 'merge'},
{'input_field': 'prediction',
'output_field': 'prediction',
'step': 'filter',
'type': 'split'},
{'fields': ['sentences_normalized', 'prediction'],
'step': 'filter',
'type': 'doc_join'},
{'entity_name_field': 'Catalyst',
'entity_type': 'Catalyst',
'excluded_values': ['not_tax_rate1_tax_rate2'],
'extract_field': 'sentences_normalized',
'format_values': False,
'global_property_field_map': {},
'modes': ['process'],
'property_field_map': {'Catalyst': ['prediction']},
'required_properties': ['Catalyst'],
'source_field': 'body',
'step': 'filter',
'type': 'squirro_entity'}
]
} |
Publish the model using client.ml_publish_model
. Below is an example how the command could look like for the above workflow config
:
Code Block | ||
---|---|---|
| ||
client.ml_publish_model(project_id,\
published_as='Proximity Model Tax Rate',\
description='Proximity Model for Tax Rate v1',\
external_model=True,\
global_id='<UNIQUE_HASH>',\
location='<LOCATION_OF_ORIGIN>',\
labels=['tax_rate1','tax_rate2','not_tax_rate1_tax_rate2'],\
tagging_level='sentence',\
workflow_name='[PUB] prox config import',\
workflow_config=config) |
Publish Existing ML Workflow
To publish an existing ML Workflow, retrieve its ID from ML Workflows under the AI STUDIO tab:
...
Publish the model using client.ml_publish_model
and submitting the workflow_id
:
Code Block | ||
---|---|---|
| ||
client.ml_publish_model(project_id,\
published_as='Proximity Model Tax Rate',\
description='Proximity Model for Tax Rate v1',\
external_model=True,\
global_id='<UNIQUE_HASH>',\
location='<LOCATION_OF_ORIGIN>',\
labels=['tax_rate1','tax_rate2','not_tax_rate1_tax_rate2'],\
tagging_level='sentence',\
workflow_id='VLGRAEbLRZ2v5Uq_MPt77w') |
Remarks
For document level tagging you must provide the keywords.prediction
field in the output_fields
to store the predictions in a keyword (see the example configuration below).
Expand | |||||
---|---|---|---|---|---|
| |||||
|
You can then assign the keyword in which the predictions are stored from the Facet dropdown in the pipeline editor when editing the published model step:
...
This page can now be found at How To Publish ML Models Using the Squirro Client on the Squirro Docs site.