Page Comparison

Note
works with Squirro 3.5.2 and newer

Model-as-a-Service (MaaS) is an initiative to open up Squirro for custom ML models and speed up the prototyping phase for ML projects in Squirro.

Table of Contents

Preparation

Before you can use MaaS, you need to install the required packages from the Squirro mirror on your target Squirro server:

Code Block

language	bash

yum install squirro-miniforge
yum install squirro-python38-mlflow

Creation of a MLFlow Model

Before you can upload a model, you need to create a MLFlow Model and example can be found here. This can be done in two ways:

train a MLFlow model on your local machine or on your exploration server
wrap an existing (pre-trained) model into the structure of a MLFlow Model and run it locally

Either way MLFlow stores the (trained) model in the MLFlow base folder (mlruns/0/) with an unique hash (<HASH>) after executing the run-command. The most minimal structure of the MLFlow Model looks as follow:

Code Block

language	bash

├── artifacts
│   └── model
│       ├── conda.yaml
│       ├── MLmodel
│       ├── python_model.pkl
│       └── requirements.txt
└── meta.yaml

MLFlow documentation

built-in model flavors (to write a own model)
train a MLFlow Model
packaging training code for MLFlow Model

Data Structure

To use the MLFlow Model later in the context of a Squirro ML Workflow you need to stick to a specific data structure:

the input is a pandas dataframe with an id and named feature fields as columns
the output is again a pandas dataframe with an id and result fields as columns

Example:

input DataFrame

Code Block

language	none

    id                         text
0  id0  this is a example sentence.
1  id1                 hello world.
2  id2             random sentence.
3  id3               test sentence.

...

output DataFrame

Code Block

language	none

    id   class
0  id0  class1
1  id1  class0
2  id2  class0
3  id3  class1

Upload of a Model

To upload the MLFlow Model we provide two options:

via squirro asset (large models >500MB (exact number is under revision) can cause nginx issues → then use scp):
- go into the MLFlow base folder
- send the (trained) model via squirro_asset
  Code Block
  language bash
  squirro_asset -vvv mlflow_models upload -t $TOKEN -c $CLUSTER -f mlruns/0/<HASH>/
via scp:
- go into the MLFlow base folder
- assure that destination directory exists (on the Squirro server)
  Code Block
  language bash
  <BASE_DIR>=/var/lib/squirro/topic/assets/mlflow_models # default path mkdir -p <BASE_DIR>/mlruns/0
- compress the directory with the (trained) model (wherever you have trained your model)
  Code Block
  language bash
  cd mlruns/0/ && tar -czvf trained_model.tar.gz <HASH>/
- send it to the MLFlow base folder on the Squirro server
  Code Block
  language bash
  scp trained_model.zip <SQUIRRO_SERVER_URL>:/tmp/
- ssh into the Squirro server and unzip the sent file
  Code Block
  language bash
  cd <BASE_DIR>/mlruns/0 && mv /tmp/trained_model.tar.gz <BASE_DIR>/mlruns/0/ #create the dirs if not existing tar -xzvf trained_model.tar.gz
- adjust artifact_uri in the meta.yaml with the new path of the MLFlow Model (file:///<BASE_DIR>/mlruns/0/<HASH>/artifacts)
  Code Block
  language bash
  sed -i '/artifact_uri/c\artifact_uri: file:///<BASE_DIR>/mlruns/0/<HASH>/artifacts' <HASH>/meta.yaml

Starting of Service

To start a Model-as-a-Service you need to execute following steps:

make sure you are in the MLFlow base folder on the Squirro server
activate the squirro environment
Code Block
language bash
squirro_activate3
serve the model identified by the <HASH> as a service listening to the chosen port <PORT>
Code Block
language bash
mlflow models serve -m runs:/<HASH>/model -p <PORT>
- use nohup or screen when starting the service so the MaaS does not stop when you terminate your ssh session

Note

there is no service orchestration provided at this stage
keep an eye on memory and storage consumption. Then among others:
- a started model service loads the model in memory and keeps it there
- there is a new conda environment created for every new model which has a different conda.yaml file
on-premise customers need to manually package their conda environment. This can be done as explained here.

Usage of MaaS

To use the Model you need to create a ML Workflow:

example 1: document level

Code Block

language	json

{
    "dataset": {
        "infer": {
            "count": 10,
            "query_string": "language:en"
        }
    },
    "pipeline": [
        {
            "fields": [
                "body"
            ],
            "step": "loader",
            "type": "squirro_query"
        },
        {
            "fields": [
                "body"
            ],
            "step": "filter",
            "type": "empty"
        },
        {
            "input_mapping": {
                "body":"text"
            },
            "output_mapping": {
                "class":"keywords.prediction"
            },
            "process_endpoint": "http://localhost:<PORT>/invocations",
            "name": "mlflow_maas",
            "step": "mlflow_maas",
            "type": "mlflow_maas"
        },
        {
            "fields": [
                "keywords.prediction"
            ],
            "step": "saver",
            "type": "squirro_item"
        }
    ]
}

example 2: sentence Level example with entity generation

Code Block

language	json

{
    "dataset": {
        "infer": {
            "count": 10,
            "query_string": "language:en"
        }
    },
    "pipeline": [
        {
            "fields": [
                "body"
            ],
            "step": "loader",
            "type": "squirro_query"
        },
        {
            "fields": [
                "body"
            ],
            "step": "filter",
            "type": "empty"
        },
        {
            "input_fields": [
                "body"
            ],
            "output_fields": [
                "extract_sentences"
            ],
            "step": "tokenizer",
            "type": "sentences_nltk"
        },
        {
            "fields": [
                "extract_sentences"
            ],
            "step": "filter",
            "type": "doc_split"
        },
        {
            "input_mapping": {
                "extract_sentences":"text"
            },
            "output_mapping": {
                "class":"prediction"
            },
            "process_endpoint": "http://localhost:<PORT>/invocations",
            "name": "mlflow_maas",
            "step": "mlflow_maas",
            "type": "mlflow_maas"
        },
        {
            "fields": [
                "extract_sentences",
                "prediction"
            ],
            "step": "filter",
            "type": "doc_join"
        },
        {
            "entity_name_field": "Catalyst",
            "entity_type": "Catalyst",
            "excluded_values": [],
            "extract_field": "extract_sentences",
            "format_values": false,
            "global_property_field_map": {},
            "modes": [
                "process"
            ],
            "property_field_map": {
                "Catalyst": [
                    "prediction"
                ]
            },
            "required_properties": [
                "Catalyst"
            ],
            "source_field": "body",
            "step": "filter",
            "type": "squirro_entity"
        },
        {
            "fields": [
                "entities"
            ],
            "step": "saver",
            "type": "squirro_item"
        }
    ]
}

These ML Workflows can then be used as inference ML Jobs scheduled in an interval or as a published model in the enrich pipeline (How-to Publish ML Models Using the Squirro Client)This page can now be found at Model-As-a-Service on the Squirro Docs site.

Versions Compared

Old Version 7

New Version Current

Key

Preparation

Creation of a MLFlow Model

MLFlow documentation

Data Structure

Upload of a Model

Starting of Service

Usage of MaaS