Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Note

works with Squirro 3.5.2 and newer

Model-as-a-Service (MaaS) is an initiative to open up Squirro for custom ML models and speed up the prototyping phase for ML projects in Squirro.

Table of Contents

Preparation

Before you can use MaaS, you need to install the required packages from the Squirro mirror on your target Squirro server:

Code Block
languagebash
yum install squirro-miniforge
yum install squirro-python38-mlflow

Creation of a MLFlow Model

Before you can upload a model, you need to create a MLFlow Model and example can be found here. This can be done in two ways:

  • train a MLFlow model on your local machine or on your exploration server

  • wrap an existing (pre-trained) model into the structure of a MLFlow Model and run it locally

Either way MLFlow stores the (trained) model in the MLFlow base folder (mlruns/0/) with an unique hash (<HASH>) after executing the run-command. The most minimal structure of the MLFlow Model looks as follow:

Code Block
languagebash
├── artifacts
│   └── model
│       ├── conda.yaml
│       ├── MLmodel
│       ├── python_model.pkl
│       └── requirements.txt
└── meta.yaml

MLFlow documentation

Data Structure

To use the MLFlow Model later in the context of a Squirro ML Workflow you need to stick to a specific data structure:

  • the input is a pandas dataframe with an id and named feature fields as columns

  • the output is again a pandas dataframe with an id and result fields as columns

Example:

input DataFrame

Code Block
languagenone
    id                         text
0  id0  this is a example sentence.
1  id1                 hello world.
2  id2             random sentence.
3  id3               test sentence.

...

output DataFrame

Code Block
languagenone
    id   class
0  id0  class1
1  id1  class0
2  id2  class0
3  id3  class1

Upload of a Model

To upload the MLFlow Model we provide two options:

  • via squirro asset (large models >500MB (exact number is under revision) can cause nginx issues → then use scp):

    • go into the MLFlow base folder

    • send the (trained) model via squirro_asset

      Code Block
      languagebash
      squirro_asset -vvv mlflow_models upload -t $TOKEN -c $CLUSTER -f mlruns/0/<HASH>/
  • via scp:

    • go into the MLFlow base folder

    • assure that destination directory exists (on the Squirro server)

      Code Block
      languagebash
      <BASE_DIR>=/var/lib/squirro/topic/assets/mlflow_models # default path
      mkdir -p <BASE_DIR>/mlruns/0
    • compress the directory with the (trained) model (wherever you have trained your model)

      Code Block
      languagebash
      cd mlruns/0/ && tar -czvf trained_model.tar.gz <HASH>/
    • send it to the MLFlow base folder on the Squirro server

      Code Block
      languagebash
      scp trained_model.zip <SQUIRRO_SERVER_URL>:/tmp/ 
    • ssh into the Squirro server and unzip the sent file

      Code Block
      languagebash
      cd <BASE_DIR>/mlruns/0 && 
      mv /tmp/trained_model.tar.gz <BASE_DIR>/mlruns/0/  #create the dirs if not existing
      tar -xzvf trained_model.tar.gz
    • adjust artifact_uri in the meta.yaml with the new path of the MLFlow Model (file:///<BASE_DIR>/mlruns/0/<HASH>/artifacts)

      Code Block
      languagebash
      sed -i '/artifact_uri/c\artifact_uri: file:///<BASE_DIR>/mlruns/0/<HASH>/artifacts' <HASH>/meta.yaml

Starting of Service

To start a Model-as-a-Service you need to execute following steps:

  • make sure you are in the MLFlow base folder on the Squirro server

  • activate the squirro environment

    Code Block
    languagebash
    squirro_activate3
  • serve the model identified by the <HASH> as a service listening to the chosen port <PORT>

    Code Block
    languagebash
    mlflow models serve -m runs:/<HASH>/model -p <PORT>
    • use nohup or screen when starting the service so the MaaS does not stop when you terminate your ssh session

Note

  • there is no service orchestration provided at this stage

  • keep an eye on memory and storage consumption. Then among others:

    • a started model service loads the model in memory and keeps it there

    • there is a new conda environment created for every new model which has a different conda.yaml file

  • on-premise customers need to manually package their conda environment. This can be done as explained here.

Usage of MaaS

To use the Model you need to create a ML Workflow:

  • example 1: document level

    Code Block
    languagejson
    {
        "dataset": {
            "infer": {
                "count": 10,
                "query_string": "language:en"
            }
        },
        "pipeline": [
            {
                "fields": [
                    "body"
                ],
                "step": "loader",
                "type": "squirro_query"
            },
            {
                "fields": [
                    "body"
                ],
                "step": "filter",
                "type": "empty"
            },
            {
                "input_mapping": {
                    "body":"text"
                },
                "output_mapping": {
                    "class":"keywords.prediction"
                },
                "process_endpoint": "http://localhost:<PORT>/invocations",
                "name": "mlflow_maas",
                "step": "mlflow_maas",
                "type": "mlflow_maas"
            },
            {
                "fields": [
                    "keywords.prediction"
                ],
                "step": "saver",
                "type": "squirro_item"
            }
        ]
    }
  • example 2: sentence Level example with entity generation

    Code Block
    languagejson
    {
        "dataset": {
            "infer": {
                "count": 10,
                "query_string": "language:en"
            }
        },
        "pipeline": [
            {
                "fields": [
                    "body"
                ],
                "step": "loader",
                "type": "squirro_query"
            },
            {
                "fields": [
                    "body"
                ],
                "step": "filter",
                "type": "empty"
            },
            {
                "input_fields": [
                    "body"
                ],
                "output_fields": [
                    "extract_sentences"
                ],
                "step": "tokenizer",
                "type": "sentences_nltk"
            },
            {
                "fields": [
                    "extract_sentences"
                ],
                "step": "filter",
                "type": "doc_split"
            },
            {
                "input_mapping": {
                    "extract_sentences":"text"
                },
                "output_mapping": {
                    "class":"prediction"
                },
                "process_endpoint": "http://localhost:<PORT>/invocations",
                "name": "mlflow_maas",
                "step": "mlflow_maas",
                "type": "mlflow_maas"
            },
            {
                "fields": [
                    "extract_sentences",
                    "prediction"
                ],
                "step": "filter",
                "type": "doc_join"
            },
            {
                "entity_name_field": "Catalyst",
                "entity_type": "Catalyst",
                "excluded_values": [],
                "extract_field": "extract_sentences",
                "format_values": false,
                "global_property_field_map": {},
                "modes": [
                    "process"
                ],
                "property_field_map": {
                    "Catalyst": [
                        "prediction"
                    ]
                },
                "required_properties": [
                    "Catalyst"
                ],
                "source_field": "body",
                "step": "filter",
                "type": "squirro_entity"
            },
            {
                "fields": [
                    "entities"
                ],
                "step": "saver",
                "type": "squirro_item"
            }
        ]
    } 

These ML Workflows can then be used as inference ML Jobs scheduled in an interval or as a published model in the enrich pipeline (How-to Publish ML Models Using the Squirro Client)This page can now be found at Model-As-a-Service on the Squirro Docs site.