Note |
---|
works with Squirro 3.5.2 and newer |
Model-as-a-Service (MaaS) is an initiative to open up Squirro for custom ML models and speed up the prototyping phase for ML projects in Squirro.
Table of Contents |
---|
Preparation
Before you can use MaaS, you need to install the required packages from the Squirro mirror on your target Squirro server:
Code Block | ||
---|---|---|
| ||
yum install squirro-miniforge
yum install squirro-python38-mlflow |
Creation of a MLFlow Model
Before you can upload a model, you need to create a MLFlow Model and example can be found here. This can be done in two ways:
train a MLFlow model on your local machine or on your exploration server
wrap an existing (pre-trained) model into the structure of a MLFlow Model and run it locally
Either way MLFlow stores the (trained) model in the MLFlow base folder (mlruns/0/
) with an unique hash (<HASH>
) after executing the run
-command. The most minimal structure of the MLFlow Model looks as follow:
Code Block | ||
---|---|---|
| ||
├── artifacts
│ └── model
│ ├── conda.yaml
│ ├── MLmodel
│ ├── python_model.pkl
│ └── requirements.txt
└── meta.yaml |
MLFlow documentation
built-in model flavors (to write a own model)
Data Structure
To use the MLFlow Model later in the context of a Squirro ML Workflow you need to stick to a specific data structure:
the input is a pandas dataframe with an
id
and named feature fields as columnsthe output is again a pandas dataframe with an
id
and result fields as columns
Example:
input DataFrame
Code Block | ||
---|---|---|
| ||
id text
0 id0 this is a example sentence.
1 id1 hello world.
2 id2 random sentence.
3 id3 test sentence. |
...
output DataFrame
Code Block | ||
---|---|---|
| ||
id class
0 id0 class1
1 id1 class0
2 id2 class0
3 id3 class1 |
Upload of a Model
To upload the MLFlow Model we provide two options:
via squirro asset (large models
>500MB
(exact number is under revision) can cause nginx issues → then usescp
):go into the MLFlow base folder
send the (trained) model via squirro_asset
Code Block language bash squirro_asset -vvv mlflow_models upload -t $TOKEN -c $CLUSTER -f mlruns/0/<HASH>/
via scp:
go into the MLFlow base folder
assure that destination directory exists (on the Squirro server)
Code Block language bash <BASE_DIR>=/var/lib/squirro/topic/assets/mlflow_models # default path mkdir -p <BASE_DIR>/mlruns/0
compress the directory with the (trained) model (wherever you have trained your model)
Code Block language bash cd mlruns/0/ && tar -czvf trained_model.tar.gz <HASH>/
send it to the MLFlow base folder on the Squirro server
Code Block language bash scp trained_model.zip <SQUIRRO_SERVER_URL>:/tmp/
ssh
into the Squirro server and unzip the sent fileCode Block language bash cd <BASE_DIR>/mlruns/0 && mv /tmp/trained_model.tar.gz <BASE_DIR>/mlruns/0/ #create the dirs if not existing tar -xzvf trained_model.tar.gz
adjust
artifact_uri
in themeta.yaml
with the new path of the MLFlow Model (file:///<BASE_DIR>/mlruns/0/<HASH>/artifacts
)Code Block language bash sed -i '/artifact_uri/c\artifact_uri: file:///<BASE_DIR>/mlruns/0/<HASH>/artifacts' <HASH>/meta.yaml
Starting of Service
To start a Model-as-a-Service you need to execute following steps:
make sure you are in the MLFlow base folder on the Squirro server
activate the squirro environment
Code Block language bash squirro_activate3
serve the model identified by the
<HASH>
as a service listening to the chosen port<PORT>
Code Block language bash mlflow models serve -m runs:/<HASH>/model -p <PORT>
Note
there is no service orchestration provided at this stage
keep an eye on memory and storage consumption. Then among others:
a started model service loads the model in memory and keeps it there
there is a new conda environment created for every new model which has a different
conda.yaml
file
on-premise customers need to manually package their conda environment. This can be done as explained here.
Usage of MaaS
To use the Model you need to create a ML Workflow:
example 1: document level
Code Block language json { "dataset": { "infer": { "count": 10, "query_string": "language:en" } }, "pipeline": [ { "fields": [ "body" ], "step": "loader", "type": "squirro_query" }, { "fields": [ "body" ], "step": "filter", "type": "empty" }, { "input_mapping": { "body":"text" }, "output_mapping": { "class":"keywords.prediction" }, "process_endpoint": "http://localhost:<PORT>/invocations", "name": "mlflow_maas", "step": "mlflow_maas", "type": "mlflow_maas" }, { "fields": [ "keywords.prediction" ], "step": "saver", "type": "squirro_item" } ] }
example 2: sentence Level example with entity generation
Code Block language json { "dataset": { "infer": { "count": 10, "query_string": "language:en" } }, "pipeline": [ { "fields": [ "body" ], "step": "loader", "type": "squirro_query" }, { "fields": [ "body" ], "step": "filter", "type": "empty" }, { "input_fields": [ "body" ], "output_fields": [ "extract_sentences" ], "step": "tokenizer", "type": "sentences_nltk" }, { "fields": [ "extract_sentences" ], "step": "filter", "type": "doc_split" }, { "input_mapping": { "extract_sentences":"text" }, "output_mapping": { "class":"prediction" }, "process_endpoint": "http://localhost:<PORT>/invocations", "name": "mlflow_maas", "step": "mlflow_maas", "type": "mlflow_maas" }, { "fields": [ "extract_sentences", "prediction" ], "step": "filter", "type": "doc_join" }, { "entity_name_field": "Catalyst", "entity_type": "Catalyst", "excluded_values": [], "extract_field": "extract_sentences", "format_values": false, "global_property_field_map": {}, "modes": [ "process" ], "property_field_map": { "Catalyst": [ "prediction" ] }, "required_properties": [ "Catalyst" ], "source_field": "body", "step": "filter", "type": "squirro_entity" }, { "fields": [ "entities" ], "step": "saver", "type": "squirro_item" } ] }
These ML Workflows can then be used as inference ML Jobs scheduled in an interval or as a published model in the enrich pipeline (How-to Publish ML Models Using the Squirro Client)This page can now be found at Model-As-a-Service on the Squirro Docs site.