works with Squirro 3.5.2 and newer
Model-as-a-Service (MaaS) is an initiative to open up Squirro for custom ML models and speed up the prototyping phase for ML projects in Squirro.
Preparation
Before you can use MaaS, you need to install the required packages from the Squirro mirror on your target Squirro server:
yum install squirro-miniforge yum install squirro-python38-mlflow
Creation of a MLFlow Model
Before you can upload a model, you need to create a MLFlow Model and example can be found here. This can be done in two ways:
train a MLFlow model on your local machine or on your exploration server
wrap an existing (pre-trained) model into the structure of a MLFlow Model and run it locally
Either way MLFlow stores the (trained) model in the MLFlow base folder (mlruns/0/
) with an unique hash (<HASH>
) after executing the run
-command. The most minimal structure of the MLFlow Model looks as follow:
├── artifacts │ └── model │ ├── conda.yaml │ ├── MLmodel │ ├── python_model.pkl │ └── requirements.txt └── meta.yaml
MLFlow documentation
built-in model flavors (to write a own model)
Data Structure
To use the MLFlow Model later in the context of a Squirro ML Workflow you need to stick to a specific data structure:
the input is a pandas dataframe with an
id
and named feature fields as columnsthe output is again a pandas dataframe with an
id
and result fields as columns
Example:
input DataFrame
id text 0 id0 this is a example sentence. 1 id1 hello world. 2 id2 random sentence. 3 id3 test sentence.
output DataFrame
id class 0 id0 class1 1 id1 class0 2 id2 class0 3 id3 class1
Upload of a Model
To upload the MLFlow Model we provide two options:
via squirro asset (large models
>500MB
(exact number is under revision) can cause nginx issues → then usescp
):go into the MLFlow base folder
send the (trained) model via squirro_asset
squirro_asset -vvv mlflow_models upload -t $TOKEN -c $CLUSTER -f mlruns/0/<HASH>/
via scp:
go into the MLFlow base folder
assure that destination directory exists (on the Squirro server)
<BASE_DIR>=/var/lib/squirro/topic/assets/mlflow_models # default path mkdir -p <BASE_DIR>/mlruns/0
compress the directory with the (trained) model (wherever you have trained your model)
cd mlruns/0/ && tar -czvf trained_model.tar.gz <HASH>/
send it to the MLFlow base folder on the Squirro server
scp trained_model.zip <SQUIRRO_SERVER_URL>:/tmp/
ssh
into the Squirro server and unzip the sent filecd <BASE_DIR>/mlruns/0 && mv /tmp/trained_model.tar.gz <BASE_DIR>/mlruns/0/ #create the dirs if not existing tar -xzvf trained_model.tar.gz
adjust
artifact_uri
in themeta.yaml
with the new path of the MLFlow Model (file:///<BASE_DIR>/mlruns/0/<HASH>/artifacts
)sed -i '/artifact_uri/c\artifact_uri: file:///<BASE_DIR>/mlruns/0/<HASH>/artifacts' <HASH>/meta.yaml
Starting of Service
To start a Model-as-a-Service you need to execute following steps:
make sure you are in the MLFlow base folder on the Squirro server
activate the squirro environment
squirro_activate3
serve the model identified by the
<HASH>
as a service listening to the chosen port<PORT>
mlflow models serve -m runs:/<HASH>/model -p <PORT>
Note
there is no service orchestration provided at this stage
keep an eye on memory and storage consumption. Then among others:
a started model service loads the model in memory and keeps it there
there is a new conda environment created for every new model which has a different
conda.yaml
file
on-premise customers need to manually package their conda environment. This can be done as explained here.
Usage of MaaS
To use the Model you need to create a ML Workflow:
example 1: document level
{ "dataset": { "infer": { "count": 10, "query_string": "language:en" } }, "pipeline": [ { "fields": [ "body" ], "step": "loader", "type": "squirro_query" }, { "fields": [ "body" ], "step": "filter", "type": "empty" }, { "input_mapping": { "body":"text" }, "output_mapping": { "class":"keywords.prediction" }, "process_endpoint": "http://localhost:<PORT>/invocations", "name": "mlflow_maas", "step": "mlflow_maas", "type": "mlflow_maas" }, { "fields": [ "keywords.prediction" ], "step": "saver", "type": "squirro_item" } ] }
example 2: sentence Level example with entity generation
{ "dataset": { "infer": { "count": 10, "query_string": "language:en" } }, "pipeline": [ { "fields": [ "body" ], "step": "loader", "type": "squirro_query" }, { "fields": [ "body" ], "step": "filter", "type": "empty" }, { "input_fields": [ "body" ], "output_fields": [ "extract_sentences" ], "step": "tokenizer", "type": "sentences_nltk" }, { "fields": [ "extract_sentences" ], "step": "filter", "type": "doc_split" }, { "input_mapping": { "extract_sentences":"text" }, "output_mapping": { "class":"prediction" }, "process_endpoint": "http://localhost:<PORT>/invocations", "name": "mlflow_maas", "step": "mlflow_maas", "type": "mlflow_maas" }, { "fields": [ "extract_sentences", "prediction" ], "step": "filter", "type": "doc_join" }, { "entity_name_field": "Catalyst", "entity_type": "Catalyst", "excluded_values": [], "extract_field": "extract_sentences", "format_values": false, "global_property_field_map": {}, "modes": [ "process" ], "property_field_map": { "Catalyst": [ "prediction" ] }, "required_properties": [ "Catalyst" ], "source_field": "body", "step": "filter", "type": "squirro_entity" }, { "fields": [ "entities" ], "step": "saver", "type": "squirro_item" } ] }
These ML Workflows can then be used as inference ML Jobs scheduled in an interval or as a published model in the enrich pipeline (How-to Publish ML Models Using the Squirro Client).