Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Excerpt

The theory behind how recommendations work in Squirro and the 3 different methods we provide based on non-correlated facets, correlated facets or machine learning

...

We provide 3 methods: non-correlated facets, correlated facets and machine learning for computing the score(f1, f2, ...fn). With the non-correlated facets and correlated facets methods, the data could be recommended immediately after loading to the storage without any training process. For the machine learning methods, you need to know ml_workflow_id after model is trained.

Non-correlated facets

The score of a class C is computed based on average score of each individual feature that belongs to it. Scores of individual feature is the probability that feature f co-occurs with C in a document or entity

score(C, f1, f2, ...fn) = (score(C, f1) + score(C, f2) + ... + score(C, fn)) / n

where score(C, fi) = power_norm(P(C|fi)) = power_norm(#E(C, fi)/#(E(fi)))

  • #E(C, fi): Number of entities contains both C and fi
  • #E(fi): Number of entities contains fi
  • power_norm: Power normalization function, see explain below

Correlated facets

The score of each class C is computed based conditional probability of C given all features 

By definition: score(C, f1, f2, ...fn) = P(C | f1, f2, ...,fn) = #E(C, f1, f2, ...fn) / #E(f1, f2, ..., fn)

where #E(fi): Number of documents or entities contains fi

However in case we have class which does not contain all the features f1, f2, ..., fn then #E(f1, f2, ..., fn) = 0, this makes score infinite. So given that we have n input feature, and class C contains only l features (l <=n), the final score is computed as:

score(C, f1, f2, ...fn) = (l + P(C | f1, f2, ...,fl) * (n-l))/n =(l + #E(C, f1, f2, ...fl) / #E(f1, f2, ..., fl) * (l - n)) / n

Machine learning (Classification)

Generates scored classes of one facet (entity property) by using other facets (entity properties) as input. This is a typical classification task, and can be done via the Machine learning service.

We currently support most classifiers available through SKLearn, though a custom classifier is also possible.

Example machine learning workflow:

...

workflow.json:

Code Block
languagejs
{
  "name": "test",
  "config": {
    "dataset": {
      "query_string": "*"
    },
    "path": ".",
    "pipeline": [{
      "step": "loader",
      "type": "squirro_query",
      "fields": ["keywords.Salary", "keywords.City", "keywords.Job"]
    },{
      "step": "checkpoint",
      "type": "disk",
      "batch_size": 64
    },{
      "step": "classifier",
      "type": "sklearn",
      "model_type": "SVC",
      "model_kwargs": {"probability": true},
      "input_fields": ["keywords.Salary", "keywords.Job"],
      "label_field": "keywords.City",
      "output_field": "keywords.City",
      "explanation_field": "keywords.City_explanation"
    }]
  }
}

Machine learning (Regression aggregation)

Generates scored classes of one facet (entity property) by aggregating another facet (entity property). The scoring of individual items (entitiies) is a regression task, and can be done via the Machine learning service. The aggregation is the performed by a filter step in the machine learning workflow.

We currently support most regressors available through SKLearn, though a custom regressor is also possible.

Example machine learning workflow:

...

workflow.json:

Code Block
{
  "name": "test",
  "config": {
    "dataset": {
      "query_string": "*"
    },
    "path": ".",
    "pipeline": [{
      "step": "loader",
      "type": "squirro_query",
      "fields": ["keywords.Salary", "keywords.City", "keywords.Job"]
    },{
      "step": "checkpoint",
      "type": "disk",
      "batch_size": 64
    },{
      "step": "classifier",
      "type": "sklearn",
      "model_type": "SVR",
      "input_fields": ["keywords.City", "keywords.Job"],
      "label_field": "keywords.Salary",
      "output_field": "keywords.Salary",
      "explanation_field": "keywords.Salary_explanation"
    },{
      "step": "filter",
      "type": "aggregate",
      "aggregated_fields": ["keywords.Salary"],
      "aggregating_field": "keywords.City"
    }]
  }
}

Understanding Power Normalization Functions

When using probability to compute the score, usually the value we get is quite small, for example a highest  probability has value 1/10, we just display the value 10% to user which make user feel less confident to take that recommendation. Therefore we define a function called power normalization to transform the probability to a reasonable value to display to user:

power_norm(x) = norm_base + power(x, 1/accelerator) * (1 - norm_base )

This function guarantees following characteristics:

  • If 0 <= x <= 1 then 0<= F(x) <=1
  • If xi < xj then F(xi) < F(xj)

where

  • accelerator: higher this value, quicker score reach 1
  • norm_base: base value to separate with 0

The figure below shows different values of power normalization function with norm_base = 0.5 and accelerator = 1, 2, 3, 4

...

This page can now be found at Recommendations on the Squirro Docs site.