Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Language detection

  2. Language-specific spaCy Analysis is applied using the pre-trained spaCy language model (see example) for the detected language. The analysis includes:

    • Tokenization and lemmatization

    • Part of Speech (POS) tagging

    • Named Entity Recognition (NER)

  3. Part of Speech Booster / Filter

    • Assigns weight to tokens based on their POS tags

    • Conjunctions and determiners are removed

  4. Query Modifier.

...

You can configure the available workflows under AI STUDIO > ML Workflows.

Every project is equipped with a default query-processing workflow per default. This default workflow is read-only and cannot be deleted or modified. It is managed by the Machine-Learning (ML) Service and is automatically updated to the latest version.

The default query-processing workflow is set as the ACTIVE QUERY PROCESSOR and is listed along with any other custom workflow.

...

Hovering over a workflow, you can click SET ACTIVE to make the workflow the ACTIVE QUERY PROCESSOR.

...

is listed along with any other custom workflow.

...

If you want to customise the behaviour of the default query-processing workflow, you can CLONE the workflow and edit it’s configuration.
Then by hovering over the newly created workflow, you can click SET ACTIVE to make the cloned workflow the ACTIVE QUERY PROCESSOR.

...

The default query processing workflow cannot be deleted, but can be disabled. To disable performing query processing, you can navigate to the SETTINGS > Project Configuration andremove the topic.search.query-workflow option by clicking the RESET button.

Info

During the startup, the ML-Service automatically adds the default query processing workflow to the projects that don’t have it.

Because each project has its own default workflow, the default query processing workflow is not imported during project importing.

Query Processing Workflow Steps

...

Expand
titlePre-configured query processing pipeline steps
Code Block
languagejson
{
  "component": "Query-Processing",
  "cacheable": true,
    "dataset": {
        "items": []
    },
  "pipeline": [
    {
      "fields": [
        "query",
        "user_terms",
        "facet_filters"
      ],
      "step": "loader",
      "type": "squirro_item"
    },
    {
      "step": "custom",
      "type": "parse",
      "name": "syntax_parser"
    },
    {
      "step": "custom",
      "type": "analysis",
      "name": "lang_detection",
      "input_field": "user_terms_str"
    },
    {
      "step": "custom",
      "name": "custom_spacy_normalizer",
      "type": "analysis",
      "infix_split_hyphen": false,
      "infix_split_chars": ":<>=",
      "merge_entities": true,
      "merge_noun_chunks": false,
      "cacheable": true,
      "input_fields": [
        "user_terms_str"
      ],
      "output_fields": [
        "nlp"
      ],
      "exclude_spacy_pipes": [],
      "spacy_model_mapping": {
        "en": "en_core_web_sm",
        "de": "de_core_news_sm"
      }
    },
    {
      "step": "custom",
      "type": "enrich",
      "name": "pos_booster",
      "strict_filter": true,
      "analyzed_input_field": "nlp",
      "phrase_proximity_distance" : 15,
      "pos_weight_map": {
        "PROPN": 10,
        "NOUN": 10,
        "VERB": 2,
        "ADJ": 5,
        "X": "-",
        "NUM": "-",
        "SYM": "-"
      }
    },
    {
      "step": "custom",
      "type": "enrich",
      "name": "query_modifier",
      "raw_input_field": "query",
      "term_mutations_metadata": ["term_expansion_mutations","pos_mutations"],
      "output_field": "enriched_query"
    },
    {
      "step": "debugger",
      "type": "log_fields",
      "fields": [
        "user_terms",
        "facet_filters",
        "pos_mutations",
        "term_expansion_mutations",
        "enriched_query"
      ],
      "log_level": "info"
    }
  ]
}

...