Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Query processing improves a user’s search experience by providing more relevant search results. Squirro achieves this improvement by running the user’s query through a customizeable query processing workflow that parses, filters, enriches, and expands queries before performing the actual search and presenting the search results to the user. For example, part of speech (POS) boosting and filtering removes irrelevant terms like conjunctions from the query and gives more weight to relevant parts of the query like nouns. Items that match boosted query terms are ranked higher in the returned search results.

Table of Contents

Overview

The figure below illustrates how query-processing fits into the overall architecture.

...

In the example shown in the figure, the user enters the query country:us 2020-10 covid-19 cases in new york in a Global Search Bar on the Squirro dashboard. The query is then sent through the Query Understanding Plugin (1) to the ML-Service where the query processing workflow, a Squirro ML-Workflow, is executed to apply the following steps on the incoming query:

  1. Language detection

  2. Language-specific spaCy Analysis is applied using the pre-trained spaCy language model (see example) for the detected language. The analysis includes:

    • Tokenization and lemmatization

    • Part of Speech (POS) tagging

    • Named Entity Recognition (NER)

  3. Part of Speech Booster / Filter

    • Assigns weight to tokens based on their POS tags

    • Conjunctions and determiners are removed

  4. Query Modifier.

The final query modifier step applies all modifications to the initial query to produce the Enriched Query (2) which is then used to retrieve the candidate documents that best match the query from the Elasticsearch index (3).

Query processing and rewriting improves the search experience by ranking items that match boosted terms higher and reducing the appearance of irrelevant search results for the query. The latter is achieved by combining terms that belong together. Entities like “New York“ will be treated as such in the query, preventing multipage items (e.g., PDFs) that have “new” on one page and “york” on a different page to be matched and appear in the search results.

Configuration

Starting with Squirro 3.4.5, each project will be pre-configured with a default query processing workflow. The workflow is installed on the server as a global asset and cannot be deleted via the user interface. It is enabled by default.

The behaviour of the workflow is managed in the project configuration under the SETTINGS tab where you can configure the following settings:

...

Name

...

Value

...

Description

...

topic.search.query-workflow-enabled

...

false

true

...

Enable or disable query processing feature

...

topic.search.query-workflow

...

${workflow_id}

...

Set the value to the workflow_id of the ML-workflow you want to use for query processing. By default, the workflow_id is set to the ID of the pre-configured workflow that is setup upon project creation.

Remove if you want to disable query processing.

...

topic.search.query-workflow-mode

...

always

global

...

Modes for workflow execution:
global (recommended, requires Global Search Bar widget)

  • Executes query processing workflow once for the whole dashboard
    (triggered via Global Search Bar widget)

always

  • Execute workflow for every request to the /query endpoint.
    This mode is useful when Squirro is used as an API only.

Workflow Management

You can configure the available workflows under AI STUDIO > ML Workflows.

Every project is equipped with a default query-processing workflow per default. This default workflow is read-only and cannot be deleted or modified. It is managed by the Machine-Learning (ML) Service and is automatically updated to the latest version.

The default query-processing workflow is set as the ACTIVE QUERY PROCESSOR and is listed along with any other custom workflow.

...

If you want to customise the behaviour of the default query-processing workflow, you can CLONE the workflow and edit it’s configuration.
Then by hovering over the newly created workflow, you can click SET ACTIVE to make the cloned workflow the ACTIVE QUERY PROCESSOR.

...

Info

During the startup, the ML-Service automatically adds the default query processing workflow to the projects that don’t have it.

Because each project has its own default workflow, the default query processing workflow is not imported during project importing.

Query Processing Workflow Steps

The default query processing workflow uses the following built-in libNLP steps (since 3.6.1 native nlp-app steps → app.query_processing).

...

titlePre-configured query processing pipeline steps
Code Block
languagejson
{
    "cacheable": true,
    "dataset": {
        "items": []
    },
    "pipeline": [
        {
            "fields": ["query"],
            "step": "loader",
            "type": "squirro_item"
        },
        {
            "step": "app",
            "type": "query_processing",
            "name": "syntax_parser"
        },
        {
            "step": "app",
            "type": "query_processing",
            "name": "lang_detection",
            "fallback_language": "en"
        },
        {
            "step": "app",
            "type": "query_processing",
            "name": "custom_spacy_normalizer",
            "cache_document": true,
            "model_cache_expiration" : 180000,
            "infix_split_hyphen": false,
            "infix_split_chars": ":<>=",
            "merge_noun_chunks": false,
            "merge_phrases": true,
            "merge_entities": true,
            "fallback_language": "en",
            "exclude_spacy_pipes": [],
            "spacy_model_mapping": {
                "en": "en_core_web_sm",
                "de": "de_core_news_sm"
            },
            "struct_log_enable": true,
            "struct_log_name": "spacy-normalizer",
            "struct_log_input_step_fields": ["user_terms_str"]
        },
        {
            "step": "app",
            "type": "query_processing",
            "name": "pos_booster",
            "phrase_proximity_distance": 15,
            "pos_weight_map": {
                "PROPN": "-",
                "NOUN": "-",
                "VERB": "-",
                "ADJ": "-",
                "X": "-",
                "NUM": "-",
                "SYM": "-"
            }
        },
        {
            "step": "app",
            "type": "query_processing",
            "name": "lemma_tagger"
        },
        {
            "step": "app",
            "type": "query_processing",
            "name": "query_classifier",
            "model": "svm-query-classifier"
        },
        {
            "step": "app",
            "type": "query_processing",
            "name": "query_modifier"
        },
        {
            "step": "debugger",
            "type": "log_fields",
            "fields": [
                "user_terms",
                "facet_filters",
                "pos_mutations",
                "type",
                "enriched_query",
                "lemma_map"
            ],
            "log_level": "info"
        }
    ]
}

The workflow is set up to:

  • Parse Squirro Query Syntax & Detect Query Language based on available natural language terms

  • Perform Named Entity Recognition. The Entity Compound gets then rewritten into an additional Phrase Query ('cases in new york' --> rewritten as --> 'cases in (new york OR "new york"~10)')

  • Boost important terms based on their POS tags. Nouns (tags NOUN and PROPN) are boosted by assigning higher weights in the pos_weight_map, and for example the impact of verbs (VERB) is reduced by assigning a lower weight.
    Terms like determiners or conjunctions are removed from the query.

  • Perform query classification: question_or_statement, vs keyword

You can configure the steps of the query processing workflow in the UI in the ML Workflows plugin under the AI STUDIO tab.

How-to Guides

How-to customise Query Processing using custom Steps

How-to Install a SpaCy Language Model This page can now be found at Query Processing on the Squirro Docs site.