Spellchecking BE

This feature enables the user to spellcheck a potential query based on the data which is loaded in Squirro. There for we make use of the Elasticsearch suggester functionality.

How to use / How does it work

You can send a normal query/item request to the topic api (/v0/{tenant}/projects/{project_id}/items/query) in addition to the query you add a tag spellcheck in the following format:

1 'spellcheck': {'text': 'squiro','field': 'body.en.unstemmed'}

The field text holds the query on which you would like to have a suggestion and the field field takes the Elasticsearch index in which the spellchecker respectively Elasticsearch needs to look for similar words.

The topic api then picks the suggested word with the highest score from the Elasticsearch response. As a response you get in addition to the query results a dictionary with suggestions to your spellcheck query:

1 'spellcheck': {'original': 'squiro', 'corrected': 'squirro'}

Further improvements

  • We see some possible improvements, since the suggester of Elasticsearch only allows requests on specific and single indices:

    • creating of a new indices which is used for the spellchecking

    • sending multiple requests to gather suggestions from various fields

  • A ES query fails if we provide a non existing indices to the spellchecker, since the ES suggester only takes single index field names without wildcards.

    • decoupling from the query execution

    • a-priory checks if the provided index exists

  • We could tackle the problem by loading the tokens into a dictionary as middle layer and access it via a python spellcheck library (e.g. pyspellchecker), if we run into performance issues with the current approach.