Page Comparison

...

Excerpt
An introduction into concept search and how it related to the Squirro Smart Filters.

Concept Search

Concept search is a technology where a search engine can return documents matching a defined concept (see Wikipedia's concept search article for further background).

...

Squirro's implementation of the concept search model is the Smart Filter technology.

Smart Filter Overview

Smart Filters are trained with text documents. Training documents can be paragraphs of text or entire documents. Various formats from plain text to PDF and Microsoft Word are supported.

...

When using a Smart Filter, the index is searched using all the entities that the Smart Filter was trained with. Results get ranked higher based on the number of matching entities and the score of the matching entity. The strictness of a Smart Filter can be controlled by the Noise Level - set to 1.0 (the highest Noise level) all results that match at least one entity are returned. At a lower level (e.g. 0.2) the matching results get ordered by relevance and low relevance matches are eliminated from the result set. Note that the result elimination is not linear; Noise level 0.1 is much stricter than 0.2 for example - this is easy to inspect by adjusting the Noise level in the Squirro UI.

Languages

A Smart Filter can be trained with documents of multiple languages. Squirro detects the language of each document and will create a cluster of the top entities for each language. During a query the entities for each language will be used to filter only documents from the corresponding language.

Out of the box Squirro Smart Filters support the following languages:

Chinese
Dutch
English
French
German
Italian
Portuguese
Russian
Spanish

However the Smart Filter concept works for almost any language. Squirro needs to be trained once to understand the word frequencies for a new language. This is done by creating a GDFS database and the following languages are supported out of the box for this process (in addition to the ones already listed above):

Arabic
Armenian
Basque
Bengali
Bulgarian
Catalan
Czech
Finnish
Galician
Hindi
Hungarian
Indonesian
Irish
Latvian
Lithuanian
Norwegian
Romanian
Sorani
Swedish
Turkish

Versions Compared

Old Version 4

New Version 5

Key

Table of Contents

Concept Search

Smart Filter Overview

Languages