Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Significant Terms configuration

Significant Terms allows the visual presentation of terms which are found to be more significant in a data set comparing to another data set. The first data set is called the foreground set, whereas the set used for comparison is the background set. In Squirro, the foreground set is constructed from the current Dashboard query (with all selections currently in place). The background set, on the other hand, is constructed from the base query defined in the dashboard. In case any of the mentioned sets cannot be constructed (for example, Dashboard is not drilled down, so the current query is still the base query), the widget will fall back to simply displaying all the facet values for a configured content facet field, if it is one indeed. A content field can be also a 'special' field, so one of 'body', 'title', 'summary'. In this case, the widget can not show any data without a Search query, so instead a message about this fact is shown to the user. allow to reveal the uncommonly common. This means, that it shows which terms show a significant different value distribution in a foreground-dataset when compared to a background-dataset. This requires a certain amount of documents to work, the more terms there are in a facet, the more document are needed to get a meaningful answer. In Squirro the background-dataset is what is defined in an unmodified dashboard. The foreground-dataset is constructed from the background-dataset and includes the current selection. If there is no selection, meaning that the foreground- and the background-dataset are equal, the term frequency is shown (except if the facet is bodytitle or summary where this operation is too costly).

Significant terms work very well on facets with few values, meaning if computed on a bodytitle or summary field, there are much more documents needed to get a significant term to show-up. One workaround for this restriction is to use phrase- or term detection and index those phrases/terms in a separate facet field. This has shown to improve the results vastly while also not requiring a lot of documents.

The following screenshot shows the significant terms on the left, without any selection:

 

 

 

The Significant Terms configuration is easy for facet content fields, and looks like this:

...