When indexing data from other data sources into Squirro, the data is transformed into the Squirro item format. In this process keywords are used to add structured and semi-structured information to the items.

This section talks about the considerations when thinking of the keywords to use on Squirro items.

None of the guidelines on this page are binding. They are simply best practices that the Squirro team uses when creating projects. Outside of the performance considerations Squirro product doesn't enforce any specific way of working with facets, labelling them, etc.

Table of Contents

Considerations

Performance

Performance considerations come into play when working with facets. Every facet that's maintained adds a bit of overhead, especially memory consumption. When returning the facet selection list in the search screen, Squirro and the underlying Elasticsearch server, need to look at every result and count the occurrences of each facet value.

To improve performance, the following changes can be made:

Completeness

It's tempting to initially import any and every field from the source data and add them as a facet. While this is often a sensible approach in a PoC or exploratory phase, for production this should be avoided.

Only facets should be imported that are actually used in dashboards, filtering and search. If a facet needs to be added at a later stage that is not a problem and can always be done.

Usability

Generally it's best to use facet names internally that don't have spaces and are lower case. The display name is then used to give them a user-friendly name.

Also group facets into user-friendly groups and make use of the provided formatting options, such as date format.