Aggregations can be requested when querying items. They are used to calculate summarized views and statistics about the data in search results. This allows you to build data drill-downs along the structured part of items as well as visualizations over time or other data dimensions.
The request format is an url encoded JSON object of the following format:
{<field_or_label>: { ["fields": <field_name> | <list of field_names>] [, "size": <number of results>]? [, "method": <metric_function>]? [, "interval": <interval>]? [, "aggregation": <field name>]? } [, <field_or_label> : { ... }] } |
{ "my_aggregation": { "fields": "Author", "size": 10, "method": "terms" } } |
The response is returned in the following format:
{ "aggregations": { "<label>": { "<field_name>": { "values": [ {"key": "value1", "value": 11}, {"key": "value2", "value": 9}, ..... ] } } } } |
or for numerically aggregated values:
{ "aggregations": { "<label>": { "<field_name>": { "values": [ {"value": 11} ] } } } } |
Aggregations can be nested, in which case the response will be nested as well:
{ "aggregations": { "<label>": { "<field_name>": { "values": [ { "key": "value1", "value": 11, "values": [ { "key": "sub_value1", "value": 7 }, { "key": "sub_value2", "value": 4 } ] }, { "key": "value2", "value": 9 "values": [ { "key": "sub_value1", "value": 6 }, { "key": "sub_value2", "value": 3 } ] }, ..... ] } } } } |
When selecting multiple labels, the output will contain responses for each of the labels:
{ "aggregations": { "<label1>": { "<field_name1>": { "values": [ {"key": "value1", "value": 11}, {"key": "value2", "value": 9}, ..... ] } } "<label2>": { "<field_name2>": { "values": [ {"key": "value1", "value": 11}, {"key": "value2", "value": 9}, ..... ] } } } } |
If one label contains multiple fields, the response will contain a section for each field below the label:
{ "aggregations": { "<label>": { "<field_name1>": { "values": [ {"key": "value1", "value": 11}, {"key": "value2", "value": 9}, ..... ] }, "<field_name2>": { "values": [ {"key": "value3", "value": 7}, {"key": "value4", "value": 5}, ..... ] } } } } |
This section contains examples for the various calculations that can be requested with aggregations.
Simple one dimensional value faceting over a single field. This will return a result count for each value in the given field.
"language": { "fields": "language" } |
This can also be shortened, as the label can also serve as the field:
"language": {} |
By default 10 results are returned for each field. This can be changed by setting the size parameter. For example to retrieve the top 3 languages from the result set:
"language": { "size": 3 } |
To return the result counts for all values in multiple fields, simply list the fields:
"multi": { "fields": ["provider", "language"] } |
It's possible to add nested dimensions in an aggregation. The example below groups the result first by client
and then within each client
add up the revenue
column:
{ "outer_agg_name": { "fields": "client", "aggregation": { "fields": "revenue", "method": "sum" } } } |
The histogram aggregation method is recommended for dates, because it groups values by hour, day, week, etc.
To count the results for \item's creation dates, use the following aggregation:
"$item_created_at": { "method": "histogram" } |
This can again be nested, to for example aggregate the languages of items over time:
"$item_created_at": { "method": "histogram", "aggregation": { "fields": ["language"], } } |
This can be used to do the Significant Terms aggregation. Please note that this requires sending a query
also to make this effective.
"aggregation_name": { "method": "significant_terms", "fields": ["provider", "language"] } |
Below is a sample python script that can be used to authenticate against a Squirro Cluster and request these aggregations.
"from squirro_client import SquirroClient s = SquirroClient(None, None, cluster="https://unstable.squirro.net") s.authenticate(refresh_token="FILL_IN_TOKEN") aggregations = { "client_agg": { "fields": "client", "aggregation": {"fields": "money", "method": "sum"}, } } res = s.query(project_id="PROJECT_ID", aggregations=aggregations) print(res) |