Aggregations can be requested when querying items. They are used to calculate summarized views and statistics about the data in search results. This allows you to build data drill-downs along the structured part of items as well as visualizations over time or other data dimensions.
Table of Contents
Table of Contents |
---|
outline | true |
---|
exclude | Table of Contents |
---|
|
Specification
The request format is an url encoded JSON object of the following format:
Code Block |
---|
|
{<field_or_label>:
{
["fields": <field_name> | <list of field_names>]
[, "size": <number of results>]?
[, "method": <metric_function>]?
[, "interval": <interval>]?
[, "aggregation": <field name>]?
}
[, <field_or_label> : {
...
}]
} |
- field_or_label – may be a label to group multiple aggregations by specifying fields inside the aggregation dictionary. If no fields are provided, the label is used as fields.
- fields – can be any of
- all – all facets
- language – item language
- source – source field
- provider – provider field
- $<field> – :item field named
- other value – mapped to a facet field with the corresponding name
- aggregation – in the same format for specifying lower dimension aggregations.
- size – an integer specifying the number of aggregation results to return. Only supported for method terms and significant_terms. The default size is 10. (See below "Set number of results" on how to change the number of results returned.)
- method – a method name to aggregate number type return values. Defaults to terms. Available methods are:
- terms
- significant_terms
- histogram
- interval – an interval for histograms. Only supported for method histogram. For date_time type facets the available intervals are: year, quarter, month, week, day, hour, minute, second.
The response is returned in the following formatExample for a single keyword/facets:
Code Block |
language |
---|
js | {
"aggregations"my_aggregation": {
"fields": "Author",
"<label>size": 10,
{ "method": "terms"
}
} |
The response is returned in the following format:
Code Block |
---|
|
{
"<field_name>aggregations": {
"<label>": {
"<field_name>": {
"values": [
{"key": "value1", "value": 11},
{"key": "value2", "value": 9},
.....
]
}
}
}
} |
or for numerically aggregated values:
Code Block |
---|
|
{
"aggregations": {
"<label>": {
"<field_name>": {
"values": [
{"value": 11}
]
}
}
}
} |
Aggregations can be nested, in which case the response will be nested as well:
Code Block |
---|
|
{
"aggregations": {
"<label>": {
"<field_name>": {
"values": [
{
"key": "value1",
"value": 11,
"values": [
{
"key": "sub_value1",
"value": 7
},
{
"key": "sub_value2",
"value": 4
}
]
}, {
"key": "value2",
"value": 9
"values": [
{
"key": "sub_value1",
"value": 6
},
{
"key": "sub_value2",
"value": 3
}
]
},
.....
]
}
}
}
} |
When selecting multiple labels, the output will contain responses for each of the labels:
Code Block |
---|
|
{
"aggregations": {
"<label1>": {
"<field_name1>": {
"values": [
{"key": "value1", "value": 11},
{"key": "value2", "value": 9},
.....
]
}
}
"<label2>": {
"<field_name2>": {
"values": [
{"key": "value1", "value": 11},
{"key": "value2", "value": 9},
.....
]
}
}
}
} |
If one label contains multiple fields, the response will contain a section for each field below the label:
Code Block |
---|
|
{
"aggregations": {
"<label>": {
"<field_name1>": {
"values": [
{"key": "value1", "value": 11},
{"key": "value2", "value": 9},
.....
]
},
"<field_name2>": {
"values": [
{"key": "value3", "value": 7},
{"key": "value4", "value": 5},
.....
]
}
}
}
} |
Examples
This section contains examples for the various calculations that can be requested with aggregations.
Group by field
Simple one dimensional value faceting over a single field. This will return a result count for each value in the given field.
Code Block |
---|
|
"language": {
"fields": "language"
} |
This can also be shortened, as the label can also serve as the field:
Set number of results
By default 10 results are returned for each field. This can be changed by setting the size parameter. For example to retrieve the top 3 languages from the result set:
Code Block |
---|
|
"language": {
"size": 3
} |
To return the result counts for all values in multiple fields, simply list the fields:
Code Block |
---|
|
"multi": {
"fields": ["provider", "language"]
} |
Group by multiple fields
It's possible to add nested dimensions in an aggregation. The example below groups the result first by language and client
and then within each language by provider client
add up the revenue
column:
Code Block |
---|
|
"language": {
"aggregateouter_agg_name": {
"fields": "provider"}
} |
Group by date
The histogram aggregation method is recommended for dates, because it groups values by hour, day, week, etc.
To count the results for \item's creation dates, use the following aggregation:
Code Block |
---|
|
"$item_created_at": {
"method": "histogram"
} |
This can again be nested, to for example aggregate the languages of items over time:
Code Block |
---|
|
"$item_created_at": {
"method": "histogram",
"aggregation": {
"fields": ["language"]client",
"aggregation": {
"fields": "revenue",
"method": "sum"
}
}
} |
Group by date
The histogram aggregation method is recommended for dates, because it groups values by hour, day, week, etc.
To count the results for \item's creation dates, use the following aggregation:
Code Block |
---|
|
"$item_created_at": {
"method": "histogram"
} |
This can again be nested, to for example aggregate the languages of items over time:
Code Block |
---|
|
"$item_created_at": {
"method": "histogram",
"aggregation": {
"fields": ["language"],
}
} |
Significant Terms Aggregation
This can be used to do the Significant Terms aggregation. Please note that this requires sending a query
also to make this effective.
Code Block |
---|
|
"aggregation_name": {
"method": "significant_terms",
"fields": ["provider", "language"]
} |
Sample Python script
Below is a sample python script that can be used to authenticate against a Squirro Cluster and request these aggregations.
Code Block |
---|
|
"from squirro_client import SquirroClient
s = SquirroClient(None, None, cluster="https://unstable.squirro.net")
s.authenticate(refresh_token="FILL_IN_TOKEN")
aggregations = {
"client_agg": {
"fields": "client",
"aggregation": {"fields": "money", "method": "sum"},
}
}
res = s.query(project_id="PROJECT_ID", aggregations=aggregations)
print(res) |