Aggregations can be requested when querying items. They are used to calculate summarized views and statistics about the data in search results. This allows you to build data drill-downs along the structured part of items as well as visualizations over time or other data dimensions.

Table of Contents

Specification

Request format

The request format is an url encoded JSON object of the following format:


{<field_or_label>:
    {
        ["fields": <field_name> | <list of field_names>]
        [, "size": <number of results>]?
        [, "method": <metric_function>]?
        [, "interval": <interval>]?
        [, "aggregation": <field name>]?
    }
 [, <field_or_label> : {
    ...
 }]
}

Example for a single keyword/facets:

{
  "my_aggregation": {
    "fields": "Author",
    "size": 10,
    "method": "terms"
  }
}


Response format

The response is returned in the following format:

 {
    "aggregations": {
        "<label>": {
            "<field_name>": {
                "values": [
                    {"key": "value1", "value": 11},
                    {"key": "value2", "value": 9},
                    .....
                ]
            }
        }
    }
}

or for numerically aggregated values:


{
    "aggregations": {
        "<label>": {
            "<field_name>": {
                "values": [
                    {"value": 11}
                ]
            }
        }
    }
}

Aggregations can be nested, in which case the response will be nested as well:

{
    "aggregations": {
        "<label>": {
            "<field_name>": {
                "values": [
                    {
                        "key": "value1",
                        "value": 11,
                        "values": [
                            {
                                "key": "sub_value1",
                                "value": 7
                            },
                            {
                                "key": "sub_value2",
                                "value": 4
                            }
                        ]
                    }, {
                        "key": "value2",
                        "value": 9
                        "values": [
                            {
                                "key": "sub_value1",
                                "value": 6
                            },
                            {
                                "key": "sub_value2",
                                "value": 3
                            }
                        ]
                    },
                    .....
                ]
            }
        }
    }
}

When selecting multiple labels, the output will contain responses for each of the labels:

{
    "aggregations": {
        "<label1>": {
            "<field_name1>": {
                "values": [
                    {"key": "value1", "value": 11},
                    {"key": "value2", "value": 9},
                    .....
                ]
            }
        }
        "<label2>": {
            "<field_name2>": {
                "values": [
                    {"key": "value1", "value": 11},
                    {"key": "value2", "value": 9},
                    .....
                ]
            }
        }
    }
}

If one label contains multiple fields, the response will contain a section for each field below the label:

 {
    "aggregations": {
        "<label>": {
            "<field_name1>": {
                "values": [
                    {"key": "value1", "value": 11},
                    {"key": "value2", "value": 9},
                    .....
                ]
            },
            "<field_name2>": {
                "values": [
                    {"key": "value3", "value": 7},
                    {"key": "value4", "value": 5},
                    .....
                ]
            }
        }
    }
}


Examples

This section contains examples for the various calculations that can be requested with aggregations.

Group by field

Simple one dimensional value faceting over a single field. This will return a result count for each value in the given field.

"language": {
    "fields": "language"
}

This can also be shortened, as the label can also serve as the field:

"language": {}

Set number of results

 

By default 10 results are returned for each field. This can be changed by setting the size parameter. For example to retrieve the top 3 languages from the result set:

"language": {
    "size": 3
}

Select multiple fields

To return the result counts for all values in multiple fields, simply list the fields:

"multi": {
    "fields": ["provider", "language"]
}

Group by multiple fields

It's possible to add nested dimensions in an aggregation. The example below groups the result first by client and then within each client add up the revenue column:


{
    "outer_agg_name": {
        "fields": "client",
        "aggregation": {
            "fields": "revenue",
            "method": "sum"
        }
    }
}

Group by date

The histogram aggregation method is recommended for dates, because it groups values by hour, day, week, etc.

To count the results for \item's creation dates, use the following aggregation:

"$item_created_at": {
    "method": "histogram"
}

This can again be nested, to for example aggregate the languages of items over time:

"$item_created_at": {
    "method": "histogram",
    "aggregation": {
       "fields": ["language"],
    }
} 


Significant Terms Aggregation

This can be used to do the Significant Terms aggregation. Please note that this requires sending a query also to make this effective.

"aggregation_name": {
    "method": "significant_terms",
	"fields": ["provider", "language"]
}

Sample Python script

Below is a sample python script that can be used to authenticate against a Squirro Cluster and request these aggregations.

"from squirro_client import SquirroClient

s = SquirroClient(None, None, cluster="https://unstable.squirro.net")

s.authenticate(refresh_token="FILL_IN_TOKEN")

aggregations = {
    "client_agg": {
        "fields": "client",
        "aggregation": {"fields": "money", "method": "sum"},
    }
}

res = s.query(project_id="PROJECT_ID", aggregations=aggregations)
print(res)