Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 13 Next »

Work in progress

What is Trend Detection?

The Trend Detection analysis can be used to detect unusual trends in the time series data. In the Squirro context, time series data is generated in the form of the number-of-items per time-unit in a particular project for a particular query. This time series data can be easily observed today with the histogram bins on the search page. Trend detection analysis aims to find unusually high peaks in this histogram/time-series automatically.

As an example scenario, consider a project `News` with a feed of all the news-items from a few of the major news publications. Now, a query like `Facebook AND Whatsapp` will filter the list of all the documents to a sub-list of documents/news-items containing both the words Facebook and Whatsapp. For the project `News` with query `Facebook AND Whatsapp`, we define a time-series as the number of items matching `Facebook AND Whatsapp` per time-unit, where time-unit can be hourly, weekly, daily, monthly or yearly.

The detection of unusual trends is done by learning from the historical/old data to auto-compute a reasonable threshold. So, in order for it to work properly it is important that we have enough historical data to learn from.

As a rule of thumb, it is advisable to have at-least two item-bins in the histogram.

How to set up Trend Detection?

The Trend Detection service can now be used with the Squirro client. In order to set up a new trend-detection on an existing project, one can use the `new_trenddetection` method of the Squirro client to set up the trend detection for a particular query `q` on a project with project-id `project-id`. A valid email-address is required for generating an email-alert when something unusual is detected for a particular query-projectid pair. Parameter `aggregation_interval` controls the discretisation of the time series (`time-unit` parameter above) before initiating the trend-detection analysis. An example usage can be seen below.

>>> client.new_trenddetection(
		project_id='2sic33jZTi-ifflvQAVcfw',
		query='hello world',
        name='test name',
        email_user='test@squirro.com',
        aggregation_interval='1w')

How to inspect detected Trends?

Once, a new trend-detection analysis has been created, one will receive alert emails every-time the number of items in a particular time window (defined by the time-unit) goes above the automatically computed threshold for that period. Moreover, one can also see the trends detected in the historical data using the `get_trenddetection_labels` method of the Squirro client. No alert email will be generated for these trends detected in the historical data.

>>> client.get_trenddetection_thresholds(project_id='2sic33jZTi-ifflvQAVcfw', trenddetection_id='fd5x9NIqQbyBmF4Yph9MMw')
 
>>> client.get_trenddetection_labels(project_id='2sic33jZTi-ifflvQAVcfw', trenddetection_id='fd5x9NIqQbyBmF4Yph9MMw')
  • No labels