Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Info

Work in progress

Excerpt

This tutorial goes step by step through setting up a Trend Detection on a Squirro project and visualizing it on a Squirro dashboard.

...

Table of Contents
outlinetrue
excludeTable of Contents

...

Introduction

The Trend Detection analysis can be used to detect unusual trends in the time series data.

In the Squirro context, time series data is generated in the form of the number-of-items per time-unit in a particular project for a particular query. This time series data can be easily observed today with the histogram bins on the search page (see image below). Moreover, the numerical facets in Squirro can also be visualized on the dashboard as a time series using the "Line chart" widget. First example of time series data in Squirro context i.e. the item-counts is shown in the screenshot below. Trend detection analysis aims can be used to automatically find anomalies - unusually high peaks in this histogram/time-series automatically.

 

                 Image RemovedImage Added

As an example scenario, consider a news project `News` with a feed of all the news - items from a few of the major news publications. Now, a query like `Facebook AND Whatsapp` A query like "Facebook AND Whatsapp" will filter the list of all the documents to a sub-list of documents/news-items containing both the words Facebook and Whatsapp. For the project `News` with query `Facebook AND Whatsapp`, we define a time-series as The time series for this particular query i Squirro is now the number of items matching `Facebook AND Whatsapp` per "Facebook AND Whatsapp" for each time-unit, where time-unit can be hourly, weekly, daily, monthly or yearly.

The detection of unusual trends is done by learning from the historical /old data to auto-compute a reasonable threshold. So, in order for it to work properly it is important that we have Squirro has enough historical data to learn from.

Info

As a rule of thumb, it is advisable to have at least two weeks worth of data.

Dataset Used

We are going to cover two different scenarios of setting up trend-detection and we are going to use two different datasets to demonstrate it.

...

.

In the second scenario, we will set up trend detection on change in the values of numerical facet of a Squirro item over time. For this scenario, we are going to use an anonymized ITSM dataset. Please download the csv dataset from here if you want to follow along with the tutorial. Every row in the dataset contains three fields i.e. "date", "title" and "calls". Please note that this dataset does not contain any textual data because trend-analysis is done purely on numerical data. 

Scenario 1 - Trend Detection

...

on item counts

This section first scenario guides one you through the process of setting up Trend Detection on the item counts number of items in a Squirro project over time. Furthermore, these These item counts can also be filtered with a Squirro Query as filtering step before setting up the Trend Detection analysis. Both of these use - cases are covered in the subsequent subsections below.

Importing data into Squirro Project

...

The data used in this example is an export from an open source bug tracker (Apache HTTP Server) with the summary removed. As a first step, import the data into Squirro.

  1. Download the CSV file: trends_tutorial_scenario1_dataset.csv.
  2. Create a new Squirro project

...

  1. .
  2. In the Data tab choose the "CSV/Excel one-off import" option

...

  1. .

    Image Modified

...


  1. Go through the process and import the data. In the mapping step you can use the auto-detected default mapping and simply continue.

Trend Detection without query

...

Set up Trend Detection

As a first step create a simple Trend Detection.

For this, switch over to the search screen. The data may not yet have been fully imported, but that's fine - it will arrive.

  1. Use the "Create Trend" option from the "Save" button drop down to start the process.

           Image Removed

 

...


  1. Image Added

  2. In the "Create Trend"

...

  1. popup fill in the information described below

...

  1. :

    Image Modified

...


  1. The following information has been filled here:
    • Title

...

    • the name of

...

    • this Trend Detection. This is later used to find the right trend in the dashboard configuration.
    • Query – the Squirro query used to filter down the item-counts. This defaults to the current query from the search screen.
    • Data Aggregation Interval

...

    • the length

...

    • of one time bucket. See the Aggregation Interval reference for full information on this.
    • Create Alert

...

    • the email

...

    • address which is notified when an anomaly is detected. This uses the current user's email address by default.

  1. Click the "Save" button to complete setting up the Trend

...

  1. Detection.
Info

A Trends widget on the dashboard first requires a Trend-Detection to be set up first using the "Search" tab of Squirro.

Visualization using Trend widget

One can visualize the Trend Detection analysis using the "Trends" widget on the dashboard. Please follow the steps below to create a new Trend Detection widget.

  • While adding a new widget on the dashboard, you will see a new "Type" of widget namely "Trend".

          Image Removed

  • Once you select the "Trend" type of widget, one can see all the Trend Detections which have been set up on this particular project under the "Trend Detections" heading. On selecting a particular Trend Detection, the widget shows the result of trend detection analysis i.e. the time series data (in blue), the automatically computed thresholds (in grey) and finally the detected anomalies (in lettered label flags).

          Image Removed

Automated alerting on future anomalies

Info

This step of the tutorial can only be executed against the Squirro demo cluster.

Once, we have set up a trend detection for a particular scenario, Squirro can also automatically alert us via email if it detects any anomalies in the future data. If one wish to simulate this behavior,  we provide a python script with a configuration file. This python script will upload new "future" items into the Squirro project and once the number of items uploaded breaches the automatic threshold, you will see an alert email in your mail box.

...

After this process, Squirro is now prepared to visualize and alert on the trend.

Note
titlePerformance Impact

Each created trend needs to be analyzed when processing data through the pipeline. So if you don't need a configured trend anymore, make sure you remove it.

Removing is not currently possible in the user interface but available through the API or the SDK.

Visualize in Dashboard

To visualize the created trend in the dashboard, there is a separate widget: Trends.

  1. Go to the dashboards and create a new dashboard.
  2. Change the default widget's type to "Trend"

    Image Added

  3. For the "Trend Detection" option choose the trend that has been created in the previous section.

    Image Added

  4. Save the dashboard.

When selecting the configured trend in step 3, the widget updates and shows the result of the trend detection analysis. The chart shows three different sections out of the box:

  • The time series data in blue
  • The automatically computed thresholds in grey
  • The detected anomalies as letters in boxes

Testing Alerts

To test alerting by email we need to add some data in real time. Because this requires the date and time to be current, this can not be done through the CSV import but instead requires a small script.

Info

This step is optional. You can skip this, if you don't need to test the alerting.

  1. Install the Python SDK. See the Python SDK Installation section for documentation on this.
  2. Download the simulation script: trends_alert_simulation.zip.
  3. Extract the ZIP file on your local system.
  4. Edit the configuration file config.ini and add your project_id and token. You can also change the cluster setting to connect to a different Squirro installation.
    See Connecting to Squirro for information on how to find these values.
  5. On the command line execute the python script: python upload_new_items.py

...

  1. Check your email inbox for the Squirro alert.

Trend Detection with query

...

Set up

...

Trend Detection

...

  • In order to achieve this through the "Create Trend" modal, type a Squirro query while setting up the Trend Detection under the "Query" heading. 

          Image Removed

  • Rest of the steps remain the same as above.

Visualization using Trend widget

Once, set up this trend detection can also be visualized by adding a new widget of type "Trend" and selecting the name of the newly created trend detection from the drop-down. The newly added trend widget will look something like this.

...

To detect trends only on a subset of the data, you can specify a query.

  1. Go back to the Search screen.
  2. Use the query "Language:en" (without the quotes)
  3. Create a trend. The query is auto-filled in the popup dialog.

    Image Added

  4. Save the trend.

Visualize in Dashboard

Back on the dashboard, add or edit a Trend widget. The new trend detection is now available in the dropdown:

Image Added

Scenario 2 - Trend Detection

...

on numerical facets

Importing data into Squirro Project

Since, we are going to set up the Trend Detection on a numerical facet, we will first have to configure a facet into the Squirro project before loading the data. Please follow the steps below to load data into a Squirro project with a numerical facet.

...

This scenario uses a different data file which contains the number of incidents as a separate keyword. So instead of analyzing the number of results that have been indexed, there is now only one Squirro item for each day. But that item contains structured information about how many service tickers there were on this day.

  1. Download the CSV file: trends_tutorial_scenario2_dataset.csv.
  2. Create a new Squirro project

...

  1. . Make sure you select the option to create a project-specific index.

    Image Added

  2. One of the facets we import is going to be numeric. This needs to be configured before the import itself.
  3. On the "Data" tab select the "Facets" option from the left navigation.
  4. Click "Add Facet".
  5. As the Name enter "service_tickets"

...

  1. and select the Type int:

    Image Modified
Info

Please make sure that the new facet is of type "int". Without proper configuration of the facet-type, you will not be able to set up the Trend Detection on this facet.

...

Now import the CSV data into the Squirro project using the Squirro UI as described in the Scenario 1 set up. Leave all the field mappings to their default while using the CSV importer.

Trend Detection on numerical facet

  • One can also use the "Create Trend" modal to create a new trend detection on a numerical facet.
  • In the presence of a numerical facet, the "Create Trend" dialog will have an extra checkbox to set up the trend detection on a numerical facet rather than the item counts.
  • In the current example, we set up the Trend Detection on a numerical facet named "service_tickets" which we have created while importing the data
    Image Removed
  • Using this "Numerical Aggregation" checkbox, one can select a particular numerical facet of interest. In this example, we select the facet "service_tickets"
  • Once a numerical facet has been selected, one can choose the aggregation method to be used on the numerical facet before setting up the trend detection. Possible options for aggregation are: Sum, Average, and Minimum. We choose "Average" for this particular example.
  • Moreover, in this example, we want to analyze data in a daily bucket rather than the default weekly bucket. So, we set up the "Data Aggregation Interval" to 1 day.
  • The rest of the workflow for setting up the Trend Detection is similar to the workflow for setting up the Trend Detection on the item-counts.

Visualizations using Trend widget

Info

A Trends widget on the dashboard first requires a Trend-Detection to be set up first using the "Search" tab of Squirro.

  • Once set up, the detected anomalies can be visualized on the dashboard using a new "Trends widget" on the dashboard.
  • One can select the "Trends" widget under the dashboard edit mode by adding a new widget of type "Trend".
  • After selecting the Trend widget, one can choose between all the trend detections set up on the project.
  • Once selected, one should see the visualizations on the Trend widget as shown in the screenshot below.
    Image Removed

Visualizations of future predictions using Trend Widget

...


  1. Go back to the "Sources" section. Use the "CSV/Excel one-off import" to import the provided CSV file.
  2. The field mappings can again be left to their default.

Configure Trend Detection

Once the import has finished, you will see in the "Search" area that the number of results is flat. For each week there are seven items in the index. To quickly visualize the numerical facet you could add a Line Chart widget on the dashboard.

Instead of that, we will create a trend on the facet right away and visualize that.

  1. Select the "Create Trend" option from the "Save" dropdown.
  2. Because this project contains numerical facets, there is now a new checkbox "Numerical Aggregation". Select this checkbox.
  3. Select the numerical facet "service_tickets" and ensure the aggregation method is set to "Average".
  4. Use a daily aggregation interval.

    Image Added

  5. Save that Trend Detection by pressing the "Save" button.

Visualize in Dashboard

After setting up the Trend Detection, you can now use it in the dashboard to visualize the data.

  1. Create a new Dashboard.
  2. Edit the default widget to be of type "Trend".
  3. Select the configured trend in the dropdown.
  4. Save the dashboard.

This results in a visualization on the number of service tickets.

Image Added

An interesting anomaly is the one that has been flagged with the letter "H" in the screenshot. It is far lower than for example the day immediately after it. But when you hover over it, you will see that it falls on a Sunday. And because Sundays usually have a lot fewer incidents than other days, this results in a detected anomaly.

Predict the Future

This time series has very recognizable repeating patterns - so-called seasonality. The future values of such time series can be predicted and visualized as well. In order to turn on the predictions visualization, go to the edit mode of the "Trends" widget and check the "Enable Predictions" checkbox.

Image RemovedImage Added

After checking the "Enable Predictions" check-box, the Trends widget will be updated with the future predictions of the time series data based on the repeating patterns recognized by Squirro in the historical data. Moreover, the number of future predictions being displayed can be adjusted using the "Predictions Range" slider.

...