Data Loading and Processing
This Data Loading and Processing manual shows how data is loaded into Squirro.
It starts with some of the fundamentals, including an overview of Squirro's architecture and data modeling aspects.
Then the various ways of getting data into the system are outlined.
Finally the enrichments show how the data can be enriched as part of the process.
Table of Contents
Architecture
Item Format
- Catalyst Data Model — The catalyst data model provides a sub-item model so that when significant events are detected, we can show exactly which sentence or phrase in a document triggered the catalyst, as well as building detailed relationships across documents.
Facets
- Facets Webinar — This is a recording of the technical webinar on the topic of Facets. The webinar goes through the Facets Tutorial.
- Facets Tutorial — This tutorial imports an extract of the Panama Papers https://panamapapers.icij.org into Squirro. The focus is especially on how to model and use the facets in this context.
- Using Facets — This section shows how the different data types of facets affect their usage in queries and dashboards.
- Data Modeling — This section talks about the considerations when thinking of the keywords to use on Squirro items.
- Managing Facets — Facets offer a range of customizations when setting them up. This page describes how to configure and manage facets using the squirro user interface.
Data Import
To gather insights from data, Squirro needs very fast access to that data. For that purpose, any required data is indexed into the Squirro data storage layer.
- Data Sources — This is a list of the data sources for which Squirro has connectors available.
- Data Connectors
- Loading Tools
- SDK tools
- Reset Project — The Reset Project function in the Studio is used to quickly reset a project.
- Attaching files to items
- Dataloader Frontend Improvements
- Dataloader Templates documentation
- Connecting to Squirro
- Priorities for Data Sources
Communities
Trend Detection
Squirro provides the Trend Detection functionality to analyze trends in time-series data. This feature can be used to find outliers in the historical time-series data and get automatic email alerts for unusual behaviour or patterns in your data.
- Trend Detection Webinar — The Trend Detection webinar gives a introduction into Squirro Trend Detection.
- Trend Detection Tutorial — This tutorial goes step by step through setting up a Trend Detection on a Squirro project and visualizing it on a Squirro dashboard.
- Trend Detection Reference — This reference section discusses the details for the Squirro Trend Detection functionality.
Pipeline
The Squirro pipeline transform a record from a data source to a Squirro item and writes it into the index.
- Pipeline Editor
- Rerun Enrichments — To apply enrichments after ingestion, Rerun Enrichments can be used.
- Rerun Pipeline Workflows
- Rerun Individual Pipeline Steps
- Pipeline Steps — The Pipeline has a number of different steps available. These are documented here in their various sections.
- Processing Error — Some steps in the pipeline may abort with an error. In those cases, the item is tagged with a "Processing Error" keyword. The table below lists all the processing error codes.
- Configuration of Ingester for prioritised Data Sources