...
Section | Description |
---|---|
Enrich | Extracting additional data from records or converting them into text is counted as enrichments. This includes language detection, deduplication, or converting binary documents to text. |
Relate | Linking the ingested items within each other or with other data sources is part of this section. Most importantly this includes the Known Entity Extraction steps. |
Discover | Discover includes steps around topic modelling and clustering, as well as analysis for the Content-based Typeahead. |
Classify | Text classification, such as the models created with the Squirro AI Studio, are part of this section. |
Predict | Time series detection with the Trend Detection module shows up here. |
Recommend | This section includes the updating of recommendation models and insights generation. These are currently not yet exposed in the user interface. |
Automate | Automated actions, such as sending of emails, is included as automations. Currently this section is empty in the user interface. |
Index | This step is not included in the architecture charts, but can be seen and used in the pipeline editor. It includes the required steps to persist Squirro items on disk for searching. |
Custom | Custom steps can be added to the pipeline in the form of Pipelets . Currently these pipelets always show up in a section called Custom, but this will be extended to allow each pipelet to be assigned to one of the above sections as well. |
...
Enrich
The Unshorten Link step should go before the Deduplication step.
The Content Augmentation step should go before the Content Extraction step.
The Content Extraction step should go before the Language Detection step.
Index
The Content Standardization step should go before the Indexing step.
The Cache Cleaning step should go after the Indexing step.
The Search Tagging and Alerting step should go after the Indexing step.
...