Table of Contents

Introduction

Known Entity Extraction (commonly abbreviated KEE) is Squirro's technology to enrich unstructured data by linking it to company-specific structured information.

Examples of such structured information and the way that they can be linked to unstructured documents include:

This documentation explains how to create these links between structured and unstructured information using the Known Entity Extraction functionality. As this is a component of Squirro, make sure you are familiar with the core Squirro concepts, especially the Squirro Architecture and the Item Format.

Usage

As data is loaded into a project, Known Entity Extraction is performed using a plugin to the data enrichment pipeline (a pipelet) provided by Squirro.

The KEE pipelet uses a lookup database as the foundation of its work. That lookup database needs to be re-compiled any time the original data or setting for the KEE project change. To create this lookup database, the kee utility is used. That utility is installed as part of the Toolbox. The following pages document how to work with this utility:

The Known Entity Extraction can also be set up directly in the Squirro user interface. That process is documented in:

For advanced use cases that are not covered by default, the pipelet can be extended by subclassing it: