Introduction

In this brief tutorial, we will go through the process of setting up, configuring, and deploying a simple Known Entity Extraction (KEE) project into a Squirro project.

Each KEE project is a tool which analyzes all incoming documents to a Squirro project, and identifies each instance of a known entity (such as a company, product, person, etc.) by tagging the document that contains the known entity with a specific metadata tag. Once known entities are identified, documents can easily be filtered and grouped within a Squirro project based on the known entities that they contain.

A more detailed overview of KEE as a whole can be found here: LINK

A simple example

As an easy example, let's take a CSV file that includes a list of salespeople employed by a company.

For each salesperson, their name, a unique ID, email address, position, and their manager are provided. The basic layout of the CSV file is shown below:

id, name, email, position, manager
1, John Smith, jsmith@company.com, District Manager, David Cole
2, Jane Doe, jdoe@company.com, District Manager, David Cole
3, Adrian Fox, afox@company.com, District Representative, Jane Doe
...

Our goal in this KEE project is to identify a salesperson any time that they are referenced in a document, and tag the document with the name of the salesperson, their position, and the name of their manager.

Setting up A KEE project

In order to make organization easy, each KEE project is stored in its own folder.

So for example, if we had a separate KEE project for identifying specific products sold by the same company, we would have a second KEE project folder for the other KEE project.

For this project, we will do all of our work in a new folder called kee_salespeople. Within this folder we want to create the following content:

salespeople.csv file - This is the file that contains the list of all the salespeople we want to identify.
config.json file - This is the configuration file for the KEE project that describes how we want the KEE project to operate. You can customize the rules for each KEE project and make tweaks to how entities are identified by changing this file.
fixtures/ folder - This contains the test items that we will use to configure the KEE project

Introduction

A simple example

Setting up A KEE project

Testing a KEE project

Deploying a KEE project