Introduction

In this brief tutorial, we will go through the process of setting up, configuring, and deploying a simple Known Entity Extraction (KEE) project into a Squirro project. 

Each KEE project is a tool which analyzes all incoming documents to a Squirro project, and identifies each instance of a known entity (such as a company, product, person, etc.) by tagging the document that contains the known entity with a specific metadata tag. Once known entities are identified, documents can easily be filtered and grouped within a Squirro project based on the known entities that they contain.

A more detailed overview of KEE as a whole can be found here: LINK

A simple example

As an easy example, let's take a CSV file that includes a list of salespeople employed by a company.

For each salesperson, their name, a unique ID, email address, position, and their manager are provided. The basic layout of the CSV file is shown below:

id, name, email, position, manager
1, John Smith, jsmith@company.com, District Manager, David Cole
2, Jane Doe, jdoe@company.com, District Manager, David Cole
3, Adrian Fox, afox@company.com, District Representative, Jane Doe
...

Our goal in this KEE project is to identify a salesperson any time that they are referenced in a document, and tag the document with the name of the salesperson, their position, and the name of their manager.

Setting up A KEE project

In order to make organization easy, each KEE project is stored in its own folder. 

So for example, if we had a separate KEE project for identifying specific products sold by the same company, we would have a second KEE project folder for the other KEE project. 

 

For this project, we will do all of our work in a new folder called kee_salespeople. Within this folder we want to create the following content:

Testing a KEE project

Deploying a KEE project