Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Introduction

This tutorial provides an example of the process for setting up, configuring, and deploying a simple Known Entity Extraction (KEE) project to a Squirro server. 

Each KEE project is a tool which analyzes all incoming documents to a Squirro project, and identifies each instance of a known entity (such as a company, product, person, etc.) by tagging the document that contains the known entity with a specific metadata tag. Once known entities are identified, documents can easily be filtered and grouped within a Squirro project based on the known entities that they contain.

Table of Contents

Table of Contents
excludeTable of Contents

Webinar

The example from this tutorial was also shown in the technical partner webinar in February 2016. Please see the KEE Webinar page for a recording.

A Simple Example

As an easy example, let's take a CSV file that includes a list of salespeople employed by a company.

...

Our goal in this KEE project is to identify a salesperson any time that they are referenced in a document, and tag the document with the name of the salesperson, their position, and the name and position of their manager.

Setting up A KEE Project

In order to make organization easy, each KEE project is stored in its own folder. 

...

Finally, we set the keyword items to tag each match with the name and position of both the matching entity (the salesperson) and the matching parent entity (the salesperson's manager). In this case, we tag each document with the name and position of the matching entity in facets called 'name' and 'position' respectively, while we tag each document with the name and position of the matching parent entity in facets called 'manager name' and 'manager position' respectively.

Testing a KEE Project

Once we have our list of entities to extract and our initial configuration file, we can begin testing the KEE project and adjusting the settings for our specific use case. 

...

If testing the KEE produces results which are different from what is expected, adjustments can be made to the config.json file to improve the results for each specific use case by modifying the way that KEE works. More information on this process can be found in the KEE Testing Documentation.

Deploying a KEE Project

Once a KEE project has been tested and produces the desired results, the KEE project can be uploaded to a remote Squirro server to be used for enriching all incoming data. 

...

In this example, the KEE project will be available as a pipelet with the title "Salesperson Extraction" once it is uploaded.

Download this Example KEE Project

The full source for this example is available for download: tutorial.zip.