Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Contents

Table of Contents

A

...

Simple Example

As an easy example, let's take a CSV file that includes a list of salespeople employed by a company.

...

Our goal in this KEE project is to identify a salesperson any time that they are referenced in a document, and tag the document with the name of the salesperson, their position, and the name of their manager.

Setting up A KEE

...

Project

In order to make organization easy, each KEE project is stored in its own folder. 

...

Finally, we set the keyword items to tag each match with for both the matching entity (the salesperson) and the matching parent entity (the salesperson's manager). In this case, we tag each document with the name and position of the matching entity in facets called 'name' and 'position' respectively, while we tag each document with the name and position of the matching parent entity in facets called 'manager name' and 'manager position' respectively.

Testing a KEE

...

Project

Once we have our list of entities to extract and our initial configuration file, we can begin testing the KEE project and adjusting the settings for our specific use case. 

...

The first step in testing our KEE project is to compile our lookup database for the entities using the KEE command line tool. This is accomplished by running the following command (with the present working directory set to the folder for our KEE project):

Code Block
>> kee compile

After running this command, a db/ folder will be created within our KEE project, this folder includes the lookup database (lookup.json) used to identify known entities. If we take a look at this file, we will see a lookup entry for each of the entities within the csv file, as well as the details for each entity within the csv file. For example:

...

It is important to remember that any time we make changes to the config.json file, we have to rerun "kee compile" for those changes to take affect. 

Creating

...

Fixtures

Once we have our lookup database compiled, we can begin testing the KEE project by running example squirro items, or "fixtures" through it. Generally, we store the fixtures that we use for testing in a folder within the KEE project called fixtures/. Each file within the fixtures folder is a JSON document which includes an example squirro item that we want to use to test the KEE, and a list of all the tags that we expect to be added to that item as a result of the KEE process. 

...

Code Block
languagejs
titleExample Fixture
{
    "item": {
        "body": "I spoke with Adrian Fox on the phone earlier this morning..."
    },
    "keywords": {
        "name": ["Adrian Fox"],
        "position": ["District Representative"],
        "manager name": ["Jane Doe"],
        "manager position": ["District Manager"]
    }
}

In this fixture, the item field includes the data for the squirro item that we want to use to test the KEE strategy that we have created. In addition to creating fixtures manually using a tool such as a text editor, fixtures can be created automatically from existing squirro items by running

The keywords field includes all of the tags that we expect to be added to the document as a result of the KEE. In this case, we expect the KEE to correctly identify the salesperson "Adrian Fox" within the document, and tag it with the salesperson's name and position, as well as the name and position of their manager.

To make a fixture, you can either create it yourself using your favorite text editor, or you can use the kee tool to create one for you from an existing Squirro item. For example, running:

Code Block
>> kee get_fixture ['pRVNr9H7QJG_UXhwHjQH3A', '2aEdt4H0R7uwVfbScYadpA']

will create fixtures for the two squirro items indicated by the unique IDs present in the list. For more information on how to get the unique ID for a given squirro item, check LINK.

Using Fixtures

Once we have a fixture created, we can use the kee command line tool to test the KEE extraction on the fixture. To test a KEE project using the set of fixtures within that KEE project folder, we run the command:

Code Block
languagebash
>> kee -v test

This will produce a basic summary output that shows which tags (keywords) were added to each fixture by the KEE. For our example KEE project, running this command produces:

Code Block
>> kee get_fixture

 

...

 -v test
 
- Running fixture demo
  -    0 (  0%) correct results: []
  -    4 (100%) missed results:
        [u'Adrian Fox', u'District Manager', u'District Representative', u'Jane Doe']
  -    0 (  0%) extra results: []
- Processed 1 fixtures
  -    0 (  0%) correct results
  -    4 (100%) missed results
  -    0 (  0%) extra results

This shows us that the KEE worked as we intended it to, and that all of the tags that we expected to find were correctly added by the KEE. 

If testing the KEE produces results which are different from what is expected, adjustments can be made to the config.json file to improve the results for each specific use case by modifying the way that KEE works. More information on this process can be found here: LINK

Deploying a KEE Project

To upload a KEE project to a Squirro server, the kee command line tool is used with the argument upload. For example:

Code Block
>> kee upload

In order for this to work, the squirro section of the config must be completed, more information can be found here LINK.

Download this Example KEE Project