Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Introduction

In this brief tutorial, we will go through the process of setting up, configuring, and deploying a simple Known Entity Extraction (KEE) project into to a Squirro projectserver

Each KEE project is a tool which analyzes all incoming documents to a Squirro project, and identifies each instance of a known entity (such as a company, product, person, etc.) by tagging the document that contains the known entity with a specific metadata tag. Once known entities are identified, documents can easily be filtered and grouped within a Squirro project based on the known entities that they contain.

...

We do this by adding an entry to the "sourcesstrategies" section of the config.json file, as shown below:

...

To make a fixture, you can either create it yourself using your favorite text editor, or you can use the kee tool to create one for you from an existing Squirro item. For example, running.

Adding Squirro Server Configuration

In order for the kee tool to be able to create fixtures from a Squrro project, the squirro section must be added to the config.json file, as shown below:

Code Block
>>{
kee get_fixture 'pRVNr9H7QJG_UXhwHjQH3A' '2aEdt4H0R7uwVfbScYadpA'

will create fixtures for the two squirro items indicated by the unique IDs present in the list. For more information on how to get the unique ID for a given squirro item, check LINK.

Using Fixtures

Once we have a fixture created, we can use the kee command line tool to test the KEE extraction on the fixture. To test a KEE project using the set of fixtures within that KEE project folder, we run the command:

Code Block
languagebash
>> kee -v test

This will produce a basic summary output that shows which tags (keywords) were added to each fixture by the KEE. For our example KEE project, running this command produces:

Code Block
>> kee -v test
 
- Running fixture demo
  -    4 (100%) correct results:
   "squirro": {
        "cluster": "http://www.example.com/squirro/",
        "token": "abcabcexampletoken123123123",
		"project_id": "example_project"
        },
        
    "sources": {
       [u'Adrian Fox', u'District Manager', u'District Representative', u'Jane Doe']
  -    0 (  0%) missed results: []
  -    0 (  0%) extra results: []
- Processed 1 fixtures
  -    4 (100%) correct results
  -    0 (  0%) missed results "salespeople": {
...

This will provide the kee tool with everything that it needs to authenticate with the squirro server and download individual items to create fixtures.

Fixtures are created using this method by running the kee get_fixture command:

Code Block
>> kee get_fixture 'pRVNr9H7QJG_UXhwHjQH3A' '2aEdt4H0R7uwVfbScYadpA'

The above command creates a fixture for each of the two squirro items indicated by the unique IDs present in the list. For more information on how to get the unique ID for a given squirro item, check LINK.

Testing with Fixtures

Once we have a fixture created, we can use the kee command line tool to test the KEE extraction on the fixture. To test a KEE project using the set of fixtures within that KEE project folder, we run the command:

Code Block
languagebash
>> kee -v test

This will produce a basic summary output that shows which tags (keywords) were added to each fixture by the KEE. For our example KEE project, running this command produces:

Code Block
>> kee -v test
 
- Running fixture demo
  -    04 (100%)  0%) extra results

This shows us that the KEE worked as we intended it to, and that all of the tags that we expected to find were correctly added by the KEE. 

If testing the KEE produces results which are different from what is expected, adjustments can be made to the config.json file to improve the results for each specific use case by modifying the way that KEE works. More information on this process can be found here: LINK

Deploying a KEE Project

To upload a KEE project to a Squirro server, the kee command line tool is used with the argument upload. For example:

Code Block
>> kee upload

...

correct results:
        [u'Adrian Fox', u'District Manager', u'District Representative', u'Jane Doe']
  -    0 (  0%) missed results: []
  -    0 (  0%) extra results: []
- Processed 1 fixtures
  -    4 (100%) correct results
  -    0 (  0%) missed results
  -    0 (  0%) extra results

This result shows us that the KEE worked as we intended it to, and that all of the tags that we expected to find were correctly added by the KEE. 

If testing the KEE produces results which are different from what is expected, adjustments can be made to the config.json file to improve the results for each specific use case by modifying the way that KEE works. More information on this process can be found here: LINK

Deploying a KEE Project

Once a KEE project has been tested and produces the desired results, the KEE project can be uploaded to a remote Squirro server to be used for enriching all incoming data. 

Similar to creating fixtures from remote Squirro items, deploying a KEE project to a Squirro server requires that the <a>squirro section</a> be present within the config.json file. This section includes the information necessary to successfully authenticate with the Squirro server.

To upload a KEE project to a Squirro server, the kee upload command is used.

Code Block
>> kee upload

After the KEE project is uploaded to a Squirro server, the KEE will be available as an enrichment under the <a> Enrich tab </a> of the Squirro frontend. Each uploaded KEE project requires a unique name which can be customized within the kee field of the config.json file, as shown below.

Code Block
{
    "kee": {
        "pipelet": "Salesperson Extraction"
        },

    "squirro": {
...

 

In this example, the KEE project will be available as a pipelet with the title "Salesperson Extraction" once it is uploaded.

Download this Example KEE Project

...