Introduction
In this brief tutorial, we will go through the process of setting up, configuring, and deploying a simple Known Entity Extraction (KEE) project into to a Squirro projectserver.
Each KEE project is a tool which analyzes all incoming documents to a Squirro project, and identifies each instance of a known entity (such as a company, product, person, etc.) by tagging the document that contains the known entity with a specific metadata tag. Once known entities are identified, documents can easily be filtered and grouped within a Squirro project based on the known entities that they contain.
...
We do this by adding an entry to the "sourcesstrategies" section of the config.json
file, as shown below:
...
To make a fixture, you can either create it yourself using your favorite text editor, or you can use the kee
tool to create one for you from an existing Squirro item. For example, running.
Adding Squirro Server Configuration
In order for the kee
tool to be able to create fixtures from a Squrro project, the squirro
section must be added to the config.json
file, as shown below:
Code Block |
---|
>>{ kee get_fixture 'pRVNr9H7QJG_UXhwHjQH3A' '2aEdt4H0R7uwVfbScYadpA' |
will create fixtures for the two squirro items indicated by the unique IDs present in the list. For more information on how to get the unique ID for a given squirro item, check LINK.
Using Fixtures
Once we have a fixture created, we can use the kee
command line tool to test the KEE extraction on the fixture. To test a KEE project using the set of fixtures within that KEE project folder, we run the command:
Code Block | ||
---|---|---|
| ||
>> kee -v test |
This will produce a basic summary output that shows which tags (keywords) were added to each fixture by the KEE. For our example KEE project, running this command produces:
Code Block |
---|
>> kee -v test - Running fixture demo - 4 (100%) correct results: "squirro": { "cluster": "http://www.example.com/squirro/", "token": "abcabcexampletoken123123123", "project_id": "example_project" }, "sources": { [u'Adrian Fox', u'District Manager', u'District Representative', u'Jane Doe'] - 0 ( 0%) missed results: [] - 0 ( 0%) extra results: [] - Processed 1 fixtures - 4 (100%) correct results - 0 ( 0%) missed results "salespeople": { ... |
This will provide the kee
tool with everything that it needs to authenticate with the squirro server and download individual items to create fixtures.
Fixtures are created using this method by running the kee get_fixture
command:
Code Block |
---|
>> kee get_fixture 'pRVNr9H7QJG_UXhwHjQH3A' '2aEdt4H0R7uwVfbScYadpA' |
The above command creates a fixture for each of the two squirro items indicated by the unique IDs present in the list. For more information on how to get the unique ID for a given squirro item, check LINK.
Testing with Fixtures
Once we have a fixture created, we can use the kee
command line tool to test the KEE extraction on the fixture. To test a KEE project using the set of fixtures within that KEE project folder, we run the command:
Code Block | ||
---|---|---|
| ||
>> kee -v test |
This will produce a basic summary output that shows which tags (keywords) were added to each fixture by the KEE. For our example KEE project, running this command produces:
Code Block |
---|
>> kee -v test - Running fixture demo - 04 (100%) 0%) extra results |
This shows us that the KEE worked as we intended it to, and that all of the tags that we expected to find were correctly added by the KEE.
If testing the KEE produces results which are different from what is expected, adjustments can be made to the config.json file to improve the results for each specific use case by modifying the way that KEE works. More information on this process can be found here: LINK
Deploying a KEE Project
To upload a KEE project to a Squirro server, the kee
command line tool is used with the argument upload
. For example:
Code Block |
---|
>> kee upload |
...
correct results:
[u'Adrian Fox', u'District Manager', u'District Representative', u'Jane Doe']
- 0 ( 0%) missed results: []
- 0 ( 0%) extra results: []
- Processed 1 fixtures
- 4 (100%) correct results
- 0 ( 0%) missed results
- 0 ( 0%) extra results |
This result shows us that the KEE worked as we intended it to, and that all of the tags that we expected to find were correctly added by the KEE.
If testing the KEE produces results which are different from what is expected, adjustments can be made to the config.json file to improve the results for each specific use case by modifying the way that KEE works. More information on this process can be found here: LINK
Deploying a KEE Project
Once a KEE project has been tested and produces the desired results, the KEE project can be uploaded to a remote Squirro server to be used for enriching all incoming data.
Similar to creating fixtures from remote Squirro items, deploying a KEE project to a Squirro server requires that the <a>squirro
section</a> be present within the config.json
file. This section includes the information necessary to successfully authenticate with the Squirro server.
To upload a KEE project to a Squirro server, the kee upload
command is used.
Code Block |
---|
>> kee upload |
After the KEE project is uploaded to a Squirro server, the KEE will be available as an enrichment under the <a> Enrich tab </a> of the Squirro frontend. Each uploaded KEE project requires a unique name which can be customized within the kee
field of the config.json
file, as shown below.
Code Block |
---|
{
"kee": {
"pipelet": "Salesperson Extraction"
},
"squirro": {
... |
In this example, the KEE project will be available as a pipelet with the title "Salesperson Extraction" once it is uploaded.
Download this Example KEE Project
...