The kee
command line utility is used to set up and deploy Known Entity Extraction to a Squirro Project. It is included in the Toolbox, which is available in the Downloads space.
Table of Contents
Table of Contents | ||||
---|---|---|---|---|
|
Basic Usage
The KEE Tool is invoked from a command line window. For example the following command runs the test suite of a KEE project:
...
This example has three components:
-v
is a common argument that turns on verbose logging.test
is the sub-command to run.--stats
is an argument to the test sub-command.
Info | |
---|---|
title | What is a KEE Project?A Known Entity Extraction project is a single folder on your computer that has all the required data, configuration and code to create a KEE lookup database. |
...
Every KEE Project will include at least the following files:
A CSV file of known entities to look for within the unstructured data
A JSON configuration file
A 'db/' folder that includes the lookup database once it is compiled
Many KEE projects will also include additional content such as:
A pipelet file (a python script which performs the known entity extraction)
A 'fixtures/' folder that includes example squirro items used to test the KEE configuration
Configuration
The basis of a KEE project is the configuration file. That file has the name config.json
and must be added to each KEE project before the kee command can be executed.
...
The kee
utility accepts a number of parameters. These all need to be specified before the sub-command and its options.
Argument | Mandatory | Description |
---|---|---|
General Options | ||
-h | No | Show a help message and exit. |
--version | No | Output the tool version and exit. |
--verbose, -v | No | Increase log verbosity.
|
--log-file | No | Path to a log file on disk, where the log output is to be stored. If this is not specified, the log messages are shown on the console. |
Sub-Commands
A sub-command always needs to be specified. The following sub-commands exist.
...
Upload the KEE project to a Squirro server. This creates a pipelet on the server which can then be added to any Squirro project.
The The Squirro section in the configuration file must be present for this to work.
Argument | Mandatory | Description |
---|---|---|
General Options | ||
--no-compile | Don't compile the lookup database. By default the |
rerun
Re-apply the KEE tagging to the Squirro project. This is used to apply new configuration changes to old items. The The Squirro section in the configuration file must be present for this to work.
Argument | Mandatory | Description |
---|---|---|
--query | The Squirro query for which the KEE extractions should be rerun. Every item that matches this query will be processed. If this is not present, then this uses the | |
--no-compile | Don't compile the lookup database. By default the |
test
Run the test suite of the current KEE project. By default the test cases are located in the fixtures
directory. The KEE Testing documentation explains how those test cases can be created.
Argument | Mandatory | Description | ||
---|---|---|---|---|
Fixtures | ||||
[fixtures…] | A list of fixture files that should be run. If this is not specified, all the fixtures are tested. The following example runs the test on just two fixture files:
| |||
General Options | ||||
--no-compile | Don't compile the lookup database. By default the | |||
Snapshots | ||||
--snapshot | Creates a new snapshot from the current test results. Snapshots are stored on disk (in the | |||
--snapshot-message, -m | Add a comment to the snapshot. This implies
| |||
--diff | Compare the snapshot to the previous snapshot. This outputs how much better or worse the match quality has become. | |||
Debugging | ||||
--stats | Outputs a summary for all the missed keywords. This provides a quick overview on what kind of entities are not yet detected as they should. | |||
--trace STRING | Turns on detailed logging whenever the given candidate is being processed. For example if the lookup database contains an entry called "Acme Inc" then invoking
If this option is present, then the |
get_fixture
Download one or more items from the configured Squirro project and store them in the fixtures folder. The Squirro section in the configuration file must be present for this to work.
Consult the KEE Testing documentation for information on the fixtures.
Argument | Mandatory | Description |
---|---|---|
[items…] | Yes | List of item identifiers to download. |