Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The kee command line utility is used to set up and deploy Known Entity Extraction to a Squirro Project. It is included in the Toolbox, which is available in the Downloads space.

Table of Contents

Table of Contents
outlinetrue
excludeTable of Contents

Basic Usage

The KEE Tool is invoked from a command line window. For example the following command runs the test suite of a KEE project:

Code Block
squirro_kee -v test --stats

This example has three components:

  • -v is a common argument that turns on verbose logging.

  • test is the sub-command to run.

  • --stats is an argument to the test sub-command.

Info

What is a KEE Project?

A Known Entity Extraction project is a single folder on your computer that has all the required data, configuration and code to create a KEE lookup database.

KEE Project Example

Every KEE Project will include at least the following files:

  • A CSV file of known entities to look for within the unstructured data

  • A JSON configuration file

  • A 'db/' folder that includes the lookup database once it is compiled

Many KEE projects will also include additional content such as:

  • A pipelet file (a python script which performs the known entity extraction)

  • A 'fixtures/' folder that includes example squirro items used to test the KEE configuration

Configuration

The basis of a KEE project is the configuration file. That file has the name config.json and must be added to each KEE project before the kee command can be executed.

See the KEE Config Reference for documentation on how to create this file or the KEE Tutorial for a simple example.

Common Arguments

The kee utility accepts a number of parameters. These all need to be specified before the sub-command and its options.

...

Argument

...

Mandatory

...

Description

...

General Options

...

-h

...

No

...

Show a help message and exit.

...

--version

...

No

...

Output the tool version and exit.

...

--verbose, -v

...

No

...

Increase log verbosity.

  • Not specified: the tool outputs all warnings and errors.

  • Specified once or more: informational messages are also output.

  • Specified twice or more (-vv): debugging messages are shown.

  • Specified three times or more (-vvv): more information is included in all messages.

...

--log-file

...

No

...

Path to a log file on disk, where the log output is to be stored. If this is not specified, the log messages are shown on the console.

Sub-Commands

A sub-command always needs to be specified. The following sub-commands exist.

compile

Compile the lookup database from the input data. (This applies to which ever KEE project is present within the current present working directory of the terminal)

This command does not accept any additional arguments. The only usage of this sub-command is shown below:

Code Block
squirro_kee compile

upload

Upload the KEE project to a Squirro server. This creates a pipelet on the server which can then be added to any Squirro project.

The Squirro section in the configuration file must be present for this to work.

...

Argument

...

Mandatory

...

Description

...

General Options

...

--no-compile

...

Don't compile the lookup database. By default the compile sub-command is automatically executed when uploading.

rerun

Re-apply the KEE tagging to the Squirro project. This is used to apply new configuration changes to old items. The  Squirro section in the configuration file must be present for this to work.

...

Argument

...

Mandatory

...

Description

...

--query

...

The Squirro query for which the KEE extractions should be rerun. Every item that matches this query will be processed.

If this is not present, then this uses the version and version_keyword parameters of the kee section in the configuration file. That will lead to KEE tagging being run on all items that have not yet been tagged with the current version.

...

--no-compile

...

Don't compile the lookup database. By default the compile sub-command is automatically executed when rerunning the KEE tagging.

 test

Run the test suite of the current KEE project. By default the test cases are located in the fixtures directory. The KEE Testing documentation explains how those test cases can be created.

...

Argument

...

Mandatory

...

Description

...

Fixtures

...

[fixtures…]

A list of fixture files that should be run. If this is not specified, all the fixtures are tested. The following example runs the test on just two fixture files:

Code Block
squirro_kee test fixtures/acme.json fixtures/other_corp.json

...

General Options

...

--no-compile

...

Don't compile the lookup database. By default the compile sub-command is automatically executed when running the tests. That can be slow for large databases, in which case you may want to disable the compilation using this flag.

...

Snapshots

...

--snapshot

...

Creates a new snapshot from the current test results. Snapshots are stored on disk (in the snapshots folder by default) and are used to compare the KEE result quality over time.

...

--snapshot-message, -m

Add a comment to the snapshot. This implies --snapshot as well, so the following command is a short version of creating a snapshot and adding a comment:

Code Block
squirro_kee test -m "Tuned ngrams"

...

--diff

...

Compare the snapshot to the previous snapshot. This outputs how much better or worse the match quality has become.

...

Debugging

...

--stats

...

Outputs a summary for all the missed keywords. This provides a quick overview on what kind of entities are not yet detected as they should.

...

--trace STRING

...

Turns on detailed logging whenever the given candidate is being processed. For example if the lookup database contains an entry called "Acme Inc" then invoking kee as follows will result in a verbose log file every time that entry is looked at:

Code Block
squirro_kee test --trace "Acme Inc"

If this option is present, then the --verbose flag does not have any effect.

get_fixture

Download one or more items from the configured Squirro project and store them in the fixtures folder. The  Squirro section in the configuration file must be present for this to work.

Consult the KEE Testing documentation for information on the fixtures.

...

Argument

...

Mandatory

...

Description

...

[items…]

...

Yes

...

This page can now be found at KEE CLI Tool on the Squirro Docs site.