Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

For each salesperson, their name, a unique ID, email address, position, and their manager are provided. The basic layout of the CSV file is shown below:

Code Block
languagetext
titlesalespeople.csv
id, name, email, position, manager
1, John Smith, jsmith@company.com, District Manager, David Cole
2, Jane Doe, jdoe@company.com, District Manager, David Cole
3, Adrian Fox, afox@company.com, District Representative, Jane Doe
...

Our goal in this KEE project is to identify a salesperson any time that they are referenced in a document, and tag the document with the name of the salesperson, their position, and the name and position of their manager.

...

To start, we point the KEE configuration to our list of entities by adding a source to the config.json file as shown below:

Code Block
languagejs
titleconfig.json
{
    "sources": {
        "salespeople": {
            "dsn": "csv:///salespeople.csv"
			"field_id": "name",
            "field_matching": ["name", "email"],
			"hierarchy": "manager -> name",
        }
    }
}

What the above code does is create a new source of known entities called "salespeople", and for this source we set the data source name ("dsn") to point to the csv CSV file salespeople.csv which is located in the same folder as the config.json file.

The field_id field identifies the field "name" as being the unique identifier for each entity in the csv CSV file. 

the field_matching field provides a list of all the fields that we want to look for to identify a known entity within a document. In this case, we want to look for references to either the salesperson's name, or their email address in the documents in our Squirro project, so we include both of those fields in a list.

The heirarchy hierarchy field indicates that there is a hierarchy within the entities in the csv CSV file, where the value in the 'manager' field of one entity points to the name of that entity's parent entity (the person's manager).

...

We do this by adding an entry to the "strategies" section of the config.json file, as shown below:

Code Block
languagejs
titleconfig.json
{
    "sources": {
        "salespeople": {
            "dsn": "csv:///salespeople.csv",
			"hierarchy": "manager -> name",
            "strategy": "salesperson_strategy"
        }
    },

    "strategies": {
        "salesperson_strategy": {
            "min_score": 0.9,
            "keywords": [
                "name",
                "position"
 
              ],
            "parent_keywords": [
                "name -> manager name",
                "position -> manager position"
  manager position"
            ],
        }
    }
}

...

The code added above creates a strategy called 'salesperson_strategy' for identifying entities and applies it to the source 'salespeople'. 

...

In order for the kee tool to be able to create fixtures from a Squrro project, the squirro section must be added to the config.json file, as shown below:

Code Block
languagejs
titleconfig.json
{
    "squirro": {
        "cluster": "http://www.example.com/squirro/",
        "token": "abcabcexampletoken123123123",
		"project_id": "example_project"
 
      },
        
    "sources": {
        "salespeople": {
...

This will provide the kee tool with everything that it needs to authenticate with the squirro Squirro server and download individual items to create fixtures. Please see Connecting to Squirro for more information on how to get this information.

...

The above command creates a fixture for each of the two squirro Squirro items indicated by the unique IDs present in the list. Unique IDs for a Squirro item can be found at the end of the URL which appears in a browser when an item is selected on the search page (see screenshot below).

...

After the KEE project is uploaded to a Squirro server, the KEE will be available as an enrichment under the Enrich tab of the Squirro frontend. Each uploaded KEE project requires a unique name which can be customized within the kee section of the config.json file, as shown below.

Code Block
languagejs
titleconfig.json
{
    "kee": {
        "pipelet": "Salesperson Extraction"
 
      },

    "squirro": {
...

...

In this example, the KEE project will be available as a pipelet with the title "Salesperson Extraction" once it is uploaded.

Download this Example KEE Project

...

The full source for this example is available for download: tutorial.zip.