Excerpt |
---|
This document explains the configuration of Known Entity Extraction (KEE) through the studio plugin Known Entity Extraction exposed in the user interface under the AI STUDIO tab in the Setup space. See here for a This tutorial that guides you through the setup of KEE using the studio plugin. |
...
Known Entity Extraction is the Squirro technology to enrich unstructured data by linking it to structured information. Examples are identifying company names or products in indexed Squirro data.
Configuration
To set up a KEE, open the Setup space in your Squirro project, and navigate to AI STUDIO → Known Entity Extraction. Press the plus button in the top right corner to configure a new KEE.
You can specify the following configuration options. Only the keys marked with a star (*) are mandatory. The KEE config.json name in the table refers to the internal config key as documented on KEE Config Reference.
Configuration | Description | KEE config.json name | |||||
---|---|---|---|---|---|---|---|
Name * | The name of the KEE enrichment. Must be unique on the entire server, and will overwrite any existing enrichment with the same name. |
| |||||
KEE data * | CSV or Excel file containing the structured information. The first row must contain column headers. The columns are referred to as fields in the configuration options below. There must be no duplicates in the rows of the CSV/Excel. Example: |
( | |||||
ID field | Field that is used as the unique ID of each records. IDs are auto-generated when this is left empty. |
| |||||
Matching fields | Fields from the input KEE data on which the match is executed. Typically the name field, for example the field holding the company name. |
| |||||
Keywords to assign | Fields for which you want to assign keywords (facets) and tag matched items. Provide each field for which you want to assign a keyword on a separate line. Use the arrow ( For example:
This keywords configuration will assign the Note, the keyword is automatically created if it is not yet existing. |
( | |||||
Minimum score for matches | The minimum score at which a match is considered. Can be any value between 0 and 1, such as |
| |||||
Enable fuzzy matching | Allow small spelling mistakes. This allows at most one letter swap, so e.g. "Apple" and "Appel" will both match each-other. |
| |||||
Enable company suffix list | Defines a company-specific suffix list which removes common company suffixes when matching company names. |
| |||||
Enable ngram database | Enables a default ngram database to improve matching precision for common English terms. |
(The ngram name is always | |||||
Config (JSON) | JSON dictionary to customize configuration values. See example below. |
...