Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


Excerpt

The catalyst data model provides a sub-item model so that when significant events are detected, we can show exactly which sentence or phrase in a document triggered the catalyst, as well as building detailed relationships across documents.

...

Definitions / Vocabulary


Documentoriginal Original data as provided by the customer.
Itema A modified version of Document as stored within Squirro.
FacetMetadata assigned to an Item in the form of a key/[list of values] pair. Stored as attribute keywords in the Item.
Extracta A single occurrence of a detected Entity within one ItemKeeps track of the location and the original text of the detection.
Entitya A real-world or higher level object of a pre-defined type, such as persons, locations, organizations, products, events etc., that can be denoted with a proper name. An Entity has a list of Extracts with all its appearances within one Item. Optionally it can maintain a list of instantiations of properties. Properties are pre-defined per Entity type and are simple values or references to other Entities
Catalyst

a A mapping between a Query and a set of Actions.

QueryA string conforming to our query syntax. The query syntax is extended to allow searching for Entities. See Query Syntax below. 
Actionsome Some action executed based on a Catalyst match. E.g., send an email, call callback.
Entity ProfileprePre-computed model for each value of an Entity. Used for ranking Recommendations.
RecommendationRanked result list of Entities based on a Query (potentially containing Entities).



Models

Itemsee Item Model 
Facetsee Facets API
Entity


Expand
titleExample


Code Block
languagejs
[{
	"id": "1234",  # unique entity id
    "item_id": "123456",  # reference to original item id
	"type": "company",  # type of the entity, e.g. company, 
	"name": "Thomson Reuters",
	"confidence": 0.8,  # aggregated confidence of all extracts [0-1]
    "relevance": 0.9,  # relevance of this entity for the item [0-1]
	"extracts": [{
    	"text": "Thomson Reuters",  # original representation
		"field": "title",  # on which Item field can this extract be found
    	"confidence": 0.9,  # confidence level [0-1]
		"offset": 14,  # start offset of text within original item
		"length": 15,  # length of text within original item
	}, {
 		"text": "TR",  # original representation
		"field": "body",  # on which Item field can this extract be found
    	"confidence": 0.1,  # confidence level [0-1]
		"offset": 0,  # start offset of text within original item
		"length": 2,  # length of text within original item
	}],
    "properties": {
        "stock_symbol": "TR",  # value based property
        "parent_company_ref": "<id of company type entity>"  # reference based property
    },
}, {
	"id": "1237", # unique entity id 
	"item_id": "123456",  # original item id
	"type": "deal", # type of the entity, e.g. deal, 
	"name": "Thomson Reuters bought Squirro for 1Mio in the US.",
	"confidence": 0.3  # confidence level of this entity [0-1]
	"extracts": [{
    	"text": "Thomson Reuters bought Squirro for 1Mio in the US.",  # original representation
		"field": "body",  # on which Item field can this extract be found
    	"confidence": 0.3,  # confidence level [0-1]
		"offset": 114,  # start offset of text within original item
		"length": 52,  # length of text within original item
	}],
	"properties": {  # variable set of keys depending on the entity type
		"region_ref": <entity_id_1_of_type_geo>,
		"size": 10000000,
		"industry": null,
		"acquirer": <entity_id_3_of_type_company>,
		"target": <entity_id_3_of_type_company>,
	}
},
...
]



Note: Properties can come in two different types: string (default) or numeric. If they are numeric, e.g. of type float or int they will be indexed on a field 'numeric_properties' in elasticsearch and mapped back to 'properties' before returned. This allows e.g. for propper number comparison or range queries. Unlike for keywords we do not maintain a DB to keep track of the types of properties, but only infer the type from the submitted value.

...