Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Next »

The catalyst data model provides a sub-item model so that when significant events are detected, we can show exactly which sentence or phrase in a document triggered the catalyst, as well as building detailed relationships across documents.

Individual entities are stored in the entities item field.

Definitions / Vocabulary

Documentoriginal data as provided by the customer
Itema modified version of Document as stored within Squirro.
FacetMetadata assigned to an Item in the form of a key/[list of values] pair. Stored as attribute keywords in theItem.
Extracta single occurrence of a detected Entity within one ItemKeeps track of the location and the original text of the detection.
Entitya real-world or higher level object of a pre-defined type, such as persons, locations, organizations, products, events etc., that can be denoted with a proper name. An Entity has a list of Extracts with all its appearances within one Item. Optionally it can maintain a list of instantiations of properties. Properties are pre-defined per Entity type and are simple values or references to other Entities
Catalyst

a mapping between a Query and a set of Actions

QueryA string conforming to our query syntax. The query syntax is extended to allow searching for Entities. See Query Syntax below. 
Actionsome action executed based on a Catalyst match. E.g. send an email, call callback
Entity Profilepre-computed model for each value of an Entity.Used for ranking Recommendations.
RecommendationRanked result list of Entities based on a Query (potentially containing Entities)

Models

Itemsee Item Model 
Facetsee Facets API
Entity
 Example
[{
	"id": "1234",  # unique entity id
    "item_id": "123456",  # reference to original item id
	"type": "company",  # type of the entity, e.g. company, 
	"name": "Thomson Reuters",
	"confidence": 0.8,  # aggregated confidence of all extracts [0-1]
    "relevance": 0.9,  # relevance of this entity for the item [0-1]
	"extracts": [{
    	"text": "Thomson Reuters",  # original representation
		"field": "title",  # on which Item field can this extract be found
    	"confidence": 0.9,  # confidence level [0-1]
		"offset": 14,  # start offset of text within original item
		"length": 15,  # length of text within original item
	}, {
 		"text": "TR",  # original representation
		"field": "body",  # on which Item field can this extract be found
    	"confidence": 0.1,  # confidence level [0-1]
		"offset": 0,  # start offset of text within original item
		"length": 2,  # length of text within original item
	}],
    "properties": {
        "stock_symbol": "TR",  # value based property
        "parent_company_ref": "<id of company type entity>"  # reference based property
    },
}, {
	"id": "1237", # unique entity id 
	"item_id": "123456",  # original item id
	"type": "deal", # type of the entity, e.g. deal, 
	"name": "Thomson Reuters bought Squirro for 1Mio in the US.",
	"confidence": 0.3  # confidence level of this entity [0-1]
	"extracts": [{
    	"text": "Thomson Reuters bought Squirro for 1Mio in the US.",  # original representation
		"field": "body",  # on which Item field can this extract be found
    	"confidence": 0.3,  # confidence level [0-1]
		"offset": 114,  # start offset of text within original item
		"length": 52,  # length of text within original item
	}],
	"properties": {  # variable set of keys depending on the entity type
		"region_ref": <entity_id_1_of_type_geo>,
		"size": 10000000,
		"industry": null,
		"acquirer": <entity_id_3_of_type_company>,
		"target": <entity_id_3_of_type_company>,
	}
},
...
]

Note: Properties can come in two different types: string (default) or numeric. If they are numeric, e.g. of type float or int they will be indexed on a field 'numeric_properties' in elasticsearch and mapped back to 'properties' before returned. This allows e.g. for propper number comparison or range queries. Unlike for keywords we do not maintain a DB to keep track of the types of properties, but only infer the type from the submitted value.


Query Syntax

Entities
entity:{< any query to match a single entity document >}
 Examples
  • Search for Items containing a specific Entity of type company:

    entity:{type:company AND name:"Thomson Reuters"}
  • Search for Items containing at least one company-typed Entity "Thomson Reuters" and another one Entity "Squirro":

    entity:{type:company AND name:"Thomson Reuters"} AND entity:{type:company AND name:Squirro}
    
  • Search for Items containing a specific Entity of type company with a confidence higher than 80%:

    entity:{type:company AND name:"Thomson Reuters" AND confidence > 0.8}
  • Search for Items containing any Entity of type company with confidence higher than 70%:

    entity:{type:company AND NOT confidence < 0.7}
  • Search for Items containing no Entity of type company with confidence higher or equal than 20%:

    entity:{type:company AND confidence < 0.2}
    
  • Search for Items containing any Entity of type deal with at least a 70% confidence:

    entity:{type:deal AND confidence > 0.7}
  • Search for Items containing a specific Entity of type deal:

    entity:{type:deal AND properties.size:100 AND properties.region:US AND properties.industry:Tech AND properties.target:Whatsapp AND properties.acquirer:Facebook}
    
  • Search for Items containing one Entity with target Squirro and another Entity with target Whatsapp:

    entity:{type:deal AND properties.target:Squirro AND properties.industry:Tech} AND entity:{type:deal AND properties.target:Whatsapp AND properties.industry:Tech}
  • Search for Items containing an Entity of type deal with a property size bigger than 100:

    entity:{type:deal AND properties.size > 100}


  • No labels