Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


Excerpt

The catalyst data model provides a sub-item model so that when significant events are detected, we can show exactly which sentence or phrase in a document triggered the catalyst, as well as building detailed relationships across documents.

Individual entities are stored in the entities item field.

Definitions / Vocabulary

...

A mapping between a Query and a set of Actions.

...

Image Removed

Models

...

titleExample
Code Block
languagejs
[{
	"id": "1234",  # unique entity id
    "item_id": "123456",  # reference to original item id
	"type": "company",  # type of the entity, e.g. company, 
	"name": "Thomson Reuters",
	"confidence": 0.8,  # aggregated confidence of all extracts [0-1]
    "relevance": 0.9,  # relevance of this entity for the item [0-1]
	"extracts": [{
    	"text": "Thomson Reuters",  # original representation
		"field": "title",  # on which Item field can this extract be found
    	"confidence": 0.9,  # confidence level [0-1]
		"offset": 14,  # start offset of text within original item
		"length": 15,  # length of text within original item
	}, {
 		"text": "TR",  # original representation
		"field": "body",  # on which Item field can this extract be found
    	"confidence": 0.1,  # confidence level [0-1]
		"offset": 0,  # start offset of text within original item
		"length": 2,  # length of text within original item
	}],
    "properties": {
        "stock_symbol": "TR",  # value based property
        "parent_company_ref": "<id of company type entity>"  # reference based property
    },
}, {
	"id": "1237", # unique entity id 
	"item_id": "123456",  # original item id
	"type": "deal", # type of the entity, e.g. deal, 
	"name": "Thomson Reuters bought Squirro for 1Mio in the US.",
	"confidence": 0.3  # confidence level of this entity [0-1]
	"extracts": [{
    	"text": "Thomson Reuters bought Squirro for 1Mio in the US.",  # original representation
		"field": "body",  # on which Item field can this extract be found
    	"confidence": 0.3,  # confidence level [0-1]
		"offset": 114,  # start offset of text within original item
		"length": 52,  # length of text within original item
	}],
	"properties": {  # variable set of keys depending on the entity type
		"region_ref": <entity_id_1_of_type_geo>,
		"size": 10000000,
		"industry": null,
		"acquirer": <entity_id_3_of_type_company>,
		"target": <entity_id_3_of_type_company>,
	}
},
...
]

Note: Properties can come in two different types: string (default) or numeric. If they are numeric, e.g. of type float or int they will be indexed on a field 'numeric_properties' in elasticsearch and mapped back to 'properties' before returned. This allows e.g. for propper number comparison or range queries. Unlike for keywords we do not maintain a DB to keep track of the types of properties, but only infer the type from the submitted value.

Query Syntax

...

No Format
entity:{< any query to match a single entity document >}

...

titleExamples

Search for Items containing a specific Entity of type company:

No Format
entity:{type:company AND name:"Thomson Reuters"}

Search for Items containing at least one company-typed Entity "Thomson Reuters" and another one Entity "Squirro":

No Format
entity:{type:company AND name:"Thomson Reuters"} AND entity:{type:company AND name:Squirro}

Search for Items containing a specific Entity of type company with a confidence higher than 80%:

No Format
entity:{type:company AND name:"Thomson Reuters" AND confidence > 0.8}

Search for Items containing any Entity of type company with confidence higher than 70%:

No Format
entity:{type:company AND NOT confidence < 0.7}

Search for Items containing no Entity of type company with confidence higher or equal than 20%:

No Format
entity:{type:company AND confidence < 0.2}

Search for Items containing any Entity of type deal with at least a 70% confidence:

No Format
entity:{type:deal AND confidence > 0.7}

Search for Items containing a specific Entity of type deal:

No Format
entity:{type:deal AND properties.size:100 AND properties.region:US AND properties.industry:Tech AND properties.target:Whatsapp AND properties.acquirer:Facebook}

Search for Items containing one Entity with target Squirro and another Entity with target Whatsapp:

No Format
entity:{type:deal AND properties.target:Squirro AND properties.industry:Tech} AND entity:{type:deal AND properties.target:Whatsapp AND properties.industry:Tech}

Search for Items containing an Entity of type deal with a property size bigger than 100:

No Format
entity:{type:deal AND properties.size > 100}

childrenThis page can now be found at Catalyst Data Model on the Squirro Docs site.