The Ground Truth in the AI Studio is the data set which is later used to train a Model on.
Structure
The Ground Truth consist off up to 3 components, with several properties:
Metadata
id
: Unique identifier of the Ground Truthname
: Name of the Ground Truthdescription
: Description about the Ground Truthtype
: Type of the Ground Truth (accepted values,text
,text+proximity
)tagging_level
: Level on which the extracts get tagged on (accepted valuessentence
,document
)label
: List of labels which can be used for taggingcandidateset_ids
, List of Candidate Set ids, which are used to generate Ground Truth
Labeled extracts
id
: Unique identifier of the labeled extractextract
: Includes of a text section (e.g. a sentence if tagging_level is set to ‘sentence’)label
: Label which classifies theextract
language
: Language of the Squirro Itemkeywords
: Additional keywords from the Squirro Itemitem_id
: Squirro item id in which the extract was foundcandidateset_id
: Id of Candidate Set which helped to find the extract
In addition there is a temporal versioning component in place with the fields:
user
: Which user did a changevalidity
: Indicates if a label ispositive
considered as true ornegative
considered as falsecreated_at
: Time of creation
Rules ( in case the Ground Truth is of type ‘text+proximity’)
id
: Unique identifier of the rulequery
: Query text of the ruleproximity
: Allowed distance of the words with in thequery
is_sequence
: Booleantype
: Type of the rule (inclusive
,exclusive
)labeled_item_id
: Id from the labeled item which is connected to the rule
Usage via Squirro Client
Ground Truth
config = { "type": "text", "tagging_level": "sentences", "label": ["dog", "no dog"], "description": "In this Ground Truth we select sentences are dog or not dog related.", "candidateset_ids": [CANDIDATE_SET_ID] } client.new_groundtruth(PROJECT_ID,'Dog Ground Truth',config) config = { "type": "text", # the tagging_level cannot be changed "label": ["dog", "no dog"], "description": "In this Ground Truth we select sentences are dog or not dog related.", "candidateset_ids": [CANDIDATE_SET_ID] } client.modify_groundtruth(PROJECT_ID, GROUNDTRUTH_ID, name='Dog Ground Truth (modified name)', config=config) client.delete_groundtruth(PROJECT_ID, GROUNDTRUTH_ID) client.get_groundtruths(PROJECT_ID) client.get_groundtruth(PROJECT_ID, GROUNDTRUTH_ID)
Labeled Extract
label = { "item_id": SQUIRRO_ITEM_ID, "extract": "The dog (Canis familiaris when considered a distinct species or Canis lupus familiaris when considered a subspecies of the wolf) is a domesticated carnivore of the family Canidae.", "label": "dog", "language": "en", "keywords": {}, "candidateset_id": CANDIDATE_SET_ID, } client.new_groundtruth_label(PROJECT_ID, GROUNDTRUTH_ID, label) client.modify_groundtruth_label(PROJECT_ID, GROUNDTRUTH_ID, LABELED_EXTRACT_ID, 'positive') client.delete_groundtruth_label(PROJECT_ID, GROUNDTRUTH_ID, LABELED_EXTRACT_ID) client.get_groundtruth_labels(PROJECT_ID, GROUNDTRUTH_ID) client.get_groundtruth_label(PROJECT_ID, GROUNDTRUTH_ID, LABELED_EXTRACT_ID)
Rule
rule = { "query": "dog sitter", "proximity": 6, "is_sequence": True, "type": "inclusive", "labeled_item_id": LABELED_EXTRACT_ID, } client.new_groundtruth_rule(PROJECT_ID, GROUNDTRUTH_ID, rule) client.modify_groundtruth_rule(PROJECT_ID, GROUNDTRUTH_ID, RULE_ID, rule) client.delete_groundtruth_rule('PROJECT_ID, GROUNDTRUTH_ID, RULE_ID) client.get_groundtruth_rule(PROJECT_ID, GROUNDTRUTH_ID, RULE_ID)