The Ground Truth in the AI Studio is the data set which is later used to train a Model on.
Structure
The Ground Truth consist off up to 3 components, with several properties:
Metadata
id
: Unique identifier of the Ground Truthname
: Name of the Ground Truthdescription
: Description about the Ground Truthtype
: Type of the Ground Truth (accepted values,text
,text+proximity
)tagging_level
: Level on which the extracts get tagged on (accepted valuessentence
,document
)label
: List of labels which can be used for taggingcandidateset_ids
, List of Candidate Set ids, which are used to generate Ground Truth
Labeled extracts
id
: Unique identifier of the labeled extractextract
: Includes of a text section (e.g. a sentence if tagging_level is set to ‘sentence’)label
: Label which classifies theextract
language
: Language of the Squirro Itemkeywords
: Additional keywords from the Squirro Itemitem_id
: Squirro item id in which the extract was foundcandidateset_id
: Id of Candidate Set which helped to find the extract
In addition there is a temporal versioning component in place with the fields:
user
: Which user did a changevalidity
: Indicates if a label ispositive
considered as true ornegative
considered as falsecreated_at
: Time of creation
Rules ( in case the Ground Truth is of type ‘text+proximity’)
id
: Unique identifier of the rulequery
: Query text of the ruleproximity
: Allowed distance of the words with in thequery
is_sequence
: Booleantype
: Type of the rule (inclusive
,exclusive
)labeled_item_id
: Id from the labeled item which is connected to the rule
Usage via Squirro Client
Ground Truth
Code Block | ||||
---|---|---|---|---|
| ||||
config = {
"type": "text",
"tagging_level": "sentences",
"label": ["dog", "no dog"],
"description": "In this Ground Truth we select sentences are dog or not dog related.",
"candidateset_ids": [CANDIDATE_SET_ID]
}
client.new_groundtruth(PROJECT_ID,'Dog Ground Truth',config)
config = {
"type": "text",
# the tagging_level cannot be changed
"label": ["dog", "no dog"],
"description": "In this Ground Truth we select sentences are dog or not dog related.",
"candidateset_ids": [CANDIDATE_SET_ID]
}
client.modify_groundtruth(PROJECT_ID, GROUNDTRUTH_ID, name='Dog Ground Truth (modified name)', config=config)
client.delete_groundtruth(PROJECT_ID, GROUNDTRUTH_ID)
client.get_groundtruths(PROJECT_ID)
client.get_groundtruth(PROJECT_ID, GROUNDTRUTH_ID) |
Labeled Extract
Code Block | ||||
---|---|---|---|---|
| ||||
label = {
"item_id": SQUIRRO_ITEM_ID,
"extract": "The dog (Canis familiaris when considered a distinct species or Canis lupus familiaris when considered a subspecies of the wolf) is a domesticated carnivore of the family Canidae.",
"label": "dog",
"language": "en",
"keywords": {},
"candidateset_id": CANDIDATE_SET_ID,
}
client.new_groundtruth_label(PROJECT_ID, GROUNDTRUTH_ID, label)
client.modify_groundtruth_label(PROJECT_ID, GROUNDTRUTH_ID, LABELED_EXTRACT_ID, 'positive')
client.delete_groundtruth_label(PROJECT_ID, GROUNDTRUTH_ID, LABELED_EXTRACT_ID)
client.get_groundtruth_labels(PROJECT_ID, GROUNDTRUTH_ID)
client.get_groundtruth_label(PROJECT_ID, GROUNDTRUTH_ID, LABELED_EXTRACT_ID) |
Rule
...
breakoutMode | wide |
---|---|
language | py |
...
This page can now be found at Ground Truth on the Squirro Docs site.