Page Comparison

The Ground Truth in the AI Studio is the data set which is later used to train a Model on.

Structure

The Ground Truth consist off up to 3 components, with several properties:

Metadata

id: Unique identifier of the Ground Truth
name: Name of the Ground Truth
description: Description about the Ground Truth
type: Type of the Ground Truth (accepted values, text, text+proximity)
tagging_level: Level on which the extracts get tagged on (accepted values sentence, document)
label: List of labels which can be used for tagging
candidateset_ids, List of Candidate Set ids, which are used to generate Ground Truth

Labeled extracts

id: Unique identifier of the labeled extract
extract: Includes of a text section (e.g. a sentence if tagging_level is set to ‘sentence’)
label: Label which classifies the extract
language: Language of the Squirro Item
keywords: Additional keywords from the Squirro Item
item_id: Squirro item id in which the extract was found
candidateset_id: Id of Candidate Set which helped to find the extract

In addition there is a temporal versioning component in place with the fields:

user: Which user did a change
validity: Indicates if a label is positive considered as true or negative considered as false
created_at: Time of creation

Rules ( in case the Ground Truth is of type ‘text+proximity’)

id: Unique identifier of the rule
query: Query text of the rule
proximity: Allowed distance of the words with in the query
is_sequence: Boolean
type: Type of the rule (inclusive, exclusive)
labeled_item_id: Id from the labeled item which is connected to the rule

Usage via Squirro Client

Ground Truth

Code Block

breakoutMode	wide
language	py

config = {
            "type": "text",
            "tagging_level": "sentences",
            "label": ["dog", "no dog"],
            "description": "In this Ground Truth we select sentences are dog or not dog related.",
            "candidateset_ids": [CANDIDATE_SET_ID]
        }

client.new_groundtruth(PROJECT_ID,'Dog Ground Truth',config)

config = {
            "type": "text",
            # the tagging_level cannot be changed
            "label": ["dog", "no dog"],
            "description": "In this Ground Truth we select sentences are dog or not dog related.",
            "candidateset_ids": [CANDIDATE_SET_ID]
        }

client.modify_groundtruth(PROJECT_ID, GROUNDTRUTH_ID, name='Dog Ground Truth (modified name)', config=config)

client.delete_groundtruth(PROJECT_ID, GROUNDTRUTH_ID)

client.get_groundtruths(PROJECT_ID)

client.get_groundtruth(PROJECT_ID, GROUNDTRUTH_ID)

Labeled Extract

Code Block

breakoutMode	wide
language	py

label = {
    "item_id": SQUIRRO_ITEM_ID,
    "extract": "The dog (Canis familiaris when considered a distinct species or Canis lupus familiaris when considered a subspecies of the wolf) is a domesticated carnivore of the family Canidae.",
    "label": "dog",
    "language": "en",
    "keywords": {},
    "candidateset_id": CANDIDATE_SET_ID,
}

client.new_groundtruth_label(PROJECT_ID, GROUNDTRUTH_ID, label)

client.modify_groundtruth_label(PROJECT_ID, GROUNDTRUTH_ID, LABELED_EXTRACT_ID, 'positive')

client.delete_groundtruth_label(PROJECT_ID, GROUNDTRUTH_ID, LABELED_EXTRACT_ID)

client.get_groundtruth_labels(PROJECT_ID, GROUNDTRUTH_ID)

client.get_groundtruth_label(PROJECT_ID, GROUNDTRUTH_ID, LABELED_EXTRACT_ID)

Rule

...

breakoutMode	wide
language	py

...

This page can now be found at Ground Truth on the Squirro Docs site.

Versions Compared

Old Version 4

New Version Current

Key