Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The Ground Truth in the AI Studio is the data set which is later used to train a Model on.

Structure

The Ground Truth consist off up to 3 components, with several properties:

Metadata

  • id: Unique identifier of the Ground Truth

  • name: Name of the Ground Truth

  • description: Description about the Ground Truth

  • type: Type of the Ground Truth (accepted values, text, text+proximity)

  • tagging_level: Level on which the extracts get tagged on (accepted values sentence, document)

  • label: List of labels which can be used for tagging

  • candidateset_ids, List of Candidate Set ids, which are used to generate Ground Truth

Labeled extracts

  • id: Unique identifier of the labeled extract

  • extract: Includes of a text section (e.g. a sentence if tagging_level is set to ‘sentence’)

  • label: Label which classifies the extract

  • language: Language of the Squirro Item

  • keywords: Additional keywords from the Squirro Item

  • item_id: Squirro item id in which the extract was found

  • candidateset_id: Id of Candidate Set which helped to find the extract

In addition there is a temporal versioning component in place with the fields:

  • user: Which user did a change

  • validity: Indicates if a label is positive considered as true or negative considered as false

  • created_at: Time of creation 

Rules ( in case the Ground Truth is of type ‘text+proximity’)

  • id: Unique identifier of the rule

  • query: Query text of the rule

  • proximity: Allowed distance of the words with in the query

  • is_sequence: Boolean

  • type: Type of the rule (inclusive, exclusive)

  • labeled_item_id: Id from the labeled item which is connected to the rule

Usage via Squirro Client

Ground Truth

Code Block
breakoutModewide
languagepy
config = {
            "type": "text",
            "tagging_level": "sentences",
            "label": ["dog", "no dog"],
            "description": "In this Ground Truth we select sentences are dog or not dog related.",
            "candidateset_ids": [CANDIDATE_SET_ID]
        }

client.new_groundtruth(PROJECT_ID,'Dog Ground Truth',config)

config = {
            "type": "text",
            # the tagging_level cannot be changed
            "label": ["dog", "no dog"],
            "description": "In this Ground Truth we select sentences are dog or not dog related.",
            "candidateset_ids": [CANDIDATE_SET_ID]
        }

client.modify_groundtruth(PROJECT_ID, GROUNDTRUTH_ID, name='Dog Ground Truth (modified name)', config=config)

client.delete_groundtruth(PROJECT_ID, GROUNDTRUTH_ID)

client.get_groundtruths(PROJECT_ID)

client.get_groundtruth(PROJECT_ID, GROUNDTRUTH_ID)

Labeled Extract

Code Block
breakoutModewide
languagepy
label = {
    "item_id": SQUIRRO_ITEM_ID,
    "extract": "The dog (Canis familiaris when considered a distinct species or Canis lupus familiaris when considered a subspecies of the wolf) is a domesticated carnivore of the family Canidae.",
    "label": "dog",
    "language": "en",
    "keywords": {},
    "candidateset_id": CANDIDATE_SET_ID,
}

client.new_groundtruth_label(PROJECT_ID, GROUNDTRUTH_ID, label)

client.modify_groundtruth_label(PROJECT_ID, GROUNDTRUTH_ID, LABELED_EXTRACT_ID, 'positive')

client.delete_groundtruth_label(PROJECT_ID, GROUNDTRUTH_ID, LABELED_EXTRACT_ID)

client.get_groundtruth_labels(PROJECT_ID, GROUNDTRUTH_ID)

client.get_groundtruth_label(PROJECT_ID, GROUNDTRUTH_ID, LABELED_EXTRACT_ID)

Rule

...

breakoutModewide
languagepy

...

This page can now be found at Ground Truth on the Squirro Docs site.