Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Info

Work in progress, some fields are not yet documented.

Squirro items are represented in JSON format. This is true both for data loading and data consumption with the API. The following tables documents the properties a Squirro item can have.

...

Table of Contents
outlinetrue
excludeTable of Contents

About Squirro Items

When executing a search, Squirro will show the user a list of matching items as the query result.

When planning and integrating a custom data source the following points should be considered:

  • What is the smallest independent result entity that the user should be consuming? These should then be modeled as Squirro items.
  • The formatting of the body content and title.

Some examples for individual items are:

  • News story, web article, tweet, etc.
  • Binary document (PDF, Office documents, etc.)
  • Service ticket
  • Email
  • Chat message

Items can also contain sub-items which are always shown in the context of the full item. Typically Squirro uses these sub-items for indexing the individual pages or chapters of PDF documents as separate sub-items.

Fields

Common Fields

The fields in this table are used in both the data loading and data consumption APIs.

FieldData typeDescription
idUnique Identifier

While this field is called the same in data loading and consumption, it has different semantics. See the sections Data Loading Fields and Data Consumption Fields for details.

linkURLLink to the item at its original location.
titleStringItem title.
bodyHTML StringItem body. This field is in HTML format and special characters need to be escaped.
languageLanguage CodesContent language of the item. If this is not specified, it is auto-detected based on the content.
created_atDate and Time

Item creation date. Ideally this is the creation date of the item in its source system.

If this is not specified for data loading, the import process goes through the following steps:

  • If the files property is specified, Squirro tries to extract a creation date from the file metadata.
  • As a fallback the server's current date and time is used.
webshot_urlURL

Main item picture. This image is displayed in the result list to represent the story.

For data loading, the webshot_picture_hint field should be used, because the picture will then automatically be archived.

If this is not set, it is automatically extracted from the web site specified with the link property, generally by using the first story picture.

webshot_heightIntegerHeight of the webshot in pixels.
webshot_widthIntegerWidth of the webshot in pixels.
keywordsDictionary, values represented as lists

Keywords attached to the item. They are the structured information of an item.

Example item with keywords:

Code Block
languagejs
{
  "title": "Our offices",
  "body": "We have offices in Munich, …",
  "keywords": {
    "country": ["Germany"],
    "city": ["Munich", "Berlin"]
  }
}
locationTwo-element list

The geographical location for the item. This is stored as a two-element list of floats, representing the latitude and longitude. To query by location the List Items resource has a location query parameter.

Example:

Code Block
languagejs
{
  "title": "Our offices",
  "body": "We have offices in Munich, …",
  "location": [48.1059422, 11.5668324]
}
commentsList of dictionaries.

A list of comments that are attached to the item. The user can toggle the display of these comments in the web interface on top of the item body. Three fields can be added to comments:

FieldDescription
idExternal identifier of the comment. This field is mandatory.
typeType of the comment. The type twitter is treated specially, for all other types no special handling has been implemented.
body

Comment body. This field is in HTML format and special characters need to be escaped.

If the type has been set to twitter, this needs to be a valid embedded tweet (without the script tags).

filesList of dictionaries.

A list of files that are uploaded for the item. Note: this is modeled as a list, but only one file can currently be attached.

The fields for individual files are:

FieldDescription
contentBase64-encoded content of the file to upload. This or the url field are mandatory.
urlURL where the file can be downloaded from. This or the content field are mandatory.
nameFile name without path. Mandatory when the content field is provided. If the url field is provided, the name is derived from the URL by default.

When importing data into Squirro at least one of the fields titlebody or files must be set. All other fields are optional.

Data Loading Fields

These fields can be specified in the data loading APIs. They will be transformed and output with different names in the data consumption APIs.

FieldData typeDescription
idUnique Identifier

External item identifier. When a value is specified here at import, it is written into the external_id data consumption field.

Used by data providers to reference their source system. Squirro uses this identifier for  deduplication.

summaryText String

Item summary text. If not specified, this is generated from the body field. Any HTML tags are removed.

webshot_picture_hintURL

Main item picture. If this URL exists and can be downloaded, the image is archived by Squirro. The resulting URL is written into the webshot_url field. The picture width and height are calculated and written into webshot_width and webshot_height.

Note: processing of webshots is disabled by default for custom data imports (bulk provider).

Data Consumption Fields

Some fields are only available during data consumption because they are calculated on the fly or represent a user state. This table documents these fields.

FieldData typeDescription
idUnique Identifier

Internal item identifier, generated by Squirro only.

external_idString

External item identifier. The external identifier is used for deduplication and can be used to link items to their source system.

See the id field in the data loading fields for details.

readBooleanTrue if the item has been read.
starredBooleanTrue if the item has been starred.
abstractText String

Item abstract. This is generated from the summary field, or if that field doesn't exist, from the body.

In case the item is returned as a matching result to a query search, the abstract is calculated around the most relevant matching keywords.

scoreFloatRelevant score of the item. This is only set when the result list is ordered by relevance.
thumbler_urlPartial URLUsed internally by Squirro to display thumbnails of the webshot_url field.
explanationDictionary

Returned for items when the explain_smartfilters option is used in the Get Item resource. The dictionary contains a list of fields, for each of which the matching Smart Filter tokens are listed.

Example:

Code Block
languagejs
{
  …
  "explanation": {
    "matches": {
      "summary.stemmed": [
        {"term": "eliminated", "score": 0.010479515},
        {"term": "equipped", "score": 0.00846127}
      ],
      "body.stemmed": [
        {"term": "eliminated", "score": 0.010341313000000001},
        {"term": "equipped", "score": 0.008302803000000001}
      ],
      "language_code": [
        {"term": "en", "score": 0.0009886466000000001}
      ]
    }
  }
}
related_itemsList

Returned when the filter_related_items option is set in the List Items resource. A list of dictionaries which contain the field id of any related (duplicate) items.

Example:

Code Block
languagejs
{
  …
  "related_items": [
    {"id": "UfA8Ah08TeSLSUo-RBzm7Q"},
    {"id": "tjiS4mjaTgupKaIiZisYww"}
  ]
}
sub_items  
highlight_matches  
matching_sub_items  
has_matching_sub_items