Squirro items are represented in JSON format. This is true both for data loading and data consumption with the API. The following tables documents the properties a Squirro item can have.
Table of Contents
Table of Contents | ||||
---|---|---|---|---|
|
About Squirro Items
When executing a search, Squirro will show the user a list of matching items as the query result.
When planning and integrating a custom data source the following points should be considered:
- What is the smallest independent result entity that the user should be consuming? These should then be modelled as Squirro items.
- The formatting of the body content and title.
Some examples of individual items are:
- News story, web article, tweet, etc.
- Binary document (PDF, Office documents, etc.)
- Service ticket
- Chat message
- …
Items can also contain sub-items which are always shown in the context of the full item. Typically Squirro uses these sub-items for indexing the individual pages or chapters of PDF documents as separate sub-items.
Fields
Common Fields
The fields in this table are used in both the data loading and data consumption APIs.
...
While this field is called the same in data loading and consumption, it has different semantics. See the sections 86084542 and 86084542 for details.
...
Item creation date. Ideally, this is the creation date of the item in its source system.
If this is not specified for data loading, the import process goes through the following steps:
- If the
files
property is specified, Squirro tries to extract a creation date from the file metadata. - As a fallback, the server's current date and time are used.
...
Main item picture. This image is displayed in the result list to represent the story.
For data loading, the webshot_picture_hint
field should be used because the picture will then automatically be archived.
If this is not set, it is automatically extracted from the web site specified with the link
property, generally by using the first story picture.
...
Keywords attached to the item (see Facets for full documentation). They are the structured information of an item.
- Item keywords are offered as filter options in the search screen.
- Keyword and their values are offered in the search field as typeahead options.
- Search Tagging, known entity extraction and /wiki/spaces/KB/pages/2949480 are used to add additional keywords to items.
Keyword values can have different data types. Please reference the Data Types section of the Facets documentation for details and format specification. The default data type is string. To use other formats, configure it before loading any data into the system. See Facets API for information.
Example item with keywords:
Code Block | ||
---|---|---|
| ||
{
"title": "Our offices",
"body": "We have offices in Munich, …",
"keywords": {
"country": ["Germany"],
"city": ["Munich", "Berlin"]
}
} |
...
When importing data into Squirro at least one of the fields title
, body
or files
must be set. All other fields are optional.
Data Loading Fields
These fields can be specified in the data loading APIs. They will be transformed and output with different names in the data consumption APIs.
...
External item identifier. When a value is specified here at import, it is written into the external_id
data consumption field.
Used by data providers to reference their source system. Squirro uses this identifier for deduplication.
...
Item summary text. If not specified, this is generated from the body
field. Any HTML tags are removed.
...
Main item picture. If this URL exists and can be downloaded, the image is archived by Squirro. The resulting URL is written into the webshot_url
field. The picture width and height are calculated and written into webshot_width
and webshot_height
.
Note: processing of webshots is disabled by default for custom data imports (bulk provider).
...
A list of files that are uploaded for the item. Note: this is modelled as a list, but only one file can currently be attached.
The fields for individual files are:
content
: Base64-encoded content of the file to upload. This or theurl
field are mandatory.url
: URL where the file can be downloaded from. This or thecontent
field are mandatory.name
: File name without path. Mandatory when the content field is provided. If theurl
field is provided, the name is derived from the URL by default.
Note, that at consumption this field also exists, but has a different layout. See below.
Data Consumption Fields
Some fields are only available during data consumption because they are calculated on the fly or represent a user state. This table documents these fields.
...
Internal item identifier, generated by Squirro only.
...
External item identifier. The external identifier is used for deduplication and can be used to link items to their source system.
See the id
field in the data loading fields for details.
...
Item abstract. This is generated from the summary
field, or if that field doesn't exist, from the body
.
In case the item is returned as a matching result to a query search, the abstract is calculated around the most relevant matching keywords.
...
Returned for items when the explain_smartfilters
option is used in the Get Item resource. The dictionary contains a list of fields, for each of which the matching Smart Filter tokens are listed.
Example:
Code Block | ||
---|---|---|
| ||
{
…
"explanation": {
"matches": {
"summary.stemmed": [
{"term": "eliminated", "score": 0.010479515},
{"term": "equipped", "score": 0.00846127}
],
"body.stemmed": [
{"term": "eliminated", "score": 0.010341313000000001},
{"term": "equipped", "score": 0.008302803000000001}
],
"language_code": [
{"term": "en", "score": 0.0009886466000000001}
]
}
}
} |
...
Returned when the filter_related_items
option is set in the List Items resource. A list of dictionaries which contain the field id
of any related (duplicate) items.
Example:
Code Block | ||
---|---|---|
| ||
{
…
"related_items": [
{"id": "UfA8Ah08TeSLSUo-RBzm7Q"},
{"id": "tjiS4mjaTgupKaIiZisYww"}
]
} |
...
A dictionary of matching query terms per field.
Code Block | ||
---|---|---|
| ||
{
…
"highlight_matches": {
"body": ["asia"],
"summary": ["asia"]
},
…
}
|
...
Returns a list of subscription details.
Code Block |
---|
{
…
"subscriptions": [{
"title": "Upload",
"object_ids": ["-RmFf-hkQLq0GX8IOE_WvQ"],
"link": null,
"provider": "bulk",
"source_id": "KJIMuj2DTM6UpSbdboqYIA",
"id": "n_ualpEOTLqAW7e6sBPv1w"
}],
…
} |
...
Returns a list of object details.
Code Block |
---|
{
…
"objects": [{
"subscription_ids": ["n_ualpEOTLqAW7e6sBPv1w"],
"project_id": "q6-ODvBGQby1vxFjckq3eg",
"type": "default",
"id": "-RmFf-hkQLq0GX8IOE_WvQ",
"title": "default"
}],
…
}
|
...
A list of files that are uploaded for the item. Note: this is modelled as a list, but only one file can currently be attached.
The fields for individual files are:
content_url
:content:///
URL where the file is stored. This includes the storage bucket name and the path within that bucket. See /wiki/spaces/KB/pages/88187790 for information on storage buckets.mime_type
: MIME type of the file.name
: File name without path.
...
This page can now be found atItem Format on the Squirro Docs site.