When executing a search, Squirro will show the user a list of matching items as the query result.
When planning and integrating a custom data source the following points should be considered:
What is the smallest independent result entity that the user should be consuming? These should then be modeled as Squirro items.
The formatting of the body content and title.
Some examples of individual items are:
News story, web article, tweet, etc.
Binary document (PDF, Office documents, etc.)
Items can also contain sub-items which are always shown in the context of the full item. By default Squirro uses these sub-items for indexing the individual pages of PDF documents as separate sub-items.
The fields in this table are used in both the data loading and data consumption APIs.
Keyword values can have different data types. Please reference the Data Types section of the Facets documentation for details and format specification. The default data type is string. To use other formats, configure it before loading any data into the system. See Facets API for information.
Entities attached to the item. See the documentation on the Catalyst Data Model for the data structure of individual entities.
When importing data into Squirro at least one of the fields title, body or files must be set. All other fields are optional.
Data Loading Fields
These fields can be specified in the data loading APIs. They will be transformed and output with different names in the data consumption APIs.
External item identifier. When a value is specified here at import, it is written into the external_iddata consumption field.
Used by data providers to reference their source system. Squirro uses this identifier for deduplication.
Item summary text. If not specified, this is generated from the body field. Any HTML tags are removed.
Main item picture. If this URL exists and can be downloaded, the image is archived by Squirro. The resulting URL is written into the webshot_url field. The picture width and height are calculated and written into webshot_width and webshot_height.
Note: processing of webshots is disabled by default for custom data imports (bulk provider).
The MIME type of the body. Set to text/html for HTML bodies. For all other types, a conversion to HTML will be attempted. If not specified, the MIME type is auto-detected. See the Content Conversion step for more details.
List of dictionaries.
A list of files that are uploaded for the item. Note: this is modelled as a list, but only one file can currently be attached.
The fields for individual files are:
content: Base64-encoded content of the file to upload. This or the url field are mandatory.
url: URL where the file can be downloaded from. This or the content field are mandatory.
name: File name without path. Mandatory when the content field is provided. If the url field is provided, the name is derived from the URL by default.
Note, that at consumption this field also exists, but has a different layout. See below.
Data Consumption Fields
Some fields are only available during data consumption because they are calculated on the fly or represent a user state. This table documents these fields.