This page will describe in detail how you can build a custom loader to work with data formats/inputs that are not supported out-of-the-box.
Prerequisites
Follow the steps outlined here: Data Loader Tutorial#Setup.
It is highly encouraged that before you install the Squirro Toolbox package, you create a python virtual environment (to isolate the packages) to work with.
Introduction
For any new data loader plugin create a new Python file. The Data loader plugin boilerplate template can be used to get started.
SDK reference
The plugin is implemented as an instance of the DataSource class. A number of methods must be implemented to provide the intended functionality. These special methods are all documented in DataSource Class.
Frontend-compatible loaders
Uploading
To provide a data loader plugin to the user in the user interface, it needs to be uploaded to the server. This is done using the squirro_asset command line tool.
See the full information on squirro_asset Command Line Reference, but in a nutshell this is how a data loader plugin can be uploaded:
squirro_asset dataloader_plugin upload --folder pubmed --token %TOKEN% --cluster %CLUSTER%
Preview
Apart from technical implementation differences between the command line and frontend data load which are not visible to the users, the main consideration for writing a UI compatible loader is the preview mode.
See Data loader plugin preview for details.
Preview mode is a UI feature that enables the user to have a peak at the data before it is ingested into the system. It allows a preview of the first 10 items. For most use cases this should not present difficulties, but there are a few cases which might result in data loss.
Caching & Data storage
Data loader plugins often need to cache information or store certain progress information. For these purposes there are two types of stores that are available to use inside a data loader plugin:
key_value_cache
key_value_store
This is covered in Data loader API for Caching and Custom State Management.