Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Excerpt

The data loader frontend requests a preview from Data Loader Plugins. This preview is used to show the record values to users, so they can decide how to apply the item field and facet mapping.

While preview handling is automatic and uses the getDataBatch method of the DataSource Class, there are some additional considerations that should be applied.

Preview mode

When requesting a preview, the data source class has self.preview_mode set to True. This can be used to change the behaviour.

Considerations

Response size

The preview data is sent in full to the browser. As a result, returning large responses, such as file content, should be avoided.

For example a loader that returns file content, might want to change each row as follows:

Code Block
languagepy
…
if self.preview_mode:
    row['file_content'] = 'BINARY…'
…

Incremental loading

Even though the data loader handles incremental data out of the box, some plugins need to apply a separate logic to that. E.g. a loader might ensure that any result is only returned once. Where such logic is present, it must be disabled in preview mode.

Example:

  1. Imagine that purpose of your loader is to retrieve a list of articles first (metadata only), and then fetch the content of each PDF article (might be lots of megabytes). Consider this loaded to be a long-running job.

  2. In order not to double-download the articles, we keep track of which articles have already been downloaded onto the disk (by using Redis, for example, see Dataloader API for Caching and Custom State Management ).

  3. If these articles are indeed downloaded during the preview, you will mark them as downloaded. Note that the preview will extremely slow if it needs to download all the data.

  4. Then during the main load (post-preview), these 10 articles will be skipped, since they've already been marked as downloaded.

  5. Since previewed items are never ingested into Squirro, so you effectively lose the content of these 10 items.

Expensive queries

In preview mode ensure that the query to the source system can not time out. One common way to improve performance is to request fewer records than you would for the real fetching.This page can now be found at Data Loader Plugin Preview on the Squirro Docs site.