Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


Excerpt

When building dataloader plugins it can be useful to be able to store key-value pairs long term or for storage or caching purposes. For this purpose, two stores are made available to the DataSource Class.

Table of Contents

Table of Contents

Features

This API provides you with:

  • Ability to temporarily store key-value pairs in the scope of a dataloader plugin e.g for de-duplication, throttling, caching
  • Ability to “permanently” store key-value pairs. e.g. for custom state management used by dataloader plugins.
  • Redis and File-system backed implementations, specially to ensure that these apis work in the command line mode on a windows machine.
  • Sane defaults for the command line (file-backed implementation as against to Redis based implementation) for client machines where Redis might not be available.

Using the stores

The two stores are made available to the data loader plugin as self.key_value_cache and self.key_value_store.

Example of using the caches (shown for key_value_cache, but key_value_store will be analogous):

Code Block
languagepy
titlekey_value_cache
// Retrieving a value
my_key = self.key_value_cache.get('my_key')

// Setting the value
self.key_value_cache['my_key'] = 'hello world'

Storing the state (key_value_store)

key_value_store can be thought of as the permanent store of information. It should be used for application critical data.

The most common use case of key_value_store is preserving the state of the last run of the dataloader. This is especially critical in long-running data loading jobs, which index thousands of items. Losing the state of what was last loaded can be costly, both in terms of time spent reloading, as well as potential cost incurred as a result of making extra API connections (in the case you have a paid subscription with data provider).

The data written into key_value_store is kept until the user explicitly clears the data. Resetting a data source (using the "Reset" option in the user interface or --reset on the command line) will achieve this.

Caching responses (key_value_cache)

key_value_cache on the other hand is used for cases where it's useful and time-saving to preserve the state of some data, but it is not critical. 

A common example is when your loader makes repeated calls to ask some server for metadata. Let's say this metadata is expected to change on average every few days. So instead of fetching this metadata with every call, you can cache it and re-use the cached response.

The data will be kept for TTL (time to live), which defaults to a week. However the data may be evicted earlier, as Squirro implements an LRU (least-recently used) algorithm for this data, after a specific memory threshold is met.

Store methods

The following methods are available on both stores:

...

Remove the given key. Example:

del self.key_value_cache[key]

Backend configuration

If you are running the Squirro data loader in the command line mode, you can configure the below mentioned command line options to the squirro_data_load tool to control the implementation of the key value stores. The implementation backend (store_backend) defaults to filesystem in the command line mode.

Key-value store options:

Code Block
  --store-backend {filesystem,redis}
  --store-directory FILESYSTEM_STORE_DIRECTORY
  --redis-key-value-store-host REDIS_STORE_SERVER
  --redis-key-value-store-port REDIS_STORE_PORT
  --redis-key-value-store-password REDIS_STORE_PASSWORD
  --redis-key-value-store-db REDIS_STORE_DATABASE

Key-value cache options:

Code Block
  --cache-backend {filesystem,redis}
  --cache-directory FILESYSTEM_CACHE_DIRECTORY
  --redis-key-value-cache-host REDIS_CACHE_SERVER
  --redis-key-value-cache-port REDIS_CACHE_PORT
  --redis-key-value-cache-password REDIS_CACHE_PASSWORD
  --redis-key-value-cache-db REDIS_CACHE_DATABASE

Moreover, the same options can also be set in the dataloader.ini file in your ~/.squirro/ folder as shown below.

...

This page can now be found at API for Caching and Custom State Management on the Squirro Docs site.