You are viewing an old version of this page. View the current version.
Compare with Current
View Page History
« Previous
Version 48
Table of Contents
class squirro_client.item_uploader.ItemUploader
class squirro_client.item_uploader.ItemUploader(token=None, project_id=None, project_title=None, object_id=None, source_name=None, source_ext_id=None, cluster=None, client_cls=None, batch_size=None, config_file=None, config_section=None, processing_config=None, steps_config=None, source_id=None, source_secret=None, pipeline_workflow_name=None, pipeline_workflow_id=None, timeout_secs=None, non_retry_list=[200, 202, 400, 401, 403, 404], **kwargs)
Item uploader class. Defaults are loaded from the .squirrorc
file in
the current user’s home directory.
Parameters: |
- token – User refresh token.
- project_id – Identifier of the project, optional but one of
project_id or project_title has to be passed in.
- project_title – Title of the project. This will use the first
project found with the given title. If two projects with the same
title exist the project being used is not predictable.
- object_id – This parameter is deprecated, and is no longer needed.
- source_name – Name of the source to be used. If a source with this
name does not exist, then a new source with this name is created. If
more than one sources with this name exist, then the processing is
aborted and can only be resumed by specifying the source_id of the
desired source to load into.
- source_ext_id – External identifier of the source, if not
provided defaults to
source_name .
- cluster – Cluster to connect to. This only needs to be changed
for on-premise installations.
- batch_size – Number of items to send in one request. This should
be lower than 100 depending on your setup. If set to -1 the optimal
batch size is calculated from the items. Defaults to -1.
- config_file – Configuration file to use, defaults to
~/.squirrorc
- config_section – Section of the .ini file to use, defaults to
squirro .
- source_id – Source which should be used. If a source with the id
exists, then no source is created
- source_secret – This option is deprecated now and is ignored.
- pipeline_workflow_name – Pipeline workflow name. Either name or id
need to be set.
- pipeline_workflow_id – Pipeline workflow ID.
- non_retry_list –
List of status codes for which we don’t want a
retry/backoff logic.
Defaults to [200, 202, 401, 403, 400, 404]
200, 202: Successful codes.
401, 403: Already have a retry block in the _perform_request method
400, 404: Does not make sense to retry for these codes as retrying
|
Typical usage:
>>> from squirro_client import ItemUploader
>>> uploader = ItemUploader(project_title='My Project',
... token='<your token>')
>>> items = [{'id': 'squirro-item1',
... 'title': 'Items arrived in Squirro!'}]
>>> uploader.upload(items)
Project selection:
The ItemUploader creates a source in your project. The project must
exist before the ItemUploader is instantiated.
Source selection:
The source will be created or re-used, the above parameter define
how the source will be named.
Configuration:
The ItemUploader
can load its settings from a configuration file
The default section is squirro
and may be overridden by the parameter
config_section
to allow for multiple sources/projects.
Example configuration:
[squirro]
project_id = 2sic33jZTi-ifflvQAVcfw
token = 9c2d1a9002a8a152395d74880528fbe4acadc5a1
upload
upload(items, priority=0, pipeline_workflow_id=None, num_retries=10, delay=1, backoff=2)
Sends items
to Squirro.
Parameters: |
- items – A list of items. See
api_reference_sink_data_format for the item format.
- priority – int, describing the priority of ingestion for the
dataset to be loaded. Currently only supports a value of 0 or 1.
0 means that the items are loaded in an asynchronous fashion and
1 would mean that the items are loaded in a synchronous fashion.
- pipeline_workflow_id – str, id of an existing pipeline
workflow which should be used to process the current batch of
items. Can only be used with parameter priority set to 1.
- num_retries – int, Number of retries to make when a service is
unavailable.
- delay – int, Initial delay in seconds between retries.
- backoff – int, Backoff multiplier, e.g. value of 2 will double
the delay each retry.
|