Table of Contents
Table of Contents |
---|
maxLevel | 4 |
---|
outline | true |
---|
exclude | Table of Contents|Introduction |
---|
|
class squirro_client.item_uploader.ItemUploader
class squirro_client.item_uploader.ItemUploader(token=None, project_id=None, project_title=None, object_id=None, source_name=None, source_ext_id=None, cluster=None, client_cls=None, batch_size=None, config_file=None, config_section=None, processing_config=None, steps_config=None, source_id=None, source_secret=None, pipeline_workflow_name=None, pipeline_workflow_id='default', timeout_secs=None, **kwargs)
Item uploader class. Defaults are loaded from the .squirrorc file in the current user’s home directory.
Parameters: | - token – User refresh token.
- project_id – Identifier of the project, optional but one of project_id or project_title has to be passed in.
- project_title – Title of the project. This will use the first project found with the given title. If two projects with the same title exist the project being used is not predictable.
- object_id – Identifier of the object.
- source_name – Name of the source.
- source_ext_id – External identifier of the source, if not provided defaults to source_name.
- cluster – Cluster to connect to. This only needs to be changed for on-premise installations.
- batch_size – Number of items to send in one request. This should be lower than 100 depending on your setup. If set to -1 the optimal batch size is calculated from the items. Defaults to -1.
- bulk_index – If set to True the cluster is instructed to index data in bulk.
- bulk_index_add_batch_identifier – If set to True the cluster is instructed to add a batch identifier to each item during bulk indexing.
- bulk_index_add_summary_from_body – If set to True the cluster is instructed to add the summary from the body during bulk indexing.
- config_file – Configuration file to use, defaults to ~/.squirrorc
- config_section – Section of the .ini file to use, defaults to squirro.
- processing_config – A dictionary which contains specific instructions which are used while processing items for the source. Overriden by pipeline workflow name/ID.
- source_id – Source which should be used. If passed in together with source_secret no source is created.
- source_secret – Source secret to be used with source_id. If passed in together with source_id no source is created.
- pipeline_workflow_name – Pipeline workflow name. Either name or id need to be set, otherwise processing_config is used.
- pipeline_workflow_id – Pipeline workflow ID.
|
---|
Typical usage:
Code Block |
---|
|
>>> from squirro_client import ItemUploader
>>> uploader = ItemUploader(project_title='My Project',
... token='<your token>')
>>> items = [{'id': 'squirro-item1',
... 'title': 'Items arrived in Squirro!'}]
>>> uploader.upload(items) |
Project selection:
The ItemUploader creates a source in your project. The project must exist before the ItemUploader is instantiated.
Source selection:
The source will be created or re-used, the above parameter define how the source will be named.
Configuration:
The ItemUploader can load its settings from a configuration file. The default section is squirro and may be overridden by the parameter config_section to allow for multiple sources/projects.
Example configuration:
Code Block |
---|
|
[squirro]
project_id = 2sic33jZTi-ifflvQAVcfw
token = 9c2d1a9002a8a152395d74880528fbe4acadc5a1 |
upload
upload(items)
Sends items to Squirro.
Parameters: | items – A list of items. See api_reference_sink_data_format for the item format. |
---|