Page Comparison

batch_size – Number of items to send in one request.
batch_size_mb – Size of documents to send in one request. If this file size is reached, the client uploads the existing documents.
metadata_mapping – A dictionary which contains the meta-data mapping.
default_mime_type_keyword – If set to True a default keyword is added to the document which contains the mime-type.
timeout_secs – How many seconds to wait for data before giving up (default 300).
kwargs – Any additional keyword arguments are passed on to the ItemUploader. See the documentation of that class for details.

Typical usage:

Code Block

language	python

>>> from squirro_client import DocumentUploader
>>> import os
>>> uploader = DocumentUploader(
...     project_title='My Project', token='<your token>',
...     cluster='https://demo.squirro.net/')
>>> uploader.upload(os.path.expanduser('~/Documents/test.pdf'))
>>> uploader.flush()

Meta-data mapping usage:

By default (i.e. for all document mime-types) map the original document size to a keyword field named “Doc Size”:

Code Block

language	python

>>> mapping = {'default': {'sq:size_orig': 'Doc Size',
...                        'sq:content-mime-type': 'Mime Type'}}
>>> uploader = DocumentUploader(metadata_mapping=mapping)

For a specific mime-type (i.e. ‘application/vnd.oasis.opendocument.text’) map the “meta:word-count” meta-data filed value to a keyword field named “Word Count”:
Code Block
language python
>>> mapping = {'application/vnd.oasis.opendocument.text': { ... 'meta:word-count': 'Word Count'}} >>> uploader = DocumentUploader(metadata_mapping=mapping)

Default meta-data fields available for mapping usage:

sq:doc_size: Converted document file size.
sq:doc_size_orig: Original uploaded document file size.
sq:content-mime-type: Document mime-type specified during upload operation.

`upload`

upload(filename, mime_type=None, title=None, doc_id=None, keywords=None, link=None, created_at=None, filename_encoding=None, content_url=None)

Method which will use the provided filename to create a Squirro item for upload. Items are buffered internally and uploaded according to the specified batch size. If mime_type is not provided a simple filename extension based lookup is performed.

Parameters:

filename – Read content from the provided filename.
mime_type – Optional mime-type for the provided filename.
title – Optional title for the uploaded document.
doc_id – Optional external document identifier.
keywords – Optional dictionary of document meta data keywords. All values must be lists of string.
link – Optional URL which points to the origin document.
created_at – Optional document creation date and time.
filename_encoding – Encoding of the filename.
content_url – Storage URL of this file. If this is set, the Squirro cluster will not copy the file.

Example:

Code Block

language	python

>>> filename = 'test.pdf'
>>> mime_type = 'application/pdf'
>>> title = 'My Test Document'
>>> doc_id = 'doc01'
>>> keywords = {'Author': ['John Smith'], 'Tags': ['sales',
...                                                'marketing']}
>>> link = 'http://example.com/test.pdf'
>>> created_at = '2014-07-10T21:26:15'
>>> uploader.upload(filename, mime_type, title, doc_id, keywords,
...                 link, created_at)

`flush`

flush()Flush the internal buffer by uploading all documents

Info
The DocumentUploader Reference has now been moved to the Developer Documentation and can be accessed there.

Versions Compared

Old Version 24

New Version Current

Key

Table of Contents

`class squirro_client.document_uploader.DocumentUploader`

`upload`

`flush`