Page Comparison

...

Excerpt
This section shows all the available options for the Data Loader and explains their usage.

Argument	Mandatory	Description
--csv-delimiter CHARACTER			A one-character string used to separate fields. It defaults to ','.
--csv-quotechar CHARACTER			A one-character string used to quote fields containing special characters, such as the delimiter or quotechar, or which contain new-line characters. It defaults to ‘"’.
--source-file PATH	Yes	Path of csv data file.

...

The following example shows a simple load from a CSV file, mapping the title, id and body of the squirro item to columns from the ‘sample.csv’ file, without using any of the additional files for facets, templating, piplets etc. All the rows will be loaded and the delimiter between fields is considered any ‘,’ (comma) found in a row. To quote fields containing special characters the double quote character ‘"’ will be used.

...

Code Block

language	powershell

squirro_data_load -v ^
    --token $token ^
    --cluster $cluster ^
    --project-id $project_id ^
    --source-name csv_sample ^
    --source-file sample.csv ^
    --source-type csv ^
    --csv-delimiter , ^
    --csv-quotechar " ^
    --map-title Title ^
    --map-id ID ^
    --map-body Description

Note that the lines have been wrapped with the circumflex (^) at the end of each line. On Mac and Linux you will need to use backslash (\) instead.

This example assumes that $token, $cluster and $project_id have been declared beforehand.

Excel Source Options

When using an excel file as the data source, only full load is supported and data must have a header in order to determine the schema. If the first row of the data (after applying the boundaries, if needed) is not the header, a KeyError exception will be raised and the job will stop. In this case it’s not possible to determine the schema of the data.

The command line parameters used for an excel data source:

Argument	Mandatory	Description
--excel-sheet STRING			Excel sheet name. Default: get first sheet.
--excel-boundaries NUMBER: NUMBER		Limit rows loaded from excel. Format is: start_row:rows_discarded_from_end.
--source-file PATH	Yes	Path of excel data file.

Usage

The example below shows a simple load from an excel file, mapping only the title, id and body of the Squirro item to columns from the ‘sample.xlsx’ excel file, without using any of the additional files for facets, templating, piplets etc. The Data Loader tool will only load the ‘Products’ sheet of the file and from this sheet the rows starting at 1 up to the last 100 rows, which will not be loaded.

Code Block

language	powershell

squirro_data_load -v ^
    --token $token ^
    --cluster $cluster ^
    --project-id $project_id ^
    --source-name excel_sample ^
    --source-file sample.xlsx ^
    --source-type excel ^
    --excel-sheet Products ^
    --excel-boundaries 1:100 ^
    --map-title Title ^
    --map-id ID ^
    --map-body Description

Note that the lines have been wrapped with the circumflex (^) at the end of each line. On Mac and Linux you will need to use backslash (\) instead.

This example assumes that $token, $cluster and $project_id have been declared beforehand.

Database Options

When loading from a database, both full and incremental load are supported, using a select query supplied as a string or in a file. The script uses uses SQLAlchemy to connect to any database.

Tested databases:

Postgres and all databases using the postgres driver for connection (Greenplum, Redshift etc)
Microsoft SQL
Oracle
MySQL
SQLite

The command line parameters used for a database source:

Argument	Mandatory	Description
--db-connection STRING	Yes	Database connection string.
--input-file PATH		File containing the SQL code.
--input-query STRING			SQL query.

Note that the --input-file and --input-query arguments are mutually exclusive.

Usage

In the following example we are performing a simple load from the database, mapping the title, id and body of the Squirro item to columns from a database table that is interrogated in the sample.sql file. The Data Loader tool makes a full load of all the rows specified in the sample.sql file since the argument --incremental-column is not set.

Code Block

language	powershell

squirro_data_load -v ^
    --token $token ^
    --cluster $cluster ^
    --project-id $project_id ^
    --db-connection $db_connection_string ^
    --source-name db_sample ^
    --input-file $script_dir/interaction.sql ^
    --source-type database ^
    --map-title Title ^
    --map-id ID ^
    --map-body Description

Note that the lines have been wrapped with the circumflex (^) at the end of each line. On Mac and Linux you will need to use backslash (\) instead.

This example assumes that $token, $cluster and $project_id have been declared beforehand.

...

It is very simple to use the custom source, just supply the full path of the Python module by using the command line argument --source-script instead of using --source-type. Adding the custom module to the PYTHONPATH and importing it will be done automatically by the loader.

Versions Compared

Old Version 16

New Version 17

Key

Table of Contents

Excel Source Options

Usage

Database Options

Usage