Import Data into Squirro from a QlikView Load Script

Intro
 

Using the QV STORE, DROP FIELDS and EXECUTE commands it is possible to hook the Squirro CSV importer into existing load scripts easily.

 

The concept is this:

  • Load additional fields and/or tables into QV
  • Prepare the data so it is ready for Squirro
  • Store the required data into CSV file(s) using the STORE command
  • Drop the data not needed for the QV Dashboard using the DROP FIELDS command, to keep RAM consumption to a minimum.
  • Prepare the data so it is ready for the QV Dashboard
  • Store the finished data in QVD file(s) using the STORE command
  • Import the CSV file(s) into Squirro using an executable provided by Squirro (not available yet) with the EXECUTE command.
  • Optionally delete the CSV files(s) from the disk

The bulk loader command line tool is documented on the page /wiki/spaces/WIP/pages/2949365.

Working Example Script

(warning) There are still some encoding issues that need to be taken care of. QlikView actually writes a BOM!

SET DecimalSep='.';
SET MoneyThousandSep=',';
SET MoneyDecimalSep='.';
SET MoneyFormat='$#,##0.00;($#,##0.00)';
SET TimeFormat='h:mm:ss TT';
SET DateFormat='M/D/YYYY';
SET TimestampFormat='M/D/YYYY h:mm:ss[.fff] TT';
SET MonthNames='Jan;Feb;Mar;Apr;May;Jun;Jul;Aug;Sep;Oct;Nov;Dec';
SET DayNames='Mon;Tue;Wed;Thu;Fri;Sat;Sun';

//load both structured and unstructured data from our sources
Storm:
LOAD BEGIN_YEARMONTH,
     BEGIN_DAY,
     BEGIN_TIME,
     END_YEARMONTH,
     END_DAY,
     END_TIME,
     EPISODE_ID,
     EVENT_ID,
     STATE,
     STATE_FIPS,
     YEAR,
     MONTH_NAME,
     EVENT_TYPE,
     CZ_TYPE,
     CZ_FIPS,
     CZ_NAME,
     WFO,
     BEGIN_DATE_TIME,
     CZ_TIMEZONE,
     END_DATE_TIME,
     INJURIES_DIRECT,
     INJURIES_INDIRECT,
     DEATHS_DIRECT,
     DEATHS_INDIRECT,
     DAMAGE_PROPERTY,
     DAMAGE_CROPS,
     SOURCE,
     MAGNITUDE,
     MAGNITUDE_TYPE,
     FLOOD_CAUSE,
     CATEGORY,
     TOR_F_SCALE,
     TOR_LENGTH,
     TOR_WIDTH,
     TOR_OTHER_WFO,
     TOR_OTHER_CZ_STATE,
     TOR_OTHER_CZ_FIPS,
     TOR_OTHER_CZ_NAME,
     BEGIN_RANGE,
     BEGIN_AZIMUTH,
     BEGIN_LOCATION,
     END_RANGE,
     END_AZIMUTH,
     END_LOCATION,
     BEGIN_LAT,
     BEGIN_LON,
     END_LAT,
     END_LON,
     EPISODE_TITLE,
     EPISODE_NARRATIVE,
     EVENT_NARRATIVE,
     LAST_MOD_DATE,
     LAST_MOD_TIME,
     LAST_CERT_DATE,
     LAST_CERT_TIME,
     LAST_MOD,
     LAST_CERT,
     ADDCORR_FLG,
     ADDCORR_DATE
FROM
Z:\Desktop\squirro_bulk_loader\storm.xls
(biff, embedded labels);

//store fields required by Squirro into a CSV file
STORE EPISODE_ID, EVENT_ID, EVENT_TYPE, BEGIN_LOCATION, END_LOCATION, EVENT_NARRATIVE, EPISODE_NARRATIVE from Storm into squirro.csv (txt);

//remove unstructured data fields, do they don't take up space in ram and on disk
DROP FIELDS EVENT_NARRATIVE,EPISODE_NARRATIVE;

//store the remaining data into QVD for use in QV dashboards
STORE * from @2 into qlikview.qvd (qvd);

//call a Squirro helper tool to push the CSV file into the Squirro index.
EXECUTE cmd.exe /C "C:\Users\Alexander Sennhauser\Downloads\bulk_loader\squirro_bulk_loader.exe" --local --user-api-url http://172.16.148.1:20009 --topic-api-url http://172.16.148.1:20000 --bulk-api-url http://172.16.148.1:20021 -vv --log-file c:\tmp\squirro.log -t cbc5e25fffa21bf68d3383bd84e8764fd2804c75e73149e4317691dcfd2b5b4680905089060264ecc3af6f9379afcce015197692593a7ed9e50938c1f981c csv --infile squirro.csv;