Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Squirro’s natively pipeline supports processing of complex document types, such as common Office formats or PDF. This guide explains how to set up the pipeline to achieve this.

Table of Contents

Table of Contents
minLevel1
maxLevel7

Overview

Squirro can index and displaying a number common of Office document formats. This enables end users to interact with those documents directly and search within them.

When set up correctly, these documents become searchable and include thumbnails in the result list:

...

They can also be displayed directly in the user interface, without the user having to navigate away to an Office application:

...

Setup

The process to get to this is straightforward. Squirro provides all the required Pipeline Steps and default configuration out of the box.

Set up Pipeline Workflow

...

In the Setup space navigate to Pipeline.

...

At the top right press the pencil icon to enter edit mode.

...

At the bottom left choose + New Pipeline to create a new pipeline.

...

A list of pipeline presets is displayed. Select the Binary Documents workflow.

...

In the Pipeline Properties on the right side give the pipeline workflow a meaningful name, for example “Documents”.

...

This concludes the required setup. The steps that ensure documents are correctly indexed are:

Ingest Data

Once the pipeline workflow is set up, data ingestion can proceed like with any other data source. Make sure to select the newly created workflow to ensure the data is processed correspondingly.

...

This page can now be found at Indexing Common Documents on the Squirro Docs site.