Note
WORK IN PROGRESS

The PDF OCR (optical character recognition) step converts images of text embedded in PDF files into machine-encoded text. Images of text are typically found in PDFs of scanned documents.

Overview

The PDF OCR step belongs to the Enrich-steps. It extracts text from files of MIME type application/pdf that don’t contain machine-readable text.

The Content Extraction (formerly Content Conversion) step is required to run after the PDF OCR step.

Configuration

...

Field

...

Default

...

Description

...

UI Setting

...

replace_file

...

True

...

Replace original PDF file

...

ocr_timeout

...

60

...

Maximum time in seconds spent on OCR per document

...

This page can now be found at PDF OCR on the Squirro Docs site.

Versions Compared

Old Version 1

New Version Current

Key

Overview

Configuration

Page Comparison

Versions Compared

Old Version 1

New Version Current

Key

Overview

Configuration