Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • fields_to_consider : Comma separated list of fields (default: title,body)

...

Reduce processing time for large documents

To reduce processing time of big PDFs, consider only a subset of pages.

  • process_pages :

    • dynamic : Chosen relative to document size (default)
      Take at least 10 pages, but at most √total_pages

    • all : Take all pages.

    • int : Take first N pages.

Additionally, it is possible to specify a hard limit of characters to be processed at most. This helps to reduce processing time especially for large non-binary documents like HTML, or Emails (“flat-items”).

  • max_characters_to_process :

    • all : Analyse full content

    • int : Take first N characters, default is 50000

Language Support

Per default english (en_core_web_sm) and german (de_core_news_sm) models are installed on Squirro instances.

...