...
fields_to_consider :
Comma separated list of fields (default:title,body
)
...
Reduce processing time for large documents
To reduce processing time of big PDFs, consider only a subset of pages.
process_pages
:dynamic
: Chosen relative to document size (default)
Take at least10
pages, but at most√total_pages
all
: Take all pages.int
: Take firstN
pages.
Additionally, it is possible to specify a hard limit of characters to be processed at most. This helps to reduce processing time especially for large non-binary documents like HTML, or Emails (“flat-items”).
max_characters_to_process
:all
: Analyse full contentint
: Take firstN
characters, default is50000
Language Support
Per default english (en_core_web_sm) and german (de_core_news_sm) models are installed on Squirro instances.
...