Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


Warning
titleDeprecation notice

This tool is deprecated and is no longer actively supported.

Intro

For the Squirro SmartFilters to produce accurate results, a list of term frequency of all documents in your various indexes has to be maintained.
This is called GDF or Global Document Frequency or GDFS 

...

Code Block
#location of the es data folder
elasticsearch_data_folder = /var/lib/elasticsearch
 
#space seperatedseparated list of indexes, or all
indexes = all
 
#where the data will be saved
target_folder = /tmp
 
#how many files per language should be created
files_per_language = 8
 
#should numbers and floats be removed?
remove_numbers = true
 
#terms with less than this amount of documents will be deleted from the gfds list
frequency_lower_limit = 10
 
#languages to extract
languages = en

...