Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Increase the min_feature_count configuration setting (see fingerprint.ini). This way a word term needs to appear more often in the training text documents to be considered.
    The default value of min_feature_count is 1, which means that a term is potentially included in the Smart Filter even if it appears only appears once in all the training contentdocuments together. This can happen especially for terms which are not present in the language model and because of that are calculated to have a high weight. A common example of this happening is names or spelling mistakes.
  • Exclude irrelevant terms. Note that this may simply promote the next worst term and reducing the number of entities should also be considered (see Max. Number of Entities below)
  • Remove irrelevant content from the training documents. For example headers or footers. This can often be achieved with enrichments, e.g. a pipelet.

...