Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Increase the min_feature_count configuration setting (see fingerprint.ini). This way a term needs to appear more often in the training documents to be considered.
    The default value of min_feature_count is 1, which means that a term is potentially included in the Smart Filter even if it appears only once in all the training documents together. This can happen especially for terms which are not present in the language model and because of that are calculated to have a high weight. A common example of this happening is names or spelling mistakes.
  • Exclude irrelevant terms. Note that this may simply promote the next worst term and reducing the number of entities should also be considered (see Max. Number of Entities86131110 below)
  • Remove irrelevant content from the training documents. For example headers or footers. This can often be achieved with enrichments, e.g. a pipelet.

Negative Training

When training a Smart Filter items can be added both as positive or negative items. The negative content trains the Smart Filters to detect content that should be excluded.

The considerations on Improving Relevance 86131110 apply even more when using Negative Training or you quickly end up with very nonsensical concepts.

...