Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

With configuration tag_topics:True, the pool of ranked key-phrases is used to extract cleaned, deduplicated phrases referred to as “topics” (stored in facet:nlp_tag__topics).

Concept

Code Block
breakoutModewide
- Filter steps:
  - Remove terms with Part-of-Speech (POS) ["ADJ", "DET", "PUNCT"]
  - Remove terms containing (almost) only number characters, like `33120x`
  - De-Duplicate:
      - Skip phrases that are also detected in NER-TAGS ["PRODUCT", "EVENT", "PERSON"] (configurable)
      - Skip phrases that contain terms from already stored "topics"
- Select 20 phrases evenly across all ranks (as determined via TextRank)

...