Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
breakoutModewide
-1) FilterCleaning stepsSteps:
  - Remove terms with specific Part-of-Speech (POS) ["ADJ"tag, "DET", "PUNCT"]like `adjectives`, `determiners` or `punctuation`. 
  - Remove terms containing (almost) only number characters, like `33120x`
  - De-Duplicate:
      - Do not Skipuse phrases that are also detected in NER-TAGSbelong to a specific Named Entity, like ["PRODUCT", "EVENT", "PERSON"] (configurable)
      - SkipDo not use phrases that containhave overlapping terms fromas already stored "topics"
-2) Select 20 phrases evenly across all ranks (as determined via TextRank)

...