...
Code Block | ||
---|---|---|
| ||
-1) FilterCleaning stepsSteps: - Remove terms with specific Part-of-Speech (POS) ["ADJ"tag, "DET", "PUNCT"]like `adjectives`, `determiners` or `punctuation`. - Remove terms containing (almost) only number characters, like `33120x` - De-Duplicate: - Do not Skipuse phrases that are also detected in NER-TAGSbelong to a specific Named Entity, like ["PRODUCT", "EVENT", "PERSON"] (configurable) - SkipDo not use phrases that containhave overlapping terms fromas already stored "topics" -2) Select 20 phrases evenly across all ranks (as determined via TextRank) |
...