...
With configuration tag_topics:True
, the pool of ranked key-phrases is used to extract cleaned, deduplicated phrases referred to as “topics” (stored in facet:nlp_tag__topics
).
Concept
Code Block | ||
---|---|---|
| ||
- Filter steps: - Remove terms with Part-of-Speech (POS) ["ADJ", "DET", "PUNCT"] - Remove terms containing (almost) only number characters, like `33120x` - De-Duplicate: - Skip phrases that are also detected in NER-TAGS ["PRODUCT", "EVENT", "PERSON"] (configurable) - Skip phrases that contain terms from already stored "topics" - Select 20 phrases evenly across all ranks (as determined via TextRank) |
...