...
tag_phrases
: Enable / Disable key-phrase taggingtag_top_k_phrases
: Amount of phrases to tagtag_phrases
: Enable / Disable key-phrase taggingtag_top_k_phrases
: Amount of phrases to tagdynamic
: Total amount of phrases selected relative to document size (between 20 - 70)10
: Take N highest ranked phrases as specified
tag_topics
: Enable simple topic-tagging based on key-phrases
Enrichment
Key phrases are stored within the facet nlp_tag__phrases
.
Additionally, the The item’s Title
is also selectedadded.
Application
Content based auto-completion (type-ahead)
Significant-terms aggregation on search results
Simple Topic Detection
With configuration tag_topics:True
, the pool of ranked key-phrases is used to extract cleaned, deduplicated phrases referred to as “topics” (stored in nlp_tag__topics
).
Concept
Code Block |
---|
- Filter steps:
- Remove terms with POS ["ADJ", "DET", "PUNCT"]
- Remove terms containing (almost) only number characters, like `33120x`
- De-Duplicate:
- Skip phrases that are also detected in NER-TAGS ["PRODUCT", "EVENT", "PERSON"] (configurable)
- Skip phrases that contain terms from already stored "topics"
- Select 20 phrases evenly across all ranks (as determined via TextRank) |
...
Named Entity Recognition
(Optional)
Store recognised entities within their corresponding facet, like .
...