Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • tag_phrases: Enable / Disable key-phrase tagging

  • tag_top_k_phrases: Amount of phrases to tag

    • tag_phrases: Enable / Disable key-phrase tagging

    • tag_top_k_phrases: Amount of phrases to tag

      • dynamic : Total amount of phrases selected relative to document size (between 20 - 70)

      • 10 : Take N highest ranked phrases as specified

    • tag_topics: Enable simple topic-tagging based on key-phrases

Enrichment

Key phrases are stored within the facet nlp_tag__phrases.
Additionally, the The item’s Title is also selectedadded.

Application

Simple Topic Detection

With configuration tag_topics:True, the pool of ranked key-phrases is used to extract cleaned, deduplicated phrases referred to as “topics” (stored in nlp_tag__topics).

Concept

Code Block
- Filter steps:
  - Remove terms with POS ["ADJ", "DET", "PUNCT"]
  - Remove terms containing (almost) only number characters, like `33120x`
  - De-Duplicate:
      - Skip phrases that are also detected in NER-TAGS ["PRODUCT", "EVENT", "PERSON"] (configurable)
      - Skip phrases that contain terms from already stored "topics"
- Select 20 phrases evenly across all ranks (as determined via TextRank)

...

Named Entity Recognition

(Optional)
Store recognised entities within their corresponding facet, like .

...