The built-in “Nlp Keyphrase Tagger” pipes items through a configurable SpaCy Pipeline to perform Key-Phrase Extraction and additionally Named Entity Recognition as well as Rule-Based Sentiment Analysis.
With configuration tag_topics:True, the pool of ranked key-phrases is used to extract cleaned, deduplicated phrases referred to as “topics” (stored in facet:nlp_tag__topics).
- Filter steps:
- Remove terms with POS ["ADJ", "DET", "PUNCT"]
- Remove terms containing (almost) only number characters, like `33120x`
- Skip phrases that are also detected in NER-TAGS ["PRODUCT", "EVENT", "PERSON"] (configurable)
- Skip phrases that contain terms from already stored "topics"
- Select 20 phrases evenly across all ranks (as determined via TextRank)
Named Entity Recognition
(Optional) Store recognised entities within their corresponding facet, like .
tag_entities : Enable entity (NER) tagging.
collect_entities : Specify NER tags to be added. (Check support on installed Label Scheme).
tag_entities_per_type : Amount of entities (per type) to be added to their corresponding facet.
One facet per entity, like Location = [Europe, London]
Applies rule based sentiment analysis (vaderSentiment) that is specifically attuned to sentiments expressed in social media or domains like NY Times editorials, movie reviews, and product reviews. It doesn’t require any training data but is constructed from a generalizable, valence-based, human-curated gold standard sentiment lexicon.
tag_sentiment : Enable rule-based sentiment tagging (for english language only)
Overall Sentiment Label facet:sentiment_pretrained One sentiment label (neutral, positive, negative) per document.
Sentiment analysis is applied per sentence
Sentences with neutral sentiment are skipped
Overall Sentiment Score facet:nlp_tag__sentiment_score Float value within [-1,+1]
Sentiment Assessment facet:positive_terms, facet:negative_terms A sentiment phrase consists of the valence-term and it’s context. \
Positive Product Feedback
"The tech provides insight into unstructured email content, it allows me to truly understand the conversation between the business and our customers. The insight gained from this analysis is significantly deeper than cam be achieved from structured data analysis” *Copied from Gartner