The built-in “Nlp Keyphrase Tagger” pipes items through a configurable SpaCy Pipeline to perform Key-Phrase Extraction and additionally Named Entity Recognition as well as Rule-Based Sentiment Analysis.

General Configuration

The Pipelet is configurable within the pipeline Editor.

PDF Approximation

To reduce processing time of big PDFs, consider only a subset of pages.

Language Support

Per default english (en_core_web_sm) and german (de_core_news_sm) models are installed on Squirro instances.


Key-Phrase Extraction

Extract highest ranked key-phrases based on the TextRank algorithm.
Key phrases are selected and ranked from a pool of recognised Noun Chunks and recognised Named Entities per item.

Configuration

Enrichment

Key phrases are stored within the facet:nlp_tag__phrases.
The item’s Title is also added.

Application

Simple Topic Detection

With configuration tag_topics:True, the pool of ranked key-phrases is used to extract cleaned, deduplicated phrases referred to as “topics” (stored in facet:nlp_tag__topics).

Concept

- Filter steps:
  - Remove terms with POS ["ADJ", "DET", "PUNCT"]
  - Remove terms containing (almost) only number characters, like `33120x`
  - De-Duplicate:
      - Skip phrases that are also detected in NER-TAGS ["PRODUCT", "EVENT", "PERSON"] (configurable)
      - Skip phrases that contain terms from already stored "topics"
- Select 20 phrases evenly across all ranks (as determined via TextRank)


Named Entity Recognition

(Optional)
Store recognised entities within their corresponding facet, like .

Configuration

Enrichment

One facet per entity, like Location = [Europe, London]


Sentiment Analysis

Applies rule based sentiment analysis (vaderSentiment) that is specifically attuned to sentiments expressed in social media or domains like NY Times editorials, movie reviews, and product reviews.
It doesn’t require any training data but is constructed from a generalizable, valence-based, human-curated gold standard sentiment lexicon.

Configuration

Enrichment

Examples

Positive Product Feedback

"The tech provides insight into unstructured email content, it allows me to truly understand the conversation between the business and our customers. The insight gained from this analysis is significantly deeper than cam be achieved from structured data analysis*Copied from Gartner

{ 
'sentiment_pretrained': ['positive'], 
'positive_terms': ['truly understand', 'insight gained'], 
'negative_terms': [],
'nlp_tag__phrases': ['structured data analysis', 'unstructured email content' ]
}

→ That review showcases the combined insights gained through sentiment-assessment and key-phrase extraction.

Negative Feedback

“This was not a good experience”

{
'sentiment_pretrained': ['negative'], 
'positive_terms': [], 
'negative_terms': ['not a good experience']
}