The content cleanup enrichment cleans up incoming text and removes potentially malicious content from the HTML body.
Table of Contents
Overview
The Content Standardization step is used to clean content as it comes in. From text fields, such as title
or summary
, any HTML tags are removed. From the HTML field body
potentially harmful tags and attributes are removed, such as script tags.
This step also ensures that the summary
item field is set, thus ensuring a good display of items in result lists. If the summary is manually mapped when loading data this can be omitted.
When dealing with data from untrusted sources, this step should always be used.
Configuration
This step does not take any configuration.