The unshorten link pipeline step resolves the link and expands it to the long version. This ensures that short URLs are indexed with the long version. This helps the Duplicate Detection which relies on a combination of title and link by default.
Enrichment name | unshorten-link |
---|---|
Stage | deduplication |
Enabled by default | Yes, except for the bulk provider (affects items that are uploaded through the ItemUploader, DocumentUploader, File Importer, etc.) |
During the unshorten-link
step, the link
field of items is expanded to resolve any HTTP redirects. This ensures that tiny URLs e.g. from Twitter posts are expanded to their long version.
Because this step has to do requests to the web sites, it will add delays to the pipeline processing. If your data source does not contain shortened URLs, then you can disable this step using the processing config.
There are no configuration options for this enrichment, with the exception of the enabled
property to enable and disable it.