Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Next »

The Thumbnail Extraction pipeline step finds a thumbnail image to represent the Squirro item.

Enrichment nameThumbnail Extraction, internally referred to as "webshot"
Stageprocessing
Enabled  by defaultYes

Table of Contents

Overview

There are two ways that Squirro can find the right thumbnail for the item:

  • If the webshot_picture_hint field points to a valid image URL, that image is used as the thumbnail.
  • Alternatively the web site is downloaded and analyzed to find the most prominent image.

Configuration

Thumbnail extraction relies on an Amazon Web Services S3 configuration to store images for thumbnails and to retrieve thumbnails for display. Configure the following files:

Configuration FileExample
/etc/squirro/common.ini
/etc/squirro/common.ini
[services_external]
thumbler = //thumbler-testing.squirro.net

[thumbler_salt]
thumb = <salt_1>

/etc/squirro/webshot.ini
/etc/squirro/webshot.ini
[aws]
access_key = <key_1>
secret_key = <key_2>
s3_bucket = webshot.testing.squirro.net
s3_base_url = http://webshot.testing.squirro.net.s3-website-eu-west-1.amazonaws.com/

[webshot]
use_thumbler = True
thumbler_config = thumb
thumbler_bucket = webshot
thumbler_salt = <salt_1>

Then restart the sqwebshotd service.


/etc/squirro/thumbler.ini
/etc/squirro/thumbler.ini
[bucket_webshot]
is_s3 = True
access_key = <key_1>
secret_key = <key_2>
s3_bucket = webshot.testing.squirro.net

[config_thumb]
operation = scale
salt = <salt_1>

Then restart the sqthumblerd service.


URL and webserver configuration to forward 
Example based on nginx: /etc/nginx/conf.d/thumber.conf
upstream thumbler-testing {
    server ip-squirro-cluster-node:443;
}

server {
    listen 443 ssl;
    server_name  thumbler-testing.squirro.net;

    ssl_certificate <ssl_certificate_1>;
    ssl_certificate_key <ssl_key_1;

    location / {
        proxy_pass https://thumbler-testing/service/thumbler/;
        proxy_set_header Host $host;
        proxy_set_header Connection Close;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_redirect    off;
        proxy_read_timeout 60;
    }

    # redirect server error pages to the static page /50x.html
    #
    error_page   500 502 503 504  /50x.html;
    location = /50x.html {
        root   /usr/share/nginx/html;
    }
}

Then reload the nginx service or other web server you may be using.



  • No labels