Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This section covers adding cluster nodes to a Squirro installation. Refer to the Setup on Linux documentation on the base installation.

Table of Contents

Table of Contents
minLevel1
maxLevel7
outlinetrue
excludeTable of Contents

Overview

For background on Squirro cluster setups, refer to the How Squirro Scales document. It covers in detail what the components of Squirro are and what their scaling considerations are.

Prerequisites

Please refer to the prerequisites in the Setup on Linux document. In summary, ensure that:

  • The Linux machines are set up. Red Hat® Enterprise Linux® (RHEL) or its open source derivative CentOS Linux are both supported.

  • Network is set up and all the machines that are to become part of the Squirro cluster can talk to each other.

  • The firewalls are open between those machines with the documented ports accessible.

  • The Squirro YUM repository is configured and accessible. In enterprise environment, where this poses a problem, offline installation is available. Contact support in this case.

Overview

Any expansion of the cluster requires some work on the old and new nodes. This is outlined in the processes below by splitting the work up into sections, based on where the work is to be executed.

The process as described here involves some cluster downtime. It is possible to expand a Squirro cluster without any downtime involved - but that process requires a bit more planning and orchestration. If you need a downtime-free expansion, contact Squirro support.

Storage Node Installation

The Squirro storage nodes are based on Elasticsearch. Some of the configuration for adding a storage node is thus in the Elasticsearch configuration files.

Process on new storage node

The following process should be applied on the new storage node.

  1. Install/update the storage node package, as described in the section Storage Node Installation of Setup on Linux.

  2. Apply some of the configuration from the previous storage nodes to the new one. Copy over the following setting from /etc/elasticsearch/elasticsearch.yml

    • cluster.name

Note

Make sure that this setting value is copied from the previous storage nodes to the new one - and not the other way around.

Process on all storage nodes (Existing as well as new)

Allow the hosts to discover each other. Again in/etc/elasticsearch/elasticsearch.yml change the following settings:

...

Allow Elasticsearch to bind to a network interface by using the following config.

/etc/elasticsearch/elasticseach.yml

Code Block
languagetext
network.host: <server ip>,127.0.0.1

These is a list of the server's own IP addresses (as can be retrieved with ip addr for example), ex. 10.1.4.5

...

Set discovery.seed_hosts and cluster.initial_master_nodes to a list of all the storage nodes that have been set up. For example:

/etc/elasticsearch/elasticseach.yml

Code Block
languagetext
discovery.seed_hosts: ["<storagenode1 ip>", "<storagenode2 ip>", "<storagenode3 ip>"]
cluster.initial_master_nodes: ["<storagenode1 ip>", "<storagenode2 ip>", "<storagenode3 ip>"]

This is the easiest way to set up discovery and making sure all the Elasticsearch nodes can see each other. There are other ways of configuring discovery of the Elasticsearch nodes. This is documented by Elasticsearch in the Discovery section of the manual.

Also, optionally, set the node name to a friendlier name:

/etc/elasticsearch/elasticseach.yml

Code Block
languagetext
# User friendly node name
node.name: test-node1

...

For new nodes remove the current Elasticsearch state:

Code Block
mv /var/lib/elasticsearch/nodes /tmp/

Note: you can also remove this folder instead of moving it to /tmp. But this approach allows you to recover if you did this on the wrong node.

...

Restart the service for the settings to take effect. 

Code Block
systemctl restart elasticsearch

...

To verify the nodes discovered each other and formed a cluster, you can debug with this command:

Code Block
curl localhost:9200/_nodes?pretty | less

If successful, you should see the correct number of nodes in the output:

Code Block
languagejs
…
  "_nodes" : {
    "total" : 3,
    "successful" : 3,
    "failed" : 0
  },
…
  1. Setup number of shards and number of replicas

  2. Modify number_of_shards and number_of_replicas in /etc/elasticsearch/templates/squirro_v9.json. Usually if we have multi storage node then set number_of_replicas to 1 and number_of_shards to the number of Elasticsearch nodes.

  3. Put new templates to Elasticsearch. This will only ensure that any new indices have the updated shards and replicas setting.

    Code Block
    cd /etc/elasticsearch/templates
    ./ensure_templates.sh
  4. Put new scripts to Elasticsearch. This will only ensure that any new indices have the updated shards and replicas setting.

    Code Block
    cd /etc/elasticsearch/scripts
    ./ensure_scripts.sh
  5. Now update the shards and replicas settings of indices that were already present on the cluster before updating the templates using the curl command below. You can also selectively update the shards and replica settings of a particular index (instead of all indices) by replacing the * with the name of the index. This is a cluster wide setting and only needs to be done on one of the nodes of ES cluster.

    Code Block
    curl -XPUT http://127.0.0.1:9200/*/_settings -H "Content-Type: application/json" -d '{"index": {"number_of_replicas": 1}}'

Cluster Node Installation

Process on each new Squirro Cluster Node server

...

Install the cluster node package, as described in the section Cluster Node Installation of Setup on Linux.

...

Install the additional required package for cluster coordination (Zookeeper library): yum install squirro-python-squirro.lib.zookeeper

...

Ensure that each of the cluster node can talk to the Elasticsearch cluster (Squirro Storage Node). Change the nginx config at /etc/nginx/conf.d/upstream-elastic.inc to 

Code Block
languagebash
upstream elastic {
   server <storagenode1 ip>:9200;
   server <storagenode2 ip>:9200;
   server <storagenode3 ip>:9200;
}

...

Whitelist all the Squirro Cluster nodes in the following nginx ipfilter files:

  • /etc/nginx/conf.d/ipfilter-cluster-nodes.inc

  • /etc/nginx/conf.d/ipfilter-api-clients.inc

  • /etc/nginx/conf.d/ipfilter-monitoring.inc.

    In each of these files include each IP address as follows:

    Code Block
    languagebash
    allow <clusternode1 ip>;
    allow <clusternode2 ip>;
    allow <clusternode2 ip>;

    Alternatively the allow also allows the specification of network addresses, e.g. 10.1.4.0/24 to whitelist an entire network.

...

Reload nginx at each of the cluster nodes.

Code Block
languagebash
$ systemctl reload nginx

...

Create the MariaDB / MySQL control users required by the cluster service. Because these users have elevated permissions, they are not created automatically by the Squirro installer. To create these users, invoke the mysql command as root user, then enter the following two commands:

Code Block
languagesql
grant select, reload, super, lock tables, replication slave, create, drop, alter, insert, process on *.* to "cluster"@"localhost" identified by "CLUSTER_CLUSTER_PASSWORD";

grant replication slave, replication client on *.* to "repl"@"%" identified by "CLUSTER_REPL_PASSWORD";

Generate secure passwords for both users (needs to be the same on all cluster nodes, though), they will be added to the configuration file later.

...

Stop all the services by executing the following command:

Code Block
languagebash
cd /usr/lib/systemd/system
for service in $(ls sq*d.service); do echo "Stopping $service"; systemctl stop $service; done
for service in redis-server redis-server-cache mariadb zookeeper; do echo "Stopping $service"; systemctl stop $service; done

Squirro cluster service

...

Edit the /etc/squirro/cluster.ini configuration file.

  1. Ensure that the mgmt_iface setting under cluster section specifies a valid network interface on which all the cluster nodes can communicate with each other (you need to look it up using network settings of your cluster node, i.e. with ifconfig, and set appropriately:

    Code Block
    [cluster]
    # If the appropriate network interface is ens142
    mgmt_iface = ens142
  2. Also inside the [cluster] section, edit the following settings:

    1. id: change this to the same value as on the previous cluster nodes - ensuring it's the same value for all cluster nodes.

    2. redis_controller: set this to true so that Redis replication is managed by the Squirro cluster service.

    3. mysql_controller: set this to true so that MySQL replication is managed by the Squirro cluster service.

  3. Add the database passwords to the [mysql] section (changing the passwords to the generated values):

    Code Block
    [mysql]
    db = mysql+pymysql://cluster:CLUSTER_CLUSTER_PASSWORD@127.0.0.1:3306
    repl_password = CLUSTER_REPL_PASSWORD
  4. Add the list of all the zookeeper nodes (including this new node) to the hosts list in the [zookeeper] section:

    /etc/squirro/cluster.ini

    Code Block
    languagetext
    [zookeeper]
    hosts = <clusternode1 ip>:2181,<clusternode2 ip>:2181,<clusternode3 ip>:2181

...

Code Block
sed -i -e 's/^endpoint_discovery = false/endpoint_discovery = true/' /etc/squirro/*.ini
sed -i -e 's/^db_endpoint_discovery = false/db_endpoint_discovery = true/' /etc/squirro/*.ini

...

MySQL

  1. Enable MySQL replication. This requires two changes in /etc/mysql/conf.d/replication.cnf - both of these values are commented out by default:

    1. server_id: this integer value needs to be a unique value over the whole cluster. For example use 10 for the first server in the cluster, 11 for the second, etc.

    2. report_host: set this to the human-readable name of the server, as it should be reported to the other hosts - for example node01.

  2. Raise the MySQL limits on open files and maximum connections.

    /etc/mysql/conf.d/maxconnections.cnf

    Code Block
    languagetext
    [mysqld]
    open_files_limit = 8192
    max_connections = 500

    The max_connections setting should be set higher depending on number of cluster nodes. We recommend at least 150 connections for each cluster node.

Redis

Extend the list of listening addresses for redis-server and redis-server-cache services, by editing /etc/redis/redis.conf and /etc/redis/cache.conf and listing all the cluster nodes (including this new server):

/etc/redis/redis.conf and /etc/redis/cache.conf

Code Block
languagetext
bind 0.0.0.0

...

Zookeeper

  1. Set the unique Zookeeper node identifier. This ID needs to start at 1, and then for each node incremented by 1. Write this identifier to /var/lib/zookeeper/data/myid.

  2. Add a list of all cluster nodes to Zookeeper. Edit /etc/zookeeper/zoo.cfg and list all the cluster nodes (including this new server):

    /etc/zookeeper/zoo.cfg

    Code Block
    languagetext
    server.1=<clusternode1 ip>:2888:3888
    server.2=<clusternode2 ip>:2888:3888
    server.3=<clusternode3 ip>:2888:3888
  3. Start services necessary for a leader election. Do NOT start the cluster service yet to avoid promoting the new node to master. We will only start the cluster service on this node after making sure that a leader has been elected from one of the existing nodes by following the next step.

    Code Block
    systemctl start zookeeper
    systemctl start redis-server
    systemctl start redis-server-cache
    systemctl start mariadb
  4. At this point follow the Squirro Cluster Expansion#Process on all the other cluster nodes section.

...

Starting

  1. Start the cluster node:

    Code Block
    systemctl start sqclusterd
  2. Wait for the cluster node to come up. Make sure the election leader is the same one as on the previous nodes.

    Code Block
    curl -s http://127.0.0.1/service/cluster/v0/leader/cluster.json | python -mjson.tool | grep electionLeader

    This command may have be repeated a few times until a result is returned.

  3. Start all other services:

    Centos 7

    Code Block
    cd /usr/lib/systemd/system
    for service in $(ls sq*d.service); do echo "Starting $service"; systemctl start $service; done

Process on all the other cluster nodes (existing nodes, before the cluster expansion)

This process needs to happen together with the Zookeeper configuration on the new cluster node.

  1. Add the new servers to the Zookeeper configuration. Edit /etc/zookeeper/zoo.cfg and list all the cluster nodes (including this new server):

    /etc/zookeeper/zoo.cfg

    Code Block
    languagetext
    server.1=<clusternode1 ip>:2888:3888
    server.2=<clusternode2 ip>:2888:3888
    server.3=<clusternode3 ip>:2888:3888

    This list should be identical on every cluster node.

  2. Add the list of the new Zookeeper node to the existing list of zookeeper nodes in Squirro cluster service config file also. Edit /etc/squirro/cluster.ini and list all the zookeeper hosts 

    /etc/squirro/cluster.ini

    Code Block
    languagetext
    [zookeeper]
    hosts = <clusternode1 ip>:2181,<clusternode2 ip>:2181,<clusternode3 ip>:2181
  3. Extend the list of listening addresses for redis-server and redis-server-cache services, by editing /etc/redis/redis.conf and /etc/redis/cache.conf and listing all the cluster nodes:

    /etc/redis/redis.conf and /etc/redis/cache.conf

    Code Block
    languagetext
    bind 127.0.0.1 <clusternode1 ip> <clusternode2 ip> <clusternode3 ip>
  4. Restart Redis, Zookeeper and then cluster service:

    Code Block
    systemctl restart redis-server
    systemctl restart redis-server-cache
    systemctl restart zookeeper
    systemctl restart sqclusterd
  5. Check that the election leader points to one of the existing nodes:

    Code Block
    curl -s http://127.0.0.1/service/cluster/v0/leader/cluster.json | python -mjson.tool | grep electionLeader

    This will output a line containing the node that has currently been selected as the leader by the Squirro cluster node.

Setting up Cluster Node Storage

Some parts of Squirro require a shared file system. This is used for:

  • Uploading data loader plugins, pipelets and custom widgets to a cluster

  • Handling of the trend detection training data

  • Uploading of files through the Squirro frontend and handling of crawling output

  • Indexing binary data files

This shared file system can be provided through any means, such as a NAS or an existing clustering file system.

The following instructions show how to set up such a shared file system with GlusterFS, a clustered file system.

All of the following commands - except where otherwise stated - are executed on the new node being set up.

...

Install GlusterFS server

Code Block
yum install -y glusterfs-server
systemctl enable glusterd
systemctl start glusterd
restorecon /var/run/glusterd.socket

...

Set up connectivity

  • On the new node, connect to all the previous cluster nodes with the following commands (repeated once for every node):

    Code Block
    gluster peer probe <clusternode1 ip>
    gluster peer probe <clusternode2 ip>
  • On all previous nodes execute the following command (with the IP of the new server that is being added):

    Code Block
    gluster peer probe <clusternode3 ip>

Create or extend the volume

If this is the first installation, then create the cluster file system. For fresh installations steps 1-2 can be executed on all the servers first, to then execute the "volume create" command here just once.

Code Block
gluster volume create gv0 replica 3 <clusternode1 ip>:/var/lib/squirro/storage/gv0/brick0 <clusternode2 ip>:/var/lib/squirro/storage/gv0/brick0 <clusternode3 ip>:/var/lib/squirro/storage/gv0/brick0 force
gluster volume start gv0

...

If this is a new server, being added to an existing GlusterFS installation, then execute this command to use this server for the volume:

Code Block
gluster volume add-brick gv0 <clusternode3 ip>:/var/lib/squirro/storage/gv0/brick0 force

 The force option at the end confirms that we are okay to create the volume on the linux system's root file system partition.

...

Configure the cluster storage volume to be mounted. Add the following line to /etc/fstab:

/etc/fstab (excerpt)

Code Block
languagetext
127.0.0.1:/gv0 /mnt/gv0 glusterfs defaults 0 0

...

Then create the mount-point and mount the new file system:

Code Block
mkdir -p /mnt/gv0
mount /mnt/gv0

...

Set up all the required directories in the shared file system:

Code Block
install -o sqprovid -g squirro -m 775 -d /mnt/gv0/storage
install -o sqplumbr -g squirro -d /mnt/gv0/pipelets
install -o sqtopic -g squirro -d /mnt/gv0/assets
install -o sqtopic -g squirro -d /mnt/gv0/widgets
install -o sqtopic -g squirro -d /mnt/gv0/machinelearning
install -o sqtopic -g squirro -d /mnt/gv0/machinelearning/job_logs

Change the configuration of the various Squirro services to point to the right folder. Below you see for each config file the desired sections and values - all the values that have been left out here should be left unmodified.

/etc/squirro/storage.ini (excerpt)

Code Block
languagetext
[storage]
default_bucket = cluster

/etc/squirro/plumber.ini (excerpt)

Code Block
languagetext
[storage]
directory = /mnt/gv0/pipelets/

...

Code Block
languagetext
[topic]
custom_assets_directory = /mnt/gv0/assets/
custom_widgets_directory = /mnt/gv0/widgets/

...

Code Block
languagetext
[machinelearning]
# File system storage for machine learning models/assets
inference_storage = /mnt/gv0/machinelearning
training_storage = /mnt/gv0/machinelearning

[handler_file]
application = machinelearning
jobs_log_dir = /mnt/gv0/machinelearning/job_logs

...

Replace the previous assets and widgets folders with symlinks:

Code Block
rm -ir /var/lib/squirro/topic/assets
rm -ir /var/lib/squirro/topic/widgets
ln -s /mnt/gv0/assets /var/lib/squirro/topic/assets
ln -s /mnt/gv0/widgets /var/lib/squirro/topic/widgets

...

In the nginx config file /etc/nginx/conf.d/frontend.conf change a few of the alias declarations:

  1. Inside the location /storage/localfile/ block change the alias from its default to alias /mnt/gv0/storage/

  2. Verify that the configuration is still valid:

    Code Block
    nginx -t
  3. Reload the nginx configuration:

    Code Block
    systemctl reload nginx

...

Restart all services:

Code Block
squirro_restart

Troubleshooting

Network drop between the servers in the cluster

This could be caused by a network monitoring tool closing all idle connections at periodic interval. In this cases, try lowering the TCP keep-alive used by the system and services:

Example, setting the value to 600s:

  • Change /proc/sys/net/ipv4/tcp_keepalive_time value to 600:

Code Block
# echo 600 > /proc/sys/net/ipv4/tcp_keepalive_time
  • Change “tcp-keepalive” value to 600 in /etc/redis/redis.conf

  • Add the following to /etc/zookeeper/zoo.cfg:

Code Block
tcpKeepAlive=true

Add a new file /etc/sysctl.d/98-elasticsearch.conf with the following content:

...

page can now be found at Squirro Cluster Expansion on the Squirro Docs site.