Setup on Linux

This section covers installing Squirro on a supported Linux system, either Red Hat® Enterprise Linux® (RHEL) or its open source derivative CentOS Linux.

In addition to this installation method, Squirro also provides ready-made images for VMWare and VirtualBox, that are fully configured. See the Installation section for information on these methods.

Table of Contents

Architecture Overview

A Squirro cluster can contain anywhere from just one to dozens of servers. Adding additional servers to a Squirro cluster always serves two different needs: performance and capacity. As these needs do not necessarily scale in the same way, Squirro differentiates two different types of servers:

  • Storage Nodes: stores the data that is indexed into Squirro. Add more storage nodes to scale capacity.
  • Cluster Nodes: answers requests by users and handles the processing of items that come in. Add more cluster nodes to improve performance.

Storage and cluster nodes can be installed on the same server, and for a single-node setup that is the recommended approach. But if you intend to scale to more than just one server, it is recommended to only install one of the two node types on a server. That makes better use of the available performance and allows for more flexible scaling.

Prerequisites

This manual assumes that a few prerequisites have been met. This includes a set up system with internet connectivity and some relevant ports opened in the firewall.

If some of these conditions can not be met, please contact support.

Linux setup

The installation of the base system is not covered in this manual. A fully functional RHEL or CentOS installation is assumed to be set up. The supported versions of these Linux distributions are documented on the System Requirements page.

If you want to avoid setting up Linux yourself, there are pre-installed images available for VMWare and VirtualBox. See the Installation section for information on these methods.

Networking

The Linux system needs internet access to download the Squirro packages. Additionally the system should be accessible from your internal network, so that the Squirro interface can be accessed by administrators and users.

Information on how to set up networking is provided by Red Hat in their networking guide.

Firewall

Incoming traffic to Squirro servers should be open on a number of TCP ports. The individual ports and the required access level are documented below:

Storage Nodes

TCP PortUsageOpen for
9200Elasticsearch accessAll storage and Squirro nodes
9300Elasticsearch replicationAll storage nodes

Squirro Nodes

TCP PortUsageOpen for
80Web access
  • All Squirro nodes
  • Optionally for all clients if HTTPS access (port 443) is not desired
111Gluster
  • All Squirro nodes (in multi-node setups)
443Web access (SSL-protected)
  • All clients
2181Zookeeper
  • All Squirro nodes
2888Zookeeper node discovery
  • All Squirro nodes (in multi-node setups)
3306MySql
  • All Squirro nodes
3888Zookeeper node discovery
  • All Squirro nodes (in multi-node setups)
6379Redis (storage)
  • All Squirro nodes
6380Redis (cache)
  • All Squirro nodes
24007Gluster
  • All Squirro nodes (in multi-node setups)
49152+ (one per node, e.g. range 49152 – 49155 for a cluster of three nodes)Gluster nodes
  • All Squirro nodes (in multi-node setups)

Users

Squirro provides packages that sets up all the Linux users required on a system. Those packages are used in the instructions below.

However in some environments users must not be created by packages. For these cases, users have to be manually created up-front. See the separate page Linux users for Squirro for a detailed list on users that need to be set up.

YUM Repositories

Squirro packages are provided with a Yum repository. Yum is a utility, provided with the RHEL / CentOS Linux system, that downloads and installs packages from central repositories. To configure a new such repository, it needs to be set up using a file in the folder /etc/yum.repos.d.

Getting the Right version of Squirro

Please note that for production deployments, we recommend to use the latest LTS release from Squirro. This can be obtained by using the 3.2-lts in the baseurl of the yum repos config (see below) for the latest LTS release of 3.2 family. This will ensure that a simple yum update on the server will not update the server to the latest bi-weekly release.

Also note that, when Squirro release the next LTS release 3.3-lts, even then it has to be a conscious choice to point the link in the baseurl to the LTS release of 3.3 family. The previous 3.2-lts link will not get updated to point to the LTS release of 3.3 family.

Squirro

Create the file /etc/yum.repos.d/squirro.repo. The exact content of the file will be provided to you by Squirro support, when delivering the license.

Use the following examples, but note that username and password are not filled in:

CentOS 7

/etc/yum.repos.d/squirro.repo
[squirro-stable]
name=Squirro - CentOS 7 - Stable
baseurl=https://<user name>:<password>@mirror.squirro.net/centos/7/stable/$basearch/<specific version or 'latest' or '3.2-lts'>/
enabled=1
gpgkey=https://mirror.squirro.net/centos/RPM-GPG-KEY-squirro-stable
gpgcheck=1
sslverify=1
metadata_expire=300


RHEL 7

/etc/yum.repos.d/squirro.repo
[squirro-stable]
name=Squirro - Red Hat Enterprise Linux 7 - Stable
baseurl=https://<user name>:<password>@mirror.squirro.net/centos/7/stable/$basearch/<specific version or 'latest' or '3.2-lts'>/
enabled=1
gpgkey=https://mirror.squirro.net/rhel/RPM-GPG-KEY-squirro-stable
gpgcheck=1
sslverify=1
metadata_expire=300


CentOS 8

/etc/yum.repos.d/squirro.repo
[squirro-stable]
name=Squirro - CentOS 8 - Stable
baseurl=https://<user name>:<password>@mirror.squirro.net/centos/8/stable/$basearch/<specific version or 'latest' or '3.2-lts'>/
enabled=1
gpgkey=https://mirror.squirro.net/centos/RPM-GPG-KEY-squirro-stable
gpgcheck=1
sslverify=1
metadata_expire=300


RHEL 8

/etc/yum.repos.d/squirro.repo
[squirro-stable]
name=Squirro - Red Hat Enterprise Linux 8 - Stable
baseurl=https://<user name>:<password>@mirror.squirro.net/centos/8/stable/$basearch/<specific version or 'latest' or '3.2-lts'>/
enabled=1
gpgkey=https://mirror.squirro.net/rhel/RPM-GPG-KEY-squirro-stable
gpgcheck=1
sslverify=1
metadata_expire=300

Storage Node Installation

Installation of the storage node happens with two separate packages. The first package installs the required Linux users and the second installs the services. Furthermore, we also need to explicitly install a few dependencies. Use the following commands for the installation:

Java

sudo su
yum install java-1.8.0-openjdk

Squirro Storage Node

sudo su
yum install squirro-storage-node-users
yum install elasticsearch
yum install squirro-storage-node

Network connectivity

If you are setting up a dedicated storage node, instead of mixing cluster and storage node on the same server, then you need to change the Elasticsearch configuration so it listens on a network IP address.

To do this, edit /etc/elasticsearch/elasticsearch.yml to add the server's IP address to the network.host setting. When doing this, you also need to declare the discovery.seed_hosts. To actually set up a cluster, see Squirro Cluster Expansion - the following value will work only for the single-node cluster case.

/etc/elasticsearch/elasticseach.yml
network.host: <storage node IP>,127.0.0.1
discovery.seed_hosts: ["127.0.0.1"]

Cluster Node Installation

Installation of the Squirro cluster node happens with two separate packages. The first package installs the required Linux users and the second installs the services. Furthermore, we also need to explicitly install a few dependencies. Use the following commands for the installation:

Java

Squirro depends on JRE which is provided in the Squirro YUm repository itself. But this package has to be installed explicitly as none of the Squirro packages declare an explicit dependency on JRE. This is done to provide more flexibility in making Squirro work for various custom deployments where we do not control the version of Java installed on the server. Please execute the yum command below to install JRE

yum install java-1.8.0-openjdk


Squirro Cluster node

You can choose to run MySql server and Redis servers remotely, i.e. not residing on the Squirro Cluster Node if you go through the trouble of setting up MySql and Redis Server installations with a specific configuration.

To set up Squirro with remote MySql server and Redis server "Backends", please create a readable file /etc/squirro/backends.ini with the content:

is_mysql_server_remote = true
is_redis_server_remote = true

Installation of the Squirro cluster node happens with two separate packages. The first package installs the required Linux users and the second installs the services. Use the following commands for the installation:

yum install squirro-cluster-node-users
yum install squirro-cluster-node

If the Storage and Cluster node are not the same physical machine, then you need now to adjust the file /etc/nginx/conf.d/upstream-elastic.inc to point to the IP or Hostnames of the storage node(s).

Examples:

Single Server both rolesDedicated Storage NodeMultiple Storage Nodes
upstream elastic {
    server 127.0.0.1:9200;
    keepalive 32;
}
upstream elastic {
 server 192.168.0.20:9200;
 keepalive 32;
}
upstream elastic {
 server 192.168.0.20:9200;
 server 192.168.0.21:9200;
 server 192.168.0.22:9200;
 keepalive 32;
}

If changes have been made to this file, reload the nginx configuration

service nginx reload

If you have chosen to rely on remote MySql and/or Redis server installations, please follow the steps on Setup on Linux with Remote MySql and Redis Servers.

Finally start the Squirro Services using the squirro_start command. 

RHEL7 / Centos 7 And RHEL8/Centos8
source /etc/profile.d/squirro-aliases.sh
squirro_restart

Please note that this command will not start the cluster service by default. If you want to start the cluster service also, please follow the second systemctl command.

systemctl start sqclusterd 

Multi-node cluster setup

There are some additional steps that need to be performed when you wish to run Squirro across multiple nodes, as some orchestration between cluster members is needed. 

This is handled by a service called zookeeper and we provide our own zookeeper library with Squirro - it needs to be installed separately, though, with:

yum install squirro-python-squirro.lib.zookeeper