Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Squirro relies on Elasticsearch for the storage nodes where the indexed data is persisted. This guide outlines Elasticsearch configuration changes that are often used in the context of Squirro.

Please also refer to the official Elasticsearch documentation for help.

Table of Contents

Table of Contents
outlinetrue
excludeTable of Contents

Where I can find

For default setup of squirro, here are most important Elasticsearch folders and files on the squirro storage node:

...

Tool to support ES operation

Since version 5.x Elasticsearch does not allow to run some plugins likes head or kopf anymore. You can use an addon to see status of Elasticsearch cluster, index and shards during upgrading or any operation. For example cerebro:

Installation:

If you want to use port 9500 as port forward to Elasticsearch node, then use: ssh -L 9500:storagehost:9200. After that you can connect cerebro to http://localhost:9500 to see status of  Elasticsearch cluster.

Monitoring

Check cluster/nodes is healthy

Code Block
languagebash
curl 'http://localhost:9200/_cluster/health?pretty=true'

you should see a json containing "status" : "green"

Check elasticsearch access from cluster node

Cluster node needs to access storage node through nginx service. To check that access:

Code Block
languagebash
curl -L 'http://127.0.0.1:81/ext/elastic/_cluster/health?pretty=true'

you should see json containing "status" : "green"

Check Elasticsearch service is started

Code Block
languagebash
ps aux | grep elasticsearch | grep -v grep

This command also allows you to view plenty settings of elasticsearch service if it's started, e.g. memory, default.path, process id:

Code Block
languagebash
496 122224 1 11 10:20 ? 00:02:14 /usr/bin/java -Xms6g -Xmx6g -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+AlwaysPreTouch -server -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -Djdk.io.permissionsUseCanonicalPath=true -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0 -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -Dlog4j.skipJansi=true -XX:+HeapDumpOnOutOfMemoryError -Djava.security.policy=file:///etc/elasticsearch/squirro.java.policy -Des.path.home=/usr/share/elasticsearch -cp /usr/share/elasticsearch/lib/* org.elasticsearch.bootstrap.Elasticsearch -p /var/run/elasticsearch/elasticsearch.pid -d -Edefault.path.logs=/var/log/elasticsearch -Edefault.path.data=/var/lib/elasticsearch -Edefault.path.conf=/etc/elasticsearch

View indices, status and size

Each squirro project by default corresponding to an index and have the format: squirro_{template-version}_{lowercase-project-id}, e.g: squirro_v8_36syha3_ss-zwvn9gyk1ww 

Code Block
languagebash
curl 'http://localhost:9200/_cat/indices?v'

View shards, status and size

One Elasticsearch index usually contains several shards. number_of_shards is defined in index setting.

Code Block
languagebash
curl 'http://localhost:9200/_cat/shards?v'

Tip: There are plenty of useful _cat commands to investigate elasticsearch, just type curl 'http://localhost:9200/_cat' to see them

View squirro templates

This to make sure the squirro templates are used in squirro storage node 

Code Block
languagebash
# see detail of all templates 
curl 'http://localhost:9200/_template?pretty'
# see only squirro template names
curl 'http://localhost:9200/_template?pretty' | grep squirro_v

View number of shards and replicas

This to make sure number of shards and replicas are set correctly in squirro storage node

Code Block
languagebash
# in template
curl 'http://localhost:9200/_template?pretty' | grep number_of
# in setting of given index
curl 'http://localhost:9200/{index_name}/_settings?pretty' | grep number_of

Tip: You can use wildcard syntax * in index_name, e.g /squirro_v8_*/

View mapping, setting of given index

This to make sure new created index used correct squirro template

Code Block
languagebash
curl 'http://localhost:9200/{index_name}/_mappings?pretty'
curl 'http://localhost:9200/{index_name}/_settings?pretty'

View elasticsearch stats

Code Block
languagebash
curl 'http://localhost:9200/_stats?pretty'
# stats of given index
curl 'http://localhost:9200/{index_name}/_stats?pretty'
# stats of nodes
curl 'http://localhost:9200/_nodes/stats?pretty'
# stats of cpu and memory
curl 'http://localhost:9200/_nodes/stats/os?pretty'
# stats of file system
curl 'http://localhost:9200/_nodes/stats/fs?pretty'
# stats of jvm
curl 'http://localhost:9200/_nodes/stats/jvm?pretty'
# stats of fielddata (data of text field is loaded to memory for aggregation, sort or in a script)
curl 'http://localhost:9200/_nodes/stats/indices/fielddata?pretty'

Investigate index content

View number of documents and randomly 1 document in the index

Code Block
languagebash
 # query whole index
curl 'http://localhost:9200/{index_name}/_search?size=1&pretty=true'
# query items
curl 'http://localhost:9200/{index_name}/item/_search?size=1&pretty=true'
# query sub items
curl 'http://localhost:9200/{index_name}/sub_item/_search?size=1&pretty=true'

You should see in the respond number of documents, for example

Code Block
{
  "hits" : {
     "total": 12345,
     ...
     "hits": [...list of items...]
}	

Get an item by id

Code Block
languagebash
 curl 'http://localhost:9200/{index_name}/item/{item_id}?pretty'

Query number of items modified before a given time

Code Block
languagebash
curl 'http://localhost:9200/{index_name}/_search?pretty' -d '
{
    "query": {
        "range" : {
            "modified_at" : {
                "lte" : "2018-02-13T18:01:44"
            }
        }
    },
	"size": 1
}'

Troubleshooting

Shards are UNASSIGNED

If  elasticsearch status is not green (yellow or red), because of UNASSIGNED shard issue, have a look at this document  https://www.datadoghq.com/blog/elasticsearch-unassigned-shards/ for solution. However we suggest you to contact our engineers for advance operations.

There are some simple operations you can do by yourself:

Disk free space

Check disk space and free it up: by default Elasticsearch needs about 20% disk free space to re assign index shards to nodes. Investigate this issue by using command "df -h ..." and "du -h ..." to find out free disk space and unimportant files to delete them (e.g. files in /var/log/squirro or /var/log/elasticsearch) 

Number of replicas

If you have only one instance of elasticsearch running but number_of_replicas in index settings is bigger than 0 then you also have  "yellow" status for that index with some unsigned shards. View them by:

Code Block
languagebash
curl -s 'localhost:9200/_cat/shards' | grep UNASSIGNED

Check number of replicas of an index:

Code Block
languagebash
curl -XGET 'http://localhost:9200/{index_name}/_settings?pretty' | grep number_of_replicas

If number_of_replicas > 0 then set value to 0:

Code Block
languagebash

curl -XPUT 'http://localhost:9200/{index_name}/_settings' -d '{"number_of_replicas":0}'

Check as well number of replicas in template to make sure future indices do not have wrong number_of_replicas setting:

Code Block
languagebash
curl -XGET 'http://localhost:9200/_template/squirro_v8?pretty' | grep number_of_replicas

If number_of_replicas > 0 then modify squirro_v8.json and put this template again

Code Block
languagebash

vi /etc/elasticsearch/templates/squirro_v8.json 
# edit the line "number_of_replicas": ...,
bash /etc/elasticsearch/templates/ensure_templates.sh

Indices are moved

Usually Elasticsearch indices are stored in /var/lib/elasticsearch/, however because of some reasons (server is down and cannot be recovered to old status, symlink is lost), you cannot find /var/lib/elasticsearch/, index is stayed in another mounted point.  To fix this issue:

1. Check content, permission and owner of /var/lib/elasticsearch folder

Code Block
languagebash
sudo ls -l /var/lib/elasticsearch/

You should see a folder nodes owned by user elasticsearch and group elasticsearch

2. If /var/lib/elasticsearch is missing or it's content is not as your expected index then

Code Block
languagebash
# stop elasticsearch
service elasticsearch stop

 choose one of 2 solutions:

a. create symlink to new index location

Code Block
languagebash
ln -s  {your_new_index_location} /var/lib/elasticsearch 

b. or point elasticsearch data dir to the new location in the config file:

Code Block
languagebash
vi /etc/sysconfig/elasticsearch
# edit the line DATA_DIR={your_new_index_location}

3. Set owner, restart service

Code Block
languagebash
# set owner and group for data folder
chown -R elasticsearch:elasticsearch {your_index_location}
# start elasticsearch again
service elasticsearch start
# check indices status
curl 'http://localhost:9200/_cat/indices?v'

Memory setting

You can set Elasticsearch memory in the file /etc/elasticsearch/jvm.options, minimum heap size (Xms) and maximum heap size (Xmx) must be equal to each other.

Elasticsearch may crash because not enough memory on your server. Make sure you set Xmx to no more than 50% of your physical RAM, to ensure that there is enough physical RAM left for kernel file system caches (https://www.elastic.co/guide/en/elasticsearch/reference/current/heap-size.html)

If you run cluster node and storage node on the same machine then set memory for Elasticsearch no more than 30% of your physical RAM.

Elasticsearch also not allow to set more than 32GB of memory.

Test an elasticsearch query

Sometimes you see there is exception in the elasticsearch log file where the request body is stored in the log file which is quite long. To reproduce this request for investigation, you can do in following way:

...

languagebash

...

This page can now be found at Managing Elasticsearch on the Squirro Docs site.