Managing Elasticsearch

Squirro relies on Elasticsearch for the storage nodes where the indexed data is persisted. This guide outlines Elasticsearch configuration changes that are often used in the context of Squirro.

Please also refer to the official Elasticsearch documentation for help.

Where I can find

For default setup of squirro, here are most important Elasticsearch folders and files on the squirro storage node:

location	description
/etc/sysconfig/elasticsearch	elasticsearch sysconfig file, customization of elasticsearch setting, e.g ES_HOME, CONF_DIR, DATA_DIR, LOG_DIR, ES_JAVA_OPTS...
/etc/elasticsearch/elasticsearch.yml	elasticsearch config file, customization of elasticsearch config, e.g. cluster name, http port...
/etc/elasticsearch/	elasticsearch config directory: elasticsearch.yml, jvm.options, templates, scripts, synonyms (symlink not allowed)
/var/log/elasticsearch/	elasticsearch log folder (symlink is allowed)
/var/lib/elasticsearch/	elasticsearch data folder (where we store indices) (symlink is allowed)
/usr/share/elasticsearch/	where elasticsearch is installed, contains /bin, /lib, /plugins

Tool to support ES operation

Since version 5.x Elasticsearch does not allow to run some plugins likes head or kopf anymore. You can use an addon to see status of Elasticsearch cluster, index and shards during upgrading or any operation. For example cerebro:

Installation:

Download from https://github.com/lmenezes/cerebro/releases
Extract files
Run bin/cerebro
Access on http://localhost:9000

If you want to use port 9500 as port forward to Elasticsearch node, then use: ssh -L 9500:storagehost:9200. After that you can connect cerebro to http://localhost:9500 to see status of Elasticsearch cluster.

Monitoring

Check cluster/nodes is healthy

curl 'http://localhost:9200/_cluster/health?pretty=true'

you should see a json containing "status" : "green"

Check elasticsearch access from cluster node

Cluster node needs to access storage node through nginx service. To check that access:

curl -L 'http://127.0.0.1:81/ext/elastic/_cluster/health?pretty=true'

you should see json containing "status" : "green"

Check Elasticsearch service is started

ps aux | grep elasticsearch | grep -v grep

This command also allows you to view plenty settings of elasticsearch service if it's started, e.g. memory, default.path, process id:

496 122224 1 11 10:20 ? 00:02:14 /usr/bin/java -Xms6g -Xmx6g -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+AlwaysPreTouch -server -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -Djdk.io.permissionsUseCanonicalPath=true -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0 -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -Dlog4j.skipJansi=true -XX:+HeapDumpOnOutOfMemoryError -Djava.security.policy=file:///etc/elasticsearch/squirro.java.policy -Des.path.home=/usr/share/elasticsearch -cp /usr/share/elasticsearch/lib/* org.elasticsearch.bootstrap.Elasticsearch -p /var/run/elasticsearch/elasticsearch.pid -d -Edefault.path.logs=/var/log/elasticsearch -Edefault.path.data=/var/lib/elasticsearch -Edefault.path.conf=/etc/elasticsearch

View indices, status and size

Each squirro project by default corresponding to an index and have the format: squirro_{template-version}_{lowercase-project-id}, e.g: squirro_v8_36syha3_ss-zwvn9gyk1ww

curl 'http://localhost:9200/_cat/indices?v'

View shards, status and size

One Elasticsearch index usually contains several shards. number_of_shards is defined in index setting.

curl 'http://localhost:9200/_cat/shards?v'

Tip: There are plenty of useful _cat commands to investigate elasticsearch, just type curl 'http://localhost:9200/_cat' to see them

View squirro templates

This to make sure the squirro templates are used in squirro storage node

# see detail of all templates 
curl 'http://localhost:9200/_template?pretty'
# see only squirro template names
curl 'http://localhost:9200/_template?pretty' | grep squirro_v

View number of shards and replicas

This to make sure number of shards and replicas are set correctly in squirro storage node

# in template
curl 'http://localhost:9200/_template?pretty' | grep number_of
# in setting of given index
curl 'http://localhost:9200/{index_name}/_settings?pretty' | grep number_of

Tip: You can use wildcard syntax * in index_name, e.g /squirro_v8_*/

View mapping, setting of given index

This to make sure new created index used correct squirro template

curl 'http://localhost:9200/{index_name}/_mappings?pretty'
curl 'http://localhost:9200/{index_name}/_settings?pretty'

View elasticsearch stats

curl 'http://localhost:9200/_stats?pretty'
# stats of given index
curl 'http://localhost:9200/{index_name}/_stats?pretty'
# stats of nodes
curl 'http://localhost:9200/_nodes/stats?pretty'
# stats of cpu and memory
curl 'http://localhost:9200/_nodes/stats/os?pretty'
# stats of file system
curl 'http://localhost:9200/_nodes/stats/fs?pretty'
# stats of jvm
curl 'http://localhost:9200/_nodes/stats/jvm?pretty'
# stats of fielddata (data of text field is loaded to memory for aggregation, sort or in a script)
curl 'http://localhost:9200/_nodes/stats/indices/fielddata?pretty'

Investigate index content

View number of documents and randomly 1 document in the index

 # query whole index
curl 'http://localhost:9200/{index_name}/_search?size=1&pretty=true'
# query items
curl 'http://localhost:9200/{index_name}/item/_search?size=1&pretty=true'
# query sub items
curl 'http://localhost:9200/{index_name}/sub_item/_search?size=1&pretty=true'

You should see in the respond number of documents, for example

{
  "hits" : {
     "total": 12345,
     ...
     "hits": [...list of items...]
}

Get an item by id

 curl 'http://localhost:9200/{index_name}/item/{item_id}?pretty'

Query number of items modified before a given time

curl 'http://localhost:9200/{index_name}/_search?pretty' -d '
{
    "query": {
        "range" : {
            "modified_at" : {
                "lte" : "2018-02-13T18:01:44"
            }
        }
    },
	"size": 1
}'

Delete documents by query

In case you see some wrong document in the index and you want to delete them, e.g. document belong to the same source with id "123456789abcdef", then you can delete them by using query:

curl -XPOST 'http://localhost:9200/{index_name}/_delete_by_query' -d '
{
    "query": {
        "term" : {
            "assoc:subscriptions" : "123456789abcdef"
        }
    }
}'

Note 1: Before delete by a query you always have to make sure that the query return only documents you want to delete by searching for that query and review some results:

curl 'http://localhost:9200/{index_name}/_search?pretty' -d '
{
    "query": {
        "term" : {
            "assoc:subscriptions" : "123456789abcdef"
        }
    }
}'

Note 2: After deleting by query, the index size on disk is not reduced because a document is not deleted from a segment, just marked as deleted. So if you want to free disk space after deleting, then execute this command:

curl -XPOST 'http://localhost:9200/{index_name}/_forcemerge?only_expunge_deletes=true'

Troubleshooting

Shards are UNASSIGNED

If elasticsearch status is not green (yellow or red), because of UNASSIGNED shard issue, have a look at this document https://www.datadoghq.com/blog/elasticsearch-unassigned-shards/ for solution. However we suggest you to contact our engineers for advance operations.

There are some simple operations you can do by yourself:

Disk free space

Check disk space and free it up: by default Elasticsearch needs about 20% disk free space to re assign index shards to nodes. Investigate this issue by using command "df -h ..." and "du -h ..." to find out free disk space and unimportant files to delete them (e.g. files in /var/log/squirro or /var/log/elasticsearch)

Number of replicas

If you have only one instance of elasticsearch running but number_of_replicas in index settings is bigger than 0 then you also have "yellow" status for that index with some unsigned shards. View them by:

curl -s 'localhost:9200/_cat/shards' | grep UNASSIGNED

Check number of replicas of an index:

curl -XGET 'http://localhost:9200/{index_name}/_settings?pretty' | grep number_of_replicas

If number_of_replicas > 0 then set value to 0:

curl -XPUT 'http://localhost:9200/{index_name}/_settings' -d '{"number_of_replicas":0}'

Check as well number of replicas in template to make sure future indices do not have wrong number_of_replicas setting:

curl -XGET 'http://localhost:9200/_template/squirro_v8?pretty' | grep number_of_replicas

If number_of_replicas > 0 then modify squirro_v8.json and put this template again

vi /etc/elasticsearch/templates/squirro_v8.json 
# edit the line "number_of_replicas": ...,
bash /etc/elasticsearch/templates/ensure_templates.sh

Indices are moved

Usually Elasticsearch indices are stored in /var/lib/elasticsearch/, however because of some reasons (server is down and cannot be recovered to old status, symlink is lost), you cannot find /var/lib/elasticsearch/, index is stayed in another mounted point. To fix this issue:

1. Check content, permission and owner of /var/lib/elasticsearch folder

sudo ls -l /var/lib/elasticsearch/

You should see a folder nodes owned by user elasticsearch and group elasticsearch

2. If /var/lib/elasticsearch is missing or it's content is not as your expected index then

# stop elasticsearch
service elasticsearch stop

choose one of 2 solutions:

a. create symlink to new index location

ln -s  {your_new_index_location} /var/lib/elasticsearch

b. or point elasticsearch data dir to the new location in the config file:

vi /etc/sysconfig/elasticsearch
# edit the line DATA_DIR={your_new_index_location}

3. Set owner, restart service

# set owner and group for data folder
chown -R elasticsearch:elasticsearch {your_index_location}
# start elasticsearch again
service elasticsearch start
# check indices status
curl 'http://localhost:9200/_cat/indices?v'

Memory setting

You can set Elasticsearch memory in the file /etc/elasticsearch/jvm.options, minimum heap size (Xms) and maximum heap size (Xmx) must be equal to each other.

Elasticsearch may crash because not enough memory on your server. Make sure you set Xmx to no more than 50% of your physical RAM, to ensure that there is enough physical RAM left for kernel file system caches (https://www.elastic.co/guide/en/elasticsearch/reference/current/heap-size.html)

If you run cluster node and storage node on the same machine then set memory for Elasticsearch no more than 30% of your physical RAM.

Elasticsearch also not allow to set more than 32GB of memory.

Test an elasticsearch query

Sometimes you see there is exception in the elasticsearch log file where the request body is stored in the log file which is quite long. To reproduce this request for investigation, you can do in following way:

# save the query in json format, e.g to /tmp/es_request.json
# make request to ES using the as input file:
curl 'http://localhost:9200/{index_name}/_search?pretty' -d @/tmp/es_request.json