Squirro relies on Elasticsearch for the storage nodes where the indexed data is persisted. This guide outlines Elasticsearch configuration changes that are often used in the context of Squirro.
Please also refer to the official Elasticsearch documentation for help.
Table of Contents
Table of Contents |
---|
outline | true |
---|
exclude | Table of Contents |
---|
|
Where I can find
For default setup of squirro, here are most important Elasticsearch folders and files on the squirro storage node:
...
Tool to support ES operation
Since version 5.x Elasticsearch does not allow to run some plugins likes head or kopf anymore. You can use an addon to see status of Elasticsearch cluster, index and shards during upgrading or any operation. For example cerebro:
Installation:
If you want to use port 9500 as port forward to Elasticsearch node, then use: ssh -L 9500:storagehost:9200. After that you can connect cerebro to http://localhost:9500 to see status of Elasticsearch cluster.
Monitoring
Check cluster/nodes is healthy
Code Block |
---|
|
curl 'http://localhost:9200/_cluster/health?pretty=true' |
you should see a json containing "status" : "green"
Check elasticsearch access from cluster node
Cluster node needs to access storage node through nginx service. To check that access:
Code Block |
---|
|
curl -L 'http://127.0.0.1:81/ext/elastic/_cluster/health?pretty=true' |
you should see json containing "status" : "green"
Check Elasticsearch service is started
Code Block |
---|
|
ps aux | grep elasticsearch | grep -v grep |
This command also allows you to view plenty settings of elasticsearch service if it's started, e.g. memory, default.path, process id:
Code Block |
---|
|
496 122224 1 11 10:20 ? 00:02:14 /usr/bin/java -Xms6g -Xmx6g -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+AlwaysPreTouch -server -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -Djdk.io.permissionsUseCanonicalPath=true -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0 -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -Dlog4j.skipJansi=true -XX:+HeapDumpOnOutOfMemoryError -Djava.security.policy=file:///etc/elasticsearch/squirro.java.policy -Des.path.home=/usr/share/elasticsearch -cp /usr/share/elasticsearch/lib/* org.elasticsearch.bootstrap.Elasticsearch -p /var/run/elasticsearch/elasticsearch.pid -d -Edefault.path.logs=/var/log/elasticsearch -Edefault.path.data=/var/lib/elasticsearch -Edefault.path.conf=/etc/elasticsearch |
View indices, status and size
Each squirro project by default corresponding to an index and have the format: squirro_{template-version}_{lowercase-project-id}, e.g: squirro_v8_36syha3_ss-zwvn9gyk1ww
Code Block |
---|
|
curl 'http://localhost:9200/_cat/indices?v' |
View shards, status and size
One Elasticsearch index usually contains several shards. number_of_shards is defined in index setting.
Code Block |
---|
|
curl 'http://localhost:9200/_cat/shards?v' |
Tip: There are plenty of useful _cat commands to investigate elasticsearch, just type curl 'http://localhost:9200/_cat' to see them
View squirro templates
This to make sure the squirro templates are used in squirro storage node
Code Block |
---|
|
# see detail of all templates
curl 'http://localhost:9200/_template?pretty'
# see only squirro template names
curl 'http://localhost:9200/_template?pretty' | grep squirro_v |
View number of shards and replicas
This to make sure number of shards and replicas are set correctly in squirro storage node
Code Block |
---|
|
# in template
curl 'http://localhost:9200/_template?pretty' | grep number_of
# in setting of given index
curl 'http://localhost:9200/{index_name}/_settings?pretty' | grep number_of |
Tip: You can use wildcard syntax * in index_name, e.g /squirro_v8_*/
View mapping, setting of given index
This to make sure new created index used correct squirro template
Code Block |
---|
|
curl 'http://localhost:9200/{index_name}/_mappings?pretty'
curl 'http://localhost:9200/{index_name}/_settings?pretty' |
View elasticsearch stats
Code Block |
---|
|
curl 'http://localhost:9200/_stats?pretty'
# stats of given index
curl 'http://localhost:9200/{index_name}/_stats?pretty'
# stats of nodes
curl 'http://localhost:9200/_nodes/stats?pretty'
# stats of cpu and memory
curl 'http://localhost:9200/_nodes/stats/os?pretty'
# stats of file system
curl 'http://localhost:9200/_nodes/stats/fs?pretty'
# stats of jvm
curl 'http://localhost:9200/_nodes/stats/jvm?pretty'
# stats of fielddata (data of text field is loaded to memory for aggregation, sort or in a script)
curl 'http://localhost:9200/_nodes/stats/indices/fielddata?pretty' |
Investigate index content
View number of documents and randomly 1 document in the index
Code Block |
---|
|
# query whole index
curl 'http://localhost:9200/{index_name}/_search?size=1&pretty=true'
# query items
curl 'http://localhost:9200/{index_name}/item/_search?size=1&pretty=true'
# query sub items
curl 'http://localhost:9200/{index_name}/sub_item/_search?size=1&pretty=true' |
You should see in the respond number of documents, for example
Code Block |
---|
{
"hits" : {
"total": 12345,
...
"hits": [...list of items...]
} |
Get an item by id
Code Block |
---|
|
curl 'http://localhost:9200/{index_name}/item/{item_id}?pretty' |
Query number of items modified before a given time
Code Block |
---|
|
curl 'http://localhost:9200/{index_name}/_search?pretty' -d '
{
"query": {
"range" : {
"modified_at" : {
"lte" : "2018-02-13T18:01:44"
}
}
},
"size": 1
}' |
Troubleshooting
Shards are UNASSIGNED
If elasticsearch status is not green (yellow or red), because of UNASSIGNED shard issue, have a look at this document https://www.datadoghq.com/blog/elasticsearch-unassigned-shards/ for solution. However we suggest you to contact our engineers for advance operations.
There are some simple operations you can do by yourself:
Disk free space
Check disk space and free it up: by default Elasticsearch needs about 20% disk free space to re assign index shards to nodes. Investigate this issue by using command "df -h ..." and "du -h ..." to find out free disk space and unimportant files to delete them (e.g. files in /var/log/squirro or /var/log/elasticsearch)
Number of replicas
If you have only one instance of elasticsearch running but number_of_replicas in index settings is bigger than 0 then you also have "yellow" status for that index with some unsigned shards. View them by:
Code Block |
---|
|
curl -s 'localhost:9200/_cat/shards' | grep UNASSIGNED |
Check number of replicas of an index:
Code Block |
---|
|
curl -XGET 'http://localhost:9200/{index_name}/_settings?pretty' | grep number_of_replicas |
If number_of_replicas > 0 then set value to 0:
Code Block |
---|
|
curl -XPUT 'http://localhost:9200/{index_name}/_settings' -d '{"number_of_replicas":0}' |
Check as well number of replicas in template to make sure future indices do not have wrong number_of_replicas setting:
Code Block |
---|
|
curl -XGET 'http://localhost:9200/_template/squirro_v8?pretty' | grep number_of_replicas |
If number_of_replicas > 0 then modify squirro_v8.json and put this template again
Code Block |
---|
|
vi /etc/elasticsearch/templates/squirro_v8.json
# edit the line "number_of_replicas": ...,
bash /etc/elasticsearch/templates/ensure_templates.sh |
Indices are moved
Usually Elasticsearch indices are stored in /var/lib/elasticsearch/, however because of some reasons (server is down and cannot be recovered to old status, symlink is lost), you cannot find /var/lib/elasticsearch/, index is stayed in another mounted point. To fix this issue:
1. Check content, permission and owner of /var/lib/elasticsearch folder
Code Block |
---|
|
sudo ls -l /var/lib/elasticsearch/ |
You should see a folder nodes owned by user elasticsearch and group elasticsearch
2. If /var/lib/elasticsearch is missing or it's content is not as your expected index then
Code Block |
---|
|
# stop elasticsearch
service elasticsearch stop |
choose one of 2 solutions:
a. create symlink to new index location
Code Block |
---|
|
ln -s {your_new_index_location} /var/lib/elasticsearch
|
b. or point elasticsearch data dir to the new location in the config file:
Code Block |
---|
|
vi /etc/sysconfig/elasticsearch
# edit the line DATA_DIR={your_new_index_location} |
3. Set owner, restart service
Code Block |
---|
|
# set owner and group for data folder
chown -R elasticsearch:elasticsearch {your_index_location}
# start elasticsearch again
service elasticsearch start
# check indices status
curl 'http://localhost:9200/_cat/indices?v' |
Memory setting
You can set Elasticsearch memory in the file /etc/elasticsearch/jvm.options, minimum heap size (Xms) and maximum heap size (Xmx) must be equal to each other.
Elasticsearch may crash because not enough memory on your server. Make sure you set Xmx
to no more than 50% of your physical RAM, to ensure that there is enough physical RAM left for kernel file system caches (https://www.elastic.co/guide/en/elasticsearch/reference/current/heap-size.html)
If you run cluster node and storage node on the same machine then set memory for Elasticsearch no more than 30% of your physical RAM.
Elasticsearch also not allow to set more than 32GB of memory.
Test an elasticsearch query
Sometimes you see there is exception in the elasticsearch log file where the request body is stored in the log file which is quite long. To reproduce this request for investigation, you can do in following way:
...
...
This page can now be found at Managing Elasticsearch on the Squirro Docs site.