Squirro relies on Elasticsearch for the storage nodes where the indexed data is persisted. This guide outlines Elasticsearch configuration changes that are often used in the context of Squirro.
Please also refer to the official Elasticsearch documentation for help.
For default setup of squirro, here are most important Elasticsearch folders and files on the squirro storage node:
location | description |
---|---|
/etc/sysconfig/elasticsearch | elasticsearch sysconfig file, customization of elasticsearch setting, e.g ES_HOME, CONF_DIR, DATA_DIR, LOG_DIR, ES_JAVA_OPTS... |
/etc/elasticsearch/elasticsearch.yml | elasticsearch config file, customization of elasticsearch config, e.g. cluster name, http port... |
/etc/elasticsearch/ | elasticsearch config directory: elasticsearch.yml, jvm.options, templates, scripts, synonyms (symlink not allowed) |
/var/log/elasticsearch/ | elasticsearch log folder (symlink is allowed) |
/var/lib/elasticsearch/ | elasticsearch data folder (where we store indices) (symlink is allowed) |
/usr/share/elasticsearch/ | where elasticsearch is installed, contains /bin, /lib, /plugins |
Since version 5.x Elasticsearch does not allow to run some plugins likes head or kopf anymore. You can use an addon to see status of Elasticsearch cluster, index and shards during upgrading or any operation. For example cerebro:
Installation:
If you want to use port 9500 as port forward to Elasticsearch node, then use: ssh -L 9500:storagehost:9200. After that you can connect cerebro to http://localhost:9500 to see status of Elasticsearch cluster.
curl 'http://localhost:9200/_cluster/health?pretty=true' |
you should see a json containing "status" : "green"
Cluster node needs to access storage node through nginx service. To check that access:
curl -L 'http://127.0.0.1:81/ext/elastic/_cluster/health?pretty=true' |
you should see json containing "status" : "green"
ps aux | grep elasticsearch | grep -v grep |
This command also allows you to view plenty settings of elasticsearch service if it's started, e.g. memory, default.path, process id:
496 122224 1 11 10:20 ? 00:02:14 /usr/bin/java -Xms6g -Xmx6g -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+AlwaysPreTouch -server -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -Djdk.io.permissionsUseCanonicalPath=true -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0 -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -Dlog4j.skipJansi=true -XX:+HeapDumpOnOutOfMemoryError -Djava.security.policy=file:///etc/elasticsearch/squirro.java.policy -Des.path.home=/usr/share/elasticsearch -cp /usr/share/elasticsearch/lib/* org.elasticsearch.bootstrap.Elasticsearch -p /var/run/elasticsearch/elasticsearch.pid -d -Edefault.path.logs=/var/log/elasticsearch -Edefault.path.data=/var/lib/elasticsearch -Edefault.path.conf=/etc/elasticsearch |
To start, stop, restart elasticsearch service:
service elasticsearch {start|stop|restart| |
Each squirro project by default corresponding to an index and have the format: squirro_{template-version}_{lowercase-project-id}, e.g: squirro_v8_36syha3_ss-zwvn9gyk1ww
curl 'http://localhost:9200/_cat/indices?v' |
One Elasticsearch index usually contains several shards. number_of_shards is defined in index setting.
curl 'http://localhost:9200/_cat/shards?v' |
Tip: There are plenty of useful _cat commands to investigate elasticsearch, just type curl 'http://localhost:9200/_cat' to see them
This to make sure the squirro templates are used in squirro storage node
# see detail of all templates curl -s 'http://localhost:9200/_template?pretty' # see only squirro template names curl -s 'http://localhost:9200/_template?pretty' | grep squirro_v |
This to make sure number of shards and replicas are set correctly in squirro storage node
# in template curl -s 'http://localhost:9200/_template?pretty' | grep -e number_of -e squirro_v # in setting of given index curl -s 'http://localhost:9200/{index_name}/_settings?pretty' | grep -e number_of -e squirro_v |
Tip: You can use wildcard syntax * in index_name, e.g /squirro_v8_*/
This to make sure new created index used correct squirro template
curl 'http://localhost:9200/{index_name}/_mappings?pretty' curl 'http://localhost:9200/{index_name}/_settings?pretty' |
curl 'http://localhost:9200/_stats?pretty' # stats of given index curl 'http://localhost:9200/{index_name}/_stats?pretty' # stats of nodes curl 'http://localhost:9200/_nodes/stats?pretty' # stats of cpu and memory curl 'http://localhost:9200/_nodes/stats/os?pretty' # stats of file system curl 'http://localhost:9200/_nodes/stats/fs?pretty' # stats of jvm curl 'http://localhost:9200/_nodes/stats/jvm?pretty' # stats of fielddata (data of text field is loaded to memory for aggregation, sort or in a script) curl 'http://localhost:9200/_nodes/stats/indices/fielddata?pretty' |
# query whole index curl 'http://localhost:9200/{index_name}/_search?size=1&pretty=true' # query items curl 'http://localhost:9200/{index_name}/item/_search?size=1&pretty=true' # query sub items curl 'http://localhost:9200/{index_name}/sub_item/_search?size=1&pretty=true' |
You should see in the respond number of documents, for example
{ "hits" : { "total": 12345, ... "hits": [...list of items...] } |
curl 'http://localhost:9200/{index_name}/item/{item_id}?pretty' |
curl 'http://localhost:9200/{index_name}/_search?pretty' -d ' { "query": { "range" : { "modified_at" : { "lte" : "2018-02-13T18:01:44" } } }, "size": 1 }' |
In case you see some wrong document in the index and you want to delete them, e.g. document belong to the same source with id "123456789abcdef", then you can delete them by using query:
curl -XPOST 'http://localhost:9200/{index_name}/_delete_by_query' -d ' { "query": { "term" : { "assoc:subscriptions" : "123456789abcdef" } } }' |
Note 1: Before delete by a query you always have to make sure that the query return only documents you want to delete by searching for that query and review some results:
curl 'http://localhost:9200/{index_name}/_search?pretty' -d ' { "query": { "term" : { "assoc:subscriptions" : "123456789abcdef" } } }' |
Note 2: After deleting by query, the index size on disk is not reduced because a document is not deleted from a segment, just marked as deleted. So if you want to free disk space after deleting, then execute this command:
curl -XPOST 'http://localhost:9200/{index_name}/_forcemerge?only_expunge_deletes=true' |
If elasticsearch status is not green (yellow or red), because of UNASSIGNED shard issue, have a look at this document https://www.datadoghq.com/blog/elasticsearch-unassigned-shards/ for solution. However we suggest you to contact our engineers for advance operations.
There are some simple operations you can do by yourself:
Check disk space and free it up: by default Elasticsearch needs about 20% disk free space to re assign index shards to nodes. Investigate this issue by using command "df -h ..." and "du -h ..." to find out free disk space and unimportant files to delete them (e.g. files in /var/log/squirro or /var/log/elasticsearch)
If you have only one instance of elasticsearch running but number_of_replicas in index settings is bigger than 0 then you also have "yellow" status for that index with some unsigned shards. View them by:
curl -s 'localhost:9200/_cat/shards' | grep UNASSIGNED |
Check number of replicas of an index:
curl -XGET 'http://localhost:9200/{index_name}/_settings?pretty' | grep number_of_replicas |
If number_of_replicas > 0 then set value to 0:
curl -XPUT 'http://localhost:9200/{index_name}/_settings' -d '{"number_of_replicas":0}' |
Check as well number of replicas in template to make sure future indices do not have wrong number_of_replicas setting:
curl -XGET 'http://localhost:9200/_template/squirro_v8?pretty' | grep number_of_replicas |
If number_of_replicas > 0 then modify squirro_v8.json and put this template again
vi /etc/elasticsearch/templates/squirro_v8.json # edit the line "number_of_replicas": ..., bash /etc/elasticsearch/templates/ensure_templates.sh |
Usually Elasticsearch indices are stored in /var/lib/elasticsearch/, however because of some reasons (server is down and cannot be recovered to old status, symlink is lost), you cannot find /var/lib/elasticsearch/, index is stayed in another mounted point. To fix this issue:
1. Check content, permission and owner of /var/lib/elasticsearch folder
sudo ls -l /var/lib/elasticsearch/ |
You should see a folder nodes owned by user elasticsearch and group elasticsearch
2. If /var/lib/elasticsearch is missing or it's content is not as your expected index then
# stop elasticsearch service elasticsearch stop |
choose one of 2 solutions:
a. create symlink to new index location
ln -s {your_new_index_location} /var/lib/elasticsearch |
b. or point elasticsearch data dir to the new location in the config file:
vi /etc/sysconfig/elasticsearch # edit the line DATA_DIR={your_new_index_location} |
3. Set owner, restart service
# set owner and group for data folder chown -R elasticsearch:elasticsearch {your_index_location} # start elasticsearch again service elasticsearch start # check indices status curl 'http://localhost:9200/_cat/indices?v' |
You can set Elasticsearch memory in the file /etc/elasticsearch/jvm.options, minimum heap size (Xms) and maximum heap size (Xmx) must be equal to each other.
Elasticsearch may crash because not enough memory on your server. Make sure you set Xmx
to no more than 50% of your physical RAM, to ensure that there is enough physical RAM left for kernel file system caches (https://www.elastic.co/guide/en/elasticsearch/reference/current/heap-size.html)
If you run cluster node and storage node on the same machine then set memory for Elasticsearch no more than 30% of your physical RAM.
Elasticsearch also not allow to set more than 32GB of memory.
Sometimes you see there is exception in the elasticsearch log file where the request body is stored in the log file which is quite long. To reproduce this request for investigation, you can do in following way:
# save the query in json format, e.g to /tmp/es_request.json # make request to ES using the as input file: curl 'http://localhost:9200/{index_name}/_search?pretty' -d @/tmp/es_request.json |