Squirro relies on Elasticsearch for the storage nodes where the indexed data is persisted. This guide outlines Elasticsearch configuration changes that are often used in the context of Squirro.
Please also refer to the official Elasticsearch documentation for help.
Table of Contents
Where I can find
For the default setup of Squirro, here are most important Elasticsearch folders and files on the Squirro storage node:
location | description |
---|---|
/etc/sysconfig/elasticsearch | elasticsearch sysconfig file, customization of elasticsearch setting, e.g |
/etc/elasticsearch/elasticsearch.yml | elasticsearch config file, customization of elasticsearch config, e.g. cluster name, http port... |
/etc/elasticsearch/ | elasticsearch config directory: elasticsearch.yml, jvm.options, templates, scripts, synonyms (symlink not allowed) |
/var/log/elasticsearch/ | elasticsearch log folder (symlink is allowed) |
/var/lib/elasticsearch/ | elasticsearch data folder (where we store indices) (symlink is allowed) |
/usr/share/elasticsearch/ | where elasticsearch is installed, contains /bin, /lib, /plugins |
Tool to support ES operation
Since version 5.x Elasticsearch does not allow to run some plugins likes head or kopf anymore. You can use an addon to see status of Elasticsearch cluster, index and shards during upgrading or any operation. For example cerebro:
Installation:
Download from https://github.com/lmenezes/cerebro/releases
Extract files
Run bin/cerebro
Access on http://localhost:9000
If you want to use port 9500 as port forward to Elasticsearch node, then use: ssh -L 9500:storagehost:9200. After that you can connect cerebro to http://localhost:9500 to see status of Elasticsearch cluster.
Monitoring
Check Size of indices in ES
curl -XGET localhost:9200/_cat/indices?v
You should see a list of the Squirro indices in Squirro with their sizes and the Project ID at the end of the Index name.
Check cluster/nodes is healthy
curl http://localhost:9200/_cluster/health?pretty=true
you should see a json containing "status" : "green"
Check elasticsearch access from cluster node
Cluster node needs to access storage node through nginx service. To check that access:
curl -L http://127.0.0.1:81/ext/elastic/_cluster/health?pretty=true
you should see json containing "status" : "green"
Check Elasticsearch service is started
ps aux | grep [e]lasticsearch
This command also allows you to view plenty settings of elasticsearch service if it's started, e.g. memory, default.path, process id:
496 122224 1 11 10:20 ? 00:02:14 /usr/bin/java -Xms6g -Xmx6g -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+AlwaysPreTouch -server -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -Djdk.io.permissionsUseCanonicalPath=true -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0 -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -Dlog4j.skipJansi=true -XX:+HeapDumpOnOutOfMemoryError -Djava.security.policy=file:///etc/elasticsearch/squirro.java.policy -Des.path.home=/usr/share/elasticsearch -cp /usr/share/elasticsearch/lib/* org.elasticsearch.bootstrap.Elasticsearch -p /var/run/elasticsearch/elasticsearch.pid -d -Edefault.path.logs=/var/log/elasticsearch -Edefault.path.data=/var/lib/elasticsearch -Edefault.path.conf=/etc/elasticsearch
To start, stop, restart elasticsearch service:
systemctl {start|stop|restart} elasticsearch
View indices, status and size
Each Squirro project by default corresponding to an index and have the format: squirro_{template-version}_{lowercase-project-id}, e.g: squirro_v9_36syha3_ss-zwvn9gyk1ww
curl http://localhost:9200/_cat/indices?v
View shards, status and size
One Elasticsearch index usually contains several shards. number_of_shards is defined in index setting.
curl http://localhost:9200/_cat/shards?v
Tip: There are plenty of useful _cat commands to investigate elasticsearch, just type curl 'http://localhost:9200/_cat' to see them
View Squirro templates
This to make sure the Squirro templates are used in Squirro storage node
# see detail of all templates curl -s http://localhost:9200/_template?pretty # see only squirro template names curl -s http://localhost:9200/_template?pretty | grep squirro_v
View number of shards and replicas
This to make sure number of shards and replicas are set correctly in Squirro storage node
# in template curl -s http://localhost:9200/_template?pretty | grep -e number_of -e squirro_v # in setting of given index curl -s http://localhost:9200/{index_name}/_settings?pretty | grep -e number_of -e squirro_v
Tip: You can use wildcard syntax * in index_name, e.g /squirro_v9_*/
View mapping, setting of given index
This to make sure new created index used correct Squirro template
curl http://localhost:9200/{index_name}/_mappings?pretty curl http://localhost:9200/{index_name}/_settings?pretty
View elasticsearch stats
curl http://localhost:9200/_stats?pretty # stats of given index curl http://localhost:9200/{index_name}/_stats?pretty # stats of nodes curl http://localhost:9200/_nodes/stats?pretty # stats of cpu and memory curl http://localhost:9200/_nodes/stats/os?pretty # stats of file system curl http://localhost:9200/_nodes/stats/fs?pretty # stats of jvm curl http://localhost:9200/_nodes/stats/jvm?pretty # stats of fielddata (data of text field is loaded to memory for aggregation, sort or in a script) curl http://localhost:9200/_nodes/stats/indices/fielddata?pretty
Set replicas in multiple nodes cluster
When you have multiple storage nodes, we suggest you to create at least 1 replica for the indices so in case 1 node is shutdown, the storage cluster still works.
curl -XPUT http://localhost:9200/squirro_v9/_settings -H "Content-Type: application/json" -d '{"index": {"number_of_replicas": 1}}' curl -XPUT http://localhost:9200/squirro_v9_*/_settings -H "Content-Type: application/json" -d '{"index": {"number_of_replicas": 1}}' curl -XPUT http://localhost:9200/.configsync/_settings -H "Content-Type: application/json" -d '{"index": {"number_of_replicas": 1}}'
Investigate index content
View number of documents and randomly 1 document in the index
# query whole index curl http://localhost:9200/{index_name}/_search?pretty&size=1
You should see in the response number of documents, for example
{ "hits" : { "total": 12345, ... "hits": [...list of items...] }
Get an item by id
curl 'http://localhost:9200/{index_name}/_doc/{item_id}?pretty'
Query number of items modified before a given time
curl http://localhost:9200/{index_name}/_search?pretty -d ' { "query": { "range" : { "modified_at" : { "lte" : "2018-02-13T18:01:44" } } }, "size": 1 }'
Delete documents by query
Before delete by a query you always have to make sure that the query return only documents you want to delete by searching for that query and review some results:
curl http://localhost:9200/{index_name}/_search?pretty -d ' { "query": { "term" : { "assoc:sources" : "123456789abcdef" } } }'
In case you see some wrong document in the index and you want to delete them, e.g. document belong to the same source with id "123456789abcdef", then you can delete them by using query:
curl -XPOST http://localhost:9200/{index_name}/_delete_by_query -d ' { "query": { "term" : { "assoc:sources" : "123456789abcdef" } } }'
After deleting by query, the index size on disk is not reduced because a document is not deleted from a segment, just marked as deleted. So if you want to free disk space after deleting, then execute this command:
curl -XPOST http://localhost:9200/{index_name}/_forcemerge?only_expunge_deletes=true
Troubleshooting
Shards are UNASSIGNED
If elasticsearch status is not green (yellow or red), because of UNASSIGNED shard issue, have a look at this document https://www.datadoghq.com/blog/elasticsearch-unassigned-shards/ for solution. However we suggest you to contact our engineers for advance operations.
There are some simple operations you can do by yourself:
Disk free space
Check disk space and free it up: by default Elasticsearch needs about 20% disk free space to re assign index shards to nodes. Investigate this issue by using command "df -h ..."
and "du -h ..."
to find out free disk space and unimportant files to delete them (e.g. files in /var/log/squirro
or /var/log/elasticsearch
)
Number of replicas
If you have only one instance of elasticsearch running but number_of_replicas
in index settings is bigger than 0 then you also have "yellow" status for that index with some unsigned shards. View them by:
curl -s localhost:9200/_cat/shards | grep UNASSIGNED
Check number of replicas of an index:
curl -XGET http://localhost:9200/{index_name}/_settings?pretty | grep number_of_replicas
If number_of_replicas
> 0 then set value to 0:
curl -XPUT http://localhost:9200/{index_name}/_settings -H "Content-Type: application/json" -d '{"number_of_replicas":0}'
Check as well number of replicas in template to make sure future indices do not have wrong number_of_replicas setting:
curl -XGET http://localhost:9200/_template/squirro_v9?pretty | grep number_of_replicas
If number_of_replicas
> 0 then modify squirro_v9.json and put this template again
vi /etc/elasticsearch/templates/squirro_v9.json # edit the line "number_of_replicas": ..., bash /etc/elasticsearch/templates/ensure_templates.sh
Retry failed allocation
Check explanation for the unassigned shard:
curl -XGET localhost:9200/_cluster/allocation/explain?pretty
Retry failed allocations:
curl -XPOST localhost:9200/_cluster/reroute?retry_failed
Indices are moved
Usually Elasticsearch indices are stored in /var/lib/elasticsearch/
, however because of some reasons (server is down and cannot be recovered to old status, symlink is lost, old elasticsearch version puts indices under cluster name...), you cannot find /var/lib/elasticsearch/
, index is stayed in another mounted point. To fix this issue:
1. Check content, permission and owner of /var/lib/elasticsearch folder
sudo ls -l /var/lib/elasticsearch/
You should see a folder nodes owned by user elasticsearch
and group elasticsearch
2. If /var/lib/elasticsearch
is missing or it's content is not as your expected index then
# stop elasticsearch systemctl stop elasticsearch
choose one of 2 solutions:
a. create symlink to new index location
ln -s {your_new_index_location} /var/lib/elasticsearch
b. or point elasticsearch data dir to the new location in the config file:
vi /etc/elasticsearch/elasticsearch.yml # edit the line path.data: {your_new_index_location}
3. Set owner, restart service
# set owner and group for data folder chown -R elasticsearch:elasticsearch {your_index_location} # start elasticsearch again systemctl start elasticsearch # check indices status curl http://localhost:9200/_cat/indices?v
Too many scroll contexts
If you encounter this type of exception:
Trying to create too many scroll contexts. Must be less than or equal to: [500]
You can increase the limit of Elasticsearch by running this command on one of the nodes:
curl -X PUT localhost:9200/_cluster/settings -H 'Content-Type: application/json' -d'{ "persistent" : { "search.max_open_scroll_context": 1000 }, "transient": { "search.max_open_scroll_context": 1000 } }'
Alternatively and better long-term is to reduce the scroll argument of any squirro_client.scan() usage from the default 5m to something like 1m. We also filed a official improvement request to auto close contexts in a future release. Reference: SQ-13364
Elasticsearch fails to start with “Unable to load JNA native support library, native methods will be disabled” error message in the log
This happens when Elasticsearch tries to use the /tmp/ folder, but that folder is mounted with the noexec flag. Or alternatively if another temporary folder is used, the Elasticsearch service user has no execution rights in that folder.
The main reason why the noexec flag would be set on tmp is OS hardening. The tmp folder can be leveraged by bad actors to store and execute things. In a highly hardened system this is not desirable and hence the noexec flag is often set.
The workaround for this is to edit /etc/sysconfig/elasticsearch and to add this line:
ES_TMPDIR=/usr/share/elasticsearch/tmp
This would be a sensible default. But any location will do, as long as the folder is owned by the elasticsearch uid and gid.
Memory setting
You can set Elasticsearch memory in the file /etc/elasticsearch/jvm.options.d/squirro.options. Minimum heap size (Xms) and maximum heap size (Xmx) must be equal to each other, for example:
-Xms8g -Xmx8g
Elasticsearch may crash because not enough memory on your server. Make sure you set Xmx
to no more than 50% of your physical RAM, to ensure that there is enough physical RAM left for kernel file system caches (https://www.elastic.co/guide/en/elasticsearch/reference/current/heap-size.html)
If you run cluster node and storage node on the same machine then set memory for Elasticsearch no more than 30% of your physical RAM.
Elasticsearch also does not allow to set more than 32GB of memory.
Warning
In older installations of Squirro, the amount of memory available to Elasticsearch was set in the file /etc/elasticsearch/jvm.options.
When both files are present in the system, /etc/elasticsearch/jvm.options.d/squirro.options overrides the options in /etc/elasticsearch/jvm.options.
Test an elasticsearch query
Sometimes you see there is exception in the elasticsearch log file where the request body is stored in the log file which is quite long. To reproduce this request for investigation, you can do in following way:
# save the query in json format, e.g to /tmp/es_request.json # make request to ES using the as input file: curl http://localhost:9200/{index_name}/_search?pretty -d @/tmp/es_request.json
Cluster Block Exception in ES logs
If the disk usage on the Elasticsearch cluster goes beyond a certain limit, Elasticsearch marks all the indices in the read only mode, and only allowing the deletion of indices/documents to facilitate space recovery. In order to make the indices writable again, first make sure that you have more than 80% disk space available, either by removing old log/unncessary files, adding more disk space or by any other means and then execute the following on each index which is marked as read only.
curl -XPUT http://localhost:9200/*/_settings -H "Content-Type: application/json" -d '{"index.blocks.read_only_allow_delete": null}'
Recover from a corrupted index
Symptoms:
Elasticsearch cluster state is red
One or multiple shards are not allocated
Output of
curl -XGET localhost:9200/_cluster/allocation/explain?pretty
Looks like this:
{ "index" : "squirro_v9_spzhmtdsrrc78oodbnolza", "shard" : 5, "primary" : true, "current_state" : "unassigned", "unassigned_info" : { "reason" : "CLUSTER_RECOVERED", "at" : "2020-02-14T19:20:02.789Z", "last_allocation_status" : "no_valid_shard_copy" }, "can_allocate" : "no_valid_shard_copy", "allocate_explanation" : "cannot allocate because all found copies of the shard are either stale or corrupt", "node_allocation_decisions" : [ { "node_id" : "7IDt77EJR-uh5PknzF26_Q", "node_name" : "squirro-node-2a679356-9373-58ec-bad1-d812fbed0cad", "transport_address" : "127.0.0.1:9300", "node_decision" : "no", "store" : { "in_sync" : true, "allocation_id" : "xLLfLDCbTgaePDz_oivuQA", "store_exception" : { "type" : "corrupt_index_exception", "reason" : "failed engine (reason: [merge failed]) (resource=preexisting_corruption)", "caused_by" : { "type" : "i_o_exception", "reason" : "failed engine (reason: [merge failed])", "caused_by" : { "type" : "corrupt_index_exception", "reason" : "codec footer mismatch (file truncated?): actual footer=0 vs expected footer=-1071082520 (resource=BufferedChecksumIndexInput(MMapIndexInput(path=\"/usr/share/elasticsearch/data/nodes/0/indices/7Md9uEZOTk-nAlZvRUvNmg/5/index/_oon.cfs\") [slice=_oon_Lucene50_0.tim]))" } } } } } ] }
At this point if you have replicas or snapshots or backups, then the only right thing to do is to recover those.
The below steps will get your index back up and running in green state, but you will most likely loose some documents.
We are going to use the Lucene CheckIndex utility to validate and fix the corrupted index.
Stop elasticsearch
Note the affected shards in the message above. In this example its shard 5.
Locate the data folder of this shared, in my example this is /usr/share/elasticsearch/data/nodes/0/indices/7Md9uEZOTk-nAlZvRUvNmg/5/index (also printed in the above message)
Backup the affected shard folder, e.g.
tar cvf /tmp/shard5.tar.gz /usr/share/elasticsearch/data/nodes/0/indices/7Md9uEZOTk-nAlZvRUvNmg/5
Enter the lib folder of elasticsearch: cd /usr/share/elasticsearch/lib
Run the following command (adjust the folder to your situation):
java -cp lucene-core*.jar -ea:org.apache.lucene… org.apache.lucene.index.CheckIndex /usr/share/elasticsearch/data/nodes/0/indices/7Md9uEZOTk-nAlZvRUvNmg/5/index/ -verbose -exorcise
Check the summary output of the tool, in my case it was:
WARNING: 1 broken segments (containing 179 documents) detected Took 545.101 sec total. WARNING: 179 documents will be lost NOTE: will write new segments file in 5 seconds; this will remove 179 docs from the index. YOU WILL LOSE DATA. THIS IS YOUR LAST CHANCE TO CTRL+C! 5... 4... 3... 2... 1... Writing... OK Wrote new segments file "segments_er0"
This is good news, the tool was able to fix the corrupted segement. But, we lost 179 documents, and we don't know which ones!
Enter the index folder, in my case cd /usr/share/elasticsearch/data/nodes/0/indices/7Md9uEZOTk-nAlZvRUvNmg/5/index/
Remove any files that start with 'corrupted', in my case: rm corrupted_Qyj-NdANTo2vr-aUDR6l_g
Start elasticsearch
Check elasticsearch status
Synonyms File Missing
Should text search stop working in your Squirro project, this may be due to a missing synonym analyzer and filters in the index configuration. This would be evident in the topic.log and would also be missing from the ES Index configuration. ES Index configuration can be retrieved by running the following command.
curl -XGET "localhost:9200/$INDEXID" | python -m json.tool
In order to restore the search functionality, the following steps should be taken:
1. Stop any dataloading jobs
2. Stop the ingester service
3. Close the particular index of the project. This can be achieved by the following command. It is important to replace the $INDEXID with the ES index currently experiencing such issues. For more info see https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-close.html
curl -X POST "localhost:9200/$INDEXID/_close?pretty"
4. Once the index has been closed, the Index setting can be updated via the below curl request. For more information please see, https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-synonym-tokenfilter.html.
It is important the following values are changed:
$INDEXID - ES Index
$PROJECTID - Squirro project id of affected index
$SYNONYMNAME - Name of the synonym file that cannot be found (Can be found in topic.log Ex 'title_body_summary')
$SYNOYNMID - ID of the synonym file that cannot be found (Id can easily be found from the Squirro URL (Explore Dashboard → Load → Synoynms → EDIT $SYNONYMMNAME))
curl -XPUT localhost:9200/$INDEXID/_settings -H "Content-Type: application/json" -d' { "settings": { "index": { "analysis": { "analyzer": { "synonyms_$PROJECTID_$SYNONYMNAME_$SYNONYMIND": { "type": "custom", "tokenizer": "icu_tokenizer", "filter": ["icu_folding", "icu_normalizer", "synonyms_$PROJECTID_$SYNONYMNAME_$SYNOYNMID"], "char_filter": ["html_strip", "quotation_char_filter"] } }, "filter": { "synonyms_$PROJECTID_$SYNONYMNAME_$SYNOYNMID": { "type": "synonym_graph", "synonyms_path": "/etc/elasticsearch/synonyms/$PROJECTID/$SYNONYMID.txt", "updateable": true } } } } } } '
5. Now that settings have been updated, it is time to open the index. This can be achieved by the below curl command. For more information please visit https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-open-close.html
curl -X POST "localhost:9200/$INDEXID/_open?pretty"
6. Resume Squirro ingester services and data loading jobs
7. Test full-text search and ensure results are returned as normal
Install/Remove Elasticsearch plugins
Use the following command to install an elasticsearch plugin
elasticsearch-plugin install <plugin name>
Use the following command to remove an elasticsearch plugin
elasticsearch-plugin remove <plugin name>