Squirro relies on Elasticsearch for the storage nodes where the indexed data is persisted. This guide outlines Elasticsearch configuration changes that are often used in the context of Squirro.
Please also refer to the official Elasticsearch documentation for help.
Table of Contents
Table of Contents | ||||
---|---|---|---|---|
|
Where I can find
For the default setup of Squirro, here are most important Elasticsearch folders and files on the Squirro storage node:
...
location
...
description
...
/etc/sysconfig/elasticsearch
...
elasticsearch sysconfig file, customization of elasticsearch setting, e.g ES_HOME, CONF_DIR, DATA_DIR, LOG_DIR, ES_JAVA_OPTS
...
...
/etc/elasticsearch/elasticsearch.yml
...
elasticsearch config file, customization of elasticsearch config, e.g. cluster name, http port...
...
/etc/elasticsearch/
...
elasticsearch config directory: elasticsearch.yml, jvm.options, templates, scripts, synonyms (symlink not allowed)
...
/var/log/elasticsearch/
...
elasticsearch log folder (symlink is allowed)
...
/var/lib/elasticsearch/
...
elasticsearch data folder (where we store indices) (symlink is allowed)
...
/usr/share/elasticsearch/
...
where elasticsearch is installed, contains /bin, /lib, /plugins
Tool to support ES operation
Since version 5.x Elasticsearch does not allow to run some plugins likes head or kopf anymore. You can use an addon to see status of Elasticsearch cluster, index and shards during upgrading or any operation. For example cerebro:
Installation:
Download from https://github.com/lmenezes/cerebro/releases
Extract files
Run bin/cerebro
Access on http://localhost:9000
If you want to use port 9500 as port forward to Elasticsearch node, then use: ssh -L 9500:storagehost:9200. After that you can connect cerebro to http://localhost:9500 to see status of Elasticsearch cluster.
Monitoring
Check Size of indices in ES
Code Block | ||
---|---|---|
| ||
curl -XGET localhost:9200/_cat/indices?v |
You should see a list of the Squirro indices in Squirro with their sizes and the Project ID at the end of the Index name.
Check cluster/nodes is healthy
Code Block | ||
---|---|---|
| ||
curl http://localhost:9200/_cluster/health?pretty=true |
you should see a json containing "status" : "green"
Check elasticsearch access from cluster node
Cluster node needs to access storage node through nginx service. To check that access:
Code Block | ||
---|---|---|
| ||
curl -L http://127.0.0.1:81/ext/elastic/_cluster/health?pretty=true |
you should see json containing "status" : "green"
Check Elasticsearch service is started
Code Block | ||
---|---|---|
| ||
ps aux | grep [e]lasticsearch |
This command also allows you to view plenty settings of elasticsearch service if it's started, e.g. memory, default.path, process id:
Code Block | ||
---|---|---|
| ||
496 122224 1 11 10:20 ? 00:02:14 /usr/bin/java -Xms6g -Xmx6g -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+AlwaysPreTouch -server -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -Djdk.io.permissionsUseCanonicalPath=true -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0 -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -Dlog4j.skipJansi=true -XX:+HeapDumpOnOutOfMemoryError -Djava.security.policy=file:///etc/elasticsearch/squirro.java.policy -Des.path.home=/usr/share/elasticsearch -cp /usr/share/elasticsearch/lib/* org.elasticsearch.bootstrap.Elasticsearch -p /var/run/elasticsearch/elasticsearch.pid -d -Edefault.path.logs=/var/log/elasticsearch -Edefault.path.data=/var/lib/elasticsearch -Edefault.path.conf=/etc/elasticsearch |
To start, stop, restart elasticsearch service:
Code Block | ||
---|---|---|
| ||
systemctl {start|stop|restart} elasticsearch |
View indices, status and size
Each Squirro project by default corresponding to an index and have the format: squirro_{template-version}_{lowercase-project-id}, e.g: squirro_v9_36syha3_ss-zwvn9gyk1ww
Code Block | ||
---|---|---|
| ||
curl http://localhost:9200/_cat/indices?v |
View shards, status and size
One Elasticsearch index usually contains several shards. number_of_shards is defined in index setting.
Code Block | ||
---|---|---|
| ||
curl http://localhost:9200/_cat/shards?v |
Tip: There are plenty of useful _cat commands to investigate elasticsearch, just type curl 'http://localhost:9200/_cat' to see them
View Squirro templates
This to make sure the Squirro templates are used in Squirro storage node
Code Block | ||
---|---|---|
| ||
# see detail of all templates
curl -s http://localhost:9200/_template?pretty
# see only squirro template names
curl -s http://localhost:9200/_template?pretty | grep squirro_v |
View number of shards and replicas
This to make sure number of shards and replicas are set correctly in Squirro storage node
Code Block | ||
---|---|---|
| ||
# in template
curl -s http://localhost:9200/_template?pretty | grep -e number_of -e squirro_v
# in setting of given index
curl -s http://localhost:9200/{index_name}/_settings?pretty | grep -e number_of -e squirro_v |
Tip: You can use wildcard syntax * in index_name, e.g /squirro_v9_*/
View mapping, setting of given index
This to make sure new created index used correct Squirro template
Code Block | ||
---|---|---|
| ||
curl http://localhost:9200/{index_name}/_mappings?pretty
curl http://localhost:9200/{index_name}/_settings?pretty |
View elasticsearch stats
Code Block | ||
---|---|---|
| ||
curl http://localhost:9200/_stats?pretty
# stats of given index
curl http://localhost:9200/{index_name}/_stats?pretty
# stats of nodes
curl http://localhost:9200/_nodes/stats?pretty
# stats of cpu and memory
curl http://localhost:9200/_nodes/stats/os?pretty
# stats of file system
curl http://localhost:9200/_nodes/stats/fs?pretty
# stats of jvm
curl http://localhost:9200/_nodes/stats/jvm?pretty
# stats of fielddata (data of text field is loaded to memory for aggregation, sort or in a script)
curl http://localhost:9200/_nodes/stats/indices/fielddata?pretty |
Set replicas in multiple nodes cluster
When you have multiple storage nodes, we suggest you to create at least 1 replica for the indices so in case 1 node is shutdown, the storage cluster still works.
Code Block | ||
---|---|---|
| ||
curl -XPUT http://localhost:9200/squirro_v9/_settings -H "Content-Type: application/json" -d '{"index": {"number_of_replicas": 1}}'
curl -XPUT http://localhost:9200/squirro_v9_*/_settings -H "Content-Type: application/json" -d '{"index": {"number_of_replicas": 1}}'
curl -XPUT http://localhost:9200/.configsync/_settings -H "Content-Type: application/json" -d '{"index": {"number_of_replicas": 1}}' |
Investigate index content
View number of documents and randomly 1 document in the index
Code Block | ||
---|---|---|
| ||
# query whole index
curl http://localhost:9200/{index_name}/_search?pretty&size=1 |
You should see in the response number of documents, for example
Code Block |
---|
{
"hits" : {
"total": 12345,
...
"hits": [...list of items...]
} |
Get an item by id
Code Block | ||
---|---|---|
| ||
curl 'http://localhost:9200/{index_name}/_doc/{item_id}?pretty' |
Query number of items modified before a given time
Code Block | ||
---|---|---|
| ||
curl http://localhost:9200/{index_name}/_search?pretty -d '
{
"query": {
"range" : {
"modified_at" : {
"lte" : "2018-02-13T18:01:44"
}
}
},
"size": 1
}' |
Delete documents by query
Before delete by a query you always have to make sure that the query return only documents you want to delete by searching for that query and review some results:
Code Block | ||
---|---|---|
| ||
curl http://localhost:9200/{index_name}/_search?pretty -d '
{
"query": {
"term" : {
"assoc:sources" : "123456789abcdef"
}
}
}' |
In case you see some wrong document in the index and you want to delete them, e.g. document belong to the same source with id "123456789abcdef", then you can delete them by using query:
Code Block | ||
---|---|---|
| ||
curl -XPOST http://localhost:9200/{index_name}/_delete_by_query -d '
{
"query": {
"term" : {
"assoc:sources" : "123456789abcdef"
}
}
}' |
After deleting by query, the index size on disk is not reduced because a document is not deleted from a segment, just marked as deleted. So if you want to free disk space after deleting, then execute this command:
Code Block | ||
---|---|---|
| ||
curl -XPOST http://localhost:9200/{index_name}/_forcemerge?only_expunge_deletes=true |
Troubleshooting
Shards are UNASSIGNED
If elasticsearch status is not green (yellow or red), because of UNASSIGNED shard issue, have a look at this document https://www.datadoghq.com/blog/elasticsearch-unassigned-shards/ for solution. However we suggest you to contact our engineers for advance operations.
There are some simple operations you can do by yourself:
Disk free space
Check disk space and free it up: by default Elasticsearch needs about 20% disk free space to re assign index shards to nodes. Investigate this issue by using command "df -h ..."
and "du -h ..."
to find out free disk space and unimportant files to delete them (e.g. files in /var/log/squirro
or /var/log/elasticsearch
)
Number of replicas
If you have only one instance of elasticsearch running but number_of_replicas
in index settings is bigger than 0 then you also have "yellow" status for that index with some unsigned shards. View them by:
Code Block | ||
---|---|---|
| ||
curl -s localhost:9200/_cat/shards | grep UNASSIGNED |
Check number of replicas of an index:
Code Block | ||
---|---|---|
| ||
curl -XGET http://localhost:9200/{index_name}/_settings?pretty | grep number_of_replicas |
If number_of_replicas
> 0 then set value to 0:
Code Block | ||
---|---|---|
| ||
curl -XPUT http://localhost:9200/{index_name}/_settings -H "Content-Type: application/json" -d '{"number_of_replicas":0}' |
Check as well number of replicas in template to make sure future indices do not have wrong number_of_replicas setting:
Code Block | ||
---|---|---|
| ||
curl -XGET http://localhost:9200/_template/squirro_v9?pretty | grep number_of_replicas |
If number_of_replicas
> 0 then modify squirro_v9.json and put this template again
Code Block | ||
---|---|---|
| ||
vi /etc/elasticsearch/templates/squirro_v9.json
# edit the line "number_of_replicas": ...,
bash /etc/elasticsearch/templates/ensure_templates.sh |
Retry failed allocation
Check explanation for the unassigned shard:
Code Block |
---|
curl -XGET localhost:9200/_cluster/allocation/explain?pretty |
Retry failed allocations:
Code Block |
---|
curl -XPOST localhost:9200/_cluster/reroute?retry_failed |
Indices are moved
Usually Elasticsearch indices are stored in /var/lib/elasticsearch/
, however because of some reasons (server is down and cannot be recovered to old status, symlink is lost, old elasticsearch version puts indices under cluster name...), you cannot find /var/lib/elasticsearch/
, index is stayed in another mounted point. To fix this issue:
1. Check content, permission and owner of /var/lib/elasticsearch folder
Code Block | ||
---|---|---|
| ||
sudo ls -l /var/lib/elasticsearch/ |
You should see a folder nodes owned by user elasticsearch
and group elasticsearch
2. If /var/lib/elasticsearch
is missing or it's content is not as your expected index then
Code Block | ||
---|---|---|
| ||
# stop elasticsearch
systemctl stop elasticsearch |
choose one of 2 solutions:
a. create symlink to new index location
Code Block | ||
---|---|---|
| ||
ln -s {your_new_index_location} /var/lib/elasticsearch
|
b. or point elasticsearch data dir to the new location in the config file:
Code Block | ||
---|---|---|
| ||
vi /etc/elasticsearch/elasticsearch.yml
# edit the line path.data: {your_new_index_location} |
3. Set owner, restart service
Code Block | ||
---|---|---|
| ||
# set owner and group for data folder
chown -R elasticsearch:elasticsearch {your_index_location}
# start elasticsearch again
systemctl start elasticsearch
# check indices status
curl http://localhost:9200/_cat/indices?v |
Too many scroll contexts
If you encounter this type of exception:
Code Block |
---|
Trying to create too many scroll contexts. Must be less than or equal to: [500] |
You can increase the limit of Elasticsearch by running this command on one of the nodes:
Code Block |
---|
curl -X PUT localhost:9200/_cluster/settings -H 'Content-Type: application/json' -d'{
"persistent" : {
"search.max_open_scroll_context": 1000
},
"transient": {
"search.max_open_scroll_context": 1000
}
}' |
Alternatively and better long-term is to reduce the scroll argument of any squirro_client.scan() usage from the default 5m to something like 1m. We also filed a official improvement request to auto close contexts in a future release. Reference: SQ-13364
Elasticsearch fails to start with “Unable to load JNA native support library, native methods will be disabled” error message in the log
This happens when Elasticsearch tries to use the /tmp/ folder, but that folder is mounted with the noexec flag. Or alternatively if another temporary folder is used, the Elasticsearch service user has no execution rights in that folder.
The main reason why the noexec flag would be set on tmp is OS hardening. The tmp folder can be leveraged by bad actors to store and execute things. In a highly hardened system this is not desirable and hence the noexec flag is often set.
The workaround for this is to edit /etc/sysconfig/elasticsearch and to add this line:
Code Block |
---|
ES_TMPDIR=/usr/share/elasticsearch/tmp |
This would be a sensible default. But any location will do, as long as the folder is owned by the elasticsearch uid and gid.
Memory setting
You can set Elasticsearch memory in the file /etc/elasticsearch/jvm.options.d/squirro.options. Minimum heap size (Xms) and maximum heap size (Xmx) must be equal to each other, for example:
Code Block | ||
---|---|---|
| ||
-Xms8g
-Xmx8g |
Elasticsearch may crash because not enough memory on your server. Make sure you set Xmx
to no more than 50% of your physical RAM, to ensure that there is enough physical RAM left for kernel file system caches (https://www.elastic.co/guide/en/elasticsearch/reference/current/heap-size.html)
If you run cluster node and storage node on the same machine then set memory for Elasticsearch no more than 30% of your physical RAM.
Elasticsearch also does not allow to set more than 32GB of memory.
Info |
---|
WarningIn older installations of Squirro, the amount of memory available to Elasticsearch was set in the file /etc/elasticsearch/jvm.options. When both files are present in the system, /etc/elasticsearch/jvm.options.d/squirro.options overrides the options in /etc/elasticsearch/jvm.options. |
Test an elasticsearch query
Sometimes you see there is exception in the elasticsearch log file where the request body is stored in the log file which is quite long. To reproduce this request for investigation, you can do in following way:
Code Block | ||
---|---|---|
| ||
# save the query in json format, e.g to /tmp/es_request.json
# make request to ES using the as input file:
curl http://localhost:9200/{index_name}/_search?pretty -d @/tmp/es_request.json |
Cluster Block Exception in ES logs
If the disk usage on the Elasticsearch cluster goes beyond a certain limit, Elasticsearch marks all the indices in the read only mode, and only allowing the deletion of indices/documents to facilitate space recovery. In order to make the indices writable again, first make sure that you have more than 80% disk space available, either by removing old log/unncessary files, adding more disk space or by any other means and then execute the following on each index which is marked as read only.
Code Block |
---|
curl -XPUT http://localhost:9200/*/_settings -H "Content-Type: application/json" -d '{"index.blocks.read_only_allow_delete": null}' |
Recover from a corrupted index
Symptoms:
Elasticsearch cluster state is red
One or multiple shards are not allocated
Output of
Code Block |
---|
curl -XGET localhost:9200/_cluster/allocation/explain?pretty |
Looks like this:
Code Block |
---|
{
"index" : "squirro_v9_spzhmtdsrrc78oodbnolza",
"shard" : 5,
"primary" : true,
"current_state" : "unassigned",
"unassigned_info" : {
"reason" : "CLUSTER_RECOVERED",
"at" : "2020-02-14T19:20:02.789Z",
"last_allocation_status" : "no_valid_shard_copy"
},
"can_allocate" : "no_valid_shard_copy",
"allocate_explanation" : "cannot allocate because all found copies of the shard are either stale or corrupt",
"node_allocation_decisions" : [
{
"node_id" : "7IDt77EJR-uh5PknzF26_Q",
"node_name" : "squirro-node-2a679356-9373-58ec-bad1-d812fbed0cad",
"transport_address" : "127.0.0.1:9300",
"node_decision" : "no",
"store" : {
"in_sync" : true,
"allocation_id" : "xLLfLDCbTgaePDz_oivuQA",
"store_exception" : {
"type" : "corrupt_index_exception",
"reason" : "failed engine (reason: [merge failed]) (resource=preexisting_corruption)",
"caused_by" : {
"type" : "i_o_exception",
"reason" : "failed engine (reason: [merge failed])",
"caused_by" : {
"type" : "corrupt_index_exception",
"reason" : "codec footer mismatch (file truncated?): actual footer=0 vs expected footer=-1071082520 (resource=BufferedChecksumIndexInput(MMapIndexInput(path=\"/usr/share/elasticsearch/data/nodes/0/indices/7Md9uEZOTk-nAlZvRUvNmg/5/index/_oon.cfs\") [slice=_oon_Lucene50_0.tim]))"
}
}
}
}
}
]
} |
Note |
---|
At this point if you have replicas or snapshots or backups, then the only right thing to do is to recover those. |
We are going to use the Lucene CheckIndex utility to validate and fix the corrupted index.
...
Stop elasticsearch
...
Note the affected shards in the message above. In this example its shard 5.
...
Locate the data folder of this shared, in my example this is /usr/share/elasticsearch/data/nodes/0/indices/7Md9uEZOTk-nAlZvRUvNmg/5/index (also printed in the above message)
Backup the affected shard folder, e.g.
Code Block |
---|
tar cvf /tmp/shard5.tar.gz /usr/share/elasticsearch/data/nodes/0/indices/7Md9uEZOTk-nAlZvRUvNmg/5 |
...
Enter the lib folder of elasticsearch: cd /usr/share/elasticsearch/lib
Run the following command (adjust the folder to your situation):
Code Block |
---|
java -cp lucene-core*.jar -ea:org.apache.lucene… org.apache.lucene.index.CheckIndex /usr/share/elasticsearch/data/nodes/0/indices/7Md9uEZOTk-nAlZvRUvNmg/5/index/ -verbose -exorcise |
...
Check the summary output of the tool, in my case it was:
Code Block |
---|
WARNING: 1 broken segments (containing 179 documents) detected
Took 545.101 sec total.
WARNING: 179 documents will be lost
NOTE: will write new segments file in 5 seconds; this will remove 179 docs from the index. YOU WILL LOSE DATA. THIS IS YOUR LAST CHANCE TO CTRL+C!
5...
4...
3...
2...
1...
Writing...
OK
Wrote new segments file "segments_er0" |
This is good news, the tool was able to fix the corrupted segement. But, we lost 179 documents, and we don't know which ones!
...
Enter the index folder, in my case cd /usr/share/elasticsearch/data/nodes/0/indices/7Md9uEZOTk-nAlZvRUvNmg/5/index/
...
Remove any files that start with 'corrupted', in my case: rm corrupted_Qyj-NdANTo2vr-aUDR6l_g
...
Start elasticsearch
...
Check elasticsearch status
Synonyms File Missing
Should text search stop working in your Squirro project, this may be due to a missing synonym analyzer and filters in the index configuration. This would be evident in the topic.log and would also be missing from the ES Index configuration. ES Index configuration can be retrieved by running the following command.
Code Block |
---|
curl -XGET "localhost:9200/$INDEXID" | python -m json.tool |
...
1. Stop any dataloading jobs
2. Stop the ingester service
3. Close the particular index of the project. This can be achieved by the following command. It is important to replace the $INDEXID with the ES index currently experiencing such issues. For more info see https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-close.html
Code Block |
---|
curl -X POST "localhost:9200/$INDEXID/_close?pretty" |
4. Once the index has been closed, the Index setting can be updated via the below curl request. For more information please see, https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-synonym-tokenfilter.html.
It is important the following values are changed:
$INDEXID - ES Index
$PROJECTID - Squirro project id of affected index
$SYNONYMNAME - Name of the synonym file that cannot be found (Can be found in topic.log Ex 'title_body_summary')
$SYNOYNMID - ID of the synonym file that cannot be found (Id can easily be found from the Squirro URL (Explore Dashboard → Load → Synoynms → EDIT $SYNONYMMNAME))
Code Block |
---|
curl -XPUT localhost:9200/$INDEXID/_settings -H "Content-Type: application/json" -d'
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"synonyms_$PROJECTID_$SYNONYMNAME_$SYNONYMIND": {
"type": "custom",
"tokenizer": "icu_tokenizer",
"filter": ["icu_folding", "icu_normalizer", "synonyms_$PROJECTID_$SYNONYMNAME_$SYNOYNMID"],
"char_filter": ["html_strip", "quotation_char_filter"]
}
},
"filter": {
"synonyms_$PROJECTID_$SYNONYMNAME_$SYNOYNMID": {
"type": "synonym_graph",
"synonyms_path": "/etc/elasticsearch/synonyms/$PROJECTID/$SYNONYMID.txt",
"updateable": true
}
}
}
}
}
}
' |
5. Now that settings have been updated, it is time to open the index. This can be achieved by the below curl command. For more information please visit https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-open-close.html
Code Block |
---|
curl -X POST "localhost:9200/$INDEXID/_open?pretty" |
6. Resume Squirro ingester services and data loading jobs
7. Test full-text search and ensure results are returned as normal
Install/Remove Elasticsearch plugins
Use the following command to install an elasticsearch plugin
Code Block |
---|
elasticsearch-plugin install <plugin name> |
Use the following command to remove an elasticsearch plugin
...
This page can now be found at Managing Elasticsearch on the Squirro Docs site.