Squirro offers a number of ways to debug what is happening in the system. Each service maintains its log file, which helps with finding the source of an error. Additionally pre-installed daemon Monit helps by automatically restarting services that fail completely. A few basic operating system commands are also explained here, which can be used to find out if the base system is running smoothly.
Table of Contents
Log files
All log files can be found in the /var/log
directory. The following table shows the relevant log files and their usage.
Log File | Service | Additional Notes |
---|---|---|
/var/log/squirro/squirro-*.log | Squirro services | Detailed log file about each service. Symlink to /var/log/squirro/SERVICE_NAME/SERVICE_NAME.log . |
/var/log/squirro/nginx-*-access.log | Nginx (Squirro services) | Every request to the web services is recorded in this log file in a line-by-line format. Symlink to /var/log/squirro/SERVICE_NAME/nginx-access.log . |
/var/log/squirro/nginx-*-error.log | Nginx (Squirro services) | Records errors on the HTTP level. When a service is stopped, errors may show up here indicating that the service is not reachable. Symlink to /var/log/squirro/SERVICE_NAME/nginx-error.log . |
/var/log/messages | - | General system log. Serious system failures will be recorded here. |
/var/log/elasticsearch/squirro.log | Elasticsearch | Records cluster information and major failures. |
/var/log/redis/redis.log | Redis | |
/var/log/mysqld.log | MySQL | |
/var/log/cron | Cron | |
/var/log/monit/monit.log | Monit | Records service state changes and actions. |
Monit
Squirro comes pre-installed with Monit, a service that is used to automatically restart services that have failed.
Monit is configured by configuration files in /etc/monit.d
. By default all the main Squirro services are monitored.
A detailed log on any actions that Monit takes is available in /var/log/monit/monit.log
.
Monit has a lot of options that can be added by configuration. For example there is a web interface that can be used to see the current status of the machine. For security reasons, that interface is restricted to local access only by default. Consult the Monit manual, specifically the section MONIT HTTPD for information on how to enable this web interface for wider access. When doing that, beware that the local Firewall also needs to be opened for the corresponding port.
System commands
The Squirro services are standard Unix daemons. Standard Linux utilities can be used to debug any issues that may arise.
Processor usage
The current processor usage can be consulted with two standard commands: uptime
and top
.
uptime
Next to some uptime information, the uptime
command outputs the load average for the past 1, 5 and 15 minutes. The load average is a simple metric showing how many processed had to wait for processing. It should usually be close or below to 1.0. If it goes above 5.0 the load is quite high, values above that are unusual.
When seeing a high load average value, the top
will usually show the processes that are generating load. But when the CPU usage shown by top
is low despite a high load average, that may indicate issues with I/O, such as disk performance.
top
The command top
shows a list of all processes on the system, sorted by current CPU usage. Pressing M
on the keyboard (upper case, so use Shift+m
) will sort the list by memory usage.
Memory usage
Memory usage of individual processes can be debugged with the top
command above. To see memory usage of the system as a whole, use free
.
free
The free
command outputs some statistics on how much RAM is being used by the system. The most useful value to consider is the used and free "-/+ buffers/cache". Those values account for how much memory the system is committed to using and it can not free easily.
By default free
outputs all values in bytes. By calling it with the -m
parameter (free -m
) all values are output in megabytes instead.
When free memory is very low, the system may be running into issues with memory usage. In some cases the kernel may need to kill processes randomly to make space. Those instances can be seen in the standard system log /var/log/messages
and are manifest by lines such as "Out of memory: kill process 23123".
Disk usage
A full disk will prevent the system from working. The df
command can help with finding those issues.
df
Use the df
command to see a list of all partitions and their disk usage. The column "Use%" will show the usage in percentage. Anything above 95% is considered full and will usually hinder the system from working well.
When you are experiencing full disks, consider enlarging the corresponding disk, or contact Support for ways to remove some extra data.
Following log files
tail
A lot of information is captured in log files. These files can be followed with the tail
command, specifically by using it's -f
parameter to follow all updates on a file.
For example:
tail -f /var/log/squirro/squirro-topic.log
This shows a real-time view of what is written into the topic service log file.
tail
also accepts multiple file names or even wildcards. So all Squirro service log files can be monitored as follows:
tail -f /var/log/squirro/squirro-*.log
grep
The grep
command searches files for occurrences of a specific text. For example if Squirro is reporting errors, but you are unsure where they might be coming from, the following command helps pinning down the responsible service:
grep ERROR /var/log/squirro/squirro-*.log
This will output a list of all Squirro log files that contain the text "ERROR" together with the lines that contain this text.