Intro
...
Home
https://github.com/benfred/py-spy
Install it
Code Block |
---|
pip install py-spy |
Note: On a Squirro server its best to create a dedicated virtualenv for it:
Code Block |
---|
/opt/rh/rh-python36/root/bin/virtualenv ~/pyspy
source ~/pyspy/bin/activate
pip install py-spy |
Use it
Before you start
We need a PID of a target process:
Example:
Code Block |
---|
#ps -elf | grep <service>d
ps -elf | grep topicd |
Common flags
Code Block |
---|
-p, --pid <pid> PID of a running python program to spy on
--full-filenames Show full Python filenames
-r, --rate <rate> The number of samples to collect per second [default: 100]
-s, --subprocesses Profile subprocesses of the original process
-n, --native Collect stack traces from native extensions written in Cython, C or C++ |
TOP
Great for these situation:
What is this service doing?
Why is a service/process slow?
Code Block |
---|
py-spy top -p 29121 |
Output:
Code Block |
---|
Total Samples 991
GIL: 2.20%, Active: 19.78%, Threads: 16
%Own %Total OwnTime TotalTime Function (filename:line)
8.79% 8.79% 0.330s 0.330s readinto (socket.py:586)
4.40% 4.40% 0.150s 0.150s run (topic/background.py:534)
0.00% 0.00% 0.120s 0.120s run (flup/server/threadedserver.py:76)
0.00% 8.79% 0.000s 0.330s _get_and_deliver (kombu/transport/virtual/base.py:406)
0.00% 8.79% 0.000s 0.330s _read_status (http/client.py:268)
0.00% 8.79% 0.000s 0.330s _delete_source_id (topic/background.py:116)
0.00% 0.00% 0.000s 0.120s run_flup_daemon (common/daemonlauncher.py:89)
0.00% 4.40% 0.000s 0.080s POST (topic/items.py:849)
0.00% 8.79% 0.000s 0.330s get (kombu/utils/scheduling.py:56)
0.00% 4.40% 0.000s 0.080s wrapped_f (common/__init__.py:142)
0.00% 8.79% 0.000s 0.330s delete (elasticsearch/client/__init__.py:596)
0.00% 8.79% 0.000s 0.330s consume (kombu/mixins.py:197) |
Here we can see that a source is being deleted.
Dump a stacktrace
Great for these situation:
A process is stuck
Code Block |
---|
py-spy dump -p 1529 |
Output:
Code Block |
---|
Thread 1614 (active): "local.3"
readinto (socket.py:586)
_read_status (http/client.py:268)
begin (http/client.py:307)
getresponse (http/client.py:1346)
_make_request (urllib3/connectionpool.py:421)
urlopen (urllib3/connectionpool.py:677)
perform_request (elasticsearch/connection/http_urllib3.py:229)
perform_request (elasticsearch/transport.py:362)
perform_request (common/tracing/elasticsearch_tracing.py:84)
delete (elasticsearch/client/__init__.py:596)
_wrapped (elasticsearch/client/utils.py:92)
retry (index/retry.py:57)
delete_document (index/writer.py:183)
delete_document (index/item/writer.py:152)
update_item (topic/background.py:88)
_delete_source_id (topic/background.py:116)
_on_task_delete_source_items (topic/background.py:70)
_on_task (common/queue.py:1071)
on_task (common/queue.py:992)
_process_task (common/queue.py:528)
<listcomp> (kombu/messaging.py:590)
receive (kombu/messaging.py:590)
_receive_callback (kombu/messaging.py:624)
_callback (kombu/transport/virtual/base.py:633)
_deliver (kombu/transport/virtual/base.py:983)
_get_and_deliver (kombu/transport/virtual/base.py:406)
get (kombu/utils/scheduling.py:56)
_poll (kombu/transport/virtual/base.py:402)
drain_events (kombu/transport/virtual/base.py:745)
_drain_channel (kombu/transport/virtual/base.py:1001)
get (kombu/utils/scheduling.py:56)
drain_events (kombu/transport/virtual/base.py:963)
drain_events (kombu/connection.py:323)
consume (kombu/mixins.py:197)
run (kombu/mixins.py:175)
run (common/background.py:174)
_bootstrap_inner (threading.py:916)
_bootstrap (threading.py:884) |
The stack is reversed, we can see its on a network socket waiting for a response of an elasticsearch delete command.
Record
Great for these situations:
Investigate slow performance
Understanding the bottlenecks during a load test
Performance regression testing: New code vs old code.
In general pinpoint where time is spent. Note: Time not equal CPU load.
Code Block |
---|
py-spy record -p 1529 -o /tmp/flamegraph.svg
Sampling process 100 times a second. Press Control-C to exit.
...
Stopped sampling because Control-C pressed
Wrote flamegraph data to '/tmp/flamegraph.svg'. Samples: 6864 Errors: 0 |
...
Output:
Attachments | ||
---|---|---|
|
Full View:
...
Zoomed in, we can see that its spending a lot of time in the delete item function of Squirro, which in turn then its waiting a lot on Elastic, due to many roundtrips:
...
Tips and Caveats
...
You need root privileges 😞 It can be worthwhile to ask the powers in charge for sudo powers for py-spy, at least temporarily.
...
If you record with py-spy and do nothing on the server, the flamegraph will mostly show flup/kombu doing a whole lot of nothing. So start recording during a time of action.
...
For record, consider using the --function switch, it will show you the function instead of the line number. This can be easier to read if you’re not too familar with the codebase.
...
Don’t forget the --subprocesses flag if you are have multi-process services, e.g. topicd, datasourced, machinelearningd, data loader with parallel flag etc.
...
This page can now be found at Python Performance Profiling Using PySpy on the Squirro Docs site.