Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Intro

...

Home

https://github.com/benfred/py-spy

Install it

Code Block
pip install py-spy

Note: On a Squirro server its best to create a dedicated virtualenv for it:

Code Block
/opt/rh/rh-python36/root/bin/virtualenv ~/pyspy
source ~/pyspy/bin/activate
pip install py-spy

Use it

Before you start

We need a PID of a target process:

Example:

Code Block
#ps -elf | grep <service>d
ps -elf | grep topicd

Common flags

Code Block
-p, --pid <pid>              PID of a running python program to spy on
--full-filenames Show full Python filenames
-r, --rate <rate>            The number of samples to collect per second [default: 100]
-s, --subprocesses           Profile subprocesses of the original process
-n, --native                 Collect stack traces from native extensions written in Cython, C or C++

TOP

Great for these situation:

  • What is this service doing?

  • Why is a service/process slow?

Code Block
py-spy top -p 29121

Output:

Code Block
Total Samples 991
GIL: 2.20%, Active: 19.78%, Threads: 16

  %Own   %Total  OwnTime  TotalTime  Function (filename:line)
  8.79%   8.79%   0.330s    0.330s   readinto (socket.py:586)
  4.40%   4.40%   0.150s    0.150s   run (topic/background.py:534)
  0.00%   0.00%   0.120s    0.120s   run (flup/server/threadedserver.py:76)
  0.00%   8.79%   0.000s    0.330s   _get_and_deliver (kombu/transport/virtual/base.py:406)
  0.00%   8.79%   0.000s    0.330s   _read_status (http/client.py:268)
  0.00%   8.79%   0.000s    0.330s   _delete_source_id (topic/background.py:116)
  0.00%   0.00%   0.000s    0.120s   run_flup_daemon (common/daemonlauncher.py:89)
  0.00%   4.40%   0.000s    0.080s   POST (topic/items.py:849)
  0.00%   8.79%   0.000s    0.330s   get (kombu/utils/scheduling.py:56)
  0.00%   4.40%   0.000s    0.080s   wrapped_f (common/__init__.py:142)
  0.00%   8.79%   0.000s    0.330s   delete (elasticsearch/client/__init__.py:596)
  0.00%   8.79%   0.000s    0.330s   consume (kombu/mixins.py:197)

Here we can see that a source is being deleted.

Dump a stacktrace

Great for these situation:

  • A process is stuck

Code Block
py-spy dump -p 1529

Output:

Code Block
Thread 1614 (active): "local.3"
    readinto (socket.py:586)
    _read_status (http/client.py:268)
    begin (http/client.py:307)
    getresponse (http/client.py:1346)
    _make_request (urllib3/connectionpool.py:421)
    urlopen (urllib3/connectionpool.py:677)
    perform_request (elasticsearch/connection/http_urllib3.py:229)
    perform_request (elasticsearch/transport.py:362)
    perform_request (common/tracing/elasticsearch_tracing.py:84)
    delete (elasticsearch/client/__init__.py:596)
    _wrapped (elasticsearch/client/utils.py:92)
    retry (index/retry.py:57)
    delete_document (index/writer.py:183)
    delete_document (index/item/writer.py:152)
    update_item (topic/background.py:88)
    _delete_source_id (topic/background.py:116)
    _on_task_delete_source_items (topic/background.py:70)
    _on_task (common/queue.py:1071)
    on_task (common/queue.py:992)
    _process_task (common/queue.py:528)
    <listcomp> (kombu/messaging.py:590)
    receive (kombu/messaging.py:590)
    _receive_callback (kombu/messaging.py:624)
    _callback (kombu/transport/virtual/base.py:633)
    _deliver (kombu/transport/virtual/base.py:983)
    _get_and_deliver (kombu/transport/virtual/base.py:406)
    get (kombu/utils/scheduling.py:56)
    _poll (kombu/transport/virtual/base.py:402)
    drain_events (kombu/transport/virtual/base.py:745)
    _drain_channel (kombu/transport/virtual/base.py:1001)
    get (kombu/utils/scheduling.py:56)
    drain_events (kombu/transport/virtual/base.py:963)
    drain_events (kombu/connection.py:323)
    consume (kombu/mixins.py:197)
    run (kombu/mixins.py:175)
    run (common/background.py:174)
    _bootstrap_inner (threading.py:916)
    _bootstrap (threading.py:884)

The stack is reversed, we can see its on a network socket waiting for a response of an elasticsearch delete command.

Record

Great for these situations:

  • Investigate slow performance

  • Understanding the bottlenecks during a load test

  • Performance regression testing: New code vs old code.

  • In general pinpoint where time is spent. Note: Time not equal CPU load.

Code Block
py-spy record -p 1529 -o /tmp/flamegraph.svg
Sampling process 100 times a second. Press Control-C to exit.
...
Stopped sampling because Control-C pressed
Wrote flamegraph data to '/tmp/flamegraph.svg'. Samples: 6864 Errors: 0

...

Output:

Attachments
patterns.*svg

Full View:

...

Zoomed in, we can see that its spending a lot of time in the delete item function of Squirro, which in turn then its waiting a lot on Elastic, due to many roundtrips:

...

Tips and Caveats

...

You need root privileges 😞 It can be worthwhile to ask the powers in charge for sudo powers for py-spy, at least temporarily.

...

If you record with py-spy and do nothing on the server, the flamegraph will mostly show flup/kombu doing a whole lot of nothing. So start recording during a time of action.

...

For record, consider using the --function switch, it will show you the function instead of the line number. This can be easier to read if you’re not too familar with the codebase.

...

Don’t forget the --subprocesses flag if you are have multi-process services, e.g. topicd, datasourced, machinelearningd, data loader with parallel flag etc.

...

This page can now be found at Python Performance Profiling Using PySpy on the Squirro Docs site.