Awesome
pytrace - a fast python tracer
pytrace records function calls, arguments and return values.
traces aid debugging, profiling and obviate logging.
pytrace has been tested on python 2.7 and python 3.2. (should support 2.6 and up)
pytrace has been tested on os x and several linux distributions.
Follow @alonhorev on twitter for updates.
Install
pytrace depends on sqlite and a C implementation of protocol buffers.
on debian/ubuntu: sudo apt-get install libsqlite3-dev libprotobuf-c0-dev
on fedora: sudo yum install libsqlite3x-devel sqlite-devel python-devel protobuf-c-devel
on mac (sqlite is included): brew install protobuf-c
or port install protobuf-c
install pytrace:
pip install pytrace
Usage
Invoke pytrace with your script:
$ pytrace foo.py --bar
Invoke the reader from the same directory by executing pytrace with no arguments:
$ pytrace
The reader can be invoked while the script is running. providing 'online' debugging capabilities.
Reader
The collected data can be viewed in an interactive reader. The reader supports less-like key bindings.
The reader can search for regular expressions:
The reader can filter traces using a python expressions. The following fields can be used for filters:
- time - int
- tid - int
- module - string
- func - string
- arg (argument name) - string. special arguments are 'return value' and 'exception' used to filter function return values and exceptions.
- value (argument value) - string
- type (argument type) - string
Field types:
- int fields - supports algebric operators (>, <, >=, <=, ==). e.g:
time > '2012/08/15 01:23:45'
. - string fields: support string comparison (==, !=). strings comparison supports sql 'like' syntax. for example:
module == 'proj%'
filters modules starting with 'proj'.
The reader corrects queries that don't match anything:
The Database
The database is saved in the current working directory and is named traces.sqlite
.
In order to not run out of disk space, The database will be truncated to a fixed number of traces (currently hard coded to 10000).
Reducing the overhead
Hot functions can be skipped using a decorator:
from pytrace import notrace
@notrace
def hot():
pass
Trace specific packages:
$ export TRACE_MODULES=/Users/alon/project
You can specify a colon :
separated list of folders as well.
Architecture
pytrace can be broken down to three parts:
- a trace generator - using python's built-in tracing mechanism (sys.settrace) function calls are translated to binary trace records saved in memory as protocol buffers (http://code.google.com/p/protobuf-c/).
- a trace dumper - runs in a separate thread/process, collects traces from memory and dumps them to a sqlite database.
- a trace reader - reads traces from the database.
The separation of trace generation and dumping has several advantages:
- speed - the overhead on the traced code is minimal, serialization to protocol buffers is faster then inserting to a database.
- prioritization - the trace records are saved to a lock-less circular buffer. The generator has priority over the dumper, meaning, if the dumper doesn't keep up it will lose traces (the generator never blocks).
- Python utilizes two cores! - the dumper thread does not touch python objects, only trace records that are saved as binary strings. therefore, it doesn't acquire the global interpreter lock.
- Trace aggregation from several processes - By using shared memory the trace data can be shared between a dumper process and one or more generator processes.
TODO
- Extract to configuration:
- size of shared memory.
- db path.
- max traces.
- Add an option to ignore modules under site-packages.
- Explicit tracing (logging).
- Sort the arguments.
- Scroll horizontally.
- Ignore traces created by the library.
- Filter query autocompletion using tab/arrows.
- Print '...' on overflown strings.
- Document multiple process support. (one db, two processes)