Metadata | |
---|---|
cEP | 0021 |
Version | 1.0 |
Title | Profile Bears |
Authors | Vaibhav Kumar Rai mailto:[email protected] |
Status | Proposed |
Type | Process |
This cEP describes the detailed implementation of a profiling interface as a part of the GSoC'18 project.
Profiling a Python program is a form of dynamic analysis that measures the execution time of the program and the functions it is composed of. Its goal is to produce data about where program is spending time and what area might be worth optimizing. Thus we can provide a Profiling interface from within coala and make it available to bear writers so that they can create better performant bears.
This cEP proposes to introduce the profiling interface so that bear writers will have the ability to Profile the Bear's code to optimize its performance.
Bear writers can invoke the profiler with the --profile-bears
argument.
Bear writer will have the ability to directly dump the raw
profile output (binary output) either on current working directory or to a
specified directory name, which can be further used for examination
of profiler stats with the help of different modules like
pstats
or snakeviz
.
--profile-bears
orprofile_bears
(using.coafile
) is the main argument to enable profiling. It accepts an additional parameterDirname
through which bear writers can specify where to store the dump.
The profiler will start by profiling the run()
method of bears
because it consumes most of the bears time. So, this is part where bear writer
will spend time, as rest of the part like loading the files, collecting the
settings, etc. are done by coala itself.
When user invokes the profiler with either CLI or by coafile
then the execute
method of Bear base class will call the
run_bear_from_section
method.
def run_bear_from_section(self, args, kwargs):
...
profile_bears = kwargs.get('profile_bears')
if profile_bears is None:
self.run(*args, **kwargs)
else:
self.profile(self.run, *args, **kwargs)
...
run_bear_from_section
method checks whether bear writer invoke the profiler
interface or not i.e., keyword arguments passed to the Bear contain the
profile_bears
key or not.
-
If not then
run_bear_from_section()
method will directly call therun()
method of Bear so that coala can continue its normal execution. -
Else
run_bear_from_section()
method will callprofile()
method which will check the whether the value ofprofile_bears
key value isTrue
orDirname
.-
If
True
thenprofile()
will create a profiler file for every bear of every section with a name{section_name}_{bear_name}.prof
and dump it in a current working directory.path = os.getcwd() if str(profile_bears) == 'True': filename = '{}_{}.prof'.format( kwargs['section_name'], kwargs['bear_name']) open(filename, 'wb') kwargs.pop('section_name') kwargs.pop('bear_name') cProfile.runctx('self.run(*args, **kwargs)', globals(), locals(), filename)
-
If
Dirname
then it check whether the specified directory exits or not.If exists then
profile()
method will create a profiler file for every bear of every section with a name{section_name}_{bear_name}.prof
and dumps it on the directoryDirname
.If not then it will create a directory with the literal name
Dirname
and dump then profiler files on that.subdir = str(profile_dump) filepath = os.path.join(path, subdir) if os.path.isdir(filepath): filename = '{}_{}.prof'.format( kwargs['section_name'], kwargs['bear_name']) open(os.path.join(filepath, filename), 'wb') kwargs.pop('section_name') kwargs.pop('bear_name') cProfile.runctx('self.run(*args, **kwargs)', globals(), locals(), os.path.join(filepath, filename)) else: os.mkdir(os.path.join(path, subdir)) filename = '{}_{}.prof'.format( kwargs['section_name'], kwargs['bear_name']) open(os.path.join(filepath, filename), 'wb') kwargs.pop('section_name') kwargs.pop('bear_name') cProfile.runctx('self.run(*args, **kwargs)', globals(), locals(), os.path.join(filepath, filename))
-
A profiler produces a binary output that will contain a set of statistics that
describes how often and for how long various parts of the program executed.
These statistics can be formatted into reports via the pstats
module or
by using third party tools like SnakeViz.
These formatted reports will help bear writers in creating a better
performant bear by analysing the profiler output.
For e.g.
With CLI arguments
coala --b PEPE8Bear --files <file.py> --profile-bears
It will create a profile file with name cli_PEP8Bear
.
and after that Bear writers can use different modules like
pstats
or snakeviz
, which reads profile results from a
file and formats them in various ways i.e.
-
Using pstats :
pstats
has a simple line-oriented interface (implemented using cmd). Apstats
module is a part of core Python. So, anyone can with a direct importpstats
and start working with it.An example of how to generate a formatted output using
pstats
module and manipulate the profiler data using different methods ofpsats
, likestrip_dirs()
,sort_stats('cumtime')
. Herestrip_dirs()
remove the extraneous path from all the module names andsort_stats('cumtime')
sort the output by cumulative time spent in this and all subfunctions (from invocation till exit).>>> import pstats >>> ps = pstats.Stats('cli_PEP8Bear.prof') >>> ps.strip_dirs().sort_stats('cumtime').print_stats(5) Thu May 31 00:16:03 2018 python_PEP8Bear.prof 7 function calls in 0.000 seconds Ordered by: cumulative time List reduced from 6 to 5 due to restriction <5> ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 0.000 0.000 {built-in method builtins.exec} 1 0.000 0.000 0.000 0.000 <string>:1(<module>) 1 0.000 0.000 0.000 0.000 __init__.py:111(wrapping_function) 2 0.000 0.000 0.000 0.000 {method 'get' of 'dict' objects} 1 0.000 0.000 0.000 0.000 PEP8Bear.py:22(run) <pstats.Stats object at 0x7f2e69028a90>
For more information about
pstats
, check this link. -
Using SnakeViz:
SnakeViz is a browser based graphical viewer for the output of Python’s
cProfile
module. SnakeViz creates sunburst and icicle diagrams using profiler data as input. Using sunburst diagram Snakeviz represent the amount of time spent inside a function as a angular width of the arc.Sometimes functions don’t just spend time calling other functions, they also have their own internal time. SnakeViz shows this by putting a special child on each node that represents internal time.
SnakeViz also provide a Funtion Info functionality in which it display a list of information about function like function name, Cumulative Time, name of the file in which the function is defined, line number on which the function is defined, Directory of a flle.
For more information about SnakeViz, check this link.
SnakeViz is available on PyPI. Install with pip i.e.
$ pip install snakeviz
Start SnakeViz from the command line i.e.
$ snakeviz cli_PEP8Bear.prof