-
Notifications
You must be signed in to change notification settings - Fork 84
Algorithm: Optimization Tools
When considering optimizing code in Python, various modules exists, and allows one to gauge the performance for specific segments of logic, or the entirety of a program.
The timit
module is intended to return execution times for smaller blocks of code, such as functions, or statements.
Consider the implementation types:
# Constructor
test = timeit.Timer(stmt='pass', setup='pass', timer=<default timer>, number=1000000)
#test.repeat[repeat=3[, number=1000000]])
#test.timeit([number=1000000])
# Convenience Methods: avoid defining the constructor
timeit.timeit(stmt='pass', setup='pass', timer=<default timer>, number=1000000)
timeit.repeat(stmt='pass', setup='pass', timer=<default timer>, repeat=3, number=1000000)
The above implementations, consists of the following arguments:
-
stmt
, the function or statement to be measured. -
setup
, the code needed to initialize variables (sometimesimport
). -
timer
, choice for a system-specific clock (i.e.time.clock
,time.time
,default_timer
). -
repeat
, number of times to repeatstmt
-
number
, how many timestimeit
executesstmt
to determine the average execution time
Note: both timeit
methods (see above), have preloaded arguments.
Note: stmt
and setup
may contain multiple statements separated by ;
or newlines, as long as they don’t contain multi-line string literals.
import timeit
print 'y' * 3
print 'y' + 'y' + 'y'
print timeit.timeit("x = 'y' * 3)
print timeit.timeit("x = y' + 'y' + 'y'")
Output for the above script:
yyy
yyy
0.0455188751221
0.0446588993073
The repeat()
method is a convenience to call timeit()
multiple times and return a list of results.
import timeit
print timeit.repeat("x = 'y' * 3, repeat=3)
print timeit.repeat("x = y' + 'y' + 'y'", repeat=3)
Output for the above script:
[0.04626893997192383, 0.03693795204162598, 0.03695511817932129]
[0.05083298683166504, 0.036875009536743164, 0.03685498237609863]
import timeit
def test():
return 'y' * 3
print timeit.repeat('test()', setup='from __main__ import test')
Output for the above script:
[0.13706612586975098, 0.13663387298583984, 0.13658595085144043]
The cprofile
is a C extension module in python, which provides more detailed results than the earlier discussed timeit
. This module allows performance times to be reviewed on a more granular level. Specifically, it records the performance for each (sub)functions required to run the desired statement, or program.
Consider the following implementations:
# import, and implement cProfile
import cProfile
cProfile.run(command, filename=None, sort=-1)
cProfile.runctx(command, globals, locals, filename=None)
# import, and implement profile
import profile
profile.run(command, filename=None, sort=-1)
profile.runctx(command, globals, locals, filename=None)
The above implementations, consists of the following arguments:
-
command
, the command, or program to measure -
filename
, an optional argument where the profile data will be serialized into (as binary format) -
sort=-1
, will order the profile result(s) by the standard name. Otherwise, this integer value will correspond to ordering the result based on the column number (i.e.sort=1
is thetotime
column). -
globals
, is a dictionary, representing the global namespace containing relevant variables, or class properties used withincommand
. -
locals
, is a dictionary, representing the local namespace containing relevant variables used withincommand
.
Note: if the filename
argument is specified, the pstats
module can be used to load the data, which allows for further statistical analysis on the performance times.
Note: if the cprofile
module is not available on a desired system, use profile
(pure python module) instead.
Note: the profile
, and cprofile
module are interchangeable, since both export the same interface. The benefit of using cprofile
, is less overhead, however.
This example passes the provided command
into the exec()
function.
$ python
>>> import hashlib
>>> import cProfile
>>> cProfile.run("hashlib.md5('abcdefghijkl').digest()")
4 function calls in 0.000 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.000 0.000 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 {_hashlib.openssl_md5}
1 0.000 0.000 0.000 0.000 {method 'digest' of '_hashlib.HASH' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
This example passes the provided command
into the exec()
function, with the added globals
, and locals
arguments. These two dict
values provide needed context (i.e. variable, or object) to execute the supplied command
.
>>> class Foo(object):
... def method_1(self):
... cProfile.runctx('self.method_2()', globals(), locals())
...
... def method_2(self):
... print 'hello'
...
>>> test = Foo()
>>> test.method_1()
hello
3 function calls in 0.000 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.000 0.000 <stdin>:5(method_2)
1 0.000 0.000 0.000 0.000 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
Note: globals()
, returns a dictionary that represents the current global namespace.
Note: locals()
, returns a dictionary that represents the current local namespace, consisting of names and values of variables within the namespace.
This example passes the provided command
into the exec()
function, without proper context to the self
object.
>>> class Foo(object):
... def method_1(self):
... cProfile.run('self.method_2()')
...
... def method_2(self):
... print 'hello'
...
>>> test = Foo()
>>> test.method_1()
2 function calls in 0.000 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.000 0.000 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 3, in method_1
File "/usr/lib/python2.7/cProfile.py", line 29, in run
prof = prof.run(statement)
File "/usr/lib/python2.7/cProfile.py", line 135, in run
return self.runctx(cmd, dict, dict)
File "/usr/lib/python2.7/cProfile.py", line 140, in runctx
exec cmd in globals, locals
File "<string>", line 1, in <module>
NameError: name 'self' is not defined
This example defines the Profile
object, which allows a [block of code]
to be profiled, and its results to be sorted by cumulative
time.
import cProfile
...
pr = cProfile.Profile()
pr.enable()
[block of code]
pr.disable()
pr.print_stats(sort='cumulative')
Note: the sort
argument can define other arguments, used to sort the profiled results.
The trace
module is similar to the cProfile
module, except the number of times a particular statement executes, can be recorded. In general, this module produces coverage reports, which helps investigate the relationship between functions that call each other.
Consider the following implementations:
# import, and implement Trace
import trace
tracer = trace.Trace([count=1[, trace=1[, countfuncs=0[, countcallers=0[, ignoremods=()[, ignoredirs=()[, infile=None[, outfile=None[, timing=False]]]]]]]]])
tracer.run(cmd)
tracer.runctx(cmd[, globals=None[, locals=None]])
tracer.runfunc(func, *args, **kwds)
The above implementations, consists of the following arguments:
-
count
, enables counting of line numbers. -
trace
, enables line execution counting. -
countfuncs
, enables listing of the functions called during execution. -
countcallers
, enables call relationship tracking. -
ignoremods
, is a list of modules of packages to ignore. -
ignoredirs
, is a list of directories whose modules or packages should be ignored. -
infile
, is a file from which to read stored count information. -
outfile
, is a file in which to write updated count information. -
timing
, enables a timestamp relative to when tracing was started to be displayed. -
cmd
the command, or program to trace. -
globals
, is a dictionary, representing the global namespace containing relevant variables, or class properties used withincommand
. -
locals
, is a dictionary, representing the local namespace containing relevant variables used withincommand
. -
func
, the function to be called with given*args
,**kwds
.
Note: tracer.run()
, and tracer.runctx()
is similar to the cProfile
equivalent methods.
Note: tracer.runfunc()
executes the passed in cmd, with supplied arguments.
import trace
def adder(x, y):
print x+y
return x+y
def loader():
for index in range(2):
adder(index, index)
tracer = trace.Trace(count=False, trace=True)
tracer.run('loader()')
Output for the above script:
--- modulename: test_script, funcname: <module>
<string>(1): --- modulename: test_script, funcname: loader
test_script.py(8): for index in range(2):
test_script.py(9): adder(index, index)
--- modulename: test_script, funcname: adder
test_script.py(4): print x+y
0
test_script.py(5): return x+y
test_script.py(8): for index in range(2):
test_script.py(9): adder(index, index)
--- modulename: test_script, funcname: adder
test_script.py(4): print x+y
2r
test_script.py(5): return x+y
test_script.py(8): for index in range(2):
--- modulename: trace, funcname: _unsettrace
trace.py(80): sys.settrace(None)