Code execution time: how to properly DRY several timeit executions? - python

Let's assume we want to use timeit for some performance testing with different inputs.
The obvious, non-DRY way would be something like this:
import timeit
# define functions to test
def some_list_operation_A(lst):
...
def some_list_operation_B(lst):
...
# create different lists (different input) to test the functions with
...
inputs = [lsta, lstb, lstc, ...]
# measure performance with first function
num = 10
t_lsta = timeit.timeit("some_list_operation_A(lsta)",
setup="from __main__ import some_list_operation_A, lsta",
number=num)
t_lstb = timeit.timeit("some_list_operation_A(lstb)",
setup="from __main__ import some_list_operation_A, lstb",
number=num)
t_lstc = timeit.timeit("some_list_operation_A(lstc)",
setup="from __main__ import some_list_operation_A, lstc",
number=num)
...
# print results & do some comparison stuff
for res in [t_lsta, t_lstb, t_lstc, ...]:
print("{:.4f}s".format(res))
...
# do this ALL OVER AGAIN for 'some_list_operation_B'
...
# print new results
# do this ALL OVER AGAIN for 'some_list_operation_C'
# ...I guess you'll got the point
...
I think it should be very clear that this would be a really ugly way to measure the performance of different functions for different input.
What I currently do is something like this:
...
inputs = dict()
inputs["lsta"] = lsta
inputs["lstb"] = lstb
inputs["lstc"] = lstc
for f in ["some_list_operation_A", "some_list_operation_B", ...]:
r = dict() # results
for key, val in inputs.iteritems():
r[key] = timeit.timeit("{}(inputs[{}])".format(f, key),
setup="from __main__ import {}, inputs".format(f),
number=num
# evaluate results 'r' for function 'f' here
# (includes a comparison of the results -
# that's why I save them in 'r')
...
# loop moves on to next function 'f'
Basically, I am using .format here to insert the function name and insert the right data inputs[key]. After .format fills all {}, the result is one correct stmt string of timeit.
While this is a lot shorter than the obvious non-DRY solution, it is also less readable and more like hack, isn't it?
What would be an appropriate DRY solution for such problems?
I also thought of simply timing the functions with decorators (this would be neat?!) - but I did not succeed: the decorator should not only print the result. In my # evaluate results 'r'-step I am not only printing the results, but I am also comparing them: like computing relative differences and stuff. Thus, I would need the decorator to return something in order to compare the results for each run...
Can someone hint me in the right direction for a clean, pythonic solution? I would like to have more beautiful/ideomatic code...and especially: shorter code!

Related

Can a MagicMock object be iterated over?

What I would like to do is this...
x = MagicMock()
x.iter_values = [1, 2, 3]
for i in x:
i.method()
I am trying to write a unit test for this function but I am unsure about how to go about mocking all of the methods called without calling some external resource...
def wiktionary_lookup(self):
"""Looks up the word in wiktionary with urllib2, only to be used for inputting data"""
wiktionary_page = urllib2.urlopen(
"http://%s.wiktionary.org/wiki/%s" % (self.language.wiktionary_prefix, self.name))
wiktionary_page = fromstring(wiktionary_page.read())
definitions = wiktionary_page.xpath("//h3/following-sibling::ol/li")
print definitions.text_content()
defs_list = []
for i in definitions:
print i
i = i.text_content()
i = i.split('\n')
for j in i:
# Takes out an annoying "[quotations]" in the end of the string, sometimes.
j = re.sub(ur'\u2003\[quotations \u25bc\]', '', j)
if len(j) > 0:
defs_list.append(j)
return defs_list
EDIT:
I may be misusing mocks, I am not sure. I am trying to unit-test this wiktionary_lookup method without calling external services...so I mock urlopen..I mock fromstring.xpath() but as far as I can see I need to also iterate through the return value of xpath() and call a method "text_contents()" so that is what I am trying to do here.
If I have totally misunderstood how to unittest this method then please tell me where I have gone wrong...
EDIT (adding current unittest code)
#patch("lang_api.models.urllib2.urlopen")
#patch("lang_api.models.fromstring")
def test_wiktionary_lookup_2(self, fromstring, urlopen):
"""Looking up a real word in wiktionary, should return a list"""
fromstring().xpath.return_value = MagicMock(
content=["test", "test"], return_value='test\ntest2')
# A real word should give an output of definitions
output = self.things.model['word'].wiktionary_lookup()
self.assertEqual(len(output), 2)
What you actually want to do is not return a Mock with a return_value=[]. You actually want to return a list of Mock objects. Here is a snippet of your test code with the correct components and a small example to show how to test one of the iterations in your loop:
#patch('d.fromstring')
#patch('d.urlopen')
def test_wiktionary(self, urlopen_mock, fromstring_mock):
urlopen_mock.return_value = Mock()
urlopen_mock.return_value.read.return_value = "some_string_of_stuff"
mocked_xpath_results = [Mock()]
fromstring_mock.return_value.xpath.return_value = mocked_xpath_results
mocked_xpath_results[0].text_content.return_value = "some string"
So, to dissect the above code to explain what was done to correct your problem:
The first thing to help us with testing the code in the for loop is to create a list of mock objects per:
mocked_xpath_results = [Mock()]
Then, as you can see from
fromstring_mock.return_value.xpath.return_value = mocked_xpath_results
We are setting the return_value of the xpath call to our list of mocks per mocked_xpath_results.
As an example of how to do things inside your list, I added how to mock within the loop, which is shown with:
mocked_xpath_results[0].text_content.return_value = "some string"
In unittests (this might be a matter of opinion) I like to be explicit, so I'm accessing the list item explicitly and determining what should happen.
Hope this helps.

Python - multiple functions - output of one to the next

I know this is super basic and I have been searching everywhere but I am still very confused by everything I'm seeing and am not sure the best way to do this and am having a hard time wrapping my head around it.
I have a script where I have multiple functions. I would like the first function to pass it's output to the second, then the second pass it's output to the third, etc. Each does it's own step in an overall process to the starting dataset.
For example, very simplified with bad names but this is to just get the basic structure:
#!/usr/bin/python
# script called process.py
import sys
infile = sys.argv[1]
def function_one():
do things
return function_one_output
def function_two():
take output from function_one, and do more things
return function_two_output
def function_three():
take output from function_two, do more things
return/print function_three_output
I want this to run as one script and print the output/write to new file or whatever which I know how to do. Just am unclear on how to pass the intermediate outputs of each function to the next etc.
infile -> function_one -> (intermediate1) -> function_two -> (intermediate2) -> function_three -> final result/outfile
I know I need to use return, but I am unsure how to call this at the end to get my final output
Individually?
function_one(infile)
function_two()
function_three()
or within each other?
function_three(function_two(function_one(infile)))
or within the actual function?
def function_one():
do things
return function_one_output
def function_two():
input_for_this_function = function_one()
# etc etc etc
Thank you friends, I am over complicating this and need a very simple way to understand it.
You could define a data streaming helper function
from functools import reduce
def flow(seed, *funcs):
return reduce(lambda arg, func: func(arg), funcs, seed)
flow(infile, function_one, function_two, function_three)
#for example
flow('HELLO', str.lower, str.capitalize, str.swapcase)
#returns 'hELLO'
edit
I would now suggest that a more "pythonic" way to implement the flow function above is:
def flow(seed, *funcs):
for func in funcs:
seed = func(seed)
return seed;
As ZdaR mentioned, you can run each function and store the result in a variable then pass it to the next function.
def function_one(file):
do things on file
return function_one_output
def function_two(myData):
doThings on myData
return function_two_output
def function_three(moreData):
doMoreThings on moreData
return/print function_three_output
def Main():
firstData = function_one(infile)
secondData = function_two(firstData)
function_three(secondData)
This is assuming your function_three would write to a file or doesn't need to return anything. Another method, if these three functions will always run together, is to call them inside function_three. For example...
def function_three(file):
firstStep = function_one(file)
secondStep = function_two(firstStep)
doThings on secondStep
return/print to file
Then all you have to do is call function_three in your main and pass it the file.
For safety, readability and debugging ease, I would temporarily store the results of each function.
def function_one():
do things
return function_one_output
def function_two(function_one_output):
take function_one_output and do more things
return function_two_output
def function_three(function_two_output):
take function_two_output and do more things
return/print function_three_output
result_one = function_one()
result_two = function_two(result_one)
result_three = function_three(result_two)
The added benefit here is that you can then check that each function is correct. If the end result isn't what you expected, just print the results you're getting or perform some other check to verify them. (also if you're running on the interpreter they will stay in namespace after the script ends for you to interactively test them)
result_one = function_one()
print result_one
result_two = function_two(result_one)
print result_two
result_three = function_three(result_two)
print result_three
Note: I used multiple result variables, but as PM 2Ring notes in a comment you could just reuse the name result over and over. That'd be particularly helpful if the results would be large variables.
It's always better (for readability, testability and maintainability) to keep your function as decoupled as possible, and to write them so the output only depends on the input whenever possible.
So in your case, the best way is to write each function independently, ie:
def function_one(arg):
do_something()
return function_one_result
def function_two(arg):
do_something_else()
return function_two_result
def function_three(arg):
do_yet_something_else()
return function_three_result
Once you're there, you can of course directly chain the calls:
result = function_three(function_two(function_one(arg)))
but you can also use intermediate variables and try/except blocks if needed for logging / debugging / error handling etc:
r1 = function_one(arg)
logger.debug("function_one returned %s", r1)
try:
r2 = function_two(r1)
except SomePossibleExceptio as e:
logger.exception("function_two raised %s for %s", e, r1)
# either return, re-reraise, ask the user what to do etc
return 42 # when in doubt, always return 42 !
else:
r3 = function_three(r2)
print "Yay ! result is %s" % r3
As an extra bonus, you can now reuse these three functions anywhere, each on it's own and in any order.
NB : of course there ARE cases where it just makes sense to call a function from another function... Like, if you end up writing:
result = function_three(function_two(function_one(arg)))
everywhere in your code AND it's not an accidental repetition, it might be time to wrap the whole in a single function:
def call_them_all(arg):
return function_three(function_two(function_one(arg)))
Note that in this case it might be better to decompose the calls, as you'll find out when you'll have to debug it...
I'd do it this way:
def function_one(x):
# do things
output = x ** 1
return output
def function_two(x):
output = x ** 2
return output
def function_three(x):
output = x ** 3
return output
Note that I have modified the functions to accept a single argument, x, and added a basic operation to each.
This has the advantage that each function is independent of the others (loosely coupled) which allows them to be reused in other ways. In the example above, function_two() returns the square of its argument, and function_three() the cube of its argument. Each can be called independently from elsewhere in your code, without being entangled in some hardcoded call chain such as you would have if called one function from another.
You can still call them like this:
>>> x = function_one(3)
>>> x
3
>>> x = function_two(x)
>>> x
9
>>> x = function_three(x)
>>> x
729
which lends itself to error checking, as others have pointed out.
Or like this:
>>> function_three(function_two(function_one(2)))
64
if you are sure that it's safe to do so.
And if you ever wanted to calculate the square or cube of a number, you can call function_two() or function_three() directly (but, of course, you would name the functions appropriately).
With d6tflow you can easily chain together complex data flows and execute them. You can quickly load input and output data for each task. It makes your workflow very clear and intuitive.
import d6tlflow
class Function_one(d6tflow.tasks.TaskCache):
function_one_output = do_things()
self.save(function_one_output) # instead of return
#d6tflow.requires(Function_one)
def Function_two(d6tflow.tasks.TaskCache):
output_from_function_one = self.inputLoad() # load function input
function_two_output = do_more_things()
self.save(function_two_output)
#d6tflow.requires(Function_two)
def Function_three():
output_from_function_two = self.inputLoad()
function_three_output = do_more_things()
self.save(function_three_output)
d6tflow.run(Function_three()) # executes all functions
function_one_output = Function_one().outputLoad() # get function output
function_three_output = Function_three().outputLoad()
It has many more useful features like parameter management, persistence, intelligent workflow management. See https://d6tflow.readthedocs.io/en/latest/
This way function_three(function_two(function_one(infile))) would be the best, you do not need global variables and each function is completely independent of the other.
Edited to add:
I would also say that function3 should not print anything, if you want to print the results returned use:
print function_three(function_two(function_one(infile)))
or something like:
output = function_three(function_two(function_one(infile)))
print output
Use parameters to pass the values:
def function1():
foo = do_stuff()
return function2(foo)
def function2(foo):
bar = do_more_stuff(foo)
return function3(bar)
def function3(bar):
baz = do_even_more_stuff(bar)
return baz
def main():
thing = function1()
print thing

Executing a function in reverse in Python

I have a function that looks something like this:
def f():
call_some_function_A()
call_some_function_B()
[...]
call_some_function_Z()
I'd like the function to be executed in reverse; that is, the execution must look like:
def f'():
call_some_function_Z()
[...]
call_some_function_B()
call_some_function_A()
(f will always be such that it is logically possible to reverse it; i.e. there are no variable declarations or anything like that).
How can I accomplish this?
I can't just write a function f' that calls the statements from f in reverse, because I don't want to have to update f' every time f is changed.
I also can't modify f.
(Please don't tell me that I shouldn't try to do that, or redesign my code, or anything like that- it's not a possibility.)
If your f() consists entirely of these function calls, you can remake it into a list:
functions = [
call_some_function_A,
call_some_function_B,
# [...]
call_some_function_Z,
]
And then use it to call the functions in (reversed) order.
def f():
for func in functions:
func()
def f_():
for func in reversed(functions):
func()
Please don't do this.
If your f() consists entirely of these function calls:
def f():
call_some_function_A()
call_some_function_B()
# [...]
call_some_function_Z()
...you can hack into it and get all the names it references:
names = f.__code__.co_names
# ('call_some_function_A', 'call_some_function_B', 'call_some_function_Z')
But you still need to get the corresponding functions.
If the functions are in some other module or anything similar, just do this:
functions = [getattr(some_module, name) for name in names]
If the functions are defined in the same file as globals, do this:
functions = [globals()[name] for name in names]
# [<function __main__.call_some_function_A>, <function __main__.call_some_function_B>, <function __main__.call_some_function_Z>]
Then all you need to do is call them in reverse order:
def f_():
for func in reversed(functions):
func()
Alternatively, you can obtain the function's source code, parse it, reverse the the abstract syntax tree, compile it back, execute it... and you will have yourself the reversed function.
Let's consider this example:
def f():
call_some_function_A()
if whatever:
call_some_function_B()
call_some_function_C()
call_some_function_D()
import inspect
import ast
original_f = f
source = inspect.getsource(f)
tree = ast.parse(source)
# tree is a Module, with body consisting of 1 FunctionDef
# tree.body[0] is a FunctionDef, with body consisting of Exprs
tree.body[0].body.reverse()
# top level expressions will be reversed
# compile the modified syntax tree to a code object as a module and execute it
exec(compile(tree, '<unknown>', 'exec'))
# f will be overwritten because the function name stays the same
# now f will be equivalent to:
# def f():
# call_some_function_D()
# if test:
# call_some_function_B()
# call_some_function_C()
# call_some_function_A()
f_ = f
f = original_f
So yes, this method is a bit better. It is even possible to recursively reverse all the bodys and achieve the reversal of ...B and ...C as well, but if even the simplest logic code is introduced, you will run into bad problems.
I hacked together this small function which assumes that the function is a simple list of one line statements. It uses exec which is another form of eval and so it makes it hard to compile the code but if you can live with evaluated code here it is:
import inspect
# sample function that will be reversed
def f():
print "first statement"
print "2nd statement"
print "last statement"
def makeReversedFunctionSrc(newName, f):
src = inspect.getsource(f)
srcLines = src.split("\n")
srcLines = srcLines[1:] # get rid of the old function definition
srcLines.reverse() # reverse function body
# hack together new function definition with reversed lines
newSrc = "def " + newName + "():\n"
for line in srcLines:
if line.strip() != "":
newSrc += line + "\n"
return newSrc
# get the code as a string
reverseCode = makeReversedFunctionSrc("reversedF", f)
# execute the string as if it was python (I heard thats evil as in eval)
exec(reverseCode)
# now lets call our new function
reversedF()

Turn the dictionary keys into variable names with same values in Python from .mat Matlab files using scipy.io.loadmat

I am trying to take a basic dictionary temp = {'key':array([1,2])} loaded from a .mat file with scipy.io.loadmat. Turn the keys in the Python dictionary file returned by loadmat() into variable names with values the same as the representing keys.
So for example:
temp = {'key':array([1,2])}
turned into
key = array([1,2])
I know how to grab the keys with temp.keys(). Then grabbing the items is easy but how do I force the list of strings in temp.keys() to be variable names instead of strings.
I hope this makes sense but this is probably really easy I just can't think how to do it.
Cheers
In python, method parameters can be passed as dictionnaries with the ** magic:
def my_func(key=None):
print key
#do the real stuff
temp = {'key':array([1,2])}
my_func(**temp)
>>> array([1,2])
The best thing to do is to use temp['key']. To answer the question, however, you could use the exec function. The benefits of doing it this way is that you can do this don't have to hard code any variable names or confine yourself to work inside a function.
from numpy import array,matrix
temp = {'key':array([1,2]),'b': 4.3,'c': 'foo','d':matrix([2,2])}
for k in temp:
exec('{KEY} = {VALUE}'.format(KEY = k, VALUE = repr(temp[k])))
>>> key
array([1, 2])
>>> b
4.3
>>> c
'foo'
>>> d
matrix([[2, 2]])
NOTE : This will only work if you have imported the specific function from the modules. If you don't want to do this because of code practice or the sheer volume of function that you would need to import, you could write a function to concatenate the module name in front of the entry. Output is the same as the previous example.
import numpy as np,numpy
temp = {'key':np.array([1,2]),'b': 4.3,'c': 'foo','d':np.matrix([2,2])}
def exec_str(key,mydict):
s = str(type(mydict[key]))
if '.' in s:
start = s.index("'") + 1
end = s.index(".") + 1
v = s[start:end:] + repr(mydict[key])
else:
v = repr(mydict[key])
return v
for k in temp:
exec('{KEY} = {VALUE}'.format(KEY = k, VALUE = exec_str(k,temp)))
While this isn't the best code practice, It works well for all of the examples I tested.
A better way may be to stuff the data to a separate object:
class attrdict(dict):
def __getattr__(self, k): return self[k]
def __setattr__(self, k, v): self[k] = v
somedict = {'key': 123, 'stuff': 456}
data = attrdict(somedict)
print data.key
print data.stuff
This is about as easy to use interactively, and does not require any magic.
This should be OK for Matlab-users, too.
EDIT: turns out the stuff below doesn't actually work most of the time. Too bad, so much for magic.
If you want to meddle with magic, though, you can do something like
locals().update(somedict)
This will work fine interactively, and you can even hide the access to locals() inside the loader function by messing with sys._getframe().f_back.f_locals.
However, this will not work in functions:
def foo():
locals().update({'a': 4})
print a
The point is that a above is bound to global variable at compile time, and so Python does not try looking it up in among local variables.
To use exec() is more simple as hdhagman answered. I write more simple code.
temp = {'key':[1,2]}
for k, v in temp.items():
exec("%s = %s" % (k, v))
print(key)
=> [1,2]
None of the answers above worked for me with numpy arrays and other non-built-in types. However, the following did:
import numpy as np
temp = {'a':np.array([1,2]), 'b': 4.3, 'c': 'foo', 'd':np.matrix([2,2])}
for var in temp.keys():
exec("{} = temp['{}']".format(var, var))
Note the order of the quotes. This allows var to be treated as a variable in the first instance, and then as a key in the second instance, indexing into the temp dictionary.
Of course, the usual disclaimers about the dangers of exec() and eval() still apply, and you should only run this on input you absolutely trust.
While I would just recommend using the dictionary directly and accessing the arrays like temp['key'], if you knew all of the variable names ahead of time you could write a function to extract them to individual variables:
def func(**kwargs):
return kwargs['a'],kwargs['b']
temp = {'a':np.array([1,2]),'b':np.array([3,4])}
a,b = func(**temp)
del temp # get rid of temporary dict

Efficient way of calling set of functions in Python

I have a set of functions:
functions=set(...)
All the functions need one parameter x.
What is the most efficient way in python of doing something similar to:
for function in functions:
function(x)
The code you give,
for function in functions:
function(x)
...does not appear to do anything with the result of calling function(x). If that is indeed so, meaning that these functions are called for their side-effects, then there is no more pythonic alternative. Just leave your code as it is.† The point to take home here, specifically, is
Avoid functions with side-effects in list-comprehensions.
As for efficiency: I expect that using anything else instead of your simple loop will not improve runtime. When in doubt, use timeit. For example, the following tests seem to indicate that a regular for-loop is faster than a list-comprehension. (I would be reluctant to draw any general conclusions from this test, thought):
>>> timeit.Timer('[f(20) for f in functions]', 'functions = [lambda n: i * n for i in range(100)]').repeat()
[44.727972984313965, 44.752119779586792, 44.577917814254761]
>>> timeit.Timer('for f in functions: f(20)', 'functions = [lambda n: i * n for i in range(100)]').repeat()
[40.320928812026978, 40.491761207580566, 40.303879022598267]
But again, even if these tests would have indicated that list-comprehensions are faster, the point remains that you should not use them when side-effects are involved, for readability's sake.
†: Well, I'd write for f in functions, so that the difference beteen function and functions is more pronounced. But that's not what this question is about.
If you need the output, a list comprehension would work.
[func(x) for func in functions]
I'm somewhat doubtful of how much of an impact this will have on the total running time of your program, but I guess you could do something like this:
[func(x) for func in functions]
The downside is that you will create a new list that you immediatly toss away, but it should be slightly faster than just the for-loop.
In any case, make sure you profile your code to confirm that this really is a bottleneck that you need to take care of.
Edit: I redid the test using timeit
My new test code:
import timeit
def func(i):
return i;
a = b = c = d = e = f = func
functions = [a, b, c, d, e, f]
timer = timeit.Timer("[f(2) for f in functions]", "from __main__ import functions")
print (timer.repeat())
timer = timeit.Timer("map(lambda f: f(2), functions)", "from __main__ import functions")
print (timer.repeat())
timer = timeit.Timer("for f in functions: f(2)", "from __main__ import functions")
print (timer.repeat())
Here is the results from this timing.
testing list comprehension
[1.7169530391693115, 1.7683839797973633, 1.7840299606323242]
testing map(f, l)
[2.5285000801086426, 2.5957231521606445, 2.6551258563995361]
testing plain loop
[1.1665718555450439, 1.1711149215698242, 1.1652190685272217]
My original, time.time() based timings are pretty much inline with this testing, plain for loops seem to be the most efficient.

Categories