Measure time of a function with arguments in Python - python

I am trying to measure the time of raw_queries(...), unsuccessfully so far. I found that I should use the timeit module. The problem is that I can't (= I don't know how) pass the arguments to the function from the environment.
Important note: Before calling raw_queries, we have to execute phase2() (environment initialization).
Side note: The code is in Python 3.
def raw_queries(queries, nlp):
""" Submit queries without getting visual response """
for q in queries:
nlp.query(q)
def evaluate_queries(queries, nlp):
""" Measure the time that the queries need to return their results """
t = Timer("raw_queries(queries, nlp)", "?????")
print(t.timeit())
def phase2():
""" Load dictionary to memory and subsequently submit queries """
# prepare Linguistic Processor to submit it the queries
all_files = get_files()
b = LinguisticProcessor(all_files)
b.loadDictionary()
# load the queries
queries_file = 'queries.txt'
queries = load_queries(queries_file)
if __name__ == '__main__':
phase2()
Thanks for any help.
UPDATE: We can call phase2() using the second argument of Timer. The problem is that we need the arguments (queries, nlp) from the environment.
UPDATE: The best solution so far, with unutbu's help (only what has changed):
def evaluate_queries():
""" Measure the time that the queries need to return their results """
t = Timer("main.raw_queries(queries, nlp)", "import main;\
(queries,nlp)=main.phase2()")
sf = 'Execution time: {} ms'
print(sf.format(t.timeit(number=1000)))
def phase2():
...
return queries, b
def main():
evaluate_queries()
if __name__ == '__main__':
main()

First, never use the time module to time functions. It can easily lead to wrong conclusions. See timeit versus timing decorator for an example.
The easiest way to time a function call is to use IPython's %timeit command.
There, you simply start an interactive IPython session, call phase2(), define queries,
and then run
%timeit raw_queries(queries,nlp)
The second easiest way that I know to use timeit is to call it from the command-line:
python -mtimeit -s"import test; queries=test.phase2()" "test.raw_queries(queries)"
(In the command above, I assume the script is called test.py)
The idiom here is
python -mtimeit -s"SETUP_COMMANDS" "COMMAND_TO_BE_TIMED"
To be able to pass queries to the raw_queries function call, you have to define the queries variable. In the code you posted queries is defined in phase2(), but only locally. So to setup queries as a global variable, you need to do something like have phase2 return queries:
def phase2():
...
return queries
If you don't want to mess up phase2 this way, create a dummy function:
def phase3():
# Do stuff like phase2() but return queries
return queries

Custom timer function may be a solution:
import time
def timer(fun,*args):
start = time.time()
ret = fun(*args)
end = time.time()
return (ret, end-start)
Using like this:
>>> from math import sin
>>> timer(sin, 0.5)
(0.47942553860420301, 6.9141387939453125e-06)
It means that sin returned 0.479... and it took 6.9e-6 seconds. Make sure your functions run long enough if you want to obtain reliable numbers (not like in the example above).

Normally, you would use timeit.
Examples are here and here.
Also Note:
By default, timeit() temporarily turns
off garbage collection during the
timing. The advantage of this approach
is that it makes independent timings
more comparable. This disadvantage is
that GC may be an important component
of the performance of the function
being measured
Or you can write your own custom timer using the time module.
If you go with a custom timer, remember that you should use time.clock() on windows and time.time() on other platforms. (timeit chooses internally)
import sys
import time
# choose timer to use
if sys.platform.startswith('win'):
default_timer = time.clock
else:
default_timer = time.time
start = default_timer()
# do something
finish = default_timer()
elapsed = (finish - start)

I'm not sure about this, I've never used it, but from what I've read it should be something like this:
....
t = Timer("raw_queries(queries, nlp)", "from __main__ import raw_queries")
print t.timeit()
I took this from http://docs.python.org/library/timeit.html (if this helps).

You don't say so, but are you by any chance trying to make the code go faster? If so, I suggest you not focus in on a particular routine and try to time it. Even if you get a number, it won't really tell you what to fix. If you can pause the program under the IDE several times and examine it's state, including the call stack, it will tell you what is taking the time and why. Here is a link that gives a brief explanation of how and why it works.*
*When you follow the link, you may have to go the the bottom of the previous page of answers. SO is having trouble following a link to an answer.

Related

Is there way to have code running concurrently (more specifically PyAutoGui)?

I have the following code
def leftdoor():
press('a')
pyautogui.sleep(1)
press('a')
def rightdoor():
press('d')
pyautogui.sleep(1)
press('d')
leftdoor()
rightdoor()
and when I run the code what happens is the letter A is pressed and 1 second is waited and then its pressed again. Then the same happens for the D key. However is there a way for me to be able to press them both down and express that in code by calling both functions and not having to wait for the .sleep of the previous function?
There are two ways to run your code concurrently:
Combine the functions (might not be possible for large functions)
In the case of your code, it would look like this:
def door():
press('a')
press('d')
sleep(1)
press('a')
press('d')
door()
If this isn't what you're looking for, use threading.
Theading
Here is a link to a tutorial on the module, and the code is below.
from threading import Thread # Module import
rdt = Thread(target=rightdoor) # Create two Thread objects
ldt = Thread(target=leftdoor)
rdt.start() # start and join the objects
ldt.start()
rdt.join()
ldt.join()
print("Finished execution") # done!
Note that using this does not absolutely guarantee that a and d will be pressed at the same time (I got a ~10 millisecond delay at max, and it might have been from the program I used to time it), but it should work for all purposes.

Measure instantiation + method execution time in Python

I have a Python class and want to measure the time it takes to instantiate the class and execute a method across numerous, e.g., 100, runs.
I noticed that the first run takes considerably longer than consecutive runs. I assume that is caused by branch prediction since the input does not change. However, I want to measure the time it takes "from scratch", i.e., without the benefit of branch prediction. Note that constructing a realistic input is difficult in this case, thus the runs have to be executed on the same input.
To tackle this, I tried creating a new object on each run and delete the old object:
import time
class Myobject:
def mymethod(self):
"""
Does something complex.
"""
pass
def benchmark(runs=100):
"""
The argument runs corresponds to the number of times the benchmark is to be executed.
"""
times_per_run = []
r = range(runs)
for _ in r:
t2_start = time.perf_counter()
# instantiation
obj = Myobject()
# method execution
obj.mymethod()
del obj
t2_stop = time.perf_counter()
times_per_run.append(t2_stop-t2_start)
print(times_per_run)
benchmark(runs=10)
Executing this code shows that the average time per run varies significantly. The first run takes consistently longer. How do I eliminate the benefit of branch prediction when benchmarking across multiple runs?
To avoid the benefits of warmup (s. comments on post), I used the subprocess module to trigger the runs individually while measuring the time for each run and aggregate the results afterwards:
def benchmark(runs=100):
times_per_run = []
command = "python3 ./myclass.py"
for _ in range(runs):
t1_start = time.perf_counter()
subprocess.run([command], capture_output=True, shell=True, check=False)
t1_stop = time.perf_counter()
times_per_run.append(t1_stop - t1_start)
logging.info(f"Average time per run: {sum(times_per_run) / runs}")
benchmark()
This yields stable results.

Function runs endlessly with nested function when multiprocessed

I am struggeling around with multiprocessing. I have some heavy image processing to do and wanted to make use of multi-core CPU power. However, I tried a lot and finally wanted to use the concurrent.futures module because it is more or less quite handy to use. But when I set up my programm, it runs and runs and runs and... It does not stop. The basic idea is as follows (not related to image processing, just a dummy setup):
import concurrent.futures as cf
import time
import multiprocessing as mp
def someFunc(seconds, multiplier=1):
time.sleep(multiplier*seconds)
return (f'Slept for {multiplier*seconds} s, Proc: {mp.current_process()}')
def parallelize(secs):
factor=2
def wrapper(sec):
return someFunc(sec, factor)
with cf.ProcessPoolExecutor() as executor:
results=[executor.submit(wrapper, secs) for _ in range(8)]
for result in cf.as_completed(results):
print(result.result())
So, I am running this under windows 10 in a jupyter notebook. For this reason, the functions are saved in a separate func.py file which is imported in the notebook and then runs using the if__name__=='main' -statement.
import func
if __name__=='__main__':
func.parallelize('some_int_number')
The reason I do this is that I have to pass two arguments to the parallelize() function, but the submit()-method only provides one argument. I know, one could also make use of the map()-method or whatever, but for some reason (overhead?!?!) the effect of parallelizing is not very significant (I am playing with the possibilities for days now). So I wanted to try the submit()-method as proposed.
BUT, this does not work (the script runs endlessly) and I don't know why. The problem also is, I have to handle a 'static' argument (factor) which is only known in the scope of the parallelize-function.
If I would define the wrapper-function outside the parallelize-function, the script would run as expected, but then I had the problem of the static factor variable.
Any ideas?
Greetings
phtagen

A module to profile peak memory usage of Python code

Currently, I tried to use the memory_profiler module to get the used memory like the following code:
from memory_profiler import memory_usage
memories=[]
def get_memory(mem,ended):
if ended:
highest_mem=max(mem)
print highest_mem
else:
memories.append(mem)
def f1():
#do something
ended=False
get_memory(memory_usage(),ended)
return #something
def f2():
#do something
ended=False
get_memory(memory_usage(),ended)
return #something
#main
f1()
f2()
ended=True
get_memory(memory_usage(),ended) #code end
>>>#output
# output
# highest memory
however, it did not successfully execute. It got stuck when ended=True and sent the value of memory_usage() and ended to the function of get_memory. It did not show any error as well., just waiting for long long time, then I force to stop executing. Anyone knows the better way or the solution?
An easy way to use memory_usage to get the peak / maximum memory from a block of code is to first put that code in a function, and then pass that function - without the () call - to memory_usage() as the proc argument:
from memory_profiler import memory_usage
def myfunc():
# code
return
mem = max(memory_usage(proc=myfunc))
print("Maximum memory used: {} MiB".format(mem))
Other arguments allow you to collect timestamps, return values, pass arguments to myfunc, etc. The docstring seems to be the only complete source for documentation on this: https://github.com/fabianp/memory_profiler/blob/master/memory_profiler.py
https://github.com/fabianp/memory_profiler/blob/4089e3ed4d5c4197925a2df8393d4cbfca745ae5/memory_profiler.py#L244
I mainly use Heapy because it's really easy to use.
Just type the following code where you want to test for memory usage.
from guppy import hpy
hp = hpy()
print hp.heap()

Python simplest form of multiprocessing

Ive been trying to read up on threading and multiprocessing but all the examples are to intricate and advanced for my level of python/programming knowlegde. I want to run a function, which consists of a while loop, and while that loop runs I want to continue with the program and eventually change the condition for the while-loop and end that process. This is the code:
class Example():
def __init__(self):
self.condition = False
def func1(self):
self.condition = True
while self.condition:
print "Still looping"
time.sleep(1)
print "Finished loop"
def end_loop(self):
self.condition = False
The I make the following function-calls:
ex = Example()
ex.func1()
time.sleep(5)
ex.end_loop()
What I want is for the func1 to run for 5s before the end_loop() is called and changes the condition and ends the loop and thus also the function. I.e I want one process to start and "go" into func1 and at the same time I want time.sleep(5) to be called, so the processes "split" when arriving at func1, one process entering the function while the other continues down the program and start with the time.sleep(5) execution.
This must be the most basic example of a multiprocess, still Ive had trouble finding a simple way to do it!
Thank you
EDIT1: regarding do_something. In my real problem do_something is replaced by some code that communicates with another program via a socket and receives packages with coordinates every 0.02s and stores them in membervariables of the class. I want this constant updating of the coordinates to start and then be able to to read the coordinates via other functions at the same time.
However that is not so relevant. What if do_something is replaced by:
time.sleep(1)
print "Still looping"
How do I solve my problem then?
EDIT2: I have tried multiprocessing like this:
from multiprocessing import Process
ex = Example()
p1 = Process(target=ex.func1())
p2 = Process(target=ex.end_loop())
p1.start()
time.sleep(5)
p2.start()
When I ran this, I never got to p2.start(), so that did not help. Even if it had this is not really what Im looking for either. What I want would be just to start the process p1, and then continue with time.sleep and ex.end_loop()
The first problem with your code are the calls
p1 = Process(target=ex.func1())
p2 = Process(target=ex.end_loop())
With ex.func1() you're calling the function and pass the return value as target parameter. Since the function doesn't return anything, you're effectively calling
p1 = Process(target=None)
p2 = Process(target=None)
which makes, of course, no sense.
After fixing that, the next problem will be shared data: when using the multiprocessing package, you implement concurrency using multiple processes which, by default, cannot simply share data afaik. Have a look at Sharing state between processes in the package's documentation to read about this. Especially take the first sentence into account: "when doing concurrent programming it is usually best to avoid using shared state as far as possible"!
So you might want to also have a look at Exchanging objects between processes to read about how to send/receive data between two different processes. So, instead of simply setting a flag to stop the loop, it might be better to send a message to signal the loop should be terminated.
Also note that processes are a heavyweight form of multiprocessing, they spawn multiple OS processes which comes with a relatively big overhead. multiprocessing's main purpose is to avoid problems imposed by Python's Global Interpreter Lock (google about this to read more...) If your problem is'nt much more complex than what you've told us, you might want to use the threading package instead: threads come with less overhead than processes and also allow to access the same data (although you really should read about synchronization when doing this...)
I'm afraid, multiprocessing is an inherently complex subject. So I think you will need to advance your programming/python skills to successfully use it. But I'm sure you'll manage this, the python documentation about this is comprehensive and there are a lot of other resources about this.
To tackle your EDIT2 problem, you could try using the shared memory map Value.
import time
from multiprocessing import Process, Value
class Example():
def func1(self, cond):
while (cond.value == 1):
print('do something')
time.sleep(1)
return
if __name__ == '__main__':
ex = Example()
cond = Value('i', 1)
proc = Process(target=ex.func1, args=(cond,))
proc.start()
time.sleep(5)
cond.value = 0
proc.join()
(Note the target=ex.func1 without the parentheses and the comma after cond in args=(cond,).)
But look at the answer provided by MartinStettner to find a good solution.

Categories