I have a Python class and want to measure the time it takes to instantiate the class and execute a method across numerous, e.g., 100, runs.
I noticed that the first run takes considerably longer than consecutive runs. I assume that is caused by branch prediction since the input does not change. However, I want to measure the time it takes "from scratch", i.e., without the benefit of branch prediction. Note that constructing a realistic input is difficult in this case, thus the runs have to be executed on the same input.
To tackle this, I tried creating a new object on each run and delete the old object:
import time
class Myobject:
def mymethod(self):
"""
Does something complex.
"""
pass
def benchmark(runs=100):
"""
The argument runs corresponds to the number of times the benchmark is to be executed.
"""
times_per_run = []
r = range(runs)
for _ in r:
t2_start = time.perf_counter()
# instantiation
obj = Myobject()
# method execution
obj.mymethod()
del obj
t2_stop = time.perf_counter()
times_per_run.append(t2_stop-t2_start)
print(times_per_run)
benchmark(runs=10)
Executing this code shows that the average time per run varies significantly. The first run takes consistently longer. How do I eliminate the benefit of branch prediction when benchmarking across multiple runs?
To avoid the benefits of warmup (s. comments on post), I used the subprocess module to trigger the runs individually while measuring the time for each run and aggregate the results afterwards:
def benchmark(runs=100):
times_per_run = []
command = "python3 ./myclass.py"
for _ in range(runs):
t1_start = time.perf_counter()
subprocess.run([command], capture_output=True, shell=True, check=False)
t1_stop = time.perf_counter()
times_per_run.append(t1_stop - t1_start)
logging.info(f"Average time per run: {sum(times_per_run) / runs}")
benchmark()
This yields stable results.
Related
I created a simple application, and I realised that my code is running extremely slow. This application included calling the same method over and over again. I tried investigating the problem, and it turned out that calling the same function / method several times resulted in Python sometimes taking 15 milliseconds to execute an empty function (pass).
I'm running windows 10 Home 64 bit on a Lenovo ThinkPad, i7 CPU
The less code the function / method has, the smaller the chance of having a 15ms runtime, however, it never goes away.
Here's the code:
import time
class Clock:
def __init__(self):
self.t = time.time()
def restart(self):
dt = time.time() - self.t
t = time.time()
return dt * 1000
def method():
pass
for i in range(100000):
c = Clock()
dt = c.restart()
if dt > 1.:
print(str(i) + ' ' + str(dt))
I'd expect that I never get anything printed out, however an average result looks like this:
6497 15.619516372680664
44412 15.622615814208984
63348 15.621185302734375
On average 1-4 out of 100000 times the time elapsed between starting the clock and getting the result (which is an empty function call and a simple subtraction and variable assignment) the elapsed time is 15.62.. milliseconds, which makes the run time really slow.
Occasionally the elapsed time is 1 millisecond.
Thank you for your help!
In your code you are making the call to time.time() twice which would require the system to retrieve the time from the OS. You can read here
How does python's time.time() method work?
As you mentioned you used Windows, it is probably better for you to use time.clock() instead and will defer you to read this link instead since they do a much better job explaining. https://www.pythoncentral.io/measure-time-in-python-time-time-vs-time-clock/
Also the link takes garbage collection into account of performance and gives the ability to remove it during testing.
Hope it answers your questions!
How do I profile/benchmark an assynchronous Python script (which uses ASYNCIO)?
I you would usualy do
totalMem = tracemalloc.get_traced_memory()[0]
totalTime = time.time()
retValue = myFunction()
totalTime = time.time() - totalTime
totalMem = tracemalloc.get_traced_memory()[0] - totalMem
This way I would save the total time taken by the function.
I learned how to use decorators and I did just that - and dumped all stats into a text file for later analysis.
But, when you have ASYNCIO script, things get pretty different: the function will block while doing an "await aiohttpSession.get()", and control will go back to the event loop, which will run other functions.
This way, the elapsed time and changes in total allocated memory won't reveal anything, because I will have measured more than just that function.
The only way it would work would be something like
class MyTracer:
def __init__(self):
self.totalTime = 0
self.totalMem = 0
self.startTime = time.time()
self.startMem = tracemalloc.get_traced_memory()[0]
def stop(self):
self.totalTime += time.time() - self.startTime
self.totalMem += tracemalloc.get_traced_memory()[0] - self.startMem
def start(self):
self.startTime = time.time()
self.startMem = tracemalloc.get_traced_memory()[0]
And now, somehow, insert it in the code:
def myFunction():
tracer = MyTracer()
session = aiohttp.ClientSession()
# do something
tracer.stop()
# the time elapsed here, and the changes in the memory allocation, are not from the current function
retValue = await(await session.get('https://hoochie-mama.org/cosmo-kramer',
headers={
'User-Agent': 'YoYo Mama! v3.0',
'Cookies': 'those cookies are making me thirsty!',
})).text()
tracer.start()
# do more things
tracer.stop()
# now "tracer" has the info about total time spent in this function, and the memory allocated by it
# (the memory stats could be negative if the function releases more than allocates)
Is there a way to accomplish this, I mean, profile all my asyncio code without having to insert all this code?
Or is there a module already capable of doing just that?
Check out Yappi profiler which has support for coroutine profiling. Their page on coroutine profiling describes the problem you're facing very clearly:
The main issue with coroutines is that, under the hood when a coroutine yields or in other words context switches, Yappi receives a return event just like we exit from the function. That means the time spent while the coroutine is in yield state does not get accumulated to the output. This is a problem especially for wall time as in wall time you want to see whole time spent in that function or coroutine. Another problem is call count. You see every time a coroutine yields, call count gets incremented since it is a regular function exit.
They also describe very high level how Yappi solves this problem:
With v1.2, Yappi corrects above issues with coroutine profiling. Under the hood, it differentiates the yield from real function exit and if wall time is selected as the clock_type it will accumulate the time and corrects the call count metric.
I call pool.apply_async() with 14 cores.
import multiprocessing
from time import time
import timeit
informative_patients = informative_patients_2500_end[20:]
pool = multiprocessing.Pool(14)
results = []
wLength = [20,30,50]
start = time()
for fn in informative_patients:
result = pool.apply_async(compute_features_test_set, args = (fn,
wLength), callback=results.append)
pool.close()
pool.join()
stop = timeit.default_timer()
print stop - start
The problem is it finishes calling compute_features_test_set() function for the first 13 data in less than one hour, but it takes more than one hour to finish the last one. The size of the data for all the 14 data-set is the same. I tried putting pool.terminate() after pool.close() but in this case it doesn't even start the pool and terminate the pool immediately without going inside the for loop. This always happen in the same way and if I use more cores and more data set, always the last one takes so long to finish. My compute_features_test_set() function is a simple feature extraction code and works correctly. I work on a server with Linux red hat 6, python 2.7 and jupyter. Computation time is important to me and my question is what is wrong here and how I can fix it to get the all the computation done in a reasonable time?
Question: ... what is wrong here and how I can fix it
Couldn't catch this as a multiprocessing issue.
But How do you get this: "always the last one takes so long to finish"?
You are using callback=results.append instead of a own function?
Edit your Question and show How you timeit one Process Time.
Also add your Python Version to your Question.
Do the following to verify it's not a Data issue:
start = time()
results.append(
compute_features_test_set(<First informative_patients>, wLength
)
stop = timeit.default_timer()
print stop - start
start = time()
results.append(
compute_features_test_set(<Last informative_patients>, wLength
)
stop = timeit.default_timer()
print stop - start
Compare the two times you get.
I have a function that takes a long time to return control to the parent after the return call is made. I've timed it, and it frequently runs between 2-10 minutes. Here's the function itself:
def mass_update_database(database, queue):
documents_to_update = []
# Get all docs from queue for updating.
while not queue.empty():
documents_to_update.append(queue.get())
queue.task_done()
# Update database
database.update(documents_to_update)
# Compact the database, which removes previous revisions
# and slims the size of our database.
if database.compact():
print('Compaction completed successfully.')
else:
print('Compaction failed.')
print('Beginning return')
d = datetime.datetime.now()
return d
Some notes on the above code, queue is pretty large (8,500 dictionaries with at least 20 keys and potentially lengthy values) This is updating to CouchDB, so the database object is a couchdb.Database object. The d variable is for timing (which is how I know it's taking so long).
I suspect that maybe the documents_to_update variable is so large that cleaning it up is taking a long time? But I ran it with a variation where I added documents_to_update = [] right before the timer started, and it still took a long time to return.
Here's where it's being called. The above function is in a different module called NS.
d = NS.mass_update_database(ns_database, docs_to_update_queue)
print('Returned', datetime.datetime.now() - d)
Anyone know any reason why returning control to the parent could take 2-10 minutes?
I should add that when I take the code from the function and stick it where the function call would go, it doesn't take forever to finish running where the return statement would be.
EDIT: I should clarify, the long time that it takes to return is from where I initialize d until control returns to the parent. All the code ABOVE that has finished and completed. What's taking a long time is from the return statement until the next statement in the parent that called mass_update_database
I am trying to measure the time of raw_queries(...), unsuccessfully so far. I found that I should use the timeit module. The problem is that I can't (= I don't know how) pass the arguments to the function from the environment.
Important note: Before calling raw_queries, we have to execute phase2() (environment initialization).
Side note: The code is in Python 3.
def raw_queries(queries, nlp):
""" Submit queries without getting visual response """
for q in queries:
nlp.query(q)
def evaluate_queries(queries, nlp):
""" Measure the time that the queries need to return their results """
t = Timer("raw_queries(queries, nlp)", "?????")
print(t.timeit())
def phase2():
""" Load dictionary to memory and subsequently submit queries """
# prepare Linguistic Processor to submit it the queries
all_files = get_files()
b = LinguisticProcessor(all_files)
b.loadDictionary()
# load the queries
queries_file = 'queries.txt'
queries = load_queries(queries_file)
if __name__ == '__main__':
phase2()
Thanks for any help.
UPDATE: We can call phase2() using the second argument of Timer. The problem is that we need the arguments (queries, nlp) from the environment.
UPDATE: The best solution so far, with unutbu's help (only what has changed):
def evaluate_queries():
""" Measure the time that the queries need to return their results """
t = Timer("main.raw_queries(queries, nlp)", "import main;\
(queries,nlp)=main.phase2()")
sf = 'Execution time: {} ms'
print(sf.format(t.timeit(number=1000)))
def phase2():
...
return queries, b
def main():
evaluate_queries()
if __name__ == '__main__':
main()
First, never use the time module to time functions. It can easily lead to wrong conclusions. See timeit versus timing decorator for an example.
The easiest way to time a function call is to use IPython's %timeit command.
There, you simply start an interactive IPython session, call phase2(), define queries,
and then run
%timeit raw_queries(queries,nlp)
The second easiest way that I know to use timeit is to call it from the command-line:
python -mtimeit -s"import test; queries=test.phase2()" "test.raw_queries(queries)"
(In the command above, I assume the script is called test.py)
The idiom here is
python -mtimeit -s"SETUP_COMMANDS" "COMMAND_TO_BE_TIMED"
To be able to pass queries to the raw_queries function call, you have to define the queries variable. In the code you posted queries is defined in phase2(), but only locally. So to setup queries as a global variable, you need to do something like have phase2 return queries:
def phase2():
...
return queries
If you don't want to mess up phase2 this way, create a dummy function:
def phase3():
# Do stuff like phase2() but return queries
return queries
Custom timer function may be a solution:
import time
def timer(fun,*args):
start = time.time()
ret = fun(*args)
end = time.time()
return (ret, end-start)
Using like this:
>>> from math import sin
>>> timer(sin, 0.5)
(0.47942553860420301, 6.9141387939453125e-06)
It means that sin returned 0.479... and it took 6.9e-6 seconds. Make sure your functions run long enough if you want to obtain reliable numbers (not like in the example above).
Normally, you would use timeit.
Examples are here and here.
Also Note:
By default, timeit() temporarily turns
off garbage collection during the
timing. The advantage of this approach
is that it makes independent timings
more comparable. This disadvantage is
that GC may be an important component
of the performance of the function
being measured
Or you can write your own custom timer using the time module.
If you go with a custom timer, remember that you should use time.clock() on windows and time.time() on other platforms. (timeit chooses internally)
import sys
import time
# choose timer to use
if sys.platform.startswith('win'):
default_timer = time.clock
else:
default_timer = time.time
start = default_timer()
# do something
finish = default_timer()
elapsed = (finish - start)
I'm not sure about this, I've never used it, but from what I've read it should be something like this:
....
t = Timer("raw_queries(queries, nlp)", "from __main__ import raw_queries")
print t.timeit()
I took this from http://docs.python.org/library/timeit.html (if this helps).
You don't say so, but are you by any chance trying to make the code go faster? If so, I suggest you not focus in on a particular routine and try to time it. Even if you get a number, it won't really tell you what to fix. If you can pause the program under the IDE several times and examine it's state, including the call stack, it will tell you what is taking the time and why. Here is a link that gives a brief explanation of how and why it works.*
*When you follow the link, you may have to go the the bottom of the previous page of answers. SO is having trouble following a link to an answer.