Benchmark Python programs

Benchmark Python programs - python

I'm new to Python. It is my first interpreted language. I used to study Java so far.
So, when Java program runs for the first time, it is execueted slower than for the next times.
The reasi is caching.
import time
def procedure():
time.sleep(2.5)
# measure process time
t0 = time.clock()
procedure()
print (time.clock() - t0), "seconds process time"
I tried this for several times. The result is always equal. So, am I right that no cashe interferes and that the benchmark is pretty reliable?

it's OK to do benchmarks like this, the accuracy is good enough for functions which run "long" and pretty constant, like in your example.
But there are some pitfalls: for "quick" functions (like the empty one), you run into precision limits. And for functions which vary in execution time (like net i/o, for example), you have to measure multiple times to find min/max/avg runtime.
And in addition to that, the clock best used differs on platforms: on windows, time.clock() is preferred, on *nix, time.time().
luckily, there is a module which takes care of all that: timeit:
>>> import time
>>> def procedure():
pass
>>> def time_this(f):
t0=time.clock()
f()
print((time.clock() - t0), "seconds process time")
>>> time_this(procedure)
1.9555558310457855e-06 seconds process time
>>> time_this(procedure)
1.9555557742023666e-06 seconds process time
>>> time_this(procedure)
1.9555557742023666e-06 seconds process time
>>> import timeit
>>> timeit.Timer(procedure).timeit()
0.09460783423588737
>>> timeit.Timer(procedure).repeat()
[0.09791419021132697, 0.09721947901198291, 0.09598943441130814]
you might want to look at it's source. Or just use it ;)
As for caching: python code compiles to bytecode when first used. This bytecode is cached by default - but this won't affect your benchmark as long as you don't do imports in your function.

Related

Python 3 subprocesses slower than equivalent bash

I'm using a python3 script to automatize some jobs.
I need to measure the time of such external jobs. So I decided to use python 3 built-in time() combined with the subprocess module:
with open(in_files[i],'r') as f, open(sol_files[i],'w') as f_sol:
start = time.time()
process = subprocess.run(['./'+src_files[i]], stdin = f, stdout=f_sol)
end = time.time()
The calculated elapsed time by this python snippet is 0.73 seconds
However, the equivalent bash command:
time ./file < input_file > output_file
Is significantly faster: 0.5 seconds
Which could be causing this huge discrepancy? Maybe the context switching with the python interpreter due the redirection usage? Maybe something related to buffering?
A similar code without the redirection usage does not show this behavior:
start = time.time()
process = subprocess.run(['sleep','1'])
end = time.time()
The above code time is elapsed in 1s + negligible time.
Best regards

It was a stupid mistake.
time.time() does not have a good precision in most systems.
Note that even though the time is always returned as a floating point number, not all systems provide time with a better precision than 1 second. While this function normally returns non-decreasing values, it can return a lower value than a previous call if the system clock has been set back between the two calls.
Python 3 Time Module Documentation
perf_counter() or process_time() works just fine. Nothing wrong with subprocesses.

The result of running time calculated by Python is not correct

I trying to use time to record running time of this function, but i think the result is not correct, sometimes it will only cost 0s and the result is not stable.The first two result is for N=10000， the third one is N=30000
import time
def sumOfN(n):
start=time.time()
theSum=0
for i in range(1,n+1):
theSum=theSum+i
end=time.time()
return theSum,end-start
for i in range(5):
print("Sum is %d required %10.7f seconds"%sumOfN(300000))

According to the Python manual:
time.time()
Return the time in seconds since the epoch as a floating
point number. Note that even though the time is always returned as a
floating point number, not all systems provide time with a better
precision than 1 second. While this function normally returns
non-decreasing values, it can return a lower value than a previous
call if the system clock has been set back between the two calls.
(emphasis mine)
It seems the timer resolution of your system is not enough to correctly measure the elapsed time of the function. It actually looks like the precision is about 0.016, about 1/60 of a second, which is typical of Windows systems.
your approach has the following two problems:
time.time() returns the current time (as in time of day), which can vary by auto-adjusting processes such as NTP or if someone modifies it (either by hand or via code). Use time.perf_counter() (or time.clock() in Python <3.3) instead.
You are measuring one execution of the function. This can give you very wrong results due to the non-deterministic nature of garbage collection, bytecode optimization, and other quirks of languages like Python. You should look into the timeit module instead.

What are the differences in Pythons time.clock() in Mac vs Windows?

I am using Python's time to gauge the time frame of a Selenium process. My script is like this...
start_time = time.clock()
...
#ending with
final_time = '{0:.2f}'.format(time.clock()-start_time)
When ran on a windows OS I will get something like 55.22 but if ran on the Mac it will return something like .14 even though it was about the same time.
Any idea what is happening differently on the Mac? I am actually going to try on Ubuntu as well to see the differences.

Per the documentation, time.clock is different between Unix (including Mac OS X) and Windows:
On Unix, return the current processor time as a floating point number
expressed in seconds. The precision, and in fact the very definition
of the meaning of “processor time”, depends on that of the C function
of the same name, but in any case, this is the function to use for
benchmarking Python or timing algorithms.
On Windows, this function returns wall-clock seconds elapsed since the
first call to this function, as a floating point number, based on the
Win32 function QueryPerformanceCounter(). The resolution is typically
better than one microsecond.
If you want cross-platform consistency, consider time.time.
The difference between processor time and wall-clock time is explained in this article by Doug Hellmann - basically the processor clock is only advancing if your process is doing work.

The timeit module in the standard library uses timeit.default_timer to measure wall time:
if sys.platform == "win32":
# On Windows, the best timer is time.clock()
default_timer = time.clock
else:
# On most other platforms the best timer is time.time()
default_timer = time.time
help(timeit) explains:
The difference in default timer function is because on Windows,
clock() has microsecond granularity but time()'s granularity is 1/60th
of a second; on Unix, clock() has 1/100th of a second granularity and
time() is much more precise. On either platform, the default timer
functions measure wall clock time, not the CPU time. This means that
other processes running on the same computer may interfere with the
timing. The best thing to do when accurate timing is necessary is to
repeat the timing a few times and use the best time. The -r option is
good for this; the default of 3 repetitions is probably enough in most
cases. On Unix, you can use clock() to measure CPU time.
So for cross-platform consistency you could use
import timeit
clock = timeit.default_timer
start_time = clock()
...
final_time = clock()

Measuring elapsed time in python

Is there a simple way / module to correctly measure the elapsed time in python? I know that I can simply call time.time() twice and take the difference, but that will yield wrong results if the system time is changed. Granted, that doesn't happen very often, but it does indicate that I'm measuring the wrong thing.
Using time.time() to measure durations is incredibly roundabout when you think about it. You take the difference of two absolute time measurements which are in turn constructed from duration measurements (performed by timers) and known absolute times (set manually or via ntp), that you aren't interested in at all.
So, is there a way to query this "timer time" directly? I'd imagine that it can be represented as a millisecond or microsecond value that has no meaningful absolute representation (and thus doesn't need to be adjusted with system time). Looking around a bit it seems that this is exactly what System.nanoTime() does in Java, but I did not find a corresponding Python function, even though it should (hardware-technically) be easier to provide than time.time().
Edit: To avoid confusion and address the answers below: This is not about DST changes, and I don't want CPU time either - I want elapsed physical time. It doesn't need to be very fine-grained, and not even particularly accurate. It just shouldn't give me negative durations, or durations which are off by several orders of magnitude (above the granularity), just because someone decided to set the system clock to a different value. Here's what the Python docs say about 'time.time()':
"While this function normally returns non-decreasing values, it can return a lower value than a previous call if the system clock has been set back between the two calls"
This is exactly what I want to avoid, since it can lead to strange things like negative values in time calculations. I can work around this at the moment, but I believe it is a good idea to learn using the proper solutions where feasible, since the kludges will come back to bite you one day.
Edit2: Some research shows that you can get a system time independent measurement like I want in Windows by using GetTickCount64(), under Linux you can get it in the return value of times(). However, I still can't find a module which provides this functionality in Python.

For measuring elapsed CPU time, look at time.clock(). This is the equivalent of Linux's times() user time field.
For benchmarking, use timeit.
The datetime module, which is part of Python 2.3+, also has microsecond time if supported by the platform.
Example:
>>> import datetime as dt
>>> n1=dt.datetime.now()
>>> n2=dt.datetime.now()
>>> (n2-n1).microseconds
678521
>>> (n2.microsecond-n1.microsecond)/1e6
0.678521
ie, it took me .678521 seconds to type the second n2= line -- slow
>>> n1.resolution
datetime.timedelta(0, 0, 1)
1/1e6 resolution is claimed.
If you are concerned about system time changes (from DS -> ST) just check the object returned by datetime.Presumably, the system time could have a small adjustment from an NTP reference adjustment. This should be slewed, and corrections are applied gradually, but ntp sync beats can have an effect with very small (millisec or microsec) time references.
You can also reference Alex Martelli's C function if you want something of that resolution. I would not go too far to reinvent the wheel. Accurate time is basic and most modern OS's do a pretty good job.
Edit
Based on your clarifications, it sounds like you need a simple side check if the system's clock has changed. Just compare to a friendly, local ntp server:
import socket
import struct
import time
ntp="pool.ntp.org" # or whatever ntp server you have handy
client = socket.socket( socket.AF_INET, socket.SOCK_DGRAM )
data = '\x1b' + 47 * '\0'
client.sendto( data, ( ntp, 123 ))
data, address = client.recvfrom( 1024 )
if data:
print 'Response received from:', address
t = struct.unpack( '!12I', data )[10]
t -= 2208988800L #seconds since Epoch
print '\tTime=%s' % time.ctime(t)
NTP is accurate to milliseconds over the Internet and has representation resolution of resolution of 2−32 seconds (233 picoseconds). Should be good enough?
Be aware that the NTP 64 bit data structure will overflow in 2036 and every 136 years thereafter -- if you really want a robust solution, better check for overflow...

What you seem to be looking for is a monotonic timer. A monotonic time reference does not jump or go backwards.
There have been several attempts to implement a cross platform monotomic clock for Python based on the OS reference of it. (Windows, POSIX and BSD are quite different) See the discussions and some of the attempts at monotonic time in this SO post.
Mostly, you can just use os.times():
os.times()
Return a 5-tuple of floating point numbers indicating
accumulated (processor or other) times, in seconds. The items are:
user time, system time, children’s user time, children’s system time,
and elapsed real time since a fixed point in the past, in that order.
See the Unix manual page times(2) or the corresponding Windows
Platform API documentation. On Windows, only the first two items are
filled, the others are zero.
Availability: Unix, Windows
But that does not fill in the needed elapsed real time (the fifth tuple) on Windows.
If you need Windows support, consider ctypes and you can call GetTickCount64() directly, as has been done in this recipe.

Python 3.3 added a monotonic timer into the standard library, which does exactly what I was looking for. Thanks to Paddy3118 for pointing this out in "How do I get monotonic time durations in python?".

>>> import datetime
>>> t1=datetime.datetime.utcnow()
>>> t2=datetime.datetime.utcnow()
>>> t2-t1
datetime.timedelta(0, 8, 600000)
Using UTC avoids those embarassing periods when the clock shifts due to daylight saving time.
As for using an alternate method rather than subtracting two clocks, be aware that the OS does actually contain a clock which is initialized from a hardware clock in the PC. Modern OS implementations will also keep that clock synchronized with some official source so that it doesn't drift. This is much more accurate than any interval timer the PC might be running.

You can use perf_counter function of time module in Python Standard Library:
from datetime import timedelta
from time import perf_counter
startTime = perf_counter()
CallYourFunc()
finishedTime = perf_counter()
duration = timedelta(seconds=(finishedTime - startTime))

The example functions you state in your edit are two completely different things:
Linux times() returns process times in CPU milliseconds. Python's equivalent is time.clock() or os.times().
Windows GetTickCount64() returns system uptime.
Although two different functions, both (potentially) could be used to reveal a system clock that had a "burp" with these methods:
First:
You could take both a system time with time.time() and a CPU time with time.clock(). Since wall clock time will ALWAYS be greater than or equal to CPU time, discard any measurements where the interval between the two time.time() readings is less than the paired time.clock() check readings.
Example:
t1=time.time()
t1check=time.clock()
# your timed event...
t2=time.time()
t2check=time.clock()
if t2-t1 < t2check - t1check:
print "Things are rotten in Denmark"
# discard that sample
else:
# do what you do with t2 - t1...
Second:
Getting system uptime is also promising if you are concerned about the system's clock, since a user reset does not reset the uptime tick count in most cases. (that I am aware of...)
Now the harder question: getting system uptime in a platform independent way -- especially without spawning a new shell -- at the sub second accuracy. Hmmm...
Probably the best bet is psutil. Browsing the source, they use uptime = GetTickCount() / 1000.00f; for Windows and sysctl "kern.boottime" for BSD / OS X, etc. Unfortunately, these are all 1 second resolution.

from datetime import datetime
start = datetime.now()
print 'hello'
end = datetime.now()
delta = end-start
print type(delta)
<type 'datetime.timedelta'>
import datetime
help(datetime.timedelta)
...elapsed seconds and microseconds...

After using time.monotonic from Python 3.3+ I found that on Mac it works well, never going backwards. On Windows, where it uses GetTickCount64() it can very rarely go backwards by a substantial amount (for the purposes of my program that was in excess of 5.0) Adding a wrapper can prevent monotonic from going backwards:
with a_lock:
original_value = time.monotonic()
if original_value < last_value:
# You can post a metric here to monitor frequency and size of backward jumps
offset = last_value - original_value
last_value = original_value
return offset + original_value
How often did it go backwards? Perhaps a handful of times over many days across millions of machines and again, only on Windows. Sorry, I did not track which versions of Windows. I can update this later if people wish.

Python execute a function for X seconds

I'm looking for a way for a function to take actions based on how long it has been executing. For example, my function would loop continuously until 5 seconds has elapsed, in which case it returns immediately. Any suggestions?

Have you looked at time.clock() ?
time.clock()
On Unix, return the current processor time as a floating point number expressed in seconds. The precision, and in fact the very definition of the meaning of “processor time”, depends on that of the C function of the same name, but in any case, this is the function to use for benchmarking Python or timing algorithms.
On Windows, this function returns wall-clock seconds elapsed since the first call to this function, as a floating point number, based on the Win32 function QueryPerformanceCounter(). The resolution is typically better than one microsecond.
Using 'time.clock()' to measure time on Windows:
>>> import time
>>> def measure():
... t0 = time.clock()
... time.sleep(3)
... return time.clock() - t0
...
>>> measure()
2.9976609581514113
>>>

Another option is to use signal.alarm() with an appropriate signal handler, documented at http://docs.python.org/library/signal.html. A particular advantage to this approach is not having to check the time every time you loop, which may add significant overhead for small, tight loops.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Benchmark Python programs - python

Related

Python 3 subprocesses slower than equivalent bash

The result of running time calculated by Python is not correct

What are the differences in Pythons time.clock() in Mac vs Windows?

Measuring elapsed time in python

Python execute a function for X seconds

Categories

Resources