Why is \r in Python print function faster but less smooth? - python

Say I have some simple code such as the following:
import time
tic = time.perf_counter()
for x in range(1000):
print(str(x))
toc = time.perf_counter()
print(toc - tic)
It prints 1000 things, and then prints the time it took to do it. When I run this, I get:
0
1
2
3
4
5
6
… skipped for brevity …
995
996
997
998
999
2.0521691879999997
A little more than two seconds. Not bad.
Now say I wanted to only have a single number showing at a time. I could do :
import time
tic = time.perf_counter()
for x in range(1000):
print('\r' + str(x), end = '')
toc = time.perf_counter()
print()
print(toc - tic)
I get (at the very end)
999
0.46631713999999996
This seems weird, because in the second script, you’re printing more things (some \r’s and x) than the first script (just x). So why would it be faster?
But in the second script, the output is very un-smooth. It looks like the program is counting by 110s. Why doesn’t it just start spitting out numbers rapidly like in the first one?
By the way, apparently sys.stdout.write() is faster than print(), and with the former, both the first script and the second script are about the same speed, but the second script is still not smooth.

Related

Performance of searching a number in an array

I was doing some tests about finding a number in a number array with python. With the following code,
from time import time
search = 9999999
numbers = []
for i in range(100000000):
numbers.append(i)
start_time = time()
is_in = search in numbers
end_time = time()
print(is_in, end_time - start_time)
I got the output as follows:
True 0.10372281074523926
However, the amount of time that has passed seems much more than the output (nearly 4 seconds). In addition to that, when I change the search value to 0, it outputs the following,
True 0.0
But still, the amount of time that the program needs to terminate is nearly 4-5 seconds, (measured by human instincs) I wonder what is the reason behind this. Why it does not finish after 0.1 seconds as measured and why searching for 0 results in 0.0 seconds?
How long do you think it takes to build your numbers list, specially when doing so in the most inefficient way ? Well, let's check it - but let's check it the right way: using timeit:
>>> def foo():
... l = []
... for i in range(100000000): l.append(i)
... return l
...
>>> import timeit
>>> timeit.timeit("foo()", "from __main__ import foo", number=1)
6.561729616951197
So on this desktop (which is a rather decent machine), just creating this list already takes 6.5 seconds.
Now let's test the linear search:
>>> def search(i, num):
... return i in num
...
>>> numbers = foo()
>>> timeit.timeit("search(9999999, numbers)", "from __main__ import search, numbers", number=1)
0.06766342208720744
So we need 6.5 seconds to build the list, and 0.067 seconds to do a linear search. Note that in both cases we only executed the code under test one single time (the number=1 argument to timeit), which is not really accurate due to os process scheduling. For a more accurate reading you want to repeat the operation thousands times or more (the default value is actually 1000000 !) so you get a reasonably representative average value.
Now just for the fun let's rewrite foo():
>>> def foo():
... return list(range(100000000))
...
>>> timeit.timeit("foo()", "from __main__ import foo", number=1)
2.594872738001868
That's still long, but it's about 2.5 times faster. If you wonder why: this waythe runtime can allocate the required memory for the full list right from the start instead of having to grow it again and again and again.
And for a much more efficient (and constant time !) search:
>>> numset = set(numbers)
>>> timeit.timeit("search(9999999, numset)", "from __main__ import search, numset", number=1)
3.505963832139969e-06
Wait !!! 3.5 something seconds ??? But no - notice the e-06 at the end, it's actually 0.00000350596383213996 seconds, so almost 20000 times faster.

Why are Python operations 30× slower after calling time.sleep or subprocess.Popen?

Consider the following loop:
for i in range(20):
if i == 10:
subprocess.Popen(["echo"]) # command 1
t_start = time.time()
1+1 # command 2
t_stop = time.time()
print(t_stop - t_start)
“command 2” command takes systematically longer to run when “command 1” is run before it. The following plot shows the execution time of 1+1 as a function of the loop index i, averaged over 100 runs.
Execution of 1+1 is 30 times slower when preceded by subprocess.Popen.
It gets even weirder. One may think that only the first command run after subprocess.Popen() is affected, but it is not the case. The following loop shows that all commands in the current loop iteration are affected. But the subsequent loops iterations seem to be mostly OK.
var = 0
for i in range(20):
if i == 10:
# command 1
subprocess.Popen(['echo'])
# command 2a
t_start = time.time()
1 + 1
t_stop = time.time()
print(t_stop - t_start)
# command 2b
t_start = time.time()
print(1)
t_stop = time.time()
print(t_stop - t_start)
# command 2c
t_start = time.time()
var += 1
t_stop = time.time()
print(t_stop - t_start)
Here’s a plot of the execution times for this loop, average over 100 runs:
More remarks:
We get the same effect when replacing subprocess.Popen() (“command 1”) with time.sleep(), or rawkit’s libraw C++ bindings initialization (libraw.bindings.LibRaw()). However, using other libraries with C++ bindings such as libraw.py, or OpenCV’s cv2.warpAffine() do not affect execution times. Opening files don’t either.
The effect is not caused by time.time(), because it is visible with timeit.timeit(), and even by measuring manually when print() result appear.
It also happens without a for-loop.
This happens even when a lot of different (possibly CPU- and memory-consuming) operations are performed between “command 1” (subprocess.Popen) and “command 2”.
With Numpy arrays, the slowdown appears to be proportional to the size of the array. With relatively big arrays (~ 60 M points), a simple arr += 1 operation can take up to 300 ms!
Question: What may cause this effect, and why does it affect only the current loop iteration?
I suspect that it could be related to context switching, but this doesn’t seem to explain why a whole loop iteration would affected. If context switching is indeed the cause, why do some commands trigger it while others don’t?
my guess would be that this is due to the Python code being evicted from various caches in the CPU/memory system
the perflib package can be used to extract more detailed CPU level stats about the state of the cache — i.e. the number of hits/misses.
I get ~5 times the LIBPERF_COUNT_HW_CACHE_MISSES counter after the Popen() call:
from subprocess import Popen, DEVNULL
from perflib import PerfCounter
import numpy as np
arr = []
p = PerfCounter('LIBPERF_COUNT_HW_CACHE_MISSES')
for i in range(100):
ti = []
p.reset()
p.start()
ti.extend(p.getval() for _ in range(7))
Popen(['echo'], stdout=DEVNULL)
ti.extend(p.getval() for _ in range(7))
p.stop()
arr.append(ti)
np.diff(np.array(arr), axis=1).mean(axis=0).astype(int).tolist()
gives me:
2605, 2185, 2127, 2099, 2407, 2120,
5481210,
16499, 10694, 10398, 10301, 10206, 10166
(lines broken in non-standard places to indicate code flow)

Inaccurate while loop timing in Python

I am trying to log data at with a high sampling rate using a Raspberry Pi 3 B+. In order to achieve a fixed sampling rate, I am delaying the while loop, but I always get a sample rate that is a little less than I specify.
For 2500 Hz I get ~2450 Hz
For 5000 Hz I get ~4800 Hz
For 10000 Hz I get ~9300 Hz
Here is the code that I use to delay the while loop:
import time
count=0
while True:
sample_rate=5000
time_start=time.perf_counter()
count+=1
while (time.perf_counter()-time_start) < (1/sample_rate):
pass
if count == sample_rate:
print(1/(time.perf_counter()-time_start))
count=0
I have also tried updating to Python 3.7 and used time.perf_counter_ns(), but it does not make a difference.
The problem you are seeing is because your code is using the real time each time in the loop when it starts each delay for the period duration - and so time spent in untimed code and jitter due to OS multitasking accumulates, reducing the overall period below what you want to achieve.
To greatly increase the timing accuracy, use the fact that each loop "should" finish at the period (1/sample_rate) after it should have started - and maintain that start time as an absolute calculation rather than the real time, and wait until the period after that absolute start time, and then there is no drift in the timing.
I put your timing into timing_orig and my revised code using absolute times into timing_new - and results are below.
import time
def timing_orig(ratehz,timefun=time.clock):
count=0
while True:
sample_rate=ratehz
time_start=timefun()
count+=1
while (timefun()-time_start) < (1.0/sample_rate):
pass
if count == ratehz:
break
def timing_new(ratehz,timefun=time.clock):
count=0
delta = (1.0/ratehz)
# record the start of the sequence of timed periods
time_start=timefun()
while True:
count+=1
# this period ends delta from "now" (now is the time_start PLUS a number of deltas)
time_next = time_start+delta
# wait until the end time has passed
while timefun()<time_next:
pass
# calculate the idealised "now" as delta from the start of this period
time_start = time_next
if count == ratehz:
break
def timing(functotime,ratehz,ntimes,timefun=time.clock):
starttime = timefun()
for n in range(int(ntimes)):
functotime(ratehz,timefun)
endtime = timefun()
# print endtime-starttime
return ratehz*ntimes/(endtime-starttime)
if __name__=='__main__':
print "new 5000",timing(timing_new,5000.0,10.0)
print "old 5000",timing(timing_orig,5000.0,10.0)
print "new 10000",timing(timing_new,10000.0,10.0)
print "old 10000",timing(timing_orig,10000.0,10.0)
print "new 50000",timing(timing_new,50000.0,10.0)
print "old 50000",timing(timing_orig,50000.0,10.0)
print "new 100000",timing(timing_new,100000.0,10.0)
print "old 100000",timing(timing_orig,100000.0,10.0)
Results:
new 5000 4999.96331002
old 5000 4991.73952992
new 10000 9999.92662005
old 10000 9956.9314274
new 50000 49999.6477761
old 50000 49591.6104893
new 100000 99999.2172809
old 100000 94841.227219
Note I didn't use time.sleep() because it introduced too much jitter. Also, note that even though this minimal example shows very accurate timing even up to 100khz on my Windows laptop, if you put more code into the loop than there is time to execute, the timing will run correspondingly slow.
Apologies I used Python 2.7 which doesn't have the very convenient time.perf_counter() function - add an extra parameter timefun=time.perf_counter() to each of the calls to timing()
I think you can fix this pretty easily by rearranging your code as such:
import time
count=0
sample_rate=5000
while True:
time_start=time.perf_counter()
# do all the real stuff here
while (time.perf_counter()-time_start) < (1/sample_rate):
pass
This way python does the waiting after you execute the code, rather than before, so the time the interpreter takes to run it will not be added to your sample rate. As danny said, it's an interpreted language so that might introduce timing inconsistencies, but this way should at least decrease the effect you are seeing.
Edit for proof that this works:
import sys
import time
count=0
sample_rate=int(sys.argv[1])
run_start = time.time()
while True:
time_start=time.time()
a = range(10)
b = range(10)
for x in a:
for y in b:
c = a+b
count += 1
if count == sample_rate*2:
break
while (time.time()-time_start) < (1.0/sample_rate):
pass
real_rate = sample_rate*2/(time.time()-run_start)
print real_rate, real_rate/sample_rate
So the testing code does a solid amount of random junk for 2 seconds and then prints the real rate and the percentage of the actual rate that turns out to be. Here's some results:
~ ><> python t.py 1000
999.378471674 0.999378471674
~ ><> python t.py 2000
1995.98713838 0.99799356919
~ ><> python t.py 5000
4980.90553757 0.996181107514
~ ><> python t.py 10000
9939.73553783 0.993973553783
~ ><> python t.py 40000
38343.706669 0.958592666726
So, not perfect. But definitely better than a ~700Hz drop at a desired 10000. The accepted answer is definitely the right one.

Time.sleep inaccurate for Python counter?

I'd like to create a revenue counter for the sales team at work and would love to use Python. E.g. Joe Bloggs shifts his target from 22.1 to 23.1 (difference of 1.0.) I'd like the counter to tick evenly from 22.1 to 23.1 over an hour.
I've created this script, which works fine for counting a minute (runs 2 seconds over the minute); however, when it's supposed to run for an hour, it runs for 47 minutes.
Question: Does anyone know why it runs faster when I set it to an hour? Is sleep.time inaccurate?
import time
def rev_counter(time_length):
time_start = (time.strftime("%H:%M:%S"))
prev_pp = 22.1
new_pp = 23.1
difference = new_pp - prev_pp
iter_difference = (difference / 100000.) # Divide by 100,000 to show 10 decimal places
time_difference = ((time_length / difference) / 100000.)
i = prev_pp
while i < new_pp:
print("%.10f" % i)
i = i + iter_difference
time.sleep(time_difference)
time_end = (time.strftime("%H:%M:%S"))
print "Time started at", time_start
print "Time ended at", time_end
rev_counter(60) # 60 seconds. Returns 62 seconds
rev_counter(600) # 10 minutes. Returns 10 minutes, 20 secs
rev_counter(3600) # 1 hour. Returns 47 minutes
Please note this quote from the Python documentation for time.sleep()
The actual suspension time may be less than that requested because any
caught signal will terminate the sleep() following execution of that
signal's catching routine. Also, the suspension time may be longer
than requested by an arbitrary amount because of the scheduling of
other activity in the system.
As a suggestion, if faced with this problem, I would use a variable to track the time that the interval starts. When sleep wakes up, check to see if the expected time has elapsed. If not, restart a sleep for the difference, etc.
First of all, your loop doesn't only contain sleep statements -- the things you do between calling time.sleep take time, too, so if you do 10 repetions, you'll spent only 10% of the time doing these compared to when you have 100 iterations through your loop.
Is sleep.time inaccurate?
Yes. Or well. Quite.
I come from a real-time signal processing background. PC clocks are only somewhat accurate, and the time you spend in your OS, your standard libraries, your scripting language run time and your scripting logic between the point in time when a piece of hardware notifies you that your time has elapsed and the point in time your software notices is significant.
I just noticed time.sleep taking way too long (5-30000 times longer for input values between .0001 to 1 second), and searching for an answer, found this thread. I ran some tests and it is consistently doing this (see code and results below). The weird thing is, I restarted, then it was back to normal, working very accurately. When code started to hang it was time.sleep taking 10000 times too long?!
So a restart is a temporary solution, but not sure what the cause is/ permanent solution is.
import numpy as np
import time
def test_sleep(N,w):
data = []
for i in xrange(N):
t0 = time.time()
time.sleep(w)
t1 = time.time()
data.append(t1-t0)
print "ave = %s, min = %s, max = %s" %(np.average(data), np.min(data), np.max(data))
return data
data1 = test_sleep(20,.0001)
Out: ave = 2.95489487648, min = 1.11787080765, max = 3.23506307602
print data1
Out: [3.1929759979248047,
3.121081829071045,
3.1982388496398926,
3.1221959590911865,
3.098078966140747,
3.131525993347168,
3.12644100189209,
3.1535091400146484,
3.2167508602142334,
3.1277999877929688,
3.1103289127349854,
3.125699996948242,
3.1129801273345947,
3.1223208904266357,
3.1313750743865967,
3.1280829906463623,
1.117870807647705,
1.3357980251312256,
3.235063076019287,
3.189779043197632]
data2 = test_sleep(20, 1)
Out: ave = 9.44276217222, min = 1.00008392334, max = 10.9998381138
print data2
Out: [10.999573945999146,
10.999622106552124,
3.8115758895874023,
1.0000839233398438,
3.3502109050750732,
10.999613046646118,
10.99983811378479,
10.999617099761963,
10.999662160873413,
10.999619960784912,
10.999650955200195,
10.99962306022644,
10.999721050262451,
10.999620914459229,
10.999532222747803,
10.99965500831604,
10.999596118927002,
10.999563932418823,
10.999600887298584,
4.6992621421813965]

What is the best/most efficient way to output value every x seconds during a loop

I have always been curious about this as the simple way is definitely not efficient. How would you efficiently go about outputting a value every x seconds?
Here is an example of what I mean:
import time
num = 50000000
startTime = time.time()
j=0
for i in range(num):
j = (((j+10)**0.5)**2)**0.5
print time.time() - startTime
#output time: 24 seconds
startTime = time.time()
newTime = time.time()
j=0
for i in range(num):
j = (((j+10)**0.5)**2)**0.5
if time.time() - newTime > 0.5:
newTime = time.time()
print i
print time.time() - startTime
#output time: 32 seconds
A whole 1/3rd faster when not outputting the progress every half a second.
I know this is because it requires an extra calculation every loop, but the same applies with other similar checks you may want to do - how would you go about implementing something like this without seriously affecting the execution time?
Well, you know that you're doing many iterations per second, so you really don't need to make the time.time() call on every iteration. You can use a modulo operator to only actually check if you need to output something every N iterations of the loop.
startTime = time.time()
newTime = time.time()
j=0
for i in range(num):
j = (((j+10)**0.5)**2)**0.5
if i % 50 == 0: # Only check every 50th iteration
if time.time() - newTime > 0.5:
newTime = time.time()
print i, newTime
print time.time() - startTime
# 45 seconds (the original version took 42 on my system)
Checking only every 50 iterations reduces my run time from 56 seconds to 43 (the original took with no printing 42, and Tom Page's solution took 50 seconds), and the iterations complete quickly enough that its still outputting exactly every 0.5 seconds according to time.time():
0 1409083225.39
605000 1409083225.89
1201450 1409083226.39
1821150 1409083226.89
2439250 1409083227.39
3054400 1409083227.89
3644100 1409083228.39
4254350 1409083228.89
4831600 1409083229.39
5433450 1409083229.89
6034850 1409083230.39
6644400 1409083230.89
7252650 1409083231.39
7840100 1409083231.89
8438300 1409083232.39
9061200 1409083232.89
9667350 1409083233.39
...
You might save a few clock cycles by keeping track of the next time that a print is due
nexttime = time.time() + 0.5
And then your condition will be a simple comparison
If time.time() >= nexttime
As opposed to a subtraction followed by a comparison
If time.time() - newTime > 0.5
You'll only have to do an addition after each message as opposed to doing a subtraction after each itteration
I tried it with a sideband thread doing the printing. It added 5 seconds to exec time on python 2.x but virtually not extra time on python 3.x. Python 2.x threads have a lot of overhead. Here's my example with timing included as comments:
import time
import threading
def showit(event):
global i # could pass in a mutable object instead
while not event.is_set():
event.wait(.5)
print 'value is', i
num = 50000000
startTime = time.time()
j=0
for i in range(num):
j = (((j+10)**0.5)**2)**0.5
print time.time() - startTime
#output time: 23 seconds
event = threading.Event()
showit_thread = threading.Thread(target=showit, args=(event,))
showit_thread.start()
startTime = time.time()
j=0
for i in range(num):
j = (((j+10)**0.5)**2)**0.5
event.set()
time.sleep(.1)
print time.time() - startTime
#output time: 28 seconds
If you want to wait a specified period of time before doing something, just use the time.sleep() method.
for i in range(100):
print(i)
time.sleep(0.5)
This will wait half a second before printing the next value of i.
If you don't care about Windows, signal.setitimer will be simpler than using a background thread, and on many *nix platforms a whole lot more efficient.
Here's an example:
import signal
import time
num = 50000000
startTime = time.time()
def ontimer(sig, frame):
global i
print(i)
signal.signal(signal.SIGVTALRM, ontimer)
signal.setitimer(signal.ITIMER_VIRTUAL, 0.5, 0.5)
j=0
for i in range(num):
j = (((j+10)**0.5)**2)**0.5
signal.setitimer(signal.ITIMER_VIRTUAL, 0)
print(time.time() - startTime)
This is about as close to free as you're going to get performance-wise.
In some use cases, a virtual timer isn't sufficiently accurate, so you need to change that to ITIMER_REAL and change the signal to SIGALRM. That's a little more expensive, but still pretty cheap, and still dead simple.
On some (older) *nix platforms, alarm may be more efficient than setitmer, but unfortunately alarm only takes integral seconds, so you can't use it to fire twice/second.
Timings from my MacBook Pro:
no output: 15.02s
SIGVTALRM: 15.03s
SIGALRM: 15.44s
thread: 19.9s
checking time.time(): 22.3s
(I didn't test with either dano's optimization or Tom Page's; obviously those will reduce the 22.3, but they're not going to get it down to 15.44…)
Part of the problem here is that you're using time.time.
On my MacBook Pro, time.time takes more than 1/3rd as long as all of the work you're doing:
In [2]: %timeit time.time()
10000000 loops, best of 3: 105 ns per loop
In [3]: %timeit (((j+10)**0.5)**2)**0.5
1000000 loops, best of 3: 268 ns per loop
And that 105ns is fast for time—e.g., an older Windows box with no better hardware timer than ACPI can take 100x longer.
On top of that, time.time is not guaranteed to have enough precision to do what you want anyway:
Note that even though the time is always returned as a floating point number, not all systems provide time with a better precision than 1 second.
Even on platforms where it has better precision than 1 second, it may have a lower accuracy; e.g., it may only be updated once per scheduler tick.
And time isn't even guaranteed to be monotonic; on some platforms, if the system time changes, time may go down.
Calling it less often will solve the first problem, but not the others.
So, what can you do?
Unfortunately, there's no built-in answer, at least not with Python 2.7. The best solution is different on different platforms—probably GetTickCount64 on Windows, clock_gettime with the appropriate clock ID on most modern *nixes, gettimeofday on most other *nixes. These are relatively easy to use via ctypes if you don't want to distribute a C extension… but someone really should wrap it all up in a module and post it on PyPI, and unfortunately I couldn't find one…

Categories