Still a NOOB in Python. Get stuck many times.
Script runs 3 sequencies, one after the other, each for 20 seconds.
Each sequence has a while loop. and a time out statement.
Then it starts the next loop, and so on till the end of end of the
3rd loop. Then it quits. I would like to start again from the top.
I probably have too many while loops.
#!/usr/bin/env python
# Import required libraries
import time
# More setup
# Choose a matrix to use
mat = mat1
t_end = time.time() + 20
#Start loop
while time.time() < t_end:
# code
# loop timeout
# 2 more loops follow just like first one, except matrix becomes
mat = mat2
mat = mat3
As others have already commented, you should do any repetitive tasks within a function. In Python, functions are defined using the "def" keyword. Using a function, it could be done as follows:
import time
# Replace these dummy assignments with real code
mat1 = "blah"
mat2 = "rhubarb"
mat3 = "custard"
def processMatrix(matrix, seconds=20):
t_end = time.time() + seconds
while time.time() < t_end:
pass # 'pass' does nothing - replace with your real code
processMatrix(mat1)
processMatrix(mat2)
processMatrix(mat3)
Note that I've also included the time/seconds as a parameter in the function. This gives you more flexibility, in case you wanted to run for different times for testing or different times for each matrix, etc. However, I've done it with a default value of 20 so that you don't need to include it in the function call. If you do want to override the default you could call, eg,
processMatrix(mat1, 5)
instead of,
processMatrix(mat1)
Related
I have a pretty specific problem. I want to measure execution time of the generator loop (with the yield keyword). However, I don't know in what intervals next() will be called on this generator. This means I can't just get the timestamp before and after the loop. I thought getting the timestamp at the beginning and end of each iteration will do the trick but I'm getting very inconsistent results.
Here's the test code:
import time
def gen(n):
total = 0
for i in range(n):
t1 = time.process_time_ns()
# Something that takes time
x = [i ** i for i in range(i)]
t2 = time.process_time_ns()
yield x
total += t2 - t1
print(total)
def main():
for i in gen(100):
pass
for i in gen(100):
time.sleep(0.001)
for i in gen(100):
time.sleep(0.01)
if __name__ == '__main__':
main()
Typical output for me looks something like this:
2151918
9970539
11581393
As you can see it looks like the delay outside of the loop somehow influences execution time of the loop itself.
What is the reason of this behavior? How can I avoid this inconsistency? Maybe there's some entirely different way of doing what I'm trying to achieve?
You can switch the yield x and total += t2 - t1 lines to only count the time it takes to create x.
For more in dept also see: Behaviour of Python's "yield"
Consider the following loop:
for i in range(20):
if i == 10:
subprocess.Popen(["echo"]) # command 1
t_start = time.time()
1+1 # command 2
t_stop = time.time()
print(t_stop - t_start)
“command 2” command takes systematically longer to run when “command 1” is run before it. The following plot shows the execution time of 1+1 as a function of the loop index i, averaged over 100 runs.
Execution of 1+1 is 30 times slower when preceded by subprocess.Popen.
It gets even weirder. One may think that only the first command run after subprocess.Popen() is affected, but it is not the case. The following loop shows that all commands in the current loop iteration are affected. But the subsequent loops iterations seem to be mostly OK.
var = 0
for i in range(20):
if i == 10:
# command 1
subprocess.Popen(['echo'])
# command 2a
t_start = time.time()
1 + 1
t_stop = time.time()
print(t_stop - t_start)
# command 2b
t_start = time.time()
print(1)
t_stop = time.time()
print(t_stop - t_start)
# command 2c
t_start = time.time()
var += 1
t_stop = time.time()
print(t_stop - t_start)
Here’s a plot of the execution times for this loop, average over 100 runs:
More remarks:
We get the same effect when replacing subprocess.Popen() (“command 1”) with time.sleep(), or rawkit’s libraw C++ bindings initialization (libraw.bindings.LibRaw()). However, using other libraries with C++ bindings such as libraw.py, or OpenCV’s cv2.warpAffine() do not affect execution times. Opening files don’t either.
The effect is not caused by time.time(), because it is visible with timeit.timeit(), and even by measuring manually when print() result appear.
It also happens without a for-loop.
This happens even when a lot of different (possibly CPU- and memory-consuming) operations are performed between “command 1” (subprocess.Popen) and “command 2”.
With Numpy arrays, the slowdown appears to be proportional to the size of the array. With relatively big arrays (~ 60 M points), a simple arr += 1 operation can take up to 300 ms!
Question: What may cause this effect, and why does it affect only the current loop iteration?
I suspect that it could be related to context switching, but this doesn’t seem to explain why a whole loop iteration would affected. If context switching is indeed the cause, why do some commands trigger it while others don’t?
my guess would be that this is due to the Python code being evicted from various caches in the CPU/memory system
the perflib package can be used to extract more detailed CPU level stats about the state of the cache — i.e. the number of hits/misses.
I get ~5 times the LIBPERF_COUNT_HW_CACHE_MISSES counter after the Popen() call:
from subprocess import Popen, DEVNULL
from perflib import PerfCounter
import numpy as np
arr = []
p = PerfCounter('LIBPERF_COUNT_HW_CACHE_MISSES')
for i in range(100):
ti = []
p.reset()
p.start()
ti.extend(p.getval() for _ in range(7))
Popen(['echo'], stdout=DEVNULL)
ti.extend(p.getval() for _ in range(7))
p.stop()
arr.append(ti)
np.diff(np.array(arr), axis=1).mean(axis=0).astype(int).tolist()
gives me:
2605, 2185, 2127, 2099, 2407, 2120,
5481210,
16499, 10694, 10398, 10301, 10206, 10166
(lines broken in non-standard places to indicate code flow)
I was wondering if there is simple way to measure the memory used to execute a function similar to the way you can time a function.
For example, I can time a function and append the elapsed time to a list by doing the following:
from timeit import default_timer as timer
timePerTenRuns = []
# Loop through numRuns to generate an average
#
for i in range(0, len(numRuns)):
# Generate random data
#
randList = getRand(k, randStart, k/randDivisor)
# Time the function
#
start = timer()
# Execute function to count duplicates in a random list
#
countDuplicates(randList)
# Take the time at function's end
#
end = timer()
# Append time for one execution of function
#
timePerTenRuns.append(end-start)
I would like to do the same to calculate and store the memory required to execute that function, then append it to a similar list.
I've read about tools such as memory_profiler, memory_usage, and others, but none have seemed to be able to produce the implementation I am looking for.
Is this possible? Any help would be greatly appreciated.
Best regards,
Ryan
I have to time the implementation I did of an algorithm in one of my classes, and I am using the time.time() function to do so. After implementing it, I have to run that algorithm on a number of data files which contains small and bigger data sets in order to formally analyse its complexity.
Unfortunately, on the small data sets, I get a runtime of 0 seconds even if I get a precision of 0.000000000000000001 with that function when looking at the runtimes of the bigger data sets and I cannot believe that it really takes less than that on the smaller data sets.
My question is: Is there a problem using this function (and if so, is there another function I can use that has a better precision)? Or am I doing something wrong?
Here is my code if ever you need it:
import sys, time
import random
from utility import parseSystemArguments, printResults
...
def main(ville):
start = time.time()
solution = dynamique(ville) # Algorithm implementation
end = time.time()
return (end - start, solution)
if __name__ == "__main__":
sys.argv.insert(1, "-a")
sys.argv.insert(2, "3")
(algoNumber, ville, printList) = parseSystemArguments()
(algoTime, solution) = main(ville)
printResults(algoTime, solution, printList)
The printResults function:
def printResults(time, solution, printList=True):
print ("Temps d'execution = " + str(time) + "s")
if printList:
print (solution)
The solution to my problem was to use the timeit module instead of the time module.
import timeit
...
def main(ville):
start = timeit.default_timer()
solution = dynamique(ville)
end = timeit.default_timer()
return (end - start, solution)
Don't confuse the resolution of the system time with the resolution of a floating point number. The time resolution on a computer is only as frequent as the system clock is updated. How often the system clock is updated varies from machine to machine, so to ensure that you will see a difference with time, you will need to make sure it executes for a millisecond or more. Try putting it into a loop like this:
start = time.time()
k = 100000
for i in range(k)
solution = dynamique(ville)
end = time.time()
return ((end - start)/k, solution)
In the final tally, you then need to divide by the number of loop iterations to know how long your code actually runs once through. You may need to increase k to get a good measure of the execution time, or you may need to decrease it if your computer is running in the loop for a very long time.
I made a little function using timeit just so I could be lazy and do less typing which isn't panning out as planned.
The (relevant) code:
def timing(function, retries=10000, formatSeconds=3, repeat=10):
"""Test how long a function takes to run. Defaults are set to run
10 times of 10000 tries each. Will display time as 1 of 4 types.
0 = Seconds, 1 = milliseconds, 2= microseconds and 3 = nanoseconds.
Pass in paramaters as: (function, retries=10000,formatSeconds=3, repeat=10)"""
t = timeit.Timer(lambda: function)
result = t.repeat(repeat=repeat,number=retries)
rlist = [i/retries for i in result]
It runs fine but it keeps returning:
timeprofile.timing(find_boundaries(numpy.asarray(Image.open(
r'D:\Python\image\image4.jpg')),79))
10 runs of 10000 cycles each:
Best time: 137.94764 Worst:158.16651 Avg: 143.25466 nanosecs/pass
Now, if I do from the interpreter:
import timeit
from timeit import Timer
t = timeit.Timer(lambda: (find_boundaries(numpy.asarray(Image.open(r'D:\Python\image\image4.jpg')),79)))
result = t.repeat(repeat=5,number=100)
result = [i/100 for i in result]
I end up with [0.007723014775432375, 0.007615270149786965, 0.0075242365377505395,
0.007420834966038683, 0.0074086862470653615], or about 8 milliseconds.
And if I run the profiler on the script, it also gives approximately the same result of about 8 milliseconds.
I'm not really sure what the problem is although I reckon it has something to do with the how it's calling the function. When I check the data in the debugger it shows the function as a dictionary with a len of 53, and each key contains 1 to 15 tuples with a pair of 2-3 digit numbers in each.
So, if anyone knows why it's doing that and would like to explain it to me, and how to fix it, that'd be great!
Yes, there is a difference. When you run:
timeprofile.timing(find_boundaries(numpy.asarray(Image.open(
r'D:\Python\image\image4.jpg')),79))
You are not passing in a function reference. You are calling the function and instead are passing in the result of that call. You are timing staticresult instead of somefunction(with, arguments).
Move out the lambda:
timeprofile.timing(lambda: (find_boundaries(numpy.asarray(Image.open(
r'D:\Python\image\image4.jpg')),79)))
This means you need to remove it from your timing function, and instead pass the function straight to the Timer() class:
t = timeit.Timer(function)