I'd like to know how'd you measure the amount of clock cycles per instruction say copy int from one place to another?
I know you can time it down to nano seconds but with today's cpu's that resolution is too low to get a correct reading for the oprations that take just a few clock cycles?
It there a way to confirm how many clock cycles per instructions like adding and subing it takes in python? if so how?
This is a very interesting question that can easily throw you into the rabbit's hole. Basically any CPU cycle measurements depends on your processors and compilers RDTSC implementation.
For python there is a package called hwcounter that can be used as follows:
# pip install hwcounter
from hwcounter import Timer, count, count_end
from time import sleep
# Method-1
start = count()
# Do something here:
sleep(1)
elapsed = count_end() - start
print(f'Elapsed cycles: {elapsed:,}')
# Method-2
with Timer() as t:
# Do something here:
sleep(1)
print(f'Elapsed cycles: {t.cycles:,}')
NOTE:
It seem that the hwcounter implementation is currently broken for Windows python builds. A working alternative is to build the pip package using the mingw compiler, instead of MS VS.
Caveats
Using this method, always depend on how your computer is scheduling tasks and threads among its processors. Ideally you'd need to:
bind the test code to one unused processor (aka. processor affinity)
Run the tests over 1k - 1M times to get a good average.
Need a good understanding of not only compilers, but also how python optimize its code internally. Many things are not at all obvious, especially if you come from C/C++/C# background.
Rabbit Hole:
http://en.wikipedia.org/wiki/Time_Stamp_Counter
https://github.com/MicrosoftDocs/cpp-docs/blob/main/docs/intrinsics/rdtsc.md
How to get the CPU cycle count in x86_64 from C++?
__asm
__rdtsc
__cpuid, __cpuidex
Defining __asm Blocks as C Macros
Related
Input: array of float time values (in seconds) relative to program start. [0.452, 0.963, 1.286, 2.003, ... ]. They are not evenly spaced apart.
Desired Output: Output text to console at those times (i.e. printing '#')
My question is what is the best design principle to go about this. Below is my naive solution using time.time.
times = [0.452, 0.963, 1.286, 2.003]
start_time = time.time()
for event_time in times:
while 1:
if time.time() - start_time >= event_time:
print '#'
break
The above feels intuitively wrong using that busy loop (even if its in its own thread).
I'm leaning towards scheduling but want to make sure there aren't better design options: Executing periodic actions in Python
There is also the timer object: timers
Edit: Events only need 10ms precision, so +/- 10ms from exact event time.
A better pattern than busy waiting might be to use time.sleep(). This suspends execution rather than using the CPU.
time_diffs = [0.452, 0.511, 0.323, 0.716]
for diff in time_diffs:
time.sleep(diff)
print '#'
Threading can also be used to similar effect. However both of these solutions only work if the action you want to perform each time the program 'restarts' takes negligible time (perhaps not true of printing).
That being said no pattern is going to work if you are after 10ms precision and want to use Python on a standard OS. I recommend this question on Real time operating via Python which explains both that GUI events (i.e. printing to a screen) are too slow and unreliable for that level of precision, that your typical OSs where Python is run do not guarantee that level of precision and that Python's garbage collection and memory management also play havoc if you want 'real-time' events.
I would like to speed up a SimPy simulation (if possible), but I'm not sure the best way to insert timers to even see what is taking long.
Is there a way to do this?
I would recommend using runsnakerun (or I guess snakeviz in py3x), which uses cProfile(there are directions on runsnakerun's webpage)
basically you just run your program
python -m cProfile -o profile.dump my_main.py
then you can get a nice visual view of your profile with runsnake (or snakeviz if using py3)
python runsnakerun.py profile.dump
(note that running it in profile mode will probably slow down your code even more ... but its really just to identify slow parts)
import time
t1 = time.time()
#code to time
t2 = time.time()
print(t2 - t1)
You can use this and compare the times with all code samples you want to test
I'm running some experiments in a virtual machine, which has its system time updated if its suspended. I want to know if I can suspend the virtual machine and not affect the timer. That is, does Timer use system time or wall time?
I've tried looking through the source code and got to _thread.lock.acquire before the code dips into C.
Below is my code. I delegate to a subprocess that outputs 'plans'. I keep collecting these plans until the optimal plan is found or the maximum allowed time has elapsed. The timer is used to terminate the process if time runs out (which is the expected state of affairs). I need to know that the Timer will not be effected by system time being updated as it will invalidate the experiment I'm running.
p = Popen(args, stdout=PIPE, cwd=self.working_directory)
timer = Timer(float(self.planning_time), p.terminate)
timer.start()
plan = None
while True:
try:
plan = list(decode_plan_from_optic(self.decode(p.stdout),
report_incomplete_plan=True))
except IncompletePlanException:
break
timer.cancel()
Upon examination of the python source code for *unix it systems, I found that python eventually delegates to sem_timedwait from semaphore.h or pthread_cond_timedwait from pthread.h depending on support. Either way, both functions take a struct timespec from time.h as an absolute time to wait till -- timespec is the number of seconds and nanoseconds since the epoch.
So on the face of it seems that waiting in python is dependent on system time -- meaning my program would be effected by a change in system time. However, time.h specifies the constant CLOCK_MONOTONIC_RAW and the function clock_gettime, if a monotonically increasing clock is required. Showing there is an ability to wait independently of system time. However, sadly python uses gettimeofday (marked as obsolete since 2008) which is affected by changes to system time.
In short, waiting in python on *unix systems is effected by changes to system time.
Using python for starting modelling/simulation runs (tuflow) and logging the runs to db
Currently on windows , python 2.7 and using timeit() .
Is it better to stick with using timeit() or to switch to using time.clock() ?
Simulation/modelling runs can be anything from a couple of minutes to a week+.
Need reasonably accurate runtimes.
I know time.clock has been depreciated on 3.3+ but I can't see this code getting moved to 3 for a long while.
self.starttime = timeit.default_timer()
run sim()
self.endtime = timeit.default_timer()
self.runtime = self.starttime - self.endtime
timeit's `timeit() function performs the code multiple times and takes the best, so it's not a great choice for lengthy runs. But your code just uses the timer, whose documentation warns that it measures elapsed time, so you will need to ensure standard running conditions in so far as you can. If you want to measure actual CPU usage, that's tricker.
When you say "reasonably accurate" times, one percent of two minutes is 2.4 seconds. The precision of time.clock() or the default timer is never going to be less than a second. Is that accurate enough?
To mitigate any possibility of migration to Python 3 you can define your own timing function. On Python 2 it can use time.clock() but at least you will only have one place to alter code if you do migrate.
I've been working on some Project Euler problems in Python 3 [osx 10.9], and I like to know how long they take to run.
I've been using the following two approaches to time my programs:
1)
import time
start = time.time()
[program]
print(time.time() - start)
2) On the bash command line, typing time python3 ./program.py
However, these two methods often give wildy different results. In the program I am working on now, the first returns 0.000263 (seconds, truncated) while the second gives
real 0m0.044s
user 0m0.032s
sys 0m0.009s
Clearly there is a huge discrepancy - two orders of magnitude compared to the real time.
My questions are:
a) Why the difference? Is it overhead from the interpreter?
b) Which one should I be using to accurately determine how long the program takes to run? Is time.time() accurate at such small intervals?
I realize these miniscule times are not of the utmost importance; this was more of a curiosity.
Thanks.
[UPDATE:]
Thank-you to all of the answers & comments. You were correct with the overhead. This program:
import time
start = time.time()
print("hello world")
print(time.time() - start)
takes ~0.045 sec, according to bash.
My complicated Project Euler problem took ~0.045 sec, according to bash. Problem solved.
I'll take a look at timeit. Thanks.
The interpreter imports site.py and can touch upon various other files on start-up. This all takes time before your import time line is ever executed:
$ touch empty.py
$ time python3 empty.py
real 0m0.158s
user 0m0.033s
sys 0m0.021s
When timing code, take into account that other processes, disk flushes and hardware interrupts all take time too and influence your timings.
Use timeit.default_timer() to get the most accurate timer for your platform, but preferably use the timeit module itself to time individual snippets of code to eliminate as many variables as possible.
Because when you run the time builtin in bash the real time taken includes the time taken to start up the Python interpreter and import the required modules to run your code, rather than just timing the execution of a single function in your code.
To see this, try for example
import os
import time
start = time.time()
os.system('python <path_to_your_script>')
print time.time() - start
You'll find that this is much closer to what time reports.