Why is this call to a dynamic library function so slow?

Why is this call to a dynamic library function so slow? - python

I am writing a shared library for python to call. Since this is my first time using python's ctypes module, and nearly my first time writing a shared library, I have been writing both C and python code to call the library's functions.
For the heck of it I put some timing code in and found that, while most calls of the C program to the library are very fast, the first is slow, considerably slower than its python counterpart in fact. This goes against everything I expected and was hoping that someone could tell me why.
Here is a stripped down version of the header file from my C library.
typedef struct MdaDataStruct
{
int numPts;
int numDists;
float* data;
float* dists;
} MdaData;
//allocate the structure
void* makeMdaStruct(int numPts, int numDist);
//deallocate the structure
void freeMdaStruct(void* strPtr);
//assign the data array
void setData(void* strPtr, float* divData);
Here is the C program that calls the functions:
int main(int argc, char* argv[])
{
clock_t t1, t2;
t1=clock();
long long int diff;
//test the allocate function
t1 = clock();
MdaData* dataPtr = makeMdaStruct(10, 3);
t2 = clock();
diff = (((t2-t1)*1000000)/CLOCKS_PER_SEC);
printf("make struct, took: %d microseconds\n", diff);
//make some data
float testArr[10] = {0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9};
//test the set data function
t1 = clock();
setData(dataPtr, testArr);
t2 = clock();
diff = (((t2-t1)*1000000)/CLOCKS_PER_SEC);
printf("set data, took: %d microseconds\n", diff);
//test the deallocate function
t1 = clock();
freeMdaStruct(dataPtr);
t2 = clock();
diff = (((t2-t1)*1000000)/CLOCKS_PER_SEC);
printf("free struct, took: %d microseconds\n", diff);
//exit
return 0;
}
and here is the python script that calls the functions:
# load the library
t1 = time.time()
cs_lib = cdll.LoadLibrary("./libChiSq.so")
t2 = time.time()
print "load library, took", int((t2-t1)*1000000), "microseconds"
# tell python the function will return a void pointer
cs_lib.makeMdaStruct.restype = c_void_p
# make the strcuture to hold the MdaData with 50 data points and 8 L dists
t1 = time.time()
mdaData = cs_lib.makeMdaStruct(10,3)
t2 = time.time()
print "make struct, took", int((t2-t1)*1000000), "microseconds"
# make an array with the test data
divDat = np.array([0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9], np.float32)
#run the function to load the array into the struct
t1 = time.time()
cs_lib.setData(mdaData, divDat.ctypes.data)
t2 = time.time()
print "set data, took", int((t2-t1)*1000000), "microseconds"
#free the structure
t1 = time.time()
cs_lib.freeMdaStruct(mdaData)
t2 = time.time()
print "free struct, took", int((t2-t1)*1000000), "microseconds"
and finally, here is the output of running the two consecutively:
[]$ ./tester
make struct, took: 60 microseconds
set data, took: 2 microseconds
free struct, took: 2 microseconds
[]$ python so_py_tester.py
load library, took 77 microseconds
make struct, took 3 microseconds
set data, took 23 microseconds
free struct, took 10 microseconds
As you can see, the C call to makeMdaStruct takes 60us and the python call to makeMdaStruct takes 3us, which is highly confusing.
My best guess was that somehow the C code pays the cost of loading the library at the first call? Which confuses me because I thought that the library was loaded when the program was loaded into memory.
Edit: I think there might be a kernel of truth to the guess because I put an extra untimed call to makeMdaStruct and freeMdaStruct before the timed call to makeMdaStruct and got the following output in testing:
[]$ ./tester
make struct, took: 1 microseconds
set data, took: 1 microseconds
free struct, took: 0 microseconds
[]$ python so_py_tester.py
load library, took 70 microseconds
make struct, took 4 microseconds
set data, took 23 microseconds
free struct, took 12 microseconds

My best guess was that somehow the C code pays the cost of loading the library at the first call? Which confuses me because I thought that the library was loaded when the program was loaded into memory.
You are correct in both cases. The library is loaded when the program is loaded. However, the dynamic loader/linker defers symbol resolution until function invocation time.
Calls to shared libraries are done so indirectly, via an entry in the procedure linkage table (PLT). Initially, all of the entries in the PLT point to ld.so. Upon the first call to a function, ld.so looks up the actual address of the symbol, updates the entry in the PLT, and jumps to the function. This is "lazy" symbol resolution.
You can set theLD_BIND_NOW environment variable to change this behavior. From ld.so(8):
LD_BIND_NOW
(libc5; glibc since 2.1.1) If set to a nonempty string, causes the dynamic linker to resolve all symbols at program startup instead of deferring function call
resolution to the point when they are first referenced. This is useful when using a debugger.
This behavior can also be changed at link time. From ld(1):
-z keyword
The recognized keywords are:
...
lazy
When generating an executable or shared library, mark it to
tell the dynamic linker to defer function call resolution to
the point when the function is called (lazy binding), rather
than at load time. Lazy binding is the default.
Further reading:
http://www.macieira.org/blog/2012/01/sorry-state-of-dynamic-libraries-on-linux/

Related

Pybind11 is slower than Pure Python

I created Python Bindings using pybind11. Everything worked perfectly, but when I did a speed check test the result was disappointing.
Basically, I have a function in C++ that adds two numbers and I want to use that function from a Python script. I also included a for loop to ran 100 times to better view the difference in the processing time.
For the function "imported" from C++, using pybind11, I obtain: 0.002310514450073242 ~ 0.0034799575805664062
For the simple Python script, I obtain: 0.0012788772583007812 ~ 0.0015883445739746094
main.cpp file:
#include <pybind11/pybind11.h>
namespace py = pybind11;
double sum(double a, double b) {
return a + b;
}
PYBIND11_MODULE(SumFunction, var) {
var.doc() = "pybind11 example module";
var.def("sum", &sum, "This function adds two input numbers");
}
main.py file:
from build.SumFunction import *
import time
start = time.time()
for i in range(100):
print(sum(2.3,5.2))
end = time.time()
print(end - start)
CMakeLists.txt file:
cmake_minimum_required(VERSION 3.0.0)
project(Projectpybind11 VERSION 0.1.0)
include(CTest)
enable_testing()
add_subdirectory(pybind11)
pybind11_add_module(SumFunction main.cpp)
set(CPACK_PROJECT_NAME ${PROJECT_NAME})
set(CPACK_PROJECT_VERSION ${PROJECT_VERSION})
include(CPack)
Simple Python script:
import time
def summ(a,b):
return a+b
start = time.time()
for i in range(100):
print(summ(2.3,5.2))
end = time.time()
print(end - start)

Benchmarking is a very complicated thing, even can be called as a Systemic Engineering.
Because there are many processes will interference our benchmarking job. For
example: NIC interrupt responsing / keyboard or mouse input / OS scheduling...
I have encountered my producing process being blocked by OS for up to 15 seconds!
So as the other advisors have pointed out, the print() invokes more
unnecessary interference.
Your testing computation is too simple.
You must think it out clearly what are you comparing for.
The speed of passing arguments between Python and C++ is obviously slower than
that of within Python side. So I assume that you want to compare the computing
speed of both, instead of arguments passing speed.
If so, I think your computing codes are too simple, and these will lead to the
time we counted is mainly the time for passing args, while the time for
computing is merely the minor of the total.
So, I put out my sample below, I will be glad to see anyone polish it.
Your loop count is too less.
The less loops, the more randomness. Similar with my opinion 1, testing time
is merely 0.000x second. It is possible, that the running process be interferenced by OS.
I think we should make the testing time to last at least a few of seconds.
C++ is not always faster than Python. Now time there are so many Python
modules/libs can use GPU to execute heavy computation, and parallelly do matrix operations even only by using CPU.
I guess that perhaps you are evaluating whether or not using Pybind11 in your project. I think that comparing like this worth nothing, because what is the best tool depends on what is the real requirement, but it is a good lesson to learn things.
I recently encountered a case, Python is faster than C++ in a Deep Learning.
Haha, funny?
At the end, I run my sample in my PC, and found that the C++ computing speed is faster up to 100 times than that in Python. I hope it be helpful for you.
If anyone would please revise/correct my opinions, it's my pleasure!
Pls forgive my ugly English, I hope I have expressed things correctly.
ComplexCpp.cpp:
#include <cmath>
#include <pybind11/numpy.h>
#include <pybind11/pybind11.h>
namespace py = pybind11;
double Compute( double x, py::array_t<double> ys ) {
// std::cout << "x:" << std::setprecision( 16 ) << x << std::endl;
auto r = ys.unchecked<1>();
for( py::ssize_t i = 0; i < r.shape( 0 ); ++i ) {
double y = r( i );
// std::cout << "y:" << std::setprecision( 16 ) << y << std::endl;
x += y;
x *= y;
y = std::max( y, 1.001 );
x /= y;
x *= std::log( y );
}
return x;
};
PYBIND11_MODULE( ComplexCpp, m ) {
m.def( "Compute", &Compute, "a more complicated computing" );
};
tryComplexCpp.py
import ComplexCpp
import math
import numpy as np
import random
import time
def PyCompute(x: float, ys: np.ndarray) -> float:
#print(f'x:{x}')
for y in ys:
#print(f'y:{y}')
x += y
x *= y
y = max(y, 1.001)
x /= y
x *= math.log(y)
return x
LOOPS: int = 100000000
if __name__ == "__main__":
# initialize random
x0 = random.random()
""" We store all args in a array, then pass them into both C++ func and
python side, to ensure that args for both sides are same. """
args = np.ndarray(LOOPS, dtype=np.float64)
for i in range(LOOPS):
args[i] = random.random()
print('Args are ready, now start...')
# try it with C++
start_time = time.time()
x = ComplexCpp.Compute(x0, args)
print(f'Computing with C++ in { time.time() - start_time }.\n')
# forcely use the result to prevent the entire procedure be optimized(omit)
print(f'The result is {x}\n')
# try it with python
start_time = time.time()
x = PyCompute(x0, args)
print(f'Computing with Python in { time.time() - start_time }.\n')
# forcely use the result to prevent the entire procedure be optimized(omit)
print(f'The result is {x}\n')

Wrong ouptut of a c function returning a double called from python

I want to speed up a python code my calling a c function:
I have a the function in vanilla python sum_and_multiply.py:
def sam_py(lim_sup):
total = 0
for i in range(0,lim_sup): # xrange is slower according
for j in range(1, lim_sup): #to my test but more memory-friendly.
total += (i / j)
return total
then I have the equivalent function in C sum_and_multiply_c.c:
#include <stdio.h>
double sam_c(int lim_sup){
int i;
int j;
double total;
total = 0;
double div;
for (i=0; i<lim_sup; i++){
for (j=1; j<lim_sup; j++){
div = (double) i / j;
// printf("div: %.2f\n", div);
total += div;
// printf("total: %.2f\n", total);
}
}
printf("total: %.2f\n", total);
return total;
}
a file script.py which calls the 2 functions
from sum_and_multiply import sam_py
import time
lim_sup = 6000
start = time.time()
print(sam_py(lim_sup))
end = time.time()
time_elapsed01 = end - start
print("time elapsed: %.4fs" % time_elapsed01)
from ctypes import *
my_c_fun = CDLL("sum_and_multiply_c.so")
start = time.time()
print(my_c_fun.sam_c(lim_sup))
end = time.time()
time_elapsed02 = end - start
print("time elapsed: %.4fs" % time_elapsed02)
print("Speedup coefficient: %.2fx" % (time_elapsed01/time_elapsed02))
and finally a shell script bashscript.zsh which compile the C code and then call script.py
cc -fPIC -shared -o sum_and_multiply_c.so sum_and_multiply_c.c
python script.py
Here is the output:
166951817.45311993
time elapsed: 2.3095s
total: 166951817.45
20
time elapsed: 0.3016s
Speedup coefficient: 7.66x
Here is my question although the c function calculate correctly the result (as output 166951817.45 via printf) its output when passed to python is 20 which wrong. How could I have 166951817.45 instead?
Edit the problem persists after changing the last part of the script.py as follows:
from ctypes import *
my_c_fun = CDLL("sum_and_multiply_c.so")
my_c_fun.restype = c_double
my_c_fun.argtypes = [ c_int ]
start = time.time()
print(my_c_fun.sam_c(lim_sup))
end = time.time()
time_elapsed02 = end - start
print("time elapsed: %.4fs" % time_elapsed02)
print("Speedup coefficient: %.2fx" % (time_elapsed01/time_elapsed02))

You're assuming Python can "see" your function returns a double. But it can't. C doesn't "encode" the return type in anything, so whoever calls a function from a library needs to know its return type, or risk misinterpreting it.
You should have read the documentation of CDLL before using it! If you say this is for the sake of exercise, then that exercise needs to include reading the documentation (that's what good programmers do, no excuses).
class ctypes.CDLL(name, mode=DEFAULT_MODE, handle=None, use_errno=False, use_last_error=False)
Instances of this class represent loaded shared libraries. Functions in these libraries use the standard C calling convention, and are assumed to return int.
(emphasis mine.)
https://docs.python.org/2.7/library/ctypes.html#return-types is your friend (and the top of the page will tell you that Python2 is dead and you shouldn't use it, even if you insist on it. I'm sure you have a better reason than the Python developers themselves!).
my_c_fun = CDLL("sum_and_multiply_c.so")
sam_c = my_c_fun.sam_c
sam_c.restype = c_double
sam_c.argtypes = [ c_int ]
value = sam_c(6000)
print(value)
is the way to go.

C++ vs. python: daylight saving time not recognized when extracting Unix time?

I'm attempting to calculate the Unix time of a given date and time represented by two integers, e.g.
testdate1 = 20060711 (July 11th, 2006)
testdate2 = 4 (00:00:04, 4 seconds after midnight)
in a timezone other than my local timezone. To calculate the Unix time, I feed testdate1, testdate2 into a function I adapted from Convert date to unix time stamp in c++
int unixtime (int testdate1, int testdate2) {
time_t rawtime;
struct tm * timeinfo;
//time1, ..., time6 are external functions that extract the
//year, month, day, hour, minute, seconds digits from testdate1, testdate2
int year=time1(testdate1);
int month=time2(testdate1);
int day=time3(testdate1);
int hour=time4(testdate2);
int minute=time5(testdate2);
int second=time6(testdate2);
time ( &rawtime );
timeinfo = localtime ( &rawtime );
timeinfo->tm_year = year - 1900;
timeinfo->tm_mon = month - 1;
timeinfo->tm_mday = day;
timeinfo->tm_hour = hour;
timeinfo->tm_min = minute;
timeinfo->tm_sec = second;
int date;
date = mktime(timeinfo);
return date;
}
Which I call from the main code
using namespace std;
int main(int argc, char* argv[])
{
int testdate1 = 20060711;
int testdate2 = 4;
//switch to CET time zone
setenv("TZ","Europe/Berlin", 1);
tzset();
cout << testdate1 << "\t" << testdate2 << "\t" << unixtime(testdate1,testdate2) << "\n";
return 0;
}
With the given example, I get unixtime(testdate1,testdate2) = 1152572404, which according to
https://www.epochconverter.com/timezones?q=1152572404&tz=Europe%2FBerlin
is 1:00:04 am CEST, but I want this to be 0:00:04 CEST.
The code seems to work perfectly well if I choose a testdate1, testdate2 in which daylight saving time (DST) isn't being observed. For example, simply setting the month to February with all else unchanged is accomplished by setting testdate1 = 20060211. This gives
unixtime(testdate1,testdate2) = 1139612404, corresponding to hh:mm:ss = 00:00:04 in CET, as desired.
My impression is that setenv("TZ","Europe/Berlin", 1) is supposed to account for DST when applicable, but perhaps I am mistaken. Can TZ interpret testdate1, testdate2 in such a way that it accounts for DST?
Interestingly, I have a python code that performs the same task by changing the local time via os.environ['TZ'] = 'Europe/Berlin'. Here I have no issues, as it seems to calculate the correct Unix time regardless of DST/non-DST.

localtime sets timeinfo->tm_isdst to that of the current time - not of the date you parse.
Don't call localtime. Set timeinfo->tm_isdst to -1:
The value specified in the tm_isdst field informs mktime() whether or not daylight saving time (DST) is in effect for the time supplied in the tm structure: a positive value means DST is in effect; zero means that DST is not in effect; and a negative value means that mktime() should (use timezone information and system databases to) attempt to determine whether DST is in effect at the specified time.
See the code example in https://en.cppreference.com/w/cpp/chrono/c/mktime

Maxim's answer is correct, and I've upvoted it. But I also thought it might be helpful to show how this can be done in C++20 using the newer <chrono> tools. This isn't implemented everywhere yet, but it is here in Visual Studio and will be coming elsewhere soon.
There's two main points I'd like to illustrate here:
<chrono> is convenient for conversions like this, even if both the input and the output does not involve std::chrono types. One can convert the integral input to chrono, do the conversion, and then convert the chrono result back to integral.
There's a thread safety weakness in using the TZ environment variable, as this is a type of global. If another thread is also doing some type of time computation, it may not get the correct answer if the computer's time zone unexpectedly changes out from under it. The <chrono> solution is thread-safe. It doesn't involve globals or environment variables.
The first job is to unpack the integral data. Here I show how to do this, and convert it into chrono types in one step:
std::chrono::year_month_day
get_ymd(int ymd)
{
using namespace std::chrono;
day d(ymd % 100);
ymd /= 100;
month m(ymd % 100);
ymd /= 100;
year y{ymd};
return y/m/d;
}
get_ymd takes "testdate1", extracts the individual integral fields for day, month, and year, then converts each integral field into the std::chrono types day, month and year, and finally combines these three separate fields into a std::chrono::year_month_day to return it as one value. This return type is simple a {year, month, day} data structure -- like a tuple but with calendrical meaning.
The / syntax is simply a convenient factory function for constructing a year_month_day. And this construction can be done with any of these three orderings: y/m/d, d/m/y and m/d/y. This syntax, when combined with auto, also means that you often don't have to spell out the verbose name year_month_day:
auto
get_ymd(int ymd)
{
// ...
return y/m/d;
}
get_hms unpacks the hour, minute and second fields and returns that as a std::chrono::seconds:
std::chrono::seconds
get_hms(int hms)
{
using namespace std::chrono;
seconds s{hms % 100};
hms /= 100;
minutes m{hms % 100};
hms /= 100;
hours h{hms};
return h + m + s;
}
The code is very similar to that for get_ymd except that the return is the sum of the hours, minutes and seconds. The chrono library does the job for you of converting hours and minutes to seconds while performing the summation.
Next is the function for doing the conversion, and returning the result back as an int.
int
unixtime(int testdate1, int testdate2)
{
using namespace std::chrono;
auto ymd = get_ymd(testdate1);
auto hms = get_hms(testdate2);
auto ut = locate_zone("Europe/Berlin")->to_sys(local_days{ymd} + hms);
return ut.time_since_epoch().count();
}
std::chrono::locate_zone is called to get a pointer to the std::chrono::time_zone with the name "Europe/Berlin". The std::lib manages the lifetime of this object, so you don't have to worry about it. It is a const singleton, created on demand. And it has no impact on what time zone your computer considers its "local time zone".
The std::chrono::time_zone has a member function called to_sys that takes a local_time, and converts it to a sys_time, using the proper UTC offset for this time zone (taking into account daylight saving rules when applicable).
Both local_time and sys_time are std::chrono::time_point types. local_time is "some local time", not necessarily your computer's local time. You can associate a local time with a time zone in order to specify the locality of that time.
sys_time is a time_point based on system_clock. This tracks UTC (Unix time).
The expression local_days{ymd} + hms converts ymd and hms to local_time with a precision of seconds. local_days is just another local_time time_point, but with a precision of days.
The type of ut is time_point<system_clock, seconds>, which has a convenience type alias called sys_seconds, though auto makes that name unnecessary in this code.
To unpack the sys_seconds into an integral type, the .time_since_epoch() member function is called which results in the duration seconds, and then the .count() member function is called to extract the integral value from that duration.
When int is 32 bits, this function is susceptible to the year 2038 overflow problem. To fix that, simply change the return type of unixtime to return a 64 bit integral type (or make the return auto). Nothing else needs to change as std::chrono::seconds is already required to be greater than 32 bits and will not overflow at 68 years. Indeed std::chrono::seconds is usually represented by a signed 64 bit integral type in practice, giving it a range greater than the age of the universe (even if the scientists are off by an order of magnitude).

How globally accurate is Python's time.time function? [duplicate]

Is there a way to measure time with high-precision in Python --- more precise than one second? I doubt that there is a cross-platform way of doing that; I'm interesting in high precision time on Unix, particularly Solaris running on a Sun SPARC machine.
timeit seems to be capable of high-precision time measurement, but rather than measure how long a code snippet takes, I'd like to directly access the time values.

The standard time.time() function provides sub-second precision, though that precision varies by platform. For Linux and Mac precision is +- 1 microsecond or 0.001 milliseconds. Python on Windows uses +- 16 milliseconds precision due to clock implementation problems due to process interrupts. The timeit module can provide higher resolution if you're measuring execution time.
>>> import time
>>> time.time() #return seconds from epoch
1261367718.971009
Python 3.7 introduces new functions to the time module that provide higher resolution:
>>> import time
>>> time.time_ns()
1530228533161016309
>>> time.time_ns() / (10 ** 9) # convert to floating-point seconds
1530228544.0792289

Python tries hard to use the most precise time function for your platform to implement time.time():
/* Implement floattime() for various platforms */
static double
floattime(void)
{
/* There are three ways to get the time:
(1) gettimeofday() -- resolution in microseconds
(2) ftime() -- resolution in milliseconds
(3) time() -- resolution in seconds
In all cases the return value is a float in seconds.
Since on some systems (e.g. SCO ODT 3.0) gettimeofday() may
fail, so we fall back on ftime() or time().
Note: clock resolution does not imply clock accuracy! */
#ifdef HAVE_GETTIMEOFDAY
{
struct timeval t;
#ifdef GETTIMEOFDAY_NO_TZ
if (gettimeofday(&t) == 0)
return (double)t.tv_sec + t.tv_usec*0.000001;
#else /* !GETTIMEOFDAY_NO_TZ */
if (gettimeofday(&t, (struct timezone *)NULL) == 0)
return (double)t.tv_sec + t.tv_usec*0.000001;
#endif /* !GETTIMEOFDAY_NO_TZ */
}
#endif /* !HAVE_GETTIMEOFDAY */
{
#if defined(HAVE_FTIME)
struct timeb t;
ftime(&t);
return (double)t.time + (double)t.millitm * (double)0.001;
#else /* !HAVE_FTIME */
time_t secs;
time(&secs);
return (double)secs;
#endif /* !HAVE_FTIME */
}
}
( from http://svn.python.org/view/python/trunk/Modules/timemodule.c?revision=81756&view=markup )

David's post was attempting to show what the clock resolution is on Windows. I was confused by his output, so I wrote some code that shows that time.time() on my Windows 8 x64 laptop has a resolution of 1 msec:
# measure the smallest time delta by spinning until the time changes
def measure():
t0 = time.time()
t1 = t0
while t1 == t0:
t1 = time.time()
return (t0, t1, t1-t0)
samples = [measure() for i in range(10)]
for s in samples:
print s
Which outputs:
(1390455900.085, 1390455900.086, 0.0009999275207519531)
(1390455900.086, 1390455900.087, 0.0009999275207519531)
(1390455900.087, 1390455900.088, 0.0010001659393310547)
(1390455900.088, 1390455900.089, 0.0009999275207519531)
(1390455900.089, 1390455900.09, 0.0009999275207519531)
(1390455900.09, 1390455900.091, 0.0010001659393310547)
(1390455900.091, 1390455900.092, 0.0009999275207519531)
(1390455900.092, 1390455900.093, 0.0009999275207519531)
(1390455900.093, 1390455900.094, 0.0010001659393310547)
(1390455900.094, 1390455900.095, 0.0009999275207519531)
And a way to do a 1000 sample average of the delta:
reduce( lambda a,b:a+b, [measure()[2] for i in range(1000)], 0.0) / 1000.0
Which output on two consecutive runs:
0.001
0.0010009999275207519
So time.time() on my Windows 8 x64 has a resolution of 1 msec.
A similar run on time.clock() returns a resolution of 0.4 microseconds:
def measure_clock():
t0 = time.clock()
t1 = time.clock()
while t1 == t0:
t1 = time.clock()
return (t0, t1, t1-t0)
reduce( lambda a,b:a+b, [measure_clock()[2] for i in range(1000000)] )/1000000.0
Returns:
4.3571334791658954e-07
Which is ~0.4e-06
An interesting thing about time.clock() is that it returns the time since the method was first called, so if you wanted microsecond resolution wall time you could do something like this:
class HighPrecisionWallTime():
def __init__(self,):
self._wall_time_0 = time.time()
self._clock_0 = time.clock()
def sample(self,):
dc = time.clock()-self._clock_0
return self._wall_time_0 + dc
(which would probably drift after a while, but you could correct this occasionally, for example dc > 3600 would correct it every hour)

If Python 3 is an option, you have two choices:
time.perf_counter which always use the most accurate clock on your platform. It does include time spent outside of the process.
time.process_time which returns the CPU time. It does NOT include time spent outside of the process.
The difference between the two can be shown with:
from time import (
process_time,
perf_counter,
sleep,
)
print(process_time())
sleep(1)
print(process_time())
print(perf_counter())
sleep(1)
print(perf_counter())
Which outputs:
0.03125
0.03125
2.560001310720671e-07
1.0005455362793145

You can also use time.clock() It counts the time used by the process on Unix and time since the first call to it on Windows. It's more precise than time.time().
It's the usually used function to measure performance.
Just call
import time
t_ = time.clock()
#Your code here
print 'Time in function', time.clock() - t_
EDITED: Ups, I miss the question as you want to know exactly the time, not the time spent...

Python 3.7 introduces 6 new time functions with nanosecond resolution, for example instead of time.time() you can use time.time_ns() to avoid floating point imprecision issues:
import time
print(time.time())
# 1522915698.3436284
print(time.time_ns())
# 1522915698343660458
These 6 functions are described in PEP 564:
time.clock_gettime_ns(clock_id)
time.clock_settime_ns(clock_id, time:int)
time.monotonic_ns()
time.perf_counter_ns()
time.process_time_ns()
time.time_ns()
These functions are similar to the version without the _ns suffix, but
return a number of nanoseconds as a Python int.

time.clock() has 13 decimal points on Windows but only two on Linux.
time.time() has 17 decimals on Linux and 16 on Windows but the actual precision is different.
I don't agree with the documentation that time.clock() should be used for benchmarking on Unix/Linux. It is not precise enough, so what timer to use depends on operating system.
On Linux, the time resolution is high in time.time():
>>> time.time(), time.time()
(1281384913.4374139, 1281384913.4374161)
On Windows, however the time function seems to use the last called number:
>>> time.time()-int(time.time()), time.time()-int(time.time()), time.time()-time.time()
(0.9570000171661377, 0.9570000171661377, 0.0)
Even if I write the calls on different lines in Windows it still returns the same value so the real precision is lower.
So in serious measurements a platform check (import platform, platform.system()) has to be done in order to determine whether to use time.clock() or time.time().
(Tested on Windows 7 and Ubuntu 9.10 with python 2.6 and 3.1)

The original question specifically asked for Unix but multiple answers have touched on Windows, and as a result there is misleading information on windows. The default timer resolution on windows is 15.6ms you can verify here.
Using a slightly modified script from cod3monk3y I can show that windows timer resolution is ~15milliseconds by default. I'm using a tool available here to modify the resolution.
Script:
import time
# measure the smallest time delta by spinning until the time changes
def measure():
t0 = time.time()
t1 = t0
while t1 == t0:
t1 = time.time()
return t1-t0
samples = [measure() for i in range(30)]
for s in samples:
print(f'time delta: {s:.4f} seconds')
These results were gathered on windows 10 pro 64-bit running python 3.7 64-bit.

The comment left by tiho on Mar 27 '14 at 17:21 deserves to be its own answer:
In order to avoid platform-specific code, use timeit.default_timer()

I observed that the resolution of time.time() is different between Windows 10 Professional and Education versions.
On a Windows 10 Professional machine, the resolution is 1 ms.
On a Windows 10 Education machine, the resolution is 16 ms.
Fortunately, there's a tool that increases Python's time resolution in Windows:
https://vvvv.org/contribution/windows-system-timer-tool
With this tool, I was able to achieve 1 ms resolution regardless of Windows version. You will need to be keep it running while executing your Python codes.

For those stuck on windows (version >= server 2012 or win 8)and python 2.7,
import ctypes
class FILETIME(ctypes.Structure):
_fields_ = [("dwLowDateTime", ctypes.c_uint),
("dwHighDateTime", ctypes.c_uint)]
def time():
"""Accurate version of time.time() for windows, return UTC time in term of seconds since 01/01/1601
"""
file_time = FILETIME()
ctypes.windll.kernel32.GetSystemTimePreciseAsFileTime(ctypes.byref(file_time))
return (file_time.dwLowDateTime + (file_time.dwHighDateTime << 32)) / 1.0e7
GetSystemTimePreciseAsFileTime function

On the same win10 OS system using "two distinct method approaches" there appears to be an approximate "500 ns" time difference. If you care about nanosecond precision check my code below.
The modifications of the code is based on code from user cod3monk3y and Kevin S.
OS: python 3.7.3 (default, date, time) [MSC v.1915 64 bit (AMD64)]
def measure1(mean):
for i in range(1, my_range+1):
x = time.time()
td = x- samples1[i-1][2]
if i-1 == 0:
td = 0
td = f'{td:.6f}'
samples1.append((i, td, x))
mean += float(td)
print (mean)
sys.stdout.flush()
time.sleep(0.001)
mean = mean/my_range
return mean
def measure2(nr):
t0 = time.time()
t1 = t0
while t1 == t0:
t1 = time.time()
td = t1-t0
td = f'{td:.6f}'
return (nr, td, t1, t0)
samples1 = [(0, 0, 0)]
my_range = 10
mean1 = 0.0
mean2 = 0.0
mean1 = measure1(mean1)
for i in samples1: print (i)
print ('...\n\n')
samples2 = [measure2(i) for i in range(11)]
for s in samples2:
#print(f'time delta: {s:.4f} seconds')
mean2 += float(s[1])
print (s)
mean2 = mean2/my_range
print ('\nMean1 : ' f'{mean1:.6f}')
print ('Mean2 : ' f'{mean2:.6f}')
The measure1 results:
nr, td, t0
(0, 0, 0)
(1, '0.000000', 1562929696.617988)
(2, '0.002000', 1562929696.6199884)
(3, '0.001001', 1562929696.620989)
(4, '0.001001', 1562929696.62199)
(5, '0.001001', 1562929696.6229906)
(6, '0.001001', 1562929696.6239917)
(7, '0.001001', 1562929696.6249924)
(8, '0.001000', 1562929696.6259928)
(9, '0.001001', 1562929696.6269937)
(10, '0.001001', 1562929696.6279945)
...
The measure2 results:
nr, td , t1, t0
(0, '0.000500', 1562929696.6294951, 1562929696.6289947)
(1, '0.000501', 1562929696.6299958, 1562929696.6294951)
(2, '0.000500', 1562929696.6304958, 1562929696.6299958)
(3, '0.000500', 1562929696.6309962, 1562929696.6304958)
(4, '0.000500', 1562929696.6314962, 1562929696.6309962)
(5, '0.000500', 1562929696.6319966, 1562929696.6314962)
(6, '0.000500', 1562929696.632497, 1562929696.6319966)
(7, '0.000500', 1562929696.6329975, 1562929696.632497)
(8, '0.000500', 1562929696.633498, 1562929696.6329975)
(9, '0.000500', 1562929696.6339984, 1562929696.633498)
(10, '0.000500', 1562929696.6344984, 1562929696.6339984)
End result:
Mean1 : 0.001001 # (measure1 function)
Mean2 : 0.000550 # (measure2 function)

Here is a python 3 solution for Windows building upon the answer posted above by CyberSnoopy (using GetSystemTimePreciseAsFileTime). We borrow some code from jfs
Python datetime.utcnow() returning incorrect datetime
and get a precise timestamp (Unix time) in microseconds
#! python3
import ctypes.wintypes
def utcnow_microseconds():
system_time = ctypes.wintypes.FILETIME()
#system call used by time.time()
#ctypes.windll.kernel32.GetSystemTimeAsFileTime(ctypes.byref(system_time))
#getting high precision:
ctypes.windll.kernel32.GetSystemTimePreciseAsFileTime(ctypes.byref(system_time))
large = (system_time.dwHighDateTime << 32) + system_time.dwLowDateTime
return large // 10 - 11644473600000000
for ii in range(5):
print(utcnow_microseconds()*1e-6)
References
https://learn.microsoft.com/en-us/windows/win32/sysinfo/time-functions
https://learn.microsoft.com/en-us/windows/win32/api/sysinfoapi/nf-sysinfoapi-getsystemtimepreciseasfiletime
https://support.microsoft.com/en-us/help/167296/how-to-convert-a-unix-time-t-to-a-win32-filetime-or-systemtime

1. Python 3.7 or later
If using Python 3.7 or later, use the modern, cross-platform time module functions such as time.monotonic_ns(), here: https://docs.python.org/3/library/time.html#time.monotonic_ns. It provides nanosecond-resolution timestamps.
import time
time_ns = time.monotonic_ns()
# or on Unix or Linux you can also use:
time_ns = time.clock_gettime_ns()
# or on Windows:
time_ns = time.perf_counter_ns()
# etc. etc. There are others. See the link above.
From my other answer from 2016, here: How can I get millisecond and microsecond-resolution timestamps in Python?:
You might also try time.clock_gettime_ns() on Unix or Linux systems. Based on its name, it appears to call the underlying clock_gettime() C function which I use in my nanos() function in C in my answer here and in my C Unix/Linux library here: timinglib.c.
2. Python 3.3 or later
On Windows, in Python 3.3 or later, you can use time.perf_counter(), as shown by #ereOn here. See: https://docs.python.org/3/library/time.html#time.perf_counter. This provides roughly a 0.5us-resolution timestamp, in floating point seconds. Ex:
import time
# For Python 3.3 or later
time_sec = time.perf_counter() # Windows only, I think
# or on Unix or Linux (I think only those)
time_sec = time.monotonic()
3. Pre-Python 3.3 (ex: Python 3.0, 3.1, 3.2), or later
Summary:
See my other answer from 2016 here for 0.5-us-resolution timestamps, or better, in Windows and Linux, and for versions of Python as old as 3.0, 3.2 or 3.2 even! We do this by calling C or C++ shared object libraries (.dll on Windows, or .so on Unix or Linux) using the ctypes module in Python.
I provide these functions:
millis()
micros()
delay()
delayMicroseconds()
Download GS_timing.py from my eRCaGuy_PyTime repo, then do:
import GS_timing
time_ms = GS_timing.millis()
time_us = GS_timing.micros()
GS_timing.delay(10) # delay 10 ms
GS_timing.delayMicroseconds(10000) # delay 10000 us
Details:
In 2016, I was working in Python 3.0 or 3.1, on an embedded project on a Raspberry Pi, and which I tested and ran frequently on Windows also. I needed nanosecond resolution for some precise timing I was doing with ultrasonic sensors. The Python language at the time did not provide this resolution, and neither did any answer to this question, so I came up with this separate Q&A here: How can I get millisecond and microsecond-resolution timestamps in Python?. I stated in the question at the time:
I read other answers before asking this question, but they rely on the time module, which prior to Python 3.3 did NOT have any type of guaranteed resolution whatsoever. Its resolution is all over the place. The most upvoted answer here quotes a Windows resolution (using their answer) of 16 ms, which is 32000 times worse than my answer provided here (0.5 us resolution). Again, I needed 1 ms and 1 us (or similar) resolutions, not 16000 us resolution.
Zero, I repeat: zero answers here on 12 July 2016 had any resolution better than 16-ms for Windows in Python 3.1. So, I came up with this answer which has 0.5us or better resolution in pre-Python 3.3 in Windows and Linux. If you need something like that for an older version of Python, or if you just want to learn how to call C or C++ dynamic libraries in Python (.dll "dynamically linked library" files in Windows, or .so "shared object" library files in Unix or Linux) using the ctypes library, see my other answer here.

I created a tiny C-Extension that uses GetSystemTimePreciseAsFileTime to provide an accurate timestamp on Windows:
https://win-precise-time.readthedocs.io/en/latest/api.html#win_precise_time.time
Usage:
>>> import win_precise_time
>>> win_precise_time.time()
1654539449.4548845

def start(self):
sec_arg = 10.0
cptr = 0
time_start = time.time()
time_init = time.time()
while True:
cptr += 1
time_start = time.time()
time.sleep(((time_init + (sec_arg * cptr)) - time_start ))
# AND YOUR CODE .......
t00 = threading.Thread(name='thread_request', target=self.send_request, args=([]))
t00.start()

Trying to read an ADC with Cython on an RPi 2 b+ via SPI (MCP3304, MCP3204, or MCP3008)?

I'd like to read differential voltage values from an MCP3304 (5v VDD, 3.3v Vref, main channel = 7, diff channel = 6) connected to an RPi 2 b+ as close as possible to the MCP3304's max sample rate of 100ksps. Preferably, I'd get > 1 sample per 100µs (> 10 ksps).
A kind user recently suggested I try porting my code to C for some speed gains. I'm VERY new to C, so thought I'd give Cython a shot, but can't seem to figure out how to tap into the C-based speed gains.
My guess is that I need to write a .pyx file that includes a more-raw means of accessing the ADC's bits/bytes via SPI than the python package I'm currently using (the python-wrapped gpiozero package). 1) Does this seem correct, and, if so, might someone 2) please help me understand how to manipulate bits/bytes appropriately for the MCP3304 in a way that will produce speed gains from Cython? I've seen C tutorials for the MCP3008, but am having trouble adapting this code to fit the timing laid out in the MCP3304's spec sheet; though I might be able to adapt a Cython-specific MCP3008 (or other ADC) tutorial to fit the MCP3304.
Here's a little .pyx loop I wrote to test how fast I'm reading voltage values. (Timing how long it takes to read 25,000 samples). It's ~ 9% faster than running it straight in Python.
# Load packages
import time
from gpiozero import MCP3304
# create a class to ping PD every ms for 1 minute
def pd_ping():
cdef char *FILENAME = "testfile.txt"
cdef double v
# cdef int adc_channel_pd = 7
cdef size_t i
# send message to user re: expected timing
print "Runing for ", 25000 , "iterations. Fingers crossed!"
print time.clock()
s = []
for i in range(25000):
v = MCP3304(channel = 7, diff = True).value * 3.3
# concatenate
s.append( str(v) )
print "donezo" , time.clock()
# write to file
out = '\n'.join(s)
f = open(FILENAME, "w")
f.write(out)

There is probably no need to create an MCP3304 object for each iteration. Also conversion from float to string can be delayed.
s = []
mcp = MCP3304(channel = 7, diff = True)
for i in range(25000):
v = mcp.value * 3.3
s.append(v)
out = '\n'.join('{:.2f}'.format(v) for v in s) + '\n'
If that multiplication by 3.3 is not strictly necessary at that point, it could be done later.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Why is this call to a dynamic library function so slow? - python

Related

Pybind11 is slower than Pure Python

Wrong ouptut of a c function returning a double called from python

C++ vs. python: daylight saving time not recognized when extracting Unix time?

How globally accurate is Python's time.time function? [duplicate]

Trying to read an ADC with Cython on an RPi 2 b+ via SPI (MCP3304, MCP3204, or MCP3008)?

Categories

Resources