I'm new to Cython, and have made a mean.pyx file with
#cython: language_level=3
def cmean(list arr, int length):
cdef float tot
cdef float elem
tot = 0
for i in range(length):
elem = arr[i]
tot += elem
tot /= length
return tot
I then call this from a Python file main.py:
import pyximport
pyximport.install()
from mean import cmean
arr = [1,2,4]
cres = cmean(arr, len(arr))
pyres = sum(arr)/len(arr)
print(cres)
print(pyres)
print(cres == pyres)
which outputs
2.3333332538604736
2.3333333333333335
False
Why are the results not the same?
I'm using Cython==0.29.30 and Python 3.9.2
Related
This is my code:
import trng
bitn: int = 2 ** (2 ** 3)
w: "function" = lambda q=1: q * (trng.randbelow(1001) / 1000)
def rn(x0: int = None):
if (m := trng.randbelow(bitn)) <= 1:
m: int = 2
a, c = (trng.randbelow(m) for _ in range(2))
if (c := trng.randbelow(m)) <= 1:
c: int = 2
if x0 == None:
x0: int = trng.randbelow(m)
xs: list = [x0]
for k in range(1, bitn):
x = (a * xs[k - 1] + c) % m
if xs.count(x) > 1:
break
xs.append(x)
xs: str = str(sum(xs))
x = trng.choice(xs)
del m, a, c
del x0, xs, k
return x
def sm():
listbits = [rn() for _ in range(64)]
listbits: str = "".join(listbits)
listbits: int = int(listbits)
return listbits
while True: print(sm())
But, when I run this, python returns a number less than 64 characters, which is what I'm doing in the above code in the sm () function, I'm not able to understand. I've tried using the int function, outside the function (in the while loop), converting on the same line as I convert the list to string with the join method and an empty string, can someone help me please?
Thanks for your time and attention!
I'm using Windows 10 1903 x86 and Python 3.8.3
I want to iterate through a large data structure in a Python program and perform a task for each element. For simplicity, let's say the elements are integers and the task is just an incrementation. In the end, the last incremented element is returned as (dummy) result. In search of the best structure/method to do this I compared timings in pure Python and Cython for these structures (I could not find a direct comparison of them elsewhere):
Python list
NumPy array / typed memory view
Cython extension type with underlying C++ vector
The iterations I timed are:
Python foreach in list iteration (it_list)
Cython list iteration with explicit element access (cit_list)
Python foreach in array iteration (it_nparray)
Python NumPy vectorised operation (vec_nparray)
Cython memory view iteration with explicit element access (cit_memview)
Python foreach in underlying vector iteration (it_pyvector)
Python foreach in underlying vector iteration via __iter__ (it_pyvector_iterator)
Cython vector iteration with explicit element access (cit_pyvector)
Cython vector iteration via vector.iterator (cit_pyvector_iterator)
I am concluding from this (timings are below):
plain Python iteration over the NumPy array is extremely slow (about 10 times slower than the Python list iteration) -> not a good idea
Python iteration over the wrapped C++ vector is slow, too (about 1.5 times slower than the Python list iteration) -> not a good idea
Cython iteration over the wrapped C++ vector is the fastest option, approximately equal to the C contiguous memory view
The iteration over the vector using explicit element access is slightly faster than using an iterator -> why bother to use an iterator?
The memory view approach has comparably larger overhead than the extension type approach
My question is now: Are my numbers reliable (did I do something wrong or miss anything here)? Is this in line with your experience with real-world examples? Is there anything else I could do to improve the iteration? Below the code that I used and the timings. I am using this in a Jupyter notebook by the way. Suggestions and comments are highly appreciated!
Relative timings (minimum value 1.000), for different data structure sizes n:
================================================================================
Timings for n = 1:
--------------------------------------------------------------------------------
cit_pyvector_iterator: 1.000
cit_pyvector: 1.005
cit_list: 1.023
it_list: 3.064
it_pyvector: 4.230
it_pyvector_iterator: 4.937
cit_memview: 8.196
vec_nparray: 20.187
it_nparray: 25.310
================================================================================
================================================================================
Timings for n = 1000:
--------------------------------------------------------------------------------
cit_pyvector_iterator: 1.000
cit_pyvector: 1.001
cit_memview: 2.453
vec_nparray: 5.845
cit_list: 9.944
it_list: 137.694
it_pyvector: 199.702
it_pyvector_iterator: 218.699
it_nparray: 1516.080
================================================================================
================================================================================
Timings for n = 1000000:
--------------------------------------------------------------------------------
cit_pyvector: 1.000
cit_memview: 1.056
cit_pyvector_iterator: 1.197
vec_nparray: 2.516
cit_list: 7.089
it_list: 87.099
it_pyvector_iterator: 143.232
it_pyvector: 162.374
it_nparray: 897.602
================================================================================
================================================================================
Timings for n = 10000000:
--------------------------------------------------------------------------------
cit_pyvector: 1.000
cit_memview: 1.004
cit_pyvector_iterator: 1.060
vec_nparray: 2.721
cit_list: 7.714
it_list: 88.792
it_pyvector_iterator: 130.116
it_pyvector: 149.497
it_nparray: 872.798
================================================================================
Cython code:
%%cython --annotate
# distutils: language = c++
# cython: boundscheck = False
# cython: wraparound = False
from libcpp.vector cimport vector
from cython.operator cimport dereference as deref, preincrement as princ
# Extension type wrapping a vector
cdef class pyvector:
cdef vector[long] _data
cpdef void push_back(self, long x):
self._data.push_back(x)
def __iter__(self):
cdef size_t i, n = self._data.size()
for i in range(n):
yield self._data[i]
#property
def data(self):
return self._data
# Cython iteration over Python list
cpdef long cit_list(list l):
cdef:
long j, ii
size_t i, n = len(l)
for i in range(n):
ii = l[i]
j = ii + 1
return j
# Cython iteration over NumPy array
cpdef long cit_memview(long[::1] v) nogil:
cdef:
size_t i, n = v.shape[0]
long j
for i in range(n):
j = v[i] + 1
return j
# Iterate over pyvector
cpdef long cit_pyvector(pyvector v) nogil:
cdef:
size_t i, n = v._data.size()
long j
for i in range(n):
j = v._data[i] + 1
return j
cpdef long cit_pyvector_iterator(pyvector v) nogil:
cdef:
vector[long].iterator it = v._data.begin()
long j
while it != v._data.end():
j = deref(it) + 1
princ(it)
return j
Python code:
# Python iteration over Python list
def it_list(l):
for i in l:
j = i + 1
return j
# Python iteration over NumPy array
def it_nparray(a):
for i in a:
j = i + 1
return j
# Vectorised NumPy operation
def vec_nparray(a):
a + 1
return a[-1]
# Python iteration over C++ vector extension type
def it_pyvector_iterator(v):
for i in v:
j = i + 1
return j
def it_pyvector(v):
for i in v.data:
j = i + 1
return j
And for the benchmark:
import numpy as np
from operator import itemgetter
def bm(sizes):
"""Call functions with data structures of varying length"""
Timings = {}
for n in sizes:
Timings[n] = {}
# Python list
list_ = list(range(n))
# NumPy array
a = np.arange(n, dtype=np.int64)
# C++ vector extension type
pyv = pyvector()
for i in range(n):
pyv.push_back(i)
calls = [
(it_list, list_),
(cit_list, list_),
(it_nparray, a),
(vec_nparray, a),
(cit_memview, a),
(it_pyvector, pyv),
(it_pyvector_iterator, pyv),
(cit_pyvector, pyv),
(cit_pyvector_iterator, pyv),
]
for fxn, arg in calls:
Timings[n][fxn.__name__] = %timeit -o fxn(arg)
return Timings
def ratios(timings, base=None):
"""Show relative performance of runs based on `timings` dict"""
if base is not None:
base = timings[base].average
else:
base = min(x.average for x in timings.values())
return sorted([
(k, v.average / base)
for k, v in timings.items()
], key=itemgetter(1))
Timings = {}
sizes = [1, 1000, 1000000, 10000000]
Timings.update(bm(sizes))
for s in sizes:
print("=" * 80)
print(f"Timings for n = {s}:")
print("-" * 80)
for x in ratios(Timings[s]):
print(f"{x[0]:>25}: {x[1]:7.3f}")
print("=" * 80, "\n")
I am a newbie with cython and trying to convert a python class to cython. I don't know how I should define argument z in instance Da, in the way that it can deal with both numpy.array or just a single float number.
cdef class Cosmology(object):
cdef double omega_m, omega_lam, omega_c
def __init__(self,double omega_m=0.3,double omega_lam=0.7):
self.omega_m = omega_m
self.omega_lam = omega_lam
self.omega_c = (1. - omega_m - omega_lam)
cpdef double a(self, double z):
cdef double a
return 1./(1+z)
cpdef double E(self, double a):
cdef double E
return (self.omega_m*a**(-3) + self.omega_c*a**(-2) + self.omega_lam)**0.5
cpdef double __angKernel(self, double x):
cdef __angKernel:
"""Integration kernel"""
return self.E(x**-1)**-1
cpdef double Da(self, double z, double z_ref=0):
cdef double Da
if isinstance(z, np.ndarray):
da = np.zeros_like(z)
for i in range(len(da)):
da[i] = self.Da(z[i], z_ref)
return da
else:
if z < 0:
raise ValueError("Redshift z must not be negative")
if z < z_ref:
raise ValueError("Redshift z must not be smaller than the reference redshift")
d = integrate.quad(self.__angKernel, z_ref+1, z+1,epsrel=1.e-6, epsabs=1.e-12)
rk = (abs(self.omega_c))**0.5
if (rk*d[0] > 0.01):
if self.omega_c > 0:
d[0] = sinh(rk*d[0])/rk
if self.omega_c < 0:
d[0] = sin(rk*d[0])/rk
return d[0]/(1+z)
I also wonder whether I convert all the arguments correctly into cython argument? I want to change my original python code to improve the speed of calculation. One of the bottleneck in my code I reckon, should be integrate.quad. Is there any substitution for this function in cython which helps to speed up the performance of my code?
cdef class halo_positions(object):
cdef double x = None
cdef double y = None
def __init__(self,numpy.ndarray[double, ndim=1] positions):
self.x = positions[0]
self.y = positions[1]
And if I want to pass an array to halo_positions instance is it a right way to do it?
If your class is defined as cdef it will be accessible only in Cython (not in Python) making it unnecessary and not efficient to use cpdef and def for the class methods. You can convert them all to cdef.
When you tell that z is double, it will accept only a double. If you want this argument to be of two different types, you should keep its type undeclared, but this will directly affect the loop performance when z is a ndarray.
Alternatively you could use double * and pass the size of it, when the size is 1 it is a double, when the size is >1 an array. The function would be:
cdef double Da(self, int size, double *z, double z_ref=0):
if size>1:
da = np.zeros(size)
for i in range(size):
da[i] = self.Da(1, &z[i], z_ref)
return da
else:
if z[0] < 0:
raise ValueError("Redshift z must not be negative")
if z[0] < z_ref:
raise ValueError("Redshift z must not be smaller than the reference redshift")
d = integrate.quad(self.__angKernel, z_ref+1, z[0]+1,
epsrel=1.e-6, epsabs=1.e-12)
rk = (abs(self.omega_c))**0.5
if (rk*d[0] > 0.01):
if self.omega_c > 0:
d[0] = sinh(rk*d[0])/rk
if self.omega_c < 0:
d[0] = sin(rk*d[0])/rk
return d[0]/(1+z[0])
I am doing some performance test on a variant of the prime numbers generator from http://docs.cython.org/src/tutorial/numpy.html.
The below performance measures are with kmax=1000
Pure Python implementation, running in CPython: 0.15s
Pure Python implementation, running in Cython: 0.07s
def primes(kmax):
p = []
k = 0
n = 2
while k < kmax:
i = 0
while i < k and n % p[i] != 0:
i = i + 1
if i == k:
p.append(n)
k = k + 1
n = n + 1
return p
Pure Python+Numpy implementation, running in CPython: 1.25s
import numpy
def primes(kmax):
p = numpy.empty(kmax, dtype=int)
k = 0
n = 2
while k < kmax:
i = 0
while i < k and n % p[i] != 0:
i = i + 1
if i == k:
p[k] = n
k = k + 1
n = n + 1
return p
Cython implementation using int*: 0.003s
from libc.stdlib cimport malloc, free
def primes(int kmax):
cdef int n, k, i
cdef int *p = <int *>malloc(kmax * sizeof(int))
result = []
k = 0
n = 2
while k < kmax:
i = 0
while i < k and n % p[i] != 0:
i = i + 1
if i == k:
p[k] = n
k = k + 1
result.append(n)
n = n + 1
free(p)
return result
The above performs great but looks horrible, as it holds two copies of the data... so I tried reimplementing it:
Cython + Numpy: 1.01s
import numpy as np
cimport numpy as np
cimport cython
DTYPE = np.int
ctypedef np.int_t DTYPE_t
#cython.boundscheck(False)
def primes(DTYPE_t kmax):
cdef DTYPE_t n, k, i
cdef np.ndarray p = np.empty(kmax, dtype=DTYPE)
k = 0
n = 2
while k < kmax:
i = 0
while i < k and n % p[i] != 0:
i = i + 1
if i == k:
p[k] = n
k = k + 1
n = n + 1
return p
Questions:
why is the numpy array so incredibly slower than a python list, when running on CPython?
what did I do wrong in the Cython+Numpy implementation? cython is obviously NOT treating the numpy array as an int[] as it should.
how do I cast a numpy array to a int*? The below doesn't work
cdef numpy.nparray a = numpy.zeros(100, dtype=int)
cdef int * p = <int *>a.data
cdef DTYPE_t [:] p_view = p
Using this instead of p in the calculations. reduced the runtime from 580 ms down to 2.8 ms for me. About the exact same runtime as the implementation using *int. And that's about the max you can expect from this.
DTYPE = np.int
ctypedef np.int_t DTYPE_t
#cython.boundscheck(False)
def primes(DTYPE_t kmax):
cdef DTYPE_t n, k, i
cdef np.ndarray p = np.empty(kmax, dtype=DTYPE)
cdef DTYPE_t [:] p_view = p
k = 0
n = 2
while k < kmax:
i = 0
while i < k and n % p_view[i] != 0:
i = i + 1
if i == k:
p_view[k] = n
k = k + 1
n = n + 1
return p
why is the numpy array so incredibly slower than a python list, when running on CPython?
Because you didn't fully type it. Use
cdef np.ndarray[dtype=np.int, ndim=1] p = np.empty(kmax, dtype=DTYPE)
how do I cast a numpy array to a int*?
By using np.intc as the dtype, not np.int (which is a C long). That's
cdef np.ndarray[dtype=int, ndim=1] p = np.empty(kmax, dtype=np.intc)
(But really, use a memoryview, they're much cleaner and the Cython folks want to get rid of the NumPy array syntax in the long run.)
Best syntax I found so far:
import numpy
cimport numpy
cimport cython
#cython.boundscheck(False)
#cython.wraparound(False)
def primes(int kmax):
cdef int n, k, i
cdef numpy.ndarray[int] p = numpy.empty(kmax, dtype=numpy.int32)
k = 0
n = 2
while k < kmax:
i = 0
while i < k and n % p[i] != 0:
i = i + 1
if i == k:
p[k] = n
k = k + 1
n = n + 1
return p
Note where I used numpy.int32 instead of int. Anything on the left side of a cdef is a C type (thus int = int32 and float = float32), while anything on the RIGHT side of it (or outside of a cdef) is a python type (int = int64 and float = float64)
Here's a piece of code that takes most time in my program, according to timeit statistics. It's a dirty function to convert floats in [-1.0, 1.0] interval into unsigned integer [0, 2**32]. How can I accelerate floatToInt?
piece = []
rng = range(32)
for i in rng:
piece.append(1.0/2**i)
def floatToInt(x):
n = x + 1.0
res = 0
for i in rng:
if n >= piece[i]:
res += 2**(31-i)
n -= piece[i]
return res
Did you try the obvious one?
def floatToInt(x):
return int((x+1.0) * (2**31))