I am currently working on improving the runtime for a simple Cython function to multiply a numpy matrix A and a numpy vector x using BLAS (i.e. runs A.dot.x in normal numpy)
My current implementation matrix_multiply(A,x) does this without copying the data:
import cython
import numpy as np
cimport numpy as np
cimport scipy.linalg.cython_blas as blas
DTYPE = np.float64
ctypedef np.float64_t DTYPE_T
#cython.boundscheck(False)
#cython.wraparound(False)
#cython.nonecheck(False)
def matrix_multiply(np.ndarray[DTYPE_T, ndim=2, mode="fortran"] A, np.ndarray[DTYPE_T, ndim=1, mode="fortran"] x):
#calls dgemv from BLAS which computes y = alpha * trans(A) + beta * y
#see: http://www.nag.com/numeric/fl/nagdoc_fl22/xhtml/F06/f06paf.xml
cdef int N = A.shape[0]
cdef int D = A.shape[1]
cdef int lda = N
cdef int incx = 1 #increments of x
cdef int incy = 1 #increments of y
cdef double alpha = 1.0
cdef double beta = 0.0
cdef np.ndarray[DTYPE_T, ndim=1, mode = "fortran"] y = np.empty(N, dtype = DTYPE)
blas.dgemv("N", &N, &D, &alpha, &A[0,0], &lda, &x[0], &incx, &beta, &y[0], &incy)
return y
I am wondering how I can change this so that it computes A(:,selected).dot.x instead of A.dot.x, where selected is a set of ordered indices of the columns.
I am open to any implementation, though I suppose that the easiest way would be to change the function header to matrix_multiply(A,x,selected) so it also expects selected as an input. I believe that the answer has to use memory views, but I am not sure.
I want to make a boolean numpy array in cython with the given size of another numpy.array but it raises an error message:
CosmoTest.pyx
import numpy as np
cimport numpy as np
cimport cython
from libcpp cimport bool
x=np.array([[-0.3,1.2],[2.5,0.82],[0.61,-0.7]])
mask= np.ones_like(x,dtype=bool)
error:
mask= np.ones_like(x,dtype=bool)
^
------------------------------------------------------------
CosmoTest.pyx:318:39: 'bool' is not a constant, variable or function identifier
How should it be defined in cython?
Update:
cpdef np.ndarray arc( np.ndarray x):
cdef np.ndarray[double, ndim=1, mode='c'] out = np.zeros_like(x)
cdef np.ndarray[np.uint8_t,cast=True, ndim=1] mask = (x < 0.999).view(dtype=np.uint8)
if mask.any():
out[mask] = 0.5*np.log((1.+((1.-x[mask])/(x[mask]+1.))**0.5)/(1.-((1.-x[mask])/(x[mask]+1.))**0.5))/(1-x[mask]**2)**0.5
cdef np.ndarray[np.uint8_t,cast=True, ndim=1] mask = (x > 1.001).view(dtype=np.uint8)
if mask.any():
out[mask] = np.arctan(((x[mask]-1.)/(x[mask]+1.))**0.5)/(x[mask]**2 - 1)**0.5
cdef np.ndarray[np.uint8_t,cast=True , ndim=1] mask = ((x >= 0.999) & (x <= 1.001)).view(dtype=np.uint8)
if mask.any():
out[mask] = 5./6. - x[mask]/3.
return out
Error Message:
Error compiling Cython file:
------------------------------------------------------------
...
if mask.any():
out[mask] = 0.5*np.log((1.+((1.-x[mask])/(x[mask]+1.))**0.5)/(1.-((1.-x[mask])/(x[mask]+1.))**0.5))/(1-x[mask]**2)**0.5
cdef np.ndarray[np.uint8_t,cast=True, ndim=1] mask = (x > 1.001).view(dtype=np.uint8)
if mask.any():
out[mask] = np.arctan(((x[mask]-1.)/(x[mask]+1.))**0.5)/(x[mask]**2 - 1)**0.5
^
------------------------------------------------------------
CosmoTest.pyx:9:55: local variable 'mask' referenced before assignment
If you change (the last line of) your code to
mask= np.ones_like(x,dtype=np.bool)
it will work (take bool from numpy rather than trying to use the lipcpp definition). However, actually statically typing boolean numpy arrays doesn't quite work currently (see Passing a numpy pointer (dtype=np.bool) to C++).
The best way forward currently is to statically type them as
def f(np.ndarray[dtype=np.int8_t,ndim=1] x):
cdef np.ndarray[dtype=np.int8_t,ndim=1] y
y = np.ones_like(x,dtype=np.int8)
return y.view(dtype=np.bool) # returns as boolean array
Internally numpy uses an 8 bit integer to store a bool, and thus you can just use view to reinterpret the array without copying.
If you had a boolean array and wanted to call f you'd do
mask = np.array([True,False,True])
f(mask.view(dtype=np.int8))
You could always write a small wrapper function as your public interface to f to do that reinterpretation automatically.
It's more fiddly than it needs be be, but it is possible to work with.
Addition in response to comments
The article I linked to suggested using cast=True:
cdef np.ndarray[np.uint8_t,cast=True] mask = (x > 0.01)
This also works fine. Written in my approach that would be
cdef np.ndarray[np.uint8_t] mask = (x > 0.01).view(dtype=np.uint8)
(i.e. no cast, but with a view). As far as I can tell there's no practical difference, so pick which one you think looks nicer.
And edited to respond to additional issues
The working code is below (I've checked and it compiles - I haven't checked to make sure it runs). You were getting compiler errors because you'd defined the type of mask multiple times. You're only allowed to use cdef once per variable per function, but having defined the type you can assign to it as often as you like.
cpdef np.ndarray arc( np.ndarray x):
cdef np.ndarray[double, ndim=1, mode='c'] out = np.zeros_like(x)
cdef np.ndarray[np.uint8_t, ndim=1] mask = (x < 0.999).view(dtype=np.uint8)
if mask.any():
out[mask] = 0.5*np.log((1.+((1.-x[mask])/(x[mask]+1.))**0.5)/(1.-((1.-x[mask])/(x[mask]+1.))**0.5))/(1-x[mask]**2)**0.5
mask = (x > 1.001).view(dtype=np.uint8) # REMOVED cdef!
if mask.any():
out[mask] = np.arctan(((x[mask]-1.)/(x[mask]+1.))**0.5)/(x[mask]**2 - 1)**0.5
mask = ((x >= 0.999) & (x <= 1.001)).view(dtype=np.uint8) # REMOVED cdef!
if mask.any():
out[mask] = 5./6. - x[mask]/3.
return out
(I've also removed cast=True from the definition. This isn't important. You can either use that, or use view(dtype=np.uint8). You can use both if you like, but it's more typing!)
The problem
I'm trying to Cythonize two small functions that mostly deal with numpy ndarrays for some scientific purpose. These two smalls functions are called millions of times in a genetic algorithm and account for the majority of the time taken by the algo.
I made some progress on my own and both work nicely, but i get only a tiny speed improvement (10%). More importantly, cython --annotate show that the majority of the code is still going through Python.
The code
First function:
The aim of this function is to get back slices of data and it is called millions of times in an inner nested loop. Depending on the bool in data[1][1], we either get the slice in the forward or reverse order.
#Ipython notebook magic for cython
%%cython --annotate
import numpy as np
from scipy import signal as scisignal
cimport cython
cimport numpy as np
def get_signal(data):
#data[0] contains the data structure containing the numpy arrays
#data[1][0] contains the position to slice
#data[1][1] contains the orientation to slice, forward = 0, reverse = 1
cdef int halfwinwidth = 100
cdef int midpoint = data[1][0]
cdef int strand = data[1][1]
cdef int start = midpoint - halfwinwidth
cdef int end = midpoint + halfwinwidth
#the arrays we want to slice
cdef np.ndarray r0 = data[0]['normals_forward']
cdef np.ndarray r1 = data[0]['normals_reverse']
cdef np.ndarray r2 = data[0]['normals_combined']
if strand == 0:
normals_forward = r0[start:end]
normals_reverse = r1[start:end]
normals_combined = r2[start:end]
else:
normals_forward = r1[end - 1:start - 1: -1]
normals_reverse = r0[end - 1:start - 1: -1]
normals_combined = r2[end - 1:start - 1: -1]
#return the result as a tuple
row = (normals_forward,
normals_reverse,
normals_combined)
return row
Second function
This one gets a list of tuples of numpy arrays, and we want to add up the arrays element wise, then normalize them and get the integration of the intersection.
def calculate_signal(list signal):
cdef int halfwinwidth = 100
cdef np.ndarray profile_normals_forward = np.zeros(halfwinwidth * 2, dtype='f')
cdef np.ndarray profile_normals_reverse = np.zeros(halfwinwidth * 2, dtype='f')
cdef np.ndarray profile_normals_combined = np.zeros(halfwinwidth * 2, dtype='f')
#b is a tuple of 3 np.ndarrays containing 200 floats
#here we add them up elementwise
for b in signal:
profile_normals_forward += b[0]
profile_normals_reverse += b[1]
profile_normals_combined += b[2]
#normalize the arrays
cdef int count = len(signal)
#print "Normalizing to number of elements"
profile_normals_forward /= count
profile_normals_reverse /= count
profile_normals_combined /= count
intersection_signal = scisignal.detrend(np.fmin(profile_normals_forward, profile_normals_reverse))
intersection_signal[intersection_signal < 0] = 0
intersection = np.sum(intersection_signal)
results = {"intersection": intersection,
"profile_normals_forward": profile_normals_forward,
"profile_normals_reverse": profile_normals_reverse,
"profile_normals_combined": profile_normals_combined,
}
return results
Any help is appreciated - I tried using memory views but for some reason the code got much, much slower.
After fixing the array cdef (as has been indicated, with the dtype specified), you should probably put the routine in a cdef function (which will only be callable by a def function in the same script).
In the declaration of the function, you'll need to provide the type (and the dimensions if it's an array numpy):
cdef get_signal(numpy.ndarray[DTYPE_t, ndim=3] data):
I'm not sure using a dict is a good idea though. You could make use of numpy's column or row slices like data[:, 0].
What I want:
I want to apply a 1D function to an arbitrarily shaped ndarray, such that it modifies a certain axis. Similar to the axis argument in numpy.fft.fft.
Take the following example:
import numpy as np
def transf1d(f, x, y, out):
"""Transform `f(x)` to `g(y)`.
This function is actually a C-function that is far more complicated
and should not be modified. It only takes 1D arrays as parameters.
"""
out[...] = (f[None,:]*np.exp(-1j*x[None,:]*y[:,None])).sum(-1)
def transf_all(F, x, y, axis=-1, out=None):
"""General N-D transform.
Perform `transf1d` along the given `axis`.
Given the following:
F.shape == (2, 3, 100, 4, 5)
x.shape == (100,)
y.shape == (50,)
axis == 2
Then the output shape would be:
out.shape == (2, 3, 50, 4, 5)
This function should wrap `transf1d` such that it works on arbitrarily
shaped (compatible) arrays `F`, and `out`.
"""
if out is None:
shape = list(np.shape(F))
shape[axis] = np.size(y)
for f, o in magic_iterator(F, out):
# Given above shapes:
# f.shape == (100,)
# o.shape == (50,)
transf1d(f, x, y, o)
return out
The function transf1d takes a 1D ndarray f, and two more 1D arrays x, and y. It performs a fourier transform of f(x) from the x-axis to the y-axis. The result is stored in the out argument.
Now I want to wrap this in a more general function transf_all, that can take ndarrays of arbitrary shape along with an axis argument, that specifies along which axis to transform.
Notes:
My code is actually written in Cython. Ideally, the magic_iterator would be fast in Cython.
The function transf1d actually is a C-function that returns its output in the out argument. Hence, I couldn't get it to work with numpy.apply_along_axis.
Because transf1d is actually a pretty complicated C-function I cannot rewrite it to work on arbitrary arrays. I need to wrap it in a Cython function that deals with the additional dimensions.
Note, that the arrays x, and y can differ in their lengths.
My question:
How can I do this? How can I iterate over arbitrary dimensions of an ndarray such that at each iteration I will get a 1D array containing the specified axis?
I had a look at nditer, but I'm not sure if that is actually the right tool for this job.
Cheers!
import numpy as np
def transf1d(f, x, y, out):
"""Transform `f(x)` to `g(y)`.
This function is actually a C-function that is far more complicated
and should not be modified. It only takes 1D arrays as parameters.
"""
out[...] = (f[None,:]*np.exp(-1j*x[None,:]*y[:,None])).sum(-1)
def transf_all(F, x, y, axis=-1, out=None):
"""General N-D transform.
Perform `transf1d` along the given `axis`.
Given the following:
F.shape == (2, 3, 100, 4, 5)
x.shape == (100,)
y.shape == (50,)
axis == 2
Then the output shape would be:
out.shape == (2, 3, 50, 4, 5)
This function should wrap `transf1d` such that it works on arbitrarily
shaped (compatible) arrays `F`, and `out`.
"""
def wrapper(f):
"""
wrap transf1d for apply_along_axis compatibility
that is, having a signature of F.shape[axis] -> out.shape[axis]
"""
out = np.empty_like(y)
transf1d(f, x, y, out)
return out
return np.apply_along_axis(wrapper, axis, F)
I believe this should do what you want, although I havnt tested it. Note that the looping happening inside apply_along_axis has python-level performance though, so this only vectorizes the operation in terms of style, not in terms of performance. However, that is quite probably of no concern, assuming the decision to resort to external C code for the inner loop is justified by it being a nontrivial operation in the first place.
To answer your question:
If you really just want to iterate over all but a given axis, you can use:
for s in itertools.product(map(range, arr.shape[:axis]+arr.shape[axis+1:]):
arr[s[:axis] + (slice(None),) + s[axis:]]
Maybe there's a more elegant way to do it, but this should work.
But, don't iterate:
For your problem, I would just rewrite your function to work on a given axis of an ndarray. I think this should work:
def transfnd(f, x, y, axis, out):
s = list(f.shape)
s.insert(axis, 1)
yx = [y.size, x.size] + [1]*(f.ndim - axis - 1)
out[...] = np.sum(f.reshape(*s)*np.exp(-1j*x[None,:]*y[:,None]).reshape(*yx), axis+1)
It's really just the generalization of your current implementation, but instead of inserting a new axis in F at the beginning, it inserts it at axis (there might be a better way to do this than with the list(shape) method, but that was all I could do. Finally, you have to add trailing new axes to your yx outer product, to match as many trailing indices you have in F.
I didn't really know how to test this, but the shapes all work out, so please test it and let me know whether it works.
I found a way of iterating over all but one axis in Cython using the Numpy C-API (Code down below). However, it's not pretty. Whether it's worth the effort depends on the inner function and the size of data.
If any one knows a more elegant way to do this in Cython, please let me know.
I compared to Eelco's solution and they run at a comparable speed for large arguments. For smaller arguments the C-API solution is faster:
In [5]: y=linspace(-1,1,100);
In [6]: %timeit transf.apply_along(f, x, y, axis=1)
1 loops, best of 3: 5.28 s per loop
In [7]: %timeit transf.transfnd(f, x, y, axis=1)
1 loops, best of 3: 5.16 s per loop
As you can see, for this input both functions are roughly at the same speed.
In [8]: f=np.random.rand(10,20,50);x=linspace(0,1,20);y=linspace(-1,1,10);
In [9]: %timeit transf.apply_along(f, x, y, axis=1)
100 loops, best of 3: 15.1 ms per loop
In [10]: %timeit transf.transfnd(f, x, y, axis=1)
100 loops, best of 3: 8.55 ms per loop
However, for less large input arrays the C-API approach is faster.
The code
#cython: boundscheck=False
#cython: wraparound=False
#cython: cdivision=True
import numpy as np
cimport numpy as np
np.import_array()
cdef extern from "complex.h":
double complex cexp(double complex z) nogil
cdef void transf1d(double complex[:] f,
double[:] x,
double[:] y,
double complex[:] out,
int Nx,
int Ny) nogil:
cdef int i, j
for i in xrange(Ny):
out[i] = 0
for j in xrange(Nx):
out[i] = out[i] + f[j]*cexp(-1j*x[j]*y[i])
def transfnd(F, x, y, axis=-1, out=None):
# Make sure everything is a numpy array.
F = np.asanyarray(F, dtype=complex)
x = np.asanyarray(x, dtype=float)
y = np.asanyarray(y, dtype=float)
# Calculate absolute axis.
cdef int ax = axis
if ax < 0:
ax = np.ndim(F) + ax
# Calculate lengths of the axes `x`, and `y`.
cdef int Nx = np.size(x), Ny = np.size(y)
# Output array.
if out is None:
shape = list(np.shape(F))
shape[axis] = Ny
out = np.empty(shape, dtype=complex)
else:
out = np.asanyarray(out, dtype=complex)
# Error check.
assert np.shape(F)[axis] == Nx, \
'Array length mismatch between `F`, and `x`!'
assert np.shape(out)[axis] == Ny, \
'Array length mismatch between `out`, and `y`!'
f_shape = list(np.shape(F))
o_shape = list(np.shape(out))
f_shape[axis] = 0
o_shape[axis] = 0
assert f_shape == o_shape, 'Array shape mismatch between `F`, and `out`!'
# Construct iterator over all but one axis.
cdef np.flatiter itf = np.PyArray_IterAllButAxis(F, &ax)
cdef np.flatiter ito = np.PyArray_IterAllButAxis(out, &ax)
cdef int f_stride = F.strides[axis]
cdef int o_stride = out.strides[axis]
# Memoryview to access one slice per iteration.
cdef double complex[:] fdat
cdef double complex[:] odat
cdef double[:] xdat = x
cdef double[:] ydat = y
while np.PyArray_ITER_NOTDONE(itf):
# View the current `x`, and `y` axes.
fdat = <double complex[:Nx]> np.PyArray_ITER_DATA(itf)
fdat.strides[0] = f_stride
odat = <double complex[:Ny]> np.PyArray_ITER_DATA(ito)
odat.strides[0] = o_stride
# Perform the 1D-transformation on one slice.
transf1d(fdat, xdat, ydat, odat, Nx, Ny)
# Go to next step.
np.PyArray_ITER_NEXT(itf)
np.PyArray_ITER_NEXT(ito)
return out
# For comparison
def apply_along(F, x, y, axis=-1):
# Make sure everything is a numpy array.
F = np.asanyarray(F, dtype=complex)
x = np.asanyarray(x, dtype=float)
y = np.asanyarray(y, dtype=float)
# Calculate absolute axis.
cdef int ax = axis
if ax < 0:
ax = np.ndim(F) + ax
# Calculate lengths of the axes `x`, and `y`.
cdef int Nx = np.size(x), Ny = np.size(y)
# Error check.
assert np.shape(F)[axis] == Nx, \
'Array length mismatch between `F`, and `x`!'
def wrapper(f):
out = np.empty(Ny, complex)
transf1d(f, x, y, out, Nx, Ny)
return out
return np.apply_along_axis(wrapper, axis, F)
Build with the following setup.py
from distutils.core import setup
from Cython.Build import cythonize
import numpy as np
setup(
name = 'transf',
ext_modules = cythonize('transf.pyx'),
include_dirs = [np.get_include()],
)
I am trying to match a template with a binary image (only black and white) by shifting the template along the image. And return the minimum distance between the template and the image with the corresponding position on which this minimum distance did occur. For example:
img:
0 1 0
0 0 1
0 1 1
template:
0 1
1 1
This template matches the image best at position (1,1) and the distance will then be 0. So far things are not too difficult and I already got some code that does the trick.
def match_template(img, template):
mindist = float('inf')
idx = (-1,-1)
for y in xrange(img.shape[1]-template.shape[1]+1):
for x in xrange(img.shape[0]-template.shape[0]+1):
#calculate Euclidean distance
dist = np.sqrt(np.sum(np.square(template - img[x:x+template.shape[0],y:y+template.shape[1]])))
if dist < mindist:
mindist = dist
idx = (x,y)
return [mindist, idx]
But for images of the size I need (image among 500 x 200 pixels and template among 250 x 100) this already takes approximately 4.5 seconds, which is way too slow. And I know the same thing can be done much quicker using matrix multiplications (in matlab I believe this can be done using im2col and repmat). Can anyone explain me how to do it in python/numpy?
btw. I know there is an opencv matchTemplate function that does exactly what I need, but since I might need to alter the code slightly later on I would prefer a solution which I fully understand and can alter.
Thanks!
edit: If anyone can explain me how opencv does this in less than 0.2 seconds that would also be great. I have had a short look at the source code, but those things somehow always look quite complicated to me.
edit2: Cython code
import numpy as np
cimport numpy as np
DTYPE = np.int
ctypedef np.int_t DTYPE_t
def match_template(np.ndarray img, np.ndarray template):
cdef float mindist = float('inf')
cdef int x_coord = -1
cdef int y_coord = -1
cdef float dist
cdef unsigned int x, y
cdef int img_width = img.shape[0]
cdef int img_height = img.shape[1]
cdef int template_width = template.shape[0]
cdef int template_height = template.shape[1]
cdef int range_x = img_width-template_width+1
cdef int range_y = img_height-template_height+1
for y from 0 <= y < range_y:
for x from 0 <= x < range_x:
dist = np.sqrt(np.sum(np.square(template - img[ x:<unsigned int>(x+template_width), y:<unsigned int>(y+template_height) ]))) #calculate euclidean distance
if dist < mindist:
mindist = dist
x_coord = x
y_coord = y
return [mindist, (x_coord,y_coord)]
img = np.asarray(img, dtype=DTYPE)
template = np.asarray(template, dtype=DTYPE)
match_template(img, template)
One possible way of doing what you want is via convolution (which can be brute force or FFT). Matrix multiplications AFAIK won't work. You need to convolve your data with the template. And find the maximum (you'll also need to do some scaling to make it work properly).
xs=np.array([[0,1,0],[0,0,1],[0,1,1]])*1.
ys=np.array([[0,1],[1,1]])*1.
print scipy.ndimage.convolve(xs,ys,mode='constant',cval=np.inf)
>>> array([[ 1., 1., inf],
[ 0., 2., inf],
[ inf, inf, inf]])
print scipy.signal.fftconvolve(xs,ys,mode='valid')
>>> array([[ 1., 1.],
[ 0., 2.]])
There may be a fancy way to get this done using pure numpy/scipy magic. But it might be easier (and more understandable when you look at the code in the future) to just drop into Cython to get this done. There's a good tutorial for integrating Cython with numpy at http://docs.cython.org/src/tutorial/numpy.html.
EDIT:
I did a quick test with your Cython code and it ran ~15 sec for a 500x400 img with a 100x200 template. After some tweaks (eliminating the numpy method calls and numpy bounds checking), I got it down under 3 seconds. That may not be enough for you, but it shows the possibility.
import numpy as np
cimport numpy as np
cimport cython
from libc.math cimport sqrt
DTYPE = np.int
ctypedef np.int_t DTYPE_t
#cython.boundscheck(False)
def match_template(np.ndarray[DTYPE_t, ndim=2] img, np.ndarray[DTYPE_t, ndim=2] template):
cdef float mindist = float('inf')
cdef int x_coord = -1
cdef int y_coord = -1
cdef float dist
cdef unsigned int x, y
cdef int img_width = img.shape[0]
cdef int img_height = img.shape[1]
cdef int template_width = template.shape[0]
cdef int template_height = template.shape[1]
cdef int range_x = img_width-template_width+1
cdef int range_y = img_height-template_height+1
cdef DTYPE_t total
cdef int delta
cdef unsigned int j, k, j_plus, k_plus
for y from 0 <= y < range_y:
for x from 0 <= x < range_x:
#dist = np.sqrt(np.sum(np.square(template - img[ x:<unsigned int>(x+template_width), y:<unsigned int>(y+template_height) ]))) #calculate euclidean distance
# Do the same operations, but in plain C
total = 0
for j from 0 <= j < template_width:
j_plus = <unsigned int>x + j
for k from 0 <= k < template_height:
k_plus = <unsigned int>y + k
delta = template[j, k] - img[j_plus, k_plus]
total += delta*delta
dist = sqrt(total)
if dist < mindist:
mindist = dist
x_coord = x
y_coord = y
return [mindist, (x_coord,y_coord)]