How to declare a global numpy.ndarray in cython? - python

I want to create a signal processing algorithm that needs to hold some internal state in a numpy array.
For speed, I coded that in cython and declared the state a global variable like this:
import numpy as np
cimport numpy as np
cdef np.ndarray delay_buffer
However, what I really would like to do is this:
import numpy as np
cimport numpy as np
DTYPE = np.float32
ctypedef np.float32_t DTYPE_t
cdef np.ndarray[DTYPE_t] delay_buffer
This I can do anyhwere else, but not in the global scope. Is there any way to accomplish this?

Is there any way to accomplish this?
No. As the error says, Buffer types only allowed as function local variables.
One alternative is to use a monolithic main function. This really only takes indenting everything but it means that you can only share so much.
My favourite alternative would be to upgrade to the modern method of using memoryviews:
cdef DTYPE_t[:] delay_buffer
The should be faster, cleaner and no less powerful.

Related

Do locally set Cython compiler directives affect one or all functions?

I am working on speeding up some Python/Numpy code with Cython, and am a bit unclear on the effects of "locally" setting (as defined here in the docs) compiler directives. In my case I'd like to use:
#cython.wraparound (False) #turn off negative indexing
#cython.boundscheck(False) #turn off bounds-checking
I understand that I can globally define this in my setup.py file, but I'm developing for non-Cython users and would like the directives to be obvious from the .pyx file.
If I am writing a .pyx file with several functions defined in it, need I only set these once, or will they only apply to the next function defined? The reason I ask is that the documentation often says things like "turn off boundscheck for this function," making me wonder whether it only applies to the next function defined.
In other words, do I need to do this:
import numpy as np
cimport numpy as np
cimport cython
ctypedef np.float64_t DTYPE_FLOAT_t
#cython.wraparound (False) #turn off negative indexing
#cython.boundscheck(False) # turn off bounds-checking
def myfunc1(np.ndarray[DTYPE_FLOAT_t] a):
do things here
def myfunc2(np.ndarray[DTYPE_FLOAT_t] b):
do things here
Or do I need to do this:
import numpy as np
cimport numpy as np
cimport cython
ctypedef np.float64_t DTYPE_FLOAT_t
#cython.wraparound (False) #turn off negative indexing
#cython.boundscheck(False) # turn off bounds-checking
def myfunc1(np.ndarray[DTYPE_FLOAT_t] a):
do things here
#cython.wraparound (False) #turn off negative indexing
#cython.boundscheck(False) # turn off bounds-checking
def myfunc2(np.ndarray[DTYPE_FLOAT_t] b):
do things here
Thanks!
The documentation states that if you want to set a compiler directive globally, you need to do so with a comment at the top of the file. eg.
#!python
#cython: language_level=3, boundscheck=False
The manual does not say explicitly, but these directives are using decorator notation. In plain Python,
#decorator2
#decorator1
def fn(args):
body
is syntactic sugar for
def fn(args):
body
fn = decorator2(decorator1(fn))
So my expectation is that the directives work like that, which would mean that they apply only to the next function.
The Cython manual also says that the compiler directives can be used in with statements. What it doesn't say, unfortunately, is whether those can appear at top level:
with cython.wraparound(False), cython.boundscheck(False):
def myfunc1(np.ndarray[DTYPE_FLOAT_t] a):
do things here
def myfunc2(np.ndarray[DTYPE_FLOAT_t] b):
do things here
might be what you're looking for, or then again it might be a syntax error or a no-op. You're going to have to try it and see.

Fast indexing: Cython with numpy array of bool and str

I am trying to speed up a Python script. I have profiled the code and re-factored quite a lot already in pure Python. It seems that I am still spending a lot of time in accessing some numpy arrays in a way that looks like:
KeyArray[BoolArray[index]]
where KeyArray is ndim=2 and contains strings, BoolArray is ndim=1 and contains bool and index is an int.
I am trying to learn Cython to see how faster it could be. I wrote the following script that does not work:
import numpy as np
cimport numpy as np
def fastindexer(np.ndarray[np.str_t,ndim=1] KeyArray, np.ndarray [np.bool_t,ndim=2] BoolArray, np.int_t DateIndex):
cdef np.ndarray[np.str_t,ndim=1] FArray = KeyArray[BoolArray[DateIndex]]
return FArray
I understand that types str/bool are not available 'as is' in np arrays. I tried to cast as well but I don't understand how this should be written.
All help welcome
As #Joe said, moving a single indexing statement to Cython won't give you speed. If you decide to move more of your program to Cython, you need to fix a number of problems.
1) You use def instead of cdef, limiting you to Python-only functionality.
2) You use the old buffer syntax. Read about memoryviews
3) Slicing a 2-D array is slow because a new memoryview is created each time. That said, it still is a lot faster than Python, but for peak performance you would have to use a different approach.
Heres something to get you started.
cpdef func():
cdef int i
cdef bool[:] my_bool_array = np.zeros(10, dtype=bool)
# I'm not if this next line is correct
cdef char[:,:] my_string_array = np.chararray((10, 10))
cdef char answer
for i in range(10):
answer = my_string_array[ my_bool_array[i] ]

Cython says buffer types only allowed as function local variables even for ndarray.copy()

I am new to Cython and encountered this code snippet:
import numpy as np
cimport numpy as np
testarray = np.arange(5)
cdef np.ndarray[np.int_t, ndim=1] testarray1 = testarray.copy()
cdef np.ndarray[np.float_t, ndim=1] testarray2 = testarray.astype(np.float)
During compilation, it said Buffer types only allowed as function local variables. However, I am using .copy() or .astype() which is returning not a memoryview, but a copy. Why is this still happening? How can I get around this?
Thanks!
When you define an array in cython using np.ndarray[Type, dim], that is accessing the python buffer interface, and those can't be set as module level variables. This is a separate issue from views vs copies of numpy array data.
Typically if I want to have an array as a module level variable (i.e not local to a method), I define a typed memoryview and then set it within a method using something like (untested):
import numpy as np
cimport numpy as np
cdef np.int_t[:] testarray1
def init_arrays(np.int_t[:] testarray):
global testarray1
testarray1 = testarray.copy()

ctypedef in Cython with numpy: what is right convention?

In Cython when using numpy, what is the point of writing:
cimport numpy as np
import numpy as np
ctypedef np.int_t DTYPE_t
and then using DTYPE_t everywhere instead of just using np.int_t? Does the ctypedef actually do anything differently in the resulting code here?
You can read the notes from the docs for cython, reading the notes they explain the reason for the use of this notation and imports.
from __future__ import division
import numpy as np
# "cimport" is used to import special compile-time information
# about the numpy module (this is stored in a file numpy.pxd which is
# currently part of the Cython distribution).
cimport numpy as np
# We now need to fix a datatype for our arrays. I've used the variable
# DTYPE for this, which is assigned to the usual NumPy runtime
# type info object.
DTYPE = np.int
# "ctypedef" assigns a corresponding compile-time type to DTYPE_t. For
# every type in the numpy module there's a corresponding compile-time
# type with a _t-suffix.
ctypedef np.int_t DTYPE_t
# "def" can type its arguments but not have a return type. The type of the
# arguments for a "def" function is checked at run-time when entering the
# function.
#
# The arrays f, g and h is typed as "np.ndarray" instances. The only effect
# this has is to a) insert checks that the function arguments really are
# NumPy arrays, and b) make some attribute access like f.shape[0] much
# more efficient. (In this example this doesn't matter though.)

cython with array of pointers

I have a list of numpy.ndarrays (with different length) in python and need to have very fast access to those in python. I think an array of pointers would do the trick. I tried:
float_type_t* list_of_arrays[no_of_arrays]
for data_array in python_list_of_arrays:
list_of_arrays[0] = data_array
But cython complains:
no_of_arrays < Not allowed in a constant expression
I have tried several ways to constify this variable:
cdef extern from *:
ctypedef int const_int "const int"
(there have been more creative attempts) - however it unfortunatley doesn't work.
Please help.
Why don't you use a numpy object array instead of a list of arrays?
I think the problem you're having is because you are declaring list_of_arrays in the stack, and its size must be known at compile-time. You can try some dynamic allocation, like this:
from libc.stdlib cimport malloc, free
cdef float_type_t *list_of_arrays = \
<float_type_t *>malloc(no_of_arrays * sizeof(float_type_t*))
for i in range(no_of_arrays):
list_of_arrays[i] = &(data_array[i].data)
# don't forget to free list_of_arrays!
(This assumes data_array is a numpy array.)
But this is still guessing a bit what you want to do.

Categories