equivalent of "double[:,::1] u_tp1 not None" in python? - python

I am trying to write the short-code below in python (it is from a .pyx file). my issue is the lines with "double[:,::1]" in them. Is there any equivalent in python for it? also, how does "cdef unsigned int i, j" translate to python? I am new to programming and most of what I found online is over my head. any suggestion or help is appreciated.
def _step_scalar(
double[:,::1] u_tp1 not None,
double[:,::1] u_t not None,
double[:,::1] u_tm1 not None,
unsigned int x1, unsigned int x2, unsigned int z1, unsigned int z2,
double dt, double ds,
double[:,::1] vel not None):
"""
Perform a single time step in the Finite Difference solution for scalar
waves 4th order in space
"""
cdef unsigned int i, j
for i in xrange(z1, z2):
for j in xrange(x1, x2):
u_tp1[i,j] = (2.*u_t[i,j] - u_tm1[i,j]
+ ((vel[i,j]*dt/ds)**2)*(
(-u_t[i,j + 2] + 16.*u_t[i,j + 1] - 30.*u_t[i,j] +
16.*u_t[i,j - 1] - u_t[i,j - 2])/12. +
(-u_t[i + 2,j] + 16.*u_t[i + 1,j] - 30.*u_t[i,j] +
16.*u_t[i - 1,j] - u_t[i - 2,j])/12.))

They're type declarations to help Cython speed up the code. Python is dynamically typed (accepts variables of any type) so they aren't meaningful in Cython. Therefore you can get rid of them.
double[:,::1] defines the variable as a 2D, C contiguous memoryview of doubles. This means the function expects something similar to a 2D numpy array (as this is still what you should pass your Cython function).
u_tp1 is the variable name. You should keep this.
not None tells Cython to assume that you won't pass None into the function (so it disables some checks for extra speed). This can be deleted in Python.
cdef unsigned int i, j defines i and j as C integers, for extra speed. In Python i and j are created when they are needed in the for loop so the definition can be deleted completely.

Related

Cython: Change integer with reference

Say I have a simple function that takes as input a pointer to an integer. How do I change the originating integer value?
My idea was as follows:
cdef myFunc(int n, int *nnz):
nnz_int = <uintptr_t>nnz
nnz_int = 0
for i in range(0, n):
nnz_int += n
but upon reflection, I think I only initially cast the value of nnz onto nnz_int, and then change nnz_int, without changing the original nnz. How do I achieve that?
From the Cython docs:
Note that Cython uses array access for pointer dereferencing, as *x is not valid Python syntax, whereas x[0] is.
So this should work:
cdef myFunc(int n, int *nnz):
for i in range(0, n):
nnz[0] += n
Not sure what you're trying to achieve by adding n to the pointed-to value n times; why not simply add n*n to it once?

How multiarray.correlate2(a, v, mode) is actually implemented?

On my way to understand how the Numpy.correlate() function actually works, I get to it's implementation in pure Python, but what I saw was very disappointing:
def correlate(a, v, mode='valid', old_behavior=False):
mode = _mode_from_name(mode)
if old_behavior:
warnings.warn("""Warning.""", DeprecationWarning)
return multiarray.correlate(a, v, mode)
else:
return multiarray.correlate2(a, v, mode)
So I started to look for implementation of the multiarray.correlate2(a, v, mode) function, but unfortunately I can't find it. I'll just say, that I'm looking for it, because I try to implement the autocorrelation function by myself, and I'm missing the functionality similar to mode='full' parameter in Numpy.correlate() that makes function to return result as a 1D array. Thank you for help in advance.
The speed of python code can be very poor compared to other languages like c. numpy aims to provide highly performant operations on arrays, therefore the developers decided to implement some operations in c.
Unfortunately, won't find a python implementation of correlate in numpy's code base, but if you are familiar with C and python's extension modules, you can have find the relevant code here.
The different modes just specify the length of the output array.
You can simulate them by transforming your inputs:
import numpy as np
a = [1, 2, 3]
v = [0, 1, 0.5]
np.correlate(a, v, mode="full")
returns:
array([ 0.5, 2. , 3.5, 3. , 0. ])
You can get the same result by filling v with zeros:
np.correlate(a, [0, 0] + v + [0, 0])
returns the same result:
array([ 0.5, 2. , 3.5, 3. , 0. ])
np.core.multiarray.correlate2
dir(np.core.multiarray.correlate2) # to inspect
print (numpy.__version__)
print numpy.__version__ # python 2
found it! it might be a private API, can't find the docs after first search with numpy.multiarray or with the newly discovered 'correct' name.
optimal search query is 'np.core.multiarray.correlate2 github'
return multiarray.correlate2(a, v, mode) # mode is int data type
if you're plan to customize the code for your purposes, be careful.
/*
* simulates a C-style 1-3 dimensional array which can be accessed using
* ptr[i] or ptr[i][j] or ptr[i][j][k] -- requires pointer allocation
* for 2-d and 3-d.
*
* For 2-d and up, ptr is NOT equivalent to a statically defined
* 2-d or 3-d array. In particular, it cannot be passed into a
* function that requires a true pointer to a fixed-size array.
*/
/*NUMPY_API
* Simulate a C-array
* steals a reference to typedescr -- can be NULL
*/
NPY_NO_EXPORT int
PyArray_AsCArray(PyObject **op, void *ptr, npy_intp *dims, int nd,
PyArray_Descr* type
# NPY_NO_EXPORT int NPY_NUMUSERTYPES = 0;
# omit code
switch(mode) {
case 0:
length = length - n + 1;
n_left = n_right = 0;
break;
case 1:
n_left = (npy_intp)(n/2);
n_right = n - n_left - 1;
break;
case 2:
n_right = n - 1;
n_left = n - 1;
length = length + n - 1;
break;
default:
PyErr_SetString(PyExc_ValueError, "mode must be 0, 1, or 2");
return NULL;
}
don't mess with the internal APIs if this is your first crack at the codebase and you are on a deadline. Too late for me.

Iterating over a list in parallel with Cython

How does one iterate in parallel over a (Python) list in Cython?
Consider the following simple function:
def sumList():
cdef int n = 1000
cdef int sum = 0
ls = [i for i in range(n)]
cdef Py_ssize_t i
for i in prange(n, nogil=True):
sum += ls[i]
return sum
This gives a lot of compiler errors, because a parallel section without the GIL apparently cannot work with any Python object:
Error compiling Cython file:
------------------------------------------------------------
...
ls = [i for i in range(n)]
cdef Py_ssize_t i
for i in prange(n, nogil=True):
sum += ls[i]
^
------------------------------------------------------------
src/parallel.pyx:42:6: Coercion from Python not allowed without the GIL
Error compiling Cython file:
------------------------------------------------------------
...
ls = [i for i in range(n)]
cdef Py_ssize_t i
for i in prange(n, nogil=True):
sum += ls[i]
^
------------------------------------------------------------
src/parallel.pyx:42:6: Operation not allowed without gil
Error compiling Cython file:
------------------------------------------------------------
...
ls = [i for i in range(n)]
cdef Py_ssize_t i
for i in prange(n, nogil=True):
sum += ls[i]
^
------------------------------------------------------------
src/parallel.pyx:42:6: Converting to Python object not allowed without gil
Error compiling Cython file:
------------------------------------------------------------
...
ls = [i for i in range(n)]
cdef Py_ssize_t i
for i in prange(n, nogil=True):
sum += ls[i]
^
------------------------------------------------------------
src/parallel.pyx:42:11: Indexing Python object not allowed without gil
I am not aware of any way to do this. A list is a Python object, so using its __getitem__ method requires the GIL. If you are able to use a NumPy array in this case, it will work. For example, if you wanted to iterate over an array A of double precision floating point values you could do something like this:
cimport cython
from numpy cimport ndarray as ar
from cython.parallel import prange
#cython.boundscheck(False)
#cython.wraparound(False)
cpdef cysumpar(ar[double] A):
cdef double tot=0.
cdef int i, n=A.size
for i in prange(n, nogil=True):
tot += A[i]
return tot
On my machine, for this particular case, prange doesn't make it any faster than a normal loop, but it could work better in other cases. For more on how to use prange see the documentation at http://docs.cython.org/src/userguide/parallelism.html
Whether or not you can use an array depends on how much you are changing the size of the array. If you need a lot of flexibility with the size, the array will not work. You could also try interfacing with the vector class in C++. I've never done that myself, but there is a brief description of how to do that here: http://docs.cython.org/src/userguide/wrapping_CPlusPlus.html#nested-class-declarations
Convert your list into an array if you need any numeric value, or a bytearray if values are limited between 0 and 255. If you store anything else than numeric values, try numpy or use dtypes directly. For example with bytes:
cdef int[::1] gen = array.array('i',[1, 2, 3, 4])
And if you want to use C types:
ctypedef unsigned char uint8_t

Optimizing my Cython/Numpy code? Only a 30% performance gain so far

Is there anything I've forgotten to do here in order to speed things up a bit? I'm trying to implement an algorithm described in a book called Tuning Timbre Spectrum Scale. Also---if all else fails, is there a way for me to just write this part of the code in C, then be able to call it from python?
import numpy as np
cimport numpy as np
# DTYPE = np.float
ctypedef np.float_t DTYPE_t
np.seterr(divide='raise', over='raise', under='ignore', invalid='raise')
"""
I define a timbre as the following 2d numpy array:
[[f0, a0], [f1, a1], [f2, a2]...] where f describes the frequency
of the given partial and a is its amplitude from 0 to 1. Phase is ignored.
"""
#Test Timbre
# cdef np.ndarray[DTYPE_t,ndim=2] t1 = np.array( [[440,1],[880,.5],[(440*3),.333]])
# Calculates the inherent dissonance of one timbres of the above form
# using the diss2Partials function
cdef DTYPE_t diss1Timbre(np.ndarray[DTYPE_t,ndim=2] t):
cdef DTYPE_t runningDiss1
runningDiss1 = 0.0
cdef unsigned int len = np.shape(t)[0]
cdef unsigned int i
cdef unsigned int j
for i from 0 <= i < len:
for j from i+1 <= j < len:
runningDiss1 += diss2Partials(t[i], t[j])
return runningDiss1
# Calculates the dissonance between two timbres of the above form
cdef DTYPE_t diss2Timbres(np.ndarray[DTYPE_t,ndim=2] t1, np.ndarray[DTYPE_t,ndim=2] t2):
cdef DTYPE_t runningDiss2
runningDiss2 = 0.0
cdef unsigned int len1 = np.shape(t1)[0]
cdef unsigned int len2 = np.shape(t2)[0]
runningDiss2 += diss1Timbre(t1)
runningDiss2 += diss1Timbre(t2)
cdef unsigned int i1
cdef unsigned int i2
for i1 from 0 <= i1 < len1:
for i2 from 0 <= i2 < len2:
runningDiss2 += diss2Partials(t1[i1], t2[i2])
return runningDiss2
cdef inline DTYPE_t float_min(DTYPE_t a, DTYPE_t b): return a if a <= b else b
# Calculates the dissonance of two partials of the form [f,a]
cdef DTYPE_t diss2Partials(np.ndarray[DTYPE_t,ndim=1] p1, np.ndarray[DTYPE_t,ndim=1] p2):
cdef DTYPE_t f1 = p1[0]
cdef DTYPE_t f2 = p2[0]
cdef DTYPE_t a1 = abs(p1[1])
cdef DTYPE_t a2 = abs(p2[1])
# In order to insure that f2 > f1:
if (f2 < f1):
(f1,f2,a1,a2) = (f2,f1,a2,a1)
# Constants of the dissonance curves
cdef DTYPE_t _xStar
_xStar = 0.24
cdef DTYPE_t _s1
_s1 = 0.021
cdef DTYPE_t _s2
_s2 = 19
cdef DTYPE_t _b1
_b1 = 3.5
cdef DTYPE_t _b2
_b2 = 5.75
cdef DTYPE_t a = float_min(a1,a2)
cdef DTYPE_t s = _xStar/(_s1*f1 + _s2)
return (a * (np.exp(-_b1*s*(f2-f1)) - np.exp(-_b2*s*(f2-f1)) ) )
cpdef dissTimbreScale(np.ndarray[DTYPE_t,ndim=2] t,np.ndarray[DTYPE_t,ndim=1] s):
cdef DTYPE_t currDiss
currDiss = 0.0;
cdef unsigned int i
for i from 0 <= i < s.size:
currDiss += diss2Timbres(t, transpose(t,s[i]))
return currDiss
cdef np.ndarray[DTYPE_t,ndim=2] transpose(np.ndarray[DTYPE_t,ndim=2] t, DTYPE_t ratio):
return np.dot(t, np.array([[ratio,0],[0,1]]))
Link to code: Cython Code
Here are some things that I noticed:
Use t1.shape[0] instead of np.shape(t1)[0] and in so on in other places.
Don't use len as a variable because it is a built-in function in Python (not for speed, but for good practice). Use L or something like that.
Don't pass two-element arrays to functions unless you really need to. Cython checks the buffer every time you do pass an array. So, when using diss2Partials(t[i], t[j]) do diss2Partials(t[i,0], t[i,1], t[j,0], t[j,1]) instead and redefine diss2Partials appropriately.
Don't use abs, or at least not the Python one. It is having to convert your C double to a Python float, call the abs function, then convert back to a C double. It would probably be better to make an inlined function like you did with float_min.
Calling np.exp is doing a similar thing to using abs. Change np.exp to exp and add from libc.math cimport exp to your imports at the top.
Get rid of the transpose function completely. The np.dot is really slowing things down, but there really is no need for matrix multiplication here anyway. Rewrite your dissTimbreScale function to create an empty matrix, say t2. Before the current loop, set the second column of t2 to be equal to the second column of t (using a loop preferably, but you could probably get away with a Numpy operation here). Then, inside of the current loop, put in a loop that sets the first column of t2 equal to the first column of t times s[i]. That's what your matrix multiplication was really doing. Then just pass t2 as the second parameter to diss2Timbres instead of the one returned by the transpose function.
Do 1-5 first because they are rather easy. Number 6 may take a little more time, effort and maybe experimentation, but I suspect that it may also give you a significant boost in speed.
In your code:
for i from 0 <= i < len:
for j from i+1 <= j < len:
runningDiss1 += diss2Partials(t[i], t[j])
return runningDiss1
bounds checking is performed for each array lookup, use the decorator #cython.boundscheck(False) before the function, and then cast to an unsigned int type before using i and j as the indices. Look up the cython for Numpy tutorial for more info.
I would profile your code in order to see which function takes the most time. If it is diss2Timbres you may benefit from the package "numexpr".
I compared Python/Cython and Numexpr for one of my functions (link to SO). Depending on the size of the array, numexpr outperformed both, Cython and Fortran.
NOTE: Just figured out this post is really old...

how to convert python/cython unicode string to array of long integers, to do levenshtein edit distance [duplicate]

This question already has an answer here:
Closed 10 years ago.
Possible Duplicate:
How to correct bugs in this Damerau-Levenshtein implementation?
I have the following Cython code (adapted from the bpbio project) that does Damerau-Levenenshtein edit-distance calculation:
#---------------------------------------------------------------------------
cdef extern from "stdlib.h":
ctypedef unsigned int size_t
size_t strlen(char *s)
void *malloc(size_t size)
void *calloc(size_t n, size_t size)
void free(void *ptr)
int strcmp(char *a, char *b)
char * strcpy(char *a, char *b)
#---------------------------------------------------------------------------
cdef extern from "Python.h":
object PyTuple_GET_ITEM(object, int)
void Py_INCREF(object)
#---------------------------------------------------------------------------
cdef inline size_t imin(int a, int b, int c):
if a < b:
if c < a:
return c
return a
if c < b:
return c
return b
#---------------------------------------------------------------------------
cpdef int editdistance( char *a, char *b ):
"""Given two byte strings ``a`` and ``b``, return their absolute Damerau-
Levenshtein distance. Each deletion, insertion, substitution, and
transposition is counted as one difference, so the edit distance between
``abc`` and ``ab``, ``abcx``, ``abx``, ``acb``, respectively, is ``1``."""
#.........................................................................
if strcmp( a, b ) == 0: return 0
#.........................................................................
cdef int alen = strlen( a )
cdef int blen = strlen( b )
cdef int R
cdef char *ctmp
cdef size_t i
cdef size_t j
cdef size_t achr
cdef size_t bchr
#.........................................................................
if alen > blen:
ctmp = a;
a = b;
b = ctmp;
alen, blen = blen, alen
#.........................................................................
cdef char *m1 = <char *>calloc( blen + 2, sizeof( char ) )
cdef char *m2 = <char *>calloc( blen + 2, sizeof( char ) )
cdef char *m3 = <char *>malloc( ( blen + 2 ) * sizeof( char ) )
#.........................................................................
for i from 0 <= i <= blen:
m2[ i ] = i
#.........................................................................
for i from 1 <= i <= alen:
m1[ 0 ] = i + 1
achr = a[ i - 1 ]
for j from 1 <= j <= blen:
bchr = b[ j- 1 ]
if achr == bchr:
m1[ j ] = m2[ j - 1 ]
else:
m1[ j ] = 1 + imin( m1[ j - 1 ], m2[ j - 1 ], m2[ j ] )
if i != 1 and j != 1 and achr == b[ j - 2 ] and bchr == a[ i - 2 ]:
m1[ j ] = m3[ j - 1 ]
#.......................................................................
m1, m2 = m2, m1
strcpy( m3, m2 )
#.........................................................................
R = <int>m2[ blen ]
#.........................................................................
# cleanup:
free( m3 )
free( m1 )
free( m2 )
#.........................................................................
return R
The code runs fine and fast (300,000...400,000 comparisons per second on my PC).
the challenge is to make this code work with unicode strings as well. i am running Python 3.1 and retrieve texts from a database that are then matched to a query text.
encoding these strings to bytes before passing them to the Cython function for comparison is not be a good idea, since performance would suffer considerably (tested) and results would likely be wrong for any text containing characters outside of 7bit US ASCII.
the (very terse) Cython manual does mention unicode strings, but is hardly helpful for the problem at hand.
as i see it, a unicode string can be conceived of as an array of integer number, each representing a single codepoint, and the code above is basically operating on arrays of chars already, so my guess is that i should (1) extend it to handle C arrays of integers; (2) add code to convert a python unicode string to a C array; (3) profit!.
( Note: there are two potential issues with this approach: one is handling unicode surrogate characters, but i guess i know what to do with those. the other problem is that unicode codepoints do not really map 1:1 to the concept of 'characters'. i am well aware of that but i consider it outside of the scope of this question. please assume that one unicode codepoint is one unit of comparison.)
so i am asking for suggestions how to
write a fast Cython function that accepts a python unicode string and returns a C array of Cython unsigned ints (4 bytes);
modify the code shown to handle those arrays and do the correct memory allocations / deallocations (this is pretty foreign stuff to me).
Edit: John Machin has pointed out that the curious typecasts char *m1 etc are probably done for speed and/or memory optimization; these variables are still treated as arrays of numbers. i realize that the code does nothing to prevent a possible overflow with long strings; erroneous results may occur when one array element exceeds 127 or 255 (depending on the C compiler used). sort of surprising for code coming from a bioinformatics project.
that said, i am only interested in precise results for largely identical strings of less than say a hundred characters or so. results below 60% sameness could for my purposes be safely reported as 'completely different' (by returning the length of the longer text), so i guess it will be best to leave the char *m1 casts in place, but add some code to check against overflow and early abortion in case of rampant dissimilarity.
Use ord() to convert characters to their integer code point. It works characters from either unicode or str string types:
codepoints = [ord(c) for c in text]
Caveat lector: I've never done this. The following is a rough sketch of what I'd try.
You will need to use the PyUnicode_AsUnicode function and the next one, PyUnicode_GetSize. In declarations, where you currently have char, use Py_UNICODE instead. Presumably with a narrow (UCS2) build you will copy the internal structure, converting surrogate pairs as you go. With a wide (UCS4) build you might operate directly on the internal structure.
i close this question because i have found a better algorithm... with its own problems. see you over there.

Categories