Cython: Change integer with reference - python

Say I have a simple function that takes as input a pointer to an integer. How do I change the originating integer value?
My idea was as follows:
cdef myFunc(int n, int *nnz):
nnz_int = <uintptr_t>nnz
nnz_int = 0
for i in range(0, n):
nnz_int += n
but upon reflection, I think I only initially cast the value of nnz onto nnz_int, and then change nnz_int, without changing the original nnz. How do I achieve that?

From the Cython docs:
Note that Cython uses array access for pointer dereferencing, as *x is not valid Python syntax, whereas x[0] is.
So this should work:
cdef myFunc(int n, int *nnz):
for i in range(0, n):
nnz[0] += n
Not sure what you're trying to achieve by adding n to the pointed-to value n times; why not simply add n*n to it once?

Related

Cython different results

Why this function in cython returns different results for every run?
I passed in 50000 for test
cpdef int fun1(int num):
cpdef int result
cdef int x
for x in range(num):
result += x*x
return result
edit:
so now I changed it to long long result like this
cpdef long long fun1(int num):
cdef long long result = 0
cdef int x = 0
for x in range(num):
result += x*x
return result
and it returns 25950131338936
but :) the python func
def pyfunc(num):
result = 0
for x in range(num):
result += x * x
return result
return 41665416675000
so hm so what is wrong?
There are probably two problems here. First, result should be initialized to zero. Secondly, the result is the sum of all integers squared from 0 to 50 000 (non-inclusive)
The problem is that the storage type int cannot fit such a big number. Try using a larger storage type like long long and it will work. The maximum value a 32-bit integer can hold is roughly 2^31. The maximum value a long long can hold is typically 2^63. Consult the C compiler on the system at hand to figure out the exact limits.
The cdef statement is used to declare C variables, either local or module-level.
So you need to set the initial value to the result variable. If you don’t, it gets what it found in memory at call time, which can be anything.
cpdef int fun1(int num):
cdef int result = 0
cdef int x
for x in range(num):
result += x * x
return result

equivalent of "double[:,::1] u_tp1 not None" in python?

I am trying to write the short-code below in python (it is from a .pyx file). my issue is the lines with "double[:,::1]" in them. Is there any equivalent in python for it? also, how does "cdef unsigned int i, j" translate to python? I am new to programming and most of what I found online is over my head. any suggestion or help is appreciated.
def _step_scalar(
double[:,::1] u_tp1 not None,
double[:,::1] u_t not None,
double[:,::1] u_tm1 not None,
unsigned int x1, unsigned int x2, unsigned int z1, unsigned int z2,
double dt, double ds,
double[:,::1] vel not None):
"""
Perform a single time step in the Finite Difference solution for scalar
waves 4th order in space
"""
cdef unsigned int i, j
for i in xrange(z1, z2):
for j in xrange(x1, x2):
u_tp1[i,j] = (2.*u_t[i,j] - u_tm1[i,j]
+ ((vel[i,j]*dt/ds)**2)*(
(-u_t[i,j + 2] + 16.*u_t[i,j + 1] - 30.*u_t[i,j] +
16.*u_t[i,j - 1] - u_t[i,j - 2])/12. +
(-u_t[i + 2,j] + 16.*u_t[i + 1,j] - 30.*u_t[i,j] +
16.*u_t[i - 1,j] - u_t[i - 2,j])/12.))
They're type declarations to help Cython speed up the code. Python is dynamically typed (accepts variables of any type) so they aren't meaningful in Cython. Therefore you can get rid of them.
double[:,::1] defines the variable as a 2D, C contiguous memoryview of doubles. This means the function expects something similar to a 2D numpy array (as this is still what you should pass your Cython function).
u_tp1 is the variable name. You should keep this.
not None tells Cython to assume that you won't pass None into the function (so it disables some checks for extra speed). This can be deleted in Python.
cdef unsigned int i, j defines i and j as C integers, for extra speed. In Python i and j are created when they are needed in the for loop so the definition can be deleted completely.

Loop over a Numpy array with Cython

Let a and b be two numpy.float arrays of length 1024, defined with
cdef numpy.ndarray a
cdef numpy.ndarray b
I notice that:
cdef int i
for i in range(1024):
b[i] += a[i]
is considerably slower than:
b += a
Why?
I really need to be able to loop manually over arrays.
The difference will be smaller if you tell Cython the data type and the number of dimensions for a and b:
cdef numpy.ndarray[np.float64_t, ndim=1] a, b
Although the difference will be smaller, you won't beat b += a because this is using NumPy's SIMD-boosted functions (which will perform depending if your CPU supports SIMD).

Cython: simple function with 2 lists, what is the fastest way?

What I want to do - transform my pure Python code into Cython.
Pure Python code:
def conflicts(list1,list2):
numIt = 10
for i in list1:
for j in list2:
if i == j and i < numIt:
return True
return False
conflicts([1,2,3], [6,9,8])
My Cython code so far:
cdef char conflicts(int [] list1,int [] list2):
cdef int numIt = 10
for i in list1:
for j in list2:
if i == j and i < numIt:
return True
return False
conflicts([1,2,3], [6,9,8])
Since I am completely new to Cython (and not really a pro in Python) I would like to get some feedback about my transformation. Am I doing the right thing? Is there anything else I should do in order to make the function even faster?
Update:
Does anyone know how i can add types in the header of the function for the input (list1, list2)? I tried "int [:]" which compiles without error but when i try to call the function with two lists i get the message "TypeError: 'list' does not have the buffer interface".
"i" and "j" could be declared for optimize your code. First optimization with cython is accomplished using explicit declaration.
You can use
cython -a yourcode.py
and see some automatic suggestion of possible changes for optimize your python code with cython (yellow lines). You can work with c module generated (work perfect!).
Some handwrite cython optimization:
+ Using type list for list1 and list2.
+ bint type for conflicts functions because that return boolean value.
+ Get lenght of lists because for loop requiere end index.
+ Map lists in int arrays (because lists has only integer values).
cdef bint conflicts(list list1, list list2):
cdef int numIt = 10
cdef int i, j
cdef int end_index = len(list1)
cdef int[:] my_list1 = list1
cdef int[:] my_list2 = list2
for i in range(end_index):
for j in range(end_index):
if my_list1[i] == my_list2[j] and my_list1[i] < numIt:
return True
return False
conflicts([1,2,3], [6,9,8])
As I commented, you should be able to get a pretty substantial improvement by changing your algorithm, without messing with cython at all. Your current code is O(len(list1)*len(list2)), but you can reduce this to O(len(list1)+len(list2)) by using a set. You can also simplify the code by using the builtin any function:
def conflicts(list1,list2):
numIt = 10
s1 = set(list1)
return any(x in s1 and x < numIt for x in list2)
Depending on how many numbers in each list you expect to be less than 10, you might try moving the x < numIt test around a bit to see what is fastest (filtering list1 before you turn it into a set, for instance, or putting if x < numIt after the for in the generator expression inside any).

Speed up cython loop by copying from a list to a numpy array

I am writing some performance intensive code, and was hoping to get some feedback from the cythonistas out there on how to improve it further. The purpose of the functions I've written is a bit tough to explain, but what they do isn't all that intimidating. The first (roughly) takes two dictionaries of lists of numbers and joins them to get one dictionary of lists of numbers. It's only run once so I am less concerned with optimizing it. The second first calls the first, then uses its result to basically cross indices stored in a numpy array with the numbers in the lists of arrays to form queries (new numbers) on a (pybloomfiltermmap) bloom filter.
I've determined the heavy step is due to my nested loops and reduced the number of loops used, moved out of the loops everything that only needs to happen once, and typed everything to the best of my knowledge. Still, each iteration of i in the second function takes about 10 seconds, which is too much. The main things I still see as yellow in the html compilation output are due to indexed accesses in the lists and numpy array, so I tried to replace my lists with all numpy arrays but wasn't able to get any improvement. I would greatly appreciate any feedback you could provide.
#cython: boundscheck=False
#cython: wraparound=False
import numpy as np
cimport numpy as np
def merge_dicts_of_lists(dict c1, dict c2):
cdef dict res
cdef int n, length1, length2, length3
cdef unsigned int i, j, j_line, jj, k, kk, new_line
res = {n: [] for n in range(256)}
length1 = len(c1)
for i in range(length1):
length2 = len(c1[i])
for j in range(length2):
j_line = c1[i][j]
jj = (j_line) % 256
length3 = len(c2[jj])
for k in range(length3):
kk = c2[jj][k]
new_line = (j_line << 10) + kk
res[i].append(new_line)
return res
def get_4kmer_set(np.ndarray c1, dict c2, dict c3, bf):
cdef unsigned int num = 0
cdef unsigned long long query = 0
cdef unsigned int i, j, i_row, i_col, j_line
cdef unsigned int length1, length2
cdef dict merge
cdef list m_i
merge = merge_dicts_of_lists(c2, c3)
length1 = len(c1[:,0])
for i in range(length1):
print "i is %d" % i
i_row = c1[i,0]
i_col = c1[i,1]
m_i = merge[i_col]
length2 = len(m_i)
for j in range(length2):
j_line = m_i[j]
query = (i_row << 24) + (i_col << 20) + j_line
if query in bf:
num += 1
print "%d yes answers from bf" % num
For posterity's sake, I'm adding an off-topic answer, but I hope it will be useful for someone nonetheless. The code I posted above isn't much different from what I've decided to stay with, as it was already compiling to short C lines as seen by the Ctyhon html compilation output.
Since the innermost operation was a Bloom filter query, I found what helped most was speeding up that step in two ways. One was changing the hash function used by pybloomfiltermmap to an available C++ implementation of murmurhash3. I found pybloomfilter was using sha, which was comparatively slowas expected for a cryptographic hash function. The second boost came from applying the trick found in this paper: http://www.eecs.harvard.edu/~kirsch/pubs/bbbf/rsa.pdf. Basically, it says you can save a lot of computation by using a linear combination of two hash values instead of k different hashes for the BF. These two tricks together gave an order of magnitude (~5x) improvement in query time.

Categories