How to use cython compiler in python - python

in this link: http://earthpy.org/speed.html I found the following
%%cython
import numpy as np
def useless_cython(year):
# define types of variables
cdef int i, j, n
cdef double a_cum
from netCDF4 import Dataset
f = Dataset('air.sig995.'+year+'.nc')
a = f.variables['air'][:]
a_cum = 0.
for i in range(a.shape[0]):
for j in range(a.shape[1]):
for n in range(a.shape[2]):
#here we have to convert numpy value to simple float
a_cum = a_cum+float(a[i,j,n])
# since a_cum is not numpy variable anymore,
# we introduce new variable d in order to save
# data to the file easily
d = np.array(a_cum)
d.tofile(year+'.bin')
print(year)
return d
It seems to be as easy as to just write %%cython over the function. However this just doesnt work for me -> "Statement seems to have no effect" says my IDE.
After a bit of research I found that the %% syntax comes from iphyton which I did also install (as well as cython). Still doesnt work. Iam using python3.6
Any ideas?

Once you are in the IPython interpreter you have to load the extension prior to using it. It can be done with the statement %load_ext, so in your case :
%load_ext cython
These two tools are pretty well documented, if you have not seen it yet, take a look at the relevant part of the document on cython side and on IPython side.

Related

Is there an equivalent to numpy where() but with the condition evaluated sequentially rather than as a bool map before selection?

I'm creating an elevation map from a point cloud. When I project the point cloud on to the elevation map there may be multiple points that project to the same cell. I only want the point with highest height in each cell.
I have my elevation map as a (J,K) size float type numpy array named image. The cells in the image that need to be updated are in a (2,N) array named pix_points. And my heights are in a (1,N) array. My initial attempt at implementing this operation looks like:
image[pix_points[1,:], pix_points[0,:]] = (
np.where(
image[pix_points[1,:], pix_points[0,:]] <= heights,
heights,
image[pix_points[1,:], pix_points[0,:]]))
I thought this would work but it seems that the condition in where is evaluated first and with that bool map then chooses between heights or the existing value at that location. This behaviour can't deal with repeated image locations with different heights, as it needs to first write in a location and then later on (within the same pass) check if the same location is higher than another height and update accordingly. I don't want to use a python loop to implement this. Is there any way of doing what I intend in a vectorized way?
UPDATE
As requested the python loop code for the described operation is just:
for p, h in zip(pix_points.T, heights):
if image[p[1],p[0]] <= h:
image[p[1],p[0]] = h
This is actually faster than I thought it would be. For N being about half a million it takes about 1.75s. I've been using the python loop version for smaller point coulds and it's fine. Although I'd like to speed it up as it's part of a live system, where the point clouds come from a camera live stream. I see I'd probably have to rewrite it in cython or c++ if I want to get a further speed up. I just wanted to make sure there isn't a numpy way (or other python library way) of doing this first, as I'm not too fluent writing cython or C++ python bindings.
Seeing there is no obvious numpy way of doing what I want and following Ahmed AEK suggestion, I've rewritten the loop in cython and got about a 100 times speed up. For reference I put here my implementation update_elevation_map.pyx
#cython: profile=False
#cython: boundscheck=False
#cython: wraparound=False
#cython: initializedcheck=False
#cython: nonecheck=False
import numpy as np
cimport numpy as np
def update_elevation_map(
np.float32_t[:, :] image,
np.int64_t[:, :] pix_points,
np.float32_t[:] heights):
cdef np.int64_t[:] p
cdef np.float32_t h
cdef int N = heights.shape[0]
cdef int i
for i in range(N):
p = pix_points[:,i]
h = heights[i]
if image[p[1],p[0]] <= h:
image[p[1],p[0]] = h
return np.asarray(image)
setup.py
from distutils.core import setup
from Cython.Build import cythonize
setup(name="update_elevation_map", ext_modules=cythonize('update_elevation_map.pyx'),)
and built with the command:
python3 setup.py build_ext --inplace

VSCode Itellisense with python C extension module (petsc4py)

I'm currently using a python module called petsc4py (https://pypi.org/project/petsc4py/). My main issue is that none of the typical intellisense features seems to work with this module.
I'm guessing it might have something to do with it being a C extension module, but I am not sure exactly why this happens. I initially thought that intellisense was unable to look inside ".so" files, but it seems that numpy is able to do this with the array object, which in my case is inside a file called multiarray.cpython-37m-x86_64-linux-gnu (check example below).
Does anyone know why I see this behaviour in the petsc4py module. Is there anything that I (or the developers of petsc4py) can do to get intellisense to work?
Example:
import sys
import petsc4py
petsc4py.init(sys.argv)
from petsc4py import PETSc
x_p = PETSc.Vec().create()
x_p.setSizes(10)
x_p.setFromOptions()
u_p = x_p.duplicate()
import numpy as np
x_n = np.array([1,2,3])
u_n = x_n.copy()
In this example, when trying to work with a Vec object from petsc4py, doing u_p.duplicate() cannot find the function and the suggestion is simply a repetition of the function immediately before. However, using an array from numpy, doing u_n.copy() works perfectly.
If you're compiling in-place then you're bumping up against https://github.com/microsoft/python-language-server/issues/197.

PyCuda - How can I use functions written in Python in the kernel?

I want to parallelize my Python code and I'm trying to use PyCuda.
What I saw so far is that you have to write a "Kernel" in C into your Python code. This Kernel is what is going to be parallelized. Am I right?
Example (doubling an array of random numbers, from https://documen.tician.de/pycuda/tutorial.html):
import pycuda.driver as cuda
import pycuda.autoinit
from pycuda.compiler import SourceModule
import numpy
a = numpy.random.randn(4, 4)
a = a.astype(numpy.float32)
a_gpu = cuda.mem_alloc(a.nbytes)
cuda.memcpy_htod(a_gpu, a)
# Kernel:
mod = SourceModule("""
__global__ void doublify(float *a)
{
int idx = threadIdx.x + threadIdx.y*4;
a[idx] *= 2;
}
""")
func = mod.get_function("doublify")
func(a_gpu, block=(4, 4, 1))
a_doubled = numpy.empty_like(a)
cuda.memcpy_dtoh(a_doubled, a_gpu)
print(a_doubled)
print(a)
The point is that my Python code has classes and other things all suitable with Python and unsuitable with C (i.e. untranslatable to C).
Let me clarify: my has 256 independent for-loops that I want to parallelize. These loops contain Python code that can’t be translated to C.
How can I parallelize an actual Python code with PyCuda without translating my code to C?
You can't.
PyCUDA doesn't support device side python, all device code must be written in the CUDA C dialect.
Numba includes a direct Python compiler which can allow an extremely limited subset of Python language features to be compiled and run directly on the GPU. This does not include access to any Python libraries such as numpy, scipy, etc.

Calling C++ function from Inside Python function

In R we can use Rcpp to call a cpp function as the one below:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
SEXP critcpp(SEXP a, SEXP b){
NumericMatrix X(a);
NumericVector crit(b);
int p = XtX.ncol();
NumericMatrix critstep(p,p);
NumericMatrix deltamin(p,p);
List lst(2);
for (int i = 0; i < (p-1); i++){
for (int j = i+1; j < p; j++){
--some calculations
}
}
lst[0] = critstep;
lst[1] = deltamin;
return lst;
}
I want to do the same thing in python.
I have gone through Boost,SWIG etc but it seems complicated to my newbie Python eyes.
Can the python wizards here kindly point me in the right direction.
I need to call this C++ function from inside a Python function.
Since I think the only real answer is by spending some time in rewriting the function you posted, or by writing a some sort of wrapper for the function (absolutely possible but quite time consuming) I'm answering with a completely different approach...
Without passing by any sort of compiled conversion, a really faster way (from a programming time point of view, not in efficiency) may be directly calling the R interpreter with the module of the function you posted from within python, through the python rpy2 module, as described here. It requires the panda module, to handle the data frames from R.
The module to use (in python) are:
import numpy as np # for handling numerical arrays
import scipy as sp # a good utility
import pandas as pd # for data frames
from rpy2.robjects.packages import importr # for importing your module
import rpy2.robjects as ro # for calling R interpreter from within python
import pandas.rpy.common as com # for storing R data frames in pandas data frames.
In your code you should import your module by calling importr
importr('your-module-with-your-cpp-function')
and you can send directly commands to R by issuing:
ro.r('x = your.function( blah blah )')
x_rpy = ro.r('x')
type(x_rpy)
# => rpy2.robjects.your-object-type
you can store your data in a data frame by:
py_df = com.load_data('variable.name')
and push back a data frame through:
r_df = com.convert_t_r_dataframe(py_df)
ro.globalenv['df'] = r_df
This is for sure a workaround for your question, but it may be considered as a reasonable solution for certain applications, even if I do not suggest it for "production".

Understanding Cython "typedness" report

I'm using Cython to make my
Python code more efficient. I have read about the Cython's function cython -a filename.pyx to see the "typedness" of my cython code. Here is the short reference from Cython web page. My environment is Windows 7, Eclipse PyDev, Python 2.7.5 32-bit, Cython 0.20.1 32-bit, MinGW 32-bit.
Here is the report for my code:
So does the color yellow mean efficient or non-efficient code? The more yellow it is the more....what?
Another question, I can click on the numbered rows and the following report opens (e.g. row 23):
What does this mean? P.S. If you can't see the image well enough --> right click --> view image (on Windows 7) ;)
Thnx for any assistance =)
UPDATE:
In case somebody wants to try my toy code here they are:
hello.pyx
import time
cdef char say_hello_to(char name):
print("Hello %s!" % name)
cdef double f(double x) except? -2:
return x**2-x
cdef double integrate_f(double a, double b, int N) except? -2:
cdef int i
cdef double s, dx
s = 0
dx = (b-a)/N
for i in range(N):
s += f(a+i*dx)
return s * dx
cpdef p():
s = 0
for i in range(0, 1000000):
c = time.time()
integrate_f(0,100,5)
s += time.time()- c
print s
test_cython.py
import hello as hel
hel.p()
setup.py
from distutils.core import setup
from Cython.Build import cythonize
setup(
name = 'Hello world app',
ext_modules = cythonize("hello.pyx"),
)
From command line prompt I used the command (to generate C, pyd etc files):
python setup.py install build --compiler=mingw32
To generate the report I used:
cython -a hello.pyx
This is more or less what you wrote. To be more precise:
Yellowish line signal Cython command which are not directly translated to pure C code but which work by calling CPython API to do the job. Those line includes:
Python object allocation
calls to Python function and builtins
operating on Python high level data stuctures (eg: list, tuples, dictionary)
use of overloaded operation on Python types (eg: +, * in Python integers vs C int)
In any case, this is a good indication that thing might be improved.
I think I perhaps got it myself already. Someone can correct me if I'm mistaken:
The more "yellowish" the line is, then less efficient it is
The most efficient lines are the white-colored lines, because these are translated into pure C-code.

Categories