Pyopencl array sum to add an array - python

I am new to opencl and pyopencl. I am trying to write a basic program to add. I came across this documentation and tried this small code in python. Obviously, it is not working.
import pyopencl as cl
import pyopencl.array
import numpy
context = cl.create_some_context()
queue = cl.CommandQueue(context)
h_a = numpy.random.rand(3,3)
d_a = cl.Buffer(context, cl.mem_flags.READ_ONLY |
cl.mem_flags.COPY_HOST_PTR, hostbuf=h_a)
print cl.array.sum(d_a, dtype=None, queue=queue)
As you can assess, I am not sure about how to use those predefined functions.

My PyOpenCl tutorial has an array sum example with inline comments explaining what each line does:
Try running that, I hope it is helpful!


VSCode Itellisense with python C extension module (petsc4py)

I'm currently using a python module called petsc4py ( My main issue is that none of the typical intellisense features seems to work with this module.
I'm guessing it might have something to do with it being a C extension module, but I am not sure exactly why this happens. I initially thought that intellisense was unable to look inside ".so" files, but it seems that numpy is able to do this with the array object, which in my case is inside a file called multiarray.cpython-37m-x86_64-linux-gnu (check example below).
Does anyone know why I see this behaviour in the petsc4py module. Is there anything that I (or the developers of petsc4py) can do to get intellisense to work?
import sys
import petsc4py
from petsc4py import PETSc
x_p = PETSc.Vec().create()
u_p = x_p.duplicate()
import numpy as np
x_n = np.array([1,2,3])
u_n = x_n.copy()
In this example, when trying to work with a Vec object from petsc4py, doing u_p.duplicate() cannot find the function and the suggestion is simply a repetition of the function immediately before. However, using an array from numpy, doing u_n.copy() works perfectly.
If you're compiling in-place then you're bumping up against

Calling C++ function from Inside Python function

In R we can use Rcpp to call a cpp function as the one below:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
SEXP critcpp(SEXP a, SEXP b){
NumericMatrix X(a);
NumericVector crit(b);
int p = XtX.ncol();
NumericMatrix critstep(p,p);
NumericMatrix deltamin(p,p);
List lst(2);
for (int i = 0; i < (p-1); i++){
for (int j = i+1; j < p; j++){
--some calculations
lst[0] = critstep;
lst[1] = deltamin;
return lst;
I want to do the same thing in python.
I have gone through Boost,SWIG etc but it seems complicated to my newbie Python eyes.
Can the python wizards here kindly point me in the right direction.
I need to call this C++ function from inside a Python function.
Since I think the only real answer is by spending some time in rewriting the function you posted, or by writing a some sort of wrapper for the function (absolutely possible but quite time consuming) I'm answering with a completely different approach...
Without passing by any sort of compiled conversion, a really faster way (from a programming time point of view, not in efficiency) may be directly calling the R interpreter with the module of the function you posted from within python, through the python rpy2 module, as described here. It requires the panda module, to handle the data frames from R.
The module to use (in python) are:
import numpy as np # for handling numerical arrays
import scipy as sp # a good utility
import pandas as pd # for data frames
from rpy2.robjects.packages import importr # for importing your module
import rpy2.robjects as ro # for calling R interpreter from within python
import pandas.rpy.common as com # for storing R data frames in pandas data frames.
In your code you should import your module by calling importr
and you can send directly commands to R by issuing:
ro.r('x = your.function( blah blah )')
x_rpy = ro.r('x')
# => rpy2.robjects.your-object-type
you can store your data in a data frame by:
py_df = com.load_data('')
and push back a data frame through:
r_df = com.convert_t_r_dataframe(py_df)
ro.globalenv['df'] = r_df
This is for sure a workaround for your question, but it may be considered as a reasonable solution for certain applications, even if I do not suggest it for "production".

Anaconda package for cufft keeping arrays in gpu memory between fft / ifft calls

I am using the anaconda suite with ipython 3.6.1 and their accelerate package. There is a cufft sub-package in this two functions fft and ifft. These, as far as I understand, takes in a numpy array and outputs to a numpy array, both in system ram, i.e. all gpu-memory and transfer between system and gpu memory is handled automatically and gpu memory is releaseed as function is ended. This seems all very nice and seems to work for me. However, I would like to run multiple fft/ifft calls on the same array and for each time extract just one number from the array. It would be nice to keep the array in the gpu memory to minimize system <-> gpu transfer. Am I correct that this is not possible using this package? If so, is there another package that would do the same. I have noticed the reikna project but that doesn't seem available in anaconda.
The thing I am doing (and would like to do efficiently on gpu) is in short shown here using numpy.fft
import math as m
import numpy as np
import numpy.fft as dft
nr = 100
nh = 2**16
h = np.random.rand(nh)*1j
H = np.zeros(nh,dtype='complex64')
h[10] = 1
r = np.zeros(nr,dtype='complex64')
fftscale = m.sqrt(nh)
corr = 0.12j
for i in np.arange(nr):
r[i] = h[10]
H = dft.fft(h,nh)/fftscale
h = dft.ifft(h*corr)*fftscale
r[nr-1] = h[10]
Thanks in advance!
So I found Arrayfire which seems rather easy to work with.

python multiprocessing.pool application

following (simplified) code applies an interpolation function on the multiprocessing module:
from multiprocessing import Pool
from scipy.interpolate import LinearNDInterpolator
if __name__=="__main__":
lndi = LinearNDInterpolator(points, valuesA)
valuesB = list(np.split(valuesA, 4))
ret =, valuesB)
when i run the .py, python freezes, if the last line is run separately everything works fine and i get the speed-up that i hoped for.
anyone knows how to fix the code to have it work automatically?
thanks in advance
edit: github issue was opened ->

Python/Numpy array element assignment issue

I'm trying to use Python/Numpy for a project that I'd normally do in Matlab, so I'm somewhat new to this environment (though I have played with Python/Django on the web development side). I'm now running into what I have to believe is a super simple issue that occurs when I'm trying to assign an element of a numpy array to another numpy array. The basic offending code is as follows. It does have some other fluff around it which I don't believe could be causing the issue, but I can provide that code as well if it would help.
import numpy as np
tf = 100
dt = 10
X0 = np.array([6978,0,5.8787,5.8787])
xhist = np.zeros(tf/dt+1)
yhist = np.zeros(tf/dt+1)
xhist[0] = X0[0]
yhist[0] = X0[1]
When I run the above code, the first print statement gives me 6978, as expected; however, the second print statement gives me 0, and I can't figure out for the life of me why this is. Any ideas? Thanks in advance!
