Python OpenCL host program to cl program parameter passing - python

Hi I am trying OpenCL using python. I am trying to pass an array and a const variable to the cl program and simply copying the const variable to array on the cl device. This should be very simple but I am getting the following error:
Traceback (most recent call last):
File "<pyshell#103>", line 1, in <module>
test()
File "D:/Programming/Programs_OpenCL_Python/Host_CL_Parameter_Passing.py", line 141, in test
event = prg.test( queue, (10,1), None, a_dev, b)
File "C:\Python27\lib\site-packages\pyopencl-2012.1-py2.7-win32.egg\pyopencl\__init__.py", line 457, in kernel_call
self.set_args(*args)
File "C:\Python27\lib\site-packages\pyopencl-2012.1-py2.7-win32.egg\pyopencl\__init__.py", line 509, in kernel_set_args
% (i+1, str(e), advice))
LogicError: when processing argument #2 (1-based): Kernel.set_arg failed: invalid value - invalid kernel argument
Here's the code Code:
def test():
ctx = cl.create_some_context()
queue = cl.CommandQueue(ctx)
a = np.empty(10, dtype = int)
b = int(1)
a_dev = cl.Buffer(ctx, cl.mem_flags.WRITE_ONLY, a.nbytes)
prg = cl.Program( ctx, """__kernel void test(__global int *a, const int b){
int i = get_global_id(0);
a[i] = b;
}""").build()
event = prg.test( queue, (10,1), None, a_dev, b)
event.wait()
cl.enqueue_copy( queue, a, a_dev)
print a
Can someone tell me the problem and give me a solution? This is driving me crazy.
Thankyou

You need to convert the integer argument to the numpy int32 type:
event = prg.test( queue, (10,1), None, a_dev, np.int32(b))
BTW I was able to figure that out by looking at the Mandelbrot Example

Related

difference in giving parameters to opengl code in LWJGL and PyOpenGL

i am learning opengl with python.and following this course
https://www.youtube.com/watch?v=WMiggUPst-Q&list=PLRIWtICgwaX0u7Rf9zkZhLoLuZVfUksDP&index=2
just to be able to do.he is using LWJGL, i am PyOpengl. i noticed some of his methods (glgenVertexArray, gldeleteVertexArray ...ex) is used without parameters even docs says otherwise. while i wrote same code in python it's says
glGenVertexArrays requires 1 arguments (n, arrays), received 0: ()
it' wants a parameter from me for the same method. it's not a problem here(i think) give 1 but when its come to glDeleteVertexArrays if i dont give 1 and the list that i am keeping vao,vbo ids its raises this
Traceback (most recent call last):
File "C:\Users\TheUser\AppData\Local\Programs\Python\Python38-32\lib\site-packages\OpenGL\latebind.py", line 43, in call
return self._finalCall( *args, **named )
TypeError: 'NoneType' object is not callable
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:/Users/TheUser/Desktop/MyPytonDen/ThinMatrixOpenGl/engineTester/MainGameLoop.py", line 22, in
Loader.CleanUP()
File "C:\Users\TheUser\Desktop\MyPytonDen\ThinMatrixOpenGl\renderEngine\Loader.py", line 12, in CleanUP
glDeleteVertexArrays()
File "C:\Users\TheUser\AppData\Local\Programs\Python\Python38-32\lib\site-packages\OpenGL\latebind.py", line 47, in call
return self._finalCall( *args, **named )
File "C:\Users\TheUser\AppData\Local\Programs\Python\Python38-32\lib\site-packages\OpenGL\wrapper.py", line 689, in wrapperCall
pyArgs = tuple( calculate_pyArgs( args ))
File "C:\Users\TheUser\AppData\Local\Programs\Python\Python38-32\lib\site-packages\OpenGL\wrapper.py", line 436, in calculate_pyArgs
raise ValueError(
ValueError: glDeleteVertexArrays requires 2 arguments (n, arrays), received 0: ()
i handle this as i say but i dont think its appropriate.
so i am asking what its acctuly want from me (docs wasn't explicit enough for me) and why it's wants for PyOpenGl but not LWJGL
and this is the file:
from ThinMatrixOpenGl.renderEngine.RawModel import RawModel
from OpenGL.GL import *
import numpy as np
VAOs = []
VBOs = []
def CleanUP():
print(VAOs, VBOs)
for vao in VAOs:
glDeleteVertexArrays(int(vao), VAOs)
for vbo in VBOs:
glDeleteBuffers(int(vbo), VBOs)
def LoadToVao(positions):
global VAOs
VAO_ID = CreateVao()
VAOs.append(VAO_ID)
storeDataInAttribList(0, positions)
unbindVao()
return RawModel(vao_id=VAO_ID, vertex_count=(len(positions) / 3))
def CreateVao():
VAO_ID = glGenVertexArrays(1)
glBindVertexArray(VAO_ID)
return VAO_ID
def storeDataInAttribList(attrib_number: int, data: float):
global VBOs
VBO_id = glGenBuffers(1)
VBOs.append(VBO_id)
glBindBuffer(GL_ARRAY_BUFFER, VBO_id)
buffer = StoreDataInFloatBuffer(data)
glBufferData(GL_ARRAY_BUFFER, buffer, GL_STATIC_DRAW)
glVertexAttribPointer(attrib_number, 3, GL_FLOAT, GL_FALSE, 0, None)
glBindBuffer(GL_ARRAY_BUFFER, 0)
def unbindVao():
glBindVertexArray(0)
def StoreDataInFloatBuffer(data: float):
buffer = np.array(data, dtype=np.float32)
return buffer
See the OpenGL 4.6 API Core Profile Specification - 10.3.1 Vertex Array Objects
void DeleteVertexArrays( sizei n, const uint *arrays );
See PyOpneGL - glDeleteVertexArrays:
Signature
glDeleteVertexArrays( GLsizei ( n ) , const GLuint *( arrays ) )-> void
def glDeleteVertexArrays( n , arrays )
The 2nd argument must be an array with the element type "unit":
def CleanUP():
np_vaos = np.array([vao], dtype="uint")
glDeleteVertexArrays(np_vaos.size, np_vaos)
In newer PyOpenGL versions, however, the second argument can also be a list:
def CleanUP():
glDeleteVertexArrays(len(VAOs), VAOs)
When using LWJGL, the size argument (n) is deduced from the Java array object. Different libraries in different languages provide different overloads for the OpenGL API functions. If a function behaves unexpectedly and differs from the OpenGL specification, you must consult the API documentation for the libraries.

Parallel Python not able to take arguments properly

I have just started using Parallel Python (pp) in Python3, and I am currently having trouble with submitting arguments of object function.
Is it possible that argument cannot be a list? I could not find anyone having the same error message as me, so I am confused.
import pp, numpy
class myobj:
def __init__(self):
"""some code"""
def myfunc(self, data, n):
return [data[numpy.random.randint(0, N)] for i in range(N)]
if __name__ == "__main__":
ppservers = ()
job_server = pp.Server(ppservers = ppservers)
proc = myobj()
data = [[1, 2, 3], [4, 5, 6]]
N = 2
results = []
for i in range(10):
f = job_server.submit(proc.myfunc, (data, N), modules=('numpy',))
results.append(f)
for f in results:
val = f()
print(val)
A fatal error has occured during the function execution
Traceback (most recent call last):
File "/anaconda3/envs/mvi/lib/python3.6/site-packages/ppft/__main__.py", line 94, in run
__fname, __fobjs = self.t.creceive(preprocess)
File "/anaconda3/envs/mvi/lib/python3.6/site-packages/ppft/transport.py", line 128, in creceive
self.rcache[hash1] = tuple(map(preprocess, (msg, )))[0]
File "/anaconda3/envs/mvi/lib/python3.6/site-packages/ppft/__main__.py", line 60, in preprocess
fobjs = [compile(fsource, '<string>', 'exec') for fsource in fsources]
File "/anaconda3/envs/mvi/lib/python3.6/site-packages/ppft/__main__.py", line 60, in <listcomp>
fobjs = [compile(fsource, '<string>', 'exec') for fsource in fsources]
File "<string>", line 1
gging(self, dataset, N):
^
SyntaxError: invalid syntax
None

How to assign a value to a sliced output signal?

I'm a beginner with myhdl.
I try to translate the following Verilog code to MyHDL:
module ModuleA(data_in, data_out, clk);
input data_in;
output reg data_out;
input clk;
always #(posedge clk) begin
data_out <= data_in;
end
endmodule
module ModuleB(data_in, data_out, clk);
input [1:0] data_in;
output [1:0] data_out;
input clk;
ModuleA instance1(data_in[0], data_out[0], clk);
ModuleA instance2(data_in[1], data_out[1], clk);
endmodule
Currently, I have this code:
import myhdl
#myhdl.block
def ModuleA(data_in, data_out, clk):
#myhdl.always(clk.posedge)
def logic():
data_out.next = data_in
return myhdl.instances()
#myhdl.block
def ModuleB(data_in, data_out, clk):
instance1 = ModuleA(data_in(0), data_out(0), clk)
instance2 = ModuleA(data_in(1), data_out(1), clk)
return myhdl.instances()
# Create signals
data_in = myhdl.Signal(myhdl.intbv()[2:])
data_out = myhdl.Signal(myhdl.intbv()[2:])
clk = myhdl.Signal(bool())
# Instantiate the DUT
dut = ModuleB(data_in, data_out, clk)
# Convert tfe DUT to Verilog
dut.convert()
But it doesn't works because signal slicing produce a read-only shadow signal (cf MEP-105).
So, what is it the good way in MyHDL to have a writable slice of a signal?
Edit:
This is the error I get
$ python demo.py
Traceback (most recent call last):
File "demo.py", line 29, in <module>
dut.convert()
File "/home/killruana/.local/share/virtualenvs/myhdl_sandbox-dYpBu4o5/lib/python3.6/site-packages/myhdl-0.10-py3.6.egg/myhdl/_block.py", line 342, in convert
File "/home/killruana/.local/share/virtualenvs/myhdl_sandbox-dYpBu4o5/lib/python3.6/site-packages/myhdl-0.10-py3.6.egg/myhdl/conversion/_toVerilog.py", line 177, in __call__
File "/home/killruana/.local/share/virtualenvs/myhdl_sandbox-dYpBu4o5/lib/python3.6/site-packages/myhdl-0.10-py3.6.egg/myhdl/conversion/_analyze.py", line 170, in _analyzeGens
File "/usr/lib/python3.6/ast.py", line 253, in visit
return visitor(node)
File "/home/killruana/.local/share/virtualenvs/myhdl_sandbox-dYpBu4o5/lib/python3.6/site-packages/myhdl-0.10-py3.6.egg/myhdl/conversion/_analyze.py", line 1072, in visit_Module
File "/home/killruana/.local/share/virtualenvs/myhdl_sandbox-dYpBu4o5/lib/python3.6/site-packages/myhdl-0.10-py3.6.egg/myhdl/conversion/_misc.py", line 148, in raiseError
myhdl.ConversionError: in file demo.py, line 4:
Signal has multiple drivers: data_out
You can use an intermediate list of Signal(bool()) as placeholder.
#myhdl.block
def ModuleB(data_in, data_out, clk):
tsig = [myhdl.Signal(bool(0)) for _ in range(len(data_in))]
instances = []
for i in range(len(data_in)):
instances.append(ModuleA(data_in(i), tsig[i], clk))
#myhdl.always_comb
def assign():
for i in range(len(data_out)):
data_out.next[i] = tsig[i]
return myhdl.instances()
A quick (probably non-fulfilling) comment, is that the intbv is treated as a single entity that can't have multiple drives. Two references that might help shed some light:
http://jandecaluwe.com/hdldesign/counting.html
http://docs.myhdl.org/en/stable/manual/structure.html#converting-between-lists-of-signals-and-bit-vectors

Pass a function as argument to a process target with Pool.map()

I'm developing a software to benchmark some scripts Python using different methods (mono-thread, multi-threads, multi-processes). So I need to execute the same function (with same arguments, etc...) in differents processes.
How to pass the function to execute as argument to a process target ?
What I currently understand is that a reference to a function cannot work because the function referenced is not visible for other processes, that's why I tried with a custom manager for the shared memory.
Here a simplified code:
#!/bin/python
from multiprocessing import Pool
from multiprocessing.managers import BaseManager
from itertools import repeat
class FunctionManager(BaseManager):
pass
def maFunction(a, b):
print(a + b)
def threadedFunction(f_i_args):
(f, i, args) = f_i_args
f(*args)
FunctionManager.register('Function', maFunction)
myManager = FunctionManager()
myManager.start()
myManager.Function(0, 0) # Test 1
threadedFunction((maFunction, 0, (1, 1))) # Test 2
p = Pool()
args = zip(repeat(myManager.Function), range(10), repeat(2, 2))
p.map(threadedFunction, args) # Does not work
p.join()
myManager.shutdown()
The current pickling error at "p.map()" is the following :
2
0
Traceback (most recent call last):
File "./test.py", line 27, in <module>
p.map(threadedFunction, args) # Does not work
File "/usr/lib/python3.5/multiprocessing/pool.py", line 260, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/usr/lib/python3.5/multiprocessing/pool.py", line 608, in get
raise self._value
File "/usr/lib/python3.5/multiprocessing/pool.py", line 385, in _handle_tasks
put(task)
File "/usr/lib/python3.5/multiprocessing/connection.py", line 206, in send
self._send_bytes(ForkingPickler.dumps(obj))
File "/usr/lib/python3.5/multiprocessing/reduction.py", line 50, in dumps
cls(buf, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <class 'weakref'>: attribute lookup weakref on builtins failed
I got a bit different error from running your code. Your key problem I think is that you pass a function to FunctionManager.register() instead of a class. I also had to remove your zip to make it work and create a list manually, but this you can probably fix. This is just an example.
The following code works and does something using your exact structure. I would do this a bit differently and not use BaseManager, but I assume you have your reasons.
#!/usr/bin/python3.5
from multiprocessing import Pool
from multiprocessing.managers import BaseManager
from itertools import repeat
class FunctionManager(BaseManager):
pass
class maClass(object):
def __init__(self):
pass
def maFunction(self,a, b):
print(a + b)
def threadedFunction(f_i_args):
(f, i, args) = f_i_args
f(*args)
FunctionManager.register('Foobar', maClass)
myManager = FunctionManager()
myManager.start()
foobar = myManager.Foobar()
foobar.maFunction(0, 0) # Test 1
threadedFunction((foobar.maFunction, 0, (1, 1))) # Test 2
p = Pool()
#args = list(zip(repeat(foobar.maFunction), range(10), repeat(2, 2)))
args = []
for i in range(10):
args.append([foobar.maFunction, i, (i,2)])
p.map(threadedFunction, args) # Does now work
p.close()
p.join()
myManager.shutdown()
Or did I misunderstand your problem completely?
Hannu

Cuda out of resources error when running python numbapro

I am trying to run a cuda kernel in numbapro python, but I keep getting an out of resources error.
I then tried to execute the kernel into a loop and send smaller arrays, but that still gave me the same error.
Here is my error message:
Traceback (most recent call last):
File "./predict.py", line 418, in <module>
predict[griddim, blockdim, stream](callResult_d, catCount, numWords, counts_d, indptr_d, indices_d, probtcArray_d, priorC_d)
File "/home/mhagen/Developer/anaconda/lib/python2.7/site-packages/numba/cuda/compiler.py", line 228, in __call__
sharedmem=self.sharedmem)
File "/home/mhagen/Developer/anaconda/lib/python2.7/site-packages/numba/cuda/compiler.py", line 268, in _kernel_call
cu_func(*args)
File "/home/mhagen/Developer/anaconda/lib/python2.7/site-packages/numba/cuda/cudadrv/driver.py", line 1044, in __call__
self.sharedmem, streamhandle, args)
File "/home/mhagen/Developer/anaconda/lib/python2.7/site-packages/numba/cuda/cudadrv/driver.py", line 1088, in launch_kernel
None)
File "/home/mhagen/Developer/anaconda/lib/python2.7/site-packages/numba/cuda/cudadrv/driver.py", line 215, in safe_cuda_api_call
self._check_error(fname, retcode)
File "/home/mhagen/Developer/anaconda/lib/python2.7/site-packages/numba/cuda/cudadrv/driver.py", line 245, in _check_error
raise CudaAPIError(retcode, msg)
numba.cuda.cudadrv.driver.CudaAPIError: Call to cuLaunchKernel results in CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES
Here is my source code:
from numbapro.cudalib import cusparse
from numba import *
from numbapro import cuda
#cuda.jit(argtypes=(double[:], int64, int64, double[:], int64[:], int64[:], double[:,:], double[:] ))
def predict( callResult, catCount, wordCount, counts, indptr, indices, probtcArray, priorC ):
i = cuda.threadIdx.x + cuda.blockIdx.x * cuda.blockDim.x
correct = 0
wrong = 0
lastDocIndex = -1
maxProb = -1e6
picked = -1
for cat in range(catCount):
probSum = 0.0
for j in range(indptr[i],indptr[i+1]):
wordIndex = indices[j]
probSum += (counts[j]*math.log(probtcArray[cat,wordIndex]))
probSum += math.log(priorC[cat])
if probSum > maxProb:
maxProb = probSum
picked = cat
callResult[i] = picked
predictions = []
counter = 1000
for i in range(int(math.ceil(numDocs/(counter*1.0)))):
docTestSliceList = docTestList[i*counter:(i+1)*counter]
numDocsSlice = len(docTestSliceList)
docTestArray = np.zeros((numDocsSlice,numWords))
for j,doc in enumerate(docTestSliceList):
for ind in doc:
docTestArray[j,ind['term']] = ind['count']
docTestArraySparse = cusparse.ss.csr_matrix(docTestArray)
start = time.time()
OPT_N = numDocsSlice
blockdim = 1024, 1
griddim = int(math.ceil(float(OPT_N)/blockdim[0])), 1
catCount = len(music_categories)
callResult = np.zeros(numDocsSlice)
stream = cuda.stream()
with stream.auto_synchronize():
probtcArray_d = cuda.to_device(numpy.asarray(probtcArray),stream)
priorC_d = cuda.to_device(numpy.asarray(priorC),stream)
callResult_d = cuda.to_device(callResult, stream)
counts_d = cuda.to_device(docTestArraySparse.data, stream)
indptr_d = cuda.to_device(docTestArraySparse.indptr, stream)
indices_d = cuda.to_device(docTestArraySparse.indices, stream)
predict[griddim, blockdim, stream](callResult_d, catCount, numWords, counts_d, indptr_d, indices_d, probtcArray_d, priorC_d)
callResult_d.to_host(stream)
#stream.synchronize()
predictions += list(callResult)
print "prediction %d: %f" % (i,time.time()-start)
I found out this was in the cuda procedure.
When you call predict the blockdim is set to 1024.
predict[griddim, blockdim, stream](callResult_d, catCount, numWords, counts_d, indptr_d, indices_d, probtcArray_d, priorC_d)
But the procedure is called iteratively with slice sizes of 1000 elements not 1024.
So, in the procedure it will attempt to write 24 elements that are out of bounds in the return array.
Sending a number of elements parameter (n_el) and placing an error checking call in the cuda procedure solves it.
#cuda.jit(argtypes=(double[:], int64, int64, int64, double[:], int64[:], int64[:], double[:,:], double[:] ))
def predict( callResult, n_el, catCount, wordCount, counts, indptr, indices, probtcArray, priorC ):
i = cuda.threadIdx.x + cuda.blockIdx.x * cuda.blockDim.x
if i < n_el:
....

Categories