Tensorflow crash with CUDNN_STATUS_ALLOC_FAILED - python

Been searching the web for hours with no results, so figured I'd ask here.
I'm trying to make a self driving car following Sentdex's tutorial, but when running the model, I get a bunch of fatal errors. I've searched all over the internet for the solution, and many seem to have the same problem. However, none of the solutions I've found (Including this Stack-post), work for me.
Here is my software:
Tensorflow: 1.5, GPU version
CUDA: 9.0, with the patch
CUDnn: 7
Windows 10 Pro
Python 3.6
Hardware:
Nvidia 1070ti, with latest drivers
Intel i5 7600K
Here is the crash log:
2018-02-04 16:29:33.606903: E C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\stream_executor\cuda\cuda_blas.cc:444] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2018-02-04 16:29:33.608872: E C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\stream_executor\cuda\cuda_blas.cc:444] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2018-02-04 16:29:33.609308: E C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\stream_executor\cuda\cuda_blas.cc:444] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2018-02-04 16:29:35.145249: E C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\stream_executor\cuda\cuda_dnn.cc:385] could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED
2018-02-04 16:29:35.145563: E C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\stream_executor\cuda\cuda_dnn.cc:352] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
2018-02-04 16:29:35.149896: F C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\kernels\conv_ops.cc:717] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo<T>(), &algorithms)
Here's my code:
import tensorflow as tf
import numpy as np
import cv2
import time
from PIL import ImageGrab
from getkeys import key_check
from alexnet import alexnet
import os
from sendKeys import PressKey, ReleaseKey, W,A,S,D,Sp
import random
WIDTH = 80
HEIGHT = 60
LR = 1e-3
EPOCHS = 10
MODEL_NAME = 'DiRT-AI-Driver-{}-{}-{}-epochs.model'.format(LR, 'alexnetv2', EPOCHS)
def straight():
PressKey(W)
ReleaseKey(A)
ReleaseKey(S)
ReleaseKey(D)
ReleaseKey(Sp)
def left():
PressKey(A)
ReleaseKey(W)
ReleaseKey(S)
ReleaseKey(D)
ReleaseKey(Sp)
def right():
PressKey(D)
ReleaseKey(A)
ReleaseKey(S)
ReleaseKey(W)
ReleaseKey(Sp)
def brake():
PressKey(S)
ReleaseKey(A)
ReleaseKey(W)
ReleaseKey(D)
ReleaseKey(Sp)
def handbrake():
PressKey(Sp)
ReleaseKey(A)
ReleaseKey(S)
ReleaseKey(D)
ReleaseKey(W)
model = alexnet(WIDTH, HEIGHT, LR)
model.load(MODEL_NAME)
def main():
last_time = time.time()
for i in list(range(4))[::-1]:
print(i+1)
time.sleep(1)
paused = False
while(True):
if not paused:
screen = np.array(ImageGrab.grab(bbox=(0,40,1024,768)))
screen = cv2.cvtColor(screen,cv2.COLOR_BGR2GRAY)
screen = cv2.resize(screen,(80,60))
print('Loop took {} seconds'.format(time.time()-last_time))
last_time = time.time()
print('took time')
prediction = model.predict([screen.reshape(WIDTH,HEIGHT,1)])[0]
print('predicted')
moves = list(np.around(prediction))
print('got moves')
print(moves,prediction)
if moves == [1,0,0,0,0]:
straight()
elif moves == [0,1,0,0,0]:
left()
elif moves == [0,0,1,0,0]:
brake()
elif moves == [0,0,0,1,0]:
right()
elif moves == [0,0,0,0,1]:
handbrake()
keys = key_check()
if 'T' in keys:
if paused:
pased = False
time.sleep(1)
else:
paused = True
ReleaseKey(W)
ReleaseKey(A)
ReleaseKey(S)
ReleaseKey(D)
ReleaseKey(Sp)
time.sleep(1)
main()
I've found that the line that crashes python and spawns the first three bugs is this line:
prediction = model.predict([screen.reshape(WIDTH,HEIGHT,1)])[0]
When running the code, the CPU goes up to a whopping 100%, suggesting that something is seriously off. GPU goes to about 40-50%
I've tried Tensorflow 1.2 and 1.3, as well as CUDA 8, to no good. When installing CUDA I do not install the specific drivers, since they are too old for my GPU. Tried different CUDnn's too, did no good.

In my case, the issue happened because another python console with tensorflow imported was running. Closing it solved the problem.
I have Windows 10, the main errors were :
failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
Could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED

Probably you're running out of GPU memory.
If you're using TensorFlow 1.x:
1st option) set allow_growth to true.
import tensorflow as tf
config = tf.ConfigProto()
config.gpu_options.allow_growth=True
sess = tf.Session(config=config)
2nd option) set memory fraction.
# change the memory fraction as you want
import tensorflow as tf
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.3)
sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))
If you're using TensorFlow 2.x:
1st option) set set_memory_growth to true.
# Currently the ‘memory growth’ option should be the same for all GPUs.
# You should set the ‘memory growth’ option before initializing GPUs.
import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
try:
for gpu in gpus:
tf.config.experimental.set_memory_growth(gpu, True)
except RuntimeError as e:
print(e)
2nd option) set memory_limit as you want.
Just change the index of gpus and memory_limit in this code below.
import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
try:
tf.config.experimental.set_virtual_device_configuration(gpus[0], [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=1024)])
except RuntimeError as e:
print(e)

Try to set:
os.environ['TF_FORCE_GPU_ALLOW_GROWTH'] = 'true' solved my problem
my environment:
Cudnn 7.6.5
Tensorflow 2.4
Cuda Toolkit 10.1
RTX 2060

Try to add the cuda path to environment variable. It's seems that the problem it's with cuda.
Set the CUDA Path in ~/.bashrc (edit with nano):
#Cuda Nvidia path
$ export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64"
$ export CUDA_HOME=/usr/local/cuda

I encountered the same problem, then I found out that because I'm also using GPU for run other stuffs even it doesn't show on task manager (windows) using GPU. Maybe even things like (rendering videos, video encoding or play heavy workload game, coin mining...).
If you think it's still using heavy GPU, just close it off and problem solve.

I had an almost identical problem. Fixed it by reinstalling tensorflow-gpu.
conda uninstall tensorflow-gpu
conda install tensorflow-gpu
I think pip should work as well.

Related

Could not load dynamic library 'libcudart.so.11.0' ? / failed call to cuInit: UNKNOWN ERROR (303)?

I am a beginner TensorFlow user and am running into the following issue when attempting to load an already saved model for segmentation on a test images.
i installed all the libraries on a virtual environment that i created.
The same code runs on google colab and now i am trying to run it on my machine.
My Environment
Ubunutu 16 /
tensorflow 2.5.0
My code
When running the code :
import os
from glob import glob
from tqdm import tqdm
import cv2
import tensorflow as tf
import nibabel as nib
import numpy as np
import matplotlib.pyplot as plt
test_images = sorted(glob("/mnt/DATA2To/projet/all/Souris/SOD/data/20210726_C321/EXVIVO/IRM/test-20210726_C321/*")) # images 160x120x1
i = 0 # iterator initialized to zero
model = tf.keras.models.load_model("/mnt/DATA2To/projet/all/Souris/SOD/segmentation/segmentation-moelle.h5", compile= False )
`
for path in tqdm(test_images, total=len(test_images)):
x = nib.load(path) # load the images (160x1x120)
new_header = header=x.header.copy() # copy the header to a variable for writing the results at the end
x = nib.load(path).get_data() # get the data from the image loaded
original_image = x
original_image_bis = original_image.transpose((0,2,1))
h, w, _ = original_image_bis.shape
original_image_bis = cv2.resize(original_image_bis, (w, h))
x = x.transpose((0,2,1)) # permute the image axes to (160x120x1)
x = cv2.resize(x, (128, 128)) # resize the image to have a shape of (128x128)
x = (x - x.min()) / (x.max() - x.min()) # do the min-max normalisation
x.shape= x.shape + (1,) # add the third axes (128x128x1)
x = x.astype(np.float32)
x1 = np.expand_dims(x, axis=0)
pred_mask = model.predict(x1)[0]
#pred_mask = (np.where(pred_mask > np.mean(pred_mask), 1,0))
pred_mask = pred_mask.astype(np.float32)
pred_mask1 = cv2.resize(pred_mask, (w, h))
pred_mask1 = (np.where(pred_mask1 > 0.92, 1,0))
pred_mask1.shape= pred_mask1.shape + (1,) # add the third axes (160x120x1)
pred_mask1 = pred_mask1.transpose((0,2,1)) #permute the image axes to (160x1x120)
Sform= new_header.get_base_affine()
pred_mask2 = nib.Nifti1Image(pred_mask1,None, header= new_header)
fname= "/mnt/DATA2To/projet/all/Souris/SOD/data/20210726_C321/EXVIVO/IRM/results-moelle/image%04d.nii" %i
nib.save(pred_mask2, fname)
i+=1
My Error
I am greeted with this error :
(venv) etudiant#PTT:~$ python3 '/home/etudiant/Documents/code/Segmentation.py'
2021-07-28 09:58:12.539200: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /mnt/software//mrtrix/lib::/opt/minc/lib:/opt/minc/lib/InsightToolkit
2021-07-28 09:58:12.539221: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2021-07-28 09:58:13.429146: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /mnt/software//mrtrix/lib::/opt/minc/lib:/opt/minc/lib/InsightToolkit
2021-07-28 09:58:13.429164: W tensorflow/stream_executor/cuda/cuda_driver.cc:326] failed call to cuInit: UNKNOWN ERROR (303)
2021-07-28 09:58:13.429179: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (PTT): /proc/driver/nvidia/version does not exist
2021-07-28 09:58:13.429322: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
0it [00:00, ?it/s]
Can anyone would tell me what is wrong and how to fix that!?
W stands for "Warning" and I stands for "Information".
There are no problems with your code, TF just tells you it did not find the libraries required for GPU computation; this does not mean that TensorFlow does not run successfully on CPU.
What you can do instead to avoid receiving such messages in the future is to suppress the warnings.
Solution 1:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
import tensorflow as tf
Level 2 means to ignore warning and information, and print only the error.
Solution 2:
import tensorflow as tf
tf.logging.set_verbosity(tf.logging.ERROR)
Solution 3:
import logging
tf.get_logger().setLevel(logging.ERROR)

Getting Memory usage of a tensorflow model with guppy.hpy() not working

I am loading a saved model of tensorflow (.pb file) and trying to evaluate how much memory it allocates for the model with guppy package. Following a simple tutorial, here is what i tried:
from guppy import hpy
import tensorflow as tf
heap = hpy()
print("Heap Status at starting: ")
heap_status1 = heap.heap()
print("Heap Size : ", heap_status1.size, " bytes\n")
print(heap_status1)
heap.setref()
print("\nHeap Status after setting reference point: ")
heap_status2 = heap.heap()
print("Heap size: ", heap_status2.size, " bytes\n")
print(heap_status2)
model_path = "./saved_model/" #.pb file directory
model = tf.saved_model.load(model_path)
print("\nHeap status after creating model: ")
heap_status3 = heap.heap()
print("Heap size: ", heap_status3.size, " bytes\n")
print(heap_status3)
print("Memory used by the model: ", heap_status3.size - heap_status2.size)
I don't know why, but when i run the code it suddenly stops executing when i call heap_status1 = heap.heap(). It doesn't throw any error.
This same code runs fine when i don't use anything related to tensorflow, i.e. it runs successfully when i just create some random lists, strings, etc instead of loading a tensorflow model.
Note: my model will run in a CPU device. Unfortunately, tf.config.experimental.get_memory_info works with GPUs only.
If you are on Windows, the crash may be related to https://github.com/zhuyifei1999/guppy3/issues/25. Check pywin32 version and if it is < 300, upgrade pywin32 with
pip install -U pywin32

setUpNet DNN module was not built with CUDA backend; switching to CPU

I want to run my script python with GPU as u see in this photo
I used the command line: watch nvidia-smi,to show Processes of GPU, unfortunately the script python use just 41Mib of GPU capacity:
this is a part of my code :
import time
import math
import cv2
import numpy as np
labelsPath = "./coco.names"
LABELS = open(labelsPath).read().strip().split("\n")
np.random.seed(42)
weightsPath = "./yolov3.weights"
configPath = "./yolov3.cfg"
net = cv2.dnn.readNetFromDarknet(configPath, weightsPath)
ln = net.getLayerNames()
ln = [ln[i[0] - 1] for i in net.getUnconnectedOutLayers()]
FR=0
vs = cv2.VideoCapture(vid_path)
# vs = cv2.VideoCapture(0) ## USe this if you want to use webcam feed
writer = None
(W, H) = (None, None)
fl = 0
q = 0
while True:
(grabbed, frame) = vs.read()
if not grabbed:
break
if W is None or H is None:
(H, W) = frame.shape[:2]
FW=W
if(W<1075):
FW = 1075
FR = np.zeros((H+210,FW,3), np.uint8)
col = (255,255,255)
FH = H + 210
FR[:] = col
blob = cv2.dnn.blobFromImage(frame, 1 / 255.0, (416, 416),
swapRB=True, crop=False)
net.setInput(blob)
start = time.time()
layerOutputs = net.forward(ln)
end = time.time()
I tried to add this command line to force run with GPU ,
net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA)
net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA)
then after running the script again it gives me this message and continue running the script with CPU :
[ WARN:0] global /io/opencv/modules/dnn/src/dnn.cpp (1363) setUpNet DNN module was not built with CUDA backend; switching to CPU
You'll need to manually build OpenCV to work with your GPU.
Here is a great tutorial on how to do so.
You might have to uninstall your opencv-python package using pip in case you are already having one, only then will the custom built opencv be accessible to the program.
pip3 uninstall opencv-python
Compatibility chart of cuda and cudnn:
https://docs.nvidia.com/deeplearning/cudnn/support-matrix/index.html#cudnn-cuda-hardware-versions
Checking the computation capability version from:
https://en.wikipedia.org/wiki/CUDA
Which is 7.5
In GPU supported, for 7.5 computation capability, CUDA SDK 11.0 – 11.2 support for compute capability 3.5 – 8.6 (Kepler (in part), Maxwell, Pascal, Volta, Turing, Ampere):
check for your Supported NVIDIA Hardware.
In my case, I was using Tesla T4 having Turing, which is compatible with cuDNN.
so in compilation report, you can see that Cmake returns cuDNN availability as "NO":
Got the docker Image Using:
sudo docker nvidia/cuda:11.1-cudnn8-runtime-ubuntu18.04
Compiled Opencv Cuda from:
https://www.pyimagesearch.com/2020/02/03/how-to-use-opencvs-dnn-module-with-nvidia-gpus-cuda-and-cudnn/

Tensorflow Read from HDFS mac : java.lang.NoSuchFieldError: LOG

I am trying to read from external hadoop from tensorflow on my mac. I have built tf with hadoop support from source, and also build hadoop with native library support on my mac. I am getting the following error ,
hdfsBuilderConnect(forceNewInstance=0, nn=192.168.60.53:9000, port=0, kerbTicketCachePath=(NULL), userName=(NULL)) error:
java.lang.NoSuchFieldError: LOG
at org.apache.hadoop.ipc.ClientCache.getClient(ClientCache.java:62)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.<init>(ProtobufRpcEngine.java:145)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.<init>(ProtobufRpcEngine.java:133)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.<init>(ProtobufRpcEngine.java:119)
at org.apache.hadoop.ipc.ProtobufRpcEngine.getProxy(ProtobufRpcEngine.java:102)
at org.apache.hadoop.ipc.RPC.getProtocolProxy(RPC.java:579)
at org.apache.hadoop.hdfs.NameNodeProxies.createNNProxyWithClientProtocol(NameNodeProxies.java:418)
at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:314)
at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:678)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:619)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
at org.apache.hadoop.fs.FileSystem$1.run(FileSystem.java:162)
at org.apache.hadoop.fs.FileSystem$1.run(FileSystem.java:159)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:159)
2018-10-05 16:01:21.867554: W tensorflow/core/kernels/queue_base.cc:277] _0_input_producer: Skipping cancelled enqueue attempt with queue not closed
Traceback (most recent call last):
This is my code:
import tensorflow as tf
def create_file_reader_ops(filename_queue):
reader = tf.TextLineReader(skip_header_lines=1)
_, csv_row = reader.read(filename_queue)
record_defaults = [[""], [""], [0], [0]]
country, code, gold, silver = tf.decode_csv(csv_row, record_defaults=record_defaults)
features = tf.stack([gold, silver])
return features, country
filename_queue = tf.train.string_input_producer([
"hdfs://192.168.60.53:9000/iris_data_multiclass.csv",
])
example, country = create_file_reader_ops(filename_queue)
with tf.Session() as sess:
tf.global_variables_initializer().run()
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
while True:
try:
example_data, country_name = sess.run([example, country])
print(example_data, country_name)
except tf.errors.OutOfRangeError:
break
I have build hadoop from source on mac.
$ hadoop version
Hadoop 2.7.3
Subversion https://github.com/apache/hadoop.git -r baa91f7c6bc9cb92be5982de4719c1c8af91ccff
Compiled by himaprasoon on 2018-10-04T11:09Z
Compiled with protoc 2.5.0
From source with checksum 2e4ce5f957ea4db193bce3734ff29ff4
This command was run using /Users/himaprasoon/git/hadoop/hadoop-dist/target/hadoop-2.7.3/share/hadoop/common/hadoop-common-2.7.3.jar
hadoop checknative output
$ hadoop checknative
18/10/05 16:15:05 INFO bzip2.Bzip2Factory: Successfully loaded & initialized native-bzip2 library libbz2.dylib
18/10/05 16:15:05 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
Native library checking:
hadoop: true /Users/himaprasoon/git/hadoop/hadoop-dist/target/hadoop-2.7.3/lib/native/libhadoop.dylib
zlib: true /usr/lib/libz.1.dylib
snappy: true /usr/local/lib/libsnappy.1.dylib
lz4: true revision:99
bzip2: true /usr/lib/libbz2.1.0.dylib
openssl: true /usr/local/lib/libcrypto.dylib
tf version : 1.10.1
Any ideas what I might be doing wrong?
here are my environment variables.
HADOOP_HOME=/Users/himaprasoon/git/hadoop/hadoop-dist/target/hadoop-2.7.3/
HADOOP_MAPRED_HOME=$HADOOP_HOME
HADOOP_COMMON_HOME=$HADOOP_HOME
HADOOP_HDFS_HOME=$HADOOP_HOME
YARN_HOME=$HADOOP_HOME
HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
HADOOP_INSTALL=$HADOOP_HOME
OPENSSL_ROOT_DIR="/usr/local/opt/openssl"
LDFLAGS="-L${OPENSSL_ROOT_DIR}/lib"
CPPFLAGS="-I${OPENSSL_ROOT_DIR}/include"
PKG_CONFIG_PATH="${OPENSSL_ROOT_DIR}/lib/pkgconfig"
OPENSSL_INCLUDE_DIR="${OPENSSL_ROOT_DIR}/include"
PATH="/usr/local/opt/protobuf#2.5/bin:$PATH
HADOOP_OPTS="-Djava.library.path=${HADOOP_HOME}/lib/native"
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${HADOOP_HOME}/lib/native
JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:${HADOOP_HOME}/lib/native
this is how I am running my program
CLASSPATH=$($HADOOP_HDFS_HOME/bin/hdfs classpath --glob) python3.6 myfile.py
references used to build tf and hadoop
Hadoop native libraries not found on OS/X
https://medium.com/#s.matthew.english/build-hadoop-from-source-on-macos-a3fb2b958b6c
Can Tensorflow read from HDFS on Mac?
https://gist.github.com/zedar/f631ace0759c1d512573
Have you read this post?
Tensorflow Enqueue operation was cancelled
It seems there is a workaround for the same error message there:
The problem happens at the very last stage when python tries to kill threads.
To do this properly you should create a train.Coordinator and pass it to your
queue_runner (no need to pass sess, as default session will be used>
with tf.Session() as sess:
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
// do your things
coord.request_stop()
coord.join(threads)
The last two lines should be added to your while loop to make sure all threads are properly killed.

CompilerWarning with OpenCL

Woke up today and all of a sudden get
C:\Python27\lib\site-packages\pyopencl\__init__.py:61: CompilerWarning: Non-empty compiler output encountered. Set the environment variable PYOPENCL_COMPILER_OUTPUT=1 to see more.
"to see more.", CompilerWarning)
C:\Python27\lib\site-packages\pyopencl\cache.py:101: UserWarning: could not obtain cache lock--delete 'c:\users\User\appdata\local\temp\pyopencl-compiler-cache-v2-uiduser-py2.7.3.final.0\lock' if necessary
% self.lock_file)
When I ran any sort of PqOpenCL code, ex:
import numpy
import pyopencl as cl
import pyopencl.array as clarray
from pyopencl.reduction import ReductionKernel
ctx = cl.create_some_context()
queue = cl.CommandQueue(ctx)
krnl = ReductionKernel(ctx, numpy.float32, neutral="0",
reduce_expr="a+b", map_expr="x[i]*y[i]",
arguments="__global float *x, __global float *y")
x = clarray.arange(queue, 400, dtype=numpy.float32)
y = clarray.arange(queue, 400, dtype=numpy.float32)
m = krnl(x, y).get()
Sample and part of the solution came from here
Solution suggested rolling back numpy, which I did from 1.8.0 to 1.7.2 but still same problem
Edit 1
Added as per suggestion
import os
os.environ['PYOPENCL_COMPILER_OUTPUT'] = '1'
C:\Python27\lib\site-packages\pyopencl\__init__.py:57: CompilerWarning: From-source build succeeded, but resulted in non-empty logs:
Build on <pyopencl.Device 'Intel(R) HD Graphics 4000' on 'Intel(R) OpenCL' at 0x51eadff0> succeeded, but said:
fcl build 1 succeeded.
fcl build 2 succeeded.
bcl build succeeded.
warn(text, CompilerWarning)
import os
os.environ['PYOPENCL_COMPILER_OUTPUT'] = '1'
Do this to see the compiler output, i've gotten the same message before. It was just the intel opencl compiler saying it had vectorized\optimized the opencl kernel.

Categories