Blas GEMV launch failed: m=3, n=2 [Op:MatMul] - python

When I run the following code, I get the error:
E tensorflow/stream_executor/cuda/] failed to run cuBLAS routine cublasSgemv_v2: CUBLAS_STATUS_EXECUTION_FAILED
Traceback (most recent call last):
File "", line 16, in <module>
File "/home/cxsbg/anaconda3/envs/dl36/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/engine/", line 314, in __call__
output = super(Layer, self).__call__(inputs, *args, **kwargs)
File "/home/cxsbg/anaconda3/envs/dl36/lib/python3.6/site-packages/tensorflow/python/layers/", line 717, in __call__
outputs =, *args, **kwargs)
File "", line 10, in call
File "/home/cxsbg/anaconda3/envs/dl36/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/engine/", line 314, in __call__
output = super(Layer, self).__call__(inputs, *args, **kwargs)
File "/home/cxsbg/anaconda3/envs/dl36/lib/python3.6/site-packages/tensorflow/python/layers/", line 717, in __call__
outputs =, *args, **kwargs)
File "/home/cxsbg/anaconda3/envs/dl36/lib/python3.6/site-packages/tensorflow/python/layers/", line 163, in call
outputs = gen_math_ops.mat_mul(inputs, self.kernel)
File "/home/cxsbg/anaconda3/envs/dl36/lib/python3.6/site-packages/tensorflow/python/ops/", line 4305, in mat_mul
_six.raise_from(_core._status_to_exception(e.code, message), None)
File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InternalError: Blas GEMV launch failed: m=3, n=2 [Op:MatMul]
My gpu is RTX2080 and the driver is v410. The cuda is v9.0, the cudnn is v7. The tensorflow-gpu is v1.8 (I tired on both v1.8 and v1.12). The python is v3.6 (I tried on both v3.6 and v2.7). The system is Ubuntu 16.04 (I also tired on win10).
The problem always occurs on tensorflow-gpu, but it works on tensorflow cpu.
Code is here (a simple linear model):
import tensorflow as tf
class Linear(tf.keras.Model):
def __init__(self):
def call(self,input):
return output
for i in range(1000):
with tf.GradientTape() as tape:

I think the error is caused by the tf.enable_eager_execution() as I test it many times. Thanks to the author which-version-of-cuda-can-work-with-rtx-2080. When I use cuda9.2, the error is fixed.


TensorFlow inference from a SavedModel: Expecting int64_t value for attr strides, got numpy.int32

I'm trying to use a pre-trained tensorflow model to classify an image.
I downloaded the efficientnet model from tensorflow hub.
The python code loads the model from the .pb file.
It then loads a sample image, resizes the image to 224x224, squishes the rgb values to [0,1] and adds another dimension to make it 4d (collection of images) as the model expects.
Use col_x for inference. The final input shape that is given to the model is (1, 224, 224, 3).
import os
import tensorflow as tf
from tensorflow import keras
path = os.path.join(os.getcwd(), 'efficientnet')
model = keras.models.load_model(path)
from matplotlib import pyplot as plt
import matplotlib.image as mpimg
from PIL import Image
import numpy as np
img ="data/zebra.jpg")
img = img.resize((224, 224), Image.ANTIALIAS)
x = tf.keras.preprocessing.image.img_to_array(img)
norm_x = x / 255
col_x = norm_x[np.newaxis,...]
But I get this error:
Traceback (most recent call last):
File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\", line 926, in conv2d
"dilations", dilations)
tensorflow.python.eager.core._FallbackException: Expecting int64_t value for attr strides, got numpy.int32
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<pyshell>", line 1, in <module>
File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\keras\engine\", line 968, in __call__
outputs =, *args, **kwargs)
File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\keras\engine\", line 719, in call
File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\keras\engine\", line 888, in _run_internal_graph
output_tensors = layer(computed_tensors, **kwargs)
File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\keras\engine\", line 968, in __call__
outputs =, *args, **kwargs)
File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\keras\layers\", line 207, in call
outputs = self._convolution_op(inputs, self.kernel)
File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\", line 1106, in __call__
return self.conv_op(inp, filter)
File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\", line 638, in __call__
return, filter)
File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\", line 237, in __call__
File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\", line 2014, in conv2d
File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\", line 933, in conv2d
data_format=data_format, dilations=dilations, name=name, ctx=_ctx)
File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\", line 1022, in conv2d_eager_fallback
ctx=ctx, name=name)
File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\eager\", line 60, in quick_execute
inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [Op:Conv2D]
I was able could reproduce your error, Please the change the last line.
to either predict or evaluate or any function that you would want to infer.
You are using model(col_x) , which is sending the image directly to the model as an class attribute.
Also, for the other error I do not think your system is using the GPU if available, please install the correct version of Tensorflow and CUDA for that purpose.
Visit this answer Which TensorFlow and CUDA version combinations are compatible? for correcting it.

How to properly run my project using GPU?

I am really new to torch and machine learning. I am trying to run my project using GPU. I tried to have such modification to my code:
model = Challenge()
model ='cuda'))
However, I am still having following error:
Traceback (most recent call last):
File "C:/Users/ruidong/Desktop/YZR temp/Project2/", line 112, in <module>
File "C:/Users/ruidong/Desktop/YZR temp/Project2/", line 91, in main
File "C:/Users/ruidong/Desktop/YZR temp/Project2/", line 40, in _evaluate_epoch
output = model(X)
File "C:\Users\ruidong\Anaconda3\envs\EECS445\lib\site-packages\torch\nn\modules\", line 532, in __call__
result = self.forward(*input, **kwargs)
File "C:\Users\ruidong\Desktop\YZR temp\Project2\model\", line 48, in forward
z = F.relu(self.conv1(x))
File "C:\Users\ruidong\Anaconda3\envs\EECS445\lib\site-packages\torch\nn\modules\", line 532, in __call__
result = self.forward(*input, **kwargs)
File "C:\Users\ruidong\Anaconda3\envs\EECS445\lib\site-packages\torch\nn\modules\", line 345, in forward
return self.conv2d_forward(input, self.weight)
File "C:\Users\ruidong\Anaconda3\envs\EECS445\lib\site-packages\torch\nn\modules\", line 342, in conv2d_forward
self.padding, self.dilation, self.groups)
RuntimeError: Expected object of device type cuda but got device type cpu for argument #1 'self' in call to _thnn_conv2d_forward
Any suggestions? really appreciate.
The model is correctly moved to GPU. However, for a model that is placed in GPU, you need to pass the tensors that are in GPU too. The error is because you are passing tensor that is placed in CPU in a model that is in GPU. Just do the same for inputs before passing them to model

writing out neural network inference test code

I am trying to modify an inference code for pruned SqueezeNet network
However, I faced the following error. Could anyone comment how to go around this cpu/gpu backend error ?
[kevin#linux SqueezeNet-Pruning]$ python --image “3_100.jpg” --model “model_prunned” --num_class “2”
prediction in progress
Traceback (most recent call last):
File “”, line 63, in
prediction = predict_image(imagepath)
File “”, line 47, in predict_image
output = model(input)
File “/usr/lib/python3.7/site-packages/torch/nn/modules/”, line 477, in call
result = self.forward(*input, **kwargs)
File “/home/kevin/Documents/Grive/Personal/Coursera/Machine_Learning/pruning/Pruning-CNN/SqueezeNet-Pruning/”, line 39, in forward
x = self.features(x)
File “/usr/lib/python3.7/site-packages/torch/nn/modules/”, line 477, in call
result = self.forward(*input, **kwargs)
File “/usr/lib/python3.7/site-packages/torch/nn/modules/”, line 92, in forward
input = module(input)
File “/usr/lib/python3.7/site-packages/torch/nn/modules/”, line 477, in call
result = self.forward(*input, **kwargs)
File “/usr/lib/python3.7/site-packages/torch/nn/modules/”, line 313, in forward
self.padding, self.dilation, self.groups)
RuntimeError: Expected object of backend CPU but got backend CUDA for argument #2 ‘weight’
[kevin#linux SqueezeNet-Pruning]$
I think it can relate to the use of GPU. I think the program might work with correct configuration with GPU. Or you can delete line 39, 40.

KeyError: u'NearestNeighbors' on loading saved model from tf.contrib.factorization.KMeansClustering

I am trying to do the following:
Run kmeans clustering using tensorflow (1.8.0)
Save the model using kmeans.export_savedmodel
Use the model using tf.saved_model.loader.load
I am using the exact script at:
I am using following code for saving the model:
Input Reciever:
def serving_input_receiver_fn():
feature_spec = {"x": tf.FixedLenFeature(dtype=tf.float32, shape=[2])}
model_placeholder = tf.placeholder(dtype=tf.string,shape=[None],name='input')
receiver_tensors = {"model_inputs": model_placeholder}
features = tf.parse_example(model_placeholder, feature_spec)
return tf.estimator.export.ServingInputReceiver(features, receiver_tensors)
kmeans.export_savedmodel("/path/", serving_input_receiver_fn)
To import I use:
tf.saved_model.loader.load(sess, [tf.saved_model.tag_constants.SERVING],"/path")
On last step I run into this issue:
Traceback (most recent call last):
File "", line 6, in <module>
tf.saved_model.loader.load(sess, [tf.saved_model.tag_constants.SERVING], "/Users/z001t3k/work/codebase/ContentPipeline/cep-scripts/cep/datacollection/algorithms/cluster_model/1525963476")
File "/Users/z001t3k/python_virtualenvs/tensorflow/lib/python2.7/site-packages/tensorflow/python/saved_model/", line 219, in load
saver = tf_saver.import_meta_graph(meta_graph_def_to_load, **saver_kwargs)
File "/Users/z001t3k/python_virtualenvs/tensorflow/lib/python2.7/site-packages/tensorflow/python/training/", line 1955, in import_meta_graph
File "/Users/z001t3k/python_virtualenvs/tensorflow/lib/python2.7/site-packages/tensorflow/python/framework/", line 743, in import_scoped_meta_graph
File "/Users/z001t3k/python_virtualenvs/tensorflow/lib/python2.7/site-packages/tensorflow/python/util/", line 432, in new_func
return func(*args, **kwargs)
File "/Users/z001t3k/python_virtualenvs/tensorflow/lib/python2.7/site-packages/tensorflow/python/framework/", line 460, in import_graph_def
_RemoveDefaultAttrs(op_dict, producer_op_list, graph_def)
File "/Users/z001t3k/python_virtualenvs/tensorflow/lib/python2.7/site-packages/tensorflow/python/framework/", line 227, in _RemoveDefaultAttrs
op_def = op_dict[node.op]
KeyError: u'NearestNeighbors'
Tensorflow is having trouble locating the NearestNeighbors op, which is part of the graph you're loading. Ops defined in contrib are loaded dynamically when you import the corresponding contrib package in Python.
So just add
import tensorflow.contrib.factorization
before loading the SavedModel.

Tensorflow sound recognition tutorial gives error: op_def = op_dict[node.op] KeyError: 'DecodeWav'

I am trying to import a pretrained tensorflow model (the default sound recognition one in the tutorial) and I keep getting this error.
I tried importing using both a checkpoint file and a .pb file, and as a beginner, I have no idea about this error. Any help would be appreciated!
I have tried this on Debian and Windows 10, python3.5 and python 3.6 with multiple versions of tensorflow.
Traceback (most recent call last):
File "C:\tmp\speech_commands_train\", line 4, in <module>
saver = tf.train.import_meta_graph('conv.ckpt-18000.meta')
File "C:\Dev\Python36\lib\site-packages\tensorflow\python\training\", line 1927, in import_meta_graph
File "C:\Dev\Python36\lib\site-packages\tensorflow\python\framework\", line 741, in import_scoped_meta_graph
File "C:\Dev\Python36\lib\site-packages\tensorflow\python\util\", line 432, in new_func
return func(*args, **kwargs)
File "C:\Dev\Python36\lib\site-packages\tensorflow\python\framework\", line 457, in import_graph_def
_RemoveDefaultAttrs(op_dict, producer_op_list, graph_def)
File "C:\Dev\Python36\lib\site-packages\tensorflow\python\framework\", line 227, in _RemoveDefaultAttrs
op_def = op_dict[node.op]
KeyError: 'DecodeWav'
This is the code that I am using to import:
import tensorflow as tf
sess = tf.Session()
saver = tf.train.import_meta_graph('conv.ckpt-18000.meta')
saver.restore(sess, tf.train.latest_checkpoint('./'))
