How to properly run my project using GPU? - python

I am really new to torch and machine learning. I am trying to run my project using GPU. I tried to have such modification to my code:
model = Challenge()
model = model.to(torch.device('cuda'))
However, I am still having following error:
Traceback (most recent call last):
File "C:/Users/ruidong/Desktop/YZR temp/Project2/train_challenge.py", line 112, in <module>
main()
File "C:/Users/ruidong/Desktop/YZR temp/Project2/train_challenge.py", line 91, in main
stats)
File "C:/Users/ruidong/Desktop/YZR temp/Project2/train_challenge.py", line 40, in _evaluate_epoch
output = model(X)
File "C:\Users\ruidong\Anaconda3\envs\EECS445\lib\site-packages\torch\nn\modules\module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "C:\Users\ruidong\Desktop\YZR temp\Project2\model\challenge.py", line 48, in forward
z = F.relu(self.conv1(x))
File "C:\Users\ruidong\Anaconda3\envs\EECS445\lib\site-packages\torch\nn\modules\module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "C:\Users\ruidong\Anaconda3\envs\EECS445\lib\site-packages\torch\nn\modules\conv.py", line 345, in forward
return self.conv2d_forward(input, self.weight)
File "C:\Users\ruidong\Anaconda3\envs\EECS445\lib\site-packages\torch\nn\modules\conv.py", line 342, in conv2d_forward
self.padding, self.dilation, self.groups)
RuntimeError: Expected object of device type cuda but got device type cpu for argument #1 'self' in call to _thnn_conv2d_forward
Any suggestions? really appreciate.

The model is correctly moved to GPU. However, for a model that is placed in GPU, you need to pass the tensors that are in GPU too. The error is because you are passing tensor that is placed in CPU in a model that is in GPU. Just do the same for inputs before passing them to model

Related

TensorFlow inference from a SavedModel: Expecting int64_t value for attr strides, got numpy.int32

I'm trying to use a pre-trained tensorflow model to classify an image.
I downloaded the efficientnet model from tensorflow hub.
The python code loads the model from the .pb file.
It then loads a sample image, resizes the image to 224x224, squishes the rgb values to [0,1] and adds another dimension to make it 4d (collection of images) as the model expects.
Use col_x for inference. The final input shape that is given to the model is (1, 224, 224, 3).
import os
import tensorflow as tf
from tensorflow import keras
print(tf.version.VERSION)
path = os.path.join(os.getcwd(), 'efficientnet')
model = keras.models.load_model(path)
from matplotlib import pyplot as plt
import matplotlib.image as mpimg
from PIL import Image
import numpy as np
img = Image.open("data/zebra.jpg")
img = img.resize((224, 224), Image.ANTIALIAS)
x = tf.keras.preprocessing.image.img_to_array(img)
plt.imshow(img)
plt.show()
norm_x = x / 255
col_x = norm_x[np.newaxis,...]
plt.imshow(col_x[0])
plt.show()
model(col_x)
But I get this error:
Traceback (most recent call last):
File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\gen_nn_ops.py", line 926, in conv2d
"dilations", dilations)
tensorflow.python.eager.core._FallbackException: Expecting int64_t value for attr strides, got numpy.int32
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<pyshell>", line 1, in <module>
File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\keras\engine\base_layer.py", line 968, in __call__
outputs = self.call(cast_inputs, *args, **kwargs)
File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\keras\engine\network.py", line 719, in call
convert_kwargs_to_constants=base_layer_utils.call_context().saving)
File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\keras\engine\network.py", line 888, in _run_internal_graph
output_tensors = layer(computed_tensors, **kwargs)
File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\keras\engine\base_layer.py", line 968, in __call__
outputs = self.call(cast_inputs, *args, **kwargs)
File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\keras\layers\convolutional.py", line 207, in call
outputs = self._convolution_op(inputs, self.kernel)
File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\nn_ops.py", line 1106, in __call__
return self.conv_op(inp, filter)
File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\nn_ops.py", line 638, in __call__
return self.call(inp, filter)
File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\nn_ops.py", line 237, in __call__
name=self.name)
File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\nn_ops.py", line 2014, in conv2d
name=name)
File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\gen_nn_ops.py", line 933, in conv2d
data_format=data_format, dilations=dilations, name=name, ctx=_ctx)
File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\gen_nn_ops.py", line 1022, in conv2d_eager_fallback
ctx=ctx, name=name)
File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\eager\execute.py", line 60, in quick_execute
inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [Op:Conv2D]
I was able could reproduce your error, Please the change the last line.
to either predict or evaluate or any function that you would want to infer.
model.predict(col_x)
You are using model(col_x) , which is sending the image directly to the model as an class attribute.
Also, for the other error I do not think your system is using the GPU if available, please install the correct version of Tensorflow and CUDA for that purpose.
Visit this answer Which TensorFlow and CUDA version combinations are compatible? for correcting it.
Cheers.

keras won't allow me to add layers

I'm trying to make a neural network using keras and everytime I try to add a layer I get a list of errors relating to the call the way I'm calling it is model.add(Dense(768,input_dim=3072,init='uniform',activation='relu'))
and the errors I get are the following:
Traceback (most recent call last):
File "nn2.py", line 52, in <module>
model.add(Dense(768,input_dim=3072,init='uniform',activation='relu'))
File "/Users/lens/Documents/NNproject/neuralenv/lib/python3.7/site-packages/tensorflow/python/keras/layers/core.py", line 1132, in __init__
activity_regularizer=regularizers.get(activity_regularizer), **kwargs)
File "/Users/lens/Documents/NNproject/neuralenv/lib/python3.7/site-packages/tensorflow/python/training/tracking/base.py", line 456, in _method_wrapper
result = method(self, *args, **kwargs)
File "/Users/lens/Documents/NNproject/neuralenv/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 294, in __init__
generic_utils.validate_kwargs(kwargs, allowed_kwargs)
File "/Users/lens/Documents/NNproject/neuralenv/lib/python3.7/site-packages/tensorflow/python/keras/utils/generic_utils.py", line 792, in validate_kwargs
raise TypeError(error_message, kwarg)
TypeError: ('Keyword argument not understood:', 'init')
Does anyone have a fix for this?
There isn't a single init argument for keras Dense layers. You'll need to specify the initialization for kernel_initializer and bias_initializer separately.

How can I convert Tensorflow frozen graph to TF Lite model?

I am using Faster RCNN, repo that I am using can be found in the link, to detect cars in a video frame. I used Keras 2.2.3 and Tensorflow 1.15.0. I want to deploy and run it on my Android device. Each part in Faster RCNN is implemented in Keras and in order to deploy it on Android I want to convert them to TF Lite model. The final network, the classifier, has a custom layer which is called RoiPoolingConv and I cannot convert the final network to TF Lite. At first, I have tried the following
converter = tf.lite.TFLiteConverter.from_keras_model_file('model_classifier_with_architecture.h5',
custom_objects={"RoiPoolingConv": RoiPoolingConv})
tfmodel = converter.convert()
open ("model_cls.tflite" , "wb") .write(tfmodel)
This gives the following error
Traceback (most recent call last):
File "Keras-FasterRCNN/model_to_tflite.py", line 26, in <module>
custom_objects={"RoiPoolingConv": RoiPoolingConv})
File "/home/alp/.local/lib/python3.6/site-packages/tensorflow/lite/python/lite.py", line 747, in from_keras_model_file
keras_model = _keras.models.load_model(model_file, custom_objects)
File "/home/alp/.local/lib/python3.6/site-packages/tensorflow/python/keras/saving/save.py", line 146, in load_model
return hdf5_format.load_model_from_hdf5(filepath, custom_objects, compile)
File "/home/alp/.local/lib/python3.6/site-packages/tensorflow/python/keras/saving/hdf5_format.py", line 212, in load_model_from_hdf5
custom_objects=custom_objects)
File "/home/alp/.local/lib/python3.6/site-packages/tensorflow/python/keras/saving/model_config.py", line 55, in model_from_config
return deserialize(config, custom_objects=custom_objects)
File "/home/alp/.local/lib/python3.6/site-packages/tensorflow/python/keras/layers/serialization.py", line 89, in deserialize
printable_module_name='layer')
File "/home/alp/.local/lib/python3.6/site-packages/tensorflow/python/keras/utils/generic_utils.py", line 192, in deserialize_keras_object
list(custom_objects.items())))
File "/home/alp/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/network.py", line 1131, in from_config
process_node(layer, node_data)
File "/home/alp/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/network.py", line 1089, in process_node
layer(input_tensors, **kwargs)
File "/home/alp/.local/lib/python3.6/site-packages/keras/engine/base_layer.py", line 475, in __call__
previous_mask = _collect_previous_mask(inputs)
File "/home/alp/.local/lib/python3.6/site-packages/keras/engine/base_layer.py", line 1441, in _collect_previous_mask
mask = node.output_masks[tensor_index]
AttributeError: 'Node' object has no attribute 'output_masks'
As a workaround I tried was to convert Keras models to Tensorflow frozen graph and then do the TF Lite conversion on these frozen graphs. This yields the following error
Traceback (most recent call last):
File "/home/alp/.local/bin/toco_from_protos", line 11, in <module>
sys.exit(main())
File "/home/alp/.local/lib/python3.6/site-packages/tensorflow/lite/toco/python/toco_from_protos.py", line 59, in main
app.run(main=execute, argv=[sys.argv[0]] + unparsed)
File "/home/alp/.local/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/home/alp/.local/lib/python3.6/site-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/home/alp/.local/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "/home/alp/.local/lib/python3.6/site-packages/tensorflow/lite/toco/python/toco_from_protos.py", line 33, in execute
output_str = tensorflow_wrap_toco.TocoConvert(model_str, toco_str, input_str)
Exception: We are continually in the process of adding support to TensorFlow Lite for more ops. It would be helpful if you could inform us of how this conversion went by opening a github issue at https://github.com/tensorflow/tensorflow/issues/new?template=40-tflite-op-request.md
and pasting the following:
Some of the operators in the model are not supported by the standard TensorFlow Lite runtime. If those are native TensorFlow operators, you might be able to use the extended runtime by passing --enable_select_tf_ops, or by setting target_ops=TFLITE_BUILTINS,SELECT_TF_OPS when calling tf.lite.TFLiteConverter(). Otherwise, if you have a custom implementation for them you can disable this error with --allow_custom_ops, or by setting allow_custom_ops=True when calling tf.lite.TFLiteConverter(). Here is a list of builtin operators you are using: ADD, CAST, CONCATENATION, CONV_2D, DEPTHWISE_CONV_2D, FULLY_CONNECTED, MUL, PACK, RESHAPE, RESIZE_BILINEAR, SOFTMAX, STRIDED_SLICE. Here is a list of operators for which you will need custom implementations: AddV2.
Is there a way to achieve the conversion of model with custom layer to TF Lite model?

Blas GEMV launch failed: m=3, n=2 [Op:MatMul]

When I run the following code, I get the error:
E tensorflow/stream_executor/cuda/cuda_blas.cc:654] failed to run cuBLAS routine cublasSgemv_v2: CUBLAS_STATUS_EXECUTION_FAILED
Traceback (most recent call last):
File "modelAndLayer.py", line 16, in <module>
y_pred=model(X)
File "/home/cxsbg/anaconda3/envs/dl36/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/engine/base_layer.py", line 314, in __call__
output = super(Layer, self).__call__(inputs, *args, **kwargs)
File "/home/cxsbg/anaconda3/envs/dl36/lib/python3.6/site-packages/tensorflow/python/layers/base.py", line 717, in __call__
outputs = self.call(inputs, *args, **kwargs)
File "modelAndLayer.py", line 10, in call
output=self.dense(input)
File "/home/cxsbg/anaconda3/envs/dl36/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/engine/base_layer.py", line 314, in __call__
output = super(Layer, self).__call__(inputs, *args, **kwargs)
File "/home/cxsbg/anaconda3/envs/dl36/lib/python3.6/site-packages/tensorflow/python/layers/base.py", line 717, in __call__
outputs = self.call(inputs, *args, **kwargs)
File "/home/cxsbg/anaconda3/envs/dl36/lib/python3.6/site-packages/tensorflow/python/layers/core.py", line 163, in call
outputs = gen_math_ops.mat_mul(inputs, self.kernel)
File "/home/cxsbg/anaconda3/envs/dl36/lib/python3.6/site-packages/tensorflow/python/ops/gen_math_ops.py", line 4305, in mat_mul
_six.raise_from(_core._status_to_exception(e.code, message), None)
File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InternalError: Blas GEMV launch failed: m=3, n=2 [Op:MatMul]
My gpu is RTX2080 and the driver is v410. The cuda is v9.0, the cudnn is v7. The tensorflow-gpu is v1.8 (I tired on both v1.8 and v1.12). The python is v3.6 (I tried on both v3.6 and v2.7). The system is Ubuntu 16.04 (I also tired on win10).
The problem always occurs on tensorflow-gpu, but it works on tensorflow cpu.
Code is here (a simple linear model):
import tensorflow as tf
tf.enable_eager_execution()
X=tf.constant([[1.,2.,3,],[4.,5.,6.]])
Y=tf.constant([[10.],[20.]])
class Linear(tf.keras.Model):
def __init__(self):
super().__init__()
self.dense=tf.keras.layers.Dense(units=1,kernel_initializer=tf.zeros_initializer(),bias_initializer=tf.zeros_initializer())
def call(self,input):
output=self.dense(input)
return output
model=Linear()
optimizer=tf.train.GradientDescentOptimizer(learning_rate=1e-3)
for i in range(1000):
with tf.GradientTape() as tape:
y_pred=model(X)
loss=tf.reduce_mean(tf.square(y_pred-Y))
grads=tape.gradient(loss,model.variables)
optimizer.apply_gradients(zip(grads,model.variables))
print(model.variables)
I think the error is caused by the tf.enable_eager_execution() as I test it many times. Thanks to the author which-version-of-cuda-can-work-with-rtx-2080. When I use cuda9.2, the error is fixed.

writing out neural network inference test code

I am trying to modify an inference code for pruned SqueezeNet network
However, I faced the following error. Could anyone comment how to go around this cpu/gpu backend error ?
[kevin#linux SqueezeNet-Pruning]$ python predict.py --image “3_100.jpg” --model “model_prunned” --num_class “2”
prediction in progress
Traceback (most recent call last):
File “predict.py”, line 63, in
prediction = predict_image(imagepath)
File “predict.py”, line 47, in predict_image
output = model(input)
File “/usr/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 477, in call
result = self.forward(*input, **kwargs)
File “/home/kevin/Documents/Grive/Personal/Coursera/Machine_Learning/pruning/Pruning-CNN/SqueezeNet-Pruning/finetune.py”, line 39, in forward
x = self.features(x)
File “/usr/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 477, in call
result = self.forward(*input, **kwargs)
File “/usr/lib/python3.7/site-packages/torch/nn/modules/container.py”, line 92, in forward
input = module(input)
File “/usr/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 477, in call
result = self.forward(*input, **kwargs)
File “/usr/lib/python3.7/site-packages/torch/nn/modules/conv.py”, line 313, in forward
self.padding, self.dilation, self.groups)
RuntimeError: Expected object of backend CPU but got backend CUDA for argument #2 ‘weight’
[kevin#linux SqueezeNet-Pruning]$
I think it can relate to the use of GPU. I think the program might work with correct configuration with GPU. Or you can delete line 39, 40.

Categories