Fail to find the dnn implementation for LSTM - python

I'm trying to run a simple LSTM model with following code
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.LSTM(32,
input_shape=x_train_single.shape[-2:]))
model.add(tf.keras.layers.Dense(1))
model.compile(optimizer=tf.keras.optimizers.RMSprop(), loss='mae')
single_step_history = model.fit(train_data_single, epochs=EPOCHS,
steps_per_epoch=EVALUATION_INTERVAL)
The error happened when it trying to fit the model
tensorflow.python.framework.errors_impl.UnknownError: [_Derived_] Fail to find the dnn implementation.
[[{{node CudnnRNN}}]]
[[sequential/lstm/StatefulPartitionedCall]] [Op:__inference_distributed_function_3107]
There's another error like this
2020-02-22 19:08:06.478567: W tensorflow/core/kernels/data/cache_dataset_ops.cc:820] The calling
iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the
dataset, the partially cached contents of the dataset will be discarded. This can happen if you have
an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use
`dataset.take(k).cache().repeat()` instead.
I tried all methods on this question which doesn't work for me
my envrionment is
tensorflow-gpu 2.0
CUDA v10
CuDNN 7.6.5
Solution
OK.. I found that I didn't have the latest Nvidia driver, so I upgraded, and works

Answering here for the benefit of the community even if the user has provided the solution.
Upgrading Nvidia driver to the latest has resolved the issue.
You can update NVIDIA manually from here here by selecting the product details and OS, you’re going to have to download the most recent drivers from their website. You’ll then have to run the installer and overwrite the old driver.

Try below
import tensorflow as tf
physical_devices = tf.config.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(physical_devices[0], enable=True)

Related

Dead Kernal when loading a quantized model: PyTorch

I’ve a pretrained quantized model which I trained on Colab, I moved the files on my system to run ONNX runtime inference. When loading the model however with
quantized_model = torch.load('quantizedmodel.pt')
My kernel proceeds to die, non-quantized models seem to load just fine.
My torch version is ‘1.11.0’, one thing I’ve done different is that I was mapping the model to CUDA on Colab, however I was mapping to the device instead with map_location = "cpu:0", I tried changing it back to cuda device with cuda:0 instead to no development.
Running nvidia-smi gives me:
Which seems to check out. Looking for some help with debugging this.

Using GPU for tensorflow object detection

I have trained a faster R-CNN model for object detection using TensorFlow object detection with Google colab. But when testing videos google colab crashes, that's why I decided to test on my pc and installed CUDA 10.0 and Cudnn 7.6.5 and TensorFlow-gpu = 1.15.
But the test is so so slow as if it is running on a CPU. I get this message when testing so I guess it is using my GPU (photo).
Does anyone know a solution to test a video in a faster way?
Is the problem with CUDA or my GPU?
Thank you
From Comments
tf.test.is_gpu_available() tells whether Tensorflow can access a
GPU.
THIS FUNCTION IS DEPRECATED. It will be removed in a future version. Instructions for updating: Use
tf.config.list_physical_devices('GPU') instead
If it returns True, then there is no issue in using GPU. But
GeForce MX110 is slowest since it also has little RAM. For better
performance you can see the list of CUDA-enabled GPU cards. For
more details on hardware requirements you can refer
here.
(Paraphrased from Stanley Zheng and Dr.Snoopy)

CleverHans is incompatible with Tensorflow Addons

I have been trying to use CleverHans (https://github.com/tensorflow/cleverhans).
Background
I ran this file here - https://github.com/tensorflow/cleverhans/blob/master/cleverhans_tutorials/mnist_tutorial_tf.py, with Python 3.7.6 and TensorFlow 1.15.3, which led me to the following error:
ImportError: This version of TensorFlow Addons requires TensorFlow version >= 2; Detected an installation of version 1.15.3. Please upgrade TensorFlow to proceed.
However, CleverHans isn't really meant for TensorFlow's versions 2 and above, and they are working on publishing a new version. I actually even updated TensorFlow to 2.3.0 just to see what would happen. That led to a bunch of errors and error-fixing, and finally I got here:
/python3.7/site-packages/cleverhans/initializers.py", line 13, in __init__
super(HeReLuNormalInitializer, self).__init__(dtype=dtype)
TypeError: __init__() got an unexpected keyword argument 'dtype'
Clearly, CleverHans doesn't work with TensorFlow 2.
Question
My question is that how come the fgsm.py file (CleverHans) uses tensorflow_addons when tensorflow_addons doesn't work with TensorFlow <2. Also, I guess the only solution would be to make things work without tensorflow_addons, but I'm not sure how much effort would that be.
Any suggestions related to evading TensorFlow Addons or a completely new approach would be awesome.
Thanks!

Fix for "Check Weather your Graph def interpreting binary is up to date with your Graph def generating binary "

I am running tensorflow with react native. I have a retrained Inception V3 graph. I used a GitHub repo example to test if a model other than my own would work, and it functioned perfectly well. When I attempt to use my own model, I get the Error: "Check whether your GraphDef interpreting binary is up to date with your GraphDef generating binary"
Dev Info{Python 3.5, react-Native 0.59, tensorflow 2.0.0a0, protobuf 3.7.1}From what I have seen, I have attempted training my model on an older version of tensorflow, (I was using 1.13.1, I tried 1.8.0). I heard that my version of tensorflow and protobuf may be too high to interpret my .pb file. This did not work though, and I received the exact same error.
Here is the recognition code:
async recognizeImage() {
try {
const tfImageRecognition = new TfImageRecognition({
model:require('./assets/retrained_graph.pb'),
labels: require('./assets/retrained_labels.txt')
})
const results = await tfImageRecognition.recognize({
image: this.image
})
On my docker container (where I running tensorflow serving) I have:
TensorFlow ModelServer: 2.1.0-rc1
TensorFlow Library: 2.1.0
The problem is related with your local tensorflow version that you use for export your protobuf model. I know that if you export your h5 model with tf versions 1.14.0, 2.1.0 and 2.2.0 you will have this problem at the time of perform inference. You can try to use tf versions >1.15.0 and less than 1.8.0. I think this happen because some tensorflow version doesn't support a particular layer at the time of export.
For change your local tensorflow version you can do
pip install tensorflow==1.15.0

Do pretrained tensorflow models need to be used by machines with the same versions?

I trained a cnn on a Linux machine with keras/tensorflow but can’t get the pretrained model to run on my Raspberry Pi. The model was made on Ubuntu 16.04 with Python 3.6.7, tensorflow version 1.7.0, CuDNN 7.0.5 and CUDA 9. I am trying to run it on the Raspberry Pi 3 Model B+ with Python 3.5.3 and tensorflow version 1.13.1.
I have no problem loading and running the pretrained model on the same machine it was created on. The issue is only when I try to run that same pretrained model on the RPi system. I end up getting a segmentation fault.
I tried updating the Linux machine that created the model to tensorflow 1.12 but after tensorflow 1.12 successfully installed, I got "Failed to get convolution algorithm. This is probably because cuDNN failed to initialize" errors, so I'd rather not go down that route. I want to know if it's possible to just use this pretrained model with tensorflow 1.13.1 on the RPi.
Here's what I'm doing on the RPi:
>>> import tensorflow as tf
/usr/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: compiletime version 3.4 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.5
return f(*args, **kwds)
/usr/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: builtins.type size changed, may indicate binary incompatibility. Expected 432, got 412
>>> print(tf.__version__)
1.13.1
>>> from keras.models import load_model
Using TensorFlow backend.
>>> model = load_model(save_dir+model_name)
WARNING:tensorflow:From /home/pi/.local/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From /home/pi/.local/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py:3445: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
2019-03-25 17:08:11.471364: W tensorflow/core/framework/allocator.cc:124] Allocation of 209715200 exceeds 10% of system memory.
2019-03-25 17:12:55.123877: W tensorflow/core/framework/allocator.cc:124] Allocation of 209715200 exceeds 10% of system memory.
Backend terminated (returncode: -11)
Fatal Python error: Segmentation fault
I need some guidance on whether this is happening - are the versions incompatible? Maybe the model is too large for RPi (doubt it - it's a fairly shallow model with 18 layers)? The other forum posts I've seen with segmentation faults seemed a lot more dire (e.g., they can't even write standard commands in the Terminal without seeing a segmentation error) - this segmentation fault only happens (and happens repeatably) through the above commanding.
Any advice/help greatly appreciated!

Categories