Tensorflow Keras error "no algorithm found!" - python

I am currently following the book Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow, but I keep coming across these errors. When I run the following code:
model.compile(loss="sparse_categorical_crossentropy", optimizer="nadam", metrics=["accuracy"])
history = model.fit(X_train, y_train, epochs=10, validation_data=(X_valid, y_valid))
score = model.evaluate(X_test, y_test)
y_pred = model.predict(X_new)
I get this error
NotFoundError: No algorithm worked!
[[node sequential/conv2d/Conv2D (defined at :2) ]] [Op:__inference_train_function_2275]
Function call stack:
train_function
I then pulled these two things from stack overflow, which seemed to "fix" the problem
os.environ['TF_CPP_MIN_LOG_LEVEL'] = "2"
config = ConfigProto()
config.gpu_options.allow_growth = True
session = InteractiveSession(config=config)
However then this message clogs my terminal:
2021-04-10 18:20:01.643838: W tensorflow/stream_executor/gpu/asm_compiler.cc:235] Your CUDA software stack is old. We fallback to the NVIDIA driver for some compilation. Update your CUDA version to get the best performance. The ptxas error was: ptxas fatal : Value 'sm_86' is not defined for option 'gpu-name'
I am running cuda 11.2, and I have driver version 460.39. My card is a 3080.
Does anyone know what might be going wrong?

Related

I got 'ValueError: No gradients provided for any variable:'

When I train the ML model that other team members had no problem with,
but I got 'ValueError: No gradients provided for any variable:'
Total error statement is below
ValueError: No gradients provided for any variable: ['dense/kernel:0', 'dense/bias:0', 'lstm/lstm_cell/kernel:0', 'lstm/lstm_cell/recurrent_kernel:0', 'lstm/lstm_cell/bias:0', 'dense_1/kernel:0', 'dense_1/bias:0', 'dense_2/kernel:0', 'dense_2/bias:0'].
and below is the block of Jupyter Notebook that makes error
model.layers[2]
model.layers[2].set_weights([embedding_matrix])
model.layers[2].trainable = False
model.compile(loss='categorical_crossentropy', optimizer='adam')
epochs = 10
number_pics_per_bath = 3
steps = len(train_descriptions)//number_pics_per_bath
for i in range(epochs):
generator = data_generator(train_descriptions, train_features, wordtoix, max_length, number_pics_per_bath)
model.fit_generator(generator, epochs=1, steps_per_epoch=steps, verbose=1)
model.save('C:/MyAI/archive/model_' + str(i) + '.h5')
i think you also want to see my total code. it is easy, because I almost copied the code of below github link
https://github.com/hlamba28/Automatic-Image-Captioning/blob/master/Automatic%20Image%20Captioning.ipynb
because I don't have the same path or name of file like github,
I changed a little bit of above github code, but i did not change the logical part of code.
and my other team member said he had no problem with our code (that is changed a little bit), and showed me that ML model is trained well and make result intended.

'use_cuda' set to True when cuda is unavailable. Make sure CUDA is available or set use_cuda=False

I am trying to create a Bert model for classifying Turkish Lan. here is my code:
import pandas as pd
import torch
df = pd.read_excel (r'preparedDataNoId.xlsx')
df = df.sample(frac = 1)
from sklearn.model_selection import train_test_split
train_df, test_df = train_test_split(df, test_size=0.10)
print('train shape: ',train_df.shape)
print('test shape: ',test_df.shape)
from simpletransformers.classification import ClassificationModel
# define hyperparameter
train_args ={"reprocess_input_data": True,
"fp16":False,
"num_train_epochs": 4}
# Create a ClassificationModel
model = ClassificationModel(
"bert", "dbmdz/bert-base-turkish-cased",
num_labels=4,
args=train_args
)
I am using Anaconda and Spyder. I think every thing is correct but when I run this I got the following error:
'use_cuda' set to True when cuda is unavailable. Make sure CUDA is available or set use_cuda=False.
how can I fix this exactly?
I ran into the same problem. If you have CUDA available, then set both use_cuda and fp16 to True. If not, then set both to False.
CUDA is a parallel computing platform and programming model developed by Nvidia for general computing on its own GPUs.
If your computer does not have GPU, this error will be thrown to you.
Don't forget to include this parameter
use_cuda= False
This will not affect your result, just take a few more seconds than usual to process.
model = ClassificationModel(
"bert", "dbmdz/bert-base-turkish-cased",
num_labels=4,
args=train_args,
use_cuda=False
)
Adding use_cuda=False will help if GPU is not available
If your GPU is unavailable on your computer. Make sure to check CUDA or try use_cuda=False in args of your model. This error will be throw since CUDA does not exist on your computer.

Failed to convert a NumPy array to a Tensor (Unsupported object type tensorflow.python.framework.ops.EagerTensor)

I'm trying to implement a simple recurrent network using TensorFlow, but am receiving the above error. I've looked through several answers related to the:
"Failed to convert a NumPy array to a Tensor (Unsupported object type ____)"
error, but none so far have addressed "tensorflow.python.framework.ops.EagerTensor" as the unsupported type. I am receiving this error after trying to implement code from this tutorial (albeit with a different data-set).
The error occurs on the history = model.fit line:
# Define the network
epochs_qty = 50
batch_size_qty = 72
model = Sequential()
model.add(LSTM(epochs_qty, input_shape = (train_X.shape[1], train_X.shape[2])))
model.add(Dense(1))
model.compile(loss = 'mae', optimizer = 'adam')
# Fit the network
history = model.fit(train_X, train_y, epochs = epochs_qty, batch_size = batch_size_qty, validation_data = (test_X, test_y), verbose = 2, shuffle = False)
The data sets have the following shapes:
print(train_X.shape, train_y.shape, test_X.shape, test_y.shape)
>> (1762, 1, 2) (1762,) (588, 1, 2) (588,)
I am running the following versions:
Python 3.7.9
Windows 10
tensorflow-gpu-2.3.1
CUDA Toolkit 10.1 Update 1
cuDNN v8.0.3 for CUDA 10.1
I have tried disabling eager execution, but this leads to a pile of additional errors, and does not seem optimal for future code development.
Also, I have tried running this code both locally and through a jupyter notebook. Both result in the exact same error, so it seems like my software setup is not the issue.
Can anyone please suggest where to look next for the cause of this error?

Running keras model on colab TPU

I am currently trying to train a convolutional neural network CNN using Keras and the Google Colab GPU.
I found this article that discussed the option to increase the training time that is needed to train the model. Since the current training on the GPU is very slow I tried to implement the method from the article. I have the following code:
sgd = optimizers.SGD(lr=0.02)
model.compile(optimizer=sgd,loss='categorical_crossentropy',metrics=['accuracy'])
def create_train_subsets():
X_train =[]
y_train = []
for i in range(80):
cat = i+1
path = 'train_set/by_cat/{}'.format(cat)
for img in os.listdir(path):
actual_image = Image.open(("train_set/by_cat/{}/{}".format(cat,img)))
X_train.append(actual_image)
y_train.append(cat)
return X_train, y_train
# This address identifies the TPU we'll use when configuring TensorFlow.
x_train, y_train = create_train_subsets()
TPU_WORKER = 'grpc://' + os.environ['COLAB_TPU_ADDR']
tf.logging.set_verbosity(tf.logging.INFO)
tpu_model = tf.contrib.tpu.keras_to_tpu_model(
model,
strategy=tf.contrib.tpu.TPUDistributionStrategy(
tf.contrib.cluster_resolver.TPUClusterResolver(TPU_WORKER)))
history = tpu_model.fit(x_train, y_train,
epochs=20,
batch_size=128 * 8,
validation_split=0.2)
tpu_model.save_weights('./tpu_model.h5', overwrite=True)
# tpu_model.evaluate(x_test, y_test, batch_size=128 * 8)
This code however gives back the following error:
InvalidArgumentError: No OpKernel was registered to support Op 'ConfigureDistributedTPU' used by node ConfigureDistributedTPU (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) with these attrs: [tpu_embedding_config="", is_global_init=false, embedding_config=""]
Registered devices: [CPU, XLA_CPU]
Registered kernels:
<no registered kernels>
[[ConfigureDistributedTPU]]
I did an extensive search online but I can't seem to find any indication on what it means. Also, I am not understanding the process enough to figure out the exact meaning of the error myself.
Therefore, is there anybody out there that can help me understand what is wrong and maybe also knows a solution on how to solve this.
Thank you in advance!

Warning `tried to deallocate nullptr` when using tensorflow eager execution with tf.keras

As per the tensorflow team suggestion, I'm getting used to tensorflow's eager execution with tf.keras. However, whenever I train a model, I receive a warning (EDIT: actually, I receive this warning repeated many times, more than once per training step, flooding my standard output):
E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
The warning doesn't seem to affect the quality of the training but I wonder what it means and if it is possible to get rid of it.
I use a conda virtual environment with python 3.7 and tensorflow 1.12 running on a CPU. (EDIT: a test with python 3.6 gives the same results.) A minimal code that reproduces the warnings follows. Interestingly, it is possible to comment the line tf.enable_eager_execution() and see that the warnings disappear.
import numpy as np
import tensorflow as tf
tf.enable_eager_execution()
N_EPOCHS = 50
N_TRN = 10000
N_VLD = 1000
# the label is positive if the input is a number larger than 0.5
# a little noise is added, just for fun
x_trn = np.random.random(N_TRN)
x_vld = np.random.random(N_VLD)
y_trn = ((x_trn + np.random.random(N_TRN) * 0.02) > 0.5).astype(float)
y_vld = ((x_vld + np.random.random(N_VLD) * 0.02) > 0.5).astype(float)
# a simple logistic regression
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(1, input_dim=1))
model.add(tf.keras.layers.Activation('sigmoid'))
model.compile(
optimizer=tf.train.AdamOptimizer(),
# optimizer=tf.keras.optimizers.Adam(), # doesn't work at all with tf eager execution
loss='binary_crossentropy',
metrics=['accuracy']
)
# Train model on dataset
model.fit(
x_trn, y_trn,
epochs=N_EPOCHS,
validation_data=(x_vld, y_vld),
)
model.summary()
Quick solutions:
It did not appear when I ran the same script in TF 1.11 while the optimization was performed to reach the same final validation accuracy on a synthetic dataset.
OR
Suppress the errors/warning using the native os module (adapted from https://stackoverflow.com/a/38645250/2374160). ie; by setting the Tensorflow logging environment variable to not show any error messages.
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
import tensorflow as tf
More info:
Solving this error in the correct way may require familiarity with MKL library calls and its interfacing on Tensorflow which is written in C (this is beyond my current TF expertise)
In my case, this memory deallocation error occurred whenever the
apply_gradients() method of an optimizer was called. In your script, it is called when the model is being fitted to the training data.
This error is raised from here: tensorflow/core/common_runtime/mkl_cpu_allocator.h
I hope this helps as a temporary solution for convenience.

Categories