I have a computer with few NVidia GPU, use packet 'segmentation_models' and build NN on the base of Unet:
import segmentation_models as sm
import keras.backend as K
from keras import optimizers
from keras.utils import multi_gpu_model
lr = 2e-4
NUM_GPUS = 3
learning_rate = lr * NUM_GPUS
adam = optimizers.Adam(lr=learning_rate)
def dice_coef(y_true, y_pred, smooth=1):
y_true_f = K.flatten(y_true)
y_pred_f = K.flatten(y_pred)
intersection = K.sum(y_true_f * y_pred_f)
return (2. * intersection + smooth) / (K.sum(y_true_f) + K.sum(y_pred_f) + smooth)
model = sm.Unet('efficientnetb3', encoder_weights='imagenet', classes=4, activation='softmax', encoder_freeze=False)
parallel_model = multi_gpu_model(model, gpus=NUM_GPUS)
model = parallel_model
model.compile(adam, 'categorical_crossentropy', [dice_coef])
history = model.fit_generator(
generator=train_gen, steps_per_epoch=len(train_gen), \
validation_data=validation_gen, \
epochs=50, callbacks=[clr, checkpoints, csv_logger],
initial_epoch=0)
after training I save weights for future using in cpu-mode:
single_gpu_model = model.layers[-2]
single_gpu_model.save(single_proc_model_path_1_kernel)
And I try to work with theese weights:
import keras
model1 = keras.models.load_model(single_proc_model_path_1_kernel)
...
pr_mask = self.model1.predict(img_exp)
Machine for NN training: Ubuntu 16.04.4 LTS, 3 x K80 GPU; python 3.6.7, tensorflow 1.12.0 - all code works here.
Win10 with 1 GeForce GTX 1080; python 3.7.3, tensorflow-gpu 1.13.1 - code works here too.
Win10 without NVidia GPU; tensorflow-gpu 1.13.1 - ERROR when loading model:
tensorflow/stream_executor/cuda/cuda_driver.cc:300] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
docker with Ubuntu 18.04.3 LTS; python 3.6.9, tensorflow 2.1.0.
Error when loading model:
tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory
tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory
tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
Segmentation Models: using keras framework.
tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
tensorflow/stream_executor/cuda/cuda_driver.cc:351] failed call to cuInit: UNKNOWN ERROR (303)
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (b36a4cf2df2e): /proc/driver/nvidia/version does not exist
What should I change to force code to work on a machine with CPUs ony?
Tensorflow 1.15 resolved all the problems. Thanks.
You can try setting the environment variable CUDA_VISIBLE_DEVICES to either blank or emptystring "", or possibly -1.
Otherwise you'll need to tell the tensorflow backend to use CPU only.
See also: Can Keras with Tensorflow backend be forced to use CPU or GPU at will?
Note that keras multi_gpu_model is deprecated and you should alter your code to use tf.distribute.MirroredStrategy instead. I haven't personally worked with it but I imagine this new API is designed to work more seamlessly across GPU/CPU situations like yours.
Related
I have this Docker container built with the image provided by the Dask project: FROM daskdev/dask
I install Tensorflow and Keras in the Dockerfile:
RUN pip3 install tensorflow
RUN pip3 install scikeras
In my python code I try to train a Keras model:
niceties = dict(verbose=False)
model = KerasClassifier(build_fn=build_model, lr=0.1, momentum=0.9,
loss=tf.keras.losses.MeanSquaredError(), **niceties)
with joblib.parallel_backend('dask'):
model.fit(X, y, epochs=500) # <--- here it throws the error
def build_model(lr=0.01, momentum=0.9):
model = keras.Sequential()
model.add(tf.keras.Input(shape=(50,)))
model.add(tf.keras.layers.Dense(16, activation='relu'))
model.add(tf.keras.layers.Dense(1, activation='softmax'))
return model
The error says that the cuda/nvidia library is not found. That's fine, I don't want to run Tensorflow with GPU, how to tell Tensorflow not to use it?
2021-12-23 01:45:33.044350: W
tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could
not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot
open shared object file: No such file or directory; LD_LIBRARY_PATH:
/pyenv/lib/python3.9/site-packages/clidriver/lib:/usr/lib/x86_64-linux-gnu/odbc
2021-12-23 01:45:33.044395: W
tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to
cuInit: UNKNOWN ERROR (303)
2021-12-23 01:45:33.044418: I
tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver
does not appear to be running on this host (9b318a7a9609):
/proc/driver/nvidia/version does not exist
2021-12-23 01:45:33.044625: I
tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow
binary is optimized with oneAPI Deep Neural Network Library (oneDNN)
to use the following CPU instructions in performance-critical
operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the
appropriate compiler flags.
I had made a discord bot in python and have now added "Chatbot" feature in it using Tensorflow and NLTK. When I run the bot locally, it works absolutely fine without any issues but when I move it to my namecheap hosting package where I host my portfolio, it starts to give an error by saying :
OpenBLAS blas_thread_init: pthread_create failed for thread 29 of 64: Resource temporarily unavailable
and nltk and tensorflow don't get imported and the bot crashes.
I googled it and found a solution which tells to use os.environ['OPENBLAS_NUM_THREADS'] = '1' before using any imports. This solved the previous error but now it gives another error saying:
Check failed: ret == 0 (11 vs. 0)Thread creation via pthread_create() failed.
The complete output on running python main.py now is:
2021-06-10 11:18:19.606471: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-06-10 11:18:19.606497: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2021-06-10 11:18:21.090650: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2021-06-10 11:18:21.090684: W tensorflow/stream_executor/cuda/cuda_driver.cc:326] failed call to cuInit: UNKNOWN ERROR (303)
2021-06-10 11:18:21.090716: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (server270.web-hosting.com): /proc/driver/nvidia/version does not exist
2021-06-10 11:18:21.091042: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-06-10 11:18:21.092409: F tensorflow/core/platform/default/env.cc:73] Check failed: ret == 0 (11 vs. 0)Thread creation via pthread_create() failed.
To not make this question too long, the source files have already been hosted on GitHub here: https://github.com/Nalin-2005/The2020CoderBot And the README.md tells which files contain which part of the bot.
The bot is being hosted on Namecheap shared hosting and the details and technical specs about the server are:
RAM: 1GB
Storage: 20GB SSD
CPU (used cat /proc/cpuinfo | grep 'model name' | uniq): Intel(R) Xeon(R) Gold 6140 CPU # 2.30GHz
As per my knowledge, both the issues are caused by limited RAM or CPU usage. But now, the Python script itself blocks the usage.
So, what causes this (If I am not correct) and how can I fix this?
After some time brainstorming and googling, i found Tensorflow Lite and it consumes less resources but offering the same performance* on my server and i could easily integrate it with the previous code to produce a more resource-efficient model.
To the users who would want to know how to convert any keras model to Tensorflow lite, here are the instructions.
While training, replace model.save("/path/to/model.h5") with:
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()
with open("/path/to/model.tflite", "wb") as f:
f.write(tflite_model)
While using, use:
model = tf.lite.Interpreter("/path/to/model.tflite")
model.allocate_tensors()
input_details = model.get_input_details()
output_details = model.get_output_details()
# prepare input data
model.set_tensor(input_details[0]['index'],input_data)
model.invoke()
output_data = model.get_tensor(output_details[0]['index'])
results = np.squeeze(output_data)
I am trying to run a prediction of a model build in keras on my NVIDIA Tegra TX2 using Tensorflow and Python (2.7) and I am quite randomly running in tensorflow giving me the following exception:
Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4504 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
2019-10-04 16:17:50.786531: E tensorflow/stream_executor/cuda/cuda_driver.cc:1032] could not synchronize on CUDA context: CUDA_ERROR_UNKNOWN: unknown error :: *** Begin stack trace ***
stream_executor::gpu::GpuDriver::SynchronizeContext(stream_executor::gpu::GpuContext*)
stream_executor::StreamExecutor::SynchronizeAllActivity()
tensorflow::GPUUtil::SyncAll(tensorflow::Device*)
*** End stack trace ***
...
tensorflow.python.framework.errors_impl.InternalError: GPU sync failed
Sometimes after a few reboots / waiting time the problem is solved and I can run the prediction again but 8 out of 10 times this error appears.
I've already tried the following:
Change the query amount and memory usage as follows:
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
config.per_process_gpu_memory_fraction = 0.7
session = tf.Session(config=config, ...)
Re-Install Tensorflow build for TX2 and Jetpack v3.3
I would be really happy for any further suggestions.
Since this problem is intermittent, this may be happening when tensorflow is not getting the required memory. It is a known issue and you have already tried the basic trouble shooting steps. Since you are still getting the issue, try below steps as well:
reinstall libhdf5-dev, python-h5py
sudo apt-get install libhdf5-dev
sudo apt-get install python-h5py
and then set gpu allow growth as per "https://github.com/keras-team/keras/issues/4161#issuecomment-366031228"
import tensorflow as tf
from keras.backend.tensorflow_backend import set_session
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)
set_session(sess)
Adding the following line solved the issue. Note my TensorFlow version is 2.1>
import tensorflow as tf
config = tf.compat.v1.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.compat.v1.InteractiveSession(config=config)
I am trying to use cuda in Goolge Colab but while running my program I get the following error.
RuntimeError: Cannot initialize CUDA without ATen_cuda library. PyTorch splits its backend into two shared libraries: a CPU library and a CUDA library; this error has occurred because you are trying to use some CUDA functionality, but the CUDA library has not been loaded by the dynamic linker for some reason. The CUDA library MUST be loaded, EVEN IF you don't directly use any symbols from the CUDA library! One common culprit is a lack of -Wl,--no-as-needed in your link arguments; many dynamic linkers will delete dynamic library dependencies if you don't depend on any of their symbols. You can check if this has occurred by using ldd on your binary to see if there is a dependency on *_cuda.so library.
I have the following libraries installed.
from os.path import exists
from wheel.pep425tags import get_abbr_impl, get_impl_ver, get_abi_tag
platform = '{}{}-{}'.format(get_abbr_impl(), get_impl_ver(), get_abi_tag())
cuda_output = !ldconfig -p|grep cudart.so|sed -e 's/.*\.\([0-9]*\)\.\([0-9]*\)$/cu\1\2/'
accelerator = cuda_output[0] if exists('/dev/nvidia0') else 'cpu'
!pip install -q http://download.pytorch.org/whl/{accelerator}/torch-0.4.1-
{platform}-linux_x86_64.whl torchvision
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
import matplotlib.pyplot as plt
import time
import torch
from torch import nn
from torch import optim
import torch.nn.functional as F
from torchvision import datasets, transforms, models
!pip install Pillow==5.3.0
# import the new one
import PIL
And I am trying to run the following code.
for device in ['cpu', 'cuda']:
criterion = nn.NLLLoss()
# Only train the classifier parameters, feature parameters are frozen
optimizer = optim.Adam(model.classifier.parameters(), lr=0.001)
model.to(device)
for ii, (inputs, labels) in enumerate(trainloader):
# Move input and label tensors to the GPU
inputs, labels = inputs.to(device), labels.to(device)
start = time.time()
outputs = model.forward(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
if ii==3:
break
print(f"Device = {device}; Time per batch: {(time.time() - start)/3:.3f} seconds")
Have you selected the runtime as GPU?
check runtime> change runtime type > select hardware accelerator as GPU
Have you tried the following?
Go to Menu > Runtime > Change runtime.
Change hardware acceleration to GPU.
How to install CUDA in Google Colab GPU's
Already long time I try to understand with a problem. Please help me.
I'm trying to run the 'Keras' example from the standard example git lib (there).
If I use CPU, then everything will works fine; But, If I try to use GPU acceleration, it will crash WITHOUT catching any errors:
# build the model: a single LSTM
print('Build model...')
print(' 1')
model = Sequential()
print(' 2')
model.add(LSTM(128, input_shape=(maxlen, len(chars))))
print(' 3')
model.add(Dense(len(chars)))
print(' 4')
model.add(Activation('softmax'))
print(' 5')
optimizer = RMSprop(lr=0.01)
print(' Compilling')
model.compile(loss='categorical_crossentropy', optimizer=optimizer)
I put some print() for better understand the place of error.
And what I get:
runfile('C:/Users/kostya/Desktop/temp/python/test.py', wdir='C:/Users/kostya/Desktop/temp/python/')
Using Theano backend.
Using cuDNN version 5110 on context None
Preallocating 1638/2048 Mb (0.800000) on cuda
Mapped name None to device cuda: GeForce GTX 650 (0000:01:00.0)
WARNING: Preallocating too much memory can prevent cudnn and cublas from working properly
DEVICE: cuda
corpus length: 206433
total chars: 79
nb sequences: 68798
Vectorization...
Build model...
1
2
Ядро остановилось, перезапуск *(It means: The Core has stopped, restarting)*
I will take Similar error, if I run it througth standatr python console. (python.exe emergency stops)
I use: Win 10-64, Python 3.6.1, Anaconda with activated separate enviroment, CUDA 8.0, cuRNN 5.1, mkl 2017.0.3, numpy 1.13.0, theano 0.9.0, conda-forge.keras 2.0.2, m2w64-openblas 0.2.19, conda-forge.pygpu 0.6.8, VC 14.0 etc.
That's my .theanorc.txt configurational file. (I'm sure this can catch him. If I put the device = cpu - it works fine (but slowly))
[global]
floatX = float32
device = cuda
optimizer_including = cudnn
[nvcc]
flags=-LC:\Users\kostya\Anaconda3\envs\keras\libs
compiler_bindir=C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin
[cuda]
root = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0
[dnn]
library_path = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\lib\x64
include_path = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\include
[gpuarray]
preallocate = 0.8
You trying to use gpuarray backend option (preallocate) with CUDA backend. From Theano doc:
This value allocates GPU memory ONLY when using (GpuArray Backend). For the old backend, please see config.lib.cnmem
Try replace in your Theano config
[gpuarray]
preallocate = 0.8
with
[lib]
cnmem = 0.8