I have TensorFlow, NVIDIA GPU (CUDA)/CPU, Keras, & Python 3.7 in Linux Ubuntu.
I followed all the steps according to this tutorial:
https://www.youtube.com/watch?v=dj-Jntz-74g
when I run the following code of:
# What version of Python do you have?
import sys
import tensorflow.keras
import pandas as pd
import sklearn as sk
import tensorflow as tf
print(f"Tensor Flow Version: {tf.__version__}")
print(f"Keras Version: {tensorflow.keras.__version__}")
print()
print(f"Python {sys.version}")
print(f"Pandas {pd.__version__}")
print(f"Scikit-Learn {sk.__version__}")
gpu = len(tf.config.list_physical_devices('GPU'))>0
print("GPU is", "available" if gpu else "NOT AVAILABLE")
I get the these results:
Tensor Flow Version: 2.4.1
Keras Version: 2.4.0
Python 3.7.10 (default, Feb 26 2021, 18:47:35)
[GCC 7.3.0]
Pandas 1.2.3
Scikit-Learn 0.24.1
GPU is available
However; I don't know how to run my Keras model on GPU. When I run my model, and I get $ nvidia-smi -l 1, GPU usage is almost %0 during the run.
from keras import layers
from keras.models import Sequential
from keras.layers import Dense, Conv1D, Flatten
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt
from sklearn.metrics import r2_score
from keras.callbacks import EarlyStopping
model = Sequential()
model.add(Conv1D(100, 3, activation="relu", input_shape=(32, 1)))
model.add(Flatten())
model.add(Dense(64, activation="relu"))
model.add(Dense(1, activation="linear"))
model.compile(loss="mse", optimizer="adam", metrics=['mean_squared_error'])
model.summary()
es = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=70)
history = model.fit(partial_xtrain_CNN, partial_ytrain_CNN, batch_size=100, epochs=1000,\
verbose=0, validation_data=(xval_CNN, yval_CNN), callbacks = [es])
Do I need to change any parts of my code or add a part to force it run on GPU??
To tensorflow work on GPU, there are a few steps to be done and they are rather difficult.
First of compatibility of these frameworks with NVIDIA is much better than others so you could have less problem if the GPU is an NVIDIA and should be in this list.
The second thing is that you need to install all of the requirements which are:
1- The last version of your GPU driver
2- CUDA instalation shown here
3- then install Anaconda add anaconda to environment while installing.
After completion of all the installations run the following commands in the command prompt.
conda install numba & conda install cudatoolkit
Now to assess the results use this code:
from numba import jit, cuda
import numpy as np
# to measure exec time
from timeit import default_timer as timer
# normal function to run on cpu
def func(a):
for i in range(10000000):
a[i]+= 1
# function optimized to run on gpu
#jit(target ="cuda")
def func2(a):
for i in range(10000000):
a[i]+= 1
if __name__=="__main__":
n = 10000000
a = np.ones(n, dtype = np.float64)
b = np.ones(n, dtype = np.float32)
start = timer()
func(a)
print("without GPU:", timer()-start)
start = timer()
func2(a)
print("with GPU:", timer()-start)
Parts of this answer is from here which you can read for more.
I found a solution for my question.
I think the problem was about the incompatibility of the NVIDIA driver, Cudnn, and TensorFlow. because I had the new NVIDIA graphic card (RTX 3060) on my laptop, and it has NVIDIA Ampere Architecture GPU, and probably it was not compatible with others.
Instead I referred to these links to download the 21.02 docker container, then I mount this docker. In this container that is provided by NVIDIA everything is tested and should give good performance.
https://docs.nvidia.com/deeplearning/frameworks/tensorflow-wheel-release-notes/tf-wheel-rel.html
https://docs.nvidia.com/deeplearning/frameworks/tensorflow-release-notes/rel_21-02.html#rel_21-02
Also, to install a docker in Linux you can follow the procedure explained here:
https://towardsdatascience.com/deep-learning-with-docker-container-from-ngc-nvidia-gpu-cloud-58d6d302e4b2
Related
I am trying out detectron2 and want to train the sample model.
When running the following code I get (<class 'RuntimeError'>, RuntimeError('No CUDA GPUs are available'), <traceback object at 0x7f42b094ebc0>). Find below the code:
import detectron2
from detectron2.utils.logger import setup_logger
setup_logger()
# import some common libraries
import matplotlib.pyplot as plt
import numpy as np
import cv2
# import some common detectron2 utilities
from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg
from detectron2.utils.visualizer import Visualizer
from detectron2.data import MetadataCatalog, DatasetCatalog
from detectron2.data.datasets import register_coco_instances
import random
from detectron2.engine import DefaultTrainer
from detectron2.config import get_cfg
import os
# To verify the data loading is correct, let's visualize the annotations of randomly selected samples in the training set:
register_coco_instances("fruits_nuts", {}, "../data/trainval.json", "../data/images")
fruits_nuts_metadata = MetadataCatalog.get("fruits_nuts")
dataset_dicts = DatasetCatalog.get("fruits_nuts")
'''
for d in random.sample(dataset_dicts, 3):
img = cv2.imread(d["file_name"])
visualizer = Visualizer(img[:, :, ::-1], metadata=fruits_nuts_metadata, scale=0.5)
vis = visualizer.draw_dataset_dict(d)
cv2.imshow('new', vis.get_image()[:, :, ::-1])
cv2.waitKey(0)
'''
# train model
cfg = get_cfg()
cfg.merge_from_file("../detectron2_repo/configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml")
cfg.DATASETS.TRAIN = ("fruits_nuts",)
cfg.DATASETS.TEST = () # no metrics implemented for this dataset
cfg.DATALOADER.NUM_WORKERS = 2
cfg.MODEL.WEIGHTS = "detectron2://COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl" # initialize from model zoo
cfg.SOLVER.IMS_PER_BATCH = 2
cfg.SOLVER.BASE_LR = 0.02
cfg.SOLVER.MAX_ITER = 300 # 300 iterations seems good enough, but you can certainly train longer
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 128 # faster, and good enough for this toy dataset
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 3 # 3 classes (data, fig, hazelnut)
os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
trainer = DefaultTrainer(cfg)
trainer.resume_or_load(resume=False)
trainer.train()
I ran the script collect_env.py from torch:
/home/project/.venv/bin/python /home/project/src/collect_env.py
Collecting environment information...
PyTorch version: 1.10.2+cu102
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.3 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.31
Python version: 3.8.10 (default, Nov 26 2021, 20:14:08) [GCC 9.3.0] (64-bit runtime)
Python platform: Linux-5.13.0-27-generic-x86_64-with-glibc2.29
Is CUDA available: False
CUDA runtime version: 10.1.243
GPU models and configuration: Could not collect
Nvidia driver version: Could not collect
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Versions of relevant libraries:
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.22.1
[pip3] torch==1.10.2
[pip3] torchvision==0.11.3
[conda] Could not collect
Process finished with exit code 0
I am having on the system a RTX3080 graphic card. However, it seems to me that its not found.
Any suggestions why?
Is there a way to run the training without CUDA?
I appreciate your replies!
I'm not sure if this works for you. But let's see from a Windows user perspective.
I'm using Detectron2 on Windows 10 with RTX3060 Laptop GPU CUDA enabled.
The first thing you should check is the CUDA. You can check by using the command:
nvcc -V
It should be shown this message:
C:\Users\User>nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Mon_May__3_19:41:42_Pacific_Daylight_Time_2021
Cuda compilation tools, release 11.3, V11.3.109
Build cuda_11.3.r11.3/compiler.29920130_0
And to check if your Pytorch is installed with CUDA enabled, use this command (reference from their website):
import torch
torch.cuda.is_available()
As on your system info shared in this question, you haven't installed CUDA on your system. And your system doesn't detect any GPU (driver) available on your system.
As far as I know, they recommended installing Pytorch CUDA to run Detectron2 by (Nvidia) GPU.
(you can check on Pytorch website and Detectron2 GitHub repo for more details).
Or, you can use this option:
Add this line of code to your python program (as reference of this issues#300):
cfg.MODEL.DEVICE = "cpu"
I hope it helps. Cheers.
In google Colab I've written an Ipython notebook where I build a neural network model, fetch the data from my google drive and train the model.
My code runs without errors and trains the model. Though I do not see any improvement when I use the colab GPU vs the default CPU. Do I correctly make use of the GPU or can tensorflow not use the GPU of google colab?
Some snippets of the code that could relate to this question:
import tensorflow as tf
print(tf.__version__)
device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
raise SystemError('GPU device not found')
print('Found GPU at: {}'.format(device_name))
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, BatchNormalization, Flatten, Dense, TimeDistributed, ReLU, ConvLSTM2D, Activation, Dropout, Reshape
Result:
2.0.0-alpha0
Found GPU at: /device:GPU:0
Building the model:
with tf.device("/gpu:0"):
model = Sequential()
#layer1
model.add(
TimeDistributed(
TimeDistributed(
Conv2D(
filters=4, kernel_size=(1,10), strides=(1,10), data_format="channels_last"
)
), input_shape=(40, 5, 7, 100, 1), name="LLConv"
)
)
model.add(TimeDistributed(BatchNormalization(axis=4), name="LBNtes"))
model.add(TimeDistributed(ReLU(), name="LRelu"))
#print(model.output_shape)#(None, 40, 5, 7, 10, 4)
#layer2
model.add(
TimeDistributed(
ConvLSTM2D(
filters=4, kernel_size=(7,3), strides=(1,1),data_format="channels_last", return_sequences=True
), name="LConvLST"
)
)
model.add(TimeDistributed(BatchNormalization(axis=4), name="LBN2"))
model.add(TimeDistributed(Activation("tanh"), name="Ltanh"))
#print(model.output_shape)#(None, 40, 5, 1, 8, 4)
model.add(Reshape((40, 5, 8, 4), name="reshape"))
#layers3
model.add(
ConvLSTM2D(
filters=1, kernel_size=(4,4), strides=(1,1), data_format="channels_last", name="GConvLSTM", return_sequences=True
)
)
model.add(BatchNormalization(axis=3, name="GBN"))
model.add(Activation("tanh", name="Gtanh"))
#print(model.output_shape)#(None, 40, 2, 5, 1)
model.add(TimeDistributed(Flatten()))
#print(model.output_shape)#(None, 40, 10)
model.add(Flatten())
#layer4
model.add(Dense(10, name="GDense"))
model.add(BatchNormalization(axis=-1))
model.add(ReLU())
model.add(Dropout(0.5))
#layer5
model.add(Dense(1, activation="linear"))
model.compile(
loss=tf.keras.losses.MeanSquaredError(),
optimizer=tf.keras.optimizers.Nadam(lr=0.001, decay=1e-6),
metrics=['mae', 'mse'],
)
#model.summary()
Training the model:
EPOCHS = 300
BATCH_SIZE = 15
with tf.device("/gpu:0"):
history = model.fit(train_features, train_labels, epochs=EPOCHS, batch_size=BATCH_SIZE, validation_data=(test_features,test_labels))
Make sure that you have tensorflow-gpu installed.
Try this on a new colab notebook first with GPU kernel enabled.
# Uninstall tensorflow first
!pip uninstall tensorflow -y
# Install tensorflow-gpu (stable version)
!pip install tensorflow-gpu # stable
import tensorflow as tf
# Check version
print(tf.__version__)
from tensorflow.python.client import device_lib
device_lib.list_local_devices()
References
How to upgrade tensorflow with GPU on google colaboratory
https://www.tensorflow.org/install/pip?lang=python3
https://www.tensorflow.org/install/gpu#pip_package
UPDATE: It looks like you would no longer need to install tensorflow-gpu in Colab as when you select GPU runtime, the environment installs tensorflow-gpu under the hood according to this video: Using GPUs in TensorFlow, TensorBoard in notebooks, finding new datasets, & more! (#AskTensorFlow).
If you try to update tensorflow by running pip install tensorflow-gpu, the binary you install may not be tuned for the GPU hardware that Colaboratory provides. Instead, you should use the tensorflow version that comes bundled with Colab.
Currently, this version is 1.15, but you can switch to version 2.X by running %tensorflow_version 2.X. At some point in the future, tensorflow 2.X will become the default.
For more information, see https://colab.research.google.com/notebooks/tensorflow_version.ipynb
I am trying to use cuda in Goolge Colab but while running my program I get the following error.
RuntimeError: Cannot initialize CUDA without ATen_cuda library. PyTorch splits its backend into two shared libraries: a CPU library and a CUDA library; this error has occurred because you are trying to use some CUDA functionality, but the CUDA library has not been loaded by the dynamic linker for some reason. The CUDA library MUST be loaded, EVEN IF you don't directly use any symbols from the CUDA library! One common culprit is a lack of -Wl,--no-as-needed in your link arguments; many dynamic linkers will delete dynamic library dependencies if you don't depend on any of their symbols. You can check if this has occurred by using ldd on your binary to see if there is a dependency on *_cuda.so library.
I have the following libraries installed.
from os.path import exists
from wheel.pep425tags import get_abbr_impl, get_impl_ver, get_abi_tag
platform = '{}{}-{}'.format(get_abbr_impl(), get_impl_ver(), get_abi_tag())
cuda_output = !ldconfig -p|grep cudart.so|sed -e 's/.*\.\([0-9]*\)\.\([0-9]*\)$/cu\1\2/'
accelerator = cuda_output[0] if exists('/dev/nvidia0') else 'cpu'
!pip install -q http://download.pytorch.org/whl/{accelerator}/torch-0.4.1-
{platform}-linux_x86_64.whl torchvision
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
import matplotlib.pyplot as plt
import time
import torch
from torch import nn
from torch import optim
import torch.nn.functional as F
from torchvision import datasets, transforms, models
!pip install Pillow==5.3.0
# import the new one
import PIL
And I am trying to run the following code.
for device in ['cpu', 'cuda']:
criterion = nn.NLLLoss()
# Only train the classifier parameters, feature parameters are frozen
optimizer = optim.Adam(model.classifier.parameters(), lr=0.001)
model.to(device)
for ii, (inputs, labels) in enumerate(trainloader):
# Move input and label tensors to the GPU
inputs, labels = inputs.to(device), labels.to(device)
start = time.time()
outputs = model.forward(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
if ii==3:
break
print(f"Device = {device}; Time per batch: {(time.time() - start)/3:.3f} seconds")
Have you selected the runtime as GPU?
check runtime> change runtime type > select hardware accelerator as GPU
Have you tried the following?
Go to Menu > Runtime > Change runtime.
Change hardware acceleration to GPU.
How to install CUDA in Google Colab GPU's
I've some trouble with tensorflow-gpu 1.6.0.
I'm doing the final assignment of "bayesan methods in machine learning" class on coursera.
https://www.coursera.org/learn/bayesian-methods-in-machine-learning
When I run the code on GPU with tensorflow-gpu (pip install tensorflow-gpu), python crashes, but if I run the same code on CPU with the standard tensorflow (pip isntall tensorflow), the code runs fast without errors or crashes. Obviously I unistalled the gpu version before I installed the standard version and vice versa.
About the python crash, the debugger shows this message:
Unhandled exception at 0x00007FFDAB4DB79E (ucrtbase.dll) in python.exe
This is the starter code:
import numpy as np
import matplotlib.pyplot as plt
from IPython.display import clear_output
import tensorflow as tf
import GPy
import GPyOpt
import keras
from keras.layers import Input, Dense, Lambda, InputLayer, concatenate, Activation, Flatten, Reshape
from keras.layers.normalization import BatchNormalization
from keras.layers.convolutional import Conv2D, Deconv2D
from keras.losses import MSE
from keras.models import Model, Sequential
from keras import backend as K
from keras import metrics
from keras.datasets import mnist
from keras.utils import np_utils
from tensorflow.python.framework import ops
from tensorflow.python.framework import dtypes
import utils
import os
%matplotlib inline
sess = tf.InteractiveSession()
K.set_session(sess)
latent_size = 8
vae, encoder, decoder = utils.create_vae(batch_size=128, latent=latent_size)
sess.run(tf.global_variables_initializer())
vae.load_weights('CelebA_VAE_small_8.h5')
K.set_learning_phase(False)
latent_placeholder = tf.placeholder(tf.float32, (1, latent_size))
decode = decoder(latent_placeholder)
This code causes python crash when is executed on GPU but NOT on CPU:
plt.figure(figsize=(10, 10))
for i in range(25):
plt.subplot(5, 5, i+1)
image = sess.run(decode, feed_dict={latent_placeholder: np.random.normal([0]*latent_size,[1]*latent_size)[:, np.newaxis].T})[0]### YOUR CODE HERE
plt.imshow(np.clip(image, 0, 1))
plt.axis('off')
Additional Information:
python version 3.6.4
tensorflow 1.6.0
tensorflow-gpu 1.6.0
cuDNN 7.1.1 for CUDA 9.0
CUDA 9.0 with patch 1 and 2
GPU 1080ti with driver 391.01
You can find the python notebook and the weights on wetransfer:
https://wetransfer.com/downloads/59b9011823d38c204b5ef5a2b58f5e8e20180311201808/32c900
I found the issue. cuDNN 7.1.1 doesn't work yet with tensorflow-gpu. I downgraded cuDNN to 7.0.5 and now the code works as expected.
If you have a issue like me, you have to downgrade cuDNN!
Already long time I try to understand with a problem. Please help me.
I'm trying to run the 'Keras' example from the standard example git lib (there).
If I use CPU, then everything will works fine; But, If I try to use GPU acceleration, it will crash WITHOUT catching any errors:
# build the model: a single LSTM
print('Build model...')
print(' 1')
model = Sequential()
print(' 2')
model.add(LSTM(128, input_shape=(maxlen, len(chars))))
print(' 3')
model.add(Dense(len(chars)))
print(' 4')
model.add(Activation('softmax'))
print(' 5')
optimizer = RMSprop(lr=0.01)
print(' Compilling')
model.compile(loss='categorical_crossentropy', optimizer=optimizer)
I put some print() for better understand the place of error.
And what I get:
runfile('C:/Users/kostya/Desktop/temp/python/test.py', wdir='C:/Users/kostya/Desktop/temp/python/')
Using Theano backend.
Using cuDNN version 5110 on context None
Preallocating 1638/2048 Mb (0.800000) on cuda
Mapped name None to device cuda: GeForce GTX 650 (0000:01:00.0)
WARNING: Preallocating too much memory can prevent cudnn and cublas from working properly
DEVICE: cuda
corpus length: 206433
total chars: 79
nb sequences: 68798
Vectorization...
Build model...
1
2
Ядро остановилось, перезапуск *(It means: The Core has stopped, restarting)*
I will take Similar error, if I run it througth standatr python console. (python.exe emergency stops)
I use: Win 10-64, Python 3.6.1, Anaconda with activated separate enviroment, CUDA 8.0, cuRNN 5.1, mkl 2017.0.3, numpy 1.13.0, theano 0.9.0, conda-forge.keras 2.0.2, m2w64-openblas 0.2.19, conda-forge.pygpu 0.6.8, VC 14.0 etc.
That's my .theanorc.txt configurational file. (I'm sure this can catch him. If I put the device = cpu - it works fine (but slowly))
[global]
floatX = float32
device = cuda
optimizer_including = cudnn
[nvcc]
flags=-LC:\Users\kostya\Anaconda3\envs\keras\libs
compiler_bindir=C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin
[cuda]
root = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0
[dnn]
library_path = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\lib\x64
include_path = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\include
[gpuarray]
preallocate = 0.8
You trying to use gpuarray backend option (preallocate) with CUDA backend. From Theano doc:
This value allocates GPU memory ONLY when using (GpuArray Backend). For the old backend, please see config.lib.cnmem
Try replace in your Theano config
[gpuarray]
preallocate = 0.8
with
[lib]
cnmem = 0.8