I was trying to run in Keras but got
terminate called after throwing an instance of 'std::bad_alloc'
which doesn't make sense since I'm running the same Unet as before. I did make changes to CUDA, so I'm guessing that's the cause of this
Whenever I use tensorflow (I use version 2.3.0 in Ubuntu 16 with an NVIDIA GPU) and try
gpus = tf.config.experimental.list_physical_devices('GPU')
it shows gpus as an empty list and says
Successfully opened dynamic library libcudart.so.10.1
2020-09-14 16:39:11.975096: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcublas.so.10'; dlerror: libcublas.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-10.0/lib64:/usr/local/cuda-10.0/lib64::/usr/local/cuda-10.0/lib64::/usr/local/cuda-11.0/lib64::/usr/local/cuda-11.0/lib64
2020-09-14 16:39:11.975158: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-09-14 16:39:11.975197: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-09-14 16:39:11.975232: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-09-14 16:39:11.975380: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcusparse.so.10'; dlerror: libcusparse.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-10.0/lib64:/usr/local/cuda-10.0/lib64::/usr/local/cuda-10.0/lib64::/usr/local/cuda-11.0/lib64::/usr/local/cuda-11.0/lib64
2020-09-14 16:39:11.975436: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
even though I set
export PATH=/usr/local/cuda-10.0/bin${PATH:+:${PATH}}
export PATH=/usr/local/cuda-10.0/bin:/usr/local/cuda-10.0/NsightCompute-1.0${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64:${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
and which nvcc shows
/usr/local/cuda-10.0/bin/nvcc
and $LD_LIBRARY_PATH
shows
bash: /usr/local/cuda-10.0/lib64::/usr/local/cuda-11.0/lib64:/usr/local/cuda-11.0/extras/CUPTI/lib64:/usr/local/cuda/lib64:/usr/local/cuda-11.0/extras/CUPTI/lib64:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-11.0/lib64: No such file or directory
and ~/.bashrc shows
export PATH="$PATH:/usr/local/cuda-10.0/bin"
export LD_LIBRARY_PATH="/usr/local/cuda-10.0/lib64"${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
can anyone help?
EDIT
Output of sudo find / -name "libcublas*" is below:
/usr/share/doc/libcublas7.5
/usr/share/doc/libcublas-11-0
/usr/share/lintian/overrides/libcublas7.5
/usr/share/man/man7/libcublas.so.7.gz
/usr/share/man/man7/libcublas.7.gz
/usr/local/MATLAB/R2018a/bin/glnxa64/libcublas.so.9.0.176
/usr/local/MATLAB/R2018a/bin/glnxa64/libcublas.so.9.0
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcublas.so.11
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcublasLt.so.11
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcublas.so.11.2.0.252
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcublasLt.so.11.2.0.252
/usr/local/cuda-10.0/doc/man/man7/libcublas.so.7
/usr/local/cuda-10.0/doc/man/man7/libcublas.7
/usr/local/cuda-10.0/targets/x86_64-linux/lib/stubs/libcublas.so
/usr/local/cuda-10.0/targets/x86_64-linux/lib/libcublas.so
/usr/local/cuda-10.0/targets/x86_64-linux/lib/libcublas.so.10.0.130
/usr/local/cuda-10.0/targets/x86_64-linux/lib/libcublas_static.a
/usr/local/cuda-10.0/targets/x86_64-linux/lib/libcublas.so.10.0
/usr/lib/x86_64-linux-gnu/libcublas.so.7.5.18
/usr/lib/x86_64-linux-gnu/stubs/libcublas.so
/usr/lib/x86_64-linux-gnu/libcublas.so
/usr/lib/x86_64-linux-gnu/libcublas.so.7.5
/usr/lib/x86_64-linux-gnu/libcublas_device.a
/usr/lib/x86_64-linux-gnu/libcublas_static.a
find: ‘/run/user/1000/gvfs’: Permission denied
/home/me/.julia/packages/CuArrays/clDeS/src/blas/libcublas_types.jl
/home/me/.julia/packages/CuArrays/clDeS/src/blas/libcublas.jl
/home/me/Downloads/pgilinux-2019-1910-x86-64/install_components/linux86-64-nollvm/19.10/lib/libcublas.ipl
/home/me/Downloads/pgilinux-2019-1910-x86-64/install_components/linux86-64-nollvm/19.10/lib/libcublasemu.so
/home/me/Downloads/pgilinux-2019-1910-x86-64/install_components/linux86-64-nollvm/19.10/lib/libcublasemu.a
/home/me/Downloads/pgilinux-2019-1910-x86-64/install_components/linux86-64-nollvm/19.10/REDIST/libcublasemu.so
/home/me/Downloads/pgilinux-2019-1910-x86-64/install_components/linux86-64-llvm/19.10/lib/libcublas.ipl
/home/me/Downloads/install_components/linux86-64-nollvm/19.10/lib/libcublas.ipl
/home/me/Downloads/install_components/linux86-64-nollvm/19.10/lib/libcublasemu.so
/home/me/Downloads/install_components/linux86-64-nollvm/19.10/lib/libcublasemu.a
/home/me/Downloads/install_components/linux86-64-nollvm/19.10/REDIST/libcublasemu.so
/home/me/Downloads/install_components/linux86-64-llvm/19.10/lib/libcublas.ipl
/opt/pgi/linux86-64-nollvm/19.10/lib/libcublas.ipl
/opt/pgi/linux86-64-nollvm/19.10/lib/libcublasemu.so
/opt/pgi/linux86-64-nollvm/19.10/lib/libcublasemu.a
/opt/pgi/linux86-64-nollvm/19.10/REDIST/libcublasemu.so
/opt/pgi/linux86-64-nollvm/2019/cuda/9.2/lib64/libcublas.so.9.2.113
/opt/pgi/linux86-64-nollvm/2019/cuda/9.2/lib64/libcublas.so
/opt/pgi/linux86-64-nollvm/2019/cuda/9.2/lib64/libcublas_device.a
/opt/pgi/linux86-64-nollvm/2019/cuda/9.2/lib64/libcublas_static.a
/opt/pgi/linux86-64-nollvm/2019/cuda/9.2/lib64/libcublas.so.9.2
/opt/pgi/linux86-64-nollvm/2019/cuda/10.1/lib64/libcublasLt.so.10
/opt/pgi/linux86-64-nollvm/2019/cuda/10.1/lib64/libcublasLt_static.a
/opt/pgi/linux86-64-nollvm/2019/cuda/10.1/lib64/libcublas.so
/opt/pgi/linux86-64-nollvm/2019/cuda/10.1/lib64/libcublasLt.so.10.2.1.243
/opt/pgi/linux86-64-nollvm/2019/cuda/10.1/lib64/libcublas.so.10.2.1.243
/opt/pgi/linux86-64-nollvm/2019/cuda/10.1/lib64/libcublasLt.so
/opt/pgi/linux86-64-nollvm/2019/cuda/10.1/lib64/libcublas_static.a
/opt/pgi/linux86-64-nollvm/2019/cuda/10.1/lib64/libcublas.so.10
/opt/pgi/linux86-64-nollvm/2019/cuda/10.0/lib64/libcublas.so
/opt/pgi/linux86-64-nollvm/2019/cuda/10.0/lib64/libcublas.so.10.0.130
/opt/pgi/linux86-64-nollvm/2019/cuda/10.0/lib64/libcublas_static.a
/opt/pgi/linux86-64-nollvm/2019/cuda/10.0/lib64/libcublas.so.10.0
/opt/pgi/linux86-64-llvm/19.10/lib/libcublas.ipl
/opt/pgi/linux86-64-llvm/2019/cuda/9.2/lib64/libcublas.so.9.2.113
/opt/pgi/linux86-64-llvm/2019/cuda/9.2/lib64/libcublas.so
/opt/pgi/linux86-64-llvm/2019/cuda/9.2/lib64/libcublas_device.a
/opt/pgi/linux86-64-llvm/2019/cuda/9.2/lib64/libcublas_static.a
/opt/pgi/linux86-64-llvm/2019/cuda/9.2/lib64/libcublas.so.9.2
/opt/pgi/linux86-64-llvm/2019/cuda/10.1/lib64/libcublasLt.so.10
/opt/pgi/linux86-64-llvm/2019/cuda/10.1/lib64/libcublasLt_static.a
/opt/pgi/linux86-64-llvm/2019/cuda/10.1/lib64/libcublas.so
/opt/pgi/linux86-64-llvm/2019/cuda/10.1/lib64/libcublasLt.so.10.2.1.243
/opt/pgi/linux86-64-llvm/2019/cuda/10.1/lib64/libcublas.so.10.2.1.243
/opt/pgi/linux86-64-llvm/2019/cuda/10.1/lib64/libcublasLt.so
/opt/pgi/linux86-64-llvm/2019/cuda/10.1/lib64/libcublas_static.a
/opt/pgi/linux86-64-llvm/2019/cuda/10.1/lib64/libcublas.so.10
/opt/pgi/linux86-64-llvm/2019/cuda/10.0/lib64/libcublas.so
/opt/pgi/linux86-64-llvm/2019/cuda/10.0/lib64/libcublas.so.10.0.130
/opt/pgi/linux86-64-llvm/2019/cuda/10.0/lib64/libcublas_static.a
/opt/pgi/linux86-64-llvm/2019/cuda/10.0/lib64/libcublas.so.10.0
/var/lib/dpkg/info/libcublas-11-0.md5sums
/var/lib/dpkg/info/libcublas-11-0.list
/var/lib/dpkg/info/libcublas7.5:amd64.list
/var/lib/dpkg/info/libcublas7.5:amd64.triggers
/var/lib/dpkg/info/libcublas7.5:amd64.md5sums
/var/lib/dpkg/info/libcublas7.5:amd64.shlibs
/var/lib/dpkg/info/libcublas7.5:amd64.symbols
I had the same problem, I went to the https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/ and downloaded and installed libcublas10_10.1.0.105-1_amd64.deb
I've tried using Tensorflow GPU accelerator in google colab with local runtime on my machine which has the following system information
OS Platform and Distribution : Windows 10
TensorFlow version: 2.1
Python version: 3.6.10
CUDA/cuDNN version: Cudnn - 7.5.6 , CUDA- 10.1
GPU: Nividia Geforce RTX 2060
I've followed all the steps precisely on https://www.tensorflow.org/install/gpu and ran then ran this code to check if it can discover my gpu and to see the difference in speed between it and cpu:
import tensorflow as tf
device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
raise SystemError('GPU device not found')
print('Found GPU at: {}'.format(device_name))
import timeit
device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
print(
'\n\nThis error most likely means that this notebook is not '
'configured to use a GPU. Change this in Notebook Settings via the '
'command palette (cmd/ctrl-shift-P) or the Edit menu.\n\n')
raise SystemError('GPU device not found')
def cpu():
with tf.device('/cpu:0'):
random_image_cpu = tf.random.normal((100, 100, 100, 3))
net_cpu = tf.keras.layers.Conv2D(32, 7)(random_image_cpu)
return tf.math.reduce_sum(net_cpu)
def gpu():
with tf.device('/device:GPU:0'):
random_image_gpu = tf.random.normal((100, 100, 100, 3))
net_gpu = tf.keras.layers.Conv2D(32, 7)(random_image_gpu)
return tf.math.reduce_sum(net_gpu)
# We run each op once to warm up; see: https://stackoverflow.com/a/45067900
cpu()
gpu()
# Run the op several times.
print('Time (s) to convolve 32x7x7x3 filter over random 100x100x100x3 images '
'(batch x height x width x channel). Sum of ten runs.')
print('CPU (s):')
cpu_time = timeit.timeit('cpu()', number=10, setup="from __main__ import cpu")
print(cpu_time)
print('GPU (s):')
gpu_time = timeit.timeit('gpu()', number=10, setup="from __main__ import gpu")
print(gpu_time)
print('GPU speedup over CPU: {}x'.format(int(cpu_time/gpu_time)))
it returned the following error:
Found GPU at: /device:GPU:0
---------------------------------------------------------------------------
UnknownError Traceback (most recent call last)
<ipython-input-1-121519b30cf2> in <module>
29 # We run each op once to warm up; see: https://stackoverflow.com/a/45067900
30 cpu()
---> 31 gpu()
32
33 # Run the op several times.
<ipython-input-1-121519b30cf2> in gpu()
24 with tf.device('/device:GPU:0'):
25 random_image_gpu = tf.random.normal((100, 100, 100, 3))
---> 26 net_gpu = tf.keras.layers.Conv2D(32, 7)(random_image_gpu)
27 return tf.math.reduce_sum(net_gpu)
28
~\anaconda3\lib\site-packages\tensorflow_core\python\keras\engine\base_layer.py in __call__(self, inputs, *args, **kwargs)
820 with base_layer_utils.autocast_context_manager(
821 self._compute_dtype):
--> 822 outputs = self.call(cast_inputs, *args, **kwargs)
823 self._handle_activity_regularization(inputs, outputs)
824 self._set_mask_metadata(inputs, outputs, input_masks)
~\anaconda3\lib\site-packages\tensorflow_core\python\keras\layers\convolutional.py in call(self, inputs)
207 inputs = array_ops.pad(inputs, self._compute_causal_padding())
208
--> 209 outputs = self._convolution_op(inputs, self.kernel)
210
211 if self.use_bias:
~\anaconda3\lib\site-packages\tensorflow_core\python\ops\nn_ops.py in __call__(self, inp, filter)
1133 call_from_convolution=False)
1134 else:
-> 1135 return self.conv_op(inp, filter)
1136 # copybara:strip_end
1137 # copybara:insert return self.conv_op(inp, filter)
~\anaconda3\lib\site-packages\tensorflow_core\python\ops\nn_ops.py in __call__(self, inp, filter)
638
639 def __call__(self, inp, filter): # pylint: disable=redefined-builtin
--> 640 return self.call(inp, filter)
641
642
~\anaconda3\lib\site-packages\tensorflow_core\python\ops\nn_ops.py in __call__(self, inp, filter)
237 padding=self.padding,
238 data_format=self.data_format,
--> 239 name=self.name)
240
241
~\anaconda3\lib\site-packages\tensorflow_core\python\ops\nn_ops.py in conv2d(input, filter, strides, padding, use_cudnn_on_gpu, data_format, dilations, name, filters)
2009 data_format=data_format,
2010 dilations=dilations,
-> 2011 name=name)
2012
2013
~\anaconda3\lib\site-packages\tensorflow_core\python\ops\gen_nn_ops.py in conv2d(input, filter, strides, padding, use_cudnn_on_gpu, explicit_paddings, data_format, dilations, name)
931 input, filter, strides=strides, use_cudnn_on_gpu=use_cudnn_on_gpu,
932 padding=padding, explicit_paddings=explicit_paddings,
--> 933 data_format=data_format, dilations=dilations, name=name, ctx=_ctx)
934 except _core._SymbolicException:
935 pass # Add nodes to the TensorFlow graph.
~\anaconda3\lib\site-packages\tensorflow_core\python\ops\gen_nn_ops.py in conv2d_eager_fallback(input, filter, strides, padding, use_cudnn_on_gpu, explicit_paddings, data_format, dilations, name, ctx)
1020 explicit_paddings, "data_format", data_format, "dilations", dilations)
1021 _result = _execute.execute(b"Conv2D", 1, inputs=_inputs_flat, attrs=_attrs,
-> 1022 ctx=ctx, name=name)
1023 if _execute.must_record_gradient():
1024 _execute.record_gradient(
~\anaconda3\lib\site-packages\tensorflow_core\python\eager\execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
65 else:
66 message = e.message
---> 67 six.raise_from(core._status_to_exception(e.code, message), None)
68 except TypeError as e:
69 keras_symbolic_tensors = [
~\anaconda3\lib\site-packages\six.py in raise_from(value, from_value)
UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [Op:Conv2D]
and this is the log from the jupyter terminal:
2020-08-09 04:37:22.168805: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-08-09 04:37:24.322956: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2020-08-09 04:37:24.329330: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2020-08-09 04:37:25.599803: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce RTX 2060 computeCapability: 7.5
coreClock: 1.335GHz coreCount: 30 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 312.97GiB/s
2020-08-09 04:37:25.607874: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-08-09 04:37:25.616921: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-08-09 04:37:25.626584: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-08-09 04:37:25.635135: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-08-09 04:37:25.650044: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-08-09 04:37:25.659390: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-08-09 04:37:25.681098: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-08-09 04:37:25.686397: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-08-09 04:37:26.217444: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-08-09 04:37:26.222044: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] 0
2020-08-09 04:37:26.225124: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0: N
2020-08-09 04:37:26.228586: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/device:GPU:0 with 4604 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2060, pci bus id: 0000:01:00.0, compute capability: 7.5)
2020-08-09 04:37:26.239786: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce RTX 2060 computeCapability: 7.5
coreClock: 1.335GHz coreCount: 30 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 312.97GiB/s
2020-08-09 04:37:26.249100: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-08-09 04:37:26.254350: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-08-09 04:37:26.260971: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-08-09 04:37:26.265307: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-08-09 04:37:26.271569: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-08-09 04:37:26.276251: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-08-09 04:37:26.281798: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-08-09 04:37:26.287682: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-08-09 04:37:26.291846: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-08-09 04:37:26.298235: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] 0
2020-08-09 04:37:26.300794: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0: N
2020-08-09 04:37:26.305262: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/device:GPU:0 with 4604 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2060, pci bus id: 0000:01:00.0, compute capability: 7.5)
2020-08-09 04:37:26.313775: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce RTX 2060 computeCapability: 7.5
coreClock: 1.335GHz coreCount: 30 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 312.97GiB/s
2020-08-09 04:37:26.328318: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-08-09 04:37:26.339994: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-08-09 04:37:26.345874: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-08-09 04:37:26.352587: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-08-09 04:37:26.359694: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-08-09 04:37:26.365286: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-08-09 04:37:26.371099: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-08-09 04:37:26.375749: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-08-09 04:37:26.380113: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce RTX 2060 computeCapability: 7.5
coreClock: 1.335GHz coreCount: 30 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 312.97GiB/s
2020-08-09 04:37:26.393424: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-08-09 04:37:26.403150: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-08-09 04:37:26.408577: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-08-09 04:37:26.423141: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-08-09 04:37:26.428838: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-08-09 04:37:26.434061: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-08-09 04:37:26.438479: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-08-09 04:37:26.443288: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-08-09 04:37:26.446511: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-08-09 04:37:26.453204: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] 0
2020-08-09 04:37:26.458931: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0: N
2020-08-09 04:37:26.463016: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4604 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2060, pci bus id: 0000:01:00.0, compute capability: 7.5)
2020-08-09 04:37:26.823644: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-08-09 04:37:27.877441: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED
2020-08-09 04:37:27.882143: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED
I've tried the solutions mentioned here https://forums.developer.nvidia.com/t/could-not-create-cudnn-handle-cudnn-status-alloc-failed/108261/2 but to no avail, I hope I can someone who can assist me in this here.
If tf.config.list_physical_devices('GPU') returns device_type='GPU' means, there was no issue with TF_GPU installation.
So, in the next step you can try GPU memory resources management by allowing GPU memory growth.
It can be done by calling tf.config.experimental.set_memory_growth, which attempts to allocate only as much GPU memory as needed for the runtime allocations: it starts out allocating very little memory, and as the program gets run and more GPU memory is needed, we extend the GPU memory region allocated to the TensorFlow process.
To turn on memory growth for a specific GPU, use the following code prior to allocating any tensors or executing any ops.
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
try:
# Currently, memory growth needs to be the same across GPUs
for gpu in gpus:
tf.config.experimental.set_memory_growth(gpu, True)
logical_gpus = tf.config.experimental.list_logical_devices('GPU')
print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
except RuntimeError as e:
# Memory growth must be set before GPUs have been initialized
print(e)
For more details please refer TensorFlow GPU Guide.
I'm attempting to run a package (via R4.0.2 - SAVERX - https://github.com/jingshuw/SAVERX) which uses sctransfer as a basis (https://github.com/jingshuw/sctransfer). And I'm running into this error regarding rmsprop:
[1] "Use a pretrained model: No"
[1] "Processed file saved as: 1596347497.19716/tmpdata.rds"
[1] "Data preprocessed ..."
2020-08-02 08:51:45.539119: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib/R/lib:/usr/lib/x86_64-linux-gnu:/usr/lib/jvm/default-java/lib/server
2020-08-02 08:51:45.539149: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
//usr/local/lib/python3.8/dist-packages/scanpy/api/__init__.py:3: FutureWarning:
In a future version of Scanpy, `scanpy.api` will be removed.
Simply use `import scanpy as sc` and `import scanpy.external as sce` instead.
warnings.warn(
[1] "Python module sctransfer imported ..."
[1] "Cross-validation round: 1"
2020-08-02 08:51:48.615482: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib/R/lib:/usr/lib/x86_64-linux-gnu:/usr/lib/jvm/default-java/lib/server
2020-08-02 08:51:48.615506: W tensorflow/stream_executor/cuda/cuda_driver.cc:312] failed call to cuInit: UNKNOWN ERROR (303)
2020-08-02 08:51:48.615521: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (TJC-Ubuntu): /proc/driver/nvidia/version does not exist
2020-08-02 08:51:48.615698: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-08-02 08:51:48.621149: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 1999965000 Hz
2020-08-02 08:51:48.621392: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55b0b3ac1b60 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-08-02 08:51:48.621406: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
Error in py_call_impl(callable, dots$args, dots$keywords) :
KeyError: 'rmsprop'
Detailed traceback:
File "//usr/local/lib/python3.8/dist-packages/sctransfer/api.py", line 84, in autoencode
loss = train(adata[adata.obs.DCA_split == 'train'],
File "//usr/local/lib/python3.8/dist-packages/sctransfer/train.py", line 46, in train
optimizer = opt.__dict__[optimizer](clipvalue=clip_grad)
Timing stopped at: 1.494 0.035 1.501
Is there any obvious way to debug this or fix this without waiting on author response?
Hopefully you've got this sorted, but follow the paths to the files in the error traceback and look for any that use "import scanpy.api as sc" and change it to be "import scanpy as sc", and also change any instance of "import scanpy.api.external as sce" to "import scanpy.external as sce". Just had to do that in several files myself and got the DCA working.
deviceQuery confirms Computer has Cuda capable device
I get this error after seemingly to load cuda files:
2020-07-19 17:18:41.922056: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-07-19 17:18:56.392936: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2020-07-19 17:18:56.969124: E tensorflow/stream_executor/cuda/cuda_driver.cc:313] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2020-07-19 17:18:56.976577: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: SURFACE-
2020-07-19 17:18:56.980572: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2020-07-19 17:18:57.018199: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x25fbcf00ee0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-07-19 17:18:57.018616: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
I checked in my Python Code:tf.test.gpu_device_name() and returns nothing.
Also, does not list GPU in print(device_lib.list_local_devices())
The code to test is:
from tensorflow.python.client import device_lib
with tf.device('/gpu:0'):
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
tf.print(c)
print('Default GPU Device: {}'.format(tf.test.gpu_device_name()))
if tf.test.is_built_with_cuda():
print("Built with cuda")
if tf.test.is_built_with_gpu_support():
print('Built with GPU support')
if tf.test.gpu_device_name():
print('Default GPU Device: {}'.format(tf.test.gpu_device_name()))
else:
print("No Installed GPU version of TF")
print(device_lib.list_local_devices())
I got this solved by adding:
os.environ['CUDA_VISIBLE_DEVICES'] = "0"
I have tried to implement Tensorflow Object Detection API and During Training I got Below Mention Result and it taking so long time to Run and not getting more Result.
I have use only 7 image and 3 image in test set still not getting more Result
I am Using Tensorflow1.15
INFO:tensorflow:Done calling model_fn.
I1207 15:04:32.265669 9380 estimator.py:1150] Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
I1207 15:04:32.267668 9380 basic_session_run_hooks.py:541] Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
I1207 15:04:37.474951 9380 monitored_session.py:240] Graph was finalized.
2019-12-07 15:04:37.478682: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2019-12-07 15:04:37.494165: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2019-12-07 15:04:37.832387: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce 830M major: 5 minor: 0 memoryClockRate(GHz): 1.15
pciBusID: 0000:0a:00.0
2019-12-07 15:04:37.841191: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll
2019-12-07 15:04:37.853644: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_100.dll
2019-12-07 15:04:37.865194: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_100.dll
2019-12-07 15:04:37.875714: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_100.dll
2019-12-07 15:04:37.889732: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_100.dll
2019-12-07 15:04:37.905515: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_100.dll
2019-12-07 15:04:37.924530: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2019-12-07 15:04:37.932013: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2019-12-07 15:04:39.474886: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-12-07 15:04:39.480670: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0
2019-12-07 15:04:39.488584: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N
2019-12-07 15:04:39.493975: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1389 MB memory) -> physical GPU (device: 0, name: GeForce 830M, pci bus id: 0000:0a:00.0, compute capability: 5.0)
INFO:tensorflow:Restoring parameters from training/model.ckpt-0
I1207 15:04:39.522326 9380 saver.py:1284] Restoring parameters from training/model.ckpt-0
WARNING:tensorflow:From C:\Users\milan\Downloads\models-master\TFO\lib\site-packages\tensorflow_core\python\training\saver.py:1069: get_checkpoint_mtimes (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file utilities to get mtimes.
W1207 15:04:42.039816 9380 deprecation.py:323] From C:\Users\milan\Downloads\models-master\TFO\lib\site-packages\tensorflow_core\python\training\saver.py:1069: get_checkpoint_mtimes (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file utilities to get mtimes.
INFO:tensorflow:Running local_init_op.
I1207 15:04:43.612918 9380 session_manager.py:500] Running local_init_op.
INFO:tensorflow:Done running local_init_op.
I1207 15:04:44.214592 9380 session_manager.py:502] Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into training/model.ckpt.
I1207 15:04:57.934353 9380 basic_session_run_hooks.py:606] Saving checkpoints for 0 into training/model.ckpt.
2019-12-07 15:05:11.850788: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2019-12-07 15:05:12.891584: W tensorflow/stream_executor/cuda/redzone_allocator.cc:312] Internal: Invoking ptxas not supported on Windows
Relying on driver to perform ptx compilation. This message will be only logged once.
2019-12-07 15:05:13.030221: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_100.dll
INFO:tensorflow:loss = 10.3425045, step = 0
I1207 15:05:22.538612 9380 basic_session_run_hooks.py:262] loss = 10.3425045, step = 0