I am a graduate student from Taiwan and I am currently trying GAN,
but there are still many problems that give me a headache that I can't understand
I can't catch my GPU and it's reporting crazy errors, I'm not sure what the reason is, I'm using the content of this article and the errors are as follows.
https://aistudio.baidu.com/aistudio/projectdetail/1177827?channelType=0&channel=0
-
BUG:
W1201 21:04:53.908452 10888 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 11.8, Runtime API Version: 10.2
W1201 21:04:53.918452 10888 gpu_resources.cc:91] device: 0, cuDNN Version: 7.0.
W1201 21:04:58.973486 10888 gpu_resources.cc:217] WARNING: device: . The installed Paddle is compiled with CUDNN 7.6, but CUDNN version in your machine is 7.0, which may cause serious incompatible bug. Please recompile or reinstall Paddle with compatible CUDNN version.
Traceback (most recent call last):
File "c:\Users\User\Desktop\gan\try3.py", line 499, in <module>
generated_image = exe.run(g_program,
File "C:\Users\User\anaconda3\envs\pytorch\lib\site-packages\paddle\fluid\executor.py", line 1463, in run
six.reraise(*sys.exc_info())
File "C:\Users\User\anaconda3\envs\pytorch\lib\site-packages\six.py", line 719, in reraise
raise value
File "C:\Users\User\anaconda3\envs\pytorch\lib\site-packages\paddle\fluid\executor.py", line 1450, in run
res = self._run_impl(program=program,
File "C:\Users\User\anaconda3\envs\pytorch\lib\site-packages\paddle\fluid\executor.py", line 1661, in _run_impl
return new_exe.run(scope, list(feed.keys()), fetch_list,
File "C:\Users\User\anaconda3\envs\pytorch\lib\site-packages\paddle\fluid\executor.py", line 631, in run
tensors = self._new_exe.run(scope, feed_names,
File "c:\Users\User\Desktop\gan\try3.py", line 422, in <module>
g_img = G(x=noise)
File "c:\Users\User\Desktop\gan\try3.py", line 360, in G
x = deconv(x, num_filters=2048, filter_size=4, stride=1, padding=0, name='g_deconv_0')
File "c:\Users\User\Desktop\gan\try3.py", line 240, in deconv
return fluid.layers.conv2d_transpose(
File "C:\Users\User\anaconda3\envs\pytorch\lib\site-packages\paddle\fluid\layers\nn.py", line 4285, in conv2d_transpose
helper.append_op(type=op_type,
File "C:\Users\User\anaconda3\envs\pytorch\lib\site-packages\paddle\fluid\layer_helper.py", line 45, in append_op
return self.main_program.current_block().append_op(*args, **kwargs)
File "C:\Users\User\anaconda3\envs\pytorch\lib\site-packages\paddle\fluid\framework.py", line 4017, in append_op
op = Operator(
File "C:\Users\User\anaconda3\envs\pytorch\lib\site-packages\paddle\fluid\framework.py", line 2858, in __init__
for frame in traceback.extract_stack():
ExternalError: CUDNN error(8), CUDNN_STATUS_EXECUTION_FAILED.
[Hint: Please search for the error code(8) on website (https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnStatus_t) to get Nvidia's official solution and advice about CUDNN Error.] (at ../paddle/phi/kernels/gpudnn/conv_transpose_kernel.cu:278)
[operator < conv2d_transpose > error]
PS C:\Users\User\Desktop\gan> & C:/Users/User/anaconda3/envs/pytorch/python.exe c:/Users/User/Desktop/gan/try3.py
c:\Users\User\Desktop\gan\try3.py:35: DeprecationWarning: BILINEAR is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BILINEAR instead.
img = img.resize((int(l // rate),int(h // rate)),Image.BILINEAR)
Thank you very much for your help!
Can help me to solve the bugs
Related
After installing MLFlow using one-click-mlflow I save a pytorch model using the default command that I found in the user guide. You can find the command bellow:
mlflow.pytorch.log_model(net, artifact_path="model", pickle_module=pickle)
The neural network saved is very simple, this is basically a two layer neural network with Xavier initialization and hyperbolic tangent as activation function.
class Net(T.nn.Module):
def __init__(self):
super(Net, self).__init__()
self.hid1 = T.nn.Linear(n_features, 10)
self.hid2 = T.nn.Linear(10, 10)
self.oupt = T.nn.Linear(10, 1)
T.nn.init.xavier_uniform_(self.hid1.weight)
T.nn.init.zeros_(self.hid1.bias)
T.nn.init.xavier_uniform_(self.hid2.weight)
T.nn.init.zeros_(self.hid2.bias)
T.nn.init.xavier_uniform_(self.oupt.weight)
T.nn.init.zeros_(self.oupt.bias)
def forward(self, x):
z = T.tanh(self.hid1(x))
z = T.tanh(self.hid2(z))
z = self.oupt(z)
return z
Every things is runing fine in the Jupyter Notebook. I can log metrics and other artifact but when I save the model I got the following error message:
2021/10/13 09:21:00 WARNING mlflow.utils.requirements_utils: Found torch version (1.9.0+cu111) contains a local version label (+cu111). MLflow logged a pip requirement for this package as 'torch==1.9.0' without the local version label to make it installable from PyPI. To specify pip requirements containing local version labels, please use `conda_env` or `pip_requirements`.
2021/10/13 09:21:00 WARNING mlflow.utils.requirements_utils: Found torchvision version (0.10.0+cu111) contains a local version label (+cu111). MLflow logged a pip requirement for this package as 'torchvision==0.10.0' without the local version label to make it installable from PyPI. To specify pip requirements containing local version labels, please use `conda_env` or `pip_requirements`.
2021/10/13 09:21:01 ERROR mlflow.utils.environment: Encountered an unexpected error while inferring pip requirements (model URI: /tmp/tmpnl9dsoye/model/data, flavor: pytorch)
Traceback (most recent call last):
File "/home/ucsky/.virtualenv/mymodel/lib/python3.9/site-packages/mlflow/utils/environment.py", line 212, in infer_pip_requirements
return _infer_requirements(model_uri, flavor)
File "/home/ucsky/.virtualenv/mymodel/lib/python3.9/site-packages/mlflow/utils/requirements_utils.py", line 263, in _infer_requirements
modules = _capture_imported_modules(model_uri, flavor)
File "/home/ucsky/.virtualenv/mymodel/lib/python3.9/site-packages/mlflow/utils/requirements_utils.py", line 221, in _capture_imported_modules
_run_command(
File "/home/ucsky/.virtualenv/mymodel/lib/python3.9/site-packages/mlflow/utils/requirements_utils.py", line 173, in _run_command
raise MlflowException(msg)
mlflow.exceptions.MlflowException: Encountered an unexpected error while running ['/home/ucsky/.virtualenv/mymodel/bin/python', '/home/ucsky/.virtualenv/mymodel/lib/python3.9/site-packages/mlflow/utils/_capture_modules.py', '--model-path', '/tmp/tmpnl9dsoye/model/data', '--flavor', 'pytorch', '--output-file', '/tmp/tmplyj0w2fr/imported_modules.txt', '--sys-path', '["/home/ucsky/project/ofi-ds-research/incubator/ofi-pe-fr/notebook/guillaume-simon/06-modelisation-pytorch", "/home/ucsky/.virtualenv/mymodel/lib/python3.9/site-packages/git/ext/gitdb", "/usr/lib/python39.zip", "/usr/lib/python3.9", "/usr/lib/python3.9/lib-dynload", "", "/home/ucsky/.virtualenv/mymodel/lib/python3.9/site-packages", "/home/ucsky/.virtualenv/mymodel/lib/python3.9/site-packages/IPython/extensions", "/home/ucsky/.ipython", "/home/ucsky/.virtualenv/mymodel/lib/python3.9/site-packages/gitdb/ext/smmap"]']
exit status: 1
stdout:
stderr: Traceback (most recent call last):
File "/home/ucsky/.virtualenv/mymodel/lib/python3.9/site-packages/mlflow/utils/_capture_modules.py", line 125, in <module>
main()
File "/home/ucsky/.virtualenv/mymodel/lib/python3.9/site-packages/mlflow/utils/_capture_modules.py", line 118, in main
importlib.import_module(f"mlflow.{flavor}")._load_pyfunc(model_path)
File "/home/ucsky/.virtualenv/mymodel/lib/python3.9/site-packages/mlflow/pytorch/__init__.py", line 723, in _load_pyfunc
return _PyTorchWrapper(_load_model(path, **kwargs))
File "/home/ucsky/.virtualenv/mymodel/lib/python3.9/site-packages/mlflow/pytorch/__init__.py", line 626, in _load_model
return torch.load(model_path, **kwargs)
File "/home/ucsky/.virtualenv/mymodel/lib/python3.9/site-packages/torch/serialization.py", line 607, in load
return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
File "/home/ucsky/.virtualenv/mymodel/lib/python3.9/site-packages/torch/serialization.py", line 882, in _load
result = unpickler.load()
File "/home/ucsky/.virtualenv/mymodel/lib/python3.9/site-packages/torch/serialization.py", line 875, in find_class
return super().find_class(mod_name, name)
AttributeError: Can't get attribute 'Net' on <module '__main__' from '/home/ucsky/.virtualenv/mymodel/lib/python3.9/site-packages/mlflow/utils/_capture_modules.py'>
Can somebody explain me what is wrong?
I am running FastPhotoStyle code on Windows 10 and using Python 3.7, CUDA 10.0 and cuda 9.1. Although I made the change that was suggested to upgrade the version of Python from string to Byte, I am still getting the same error. Can you please suggest a fix for this issue.
Resize image: (803,538)->(803,538)
Resize image: (960,540)->(960,540)
Elapsed time in stylization: 2.325060
Elapsed time in propagation: 83.987388
Elapsed time in post processing: 0.015629
Traceback (most recent call last):
File "demo.py", line 47, in
no_post=args.no_post
File "D:\TrainImages\FastPhotoStyle-master\process_stylization.py", line 135, in stylization
out_img = smooth_filter(out_img, cont_pilimg, f_radius=15, f_edge=1e-1)
File "D:\TrainImages\FastPhotoStyle-master\smooth_filter.py", line 402, in smooth_filter
best_ = smooth_local_affine(output_, input_, 1e-7, 3, H, W, f_radius, f_edge)
File "D:\TrainImages\FastPhotoStyle-master\smooth_filter.py", line 333, in smooth_local_affine
program = Program(src.encode('utf-8'),best_local_affine_kernel.cu'.encode('utf-8'))
File "C:\Users\SD\Anaconda3\lib\site-packages\pynvrtc\compiler.py", line 49, in init
self._interface = NVRTCInterface(lib_name)
File "C:\Users\SD\Anaconda3\lib\site-packages\pynvrtc\interface.py", line 87, in init
self._load_nvrtc_lib(lib_path)
File "C:\Users\SD\Anaconda3\lib\site-packages\pynvrtc\interface.py", line 109, in _load_nvrtc_lib
self.lib = cdll.LoadLibrary(name)
File "C:\Users\SD\Anaconda3\lib\ctypes_init.py", line 434, in LoadLibrary
return self.dlltype(name)
File "C:\Users\SD\Anaconda3\lib\ctypes_init.py", line 356, in init
self._handle = _dlopen(self._name, mode)
OSError: [WinError 126] The specified module could not be found
I have already changed string to bytes
program = Program(src.encode('utf-8'), 'best_local_affine_kernel.cu'.encode('utf-8'))
ptx = program.compile(['-I/usr/local/cuda/include'.encode('utf-8')])
Please check the documentation here --> https://github.com/NVIDIA/FastPhotoStyle/blob/master/TUTORIAL.md
The above link specifies setup on Ubuntu but there are prerequites for Python modules as well that you should have installed on your machine.
I have non-sudo access to a machine with NVIDIA GPUs and CUDA 7.5 installed. I installed PyTorch with CUDA 7.5 support, which seems to have worked:
>>> import torch
>>> torch.cuda.is_available()
True
To get some practice, I followed tutorial for machine translation using RNNs. When I set USE_CUDA = False and the CPUs are used, everything works quite alright. However, when want to utilize the GPUs with USE_CUDA = True I get the following error:
Traceback (most recent call last):
...
File "seq2seq.py", line 229, in train
encoder_output, encoder_hidden = encoder(input_variable[ei], encoder_hidden)
File "/.../python2.7/site-packages/torch/nn/modules/module.py", line 206, in __call__
result = self.forward(*input, **kwargs)
File "seq2seq.py", line 144, in forward
output, hidden = self.gru(embedded, hidden)
File "/.../python2.7/site-packages/torch/nn/modules/module.py", line 206, in __call__
result = self.forward(*input, **kwargs)
File "/.../python2.7/site-packages/torch/nn/modules/rnn.py", line 91, in forward
output, hidden = func(input, self.all_weights, hx)
...
File "/.../python2.7/site-packages/torch/backends/cudnn/rnn.py", line 42, in init_rnn_descriptor
cudnn.DropoutDescriptor(handle, dropout_p, fn.dropout_seed)
File "/usr/lib/python2.7/ctypes/__init__.py", line 383, in __getitem__
func = self._FuncPtr((name_or_ordinal, self))
AttributeError: python: undefined symbol: cudnnCreateDropoutDescriptor
Exception AttributeError: 'python: undefined symbol: cudnnDestroyDropoutDescriptor' in <bound method DropoutDescriptor.__del__ of <torch.backends.cudnn.DropoutDescriptor object at 0x7fe540efec10>> ignored
I've tried to use Google to search for that error but got no meaningful results. Since I'm rather a newbie with PyTorch and CUDA, I have no idea how to go on from here. The full setup is Ubuntu 14.04, Python 2.7, CUDA 7.5.
As stated in the comments: your error is with outdated CUDNN, and can be resolved by upgrading.
Install current versions of CUDA, CUDNN, and PyTorch, then you'll be fine.
Sorry if this happens to be trivial as I happen to be new with these stuff. I setup theano to use my gpu for computations on ubuntu trusty tahr. I have AMD Radeon HD 7670M gpu. When I try to run the test script to check the functioning of the theano with gpu, I get the following error:
Mapped name None to device opencl0:0: Turks
Traceback (most recent call last):
File "test.py", line 11, in <module>
f = function([], T.exp(x))
File "/home/sachu/git/Theano/theano/compile/function.py", line 322, in function
output_keys=output_keys)
File "/home/sachu/git/Theano/theano/compile/pfunc.py", line 480, in pfunc
output_keys=output_keys)
File "/home/sachu/git/Theano/theano/compile/function_module.py", line 1784, in orig_function
defaults)
File "/home/sachu/git/Theano/theano/compile/function_module.py", line 1648, in create
input_storage=input_storage_lists, storage_map=storage_map)
File "/home/sachu/git/Theano/theano/gof/link.py", line 699, in make_thunk
storage_map=storage_map)[:3]
File "/home/sachu/git/Theano/theano/gof/vm.py", line 1042, in make_all
no_recycling))
File "/home/sachu/git/Theano/theano/gof/op.py", line 975, in make_thunk
no_recycling)
File "/home/sachu/git/Theano/theano/gof/op.py", line 875, in make_c_thunk
output_storage=node_output_storage)
File "/home/sachu/git/Theano/theano/gof/cc.py", line 1189, in make_thunk
keep_lock=keep_lock)
File "/home/sachu/git/Theano/theano/gof/cc.py", line 1130, in __compile__
keep_lock=keep_lock)
File "/home/sachu/git/Theano/theano/gof/cc.py", line 1602, in cthunk_factory
*(in_storage + out_storage + orphd))
RuntimeError: ('The following error happened while compiling the node', GpuElemwise{exp,no_inplace}(<GpuArrayType<None>(float64, (False,))>), '\n', 'Could not initialize elemwise support')
The script I ran was the one available on the website: http://deeplearning.net/software/theano/tutorial/using_gpu.html
Is it something wrong with the config? I believe all dependencies are set properly, but I could have made some mistake, but then I would probably something other than runtime-error. I searched a lot on the github for info related to this, but found nothing. Same was the result after searching on stackoverflow, heance I am posting this here. Any help is appreciated.
Thanks
Additional Info: python3.4, theano bleeding edge version. Libgpuarray, clblas, openblas are all built from the git source master branch. 64bit architecture.
Theano support for OpenCL is just not ready yet and it does not seem to be a priority for the development team to get this working (see this issue). So either you will need some patience or an nvidia GPU on which you could run CUDA.
I have installed Theano framework and enabled CUDA on my machine, however when I "import theano" in my python console, I got the following message:
>>> import theano
Using gpu device 0: GeForce GTX 950 (CNMeM is disabled, CuDNN not available)
Now that "CuDNN not available", I downloaded cuDnn from Nvidia website. I also updated 'path' in environment, and added 'optimizer_including=cudnn' in '.theanorc.txt' config file.
Then, I tried again, but failed, with:
>>> import theano
Using gpu device 0: GeForce GTX 950 (CNMeM is disabled, CuDNN not available)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Anaconda2\lib\site-packages\theano\__init__.py", line 111, in <module>
theano.sandbox.cuda.tests.test_driver.test_nvidia_driver1()
File "C:\Anaconda2\lib\site-packages\theano\sandbox\cuda\tests\test_driver.py", line 31, in test_nvidia_driver1
profile=False)
File "C:\Anaconda2\lib\site-packages\theano\compile\function.py", line 320, in function
output_keys=output_keys)
File "C:\Anaconda2\lib\site-packages\theano\compile\pfunc.py", line 479, in pfunc
output_keys=output_keys)
File "C:\Anaconda2\lib\site-packages\theano\compile\function_module.py", line 1776, in orig_function
output_keys=output_keys).create(
File "C:\Anaconda2\lib\site-packages\theano\compile\function_module.py", line 1456, in __init__
optimizer_profile = optimizer(fgraph)
File "C:\Anaconda2\lib\site-packages\theano\gof\opt.py", line 101, in __call__
return self.optimize(fgraph)
File "C:\Anaconda2\lib\site-packages\theano\gof\opt.py", line 89, in optimize
ret = self.apply(fgraph, *args, **kwargs)
File "C:\Anaconda2\lib\site-packages\theano\gof\opt.py", line 230, in apply
sub_prof = optimizer.optimize(fgraph)
File "C:\Anaconda2\lib\site-packages\theano\gof\opt.py", line 89, in optimize
ret = self.apply(fgraph, *args, **kwargs)
File "C:\Anaconda2\lib\site-packages\theano\gof\opt.py", line 230, in apply
sub_prof = optimizer.optimize(fgraph)
File "C:\Anaconda2\lib\site-packages\theano\gof\opt.py", line 89, in optimize
ret = self.apply(fgraph, *args, **kwargs)
File "C:\Anaconda2\lib\site-packages\theano\sandbox\cuda\dnn.py", line 2508, in apply
dnn_available.msg)
AssertionError: cuDNN optimization was enabled, but Theano was not able to use it. We got this error:
Theano can not compile with cuDNN. We got this error:
>>>
anyone can help me? Thanks.
There should be a way to do it by setting only the Path environment variable but I could never get that to work. The only thing that worked for me was to manually copy the CuDNN files into the appropriate folders in your CUDA installation.
For example, if your CUDA installation is in C:\CUDA\v7.0 and you extracted CuDNN to C:\CuDNN you would copy as follows:
The contents of C:\CuDNN\lib\x64\ would be copied to C:\CUDA\v7.0\lib\x64\
The contents of C:\CuDNN\include\ would be copied to C:\CUDA\v7.0\include\
The contents of C:\CuDNN\bin\ would be copied to C:\CUDA\v7.0\bin\
After that it should work.
In addition to all the stuffs you did I updated following content of .theanorc.txt in my home folder and it worked after that.
[lib]
#cnmem=1.0
cudnn=1.0