Older GPUs don't seem to support torch in spite of recent cuda versions.
In my case the crash has the following error:
/home/maxs/dev/mdb/venv38/lib/python3.8/site-packages/torch/cuda/__init__.py:83: UserWarning:
Found GPU%d %s which is of cuda capability %d.%d.
PyTorch no longer supports this GPU because it is too old.
The minimum cuda capability supported by this library is %d.%d.
warnings.warn(old_gpu_warn.format(d, name, major, minor, min_arch // 10, min_arch % 10))
WARNING:lightwood-16979:Exception: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1. when training model: <lightwood.model.neural.Neural object at 0x7f9c34df1e80>
Process LearnProcess-1:13:
Traceback (most recent call last):
File "/home/maxs/dev/mdb/venv38/sources/lightwood/lightwood/model/helpers/default_net.py", line 59, in forward
output = self.net(input)
File "/home/maxs/dev/mdb/venv38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/maxs/dev/mdb/venv38/lib/python3.8/site-packages/torch/nn/modules/container.py", line 139, in forward
input = module(input)
File "/home/maxs/dev/mdb/venv38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/maxs/dev/mdb/venv38/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 96, in forward
return F.linear(input, self.weight, self.bias)
File "/home/maxs/dev/mdb/venv38/lib/python3.8/site-packages/torch/nn/functional.py", line 1847, in linear
return torch._C._nn.linear(input, weight, bias)
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
This happens in spite of:
assert torch.cuda.is_available() == True
torch.version.cuda == '10.2'
How can I check for an older GPU that doesn't support torch without actually try/catching a tensor-to-gpu transfer? The transfer initializes cuda, which wastes like 2GB of memory, something I can't afford since I'd be running this check in dozens of processes, all of which would then waste 2GB of memory extra due to the initialization.
Based on the code in torch.cuda.__init__ that was actually throwing the error the following check seems to work:
import torch
from torch.cuda import device_count, get_device_capability
def is_cuda_compatible():
compatible_device_count = 0
if torch.version.cuda is not None:
for d in range(device_count()):
capability = get_device_capability(d)
major = capability[0]
minor = capability[1]
current_arch = major * 10 + minor
min_arch = min((int(arch.split("_")[1]) for arch in torch.cuda.get_arch_list()), default=35)
if (not current_arch < min_arch
and not torch._C._cuda_getCompiledVersion() <= 9000):
compatible_device_count += 1
if compatible_device_count > 0:
return True
return False
Not sure if it's 100% correct but putting it out here for feedback and in case anybody else needs it.
Related
I'm trying to get StableTuner working on Arch Linux and while I've gotten far I'm currently facing a problem now when I run the .sh file used for training.
I'm getting this error when trying to run StableTuner:
[campfire#archlinux scripts]$ bash run.sh
Booting Up StableTuner
Please wait a moment as we load up some stuff...
/home/campfire/.local/lib/python3.10/site-packages/accelerate/accelerator.py:231: FutureWarning: `logging_dir` is deprecated and will be removed in version 0.18.0 of π€ Accelerate. Use `project_dir` instead.
warnings.warn(
You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
For effortless bug reporting copy-paste your error into this form: https://docs.google.com/forms/d/e/1FAIpQLScPB8emS3Thkp66nvqwmjTEgxp8Y9ufuWTzFyr9kJ5AoI47dQ/viewform?usp=sf_link
================================================================================
/home/campfire/.local/lib/python3.10/site-packages/bitsandbytes/cuda_setup/paths.py:27: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('#/tmp/.ICE-unix/582,unix/archlinux'), PosixPath('local/archlinux')}
warn(
/home/campfire/.local/lib/python3.10/site-packages/bitsandbytes/cuda_setup/paths.py:27: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/org/freedesktop/DisplayManager/Session0')}
warn(
/home/campfire/.local/lib/python3.10/site-packages/bitsandbytes/cuda_setup/paths.py:27: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/org/freedesktop/DisplayManager/Seat0')}
warn(
/home/campfire/.local/lib/python3.10/site-packages/bitsandbytes/cuda_setup/paths.py:27: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('//debuginfod.archlinux.org '), PosixPath('https')}
warn(
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
/home/campfire/.local/lib/python3.10/site-packages/bitsandbytes/cuda_setup/paths.py:27: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/cuda/lib64')}
warn(
WARNING: No libcudart.so found! Install CUDA or the cudatoolkit package (anaconda)!
CUDA SETUP: Loading binary /home/campfire/.local/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so...
/home/campfire/.local/lib/python3.10/site-packages/bitsandbytes/cextension.py:48: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers and GPU quantization are unavailable.
warn(
/home/campfire/.local/lib/python3.10/site-packages/diffusers/configuration_utils.py:195: FutureWarning: It is deprecated to pass a pretrained model name or path to `from_config`.If you were trying to load a scheduler, please use <class 'diffusers.schedulers.scheduling_ddpm.DDPMScheduler'>.from_pretrained(...) instead. Otherwise, please make sure to pass a configuration dictionary instead. This functionality will be removed in v1.0.0.
deprecate("config-passed-as-path", "1.0.0", deprecation_message, standard_warn=False)
Creating Auto Bucketing Dataloader
Rounded resolution to: 512
Preloading images...
** Processing /home/campfire/Desktop: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5/5 [00:00<00:00, 3562.34it/s]
** Number of buckets: 2
** Bucket (512, 512) found 1 images, will duplicate 34 images due to batch size 35
** Bucket (640, 384) found 2 images, will duplicate 33 images due to batch size 35
Number of image-caption pairs: 70
** Validation Set: val, steps: 2, repeats: 1
Generating latents cache...
Caching latents: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 2/2 [00:08<00:00, 4.17s/it]
Latents are ready.
0%| | 0/2 [00:00<?, ?it/s Starting Training!%| | 0/200 [00:00<?, ?it/s]
Fetching 15 files: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 15/15 [00:00<00:00, 28728.11it/s]
/home/campfire/.local/lib/python3.10/site-packages/transformers/models/clip/feature_extraction_clip.py:28: FutureWarning: The class CLIPFeatureExtractor is deprecated and will be removed in version 5 of Transformers. Please use CLIPImageProcessor instead.
warnings.warn(s: 0%| | 0/15 [00:00<?, ?it/s]
You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing `safety_checker=None`. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 .
Steps To Epoch: 0%| | 0/2 [00:00<?, ?it/s]Traceback (most recent call last):
File "/home/campfire/StableTuner/scripts/scripts/trainer.py", line 2738, in <module>
main()
File "/home/campfire/StableTuner/scripts/scripts/trainer.py", line 2613, in main
model_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample
File "/home/campfire/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/campfire/.local/lib/python3.10/site-packages/accelerate/utils/operations.py", line 489, in __call__
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/home/campfire/.local/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 14, in decorate_autocast
return func(*args, **kwargs)
File "/home/campfire/.local/lib/python3.10/site-packages/diffusers/models/unet_2d_condition.py", line 424, in forward
sample, res_samples = downsample_block(
File "/home/campfire/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/campfire/.local/lib/python3.10/site-packages/diffusers/models/unet_2d_blocks.py", line 770, in forward
hidden_states = torch.utils.checkpoint.checkpoint(
File "/home/campfire/.local/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 249, in checkpoint
return CheckpointFunction.apply(function, preserve, *args)
File "/home/campfire/.local/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 107, in forward
outputs = run_function(*args)
File "/home/campfire/.local/lib/python3.10/site-packages/diffusers/models/unet_2d_blocks.py", line 763, in custom_forward
return module(*inputs, return_dict=return_dict)
File "/home/campfire/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/campfire/.local/lib/python3.10/site-packages/diffusers/models/attention.py", line 216, in forward
hidden_states = block(hidden_states, encoder_hidden_states=encoder_hidden_states, timestep=timestep)
File "/home/campfire/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/campfire/.local/lib/python3.10/site-packages/diffusers/models/attention.py", line 490, in forward
hidden_states = self.attn1(norm_hidden_states, attention_mask=attention_mask) + hidden_states
File "/home/campfire/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/campfire/.local/lib/python3.10/site-packages/diffusers/models/attention.py", line 638, in forward
hidden_states = self._attention(query, key, value, attention_mask)
File "/home/campfire/.local/lib/python3.10/site-packages/diffusers/models/attention.py", line 654, in _attention
attention_scores = torch.baddbmm(
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 8.75 GiB (GPU 0; 23.65 GiB total capacity; 13.25 GiB already allocated; 7.24 GiB free; 13.74 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Steps To Epoch: 0%| | 0/2 [00:00<?, ?it/s]
Overall Steps: 0%| | 0/200 [00:02<?, ?it/s]
Overall Epochs: 0%| | 0/100 [00:02<?, ?it/s]
Traceback (most recent call last):
File "/home/campfire/.local/bin/accelerate", line 8, in <module>
sys.exit(main())
File "/home/campfire/.local/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main
args.func(args)
File "/home/campfire/.local/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1097, in launch_command
simple_launcher(args)
File "/home/campfire/.local/lib/python3.10/site-packages/accelerate/commands/launch.py", line 552, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python', 'scripts/trainer.py', '--attention=xformers', '--model_variant=base', '--disable_cudnn_benchmark', '--use_text_files_as_captions', '--sample_step_interval=500', '--pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5', '--pretrained_vae_name_or_path=', '--output_dir=output/new_model', '--seed=3434554', '--resolution=512', '--train_batch_size=35', '--num_train_epochs=100', '--mixed_precision=fp16', '--use_bucketing', '--aspect_mode=dynamic', '--aspect_mode_action_preference=add', '--use_8bit_adam', '--gradient_checkpointing', '--gradient_accumulation_steps=1', '--learning_rate=3e-6', '--lr_warmup_steps=0', '--lr_scheduler=constant', '--regenerate_latent_cache', '--train_text_encoder', '--token_limit=75', '--concepts_list=stabletune_concept_list.json', '--num_class_images=200', '--save_every_n_epoch=100', '--n_save_sample=1', '--sample_height=512', '--sample_width=512', '--dataset_repeats=1', '--sample_on_training_start']' returned non-zero exit status 1.
I was told this is due to the CUDA path not being defined and that I needed to set
LD_LIBRARY_PATH=/opt/cuda/lib64:$LD_LIBRARY_PATHin the .sh or before I run the program in terminal, however inside /opt/ there isn't a CUDA folder.
I already have CUDA installed with pytorch(as it was a requirement) inside the "ST" conda env.
torch 1.13.1+cu116
torchaudio 0.13.1+cu116
torchmetrics 0.11.1
torchvision 0.14.1+cu116
When I enter pip show torch I get a location of Location: /home/campfire/.local/lib/python3.10/site-packages I am assuming since the pytorch version came with cu116 that that is where I need to point the path to?
How would I solve this issue? Do I need to point the CUDA PATH to the "ST" conda env instead?
you need to install cuda from sudo pacman -S cuda. Then you will have /opt/cuda. This is assuming you are on arch linux considering the arch linux tag on the post. The cuda package provides cuda-toolkit, cuda-sdk, and other libraries that you require.
UPDATE: I have edited and changed more code, now I dont get an error and it either works but taked hours, or it is stuck on step one
I have tried running Stable Diffusion, the new text2image model. The Problem is: I donΒ΄t have a NVIDIA GPU... After a bit of research, I found out you can "force" PyTorch to run on your CPU, not GPU. But up to this point, everything I tried while modifying the existing code, did not work. I always get to the point where it starts sampling, and prints the following error (everyting after the command):
Falling back to LAION 400M model...
Global seed set to 42
Loading model from models/ldm/text2img-large/model.ckpt
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 872.30 M params.
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla' with 512 in_channels
data: 0%| | 0/1 [00:00<?, ?it/s]
Sampling: 0%| | 0/2 [00:00<?, ?it/s]
Traceback (most recent call last):
File "scripts/txt2img.py", line 279, in <module>
main()
File "scripts/txt2img.py", line 233, in main
uc = model.get_learned_conditioning(batch_size * [""])
File "c:\users\louis\stable-diffusion\ldm\models\diffusion\ddpm.py", line 558, in get_learned_conditioning
c = self.cond_stage_model.encode(c)
File "c:\users\louis\stable-diffusion\ldm\modules\encoders\modules.py", line 111, in encode
return self(text)
File "C:\Users\louis\anaconda3\envs\ldm\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "c:\users\louis\stable-diffusion\ldm\modules\encoders\modules.py", line 103, in forward
tokens = self.tknz_fn(text)#.to(self.device)
File "C:\Users\louis\anaconda3\envs\ldm\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "c:\users\louis\stable-diffusion\ldm\modules\encoders\modules.py", line 74, in forward
tokens = batch_encoding["input_ids"].to(self.device)
File "C:\Users\louis\anaconda3\envs\ldm\lib\site-packages\torch\cuda\__init__.py", line 216, in _lazy_init
torch._C._cuda_init()
RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx
I already defined to use CPU in text2img.py, so the error of something like "user defined to use cuda, but no cuda device available" is fixed.
So my question(s):
-Is what IΒ΄m trying even possible?
-If yes, how should I edit the code to work?
(-Would it even be possible to modify it to work on AMD GPUs using ROCm?)
The Repo: https://github.com/CompVis/stable-diffusion
Using the LAION400m weights, because I currently donΒ΄t have access to the SD ones.
I got them using:
wget -O models/ldm/text2img-large/model.ckpt https://ommer-lab.com/files/latent-diffusion/nitro/txt2img-f8-large/model.ckpt
Guide I followed:https://github.com/lstein/stable-diffusion
Torch 0.4.1
Python 2.7.12
I was adapting NMP QC code (with some compatibility issues ironed out) to use multiple GPUs since my GPU couldn't handle the workload (crashed after running out of VRAM)
I'm new to pytorch, but I found a tutorial on using nn.DataParallel(model) to implement multi-gpu use
I modified main.py to use nn.DataParallel(model). Areas I changed have "#NEW" stuck to them.
The code runs fine even in multi-gpu mode if running on a single GPU, but gets a "arguments are located on different GPUs" error when running on 2 or more GPU
libibverbs: Warning: couldn't open config directory '/etc/libibverbs.d'.
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
Unexpected end of /proc/mounts line `overlay / overlay rw,relatime,lowerdir=/var/lib/docker/overlay2/l/QKSBQ5PAFDDC3OMBEELQQETALQ:/var/lib/docker/overlay2/l/WWYI3IDQPNXGON7AHODBPSTVXL:/var/lib/docker/overlay2/l/Q54I2HYS4TKH4LDJKBTVTGWWO6:/var/lib/docker/overlay2/l/IUV2LFPNMPOS3MREOTT52TKL54:/var/lib/docker/overlay2/l/DB5GBUCI3DCBPX6TJG3O337YVB:/var/lib/docker/overlay2/l/DNYKXCZJH5FMFNJLNGYJJ2ITPI:/var/lib/docker/overlay2/l/7DZCTDVNSTPJISGW65UG7U3F75:/var/lib/docker/overlay2/l/VOEQO652VS63NLDLZZ4TCIJLO6:/var/lib/docker/overlay2/l/4SI6ZCRUIORG5'
Traceback (most recent call last):
File "main.py", line 332, in <module>
main()
File "main.py", line 190, in main
train(train_loader, model, criterion, optimizer, epoch, evaluation, logger)
File "main.py", line 251, in train
output = model(g, h, e)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 477, in __call__
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/parallel/data_parallel.py", line 123, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/parallel/data_parallel.py", line 133, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/usr/local/lib/python2.7/dist-packages/torch/nn/parallel/parallel_apply.py", line 77, in parallel_apply
raise output
RuntimeError: arguments are located on different GPUs at /pytorch/aten/src/THC/generic/THCTensorMathBlas.cu:236
Since I was sending the inputs one at a time instead of at once like in the tutorial, I checked by using .get_device(), which confirmed all 4 arguments being sent (g, h, e, target) were on the same device (device 0)
I have non-sudo access to a machine with NVIDIA GPUs and CUDA 7.5 installed. I installed PyTorch with CUDA 7.5 support, which seems to have worked:
>>> import torch
>>> torch.cuda.is_available()
True
To get some practice, I followed tutorial for machine translation using RNNs. When I set USE_CUDA = False and the CPUs are used, everything works quite alright. However, when want to utilize the GPUs with USE_CUDA = True I get the following error:
Traceback (most recent call last):
...
File "seq2seq.py", line 229, in train
encoder_output, encoder_hidden = encoder(input_variable[ei], encoder_hidden)
File "/.../python2.7/site-packages/torch/nn/modules/module.py", line 206, in __call__
result = self.forward(*input, **kwargs)
File "seq2seq.py", line 144, in forward
output, hidden = self.gru(embedded, hidden)
File "/.../python2.7/site-packages/torch/nn/modules/module.py", line 206, in __call__
result = self.forward(*input, **kwargs)
File "/.../python2.7/site-packages/torch/nn/modules/rnn.py", line 91, in forward
output, hidden = func(input, self.all_weights, hx)
...
File "/.../python2.7/site-packages/torch/backends/cudnn/rnn.py", line 42, in init_rnn_descriptor
cudnn.DropoutDescriptor(handle, dropout_p, fn.dropout_seed)
File "/usr/lib/python2.7/ctypes/__init__.py", line 383, in __getitem__
func = self._FuncPtr((name_or_ordinal, self))
AttributeError: python: undefined symbol: cudnnCreateDropoutDescriptor
Exception AttributeError: 'python: undefined symbol: cudnnDestroyDropoutDescriptor' in <bound method DropoutDescriptor.__del__ of <torch.backends.cudnn.DropoutDescriptor object at 0x7fe540efec10>> ignored
I've tried to use Google to search for that error but got no meaningful results. Since I'm rather a newbie with PyTorch and CUDA, I have no idea how to go on from here. The full setup is Ubuntu 14.04, Python 2.7, CUDA 7.5.
As stated in the comments: your error is with outdated CUDNN, and can be resolved by upgrading.
Install current versions of CUDA, CUDNN, and PyTorch, then you'll be fine.
I am implementing some deep learning algorithms using theano. After I stop some programs running theano, occasionally the following error appears if I want to import theano again.
>>> import theano
ERROR (theano.sandbox.cuda): ERROR: Not using GPU. Initialisation of device gpu failed:
initCnmem: cnmemInit call failed! Reason=CNMEM_STATUS_OUT_OF_MEMORY. numdev=1
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/jjhu/.local/lib/python2.7/site-packages/theano/__init__.py", line 118, in <module>
theano.sandbox.cuda.tests.test_driver.test_nvidia_driver1()
File "/home/jjhu/.local/lib/python2.7/site-packages/theano/sandbox/cuda/tests/test_driver.py", line 40, in test_nvidia_driver1
if not numpy.allclose(f(), a.sum()):
File "/home/jjhu/.local/lib/python2.7/site-packages/theano/compile/function_module.py", line 875, in __call__
storage_map=getattr(self.fn, 'storage_map', None))
File "/home/jjhu/.local/lib/python2.7/site-packages/theano/gof/link.py", line 317, in raise_with_op
reraise(exc_type, exc_value, exc_trace)
File "/home/jjhu/.local/lib/python2.7/site-packages/theano/compile/function_module.py", line 862, in __call__
self.fn() if output_subset is None else\
RuntimeError: Cuda error: kernel_reduce_ccontig_node_4894639462a290346189bb38dab7bb7e_0: out of memory. (grid: 1 x 1; block: 256 x 1 x 1)
Apply node that caused the error: GpuCAReduce{add}{1}(<CudaNdarrayType(float32, vector)>)
Toposort index: 0
Inputs types: [CudaNdarrayType(float32, vector)]
Inputs shapes: [(10000,)]
Inputs strides: [(1,)]
Inputs values: ['not shown']
Outputs clients: [[HostFromGpu(GpuCAReduce{add}{1}.0)]]
HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.
I search for several solutions. Someone suggests to remove the compilation folder by rm -rf ./theano . I also check that the owner of ./theano is not root user. I also try setting my ./theanorc as following. But both do not work for me.
[global]
floatX = float32
device = cpu
optimizer=fast_run
[lib]
cnmem = 0.1
[cuda]
root = /usr/local/cuda
The only working solution is to reboot or log out the machine. It is very awkward. I don't know what causes this problem. Can anyone suggest some solutions?