Error invalid value when using CUDA [duplicate]

Error invalid value when using CUDA [duplicate] - python

I'm having this error when trying to run this code in Python using CUDA. I'm following this tutorial but i'm trying it in Windows 7 x64 machine.
https://www.youtube.com/watch?v=jKV1m8APttU
In fact, I run check_cuda() and all tests passed. Can anyone help me what is the exact issue here.
My Code:
import numpy as np
from timeit import default_timer as timer
from numbapro import vectorize, cuda
#vectorize(['float64(float64, float64)'], target='gpu')
def VectorAdd(a, b):
return a + b
def main():
N = 32000000
A = np.ones(N, dtype=np.float64)
B = np.ones(N, dtype=np.float64)
C = np.zeros(N, dtype=np.float64)
start = timer()
C = VectorAdd(A, B)
vectoradd_time = timer() - start
print("C[:5] = " + str(C[:5]))
print("C[-5:] = " + str(C[-5:]))
print("VectorAdd took %f seconds" % vectoradd_time)
if __name__ == '__main__':
main()
Error Message:
---------------------------------------------------------------------------
CudaAPIError Traceback (most recent call last)
<ipython-input-18-2436fc2ab63a> in <module>()
1 if __name__ == '__main__':
----> 2 main()
<ipython-input-17-64de53fdbe77> in main()
7
8 start = timer()
----> 9 C = VectorAdd(A, B)
10 vectoradd_time = timer() - start
11
C:\Anaconda2\lib\site-packages\numba\cuda\dispatcher.pyc in __call__(self, *args, **kws)
93 the input arguments.
94 """
---> 95 return CUDAUFuncMechanism.call(self.functions, args, kws)
96
97 def reduce(self, arg, stream=0):
C:\Anaconda2\lib\site-packages\numba\npyufunc\deviceufunc.pyc in call(cls, typemap, args, kws)
297
298 devarys.extend([devout])
--> 299 cr.launch(func, shape[0], stream, devarys)
300
301 if any_device:
C:\Anaconda2\lib\site-packages\numba\cuda\dispatcher.pyc in launch(self, func, count, stream, args)
202
203 def launch(self, func, count, stream, args):
--> 204 func.forall(count, stream=stream)(*args)
205
206 def is_device_array(self, obj):
C:\Anaconda2\lib\site-packages\numba\cuda\compiler.pyc in __call__(self, *args)
193
194 return kernel.configure(blkct, tpb, stream=self.stream,
--> 195 sharedmem=self.sharedmem)(*args)
196
197 class CUDAKernelBase(object):
C:\Anaconda2\lib\site-packages\numba\cuda\compiler.pyc in __call__(self, *args, **kwargs)
357 blockdim=self.blockdim,
358 stream=self.stream,
--> 359 sharedmem=self.sharedmem)
360
361 def bind(self):
C:\Anaconda2\lib\site-packages\numba\cuda\compiler.pyc in _kernel_call(self, args, griddim, blockdim, stream, sharedmem)
431 sharedmem=sharedmem)
432 # Invoke kernel
--> 433 cu_func(*kernelargs)
434
435 if self.debug:
C:\Anaconda2\lib\site-packages\numba\cuda\cudadrv\driver.pyc in __call__(self, *args)
1114
1115 launch_kernel(self.handle, self.griddim, self.blockdim,
-> 1116 self.sharedmem, streamhandle, args)
1117
1118 #property
C:\Anaconda2\lib\site-packages\numba\cuda\cudadrv\driver.pyc in launch_kernel(cufunc_handle, griddim, blockdim, sharedmem, hstream, args)
1158 hstream,
1159 params,
-> 1160 None)
1161
1162
C:\Anaconda2\lib\site-packages\numba\cuda\cudadrv\driver.pyc in safe_cuda_api_call(*args)
220 def safe_cuda_api_call(*args):
221 retcode = libfn(*args)
--> 222 self._check_error(fname, retcode)
223
224 setattr(self, fname, safe_cuda_api_call)
C:\Anaconda2\lib\site-packages\numba\cuda\cudadrv\driver.pyc in _check_error(self, fname, retcode)
250 errname = ERROR_MAP.get(retcode, "UNKNOWN_CUDA_ERROR")
251 msg = "Call to %s results in %s" % (fname, errname)
--> 252 raise CudaAPIError(retcode, msg)
253
254 def get_device(self, devnum=0):
CudaAPIError: [1] Call to cuLaunchKernel results in CUDA_ERROR_INVALID_VALUE

I found a solution to my problem through NVIDIA Developer Forum. If you wanna know more info regarding the solution check out this link.
https://devtalk.nvidia.com/default/topic/962843/cuda-programming-and-performance/cudaapierror-1-call-to-culaunchkernel-results-in-cuda_error_invalid_value-in-python/?offset=3#4968130
In Short:
When I changed the N = 32000 or any other smaller amount, it did work nicely.
In fact, this means I am not compiling it in correct GPU type(check_cuda is the function call to verify it).
Hope my answer would help for someone.

This may mean, that you try to run more threads in one block as it is actually allowed. For me it was the case. So try to split your execution in blocks.

Related

How to parallel a function taking two arguments and return a dictionary in DASK

I have a function batch_opt taking two arguments (integer i and pandas dataframe train) and return a python dictionary. When I was trying to parallelize the computation using DASK in Python, I got the type error of Delayed objects are immutable. I am new to DASK. Can anyone help me out here? Thanks.
results = []
for i in range(0, 2):
validation_res = delayed(batch_opt)(i, train)
results.append(validation_res)
start = time.time()
res = compute(*results)
print(time.time() - start)
Trace:
TypeError Traceback (most recent call last)
<ipython-input-19-8463f64dec56> in <module>
5
6 start = time.time()
----> 7 res = compute(*results)
8 print(time.time() - start)
~/.conda/envs/odop/lib/python3.8/site-packages/dask/base.py in compute(*args, **kwargs)
568 postcomputes.append(x.__dask_postcompute__())
569
--> 570 results = schedule(dsk, keys, **kwargs)
571 return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])
572
~/.conda/envs/odop/lib/python3.8/site-packages/dask/threaded.py in get(dsk, result, cache, num_workers, pool, **kwargs)
77 pool = MultiprocessingPoolExecutor(pool)
78
---> 79 results = get_async(
80 pool.submit,
81 pool._max_workers,
~/.conda/envs/odop/lib/python3.8/site-packages/dask/local.py in get_async(submit, num_workers, dsk, result, cache, get_id, rerun_exceptions_locally, pack_exception, raise_exception, callbacks, dumps, loads, chunksize, **kwargs)
505 _execute_task(task, data) # Re-execute locally
506 else:
--> 507 raise_exception(exc, tb)
508 res, worker_id = loads(res_info)
509 state["cache"][key] = res
~/.conda/envs/odop/lib/python3.8/site-packages/dask/local.py in reraise(exc, tb)
313 if exc.__traceback__ is not tb:
314 raise exc.with_traceback(tb)
--> 315 raise exc
316
317
~/.conda/envs/odop/lib/python3.8/site-packages/dask/local.py in execute_task(key, task_info, dumps, loads, get_id, pack_exception)
218 try:
219 task, data = loads(task_info)
--> 220 result = _execute_task(task, data)
221 id = get_id()
222 result = dumps((result, id))
~/.conda/envs/odop/lib/python3.8/site-packages/dask/core.py in _execute_task(arg, cache, dsk)
117 # temporaries by their reference count and can execute certain
118 # operations in-place.
--> 119 return func(*(_execute_task(a, cache) for a in args))
120 elif not ishashable(arg):
121 return arg
<ipython-input-7-e3af5748e1cf> in batch_opt(i, train)
22 test.loc[:, 'seg'] = test.apply(lambda x: proc.assign_trxn(x), axis = 1)
23 test_policy_res, test_metrics_res = opt.analyze_result(fa_m, x, test, cum_to_day, cur_policy, policy)
---> 24 validation_res[(train_mon_yr_batch, test_mon_yr)] = {'train_policy': train_policy_res, 'train_result': train_metrics_res, 'test_policy': test_policy_res, 'test_result': test_metrics_res}
25 return validation_res
~/.conda/envs/odop/lib/python3.8/site-packages/dask/delayed.py in __setitem__(self, index, val)
564
565 def __setitem__(self, index, val):
--> 566 raise TypeError("Delayed objects are immutable")
567
568 def __iter__(self):
TypeError: Delayed objects are immutable

Failure to parallelize code trying to load the same numpy array with joblib

I am new to the world of parallelization, and encountered a very odd bug as I was trying to run a function trying to load the same npy file running on several cores.
My code is of the form:
import os
from pathlib import Path
from joblib import Parallel, delayed
import multiprocessing
num_cores = multiprocessing.cpu_count()
mydir = 'path/of/your/choice'
myfile = 'myArray.npy'
mydir=Path(mydir)
myfile=mydir/myfile
os.chdir(mydir)
myarray = np.zeros((12345))
np.save(myfile, myarray)
def foo(myfile, x):
# function loading a myArray and working with it
arr=np.load(myfile)
return arr+x
if __name__=='__main__':
foo_results = Parallel(n_jobs=num_cores, backend="threading")(\
delayed(foo)(myfile,i) for i in range(10))
In my case, this script would run fine about 40% of the way, then return
--> 17 arr=np.load(mydir/'myArray.npy')
ValueError: cannot reshape array of size 0 into shape (12345,)
What blows my mind is that if I enter %pdb debug mode and actually try to run arr=np.load(mydir/'myArray.npy'), this works! So I assume that the issue stems from all the parallel processes running foo trying to load the same numpy array at the same time (as in debug mode, all the processes are paused and only the code that I execute actually runs).
This very minimal example actually works, presumably because the function is very simple and joblib handles this gracefully, but my code would be too long and complicated to be posted here - first of all, has anyone encountered a similar issue in the past? If no one manages to identify my issue, I will post my whole script.
Thanks for your help!
-------------------- EDIT ------------------
Given that there doesn't seem to be an easy answer with the toy code that I posted, here are the full error logs. I played around with the backends following #psarka recommendation and for some reason, the following error arises with the default loky backend (again, no problem to run the code in a non-parallel manner):
/media/maxime/ut_data/Dropbox/NeuroPyxels/npyx/corr.py in ccg_stack(dp, U_src, U_trg, cbin, cwin, normalize, all_to_all, name, sav, again, periods)
541
542 ccg_results=Parallel(n_jobs=num_cores)(\
--> 543 delayed(ccg)(*ccg_inputs[i]) for i in tqdm(range(len(ccg_inputs)), desc=f'Computing ccgs over {num_cores} cores'))
544 for ((i1, u1, i2, u2), CCG) in zip(ccg_ids,ccg_results):
545 if i1==i2:
~/miniconda3/envs/npyx/lib/python3.7/site-packages/joblib-1.0.1-py3.7.egg/joblib/parallel.py in __call__(self, iterable)
1052
1053 with self._backend.retrieval_context():
-> 1054 self.retrieve()
1055 # Make sure that we get a last message telling us we are done
1056 elapsed_time = time.time() - self._start_time
~/miniconda3/envs/npyx/lib/python3.7/site-packages/joblib-1.0.1-py3.7.egg/joblib/parallel.py in retrieve(self)
931 try:
932 if getattr(self._backend, 'supports_timeout', False):
--> 933 self._output.extend(job.get(timeout=self.timeout))
934 else:
935 self._output.extend(job.get())
~/miniconda3/envs/npyx/lib/python3.7/site-packages/joblib-1.0.1-py3.7.egg/joblib/_parallel_backends.py in wrap_future_result(future, timeout)
540 AsyncResults.get from multiprocessing."""
541 try:
--> 542 return future.result(timeout=timeout)
543 except CfTimeoutError as e:
544 raise TimeoutError from e
~/miniconda3/envs/npyx/lib/python3.7/concurrent/futures/_base.py in result(self, timeout)
426 raise CancelledError()
427 elif self._state == FINISHED:
--> 428 return self.__get_result()
429
430 self._condition.wait(timeout)
~/miniconda3/envs/npyx/lib/python3.7/concurrent/futures/_base.py in __get_result(self)
382 def __get_result(self):
383 if self._exception:
--> 384 raise self._exception
385 else:
386 return self._result
ValueError: Cannot load file containing pickled data when allow_pickle=False
but this arises with the threading backend, which is more informative (which was originally used in my question) - again, it is possible to actually run train = np.load(Path(dprm,fn)) in debug mode:
/media/maxime/ut_data/Dropbox/NeuroPyxels/npyx/corr.py in ccg_stack(dp, U_src, U_trg, cbin, cwin, normalize, all_to_all, name, sav, again, periods)
541
542 ccg_results=Parallel(n_jobs=num_cores, backend='threading')(\
--> 543 delayed(ccg)(*ccg_inputs[i]) for i in tqdm(range(len(ccg_inputs)), desc=f'Computing ccgs over {num_cores} cores'))
544 for ((i1, u1, i2, u2), CCG) in zip(ccg_ids,ccg_results):
545 if i1==i2:
~/miniconda3/envs/npyx/lib/python3.7/site-packages/joblib-1.0.1-py3.7.egg/joblib/parallel.py in __call__(self, iterable)
1052
1053 with self._backend.retrieval_context():
-> 1054 self.retrieve()
1055 # Make sure that we get a last message telling us we are done
1056 elapsed_time = time.time() - self._start_time
~/miniconda3/envs/npyx/lib/python3.7/site-packages/joblib-1.0.1-py3.7.egg/joblib/parallel.py in retrieve(self)
931 try:
932 if getattr(self._backend, 'supports_timeout', False):
--> 933 self._output.extend(job.get(timeout=self.timeout))
934 else:
935 self._output.extend(job.get())
~/miniconda3/envs/npyx/lib/python3.7/multiprocessing/pool.py in get(self, timeout)
655 return self._value
656 else:
--> 657 raise self._value
658
659 def _set(self, i, obj):
~/miniconda3/envs/npyx/lib/python3.7/multiprocessing/pool.py in worker(inqueue, outqueue, initializer, initargs, maxtasks, wrap_exception)
119 job, i, func, args, kwds = task
120 try:
--> 121 result = (True, func(*args, **kwds))
122 except Exception as e:
123 if wrap_exception and func is not _helper_reraises_exception:
~/miniconda3/envs/npyx/lib/python3.7/site-packages/joblib-1.0.1-py3.7.egg/joblib/_parallel_backends.py in __call__(self, *args, **kwargs)
593 def __call__(self, *args, **kwargs):
594 try:
--> 595 return self.func(*args, **kwargs)
596 except KeyboardInterrupt as e:
597 # We capture the KeyboardInterrupt and reraise it as
~/miniconda3/envs/npyx/lib/python3.7/site-packages/joblib-1.0.1-py3.7.egg/joblib/parallel.py in __call__(self)
261 with parallel_backend(self._backend, n_jobs=self._n_jobs):
262 return [func(*args, **kwargs)
--> 263 for func, args, kwargs in self.items]
264
265 def __reduce__(self):
~/miniconda3/envs/npyx/lib/python3.7/site-packages/joblib-1.0.1-py3.7.egg/joblib/parallel.py in <listcomp>(.0)
261 with parallel_backend(self._backend, n_jobs=self._n_jobs):
262 return [func(*args, **kwargs)
--> 263 for func, args, kwargs in self.items]
264
265 def __reduce__(self):
/media/maxime/ut_data/Dropbox/NeuroPyxels/npyx/corr.py in ccg(dp, U, bin_size, win_size, fs, normalize, ret, sav, verbose, periods, again, trains)
258 if verbose: print("File {} not found in routines memory.".format(fn))
259 crosscorrelograms = crosscorrelate_cyrille(dp, bin_size, win_size, sortedU, fs, True,
--> 260 periods=periods, verbose=verbose, trains=trains)
261 crosscorrelograms = np.asarray(crosscorrelograms, dtype='float64')
262 if crosscorrelograms.shape[0]<len(U): # no spikes were found in this period
/media/maxime/ut_data/Dropbox/NeuroPyxels/npyx/corr.py in crosscorrelate_cyrille(dp, bin_size, win_size, U, fs, symmetrize, periods, verbose, trains)
88 U=list(U)
89
---> 90 spike_times, spike_clusters = make_phy_like_spikeClustersTimes(dp, U, periods=periods, verbose=verbose, trains=trains)
91
92 return crosscorr_cyrille(spike_times, spike_clusters, win_size, bin_size, fs, symmetrize)
/media/maxime/ut_data/Dropbox/NeuroPyxels/npyx/corr.py in make_phy_like_spikeClustersTimes(dp, U, periods, verbose, trains)
46 for iu, u in enumerate(U):
47 # Even lists of strings can be dealt with as integers by being replaced by their indices
---> 48 trains_dic[iu]=trn(dp, u, sav=True, periods=periods, verbose=verbose) # trains in samples
49 else:
50 assert len(trains)>1
/media/maxime/ut_data/Dropbox/NeuroPyxels/npyx/spk_t.py in trn(dp, unit, sav, verbose, periods, again, enforced_rp)
106 if op.exists(Path(dprm,fn)) and not again:
107 if verbose: print("File {} found in routines memory.".format(fn))
--> 108 train = np.load(Path(dprm,fn))
109
110 # if not, compute it
~/miniconda3/envs/npyx/lib/python3.7/site-packages/numpy-1.21.0rc2-py3.7-linux-x86_64.egg/numpy/lib/npyio.py in load(file, mmap_mode, allow_pickle, fix_imports, encoding)
443 # Try a pickle
444 if not allow_pickle:
--> 445 raise ValueError("Cannot load file containing pickled data "
446 "when allow_pickle=False")
447 try:
ValueError: Cannot load file containing pickled data when allow_pickle=False
The original error ValueError: cannot reshape array of size 0 into shape (12345,) doesn't show up anymore for some reason.

Overriding * imports globally for jupyter

I'm running jupyter lab on windows and fastai.vision.utils.verify_images(fns) is giving me problems because it calls fastcore.parallel.parallel with default n_workers=8. There are many ways around it, but I was trying to figure out a code block that I could slap in any notebook and have it so all underlying calls to parallel will run with n_workers=1.
I tried the following cell:
import fastcore
import sys
_fastcore = fastcore
_parallel = lambda *args, **kwargs: fastcore.parallel.parallel(*args, **kwargs, n_workers=1)
_fastcore.parallel.parallel = _parallel
sys.modules['fastcore'] = _fastcore
fastcore.parallel.parallel
printing
<function __main__.<lambda>(*args, **kwargs)>
but when I try running verify_images it still fails as if the patch never happened
---------------------------------------------------------------------------
BrokenProcessPool Traceback (most recent call last)
<ipython-input-37-f1773f2c9e62> in <module>
3 # from mock import patch
4 # with patch('fastcore.parallel.parallel') as _parallel:
----> 5 failed = verify_images(fns)
6 # failed = L(fns[i] for i,o in enumerate(_parallel(verify_image, fns)) if not o)
7 failed
~\anaconda3\lib\site-packages\fastai\vision\utils.py in verify_images(fns)
59 def verify_images(fns):
60 "Find images in `fns` that can't be opened"
---> 61 return L(fns[i] for i,o in enumerate(parallel(verify_image, fns)) if not o)
62
63 # Cell
~\anaconda3\lib\site-packages\fastcore\parallel.py in parallel(f, items, n_workers, total, progress, pause, threadpool, timeout, chunksize, *args, **kwargs)
121 if total is None: total = len(items)
122 r = progress_bar(r, total=total, leave=False)
--> 123 return L(r)
124
125 # Cell
~\anaconda3\lib\site-packages\fastcore\foundation.py in __call__(cls, x, *args, **kwargs)
95 def __call__(cls, x=None, *args, **kwargs):
96 if not args and not kwargs and x is not None and isinstance(x,cls): return x
---> 97 return super().__call__(x, *args, **kwargs)
98
99 # Cell
~\anaconda3\lib\site-packages\fastcore\foundation.py in __init__(self, items, use_list, match, *rest)
103 def __init__(self, items=None, *rest, use_list=False, match=None):
104 if (use_list is not None) or not is_array(items):
--> 105 items = listify(items, *rest, use_list=use_list, match=match)
106 super().__init__(items)
107
~\anaconda3\lib\site-packages\fastcore\basics.py in listify(o, use_list, match, *rest)
54 elif isinstance(o, list): res = o
55 elif isinstance(o, str) or is_array(o): res = [o]
---> 56 elif is_iter(o): res = list(o)
57 else: res = [o]
58 if match is not None:
~\anaconda3\lib\concurrent\futures\process.py in _chain_from_iterable_of_lists(iterable)
482 careful not to keep references to yielded objects.
483 """
--> 484 for element in iterable:
485 element.reverse()
486 while element:
~\anaconda3\lib\concurrent\futures\_base.py in result_iterator()
609 # Careful not to keep a reference to the popped future
610 if timeout is None:
--> 611 yield fs.pop().result()
612 else:
613 yield fs.pop().result(end_time - time.monotonic())
~\anaconda3\lib\concurrent\futures\_base.py in result(self, timeout)
437 raise CancelledError()
438 elif self._state == FINISHED:
--> 439 return self.__get_result()
440 else:
441 raise TimeoutError()
~\anaconda3\lib\concurrent\futures\_base.py in __get_result(self)
386 def __get_result(self):
387 if self._exception:
--> 388 raise self._exception
389 else:
390 return self._result
BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.
I suspect it has to do with fastai.vision.utils using * imports for fastcore. Is there a way to achieve what I want?

Since the parallel function has already been imported into the fastai.vision.utils module, the correct way is to monkeypatch that module rather than fastcore.parallel:
... # your code for custom `parallel` function goes here
import fastai.vision.utils
fastai.vision.utils.parallel = _parallel # assign your custom function here

When terminate celery chain conditionally, it not return the data

I have chain of tasks, and want to terminate conditionally, I am following steps in https://stackoverflow.com/a/21106596/243031 but after that, we are not getting the output.
I have tasks as
from __future__ import absolute_import, unicode_literals
from .celery import app
#app.task(bind=True)
def add(self, x, y):
if (x + y) % 2 == 0:
self.request.callbacks[:] = []
return x + y
means when the sum is even and its part of chain, then stop that chain.
But it gives error.
In [13]: ~(add.s(2, 2) | add.s(3))
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-13-0fe85c5d0e22> in <module>
----> 1 ~(add.s(2, 2) | add.s(3))
~/virtualenv/lib/python3.7/site-packages/celery/canvas.py in __invert__(self)
479
480 def __invert__(self):
--> 481 return self.apply_async().get()
482
483 def __reduce__(self):
~/virtualenv/lib/python3.7/site-packages/celery/result.py in get(self, timeout, propagate, interval, no_ack, follow_parents, callback, on_message, on_interval, disable_sync_subtasks, EXCEPTION_STATES, PROPAGATE_STATES)
228 propagate=propagate,
229 callback=callback,
--> 230 on_message=on_message,
231 )
232 wait = get # deprecated alias to :meth:`get`.
~/virtualenv/lib/python3.7/site-packages/celery/backends/asynchronous.py in wait_for_pending(self, result, callback, propagate, **kwargs)
197 callback=None, propagate=True, **kwargs):
198 self._ensure_not_eager()
--> 199 for _ in self._wait_for_pending(result, **kwargs):
200 pass
201 return result.maybe_throw(callback=callback, propagate=propagate)
~/virtualenv/lib/python3.7/site-packages/celery/backends/asynchronous.py in _wait_for_pending(self, result, timeout, on_interval, on_message, **kwargs)
265 for _ in self.drain_events_until(
266 result.on_ready, timeout=timeout,
--> 267 on_interval=on_interval):
268 yield
269 sleep(0)
~/virtualenv/lib/python3.7/site-packages/celery/backends/asynchronous.py in drain_events_until(self, p, timeout, interval, on_interval, wait)
56 pass
57 if on_interval:
---> 58 on_interval()
59 if p.ready: # got event on the wanted channel.
60 break
~/virtualenv/lib/python3.7/site-packages/vine/promises.py in __call__(self, *args, **kwargs)
158 self.value = (ca, ck) = (retval,), {}
159 except Exception:
--> 160 return self.throw()
161 else:
162 self.value = (ca, ck) = final_args, final_kwargs
~/virtualenv/lib/python3.7/site-packages/vine/promises.py in __call__(self, *args, **kwargs)
155 ck = {}
156 else:
--> 157 retval = fun(*final_args, **final_kwargs)
158 self.value = (ca, ck) = (retval,), {}
159 except Exception:
~/virtualenv/lib/python3.7/site-packages/celery/result.py in _maybe_reraise_parent_error(self)
234 def _maybe_reraise_parent_error(self):
235 for node in reversed(list(self._parents())):
--> 236 node.maybe_throw()
237
238 def _parents(self):
~/virtualenv/lib/python3.7/site-packages/celery/result.py in maybe_throw(self, propagate, callback)
333 cache['status'], cache['result'], cache.get('traceback'))
334 if state in states.PROPAGATE_STATES and propagate:
--> 335 self.throw(value, self._to_remote_traceback(tb))
336 if callback is not None:
337 callback(self.id, value)
~/virtualenv/lib/python3.7/site-packages/celery/result.py in throw(self, *args, **kwargs)
326
327 def throw(self, *args, **kwargs):
--> 328 self.on_ready.throw(*args, **kwargs)
329
330 def maybe_throw(self, propagate=True, callback=None):
~/virtualenv/lib/python3.7/site-packages/vine/promises.py in throw(self, exc, tb, propagate)
232 if tb is None and (exc is None or exc is current_exc):
233 raise
--> 234 reraise(type(exc), exc, tb)
235
236 #property
~/virtualenv/lib/python3.7/site-packages/vine/utils.py in reraise(tp, value, tb)
28 if value.__traceback__ is not tb:
29 raise value.with_traceback(tb)
---> 30 raise value
TypeError: 'NoneType' object does not support item assignment
I tried self.request.chain = None and self.request.chain[:] = [],
#app.task(bind=True)
def add(self, x, y):
if self.request.chain and (x + y) % 2 == 0:
self.request.chain = None
return x + y
In logs, it shows, it return the data.
[2021-03-24 22:11:48,795: WARNING/MainProcess] Substantial drift from scrpc_worker#458cd596aed3 may mean clocks are out of sync. Current drift is
14400 seconds. [orig: 2021-03-24 22:11:48.795402 recv: 2021-03-25 02:11:48.796579]
[2021-03-24 22:11:52,227: INFO/MainProcess] Events of group {task} enabled by remote.
[2021-03-24 22:11:57,853: INFO/MainProcess] Received task: myprj.tasks.add[8aaae68f-d5ca-4c0a-8f2e-f1c7b5916e29]
[2021-03-24 22:11:57,867: INFO/ForkPoolWorker-8] Task myprj.tasks.add[8aaae68f-d5ca-4c0a-8f2e-f1c7b5916e29] succeeded in 0.01066690299999884s: 4
but it wait for some socket and when we press ctr+c, it give below traceback.
In [1]: from myprj.tasks import add
In [2]: ~(add.s(2, 2) | add.s(3))
^C---------------------------------------------------------------------------
KeyboardInterrupt Traceback (most recent call last)
<ipython-input-2-0fe85c5d0e22> in <module>
----> 1 ~(add.s(2, 2) | add.s(3))
~/virtualenv/lib/python3.7/site-packages/celery/canvas.py in __invert__(self)
479
480 def __invert__(self):
--> 481 return self.apply_async().get()
482
483 def __reduce__(self):
~/virtualenv/lib/python3.7/site-packages/celery/result.py in get(self, timeout, propagate, interval, no_ack, follow_parents, callback, on_message, on_interval, disable_sync_subtasks, EXCEPTION_STATES, PROPAGATE_STATES)
228 propagate=propagate,
229 callback=callback,
--> 230 on_message=on_message,
231 )
232 wait = get # deprecated alias to :meth:`get`.
~/virtualenv/lib/python3.7/site-packages/celery/backends/asynchronous.py in wait_for_pending(self, result, callback, propagate, **kwargs)
197 callback=None, propagate=True, **kwargs):
198 self._ensure_not_eager()
--> 199 for _ in self._wait_for_pending(result, **kwargs):
200 pass
201 return result.maybe_throw(callback=callback, propagate=propagate)
~/virtualenv/lib/python3.7/site-packages/celery/backends/asynchronous.py in _wait_for_pending(self, result, timeout, on_interval, on_message, **kwargs)
265 for _ in self.drain_events_until(
266 result.on_ready, timeout=timeout,
--> 267 on_interval=on_interval):
268 yield
269 sleep(0)
~/virtualenv/lib/python3.7/site-packages/celery/backends/asynchronous.py in drain_events_until(self, p, timeout, interval, on_interval, wait)
52 raise socket.timeout()
53 try:
---> 54 yield self.wait_for(p, wait, timeout=interval)
55 except socket.timeout:
56 pass
~/virtualenv/lib/python3.7/site-packages/celery/backends/asynchronous.py in wait_for(self, p, wait, timeout)
61
62 def wait_for(self, p, wait, timeout=None):
---> 63 wait(timeout=timeout)
64
65
~/virtualenv/lib/python3.7/site-packages/celery/backends/redis.py in drain_events(self, timeout)
149 if self._pubsub:
150 with self.reconnect_on_error():
--> 151 message = self._pubsub.get_message(timeout=timeout)
152 if message and message['type'] == 'message':
153 self.on_state_change(self._decode_result(message['data']), message)
~/virtualenv/lib/python3.7/site-packages/redis/client.py in get_message(self, ignore_subscribe_messages, timeout)
3615 number.
3616 """
-> 3617 response = self.parse_response(block=False, timeout=timeout)
3618 if response:
3619 return self.handle_message(response, ignore_subscribe_messages)
~/virtualenv/lib/python3.7/site-packages/redis/client.py in parse_response(self, block, timeout)
3501 self.check_health()
3502
-> 3503 if not block and not conn.can_read(timeout=timeout):
3504 return None
3505 response = self._execute(conn, conn.read_response)
~/virtualenv/lib/python3.7/site-packages/redis/connection.py in can_read(self, timeout)
732 self.connect()
733 sock = self._sock
--> 734 return self._parser.can_read(timeout)
735
736 def read_response(self):
~/virtualenv/lib/python3.7/site-packages/redis/connection.py in can_read(self, timeout)
319
320 def can_read(self, timeout):
--> 321 return self._buffer and self._buffer.can_read(timeout)
322
323 def read_response(self):
~/virtualenv/lib/python3.7/site-packages/redis/connection.py in can_read(self, timeout)
229 return bool(self.length) or \
230 self._read_from_socket(timeout=timeout,
--> 231 raise_on_timeout=False)
232
233 def read(self, length):
~/virtualenv/lib/python3.7/site-packages/redis/connection.py in _read_from_socket(self, length, timeout, raise_on_timeout)
196 sock.settimeout(timeout)
197 while True:
--> 198 data = recv(self._sock, socket_read_size)
199 # an empty string indicates the server shutdown the socket
200 if isinstance(data, bytes) and len(data) == 0:
~/virtualenv/lib/python3.7/site-packages/redis/_compat.py in recv(sock, *args, **kwargs)
70 else: # Python 3.5 and above automatically retry EINTR
71 def recv(sock, *args, **kwargs):
---> 72 return sock.recv(*args, **kwargs)
73
74 def recv_into(sock, *args, **kwargs):
KeyboardInterrupt:
First of all, terminating chain is good option or we have to add condition in next chain task that if some condition, don't process, just return what you get.
If we have to terminate the chain, what is the best option? where you get the output of partial chain.

pytorch RuntimeError: CUDA error: device-side assert triggered

I've a notebook on google colab that fails with following error
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/fastai/basic_train.py in fit(epochs, model, loss_func, opt, data, callbacks, metrics)
93 exception = e
---> 94 raise e
95 finally: cb_handler.on_train_end(exception)
/usr/local/lib/python3.6/dist-packages/fastai/basic_train.py in fit(epochs, model, loss_func, opt, data, callbacks, metrics)
83 xb, yb = cb_handler.on_batch_begin(xb, yb)
---> 84 loss = loss_batch(model, xb, yb, loss_func, opt, cb_handler)
85 if cb_handler.on_batch_end(loss): break
/usr/local/lib/python3.6/dist-packages/fastai/basic_train.py in loss_batch(model, xb, yb, loss_func, opt, cb_handler)
24 if opt is not None:
---> 25 loss = cb_handler.on_backward_begin(loss)
26 loss.backward()
/usr/local/lib/python3.6/dist-packages/fastai/callback.py in on_backward_begin(self, loss)
223 for cb in self.callbacks:
--> 224 a = cb.on_backward_begin(**self.state_dict)
225 if a is not None: self.state_dict['last_loss'] = a
/usr/local/lib/python3.6/dist-packages/fastai/basic_train.py in on_backward_begin(self, smooth_loss, **kwargs)
266 if self.pbar is not None and hasattr(self.pbar,'child'):
--> 267 self.pbar.child.comment = f'{smooth_loss:.4f}'
268
/usr/local/lib/python3.6/dist-packages/torch/tensor.py in __format__(self, format_spec)
377 if self.dim() == 0:
--> 378 return self.item().__format__(format_spec)
379 return object.__format__(self, format_spec)
RuntimeError: CUDA error: device-side assert triggered
During handling of the above exception, another exception occurred:
RuntimeError Traceback (most recent call last)
<ipython-input-33-dd390b1c8108> in <module>()
----> 1 lr_find(learn)
2 learn.recorder.plot()
/usr/local/lib/python3.6/dist-packages/fastai/train.py in lr_find(learn, start_lr, end_lr, num_it, stop_div, **kwargs)
26 cb = LRFinder(learn, start_lr, end_lr, num_it, stop_div)
27 a = int(np.ceil(num_it/len(learn.data.train_dl)))
---> 28 learn.fit(a, start_lr, callbacks=[cb], **kwargs)
29
30 def to_fp16(learn:Learner, loss_scale:float=512., flat_master:bool=False)->Learner:
/usr/local/lib/python3.6/dist-packages/fastai/basic_train.py in fit(self, epochs, lr, wd, callbacks)
160 callbacks = [cb(self) for cb in self.callback_fns] + listify(callbacks)
161 fit(epochs, self.model, self.loss_func, opt=self.opt, data=self.data, metrics=self.metrics,
--> 162 callbacks=self.callbacks+callbacks)
163
164 def create_opt(self, lr:Floats, wd:Floats=0.)->None:
/usr/local/lib/python3.6/dist-packages/fastai/basic_train.py in fit(epochs, model, loss_func, opt, data, callbacks, metrics)
93 exception = e
94 raise e
---> 95 finally: cb_handler.on_train_end(exception)
96
97 loss_func_name2activ = {'cross_entropy_loss': partial(F.softmax, dim=1), 'nll_loss': torch.exp, 'poisson_nll_loss': torch.exp,
/usr/local/lib/python3.6/dist-packages/fastai/callback.py in on_train_end(self, exception)
254 def on_train_end(self, exception:Union[bool,Exception])->None:
255 "Handle end of training, `exception` is an `Exception` or False if no exceptions during training."
--> 256 self('train_end', exception=exception)
257
258 class AverageMetric(Callback):
/usr/local/lib/python3.6/dist-packages/fastai/callback.py in __call__(self, cb_name, call_mets, **kwargs)
185 "Call through to all of the `CallbakHandler` functions."
186 if call_mets: [getattr(met, f'on_{cb_name}')(**self.state_dict, **kwargs) for met in self.metrics]
--> 187 return [getattr(cb, f'on_{cb_name}')(**self.state_dict, **kwargs) for cb in self.callbacks]
188
189 def on_train_begin(self, epochs:int, pbar:PBar, metrics:MetricFuncList)->None:
/usr/local/lib/python3.6/dist-packages/fastai/callback.py in <listcomp>(.0)
185 "Call through to all of the `CallbakHandler` functions."
186 if call_mets: [getattr(met, f'on_{cb_name}')(**self.state_dict, **kwargs) for met in self.metrics]
--> 187 return [getattr(cb, f'on_{cb_name}')(**self.state_dict, **kwargs) for cb in self.callbacks]
188
189 def on_train_begin(self, epochs:int, pbar:PBar, metrics:MetricFuncList)->None:
/usr/local/lib/python3.6/dist-packages/fastai/callbacks/lr_finder.py in on_train_end(self, **kwargs)
45 # restore the valid_dl we turned of on `__init__`
46 self.data.valid_dl = self.valid_dl
---> 47 self.learn.load('tmp')
48 if hasattr(self.learn.model, 'reset'): self.learn.model.reset()
49 print('LR Finder complete, type {learner_name}.recorder.plot() to see the graph.')
/usr/local/lib/python3.6/dist-packages/fastai/basic_train.py in load(self, name, device)
202 "Load model `name` from `self.model_dir` using `device`, defaulting to `self.data.device`."
203 if device is None: device = self.data.device
--> 204 self.model.load_state_dict(torch.load(self.path/self.model_dir/f'{name}.pth', map_location=device))
205 return self
206
/usr/local/lib/python3.6/dist-packages/torch/serialization.py in load(f, map_location, pickle_module)
356 f = open(f, 'rb')
357 try:
--> 358 return _load(f, map_location, pickle_module)
359 finally:
360 if new_fd:
/usr/local/lib/python3.6/dist-packages/torch/serialization.py in _load(f, map_location, pickle_module)
527 unpickler = pickle_module.Unpickler(f)
528 unpickler.persistent_load = persistent_load
--> 529 result = unpickler.load()
530
531 deserialized_storage_keys = pickle_module.load(f)
/usr/local/lib/python3.6/dist-packages/torch/serialization.py in persistent_load(saved_id)
493 if root_key not in deserialized_objects:
494 deserialized_objects[root_key] = restore_location(
--> 495 data_type(size), location)
496 storage = deserialized_objects[root_key]
497 if view_metadata is not None:
/usr/local/lib/python3.6/dist-packages/torch/serialization.py in restore_location(storage, location)
376 elif isinstance(map_location, torch.device):
377 def restore_location(storage, location):
--> 378 return default_restore_location(storage, str(map_location))
379 else:
380 def restore_location(storage, location):
/usr/local/lib/python3.6/dist-packages/torch/serialization.py in default_restore_location(storage, location)
102 def default_restore_location(storage, location):
103 for _, _, fn in _package_registry:
--> 104 result = fn(storage, location)
105 if result is not None:
106 return result
/usr/local/lib/python3.6/dist-packages/torch/serialization.py in _cuda_deserialize(obj, location)
84 'to an existing device.'.format(
85 device, torch.cuda.device_count()))
---> 86 return obj.cuda(device)
87
88
/usr/local/lib/python3.6/dist-packages/torch/_utils.py in _cuda(self, device, non_blocking, **kwargs)
74 else:
75 new_type = getattr(torch.cuda, self.__class__.__name__)
---> 76 return new_type(self.size()).copy_(self, non_blocking)
77
78
RuntimeError: cuda runtime error (59) : device-side assert triggered at /pytorch/aten/src/THC/generic/THCTensorCopy.cpp:20
There is no information about the real cause, I tried to get the stack trace by forcing cuda to run on one gpu (as suggested here) using a cell like this
!export CUDA_LAUNCH_BLOCKING=1
But this does not seem to work, still having the same error with.
Is there another way that works with Google Colab?

Be sure that your targets values starts from zero to number of classes - 1. Ex: you have 100 classification class so your target should be from 0 to 99

!export FOO=blah is usually not useful to run in a notebook because ! means run the following command in a sub-shell, so the effect of the statement is gone by the time the ! returns.
You might have more success by storing your python code in a file and then executing that file in a subshell:
In one cell:
%%writefile foo.py
[...your code...]
In the next cell:
!export CUDA_LAUNCH_BLOCKING=1; python3 foo.py
(or s/python3/python2/ if you're writing py2)

Switch Hardware Accelerator Type to "None" under Runtime->Change Runtime Type . This should give you a more meaningful error message.

The proper way to set environmental variables in Google Colab is to use os:
import os
os.environ['CUDA_LAUNCH_BLOCKING'] = "1"
Using the os library will allow you to set whatever environmental variables you need. Setting CUDA_LAUNCH_BLOCKING this way enables proper CUDA tracebacks in Google Colab.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Error invalid value when using CUDA [duplicate] - python

This may mean, that you try to run more threads in one block as it is actually allowed. For me it was the case. So try to split your execution in blocks.

Related

How to parallel a function taking two arguments and return a dictionary in DASK

Failure to parallelize code trying to load the same numpy array with joblib

Overriding * imports globally for jupyter

When terminate celery chain conditionally, it not return the data

pytorch RuntimeError: CUDA error: device-side assert triggered

Categories

Resources