I am trying to use dask to parallelise some code. The function that I parallelise has 3 arguments, but only one of these changes as the loop progresses. This is what I have so far:
import dask
import numpy as np
# Set up client
cluster = SLURMCluster(cores=1, memory='40 GB',
queue='brc', interface='em1',
log_directory='./dask_logs')
cluster.scale(jobs=2)
client = distributed.Client(cluster)
# Fuction to be parrallelised
def nT_loop(i, P,inv_DiagCe):
x = P[:,i]* np.squeeze(-inv_DiagCe)
return x
P = np.random.rand(64620, 64620)
inv_DiagCe = np.random.rand(64620)
# Run loop
res1=[]
for i in range(2):
res = dask.delayed(nT_loop)(i, P,inv_DiagCe)
res1.append(res)
# Compute results
res1 = dask.compute(*res1)
When I run this however it give the following error:
~/miniconda3/envs/python38/lib/python3.8/site-packages/distributed/protocol/pickle.py in dumps(x, buffer_callback, protocol)
48 buffers.clear()
---> 49 result = pickle.dumps(x, **dump_kwargs)
50 if len(result) < 1000:
MemoryError:
During handling of the above exception, another exception occurred:
MemoryError Traceback (most recent call last)
~/wang_model/estimation.py in
208 #P[:,i] = P[:,i]* np.squeeze(-inv_DiagCe) #bsxfun(#times, P(:,i), -inv_DiagCe');
209
---> 210 res1 = dask.compute(*res1)
211 print(datetime.now().strftime("%H:%M:%S"))
~/miniconda3/envs/python38/lib/python3.8/site-packages/dask/base.py in compute(*args, **kwargs)
450 postcomputes.append(x.__dask_postcompute__())
451
--> 452 results = schedule(dsk, keys, **kwargs)
453 return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])
454
~/miniconda3/envs/python38/lib/python3.8/site-packages/distributed/client.py in get(self, dsk, keys, restrictions, loose_restrictions, resources, sync, asynchronous, direct, retries, priority, fifo_timeout, actors, **kwargs)
2703 Client.compute: Compute asynchronous collections
2704 """
-> 2705 futures = self._graph_to_futures(
2706 dsk,
2707 keys=set(flatten([keys])),
~/miniconda3/envs/python38/lib/python3.8/site-packages/distributed/client.py in _graph_to_futures(self, dsk, keys, restrictions, loose_restrictions, priority, user_priority, resources, retries, fifo_timeout, actors)
2639 {
2640 "op": "update-graph",
-> 2641 "tasks": valmap(dumps_task, dsk),
2642 "dependencies": dependencies,
2643 "keys": list(map(tokey, keys)),
~/miniconda3/envs/python38/lib/python3.8/site-packages/cytoolz/dicttoolz.pyx in cytoolz.dicttoolz.valmap()
~/miniconda3/envs/python38/lib/python3.8/site-packages/cytoolz/dicttoolz.pyx in cytoolz.dicttoolz.valmap()
~/miniconda3/envs/python38/lib/python3.8/site-packages/distributed/worker.py in dumps_task(task)
3356 return d
3357 elif not any(map(_maybe_complex, task[1:])):
-> 3358 return {"function": dumps_function(task[0]), "args": warn_dumps(task[1:])}
3359 return to_serialize(task)
3360
~/miniconda3/envs/python38/lib/python3.8/site-packages/distributed/worker.py in warn_dumps(obj, dumps, limit)
3365 def warn_dumps(obj, dumps=pickle.dumps, limit=1e6):
3366 """ Dump an object to bytes, warn if those bytes are large """
-> 3367 b = dumps(obj, protocol=4)
3368 if not _warn_dumps_warned[0] and len(b) > limit:
3369 _warn_dumps_warned[0] = True
~/miniconda3/envs/python38/lib/python3.8/site-packages/distributed/protocol/pickle.py in dumps(x, buffer_callback, protocol)
58 try:
59 buffers.clear()
---> 60 result = cloudpickle.dumps(x, **dump_kwargs)
61 except Exception as e:
62 logger.info("Failed to serialize %s. Exception: %s", x, e)
~/miniconda3/envs/python38/lib/python3.8/site-packages/cloudpickle/cloudpickle_fast.py in dumps(obj, protocol, buffer_callback)
71 file, protocol=protocol, buffer_callback=buffer_callback
72 )
---> 73 cp.dump(obj)
74 return file.getvalue()
75
~/miniconda3/envs/python38/lib/python3.8/site-packages/cloudpickle/cloudpickle_fast.py in dump(self, obj)
561 def dump(self, obj):
562 try:
--> 563 return Pickler.dump(self, obj)
564 except RuntimeError as e:
565 if "recursion" in e.args[0]:
MemoryError: '
I think this may be related to the large size of 'P'. Does anyone have any advice?
Thanks
Here
P = np.random.rand(64620, 64620)
you produce a massive array in memory, and then make copies to send to workers. Your function also returns an equally big array.
You should at the very least use client.scatter to do this step alone, rather than include the array in the graph.
But actually, dask has a perfectly good interface designed to be able to handle large arrays chunk-wise without breaking memory. I suggest you should use it instead of your delayed function approach.
Related
How can I properly serialize metpy units (based on pint) to work with dask distributed? As far as I understand, it looks like dask distributed automatically pickles data for ease of transfer, but fails to pickle the metpy units which is necessary for computation. Error produced: TypeError: cannot pickle 'weakref' object. MWE below.
import metpy.calc as mpcalc
from metpy.units import units
from dask.distributed import Client, LocalCluster
def calculate_dewpoint(vapor_pressure):
dewpoint = mpcalc.dewpoint(vapor_pressure * units('hPa'))
return dewpoint
cluster = LocalCluster()
client = Client(cluster)
## works
vapor_pressure = 5
dp = calculate_dewpoint(vapor_pressure)
print(dp)
## doesn't work
vapor_pressure = 5
dp_future = client.submit(calculate_dewpoint, vapor_pressure)
dp = dp_future.result()
EDIT: Added full traceback.
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
/glade/work/cbecker/miniconda3/envs/risk/lib/python3.8/site-packages/distributed/worker.py in dumps_function(func)
4271 with _cache_lock:
-> 4272 result = cache_dumps[func]
4273 except KeyError:
/glade/work/cbecker/miniconda3/envs/risk/lib/python3.8/site-packages/distributed/utils.py in __getitem__(self, key)
1362 def __getitem__(self, key):
-> 1363 value = super().__getitem__(key)
1364 self.data.move_to_end(key)
/glade/work/cbecker/miniconda3/envs/risk/lib/python3.8/collections/__init__.py in __getitem__(self, key)
1009 return self.__class__.__missing__(self, key)
-> 1010 raise KeyError(key)
1011 def __setitem__(self, key, item): self.data[key] = item
KeyError: <function calculate_dewpoint at 0x2ad5e010f0d0>
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
/glade/work/cbecker/miniconda3/envs/risk/lib/python3.8/site-packages/distributed/protocol/pickle.py in dumps(x, buffer_callback, protocol)
52 buffers.clear()
---> 53 result = cloudpickle.dumps(x, **dump_kwargs)
54 elif not _always_use_pickle_for(x) and b"__main__" in result:
/glade/work/cbecker/miniconda3/envs/risk/lib/python3.8/site-packages/cloudpickle/cloudpickle_fast.py in dumps(obj, protocol, buffer_callback)
72 )
---> 73 cp.dump(obj)
74 return file.getvalue()
/glade/work/cbecker/miniconda3/envs/risk/lib/python3.8/site-packages/cloudpickle/cloudpickle_fast.py in dump(self, obj)
601 try:
--> 602 return Pickler.dump(self, obj)
603 except RuntimeError as e:
TypeError: cannot pickle 'weakref' object
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
/glade/scratch/cbecker/ipykernel_272346/952144406.py in <module>
20 ## doesn't work
21 vapor_pressure = 5
---> 22 dp_future = client.submit(calculate_dewpoint, vapor_pressure)
23 dp = dp_future.result()
/glade/work/cbecker/miniconda3/envs/risk/lib/python3.8/site-packages/distributed/client.py in submit(self, func, key, workers, resources, retries, priority, fifo_timeout, allow_other_workers, actor, actors, pure, *args, **kwargs)
1577 dsk = {skey: (func,) + tuple(args)}
1578
-> 1579 futures = self._graph_to_futures(
1580 dsk,
1581 [skey],
/glade/work/cbecker/miniconda3/envs/risk/lib/python3.8/site-packages/distributed/client.py in _graph_to_futures(self, dsk, keys, workers, allow_other_workers, priority, user_priority, resources, retries, fifo_timeout, actors)
2628 # Pack the high level graph before sending it to the scheduler
2629 keyset = set(keys)
-> 2630 dsk = dsk.__dask_distributed_pack__(self, keyset, annotations)
2631
2632 # Create futures before sending graph (helps avoid contention)
/glade/work/cbecker/miniconda3/envs/risk/lib/python3.8/site-packages/dask/highlevelgraph.py in __dask_distributed_pack__(self, client, client_keys, annotations)
1074 "__module__": layer.__module__,
1075 "__name__": type(layer).__name__,
-> 1076 "state": layer.__dask_distributed_pack__(
1077 self.get_all_external_keys(),
1078 self.key_dependencies,
/glade/work/cbecker/miniconda3/envs/risk/lib/python3.8/site-packages/dask/highlevelgraph.py in __dask_distributed_pack__(self, all_hlg_keys, known_key_dependencies, client, client_keys)
432 for k, v in dsk.items()
433 }
--> 434 dsk = toolz.valmap(dumps_task, dsk)
435 return {"dsk": dsk, "dependencies": dependencies}
436
/glade/work/cbecker/miniconda3/envs/risk/lib/python3.8/site-packages/cytoolz/dicttoolz.pyx in cytoolz.dicttoolz.valmap()
/glade/work/cbecker/miniconda3/envs/risk/lib/python3.8/site-packages/cytoolz/dicttoolz.pyx in cytoolz.dicttoolz.valmap()
/glade/work/cbecker/miniconda3/envs/risk/lib/python3.8/site-packages/distributed/worker.py in dumps_task(task)
4308 return d
4309 elif not any(map(_maybe_complex, task[1:])):
-> 4310 return {"function": dumps_function(task[0]), "args": warn_dumps(task[1:])}
4311 return to_serialize(task)
4312
/glade/work/cbecker/miniconda3/envs/risk/lib/python3.8/site-packages/distributed/worker.py in dumps_function(func)
4272 result = cache_dumps[func]
4273 except KeyError:
-> 4274 result = pickle.dumps(func, protocol=4)
4275 if len(result) < 100000:
4276 with _cache_lock:
/glade/work/cbecker/miniconda3/envs/risk/lib/python3.8/site-packages/distributed/protocol/pickle.py in dumps(x, buffer_callback, protocol)
58 try:
59 buffers.clear()
---> 60 result = cloudpickle.dumps(x, **dump_kwargs)
61 except Exception as e:
62 logger.info("Failed to serialize %s. Exception: %s", x, e)
/glade/work/cbecker/miniconda3/envs/risk/lib/python3.8/site-packages/cloudpickle/cloudpickle_fast.py in dumps(obj, protocol, buffer_callback)
71 file, protocol=protocol, buffer_callback=buffer_callback
72 )
---> 73 cp.dump(obj)
74 return file.getvalue()
75
/glade/work/cbecker/miniconda3/envs/risk/lib/python3.8/site-packages/cloudpickle/cloudpickle_fast.py in dump(self, obj)
600 def dump(self, obj):
601 try:
--> 602 return Pickler.dump(self, obj)
603 except RuntimeError as e:
604 if "recursion" in e.args[0]:
TypeError: cannot pickle 'weakref' object
So there's an issue where (I think) it's trying to serialize the unit registry or units and transfer them between processes. To work around this, try moving the import of units inside the function (though this might cause some other problems):
def calculate_dewpoint(vapor_pressure):
from metpy.units import units
return mpcalc.dewpoint(vapor_pressure * units('hPa'))
I have a function batch_opt taking two arguments (integer i and pandas dataframe train) and return a python dictionary. When I was trying to parallelize the computation using DASK in Python, I got the type error of Delayed objects are immutable. I am new to DASK. Can anyone help me out here? Thanks.
results = []
for i in range(0, 2):
validation_res = delayed(batch_opt)(i, train)
results.append(validation_res)
start = time.time()
res = compute(*results)
print(time.time() - start)
Trace:
TypeError Traceback (most recent call last)
<ipython-input-19-8463f64dec56> in <module>
5
6 start = time.time()
----> 7 res = compute(*results)
8 print(time.time() - start)
~/.conda/envs/odop/lib/python3.8/site-packages/dask/base.py in compute(*args, **kwargs)
568 postcomputes.append(x.__dask_postcompute__())
569
--> 570 results = schedule(dsk, keys, **kwargs)
571 return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])
572
~/.conda/envs/odop/lib/python3.8/site-packages/dask/threaded.py in get(dsk, result, cache, num_workers, pool, **kwargs)
77 pool = MultiprocessingPoolExecutor(pool)
78
---> 79 results = get_async(
80 pool.submit,
81 pool._max_workers,
~/.conda/envs/odop/lib/python3.8/site-packages/dask/local.py in get_async(submit, num_workers, dsk, result, cache, get_id, rerun_exceptions_locally, pack_exception, raise_exception, callbacks, dumps, loads, chunksize, **kwargs)
505 _execute_task(task, data) # Re-execute locally
506 else:
--> 507 raise_exception(exc, tb)
508 res, worker_id = loads(res_info)
509 state["cache"][key] = res
~/.conda/envs/odop/lib/python3.8/site-packages/dask/local.py in reraise(exc, tb)
313 if exc.__traceback__ is not tb:
314 raise exc.with_traceback(tb)
--> 315 raise exc
316
317
~/.conda/envs/odop/lib/python3.8/site-packages/dask/local.py in execute_task(key, task_info, dumps, loads, get_id, pack_exception)
218 try:
219 task, data = loads(task_info)
--> 220 result = _execute_task(task, data)
221 id = get_id()
222 result = dumps((result, id))
~/.conda/envs/odop/lib/python3.8/site-packages/dask/core.py in _execute_task(arg, cache, dsk)
117 # temporaries by their reference count and can execute certain
118 # operations in-place.
--> 119 return func(*(_execute_task(a, cache) for a in args))
120 elif not ishashable(arg):
121 return arg
<ipython-input-7-e3af5748e1cf> in batch_opt(i, train)
22 test.loc[:, 'seg'] = test.apply(lambda x: proc.assign_trxn(x), axis = 1)
23 test_policy_res, test_metrics_res = opt.analyze_result(fa_m, x, test, cum_to_day, cur_policy, policy)
---> 24 validation_res[(train_mon_yr_batch, test_mon_yr)] = {'train_policy': train_policy_res, 'train_result': train_metrics_res, 'test_policy': test_policy_res, 'test_result': test_metrics_res}
25 return validation_res
~/.conda/envs/odop/lib/python3.8/site-packages/dask/delayed.py in __setitem__(self, index, val)
564
565 def __setitem__(self, index, val):
--> 566 raise TypeError("Delayed objects are immutable")
567
568 def __iter__(self):
TypeError: Delayed objects are immutable
I'm running jupyter lab on windows and fastai.vision.utils.verify_images(fns) is giving me problems because it calls fastcore.parallel.parallel with default n_workers=8. There are many ways around it, but I was trying to figure out a code block that I could slap in any notebook and have it so all underlying calls to parallel will run with n_workers=1.
I tried the following cell:
import fastcore
import sys
_fastcore = fastcore
_parallel = lambda *args, **kwargs: fastcore.parallel.parallel(*args, **kwargs, n_workers=1)
_fastcore.parallel.parallel = _parallel
sys.modules['fastcore'] = _fastcore
fastcore.parallel.parallel
printing
<function __main__.<lambda>(*args, **kwargs)>
but when I try running verify_images it still fails as if the patch never happened
---------------------------------------------------------------------------
BrokenProcessPool Traceback (most recent call last)
<ipython-input-37-f1773f2c9e62> in <module>
3 # from mock import patch
4 # with patch('fastcore.parallel.parallel') as _parallel:
----> 5 failed = verify_images(fns)
6 # failed = L(fns[i] for i,o in enumerate(_parallel(verify_image, fns)) if not o)
7 failed
~\anaconda3\lib\site-packages\fastai\vision\utils.py in verify_images(fns)
59 def verify_images(fns):
60 "Find images in `fns` that can't be opened"
---> 61 return L(fns[i] for i,o in enumerate(parallel(verify_image, fns)) if not o)
62
63 # Cell
~\anaconda3\lib\site-packages\fastcore\parallel.py in parallel(f, items, n_workers, total, progress, pause, threadpool, timeout, chunksize, *args, **kwargs)
121 if total is None: total = len(items)
122 r = progress_bar(r, total=total, leave=False)
--> 123 return L(r)
124
125 # Cell
~\anaconda3\lib\site-packages\fastcore\foundation.py in __call__(cls, x, *args, **kwargs)
95 def __call__(cls, x=None, *args, **kwargs):
96 if not args and not kwargs and x is not None and isinstance(x,cls): return x
---> 97 return super().__call__(x, *args, **kwargs)
98
99 # Cell
~\anaconda3\lib\site-packages\fastcore\foundation.py in __init__(self, items, use_list, match, *rest)
103 def __init__(self, items=None, *rest, use_list=False, match=None):
104 if (use_list is not None) or not is_array(items):
--> 105 items = listify(items, *rest, use_list=use_list, match=match)
106 super().__init__(items)
107
~\anaconda3\lib\site-packages\fastcore\basics.py in listify(o, use_list, match, *rest)
54 elif isinstance(o, list): res = o
55 elif isinstance(o, str) or is_array(o): res = [o]
---> 56 elif is_iter(o): res = list(o)
57 else: res = [o]
58 if match is not None:
~\anaconda3\lib\concurrent\futures\process.py in _chain_from_iterable_of_lists(iterable)
482 careful not to keep references to yielded objects.
483 """
--> 484 for element in iterable:
485 element.reverse()
486 while element:
~\anaconda3\lib\concurrent\futures\_base.py in result_iterator()
609 # Careful not to keep a reference to the popped future
610 if timeout is None:
--> 611 yield fs.pop().result()
612 else:
613 yield fs.pop().result(end_time - time.monotonic())
~\anaconda3\lib\concurrent\futures\_base.py in result(self, timeout)
437 raise CancelledError()
438 elif self._state == FINISHED:
--> 439 return self.__get_result()
440 else:
441 raise TimeoutError()
~\anaconda3\lib\concurrent\futures\_base.py in __get_result(self)
386 def __get_result(self):
387 if self._exception:
--> 388 raise self._exception
389 else:
390 return self._result
BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.
I suspect it has to do with fastai.vision.utils using * imports for fastcore. Is there a way to achieve what I want?
Since the parallel function has already been imported into the fastai.vision.utils module, the correct way is to monkeypatch that module rather than fastcore.parallel:
... # your code for custom `parallel` function goes here
import fastai.vision.utils
fastai.vision.utils.parallel = _parallel # assign your custom function here
I'm trying to do multiprocessing using dask. I have a function which has to run for 10000 files and will generate files as an output. Function is taking files from S3 bucket as an input and is working with another file inside from S3 with similar date and time. And I'm doing everything in JupyterLab
So here's my function:
def get_temp(file, name):
d=[name[0:4],name[4:6],name[6:8],name[9:11],name[11:13]]
f_zip = gzip.decompress(file)
yr=d[0]
mo=d[1]
da=d[2]
hr=d[3]
mn=d[4]
fs = s3fs.S3FileSystem(anon=True)
period = pd.Period(str(yr)+str('-')+str(mo)+str('-')+str(da), freq='D')
# period.dayofyear
dy=period.dayofyear
cc=[7,8,9,10,11,12,13,14,15,16] #look at the IR channels only for now
dat = xr.open_dataset(f_zip)
dd=dat[['recNum','trackLat','trackLon', 'temp']]
dd=dd.to_dataframe()
dd = dd.dropna()
dd['num'] = np.arange(len(dd))
l=dd.where((dd.trackLat>-50.0) & (dd.trackLat<50.0) & (dd.trackLon>-110.0) & (dd.trackLon<10.0))
l = l.dropna()
l.reset_index()
dy="{0:0=3d}".format(dy)
#opening goes data from S3
F=xr.open_dataset(fs.open(fs.glob('s3://noaa-goes16/ABI-L1b-RadF/'+str(yr)+'/'+str(dy)+'/'+str(hr)+'/'+'OR_ABI-L1b-RadF-M3C07'+'*')[int(mn)//15]))
#Converting Lat lon to radiance
req=F['goes_imager_projection'].semi_major_axis
oneovf=F['goes_imager_projection'].inverse_flattening
rpol=F['goes_imager_projection'].semi_minor_axis
e = 0.0818191910435
sat_h=F['goes_imager_projection'].perspective_point_height
H=req+sat_h
gc=np.deg2rad(F['goes_imager_projection'].longitude_of_projection_origin)
phi=np.deg2rad(l.trackLat.values)
gam=np.deg2rad(l.trackLon.values)
phic=np.arctan((rpol**2/req**2)*np.tan(phi))
rc=rpol/np.sqrt((1-e**2*np.cos(phic)**2))
sx=H-rc*np.cos(phic)*np.cos(gam-gc)
sy=-rc*np.cos(phic)*np.sin(gam-gc)
sz=rc*np.sin(phic)
yy=np.arctan(sz/sx)
xx=np.arcsin(-sy/(np.sqrt(sx**2+sy**2+sz**2)))
for i in range(len(xx)):
for c in range(len(ch):
ch="{0:0=2d}".format(cc[c])
F1=xr.open_dataset(fs.open(fs.glob('s3://noaa-goes16/ABI-L1b-RadF/'+str(yr)+'/'+str(dy)+'/'+str(hr)+'/'+'OR_ABI-L1b-RadF-M3C'+ch+'*')[0]))
F2=xr.open_dataset(fs.open(fs.glob('s3://noaa-goes16/ABI-L1b-RadF/'+str(yr)+'/'+str(dy)+'/'+str("{0:0=2d}".format(hr))+'/'+'OR_ABI-L1b-RadF-M3C'+ch+'*')[-1]))
G1 = F1.where((F1.x >= (xx[i]-0.005)) & (F1.x <= (xx[i]+0.005)) & (F1.y >= (yy[i]-0.005)) & (F1.y <= (yy[i]+0.005)), drop=True)
G2 = F2.where((F2.x >= (xx[i]-0.005)) & (F2.x <= (xx[i]+0.005)) & (F2.y >= (yy[i]-0.005)) & (F2.y <= (yy[i]+0.005)), drop=True)
G = xr.concat([G1, G2], dim = 'time')
G = G.assign_coords(channel=(ch))
if c == 0:
T = G
else:
T = xr.concat([T, G], dim = 'channel')
T = T.assign_coords(temp=(str(l['temp'][i])))
print(l.iloc[i]['num'])
path = name+'_'+str(int(l.iloc[i]['num']))+'.nc'
T.to_netcdf(path)
#zipping the file
with zipfile.ZipFile(name+'_'+str(int(l.iloc[i]['num']))+'.zip', 'w', compression=zipfile.ZIP_DEFLATED) as zf:
zf.write(path, arcname=str(name+'_'+str(int(l.iloc[i]['num']))+'.nc'))
# Storing it to S3
s3.Bucket(BUCKET).upload_file(path[:-3]+'.zip', "Output/" + path[:-3]+'.zip')
Here's I'm calling data from S3:
s3 = boto3.resource('s3')
s3client = boto3.client(
's3',
region_name='us-east-1'
)
bucketname = s3.Bucket('temp')
filedata = []
keys = []
names = []
for my_bucket_object in bucketname.objects.all():
keys.append(my_bucket_object.key)
for i in range(1, 21):
fileobj = s3client.get_object(
Bucket='temp',
Key=(keys[i]))
filedata.append(fileobj['Body'].read())
names.append(keys[i][10:-3])
Initially, I'm just trying to run 20 files for testing purposes.
Here's I'm creating dask delayed and compute function:
temp_files = []
for i in range(20):
s3_ds = dask.delayed(get_temp)(filedata[i], names[i])
temp_files.append(s3_ds)
temp_files = dask.compute(*temp_files)
Here's full log of error:
distributed.protocol.pickle - INFO - Failed to serialize <function get_temp at 0x7f20a9cb8550>. Exception: cannot pickle '_thread.lock' object
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
/srv/conda/envs/notebook/lib/python3.8/site-packages/distributed/worker.py in dumps_function(func)
3319 with _cache_lock:
-> 3320 result = cache_dumps[func]
3321 except KeyError:
/srv/conda/envs/notebook/lib/python3.8/site-packages/distributed/utils.py in __getitem__(self, key)
1572 def __getitem__(self, key):
-> 1573 value = super().__getitem__(key)
1574 self.data.move_to_end(key)
/srv/conda/envs/notebook/lib/python3.8/collections/__init__.py in __getitem__(self, key)
1009 return self.__class__.__missing__(self, key)
-> 1010 raise KeyError(key)
1011 def __setitem__(self, key, item): self.data[key] = item
KeyError: <function get_temp at 0x7f20a9cb8550>
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
/srv/conda/envs/notebook/lib/python3.8/site-packages/distributed/protocol/pickle.py in dumps(x, buffer_callback, protocol)
52 buffers.clear()
---> 53 result = cloudpickle.dumps(x, **dump_kwargs)
54 elif not _always_use_pickle_for(x) and b"__main__" in result:
/srv/conda/envs/notebook/lib/python3.8/site-packages/cloudpickle/cloudpickle_fast.py in dumps(obj, protocol, buffer_callback)
72 )
---> 73 cp.dump(obj)
74 return file.getvalue()
/srv/conda/envs/notebook/lib/python3.8/site-packages/cloudpickle/cloudpickle_fast.py in dump(self, obj)
562 try:
--> 563 return Pickler.dump(self, obj)
564 except RuntimeError as e:
TypeError: cannot pickle '_thread.lock' object
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
<ipython-input-77-fa46004f5919> in <module>
----> 1 temp_files = dask.compute(*temp_files)
/srv/conda/envs/notebook/lib/python3.8/site-packages/dask/base.py in compute(*args, **kwargs)
450 postcomputes.append(x.__dask_postcompute__())
451
--> 452 results = schedule(dsk, keys, **kwargs)
453 return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])
454
/srv/conda/envs/notebook/lib/python3.8/site-packages/distributed/client.py in get(self, dsk, keys, restrictions, loose_restrictions, resources, sync, asynchronous, direct, retries, priority, fifo_timeout, actors, **kwargs)
2703 Client.compute: Compute asynchronous collections
2704 """
-> 2705 futures = self._graph_to_futures(
2706 dsk,
2707 keys=set(flatten([keys])),
/srv/conda/envs/notebook/lib/python3.8/site-packages/distributed/client.py in _graph_to_futures(self, dsk, keys, restrictions, loose_restrictions, priority, user_priority, resources, retries, fifo_timeout, actors)
2639 {
2640 "op": "update-graph",
-> 2641 "tasks": valmap(dumps_task, dsk),
2642 "dependencies": dependencies,
2643 "keys": list(map(tokey, keys)),
/srv/conda/envs/notebook/lib/python3.8/site-packages/cytoolz/dicttoolz.pyx in cytoolz.dicttoolz.valmap()
/srv/conda/envs/notebook/lib/python3.8/site-packages/cytoolz/dicttoolz.pyx in cytoolz.dicttoolz.valmap()
/srv/conda/envs/notebook/lib/python3.8/site-packages/distributed/worker.py in dumps_task(task)
3356 return d
3357 elif not any(map(_maybe_complex, task[1:])):
-> 3358 return {"function": dumps_function(task[0]), "args": warn_dumps(task[1:])}
3359 return to_serialize(task)
3360
/srv/conda/envs/notebook/lib/python3.8/site-packages/distributed/worker.py in dumps_function(func)
3320 result = cache_dumps[func]
3321 except KeyError:
-> 3322 result = pickle.dumps(func, protocol=4)
3323 if len(result) < 100000:
3324 with _cache_lock:
/srv/conda/envs/notebook/lib/python3.8/site-packages/distributed/protocol/pickle.py in dumps(x, buffer_callback, protocol)
58 try:
59 buffers.clear()
---> 60 result = cloudpickle.dumps(x, **dump_kwargs)
61 except Exception as e:
62 logger.info("Failed to serialize %s. Exception: %s", x, e)
/srv/conda/envs/notebook/lib/python3.8/site-packages/cloudpickle/cloudpickle_fast.py in dumps(obj, protocol, buffer_callback)
71 file, protocol=protocol, buffer_callback=buffer_callback
72 )
---> 73 cp.dump(obj)
74 return file.getvalue()
75
/srv/conda/envs/notebook/lib/python3.8/site-packages/cloudpickle/cloudpickle_fast.py in dump(self, obj)
561 def dump(self, obj):
562 try:
--> 563 return Pickler.dump(self, obj)
564 except RuntimeError as e:
565 if "recursion" in e.args[0]:
TypeError: cannot pickle '_thread.lock' object
Can someone help me here and tell me what I'm doing wrong. And apart from Dask is there any other way for parallel processing?
So I figured out only when I'm uploading file to S3 bucket it's throwing that error, otherwise it's working fine. But If I'm not saving files in S3, I'm not able to figure out where the files are getting stored. When I'm running on dask it's saving file somewhere where I'm not able to find it. I'm running my code in Jupyterlab and there's nothing getting save in any directory.
I have taken some time to parse your code.
In the large function, you use s3fs to interact with your cloud storage, and this works well with xarray.
However, in your main code, you use boto3 to list and open S3 files. These files retain a reference to the client object, which maintains a connection pool. That is the thing that cannot be pickled.
s3fs is designed to work with Dask, and ensures the picklebility of the filesystem instances and OpenFile objects. Since you already use it in one part, I would recommend using s3fs throughout (but I am, of course biased, since I am the main author).
Alternatively, you could pass just the file names (as strings), and not open anything until within the worker function. This would be "best practice" - you should load data in worker tasks, rather than loading in the client and passing the data.
I have a function called sig2z that I want to apply over a dask array:
def sig2z(da, zr, zi, nvar=None, dim=None, coord=None):
"""
Interpolate variables on \sigma coordinates onto z coordinates.
Parameters
----------
da : `dask.array`
The data on sigma coordinates to be interpolated
zr : `dask.array`
The depths corresponding to sigma layers
zi : `numpy.array`
The depths which to interpolate the data on
nvar : str (optional)
Name of the variable. Only necessary when the variable is
horizontal velocity.
Returns
-------
dai : `dask.array`
The data interpolated onto a spatial uniform z coordinate
"""
if np.diff(zi)[0] < 0. or zi.max() <= 0.:
raise ValueError("The values in `zi` should be postive and increasing.")
if np.any(np.absolute(zr[0]) < np.absolute(zr[-1])):
raise ValueError("`zr` should have the deepest depth at index 0.")
if zr.shape != da.shape[-3:]:
raise ValueError("`zr` should have the same "
"spatial dimensions as `da`.")
if dim == None:
dim = da.dims
if coord == None:
coord = da.coords
N = da.shape
nzi = len(zi)
if len(N) == 4:
dai = np.empty((N[0],nzi,N[-2],N[-1]))
elif len(N) == 3:
dai = np.empty((nzi,N[-2],N[-1]))
else:
raise ValueError("The data should at least have three dimensions")
dai[:] = np.nan
zi = -zi[::-1] # ROMS has deepest level at index=0
if nvar=='u': # u variables
zl = .5*(zr.shift(eta_rho=-1, xi_rho=-1)
+ zr.shift(eta_rho=-1)
)
elif nvar=='v': # v variables
zl = .5*(zr.shift(xi_rho=-1)
+ zr.shift(eta_rho=-1, xi_rho=-1)
)
else:
zl = zr
for i in range(N[-1]):
for j in range(N[-2]):
# only bother for sufficiently deep regions
if zl[:,j,i].min() < -1e2:
# only interp on z above topo
ind = np.argwhere(zi >= zl[:,j,i].copy().min())
if len(N) == 4:
for s in range(N[0]):
dai[s,:len(ind),j,i] = _interpolate(da[s,:,j,i].copy(),
zl[:,j,i].copy(),
zi[int(ind[0]):]
)
else:
dai[:len(ind),j,i] = _interpolate(da[:,j,i].copy(),
zl[:,j,i].copy(),
zi[int(ind[0]):]
)
return xr.DataArray(dai, dims=dim, coords=coord)
This works fine on xarray.DataArray but when I apply it to dask.array, I get the following error:
test = dsar.map_blocks(sig2z, w[0].data,
zr.chunk({'eta_rho':1,'xi_rho':1}).data, zi,
dim, coord,
chunks=dai[0].chunks, dtype=dai.dtype
).compute()
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-29-d81bad2f4486> in <module>()
----> 1 test.compute()
/home/takaya/.conda/envs/arab/lib/python3.6/site-packages/dask/base.py in compute(self, **kwargs)
95 Extra keywords to forward to the scheduler ``get`` function.
96 """
---> 97 (result,) = compute(self, traverse=False, **kwargs)
98 return result
99
/home/takaya/.conda/envs/arab/lib/python3.6/site-packages/dask/base.py in compute(*args, **kwargs)
202 dsk = collections_to_dsk(variables, optimize_graph, **kwargs)
203 keys = [var._keys() for var in variables]
--> 204 results = get(dsk, keys, **kwargs)
205
206 results_iter = iter(results)
/home/takaya/.conda/envs/arab/lib/python3.6/site-packages/dask/threaded.py in get(dsk, result, cache, num_workers, **kwargs)
73 results = get_async(pool.apply_async, len(pool._pool), dsk, result,
74 cache=cache, get_id=_thread_get_id,
---> 75 pack_exception=pack_exception, **kwargs)
76
77 # Cleanup pools associated to dead threads
/home/takaya/.conda/envs/arab/lib/python3.6/site-packages/dask/local.py in get_async(apply_async, num_workers, dsk, result, cache, get_id, rerun_exceptions_locally, pack_exception, raise_exception, callbacks, dumps, loads, **kwargs)
519 _execute_task(task, data) # Re-execute locally
520 else:
--> 521 raise_exception(exc, tb)
522 res, worker_id = loads(res_info)
523 state['cache'][key] = res
/home/takaya/.conda/envs/arab/lib/python3.6/site-packages/dask/compatibility.py in reraise(exc, tb)
58 if exc.__traceback__ is not tb:
59 raise exc.with_traceback(tb)
---> 60 raise exc
61
62 else:
/home/takaya/.conda/envs/arab/lib/python3.6/site-packages/dask/local.py in execute_task(key, task_info, dumps, loads, get_id, pack_exception)
288 try:
289 task, data = loads(task_info)
--> 290 result = _execute_task(task, data)
291 id = get_id()
292 result = dumps((result, id))
/home/takaya/.conda/envs/arab/lib/python3.6/site-packages/dask/local.py in _execute_task(arg, cache, dsk)
269 func, args = arg[0], arg[1:]
270 args2 = [_execute_task(a, cache) for a in args]
--> 271 return func(*args2)
272 elif not ishashable(arg):
273 return arg
/home/takaya/.conda/envs/arab/lib/python3.6/site-packages/dask/array/core.py in getarray(a, b, lock)
63 c = a[b]
64 if type(c) != np.ndarray:
---> 65 c = np.asarray(c)
66 finally:
67 if lock:
/home/takaya/.conda/envs/arab/lib/python3.6/site-packages/numpy/core/numeric.py in asarray(a, dtype, order)
529
530 """
--> 531 return array(a, dtype, copy=False, order=order)
532
533
/home/takaya/.conda/envs/arab/lib/python3.6/site-packages/xarray/core/indexing.py in __array__(self, dtype)
425
426 def __array__(self, dtype=None):
--> 427 self._ensure_cached()
428 return np.asarray(self.array, dtype=dtype)
429
/home/takaya/.conda/envs/arab/lib/python3.6/site-packages/xarray/core/indexing.py in _ensure_cached(self)
422 def _ensure_cached(self):
423 if not isinstance(self.array, np.ndarray):
--> 424 self.array = np.asarray(self.array)
425
426 def __array__(self, dtype=None):
/home/takaya/.conda/envs/arab/lib/python3.6/site-packages/numpy/core/numeric.py in asarray(a, dtype, order)
529
530 """
--> 531 return array(a, dtype, copy=False, order=order)
532
533
/home/takaya/.conda/envs/arab/lib/python3.6/site-packages/xarray/core/indexing.py in __array__(self, dtype)
406
407 def __array__(self, dtype=None):
--> 408 return np.asarray(self.array, dtype=dtype)
409
410 def __getitem__(self, key):
/home/takaya/.conda/envs/arab/lib/python3.6/site-packages/numpy/core/numeric.py in asarray(a, dtype, order)
529
530 """
--> 531 return array(a, dtype, copy=False, order=order)
532
533
/home/takaya/.conda/envs/arab/lib/python3.6/site-packages/xarray/core/indexing.py in __array__(self, dtype)
373 def __array__(self, dtype=None):
374 array = orthogonally_indexable(self.array)
--> 375 return np.asarray(array[self.key], dtype=None)
376
377 def __getitem__(self, key):
/home/takaya/.conda/envs/arab/lib/python3.6/site-packages/numpy/core/numeric.py in asarray(a, dtype, order)
529
530 """
--> 531 return array(a, dtype, copy=False, order=order)
532
533
/home/takaya/.conda/envs/arab/lib/python3.6/site-packages/xarray/core/indexing.py in __array__(self, dtype)
373 def __array__(self, dtype=None):
374 array = orthogonally_indexable(self.array)
--> 375 return np.asarray(array[self.key], dtype=None)
376
377 def __getitem__(self, key):
/home/takaya/.conda/envs/arab/lib/python3.6/site-packages/xarray/backends/netCDF4_.py in __getitem__(self, key)
58 with self.datastore.ensure_open(autoclose=True):
59 try:
---> 60 data = getitem(self.get_array(), key)
61 except IndexError:
62 # Catch IndexError in netCDF4 and return a more informative
netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Variable.__getitem__ (netCDF4/_netCDF4.c:39743)()
netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Variable._get (netCDF4/_netCDF4.c:49835)()
RuntimeError: Resource temporarily unavailable
Could someone please tell me why I'm getting this error? Thank you in advance.
pid numbers, open file descriptors, memory are limited resources.
fork(2) manual says when errno.EAGAIN should happen:
[EAGAIN] The system-imposed limit on the total number of processes under
execution would be exceeded. This limit is configuration-dependent.
[EAGAIN] The system-imposed limit MAXUPRC () on the total number of processes
under execution by a single user would be exceeded.
To reproduce the error more easily, you could add at the start of your program:
import resource
resource.setrlimit(resource.RLIMIT_NPROC, (20, 20))
The issue might be that all child processes are alive because you haven't called p.stdin.close() and gnuplot's stdin might be fully buffered when redirected to a pipe i.e., gnuplot processes might be stuck awaiting input. And/or your application uses too many file descriptors (file descriptors are inherited by child processes by default on Python 2.7) without releasing them.
If input doesn't depend on the output and the input is limited in size then use .communicate():
from subprocess import Popen, PIPE, STDOUT
p = Popen("gnuplot", stdin=PIPE, stdout=PIPE, stderr=PIPE,
close_fds=True, # to avoid running out of file descriptors
bufsize=-1, # fully buffered (use zero (default) if no p.communicate())
universal_newlines=True) # translate newlines, encode/decode text
out, err = p.communicate("\n".join(['set terminal gif;', contents]))
.communicate() writes all input and reads all output (concurrently, so there is no deadlock) then closes p.stdin, p.stdout, p.stderr (even if input is small and gnuplot's side is fully buffered; EOF flushes the buffer) and waits for the process to finish (no zombies).
Popen calls _cleanup() in its constructor that polls exit status of all known subprocesses i.e., even if you won't call p.wait() there shouldn't be many zombie processes (dead but with unread status).
answer from https://stackoverflow.com/a/22729602/4879665