PicklingError when getting the result from ray

PicklingError when getting the result from ray - python

I'm working on slowly converting my very serialized text analysis engine to use Modin and Ray. Feels like I'm nearly there, however, I seem to have hit a stumbling block. My code looks like this:
vectorizer = TfidfVectorizer(
analyzer=ngrams, encoding="ascii", stop_words="english", strip_accents="ascii"
)
tf_idf_matrix = vectorizer.fit_transform(r_strings["name"])
r_vectorizer = ray.put(vectorizer)
r_tf_idf_matrix = ray.put(tf_idf_matrix)
n = 2
match_results = []
for fn in files["c.file"]:
match_results.append(
match_name.remote(fn, r_vectorizer, r_tf_idf_matrix, r_strings, n)
)
match_returns = ray.get(match_results)
I'm following the guidance from the "anti-patterns" section in the Ray documentation, on what to avoid, and this is very similar to that of the "better" pattern.
Traceback (most recent call last):
File "alt.py", line 213, in <module>
match_returns = ray.get(match_results)
File "/home/myuser/.local/lib/python3.7/site-packages/ray/_private/client_mode_hook.py", line 62, in wrapper
return func(*args, **kwargs)
File "/home/myuser/.local/lib/python3.7/site-packages/ray/worker.py", line 1501, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(PicklingError): ray::match_name() (pid=23393, ip=192.168.1.173)
File "python/ray/_raylet.pyx", line 564, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 565, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 1652, in ray._raylet.CoreWorker.store_task_outputs
File "/home/myuser/.local/lib/python3.7/site-packages/ray/serialization.py", line 327, in serialize
return self._serialize_to_msgpack(value)
File "/home/myuser/.local/lib/python3.7/site-packages/ray/serialization.py", line 307, in _serialize_to_msgpack
self._serialize_to_pickle5(metadata, python_objects)
File "/home/myuser/.local/lib/python3.7/site-packages/ray/serialization.py", line 267, in _serialize_to_pickle5
raise e
File "/home/myuser/.local/lib/python3.7/site-packages/ray/serialization.py", line 264, in _serialize_to_pickle5
value, protocol=5, buffer_callback=writer.buffer_callback)
File "/home/myuser/.local/lib/python3.7/site-packages/ray/cloudpickle/cloudpickle_fast.py", line 73, in dumps
cp.dump(obj)
File "/home/myuser/.local/lib/python3.7/site-packages/ray/cloudpickle/cloudpickle_fast.py", line 580, in dump
return Pickler.dump(self, obj)
_pickle.PicklingError: args[0] from __newobj__ args has the wrong class
Definitely an unexpected result. I'm not sure where to go next with this and would appreciate help from folks who have more experience with Ray and Modin.

Related

MMDetection SSD Training Error: ValueError: need at least one array to concatenate

Error message
Traceback (most recent call last):
File "tools/train.py", line 244, in <module>
main()
File "tools/train.py", line 233, in main
train_detector(
File "/home/christ/dev/repos/railsight/mmdetection-2.25.3/mmdet/apis/train.py", line 244, in train_detector
runner.run(data_loaders, cfg.workflow)
File "/home/christ/miniconda3/envs/mmdetection/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 130, in run
epoch_runner(data_loaders[i], **kwargs)
File "/home/christ/miniconda3/envs/mmdetection/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 47, in train
for i, data_batch in enumerate(self.data_loader):
File "/home/christ/miniconda3/envs/mmdetection/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 368, in __iter__
return self._get_iterator()
File "/home/christ/miniconda3/envs/mmdetection/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 314, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File "/home/christ/miniconda3/envs/mmdetection/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 965, in __init__
self._reset(loader, first_iter=True)
File "/home/christ/miniconda3/envs/mmdetection/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 996, in _reset
self._try_put_index()
File "/home/christ/miniconda3/envs/mmdetection/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1230, in _try_put_index
index = self._next_index()
File "/home/christ/miniconda3/envs/mmdetection/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 521, in _next_index
return next(self._sampler_iter) # may raise StopIteration
File "/home/christ/miniconda3/envs/mmdetection/lib/python3.8/site-packages/torch/utils/data/sampler.py", line 226, in __iter__
for idx in self.sampler:
File "/home/christ/dev/repos/railsight/mmdetection-2.25.3/mmdet/datasets/samplers/group_sampler.py", line 36, in __iter__
indices = np.concatenate(indices)
File "<__array_function__ internals>", line 180, in concatenate
ValueError: need at least one array to concatenate
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/home/christ/miniconda3/envs/mmdetection/lib/python3.8/multiprocessing/popen_fork.py", line 27, in poll
pid, sts = os.waitpid(self.pid, flag)
File "/home/christ/miniconda3/envs/mmdetection/lib/python3.8/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler
_error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 35413) is killed by signal: Terminated.
This is the error message I am faced with when trying to train using SSD with MMDetection. I have checked through my dataset and it works with faster_rcnn so I am not understanding why I am having such an issue with the SSD training. Any advice would be great!
_base_ = '../ssd/ssd300_coco.py'
dataset_type = 'CocoDataset'
classes = ('pantograph',)
data = dict(
train=dict(
img_prefix='configs/pantograph/train/',
classes=classes,
ann_file='configs/pantograph/train/SSDTrain.json',
dataset=dict(
ann_file='configs/pantograph/train/SSDTrain.json',
img_prefix='configs/pantograph/train/')),
val=dict(
img_prefix='configs/pantograph/val/',
classes=classes,
ann_file='configs/pantograph/val/SSDVal.json'),
test=dict(
img_prefix='configs/pantograph/test/',
classes=classes,
ann_file='configs/pantograph/test/SSDTest.json'))
Above is my custom SSD config that I tried to run. I have double checked the file locations and made sure they are all correct.

mmdet has a very good community, you can try to look at this link to solve your problem. Shortly, you should change your classes = ('pantograph',) as the following codes
cfg.metainfo = {
'classes': ('balloon', ),
'palette': [
(220, 20, 60),
]
}

xlwings recently stopped getting live data from excel via Range

I was running a script to get data from excel for over a year using the Xlwings range command like so...
list=Range('A1:D10').value
Suddenly, it stopper working. I had changed nothing in the code nor the system, other than maybe installing another network card.
This is the error when trying to use the Range assignment now.
Traceback (most recent call last):
File "G:\python32\fetcher.py", line 61, in <module>
listFull = getComData()
File "G:\python32\fetcher.py", line 38, in getComData
listFull=Range('A4:H184').value
File "G:\python32\lib\site-packages\xlwings\main.py", line 1490, in __init__
impl = apps.active.range(cell1).impl
File "G:\python32\lib\site-packages\xlwings\main.py", line 439, in range
return Range(impl=self.impl.range(cell1, cell2))
File "G:\python32\lib\site-packages\xlwings\_xlwindows.py", line 457, in range
xl1 = self.xl.Range(arg1)
File "G:\python32\lib\site-packages\xlwings\_xlwindows.py", line 341, in xl
self._xl = get_xl_app_from_hwnd(self._hwnd)
File "G:\python32\lib\site-packages\xlwings\_xlwindows.py", line 251, in get_xl_app_from_hwnd
disp = COMRetryObjectWrapper(Dispatch(p))
File "G:\python32\lib\site-packages\win32com\client\__init__.py", line 96, in Dispatch
return __WrapDispatch(dispatch, userName, resultCLSID, typeinfo, clsctx=clsctx)
File "G:\python32\lib\site-packages\win32com\client\__init__.py", line 37, in __WrapDispatch
klass = gencache.GetClassForCLSID(resultCLSID)
File "G:\python32\lib\site-packages\win32com\client\gencache.py", line 180, in GetClassForCLSID
mod = GetModuleForCLSID(clsid)
File "G:\python32\lib\site-packages\win32com\client\gencache.py", line 223, in GetModuleForCLSID
mod = GetModuleForTypelib(typelibCLSID, lcid, major, minor)
File "G:\python32\lib\site-packages\win32com\client\gencache.py", line 259, in GetModuleForTypelib
mod = _GetModule(modName)
File "G:\python32\lib\site-packages\win32com\client\gencache.py", line 622, in _GetModule
mod = __import__(mod_name)
ValueError: source code string cannot contain null bytes

(Casting) errors using extract_(relevant_)features from tsfresh

Trying out Python package tsfresh I run into issues in the first steps. Given a series how to (automatically) make features for it? This snippet produces different errors based on which part I try.
import tsfresh
import pandas as pd
import numpy as np
#tfX, tfy = tsfresh.utilities.dataframe_functions.make_forecasting_frame(pd.Series(np.random.randn(1000)/50), kind='float64', max_timeshift=50, rolling_direction=1)
#rf = tsfresh.extract_relevant_features(tfX, y=tfy, n_jobs=1, column_id='id')
tfX, tfy = tsfresh.utilities.dataframe_functions.make_forecasting_frame(pd.Series(np.random.randn(1000)/50), kind=1, max_timeshift=50, rolling_direction=1)
rf = tsfresh.extract_relevant_features(tfX, y=tfy, n_jobs=1, column_id='id')
The errors are in the first case
""" Traceback (most recent call last): File "C:\Users\user\Anaconda3\envs\env1\lib\multiprocessing\pool.py", line
119, in worker
result = (True, func(*args, **kwds)) File "C:\Users\user\Anaconda3\envs\env1\lib\site-packages\tsfresh\utilities\distribution.py",
line 38, in _function_with_partly_reduce
results = list(itertools.chain.from_iterable(results)) File "C:\Users\user\Anaconda3\envs\env1\lib\site-packages\tsfresh\utilities\distribution.py",
line 37, in
results = (map_function(chunk, **kwargs) for chunk in chunk_list) File
"C:\Users\user\Anaconda3\envs\env1\lib\site-packages\tsfresh\feature_extraction\extraction.py",
line 358, in _do_extraction_on_chunk
return list(_f()) File "C:\Users\user\Anaconda3\envs\env1\lib\site-packages\tsfresh\feature_extraction\extraction.py",
line 350, in _f
result = [("", func(data))] File "C:\Users\user\Anaconda3\envs\env1\lib\site-packages\tsfresh\feature_extraction\feature_calculators.py",
line 193, in variance_larger_than_standard_deviation
y = np.var(x) File "C:\Users\user\Anaconda3\envs\env1\lib\site-packages\numpy\core\fromnumeric.py",
line 3157, in var
**kwargs) File "C:\Users\user\Anaconda3\envs\env1\lib\site-packages\numpy\core_methods.py",
line 110, in _var
arrmean, rcount, out=arrmean, casting='unsafe', subok=False) TypeError: unsupported operand type(s) for /: 'str' and 'int' """
and in the second case
""" Traceback (most recent call last): File
"C:\Users\user\Anaconda3\envs\env1\lib\multiprocessing\pool.py", line
119, in worker
result = (True, func(*args, **kwds)) File "C:\Users\user\Anaconda3\envs\env1\lib\site-packages\tsfresh\utilities\distribution.py",
line 38, in _function_with_partly_reduce
results = list(itertools.chain.from_iterable(results)) File "C:\Users\user\Anaconda3\envs\env1\lib\site-packages\tsfresh\utilities\distribution.py",
line 37, in
results = (map_function(chunk, **kwargs) for chunk in chunk_list) File
"C:\Users\user\Anaconda3\envs\env1\lib\site-packages\tsfresh\feature_extraction\extraction.py",
line 358, in _do_extraction_on_chunk
return list(_f()) File "C:\Users\user\Anaconda3\envs\env1\lib\site-packages\tsfresh\feature_extraction\extraction.py",
line 345, in _f
result = func(data, param=parameter_list) File "C:\Users\user\Anaconda3\envs\env1\lib\site-packages\tsfresh\feature_extraction\feature_calculators.py",
line 1752, in friedrich_coefficients
coeff = _estimate_friedrich_coefficients(x, m, r) File "C:\Users\user\Anaconda3\envs\env1\lib\site-packages\tsfresh\feature_extraction\feature_calculators.py",
line 145, in _estimate_friedrich_coefficients
result.dropna(inplace=True) File "C:\Users\user\Anaconda3\envs\env1\lib\site-packages\pandas\core\frame.py",
line 4598, in dropna
result = self.loc(axis=axis)[mask] File "C:\Users\user\Anaconda3\envs\env1\lib\site-packages\pandas\core\indexing.py",
line 1500, in getitem
return self._getitem_axis(maybe_callable, axis=axis) File "C:\Users\user\Anaconda3\envs\env1\lib\site-packages\pandas\core\indexing.py",
line 1859, in _getitem_axis
if is_iterator(key): File "C:\Users\user\Anaconda3\envs\env1\lib\site-packages\pandas\core\dtypes\inference.py",
line 157, in is_iterator
return hasattr(obj, 'next') File "C:\Users\user\Anaconda3\envs\env1\lib\site-packages\pandas\core\generic.py",
line 5065, in getattr
if self._info_axis._can_hold_identifiers_and_holds_name(name): File
"C:\Users\user\Anaconda3\envs\env1\lib\site-packages\pandas\core\indexes\base.py",
line 3984, in _can_hold_identifiers_and_holds_name
return name in self File "C:\Users\user\Anaconda3\envs\env1\lib\site-packages\pandas\core\indexes\category.py",
line 327, in contains
return contains(self, key, container=self._engine) File "C:\Users\user\Anaconda3\envs\env1\lib\site-packages\pandas\core\arrays\categorical.py",
line 188, in contains
loc = cat.categories.get_loc(key) File "C:\Users\user\Anaconda3\envs\env1\lib\site-packages\pandas\core\indexes\interval.py",
line 770, in get_loc
start, stop = self._find_non_overlapping_monotonic_bounds(key) File
"C:\Users\user\Anaconda3\envs\env1\lib\site-packages\pandas\core\indexes\interval.py",
line 717, in _find_non_overlapping_monotonic_bounds
start = self._searchsorted_monotonic(key, 'left') File "C:\Users\user\Anaconda3\envs\env1\lib\site-packages\pandas\core\indexes\interval.py",
line 681, in _searchsorted_monotonic
return sub_idx._searchsorted_monotonic(label, side) File "C:\Users\user\Anaconda3\envs\env1\lib\site-packages\pandas\core\indexes\base.py",
line 4755, in _searchsorted_monotonic
return self.searchsorted(label, side=side) File "C:\Users\user\Anaconda3\envs\env1\lib\site-packages\pandas\core\base.py",
line 1501, in searchsorted
return self._values.searchsorted(value, side=side, sorter=sorter) TypeError: Cannot cast array data from dtype('float64') to
dtype('
np.version, tsfresh.version are ('1.15.4', 'unknown'). I installed tsfresh using conda, probably from conda-forge. I am on Windows 10. Using another kernel with np.version, tsfresh.version ('1.15.4', '0.11.2') lead to the same results.
Trying the first couple of cells from timeseries_forecasting_basic_example.ipynb yields the casting error as well.

Fixed it. Either the version on conda(-forge) or one of the dependencies was the issue. So using "conda uninstall tsfresh", "conda install patsy future six tqdm" and "pip install tsfresh" combined did the trick.

Python connectivity with HBase using happybase

Can someone help me with the stacktrace generated while using happybase library?
I am trying to pass a dictionary object of python 3.4 in the 'put' method and the following stack trace is generated::
x
b"TWLb'25-Jan-13'"
data_values
{b'Low': b'0.10', b'Date': b'25-Jan-13', b'Volume': b'-', b'Close': b'0.12', b'High': b'0.12', b'Open': b'0.12'}
Traceback (most recent call last):
File "/home/msingal/Desktop/asd/Daily.py", line 63, in insert
hbase_test.insert_data(code, data_format)
File "/home/msingal/Desktop/asd/hbase_test.py", line 56, in insert_data
con.table(ticker, use_prefix=False).put(x, data_values)
File "/usr/lib/python3.4/site-packages/happybase/table.py", line 464, in put
batch.put(row, data)
File "/usr/lib/python3.4/site-packages/happybase/batch.py", line 137, in __exit__
self.send()
File "/usr/lib/python3.4/site-packages/happybase/batch.py", line 60, in send
self._table.connection.client.mutateRows(self._table.name, bms, {})
File "/usr/lib64/python3.4/site-packages/thriftpy/thrift.py", line 198, in _req
return self._recv(_api)
File "/usr/lib64/python3.4/site-packages/thriftpy/thrift.py", line 210, in _recv
fname, mtype, rseqid = self._iprot.read_message_begin()
File "thriftpy/protocol/cybin/cybin.pyx", line 429, in cybin.TCyBinaryProtocol.read_message_begin (thriftpy/protocol/cybin/cybin.c:6325)
File "thriftpy/protocol/cybin/cybin.pyx", line 60, in cybin.read_i32 (thriftpy/protocol/cybin/cybin.c:1546)
File "thriftpy/transport/buffered/cybuffered.pyx", line 65, in thriftpy.transport.buffered.cybuffered.TCyBufferedTransport.c_read (thriftpy/transport/buffered/cybuffered.c:1881)
File "thriftpy/transport/buffered/cybuffered.pyx", line 69, in thriftpy.transport.buffered.cybuffered.TCyBufferedTransport.read_trans (thriftpy/transport/buffered/cybuffered.c:1948)
File "thriftpy/transport/cybase.pyx", line 61, in thriftpy.transport.cybase.TCyBuffer.read_trans (thriftpy/transport/cybase.c:1472)
File "/usr/lib64/python3.4/site-packages/thriftpy/transport/socket.py", line 125, in read
message='TSocket read 0 bytes')
thriftpy.transport.TTransportException: TTransportException(message='TSocket read 0 bytes', type=4)
The lines of code are::
ticker = 'TWLB'
data_values = {b'Low': b'0.10', b'Date': b'25-Jan-13', b'Volume': b'-', b'Close': b'0.12', b'High': b'0.12', b'Open': b'0.12'}
x = (ticker + str(data_values.get(b'Date'))).encode('ASCII')
print('x')
print(x)
print('data_values')
print(data_values)
con.table(ticker, use_prefix=False).put(x, data_values)
Any help on solution and explaination would be apprciated.
I am new to StackOverflow so if my language feels offencive, please forgive me. I have tried to provide all the relevant info but if some info is missing let me know and I will update.

How to add an edge in Python Gremlin variant

I'm trying to create a graph using gremlin-python, but I can't seem to work out how to add an edge.
Using the standard Gremlin console I can do the following:
gremlin> a = g.addV().next()
==>v[0]
gremlin> b = g.addV().next()
==>v[1]
gremlin> g.V()
==>v[0]
==>v[1]
gremlin> a.addEdge('conn', b)
==>e[2][0-conn->1]
gremlin> g.E()
==>e[2][0-conn->1]
gremlin>
But when trying to do the same via python connected to gremlin server, I can't seem to do the same:
>>> a = g.addV().next()
>>> b = g.addV().next()
>>> g.V().toList()
[v[1519], v[1520]]
>>> a.addEdge('conn', b)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'Vertex' object has no attribute 'addEdge'
I've tried various incantations, but can't seem to work it out, and can't find any examples anywhere. Also, I see reference in the Gremlin docs to both addE and addEdge but can't work out what the difference is (neither appear to work above).
Edit: Getting a bit further, but still no luck. It seems GraphTraversal.addE() exists, so if I don't call next() then I can call addE... but still I don't seem to be able to get the arguments something it likes.
>>> a = g.addV()
>>> b = g.addV()
>>> a.addE('foo', b).toList()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Development/matt/lib/python2.7/site-packages/gremlin_python/process/traversal.py", line 52, in toList
return list(iter(self))
File "/Development/matt/lib/python2.7/site-packages/gremlin_python/process/traversal.py", line 70, in next
return self.__next__()
File "/Development/matt/lib/python2.7/site-packages/gremlin_python/process/traversal.py", line 43, in __next__
self.traversal_strategies.apply_strategies(self)
File "/Development/matt/lib/python2.7/site-packages/gremlin_python/process/traversal.py", line 284, in apply_strategies
traversal_strategy.apply(traversal)
File "/Development/matt/lib/python2.7/site-packages/gremlin_python/driver/remote_connection.py", line 95, in apply
remote_traversal = self.remote_connection.submit(traversal.bytecode)
File "/Development/matt/lib/python2.7/site-packages/gremlin_python/driver/driver_remote_connection.py", line 53, in submit
traversers = self._loop.run_sync(lambda: self.submit_traversal_bytecode(request_id, bytecode))
File "/Development/matt/lib/python2.7/site-packages/tornado/ioloop.py", line 457, in run_sync
return future_cell[0].result()
File "/Development/matt/lib/python2.7/site-packages/tornado/concurrent.py", line 237, in result
raise_exc_info(self._exc_info)
File "/Development/matt/lib/python2.7/site-packages/tornado/gen.py", line 1021, in run
yielded = self.gen.throw(*exc_info)
File "/Development/matt/lib/python2.7/site-packages/gremlin_python/driver/driver_remote_connection.py", line 73, in submit_traversal_bytecode
traversers = yield self._execute_message(message)
File "/Development/matt/lib/python2.7/site-packages/tornado/gen.py", line 1015, in run
value = future.result()
File "/Development/matt/lib/python2.7/site-packages/tornado/concurrent.py", line 237, in result
raise_exc_info(self._exc_info)
File "/Development/matt/lib/python2.7/site-packages/tornado/gen.py", line 1021, in run
yielded = self.gen.throw(*exc_info)
File "/Development/matt/lib/python2.7/site-packages/gremlin_python/driver/driver_remote_connection.py", line 149, in _execute_message
recv_message = yield response.receive()
File "/Development/matt/lib/python2.7/site-packages/tornado/gen.py", line 1015, in run
value = future.result()
File "/Development/matt/lib/python2.7/site-packages/tornado/concurrent.py", line 237, in result
raise_exc_info(self._exc_info)
File "/Development/matt/lib/python2.7/site-packages/tornado/gen.py", line 1024, in run
yielded = self.gen.send(value)
File "/Development/matt/lib/python2.7/site-packages/gremlin_python/driver/driver_remote_connection.py", line 236, in receive
"{0}: {1}".format(status_code, recv_message["status"]["message"]))
gremlin_python.driver.driver_remote_connection.GremlinServerError: 599: Could not locate method: DefaultGraphTraversal.addE([foo, [AddVertexStep({})]])

As far as I know, addEdge() works on the graph object and addE() works on the graph traversal object. Since you were using g() which is the latter, you need addE().

Seems the following syntax works:
>>> a = g.addV()
>>> b = g.addV()
>>> a.addE('foo').to(b).toList()
[e[1534][1532-foo->1533]]
I'm still not clear on the difference between addE and addEdge but I guess the latter is not available in python and I was confusing the signature of them.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

PicklingError when getting the result from ray - python

Related

MMDetection SSD Training Error: ValueError: need at least one array to concatenate

xlwings recently stopped getting live data from excel via Range

(Casting) errors using extract_(relevant_)features from tsfresh

Python connectivity with HBase using happybase

How to add an edge in Python Gremlin variant

Categories

Resources