python struct.error: 'i' format requires -2147483648 <= number <= 2147483647 - python

Problem
I'm willing to do a feature engineering using multiprocessing module (multiprocessing.Pool.starmap().
However, it gives an error message as follows. I guess this error message is about the size of inputs (2147483647 = 2^31 − 1?), since the same code worked smoothly for a fraction(frac=0.05) of input dataframes(train_scala, test, ts). I convert types of data frame as smallest as possible, however it does not get better.
The anaconda version is 4.3.30 and the Python version is 3.6 (64 bit).
And the memory size of the system is over 128GB with more than 20 cores.
Would you like to suggest any pointer or solution to overcome this problem? If this problem is caused by a large data for a multiprocessing module, How much smaller data should I use to utilize the multiprocessing module on Python3?
Code:
from multiprocessing import Pool, cpu_count
from itertools import repeat
p = Pool(8)
is_train_seq = [True]*len(historyCutoffs)+[False]
config_zip = zip(historyCutoffs, repeat(train_scala), repeat(test), repeat(ts), ul_parts_path, repeat(members), is_train_seq)
p.starmap(multiprocess_FE, config_zip)
Error Message:
Traceback (most recent call last):
File "main_1210_FE_scala_multiprocessing.py", line 705, in <module>
print('----Pool starmap start----')
File "/home/dmlab/ksedm1/anaconda3/envs/py36/lib/python3.6/multiprocessing/pool.py", line 274, in starmap
return self._map_async(func, iterable, starmapstar, chunksize).get()
File "/home/dmlab/ksedm1/anaconda3/envs/py36/lib/python3.6/multiprocessing/pool.py", line 644, in get
raise self._value
File "/home/dmlab/ksedm1/anaconda3/envs/py36/lib/python3.6/multiprocessing/pool.py", line 424, in _handle_tasks
put(task)
File "/home/dmlab/ksedm1/anaconda3/envs/py36/lib/python3.6/multiprocessing/connection.py", line 206, in send
self._send_bytes(_ForkingPickler.dumps(obj))
File "/home/dmlab/ksedm1/anaconda3/envs/py36/lib/python3.6/multiprocessing/connection.py", line 393, in _send_bytes
header = struct.pack("!i", n)
struct.error: 'i' format requires -2147483648 <= number <= 2147483647
Extra infos
historyCutoffs is a list of integers
train_scala is a pandas DataFrame (377MB)
test is a pandas DataFrame (15MB)
ts is a pandas DataFrame (547MB)
ul_parts_path is a list of directories (string)
is_train_seq is a list of booleans
Extra Code: Method multiprocess_FE
def multiprocess_FE(historyCutoff, train_scala, test, ts, ul_part_path, members, is_train):
train_dict = {}
ts_dict = {}
msno_dict = {}
ul_dict = {}
if is_train == True:
train_dict[historyCutoff] = train_scala[train_scala.historyCutoff == historyCutoff]
else:
train_dict[historyCutoff] = test
msno_dict[historyCutoff] = set(train_dict[historyCutoff].msno)
print('length of msno is {:d} in cutoff {:d}'.format(len(msno_dict[historyCutoff]), historyCutoff))
ts_dict[historyCutoff] = ts[(ts.transaction_date <= historyCutoff) & (ts.msno.isin(msno_dict[historyCutoff]))]
print('length of transaction is {:d} in cutoff {:d}'.format(len(ts_dict[historyCutoff]), historyCutoff))
ul_part = pd.read_csv(gzip.open(ul_part_path, mode="rt")) ##.sample(frac=0.01, replace=False)
ul_dict[historyCutoff] = ul_part[ul_part.msno.isin(msno_dict[historyCutoff])]
train_dict[historyCutoff] = enrich_by_features(historyCutoff, train_dict[historyCutoff], ts_dict[historyCutoff], ul_dict[historyCutoff], members, is_train)

The communication protocol between processes uses pickling, and the pickled data is prefixed with the size of the pickled data. For your method, all arguments together are pickled as one object.
You produced an object that when pickled is larger than fits in a i struct formatter (a four-byte signed integer), which breaks the assumptions the code has made.
You could delegate reading of your dataframes to the child process instead, only sending across the metadata needed to load the dataframe. Their combined size is nearing 1GB, way too much data to share over a pipe between your processes.
Quoting from the Programming guidelines section:
Better to inherit than pickle/unpickle
When using the spawn or forkserver start methods many types from multiprocessing need to be picklable so that child processes can use them. However, one should generally avoid sending shared objects to other processes using pipes or queues. Instead you should arrange the program so that a process which needs access to a shared resource created elsewhere can inherit it from an ancestor process.
If you are not running on Windows and use either the spawn or forkserver methods, you could load your dataframes as globals before starting your subprocesses, at which point the child processes will 'inherit' the data via the normal OS copy-on-write memory page sharing mechanisms.
Note that this limit was raised for non-Windows systems in Python 3.8, to an unsigned long long (8 bytes), and so you can now send and receive 4 EiB of data. See this commit, and Python issues #35152 and #17560.
If you can't upgrade and you can't make use of resource inheriting, and are not running on Windows, then use this patch:
import functools
import logging
import struct
import sys
logger = logging.getLogger()
def patch_mp_connection_bpo_17560():
"""Apply PR-10305 / bpo-17560 connection send/receive max size update
See the original issue at https://bugs.python.org/issue17560 and
https://github.com/python/cpython/pull/10305 for the pull request.
This only supports Python versions 3.3 - 3.7, this function
does nothing for Python versions outside of that range.
"""
patchname = "Multiprocessing connection patch for bpo-17560"
if not (3, 3) < sys.version_info < (3, 8):
logger.info(
patchname + " not applied, not an applicable Python version: %s",
sys.version
)
return
from multiprocessing.connection import Connection
orig_send_bytes = Connection._send_bytes
orig_recv_bytes = Connection._recv_bytes
if (
orig_send_bytes.__code__.co_filename == __file__
and orig_recv_bytes.__code__.co_filename == __file__
):
logger.info(patchname + " already applied, skipping")
return
#functools.wraps(orig_send_bytes)
def send_bytes(self, buf):
n = len(buf)
if n > 0x7fffffff:
pre_header = struct.pack("!i", -1)
header = struct.pack("!Q", n)
self._send(pre_header)
self._send(header)
self._send(buf)
else:
orig_send_bytes(self, buf)
#functools.wraps(orig_recv_bytes)
def recv_bytes(self, maxsize=None):
buf = self._recv(4)
size, = struct.unpack("!i", buf.getvalue())
if size == -1:
buf = self._recv(8)
size, = struct.unpack("!Q", buf.getvalue())
if maxsize is not None and size > maxsize:
return None
return self._recv(size)
Connection._send_bytes = send_bytes
Connection._recv_bytes = recv_bytes
logger.info(patchname + " applied")

this problem was fixed in a recent PR to python
https://github.com/python/cpython/pull/10305
if you want, you can make this change locally to make it work for you right away, without waiting for a python and anaconda release.

Related

File I/O error using nglview.show_biopython(structure)

So I have been trying to get into visualizing proteins in python, so I went on youtube and found some tutorials I ended up on a tutorial that was teaching you how to visualize a protein from the COVID-19 virus, so I went and setup anaconda, got jupyter notebook working vscode, and downloaded the necessary files from the PDB database, and made sure they were in the same directory as my notebook but when I run the the nglview.show_biopython(structure) function I get an ValueError: I/O opertation on a closed file. I'm stummed this is my first time using jupyter notebook so maybe there is something I'm missing, I don't know.
This what the code looks like
from Bio.PDB import *
import nglview as nv
parser = PDBParser()
structure = parser.get_structure("6YYT", "6YYT.pdb")
view = nv.show_biopython(structure)
This is the error
Output exceeds the size limit. Open the full output data in a text editor
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_1728\2743687014.py in <module>
----> 1 view = nv.show_biopython(structure)
c:\Users\jerem\anaconda3\lib\site-packages\nglview\show.py in show_biopython(entity, **kwargs)
450 '''
451 entity = BiopythonStructure(entity)
--> 452 return NGLWidget(entity, **kwargs)
453
454
c:\Users\jerem\anaconda3\lib\site-packages\nglview\widget.py in __init__(self, structure, representations, parameters, **kwargs)
243 else:
244 if structure is not None:
--> 245 self.add_structure(structure, **kwargs)
246
247 if representations:
c:\Users\jerem\anaconda3\lib\site-packages\nglview\widget.py in add_structure(self, structure, **kwargs)
1111 if not isinstance(structure, Structure):
1112 raise ValueError(f'{structure} is not an instance of Structure')
-> 1113 self._load_data(structure, **kwargs)
1114 self._ngl_component_ids.append(structure.id)
1115 if self.n_components > 1:
...
--> 200 return io_str.getvalue()
201
202
ValueError: I/O operation on closed file
I only get this error when using nglview.show_biopython, when I run the get_structure() function it can read the file just fine. I can visualize other molucles just fine, or maybe that's because I was using the ASE library instead of a file. I don't know, that's why I'm here.
Update: Recently I found out that I can visualize the protein using nglview.show_file() instead of using nglview.show_biopython(). Even though I can visualize proteins now and techincally my problem has been solved I would still like to know why the show_biopython() function isn't working properly.
I also figured out another way to fix this problem. After going back to the tutorial I was talking about I saw that it was made back in 2021. After seeing this I wonder if we were using the same verions of each package, turns out we were not. I'm not sure what version of nglview they were using, but they were using biopython 1.79 which was the latest verion back in 2021 and I was using biopython 1.80. While using biopython 1.80 I was getting the error seen above. But now that I'm using biopython 1.79 I get this:
file = "6YYT.pdb"
parser = PDBParser()
structure = parser.get_structure("6YYT", file)
structure
view = nv.show_biopython(structure)
view
Output:
c:\Users\jerem\anaconda3\lib\site-packages\Bio\PDB\StructureBuilder.py:89:
PDBConstructionWarning: WARNING: Chain A is discontinuous at line 12059.
warnings.warn(
So I guess there is something going on with biopython 1.80, so I'm going to stick with 1.79
I had a similar problem with:
from Bio.PDB import *
import nglview as nv
parser = PDBParser(QUIET = True)
structure = parser.get_structure("2ms2", "2ms2.pdb")
save_pdb = PDBIO()
save_pdb.set_structure(structure)
save_pdb.save('pdb_out.pdb')
view = nv.show_biopython(structure)
view
error was like in question:
.................site-packages/nglview/adaptor.py:201, in BiopythonStructure.get_structure_string(self)
199 io_str = StringIO()
200 io_pdb.save(io_str)
--> 201 return io_str.getvalue()
ValueError: I/O operation on closed file
I modified site-packages/nglview/adaptor.py:201, in BiopythonStructure.get_structure_string(self):
def get_structure_string(self):
from Bio.PDB import PDBIO
from io import StringIO
io_pdb = PDBIO()
io_pdb.set_structure(self._entity)
io_str = StringIO()
io_pdb.save(io_str)
return io_str.getvalue()
with :
def get_structure_string(self):
from Bio.PDB import PDBIO
import mmap
io_pdb = PDBIO()
io_pdb.set_structure(self._entity)
mo = mmap_str()
io_pdb.save(mo)
return mo.read()
and added this new class mmap_str() , in same file:
import mmap
import copy
class mmap_str():
import mmap #added import at top
instance = None
def __init__(self):
self.mm = mmap.mmap(-1, 2)
self.a = ''
b = '\n'
self.mm.write(b.encode(encoding = 'utf-8'))
self.mm.seek(0)
#print('self.mm.read().decode() ',self.mm.read().decode(encoding = 'utf-8'))
self.mm.seek(0)
def __new__(cls, *args, **kwargs):
if not isinstance(cls.instance, cls):
cls.instance = object.__new__(cls)
return cls.instance
def write(self, string):
self.a = str(copy.deepcopy(self.mm.read().decode(encoding = 'utf-8'))).lstrip('\n')
self.mm.seek(0)
#print('a -> ', self.a)
len_a = len(self.a)
self.mm = mmap.mmap(-1, len(self.a)+len(string))
#print('a :', self.a)
#print('len self.mm ', len(self.mm))
#print('lenght string : ', len(string))
#print(bytes((self.a+string).encode()))
self.mm.write(bytes((self.a+string).encode()))
self.mm.seek(0)
#print('written once ')
#self.mm.seek(0)
def read(self):
self.mm.seek(0)
a = self.mm.read().decode().lstrip('\n')
self.mm.seek(0)
return a
def __enter__(self):
return self
def __exit__(self, *args):
pass
if I uncomment the print statements I'll get the :
IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
error , but commenting them out I get:
while using thenglview.show_file(filename) I get:
tha's because, as could be seen looking at the pdb_out.pdb file
outputted by my code, Biopytho.PDB.PDBParser.get_structure(name , filename) doesnt retrieve the pdb header responsible for generate full CRYSTALLOGRAPHIC SYMMETRY/or biopython can't handle it (not sure about this, help if you know better), but just the coordinates.
Still don't understand what is going on with the :
--> 201 return io_str.getvalue()
ValueError: I/O operation on closed file
it could be something related to jupiter ipykernal ? hope somebody could shed more light into this, dont know how the framework runs, but is definitely different from a normal python interpreter. As an example:
Same code in one of my Python virtualenv, will run forever, so it could be ipykernel dont like StringIO()s or do something strange to them ?
OK thanks to the hint in the answer below, I went inspecting PDBIO.py in github repo for version Biopython 1.80 and compared the save method of PDBIO : def save(self, file, select=_select, write_end=True, preserve_atom_numbering=False): with the one in Biopython 1.79,
see first bit:
and last bit:
so apparently the big difference is the with fhandle: block in version 1.80.
So I realized that changing adaptor.py with adding a subclass of StringIO that looks like:
from io import StringIO
class StringIO(StringIO):
def __exit__(self, *args, **kwargs):
print('exiting from subclassed StringIO !!!!!')
pass
and modifying def get_structure_string(self): like this:
def get_structure_string(self):
from Bio.PDB import PDBIO
#from io import StringIO
io_pdb = PDBIO()
io_pdb.set_structure(self._entity)
io_str = StringIO()
io_pdb.save(io_str)
return io_str.getvalue()
was enough to get my Biopython 1.80 work in jupiter with nglview.
That told I am not sure what are the pitfalls of not closing the StringIO object we use for the visualization, but apparently its what Biopython 1.79 was doing like my first answer using a mmap object was doing too (not closing the mmap_str)
Another way to solve the probelm:
tried to understand git, I ended up with this, seems more coherent with the previous habits in the biopython project, but cant push it.
It makes use of as_handle from BIO.file :https://github.com/biopython/biopython/blob/e1902d1cdd3aa9325b4622b25d82fbf54633e251/Bio/File.py#L28
#contextlib.contextmanager
def as_handle(handleish, mode="r", **kwargs):
r"""Context manager to ensure we are using a handle.
Context manager for arguments that can be passed to SeqIO and AlignIO read, write,
and parse methods: either file objects or path-like objects (strings, pathlib.Path
instances, or more generally, anything that can be handled by the builtin 'open'
function).
When given a path-like object, returns an open file handle to that path, with provided
mode, which will be closed when the manager exits.
All other inputs are returned, and are *not* closed.
Arguments:
- handleish - Either a file handle or path-like object (anything which can be
passed to the builtin 'open' function, such as str, bytes,
pathlib.Path, and os.DirEntry objects)
- mode - Mode to open handleish (used only if handleish is a string)
- kwargs - Further arguments to pass to open(...)
Examples
--------
>>> from Bio import File
>>> import os
>>> with File.as_handle('seqs.fasta', 'w') as fp:
... fp.write('>test\nACGT')
...
10
>>> fp.closed
True
>>> handle = open('seqs.fasta', 'w')
>>> with File.as_handle(handle) as fp:
... fp.write('>test\nACGT')
...
10
>>> fp.closed
False
>>> fp.close()
>>> os.remove("seqs.fasta") # tidy up
"""
try:
with open(handleish, mode, **kwargs) as fp:
yield fp
except TypeError:
yield handleish
Anyone could pass it along ? [of course needs to be checked out, my tests are OK, but I am a novice].

Python multiprocessing apply_async "assert left > 0" AssertionError

I am trying to load numpy files asynchronously in a Pool:
self.pool = Pool(2, maxtasksperchild = 1)
...
nextPackage = self.pool.apply_async(loadPackages, (...))
for fi in np.arange(len(files)):
packages = nextPackage.get(timeout=30)
# preload the next package asynchronously. It will be available
# by the time it is required.
nextPackage = self.pool.apply_async(loadPackages, (...))
The method "loadPackages":
def loadPackages(... (2 strings & 2 ints) ...):
print("This isn't printed!')
packages = {
"TRUE": np.load(gzip.GzipFile(path1, "r")),
"FALSE": np.load(gzip.GzipFile(path2, "r"))
}
return packages
Before even the first "package" is loaded, the following error occurs:
Exception in thread Thread-8: Traceback (most recent call last):
File "C:\Users\roman\Anaconda3\envs\tsc1\lib\threading.py", line 914,
in _bootstrap_inner
self.run() File "C:\Users\roman\Anaconda3\envs\tsc1\lib\threading.py", line 862, in
run
self._target(*self._args, **self._kwargs) File "C:\Users\roman\Anaconda3\envs\tsc1\lib\multiprocessing\pool.py", line
463, in _handle_results
task = get() File "C:\Users\roman\Anaconda3\envs\tsc1\lib\multiprocessing\connection.py",
line 250, in recv
buf = self._recv_bytes() File "C:\Users\roman\Anaconda3\envs\tsc1\lib\multiprocessing\connection.py",
line 318, in _recv_bytes
return self._get_more_data(ov, maxsize) File "C:\Users\roman\Anaconda3\envs\tsc1\lib\multiprocessing\connection.py",
line 337, in _get_more_data
assert left > 0 AssertionError
I monitor the resources closely: Memory is not an issue, I still have plenty left when the error occurs.
The unzipped files are just plain multidimensional numpy arrays.
Individually, using a Pool with a simpler method works, and loading the file like that works. Only in combination it fails.
(All this happens in a custom keras generator. I doubt this helps but who knows.) Python 3.5.
What could the cause of this issue be? How can this error be interpreted?
Thank you for your help!
There is a bug in Python C core code that prevents data responses bigger than 2GB return correctly to the main thread.
you need to either split the data into smaller chunks as suggested in the previous answer or not use multiprocessing for this function
I reported this bug to python bugs list (https://bugs.python.org/issue34563) and created a PR (https://github.com/python/cpython/pull/9027) to fix it, but it probably will take a while to get it released (UPDATE: the fix is present in python 3.8.0+)
if you are interested you can find more details on what causes the bug in the bug description in the link I posted
It think I've found a workaround by retrieving data in small chunks. In my case it was a list of lists.
I had:
for i in range(0, NUMBER_OF_THREADS):
print('MAIN: Getting data from process ' + str(i) + ' proxy...')
X_train.extend(ListasX[i]._getvalue())
Y_train.extend(ListasY[i]._getvalue())
ListasX[i] = None
ListasY[i] = None
gc.collect()
Changed to:
CHUNK_SIZE = 1024
for i in range(0, NUMBER_OF_THREADS):
print('MAIN: Getting data from process ' + str(i) + ' proxy...')
for k in range(0, len(ListasX[i]), CHUNK_SIZE):
X_train.extend(ListasX[i][k:k+CHUNK_SIZE])
Y_train.extend(ListasY[i][k:k+CHUNK_SIZE])
ListasX[i] = None
ListasY[i] = None
gc.collect()
And now it seems to work, possibly by serializing less data at a time.
So maybe if you can segment your data into smaller portions you can overcome the issue. Good luck!

Porting pickle py2 to py3 strings become bytes

I have a pickle file that was created with python 2.7 that I'm trying to port to python 3.6. The file is saved in py 2.7 via pickle.dumps(self.saved_objects, -1)
and loaded in python 3.6 via loads(data, encoding="bytes") (from a file opened in rb mode). If I try opening in r mode and pass encoding=latin1 to loads I get UnicodeDecode errors. When I open it as a byte stream it loads, but literally every string is now a byte string. Every object's __dict__ keys are all b"a_variable_name" which then generates attribute errors when calling an_object.a_variable_name because __getattr__ passes a string and __dict__ only contains bytes. I feel like I've tried every combination of arguments and pickle protocols already. Apart from forcibly converting all objects' __dict__ keys to strings I'm at a loss. Any ideas?
** Skip to 4/28/17 update for better example
-------------------------------------------------------------------------------------------------------------
** Update 4/27/17
This minimum example illustrates my problem:
From py 2.7.13
import pickle
class test(object):
def __init__(self):
self.x = u"test ¢" # including a unicode str breaks things
t = test()
dumpstr = pickle.dumps(t)
>>> dumpstr
"ccopy_reg\n_reconstructor\np0\n(c__main__\ntest\np1\nc__builtin__\nobject\np2\nNtp3\nRp4\n(dp5\nS'x'\np6\nVtest \xa2\np7\nsb."
From py 3.6.1
import pickle
class test(object):
def __init__(self):
self.x = "xyz"
dumpstr = b"ccopy_reg\n_reconstructor\np0\n(c__main__\ntest\np1\nc__builtin__\nobject\np2\nNtp3\nRp4\n(dp5\nS'x'\np6\nVtest \xa2\np7\nsb."
t = pickle.loads(dumpstr, encoding="bytes")
>>> t
<__main__.test object at 0x040E3DF0>
>>> t.x
Traceback (most recent call last):
File "<pyshell#15>", line 1, in <module>
t.x
AttributeError: 'test' object has no attribute 'x'
>>> t.__dict__
{b'x': 'test ¢'}
>>>
-------------------------------------------------------------------------------------------------------------
Update 4/28/17
To re-create my issue I'm posting my actual raw pickle data here
The pickle file was created in python 2.7.13, windows 10 using
with open("raw_data.pkl", "wb") as fileobj:
pickle.dump(library, fileobj, protocol=0)
(protocol 0 so it's human readable)
To run it you'll need classes.py
# classes.py
class Library(object): pass
class Book(object): pass
class Student(object): pass
class RentalDetails(object): pass
And the test script here:
# load_pickle.py
import pickle, sys, itertools, os
raw_pkl = "raw_data.pkl"
is_py3 = sys.version_info.major == 3
read_modes = ["rb"]
encodings = ["bytes", "utf-8", "latin-1"]
fix_imports_choices = [True, False]
files = ["raw_data_%s.pkl" % x for x in range(3)]
def py2_test():
with open(raw_pkl, "rb") as fileobj:
loaded_object = pickle.load(fileobj)
print("library dict: %s" % (loaded_object.__dict__.keys()))
return loaded_object
def py2_dumps():
library = py2_test()
for protcol, path in enumerate(files):
print("dumping library to %s, protocol=%s" % (path, protcol))
with open(path, "wb") as writeobj:
pickle.dump(library, writeobj, protocol=protcol)
def py3_test():
# this test iterates over the different options trying to load
# the data pickled with py2 into a py3 environment
print("starting py3 test")
for (read_mode, encoding, fix_import, path) in itertools.product(read_modes, encodings, fix_imports_choices, files):
py3_load(path, read_mode=read_mode, fix_imports=fix_import, encoding=encoding)
def py3_load(path, read_mode, fix_imports, encoding):
from traceback import print_exc
print("-" * 50)
print("path=%s, read_mode = %s fix_imports = %s, encoding = %s" % (path, read_mode, fix_imports, encoding))
if not os.path.exists(path):
print("start this file with py2 first")
return
try:
with open(path, read_mode) as fileobj:
loaded_object = pickle.load(fileobj, fix_imports=fix_imports, encoding=encoding)
# print the object's __dict__
print("library dict: %s" % (loaded_object.__dict__.keys()))
# consider the test a failure if any member attributes are saved as bytes
test_passed = not any((isinstance(k, bytes) for k in loaded_object.__dict__.keys()))
print("Test %s" % ("Passed!" if test_passed else "Failed"))
except Exception:
print_exc()
print("Test Failed")
input("Press Enter to continue...")
print("-" * 50)
if is_py3:
py3_test()
else:
# py2_test()
py2_dumps()
put all 3 in the same directory and run c:\python27\python load_pickle.py first which will create 1 pickle file for each of the 3 protocols. Then run the same command with python 3 and notice that it version converts the __dict__ keys to bytes. I had it working for about 6 hours, but for the life of me I can't figure out how I broke it again.
In short, you're hitting bug 22005 with datetime.date objects in the RentalDetails objects.
That can be worked around with the encoding='bytes' parameter, but that leaves your classes with __dict__ containing bytes:
>>> library = pickle.loads(pickle_data, encoding='bytes')
>>> dir(library)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: '<' not supported between instances of 'str' and 'bytes'
It's possible to manually fix that based on your specific data:
def fix_object(obj):
"""Decode obj.__dict__ containing bytes keys"""
obj.__dict__ = dict((k.decode("ascii"), v) for k, v in obj.__dict__.items())
def fix_library(library):
"""Walk all library objects and decode __dict__ keys"""
fix_object(library)
for student in library.students:
fix_object(student)
for book in library.books:
fix_object(book)
for rental in book.rentals:
fix_object(rental)
But that's fragile and enough of a pain you should be looking for a better option.
1) Implement __getstate__/__setstate__ that maps datetime objects to a non-broken representation, for instance:
class Event(object):
"""Example class working around datetime pickling bug"""
def __init__(self):
self.date = datetime.date.today()
def __getstate__(self):
state = self.__dict__.copy()
state["date"] = state["date"].toordinal()
return state
def __setstate__(self, state):
self.__dict__.update(state)
self.date = datetime.date.fromordinal(self.date)
2) Don't use pickle at all. Along the lines of __getstate__/__setstate__, you can just implement to_dict/from_dict methods or similar in your classes for saving their content as json or some other plain format.
A final note, having a backreference to library in each object shouldn't be required.
You should treat pickle data as specific to the (major) version of Python that created it.
(See Gregory Smith's message w.r.t. issue 22005.)
The best way to get around this is to write a Python 2.7 program to read the pickled data, and write it out in a neutral format.
Taking a quick look at your actual data, it seems to me that an SQLite database is appropriate as an interchange format, since the Books contain references to a Library and RentalDetails. You could create separate tables for each.
Question: Porting pickle py2 to py3 strings become bytes
The given encoding='latin-1' below, is ok.
Your Problem with b'' are the result of using encoding='bytes'.
This will result in dict-keys being unpickled as bytes instead of as str.
The Problem data are the datetime.date values '\x07á\x02\x10', starting at line 56 in raw-data.pkl.
It's a konwn Issue, as pointed already.
Unpickling python2 datetime under python3
http://bugs.python.org/issue22005
For a workaround, I have patched pickle.py and got unpickled object, e.g.
book.library.books[0].rentals[0].rental_date=2017-02-16
This will work for me:
t = pickle.loads(dumpstr, encoding="latin-1")
Output:
<main.test object at 0xf7095fec>
t.__dict__={'x': 'test ¢'}
test ¢
Tested with Python:3.4.2

Python AttributeError: function 'Search' not found

I am trying to control a Tektronix RSA306 Spectrum Analyzer by using the API. The program finds the RSA300API.dll file but throws an error when searching and connecting to the device. The program I am running is an example from Tektronix. The setup I am currently using is Python 2.7.12 x64(Anaconda 4.1.1) on 64 bit Windows 7.
from ctypes import *
import numpy as np
import matplotlib.pyplot as plt
I am locating the .dll file with:
rsa300 = WinDLL("RSA300API.dll")
The error occurs when executing the search function:
longArray = c_long*10
deviceIDs = longArray()
deviceSerial = c_wchar_p('')
numFound = c_int(0)
serialNum = c_char_p('')
nomenclature = c_char_p('')
header = IQHeader()
rsa300.Search(byref(deviceIDs), byref(deviceSerial), byref(numFound))
if numFound.value == 1:
rsa300.Connect(deviceIDs[0])
else:
print('Unexpected number of instruments found.')
exit()
When running the following error messages appear:
C:\Anaconda2\python.exe C:/Tektronix/RSA_API/lib/x64/trial
<WinDLL 'RSA300API.dll', handle e47b0000 at 3ae4e80>
Traceback (most recent call last):
File "C:/Tektronix/RSA_API/lib/x64/trial", line 44, in <module>
rsa300.Search(byref(deviceIDs), byref(deviceSerial), byref(numFound))
File "C:\Anaconda2\lib\ctypes\__init__.py", line 376, in __getattr__
func = self.__getitem__(name)
File "C:\Anaconda2\lib\ctypes\__init__.py", line 381, in __getitem__
func = self._FuncPtr((name_or_ordinal, self))
AttributeError: function 'Search' not found
The issue that I am having is that the 'Search' function is not found. What would be the solution to this problem?
Tektronix application engineer here.
The problem here is a mismatch of API versions. Your code is referencing an old version of the API (RSA300API.dll) and the error message is referencing a newer version of the API (RSA_API.dll). Make sure you have installed the most current version of the API and that you reference the correct dll in your code.
Here is a link to download the latest version of the RSA API (as of 11/1/16):
http://www.tek.com/model/rsa306-software
Here is a link to download the API documentation (as of 11/1/16). There is an Excel spreadsheet attached to this document that outlines the differences between old functions and new functions:
http://www.tek.com/spectrum-analyzer/rsa306-manual-6
Function names were changed in the new version using for the sake of clarity and consistency. The old version of the API didn't have prefixes for most functions, and it was unclear which functions were grouped together just from reading the function names. The new version of the API applies prefixes to all functions and it is now much easier to tell what functional group a given function is in just by reading its declaration. For example the old search and connect functions were simply called Search() and Connect(), and the new version of the functions are called DEVICE_Search() and DEVICE_Connect().
Note: I use cdll.LoadLibrary("RSA_API.dll") to load the dll rather than WinDLL().
DEVICE_Search() has slightly different arguments than Search(). Due to different argument data types, the new DEVICE_Search() function doesn't play as well with ctypes as the old Search() function does, but I've found a method that works (see code below).
Here is the search_connect() function I use at the beginning of my RSA control scripts:
from ctypes import *
import os
"""
################################################################
C:\Tektronix\RSA306 API\lib\x64 needs to be added to the
PATH system environment variable
################################################################
"""
os.chdir("C:\\Tektronix\\RSA_API\\lib\\x64")
rsa = cdll.LoadLibrary("RSA_API.dll")
"""#################CLASSES AND FUNCTIONS#################"""
def search_connect():
#search/connect variables
numFound = c_int(0)
intArray = c_int*10
deviceIDs = intArray()
#this is absolutely asinine, but it works
deviceSerial = c_char_p('longer than the longest serial number')
deviceType = c_char_p('longer than the longest device type')
apiVersion = c_char_p('api')
#get API version
rsa.DEVICE_GetAPIVersion(apiVersion)
print('API Version {}'.format(apiVersion.value))
#search
ret = rsa.DEVICE_Search(byref(numFound), deviceIDs,
deviceSerial, deviceType)
if ret != 0:
print('Error in Search: ' + str(ret))
exit()
if numFound.value < 1:
print('No instruments found. Exiting script.')
exit()
elif numFound.value == 1:
print('One device found.')
print('Device type: {}'.format(deviceType.value))
print('Device serial number: {}'.format(deviceSerial.value))
ret = rsa.DEVICE_Connect(deviceIDs[0])
if ret != 0:
print('Error in Connect: ' + str(ret))
exit()
else:
print('Unexpected number of devices found, exiting script.')
exit()

twisted xmlrpc and numpy float 64 exception

I'm using numpy to to some staff and then serve the results via a twisted/XMLRPC server. If the result is a numpy float 64, I get an exception cause probably twisted can't handle this type. Infact is I downgrade the result to float32 with x=float(x), everything is ok.
This is not so good cause if I forget this workaroud somewhere, it's a pain.
Have you any better solution?
server:
from twisted.web import xmlrpc, server
import numpy as np
class MioServer(xmlrpc.XMLRPC):
"""
An example object to be published.
"""
def xmlrpc_test_np(self):
return np.sqrt(2)
if __name__ == '__main__':
from twisted.internet import reactor
r = MioServer()
reactor.listenTCP(7080, server.Site(r))
reactor.run()
client:
import xmlrpclib
if __name__=='__main__':
x=xmlrpclib.ServerProxy('http://localhost:7080/')
print x.test_np()
Exception:
Traceback (most recent call last):
File "C:\Users\Stone\.eclipse\org.eclipse.platform_4.3.0_1709980481_win32_win32_x86\plugins\org.python.pydev_2.8.2.2013090511\pysrc\pydevd.py", line 1446, in <module>
debugger.run(setup['file'], None, None)
File "C:\Users\Stone\.eclipse\org.eclipse.platform_4.3.0_1709980481_win32_win32_x86\plugins\org.python.pydev_2.8.2.2013090511\pysrc\pydevd.py", line 1092, in run
pydev_imports.execfile(file, globals, locals) #execute the script
File "C:\Users\Stone\Documents\FastDose\src\Beagle\Prove e test\xmlrpc_client.py", line 28, in <module>
print x.test_np()
File "C:\Python27\lib\xmlrpclib.py", line 1224, in __call__
return self.__send(self.__name, args)
File "C:\Python27\lib\xmlrpclib.py", line 1578, in __request
verbose=self.__verbose
File "C:\Python27\lib\xmlrpclib.py", line 1264, in request
return self.single_request(host, handler, request_body, verbose)
File "C:\Python27\lib\xmlrpclib.py", line 1297, in single_request
return self.parse_response(response)
File "C:\Python27\lib\xmlrpclib.py", line 1473, in parse_response
return u.close()
File "C:\Python27\lib\xmlrpclib.py", line 793, in close
raise Fault(**self._stack[0])
xmlrpclib.Fault: <Fault 8002: "Can't serialize output: cannot marshal <type 'numpy.float64'> objects">
This has nothing to do with twisted. If you read the error message you posted you see near the end that the error arises in xmlrpclib.py.
The xml-rpc implementation uses marshal to serialize objects. However, the marshalling done by xml-rpc does not support handling third party objects like numpy.ndarray. The reason that it works when you convert to float is that the built-in float type is supported.
Before offering my solution, I should point out that this exact same thing has been asked elsewhere in several places easily found via google (1 2), and I am stealing my answers from there.
To do what you want, you can convert your numpy array to something that can be serialized. The simplest way to do this is to write flatten/unflatten functions. You would then call then call the flattener when sending, and the unflattener when receiving. Here's an example (taken from this post):
from cStringIO import StringIO
from numpy.lib import format
def unflatten(s):
f = StringIO(s)
arr = format.read_array(f)
return arr
def flatten(arr):
f = StringIO()
format.write_array(f, arr)
s = f.getvalue()
return s
An even simpler thing to do would be to call
<the array you want to send>.tolist()
on the sending side to convert to a python list, and then call
np.array(<the list you received>)
on the receiving side.
The only drawback of doing this is that you have to explicitly call the flattener and unflattener when you send and receive data. Although this is a bit more typing, it's not much, and if you forget the program will fail loudly (with the same error you already experienced) rather than silently doing something wrong.
I gather from your question that you don't like this, and would rather find a way to make numpy arrays work directly with xml-rpc without any explicit flattening/unflattening. I think this may not be possible, because the xml-rpc documentation specifically says that the only third party objects that can be serialized are new style classes with a __dict__ attribute, and in this case the keys must be strings and the values must be other conformable types.
So you see, if you want to support numpy arrays directly, it seems that you have to modify the way xml-rpc's marshalling works. It would be nice if you could just add some kind of method to a subclass of ndarray to support being marshalled, but it looks like that's not how it works.
I hope this helps you understand what's going on, and why the existing solutions use explicit flattening/unflattening.
-Daniel
P.S. This has made me curious about how to extend xml-rpc in python to support other types of objects, so I've posted my own question about that.
I have found this way: before sending the result, I pass it to funcion to_float()
def iterable(x):
try:
iter(x)
except: # not iterable
return False
else: # iterable
return True
def to_float(x):
from numpy import float64,ndarray
if type(x) == dict:
res = dict()
for name in iter(x):
res[name] = to_float(x[name])
return res
elif type(x) == ndarray:
return map(float, x.tolist())
elif type(x) == float64:
return float(x)
elif iterable(x) and not isinstance(x,str):
res=[]
for item in x:
if type(item) == float64:
res.append(float(item))
elif type(x) == ndarray:
res.append(map(float, x.tolist()))
else:
res.append(item)
return res
else:
return x
Tried for quite some time to transfer a 640x480x3 image with xmlrpc. Neither the proposed "tolist()" nor the "flatten" solution worked for me. Finally found this solution:
Sender:
def getCamPic():
cam = cv2.VideoCapture(0)
img = cam.read()
transferImg = img.tostring()
return transferImg
Receiver:
transferImg = proxy.getCamPic()
rawImg = np.fromstring(transferImg.data,dtype=np.uint8)
img = np.reshape(rawImg,(640,480,3))
As I know my cam resolution the hard coding is not an issue for me. An improved version might have this information included in the transferImg?

Categories