My issue is that a custom class has been saved with pickle.dump, since these files were saved the custom class has been changed and now when I use pickle.load I am getting this error. Is it a problem with the saved file?
The error:
File "/cprprod/extern/lib/python2.7/pickle.py", line 1378, in load
return Unpickler(file).load()
File "/cprprod/extern/lib/python2.7/pickle.py", line 858, in load
dispatch[key](self)
file "/cprprod/extern/lib/python2.7/pickle.py", line 1070, in load_inst
self._instantiate(klass, self.marker())
File "/cprprod/extern/lib/python2.7/pickle.py", line 1060, in _instantiate
value = klass(*args)
Is there anything I can do to load the file?
The code
file = open(filename,'rb')
obj = pickle.load(file)
will give me the error.
Here is some minimal code which can reproduce the error:
import pickle
class foo:
def __init__(self,a):
self.a = a
def __str__(self):
return str(self.a)
obj = foo(1)
with open('junk','wb') as f:
pickle.dump(obj,f)
class foo:
def __init__(self,a,b):
self.a = a
self.b = b
def __str__(self):
return '%s %s'%(self.a,self.b)
def __getinitargs__(self):
return (self.a,self.b)
with open('junk','rb') as f:
obj = pickle.load(f)
print str(obj)
Given the contrived code that I posted on your behalf in the question, we can "fix" this error as:
with open('junk','rb') as f:
try:
obj = pickle.load(f)
except Exception as e:
print e
position = f.tell()
a = foo.__getinitargs__
del foo.__getinitargs__
f.seek(position)
obj = pickle.load(f)
foo.__getinitargs__ = a
print str(obj)
Now we see that the instance has been unpickled and no longer has attribute b.
If you added __getinitargs__() then it is up to you to make sure your new class can handle the arguments passed to __init__(). Old data that doesn't have the __getinitargs__ data will still lead to __init__ to be called but with no arguments.
Make the arguments to __init__ optional via keyword arguments:
def __init__(self, otherarg=None):
if otherarg is None:
# created from an old-revision pickle. Handle separately.
# The pickle will be loaded *normally* and data will still be set normally
return
self.otherarg = otherarg
When loading the old-style pickle, the data for these classes will still be restored. You can use __setstate__() to transform the internal state as needed.
Alternatively, temporarily remove the __getinitargs__ method from the class:
initargs = foo.__getinitargs__.__func__
del foo.__getinitargs__
obj = pickle.load(f)
foo.__getinitargs__ = initargs
and re-dump your pickles from the now-loaded objects with __getinitargs__ reinstated.
I've tested both methods and in both cases the old data is loaded correctly and you can then dump your objects again to a new pickle file with __getinitargs__ just fine.
You might want to modify the custom class to optionally require a second parameter. This would keep back award compatibility with your pickled objects.
Related
I have a dilemma writing a context/resource manager wrapper for a file object which passes Pylint, for use with with: Do I put the wrapped open call in __init__, or in __enter__?
Conceptually, this is a class that wraps open by accepting either a filename, the string -, or a file object (such as sys.stdin) and does 'the right thing'. In other words, if it's a filename other than -, it opens the file and manages it as a resource; otherwise, it either chooses a default file object (expected to be sys.stdin or sys.stdout) if the filename is -, or if it is a file object, uses that file object unchanged.
I see two possibilities, neither of which are working out right now: The wrapped open goes in the __init__ constructor, or it goes in the __enter__ context management method. If I put it in the __init__ constructor, which examples on SO suggest, Pylint -- which I must pass -- fails on the open with R1732: Consider using 'with' for resource-allocating operations (consider-using-with). If I put the open in the __enter__ method, Pylint is happy, but I am not sure this is the correct practice, plus the problem is, in one use case, I need the file object in order to initialize a base class (and Pylint won't let me call the base class constructor in the __enter__ method).
Some example code is in order. Here is code that opens in the constructor:
class ManagedFile:
'''Manage a file, which could be an unopened filename, could be
'-' for stdin/stdout, or could be an existing filehandle'''
def __init__(self, file_in, handle_default, open_kwargs):
''' Open a file if given a filename, or handle_default if -,
or if it's a file object already, just pass it through.
:param file_in: Valid filename, '-', or file-like object
:param handle_default: What to return if file is '-'
:param open_kwargs: Dictionary of options to pass to open() if used
'''
self.file_handle = None
self.file_in = None
if isinstance(file_in, io.IOBase):
self.file_handle = file_in
elif isinstance(file_in, str):
if file_in is None or file_in == "-":
self.file_handle = handle_default
else:
self.file_handle = open(self.file_in, **open_kwargs)
self.file_in = file_in
else:
raise TypeError('File specified must be string or file object')
def __enter__(self):
self.file_handle.__enter__()
return self.file_handle
def __exit__(self, err_type, err_value, traceback):
self.file_handle.__exit__(err_type, err_value, traceback)
self.file_handle.close()
self.file_in = None
def handle(self):
'''Return handle of file that was opened'''
return self.file_handle
And here is how I would do it with the open call in the __enter__ method:
class ManagedFile:
'''Manage a file, which could be an unopened filename, could be
'-' for stdin/stdout, or could be an existing filehandle'''
def __init__(self, file_in, handle_default, open_kwargs):
''' Open a file if given a filename, or handle_default if -,
or if it's a file object already, just pass it through.
:param file_in: Valid filename, '-', or file-like object
:param handle_default: What to return if file is '-'
:param open_kwargs: Dictionary of options to pass to open() if used
:return: Managed file object
'''
self.managed = False
self.file_handle = None
self.open_kwargs = {'mode': 'r'}
self.file_in = None
if isinstance(file_in, io.IOBase):
self.file_handle = file_in
elif isinstance(file_in, str):
if file_in is None or file_in == "-":
self.file_handle = handle_default
else:
self.file_in = file_in
self.open_kwargs = open_kwargs
self.managed = True
else:
raise TypeError('File specified must be string or file object')
def __enter__(self):
if self.managed:
self.file_handle = open(self.file_in, **self.open_kwargs)
self.file_handle.__enter__()
return self.file_handle
def __exit__(self, err_type, err_value, traceback):
self.file_handle.__exit__(err_type, err_value, traceback)
self.file_handle.close()
self.managed = False
self.file_in = None
def handle(self):
'''Return handle of file that was opened'''
return self.file_handle
I've pored over many SO questions & answers, but haven't been able to triangulate the exact answer I'm looking for, particular one that accounts for why Pylint flags an error when I ostensibly do the right thing.
Undoubtedly I've committed some Python errors in this code so any other ancillary correction would be welcome. Other ideas are also welcome but please don't get too fancy on me.
Ideally the class would itself behave as a full-fledged file object, but right now I'm focusing on something simple: Something that just manages a file handle (i.e. a reference to a regular file object). Extra gratitude if someone can provide some hints on turning it into a file object.
Python version is 3.8.10; platform is Linux.
If you don't need this to be a class, this feels like it's a lot simpler.
import contextlib
import io
import sys
def open_or_stdout(file_or_path, mode):
"""
Returns a context manager with either the file path opened,
the file object passed through or standard output
"""
if isinstance(file_or_path, io.IOBase):
# Input is already a file-like object. Just pass it through
return file_or_path
if file_or_path == "-":
return contextlib.nullcontext(sys.stdout)
return open(path, mode=mode)
If you want to stick with the class, I think you need to return the value of self.file_handle.__enter__() in your implementation of __enter__() rather than self.file_handle.
I have a pickle file that was created with python 2.7 that I'm trying to port to python 3.6. The file is saved in py 2.7 via pickle.dumps(self.saved_objects, -1)
and loaded in python 3.6 via loads(data, encoding="bytes") (from a file opened in rb mode). If I try opening in r mode and pass encoding=latin1 to loads I get UnicodeDecode errors. When I open it as a byte stream it loads, but literally every string is now a byte string. Every object's __dict__ keys are all b"a_variable_name" which then generates attribute errors when calling an_object.a_variable_name because __getattr__ passes a string and __dict__ only contains bytes. I feel like I've tried every combination of arguments and pickle protocols already. Apart from forcibly converting all objects' __dict__ keys to strings I'm at a loss. Any ideas?
** Skip to 4/28/17 update for better example
-------------------------------------------------------------------------------------------------------------
** Update 4/27/17
This minimum example illustrates my problem:
From py 2.7.13
import pickle
class test(object):
def __init__(self):
self.x = u"test ¢" # including a unicode str breaks things
t = test()
dumpstr = pickle.dumps(t)
>>> dumpstr
"ccopy_reg\n_reconstructor\np0\n(c__main__\ntest\np1\nc__builtin__\nobject\np2\nNtp3\nRp4\n(dp5\nS'x'\np6\nVtest \xa2\np7\nsb."
From py 3.6.1
import pickle
class test(object):
def __init__(self):
self.x = "xyz"
dumpstr = b"ccopy_reg\n_reconstructor\np0\n(c__main__\ntest\np1\nc__builtin__\nobject\np2\nNtp3\nRp4\n(dp5\nS'x'\np6\nVtest \xa2\np7\nsb."
t = pickle.loads(dumpstr, encoding="bytes")
>>> t
<__main__.test object at 0x040E3DF0>
>>> t.x
Traceback (most recent call last):
File "<pyshell#15>", line 1, in <module>
t.x
AttributeError: 'test' object has no attribute 'x'
>>> t.__dict__
{b'x': 'test ¢'}
>>>
-------------------------------------------------------------------------------------------------------------
Update 4/28/17
To re-create my issue I'm posting my actual raw pickle data here
The pickle file was created in python 2.7.13, windows 10 using
with open("raw_data.pkl", "wb") as fileobj:
pickle.dump(library, fileobj, protocol=0)
(protocol 0 so it's human readable)
To run it you'll need classes.py
# classes.py
class Library(object): pass
class Book(object): pass
class Student(object): pass
class RentalDetails(object): pass
And the test script here:
# load_pickle.py
import pickle, sys, itertools, os
raw_pkl = "raw_data.pkl"
is_py3 = sys.version_info.major == 3
read_modes = ["rb"]
encodings = ["bytes", "utf-8", "latin-1"]
fix_imports_choices = [True, False]
files = ["raw_data_%s.pkl" % x for x in range(3)]
def py2_test():
with open(raw_pkl, "rb") as fileobj:
loaded_object = pickle.load(fileobj)
print("library dict: %s" % (loaded_object.__dict__.keys()))
return loaded_object
def py2_dumps():
library = py2_test()
for protcol, path in enumerate(files):
print("dumping library to %s, protocol=%s" % (path, protcol))
with open(path, "wb") as writeobj:
pickle.dump(library, writeobj, protocol=protcol)
def py3_test():
# this test iterates over the different options trying to load
# the data pickled with py2 into a py3 environment
print("starting py3 test")
for (read_mode, encoding, fix_import, path) in itertools.product(read_modes, encodings, fix_imports_choices, files):
py3_load(path, read_mode=read_mode, fix_imports=fix_import, encoding=encoding)
def py3_load(path, read_mode, fix_imports, encoding):
from traceback import print_exc
print("-" * 50)
print("path=%s, read_mode = %s fix_imports = %s, encoding = %s" % (path, read_mode, fix_imports, encoding))
if not os.path.exists(path):
print("start this file with py2 first")
return
try:
with open(path, read_mode) as fileobj:
loaded_object = pickle.load(fileobj, fix_imports=fix_imports, encoding=encoding)
# print the object's __dict__
print("library dict: %s" % (loaded_object.__dict__.keys()))
# consider the test a failure if any member attributes are saved as bytes
test_passed = not any((isinstance(k, bytes) for k in loaded_object.__dict__.keys()))
print("Test %s" % ("Passed!" if test_passed else "Failed"))
except Exception:
print_exc()
print("Test Failed")
input("Press Enter to continue...")
print("-" * 50)
if is_py3:
py3_test()
else:
# py2_test()
py2_dumps()
put all 3 in the same directory and run c:\python27\python load_pickle.py first which will create 1 pickle file for each of the 3 protocols. Then run the same command with python 3 and notice that it version converts the __dict__ keys to bytes. I had it working for about 6 hours, but for the life of me I can't figure out how I broke it again.
In short, you're hitting bug 22005 with datetime.date objects in the RentalDetails objects.
That can be worked around with the encoding='bytes' parameter, but that leaves your classes with __dict__ containing bytes:
>>> library = pickle.loads(pickle_data, encoding='bytes')
>>> dir(library)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: '<' not supported between instances of 'str' and 'bytes'
It's possible to manually fix that based on your specific data:
def fix_object(obj):
"""Decode obj.__dict__ containing bytes keys"""
obj.__dict__ = dict((k.decode("ascii"), v) for k, v in obj.__dict__.items())
def fix_library(library):
"""Walk all library objects and decode __dict__ keys"""
fix_object(library)
for student in library.students:
fix_object(student)
for book in library.books:
fix_object(book)
for rental in book.rentals:
fix_object(rental)
But that's fragile and enough of a pain you should be looking for a better option.
1) Implement __getstate__/__setstate__ that maps datetime objects to a non-broken representation, for instance:
class Event(object):
"""Example class working around datetime pickling bug"""
def __init__(self):
self.date = datetime.date.today()
def __getstate__(self):
state = self.__dict__.copy()
state["date"] = state["date"].toordinal()
return state
def __setstate__(self, state):
self.__dict__.update(state)
self.date = datetime.date.fromordinal(self.date)
2) Don't use pickle at all. Along the lines of __getstate__/__setstate__, you can just implement to_dict/from_dict methods or similar in your classes for saving their content as json or some other plain format.
A final note, having a backreference to library in each object shouldn't be required.
You should treat pickle data as specific to the (major) version of Python that created it.
(See Gregory Smith's message w.r.t. issue 22005.)
The best way to get around this is to write a Python 2.7 program to read the pickled data, and write it out in a neutral format.
Taking a quick look at your actual data, it seems to me that an SQLite database is appropriate as an interchange format, since the Books contain references to a Library and RentalDetails. You could create separate tables for each.
Question: Porting pickle py2 to py3 strings become bytes
The given encoding='latin-1' below, is ok.
Your Problem with b'' are the result of using encoding='bytes'.
This will result in dict-keys being unpickled as bytes instead of as str.
The Problem data are the datetime.date values '\x07á\x02\x10', starting at line 56 in raw-data.pkl.
It's a konwn Issue, as pointed already.
Unpickling python2 datetime under python3
http://bugs.python.org/issue22005
For a workaround, I have patched pickle.py and got unpickled object, e.g.
book.library.books[0].rentals[0].rental_date=2017-02-16
This will work for me:
t = pickle.loads(dumpstr, encoding="latin-1")
Output:
<main.test object at 0xf7095fec>
t.__dict__={'x': 'test ¢'}
test ¢
Tested with Python:3.4.2
I have server and client programs that communicate with each other through a network socket.
What I want is to send a directory entry (scandir.DirEntry) obtained from scandir.scandir() through the socket.
For now I am using pickle and cPickle modules and have come up with the following (excerpt only):
import scandir, pickle
s = scandir.scandir("D:\\PYTHON")
entry = s.next()
data = pickle.dumps(entry)
However, I am getting the following error stack:
File "untitled.py", line 5, in <module>
data = pickle.dumps(item)
File "C:\Python27\Lib\pickle.py", line 1374, in dumps
Pickler(file, protocol).dump(obj)
File "C:\Python27\Lib\pickle.py", line 224, in dump
self.save(obj)
File "C:\Python27\Lib\pickle.py", line 306, in save
rv = reduce(self.proto)
File "C:\Python27\Lib\copy_reg.py", line 70, in _reduce_ex
raise TypeError, "can't pickle %s objects" % base.__name__
TypeError: can't pickle DirEntry objects
How can I get rid of this error?
I have heard of using marshall or JSON.
UPDATE: JSON is not dumping all the data within the object.
Is there any completely different way to do so to send the object through the socket?
Thanks in advance for any help.
Yes, os.DirEntry objects are intended to be short-lived, not really kept around or serialized. If you need the data in them to be serialized, looks like you've figured that out in your own answer -- serialize (pickle) a dict version of the attributes you need.
To deserialize into an object that walks and quacks like an os.DirEntry instance, create a PseudoDirEntry class that mimics the things you need.
Note that you can directly serialize the stat object already, which saves you picking the fields out of that.
Combined, that would look like this:
class PseudoDirEntry:
def __init__(self, name, path, is_dir, stat):
self.name = name
self.path = path
self._is_dir = is_dir
self._stat = stat
def is_dir(self):
return self._is_dir
def stat(self):
return self._stat
And then:
>>> import os, pickle
>>> entry = list(os.scandir())[0]
>>> pickled = pickle.dumps({'name': entry.name, 'path': entry.path, 'is_dir': entry.is_dir(), 'stat': entry.stat()})
>>> loaded = pickle.loads(pickled)
>>> pseudo = PseudoDirEntry(loaded['name'], loaded['path'], loaded['is_dir'], loaded['stat'])
>>> pseudo.name
'.DS_Store'
>>> pseudo.is_dir()
False
>>> pseudo.stat()
os.stat_result(st_mode=33188, st_ino=8370294, st_dev=16777220, st_nlink=1, st_uid=502, st_gid=20, st_size=8196, st_atime=1478356967, st_mtime=1477601172, st_ctime=1477601172)
Well I myself have figured out that for instances of non-standard classes like this scandir.DirEntry, the best way is to convert the class member data into a (possibly nested) combination of standard objects like (list, dict, etc.).
For example, in the particular case of scandir.DirEntry, it can be done as follows.
import scandir, pickle
s = scandir.scandir("D:\\PYTHON")
entry = s.next()
# first convert the stat object to st_
st = entry.stat()
st_ = {'st_mode':st.st_mode, 'st_size':st.st_size,\
'st_atime':st.st_atime, 'st_mtime':st.st_mtime,\
'st_ctime':st.st_ctime}
# now convert the entry object to entry_
entry_ = {'name':entry.name, 'is_dir':entry.is_dir(), \
'path':entry.path, 'stat':st_}
# one may need some other class member data also as necessary
# now pickle the converted entry_
data = pickle.dumps(entry_)
Although for my purpose, I only require the data, after the unpickling in the other end, one may need to reconstruct the unpickled entry_ to unpickled scandir.DirEntry object 'entry'. However, I am yet to figure out how to reconstruct the class instance and set the data for the behaviour of methods like is_dir(), stat().
This represents a simple class, that I have made to try and practice OOP.
import csv
import logging
class LoaderCSV:
def __init__(self, file):
self.file = file
if file is None:
logging.warning('Missing input file.')
def load(self):
with open(self.file) as f:
holder = csv.reader(f)
file_data = list(holder)
return file_data
What happens is when I call this class with:
data = LoaderCSV.load(input_file)
I get
line 14, in load
with open(self.file) as f:
AttributeError: 'str' object has no attribute 'file'
I must be messing something up, but can't understand what. My previous attempt worked just fine this way. I just don't understand why
self.file
does not pass the value, assigned to the argument, when it is defined under __init__
The problem is you're calling an instance method as a static method, so your filename is being passed in instead of self. The proper way to do this would be like:
loader = LoaderCSV(input_file)
data = loader.load()
This will pass in loader as the self parameter, allowing you to access the file name in the object's file field.
Check out the Python documentation on classes for more information.
You need to create the LoaderCSV object first, then call the load method on that object.
loader = LoaderCSV(input_file)
data = loader.load()
The way to use the instance method load is to make an instance of your class, and then call the method on that class. Like this:
myloader = LoaderCSV(input_file)
data = myloader.load()
or succinctly:
data = LoaderCSV(input_file).load()
I've a class in python that contains a static method. I want to mock.patch it in order to see if it was called. When trying to do it I get an error:
AttributeError: path.to.A does not have the attribute 'foo'
My setup can be simplified to:
class A:
#staticMethod
def foo():
bla bla
Now the test code that fails with error:
def test():
with mock.patch.object("A", "foo") as mock_helper:
mock_helper.return_value = ""
A.some_other_static_function_that_could_call_foo()
assert mock_helper.call_count == 1
You can always use patch as a decorator, my preferred way of patching things:
from mock import patch
#patch('absolute.path.to.class.A.foo')
def test(mock_foo):
mock_foo.return_value = ''
# ... continue with test here
EDIT: Your error seems to hint that you have a problem elsewhere in your code. Possibly some signal or trigger that requires this method that is failing?
I was getting that same error message when trying to patch a method using the #patch decorator.
Here is the full error I got.
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/tornado/testing.py", line 136, in __call__
result = self.orig_method(*args, **kwargs)
File "/usr/local/lib/python3.6/unittest/mock.py", line 1171, in patched
arg = patching.__enter__()
File "/usr/local/lib/python3.6/unittest/mock.py", line 1243, in __enter__
original, local = self.get_original()
File "/usr/local/lib/python3.6/unittest/mock.py", line 1217, in get_original
"%s does not have the attribute %r" % (target, name)
AttributeError: <module 'py-repo.models.Device' from
'/usr/share/projects/py-repo/models/Device.py'> does not have the attribute 'get_device_from_db'
What I ended up doing to fix this was changing the patch decorator I used
from
#patch('py-repo.models.Device.get_device_from_db')
to #patch.object(DeviceModel, 'get_device_from_db')
I really wish I could explain further why that was the issue but I'm still pretty new to Python myself. The patch documentation was especially helpful in figuring out what was available to work with. Important: I should note that get_device_from_db uses the #staticmethod decorator which may be changing things. Hope it helps though.
What worked for me:
#patch.object(RedisXComBackend, '_handle_conn')
def test_xcoms(self, mock_method: MagicMock):
mock_method.return_value = fakeredis.FakeStrictRedis()
'_handle_conn' (static function) looks like this:
#staticmethod
def _handle_conn():
redis_hook = RedisHook()
conn: Redis = redis_hook.get_conn()