Unpickling "None" object in Python - python

I am using redis to try to save a request's session object. Based on how to store a complex object in redis (using redis-py), I have:
def get_object_redis(key,r):
saved = r.get(key)
obj = pickle.loads(saved)
return obj
redis = Redis()
s = get_object_redis('saved',redis)
I have situations where there is no saved session and 'saved' evaluates to None. In this case I get:
TypeError: must be string or buffer, not None
Whats the best way to deal with this?

There are several ways to deal with it. This is what they would have in common:
def get_object_redis(key,r):
saved = r.get(key)
if saved is None:
# maybe add code here
return ... # return something you expect
obj = pickle.loads(saved)
return obj
You need to make it clear what you expect if a key is not found.
Version 1
An example would be you just return None:
def get_object_redis(key,r):
saved = r.get(key)
if saved is None:
return None
obj = pickle.loads(saved)
return obj
redis = Redis()
s = get_object_redis('saved',redis)
s is then None. This may be bad because you need to handle that somewhere and you do not know whether it was not found or it was found and really None.
Version 2
You create an object, maybe based on the key, that you can construct because you know what lies behind a key.
class KeyWasNotFound(object):
# just an example class
# maybe you have something useful in mind
def __init__(self, key):
self.key = key
def get_object_redis(key,r):
saved = r.get(key)
if saved is None:
return KeyWasNotFound(key)
obj = pickle.loads(saved)
return obj
Usually, if identity is important, you would store the object after you created it, to return the same object for the key.
Version 3
TypeError is a very geneneric error. You can create your own error class. This would be the preferred way for me, because I do not like version 1 and do not have knowledge of which object would be useful to return.
class NoRedisObjectFoundForKey(KeyError):
pass
def get_object_redis(key,r):
saved = r.get(key)
if saved is None:
raise NoRedisObjectFoundForKey(key)
obj = pickle.loads(saved)
return obj

Related

Is it really impossible to unpickle a Python class if the original python file has been deleted?

Suppose you have the following:
file = 'hey.py'
class hey:
def __init__(self):
self.you =1
ins = hey()
temp = open("cool_class", "wb")
pickle.dump(ins, temp)
temp.close()
Now suppose you delete the file hey.py and you run the following code:
pkl_file = open("cool_class", 'rb')
obj = pickle.load(pkl_file)
pkl_file.close()
You'll get an error. I get that it's probably the case that you can't work around the problem of if you don't have the file hey.py with the class and the attributes of that class in the top level then you can't open the class with pickle. But it has to be the case that I can find out what the attributes of the serialized class are and then I can reconstruct the deleted file and open the class. I have pickles that are 2 years old and I have deleted the file that I used to construct them and I just have to find out what what the attributes of those classes are so that I can reopen these pickles
#####UPDATE
I know from the error messages that the module that originally contained the old class, let's just call it 'hey.py'. And I know the name of the class let's call it 'you'. But even after recreating the module and building a class called 'you' I still can't get the pickle to open. So I wrote this code on the hey.py module like so:
class hey:
def __init__(self):
self.hey = 1
def __setstate__(self):
self.__dict__ = ''
self.you = 1
But I get the error message: TypeError: init() takes 1 positional argument but 2 were given
#########UPDATE 2:
I Changed the code from
class hey:
to
class hey():
I then got an AttributeError but it doesn't tell me what attribute is missing. I then performed
obj= pickletools.dis(file)
And got an error on the pickletools.py file here
def _genops(data, yield_end_pos=False):
if isinstance(data, bytes_types):
data = io.BytesIO(data)
if hasattr(data, "tell"):
getpos = data.tell
else:
getpos = lambda: None
while True:
pos = getpos()
code = data.read(1)
opcode = code2op.get(code.decode("latin-1"))
if opcode is None:
if code == b"":
raise ValueError("pickle exhausted before seeing STOP")
else:
raise ValueError("at position %s, opcode %r unknown" % (
"<unknown>" if pos is None else pos,
code))
if opcode.arg is None:
arg = None
else:
arg = opcode.arg.reader(data)
if yield_end_pos:
yield opcode, arg, pos, getpos()
else:
yield opcode, arg, pos
if code == b'.':
assert opcode.name == 'STOP'
break
At this line:
code = data.read(1)
saying: AttributeError: 'str' object has no attribute 'read'
I will now try the other methods in the pickletools
########### UPDATE 3
I wanted to see what happened when I saved an object composed mostly of dictionary but some of the values in the dictionaries were classes. This is the class that was saved:
so here is the class in question:
class fss(frozenset):
def __init__(self, *args, **kwargs):
super(frozenset, self).__init__()
def __str__(self):
str1 = lbr + "{}" + rbr
return str1.format(','.join(str(x) for x in self))
Now keep in mind that the object pickled is mostly a dictionary and that class exists within the dictionary. After performing
obj= pickletools.genops(file)
I get the following output:
image
image2
I don't see how I would be able to construct the class referred to with that data if I hadn't known what the class was.
############### UPDATE #4
#AKK
Thanks for helping me out. I am able to see how your code works but my pickled file saved from 2 years ago and whose module and class have long since been deleted, I cannot open it into a bytes-like object which to me seems to be a necessity.
So the path of the file is
file ='hey.pkl'
pkl_file = open(file, 'rb')
x = MagicUnpickler(io.BytesIO(pkl_file)).load()
This returns the error:
TypeError: a bytes-like object is required, not '_io.BufferedReader'
But I thought the object was a bytes object since I opened it with open(file, 'rb')
############ UPDATE #5
Actually, I think with AKX's help I've solved the problem.
So using the code:
pkl_file = open(name, 'rb')
x = MagicUnpickler(pkl_file).load()
I then created two blank modules which once contained the classes found in the save pickle, but I did not have to put the classes on them. I was getting an error in the file pickle.py here:
def load_reduce(self):
stack = self.stack
args = stack.pop()
func = stack[-1]
try:
stack[-1] = func(*args)
except TypeError:
pass
dispatch[REDUCE[0]] = load_reduce
So after excepting that error, everything worked. I really want to thank AKX for helping me out. I have actually been trying to solve this problem for about 5 years because I use pickles far more often than most programmers. I used to not understand that if you alter a class then that ruins any pickled files saved with that class so I ran into this problem again and again. But now that I'm going back over some code which is 2 years old and it looks like some of the files were deleted, I'm going to need this code a lot in the future. So I really appreciate your help in getting this problem solved.
Well, with a bit of hacking and magic, sure, you can hydrate missing classes, but I'm not guaranteeing this will work for all pickle data you may encounter; for one, this doesn't touch the __setstate__/__reduce__ protocols, so I don't know if they work.
Given a script file (so72863050.py in my case):
import io
import pickle
import types
from logging import Formatter
# Create a couple empty classes. Could've just used `class C1`,
# but we're coming back to this syntax later.
C1 = type('C1', (), {})
C2 = type('C2', (), {})
# Create an instance or two, add some data...
inst = C1()
inst.child1 = C2()
inst.child1.magic = 42
inst.child2 = C2()
inst.child2.mystery = 'spooky'
inst.child2.log_formatter = Formatter('heyyyy %(message)s') # To prove we can unpickle regular classes still
inst.other_data = 'hello'
inst.some_dict = {'a': 1, 'b': 2}
# Pickle the data!
pickle_bytes = pickle.dumps(inst)
# Let's erase our memory of these two classes:
del C1
del C2
try:
print(pickle.loads(pickle_bytes))
except Exception as exc:
pass # Can't get attribute 'C1' on <module '__main__'> – yep, it certainly isn't there!
we now have successfully created some pickle data that we can't load anymore, since we forgot about those two classes. Now, since the unpickling mechanism is customizable, we can derive a magic unpickler, that in the face of certain defeat (or at least an AttributeError), synthesizes a simple class from thin air:
# Could derive from Unpickler, but that may be a C class, so our tracebacks would be less helpful
class MagicUnpickler(pickle._Unpickler):
def __init__(self, fp):
super().__init__(fp)
self._magic_classes = {}
def find_class(self, module, name):
try:
return super().find_class(module, name)
except AttributeError:
return self._create_magic_class(module, name)
def _create_magic_class(self, module, name):
cache_key = (module, name)
if cache_key not in self._magic_classes:
cls = type(f'<<Emulated Class {module}:{name}>>', (types.SimpleNamespace,), {})
self._magic_classes[cache_key] = cls
return self._magic_classes[cache_key]
Now, when we run that magic unpickler against a stream from the aforebuilt pickle_bytes that plain ol' pickle.loads() couldn't load...
x = MagicUnpickler(io.BytesIO(pickle_bytes)).load()
print(x)
print(x.child1.magic)
print(x.child2.mystery)
print(x.child2.log_formatter._style._fmt)
prints out
<<Emulated Class __main__:C1>>(child1=<<Emulated Class __main__:C2>>(magic=42), child2=<<Emulated Class __main__:C2>>(mystery='spooky'), other_data='hello', some_dict={'a': 1, 'b': 2})
42
spooky
heyyyy %(message)s
Hey, magic!
The error in function load_reduce(self) can be re-created by:
class Y(set):
pass
pickle_bytes = io.BytesIO(pickle.dumps(Y([2, 3, 4, 5])))
del Y
print(MagicUnpickler(pickle_bytes).load())
AKX's answer do not solve cases when the class inherit from base classes as set, dict, list,...

How checking lookup depth into nested dictionary as class attribute?

I created a nested dictionary based on AttrDict found there :
Object-like attribute access for nested dictionary
I modified it to contain str commands in "leaves" that gets executed when the value is requested/written to :
commands = {'root': {'com': {'read': 'READ_CMD', 'write': 'WRITE_CMD'} } }
class AttrTest()
def __init__:
self.__dict__['attr'] = AttrDict(commands)
test = AttrTest()
data = test.attr.root.com.read # data = value read with the command
test.attr.root.com.write = data # data = value written on the com port
While it works beautifully, I'd like to :
Avoid people getting access to attr/root/com as these returns a sub-level dictonary
People accessing attr.root.com directly (through __getattribute__/__setattr__)
Currently, I'm facing the following problems :
As said, when accessing the 'trunk' of the nested dict, I get a partial dict of the 'leaves'
When accessing attr.root.com it returns {'read': 'READ_CMD', 'write': 'WRITE_CMD'}
If detecting a read I do a forward lookup and return the value, but then attr.root.com.read fails
Is it possible to know what is the final level Python will request in the "path" ?
To block access to attr/root
To read/write the value accessing attr.root.com directly (using forward lookup)
To return the needed partial dict only if attr.root.com.read or attr.root.com.write are requested
Currently I've found nothing that allows me to control how deep the lookup is expected to go.
Thanks for your consideration.
For a given attribute lookup you cannot determine how many others will follow; this is how Python works. In order to resolve x.y.z, first the object x.y needs to be retrieved before the subsequent attribute lookup (x.y).z can be performed.
What you can do however, is return a proxy object that represents the (partial) path instead of the actual underlying object which is stored in the dict. So for example if you did test.attr.com then this would return a proxy object which represents the path attr.com to-be-looked up on the test object. Only when you encounter a read or write leaf in the path, you would resolve the path and read/write the data.
The following is a sample implementation which uses an AttrDict based on __getattr__ to provide the Proxy objects (so you don't have to intercept __getattribute__):
from functools import reduce
class AttrDict(dict):
def __getattr__(self, name):
return Proxy(self, (name,))
def _resolve(self, path):
return reduce(lambda d, k: d[k], path, self)
class Proxy:
def __init__(self, obj, path):
object.__setattr__(self, '_obj', obj)
object.__setattr__(self, '_path', path)
def __str__(self):
return f"Path<{'.'.join(self._path)}>"
def __getattr__(self, name):
if name == 'read':
return self._obj._resolve(self._path)[name]
else:
return type(self)(self._obj, (*self._path, name))
def __setattr__(self, name, value):
if name != 'write' or name not in (_dict := self._obj._resolve(self._path)):
raise AttributeError(f'Cannot set attribute {name!r} for {self}')
_dict[name] = value
commands = {'root': {'com': {'read': 'READ_CMD', 'write': 'WRITE_CMD'} } }
test = AttrDict({'attr': commands})
print(f'{test.attr = !s}') # Path<attr>
print(f'{test.attr.root = !s}') # Path<attr.root>
print(f'{test.attr.root.com = !s}') # Path<attr.root.com>
print(f'{test.attr.root.com.read = !s}') # READ_CMD
test.attr.root.com.write = 'test'
test.attr.root.write = 'illegal' # raises AttributeError

Python Pickle not saving entire object

I'm trying to pickle out a list of objects where the objects contain a list. When I open the pickled file I can see any data in my objects except from the list. I'm putting code below so this makes more sense.
Object that contains a list.
class TestPickle:
testNumber = None
testList = []
def addNumber(self, value):
self.testNumber = value
def getNumber(self):
return self.testNumber
def addTestList(self, value):
self.testList.append(value)
def getTestList(self):
return self.testList
This example I create a list of the above object (I'm adding one object to keep it brief)
testPKL = TestPickle()
testList = []
testPKL.addNumber(12)
testPKL.addTestList(1)
testPKL.addTestList(2)
testList.append(testPKL)
with open(os.path.join(os.path.curdir, 'test.pkl'), 'wb') as f:
pickle.dump(testList, f)
Here is an example of me opening the pickled file and trying to access the data, I can only retrieve the testNumber from above, the testList returns a empty list.
pklResult = None
with open(os.path.join(os.path.curdir, 'test.pkl'), 'rb') as f:
pklResult = pickle.load(f)
for result in pklResult:
print result.getNumber() # returns 12
print result.testNumber # returns 12
print result.getTestList() # returns []
print result.testList # returns []
I think i'm missing something obvious here but I'm not having any luck spotting it. Thanks for any guidance.
testNumber and testList both are class attributes initially. testNumber is of immutable type hence modifying it create new instance attribute, But testList is of mutable type and can be modified in place. Hence modifying testList doesn't create new instance attribute and it remains as class attribute.
You can verify it -
print testPKL.__dict__
{'testNumber': 12}
print result.__dict__
{'testNumber': 12}
So when you access result.testList, it looks for class attribute TestPickle.testList, which is [] in your case.
Solution
You are storing instance in pickle so use instance attribute. Modify TestPickle class as below -
class TestPickle:
def __init__(self):
self.testNumber = None
self.testList = []
def addNumber(self, value):
self.testNumber = value
def getNumber(self):
return self.testNumber
def addTestList(self, value):
self.testList.append(value)
def getTestList(self):
return self.testList

GAE converting dictionary to NDB datastore entity

I would like to ask some guidelines on a small task that I am trying to solve.
I am experimenting with a small app that uses JSON data to save entities.
I know that you can easily convert a dict to an entity by just creating the model but, I am trying to build a more generic approach that would convert any dict to an entity.
My steps are:
Get the dict.
Validate that the dict keys correspond to an entitys model definitions by reading the class.dict of the model.
Try to unpack the validated properties in the model class contructor (create the model instance)
return it.
So far I am ok but lack of my python knowledge, is either constraining me, or confusing me.
Maybe I am as well forgetting or unaware of more simple way to do it.
So here is it:
#classmethod
def entity_from_dict(cls, parent_key, dict):
valid_properties = {}
logging.info(cls.__dict__)
for property,value in dict.iteritems():
if property in cls.__dict__: # should not iterate over functions, classmethods, and #property
logging.info(cls.__dict__[property]) # this outputs eg: StringProperty('title', required=True)
logging.info(type(cls.__dict__[property])) #this is more interesting <class 'google.appengine.ext.ndb.model.StringProperty'>
valid_properties.update({property: value})
# Update the id from the dict
if 'id' in dict: # if not creating a new entity
valid_properties['id'] = dict['id']
# Add the parent
valid_properties['parent'] = parent_key
#logging.info(valid_properties)
try:
entity = cls(**valid_properties)
except Exception as e:
logging.exception('Could not create entity \n' + repr(e))
return False
return entity
My problem is that I want only to validate ndb. Properties and not #classmethods, #property as well because this causes a conflict.
I am also using expando classes, so any property in the dict that is extra gets stored.
How can I check against these specific types?
Solved it as #Tim Hoffman proposed using the ._properties of the Ndb model.
A thing I didn't know is that via the ._properties I could get the model definition properties and I thought that it would only return the instance properties :-).
Also I did not use populate because I find that it does the same as passing the valid dict unpacked in the model's contructor ;-)
So here it is:
#classmethod
def entity_from_dict(cls, parent_key, data_dict):
valid_properties = {}
for cls_property in cls._properties:
if cls_property in data_dict:
valid_properties.update({cls_property: data_dict[cls_property]})
#logging.info(valid_properties)
# Update the id from the data_dict
if 'id' in data_dict: # if creating a new entity
valid_properties['id'] = data_dict['id']
# Add the parent
valid_properties['parent'] = parent_key
try:
entity = cls(**valid_properties)
except Exception as e:
logging.exception('Could not create entity \n' + repr(e))
return False
return entity
The JSON dump method in python which we using during the converting models to JSON for export converts non-strings into strings. Therefore Jimmy Kane methods throw the error due to model incompatibility. To avoid this problem I updated his method and added a method named prop_literal just for converting non-string characters which capsuled in the string into their literal type.
I also added the entity.put() to add the entity to datastore because the aim was that :)
def prop_literal(prop_type,prop_val):
"""
Convert non-string encapsulated in the string into literal type
"""
if "Integer" in prop_type:
return int(prop_val)
elif "Float" in prop_type:
return float(prop_val)
elif "DateTime" in prop_type:
# bos gecsin neticede locale
return None
elif ("String" in prop_type) or ("Text" in prop_type):
return prop_val
elif "Bool" in prop_type:
return True if prop_val == True else False
else:
return prop_val
def entity_from_dict(cls, parent_key, data_dict):
valid_properties = {}
for cls_property in cls._properties:
if cls_property in data_dict:
prop_type = str(cls._properties[cls_property])
# logging.info(prop_type)
real_val = prop_literal(prop_type,data_dict[cls_property])
try:
valid_properties.update({cls_property: real_val})
except Exception as ex:
# logging.info("Veri aktariminda hata:"+str(ex))
else:
# logging.info("prop skipped")
#logging.info(valid_properties)
# Update the id from the data_dict
if 'id' in data_dict: # if creating a new entity
valid_properties['id'] = data_dict['id']
# Add the parent
valid_properties['parent'] = parent_key
try:
entity = cls(**valid_properties)
logging.info(entity)
entity.put()
except Exception as e:
logging.exception('Could not create entity \n' + repr(e))
return False
return entity

Python / YAML: How to initialize additional objects not just from the YAML file, within loadConfig?

I have what I think is a small misconception with loading some YAML objects. I defined the class below.
What I want to do is load some objects with the overridden loadConfig function for YAMLObjects. Some of these come from my .yaml file, but others should be built out of objects loaded from the YAML file.
For instance, in the class below, I load a member object named "keep" which is a string naming some items to keep in the region. But I want to also parse this into a list and have the list stored as a member object too. And I don't want the user to have to give both the string and list version of this parameter in the YAML.
My current work around has been to override the __getattr__ function inside Region and make it create the defaults if it looks and doesn't find them. But this is clunky and more complicated than needed for just initializing objects.
What convention am I misunderstanding here. Why doesn't the loadConfig method create additional things not found in the YAML?
import yaml, pdb
class Region(yaml.YAMLObject):
yaml_tag = u'!Region'
def __init__(self, name, keep, drop):
self.name = name
self.keep = keep
self.drop = drop
self.keep_list = self.keep.split("+")
self.drop_list = self.drop.split("+")
self.pattern = "+".join(self.keep_list) + "-" + "-".join(self.drop_list)
###
def loadConfig(self, yamlConfig):
yml = yaml.load_all(file(yamlConfig))
for data in yml:
# These get created fine
self.name = data["name"]
self.keep = data["keep"]
self.drop = data["drop"]
# These do not get created.
self.keep_list = self.keep.split("+")
self.drop_list = self.drop.split("+")
self.pattern = "+".join(self.keep_list) + "-" + "-".join(self.drop_list)
###
### End Region
if __name__ == "__main__":
my_yaml = "/home/path/to/test.yaml"
region_iterator = yaml.load_all(file(my_yaml))
# Set a debug breakpoint to play with region_iterator and
# confirm the extra stuff isn't created.
pdb.set_trace()
And here is test.yaml so you can run all of this and see what I mean:
Regions:
# Note: the string conventions below are for an
# existing system. This is a shortened, representative
# example.
Market1:
!Region
name: USAndGB
keep: US+GB
drop: !!null
Market2:
!Region
name: CanadaAndAustralia
keep: CA+AU
drop: !!null
And here, for example, is what it looks like for me when I run this in an IPython shell and explore the loaded object:
In [57]: %run "/home/espears/testWorkspace/testRegions.py"
--Return--
> /home/espears/testWorkspace/testRegions.py(38)<module>()->None
-> pdb.set_trace()
(Pdb) region_iterator
<generator object load_all at 0x1139d820>
(Pdb) tmp = region_iterator.next()
(Pdb) tmp
{'Regions': {'Market2': <__main__.Region object at 0x1f858550>, 'Market1': <__main__.Region object at 0x11a91e50>}}
(Pdb) us = tmp['Regions']['Market1']
(Pdb) us
<__main__.Region object at 0x11a91e50>
(Pdb) us.name
'USAndGB'
(Pdb) us.keep
'US+GB'
(Pdb) us.keep_list
*** AttributeError: 'Region' object has no attribute 'keep_list'
A pattern I have found useful for working with yaml for classes that are basically storage is to have the loader use the constructor so that objects are created in the same way as when you make them normally. If I understand what you are attempting to do correctly, this kind of structure might be useful:
import inspect
import yaml
from collections import OrderedDict
class Serializable(yaml.YAMLObject):
__metaclass__ = yaml.YAMLObjectMetaclass
#property
def _dict(self):
dump_dict = OrderedDict()
for var in inspect.getargspec(self.__init__).args[1:]:
if getattr(self, var, None) is not None:
item = getattr(self, var)
if isinstance(item, np.ndarray) and item.ndim == 1:
item = list(item)
dump_dict[var] = item
return dump_dict
#classmethod
def to_yaml(cls, dumper, data):
return ordered_dump(dumper, '!{0}'.format(data.__class__.__name__),
data._dict)
#classmethod
def from_yaml(cls, loader, node):
fields = loader.construct_mapping(node, deep=True)
return cls(**fields)
def ordered_dump(dumper, tag, data):
value = []
node = yaml.nodes.MappingNode(tag, value)
for key, item in data.iteritems():
node_key = dumper.represent_data(key)
node_value = dumper.represent_data(item)
value.append((node_key, node_value))
return node
You would then want to have your Region class inherit from Serializable, and remove the loadConfig stuff. The code I posted inspects the constructor to see what data to save to the yaml file, and then when loading a yaml file calls the constructor with that same set of data. That way you just have to get the logic right in your constructor and the yaml loading should get it for free.
That code was ripped from one of my projects, apologies in advance if it doesn't quite work. It is also slightly more complicated than it needs to be because I wanted to control the order of output by using OrderedDict. You could replace my ordered_dump function with a call to dumper.represent_dict.

Categories