Python Pickle not saving entire object - python

I'm trying to pickle out a list of objects where the objects contain a list. When I open the pickled file I can see any data in my objects except from the list. I'm putting code below so this makes more sense.
Object that contains a list.
class TestPickle:
testNumber = None
testList = []
def addNumber(self, value):
self.testNumber = value
def getNumber(self):
return self.testNumber
def addTestList(self, value):
self.testList.append(value)
def getTestList(self):
return self.testList
This example I create a list of the above object (I'm adding one object to keep it brief)
testPKL = TestPickle()
testList = []
testPKL.addNumber(12)
testPKL.addTestList(1)
testPKL.addTestList(2)
testList.append(testPKL)
with open(os.path.join(os.path.curdir, 'test.pkl'), 'wb') as f:
pickle.dump(testList, f)
Here is an example of me opening the pickled file and trying to access the data, I can only retrieve the testNumber from above, the testList returns a empty list.
pklResult = None
with open(os.path.join(os.path.curdir, 'test.pkl'), 'rb') as f:
pklResult = pickle.load(f)
for result in pklResult:
print result.getNumber() # returns 12
print result.testNumber # returns 12
print result.getTestList() # returns []
print result.testList # returns []
I think i'm missing something obvious here but I'm not having any luck spotting it. Thanks for any guidance.

testNumber and testList both are class attributes initially. testNumber is of immutable type hence modifying it create new instance attribute, But testList is of mutable type and can be modified in place. Hence modifying testList doesn't create new instance attribute and it remains as class attribute.
You can verify it -
print testPKL.__dict__
{'testNumber': 12}
print result.__dict__
{'testNumber': 12}
So when you access result.testList, it looks for class attribute TestPickle.testList, which is [] in your case.
Solution
You are storing instance in pickle so use instance attribute. Modify TestPickle class as below -
class TestPickle:
def __init__(self):
self.testNumber = None
self.testList = []
def addNumber(self, value):
self.testNumber = value
def getNumber(self):
return self.testNumber
def addTestList(self, value):
self.testList.append(value)
def getTestList(self):
return self.testList

Related

pickle, dill and cloudpickle returning field as empty dict on custom class after process termination

I have an object of a custom class that I am trying to serialize and permanently store.
When I serialize it, store it, load it and use it in the same run, it works fine. It only messes up when I've ended the process and then try to load it again from the pickle file. This is the code that works fine:
first_model = NgramModel(3, name="debug")
for paragraph in text:
first_model.train(paragraph_to_sentences(text))
# paragraph to sentences just uses regex to do the equivalent of splitting by punctuation
print(first_model.context_options)
# context_options is a dict (counter)
first_model = NgramModel.load_existing_model("debug")
#load_existing_model loads the pickle file. Look in the class code
print(first_model.context_options)
However, when I run this alone, it prints an empty counter:
first_model = NgramModel.load_existing_model("debug")
print(first_model.context_options)
This is a shortened version of the class file (the only two methods that touch the pickle/dill are update_pickle_state and load_existing_model):
import os
import dill
from itertools import count
from collections import Counter
from os import path
class NgramModel:
context_options: dict[tuple, set[str]] = {}
ngram_count: Counter[tuple] = Counter()
n = 0
pickle_path: str = None
num_paragraphs = 0
num_sentences = 0
def __init__(self, n: int, **kwargs):
self.n = n
self.pickle_path = NgramModel.pathify(kwargs.get('name', NgramModel.gen_pickle_name())) #use name if exists else generate random name
def train(self, paragraph_as_list: list[str]):
'''really the central method that coordinates everything else. Takes a list of sentences, generates data(n-grams) from each, updates the fields, and saves the instance (self) to a pickle file'''
self.num_paragraphs += 1
for sentence in paragraph_as_list:
self.num_sentences += 1
generated = self.generate_Ngrams(sentence)
self.ngram_count.update(generated)
for ngram in generated:
self.add_to_set(ngram)
self.update_pickle_state()
def update_pickle_state(self):
'''saves instance to pickle file'''
file = open(self.pickle_path, "wb")
dill.dump(self, file)
file.close()
#staticmethod
def load_existing_model(name: str):
'''returns object from pickle file'''
path = NgramModel.pathify(name)
file = open(path, "rb")
obj: NgramModel = dill.load(file)
return obj
def generate_Ngrams(self, string: str):
'''ref: https://www.analyticsvidhya.com/blog/2021/09/what-are-n-grams-and-how-to-implement-them-in-python/'''
words = string.split(" ")
words = ["<start>"] * (self.n - 1) + words + ["<end>"] * (self.n - 1)
list_of_tup = []
for i in range(len(words) + 1 - self.n):
list_of_tup.append((tuple(words[i + j] for j in range(self.n - 1)), words[i + self.n - 1]))
return list_of_tup
def add_to_set(self, ngram: tuple[tuple[str, ...], str]):
if ngram[0] not in self.context_options:
self.context_options[ngram[0]] = set()
self.context_options[ngram[0]].add(ngram[1])
#staticmethod
def pathify(name):
'''converts name to path'''
return f"models/{name}.pickle"
#staticmethod
def gen_pickle_name():
for i in count():
new_name = f"unnamed-pickle-{i}"
if not path.exists(NgramModel.pathify(new_name)):
return new_name
All the other fields print properly and are complete and correct except the two dicts
The problem is that is that context_options is a mutable class-member, not an instance member. If I had to guess, dill is only pickling instance members, since the class definition holds class members. That would account for why you see a "filled-out" context_options when you're working in the same shell but not when you load fresh — you're using the dirtied class member in the former case.
It's for stuff like this that you generally don't want to use mutable class members (or similarly, mutable default values in function signatures). More typical is to use something like context_options: dict[tuple, set[str]] = None and then check if it's None in the __init__ to set it to a default value, e.g., an empty dict. Alternatively, you could use a #dataclass and provide a field initializer, i.e.
#dataclasses.dataclass
class NgramModel:
context_options: dict[tuple, set[str]] = dataclasses.field(default_factory=dict)
...
You can observe what I mean about it being a mutable class member with, for instance...
if __name__ == '__main__':
ng = NgramModel(3, name="debug")
print(ng.context_options) # {}
ng.context_options[("foo", "bar")] = {"baz", "qux"}
print(ng.context_options) # {('foo', 'bar'): {'baz', 'qux'}}
ng2 = NgramModel(3, name="debug")
print(ng2.context_options) # {('foo', 'bar'): {'baz', 'qux'}}
I would expect a brand new ng2 to have the same context that the brand new ng had - empty (or whatever an appropriate default is).

Unpickling "None" object in Python

I am using redis to try to save a request's session object. Based on how to store a complex object in redis (using redis-py), I have:
def get_object_redis(key,r):
saved = r.get(key)
obj = pickle.loads(saved)
return obj
redis = Redis()
s = get_object_redis('saved',redis)
I have situations where there is no saved session and 'saved' evaluates to None. In this case I get:
TypeError: must be string or buffer, not None
Whats the best way to deal with this?
There are several ways to deal with it. This is what they would have in common:
def get_object_redis(key,r):
saved = r.get(key)
if saved is None:
# maybe add code here
return ... # return something you expect
obj = pickle.loads(saved)
return obj
You need to make it clear what you expect if a key is not found.
Version 1
An example would be you just return None:
def get_object_redis(key,r):
saved = r.get(key)
if saved is None:
return None
obj = pickle.loads(saved)
return obj
redis = Redis()
s = get_object_redis('saved',redis)
s is then None. This may be bad because you need to handle that somewhere and you do not know whether it was not found or it was found and really None.
Version 2
You create an object, maybe based on the key, that you can construct because you know what lies behind a key.
class KeyWasNotFound(object):
# just an example class
# maybe you have something useful in mind
def __init__(self, key):
self.key = key
def get_object_redis(key,r):
saved = r.get(key)
if saved is None:
return KeyWasNotFound(key)
obj = pickle.loads(saved)
return obj
Usually, if identity is important, you would store the object after you created it, to return the same object for the key.
Version 3
TypeError is a very geneneric error. You can create your own error class. This would be the preferred way for me, because I do not like version 1 and do not have knowledge of which object would be useful to return.
class NoRedisObjectFoundForKey(KeyError):
pass
def get_object_redis(key,r):
saved = r.get(key)
if saved is None:
raise NoRedisObjectFoundForKey(key)
obj = pickle.loads(saved)
return obj

Printing an object python class

I wrote the following program:
def split_and_add(invoer):
rij = invoer.split('=')
rows = []
for line in rij:
rows.append(process_row(line))
return rows
def process_row(line):
temp_coordinate_row = CoordinatRow()
rij = line.split()
for coordinate in rij:
coor = process_coordinate(coordinate)
temp_coordinate_row.add_coordinaterow(coor)
return temp_coordinate_row
def process_coordinate(coordinate):
cords = coordinate.split(',')
return Coordinate(int(cords[0]),int(cords[1]))
bestand = file_input()
rows = split_and_add(bestand)
for row in range(0,len(rows)-1):
rij = rows[row].weave(rows[row+1])
print rij
With this class:
class CoordinatRow(object):
def __init__(self):
self.coordinaterow = []
def add_coordinaterow(self, coordinate):
self.coordinaterow.append(coordinate)
def weave(self,other):
lijst = []
for i in range(len(self.coordinaterow)):
lijst.append(self.coordinaterow[i])
try:
lijst.append(other.coordinaterow[i])
except IndexError:
pass
self.coordinaterow = lijst
return self.coordinaterow
However there is an error in
for row in range(0,len(rows)-1):
rij = rows[row].weave(rows[row+1])
print rij
The outcome of the print statement is as follows:
[<Coordinates.Coordinate object at 0x021F5630>, <Coordinates.Coordinate object at 0x021F56D0>]
It seems as if the program doesn't acces the actual object and printing it. What am i doing wrong here ?
This isn't an error. This is exactly what it means for Python to "access the actual object and print it". This is what the default string representation for a class looks like.
If you want to customize the string representation of your class, you do that by defining a __repr__ method. The typical way to do it is to write a method that returns something that looks like a constructor call for your class.
Since you haven't shown us the definition of Coordinate, I'll make some assumptions here:
class Coordinate(object):
def __init__(self, x, y):
self.x, self.y = x, y
# your other existing methods
def __repr__(self):
return '{}({}, {})'.format(type(self).__name__, self.x, self.y)
If you don't define this yourself, you end up inheriting __repr__ from object, which looks something like:
return '<{} object at {:#010x}>'.format(type(self).__qualname__, id(self))
Sometimes you also want a more human-readable version of your objects. In that case, you also want to define a __str__ method:
def __str__(self):
return '<{}, {}>'.format(self.x, self.y)
Now:
>>> c = Coordinate(1, 2)
>>> c
Coordinate(1, 2)
>>> print(c)
<1, 2>
But notice that the __str__ of a list calls __repr__ on all of its members:
>>> cs = [c]
>>> print(cs)
[Coordinate(1, 2)]

what's the reason of changes of copy function in UserDict.py

copy function defined by this:
def copy(self):
if self.__class__ is UserDict:
return UserDict(self.data.copy())
import copy
data = self.data
//why use try? use return copy.copy(self) instead
try:
self.data = {}
c = copy.copy(self)
finally:
self.data = data
c.update(self)
return c
why try-finally is used here? self.data will be cleared at first? what's the exception that will be raised here?
If you ignore the try / except, the code is::
data = self.data
self.data = {}
c = copy.copy(self)
self.data = data
c.update(self)
Note the self.data = {} line. For some reason, the person who wrote this code felt that the copy would work better if self.data was set to an empty dictionary before calling copy.copy(), and then the actual data was copied over using update().
The point of the finally is to ensure that self.data is restored to its original value, no matter what happens in copy.copy().

List callbacks?

Is there any way to make a list call a function every time the list is modified?
For example:
>>>l = [1, 2, 3]
>>>def callback():
print "list changed"
>>>apply_callback(l, callback) # Possible?
>>>l.append(4)
list changed
>>>l[0] = 5
list changed
>>>l.pop(0)
list changed
5
Borrowing from the suggestion by #sr2222, here's my attempt. (I'll use a decorator without the syntactic sugar):
import sys
_pyversion = sys.version_info[0]
def callback_method(func):
def notify(self,*args,**kwargs):
for _,callback in self._callbacks:
callback()
return func(self,*args,**kwargs)
return notify
class NotifyList(list):
extend = callback_method(list.extend)
append = callback_method(list.append)
remove = callback_method(list.remove)
pop = callback_method(list.pop)
__delitem__ = callback_method(list.__delitem__)
__setitem__ = callback_method(list.__setitem__)
__iadd__ = callback_method(list.__iadd__)
__imul__ = callback_method(list.__imul__)
#Take care to return a new NotifyList if we slice it.
if _pyversion < 3:
__setslice__ = callback_method(list.__setslice__)
__delslice__ = callback_method(list.__delslice__)
def __getslice__(self,*args):
return self.__class__(list.__getslice__(self,*args))
def __getitem__(self,item):
if isinstance(item,slice):
return self.__class__(list.__getitem__(self,item))
else:
return list.__getitem__(self,item)
def __init__(self,*args):
list.__init__(self,*args)
self._callbacks = []
self._callback_cntr = 0
def register_callback(self,cb):
self._callbacks.append((self._callback_cntr,cb))
self._callback_cntr += 1
return self._callback_cntr - 1
def unregister_callback(self,cbid):
for idx,(i,cb) in enumerate(self._callbacks):
if i == cbid:
self._callbacks.pop(idx)
return cb
else:
return None
if __name__ == '__main__':
A = NotifyList(range(10))
def cb():
print ("Modify!")
#register a callback
cbid = A.register_callback(cb)
A.append('Foo')
A += [1,2,3]
A *= 3
A[1:2] = [5]
del A[1:2]
#Add another callback. They'll be called in order (oldest first)
def cb2():
print ("Modify2")
A.register_callback(cb2)
print ("-"*80)
A[5] = 'baz'
print ("-"*80)
#unregister the first callback
A.unregister_callback(cbid)
A[5] = 'qux'
print ("-"*80)
print (A)
print (type(A[1:3]))
print (type(A[1:3:2]))
print (type(A[5]))
The great thing about this is if you realize you forgot to consider a particular method, it's just 1 line of code to add it. (For example, I forgot __iadd__ and __imul__ until just now :)
EDIT
I've updated the code slightly to be py2k and py3k compatible. Additionally, slicing creates a new object of the same type as the parent. Please feel free to continue poking holes in this recipe so I can make it better. This actually seems like a pretty neat thing to have on hand ...
You'd have to subclass list and modify __setitem__.
class NotifyingList(list):
def __init__(self, *args, **kwargs):
self.on_change_callbacks = []
def __setitem__(self, index, value):
for callback in self.on_change_callbacks:
callback(self, index, value)
super(NotifyingList, self).__setitem__(name, index)
notifying_list = NotifyingList()
def print_change(list_, index, value):
print 'Changing index %d to %s' % (index, value)
notifying_list.on_change_callbacks.append(print_change)
As noted in comments, it's more than just __setitem__.
You might even be better served by building an object that implements the list interface and dynamically adds and removes descriptors to and from itself in place of the normal list machinery. Then you can reduce your callback calls to just the descriptor's __get__, __set__, and __delete__.
I'm almost certain this can't be done with the standard list.
I think the cleanest way would be to write your own class to do this (perhaps inheriting from list).

Categories