Saving and Loading a class instance from file - python

an essential part of my project is being able to save and load class instances to a file. For further context, my class has both a set of attributes as well as a few methods.
So far, I've tried using pickle, but it's not working quite as expected. For starters, it's not loading the methods, nor it's letting me add attributes that I've defined initially; in other words, it's not really making a copy of the class I can work with.
Relevant Code:
class Brick(object):
def __init__(self, name, filename=None, areaMin=None, areaMax=None, kp=None):
self.name = name
self.filename = filename
self.areaMin = areaMin
self.areaMax = areaMax
self.kp = kp
self.__kpsave = None
if filename != None:
self.__logfile = file(filename, 'w')
def __getstate__(self):
f = self.__logfile
self.__kpsave = []
for point in self.kp:
temp = (point.pt, point.size, point.angle, point.response, point.octave, point.class_id)
self.__kpsave.append(temp)
return (self.name, self.areaMin, self.areaMax, self.__kpsave,
f.name, f.tell())
def __setstate__(self, state):
self.value, self.areaMin, self.areaMax, self.__kpsave, name, position = state
f = file(name, 'w')
f.seek(position)
self.__logfile = f
self.filename = name
self.kp = []
for point in self.__kpsave:
temp = cv2.KeyPoint(x=point[0][0], y=point[0][1], _size=point[1], _angle=point[2], _response=point[3],
_octave=point[4], _class_id=point[5])
self.kp.append(temp)
def calculateORB(self, img):
pass #I've omitted the actual method here
(There are a few more attributes and methods, but they're not relevant)
Now, this class definition works just fine when creating new instances: I can make a new Brick with just the name, I can then set areaMin or any other attribute, and I can use pickle(cPickle) to dump the current instance to a file just fine (I'm using those getstate and setstate because pickle won't work with OpenCV's Keypoint elements).
The problem comes, of course, when I do load the instance: using pickle load() I can load the instance from a file, and the values I set previously will be there (ie I can access areaMin just fine if I did set a value for it) but I can't access either methods or add values to any of the other attributes if I never changed their values. I've noticed that I don't need to import my class definition either if I'm simply pickling from a completely different source file.
Since all I want to do is build a "database" of sorts from my class objects, what's the best way to approach this? I know something that should work is to simply write a .Save() method that writes a .py source file where I essentially create an instance of the class, so I can then .Load() which will do exec and eval as appropriate, however, this seems like the worst possible way to do this, so, how should I actually do this?
Thanks.

You should not try to do I/O inside your __getstate__ and __setstate__ methods - those are called by Pickle, and the expted result is just an in-memory object that can be further pickled.
Moreover, if your "Point" class in the "self.kp" attribute is just a regular Python class, there is no need for you to customize pickling at all -
What you have to worry about is to deal with the I/O at the point you call Pickle. If you really need to load different instances independently, you could resort to the "shelve" module, or, better yet, use pickle.dumps and store the resulting string in a DBMS (which can be the built-in sqlite).
All in all:
class Point(object):
...
class Brick(object):
def __init__(self, point, ...):
self.kp = point
Then, to save a single object to a file:
with open("filename.pickle", "wb") as file_:
pickle.dump(my_brick, file_, -1)
and restore with:
my_brick = pickle.load(open("filename.pickle", "rb", -1)
To store several instances and recover all at once, you could just dump then in sequence to the same open file, and them read one by one until you got a fault due to "empty file" - or ou can simply add all objects you want to save to a List, and pickle the whole list at once.
To save and retrieve arbitrary objects that you can retrieve giving some attrbute like "name" or "id" - you can resort to the shelve module: https://docs.python.org/3/library/shelve.html or use a real database if you need complex queries and such. Trying to write your own ad hoc binary format to allow for searching the required instance is an horrible idea - as you'd have to implement all the protocol for that file 0 reading, writting, safeguards, corner cases, and such.

Related

Implement an optional context manager in Python

We have a codebase with the following use pattern:
factory = DataFactory(args)
dataset = factory.download_and_cache_big_dataset(key)
metadata = dataset.get_some_metadata()
Currently, download_and_cache_big_dataset fetches a very large file from S3 and puts it somewhere. Among other things, it does
filename = get_s3_key(key)
filepath = os.path.join(get_tmp_dir(), filename)
s3.download_file(key, filepath)
return BigFileClass(filepath) # gets stored in a class somewhere
However, this file doesn't get deleted. This is fine when this function is called sparingly and relies on file caching, but bad when it is called repeatedly and we don't want to fill up the disk. Is there a way to refactor the code with a context manager such that we can use it as
factory = DataFactory(args)
with factory.download_and_cache_big_dataset(key) as dataset:
metadata = dataset.get_some_metadata()
# do something with metadata
# file gets automatically deleted
But critically, without breaking the existing usage, so that the other code works as is? Or will there need to be a different method that returns the context manager?
Since you return an instance of BigFileClass to handle/represent the data, I would suggest the following.
I'm assuming that the data file is unique to each instance.
Add an instance variable to BigFileClass to keep track of the path of the data file.
Add a __del__ method to BigFileClass in which the data file is removed.
Edit: If you want to use BigFileClass as a contextmanager, define __enter__ and __exit__ methods for BigFileClass. The only thing that __enter__ has to do in this case is basically return self.
I would leave the task of deleting the file to the __del__ method (when the reference count for a BigFileClass reaches 0). It doesn't feel right to have the class instance still around when you have already deleted the data file.
Remark w.r.t. architecture.
The use of a factory seems like an unnecessary complication to me. IMO, download_and_cache_big_dataset could just be a function returning a BigFileClass instance.

Unpickle class instances by interating over a list

I am trying to unpickle various class instances which are saved in separate .pkl files by iterating over a list containing all the class instances (each class instance appends itself to the appropriate list when instantiated).
This works:
# LOAD IN INGREDIENT INSTANCES
for each in il:
with open('Ingredients/{}.pkl'.format(each), 'rb') as f:
globals()[each] = pickle.load(f)
For example, one ingredient is Aubergine:
print(Aubergine)
output:
Name: Aubergine
Price: £1.00
Portion Size: 1
However, this doesn't work:
# LOAD IN RECIPE INSTANCES
for each in rl:
with open('Recipes/{}.pkl'.format(each.name), 'rb') as f:
globals()[each] = pickle.load(f)
I can only assume that the issue stems from each.name being used for the file names of the recipes, whereas each is used for the ingredient file names. This is intentional, however, as the name attribute of the recipes is formatted for the end-user (i.e. contains white space etc.) I think this may be the issue, but I am not sure.
Both the ingredient and recipe classes use:
def __repr__(self):
return self.name
For example:
I have a recipe class instance SausageAubergineRagu, for which self.name is 'Sausage & Aubergine Ragu', and this is inside the list rl. I have tried testing this individually:
input:
rl
output:
[Sausage & Aubergine Ragu]
So I believe that this code:
# LOAD IN RECIPE INSTANCES
for each in rl:
with open('Recipes/{}.pkl'.format(each.name), 'rb') as f:
globals()[each] = pickle.load(f)
...should result in this:
with open('Recipes/Sausage & Aubergine Ragu.pkl', 'rb') as f:
globals()[SausageAubergineRagu] = pickle.load(f)
But attempting to access the recipe class instances results in a NameError.
One final note - please don't ask why I am doing things this way. Instead help me to address and solve the problem, so I can make it work, and understand what is going on. Appreciated :)
The NameError you are getting is Python telling you that you are trying to use a variable that hasn't been defined yet.
You aren't defining SausageAubergineRagu before you use it in this line:
globals()[SausageAubergineRagu] = pickle.load(f)
In your first example, you are adding keys and values to globals. You are using instances of recipes (each) as keys, and the pickled data as values.
In your second example, you are attempting to do the same thing, but instead of using instances of recipes (each) as keys, you are using SausageAubergineRagu, which is undefined.
How is Python supposed to know what SausageAubergineRagu is? If you want that line to work, you will need to define it first, or use something that is already defined, like each, which is what you do in your other snippet.
Honestly, using instances of custom classes as keys in globals seems bizarre to me anyway (usually people use strings), but since you apparently want to make it work, the answer is simple:
Define SausageAubergineRagu before attempting to use it as a key in a dictionary.

How to dynamically create new python class files?

I currently have a CSV file with over 200 entries, where each line needs to be made into its own class file. These classes will be inheriting from a base class with some field variables that it will inherit and set values to based on the CSV file. Additionally, the name of the python module will need to be based off an entry of the CSV file.
I really don't want to manually make over 200 individual python class files, and was wondering if there was a way to do this easily. Thanks!
edit* I'm definitely more of a java/C# coder so I'm not too familiar with python.
Some more details: I'm trying to create an AI for an already existing web game, which I can extract live data from via a live stream text box.
There are over 200 moves that a player can use each turn, and each move is vastly different. I could possibly create new instances of a move class as it's being used, but then I would have to loop through a database of all the moves and its effects each time the move is used, which seems very inefficient. Hence, I was thinking of creating classes of every move with the same name as it would appear in the text box so that I could create new instances of that specific move more quickly.
As others have stated, you usually want to be doing runtime class generation for this kind of thing, rather than creating individual files.
But I thought: what if you had some good reason to do this, like just making class templates for a bunch of files, so that you could go in and expand them later? Say I plan on writing a lot of code, so I'd like to automate the boilerplate code parts, that way I'm not stuck doing tedious work.
Turns out writing a simple templating engine for Python classes isn't that hard. Here's my go at it, which is able to do templating from a csv file.
from os import path
from sys import argv
import csv
INIT = 'def __init__'
def csvformat(csvpath):
""" Read a csv file containing a class name and attrs.
Returns a list of the form [['ClassName', {'attr':'val'}]].
"""
csv_lines = []
with open(csvpath) as f:
reader = csv.reader(f)
_ = [csv_lines.append(line)
for line in reader]
result = []
for line in csv_lines:
attr_dict = {}
attrs = line[1:]
last_attr = attrs[0]
for attr in attrs[1:]:
if last_attr:
attr_dict[last_attr] = attr
last_attr = ''
else:
last_attr = attr
result.append([line[0], attr_dict])
return result
def attr_template(attrs):
""" Format a list of default attribute setting code. """
attr_list = []
for attr, val in attrs.items():
attr_list.append(str.format(' if {} is None:\n', attr, val))
attr_list.append(str.format(' self.{} = {}\n', attr, val))
attr_list.append(' else:\n')
attr_list.append(str.format(' self.{} = {}\n', attr, attr))
return attr_list
def import_template(imports):
""" Import superclasses.
Assumes the .py files are named based on the lowercased class name.
"""
imp_lines = []
for imp in imports:
imp_lines.append(str.format('from {} import {}\n',
imp.lower(), imp))
return imp_lines
def init_template(attrs):
""" Template a series of optional arguments based on a dict of attrs.
"""
init_string = 'self'
for key in attrs:
init_string += str.format(', {}=None', key)
return init_string
def gen_code(foldername, superclass, name, attrs):
""" Generate python code in foldername.
Uses superclass for the superclass, name for the class name,
and attrs as a dict of {attr:val} for the generated class.
Writes to a file with lowercased name as the name of the class.
"""
imports = [superclass]
pathname = path.join(foldername, name.lower() + '.py')
with open(pathname, 'w') as pyfile:
_ = [pyfile.write(imp)
for imp
in import_template(imports)]
pyfile.write('\n')
pyfile.write((str.format('class {}({}):\n', name, superclass)))
pyfile.write((str.format(' {}({}):\n',
INIT, init_template(attrs))))
_ = [pyfile.write(attribute)
for attribute
in attr_template(attrs)]
pyfile.write(' super().__init__()')
def read_and_generate(csvpath, foldername, superclass):
class_info = csvformat(csvpath)
for line in class_info:
gen_code(foldername, superclass, *line)
def main():
read_and_generate(argv[1], argv[2], argv[3])
if __name__ == "__main__":
main()
The above takes a csvfile formatted like this as its first argument (here, saved as a.csv):
Magistrate,foo,42,fizz,'baz'
King,fizz,'baz'
Where the first field is the class name, followed by the attribute name and its default value. The second argument is the path to the output folder.
If I make a folder called classes and create a classes/mysuper.py in it with a basic class structure:
class MySuper():
def __init__(*args, **kwargs):
pass
And then run the code like this:
$ python3 codegen.py a.csv classes MySuper
I get the files classes/magistrate.py with the following contents:
from mysuper import MySuper
class Magistrate(MySuper):
def __init__(self, fizz=None, foo=None):
if fizz is None:
self.fizz = 'baz'
else:
self.fizz = fizz
if foo is None:
self.foo = 42
else:
self.foo = foo
super().__init__()
And classes/king.py:
from mysuper import MySuper
class King(MySuper):
def __init__(self, fizz=None):
if fizz is None:
self.fizz = 'baz'
else:
self.fizz = fizz
super().__init__()
You can actually load them and use them, too!
$ cd classes
classes$ python3 -i magistrate.py
>>> m = Magistrate()
>>> m.foo
42
>>> m.fizz
'baz'
>>>
The above generates Python 3 code, which is what I'm used to, so you will need to make some small changes for it to work in Python 2.
First of all, you don't have to seperate python classes by files - it is more common to group them by functionality to modules and packages (ref to What's the difference between a Python module and a Python package?). Furthermore, 200 similar classes sound like a quite unusual design - are they really needed or could you e.g. use a dict to store some properties?
And of course you can just write a small python script, read in the csv, and generate one ore more .py files containing the classes (lines of text written to the file).
Should be just a few lines of code depending on the level of customization.
If this list changes, you even don't have to write the classes to a file: You can just generate them on the fly.
If you tell us how far you got or more details about the problem, we could help in completing the code...
Instead of generating .py files, read in the csv and do dynamic type creation. This way, if the csv changes, you can be sure that your types are dynamically regenerated.

Python pickle load doesn't work when program is run again

I'm trying to store a class instance that contains one variable using pickle.
class Scm1:
keys = {'b':0, 'i':0, 's':0}
The first thing I do in my program is check if the pickled file is exists. If it does, I attempt to load the data using pickle load. If it doesn't (this only happens the very first time the program is run), I create two instances of this class t1 = Scm1() and t2 = Scm1(). Then, in my program, I modify the entries in the keys field. At the end, I attempt to store the instances to a file. For this, I add the two instances to a dictionary -- tmpDict = {'t1':t1, 't2':t2} and execute a pickle dump using tmpDict as the object. When I load the data using pickle load right after the dump, I get what I expect (the data is set to what it was during the program). However, when I run the program again (this time the file exists) and load the data, all entries in the keys field for the two objects (t1 and t2) are 0. Why is it that when I'm able to get the correct results when I do pickle load prior to my program ending and not again when I rerun the program. I'm new to python and so I'm not sure if I'm expecting pickle to work the right way. Sorry for not being able to paste more code snippets as it's for a school assignment.
class Scm1:
keys = {'b':0, 'i':0, 's':0}
keys is a class variable, so all instances of the class share it. Class variables are not pickled when you store the instances, as they do not belong to the instances.
You should use instance attributes instead:
class Scm1:
def __init__(self):
self.keys = {'b':0, 'i':0, 's':0}

python load from shelve - can I retain the variable name?

I'm teaching myself how to write a basic game in python (text based - not using pygame). (Note: I haven't actually gotten to the "game" part per-se, because I wanted to make sure I have the basic core structure figured out first.)
I'm at the point where I'm trying to figure out how I might implement a save/load scenario so a game session could persist beyond a signle running of the program. I did a bit of searching and everything seems to point to pickling or shelving as the best solutions.
My test scenario is for saving and loading a single instance of a class. Specifically, I have a class called Characters(), and (for testing's sake) a sigle instance of that class assigned to a variable called pc. Instances of the Character class have an attribute called name which is originally set to "DEFAULT", but will be updated based on user input at the initial setup of a new game. For ex:
class Characters(object):
def __init__(self):
self.name = "DEFAULT"
pc = Characters()
pc.name = "Bob"
I also have (or will have) a large number of functions that refer to various instances using the variables they are asigned to. For example, a made up one as a simplified example might be:
def print_name(character):
print character.name
def run():
print_name(pc)
run()
I plan to have a save function that will pack up the pc instance (among other info) with their current info (ex: with the updated name). I also will have a load function that would allow a user to play a saved game instead of starting a new one. From what I read, the load could work something like this:
*assuming info was saved to a file called "save1"
*assuming the pc instance was shelved with "pc" as the key
import shelve
mysave = shelve.open("save1")
pc = mysave["pc"]
My question is, is there a way for the shelve load to "remember" the variable name assotiated with the instance, and automatically do that << pc = mysave["pc"] >> step? Or a way for me to store that variable name as a string (ex as the key) and somehow use that string to create the variable with the correct name (pc)?
I will need to "save" a LOT of instances, and can automate that process with a loop, but I don't know how to automate the unloading to specific variable names. Do I really have to re-asign each one individually and explicitly? I need to asign the instances back to the apropriate variable names bc I have a bunch of core functions that refer to specific instances using variable names (like the example I gave above).
Ideas? Is this possible, or is there an entirely different solution that I'm not seeing?
Thanks!
~ribs
Sure, it's possible to do something like that. Since a shelf itself is like a dictionary, just save all the character instances in a real dictionary instance inside it using their variable's name as the key. For example:
class Character(object):
def __init__(self, name="DEFAULT"):
self.name = name
pc = Character("Bob")
def print_name(character):
print character.name
def run():
print_name(pc)
run()
import shelve
mysave = shelve.open("save1")
# save all Character instances without the default name
mysave["all characters"] = {varname:value for varname,value in
globals().iteritems() if
isinstance(value, Character) and
value.name != "DEFAULT"}
mysave.close()
del pc
mysave = shelve.open("save1")
globals().update(mysave["all characters"])
mysave.close()
run()

Categories