How to save changes to a python object? - python

I have a Python dictionary of objects from a class that I have created in one file. It is of the form {string : object}, with several key, value pairs.
My goal is to do something in a method in a separate file that changes an attribute of certain objects in the dictionary and to save those changes to those objects while keeping them within the dictionary.
I've tried using pickle, but it doesn't seem to save the changes to the objects within the dictionary.
Basic Idea of what I'm doing right now and what is wrong with it:
File #1:
class A:
def __init__(self):
self.value = 0
a = A()
dict = {"Test" : a}
pickle.dump(dict, open("save.p", "wb"))
File #2:
dict = pickle.load(open("save.p", "rb"))
dict["Test"].value += 1
print(dict["Test"].value)
pickle.dump(dict, open("save.p", "wb"))
So when I run File #2 the first time, it should print 1, and it does
but when I run File #2 the second time, I want it to print 2, but it prints 1 again because the change to the value was not saved.
It could be that I am using pickle incorrectly...
Any help would be appreciated! Thanks!

From the pickle documentation:
Note that none of the class’s code or data is pickled
See pickling class instances for the right way to do it.
Also class A does not exist in the unpickling environment, that can't be a good thing, class are unpickeled by name if I read the doc right.
BTW I'd use json over pickle so you can open the file between two runs and inspect it yourself to understand what happen. There's a few advantages to use json over pickle, and a few to use pickle over json, here's a comparison between pickle and json.
Oh, and, avoid naming your variables dict or any existing builtins, it shadows them and can lead to very strange behaviors.

Related

How to works pickle dataframe inside

I wonder how the module "pickle" save and load objects. I saved a file with a dataframe object on the disk,
import pandas as pd
import pickle
df = pd.read_excel(r".\test.xlsx")
with open("o.pkl", "wb") as file:
pickle.dump(df, file)
then I uninstalled pandas and tried to load the object dataframe from file, but i get error "Exception has occurred: ModuleNotFoundError
No module named 'pandas'":
import pickle
with open("o.pkl", "rb") as file:
e = pickle.load(file)
my question is, does the pickle module somehow use pandas when loading an df? If so how is it done?
Pickle by default will go and import the class.
In this case, if you do not have pandas installed when you run the second snippet, it won't work by default (see below for more info on that default behaviour).
Quick primer on pickling
Essentially, everything in Python is an instance of a class, in some shape or form.
When you make a DataFrame, such as when you use pandas.read_excel, you create an instance of a DataFrame class. To create that class you need:
the class definition (containing information about methods and attributes)
something that creates the instance from some input data
You can create instances of a class normally by directly instantiating the class, or by using another method/function. Example:
# This makes a string, '12345' by directly invoking the str constructor
s = str(12345)
# This makes a list by using the split method of the string
l = s.split('3')
Pickle works just the same. When you unpickle, you need the class definition as well as the function which transforms some input data (your .pkl file) into the instance.
The class definition will be available in the pickled data, but none of the other supporting imports + code outside of the class will be.
This means that even if you override the default behaviour, while you might be able to make a DataFrame, your DataFrame won't work because you're missing pandas. When you try to invoke a method on the DataFrame, Python will try to access code that doesn't live in the original class definition. This code lives in other modules in the pandas module, and so this will never be captured in the pickle -- your code will then become quite unhappy at this point.
Can I override the default behaviour for unpickling?
Yes, you can do this -- you can override the import behaviour by using a custom unpickler. That's described here in the Python doc: restricting globals (Python official doc).
I've run into a similar thing before where it needed a specific pandas version, but I didn't investigate. Running across your post here, I read some of the documentation and came across this line:
When a class instance is unpickled, its __init__() method is usually not invoked. The default behaviour first creates an uninitialized instance and then restores the saved attributes.
https://docs.python.org/3.8/library/pickle.html#pickle-inst
So to unpickle an arbitrary class instance, it has to be able to access the initialization method of that class. If the class isn't present, it can't do that.
That same page also says:
Similarly, when class instances are pickled, their class’s code and data are not pickled along with them. Only the instance data are pickled.
If I make a pandas DataFrame, I can access df.__class__ which will return pandas.core.frame.DataFrame
Putting this all together on that page, here's what I think happens:
Pickling df saves the instance data, which includes the __class__ attribute
Unpickling goes and looks for this class to access its __setstate__ method
If the module containing this class definition can't be found: error!
Short answer: it saves that information.

Use Python for Creating JSON

I want to use Python for creating JSON.
Since I found no library which can help me, I want to know if it's possible to inspect the order of the classes in a Python file?
Example
# example.py
class Foo:
pass
class Bar:
pass
If I import example, I want to know the order of the classes. In this case it is [Foo, Bar] and not [Bar, Foo].
Is this possible? If "yes", how?
Background
I am not happy with yaml/json. I have the vague idea to create config via Python classes (only classes, not instantiation to objects).
Answers which help me to get to my goal (Create JSON with a tool which is easy and fun to use) are welcome.
The inspect module can tell the line numbers of the class declarations:
import inspect
def get_classes(module):
for name, value in inspect.getmembers(module):
if inspect.isclass(value):
_, line = inspect.getsourcelines(value)
yield line, name
So the following code:
import example
for line, name in sorted(get_classes(example)):
print line, name
Prints:
2 Foo
5 Bar
First up, as I see it, there are 2 things you can do...
Continue pursuing to use Python source files as configuration files. (I won't recommend this. It's analogous to using a bulldozer to strike a nail or converting a shotgun to a wheel)
Switch to something like TOML, JSON or YAML for configuration files, which are designed for the job.
Nothing in JSON or YAML prevents them from holding "ordered" key-value pairs. Python's dict data type is unordered by default (at least till 3.5) and list data type is ordered. These map directly to object and array in JSON respectively, when using the default loaders. Just use something like Python's OrderedDict when deserializing them and voila, you preserve order!
With that out of the way, if you really want to use Python source files for the configuration, I suggest trying to process the file using the ast module. Abstract Syntax Trees are a powerful tool for syntax level analysis.
I whipped a quick script for extracting class line numbers and names from a file.
You (or anyone really) can use it or extend it to be more extensive and have more checks if you want for whatever you want.
import sys
import ast
import json
class ClassNodeVisitor(ast.NodeVisitor):
def __init__(self):
super(ClassNodeVisitor, self).__init__()
self.class_defs = []
def visit(self, node):
super(ClassNodeVisitor, self).visit(node)
return self.class_defs
def visit_ClassDef(self, node):
self.class_defs.append(node)
def read_file(fpath):
with open(fpath) as f:
return f.read()
def get_classes_from_text(text):
try:
tree = ast.parse(text)
except Exception as e:
raise e
class_extractor = ClassNodeVisitor()
li = []
for definition in class_extractor.visit(tree):
li.append([definition.lineno, definition.name])
return li
def main():
fpath = "/tmp/input_file.py"
try:
text = read_file(fpath)
except Exception as e:
print("Could not load file due to " + repr(e))
return 1
print(json.dumps(get_classes_from_text(text), indent=4))
if __name__ == '__main__':
sys.exit(main())
Here's a sample run on the following file:
input_file.py:
class Foo:
pass
class Bar:
pass
Output:
$ py_to_json.py input_file.py
[
[
1,
"Foo"
],
[
5,
"Bar"
]
]
If I import example,
If you're going to import the module, the example module to be on the import path. Importing means executing any Python code in the example module. This is a pretty big security hole - you're loading a user-editable file in the same context as the rest of the application.
I'm assuming that since you care about preserving class-definition order, you also care about preserving the order of definitions within each class.
It is worth pointing out that is now the default behavior in python, since python3.6.
Aslo see PEP 520: Preserving Class Attribute Definition Order.
(Moving my comments to an answer)
That's a great vague idea. You should give Figura a shot! It does exactly that.
(Full disclosure: I'm the author of Figura.)
I should point out the order of declarations is not preserved in Figura, and also not in json.
I'm not sure about order-preservation in YAML, but I did find this on wikipedia:
... according to the specification, mapping keys do not have an order
It might be the case that specific YAML parsers maintain the order, though they aren't required to.
You can use a metaclass to record each class's creation time, and later, sort the classes by it.
This works in python2:
class CreationTimeMetaClass(type):
creation_index = 0
def __new__(cls, clsname, bases, dct):
dct['__creation_index__'] = cls.creation_index
cls.creation_index += 1
return type.__new__(cls, clsname, bases, dct)
__metaclass__ = CreationTimeMetaClass
class Foo: pass
class Bar: pass
classes = [ cls for cls in globals().values() if hasattr(cls, '__creation_index__') ]
print(sorted(classes, key = lambda cls: cls.__creation_index__))
The standard json module is easy to use and works well for reading and writing JSON config files.
Objects are not ordered within JSON structures but lists/arrays are, so put order dependent information into a list.
I have used classes as a configuration tool, the thing I did was to derive them from a base class which was customised by the particular class variables. By using the class like this I did not need a factory class. For example:
from .artifact import Application
class TempLogger(Application): partno='03459'; path='c:/apps/templog.exe'; flag=True
class GUIDisplay(Application): partno='03821'; path='c:/apps/displayer.exe'; flag=False
in the installation script
from .install import Installer
import app_configs
installer = Installer(apps=(TempLogger(), GUIDisplay()))
installer.baseline('1.4.3.3475')
print installer.versions()
print installer.bill_of_materials()
One should use the right tools for the job, so perhaps python classes are not the right tool if you need ordering.
Another python tool I have used to create JSON files is Mako templating system. This is very powerful. We used it to populate variables like IP addresses etc into static JSON files that were then read by C++ programs.
I'm not sure if this is answers your question, but it might be relevant. Take a look at the excellent attrs module. It's great for creating classes to use as data types.
Here's an example from glyph's blog (creator of Twisted Python):
import attr
#attr.s
class Point3D(object):
x = attr.ib()
y = attr.ib()
z = attr.ib()
It saves you writing a lot of boilerplate code - you get things like str representation and comparison for free, and the module has a convenient asdict function which you can pass to the json library:
>>> p = Point3D(1, 2, 3)
>>> str(p)
'Point3D(x=1, y=2, z=3)'
>>> p == Point3D(1, 2, 3)
True
>>> json.dumps(attr.asdict(p))
'{"y": 2, "x": 1, "z": 3}'
The module uses a strange naming convention, but read attr.s as "attrs" and attr.ib as "attrib" and you'll be okay.
Just touching the point about creating JSON from python. there is an excellent library called jsonpickle which lets you dump python objects to json. (and using this alone or with other methods mentioned here you can probably get what you wanted)

Calling a function from a dictionary, dictionary in imported settings file

So I have a dictionary with a bunch of names that I use to call functions. It works fine, but I prefer to put it in my settings file. If I do so, though, I will get errors from the settings file saying that there are no functions by that name(even though I'm not calling them at the time). Any workarounds?
def callfunct(id, time):
#stuff here
def callotherfunct(id, time):
#stuff here
dict = {"blah blah": callfunct, "blah blah blah": callfunct, "otherblah": callotherfunct}
dict[str(nameid)](id, time)
Hope this makes sense. Also open to other ideas, but basically I have about 50 iterations of these definitions and unique names that are passed by nameid that need to call specific functions, so that's why I do it the way I do, so that I can add new names quickly. It would obviously be even quicker if I could get the dictionary into the settings file seamlessly as well.
If you try
def f_one(id, time):
pass
def f_two(id, time):
pass
d = {"blah blah":"f_one", "blah blah blah":"f_one", "otherblah","f_two"
locals()[d[str(nameid)]](id, time)
(replacing the dictionary initialization with just loading the config file with the string name of the functions you want to call), does that work?
If not, there needs to be a little more info: What does the config file look like, and how are you loading it?
I'm guessing the reason that the config file part isn't working is that you're trying to reference the functions directly from the config file, which shouldn't work. This is using whatever's stored in the config file and looking it up in the locals() dictionary (if you're in a function, you'll have to use globals() instead)
You could initialise the dictionary with the looked up function only when you attempt to access it:
d = {}
d.setdefault('func1', globals()['func1'])()

lazy load dictionary

I have a dictionary called fsdata at module level (like a global variable).
The content gets read from the file system. It should load its data once on the first access. Up to now it loads the data during importing the module. This should be optimized.
If no code accesses fsdata, the content should not be read from the file system (save CPU/IO).
Loading should happen, if you check for the boolean value, too:
if mymodule.fsdata:
... do_something()
Update: Some code already uses mymodule.fsdata. I don't want to change the other places. It should be variable, not a function. And "mymodule" needs to be a module, since it gets already used in a lot of code.
I think you should use Future/Promise like this https://gist.github.com/2935416
Main point - you create not an object, but a 'promise' about object, that behave like an object.
You can replace your module with an object that has descriptor semantics:
class FooModule(object):
#property
def bar(self):
print "get"
import sys
sys.modules[__name__] = FooModule()
Take a look at http://pypi.python.org/pypi/apipkg for a packaged approach.
You could just create a simple function that memoizes the data:
fsdata = []
def get_fsdata:
if not fsdata:
fsdata.append(load_fsdata_from_file())
return fsdata[0]
(I'm using a list as that's an easy way to make a variable global without mucking around with the global keyword).
Now instead of referring to module.fsdata you can just call module.get_fsdata().

Design of a python pickleable object that describes a file

I would like to create a class that describes a file resource and then pickle it. This part is straightforward. To be concrete, let's say that I have a class "A" that has methods to operate on a file. I can pickle this object if it does not contain a file handle. I want to be able to create a file handle in order to access the resource described by "A". If I have an "open()" method in class "A" that opens and stores the file handle for later use, then "A" is no longer pickleable. (I add here that opening the file includes some non-trivial indexing which cannot be cached--third party code--so closing and reopening when needed is not without expense). I could code class "A" as a factory that can generate file handles to the described file, but that could result in multiple file handles accessing the file contents simultaneously. I could use another class "B" to handle the opening of the file in class "A", including locking, etc. I am probably overthinking this, but any hints would be appreciated.
The question isn't too clear; what it looks like is that:
you have a third-party module which has picklable classes
those classes may contain references to files, which makes the classes themselves not picklable because open files aren't picklable.
Essentially, you want to make open files picklable. You can do this fairly easily, with certain caveats. Here's an incomplete but functional sample:
import pickle
class PicklableFile(object):
def __init__(self, fileobj):
self.fileobj = fileobj
def __getattr__(self, key):
return getattr(self.fileobj, key)
def __getstate__(self):
ret = self.__dict__.copy()
ret['_file_name'] = self.fileobj.name
ret['_file_mode'] = self.fileobj.mode
ret['_file_pos'] = self.fileobj.tell()
del ret['fileobj']
return ret
def __setstate__(self, dict):
self.fileobj = open(dict['_file_name'], dict['_file_mode'])
self.fileobj.seek(dict['_file_pos'])
del dict['_file_name']
del dict['_file_mode']
del dict['_file_pos']
self.__dict__.update(dict)
f = PicklableFile(open("/tmp/blah"))
print f.readline()
data = pickle.dumps(f)
f2 = pickle.loads(data)
print f2.read()
Caveats and notes, some obvious, some less so:
This class should operate directly on the file object you got from open. If you're using wrapper classes on files, like gzip.GzipFile, those should go above this, not below it. Logically, treat this as a decorator class on top of file.
If the file doesn't exist when you unpickle, it can't be unpickled and will throw an exception.
If it's a different file, the behavior may or may not make sense.
If the file mode includes file creation ('w+'), and the file doesn't exist, it'll be created; we don't know what file permissions to use, since that's not stored with the file. If this is important--it probably shouldn't be--then store the correct permissions in the class when you first create it.
If the file isn't seekable, trying to seek to the old position may raise IOError; if you're using a file like that you'll need to decide how to handle that.
The file classes in Python 2 and Python 3 are different; there's no file class in Python 3. Even if you're only using Python 2 right now, don't subclass file.
I'd steer away from doing this; having pickled data dependent on external files not changing and staying in the same place is brittle. This makes it difficult to even relocate files, since your pickled data won't make sense.
If you open a pointer to a file, pickle it, then attempt to reconstitute is later, there is no guarantee that file will still be available for opening.
To elaborate, the file pointer really represents a connection to the file. Just like a database connection, you can't "pickle" the other end of the connection, so this won't work.
Is it possible to keep the file pointer around in memory in its own process instead?
It sounds like you know you can't pickle the handle, and you're ok with that, you just want to pickle the part that can be pickled. As your object stands now, it can't be pickled because it has the handle. Do I have that right? If so, read on.
The pickle module will let your class describe its own state to pickle, for exactly these cases. You want to define your own __getstate__ method. The pickler will invoke it to get the state to be pickled, only if the method is missing does it go ahead and do the default thing of trying to pickle all the attributes.

Categories