Unpickle sometimes makes blank objects - python

I'm trying to use pickle to save a custom class; something very much like the code below (though with a few methods defined on the class, and several more dicts and such for data). However, often when I run this, pickle and then unpickle, I lose whatever data was in the class, and its as if I created a new blank instance.
import pickle
class MyClass:
VERSION = 1
some_data = {}
more_data = set()
def save(self,filename):
with open(filename, 'wb') as f:
p = pickle.Pickler(f)
p.dump(self)
def load(filename):
with open(filename,'rb') as ifile:
u = pickle.Unpickler(ifile)
obj = u.load()
return obj
I was wondering if this had something to do with the memo of the pickle class, but I don't feel like it should. When it doesn't work, I look at my generated file and it looks something like this: (Obviously not meant to be readable, but it obviously contains no data)
€c__main__
MyClass
q
Anyways, I hope this is enough for someone to understand what might possibly be going on here, or what to look at.

The problem you're having is that you're using mutable class variables to hold your data, rather than putting the data into instance variables.
The pickle module only saves the data stored directly on the instance, not class variables that can also be accessed via self. When you're finding your unpickled instance have no data, what that probably means is that the class doesn't hold the data from the previous run, so the instances can't access it any more.
Using class variables that way will probably cause you other problems too, as the data will be shared by all instances of the class! Here's a Python console session code that illustrates the issue:
>>> class Foo(object):
class_var = []
def __init__(self, value):
self.class_var.append(value)
>>> f1 = Foo(1)
>>> f1.class_var
[1]
>>> f2 = Foo(2)
>>> f2.class_var
[1, 2]
That's probably not what you wanted. But it gets worse!
>>> f1.class_var
[1, 2]
The data you thought had belonged to f1 has been changed by the creation of f2. In fact, f1.class_var is the very same object as f2.class_var (it is also available via Foo.class_var directly, without going through any instances at all).
So, using a class variable is almost certainly not what you want. Instead, write an __init__ method for the class that creates a new value and saves it as an instance variable:
>>> class Bar(object):
def __init__(self, value):
self.instance_var = [] # creates a separate list for each instance!
self.instance_var.append(value)
>>> b1 = Bar(1)
>>> b1.instance_var
[1]
>>> b2 = Bar(2)
>>> b2.instance_var # doesn't include value from b1
[2]
>>> b1.instance_var # b1's data is unchanged
[1]
Pickle will handle this class as you expect. All of its data is in the instances, so you should never end up with an empty instance when you unpickle.

Related

Can't access global variables from Python server

I am working on a small game server in Python using the class SimpleWebSocketServer found here. Everything works great, but the problem is each time I want to access a variable, I have to use self.variable_name. Let me give an example.
class SimpleEcho(WebSocket):
times_played_since_reset = 0
def handleMessage(self):
global times_played_since_reset
print times_played_since_reset
Whenever I try accessing times_played_since_reset using global it doesn't work and the server quits. Make it self.times_played_since_reset and everything works.
This variable needs affected by EVERY client connected. Unfortunately when I make it using self, only the client affects it's own instance. I need it to be where the client affects the class-wide variable instead of self.
You might want to consider using a mutable type for times_played_since_reset if you want it to be shared between all instances of the class.
Integers are not mutable, so they are not shared. As mentioned in the comments above, you could explicitly modify the class variable by doing something like SimpleEcho.times_played_since_reset += 1 however this only works as long as the instance attribute of the same name has not explicitly been set/modified.
For instance, take this example class:
class Foo(object):
bar = 1
If we create two instances:
>>> x = Foo()
>>> y = Foo()
Then:
>>> x.bar
1
>>> y.bar
1
And if we do:
>>> Foo.bar += 1
Then
>>> x.bar
2
>>> y.bar
2
But if we do:
>>> x.bar = 7
>>> Foo.bar +=1
Then:
>>> x.bar
7
>>> y.bar
3
If instead you were to use a mutable type like a list, For example like:
class Foo(object):
bar = [1]
whether you modify Foo.bar[0] or <instance>.bar.[0] all current and future instances would see the change. This is because they all reference the same list, and you have modified the contents of the list rather than changing the specific object the variable points to.
However, if you were to assign a new list via <instance>.bar = [78], only that instance would see the change and all other instances (current and future) would still reference the original list that was defined in the class definition.
To access global variables from python server, use:
(Class Name).variable
SimpleEcho.times_played_since_reset

How to get the object for a given class name in Python?

Is there any way to get the object name when the class name is known. If there are multiple objects for a class they also need to be printed.
Class A():
pass
Assume that some one have created objects for class A in some other files. So, I want to look all instances of 'Class A'
If you are the one creating the class you can simply store weak-references when instantiating the class:
import weakref
class A(object):
instances = []
def __init__(self):
A.instances.append(weakref.ref(self))
a, b, c = A(), A(), A()
instances = [ref() for ref in A.instances if ref() is not None]
Using weak-references allow the instances to be deallocated before the class.
See the weakref module for details on what it does.
Note that you may be able to use this technique even with classes that you didn't write. You simply have to monkey-patch the class.
For example:
def track_instances(cls):
def init(self, *args, **kwargs):
getattr(self, 'instances').append(weakref.ref(self))
getattr(self, '_old_init')(self, *args, **kwargs)
cls._old_init = cls.__init__
cls.__init__ = init
return cls
Then you can do:
track_instances(ExternalClass)
And all instances created after the execution of this statement will be found in ExternalClass.instances.
Depending on the class you may have to replace __new__ instead of __init__.
You can do this even without any special code in the class, simply using the garbage collector:
import gc
candidates = gc.get_referrers(cls_object)
instances = [candidate for candidate in candidates if isinstance(candidate, cls_object)]
And you can always obtain the class object since you can find it using object.__subclasses__ method:
cls_object = next(cls for cls in object.__subclasses__() if cls.__name__ == cls_name)
(assuming there is only a class with that name, otherwise you should try all of them)
However I cannot think of a situation where this is the right thing to do, so avoid this code in real applications.
I've done some testing and I believe that this solution may not work for built-in classes or classes defined in C extensions.
If you are in this case the last resort is to use gc.get_objects() to retrieve all tracked objects. However this will work only if the object support cyclic garbage collection, so there isn't a method that works in every possible situation.
Here the version getting the instances from memory, I wouldn't recommend using this in live code but it can be convenient for debugging:
import weakref
class SomeClass(object):
register = []
def __init__(self):
self.register.append(weakref.ref(self))
a = SomeClass()
b = SomeClass()
c = SomeClass()
# Now the magic :)
import gc
def get_instances(class_name):
# Get the objects from memory
for instance in gc.get_objects():
# Try and get the actual class
class_ = getattr(instance, '__class__', None)
# Only return if the class has the name we want
if class_ and getattr(class_, '__name__', None) == class_name:
yield instance
print list(get_instances('SomeClass'))
Python provides the types module that defined classes for built-in types and the locals() and globals() functions that return a list of local and global variables in the application.
One quick way to find objects by type is to do this.
import types
for varname, var_instance in locals().items():
if type(var_instance) == types.InstanceType and var_instance.__class__.__name__ == 'CLASS_NAME_YOU_ARE_LOOKING_FOR':
print "This instance was found:", varname, var_instance
It's worth going through the Python library documentation and read the docs for modules that work with the code directly. Some of which are inspect, gc, types, codeop, code, imp, ast. bdb, pdb. The IDLE source code is also very informative.
Instances are created within a namespace:
def some_function():
some_object = MyClass()
In this case, some_object is a name inside the "namespace" of the function that points at a MyClass instance. Once you leave the namespace (i.e., the function ends), Python's garbage collection cleans up the name and the instance.
If there would be some other location that also has a pointer to the object, the cleanup wouldn't happen.
So: no, there's no place where a list of instances is maintained.
It would be a different case where you to use a database with an ORM (object-relational mapper). In Django's ORM you can do MyClass.objects.all() if MyClass is a database object. Something to look into if you really need the functionality.
Update: See Bakuriu's answer. The garbage collector (which I mentioned) knows about all the instances :-) And he suggests the "weakref" module that prevents my won't-be-cleaned-up problem.
You cann get names for all the instances as they may not all have names, or the names they do have may be in scope. You may be able to get the instances.
If you are willing to keep track of the instances yourself, use a WeakSet:
import weakref
class SomeClass(object):
instances = weakref.WeakSet()
def __init__(self):
self.instances.add(self)
>>> instances = [SomeClass(), SomeClass(), SomeClass()]
>>> other = SomeClass()
>>> SomeClass.instances
<_weakrefset.WeakSet object at 0x0291F6F0>
>>> list(SomeClass.instances)
[<__main__.SomeClass object at 0x0291F710>, <__main__.SomeClass object at 0x0291F730>, <__main__.SomeClass object at 0x028F0150>, <__main__.SomeClass object at 0x0291F210>]
Note that just deleting a name may not destroy the instance. other still exists until the garbage collected:
>>> del other
>>> list(SomeClass.instances)
[<__main__.SomeClass object at 0x0291F710>, <__main__.SomeClass object at 0x0291F730>, <__main__.SomeClass object at 0x028F0150>, <__main__.SomeClass object at 0x0291F210>]
>>> import gc
>>> gc.collect()
0
>>> list(SomeClass.instances)
[<__main__.SomeClass object at 0x0291F710>, <__main__.SomeClass object at 0x0291F730>, <__main__.SomeClass object at 0x0291F210>]
If you don't want to track them manually, then it is possible to use gc.get_objects() and filter out the instances you want, but that means you have to filter through all the objects in your program every time you do this. Even in the above example that means processing nearly 12,000 objects to find the 3 instances you want.
>>> [g for g in gc.get_objects() if isinstance(g, SomeClass)]
[<__main__.SomeClass object at 0x0291F210>, <__main__.SomeClass object at 0x0291F710>, <__main__.SomeClass object at 0x0291F730>]
>>> class TestClass:
... pass
...
>>> foo = TestClass()
>>> for i in dir():
... if isinstance(eval(i), TestClass):
... print(i)
...
foo
>>>
Finally found a way to get through.
As I know the class name, I would search for the object created for that class in garbage collector(gc) like this...
for instance in gc.get_objects():
if str(type(instance)).find("dict") != -1:
for k in instance.keys():
if str(k).find("Sample") != -1:
return k
The above code returns an instance of the class which will be like this. Unfortunately,its in String format which doesn't suit the requirement. It should be of 'obj' type.
<mod_example.Sample object at 0x6f55250>
From the above value, parse the id(0x6f55250) and get the object reference based on the id.
obj_id = 0x6f55250
for obj in gc.get_objects():
# Converting decimal value to hex value
if id(obj) == ast.literal_eval(obj_id):
required_obj = obj
Hence required_obj will hold the object reference exactly in the 'obj' format.
:-)

Better way to initialize python ctypes structure field

Is there a better way to initialize a ctypes field that is meant to be static/constant than what I have below?
from ctypes import *
class foo(LittleEndianStructure):
_fields_ = [
("signature", c_ulonglong),
]
def __init__(self):
super(LittleEndianStructure,self).__init__()
self.signature = 0x896489648964
f = foo()
print hex(f.signature)
For example, I was hoping I could do something similar to how you could do it with a normal python object:
class bar:
signature = 0x896489648964
b = bar()
print hex(b.signature)
The short answer is no, you can't do this, and shouldn't want to.
Your normal Python object sample doesn't do what you think. It's not automatically initializing an instance attribute; it's creating a class attribute instead.
They work similarly in some cases, but they're not the same thing. For example, compare:
>>> class Foo(object):
... bar=[]
... def __init__(self):
... self.baz=[]
...
>>> f1 = Foo()
>>> f2 = Foo()
>>> f1.bar.append(100)
>>> f1.baz.append(100)
>>> f2.bar
[100]
>>> f2.baz
[]
Here, f1 and f2 each initialize their own baz, but they do not automatically initialize their own bar—they share a single bar with every other instance.
And, more directly relevant to this case:
>>> f1.__dict__
{'baz': [1]}
The bar class attribute is not part of f1's dictionary.
So, translating the same thing to ctypes, your "signature" would not be a member of your structure if you made it a class attribute—that is, it wouldn't be laid out in memory as part of each instance. Which would defeat the entire purpose of having it.
If you know C++, it may help to look at it in C++ terms.
A class attribute, like bar above, is sort of* like a static member variable in C++, while an instance attribute is like a normal instance member variable.
In this C++ code:
struct bar {
static const long signature = 0x896489648964;
};
… each bar is actually an empty structure; there's a single bar::signature stored somewhere else in memory. You can reference it through bar instances, but only because the compiler turns b1.signature into bar::signature.
* The reason I say "sort of" is that Python class attributes can be overridden by subclasses, while C++ static members can't, they really are just global variables, and they can only be "hidden" by subclasses.

Dumping a subclass of gtk.ListStore using pickle

I am trying to dump a custom class using pickle. The class was subclassed from gtk.ListStore, since that made it easier to store particular data and then display it using gtk. This can be reproduced as shown here.
import gtk
import pickle
import os
class foo(gtk.ListStore):
pass
if __name__=='__main__':
x = foo(str)
with open(os.path.expandvars('%userprofile%\\temp.txt'),'w') as f:
pickle.dump(x,f)
The solution that I have tried was to add a __getstate__ function into my class. As far as I understand the documentation, this should take precedence for pickle so that it no longer tries to serialize the ListStore, which it is unable to do. However, I still get an identical error from pickle.dump when I try to pickle my object. The error can be reproduced as follows.
import gtk
import pickle
import os
class foo(gtk.ListStore):
def __getstate__(self):
return 'bar'
if __name__=='__main__':
x = foo(str)
with open(os.path.expandvars('%userprofile%\\temp.txt'),'w') as f:
pickle.dump(x,f)
In each case, pickle.dump raises a TypeError, "can't pickle ListStore objects". Using print statements, I have verified that the __getstate__ function is run when using pickle.dump. I don't see any hints as to what to do next from the documentation, and so I'm in a bit of a bind. Any suggestions?
With this method you can even use json instead of pickle for your purpose.
Here is a quick working example to show you the steps you need to employ to pickle "unpicklable types" like gtk.ListStore. Essentially you need to do a few things:
Define __reduce__ which returns a function and arguments needed to reconstruct the instance.
Determine the column types for your ListStore. The method self.get_column_type(0) returns a Gtype, so you will need to map this back to the corresponding Python type. I've left that as an exercise - in my example I've employed a hack to get the column types from the first row of values.
Your _new_foo function will need to rebuild the instance.
Example:
import gtk, os, pickle
def _new_foo(cls, coltypes, rows):
inst = cls.__new__(cls)
inst.__init__(*coltypes)
for row in rows:
inst.append(row)
return inst
class foo(gtk.ListStore):
def __reduce__(self):
rows = [list(row) for row in self]
# hack - to be correct you'll really need to use
# `self.get_column_type` and map it back to Python's
# corresponding type.
coltypes = [type(c) for c in rows[0]]
return _new_foo, (self.__class__, coltypes, rows)
x = foo(str, int)
x.append(['foo', 1])
x.append(['bar', 2])
s = pickle.dumps(x)
y = pickle.loads(s)
print list(y[0])
print list(y[1])
Output:
['foo', 1]
['bar', 2]
When you subclass object, object.__reduce__ takes care of calling __getstate__. It would seem that since this is a subclass of gtk.ListStore, the default implementation of __reduce__ tries to pickle the data for reconstructing a gtk.ListStore object first, then calls your __getstate__, but since the gtk.ListStore can't be pickled, it refuses to pickle your class. The problem should go away if you try to implement __reduce__ and __reduce_ex__ instead of __getstate__.
>>> class Foo(gtk.ListStore):
... def __init__(self, *args):
... super(Foo, self).__init__(*args)
... self._args = args
... def __reduce_ex__(self, proto=None):
... return type(self), self._args, self.__getstate__()
... def __getstate__(self):
... return 'foo'
... def __setstate__(self, state):
... print state
...
>>> x = Foo(str)
>>> pickle.loads(pickle.dumps(x))
foo
<Foo object at 0x18be1e0 (__main__+Foo-v3 at 0x194bd90)>
As an addition, you may try to consider other serializers, such as json. There you take full control of the serialiazaton process by defining how custom classes are to be serialized yourself. Plus by default they come without the security issues of pickle.

Why does python + pylons "remember" previously specified class variables?

I have a simple form in python + pylons that submits to a controller. However, each page load doesn't seem to be a fresh instantiation of the class. Rather, class variables specified on the previous page load are still accessible.
What's going on here? And what's the solution?
A common programmer oversight is that defining a list [] as a default argument or class initialiser is evaluated only once. If you have class variables such as lists, I recommend you initialise them in init. I'll give you an example.
>>> class Example(object):
... a = []
... def __init__(self):
... self.b = []
...
>>> foo = Example()
>>> bar = Example()
>>> foo.a
[]
>>> bar.a
[]
>>> foo.b
[]
>>> bar.b
[]
>>> foo.a.append(1)
>>> foo.b.append(2)
>>> foo.a
[1]
>>> foo.b
[2]
>>> bar.a
[1]
>>> bar.b
[]
Pylons uses a multi-threaded application server and variables are not cleared from request to request. This is a performance issue, as re-instantiating entire class trees would be expensive. Instead of storing the data returned by the user in a class, use a sessions system (Pylons comes with one or use something like Beaker) or back-end database like SQLAlchemy, SQLObject, or PyMongo.
Additionally, due to the multi-threaded nature of the framework, you should avoid shared objects (like globals) like the plague unless you are very careful to ensure you are using them in a thread-safe way (e.g. read-only). Certain Pylons-supplied objects (request/response) have been written to be thread-local, so don't worry about those.

Categories