Python : serialise class hierarchy - python

I have to serialise a dynamically created class hierarchy. And a bunch of objects - instances of the latter classes.
Python pickle is not of big help, its wiki says "Classes ... cannot be pickled". O there may be some trick that I cannot figure.
Performance requirement:
Deserialization should be pretty fast, because the serialised staff serves for cache and should save me the work of creating the same class hierarchy.
Details:
classes are created dynamically using type and sometimes meta-classes.

If you provide a custom object.__reduce__() method I believe you can still use pickling.
Normally, when pickling, the class import path is stored, plus instance state. On unpickling, the class is imported, and a new instance is created using the stored state. This is why pickling cannot work with dynamic classes, there is nothing to import.
The object.__reduce__() method lets you store a different instance factory. The callable returned by this function is stored (again by import path), and called with specified arguments to produce an instance. This instance is then used to apply state to, in the same way a regular instance would be unpickled:
def class_factory(name):
return globals()[name]()
class SomeDynamicClass(object):
def __reduce__(self):
return (class_factory, (type(self).__name__,), self.__dict__)
Here __reduce__ returns a function, the arguments for the function, and the instance state.
All you need to do then, is provide the right arguments to the factory function to recreate the class, and return an instance of that class. It'll be used instead of importing the class directly.

Classes are normal python objects, so, in theory, should be picklable, if you provide __reduce__ (or implement other pickle protocol methods) for them. Try to define __reduce__ on their metaclass.

Related

Verifying that an object instance complies with ABC in Python

I have an API that receives a serialized representation of an object that I expect to comply with a particular interface (known at development time). The serialized data I receive includes details that are used to create implementations of the methods/properties of this interface, so the actual object gets constructed at runtime; however, method names, signatures, property types, etc. are expected to match those known from the interface at development time. I would like to be able to construct this object at runtime, and then verify interface compliance, preferably failing immediately once an invalid object is constructed, not just when I try to invoke a method that's not there.
I am new to Python, so I am not sure if there is an idiomatic way of doing such a check. I have investigated using Abstract Base Classes, and annotating my constructed object with such a class. Using annotations is convenient during development time because I can get intellisense in VSCode, but they are not used to verify that my constructed object implements the ABC correctly at runtime - for example, when it is passed into a method as a parameter, like this:
def my_method(self, generated_object: MyABC):
Is there another approach to doing what I have described (casting/coercing to the ABC, or perhaps using a different language feature)? Or is my best bet to implement my own validator that will compare the methods/properties on the constructed object vs those on the ABC?
import abc
class Base(abc.ABC):
#abc.abstractmethod
def i_require_this(self):
pass
class Concrete(Base):
def __init__(self):
return
concrete = Concrete()
TypeError: Can't instantiate abstract class Concrete with abstract methods i_require_this

Why must instance variables be defined inside of methods?

Why must instance variables be defined inside of methods? In other words why must self only be used to define new variables inside of methods in a class. Why can't you define variables using self as part of the class, but outside of methods.
"Instance variables are those variables for which each class object has it's own copy of it" - this definition doesn't say anything about methods. So, given that the definition doesn't mention methods why can't I define an instance variable (in other words use self to define a new variable) inside of a class, but outside of a method?
Python requires the object reference (implicit or explicit this in Java, for example) to be explicit. Inside methods -- bound functions -- the first param in the function definition is the instance. (This is conventionally called self but you can use any name.)
If you define
class C:
x = 1
there is no self reference, unlike, e.g. Java, where this is implicit.
Because the mechanism which Python uses to deal with OOP are very simple. There's no special syntax to define classes really, the class keyword is a very thin layer over what amounts to creating a dict. Everything you define inside a class Foo: block basically ends up as the contents of Foo.__dict__. So there's no syntax to define attributes of the instance resulting from calling Foo(). You add instance attributes simply by attaching them to the object you get from calling Foo(), which is self in __init__ or other instance methods.
For that to answer you need to know a little bit how the Python interpreter works.
In general every class and method definition are separate objects.
What you do when calling a method is that you pass the class instance as first parameter to the method. With that the method knows on what instance it is running on (and therefore where to allocate instance variables to).
This however only counts for instance methods.
Of course you can also create classmethods with #classmethod these take the class type as argument instead of an instance and can therefore not be used to create variables on the self context.
Why must instance variables be defined inside of methods?
They don't. You can define them from anywhere, as long as you have an instance (of a mutable type):
class Foo(object):
pass
f = Foo()
f.bar = 42
print(f.bar)
In other words why must self only be used to define new variables inside of methods in a class. Why can't you define variables using self as part of the class, but outside of methods.
self (which is only a naming convention, there's absolutely nothing magical here) is used to represent the current instance. How could you use it at the class block's top-level where you don't have any instance at all (and not even the class itself FWIW) ?
Defining the class "members" at the class top-level is mostly a static languages thing, where "objects" are mainly (technically) structs (C style structs, or Pascal style records if you prefer) with a statically defined memory structure.
Python is a dynamic language, which instead uses dicts as supporting data structure, so someobj.attribute is usually (minus computed attributes etc) resolved as someobj.__dict__["attribute"] (and someobj.attribute = value as someobj.__dict__["attribute"] = value).
So 1/ it doesn't NEED to have a fixed, explicitely defined data structure, and 2/ yet it DOES need to have an instance at end to set an attribute on it.
Note that you can force a class to use a fixed memory structure (instead of a plain dict) using slots, but you will still need to set the values from within a method (canonically the __init__, which exists for this very reason: initializing the instance's attributes).

Questions related to classes

I have a problem understanding some concepts of data structures in Python, in the following code.
class Stack(object): #1
def __init__(self): #2
self.items=[]
def isEmpty(self):
return self.items ==[]
def push(self,item):
self.items.append(item)
def pop(self):
self.items.pop()
def peak(self):
return self.items[len(self.items)-1]
def size(self):
return len(self.items)
s = Stack()
s.push(3)
s.push(7)
print(s.peak())
print (s.size())
s.pop()
print (s.size())
print (s.isEmpty())
I don't understand what is this object argument
I replaced it with (obj) and it generated an error, why?
I tried to remove it and it worked perfectly, why?
Why do I have __init__ to set a constructor?
self is an argument, but how does it get passed? and which object does it represent, the class it self?
Thanks.
object is a class, from which class Stack inherits. There is no
class obj, hence error. However, you can define a class that does
not inherit from anything (at least, in Python 2).
self represents an object on which the method is called; for
example when you do s.pop(), self inside method pop refers to
the same object as s - it is not a class, it is an instance of the class.
1
object here is the class your new class inherits from. There is already a base class named object, but there is no class named obj which is why replacing object with obj would cause an error. Anyway in your example code it is not needed at all since all classes in python 3 implicitly extends the object class.
2
__init__ is the constructor of the object and self there represents the object that you are creating itself, not the class, just like in the other methods you made.
Point 1:
Some history required here... Originally Python had two distinct kind of types, those implemented in C (whether in the stdlib or C extensions) and those implemented in Python with the class statement. Python 2.2 introduced a new object model (known as "new-style classes") to unify both, but kept the "classic" (aka "old-style") model for compatibility. This new model also introduced quite a lot of goodies like support for computed attributes, cooperative super calls via the super() object, metaclasses etc, all of which coming from the builtin object base class.
So in Python 2.2.x to 2.7.x, you can either create a new-style class by inheriting from object (or any subclass of object) or an old-style one by not inheriting from object (nor - obviously - any subclass of object).
In Python 2.7., since your example Stack class does not use any feature of the new object model, it works as well as an 'old-style' or as a 'new-style' class, but try to add a custom metaclass or a computed attribute and it will break in one way or another.
Python 3 totally removed old-style classes support and object is the defaut base class if you dont explicitely specify one, so whatever you do your class WILL inherit from object and will work as well with or without explicit parent class.
You can read this for more details.
Point 2.1 - I'm not sure I understand the question actually, but anyway:
In Python, objects are not fixed C-struct-like structures with a fixed set of attributes, but dict-like mappings (well there are exceptions but let's ignore them for the moment). The set of attributes of an object is composed of the class attributes (methods mainly but really any name defined at the class level) that are shared between all instances of the class, and instance attributes (belonging to a single instance) which are stored in the instance's __dict__. This imply that you dont define the instance attributes set at the class level (like in Java or C++ etc), but set them on the instance itself.
The __init__ method is there so you can make sure each instance is initialised with the desired set of attributes. It's kind of an equivalent of a Java constructor, but instead of being only used to pass arguments at instanciation, it's also responsible for defining the set of instance attributes for your class (which you would, in Java, define at the class level).
Point 2.2 : self is the current instance of the class (the instance on which the method is called), so if s is an instance of your Stack class, s.push(42) is equivalent to Stack.push(s, 42).
Note that the argument doesn't have to be called self (which is only a convention, albeit a very strong one), the important part is that it's the first argument.
How s get passed as self when calling s.push(42) is a bit intricate at first but an interesting example of how to use a small feature set to build a larger one. You can find a detailed explanation of the whole mechanism here, so I wont bother reposting it here.

What's the best way to extend the functionality of factory-produced classes outside of the module in python?

I've been reading lots of previous SO discussions of factory functions, etc. and still don't know what the best (pythonic) approach is to this particular situation. I'll admit up front that i am imposing a somewhat artificial constraint on the problem in that i want my solution to work without modifying the module i am trying to extend: i could make modifications to it, but let's assume that it must remain as-is because i'm trying to understand best practice in this situation.
I'm working with the http://pypi.python.org/pypi/icalendar module, which handles parsing from and serializing to the Icalendar spec (hereafter ical). It parses the text into a hierarchy of dictionary-like "component" objects, where every "component" is an instance of a trivial derived class implementing the different valid ical types (VCALENDAR, VEVENT, etc.) and they are all spit out by a recursive factory from the common parent class:
class Component(...):
#classmethod
def from_ical(cls, ...)
I have created a 'CalendarFile' class that extends the ical 'Calendar' class, including in it generator function of its own:
class CalendarFile(Calendar):
#classmethod
def from_file(cls, ics):
which opens a file (ics) and passes it on:
instance = cls.from_ical(f.read())
It initializes and modifies some other things in instance and then returns it. The problem is that instance ends up being a Calendar object instead of a CalendarFile object, in spite of cls being CalendarFile. Short of going into the factory function of the ical module and fiddling around in there, is there any way to essentially "recast" that object as a 'CalendarFile'?
The alternatives (again without modifying the original module) that I have considered are:make the CalendarFile class a has-a Calendar class (each instance creates its own internal instance of a Calendar object), but that seems methodically stilted.
fiddle with the returned object to give it the methods it needs (i know there's a term for creating a customized object but it escapes me).
make the additional methods into functions and just have them work with instances of Calendar.
or perhaps the answer is that i shouldn't be trying to subclass from a module in the first place, and this type of code belongs in the module itself.
Again i'm trying to understand what the "best" approach is and also learn if i'm missing any alternatives. Thanks.
Normally, I would expect an alternative constructor defined as a classmethod to simply call the class's standard constructor, transforming the arguments that it receives into valid arguments to the standard constructor.
>>> class Toy(object):
... def __init__(self, x):
... self.x = abs(x)
... def __repr__(self):
... return 'Toy({})'.format(self.x)
... #classmethod
... def from_string(cls, s):
... return cls(int(s))
...
>>> Toy.from_string('5')
Toy(5)
In most cases, I would strongly recommend something like this approach; this is the gold standard for alternative constructors.
But this is a special case.
I've now looked over the source, and I think the best way to add a new class is to edit the module directly; otherwise, scrap inheritance and take option one (your "has-a" option). The different classes are all slightly differentiated versions of the same container class -- they shouldn't really even be separate classes. But if you want to add a new class in the idiom of the code as it it is written, you have to add a new class to the module itself. Furthermore, from_iter is deceptively named; it's not really a constructor at all. I think it should be a standalone function. It builds a whole tree of components linked together, and the code that builds the individual components is buried in a chain of calls to various factory functions that also should be standalone functions but aren't. IMO much of that code ought to live in __init__ where it would be useful to you for subclassing, but it doesn't.
Indeed, none of the subclasses of Component even add any methods. By adding methods to your subclass of Calendar, you're completely disregarding the actual idiom of the code. I don't like its idiom very much but by disregarding that idiom, you're making it even worse. If you don't want to modify the original module, then forget about inheritance here and give your object a has-a relationship to Calendar objects. Don't modify __class__; establish your own OO structure that follows standard OO practices.

Python: Pickle derived classes as if they were an instance of the base class

I want to define a base class so that when derived class instances are pickled, they are pickled as if they are instances of the base class. This is because the derived classes may exist on the client side of the pickling but not on the server side, but this is not important to the server since it only needs information from the base class. I don't want to have to dynamically create classes for every client.
The base class is simply an "object handle" which contains an ID, with methods defined on the server, but I would like the client to be able to subclass the server classes and define new methods (which would only be seen by the client, but that doesn't matter).
I believe you can do it by giving the object a __reduce__ method, returning a tuple, the first part of which should be BaseClass.__new__ (this will be called when loading the object in unpickling). See the pickle documentation (Python 2, Python 3) for the full details. I haven't attempted this.
Depending on what you're doing, it might be easier to use a simpler serialisation format like JSON, and have code on each side to reconstruct the relevant objects.
You can change an object's class dynamically in Python:
import cPickle as pickle
class Foo(object):
def __init__(self):
self.id=1
class Bar(Foo):
def derived_class_method(self): pass
bar=Bar()
bar.id=2
bar.__class__=Foo # changes `bar`'s class to Foo
bar_pickled=pickle.dumps(bar)
bar2=pickle.loads(bar_pickled)
bar.__class__=Bar # reset `bar`'s class to Bar
print(repr(bar2))
# <__main__.Foo object at 0xb76b08ec>
print(bar2.id)
# 2
I'm not sure using this is the best design decision, however. I like Thomas K's idea of using JSON.

Categories