I have a class whose instances need to format output as instructed by the user. There's a default format, which can be overridden. I implemented it like this:
class A:
def __init__(self, params):
# ...
# by default printing all float values as percentages with 2 decimals
self.format_functions = {float: lambda x : '{:.2%}'.format(x)}
def __str__(self):
# uses self.format_functions to format output
# ...
a = A(params)
print(a) # uses default output formatting
# overriding default output formatting
# float printed as percentages 3 decimal digits; bool printed as Y / N
a.format_functions = {float : lambda x: '{:.3%}'.format(x),
bool : lambda x: 'Y' if x else 'N'}
print(a)
Is it ok? Let me know if there is a better way to design this.
Unfortunately, I need to pickle instances of this class. But only functions defined at the top level of the module can be pickled; lambda functions are unpicklable, so my format_functions instance attribute breaks the pickling.
I tried rewriting this to use a class method instead of lambda functions, but still no luck for the same reason:
class A:
#classmethod
def default_float_format(cls, x):
return '{:.2%}'.format(x)
def __init__(self, params):
# ...
# by default printing all float values as percentages with 2 decimals
self.format_functions = {float: self.default_float_format}
def __str__(self):
# uses self.format_functions to format output
# ...
a = A(params)
pickle.dump(a) # Can't pickle <class 'method'>: attribute lookup builtins.method failed
Note that pickling here doesn't work even if I don't override the defaults; just the fact that I assigned self.format_functions = {float : self.default_float_format} breaks it.
What to do? I'd rather not pollute the namespace and break encapsulation by defining default_float_format at the module level.
Incidentally, why in the world does pickle create this restriction? It certainly feels like a gratuitous and substantial pain to the end user.
For pickling of class instances or functions (and therefore methods), Python's pickle depend that their name is available as global variables - the reference to the method in the dictionary points to a name that is not available in the global name space - which iis better said "module namespace" -
You could circunvent that by customizing the pickling of your class, by creating teh "__setstate__" and "__getstate__" methods - but I think you be better, since the formatting function does not depend on any information of the object or of the class itself (and even if some formatting function does, you could pass that as parameters), and define a function outside of the class scope.
This does work (Python 3.2):
def default_float_format( x):
return '{:.2%}'.format(x)
class A:
def __init__(self, params):
# ...
# by default printing all float values as percentages with 2 decimals
self.format_functions = {float: default_float_format}
def __str__(self):
# uses self.format_functions to format output
pass
a = A(1)
pickle.dumps(a)
If you use the dill module, either of your two approaches will just "work" as is. dill can pickle lambda as well as instances of classes and also class methods.
No need to pollute the namespace and break encapsulation, as you said you didn't want to do… but the other answer does.
dill is basically ten years or so worth of finding the right copy_reg function that registers how to serialize the majority of objects in standard python. Nothing special or tricky, it just takes time. So why doesn't pickle do this for us? Why does pickle have this restriction?
Well, if you look at the pickle docs, the answer is there:
https://docs.python.org/2/library/pickle.html#what-can-be-pickled-and-unpickled
Basically: Functions and classes are pickled by reference.
This means pickle does not work on objects defined in __main__, and it also doesn't work on many dynamically modified objects. dill registers __main__ as a module, so it has a valid namespace. dill also given you the option to not pickle by reference, so you can serialize dynamically modified objects… and class instances, class methods (bound and unbound), and so on.
Related
If someone writes a class in python, and fails to specify their own __repr__() method, then a default one is provided for them. However, suppose we want to write a function which has the same, or similar, behavior to the default __repr__(). However, we want this function to have the behavior of the default __repr__() method even if the actual __repr__() for the class was overloaded. That is, suppose we want to write a function which has the same behavior as a default __repr__() regardless of whether someone overloaded the __repr__() method or not. How might we do it?
class DemoClass:
def __init__(self):
self.var = 4
def __repr__(self):
return str(self.var)
def true_repr(x):
# [magic happens here]
s = "I'm not implemented yet"
return s
obj = DemoClass()
print(obj.__repr__())
print(true_repr(obj))
Desired Output:
print(obj.__repr__()) prints 4, but print(true_repr(obj)) prints something like:
<__main__.DemoClass object at 0x0000000009F26588>
You can use object.__repr__(obj). This works because the default repr behavior is defined in object.__repr__.
Note, the best answer is probably just to use object.__repr__ directly, as the others have pointed out. But one could implement that same functionality roughly as:
>>> def true_repr(x):
... type_ = type(x)
... module = type_.__module__
... qualname = type_.__qualname__
... return f"<{module}.{qualname} object at {hex(id(x))}>"
...
So....
>>> A()
hahahahaha
>>> true_repr(A())
'<__main__.A object at 0x106549208>'
>>>
Typically we can use object.__repr__ for that, but this will to the "object repr for every item, so:
>>> object.__repr__(4)
'<int object at 0xa6dd20>'
Since an int is an object, but with the __repr__ overriden.
If you want to go up one level of overwriting, we can use super(..):
>>> super(type(4), 4).__repr__() # going up one level
'<int object at 0xa6dd20>'
For an int that thus again means that we will print <int object at ...>, but if we would for instance subclass the int, then it would use the __repr__ of int again, like:
class special_int(int):
def __repr__(self):
return 'Special int'
Then it will look like:
>>> s = special_int(4)
>>> super(type(s), s).__repr__()
'4'
What we here do is creating a proxy object with super(..). Super will walk the method resolution order (MRO) of the object and will try to find the first function (from a superclass of s) that has overriden the function. If we use single inheritance, that is the closest parent that overrides the function, but if it there is some multiple inheritance involved, then this is more tricky. We thus select the __repr__ of that parent, and call that function.
This is also a rather weird application of super since usually the class (here type(s)) is a fixed one, and does not depend on the type of s itself, since otherwise multiple such super(..) calls would result in an infinite loop.
But usually it is a bad idea to break overriding anyway. The reason a programmer overrides a function is to change the behavior. Not respecting this can of course sometimes result into some useful functions, but frequently it will result in the fact that the code contracts are no longer satisfied. For example if a programmer overrides __eq__, he/she will also override __hash__, if you use the hash of another class, and the real __eq__, then things will start breaking.
Calling magic function directly is also frequently seen as an antipattern, so you better avoid that as well.
Here's a simple class created declaratively:
class Person:
def say_hello(self):
print("hello")
And here's a similar class, but it was defined by invoking the metaclass manually:
def say_hello(self):
print("sayolala")
say_hello.__qualname__ = 'Person.say_hello'
TalentedPerson = type('Person', (), {'say_hello': say_hello})
I'm interested to know whether they are indistinguishable. Is it possible to detect such a difference from the class object itself?
>>> def was_defined_declaratively(cls):
... # dragons
...
>>> was_defined_declaratively(Person)
True
>>> was_defined_declaratively(TalentedPerson)
False
This should not matter, at all. Even if we dig for more attributes that differ, it should be possible to inject these attributes into the dynamically created class.
Now, even without the source file around (from which, things like inspect.getsource can make their way, but see below), class body statements should have a corresponding "code" object that is run at some point. The dynamically created class won't have a code body (but if instead of calling type(...) you call types.new_class you can have a custom code object for the dynamic class as well - so, as for my first statement: it should be possible to render both classes indistinguishable.
As for locating the code object without relying on the source file (which, other than by inspect.getsource can be reached through a method's .__code__ attibute which anotates co_filename and co_fistlineno (I suppose one would have to parse the file and locate the class statement above the co_firstlineno then)
And yes, there it is:
given a module, you can use module.__loader__.get_code('full.path.tomodule') - this will return a code_object. This object has a co_consts attribute which is a sequence with all constants compiled in that module - among those are the code objects for the class bodies themselves. And these, have the line number, and code objects for the nested declared methods as well.
So, a naive implementation could be:
import sys, types
def was_defined_declarative(cls):
module_name = cls.__module__
module = sys.modules[module_name]
module_code = module.__loader__.get_code(module_name)
return any(
code_obj.co_name == cls.__name__
for code_obj in module_code.co_consts
if isinstance(code_obj, types.CodeType)
)
For simple cases. If you have to check if the class body is inside another function, or nested inside another class body, you have to do a recursive search in all code objects .co_consts attribute in the file> Samething if you find if safer to check for any attributes beyond the cls.__name__ to assert you got the right class.
And again, while this will work for "well behaved" classes, it is possible to dynamically create all these attributes if needed - but that would ultimately require one to replace the code object for a module in sys.__modules__ - it starts to get a little more cumbersome than simply providing a __qualname__ to the methods.
update
This version compares all strings defined inside all methods on the candidate class. This will work with the given example classess - more accuracy can be achieved by comparing other class members such as class attributes, and other method attributes such as variable names, and possibly even bytecode. (For some reason, the code object for methods in the module's code object and in the class body are different instances,though code_objects should be imutable) .
I will leave the implementation above, which only compares the class names, as it should be better for understanding what is going on.
def was_defined_declarative(cls):
module_name = cls.__module__
module = sys.modules[module_name]
module_code = module.__loader__.get_code(module_name)
cls_methods = set(obj for obj in cls.__dict__.values() if isinstance(obj, types.FunctionType))
cls_meth_strings = [string for method in cls_methods for string in method.__code__.co_consts if isinstance(string, str)]
for candidate_code_obj in module_code.co_consts:
if not isinstance(candidate_code_obj, types.CodeType):
continue
if candidate_code_obj.co_name != cls.__name__:
continue
candidate_meth_strings = [string for method_code in candidate_code_obj.co_consts if isinstance(method_code, types.CodeType) for string in method_code.co_consts if isinstance(string, str)]
if candidate_meth_strings == cls_meth_strings:
return True
return False
It is not possible to detect such difference at runtime with python.
You can check the files with a third-party app but not in the language since no matter how you define your classes they should be reduced to the objects which the interpreter knows how to manage.
Everything other is syntax sugar and its death with at the preprocessing step of the operations on the text.
The whole metaprogramming is a technique that lets you close to the compiler/interpreter work.
Revealing some of the type traits and giving you the freedom to work on the type with code.
It is possible — somewhat.
inspect.getsource(TalentedPerson) will fail with an OSError, whereas it will succeed with Person. This only works though if you don't have a class of that name in the file where it was defined:
If your file consists of both of these definitions, and TalentedPerson also believes it is Person, then inspect.getsource will simply find Person's definition.
Obviously this relies on the source code still being around and findable by inspect — this won't work with compiled code, e.g. in the REPL, can be tricked, and is sort of cheating. The actual code objects don't differ AFAIK.
I have a lookup table like this:
lookup = {
"<class 'Minority.mixin'>": report_minority,
"<class 'Majority.mixin'>": report_majority,
}
def report(o):
h = lookup[str(type(o))]
h()
It looks awkward to me as the key is precariously linked to how type() returns and represents a class in string. If one day Python changes its way to represents class types in string, all the codes like this are broken. So I want to have some advice from pros, is such type of keys considered good or bad? Thanks.
The question would more be why are you doing this in the first place? What advantage do you think using strings has?
Class objects are hashable, so you can use Minority.mixin and Majority.mixin directly as keys. And when you do so, you ensure that the key is always the exact same object provided your classes are globals in their respective modules, making them singletons your program. Moreover, there is no chance of accidental confusion when you later on refactor your code to rename modules and you end up with a different type with that exact repr() output.
So unless you have a specific usecase where you can't use the class directly, you should not be using the string representations.
(And even if you were to generate classes with a factory function, using a baseclass and isinstance checks or extracting a base class from the MRO would probably be preferable).
So, for your usecase, stick to:
lookup = {
Minority.mixin: report_minority,
Majority.mixin: report_majority,
}
def report(o):
h = lookup[type(o)])
h()
Next, if you ensure that report_minority and report_majority are functions (and not methods, for example), you can use functools.singledispatch() and dispense with your mapping altogether:
from functools import singledispatch
#singledispatch
def report(o):
raise ValueError('Unhandled type {}'.format(type(o)))
#report.register(Minority.mixin)
def report_minority(o):
# handle a Minority instance
#report.register(Majority.mixin)
def report_majority(o):
# handle a Majority instance
Note that this won't work with methods, as methods have to take multiple arguments for dispatch to work, as they always take self.
Single-dispatch handles subclasses much better, unlike your str-based mapping or even the direct class mapping.
Just remove the str from the lookup and from the dictionary. So
lookup = {
Minority.mixin : report_minority,
Majority.mixin : report_majority,
}
def report(o):
h = lookup[type(o)]
h()
This fixes your immediate concern and is fairly reasonable code. However, it seems like dispatching on the type is exactly what OO is for. So why not make those varying report functions methods on the o object derived from its type? Then you could just write:
o.report()
and the correct variant would be obtained from the class.
Take a look at that script, I don't think it's perfect but it works and I think it's more elegant than your solution :
#!/usr/bin/env python3
class Person:
pass
def display_person():
print("Person !")
class Car:
pass
def display_car():
print("Car !")
lookup = {
Person: display_person,
Car: display_car
}
def report(o):
lookup[o.__class__]()
report(Person())
report(Car())
Edit : modification for Python3
I want to clarify how variables are declared in Python.
I have seen variable declaration as
class writer:
path = ""
sometimes, there is no explicit declaration but just initialization using __init__:
def __init__(self, name):
self.name = name
I understand the purpose of __init__, but is it advisable to declare variable in any other functions?
How can I create a variable to hold a custom type?
class writer:
path = "" # string value
customObj = ??
Okay, first things first.
There is no such thing as "variable declaration" or "variable initialization" in Python.
There is simply what we call "assignment", but should probably just call "naming".
Assignment means "this name on the left-hand side now refers to the result of evaluating the right-hand side, regardless of what it referred to before (if anything)".
foo = 'bar' # the name 'foo' is now a name for the string 'bar'
foo = 2 * 3 # the name 'foo' stops being a name for the string 'bar',
# and starts being a name for the integer 6, resulting from the multiplication
As such, Python's names (a better term than "variables", arguably) don't have associated types; the values do. You can re-apply the same name to anything regardless of its type, but the thing still has behaviour that's dependent upon its type. The name is simply a way to refer to the value (object). This answers your second question: You don't create variables to hold a custom type. You don't create variables to hold any particular type. You don't "create" variables at all. You give names to objects.
Second point: Python follows a very simple rule when it comes to classes, that is actually much more consistent than what languages like Java, C++ and C# do: everything declared inside the class block is part of the class. So, functions (def) written here are methods, i.e. part of the class object (not stored on a per-instance basis), just like in Java, C++ and C#; but other names here are also part of the class. Again, the names are just names, and they don't have associated types, and functions are objects too in Python. Thus:
class Example:
data = 42
def method(self): pass
Classes are objects too, in Python.
So now we have created an object named Example, which represents the class of all things that are Examples. This object has two user-supplied attributes (In C++, "members"; in C#, "fields or properties or methods"; in Java, "fields or methods"). One of them is named data, and it stores the integer value 42. The other is named method, and it stores a function object. (There are several more attributes that Python adds automatically.)
These attributes still aren't really part of the object, though. Fundamentally, an object is just a bundle of more names (the attribute names), until you get down to things that can't be divided up any more. Thus, values can be shared between different instances of a class, or even between objects of different classes, if you deliberately set that up.
Let's create an instance:
x = Example()
Now we have a separate object named x, which is an instance of Example. The data and method are not actually part of the object, but we can still look them up via x because of some magic that Python does behind the scenes. When we look up method, in particular, we will instead get a "bound method" (when we call it, x gets passed automatically as the self parameter, which cannot happen if we look up Example.method directly).
What happens when we try to use x.data?
When we examine it, it's looked up in the object first. If it's not found in the object, Python looks in the class.
However, when we assign to x.data, Python will create an attribute on the object. It will not replace the class' attribute.
This allows us to do object initialization. Python will automatically call the class' __init__ method on new instances when they are created, if present. In this method, we can simply assign to attributes to set initial values for that attribute on each object:
class Example:
name = "Ignored"
def __init__(self, name):
self.name = name
# rest as before
Now we must specify a name when we create an Example, and each instance has its own name. Python will ignore the class attribute Example.name whenever we look up the .name of an instance, because the instance's attribute will be found first.
One last caveat: modification (mutation) and assignment are different things!
In Python, strings are immutable. They cannot be modified. When you do:
a = 'hi '
b = a
a += 'mom'
You do not change the original 'hi ' string. That is impossible in Python. Instead, you create a new string 'hi mom', and cause a to stop being a name for 'hi ', and start being a name for 'hi mom' instead. We made b a name for 'hi ' as well, and after re-applying the a name, b is still a name for 'hi ', because 'hi ' still exists and has not been changed.
But lists can be changed:
a = [1, 2, 3]
b = a
a += [4]
Now b is [1, 2, 3, 4] as well, because we made b a name for the same thing that a named, and then we changed that thing. We did not create a new list for a to name, because Python simply treats += differently for lists.
This matters for objects because if you had a list as a class attribute, and used an instance to modify the list, then the change would be "seen" in all other instances. This is because (a) the data is actually part of the class object, and not any instance object; (b) because you were modifying the list and not doing a simple assignment, you did not create a new instance attribute hiding the class attribute.
This might be 6 years late, but in Python 3.5 and above, you can give a hint about a variable type like this:
variable_name: type_name
or this:
variable_name # type: shinyType
This hint has no effect in the core Python interpreter, but many tools will use it to aid the programmer in writing correct code.
So in your case(if you have a CustomObject class defined), you can do:
customObj: CustomObject
See this or that for more info.
There's no need to declare new variables in Python. If we're talking about variables in functions or modules, no declaration is needed. Just assign a value to a name where you need it: mymagic = "Magic". Variables in Python can hold values of any type, and you can't restrict that.
Your question specifically asks about classes, objects and instance variables though. The idiomatic way to create instance variables is in the __init__ method and nowhere else — while you could create new instance variables in other methods, or even in unrelated code, it's just a bad idea. It'll make your code hard to reason about or to maintain.
So for example:
class Thing(object):
def __init__(self, magic):
self.magic = magic
Easy. Now instances of this class have a magic attribute:
thingo = Thing("More magic")
# thingo.magic is now "More magic"
Creating variables in the namespace of the class itself leads to different behaviour altogether. It is functionally different, and you should only do it if you have a specific reason to. For example:
class Thing(object):
magic = "Magic"
def __init__(self):
pass
Now try:
thingo = Thing()
Thing.magic = 1
# thingo.magic is now 1
Or:
class Thing(object):
magic = ["More", "magic"]
def __init__(self):
pass
thing1 = Thing()
thing2 = Thing()
thing1.magic.append("here")
# thing1.magic AND thing2.magic is now ["More", "magic", "here"]
This is because the namespace of the class itself is different to the namespace of the objects created from it. I'll leave it to you to research that a bit more.
The take-home message is that idiomatic Python is to (a) initialise object attributes in your __init__ method, and (b) document the behaviour of your class as needed. You don't need to go to the trouble of full-blown Sphinx-level documentation for everything you ever write, but at least some comments about whatever details you or someone else might need to pick it up.
For scoping purpose, I use:
custom_object = None
Variables have scope, so yes it is appropriate to have variables that are specific to your function. You don't always have to be explicit about their definition; usually you can just use them. Only if you want to do something specific to the type of the variable, like append for a list, do you need to define them before you start using them. Typical example of this.
list = []
for i in stuff:
list.append(i)
By the way, this is not really a good way to setup the list. It would be better to say:
list = [i for i in stuff] # list comprehension
...but I digress.
Your other question.
The custom object should be a class itself.
class CustomObject(): # always capitalize the class name...this is not syntax, just style.
pass
customObj = CustomObject()
As of Python 3, you can explicitly declare variables by type.
For instance, to declare an integer one can do it as follows:
x: int = 3
or:
def f(x: int):
return x
see this question for more detailed info about it:
Explicitly declaring a variable type in Python
Wishing to avoid a situation like this:
>>> class Point:
x = 0
y = 0
>>> a = Point()
>>> a.X = 4 #whoops, typo creates new attribute capital x
I created the following object to be used as a superclass:
class StrictObject(object):
def __setattr__(self, item, value):
if item in dir(self):
object.__setattr__(self, item, value)
else:
raise AttributeError("Attribute " + item + " does not exist.")
While this seems to work, the python documentation says of dir():
Note: Because dir() is supplied primarily as a convenience for use at an interactive prompt, it tries to supply an interesting set of names more than it tries to supply a rigorously or consistently defined set of names, and its detailed behavior may change across releases. For example, metaclass attributes are not in the result list when the argument is a class.
Is there a better way to check if an object has an attribute?
Much better ways.
The most common way is "we're all consenting adults". That means, you don't do any checking, and you leave it up to the user. Any checking you do makes the code less flexible in it's use.
But if you really want to do this, there is __slots__ by default in Python 3.x, and for new-style classes in Python 2.x:
By default, instances of both old and new-style classes have a dictionary for attribute storage. This wastes space for objects having very few instance variables. The space consumption can become acute when creating large numbers of instances.
The default can be overridden by defining __slots__ in a new-style class definition. The __slots__ declaration takes a sequence of instance variables and reserves just enough space in each instance to hold a value for each variable. Space is saved because __dict__ is not created for each instance.
Without a __dict__ variable, instances cannot be assigned new variables not listed in the __slots__ definition. Attempts to assign to an unlisted variable name raises AttributeError. If dynamic assignment of new variables is desired, then add '__dict__' to the sequence of strings in the __slots__ declaration.
For example:
class Point(object):
__slots__ = ("x", "y")
point = Point()
point.x = 5 # OK
point.y = 1 # OK
point.X = 4 # AttributeError is raised
And finally, the proper way to check if an object has a certain attribute is not to use dir, but to use the built-in function hasattr(object, name).
I don't think it's a good idea to write code to prevent such errors. These "static" checks should be the job of your IDE. Pylint will warn you about assigning attributes outside of __init__ thus preventing typo errors. It also shows many other problems and potential problems and it can easily be used from PyDev.
In such situation you should look what the python standard library may offer you. Did you consider the namedtuple?
from collections import namedtuple
Point = namedtuple("Point", "x, y")
a = Point(1,3)
print a.x, a.y
Because Point is now immutable your problem just can't happen, but the draw-back is naturally you can't e.g. just add +1 to a, but have to create a complete new Instance.
x,y = a
b = Point(x+1,y)