I have just recently battled a bug in Python. It was one of those silly newbie bugs, but it got me thinking about the mechanisms of Python (I'm a long time C++ programmer, new to Python). I will lay out the buggy code and explain what I did to fix it, and then I have a couple of questions...
The scenario: I have a class called A, that has a dictionary data member, following is its code (this is simplification of course):
class A:
dict1={}
def add_stuff_to_1(self, k, v):
self.dict1[k]=v
def print_stuff(self):
print(self.dict1)
The class using this code is class B:
class B:
def do_something_with_a1(self):
a_instance = A()
a_instance.print_stuff()
a_instance.add_stuff_to_1('a', 1)
a_instance.add_stuff_to_1('b', 2)
a_instance.print_stuff()
def do_something_with_a2(self):
a_instance = A()
a_instance.print_stuff()
a_instance.add_stuff_to_1('c', 1)
a_instance.add_stuff_to_1('d', 2)
a_instance.print_stuff()
def do_something_with_a3(self):
a_instance = A()
a_instance.print_stuff()
a_instance.add_stuff_to_1('e', 1)
a_instance.add_stuff_to_1('f', 2)
a_instance.print_stuff()
def __init__(self):
self.do_something_with_a1()
print("---")
self.do_something_with_a2()
print("---")
self.do_something_with_a3()
Notice that every call to do_something_with_aX() initializes a new "clean" instance of class A, and prints the dictionary before and after the addition.
The bug (in case you haven't figured it out yet):
>>> b_instance = B()
{}
{'a': 1, 'b': 2}
---
{'a': 1, 'b': 2}
{'a': 1, 'c': 1, 'b': 2, 'd': 2}
---
{'a': 1, 'c': 1, 'b': 2, 'd': 2}
{'a': 1, 'c': 1, 'b': 2, 'e': 1, 'd': 2, 'f': 2}
In the second initialization of class A, the dictionaries are not empty, but start with the contents of the last initialization, and so forth. I expected them to start "fresh".
What solves this "bug" is obviously adding:
self.dict1 = {}
In the __init__ constructor of class A. However, that made me wonder:
What is the meaning of the "dict1 = {}" initialization at the point of dict1's declaration (first line in class A)? It is meaningless?
What's the mechanism of instantiation that causes copying the reference from the last initialization?
If I add "self.dict1 = {}" in the constructor (or any other data member), how does it not affect the dictionary member of previously initialized instances?
EDIT: Following the answers I now understand that by declaring a data member and not referring to it in the __init__ or somewhere else as self.dict1, I'm practically defining what's called in C++/Java a static data member. By calling it self.dict1 I'm making it "instance-bound".
What you keep referring to as a bug is the documented, standard behavior of Python classes.
Declaring a dict outside of __init__ as you initially did is declaring a class-level variable. It is only created once at first, whenever you create new objects it will reuse this same dict. To create instance variables, you declare them with self in __init__; its as simple as that.
When you access attribute of instance, say, self.foo, python will first find 'foo' in self.__dict__. If not found, python will find 'foo' in TheClass.__dict__
In your case, dict1 is of class A, not instance.
#Matthew : Please review the difference between a class member and an object member in Object Oriented Programming. This problem happens because of the declaration of the original dict makes it a class member, and not an object member (as was the original poster's intent.) Consequently, it exists once for (is shared accross) all instances of the class (ie once for the class itself, as a member of the class object itself) so the behaviour is perfectly correct.
Pythons class declarations are executed as a code block and any local variable definitions (of which function definitions are a special kind of) are stored in the constructed class instance. Due to the way attribute look up works in Python, if an attribute is not found on the instance the value on the class is used.
The is an interesting article about the class syntax on the history of Python blog.
If this is your code:
class ClassA:
dict1 = {}
a = ClassA()
Then you probably expected this to happen inside Python:
class ClassA:
__defaults__['dict1'] = {}
a = instance(ClassA)
# a bit of pseudo-code here:
for name, value in ClassA.__defaults__:
a.<name> = value
As far as I can tell, that is what happens, except that a dict has its pointer copied, instead of the value, which is the default behaviour everywhere in Python. Look at this code:
a = {}
b = a
a['foo'] = 'bar'
print b
Related
I was looking into how the order in which you declare classes to inherit from affects Method Resolution Order (Detailed Here By Raymond Hettinger). I personally was using this to elegantly create an Ordered Counter via this code:
class OrderedCounter(Counter, OrderedDict):
pass
counts = OrderedCounter([1, 2, 3, 1])
print(*counts.items())
>>> (1, 2) (2, 1) (3, 1)
I was trying to understand why the following didn't work similarly:
class OrderedCounter(OrderedDict, Counter):
pass
counts = OrderedCounter([1, 2, 3, 1])
print(*counts.items())
>>> TypeError: 'int' object is not iterable
While I understand that on a fundamental level this is because the OrderedCounter object is using the OrderedDict.__init__() function in the second example which according to the documentation only accepts "[items]". In the first example however the Counter.__init__() function is used which according to the documentation accepts "[iterable-or-mapping]" thus it can take the list as an input.
I wanted to further understand this interaction specifically though so I went to look at the actual source. When I looked at the OrderedDict.__init__() function I noticed that after some error handling it made a call to self.update(*args, **kwds). However, the code simply has the line update = MutableMapping.update which I can't find much documentation on.
I guess I would just like a more concrete answer as to why the second code block doesn't work.
Note: For context, I have a decent amount of programming experience but I'm new to python and OOP in Python
TLDR: How/Why does the Method Resolution Order interfere with the second code block?
In your second example, class OrderedCounter(OrderedDict, Counter): the object looks in OrderedDict first which uses the update method from MutableMapping.
MutableMapping is an Abstract Base Class in collections._abc. Its update method source is here. You can see that if the other argument is not a mapping it will try to iterate over other unpacking a key and value on each iteration.
for key, value in other:
self[key] = value
If other is a sequence of tuples it would work.
>>> other = ((1,2),(3,4))
>>> for key,value in other:
print(key,value)
1 2
3 4
>>>
But if other is a sequence of single items it will throw the error when it tries to unpack a single value into two names/variables.
>>> other = (1,2,3,4)
>>> for key,value in other:
print(key,value)
Traceback (most recent call last):
File "<pyshell#50>", line 1, in <module>
for key,value in other:
TypeError: cannot unpack non-iterable int object
>>>
Whearas collections.Counter's update method calls a different function if other is not a Mapping.
else:
_count_elements(self, iterable)
_count_elements adds keys for new items (with a count of zero) or adds one to the count of existing keys.
As you probably discovered if a class inherits from two classes it will look in the first class to find an attribute, if it isn't there it will look in the second class.
>>> class A:
def __init__(self):
pass
def f(self):
print('class A')
>>> class B:
def __init__(self):
pass
def f(self):
print('class B')
>>> class C(A,B):
pass
>>> c = C()
>>> c.f()
class A
>>> class D(B,A):
pass
>>> d = D()
>>> d.f()
class B
In mro, children precede their parents and the order of appearance in __bases__ is respected.
In the first example, Counter is a subclass of dict. When OrderedDict is provided along with Counter, the parent dict of Counter is replaced by OrderedDict and the code works seamlessly.
In the second example, OrderedDict is again a subclass of dict. When Counter is provided along with OrderedDict, it tries to replace the parent dict of OrderedDict with Counter, which is counter intuitive (pun intended). Hence the error!!
I hope this layman explaination helps you. Just think about that for a moment.
I found that some classes contain a __init__ function, and some don’t. I’m confused about something described below.
What is the difference between these two pieces of code:
class Test1(object):
i = 1
and
class Test2(object):
def __init__(self):
self.i = 1
I know that the result or any instance created by these two class and the way of getting their instance variable are pretty much the same. But is there any kind of “default” or “hidden” initialization mechanism of Python behind the scene when we don’t define the __init__ function for a class? And why I can’t write the first code in this way:
class Test1(object):
self.i = 1
That’s my questions. Thank you very much!
Thank you very much Antti Haapala! Your answer gives me further understanding of my questions. Now, I understand that they are different in a way that one is a "class variable", and the other is a "instance variable". But, as I tried it further, I got yet another confusing problem.
Here is what it is. I created 2 new classes for understanding what you said:
class Test3(object):
class_variable = [1]
def __init__(self):
self.instance_variable = [2]
class Test4(object):
class_variable = 1
def __init__(self):
self.instance_variable = 2
As you said in the answer to my first questions, I understand the class_variable is a "class variable" general to the class, and should be passed or changed by reference to the same location in the memory. And the instance_variable would be created distinctly for different instances.
But as I tried out, what you said is true for the Test3's instances, they all share the same memory. If I change it in one instance, its value changes wherever I call it.
But that's not true for instances of Test4. Shouldn't the int in the Test4 class also be changed by reference?
i1 = Test3()
i2 = Test3()
>>> i1.i.append(2)
>>> i2.i
[1, 2]
j1 = Test4()
j2 = Test4()
>>> j1.i = 3
>>> j2.i
1
Why is that? Does that "=" create an "instance variable" named "i" without changing the original "Test4.i" by default? Yet the "append" method just handles the "class variable"?
Again, thank you for your exhaustive explanation of the most boring basic concepts to a newbie of Python. I really appreciate that!
In python the instance attributes (such as self.i) are stored in the instance dictionary (i.__dict__). All the variable declarations in the class body are stored as attributes of the class.
Thus
class Test(object):
i = 1
is equivalent to
class Test(object):
pass
Test.i = 1
If no __init__ method is defined, the newly created instance usually starts with an empty instance dictionary, meaning that none of the properties are defined.
Now, when Python does the get attribute (as in print(instance.i) operation, it first looks for the attribute named i that is set on the instance). If that fails, the i attribute is looked up on type(i) instead (that is, the class attribute i).
So you can do things like:
class Test:
i = 1
t = Test()
print(t.i) # prints 1
t.i += 1
print(t.i) # prints 2
but what this actually does is:
>>> class Test(object):
... i = 1
...
>>> t = Test()
>>> t.__dict__
{}
>>> t.i += 1
>>> t.__dict__
{'i': 2}
There is no i attribute on the newly created t at all! Thus in t.i += 1 the .i was looked up in the Test class for reading, but the new value was set into the t.
If you use __init__:
>>> class Test2(object):
... def __init__(self):
... self.i = 1
...
>>> t2 = Test2()
>>> t2.__dict__
{'i': 1}
The newly created instance t2 will already have the attribute set.
Now in the case of immutable value such as int there is not that much difference. But suppose that you used a list:
class ClassHavingAList():
the_list = []
vs
class InstanceHavingAList()
def __init__(self):
self.the_list = []
Now, if you create 2 instances of both:
>>> c1 = ClassHavingAList()
>>> c2 = ClassHavingAList()
>>> i1 = InstanceHavingAList()
>>> i2 = InstanceHavingAList()
>>> c1.the_list is c2.the_list
True
>>> i1.the_list is i2.the_list
False
>>> c1.the_list.append(42)
>>> c2.the_list
[42]
c1.the_list and c2.the_list refer to the exactly same list object in memory, whereas i1.the_list and i2.the_list are distinct. Modifying the c1.the_list looks as if the c2.the_list also changes.
This is because the attribute itself is not set, it is just read. The c1.the_list.append(42) is identical in behaviour to
getattr(c1, 'the_list').append(42)
That is, it only tries read the value of attribute the_list on c1, and if not found there, then look it up in the superclass. The append does not change the attribute, it just changes the value that the attribute points to.
Now if you were to write an example that superficially looks the same:
c1.the_list += [ 42 ]
It would work identical to
original = getattr(c1, 'the_list')
new_value = original + [ 42 ]
setattr(c1, 'the_list', new_value)
And do a completely different thing: first of all the original + [ 42 ] would create a new list object. Then the attribute the_list would be created in c1, and set to point to this new list. That is, in case of instance.attribute, if the attribute is "read from", it can be looked up in the class (or superclass) if not set in the instance, but if it is written to, as in instance.attribute = something, it will always be set on the instance.
As for this:
class Test1(object):
self.i = 1
Such thing does not work in Python, because there is no self defined when the class body (that is all lines of code within the class) is executed - actually, the class is created only after all the code in the class body has been executed. The class body is just like any other piece of code, only the defs and variable assignments will create methods and attributes on the class instead of setting global variables.
I understood my newly added question. Thanks to Antti Haapala.
Now, when Python does the get attribute (as in print(instance.i) operation, it first looks for the attribute named i that is set on the instance). If that fails, the i attribute is looked up on type(i) instead (that is, the class attribute i).
I'm clear about why is:
j1 = Test4()
j2 = Test4()
>>> j1.i = 3
>>> j2.i
1
after few tests. The code
j1.3 = 3
actually creates a new instance variable for j1 without changing the class variable. That's the difference between "=" and methods like "append".
I'm a newbie of Python coming from c++. So, at the first glance, that's weird to me, since I never thought of creating a new instance variable which is not created in the class just using the "=". It's really a big difference between c++ and Python.
Now I got it, thank you all.
I was reading the python descriptors and there was one line there
Python first looks for the member in the instance dictionary. If it's
not found, it looks for it in the class dictionary.
I am really confused what is instance dict and what is class dictionary
Can anyone please explain me with code what is that
I was thinking of them as same
An instance dict holds a reference to all objects and values assigned to the instance, and the class level dict holds all references at the class namespace.
Take the following example:
>>> class A(object):
... def foo(self, bar):
... self.zoo = bar
...
>>> i = A()
>>> i.__dict__ # instance dict is empty
{}
>>> i.foo('hello') # assign a value to an instance
>>> i.__dict__
{'zoo': 'hello'} # this is the instance level dict
>>> i.z = {'another':'dict'}
>>> i.__dict__
{'z': {'another': 'dict'}, 'zoo': 'hello'} # all at instance level
>>> A.__dict__.keys() # at the CLASS level, only holds items in the class's namespace
['__dict__', '__module__', 'foo', '__weakref__', '__doc__']
I think, you can understand with this example.
class Demo(object):
class_dict = {} # Class dict, common for all instances
def __init__(self, d):
self.instance_dict = d # Instance dict, different for each instance
And it's always possible to add instance attribute on the fly like this: -
demo = Demo({1: "demo"})
demo.new_dict = {} # A new instance dictionary defined just for this instance
demo2 = Demo({2: "demo2"}) # This instance only has one instance dictionary defined in `init` method
So, in the above example, demo instance has now 2 instance dictionary - one added outside the class, and one that is added to each instance in __init__ method. Whereas, demo2 instance has just 1 instance dictionary, the one added in __init__ method.
Apart from that, both the instances have a common dictionary - the class dictionary.
Those dicts are the internal way of representing the object or class-wide namespaces.
Suppose we have a class:
class C(object):
def f(self):
print "Hello!"
c = C()
At this point, f is a method defined in the class dict (f in C.__dict__, and C.f is an unbound method in terms of Python 2.7).
c.f() will make the following steps:
look for f in c.__dict__ and fail
look for f in C.__dict__ and succeed
call C.f(c)
Now, let's do a trick:
def f_french():
print "Bonjour!"
c.f = f_french
We've just modified the object's own dict. That means, c.f() will now print Bounjour!. This does not affect the original class behaviour, so that other C's instances will still speak English.
Class dict is shared among all the instances (objects) of the class, while each instance (object) has its own separate copy of instance dict.
You can define attributes separately on a per instance basis rather than for the whole class
For eg.
class A(object):
an_attr = 0
a1 = A()
a2 = A()
a1.another_attr = 1
Now a2 will not have another_attr. That is part of the instance dict rather than the class dict.
Rohit Jain has the simplest python code to explain this quickly. However, understanding the same ideas in Java can be useful, and there is much more information about class and instance variables here
I am trying to add class attributes dynamically, but not at the instance level. E.g. what I can do manually as:
class Foo(object):
a = 1
b = 2
c = 3
I'd like to be able to do with:
class Foo(object):
dct = {'a' : 1, 'b' : 2, 'c' : 3}
for key, val in dct.items():
<update the Foo namespace here>
I'd like to be able to do this without a call to the class from outside the class (so it's portable), or without additional classes/decorators. Is this possible?
Judging from your example code, you want to do this at the same time you create the class. In this case, assuming you're using CPython, you can use locals().
class Foo(object):
locals().update(a=1, b=2, c=3)
This works because while a class is being defined, locals() refers to the class namespace. It's implementation-specific behavior and may not work in later versions of Python or alternative implementations.
A less dirty-hacky version that uses a class factory is shown below. The basic idea is that your dictionary is converted to a class by way of the type() constructor, and this is then used as the base class for your new class. For convenience of defining attributes with a minimum of syntax, I have used the ** convention to accept the attributes.
def dicty(*bases, **attrs):
if not bases:
bases = (object,)
return type("<from dict>", bases, attrs)
class Foo(dicty(a=1, b=2, c=3)):
pass
# if you already have the dict, use unpacking
dct = dict(a=1, b=2, c=3)
class Foo(dicty(**dct)):
pass
This is really just syntactic sugar for calling type() yourself. This works fine, for instance:
class Foo(type("<none>", (object,), dict(a=1, b=2, c=3))):
pass
Do you mean something like this:
def update(obj, dct):
for key, val in dct.items():
obj.setattr(key, val)
Then just go
update(Foo, {'a': 1, 'b': 2, 'c': 3})
This works, because a class is just an object too ;)
If you want to move everything into the class, then try this:
class Foo(object):
__metaclass__ = lambda t, p, a: return type(t, p, a['dct'])
dct = {'a': 1, 'b': 2, 'c': 3}
This will create a new class, with the members in dct, but all other attributes will not be present - so, you want to alter the last argument to type to include the stuff you want. I found out how to do this here: What is a metaclass in Python?
The accepted answer is a nice approach. However, one downside is you end up with an additional parent object in the MRO inheritance chain that isn't really necessary and might even be confusing:
>>> Foo.__mro__
(<class '__main__.Foo'>, <class '__main__.<from dict>'>, <class 'object'>)
Another approach would be to use a decorator. Like so:
def dicty(**attrs):
def decorator(cls):
vars(cls).update(**attrs)
return cls
return decorator
#dicty(**some_class_attr_namespace)
class Foo():
pass
In this way, you avoid an additional object in the inheritance chain. The #decorator syntax is just a pretty way of saying:
Foo = dicty(a=1, b=2, c=3)(Foo)
Is there any way to translate this Java code into Python?
class Foo
{
final static private List<Thingy> thingies =
ImmutableList.of(thing1, thing2, thing3);
}
e.g. thingies is an immutable private list of Thingy objects that belongs to the Foo class rather than its instance.
I know how to define static class variables from this question Static class variables in Python but I don't know how to make them immutable and private.
In Python the convention is to use a _ prefix on attribute names to mean protected and a __ prefix to mean private. This isn't enforced by the language; programmers are expected to know not to write code that relies on data that isn't public.
If you really wanted to enforce immutability, you could use a metaclass[docs] (the class of a class). Just modify __setattr__ and __delattr__ to raise exceptions when someone attempts to modify it, and make it a tuple (an immutable list) [docs].
class FooMeta(type):
"""A type whose .thingies attribute can't be modified."""
def __setattr__(cls, name, value):
if name == "thingies":
raise AttributeError("Cannot modify .thingies")
else:
return type.__setattr__(cls, name, value)
def __delattr__(cls, name):
if name == "thingies":
raise AttributeError("Cannot delete .thingies")
else:
return type.__delattr__(cls, name)
thing1, thing2, thing3 = range(3)
class Foo(object):
__metaclass__ = FooMeta
thingies = (thing1, thing2, thing3)
other = [1, 2, 3]
Examples
print Foo.thingies # prints "(0, 1, 2)"
Foo.thingies = (1, 2) # raises an AttributeError
del Foo.thingies # raise an AttributeError
Foo.other = Foo.other + [4] # no exception
print Foo.other # prints "[1, 2, 3, 4]"
It would still technically be possible to modify these by going through the class's internal .__dict__ of attributes, but this should be enough to deter most users, it's very difficult to entirely secure Python objects.
You can't do either of those things in Python, not in the sense you do them in Java, anyway.
By convention, names prefixed with an underscore are considered private and should not be accessed outside the implementation, but nothing in Python enforces this convention. It's considered more of a warning that you're messing with an implementation detail that may change without warning in a future version of the code.
You can make it un-writeable (subtly different from immutable) by using properties, but there is no way to make it private -- that goes against Python's philosophy.
class Foo(object): # don't need 'object' in Python 3
#property
def thingies(self):
return 'thing1', 'thing2', 'thing3'
f = Foo()
print f.thingies
#('thing1', 'thing2', 'thing3')
f.thingies = 9
#Traceback (most recent call last):
# File "test.py", line 8, in <module>
# f.thingies = 9
#AttributeError: can't set attribute
Whether it's immutable or not depends on what you return; if you return a mutable object you may be able to mutate that and have those changes show up in the instance/class.
class FooMutable(object):
_thingies = [1, 2, 3]
#property
def thingies(self):
return self._thingies
foo = FooMutable()
foo.thingies.append(4)
print foo.thingies
# [1, 2, 3, 4]
This will let you mutate thingies, and because the object returned is the same object kept in the instance/class the changes will be reflected on subsequent access.
Compare that with:
class FooMutable(object):
#property
def thingies(self):
return [1, 2, 3]
foo = FooMutable()
foo.thingies.append(4)
print foo.thingies
# [1, 2, 3]
Because a brand new list is returned each time, changes to it are not reflected in subsequent accesses.
You want to look into the property() function. It allows you to define your own custom Getter and Setter for a member attribute of a class. It might look something like this:
class myClass(object):
_x = "Hard Coded Value"
def set_x(self, val): return
def get_x(self): return self._x
def del_x(self): return
x = property(get_x, set_x, del_x, "I'm an immutable property named 'x'")
I haven't used it enough to be certain whether it can be used to create something "private" so you'd have to delve into that yourself, but isinstance may help.
You can achieve the final part using type hints*. As others have said, __ achieves the private aspect well enough, so
from typing import List
from typing_extensions import Final
class Foo:
__thingies: Final[List[Thingy]] = ImmutableList.of(thing1, thing2, thing3)
I'll leave the definition of ImmutableList to you. A tuple will probably do.
*with the usual caveat that users can ignore them