I was playing around with the python3k interpreter and came across the Decimal class in the decimal.py module. Documentation on the class indicates that objects created from the class are immutable. I tried to look into the python library code to see exactly how the programmers coded the class to make it immutable. Unfortunately, I was unable to discover exactly what they had done (mainly due to the fact that the class itself is quite large).
I am curious to know if anyone understands how this class achieves its immutable state. I am thinking it may have something to do with overriding the getattr() and setattr() methods within the class, but I could not find any evidence of this.
I also noticed that this class is able to hide the attributes it defines in its slots variable somehow. Perhaps the class uses some type of metaclass technique to achieve this as well as its immutability.
If anyone has any ideas I would appreciate your feedback. Thanks
Good question. Doesn't seem invincibly immutable to me. Python 2.7.2:
>>> from decimal import Decimal
>>> d = Decimal('1.23')
>>> d
Decimal('1.23')
>>> f = d
>>> f
Decimal('1.23')
>>> d._exp = 1
>>> f
Decimal('1.23E+3')
I do see that the documentation says it's immutable. I guess they mean if you use the documented interface, it's immutable. E.g.
>>> d += 100
>>> f
Decimal('1.23E+3')
>>> d
Decimal('1330')
The way this works is simply that Decimal does not define operator overloads for ?= operators: functions like __i[op]__. In that case, d += 100 translates to d = d.__add__(100), which returns a new object and changes the identity of d while not affecting the original object.
Related
I was introducing is operator to my students when I noticed that there is an inconsistency in its behavior between python (v3.6 and older) and (v3.7).
Launch a python shell and run:
5/2 is 2.5
Or:
(1, 2, 3) is (1, 2, 3)
in v3.6.X you get False for both, but in v3.7 they turn out to be True.
My expectation was that the result should be True as I thought immutable numeric objects (or a tuple of them) have just one instance.
It seems that at least my thought was not right in the previous versions of Python.
Does anyone know what changes have been made which explains this new behaviour?
I'm not sure about reasons and source for this, but my guess is that this has something to do with in-line optimizations.
If you'll assign variable for this values, identity check will result in False, same as before.
>>> 5/2 is 2.5
True
>>> a = 5/2
>>> a is 2.5
False
Interesting note on new folding optimisation. As python is "all runtime", there's no way to optimize some things ahead, but it tries hard, parsing as much scope as it can:
>>> a = 3.14
>>> b = 3.14
>>> a is b
False
>>> a = 3.14; b = 3.14
>>> a is b
True
My expectation was that the result should be True as I thought immutable numeric objects (or a tuple of them) have just one instance.
That expectation is questionable - there's no such thing guaranteed by the Python language.
is is a quite tricky operator because you really need to know when it's appropriate to use it.
For example:
>>> 5 / 2 is 2.5
>>> (1, 2, 3) is (1, 2, 3)
These are not appropriate uses of is in the general case. They may be appropriate if you want to check what line/function optimizations (interning) Python is doing but I guess that wasn't the desired use-case here.
is should only be used if you want to compare to constants (that are guaranteed to only have one instance)! The guaranteed built-in constants are:
None
NotImplemented
Ellipsis (also known as ...)
True
False
__debug__
Or your own constant-like instances:
_sentinel = object()
def func(a=_sentinel):
return a is _sentinel
Or when you explicitly assign variables to a new name:
a = b
a is b # <- that's expected to be True
Does anyone know what changes have been made which explains this new behaviour?
Probably the peep-hole optimizer now optimizes more cases (tuples and mathematical expressions). For example "AST-level Constant folding" (https://bugs.python.org/issue29469) has been added in CPython 3.7 (I intentionally wrote CPython here because it's nothing that has been added to the Python 3.7 language specification).
I believe that this behavior is due moving the Constant folding from the peephole optimizer (compile time operation) to the new AST optimizer (run time operation), which as it's also mentioned in https://docs.python.org/3/whatsnew/3.7.html#optimizations is now able to perform optimizations more consistently. (Contributed by Eugene Toder and INADA Naoki in bpo-29469 and bpo-11549.)
Re:
My expectation was that the result should be True as I thought immutable numeric objects (or a tuple of them) have just one instance.
Immutability is not strictly the same as having an unchangeable value. Before you call an object mutable or immutable it's an object and objects in Python are created at run time. So there's no reason to connect mutability to object creation and identity. There are, however, some exceptions like this one or small object interning in both previous and current versions which, mostly for the sake of optimization, this rule (object creation at run time) gets manipulated. Read https://stackoverflow.com/a/38189759/2867928 for more details.
Why should immutable objects that are the same occupy the same instance?
When using is in python, you are essentially asking if a and b occupy the same piece in memory. If you think of a and b as immutable literals, it's not like python has a specific space to save every type of immutable literal. It's pure chance that it returned true in this instance and entirely possible it will return false if you choose a different literal. Take a look at this:
>>> a = "wtf"
>>> b = "wtf"
>>> a is b
True
>>> a = "wtf!"
>>> b = "wtf!"
>>> a is b
False
>>> a, b = "wtf!", "wtf!"
>>> a is b
True
If you want to avoid this, do not use is on things you didn't explicitly save into memory.
I know that if you create your own object you can define your own methods on that object.
my_object_instance.mymethod()
I also know you can define infix functions with the infix package.
obj1 |func| obj2
What I want is the ability to define a function which accepts an existing type in postfix notation.
For example given a list l we may want to check if it is sorted. Defining a typical function might give us
if is_sorted(l): #dosomething
but it might be more idiomatic if one could write
if l.is_sorted(): #dosomething
Is this possible without creating a custom type?
The correct way is inheritance, creating a custom type by inheriting list and adding the new functionality. Monkeypatching is not a strength of Python. But since you specifically asked:
Is this possible without creating a custom type?
What kindall mentioned stands, Python does not allow it. But since nothing in the implementation is truly read-only, you can approximate the result by hacking in the class dict.
>>> def is_sorted(my_list):
... return sorted(my_list) == my_list
...
>>> import gc
>>> gc.get_referents(list.__dict__)[0]['is_sorted'] = is_sorted
>>> [1,2,3].is_sorted()
True
>>> [1,3,2].is_sorted()
False
The new "method" will appear in vars(list), the name will be there in dir([]), and it will also be available/usable on instances which were created before the monkeypatch was applied.
This approach uses the garbage collector interface to obtain, via the class mappingproxy, a reference to the underlying dict. And garbage collection by reference counting is a CPython implementation detail. Suffice it to say, this is dangerous/fragile and you should not use it in any serious code.
If you like this kind of feature, you might enjoy ruby as a programming language.
Python does not generally allow monkey-patching of built-in types because the common built-in types aren't written in Python (but rather C) and do not allow the class dictionary to be modified. You have to subclass them to add methods as you want to.
I'm hoping someone can explain why searching a list of object references is so much slower than searching a normal list. This is using the python "in" keyword to search which I thought runs at "C compiler" speed. I thought a list is just an array of object references (pointers) so the search should be extremely fast. Both lists are exactly 412236 bytes in memory.
Normal list (takes 0.000 seconds to search):
alist = ['a' for x in range(100000)]
if 'b' in alist:
print("Found")
List of object references (takes 0.469 !! seconds to search):
class Spam:
pass
spamlist = [Spam() for x in range(100000)]
if Spam() in spamlist:
print("Found")
Edit: So apparently this has something to do with old-style classes having way more overhead than new style classes. My script that was bogging down with only 400 objects can now easily handle up to 10000 objects simply by making all my classes inherit from the "object" class. Just when I thought I knew Python!.
I've read about new-style vs old-style before but it was never mentioned that old-style classes can be up to 100x slower than new style ones. What is the best way to search a list of object instances for a particular instance?
1. Keep using the "in" statement but make sure all classes are new style.
2. Perform some other type of search using the "is" statement like:
[obj for obj in spamlist if obj is target]
3. Some other more Pythonic way?
This is mostly due to the different special method lookup mechanics of old-style classes.
>>> timeit.timeit("Spam() in l", """
... # Old-style
... class Spam: pass
... l = [Spam() for i in xrange(100000)]""", number=10)
3.0454677856675403
>>> timeit.timeit("Spam() in l", """
... # New-style
... class Spam(object): pass
... l = [Spam() for i in xrange(100000)]""", number=10)
0.05137817007346257
>>> timeit.timeit("'a' in l", 'l = ["b" for i in xrange(100000)]', number=10)
0.03013876870841159
As you can see, the version where Spam inherits from object runs much faster, almost as fast as the case with strings.
The in operator for lists uses == to compare items for equality. == is defined to try the objects' __eq__ methods, their __cmp__ methods, and pointer comparison, in that order.
For old-style classes, this is implemented in a straightforward but slow manner. Python has to actually look for the __eq__ and __cmp__ methods in each instance's dict and the dicts of each instance's class and superclasses. __coerce__ gets looked up too, as part of the 3-way compare process. When none of these methods actually exist, that's something like 12 dict lookups just to get to the pointer comparison. There's a bunch of other overhead besides the dict lookups, and I'm not actually sure which aspects of the process are the most time-consuming, but suffice it to say that the procedure is more expensive than it could be.
For built-in types and new-style classes, things are better. First, Python doesn't look for special methods on the instance's dict. This saves some dict lookups and enables the next part. Second, type objects have C-level function pointers corresponding to the Python-level special methods. When a special method is implemented in C or doesn't exist, the corresponding function pointer allows Python to skip the method lookup procedure entirely. This means that in the new-style case, Python can quickly detect that it should skip straight to the pointer comparison.
As for what you should do, I'd recommend using in and new-style classes. If you find that this operation is becoming a bottleneck, but you need old-style classes for backward compatibility, any(x is y for y in l) runs about 20 times faster than x in l:
>>> timeit.timeit('x in l', '''
... class Foo: pass
... x = Foo(); l = [Foo()] * 100000''', number=10)
2.8618816054721936
>>> timeit.timeit('any(x is y for y in l)', '''
... class Foo: pass
... x = Foo(); l = [Foo()] * 100000''', number=10)
0.12331640524583776
This is not the right answer for your question but this will a very good knowledge for who wants to understand how 'in' keywords works under the hood :
ceval sourcecode : ceval.c source code
abstract.c sourcecode : abstract.c source code
From the mail : mail about 'in' keywords
Expalantion from the mail thread:
I'm curious enough about this (OK, I admit it, I like to be right, too
;) to dig in to the details, if anyone is interested...one of the
benefits of Python being open-source is you can find out how it works...
First step, look at the bytecodes:
>>> import dis
>>> def f(x, y):
... return x in y
...
>>> dis.dis(f)
2 0 LOAD_FAST 0 (x)
3 LOAD_FAST 1 (y)
6 COMPARE_OP 6 (in)
9 RETURN_VALUE
So in is implemented as a COMPARE_OP. Looking in ceval.c for
COMPARE_OP, it has some optimizations for a few fast compares, then
calls cmp_outcome() which, for 'in', calls PySequence_Contains().
PySequence_Contains() is implemented in abstract.c. If the container
implements __contains__, that is called, otherwise
_PySequence_IterSearch() is used.
_PySequence_IterSearch() calls PyObject_GetIter() to constuct an
iterator on the sequence, then goes into an infinite loop (for (;;))
calling PyIter_Next() on the iterator until the item is found or the
call to PyIter_Next() returns an error.
PyObject_GetIter() is also in abstract.c. If the object has an
__iter__() method, that is called, otherwise PySeqIter_New() is called
to construct an iterator.
PySeqIter_New() is implemented in iterobject.c. It's next() method is in
iter_iternext(). This method calls __getitem__() on its wrapped object
and increments an index for next time.
So, though the details are complex, I think it is pretty fair to say
that the implementation uses a while loop (in _PySequence_IterSearch())
and a counter (wrapped in PySeqIter_Type) to implement 'in' on a
container that defines __getitem__ but not __iter__.
By the way the implementation of 'for' also calls PyObject_GetIter(), so
it uses the same mechanism to generate an iterator for a sequence that
defines __getitem__().
Python creates one immutable 'a' object and each element in a list points to the same object. Since Spam() is mutable, each instance is a different object, and dereferencing the pointers in spamlist will access many areas in RAM. The performance difference may have something to do with hardware cache hits/misses.
Obviously the performance difference would be even greater if you are including list creation time in your results (instead of just Spam() in spamlist. Also try x = Spam(); x in spamlist to see if that makes a difference.
I am curious how any(imap(equalsFunc, spamlist)) compares.
Using the the test of alist = ['a' for x in range(100000)] can be very misleading because of string interning. Turns out that Python will intern (in most cases) short immutables -- especially strings -- so that they are all the same object.
Demo:
>>> alist=['a' for x in range(100000)]
>>> len(alist)
100000
>>> len({id(x) for x in alist})
1
You can see that while a list of 100000 strings is created it is only comprised of one interned object.
A more fair case would be to use a call to object to guarantee that each is a unique Python object:
>>> olist=[object() for x in range(100000)]
>>> len(olist)
100000
>>> len({id(x) for x in olist})
100000
If you compare the in operator with olist you will find the timing to be similar.
I have a framework with some C-like language. Now I'm re-writing that framework and the language is being replaced with Python.
I need to find appropriate Python replacement for the following code construction:
SomeFunction(&arg1)
What this does is a C-style pass-by-reference so the variable can be changed inside the function call.
My ideas:
just return the value like v = SomeFunction(arg1)
is not so good, because my generic function can have a lot of arguments like SomeFunction(1,2,'qqq','vvv',.... and many more)
and I want to give the user ability to get the value she wants.
Return the collection of all the arguments no matter have they changed or not, like: resulting_list = SomeFunction(1,2,'qqq','vvv',.... and many more) interesting_value = resulting_list[3]
this can be improved by giving names to the values and returning dictionary interesting_value = resulting_list['magic_value1']
It's not good because we have constructions like
DoALotOfStaff( [SomeFunction1(1,2,3,&arg1,'qq',val2),
SomeFunction2(1,&arg2,v1),
AnotherFunction(),
...
], flags1, my_var,... )
And I wouldn't like to load the user with list of list of variables, with names or indexes she(the user) should know. The kind-of-references would be very useful here ...
Final Response
I compiled all the answers with my own ideas and was able to produce the solution. It works.
Usage
SomeFunction(1,12, get.interesting_value)
AnotherFunction(1, get.the_val, 'qq')
Explanation
Anything prepended by get. is kind-of reference, and its value will be filled by the function. There is no need in previous defining of the value.
Limitation - currently I support only numbers and strings, but these are sufficient form my use-case.
Implementation
wrote a Getter class which overrides getattribute and produces any variable on demand
all newly created variables has pointer to their container Getter and support method set(self,value)
when set() is called it checks if the value is int or string and creates object inheriting from int or str accordingly but with addition of the same set() method. With this new object we replace our instance in the Getter container
Thank you everybody. I will mark as "answer" the response which led me on my way, but all of you helped me somehow.
I would say that your best, cleanest, bet would be to construct an object containing the values to be passed and/or modified - this single object can be passed, (and will automatically be passed by reference), in as a single parameter and the members can be modified to return the new values.
This will simplify the code enormously and you can cope with optional parameters, defaults, etc., cleanly.
>>> class C:
... def __init__(self):
... self.a = 1
... self.b = 2
...
>>> c=C
>>> def f(o):
... o.a = 23
...
>>> f(c)
>>> c
<class __main__.C at 0x7f6952c013f8>
>>> c.a
23
>>>
Note
I am sure that you could extend this idea to have a class of parameter that carried immutable and mutable data into your function with fixed member names plus storing the names of the parameters actually passed then on return map the mutable values back into the caller parameter name. This technique could then be wrapped into a decorator.
I have to say that it sounds like a lot of work compared to re-factoring your existing code to a more object oriented design.
This is how Python works already:
def func(arg):
arg += ['bar']
arg = ['foo']
func(arg)
print arg
Here, the change to arg automatically propagates back to the caller.
For this to work, you have to be careful to modify the arguments in place instead of re-binding them to new objects. Consider the following:
def func(arg):
arg = arg + ['bar']
arg = ['foo']
func(arg)
print arg
Here, func rebinds arg to refer to a brand new list and the caller's arg remains unchanged.
Python doesn't come with this sort of thing built in. You could make your own class which provides this behavior, but it will only support a slightly more awkward syntax where the caller would construct an instance of that class (equivalent to a pointer in C) before calling your functions. It's probably not worth it. I'd return a "named tuple" (look it up) instead--I'm not sure any of the other ways are really better, and some of them are more complex.
There is a major inconsistency here. The drawbacks you're describing against the proposed solutions are related to such subtle rules of good design, that your question becomes invalid. The whole problem lies in the fact that your function violates the Single Responsibility Principle and other guidelines related to it (function shouldn't have more than 2-3 arguments, etc.). There is really no smart compromise here:
either you accept one of the proposed solutions (i.e. Steve Barnes's answer concerning your own wrappers or John Zwinck's answer concerning usage of named tuples) and refrain from focusing on good design subtleties (as your whole design is bad anyway at the moment)
or you fix the design. Then your current problem will disappear as you won't have the God Objects/Functions (the name of the function in your example - DoALotOfStuff really speaks for itself) to deal with anymore.
example:
a_list = [1, 2, 3]
a_list.len() # doesn't work
len(a_list) # works
Python being (very) object oriented, I don't understand why the 'len' function isn't inherited by the object.
Plus I keep trying the wrong solution since it appears as the logical one to me
Guido's explanation is here:
First of all, I chose len(x) over x.len() for HCI reasons (def __len__() came much later). There are two intertwined reasons actually, both HCI:
(a) For some operations, prefix notation just reads better than postfix — prefix (and infix!) operations have a long tradition in mathematics which likes notations where the visuals help the mathematician thinking about a problem. Compare the easy with which we rewrite a formula like x*(a+b) into x*a + x*b to the clumsiness of doing the same thing using a raw OO notation.
(b) When I read code that says len(x) I know that it is asking for the length of something. This tells me two things: the result is an integer, and the argument is some kind of container. To the contrary, when I read x.len(), I have to already know that x is some kind of container implementing an interface or inheriting from a class that has a standard len(). Witness the confusion we occasionally have when a class that is not implementing a mapping has a get() or keys() method, or something that isn’t a file has a write() method.
Saying the same thing in another way, I see ‘len‘ as a built-in operation. I’d hate to lose that. /…/
The short answer: 1) backwards compatibility and 2) there's not enough of a difference for it to really matter. For a more detailed explanation, read on.
The idiomatic Python approach to such operations is special methods which aren't intended to be called directly. For example, to make x + y work for your own class, you write a __add__ method. To make sure that int(spam) properly converts your custom class, write a __int__ method. To make sure that len(foo) does something sensible, write a __len__ method.
This is how things have always been with Python, and I think it makes a lot of sense for some things. In particular, this seems like a sensible way to implement operator overloading. As for the rest, different languages disagree; in Ruby you'd convert something to an integer by calling spam.to_i directly instead of saying int(spam).
You're right that Python is an extremely object-oriented language and that having to call an external function on an object to get its length seems odd. On the other hand, len(silly_walks) isn't any more onerous than silly_walks.len(), and Guido has said that he actually prefers it (http://mail.python.org/pipermail/python-3000/2006-November/004643.html).
It just isn't.
You can, however, do:
>>> [1,2,3].__len__()
3
Adding a __len__() method to a class is what makes the len() magic work.
This way fits in better with the rest of the language. The convention in python is that you add __foo__ special methods to objects to make them have certain capabilities (rather than e.g. deriving from a specific base class). For example, an object is
callable if it has a __call__ method
iterable if it has an __iter__ method,
supports access with [] if it has __getitem__ and __setitem__.
...
One of these special methods is __len__ which makes it have a length accessible with len().
Maybe you're looking for __len__. If that method exists, then len(a) calls it:
>>> class Spam:
... def __len__(self): return 3
...
>>> s = Spam()
>>> len(s)
3
Well, there actually is a length method, it is just hidden:
>>> a_list = [1, 2, 3]
>>> a_list.__len__()
3
The len() built-in function appears to be simply a wrapper for a call to the hidden len() method of the object.
Not sure why they made the decision to implement things this way though.
there is some good info below on why certain things are functions and other are methods. It does indeed cause some inconsistencies in the language.
http://mail.python.org/pipermail/python-dev/2008-January/076612.html