About Python dynamic instantiation - python

I am trying to figure out what's happening here. I want to keep a map (aka dict) of string keys and class values, in order to create new instances at runtime. I omitted the Farm class definition which is not important here.
Well, given the following code:
d = dict(farm = Farm)
# Dynamic instantiation with assignment
f1 = d["farm"]()
f2 = d["farm"]()
print(f1)
print(f2)
# Dynamic instantiation without assignment
print(d["farm"]())
print(d["farm"]())
I get the next output:
C:\Python3\python.exe E:/Programacion/Python/PythonGame/Prueba.py
<BuildingManager.Farm object at 0x00F7B330>
<BuildingManager.Farm object at 0x00F7B730>
<BuildingManager.Farm object at 0x00F7BAD0>
<BuildingManager.Farm object at 0x00F7BAD0>
Process finished with exit code 0
Note that when I print them without being assigned, the ref is the same (0x00F7BAD0).
Why does instantiation in Python always return the same object?

Why does instantiation in Python always return the same object?
It doesn't. Look again at the IDs returned by your output:
Only the last one is recycled. And it's still not the same object.
So why are two those last IDs the same, but the first two are different?
In the first two cases you assign to a variable. That variable is kept around for the full execution of your program. Thus, each of the two object is unique, and remains unique.
Then, there is the third instantiation (the first print statement). This object is created, printed, but never assigned to any variable. Thus, after printing, Python can forget about it. And it does.
In the last instantiation (second print statement), Python creates a new Farm instance, but assigns it the same ID as the one that is not kept around (number 3). That is just convenience, and under the hood, this is probably efficient as well (the memory space is available.)
Thus, you see a recycled ID, even though it is in fact a new instance.

Python didn't return the same object, it returned a new object that just happened to be created to same address as the previous one. When print(d["farm"]()) is executed new object will be created and it's address is printed. Since there are no references to it it's available for garbage collection as soon as print returns. When second print(d["farm"]()) is executed it just happens to create the object to same address. Note that this won't happen when you assign the return value to a variable since object can't be garbage collected as long as there are references to it.

Related

Why are references to python values, that are function parameters, stored on the stack(frame) in CPython?

Python works with reference counting. That means, if there is no more reference to a value, then the memory of that value is recycled. Or in other words. As long as there is at least one remaining reference, the obj is not deleted and the memory is not released.
Lets consider the following example:
def myfn():
result = work_with(BigObj()) # reference 1 to BigObj is on the stack frame.
# Not yet counting any
# reference inside of work_with function
# after work_with returns: The stack frame
# and reference 1 are deleted. memory of BigObj
# is released
return result
def work_with(big_obj): # here we have another reference to BigObj
big_obj = None # let's assume, that need more memory and we don't
# need big_obj any_more
# the reference inside work_with is deleted. However,
# there is still the reference on the stack. So the
# memory is not released until work_with returns
other_big_obj = BigObj() # we need the memory for another BigObj -> we may run
# out of memory here
So my question is:
Why does CPython hold an additional reference to values which are passed to functions on the stack? Is there any special purpose behind this or is it just an "unlucky" implementation detail?
My first thought on this is:
To prevent the reference count from dropping to zero. However, we have still an alive reference inside the called function. So this does not make any sense to me.
It is the way CPython passes parameters to a function. The frame holds a reference to its argument to allow passing temporary objects. And the frame is destroyed only when the function returns, so all parameters get an additional reference during the function call.
This is the reason why the doc for sys.getrefcount says:
The count returned is generally one higher than you might expect, because it includes the (temporary) reference as an argument to getrefcount().
In fact, in the callee, the reference to the arguments is known to be a borrowed reference, meaning that the callee never has to decrement it. So when you set it to None it will not destroy the object.
A different implementation would be possible, where the callee should decrement the reference to its arguments. The benefit would be that it would allow immediate destruction of temporaries. But the drawback would be that the callee should explicitely decrement the reference count of all its parameters. At C level, ref counting is already tedious, and I assume that Python implementers made that choice for simplicity.
By the way, it only matters when you pass a large temporary object to a function which is not the most common use case.
TL/DR: IMHO there is no real rationale for preventing a function to immediately destroy a temporary, it is just a consequence of the general implementation of functions in CPython.

Applying `id` to a method is always different in IPython. Can someone explain this odd behavior?

I ran across a Python behavior that seems very strange to me, and I would like to understand it.
Normally I expect the id function to always return the same value when I pass the same object to it. In CPython it corresponds to the location of the object in memory.
When I create an object and apply id the result is always the same, but when I use id on a bound method of the object the result changes. Why is this? Is a new method being created each time I get the method attribute?
I first noticed this in IPython, but it was more difficult to make a script that shows the same behavior. Maybe this is partially an IPython thing?
I did manage to write a small block that partially recreates the behavior.
# Create an object
class Foo(object):
def bar(self):
pass
obj = Foo()
for _ in range(10):
print(id(obj))
# ... prints the same number
for _ in range(10):
print(id(obj.bar))
# ... in this case the first number is different and the rest are the same
This is slightly different than just pasting the line print(id(obj.bar)) into IPython a bunch of times because the returned ids are mostly consistent. However, when I just run the above code as a python script, all the numbers are the same, so it seems that this is an IPython quirk. I guess the question now is: why?
Every time you retrieve a method from a class instance, you get a bound method which will fill in the instance as the first parameter (self) when called. A new bound method is created every time. However, in your test, only one bound method exists at a time; the previous becomes eligible for garbage collection before you create the next one. It is therefore likely (but by no means guaranteed) that the new bound method will be allocated at the same address as the one just freed, and therefore will have the same id. If you collected them all in a list, so that they all existed at the same time, they would definitely have distinct ids.
The bound method is not the same as unbound class function:
In [539]: Foo.bar
Out[539]: <function __main__.Foo.bar>
In [540]: id(Foo.bar)
Out[540]: 2951600788
In [541]: obj=Foo()
In [542]: obj.bar
Out[542]: <bound method Foo.bar of <__main__.Foo object at 0xaf63c0cc>>
In [543]: id(obj.bar)
Out[543]: 2942557836
In [544]: obj1=Foo()
In [545]: id(obj1.bar) # different obj, different bound method
Out[545]: 2996305612
In [546]: id(obj.bar) # different from the previous time
Out[546]: 2942663116
So it is creating a new bound method each time you reference it.
All the bound methods link to the same unbound method, Foo.bar:
In [549]: obj.bar.__func__
Out[549]: <function __main__.Foo.bar>
In [550]: id(obj.bar.__func__)
Out[550]: 2951600788
In [551]: id(obj1.bar.__func__)
Out[551]: 2951600788
It is because python creates objects in the run time. When you start your script, object is created once and you can see the same number in each loop iterations. When you use this print in separate entries in IPython new objects are created.

Reusing names in Python to save memory

The Hitchhiker’s Guide to Python states that it's good to reuse variable names.
foo = Spam()
bar = foo.eggs()
And I agree w/ it. It makes code readable.
What if the variable is 40 MB of data? Will it copy itself and have 80 MB at total?
foo = buffer # 40 MB.
bar = foo.resize((50, 50)) # +40?
I know that memory will be released when function will be executed, but I still don't think that it's a good idea to have like two times higher memory usage at one state of app only because of readability. It's like a special case, but on the other hand, special cases aren't special enough, huh?
Python assignment just copy the reference value to the target object. There is no data copying. Python variable is just the name in a Python system dictionary plus the value which is the reference value to the object.
Actually, you should be careful with assignments. Any Python assignment means sharing the reference value. Python assignment never means copying the target object. When working with immutable objects like strings or numbers, no problem can appear. However, when you assign any mutable object (a list, a dictionary, a set, some user object), you should know that after that you are only giving the target object a different name (access via another copy of the reference value).
The same holds for passing the object as a function/method argument.
If you absolutely must have that data entirely in memory before you resize it (rather than only reading in the bits you care about), you can do this:
foo = buffer()
bar = foo.resize((50, 50))
del foo
or equivalently:
bar = buffer().resize((50, 50))
both of these make the result of buffer immediately available for garbage collection as soon as this code is run.
Also, it can be perfectly reasonable to reuse the variable name in this case - if the lines are immediately one after the other in your code, and especially if foo.resize returns the same type of object as foo (as it seems to), then:
foo = buffer
foo = foo.resize((50, 50))
is perfectly fine. The advice is to not reuse the name for a completely unrelated variable - so that a person reading your code can see the variable and just skip up to wherever it was first assigned to understand what it is. When one of them is just a one-off "stepping stone" to get the actual object you care about, there's only a trivial risk of confusing a reader.

How to create dangling pointer (in stack or heap) in python

I was wondering is there any way to create a dangling pointers in python? I guess we have to manually delete an object for example and then the reference of that object will point at a location that has no meaning for the program.
I found this example Here
import weakref
class Object:
pass
o = Object() #new instance
print ("o id is:",id(o))
r = weakref.ref(o)
print ("r id is:",id(r))
o2 = r()
print ("o2 id is:",id(o2))
print ("r() id is:",id(r()))
print (o is o2)
del o,o2
print (r(),r) #If the referent no longer exists, calling the reference object returns None
o = r() # r is a weak reference object
if o is None:
# referent has been garbage collected
print ("Object has been deallocated; can't frobnicate.")
else:
print ("Object is still live!")
o.do_something_useful()
In this example which one is the dangling pointer/reference? Is it o or r? I am confused.
Is it also possible to create dangling pointers in stack? If you please, give me some simple examples so i can understand how it goes.
Thanks in advance.
All Python objects live on the heap. The stack is only used for function calls.
Calling a weakref object dereferences it and gives you a strong reference to the object, if the object is still around. Otherwise, you get None. In the latter case, you might call the weakref "dangling" (r in your example).
However, Python does not have any notion of a "dangling pointer" in the same way that C does. It's not possible (barring a bug in Python, a buggy extension module or misuse of a module like ctypes) to create a name (strong reference) that refers to a deleted object, because by definition strong references keep their referents alive. On the other hand, weak references are not really dangling pointers, since they are automatically resolved to None if their referents are deleted.
Note that with ctypes abuse it is possible to create a "real" dangling pointer:
import ctypes
a = (1, 2, 3)
ctypes.pythonapi.Py_DecRef(ctypes.py_object(a))
print a
What happens when you print a is now undefined. It might crash the interpreter, print (1, 2, 3), print other tuples, or execute a random function. Of course, this is only possible because you abused ctypes; it's not something that you should ever do.
Barring a bug in Python or an extension, there is no way to refer to a deallocated object. Weak references refer to the object as long as it is alive, while not contributing to keeping it alive. The moment the object is deallocated, the weak reference evaluates to None, so you never get the dangling object. (Even the callback of the weak reference is called after the object has already been deallocated and the weakref dereferences to None, so you cannot resurrect it, either.)
If you could refer to a real deallocated object, Python would most likely crash on first access, because the memory previously held by the object would be reused and the object's type and other slots would contain garbage. Python objects are never allocated on the stack.
If you have a use case why you need to make use of a dangling object, you should present the use case in the form of a question.
If you create a weak reference, it becomes "dangling" when the referenced object is deleted (when it's reference count reaches zero, or is part of a closed cycle of objects not referenced by anything else). This is possible because weakref doesn't increase the reference count itself (that's the whole point of a weak reference).
When this happens, everytime you try to "dereference" the weakref object (call it), it returns None.
It is important to remember that in Python variables are actually names, pointing at objects. They are actually "strong references". Example:
import weakref
class A:
pass
# a new object is created, and the name "x" is set to reference the object,
# giving a reference count of 1
x = A()
# a weak reference is created referencing the object that the name x references
# the reference count is still 1 though, because x is still the only strong
# reference
weak_reference = weakref.ref(x)
# the only strong reference to the object is deleted (x), reducing the reference
# count to 0 this means that the object is destroyed, and at this point
# "weak_reference" becomes dangling, and calls return None
del x
assert weak_reference() is None

How are closures implemented?

"Learning Python, 4th Ed." mentions that:
the enclosing scope variable is looked up when the nested functions
are later called..
However, I thought that when a function exits, all of its local references disappear.
def makeActions():
acts = []
for i in range(5): # Tries to remember each i
acts.append(lambda x: i ** x) # All remember same last i!
return acts
makeActions()[n] is the same for every n because the variable i is somehow looked up at call time. How does Python look up this variable? Shouldn't it not exist at all because makeActions has already exited? Why doesn't Python do what the code intuitively suggests, and define each function by replacing i with its current value within the for loop as the loop is running?
I think it's pretty obvious what happens when you think of i as a name not some sort of value. Your lambda function does something like "take x: look up the value of i, calculate i**x" ... so when you actually run the function, it looks up i just then so i is 4.
You can also use the current number, but you have to make Python bind it to another name:
def makeActions():
def make_lambda( j ):
return lambda x: j * x # the j here is still a name, but now it wont change anymore
acts = []
for i in range(5):
# now you're pushing the current i as a value to another scope and
# bind it there, under a new name
acts.append(make_lambda(i))
return acts
It might seem confusing, because you often get taught that a variable and it's value are the same thing -- which is true, but only in languages that actually use variables. Python has no variables, but names instead.
About your comment, actually i can illustrate the point a bit better:
i = 5
myList = [i, i, i]
i = 6
print(myList) # myList is still [5, 5, 5].
You said you changed i to 6, that is not what actually happend: i=6 means "i have a value, 6 and i want to name it i". The fact that you already used i as a name matters nothing to Python, it will just reassign the name, not change it's value (that only works with variables).
You could say that in myList = [i, i, i], whatever value i currently points to (the number 5) gets three new names: mylist[0], mylist[1], mylist[2]. That's the same thing that happens when you call a function: The arguments are given new names. But that is probably going against any intuition about lists ...
This can explain the behavior in the example: You assign mylist[0]=5, mylist[1]=5, mylist[2]=5 - no wonder they don't change when you reassign the i. If i was something muteable, for example a list, then changing i would reflect on all entries in myList too, because you just have different names for the same value!
The simple fact that you can use mylist[0] on the left hand of a = proves that it is indeed a name. I like to call = the assign name operator: It takes a name on the left, and a expression on the right, then evaluates the expression (call function, look up the values behind names) until it has a value and finally gives the name to the value. It does not change anything.
For Marks comment about compiling functions:
Well, references (and pointers) only make sense when we have some sort of addressable memory. The values are stored somewhere in memory and references lead you that place. Using a reference means going to that place in memory and doing something with it. The problem is that none of these concepts are used by Python!
The Python VM has no concept of memory - values float somewhere in space and names are little tags connected to them (by a little red string). Names and values exist in separate worlds!
This makes a big difference when you compile a function. If you have references, you know the memory location of the object you refer to. Then you can simply replace then reference with this location.
Names on the other hand have no location, so what you have to do (during runtime) is follow that little red string and use whatever is on the other end. That is the way Python compiles functions: Where
ever there is a name in the code, it adds a instruction that will figure out what that name stands for.
So basically Python does fully compile functions, but names are compiled as lookups in the nesting namespaces, not as some sort of reference to memory.
When you use a name, the Python compiler will try to figure out where to which namespace it belongs to. This results in a instruction to load that name from the namespace it found.
Which brings you back to your original problem: In lambda x:x**i, the i is compiled as a lookup in the makeActions namespace (because i was used there). Python has no idea, nor does it care about the value behind it (it does not even have to be a valid name). One that code runs the i gets looked up in it's original namespace and gives the more or less expected value.
What happens when you create a closure:
The closure is constructed with a pointer to the frame (or roughly, block) that it was created in: in this case, the for block.
The closure actually assumes shared ownership of that frame, by incrementing the frame's ref count and stashing the pointer to that frame in the closure. That frame, in turn, keeps around references to the frames it was enclosed in, for variables that were captured further up the stack.
The value of i in that frame keeps changing as long as the for loop is running – each assignment to i updates the binding of i in that frame.
Once the for loop exits, the frame is popped off the stack, but it isn't thrown away as it might usually be! Instead, it's kept around because the closure's reference to the frame is still active. At this point, though, the value of i is no longer updated.
When the closure is invoked, it picks up whatever value of i is in the parent frame at the time of invocation. Since in the for loop you create closures, but don't actually invoke them, the value of i upon invocation will be the last value it had after all the looping was done.
Future calls to makeActions will create different frames. You won't reuse the for loop's previous frame, or update that previous frame's i value, in that case.
In short: frames are garbage-collected just like other Python objects, and in this case, an extra reference is kept around to the frame corresponding to the for block so it doesn't get destroyed when the for loop goes out of scope.
To get the effect you want, you need to have a new frame created for each value of i you want to capture, and each lambda needs to be created with a reference to that new frame. You won't get that from the for block itself, but you could get that from a call to a helper function which will establish the new frame. See THC4k's answer for one possible solution along these lines.
The local references persist because they're contained in the local scope, which the closure keeps a reference to.
I thought that when a function exits, all of its local references disappear.
Except for those locals which are closed over in a closure. Those do not disappear, even when the function to which they are local has returned.
Intuitively one might think i would be captured in its current state but that is not the case. Think of each layer as a dictionary of name value pairs.
Level 1:
acts
i
Level 2:
x
Every time you create a closure for the inner lambda you are capturing a reference to level one. I can only assume that the run-time will perform a look-up of the variable i, starting in level 2 and making its way to level 1. Since you are not executing these functions immediately they will all use the final value of i.
Experts?

Categories