Reusing names in Python to save memory

Reusing names in Python to save memory - python

The Hitchhiker’s Guide to Python states that it's good to reuse variable names.
foo = Spam()
bar = foo.eggs()
And I agree w/ it. It makes code readable.
What if the variable is 40 MB of data? Will it copy itself and have 80 MB at total?
foo = buffer # 40 MB.
bar = foo.resize((50, 50)) # +40?
I know that memory will be released when function will be executed, but I still don't think that it's a good idea to have like two times higher memory usage at one state of app only because of readability. It's like a special case, but on the other hand, special cases aren't special enough, huh?

Python assignment just copy the reference value to the target object. There is no data copying. Python variable is just the name in a Python system dictionary plus the value which is the reference value to the object.
Actually, you should be careful with assignments. Any Python assignment means sharing the reference value. Python assignment never means copying the target object. When working with immutable objects like strings or numbers, no problem can appear. However, when you assign any mutable object (a list, a dictionary, a set, some user object), you should know that after that you are only giving the target object a different name (access via another copy of the reference value).
The same holds for passing the object as a function/method argument.

If you absolutely must have that data entirely in memory before you resize it (rather than only reading in the bits you care about), you can do this:
foo = buffer()
bar = foo.resize((50, 50))
del foo
or equivalently:
bar = buffer().resize((50, 50))
both of these make the result of buffer immediately available for garbage collection as soon as this code is run.
Also, it can be perfectly reasonable to reuse the variable name in this case - if the lines are immediately one after the other in your code, and especially if foo.resize returns the same type of object as foo (as it seems to), then:
foo = buffer
foo = foo.resize((50, 50))
is perfectly fine. The advice is to not reuse the name for a completely unrelated variable - so that a person reading your code can see the variable and just skip up to wherever it was first assigned to understand what it is. When one of them is just a one-off "stepping stone" to get the actual object you care about, there's only a trivial risk of confusing a reader.

Related

Do multiple immutable objects having the same value point to a single object in memory?

Lets say a = 10000000000 and b = 10000000000 i.e. both a and b have the same value.
When I print id() of a and b it always remains same no matter how many times I run the code.
Also, it remains same for float, string, boolean and tuple but does not remain same for lists, sets and dictionaries.
Does that mean when multiple variables (immutable types) have the exact same value it always point to a single object in memory and hence a is b will always return True, whereas multiple variables of mutable type having the same value point to its unique object in memory and hence a is b will always return False?

...it always point...
In general yes, but it is not guaranteed. It is a form of Python internal optimization known as type kerning.
You should look at it like something that does not matter for immutables, something transparent for the language user. If the object has a value that cannot change, it does not matter what instance of the objects of that type (and with that value) you are reading. That is why you can live with having only one.
As for the tuples, note that the contained objects can change, only the tuple cannot (that is, change the number of its elements).
So for immutables you do not have to worry.
For mutables, you should be careful, not with Python internal optimizations but with the code you write. Because you can have many names referring to the same instance (that now can be changed through any one of these references) and one change will be reflected in all of them. This is more tricky when passing mutables as arguments, because far away code can change the object (what was passed was a copy of the reference to the object, not a copy of the object itself).
It is your responsability to manage things with mutables. You can create new instances with the same values (copies) or share the objects. You can even pass copies as arguments to protect yourself from unintended side effects of calls.

About Python dynamic instantiation

I am trying to figure out what's happening here. I want to keep a map (aka dict) of string keys and class values, in order to create new instances at runtime. I omitted the Farm class definition which is not important here.
Well, given the following code:
d = dict(farm = Farm)
# Dynamic instantiation with assignment
f1 = d["farm"]()
f2 = d["farm"]()
print(f1)
print(f2)
# Dynamic instantiation without assignment
print(d["farm"]())
print(d["farm"]())
I get the next output:
C:\Python3\python.exe E:/Programacion/Python/PythonGame/Prueba.py
<BuildingManager.Farm object at 0x00F7B330>
<BuildingManager.Farm object at 0x00F7B730>
<BuildingManager.Farm object at 0x00F7BAD0>
<BuildingManager.Farm object at 0x00F7BAD0>
Process finished with exit code 0
Note that when I print them without being assigned, the ref is the same (0x00F7BAD0).
Why does instantiation in Python always return the same object?

Why does instantiation in Python always return the same object?
It doesn't. Look again at the IDs returned by your output:
Only the last one is recycled. And it's still not the same object.
So why are two those last IDs the same, but the first two are different?
In the first two cases you assign to a variable. That variable is kept around for the full execution of your program. Thus, each of the two object is unique, and remains unique.
Then, there is the third instantiation (the first print statement). This object is created, printed, but never assigned to any variable. Thus, after printing, Python can forget about it. And it does.
In the last instantiation (second print statement), Python creates a new Farm instance, but assigns it the same ID as the one that is not kept around (number 3). That is just convenience, and under the hood, this is probably efficient as well (the memory space is available.)
Thus, you see a recycled ID, even though it is in fact a new instance.

Python didn't return the same object, it returned a new object that just happened to be created to same address as the previous one. When print(d["farm"]()) is executed new object will be created and it's address is printed. Since there are no references to it it's available for garbage collection as soon as print returns. When second print(d["farm"]()) is executed it just happens to create the object to same address. Note that this won't happen when you assign the return value to a variable since object can't be garbage collected as long as there are references to it.

Are numbers considered objects in python?

I am aware that numeric values are immutable in python. I have also read how everything is an object in python. I just want to know if numeric types are also objects in python. Because if they are objects, then the variables are actually reference variables right? Does it mean that if I pass a number to a function and modify it inside a function, then two number objects with two references are created? Is there a concept of primitive data types in python?
Note: I too was thinking it as objects. But visualizing in python tutor says differnt:
http://www.pythontutor.com/visualize.html#mode=edit
def test(a):
a+=10
b=100
test(b)
Or is it a defect in the visualization tool?

Are numeric types objects?
>>> isinstance(1, object)
True
Apparently they are. :-).
Note that you might need to adjust your mental model of an object a little. It seems to me that you're thinking of object as something that is "mutable" -- that isn't the case. In reality, we need to think of python names as a reference to an object. That object may hold references to other objects.
name = something
Here, the right hand side is evaluated -- All the names are resolved into objects and the result of the expression (an object) is referenced by "name".
Ok, now lets consider what happens when you pass something to a function.
def foo(x):
x = 2
z = 3
foo(z)
print(z)
What do we expect to happen here? Well, first we create the function foo. Next, we create the object 3 and reference it by the name z. After that, we look up the value that z references and pass that value to foo. Upon entering foo, that value gets referenced by the (local) name x. We then create the object 2 and reference it by the local name x. Note, x has nothing to do with the global z -- They're independent references. Just because they were referencing the same object when you enter the function doesn't mean that they have to reference the function for all time. We can change what a name references at any point by using an assignment statement.
Note, your example with += may seem to complicate things, but you can think of a += 10 as a = a + 10 if it helps in this context. For more information on += check out: When is "i += x" different from "i = i + x" in Python?

Everything in Python is an object, and that includes the numbers. There are no "primitive" types, only built-in types.
Numbers, however, are immutable. When you perform an operation with a number, you are creating a new number object.

memory management with objects and lists in python

I am trying to understand how exactly assignment operators, constructors and parameters passed in functions work in python specifically with lists and objects. I have a class with a list as a parameter. I want to initialize it to an empty list and then want to populate it using the constructor. I am not quite sure how to do it.
Lets say my class is --
class A:
List = [] # Point 1
def __init1__(self, begin=[]): # Point 2
for item in begin:
self.List.append(item)
def __init2__(self, begin): # Point 3
List = begin
def __init3__(self, begin=[]): # Point 4
List = list()
for item in begin:
self.List.append(item)
listObj = A()
del(listObj)
b = listObj
I have the following questions. It will be awesome if someone could clarify what happens in each case --
Is declaring an empty like in Point 1 valid? What is created? A variable pointing to NULL?
Which of Point 2 and Point 3 are valid constructors? In Point 3 I am guessing that a new copy of the list passed in (begin) is not made and instead the variable List will be pointing to the pointer "begin". Is a new copy of the list made if I use the constructor as in Point 2?
What happens when I delete the object using del? Is the list deleted as well or do I have to call del on the List before calling del on the containing object? I know Python uses GC but if I am concerned about cleaning unused memory even before GC kicks in is it worth it?
Also assigning an object of type A to another only makes the second one point to the first right? If so how do I do a deep copy? Is there a feature to overload operators? I know python is probably much simpler than this and hence the question.
EDIT:
5. I just realized that using Point 2 and Point 3 does not make a difference. The items from the list begin are only copied by reference and a new copy is not made. To do that I have to create a new list using list(). This makes sense after I see it I guess.
Thanks!

In order:
using this form is simply syntactic sugar for calling the list constructor - i.e. you are creating a new (empty) list. This will be bound to the class itself (is a static field) and will be the same for all instances.
apart from the constructor name which must always be init, both are valid forms, but mean different things.
The first constructor can be called with a list as argument or without. If it is called without arguments, the empty list passed as default is used within (this empty list is created once during class definition, and not once per constructor call), so no items are added to the static list.
The second must be called with a list parameter, or python will complain with an error, but using it without the self. prefix like you are doing, it would just create a new local variable name List, accessible only within the constructor, and leave the static A.List variable unchanged.
Deleting will only unlink a reference to the object, without actually deleting anything. Once all references are removed, however, the garbage collector is free to clear the memory as needed.
It is usually a bad idea to try to control the garbage collector. instead. just make sure you don't hold references to objects you no longer need and let it make its work.
Assigning a variable with an object will only create a new reference to the same object, yes. To create a deep copy use the related functions or write your own.
Operator overloading (use with care, it can make things more confusing instead of clearer if misused) can be done by overriding some special methods in the class definition.
About your edit: like i pointed above, when writing List=list() inside the constructor, without the self. (or better, since the variable is static, A.) prefix, you are just creating an empty variable, and not overriding the one you defined in the class body.
For reference, the usual way to handle a list as default argument is by using a None placeholder:
class A(object):
def __init__(self, arg=None):
self.startvalue = list(arg) if arg is not None else list()
# making a defensive copy of arg to keep the original intact
As an aside, do take a look at the python tutorial. It is very well written and easy to follow and understand.

"It will be awesome if someone could clarify what happens in each case" isn't that the purpose of the dis module ?
http://docs.python.org/2/library/dis.html

How are closures implemented?

"Learning Python, 4th Ed." mentions that:
the enclosing scope variable is looked up when the nested functions
are later called..
However, I thought that when a function exits, all of its local references disappear.
def makeActions():
acts = []
for i in range(5): # Tries to remember each i
acts.append(lambda x: i ** x) # All remember same last i!
return acts
makeActions()[n] is the same for every n because the variable i is somehow looked up at call time. How does Python look up this variable? Shouldn't it not exist at all because makeActions has already exited? Why doesn't Python do what the code intuitively suggests, and define each function by replacing i with its current value within the for loop as the loop is running?

I think it's pretty obvious what happens when you think of i as a name not some sort of value. Your lambda function does something like "take x: look up the value of i, calculate i**x" ... so when you actually run the function, it looks up i just then so i is 4.
You can also use the current number, but you have to make Python bind it to another name:
def makeActions():
def make_lambda( j ):
return lambda x: j * x # the j here is still a name, but now it wont change anymore
acts = []
for i in range(5):
# now you're pushing the current i as a value to another scope and
# bind it there, under a new name
acts.append(make_lambda(i))
return acts
It might seem confusing, because you often get taught that a variable and it's value are the same thing -- which is true, but only in languages that actually use variables. Python has no variables, but names instead.
About your comment, actually i can illustrate the point a bit better:
i = 5
myList = [i, i, i]
i = 6
print(myList) # myList is still [5, 5, 5].
You said you changed i to 6, that is not what actually happend: i=6 means "i have a value, 6 and i want to name it i". The fact that you already used i as a name matters nothing to Python, it will just reassign the name, not change it's value (that only works with variables).
You could say that in myList = [i, i, i], whatever value i currently points to (the number 5) gets three new names: mylist[0], mylist[1], mylist[2]. That's the same thing that happens when you call a function: The arguments are given new names. But that is probably going against any intuition about lists ...
This can explain the behavior in the example: You assign mylist[0]=5, mylist[1]=5, mylist[2]=5 - no wonder they don't change when you reassign the i. If i was something muteable, for example a list, then changing i would reflect on all entries in myList too, because you just have different names for the same value!
The simple fact that you can use mylist[0] on the left hand of a = proves that it is indeed a name. I like to call = the assign name operator: It takes a name on the left, and a expression on the right, then evaluates the expression (call function, look up the values behind names) until it has a value and finally gives the name to the value. It does not change anything.
For Marks comment about compiling functions:
Well, references (and pointers) only make sense when we have some sort of addressable memory. The values are stored somewhere in memory and references lead you that place. Using a reference means going to that place in memory and doing something with it. The problem is that none of these concepts are used by Python!
The Python VM has no concept of memory - values float somewhere in space and names are little tags connected to them (by a little red string). Names and values exist in separate worlds!
This makes a big difference when you compile a function. If you have references, you know the memory location of the object you refer to. Then you can simply replace then reference with this location.
Names on the other hand have no location, so what you have to do (during runtime) is follow that little red string and use whatever is on the other end. That is the way Python compiles functions: Where
ever there is a name in the code, it adds a instruction that will figure out what that name stands for.
So basically Python does fully compile functions, but names are compiled as lookups in the nesting namespaces, not as some sort of reference to memory.
When you use a name, the Python compiler will try to figure out where to which namespace it belongs to. This results in a instruction to load that name from the namespace it found.
Which brings you back to your original problem: In lambda x:x**i, the i is compiled as a lookup in the makeActions namespace (because i was used there). Python has no idea, nor does it care about the value behind it (it does not even have to be a valid name). One that code runs the i gets looked up in it's original namespace and gives the more or less expected value.

What happens when you create a closure:
The closure is constructed with a pointer to the frame (or roughly, block) that it was created in: in this case, the for block.
The closure actually assumes shared ownership of that frame, by incrementing the frame's ref count and stashing the pointer to that frame in the closure. That frame, in turn, keeps around references to the frames it was enclosed in, for variables that were captured further up the stack.
The value of i in that frame keeps changing as long as the for loop is running – each assignment to i updates the binding of i in that frame.
Once the for loop exits, the frame is popped off the stack, but it isn't thrown away as it might usually be! Instead, it's kept around because the closure's reference to the frame is still active. At this point, though, the value of i is no longer updated.
When the closure is invoked, it picks up whatever value of i is in the parent frame at the time of invocation. Since in the for loop you create closures, but don't actually invoke them, the value of i upon invocation will be the last value it had after all the looping was done.
Future calls to makeActions will create different frames. You won't reuse the for loop's previous frame, or update that previous frame's i value, in that case.
In short: frames are garbage-collected just like other Python objects, and in this case, an extra reference is kept around to the frame corresponding to the for block so it doesn't get destroyed when the for loop goes out of scope.
To get the effect you want, you need to have a new frame created for each value of i you want to capture, and each lambda needs to be created with a reference to that new frame. You won't get that from the for block itself, but you could get that from a call to a helper function which will establish the new frame. See THC4k's answer for one possible solution along these lines.

The local references persist because they're contained in the local scope, which the closure keeps a reference to.

I thought that when a function exits, all of its local references disappear.
Except for those locals which are closed over in a closure. Those do not disappear, even when the function to which they are local has returned.

Intuitively one might think i would be captured in its current state but that is not the case. Think of each layer as a dictionary of name value pairs.
Level 1:
acts
i
Level 2:
x
Every time you create a closure for the inner lambda you are capturing a reference to level one. I can only assume that the run-time will perform a look-up of the variable i, starting in level 2 and making its way to level 1. Since you are not executing these functions immediately they will all use the final value of i.
Experts?

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.