How to delete a function argument early? - python

I'm writing a function which takes a huge argument, and runs for a long time. It needs the argument only halfway. Is there a way for the function to delete the value pointed to by the argument if there are no more references to it?
I was able to get it deleted as soon as the function returns, like this:
def f(m):
print 'S1'
m = None
#__import__('gc').collect() # Uncommenting this doesn't help.
print 'S2'
class M(object):
def __del__(self):
print '__del__'
f(M())
This prints:
S1
S2
__del__
I need:
S1
__del__
S2
I was also trying def f(*args): and def f(**kwargs), but it didn't help, I still get __del__ last.
Please note that my code is relying on the fact that Python has reference counting, and __del__ gets called as soon as an object's reference count drops to zero. I want the reference count of a function argument drop to zero in the middle of a function. Is this possible?
Please note that I know of a workaround: passing a list of arguments:
def f(ms):
print 'S1'
del ms[:]
print 'S2'
class M(object):
def __del__(self):
print '__del__'
f([M()])
This prints:
S1
__del__
S2
Is there a way to get the early deletion without changing the API (e.g. introducing lists to the arguments)?
If it's hard to get a portable solution which works in many Python implementations, I need something which works in the most recent CPython 2.7. It doesn't have to be documented.

From the documentation:
CPython implementation detail: CPython currently uses a reference-counting scheme with (optional) delayed detection of cyclically linked garbage, which collects most objects as soon as they become unreachable, but is not guaranteed to collect garbage containing circular references. See the documentation of the gc module for information on controlling the collection of cyclic garbage. Other implementations act differently and CPython may change. Do not depend on immediate finalization of objects when they become unreachable (ex: always close files).
Short of modifying the interpreter yourself, you cannot achieve what you want. __del__ will be called when the interpreter decides to do it.

It look like it's not possible to do the early deletion in CPython 2.7 without changing the API of the f function.

Related

Why are references to python values, that are function parameters, stored on the stack(frame) in CPython?

Python works with reference counting. That means, if there is no more reference to a value, then the memory of that value is recycled. Or in other words. As long as there is at least one remaining reference, the obj is not deleted and the memory is not released.
Lets consider the following example:
def myfn():
result = work_with(BigObj()) # reference 1 to BigObj is on the stack frame.
# Not yet counting any
# reference inside of work_with function
# after work_with returns: The stack frame
# and reference 1 are deleted. memory of BigObj
# is released
return result
def work_with(big_obj): # here we have another reference to BigObj
big_obj = None # let's assume, that need more memory and we don't
# need big_obj any_more
# the reference inside work_with is deleted. However,
# there is still the reference on the stack. So the
# memory is not released until work_with returns
other_big_obj = BigObj() # we need the memory for another BigObj -> we may run
# out of memory here
So my question is:
Why does CPython hold an additional reference to values which are passed to functions on the stack? Is there any special purpose behind this or is it just an "unlucky" implementation detail?
My first thought on this is:
To prevent the reference count from dropping to zero. However, we have still an alive reference inside the called function. So this does not make any sense to me.
It is the way CPython passes parameters to a function. The frame holds a reference to its argument to allow passing temporary objects. And the frame is destroyed only when the function returns, so all parameters get an additional reference during the function call.
This is the reason why the doc for sys.getrefcount says:
The count returned is generally one higher than you might expect, because it includes the (temporary) reference as an argument to getrefcount().
In fact, in the callee, the reference to the arguments is known to be a borrowed reference, meaning that the callee never has to decrement it. So when you set it to None it will not destroy the object.
A different implementation would be possible, where the callee should decrement the reference to its arguments. The benefit would be that it would allow immediate destruction of temporaries. But the drawback would be that the callee should explicitely decrement the reference count of all its parameters. At C level, ref counting is already tedious, and I assume that Python implementers made that choice for simplicity.
By the way, it only matters when you pass a large temporary object to a function which is not the most common use case.
TL/DR: IMHO there is no real rationale for preventing a function to immediately destroy a temporary, it is just a consequence of the general implementation of functions in CPython.

python - gc unreachable when reload()

I have this code, save as so.py:
import gc
gc.set_debug(gc.DEBUG_STATS|gc.DEBUG_LEAK)
class GUI():
#########################################
def set_func(self):
self.functions = {}
self.functions[100] = self.userInput
#########################################
def userInput(self):
a = 1
g = GUI()
g.set_func()
print gc.collect()
print gc.garbage
And this is the output:
I have two questions:
Why gc.collect() does not reports unreachable when first time import? Instead it reports unreachable only when reload().
Is there any quick way to fix this function mapping circular reference, i.e self.functions[100] = self.userInput ? Because my old project have a lot of this function mapping circular reference and i'm looking for a quick way/one line to change this codes. Currently what i do is "del g.functions" for all this functions at the end.
The first time you import the module nothing is being collected because you have a reference to the so module and all other objects are referenced by it, so they are all alive and the garbage collector has nothing to collect.
When you reload(so) what happens is that the module is reexecuted, overriding all previous references and thus now the old values don't have any reference anymore.
You do have a reference cycle in:
self.functions[100] = self.userInput
since self.userInput is a bound method it has a reference to self. So now self has a reference to the functions dictionary which has a reference to the userInput bound method which has a reference to self and the gc will collect those objects.
It depends by what you are trying to do. From your code is not clear how you are using that self.functions dictionary and depending on that different options may be viable.
The simplest way to break the cycle is to simply not create the self.functions attribute, but pass the dictionary around explicitly.
If self.functions only references bound methods you could store the name of the methods instead of the method itself:
self.functions[100] = self.userInput.__name__
and then you can call the method doing:
getattr(self, self.functions[100])()
or you can do:
from operator import methodcaller
call_method = methodcaller(self.functions[100])
call_method(self) # calls self.userInput()
I don't really understand what do you mean by "Currently what i do is del g.functions for all this functions at the end." Which functions are you talking about?
Also, is this really a problem? Are you experience a real memory leak?
Note that the garbage collector reports the objects as unreachable not as uncollectable. This means that the objects are freed even if they are part of a reference cycle. So no memory leak should happen.
In fact adding del g.functions is useless because the objects are going to be freed anyway, so the one line fix is to simply remove all those del statements, since they don't do anything at all.
The fact that they are put into gc.garbage is because gc.DEBUG_LEAK implies the flag GC.DEBUG_SAVEALL which makes the collector put all unreachable objects into the garbage and not just the uncollectable ones.
The nature of reload is that the module is re-executed. The new definitions supersede the old ones, so the old values become unreachable. By contrast, on the first import, there are no superseded definitions, so naturally there is nothing to become unreachable.
One way is to pass the functions object as a parameter to set_func, and do not assign it as an instance attribute. This will break the cycle while still allowing you to pass the functions object to where it's needed.

python creates everything from heap?

in c/c++, you have variables in stack when you create a local variable inside a function.
http://effbot.org/zone/call-by-object.htm
CLU objects exist independently of procedure activations. Space
for objects is allocated from a dynamic storage area /.../ In
theory, all objects continue to exist forever. In practice, the
space used by an object may be reclaimed when the object isno
longer accessible to any CLU program.
Does this mean objects in python is created from heap(as in malloc in c/c++)? and the objects are deallocated when there 's no name associated with them?(like smart pointers)?
Example:
def foo(a):
result = []
result.append(a)
return result
foo("hello")
myList = foo("bye")
So the first result([]) was created in the heap and got deallocated because there's no name associated with it?
Yes, all Python objects live on the heap (at least on CPython.) They are reference-counted: they are de-allocated when the last reference to the object disappear. (CPython also has a garbage collector to break cycles.)
In CPython your first list disappears as soon as the function returns since you did not bind the return value to a name and the reference count dropped to zero. In other implementation the object may live longer until the garbage-collector kicks in.
Some objects (like open files) have resources attached that are automatically freed when the object is deallocated, but because of the above it is not recommended to rely on this. Resources should be closed explicitly when you are done with them.
Yes, all values in CPython are allocated on the heap and reference-counted to know when to deallocate them. Unlike in C, there is no way to know in most cases if a value will outlive its function, so the only safe thing to do is to heap-allocate everything.
Certainly you could do some analysis and determine that certain values are never passed to functions and thus couldn't escape, but that's of limited use in Python and the extra overhead probably wouldn't be worth it.
As a supplement to the other answers, here's one way to track when garbage-collection happens, using the special method __del__:
class Test(object):
def __init__(self, name):
self.name = name
def __del__(self):
print "deleting {0}".format(self.name)
print "discarded instance creation"
Test("hello")
print "saved instance creation"
myList = Test("bye")
print "program done"
Output:
discarded instance creation
deleting hello
saved instance creation
program done
deleting bye
For more in-depth data, see the gc module.

Are there stack based variables in Python?

If I do this:
def foo():
a = SomeObject()
Is 'a' destroyed immediately after leaving foo? Or does it wait for some GC to happen?
Yes and no. The object will get destroyed after you leave foo (as long as nothing else has a reference to it), but whether it is immediate or not is an implementation detail, and will vary.
In CPython (the standard python implementation), refcounting is used, so the item will immediately be destroyed. There are some exceptions to this, such as when the object contains cyclical references, or when references are held to the enclosing frame (eg. an exception is raised that retains a reference to the frame's variables.)
In implmentations like Jython or IronPython however, the object won't be finalised until the garbage collector kicks in.
As such, you shouldn't rely on timely finalisation of objects, but should only assume that it will be destroyed at some point after the last reference goes. When you do need some cleanup to be done based on the lexical scope, either explicitely call a cleanup method, or look at the new with statement in python 2.6 (available in 2.5 with "from __future__ import with_statement").

Bad Practice to run code in constructor thats likely to fail?

my question is rather a design question.
In Python, if code in your "constructor" fails, the object ends up not being defined. Thus:
someInstance = MyClass("test123") #lets say that constructor throws an exception
someInstance.doSomething() # will fail, name someInstance not defined.
I do have a situation though, where a lot of code copying would occur if i remove the error-prone code from my constructor. Basically my constructor fills a few attributes (via IO, where a lot can go wrong) that can be accessed with various getters. If I remove the code from the contructor, i'd have 10 getters with copy paste code something like :
is attribute really set?
do some IO actions to fill the attribute
return the contents of the variable in question
I dislike that, because all my getters would contain a lot of code. Instead of that I perform my IO operations in a central location, the constructor, and fill all my attributes.
Whats a proper way of doing this?
There is a difference between a constructor in C++ and an __init__ method
in Python. In C++, the task of a constructor is to construct an object. If it fails,
no destructor is called. Therefore if any resources were acquired before an
exception was thrown, the cleanup should be done before exiting the constructor.
Thus, some prefer two-phase construction with most of the construction done
outside the constructor (ugh).
Python has a much cleaner two-phase construction (construct, then
initialize). However, many people confuse an __init__ method (initializer)
with a constructor. The actual constructor in Python is called __new__.
Unlike in C++, it does not take an instance, but
returns one. The task of __init__ is to initialize the created instance.
If an exception is raised in __init__, the destructor __del__ (if any)
will be called as expected, because the object was already created (even though it was not properly initialized) by the time __init__ was called.
Answering your question:
In Python, if code in your
"constructor" fails, the object ends
up not being defined.
That's not precisely true. If __init__ raises an exception, the object is
created but not initialized properly (e.g., some attributes are not
assigned). But at the time that it's raised, you probably don't have any references to
this object, so the fact that the attributes are not assigned doesn't matter. Only the destructor (if any) needs to check whether the attributes actually exist.
Whats a proper way of doing this?
In Python, initialize objects in __init__ and don't worry about exceptions.
In C++, use RAII.
Update [about resource management]:
In garbage collected languages, if you are dealing with resources, especially limited ones such as database connections, it's better not to release them in the destructor.
This is because objects are destroyed in a non-deterministic way, and if you happen
to have a loop of references (which is not always easy to tell), and at least one of the objects in the loop has a destructor defined, they will never be destroyed.
Garbage collected languages have other means of dealing with resources. In Python, it's a with statement.
In C++ at least, there is nothing wrong with putting failure-prone code in the constructor - you simply throw an exception if an error occurs. If the code is needed to properly construct the object, there reallyb is no alternative (although you can abstract the code into subfunctions, or better into the constructors of subobjects). Worst practice is to half-construct the object and then expect the user to call other functions to complete the construction somehow.
It is not bad practice per se.
But I think you may be after a something different here. In your example the doSomething() method will not be called when the MyClass constructor fails. Try the following code:
class MyClass:
def __init__(self, s):
print s
raise Exception("Exception")
def doSomething(self):
print "doSomething"
try:
someInstance = MyClass("test123")
someInstance.doSomething()
except:
print "except"
It should print:
test123
except
For your software design you could ask the following questions:
What should the scope of the someInstance variable be? Who are its users? What are their requirements?
Where and how should the error be handled for the case that one of your 10 values is not available?
Should all 10 values be cached at construction time or cached one-by-one when they are needed the first time?
Can the I/O code be refactored into a helper method, so that doing something similiar 10 times does not result in code repetition?
...
I'm not a Python developer, but in general, it's best to avoid complex/error-prone operations in your constructor. One way around this would be to put a "LoadFromFile" or "Init" method in your class to populate the object from an external source. This load/init method must then be called separately after constructing the object.
One common pattern is two-phase construction, also suggested by Andy White.
First phase: Regular constructor.
Second phase: Operations that can fail.
Integration of the two: Add a factory method to do both phases and make the constructor protected/private to prevent instantation outside the factory method.
Oh, and I'm neither a Python developer.
If the code to initialise the various values is really extensive enough that copying it is undesirable (which it sounds like it is in your case) I would personally opt for putting the required initialisation into a private method, adding a flag to indicate whether the initialisation has taken place, and making all accessors call the initialisation method if it has not initialised yet.
In threaded scenarios you may have to add extra protection in case initialisation is only allowed to occur once for valid semantics (which may or may not be the case since you are dealing with a file).
Again, I've got little experience with Python, however in C# its better to try and avoid having a constructor that throws an exception. An example of why that springs to mind is if you want to place your constructor at a point where its not possible to surround it with a try {} catch {} block, for example initialisation of a field in a class:
class MyClass
{
MySecondClass = new MySecondClass();
// Rest of class
}
If the constructor of MySecondClass throws an exception that you wish to handle inside MyClass then you need to refactor the above - its certainly not the end of the world, but a nice-to-have.
In this case my approach would probably be to move the failure-prone initialisation logic into an initialisation method, and have the getters call that initialisation method before returning any values.
As an optimisation you should have the getter (or the initialisation method) set some sort of "IsInitialised" boolean to true, to indicate that the (potentially costly) initialisation does not need to be done again.
In pseudo-code (C# because I'll just mess up the syntax of Python):
class MyClass
{
private bool IsInitialised = false;
private string myString;
public void Init()
{
// Put initialisation code here
this.IsInitialised = true;
}
public string MyString
{
get
{
if (!this.IsInitialised)
{
this.Init();
}
return myString;
}
}
}
This is of course not thread-safe, but I don't think multithreading is used that commonly in python so this is probably a non-issue for you.
seems Neil had a good point: my friend just pointed me to this:
http://en.wikipedia.org/wiki/Resource_Acquisition_Is_Initialization
which is basically what Neil said...

Categories