Python C API: omitted variable assignment causes unexpected behaviour

Python C API: omitted variable assignment causes unexpected behaviour - python

While using python with pyroot (a python interface to a CERN data analysis package named ROOT), I encountered the following strange behaviour:
print ROOT.TFile(fname).GetListOfKeys()
outputs None while the seemingly semantically equivalent code
f=ROOT.TFile(fname)
print f.GetListOfKeys()
outputs the expected <ROOT.THashList object ("THashList") at 0x13f0fa0>.
While this is hardly the first bug I have encountered while working with ROOT, this time I am quite puzzled that python allows this bug to happen.
I reckon that somehow, the reference count for the TFile object gets wrong in the first example, and that it gets deleted before GetListOfKeys is actually called. (After setting ROOT.TFile.__del__ to be some print command, this is indeed what happens.)
The way I see it, after ROOT.TFile(fname) gets executed, but before GetListOfKeys() is called, the pointer to the TFile object is on the stack. Therefore, the reference count should not be zero and the destructor should not be called until GetListOfKeys() returns.
Can anyone shed some light on why this happens?
On a related note, is there a way to disable python from ever deling my objects implicitly just because the reference count becomes zero? I tried gc.disable(), and it did not change the results. Is there any more elegant solution than appending the objects to some globally defined write-only list?

Related

LLDB Python scripting create variable

I am using LLDB Python scripting support to add custom Variable Formatting for a complex C++ class type in XCode.
This is working well for simple situations, but I have hit a wall when I need to call a method which uses a pass-by-reference parameter, which it populates with results. This would require me to create a variable to pass here, but I can't find a way to do this?
I have tried using the target's CreateValueFromData method, as below, but this doesn't seem to work.
import lldb
def MyClass(valobj, internal_dict):
class2_type = valobj.target.FindFirstType('class2')
process = valobj.process
class2Data = [0]
data = lldb.SBData.CreateDataFromUInt32Array(process.GetByteOrder(), process.GetAddressByteSize(), class2Data)
valobj.target.CreateValueFromData("testClass2", data, class2_type)
valobj.EvaluateExpression("getType(testClass2)")
class2Val = valobj.frame.FindVariable("testClass2")
if not class2Val.error.success:
return class2Val.error.description
return class2Val.GetValueAsUnsigned()
Is there some way to be able to achieve what I'm trying to do?

SBValue names are just labels for the SBValue, they aren't guaranteed to exist as symbols in the target. For instance if the value you are formatting is an ivar of some other object, it's name will be the ivar name... And lldb does not inject new SBValue's names into the symbol table - that would end up causing lots of name collisions. So they don't exist in the namespace the expression evaluator queries when looking up names.
If the variable you are formatting is a pointer, you can get the pointer value and cons up an expression that casts the pointer value to the appropriate type for your getType function, and pass that to your function. If the value is not a pointer, you can still use SBValue.AddressOf to get the memory location of the value. If the value exists only in lldb (AddressOf will return an invalid address) then you would have to push it to the target with SBProcess.AllocateMemory/WriteMemory, but that should only happen if you have another data formatter that makes these objects out of whole cloth for its own purposes.
It's better not to call functions in formatters if you can help it. But if you really must call a function in your data formatter, you should to do that judiciously.
They can cause performance problems (if you have an array of 100 elements of this type, your formatter will require 100 function calls in the target to render the array... That's 200 context switches between your process and the debugger, plus a bunch of memory reads and writes) for every step operation.
Also, since you can't ensure that the data in your value is correct (it might represent a variable that has not been initialized yet, or already deallocated) you either need to have your function handle bad data, or at least be prepared for the expression to crash. lldb can clean up the stack and suppress the exception from crashes, but it can't undo any side-effects the expression might have had before crashing.
For instance, if the function you called took some lock before crashing that it was expecting to release on the way out, your formatter will damage the state of the program. So you have to be careful what you call...
And by default, EvaluateExpression will allow all threads to run so that expressions don't deadlock against a lock held by another thread. You probably don't want that to happen, since that means looking at the locals of one thread will "change" the state of another thread. So you really should only call functions you are sure don't take locks. And use the version of EvaluateExpression that takes an SBExpressionOption, in which you set the SBExpressionOptions.StopOthers to True, and SetTryAllThreads to False.

Why are python generator frames' (gi_frame) f_back attribute always none?

The title is pretty self-explanatory. I'm doing something like:
gen = obj #some generator instance running
frame = obj.gi_frame
prevframe = frame.f_back
But I always get None for prevframe. Why is this the case. Also, is there some workaround for this?
CONTEXT: I'm trying to write a simple call stack method to determine what called a particular function. I'm using twisted manhole and telnetting into a running process, where I then execute these commands but I can't seem to access the previous frames.

To the best of my knowledge, this is both intentional and cannot be worked around. The code in cpython responsible for it is here, which indicates that the reference to the previous frame is broken as soon as the generator yields (or excepts out) in order to prevent issues with reference counting. It also appears that the intended behavior is that the generator's previous frame is swapped out every time it's entered, so while it's not running, the notion of "the parent frame" doesn't make much sense.
The correct way to do this, at least in the post-mortem context, is to use traceback objects, which have their frame lists linked in the reverse order, tb_next instead of f_back.

How to delete a function argument early?

I'm writing a function which takes a huge argument, and runs for a long time. It needs the argument only halfway. Is there a way for the function to delete the value pointed to by the argument if there are no more references to it?
I was able to get it deleted as soon as the function returns, like this:
def f(m):
print 'S1'
m = None
#__import__('gc').collect() # Uncommenting this doesn't help.
print 'S2'
class M(object):
def __del__(self):
print '__del__'
f(M())
This prints:
S1
S2
__del__
I need:
S1
__del__
S2
I was also trying def f(*args): and def f(**kwargs), but it didn't help, I still get __del__ last.
Please note that my code is relying on the fact that Python has reference counting, and __del__ gets called as soon as an object's reference count drops to zero. I want the reference count of a function argument drop to zero in the middle of a function. Is this possible?
Please note that I know of a workaround: passing a list of arguments:
def f(ms):
print 'S1'
del ms[:]
print 'S2'
class M(object):
def __del__(self):
print '__del__'
f([M()])
This prints:
S1
__del__
S2
Is there a way to get the early deletion without changing the API (e.g. introducing lists to the arguments)?
If it's hard to get a portable solution which works in many Python implementations, I need something which works in the most recent CPython 2.7. It doesn't have to be documented.

From the documentation:
CPython implementation detail: CPython currently uses a reference-counting scheme with (optional) delayed detection of cyclically linked garbage, which collects most objects as soon as they become unreachable, but is not guaranteed to collect garbage containing circular references. See the documentation of the gc module for information on controlling the collection of cyclic garbage. Other implementations act differently and CPython may change. Do not depend on immediate finalization of objects when they become unreachable (ex: always close files).
Short of modifying the interpreter yourself, you cannot achieve what you want. __del__ will be called when the interpreter decides to do it.

It look like it's not possible to do the early deletion in CPython 2.7 without changing the API of the f function.

Temporary variables in Mathematica

I have written a package for Mathematica called MathOO. In short, it allows you to use object orientation in Mathematica just like you do in Python. Please read the following article in Voofie/MathOO for details:
MathOO: Adding Python style Object Orientation to Mathematica with MathOO (1.0 beta launch) [Alternative to Objectica]
The problem I encountered is that, I would like to have garbage collector, so that user don't have to explicitly deleting the object after using it. For instance:
NewClass[Object1]
Object1.$init$[self_]:= Return[];
In the above two lines, I just defined Object1 to be a new class, and the constructor to be an empty function. If you are familiar with Python, you should see the similarity with __init__().
To instantiate an Object1, I do:
object1 = new[Object1][]
The output is:
Out: object$13
Here, object$13 is an temporary variable. What I want is, when there are no references to this temporary variable, it should be deleted automatically. But it doesn't work as expected. I have identified the problem to be the following:
In: y = Module[{x}, x[1] = 2; x]
Out: x$117
In: FullDefinition[y]
Out: y = x$117
Attributes[x$117] = {Temporary}
x$117[1] = 2
Since y holds a reference of x$117, so x$117 is not removed yet. Now let's delete the reference by setting the value of y to 1:
In: y = 1;
However, x$117 is still here:
In: Definition[x$117]
Out: Attributes[x$117] = {Temporary}
x$117[1] = 2
But I expected the variable to be removed since it is no longer referenced. From the manual of Mathematica, it said:
Temporary symbols are removed if they are no longer referenced:
So, is it a bug of Mathematica? Or is there any workaround methods? I am using Mathematica 7.0. Thank you very much.

Mathematica really does garbage collects Temporary variables when they have no more references. That said, there's two reasons that your x$117 is not garbage collected.
Remember that Module uses lexical scoping, so the module variables are only "local" in the sense that they are give a unique name "var$modnum" and the Temporary Attribute.
Since you gave your x a DownValue, it must be cleared before x can be garbage collected.
Your y was set to be the temporary variable x$... and the output was assigned to Out[]. So you also need to clear the history: Unprotect[In, Out]; Clear[In, Out]; Protect[In, Out];.
Then your Module example seems to be properly garbage collected.
When using your MathOO package (that I downloaded yesterday, but haven't played with yet) maybe you can just set the $HistoryLength to some finite number. And recommend that users suppress the output of instantiations object1 = new[Object1][];

Mathematica is a string rewriting system (at the bottom) (sort of) (not really) (but really) (ANYWAY...) The DownValue "x$117[1] = 2" is a string rewriting rule that it is not entirely inaccurate to imagine is an entry in an associative array. The array is named "x$117" and the entry is the pair {1,2}. As long as there is an entry in the array, the symbol "x$117" is referenced and will not be GCed by Mma.
Your best bet is to Remove[] symbols when they are destructed or go out of scope. (Clear[] is insufficient since lingering attributes, messages, or defaults associated with symbols are not eliminated by Clear[] and so Mma will still hold live references to the symbol.)

Python garbage collection

I have created some python code which creates an object in a loop, and in every iteration overwrites this object with a new one of the same type. This is done 10.000 times, and Python takes up 7mb of memory every second until my 3gb RAM is used. Does anyone know of a way to remove the objects from memory?

I think this is circular reference (though the question isn't explicit about this information.)
One way to solve this problem is to manually invoke garbage collection. When you manually run garbage collector, it will sweep circular referenced objects too.
import gc
for i in xrange(10000):
j = myObj()
processObj(j)
#assuming count reference is not zero but still
#object won't remain usable after the iteration
if !(i%100):
gc.collect()
Here don't run garbage collector too often because it has its own overhead, e.g. if you run garbage collector in every loop, interpretation will become extremely slow.

You haven't provided enough information - this depends on the specifics of the object you are creating and what else you're doing with it in the loop. If the object does not create circular references, it should be deallocated on the next iteration. For example, the code
for x in range(100000):
obj = " " * 10000000
will not result in ever-increasing memory allocation.

This is an old error that was corrected for some types in python 2.5. What was happening was that python was not so good at collecting things like empty lists/dictionaries/tupes/floats/ints. In python 2.5 this was fixed...mostly. However floats and ints are singletons for comparisons so once one of those is created it stays around as long as the interpreter is alive. I've been bitten by this worst when dealing with large amount of floats since they have a nasty habit of being unique. This was characterized for python 2.4 and updated about it being folded into python 2.5
The best way I've found around it is to upgrade to python 2.5 or newer to take care of the lists/dictionaries/tuples issue. For numbers the only solution is to not let large amounts of numbers get into python. I've done it with my own wrapper to a c++ object, but I have the impression that numpy.array will give similar results.
As a post script I have no idea what has happened to this in python 3, but I'm suspicious that numbers are still part of a singleton. So the memory leak is actually a feature of the language.

If you're creating circular references, your objects won't be deallocated immediately, but have to wait for a GC cycle to run.
You could use the weakref module to address this problem, or explicitly del your objects after use.

I found that in my case (with Python 2.5.1), with circular references involving classes that have __del__() methods, not only was garbage collection not happening in a timely manner, the __del__() methods of my objects were never getting called, even when the script exited. So I used weakref to break the circular references and all was well.
Kudos to Miles who provided all the information in his comments for me to put this together.

Here's one thing you can do at the REPL to force a dereferencing of a variable:
>>> x = 5
>>> x
5
>>> del x
>>> x
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'x' is not defined

weakref can be used for circular object structured code as in the explained example

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python C API: omitted variable assignment causes unexpected behaviour - python

Related

LLDB Python scripting create variable

Why are python generator frames' (gi_frame) f_back attribute always none?

How to delete a function argument early?

Temporary variables in Mathematica

Python garbage collection

Categories

Resources