I have written a package for Mathematica called MathOO. In short, it allows you to use object orientation in Mathematica just like you do in Python. Please read the following article in Voofie/MathOO for details:
MathOO: Adding Python style Object Orientation to Mathematica with MathOO (1.0 beta launch) [Alternative to Objectica]
The problem I encountered is that, I would like to have garbage collector, so that user don't have to explicitly deleting the object after using it. For instance:
NewClass[Object1]
Object1.$init$[self_]:= Return[];
In the above two lines, I just defined Object1 to be a new class, and the constructor to be an empty function. If you are familiar with Python, you should see the similarity with __init__().
To instantiate an Object1, I do:
object1 = new[Object1][]
The output is:
Out: object$13
Here, object$13 is an temporary variable. What I want is, when there are no references to this temporary variable, it should be deleted automatically. But it doesn't work as expected. I have identified the problem to be the following:
In: y = Module[{x}, x[1] = 2; x]
Out: x$117
In: FullDefinition[y]
Out: y = x$117
Attributes[x$117] = {Temporary}
x$117[1] = 2
Since y holds a reference of x$117, so x$117 is not removed yet. Now let's delete the reference by setting the value of y to 1:
In: y = 1;
However, x$117 is still here:
In: Definition[x$117]
Out: Attributes[x$117] = {Temporary}
x$117[1] = 2
But I expected the variable to be removed since it is no longer referenced. From the manual of Mathematica, it said:
Temporary symbols are removed if they are no longer referenced:
So, is it a bug of Mathematica? Or is there any workaround methods? I am using Mathematica 7.0. Thank you very much.
Mathematica really does garbage collects Temporary variables when they have no more references. That said, there's two reasons that your x$117 is not garbage collected.
Remember that Module uses lexical scoping, so the module variables are only "local" in the sense that they are give a unique name "var$modnum" and the Temporary Attribute.
Since you gave your x a DownValue, it must be cleared before x can be garbage collected.
Your y was set to be the temporary variable x$... and the output was assigned to Out[]. So you also need to clear the history: Unprotect[In, Out]; Clear[In, Out]; Protect[In, Out];.
Then your Module example seems to be properly garbage collected.
When using your MathOO package (that I downloaded yesterday, but haven't played with yet) maybe you can just set the $HistoryLength to some finite number. And recommend that users suppress the output of instantiations object1 = new[Object1][];
Mathematica is a string rewriting system (at the bottom) (sort of) (not really) (but really) (ANYWAY...) The DownValue "x$117[1] = 2" is a string rewriting rule that it is not entirely inaccurate to imagine is an entry in an associative array. The array is named "x$117" and the entry is the pair {1,2}. As long as there is an entry in the array, the symbol "x$117" is referenced and will not be GCed by Mma.
Your best bet is to Remove[] symbols when they are destructed or go out of scope. (Clear[] is insufficient since lingering attributes, messages, or defaults associated with symbols are not eliminated by Clear[] and so Mma will still hold live references to the symbol.)
Related
I'm looking for an example that purposely makes a memory leak in Python.
It should be as short and simple as possible and ideally not use non-standard dependencies (that could simply do the memory leak in C code) or multi-threading/processing.
I've seen memory leaks achieved before but only when bad things were being done to libraries such as matplotlib. Also, there are many questions about how to find and fix memory leaks in Python, but they all seem to be big programs with lots of external dependencies.
The reason for asking this is about how good Python's GC really is. I know it detects reference cycles. However, can it be tricked? Is there some way to leak memory? It may be impossible to solve the most restrictive version of this problem. In that case, I'm very happy to see a rigorous argument why. Ideally, the answer should refer to the actual implementation and not just state that "an ideal garbage collector would be ideal and disallow memory leaks".
For nitpicking purposes: An ideal solution to the problem would be a program like this:
# Use Python version at least v3.10
# May use imports.
# Bonus points for only standard library.
# If the problem is unsolvable otherwise (please argue that it is),
# then you may use e.g. Numpy, Scipy, Pandas. Minus points for Matplotlib.
def memleak():
# do whatever you want but only within this function
# No global variables!
# Bonus points for no observable side-effects (besides memory use)
# ...
for _ in range(100):
memleak()
The function must return and be called multiple times. Goals in order of bonus points (high number = many bonus points)
the program keeps using more memory, until it crashes.
after calling the function multiple times (e.g. the 100 specified above), the program may continue doing other (normal) things such that the memory leaked during the function is never freed.
Like 2 but the memory cannot be freed, even by by calling gc manually and similar means.
One way to "trick" CPython's garbage collector into leaking memory is by invalidating an object's reference count. We can do this by creating an extraneous strong reference that never gets deleted.
To create a new strong reference, we need to invoke Py_IncRef (or Py_NewRef) from Python's C API. This can be done via ctypes.pythonapi:
import ctypes
import sys
# Create C API callable
inc_ref = ctypes.pythonapi.Py_IncRef
inc_ref.argtypes = [ctypes.py_object]
inc_ref.restype = None
# Create an arbitrary object.
obj = object()
# Print the number of references to obj.
# This should be 2:
# - one for the global variable 'obj'
# - one for the argument inside of 'sys.getrefcount'
print(sys.getrefcount(obj))
# Create a new strong reference.
inc_ref(obj)
# Print the number of references to obj.
# This should be 3 now.
print(sys.getrefcount(obj))
outputs
2
3
Concretely, you can write your memleak function as
import ctypes
def memleak():
# Create C api callable
inc_ref = ctypes.pythonapi.Py_IncRef
inc_ref.argtypes = [ctypes.py_object]
inc_ref.restype = None
# Allocate a large object
obj = list(range(10_000_000))
# Increment its ref count
inc_ref(obj)
# obj will have a dangling reference after this function exits
memleak() # leaks memory
An object with a dangling strong reference will never be freed by reference counting, and won't be detected as an unreachable object by the optional garbage collector. Running gc manually via
gc.collect()
will have not effect.
While using python with pyroot (a python interface to a CERN data analysis package named ROOT), I encountered the following strange behaviour:
print ROOT.TFile(fname).GetListOfKeys()
outputs None while the seemingly semantically equivalent code
f=ROOT.TFile(fname)
print f.GetListOfKeys()
outputs the expected <ROOT.THashList object ("THashList") at 0x13f0fa0>.
While this is hardly the first bug I have encountered while working with ROOT, this time I am quite puzzled that python allows this bug to happen.
I reckon that somehow, the reference count for the TFile object gets wrong in the first example, and that it gets deleted before GetListOfKeys is actually called. (After setting ROOT.TFile.__del__ to be some print command, this is indeed what happens.)
The way I see it, after ROOT.TFile(fname) gets executed, but before GetListOfKeys() is called, the pointer to the TFile object is on the stack. Therefore, the reference count should not be zero and the destructor should not be called until GetListOfKeys() returns.
Can anyone shed some light on why this happens?
On a related note, is there a way to disable python from ever deling my objects implicitly just because the reference count becomes zero? I tried gc.disable(), and it did not change the results. Is there any more elegant solution than appending the objects to some globally defined write-only list?
I have a nested dictionary containing a bunch of data on a number of different objects (where I mean object in the non-programming sense of the word). The format of the dictionary is allData[i][someDataType], where i is a number designation of the object that I have data on, and someDataType is a specific data array associated with the object in question.
Now, I have a function that I have defined that requires a particular data array for a calculation to be performed for each object. The data array is called cleanFDF. So I feed this to my function, along with a bunch of other things it requires to work. I call it like this:
rm.analyze4complexity(allData[i]['cleanFDF'], other data, other data, other data)
Inside the function itself, I straight away re-assign the cleanFDF data to another variable name, namely clFDF. I.e. The end result is:
clFDF = allData[i]['cleanFDF']
I then have to zero out all of the data that lies below a certain threshold, as such:
clFDF[ clFDF < threshold ] = 0
OK - the function works as it is supposed to. But now when I try to plot the original cleanFDF data back in the main script, the entries that got zeroed out in clFDF are also zeroed out in allData[i]['cleanFDF']. WTF? Obviously something is happening here that I do not understand.
To make matters even weirder (from my point of view), I've tried to do a bodgy kludge to get around this by 'saving' the array to another variable before calling the function. I.e. I do
saveFDF = allData[i]['cleanFDF']
then run the function, then update the cleanFDF entry with the 'saved' data:
allData[i].update( {'cleanFDF':saveFDF} )
but somehow, simply by performing clFDF[ clFDF < threshold ] = 0 within the function modifies clFDF, saveFDF and allData[i]['cleanFDF'] in the main friggin' script, zeroing out all the entires at the same array indexes! It is like they are all associated global variables somehow, but I've made no such declarations anywhere...
I am a hopeless Python newbie, so no doubt I'm not understanding something about how it works. Any help would be greatly appreciated!
You are passing the value at allData[i]['cleanFDF'] by reference (decent explanation at https://stackoverflow.com/a/430958/337678). Any changes made to it will be made to the object it refers to, which is still the same object as the original, just assigned to a different variable.
Making a deep copy of the data will likely fix your issue (Python has a deepcopy library that should do the trick ;)).
Everything is a reference in Python.
def function(y):
y.append('yes')
return y
example = list()
function(example)
print(example)
it would return ['yes'] even though i am not directly changing the variable 'example'.
See Why does list.append evaluate to false?, Python append() vs. + operator on lists, why do these give different results?, Python lists append return value.
The Hitchhiker’s Guide to Python states that it's good to reuse variable names.
foo = Spam()
bar = foo.eggs()
And I agree w/ it. It makes code readable.
What if the variable is 40 MB of data? Will it copy itself and have 80 MB at total?
foo = buffer # 40 MB.
bar = foo.resize((50, 50)) # +40?
I know that memory will be released when function will be executed, but I still don't think that it's a good idea to have like two times higher memory usage at one state of app only because of readability. It's like a special case, but on the other hand, special cases aren't special enough, huh?
Python assignment just copy the reference value to the target object. There is no data copying. Python variable is just the name in a Python system dictionary plus the value which is the reference value to the object.
Actually, you should be careful with assignments. Any Python assignment means sharing the reference value. Python assignment never means copying the target object. When working with immutable objects like strings or numbers, no problem can appear. However, when you assign any mutable object (a list, a dictionary, a set, some user object), you should know that after that you are only giving the target object a different name (access via another copy of the reference value).
The same holds for passing the object as a function/method argument.
If you absolutely must have that data entirely in memory before you resize it (rather than only reading in the bits you care about), you can do this:
foo = buffer()
bar = foo.resize((50, 50))
del foo
or equivalently:
bar = buffer().resize((50, 50))
both of these make the result of buffer immediately available for garbage collection as soon as this code is run.
Also, it can be perfectly reasonable to reuse the variable name in this case - if the lines are immediately one after the other in your code, and especially if foo.resize returns the same type of object as foo (as it seems to), then:
foo = buffer
foo = foo.resize((50, 50))
is perfectly fine. The advice is to not reuse the name for a completely unrelated variable - so that a person reading your code can see the variable and just skip up to wherever it was first assigned to understand what it is. When one of them is just a one-off "stepping stone" to get the actual object you care about, there's only a trivial risk of confusing a reader.
"Learning Python, 4th Ed." mentions that:
the enclosing scope variable is looked up when the nested functions
are later called..
However, I thought that when a function exits, all of its local references disappear.
def makeActions():
acts = []
for i in range(5): # Tries to remember each i
acts.append(lambda x: i ** x) # All remember same last i!
return acts
makeActions()[n] is the same for every n because the variable i is somehow looked up at call time. How does Python look up this variable? Shouldn't it not exist at all because makeActions has already exited? Why doesn't Python do what the code intuitively suggests, and define each function by replacing i with its current value within the for loop as the loop is running?
I think it's pretty obvious what happens when you think of i as a name not some sort of value. Your lambda function does something like "take x: look up the value of i, calculate i**x" ... so when you actually run the function, it looks up i just then so i is 4.
You can also use the current number, but you have to make Python bind it to another name:
def makeActions():
def make_lambda( j ):
return lambda x: j * x # the j here is still a name, but now it wont change anymore
acts = []
for i in range(5):
# now you're pushing the current i as a value to another scope and
# bind it there, under a new name
acts.append(make_lambda(i))
return acts
It might seem confusing, because you often get taught that a variable and it's value are the same thing -- which is true, but only in languages that actually use variables. Python has no variables, but names instead.
About your comment, actually i can illustrate the point a bit better:
i = 5
myList = [i, i, i]
i = 6
print(myList) # myList is still [5, 5, 5].
You said you changed i to 6, that is not what actually happend: i=6 means "i have a value, 6 and i want to name it i". The fact that you already used i as a name matters nothing to Python, it will just reassign the name, not change it's value (that only works with variables).
You could say that in myList = [i, i, i], whatever value i currently points to (the number 5) gets three new names: mylist[0], mylist[1], mylist[2]. That's the same thing that happens when you call a function: The arguments are given new names. But that is probably going against any intuition about lists ...
This can explain the behavior in the example: You assign mylist[0]=5, mylist[1]=5, mylist[2]=5 - no wonder they don't change when you reassign the i. If i was something muteable, for example a list, then changing i would reflect on all entries in myList too, because you just have different names for the same value!
The simple fact that you can use mylist[0] on the left hand of a = proves that it is indeed a name. I like to call = the assign name operator: It takes a name on the left, and a expression on the right, then evaluates the expression (call function, look up the values behind names) until it has a value and finally gives the name to the value. It does not change anything.
For Marks comment about compiling functions:
Well, references (and pointers) only make sense when we have some sort of addressable memory. The values are stored somewhere in memory and references lead you that place. Using a reference means going to that place in memory and doing something with it. The problem is that none of these concepts are used by Python!
The Python VM has no concept of memory - values float somewhere in space and names are little tags connected to them (by a little red string). Names and values exist in separate worlds!
This makes a big difference when you compile a function. If you have references, you know the memory location of the object you refer to. Then you can simply replace then reference with this location.
Names on the other hand have no location, so what you have to do (during runtime) is follow that little red string and use whatever is on the other end. That is the way Python compiles functions: Where
ever there is a name in the code, it adds a instruction that will figure out what that name stands for.
So basically Python does fully compile functions, but names are compiled as lookups in the nesting namespaces, not as some sort of reference to memory.
When you use a name, the Python compiler will try to figure out where to which namespace it belongs to. This results in a instruction to load that name from the namespace it found.
Which brings you back to your original problem: In lambda x:x**i, the i is compiled as a lookup in the makeActions namespace (because i was used there). Python has no idea, nor does it care about the value behind it (it does not even have to be a valid name). One that code runs the i gets looked up in it's original namespace and gives the more or less expected value.
What happens when you create a closure:
The closure is constructed with a pointer to the frame (or roughly, block) that it was created in: in this case, the for block.
The closure actually assumes shared ownership of that frame, by incrementing the frame's ref count and stashing the pointer to that frame in the closure. That frame, in turn, keeps around references to the frames it was enclosed in, for variables that were captured further up the stack.
The value of i in that frame keeps changing as long as the for loop is running – each assignment to i updates the binding of i in that frame.
Once the for loop exits, the frame is popped off the stack, but it isn't thrown away as it might usually be! Instead, it's kept around because the closure's reference to the frame is still active. At this point, though, the value of i is no longer updated.
When the closure is invoked, it picks up whatever value of i is in the parent frame at the time of invocation. Since in the for loop you create closures, but don't actually invoke them, the value of i upon invocation will be the last value it had after all the looping was done.
Future calls to makeActions will create different frames. You won't reuse the for loop's previous frame, or update that previous frame's i value, in that case.
In short: frames are garbage-collected just like other Python objects, and in this case, an extra reference is kept around to the frame corresponding to the for block so it doesn't get destroyed when the for loop goes out of scope.
To get the effect you want, you need to have a new frame created for each value of i you want to capture, and each lambda needs to be created with a reference to that new frame. You won't get that from the for block itself, but you could get that from a call to a helper function which will establish the new frame. See THC4k's answer for one possible solution along these lines.
The local references persist because they're contained in the local scope, which the closure keeps a reference to.
I thought that when a function exits, all of its local references disappear.
Except for those locals which are closed over in a closure. Those do not disappear, even when the function to which they are local has returned.
Intuitively one might think i would be captured in its current state but that is not the case. Think of each layer as a dictionary of name value pairs.
Level 1:
acts
i
Level 2:
x
Every time you create a closure for the inner lambda you are capturing a reference to level one. I can only assume that the run-time will perform a look-up of the variable i, starting in level 2 and making its way to level 1. Since you are not executing these functions immediately they will all use the final value of i.
Experts?