I would to ask if using the Delegate Pattern in Python would lead to circular references and if so, what would be the best way to implement it to ensure the object and its delegate will be garbage collected?
In Objective C, the above problem is avoided by using a weak reference to the delegate. In C++, we don't call delete on the delegate. I've found a link to Python's weak reference module here: http://docs.python.org/library/weakref.html. It seems like a plausible approach might be to create a weak reference to refer to the instance variable using this module but I'm not sure.
As I've googled this question and was not able to find answers to it, I'm wondering whether this is even a problem in Python or if there is a common solution (without the need for the weakref module) that I'm unaware of? Also, I did search stackoverflow before asking but the questions I found either deal with circular imports or delegate pattern in general and not specific to Python and the problem of circular references.
Thanks in advance for any replies.
Listed below is some code for a toy example to help illustrate my question. I've implemented code in this way and it works but I'm not sure whether memory is garbage collected at the end.
class A(object):
def __init__(self):
self.delegate = None
# Some other instance variables that keep track of state for performing some tasks.
def doSomething(self):
if self.delegate is not None:
self.delegate.doSomething()
else:
print('Cannot perform task because delegate is not set.')
# Other methods not shown.
class B(object):
def __init__(self):
self.a = A() # Need to keep object 'a' from garbage collected so as to preserve its state information.
self.a.delegate = self # Is this a circular reference? How to 'fix' it so that A and B will eventually be garbage collected?
def doSomething(self):
print('B doing something')
# Other methods not shown.
EDIT:
After reading some of the replies, I decided to clarify my question. I understand that Python has garbage collection. What I wasn't sure was whether it will perform garbage collection on circular referenced objects. My worries stems from the following passage from Python's doc:
CPython implementation detail: CPython currently uses a
reference-counting scheme with (optional) delayed detection of
cyclically linked garbage, which collects most objects as soon as they
become unreachable, but is not guaranteed to collect garbage
containing circular references. See the documentation of the gc module
for information on controlling the collection of cyclic garbage. Other
implementations act differently and CPython may change. Do not depend
on immediate finalization of objects when they become unreachable (ex:
always close files).
The passage in its original form can be found here: http://docs.python.org/reference/datamodel.html The bold setting is mine.
The following post provides a clearer explanation on the problem of circular referenced objects and why it would prevent garbage collection on those objects (at least in a typical setting): http://www.electricmonk.nl/log/2008/07/07/python-destructor-and-garbage-collection-notes/.
Further, I just came across Alex Martellli's reply to the following question on whether Python users should worry about circular reference: Should I worry about circular references in Python? From his answer, I gather that even though circular referenced objects will eventually be garbage collected BUT there would be overheads. Whether it is significant depends on the program.
Further, he mentioned to use Python's weakref module but did not explicitly say how.
Hence, I would like to add the following questions to clarify some unresolved issues:
The docs say garbaged collection is not guaranteed for circular
referenced objects. But from the replies it appears that is not the
case. So have I misunderstood the passage or are there further
details that I've missed?
I suppose using a weak reference, as stated in Alex's reply and my
question, would avoid the overhead the problem entirely?
Again thanks for the replies.
Python already does garbage collection. You only need to do something special if you write your own container types in C, as extensions.
Demo: Run this program and watch the memory usage not climb.
class C(object):
pass
def circular():
for x in range(10**4):
for y in range(10**4):
a = C()
b = C()
a.x = b
b.x = a
circular()
Footnote: The following function doesn't do anything, delete it.
def setDelegate(self, delegate):
self.delegate = delegate
Instead of calling x.setDelegate(y), you can use x.delegate = y. You can overload member access in Python, so there's no benefit to writing a method.
Why wouldn't it be garbage collected at the end? When the script is over and python completes execution, the entire section of memory will be marked for garbage collection and (eventually) OS recovery.
If you're running this in a long-running program, once A and B are both dereferenced, then the memory will be reclaimed.
Related
Pretty simple question:
I have some code to show some graphs, and it prepares data for the graphs, and I don't want to waste memory (limited)... is there a way to have a "local scope" so when we get to the end, everything inside is freed?
I come from C++ where you can define code inside { ... } so at the end everything is freed, and you don't have to care about anything
Anything like that in python?
The only thing I can think of is:
def tmp():
... code ...
tmp()
but is very ugly, and for sure I don't want to list all the del x at the end
If anything holds a reference to your object, it cannot be freed. By default, anything at the global scope is going to be held in the global namespace (globals()), and as far as the interpreter knows, the very next line of source code could reference it (or, another module could import it from this current module), so globals cannot be implicitly freed, ever.
This forces your hand to either explicitly delete references to objects with del, or to put them within the local scope of a function. This may seem ugly, but if you follow the philosophy that a function should do one thing and one thing well (thanks Unix!), you will already segment your code into functions already. On the one-off exceptions where you allocate a lot of memory early on in your function, and no longer need it midway through, you can del the reference to it.
I know this isn't the answer you want to hear, but its the reality of Python. You could accomplish something similar by nesting function defs or classs inside, but this is kinda hacky (or in the class case, which wouldn't require calling/instantiating, extremely hacky).
I will also mention, there is a gc built in module for interacting with the garbage collector. Here, you can trigger an immediate garbage collection (otherwise python will eventually get around to collecting the things you del refs to), as well as inspect how many references a given object has.
If you're curious where the allocations are happening, you can also use the built in tracemalloc module to trace said allocations.
Mechanism that handles freeing memory in Python is called "Garbage Collector" and it means there's no reason to use del in overwhelming majority of Python code.
When programming in Python, you are "not supposed" to care about such low level things as allocating and freeing memory for your variables.
That being said, putting your code into functions (although preferrably called something clearer than tmp()) is most definitely a good idea as it will make your code much more readable and "Pythonic"
Coming from C++ and already stumbled to one of the main diferences (drawbacks) of python and this is memory management.Python Garbage Collector will delete all the objects that will fall out of scope.Freeing up memory of objects althought doesnt guarantee that this memory will return actually to the system but instead a rather big portion will be kept reserved by the python programm even if not used.If you face a memory problem and you want to free your memory back to the system the only safe method is to run the memory intensive function into a seperate process.Every process in python have its own interpreter and any memory consumed by this process will return to the system when the process exits.
I'm doing some things in Python (3.3.3), and I came across something that is confusing me since to my understanding classes get a new id each time they are called.
Lets say you have this in some .py file:
class someClass: pass
print(someClass())
print(someClass())
The above returns the same id which is confusing me since I'm calling on it so it shouldn't be the same, right? Is this how Python works when the same class is called twice in a row or not? It gives a different id when I wait a few seconds but if I do it at the same like the example above it doesn't seem to work that way, which is confusing me.
>>> print(someClass());print(someClass())
<__main__.someClass object at 0x0000000002D96F98>
<__main__.someClass object at 0x0000000002D96F98>
It returns the same thing, but why? I also notice it with ranges for example
for i in range(10):
print(someClass())
Is there any particular reason for Python doing this when the class is called quickly? I didn't even know Python did this, or is it possibly a bug? If it is not a bug can someone explain to me how to fix it or a method so it generates a different id each time the method/class is called? I'm pretty puzzled on how that is doing it because if I wait, it does change but not if I try to call the same class two or more times.
The id of an object is only guaranteed to be unique during that object's lifetime, not over the entire lifetime of a program. The two someClass objects you create only exist for the duration of the call to print - after that, they are available for garbage collection (and, in CPython, deallocated immediately). Since their lifetimes don't overlap, it is valid for them to share an id.
It is also unsuprising in this case, because of a combination of two CPython implementation details: first, it does garbage collection by reference counting (with some extra magic to avoid problems with circular references), and second, the id of an object is related to the value of the underlying pointer for the variable (ie, its memory location). So, the first object, which was the most recent object allocated, is immediately freed - it isn't too surprising that the next object allocated will end up in the same spot (although this potentially also depends on details of how the interpreter was compiled).
If you are relying on several objects having distinct ids, you might keep them around - say, in a list - so that their lifetimes overlap. Otherwise, you might implement a class-specific id that has different guarantees - eg:
class SomeClass:
next_id = 0
def __init__(self):
self.id = SomeClass.nextid
SomeClass.nextid += 1
If you read the documentation for id, it says:
Return the “identity” of an object. This is an integer which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.
And that's exactly what's happening: you have two objects with non-overlapping lifetimes, because the first one is already out of scope before the second one is ever created.
But don't trust that this will always happen, either. Especially if you need to deal with other Python implementations, or with more complicated classes. All that the language says is that these two objects may have the same id() value, not that they will. And the fact that they do depends on two implementation details:
The garbage collector has to clean up the first object before your code even starts to allocate the second object—which is guaranteed to happen with CPython or any other ref-counting implementation (when there are no circular references), but pretty unlikely with a generational garbage collector as in Jython or IronPython.
The allocator under the covers have to have a very strong preference for reusing recently-freed objects of the same type. This is true in CPython, which has multiple layers of fancy allocators on top of basic C malloc, but most of the other implementations leave a lot more to the underlying virtual machine.
One last thing: The fact that the object.__repr__ happens to contain a substring that happens to be the same as the id as a hexadecimal number is just an implementation artifact of CPython that isn't guaranteed anywhere. According to the docs:
If at all possible, this should look like a valid Python expression that could be used to recreate an object with the same value (given an appropriate environment). If this is not possible, a string of the form <...some useful description…> should be returned.
The fact that CPython's object happens to put hex(id(self)) (actually, I believe it's doing the equivalent of sprintf-ing its pointer through %p, but since CPython's id just returns the same pointer cast to a long that ends up being the same) isn't guaranteed anywhere. Even if it has been true since… before object even existed in the early 2.x days. You're safe to rely on it for this kind of simple "what's going on here" debugging at the interactive prompt, but don't try to use it beyond that.
I sense a deeper problem here. You should not be relying on id to track unique instances over the lifetime of your program. You should simply see it as a non-guaranteed memory location indicator for the duration of each object instance. If you immediately create and release instances then you may very well create consecutive instances in the same memory location.
Perhaps what you need to do is track a class static counter that assigns each new instance with a unique id, and increments the class static counter for the next instance.
It's releasing the first instance since it wasn't retained, then since nothing has happened to the memory in the meantime, it instantiates a second time to the same location.
Try this, try calling the following:
a = someClass()
for i in range(0,44):
print(someClass())
print(a)
You'll see something different. Why? Cause the memory that was released by the first object in the "foo" loop was reused. On the other hand a is not reused since it's retained.
A example where the memory location (and id) is not released is:
print([someClass() for i in range(10)])
Now the ids are all unique.
Will it cause memory leak if they cannot be cleaned by GC?
It's a standard issue with garbage collection.
It's not about memory leaks, but about the circular references themselves, and about other kinds of resources managed by those objects that may need cleanup. The references create a dependency - you can't delete the referrer until all objects it references are deleted, because it may need to do something with those referred-to objects during its cleanup.
As a contrived example, two objects may each have log files, and during their cleanups may need to write log messages both to their own log file and to the other one. You can't clean up either object first, as by doing so you leave the other object unable to perform its cleanup.
The basic rule is that you can have either reliable destructors (as in C++) or garbage collection (as in Python, Java...), but not both. Though in principle, a static analysis of code (or even a visual inspection in most cases) can tell you which classes might have this circular reference problem.
From the docs for gc.garbage:
Python doesn’t collect such cycles
automatically because, in general, it
isn’t possible for Python to guess a
safe order in which to run the
__del__() methods. If you know a safe order, you can force the issue by
examining the garbage list, and
explicitly breaking cycles due to your
objects within the list.
It depends on what are You doing in __del__. If You are using it to handle references to another objects, it may be so.
Some discussion is in docs. More appropriate question is what are You trying to do in __del__ and if it should not be done explicitly somewhere else in the code.
Note: I'm not talking about preventing the rebinding of a variable. I'm talking about preventing the modification of the memory that the variable refers to, and of any memory that can be reached from there by following the nested containers.
I have a large data structure, and I want to expose it to other modules, on a read-only basis. The only way to do that in Python is to deep-copy the particular pieces I'd like to expose - prohibitively expensive in my case.
I am sure this is a very common problem, and it seems like a constant reference would be the perfect solution. But I must be missing something. Perhaps constant references are hard to implement in Python. Perhaps they don't quite do what I think they do.
Any insights would be appreciated.
While the answers are helpful, I haven't seen a single reason why const would be either hard to implement or unworkable in Python. I guess "un-Pythonic" would also count as a valid reason, but is it really? Python does do scrambling of private instance variables (starting with __) to avoid accidental bugs, and const doesn't seem to be that different in spirit.
EDIT: I just offered a very modest bounty. I am looking for a bit more detail about why Python ended up without const. I suspect the reason is that it's really hard to implement to work perfectly; I would like to understand why it's so hard.
It's the same as with private methods: as consenting adults authors of code should agree on an interface without need of force. Because really really enforcing the contract is hard, and doing it the half-assed way leads to hackish code in abundance.
Use get-only descriptors, and state clearly in your documentation that these data is meant to be read only. After all, a determined coder could probably find a way to use your code in different ways you thought of anyways.
In PEP 351, Barry Warsaw proposed a protocol for "freezing" any mutable data structure, analogous to the way that frozenset makes an immutable set. Frozen data structures would be hashable and so capable being used as keys in dictionaries.
The proposal was discussed on python-dev, with Raymond Hettinger's criticism the most detailed.
It's not quite what you're after, but it's the closest I can find, and should give you some idea of the thinking of the Python developers on this subject.
There are many design questions about any language, the answer to most of which is "just because". It's pretty clear that constants like this would go against the ideology of Python.
You can make a read-only class attribute, though, using descriptors. It's not trivial, but it's not very hard. The way it works is that you can make properties (things that look like attributes but call a method on access) using the property decorator; if you make a getter but not a setter property then you will get a read-only attribute. The reason for the metaclass programming is that since __init__ receives a fully-formed instance of the class, you actually can't set the attributes to what you want at this stage! Instead, you have to set them on creation of the class, which means you need a metaclass.
Code from this recipe:
# simple read only attributes with meta-class programming
# method factory for an attribute get method
def getmethod(attrname):
def _getmethod(self):
return self.__readonly__[attrname]
return _getmethod
class metaClass(type):
def __new__(cls,classname,bases,classdict):
readonly = classdict.get('__readonly__',{})
for name,default in readonly.items():
classdict[name] = property(getmethod(name))
return type.__new__(cls,classname,bases,classdict)
class ROClass(object):
__metaclass__ = metaClass
__readonly__ = {'a':1,'b':'text'}
if __name__ == '__main__':
def test1():
t = ROClass()
print t.a
print t.b
def test2():
t = ROClass()
t.a = 2
test1()
While one programmer writing code is a consenting adult, two programmers working on the same code seldom are consenting adults. More so if they do not value the beauty of the code but them deadlines or research funds.
For such adults there is some type safety, provided by Enthought's Traits.
You could look into Constant and ReadOnly traits.
For some additional thoughts, there is a similar question posed about Java here:
Why is there no Constant feature in Java?
When asking why Python has decided against constant references, I think it's helpful to think of how they would be implemented in the language. Should Python have some sort of special declaration, const, to create variable references that can't be changed? Why not allow variables to be declared a float/int/whatever then...these would surely help prevent programming bugs as well. While we're at it, adding class and method modifiers like protected/private/public/etc. would help enforce compile-type checking against illegal uses of these classes. ...pretty soon, we've lost the beauty, simplicity, and elegance that is Python, and we're writing code in some sort of bastard child of C++/Java.
Python also currently passes everything by reference. This would be some sort of special pass-by-reference-but-flag-it-to-prevent-modification...a pretty special case (and as the Tao of Python indicates, just "un-Pythonic").
As mentioned before, without actually changing the language, this type of behaviour can be implemented via classes & descriptors. It may not prevent modification from a determined hacker, but we are consenting adults. Python didn't necessarily decide against providing this as an included module ("batteries included") - there was just never enough demand for it.
im using the fantastic eric4 ide to code python, it's got a tool built in called 'cyclops', which is apparently looking for cycles. After running it, it gives me a bunch of big bold red letters declaring there to be a multitude of cycles in my code. The problem is the output is nearly indecipherable, there's no way im gonna understand what a cycle is by reading its output. ive browsed the web for hours and cant seem to find so much as a blog post. when the cycles pile up to a certain point the profiler and debugger stop working :(.
my question is what are cycles, how do i know when im making a cycle, how do i avoid making cycles in python. thanks.
A cycle (or "references loop") is two or more objects referring to each other, e.g.:
alist = []
anoth = [alist]
alist.append(anoth)
or
class Child(object): pass
class Parent(object): pass
c = Child()
p = Parent()
c.parent = p
p.child = c
Of course, these are extremely simple examples with cycles of just two items; real-life examples are often longer and harder to spot. There's no magic bullet telling you that you just made a cycle -- you just need to watch for it. The gc module (whose specific job is to garbage-collect unreachable cycles) can help you diagnose existing cycles (when you set the appropriate debug flags). The weakref module can help you to avoid building cycles when you do need (e.g.) a child and parent to know about each other without creating a reference cycle (make just one of the two mutual references into a weak ref or proxy, or use the handy weak-dictionary containers that the module supplies).
All Cyclops tells you is whether there are objects in your code that refer to themselves through a chain of other objects. This used to be an issue in python, because the garbage collector wouldn't handle these kinds of objects correctly. That problem has since been, for the most part, fixed.
Bottom line: if you're not observing a memory leak, you don't need to worry about the output of Cyclops in most instances.