im using the fantastic eric4 ide to code python, it's got a tool built in called 'cyclops', which is apparently looking for cycles. After running it, it gives me a bunch of big bold red letters declaring there to be a multitude of cycles in my code. The problem is the output is nearly indecipherable, there's no way im gonna understand what a cycle is by reading its output. ive browsed the web for hours and cant seem to find so much as a blog post. when the cycles pile up to a certain point the profiler and debugger stop working :(.
my question is what are cycles, how do i know when im making a cycle, how do i avoid making cycles in python. thanks.
A cycle (or "references loop") is two or more objects referring to each other, e.g.:
alist = []
anoth = [alist]
alist.append(anoth)
or
class Child(object): pass
class Parent(object): pass
c = Child()
p = Parent()
c.parent = p
p.child = c
Of course, these are extremely simple examples with cycles of just two items; real-life examples are often longer and harder to spot. There's no magic bullet telling you that you just made a cycle -- you just need to watch for it. The gc module (whose specific job is to garbage-collect unreachable cycles) can help you diagnose existing cycles (when you set the appropriate debug flags). The weakref module can help you to avoid building cycles when you do need (e.g.) a child and parent to know about each other without creating a reference cycle (make just one of the two mutual references into a weak ref or proxy, or use the handy weak-dictionary containers that the module supplies).
All Cyclops tells you is whether there are objects in your code that refer to themselves through a chain of other objects. This used to be an issue in python, because the garbage collector wouldn't handle these kinds of objects correctly. That problem has since been, for the most part, fixed.
Bottom line: if you're not observing a memory leak, you don't need to worry about the output of Cyclops in most instances.
Related
Pretty simple question:
I have some code to show some graphs, and it prepares data for the graphs, and I don't want to waste memory (limited)... is there a way to have a "local scope" so when we get to the end, everything inside is freed?
I come from C++ where you can define code inside { ... } so at the end everything is freed, and you don't have to care about anything
Anything like that in python?
The only thing I can think of is:
def tmp():
... code ...
tmp()
but is very ugly, and for sure I don't want to list all the del x at the end
If anything holds a reference to your object, it cannot be freed. By default, anything at the global scope is going to be held in the global namespace (globals()), and as far as the interpreter knows, the very next line of source code could reference it (or, another module could import it from this current module), so globals cannot be implicitly freed, ever.
This forces your hand to either explicitly delete references to objects with del, or to put them within the local scope of a function. This may seem ugly, but if you follow the philosophy that a function should do one thing and one thing well (thanks Unix!), you will already segment your code into functions already. On the one-off exceptions where you allocate a lot of memory early on in your function, and no longer need it midway through, you can del the reference to it.
I know this isn't the answer you want to hear, but its the reality of Python. You could accomplish something similar by nesting function defs or classs inside, but this is kinda hacky (or in the class case, which wouldn't require calling/instantiating, extremely hacky).
I will also mention, there is a gc built in module for interacting with the garbage collector. Here, you can trigger an immediate garbage collection (otherwise python will eventually get around to collecting the things you del refs to), as well as inspect how many references a given object has.
If you're curious where the allocations are happening, you can also use the built in tracemalloc module to trace said allocations.
Mechanism that handles freeing memory in Python is called "Garbage Collector" and it means there's no reason to use del in overwhelming majority of Python code.
When programming in Python, you are "not supposed" to care about such low level things as allocating and freeing memory for your variables.
That being said, putting your code into functions (although preferrably called something clearer than tmp()) is most definitely a good idea as it will make your code much more readable and "Pythonic"
Coming from C++ and already stumbled to one of the main diferences (drawbacks) of python and this is memory management.Python Garbage Collector will delete all the objects that will fall out of scope.Freeing up memory of objects althought doesnt guarantee that this memory will return actually to the system but instead a rather big portion will be kept reserved by the python programm even if not used.If you face a memory problem and you want to free your memory back to the system the only safe method is to run the memory intensive function into a seperate process.Every process in python have its own interpreter and any memory consumed by this process will return to the system when the process exits.
I would to ask if using the Delegate Pattern in Python would lead to circular references and if so, what would be the best way to implement it to ensure the object and its delegate will be garbage collected?
In Objective C, the above problem is avoided by using a weak reference to the delegate. In C++, we don't call delete on the delegate. I've found a link to Python's weak reference module here: http://docs.python.org/library/weakref.html. It seems like a plausible approach might be to create a weak reference to refer to the instance variable using this module but I'm not sure.
As I've googled this question and was not able to find answers to it, I'm wondering whether this is even a problem in Python or if there is a common solution (without the need for the weakref module) that I'm unaware of? Also, I did search stackoverflow before asking but the questions I found either deal with circular imports or delegate pattern in general and not specific to Python and the problem of circular references.
Thanks in advance for any replies.
Listed below is some code for a toy example to help illustrate my question. I've implemented code in this way and it works but I'm not sure whether memory is garbage collected at the end.
class A(object):
def __init__(self):
self.delegate = None
# Some other instance variables that keep track of state for performing some tasks.
def doSomething(self):
if self.delegate is not None:
self.delegate.doSomething()
else:
print('Cannot perform task because delegate is not set.')
# Other methods not shown.
class B(object):
def __init__(self):
self.a = A() # Need to keep object 'a' from garbage collected so as to preserve its state information.
self.a.delegate = self # Is this a circular reference? How to 'fix' it so that A and B will eventually be garbage collected?
def doSomething(self):
print('B doing something')
# Other methods not shown.
EDIT:
After reading some of the replies, I decided to clarify my question. I understand that Python has garbage collection. What I wasn't sure was whether it will perform garbage collection on circular referenced objects. My worries stems from the following passage from Python's doc:
CPython implementation detail: CPython currently uses a
reference-counting scheme with (optional) delayed detection of
cyclically linked garbage, which collects most objects as soon as they
become unreachable, but is not guaranteed to collect garbage
containing circular references. See the documentation of the gc module
for information on controlling the collection of cyclic garbage. Other
implementations act differently and CPython may change. Do not depend
on immediate finalization of objects when they become unreachable (ex:
always close files).
The passage in its original form can be found here: http://docs.python.org/reference/datamodel.html The bold setting is mine.
The following post provides a clearer explanation on the problem of circular referenced objects and why it would prevent garbage collection on those objects (at least in a typical setting): http://www.electricmonk.nl/log/2008/07/07/python-destructor-and-garbage-collection-notes/.
Further, I just came across Alex Martellli's reply to the following question on whether Python users should worry about circular reference: Should I worry about circular references in Python? From his answer, I gather that even though circular referenced objects will eventually be garbage collected BUT there would be overheads. Whether it is significant depends on the program.
Further, he mentioned to use Python's weakref module but did not explicitly say how.
Hence, I would like to add the following questions to clarify some unresolved issues:
The docs say garbaged collection is not guaranteed for circular
referenced objects. But from the replies it appears that is not the
case. So have I misunderstood the passage or are there further
details that I've missed?
I suppose using a weak reference, as stated in Alex's reply and my
question, would avoid the overhead the problem entirely?
Again thanks for the replies.
Python already does garbage collection. You only need to do something special if you write your own container types in C, as extensions.
Demo: Run this program and watch the memory usage not climb.
class C(object):
pass
def circular():
for x in range(10**4):
for y in range(10**4):
a = C()
b = C()
a.x = b
b.x = a
circular()
Footnote: The following function doesn't do anything, delete it.
def setDelegate(self, delegate):
self.delegate = delegate
Instead of calling x.setDelegate(y), you can use x.delegate = y. You can overload member access in Python, so there's no benefit to writing a method.
Why wouldn't it be garbage collected at the end? When the script is over and python completes execution, the entire section of memory will be marked for garbage collection and (eventually) OS recovery.
If you're running this in a long-running program, once A and B are both dereferenced, then the memory will be reclaimed.
OK I've got 2 really big classes > 1k lines each that I currently have split up into multiple ones. They then get recombined using multiple inheritance. Now I'm wondering, if there is any cleaner/better more pythonic way of doing this. Completely factoring them out would result in endless amounts of self.otherself.do_something calls, which I don't think is the way it should be done.
To make things clear here's what it currently looks like:
from gui_events import GUIEvents # event handlers
from gui_helpers import GUIHelpers # helper methods that don't directly modify the GUI
# GUI.py
class GUI(gtk.Window, GUIEvents, GUIHelpers):
# general stuff here stuff here
One problem that is result of this is Pylint complaining giving me trillions of "init not called" / "undefined attribute" / "attribute accessed before definition" warnings.
EDIT:
You may want to take a look at the code, to make yourself a picture about what the whole thing actually is.
http://github.com/BonsaiDen/Atarashii/tree/next/atarashii/usr/share/pyshared/atarashii/
Please note, I'm really trying anything to keep this thing as DRY as possible, I'm using pylint to detect code duplication, the only thing it complains about are the imports.
If you want to use multiple inheritance to combine everything into one big class (it might make sense to do this), then you can refactor each of the parent classes so that every method and property is either private (starts with '__') or has a short 2-3 character prefix unique to that class. For example, all the methods and properties in your GUIEvents class could start with ge_, everything in GUIHelpers could start with gh_. By doing this, you'll get achieve some of the clarity of using separate sub-class instances (self.ge.doSomething() vs self.ge_doSomething()) and you'll avoid conflicting member names, which is the main risk when combining such large classes into one.
Start by finding classes that model real world concepts that your application needs to work with. Those are natural candidates for classes.
Try to avoid multiple inheritance as much as possible; it's rarely useful and always somewhat confusing. Instead, look to use functional composition ("HAS-A" relationships) to give rich attributes to your objects made of other objects.
Remember to make each method do one small, specific thing; this necessarily entails breaking up methods that do too many things into smaller pieces.
Refactor cases where you find many such methods are duplicating each other's functionality; this is another way to find natural collections of functionality that deserve to be in a distinct class.
I think this is more of a general OO-design problem than Python problem. Python pretty much gives you all the classic OOP tools, conveniently packaged. You'd have to describe the problem in more detail (e.g. what do the GUIEvents and GUIHelpers classes contain?)
One Python-specific aspect to consider is the following: Python supports multiple programming paradigms, and often the best solution is not OOP. This may be the case here. But again, you'll have to throw in more details to get a meaningful answer.
Your code may be substantially improved by implementing a Model-View-Controller design. Depending on how your GUI and tool are setup, you may also benefit from "widgetizing" portions of your GUI, so that rather than having one giant Model-View-Controller, you have a main Model-View-Controller that manages a bunch of smaller Model-View-Controllers, each for distinct portions of your GUI. This would allow you to break up your tool and GUI into many classes, and you may be able to reuse portions of it, reducing the total amount of code you need to maintain.
While python does support multiple programming paradigms, for GUI tools, the best solution will nearly always be an Object-Oriented design.
One possibility is to assign imported functions to class attributes:
In file a_part_1.py:
def add(self, n):
self.n += n
def __init__(self, n):
self.n = n
And in main class file:
import a_part_1
class A:
__init__ = a_part_1.__init__
add = a_part_1.add
Or if you don't want to update main file when new methods are added:
class A: pass
import a_part_1
for k, v in a_part_1.__dict__.items():
if callable(v):
setattr(A,k,v)
I have been working on some code. My usual approach is to first solve all of the pieces of the problem, creating the loops and other pieces of code I need as I work through the problem and then if I expect to reuse the code I go back through it and group the parts of code together that I think should be grouped to create functions.
I have just noticed that creating functions and calling them seems to be much more efficient than writing lines of code and deleting containers as I am finished with them.
for example:
def someFunction(aList):
do things to aList
that create a dictionary
return aDict
seems to release more memory at the end than
>>do things to alist
>>that create a dictionary
>>del(aList)
Is this expected behavior?
EDIT added example code
When this function finishes running the PF Usage shows an increase of about 100 mb the filingsList has about 8 million lines.
def getAllCIKS(filingList):
cikDICT=defaultdict(int)
for filing in filingList:
if filing.startswith('.'):
del(filing)
continue
cik=filing.split('^')[0].strip()
cikDICT[cik]+=1
del(filing)
ciklist=cikDICT.keys()
ciklist.sort()
return ciklist
allCIKS=getAllCIKS(open(r'c:\filinglist.txt').readlines())
If I run this instead I show an increase of almost 400 mb
cikDICT=defaultdict(int)
for filing in open(r'c:\filinglist.txt').readlines():
if filing.startswith('.'):
del(filing)
continue
cik=filing.split('^')[0].strip()
cikDICT[cik]+=1
del(filing)
ciklist=cikDICT.keys()
ciklist.sort()
del(cikDICT)
EDIT
I have been playing around with this some more today. My observation and question should be refined a bit since my focus has been on the PF Usage. Unfortunately I can only poke at this between my other tasks. However I am starting to wonder about references versus copies. If I create a dictionary from a list does the dictionary container hold a copy of the values that came from the list or do they hold references to the values in the list? My bet is that the values are copied instead of referenced.
Another thing I noticed is that items in the GC list were items from containers that were deleted. Does that make sense? Soo I have a list and suppose each of the items in the list was [(aTuple),anInteger,[another list]]. When I started learning about how to manipulate the gc objects and inspect them I found those objects in the gc even though the list had been forcefully deleted and even though I passed the 0,1 & 2 value to the method that I don't remember to try to still delete them.
I appreciate the insights people have been sharing. Unfortunately I am always interested in figuring out how things work under the hood.
Maybe you used some local variables in your function, which are implicitly released by reference counting at the end of the function, while they are not released at the end of your code segment?
You can use the Python garbage collector interface provided to more closely examine what (if anything) is being left around in the second case. Specifically, you may want to check out gc.get_objects() to see what is left uncollected, or gc.garbage to see if you have any reference cycles.
Some extra memory is freed when you return from a function, but that's exactly as much extra memory as was allocated to call the function in the first place. In any case - if you seeing a large amount of difference, that's likely an artifact of the state of the runtime, and is not something you should really be worrying about. If you are running low on memory, the way to solve the problem is to keep more data on disk using things like b-trees (or just use a database), or use algorithms that use less memory. Also, keep an eye out for making unnecessary copies of large data structures.
The real memory savings in creating functions is in your short-term memory. By moving something into a function, you reduce the amount of detail you need to remember by encapsulating part of the minutia away.
Maybe you should re-engineer your code to get rid of unnecessary variables (that may not be freed instantly)... how about the following snippet?
myfile = file(r"c:\fillinglist.txt")
ciklist = sorted(set(x.split("^")[0].strip() for x in myfile if not x.startswith(".")))
EDIT: I don't know why this answer was voted negative... Maybe because it's short? Or maybe because the dude who voted was unable to understand how this one-liner does the same that the code in the question without creating unnecessary temporal containers?
Sigh...
I asked another question about copying lists and the answers, particularly the answer directing me to look at deepcopy caused me to think about some dictionary behavior. The problem I was experiencing had to do with the fact that the original list is never garbage collected because the dictionary maintains references to the list. I need to use the information about weakref in the Python Docs.
objects referenced by dictionaries seem to stay alive. I think (but am not sure) the process of pushing the dictionary out of the function forces the copy process and kills the object. This is not complete I need to do some more research.
I have a block of code that basically intializes several classes, but they are placed in a sequential order, as later ones reference early ones.
For some reason the last one initializes before the first one...it seems to me there is some sort of threading going on. What I need to know is how can I stop it from doing this?
Is there some way to make a class init do something similar to sending a return value?
Or maybe I could use the class in an if statement of some sort to check if the class has already been initialized?
I'm a bit new to Python and am migrating from C, so I'm still getting used to the little differences like naming conventions.
Python upto 3.0 has a global lock, so everything is running in a single thread and in sequence.
My guess is that some side effect initializes the last class from a different place than you expect. Throw an exception in __init__ of that last class to see where it gets called.
Spaces vs. Tabs issue...ugh. >.>
Well, atleast it works now. I admit that I kind of miss the braces from C instead of forced-indentation. It's quite handy as a prototyping language though. Maybe I'll grow to love it more when I get a better grasp of it.