Is my method right for gargabe collecting circular referenced objects? - python

When I was playing with my newly created html module, I used weakref module to overcome the circular reference problem. Everything seems to be fine for me! but I am not sure about the way I followed and not sure about the Scope class below. I tried to have a smallest working example (Here is a link for full code). Html class is just for creating html output with python objects. The example below does not do that for simplicity, of course.
# encoding: utf-8
from __future__ import print_function, unicode_literals
import weakref
class Scope(object):
def __init__(self):
self.ref_holder = set()
def add(self, obj):
self.ref_holder.add(obj)
def __enter__(self):
return self
def __exit__(self, *args, **kwargs):
self.ref_holder = None
class Html(object):
def __init__(self, parent=None, tag="", scope=None):
self.scope = scope
if parent is None:
self.parent = None
elif type(parent) != weakref.CallableProxyType:
self.parent = weakref.proxy(parent)
if self.scope:
self.scope.add(parent)
elif parent.scope:
parent.scope.add(self)
else:
self.parent = parent
self.tag = tag
if self.scope:
self.scope.add(self)
self.children = []
def append(self, html):
if isinstance(html, basestring):
html = Html(tag=html)
return self.append(html)
elif isinstance(html, self.__class__):
self.children.append(html)
return html
else:
raise Exception("Unknown type")
def __unicode__(self):
return 'Html "{tag}" children = {children}'.format(tag=self.tag,
children=list(map(str, self.children)))
def __str__(self):
return self.__unicode__()
if __name__ == "__main__":
with Scope() as scope:
test_form = Html(tag="form", scope=scope)
test_form.append(Html(tag="label"))
test_input = Html(tag="input")
test_form.append(test_input)
print(test_form)
Here are my concerns and I will appreciate your guidance:
I call reference holder class as Scope. It just holds the references to objects even if they are not assigned to any variable so Html object is not garbage collected (note: some objects can change parent/child relation ship and therefore there is not left any strong reference to object, in the real code).
I could simply hold the object references in a list and delete it after that but using with statement seems nicer. Is the class name Scope right for this task and the way I hold references is right? Is there a good way to hold the objects' strong references created on the fly different than my method?
I believe setting the Scope.ref_holder variable to None after exiting with statement, frees all the strong references and then gc collects them. I tested this by disabling gc and calling gc.collect then no object exists as unreacable, am I right to assume this method assures there is no leakage?
EDIT
I added the link for full source code.
Code is compliant with Python 2.7

I think the point here is that you are using __exit__ matched with with in Python.
Basically it should not happen like this. In __enter__ you are returning the self, the __exit__ should remove all the references. Are you sure that you don't have any exception in the middle? Due to exception __exit__ may return false and your garbage collection may not be complete.
However to answer your question, I can tell you that the way you are making it None in the __exit__ method is forceful garbage cleaning & I think it completely depends on the OS. If self.ref_holder is somehow liked with addressed of those objects, they may be cleaned properly.
Try to do del self.ref_holder rather. Assure more powerful cleanup.

Perhaps a bit OT: why explicit class for scope?
How about something simpler instead?
#contextlib.contextmanager
def scope():
rv = set()
try:
yield rv
finally:
pass
# if you must be explicit or
# if you want side-effect in leaked scopes
rv.clear()

Related

Call a class method only once

I created the following class:
import loader
import pandas
class SavTool(pd.DataFrame):
def __init__(self, path):
pd.DataFrame.__init__(self, data=loader.Loader(path).data)
#property
def path(self):
return path
#property
def meta_dict(self):
return loader.Loader(path).dict
If the class is instantiated the instance becomes a pandas DataFrame which I wanted to extend by other attributes like the path to the file and a dictionary containing meta information (called 'meta_dict').
What I want is the following: the dictionary 'meta_dict' shall be mutable. Namely, the following should work:
df = SavTool("somepath")
df.meta_dict["new_key"] = "new_value"
print df.meta_dict["new_key"]
But what happens is that every time I use the syntax 'df.meta_dict' the method 'meta_dict' is called and the original 'meta_dict' from loader.Loader is returned such that 'df.meta_dict' cannot be changed. Therefore, the syntax leads to "KeyError: 'new_key'". 'meta_dict' shall be called only once and then never again if it is used/called a second/third... time. The second/third... time 'meta_dict' should just be an attribute, in this case a dictionary.
How can I fix this? Maybe the whole design of the class is bad and should be changed (I'm new to using classes)? Thanks for your answers!
When you call loader.Loader you'll create a new instance of the dictionary each time. The #property doesn't cache anything for you, just provides a convenience for wrapping complicated getters for a clean interface for the caller.
Something like this should work. I also updated the path variable so it's bound correctly on the class and returned in the path property correctly.
import loader
import pandas
class SavTool(pd.DataFrame):
def __init__(self, path):
pd.DataFrame.__init__(self, data=loader.Loader(path).data)
self._path = path
self._meta_dict = loader.Loader(path).dict
#property
def path(self):
return self._path
#property
def meta_dict(self):
return self._meta_dict
def update_meta_dict(self, **kwargs):
self._meta_dict.update(kwargs)
Another way to just cache the variable is by using hasattr:
#property
def meta_dict(self):
if not hasattr(self, "_meta_dict"):
self._meta_dict = loader.Loader(path).dict
return self._meta_dict

Share plugin resources with implemented permission rules

I have multiple scripts that are exporting a same interface and they're executed using execfile() in insulated scope.
The thing is, I want them to share some resources so that each new script doesn't have to load them again from the start, thus loosing starting speed and using unnecessary amount of RAM.
The scripts are in reality much better encapsulated and guarded from malicious plug-ins than presented in example below, that's where problems for me begins.
The thing is, I want the script that creates a resource to be able to fill it with data, remove data or remove a resource, and of course access it's data.
But other scripts shouldn't be able to change another's scripts resource, just read it. I want to be sure that newly installed plug-ins cannot interfere with already loaded and running ones via abuse of shared resources.
Example:
class SharedResources:
# Here should be a shared resource manager that I tried to write
# but got stuck. That's why I ask this long and convoluted question!
# Some beginning:
def __init__ (self, owner):
self.owner = owner
def __call__ (self):
# Here we should return some object that will do
# required stuff. Read more for details.
pass
class plugin (dict):
def __init__ (self, filename):
dict.__init__(self)
# Here some checks and filling with secure versions of __builtins__ etc.
# ...
self["__name__"] = "__main__"
self["__file__"] = filename
# Add a shared resources manager to this plugin
self["SharedResources"] = SharedResources(filename)
# And then:
execfile(filename, self, self)
# Expose the plug-in interface to outside world:
def __getattr__ (self, a):
return self[a]
def __setattr__ (self, a, v):
self[a] = v
def __delattr__ (self, a):
del self[a]
# Note: I didn't use self.__dict__ because this makes encapsulation easier.
# In future I won't use object itself at all but separate dict to do it. For now let it be
----------------------------------------
# An example of two scripts that would use shared resource and be run with plugins["name"] = plugin("<filename>"):
# Presented code is same in both scripts, what comes after will be different.
def loadSomeResource ():
# Do it here...
return loadedresource
# Then Load this resource if it's not already loaded in shared resources, if it isn't then add loaded resource to shared resources:
shr = SharedResources() # This would be an instance allowing access to shared resources
if not shr.has_key("Default Resources"):
shr.create("Default Resources")
if not shr["Default Resources"].has_key("SomeResource"):
shr["Default Resources"].add("SomeResource", loadSomeResource())
resource = shr["Default Resources"]["SomeResource"]
# And then we use normally resource variable that can be any object.
# Here I Used category "Default Resources" to add and/or retrieve a resource named "SomeResource".
# I want more categories so that plugins that deal with audio aren't mixed with plug-ins that deal with video for instance. But this is not strictly needed.
# Here comes code specific for each plug-in that will use shared resource named "SomeResource" from category "Default Resources".
...
# And end of plugin script!
----------------------------------------
# And then, in main program we load plug-ins:
import os
plugins = {} # Here we store all loaded plugins
for x in os.listdir("plugins"):
plugins[x] = plugin(x)
Let say that our two scripts are stored in plugins directory and are both using some WAVE files loaded into memory.
Plugin that loads first will load the WAVE and put it into RAM.
The other plugin will be able to access already loaded WAVE but not to replace or delete it, thus messing with other plugin.
Now, I want each resource to have an owner, some id or filename of the plugin script, and that this resource is writable only by it's owner.
No tweaking or workarounds should enable the other plugin to access the first one.
I almost did it and then got stuck, and my head is spining with concepts that when implemented do the thing, but only partially.
This eats me, so I cannot concentrate any more. Any suggestion is more than welcome!
Adding:
This is what I use now without any safety included:
# Dict that will hold a category of resources (should implement some security):
class ResourceCategory (dict):
def __getattr__ (self, i): return self[i]
def __setattr__ (self, i, v): self[i] = v
def __delattr__ (self, i): del self[i]
SharedResources = {} # Resource pool
class ResourceManager:
def __init__ (self, owner):
self.owner = owner
def add (self, category, name, value):
if not SharedResources.has_key(category):
SharedResources[category] = ResourceCategory()
SharedResources[category][name] = value
def get (self, category, name):
return SharedResources[category][name]
def rem (self, category, name=None):
if name==None: del SharedResources[category]
else: del SharedResources[category][name]
def __call__ (self, category):
if not SharedResources.has_key(category):
SharedResources[category] = ResourceCategory()
return SharedResources[category]
__getattr__ = __getitem__ = __call__
# When securing, this must not be left as this, it is unsecure, can provide a way back to SharedResources pool:
has_category = has_key = SharedResources.has_key
Now a plugin capsule:
class plugin(dict):
def __init__ (self, path, owner):
dict.__init__()
self["__name__"] = "__main__"
# etc. etc.
# And when adding resource manager to the plugin, register it with this plugin as an owner
self["SharedResources"] = ResourceManager(owner)
# ...
execfile(path, self, self)
# ...
Example of a plugin script:
#-----------------------------------
# Get a category we want. (Using __call__() ) Note: If a category doesn't exist, it is created automatically.
AudioResource = SharedResources("Audio")
# Use an MP3 resource (let say a bytestring):
if not AudioResource.has_key("Beep"):
f = open("./sounds/beep.mp3", "rb")
Audio.Beep = f.read()
f.close()
# Take a reference out for fast access and nicer look:
beep = Audio.Beep # BTW, immutables doesn't propagate as references by themselves, doesn't they? A copy will be returned, so the RAM space usage will increase instead. Immutables shall be wrapped in a composed data type.
This works perfectly but, as I said, messing resources is too much easy here.
I would like an instance of ResourceManager() to be in charge to whom return what version of stored data.
So, my general approach would be this.
Have a central shared resource pool. Access through this pool would be read-only for everybody. Wrap all data in the shared pool so that no one "playing by the rules" can edit anything in it.
Each agent (plugin) maintains knowledge of what it "owns" at the time it loads it. It keeps a read/write reference for itself, and registers a reference to the resource to the centralized read-only pool.
When an plugin is loaded, it gets a reference to the central, read-only pool that it can register new resources with.
So, only addressing the issue of python native data structures (and not instances of custom classes), a fairly locked down system of read-only implementations is as follows. Note that the tricks that are used to lock them down are the same tricks that someone could use to get around the locks, so the sandboxing is very weak if someone with a little python knowledge is actively trying to break it.
import collections as _col
import sys
if sys.version_info >= (3, 0):
immutable_scalar_types = (bytes, complex, float, int, str)
else:
immutable_scalar_types = (basestring, complex, float, int, long)
# calling this will circumvent any control an object has on its own attribute lookup
getattribute = object.__getattribute__
# types that will be safe to return without wrapping them in a proxy
immutable_safe = immutable_scalar_types
def add_immutable_safe(cls):
# decorator for adding a new class to the immutable_safe collection
# Note: only ImmutableProxyContainer uses it in this initial
# implementation
global immutable_safe
immutable_safe += (cls,)
return cls
def get_proxied(proxy):
# circumvent normal object attribute lookup
return getattribute(proxy, "_proxied")
def set_proxied(proxy, proxied):
# circumvent normal object attribute setting
object.__setattr__(proxy, "_proxied", proxied)
def immutable_proxy_for(value):
# Proxy for known container types, reject all others
if isinstance(value, _col.Sequence):
return ImmutableProxySequence(value)
elif isinstance(value, _col.Mapping):
return ImmutableProxyMapping(value)
elif isinstance(value, _col.Set):
return ImmutableProxySet(value)
else:
raise NotImplementedError(
"Return type {} from an ImmutableProxyContainer not supported".format(
type(value)))
#add_immutable_safe
class ImmutableProxyContainer(object):
# the only names that are allowed to be looked up on an instance through
# normal attribute lookup
_allowed_getattr_fields = ()
def __init__(self, proxied):
set_proxied(self, proxied)
def __setattr__(self, name, value):
# never allow attribute setting through normal mechanism
raise AttributeError(
"Cannot set attributes on an ImmutableProxyContainer")
def __getattribute__(self, name):
# enforce attribute lookup policy
allowed_fields = getattribute(self, "_allowed_getattr_fields")
if name in allowed_fields:
return getattribute(self, name)
raise AttributeError(
"Cannot get attribute {} on an ImmutableProxyContainer".format(name))
def __repr__(self):
proxied = get_proxied(self)
return "{}({})".format(type(self).__name__, repr(proxied))
def __len__(self):
# works for all currently supported subclasses
return len(get_proxied(self))
def __hash__(self):
# will error out if proxied object is unhashable
proxied = getattribute(self, "_proxied")
return hash(proxied)
def __eq__(self, other):
proxied = get_proxied(self)
if isinstance(other, ImmutableProxyContainer):
other = get_proxied(other)
return proxied == other
class ImmutableProxySequence(ImmutableProxyContainer, _col.Sequence):
_allowed_getattr_fields = ("count", "index")
def __getitem__(self, index):
proxied = get_proxied(self)
value = proxied[index]
if isinstance(value, immutable_safe):
return value
return immutable_proxy_for(value)
class ImmutableProxyMapping(ImmutableProxyContainer, _col.Mapping):
_allowed_getattr_fields = ("get", "keys", "values", "items")
def __getitem__(self, key):
proxied = get_proxied(self)
value = proxied[key]
if isinstance(value, immutable_safe):
return value
return immutable_proxy_for(value)
def __iter__(self):
proxied = get_proxied(self)
for key in proxied:
if not isinstance(key, immutable_scalar_types):
# If mutable keys are used, returning them could be dangerous.
# If owner never puts a mutable key in, then integrity should
# be okay. tuples and frozensets should be okay as keys, but
# are not supported in this implementation for simplicity.
raise NotImplementedError(
"keys of type {} not supported in "
"ImmutableProxyMapping".format(type(key)))
yield key
class ImmutableProxySet(ImmutableProxyContainer, _col.Set):
_allowed_getattr_fields = ("isdisjoint", "_from_iterable")
def __contains__(self, value):
return value in get_proxied(self)
def __iter__(self):
proxied = get_proxied(self)
for value in proxied:
if isinstance(value, immutable_safe):
yield value
yield immutable_proxy_for(value)
#classmethod
def _from_iterable(cls, it):
return set(it)
NOTE: this is only tested on Python 3.4, but I tried to write it to be compatible with both Python 2 and 3.
Make the root of the shared resources a dictionary. Give a ImmutableProxyMapping of that dictionary to the plugins.
private_shared_root = {}
public_shared_root = ImmutableProxyMapping(private_shared_root)
Create an API where the plugins can register new resources to the public_shared_root, probably on a first-come-first-served basis (if it's already there, you can't register it). Pre-populate private_shared_root with any containers you know you're going to need, or any data you want to share with all plugins but you know you want to be read-only.
It might be convenient if the convention for the keys in the shared root mapping were all strings, like file-system paths (/home/dalen/local/python) or dotted paths like python library objects (os.path.expanduser). That way collision detection is immediate and trivial/obvious if plugins try to add the same resource to the pool.

Python lists being collected by GC

I've been reading about weak and strong references in Python, specifically regarding errors that look like
ReferenceError: weakly-referenced object no longer exists
Here I have a basic RPC interface that passes objects from client to server, where the server then saves those objects into a predefined class. Here's a basic outline of all the structures in my code. Note the behavior of "flags":
Client side:
# target = 'file.txt', flags = [(tuple, tuple), (tuple, tuple)]
def file_reminder(self, flags, target):
target = os.path.abspath(target)
c = rpyc.connect("localhost", port)
# flags can be referenced here
return c.root.file_reminder(flags, target)
Server side:
class MyService(rpyc.Service):
jobs = EventLoop().start()
# this is what's called from the client side
def exposed_file_reminder(self, flags, target):
reminder = FileReminder(flags, target)
self.jobs.add_reminder(reminder)
# reminder.flags can be referenced here
return "Added a new reminder"
class FileReminder(object):
def __init__(self, flags, target):
self.flags = flags
self.target = target
def __str__(self):
return str(self.flags) + target
class EventLoop(threading.Thread):
def __init__(self):
self.reminders = []
def add_reminder(self, reminder):
# reminder.flags can be referenced here
self.reminders.append(reminder)
def run(self):
while True:
for reminder in self.reminders:
# reminder.flags is no longer defined here
print reminder
The issue here is the "flags" argument always throwing a ReferenceError when printed in the thread (or manipulated in any way within the Thread's run() function). Note, target is processed just fine. When I change "flags" to an immutable, like a string, no ReferenceError is popping up. This is making my head scratch so any help would be appreciated!
Using Python GC on Compound Objects, I was able to fix this, although I do not know if it was done using "best practices".
Here's what I think the error was: although there were many references to the list itself, there were no explicit references to the tuples within that list. What I did to fix it was create a deep copy of the list on the instantiation of a FileReminder
For example
def __init__(self, flags, target):
self.flags = []
for flag in flags:
flags.append(flag)
This seems to work!

How to implement a strategy pattern with runtime selection of a method?

Context
I'm trying to implement some variant of strategy pattern in Python 2.7.
I want to be able to instantiate a 'my_strategy' base class, but switch between different implementations of a 'score' method at run-time.
I will have many common methods in 'my_strategy' but a bunch of 'score' implementations.
The main illustrates how I want to use it.
Here the scoring implementation is dummy of course.
What I tried (i.e. My code so far)
strategy.py:
from algo_one import *
#from algo_two import *
class my_strategy ( object ):
def __init__(self, candidate = ""):
self.candidate = candidate
self.method = 'default'
self.no = 10
self._algo = algo_one
def set_strategy(self, strategy='default'):
self.strategy = strategy
if self.strategy == 'algo_one':
self._algo = algo_one
elif self.strategy == 'algo_two':
# self._algo = algo_two
pass
else:
self._algo = None
def score(self, *args):
if len(args) > 0:
self.candidate = args[0]
self._algo.score(self.candidate)
if __name__ == "__main__":
s = my_strategy()
s.strategy = 'algo_one'
s.candidate = "hello world"
print s.score()
print s.score("hi")
# s.set_method('algo_two')
# print s.score("hi")
I want to save the selected strategy in some sort of private pointer to the sub-class method.
algo_one.py:
from strategy import my_strategy
class algo_one ( my_strategy ):
def score(self, candidate):
return len(candidate)*self.no
I could have a class-less method, but later I'll need to access public variables of the base class.
algo_two.py:
from strategy import my_strategy
class algo_two ( my_strategy ):
def score(self, candidate):
return len(candidate)*3
I have an empty init.py too.
The errors
1.
in score self._algo.score(self.candidate)
TypeError: unbound method score() must be called with algo_one
instance as first argument (got str instance instead)
2.
If I uncomment the import of the second strategy:
from algo_two import *
I get the following error.
ImportError: cannot import name my_strategy
My guess is that I run into some sort of circular dependency.
3.
from algo_one import *
This is obviously not pretty (unable to detect undefined names), but if I
from algo_one import algo_one
I get
ImportError: cannot import name algo_one
Question
I think the errors are intertwined and that my approach, as a whole, may be flawed. If not just addressing the error, I'm looking for suggestions to improve the design. Or any comment, really. Also I'm open to suggestions regarding the title of this question. Thank you!
You make it much more complicated than it needs to be. Python functions are first class objects so the simplest way to implement the strategy pattern in Python is to pass a 'strategy' function to your "context" object (the one that uses the strategy). The fine part is that any callable object (ie: any object implementing the __call__ method) will work.
def default_score_strategy(scorer):
return len(scorer.candidate) * 3
def universal_answer_score_strategy(scorer):
return 42 # definitly the universal answer <g>
class ComplicatedStrategy(object):
def __init__(self, factor):
self.factor = factor
def __call__(self, scorer):
return len(scorer.candidate) * self.factor
class Scorer(object):
def __init__(self, candidate="", strategy=default_score_strategy):
self.candidate = candidate
self.strategy = strategy
def score(self):
return self.strategy(self)
s1 = Scorer("foo")
s2 = Scorer("bar", strategy=universal_answer_score_strategy)
s3 = Scorer("baaz", strategy=ComplicatedStrategy(365))
Note that your strategies dont have to be in the same module as the Scorer class (well, except the default one of course), and that the module containing the Scorer class doesn't have to import the stratgeies modules - nor know anything about where the strategies are defined:
# main.py
from mylib.scores import Scorer
from myapp.strategies import my_custom_strategy
s = Scorer("yadda", my_custom_strategy)
You don't instantiate your algo object in the __init__ method. Remember, to instantiate a class object, you need to call it:
self._algo = algo_one()
Yes, that's a circular dependency. I don't see however why algo_one and algo_two need to inherit from my_strategy at all. Just make them plain objects, or inherit a base class stored somewhere else. Or, keep them all in the same file - there's no reason to necessarily have classes in separate files in Python.
This is the same problem as 2.
One of your main problems are that your algorithms try to subclass from your base class, which is a huge design flaw (you already noticed that). Use a simple method binding instead, which deals with all the necessary things:
def algo_one(candidate):
# do stuff
return "A fluffy unicorn"
def algo_two(candidate):
# do some other stuff
return "Awesome rabbits"
# not really necessary, just to make it easier to add new algorithms
STRATEGIES = { "one": algo_one, "two": algo_two }
class Strategy(object):
def __init__(self):
...
def set_strategy(self, which):
if which not in STRATEGIES:
raise ValueError("'%s' is an unknown strategy" % which)
# compatibility checks about the entries in STRATEGIES omitted here
self._algo = STRATEGIES[which]
def score(self, *args):
# ...
return self._algo(...)
If you need a more complex approach (this however depends on your requirements), in which everyone knows about each other, split the algorithms and strategy chooser into different classes referencing each other (shortened version below):
class ScoreAlgo(object):
def __init__(self, parent):
self._strategy = parent # if you need a back-reference, just be aware of circular dependencies in the garbage collection
def __del__(self):
self._strategy = None # resolve circular dependency for the GC
def score(self, candidate):
return None
class Strategy(object):
def __init__(self):
...
def set_strategy(self, ...):
...
self._algo = ScoreAlgo(self)
def score(self, ...):
return self._algo.score(...)
(If you need a huge variety of algorithms, you should make ScoreAlgo an abstract base class, for which subclasses have to implement the score() method).
You also could use a mixin pattern (which is a bit more formal than the method binding) or several other ways. This however depends on your overall requirements.
EDIT: I just added a returnto both def score(): stubs to avoid confusion about why those might not return anything.

python: closures and classes

I need to register an atexit function for use with a class (see Foo below for an example) that, unfortunately, I have no direct way of cleaning up via a method call: other code, that I don't have control over, calls Foo.start() and Foo.end() but sometimes doesn't call Foo.end() if it encounters an error, so I need to clean up myself.
I could use some advice on closures in this context:
class Foo:
def cleanup(self):
# do something here
def start(self):
def do_cleanup():
self.cleanup()
atexit.register(do_cleanup)
def end(self):
# cleanup is no longer necessary... how do we unregister?
Will the closure work properly, e.g. in do_cleanup, is the value of self bound correctly?
How can I unregister an atexit() routine?
Is there a better way to do this?
edit: this is Python 2.6.5
Make a registry a global registry and a function that calls a function in it, and remove them from there when necessary.
cleaners = set()
def _call_cleaners():
for cleaner in list(cleaners):
cleaner()
atexit.register(_call_cleaners)
class Foo(object):
def cleanup(self):
if self.cleaned:
raise RuntimeError("ALREADY CLEANED")
self.cleaned = True
def start(self):
self.cleaned = False
cleaners.add(self.cleanup)
def end(self):
self.cleanup()
cleaners.remove(self.cleanup)
I think the code is fine. There's no way to unregister, but you can set a boolean flag that would disable cleanup:
class Foo:
def __init__(self):
self.need_cleanup = True
def cleanup(self):
# do something here
print 'clean up'
def start(self):
def do_cleanup():
if self.need_cleanup:
self.cleanup()
atexit.register(do_cleanup)
def end(self):
# cleanup is no longer necessary... how do we unregister?
self.need_cleanup = False
Lastly, bear in mind that atexit handlers don't get called if "the program is killed by a signal not handled by Python, when a Python fatal internal error is detected, or when os._exit() is called."
self is bound correctly inside the callback to do_cleanup, but in fact if all you are doing is calling the method you might as well use the bound method directly.
You use atexit.unregister() to remove the callback, but there is a catch here as you must unregister the same function that you registered and since you used a nested function that means you have to store a reference to that function. If you follow my suggestion of using a bound method then you still have to save a reference to it:
class Foo:
def cleanup(self):
# do something here
def start(self):
self._cleanup = self.cleanup # Need to save the bound method for unregister
atexit.register(self._cleanup)
def end(self):
atexit.unregister(self._cleanup)
Note that it is still possible for your code to exit without calling ther atexit registered functions, for example if the process is aborted with ctrl+break on windows or killed with SIGABRT on linux.
Also as another answer suggests you could just use __del__ but that can be problematic for cleanup while a program is exiting as it may not be called until after other globals it needs to access have been deleted.
Edited to note that when I wrote this answer the question didn't specify Python 2.x. Oh well, I'll leave the answer here anyway in case it helps anyone else.
Since shanked deleted his posting, I'll speak in favor of __del__ again:
import atexit, weakref
class Handler:
def __init__(self, obj):
self.obj = weakref.ref(obj)
def cleanup(self):
if self.obj is not None:
obj = self.obj()
if obj is not None:
obj.cleanup()
class Foo:
def __init__(self):
self.start()
def cleanup(self):
print "cleanup"
self.cleanup_handler = None
def start(self):
self.cleanup_handler = Handler(self)
atexit.register(self.cleanup_handler.cleanup)
def end(self):
if self.cleanup_handler is None:
return
self.cleanup_handler.obj = None
self.cleanup()
def __del__(self):
self.end()
a1=Foo()
a1.end()
a1=Foo()
a2=Foo()
del a2
a3=Foo()
a3.m=a3
This supports the following cases:
objects where .end is called regularly; cleanup right away
objects that are released without .end being called; cleanup when the last
reference goes away
objects living in cycles; cleanup atexit
objects that are kept alive; cleanup atexit
Notice that it is important that the cleanup handler holds a weak reference
to the object, as it would otherwise keep the object alive.
Edit: Cycles involving Foo will not be garbage-collected, since Foo implements __del__. To allow for the cycle being deleted at garbage collection time, the cleanup must be taken out of the cycle.
class Cleanup:
cleaned = False
def cleanup(self):
if self.cleaned:
return
print "cleanup"
self.cleaned = True
def __del__(self):
self.cleanup()
class Foo:
def __init__(self):...
def start(self):
self.cleaner = Cleanup()
atexit.register(Handler(self).cleanup)
def cleanup(self):
self.cleaner.cleanup()
def end(self):
self.cleanup()
It's important that the Cleanup object has no references back to Foo.
Why don't you try it? It only took me a minute to check.
(Answer: Yes)
However, you can simplify it. The closure isn't needed.
class Foo:
def cleanup(self):
pass
def start(self):
atexit.register(self.cleanup)
And to not cleanup twice, just check in the cleanup method if a cleanup is needed or not before you clean up.

Categories