thread Locking/unlocking in constructor/destructor in python - python

I have a class that is only ever accessed externally through static methods. Those static methods then create an object of the class to use within the method, then they return and the object is presumably destroyed. The class is a getter/setter for a couple config files and now I need to place thread locks on the access to the config files.
Since I have several different static methods that all need read/write access to the config files that all create objects in the scope of the method, I was thinking of having my lock acquires done inside of the object constructor, and then releasing in the destructor.
My coworker expressed concern that it seems like that could potentially leave the class locked forever if something happened. And he also mentioned something about how the destructor in python was called in regards to the garbage collector, but we're both relatively new to python so that's an unknown.
Is this a reasonable solution or should I just lock/unlock in each of the methods themselves?
Class A():
rateLock = threading.RLock()
chargeLock = threading.RLock()
#staticmethod
def doZStuff():
a = A()
a.doStuff('Z')
#staticmethod
def doYStuff():
a = A()
a.doStuff('Y')
#synchronized(lock)
def doStuff(self, type):
if type == 'Z':
otherstuff()
elif type == 'B':
evenmorestuff()
Is it even possible to get it to work that way with the decorator on doStuff() instead of doZStuff()
Update
Thanks for the answers everyone. The problem I'm facing is mostly due to the fact that it doesn't really make sense to access my module asynchronously, but this is just part of an API. And the team accessing our stuff through the API was complaining about concurrency issues. So I don't need the perfect solution, I'm just trying to make it so they can't crash our side or get garbage data back

Class A():
rateLock = threading.RLock()
chargeLock = threading.RLock()
def doStuff(self,ratefile,chargefile):
with A.rateLock:
with open(ratefile) as f:
# ...
with A.chargeLock:
with open(chargefile) as f:
# ...
Using the with statement will guarantee that the (R)Lock is acquired and released in pairs. The release will be called even if there an exception occurs within the with-block.
You might also want to think about placing your locks around the file access block with open(...) as ... as tightly as you can so that the locks are not held longer than necessary.
Finally, the creation and garbage collection of a=A() will not affect the locks
if (as above) the locks are class attributes (as opposed to instance attributes). The class attributes live in A.__dict__, rather than a.__dict__. So the locks will not be garbage collected until A itself is garbage collected.

You are right with the garbage collection, so it is not a good idea.
Look into decorators, for writing synchronized functions.
Example: http://code.activestate.com/recipes/465057-basic-synchronization-decorator/
edit
I'm still not 100% sure what you have in mind, so my suggestion may be wrong:
class A():
lockZ = threading.RLock()
lockY = threading.RLock()
#staticmethod
#synchroized(lockZ)
def doZStuff():
a = A()
a.doStuff('Z')
#staticmethod
#synchroized(lockY)
def doYStuff():
a = A()
a.doStuff('Y')
def doStuff(self, type):
if type == 'Z':
otherstuff()
elif type == 'B':
evenmorestuff()

However, if you HAVE TO acquire and release locks in constructors and destructors, then you really, really, really should give your design another chance. You should change your basic assumptions.
In any application: a "LOCK" should always be held for a short time only - as short as possible. That means - in probably 90% of all cases, you will acquire the lock in the same method that will also release the lock.
There should hardly be NEVER EVER a reason to lock/unlock an object in a RAII style. This is not what it was meant to become ;)
Let me give you an ekxample: you manage some ressources, those res. can be read from many threads at once but only one thread can write to them.
In a "naive" implementation you would have one lock per object, and whenever someone wants to write to it, then you will LOCK it. When multiple threads want to write to it, then you have it synchronyzed fairly, all safe and well, BUT: When thread says "WRITE", then we will stall, until the other threads decide to release the lock.
But please understand that locks, mutex - all these primitives were created to synchronize only a few lines of your source code. So, instead of making the lock part of you writeable object, you have only a lock for the very short time where it really is required. You have to invest more time and thoughts in your interfaces. But, LOCKS/MUTEXES were never meant to be "held" for more than a few microseconds.

I don't know which platform you are on, but if you need to lock a file, well, you should probably use flock() if it is available instead of rolling your own locking routines.
Since you've mentioned that you are new to python, I must say that most of the time threads are not the solution in python. If your activity is CPU-bound, you should consider using multiprocessing. There is no concurrent execution because of GIL, remember? (this is true for most cases). If your activity is I/O bound, which I guess is the case, you should, perhaps, consider using an event-driven framework, like Twisted. That way you won't have to worry about deadlocks at all, I promise :)

Releasing locks in the destruction of objects is risky as has already been mentioned because of the garbage collector, because deciding when to call the __del__() method on objects is exclusively decided by the GC (usually when the refcount reaches zero) but in some cases, if you have circular references, it might never be called, even when the program exits.
If you are treating one specific configfile inside a class instance, then you might put a lock object from the Threading module inside it.
Some example code of this:
from threading import Lock
class ConfigFile:
def __init__(file):
self.file = file
self.lock = Lock()
def write(self, data):
self.lock.aquire()
<do stuff with file>
self.lock.release()
# Function that uses ConfigFile object
def staticmethod():
config = ConfigFile('myconfig.conf')
config.write('some data')
You can also use locks in a With statement, like:
def write(self, data):
with self.lock:
<do stuff with file>
And Python will aquire and release the lock for you, even in case of errors that happens while doing stuff with the file.

Related

How to design a python class with a thread member, that gets garbage collected

I have created a class A using the following pattern
class A:
def __init__(self):
self.worker = threading.Thread(target=self.workToDo)
self.worker.setDaemon(daemonic=True)
self.worker.start()
def workToDo(self):
while True:
print("Work")
However, this design gets not garbage collected. I assume that this is due to a circular dependency between the running thread and its parent.
How can i design a class that starts a periodic thread that does some work, stops this thread on destruction and gets destructed as soon as all obvious references to the parent object get out of scope.
I tried to stop the thread in the ___del___ method, but this method is never called (i assume due to the circular dependency).
There is no circular dependence, and the garbage collector is doing exactly what it is supposed to do. Look at the method workToDo:
def workToDo(self):
while True:
print("Work")
Once you start the thread, this method will run forever. It contains a variable named self: the instance of class A that originally launched the thread. As long as this method continues to run, there is an active reference to the instance of A and therefore it cannot be garbage collected.
This can easily be demonstrated with the following little program:
import threading
import time
def workToDo2():
while True:
print("Work2")
time.sleep(0.5)
class A:
def __init__(self):
self.worker = threading.Thread(target=workToDo2, daemon=True)
self.worker.start()
def workToDo(self):
while True:
print("Work")
time.sleep(0.5)
def __del__(self):
print("del")
A()
time.sleep(5.0)
If you change the function that starts the thread from self.workToDo to workToDo2, the __del__ method fires almost immediately. In that case the thread does not reference the object created by A(), so it can be safely garbage collected.
Your statement of the problem is based on a false assumption about how the garbage collector works. There is no such concept as "obvious reference" - there is either a reference or there isn't.
The threads continue to run whether the object that launched them is garbage collected or not. You really should design Python threads so there is a mechanism to exit from them cleanly, unless they are true daemons and can continue to run without harming anything.
I understand the urge to avoid trusting your users to call some sort of explicit close function. But the Python philosophy is "we're all adults here," so IMO this problem is not a good use of your time.
Syntax of destructor declaration:
def __del__(self):
# body of destructor
Note: A reference to objects is also deleted when the object goes out of reference or when the program ends.
Example 1: Here is the simple example of destructor. By using del keyword we deleted the all references of object ‘obj’, therefore destructor invoked automatically
Python program to illustrate destructor:
class Employee:
# Initializing
def __init__(self):
print('Employee created.')
# Deleting (Calling destructor)
def __del__(self):
print('Destructor called, Employee deleted.')
obj = Employee()
del obj

When are return values garbage collected?

I'm trying to understand how the Python garbage collector functions and if there is anything I can do to control when an object is collected. I wrote this test:
>>> class Test:
... def __del__(self):
... print 'Delete ' + str(self)
...
>>> def fun():
... return Test()
...
>>> fun()
<__main__.Test instance at 0x0000000002989E48>
>>> fun()
Delete <__main__.Test instance at 0x0000000002989E48>
<__main__.Test instance at 0x00000000023E2488>
>>> fun()
Delete <__main__.Test instance at 0x00000000023E2488>
<__main__.Test instance at 0x0000000002989C48>
As you can see, the Test instance, although I do not keep an instance to it, is not deleted until the next time I call fun. Is this simply an accident (could it have been deleted at any other point) or is there a specific reason why it is deleted only when I call fun again? Is there anything I can do to ensure it gets deleted if I don't keep a reference to it?
The "contact" of the Python garbage collector (like all garbage collectors) is that it will release an object sometime after the last reachable reference to it disappears.
Because CPython uses reference counting, as an implementation detail it will release most garbage objects (specifically non-cyclic objects) immediately after the last reachable reference to them disappears. This is not a guarantee of the Python language, and is not true of other Python implementations like PyPy, Jython, IronPython, so relying on it is generally considered to be poor practice.
In your case, what you're observing with the object being collected after the function is called again has little to do with the behaviour of the garbage collector, but is rather due to the way the interactive interpreter shell works.
When you evaluate an expression in the interactive prompt, the resulting value is automatically saved in the variable _, so you can get it back if you discover that you still want it only after you've seen it printed. So after your fun() calls, there is still a reference to the return value. Then when you evaluate another expression (anything else, it doesn't have to involve fun again), _ is overwritten with the new value, allowing the old one to be garbage collected.
This only happens for expressions directly entered at the interactive prompt, so it won't delay collection of objects within functions or when your Python code is imported or run as a script.
Try explicitly calling del on the returned value:
returned_value = fun()
del returned_value
But finalizers like __del__ can be problematic; as you have already seen, one issue is that when they get called is not deterministic. Also, it is possible within a finalizer to reinstantiate a deleted object, such as sticking a reference to it in a global list.
If you need to release resources (besides just raw memory) - things like unlocking locks, closing files, or releasing database connections, use a context manager, and bound its life span using the with statement. Many of these resource are already context managers. For example, a threading.Lock can be locked and unlocked implicitly using with:
# "with" statement will call the __enter__ method of self.lock,
# which will block until self.lock can be locked
with self.lock:
# do thread-synchronized stuff here
# self.lock is automatically released here - at then end of
# the "with" block, the lock's __exit__ method is called, which
# releases the lock. This will get called even if the block is
# exited by a raised exception

Why aren't destructors guaranteed to be called on interpreter exit?

From the python docs:
It is not guaranteed that __del__() methods are called for objects that still exist when the interpreter exits.
Why not? What problems would occur if this guarantee were made?
I'm not convinced by the previous answers here.
Firstly note that the example given does not prevent __del__ methods being called during exit. In fact, the current CPythons will call the __del__ method given, twice in the case of Python 2.7 and once in the case of Python 3.4. So this can't be the "killer example" which shows why the guarantee is not made.
I think the statement in the docs is not motivated by a design principle that calling the destructors would be bad. Not least because it seems that in CPython 3.4 and up they are always called as you would expect and this caveat seems to be moot.
Instead I think the statement simply reflects the fact that the CPython implementation has sometimes not called all destructors on exit (presumably for ease of implementation reasons).
The situation seems to be that CPython 3.4 and 3.5 do always call all destructors on interpreter exit.
CPython 2.7 by contrast does not always do this. Certainly __del__ methods are usually not called on objects which have cyclic references, because those objects cannot be deleted if they have a __del__ method. The garbage collector won't collect them. While the objects do disappear when the interpreter exits (of course) they are not finalized and so their __del__ methods are never called. This is no longer true in Python 3.4 after the implementation of PEP 442.
However, it seems that Python 2.7 also does not finalize objects that have cyclic references, even if they have no destructors, if they only become unreachable during the interpreter exit.
Presumably this behaviour is sufficiently particular and difficult to explain that it is best expressed simply by a generic disclaimer - as the docs do.
Here's an example:
class Foo(object):
def __init__(self):
print("Foo init running")
def __del__(self):
print("Destructor Foo")
class Bar(object):
def __init__(self):
print("Bar1 init running")
self.bar = self
self.foo = Foo()
b = Bar()
# del b
With the del b commented out, the destructor in Foo is not called in Python 2.7 though it is in Python 3.4.
With the del b added, then the destructor is called (at interpreter exit) in both cases.
If you did some nasty things, you could find yourself with an undeletable object which python would try to delete forever:
class Phoenix(object):
def __del__(self):
print "Deleting an Oops"
global a
a = self
a = Phoenix()
Relying on __del__ isn't great in any event as python doesn't guarantee when an object will be deleted (especially objects with cyclic references). That said, perhaps turning your class into a context manager is a better solution ... Then you can guarantee that cleanup code is called even in the case of an exception, etc...
One example where the destructor is not called is, if you exit inside a method. Have a look at this example:
class Foo(object):
def __init__(self):
print("Foo init running")
def __del__(self):
print("Destructor Foo")
class Bar(object):
def __init__(self):
print("Bar1 init running")
self.bar = self
self.foo = Foo()
def __del__(self):
print("Destructor Bar")
def stop(self):
del self.foo
del self
exit(1)
b = Bar()
b.stop()
The output is:
Bar1 init running
Foo init running
Destructor Foo
As we destruct foo explicitly, the destructor is called, but not the destructor of bar!
And, if we do not delete foo explicitly, it is also not destructed properly:
class Foo(object):
def __init__(self):
print("Foo init running")
def __del__(self):
print("Destructor Foo")
class Bar(object):
def __init__(self):
print("Bar1 init running")
self.bar = self
self.foo = Foo()
def __del__(self):
print("Destructor Bar")
def stop(self):
exit(1)
b = Bar()
b.stop()
Output:
Bar1 init running
Foo init running
I don't think this is because doing the deletions would cause problems. It's more that the Python philosophy is not to encourage developers to rely on the use of object deletion, because the timing of these deletions cannot be predicted - it is up to the garbage collector when it occurs.
If the garbage collector may defer deleting unused objects for an unknown amount of time after they go out of scope, then relying on side effects that happen during the object deletion is not a very robust or deterministic strategy. RAII is not the Python way. Instead Python code handles cleanup using context managers, decorators, and the like.
Worse, in complicated situations, such as with object cycles, the garbage collector might not ever detect that objects can be deleted. This situation has improved as Python has matured. But because of exceptions to the expected GC behaviour like this, it is unwise for Python developers to rely on object deletion.
I speculate that interpreter exit is another complicated situation where the Python devs, especially for older versions of Python, were not completely strict about making sure the GC delete ran on all objects.
Likely because most of programmers would assume that destructors should only be called on dead (already unreachable) objects, and here on exit we would invoke them on live objects.
If it the developer has not been expecting a destructor call on the live object, some nasty UB may result. At least, something must be done to force-close the application after time out if it hangs. But then some destructors may not be called.
Java Runtime.runFinalizersOnExit has been deprecated because of the same reason.

If I have an #staticmethod in my python webapp, do I need to protect it with threading.RLock()?

I have a webapp built in python running off the paste server. If I have declared an #staticmethod that assigns state to method-scoped variables do I have to protect it with eg threading.RLock() (or is there a better way) in order to prevent multiple HTTP requests (I'm assuming paste as a server contains some sort of threadpool for serving incoming requests) from interfering with eachother's state?
I should point out I am using Grok as my framework.
so -
#staticmethod
def doSomeStuff():
abc = 1
...some code...
abc = 5
given the above, is it thread safe inside grok/paste between threads (again, assuming requests are dealt with in threads?)
Local variables are created for each method call separately, no matter if it's static method, class method, non-static method or a stand-alone function, the same way as in Java. Unless you copy the references to those objects somewhere outside explicitly, so that they survive the method and can be accessed from other threads, you don't have to lock anything.
For example, this is safe unless CoolClass uses any shared state between instances:
def my_safe_method(*args):
my_cool_object = CoolClass()
my_cool_object.populate_from_stuff(*args)
return my_cool_object.result()
This is likely to be unsafe since the object reference may be shared between the threads (depends on what get_cool_inst does):
def my_suspicious_method(*args):
my_cool_object = somewhere.get_cool_inst()
my_cool_object.populate_from_stuff(*args)
# another thread received the same instance
# and modified it
# (my_cool_object is still local, but it's a reference to a shared object)
return my_cool_object.result()
This can be unsafe too, if publish shares the reference:
def my_suspicious_method(*args):
my_cool_object = CoolClass()
# puts somewhere into global namespace, other threads access it
publish(my_cool_object)
my_cool_object.prepare(*args)
# another thread modifies it now
return my_cool_object.result()
EDIT: The code sample you've provided is completely thread safe, #staticmethod didn't change anything in that respect.

setattr, object deletion and cyclic garbage collection

I would like to understand how object deletion works on python. Here is a very simple bunch of code.
class A(object):
def __init__(self):
setattr(self, "test", self._test)
def _test(self):
print "Hello, World!"
def __del__(self):
print "I'm dying!"
class B(object):
def test(self):
print "Hello, World!"
def __del__(self):
print "I'm dying"
print "----------Test on A"
A().test()
print "----------Test on B"
B().test()
Pythonista would recognize that I'm running a python 2.x version. More specially, this code runs on a python 2.7.1 setup.
This code outputs the following:
----------Test on A
Hello, World!
----------Test on B
Hello, World!
I'm dying
Surprisingly, A object is not deleted. I can understand why, since the setattr statement in __init__ produces a circular reference. But this one seems to be easy to resolve.
Finally, this page, in python documentation (Supporting Cyclic Garbage Collection), show that it's possible to deal with this kind of circular reference.
I would like to know:
why I never go thru my __del__ method in A class?
if my diagnosis about circular reference is good, why my object subclass does not support cyclic garbage collection?
finally, how to deal with this kind of setattr if I really want to go thru __del__?
Note: In A if the setattr points to another method of my module, there's no problem.
Fact 1
Instance methods are normally stored on the class. The interpreter first looks them up in the instance __dict__, which fails, and then looks on the class, which succeeds.
When you dynamically set the instance method of A in __init__, you create a reference to it in the instance dictionary. This reference is circular, so the refcount will never go to zero and the reference counter will not clean A up.
>>> class A(object):
... def _test(self): pass
... def __init__(self):
... self.test = self._test
...
>>> a = A()
>>> a.__dict__['test'].im_self
Fact 2
The garbage collector is what Python uses to deal with circular references. Unfortunately, it can't handle objects with __del__ methods, since in general it can't determine a safe order to call them. Instead, it just puts all such objects in gc.garbage. You can then go look there to break cycles, so they can be freed. From the docs
gc.garbage
A list of objects which the collector found to be unreachable but could
not be freed (uncollectable objects). By default, this list contains only
objects with __del__() methods. Objects that have __del__() methods
and are part of a reference cycle cause the entire reference cycle
to be uncollectable, including objects not necessarily
in the cycle but reachable only from it. Python doesn’t collect such
cycles automatically because, in general, it isn’t possible for Python
to guess a safe order in which to run the __del__() methods. If you
know a safe order, you can force the issue by examining the garbage
list, and explicitly breaking cycles due to your objects within the
list. Note that these objects are kept alive even so by virtue of
being in the garbage list, so they should be removed from garbage too.
For example, after breaking cycles, do del gc.garbage[:] to empty the
list. It’s generally better to avoid the issue by not creating cycles
containing objects with __del__() methods, and garbage can be examined
in that case to verify that no such cycles are being created.
Therefore
Don't make cyclic references on objects with __del__ methods if you want them to be garbage collected.
You should read the documentation on the __del__ method rather carefully - specifically, the part where objects with __del__ methods change the way the collector works.
The gc module provides some hooks where you can clean this up yourself.
I suspect that simply not having a __del__ method here would result in your object being properly cleaned up. You can verify this by looking through gc.garbage and seeing if your instance of A is present.

Categories