Python destructor basing on try/finally + yield? - python

I've been testing a dirty hack inspired by this http://docs.python.org/2/library/contextlib.html .
The main idea is to bring try/finally idea onto class level and get reliable and simple class destructor.
class Foo():
def __init__(self):
self.__res_mgr__ = self.__acquire_resources__()
self.__res_mgr__.next()
def __acquire_resources__(self):
try:
# Acquire some resources here
print "Initialize"
self.f = 1
yield
finally:
# Release the resources here
print "Releasing Resources"
self.f = 0
f = Foo()
print "testing resources"
print f.f
But it always gives me:
Initialize
testing resources
1
and never "Releasing Resources". I'm basing my hope on:
As of Python version 2.5, the yield statement is now allowed in the
try clause of a try ... finally construct. If the generator is not
resumed before it is finalized (by reaching a zero reference count or
by being garbage collected), the generator-iterator’s close() method
will be called, allowing any pending finally clauses to execute. Source link
But it seems when the class member is being garbage collected together with the class their ref counts don't decrease, so as a result generators close() and thus finally is never called. As for the second part of the quote
"or by being garbage collected"
I just don't know why it's not true. Any chance to make this utopia work? :)
BTW this works on module level:
def f():
try:
print "ack"
yield
finally:
print "release"
a = f()
a.next()
print "testing"
Output will be as I expect:
ack
testing
release
NOTE: In my task I'm not able to use WITH manager because I'm releasing the resource inside end_callback of the thread (it will be out of any WITH). So I wanted to get a reliable destructor for cases when callback won't be called for some reason

The problem you are having is caused by a reference cycle and an implicit __del__ defined on your generator (it's so implicit, CPython doesn't actually show __del__ when you introspect, because only the C level tp_del exists, no Python-visible __del__ is created). Basically, when a generator has a yield inside:
A try block, or equivalently
A with block
it has an implicit __del__-like implementation. On Python 3.3 and earlier, if a reference cycle contains an object whose class implements __del__ (technically, has tp_del in CPython), unless the cycle is manually broken, the cyclic garbage collector cannot clean it up, and just sticks it in gc.garbage (import gc to gain access), because it doesn't know which objects (if any) must be collected first to clean up "nicely".
Because your class's __acquire_resources__(self) contains a reference to the instance's self, you form a reference cycle:
self -> self.__res_mgr__ (generator object) -> generator frame (referencing locals which includes) -> self
Because of this reference cycle, and the fact that the generator has a try/finally in it (creating tp_del equivalent to __del__), the cycle is uncollectable, and your finally block never gets executed unless you manually advance self.__res_mgr__ (which defeats the whole purpose).
You experiment happens to display this problem automatically because the reference cycle is implicit/automatic, but any accidental reference cycle where an object in the cycle has a class with __del__ will trigger the same problem, so even if you just did:
class Foo():
def __init__(self):
# Acquire some resources here
print "Initialize"
self.f = 1
def __del__(self):
# Release the resources here
print "Releasing Resources"
self.f = 0
if the "resources" involved could conceivably lead to a reference cycle with an instance of Foo, you'd have the same problem.
The solution here is one or both of:
Make your class a context manager so users provide the information necessary for deterministic finalization (by using with blocks) as well as providing an explicit cleanup method (e.g. close) for when with blocks aren't feasible (part of another object's state that is cleaned up through its own resource management). This is also the only way to provide deterministic cleanup on most non-CPython interpreters where reference counting semantics have never been used (so all finalizers are called non-deterministically, if at all)
Move to Python 3.4 or higher, where PEP 442 resolves the issue with uncollectable cyclic garbage (it's technically still possible to produce such cycles on CPython, but only via third party extensions that continue to use tp_del instead of updating to use the tp_finalize slot that allows cyclic garbage to be cleaned properly). It's still non-deterministic cleanup (if a reference cycle exists, you're waiting on the cyclic gc to run, sometime), but it's possible, where pre-3.4, cyclic garbage of this sort could not be cleaned up at all.

Related

Understanding multi-threading and locks in Python (concept and example)

I did research on multi-threading for a programming project using it (first-timer here...). I would appreciate if you deemed my statements below correct or, rather, comment on the ones that are wrong or need correction.
A lock is an object which can be passed to functions, methods, ... by reference. A (in this example) function can then make use of that lock object reference in order to safely operate on data (a variable in this example). It does this by acquiring the lock, modifying the variable and then releasing the lock.
A thread can be created to target a function, which may obtain a reference to a lock (to then achieve what is stated above).
A lock does not protect a specific variable, object etc.
A lock does not protect or do anything unless it is acquired (and released).
Thus, it is in the responsibility of the programmer to use the lock in order achieve the desired protection.
If a lock is acquired inside a function executed by thread A, this has no immediate influence on any other running thread B. Not even if the functions targeted by threads A and B have a reference to the same lock object.
Only if the function targeted by thread B wants to acquire the same lock (i.e. via the same referenced lock object), which already was acquired by the function targeted by thread A at that time, the lock conveys influence on both threads in that thread B will pause further execution until the function targeted by thread A releases the lock again.
Thus, a locked lock only ever pauses execution of a thread, if its targeted function wants (and waits) to acquire the very same lock itself. Thus, by thread A acquiring the lock, it can only prevent thread B from acquiring the same lock, nothing more, nothing less.
If I want to use a lock to prevent race conditions when setting a variable, I (as the programmer) need to:
pass a lock to all functions targeted by threads that will want to set the variable and
acquire the lock in every function and every time before I set the variable (and release it afterwards). (*)
If I create even only one thread targeting a function without providing it a reference to the lock object and let it set the variable or
if I set the variable via a thread whose targeted function has the lock object, but doesn't acquire it prior to the operation, I will have failed to implement thread-safe setting of the variable.
(*) The lock should be acquired as long as the variable must not be accessed by other threads. Right now, I like to compare that to a database transaction... I lock the database (~ acquire a lock) until my set of instructions is completed, then I commit (~ release the lock).
Example If I wanted to create a class whose member _value should be set in a thread-safe fashion, I would implement one of these two versions:
class Version1:
def __init__(self):
self._value:int = 0
self._lock:threading.Lock = threading.Lock()
def getValue(self) -> int:
"""Getting won't be protected in this example."""
return self._value
def setValue(self, val:int) -> None:
"""This will be made thread-safe by member lock."""
with self._lock:
self._value = val
v1 = Version1()
t1_1 = threading.Thread(target=v1.setValue, args=(1)).start()
t1_2 = threading.Thread(target=v1.setValue, args=(2)).start()
class Version2:
def __init__(self):
self._value:int = 0
def getValue(self) -> int:
"""Getting won't be protected in this example."""
return self._value
def setValue(self, val:int, lock:threading.Lock) -> None:
"""This will be made thread-safe by injected lock."""
with self._lock:
self._value = val
v2 = Version2()
l = threading.Lock()
t2_1 = threading.Thread(target=v2.setValue, args=(1, l)).start()
t2_2 = threading.Thread(target=v2.setValue, args=(2, l)).start()
In Version1, I, as the class provider, can guarantee that setting _value is always thread-safe...
...because in Version2, the user of my class might pass to different lock objects to the two spawned threads and thus render the lock protection useless.
If I want to give the user of my class the freedom to include the setting of _value into a larger collection of steps that should be executed in a thread-safe manner, I could inject a Lock reference into Version1's __init__ function and assign that to the _lock member. Thus, the thread-safe operation of the class would be guaranteed while still allowing the user of the class to use "her own" lock for that purpose.
A score from 0-15 will now rate how well I have (mis)understood locks... :-D
It's also quite common to use global variables for locks. It depends on what the lock is protecting.
True, although somewhat meaningless. Any function can use a lock, not just the function that's the target of a thread.
If you mean there's no direct link between a lock and the data it protects, that's true. But you can define a data structure that contains a value that needs protecting and a reference to its lock.
True. Although as I say in 3, you can define a data structure that packages the data and lock. You could make this a class and have the class methods automatically acquire the lock as needed.
Correct. But see 4 for how you can automate this.
Correct.
Correct.
Correct.
Correct if it's not a global lock.
Partially correct. You should also often acquire the lock if you're merely reading the variable. If reading the object is not atomic (e.g. it's a list and you're reading multiple elements, or you read the same scalar object variable times and expect it to be stable), you need to prevent another thread from modifying it while you're reading.
Correct.
Correct.
Correct. This is an example of what I described above in 3 and 4.
Correct. Which is why the design in 13 is often better.
This is tricky, because the granularity of the locking needs to reflect all the objects that need to be protected. Your class only protects the assignment of that one variable -- it will release the lock before all the other steps associated with the caller-provided lock have been completed.

When does exception handling unexpectedly influence object lifetimes?

The Python reference on the data model notes that
catching an exception with a ‘try…except’ statement may keep objects alive.
It seems rather obvious that exceptions change control flow, potentially leading to different objects remaining referenced. Why is it explicitly mentioned? Is there a potential for memory leaks here?
An exception stores a traceback, which stores all child frames ("function calls") between raising and excepting. Frames reference all local names and their values, preventing the garbage collection of local names and values.
This means that an exception handler should promptly finish handling exceptions to allow child locals to be cleaned up. Still, a function cannot rely on its locals being collectable immediately after the function ends.
As a result, patterns such as RAII are not reliable to be prompt even on reference counted implementations. When prompt cleanup is required, objects should provide a means for explicit cleanup (for use in finally blocks) or preferably automatic cleanup (for use in with blocks).
Objects, values and types
[…]
Programs are strongly recommended to explicitly close such objects. The ‘try…finally’ statement and the ‘with’ statement provide convenient ways to do this.
One can observe this with a class that marks when it is garbage collected.
class Collectible:
def __init__(self, name):
self.name = name
def __del__(self, print=print):
print("Collecting", self.name)
def inner():
local_name = Collectible("inner local value")
raise RuntimeError("This is a drill")
def outer():
local_name = Collectible("outer local value")
inner()
try:
outer()
except RuntimeError as e:
print(f"handling a {type(e).__name__}: {e}")
On CPython, the output shows that the handler runs before the locals are collected:
handling a RuntimeError: This is a drill
Collecting inner local value
Collecting outer local value
Note that CPython uses reference counting, which already leads to quick cleanup as soon as possible. Other implementations may further and arbitrarily delay cleanup.
Well, AFAIK, if the exception references some object or another, those won't be collected until the exception itself is collected and, also, if the except statement happens to reference some object, that would also postergate its collection until after the block is over. I wonder if there are other, less obvious ways in which catching an exception could affect garbage collection.

Will a ContextVar leak memory in async logic if not reset after Exception?

If I have a structure in an async webserver like
import contextvars
...
my_context_var = contextvars.ContextVar("var")
#app.route("/foo") # decorator from webserver
async def some_web_endpoint():
local_ctx_var = my_context_var.set(params.get("bar")) # app sets params
await some_function_that_can_raise()
local_ctx_var.reset()
Will it leak memory if I don't wrap the ContextVar in a finally: block and some_function_that_can_raise() raises an Exception?
(without such a case, .reset() would never be called)
try:
await some_function_that_can_raise()
finally:
local_ctx_var.reset()
.. or is it safe to assume the value will be destroyed when the request scope ends?
The async example in the upstream docs doesn't actually bother .reset()-ing it at all!
In such a case, .reset() is redundant as it happens right before the context is cleaned up anyways.
To add some more context (ha), I'm recently learning about ContextVars and I assume the second is the case.
local_ctx_var is the only name which refers to the Token (from .set()), and as the name is deleted when the request scope ends, the local value should become candidate for garbage collection, preventing a potential leak and making .reset() unnecessary for short-lived scopes (hooray)
..but I'm not absolutely certain, and while there's some very extremely helpful information on the subject, it muddles the mixture slightly
What happens if I don't reset Python's ContextVars? (implies it'll be GC'd as one would expect)
Context variables in Python (explicitly uses finally:)
Yes - the previous value of the context_var is kept in the token object in this case. There is this rather similar question, where one of the answers run a simple benchmark to assert that calling context_var.set() multiple times and discarding the return value does not consume memory, when compared to, say, create a new string and keeping a reference to it.
Given the benchmark, I made some further experimentation and concluded there is no leak - in fact, in code like the above, calling reset is indeed redundant - it is useful if you'd have to restore the previous value inside a loop construct for some reason.
The new var is set, on top of the last saved context, the value set in the current version of the context is simply discarded along the way: the only references to it are the one left in the tokens, if any. In ohtther words: what preserves the previous values in a "stack like" way are calls to contextvars.run and contextvars.copy_context only, not Contextvar.set.

What is a "runtime context"?

(Edited for even more clarity)
I'm reading the Python book (Python Essential Reference by Beazley) and he says:
The with statement allows a series of statements to execute inside a
runtime context that is controlled by an object that serves as a context manager.
Here is an example:
with open("debuglog","a") as f:
f.write("Debugging\n")
statements
f.write("Done\n")
He goes on to say:
The with obj statement accepts an optional as var specifier. If given, the value
returned by obj._ enter _() is placed into var. It is important to emphasize
that obj is not necessarily the value assigned to var.
I understand the mechanics of what a 'with' keyword does: a file-object is returned by open and that object is accessible via f within the body of the block. I also understand that enter() and eventually exit() will be called.
But what exactly is a run-time context? A few low level details would be nice - or, an example in C. Could someone clarify what exactly a "context" is and how it might relate to other languages (C, C++). My understanding of a context was the environment eg: a Bash shell executes ls in the context of all the (env displayed) shell variables.
With the with keyword - yes f is accessible to the body of the block but isn't that just scoping? eg: for x in y: here x is not scoped within the block and retains it's value outside the block - is this what Beazley means when he talks about 'runtime context', that f is scoped only within the block and looses all significance outside the with-block?? Why does he say that the statements "execute inside a runtime context"??? Is this like an "eval"??
I understand that open returns an object that is "not ... assigned to var"??
Why isn't it assigned to var? What does Beazley mean by making a statement like that?
The with statement was introduced in PEP 343. This PEP also introduced a new term, "context manager", and defined what that term means.
Briefly, a "context manager" is an object that has special method functions .__enter__() and .__exit__(). The with statement guarantees that the .__enter__() method will be called to set up the block of code indented under the with statement, and also guarantees that the .__exit__() method function will be called at the time of exit from the block of code (no matter how the block is exited; for example, if the code raises an exception, .__exit__() will still be called).
http://www.python.org/dev/peps/pep-0343/
http://docs.python.org/2/reference/datamodel.html?highlight=context%20manager#with-statement-context-managers
The with statement is now the preferred way to handle any task that has a well-defined setup and teardown. Working with a file, for example:
with open(file_name) as f:
# do something with file
You know the file will be properly closed when you are done.
Another great example is a resource lock:
with acquire_lock(my_lock):
# do something
You know the code won't run until you get the lock, and as soon as the code is done the lock will be released. I don't often do multithreaded coding in Python, but when I did, this statement made sure that the lock was always released, even in the face of an exception.
P.S. I did a Google search online for examples of context managers and I found this nifty one: a context manager that executes a Python block in a specific directory.
http://ralsina.me/weblog/posts/BB963.html
EDIT:
The runtime context is the environment that is set up by the call to .__enter__() and torn down by the call to .__exit__(). In my example of acquiring a lock, the block of code runs in the context of having a lock available. In the example of reading a file, the block of code runs in the context of the file being open.
There isn't any secret magic inside Python for this. There is no special scoping, no internal stack, and nothing special in the parser. You simply write two method functions, .__enter__() and .__exit__() and Python calls them at specific points for your with statement.
Look again at this section from the PEP:
Remember, PEP 310 proposes roughly this syntax (the "VAR =" part is optional):
with VAR = EXPR:
BLOCK
which roughly translates into this:
VAR = EXPR
VAR.__enter__()
try:
BLOCK
finally:
VAR.__exit__()
In both examples, BLOCK is a block of code that runs in a specific runtime context that is set up by the call to VAR.__enter__() and torn down by VAR.__exit__().
There are two main benefits to the with statement and the way it is all set up.
The more concrete benefit is that it's "syntactic sugar". I would much rather write a two-line with statement than a six-line sequence of statements; it's easier two write the shorter one, it looks nicer and is easier to understand, and it is easier to get right. Six lines versus two means more chances to screw things up. (And before the with statement, I was usually sloppy about wrapping file I/O in a try block; I only did it sometimes. Now I always use with and always get the exception handling.)
The more abstract benefit is that this gives us a new way to think about designing our programs. Raymond Hettinger, in a talk at PyCon 2013, put it this way: when we are writing programs we look for common parts that we can factor out into functions. If we have code like this:
A
B
C
D
E
F
B
C
D
G
we can easily make a function:
def BCD():
B
C
D
A
BCD()
E
F
BCD()
G
But we have never had a really clean way to do this with setup/teardown. When we have a lot of code like this:
A
BCD()
E
A
XYZ()
E
A
PDQ()
E
Now we can define a context manager and rewrite the above:
with contextA:
BCD()
with contextA:
XYZ()
with contextA:
PDQ()
So now we can think about our programs and look for setup/teardown that can be abstracted into a "context manager". Raymond Hettinger showed several new "context manager" recipes he had invented (and I'm racking my brain trying to remember an example or two for you).
EDIT: Okay, I just remembered one. Raymond Hettinger showed a recipe, that will be built in to Python 3.4, for using a with statement to ignore an exception within a block. See it here: https://stackoverflow.com/a/15566001/166949
P.S. I've done my best to give the sense of what he was saying... if I have made any mistake or misstated anything, it's on me and not on him. (And he posts on StackOverflow sometimes so he might just see this and correct me if I've messed anything up.)
EDIT: You've updated the question with more text. I'll answer it specifically as well.
is this what Beazley means when he talks about 'runtime context', that f is scoped only within the block and looses all significance outside the with-block?? Why does he say that the statements "execute inside a runtime context"??? Is this like an "eval"??
Actually, f is not scoped only within the block. When you bind a name using the as keyword in a with statement, the name remains bound after the block.
The "runtime context" is an informal concept and it means "the state set up by the .__enter__() method function call and torn down by the .__exit__() method function call." Again, I think the best example is the one about getting a lock before the code runs. The block of code runs in the "context" of having the lock.
I understand that open returns an object that is "not ... assigned to var"?? Why isn't it assigned to var? What does Beazley mean by making a statement like that?
Okay, suppose we have an object, let's call it k. k implements a "context manager", which means that it has method functions k.__enter__() and k.__exit__(). Now we do this:
with k as x:
# do something
What David Beazley wants you to know is that x will not necessarily be bound to k. x will be bound to whatever k.__enter__() returns. k.__enter__() is free to return a reference to k itself, but is also free to return something else. In this case:
with open(some_file) as f:
# do something
The call to open() returns an open file object, which works as a context manager, and its .__enter__() method function really does just return a reference to itself.
I think most context managers return a reference to self. Since it's an object it can have any number of member variables, so it can return any number of values in a convenient way. But it isn't required.
For example, there could be a context manager that starts a daemon running in the .__enter__() function, and returns the process ID number of the daemon from the .__enter__() function. Then the .__exit__() function would shut down the daemon. Usage:
with start_daemon("parrot") as pid:
print("Parrot daemon running as PID {}".format(pid))
daemon = lookup_daemon_by_pid(pid)
daemon.send_message("test")
But you could just as well return the context manager object itself with any values you need tucked inside:
with start_daemon("parrot") as daemon:
print("Parrot daemon running as PID {}".format(daemon.pid))
daemon.send_message("test")
If we need the PID of the daemon, we can just put it in a .pid member of the object. And later if we need something else we can just tuck that in there as well.
The with context takes care that on entry, the __enter__ method is called and the given var is set to whatever __enter__ returns.
In most cases, that is the object which is worked on previously - in the file case, it is - but e.g. on a database, not the connection object, but a cursor object is returned.
The file example can be extended like this:
f1 = open("debuglog","a")
with f1 as f2:
print f1 is f2
which will print True as here, the file object is returned by __enter__. (From its point of view, self.)
A database works like
d = connect(...)
with d as c:
print d is c # False
print d, c
Here, d and c are completely different: d is the connection to the database, c is a cursor used for one transaction.
The with clause is terminated by a call to __exit__() which is given the state of execution of the clause - either success or failure. In this case, the __exit__() method can act appropriately.
In the file example, the file is closed no matter if there was an error or not.
In the database example, normally the transaction is committed on success and rolled back on failure.
The context manager is for easy initialisation and cleanup of things like exactly these - files, databases etc.
There is no direct correspondence in C or C++ that I am aware of.
C knows no concept of exception, so none can be caught in a __exit__(). C++ knows exceptions, and there seems to be ways to do soo (look below at the comments).

Is this Python producer-consumer lockless approach thread-safe?

I recently wrote a program that used a simple producer/consumer pattern. It initially had a bug related to improper use of threading.Lock that I eventually fixed. But it made me think whether it's possible to implement producer/consumer pattern in a lockless manner.
Requirements in my case were simple:
One producer thread.
One consumer thread.
Queue has place for only one item.
Producer can produce next item before the current one is consumed. The current item is therefore lost, but that's OK for me.
Consumer can consume current item before the next one is produced. The current item is therefore consumed twice (or more), but that's OK for me.
So I wrote this:
QUEUE_ITEM = None
# this is executed in one threading.Thread object
def producer():
global QUEUE_ITEM
while True:
i = produce_item()
QUEUE_ITEM = i
# this is executed in another threading.Thread object
def consumer():
global QUEUE_ITEM
while True:
i = QUEUE_ITEM
consume_item(i)
My question is: Is this code thread-safe?
Immediate comment: this code isn't really lockless - I use CPython and it has GIL.
I tested the code a little and it seems to work. It translates to some LOAD and STORE ops which are atomic because of GIL. But I also know that del x operation isn't atomic when x implements __del__ method. So if my item has a __del__ method and some nasty scheduling happens, things may break. Or not?
Another question is: What kind of restrictions (for example on produced items' type) do I have to impose to make the above code work fine?
My questions are only about theoretical possibility to exploit CPython's and GIL's quirks in order to come up with lockless (i.e. no locks like threading.Lock explicitly in code) solution.
Trickery will bite you. Just use Queue to communicate between threads.
Yes this will work in the way that you described:
That the producer may produce a skippable element.
That the consumer may consume the same element.
But I also know that del x operation isn't atomic when x implements del method. So if my item has a del method and some nasty scheduling happens, things may break.
I don't see a "del" here. If a del happens in consume_item then the del may occur in the producer thread. I don't think this would be a "problem".
Don't bother using this though. You will end up using up CPU on pointless polling cycles, and it is not any faster than using a queue with locks since Python already has a global lock.
This is not really thread safe because producer could overwrite QUEUE_ITEM before consumer has consumed it and consumer could consume QUEUE_ITEM twice. As you mentioned, you're OK with that but most people aren't.
Someone with more knowledge of cpython internals will have to answer you more theoretical questions.
I think it's possible that a thread is interrupted while producing/consuming, especially if the items are big objects.
Edit: this is just a wild guess. I'm no expert.
Also the threads may produce/consume any number of items before the other one starts running.
You can use a list as the queue as long as you stick to append/pop since both are atomic.
QUEUE = []
# this is executed in one threading.Thread object
def producer():
global QUEUE
while True:
i = produce_item()
QUEUE.append(i)
# this is executed in another threading.Thread object
def consumer():
global QUEUE
while True:
try:
i = QUEUE.pop(0)
except IndexError:
# queue is empty
continue
consume_item(i)
In a class scope like below, you can even clear the queue.
class Atomic(object):
def __init__(self):
self.queue = []
# this is executed in one threading.Thread object
def producer(self):
while True:
i = produce_item()
self.queue.append(i)
# this is executed in another threading.Thread object
def consumer(self):
while True:
try:
i = self.queue.pop(0)
except IndexError:
# queue is empty
continue
consume_item(i)
# There's the possibility producer is still working on it's current item.
def clear_queue(self):
self.queue = []
You'll have to find out which list operations are atomic by looking at the bytecode generated.
The __del__ could be a problem as You said. It could be avoided, if only there was a way to prevent the garbage collector from invoking the __del__ method on the old object before We finish assigning the new one to the QUEUE_ITEM. We would need something like:
increase the reference counter on the old object
assign a new one to `QUEUE_ITEM`
decrease the reference counter on the old object
I'm afraid, I don't know if it is possible, though.

Categories