I'm sure this is not a very pythonic situation. But I'm not actually using this in any production code, I'm just considering how (if?) this could work. It doesn't have to be python specific, but I'd like a solution that at least WORKS within python framework.
Basically, I have a thread safe singleton object that implements __enter__ and __exit__ (so it can be used with with.
Singleton():
l = threading.Lock()
__enter__():
l.acquire()
__exit__():
l.release()
In my example, one thread gets the singleton, and inside the with statement it enters an infinite loop.
def infinite():
with Singleton():
while True:
pass
The goal of this experiment is to get the infinite thread out of its infinite loop WITHOUT killing the thread. Specifically using the Singleton object. First I was thinking of using an exception called from a different thread:
Singleton():
....
def killMe():
raise exception
But this obviously doesn't raise the exception in the other thread. What I thought next is that since the enter and exit methods acquire a class variable lock, is there any method that can be called on the Lock that will cause the thread that has acquired it to throw an exception?
Or, what I would probably do in C++ is just delete this or somehow call the destructor of the object from itself. Is there ANY way to do this in python? I know that if it's possible it will be a total hack job. But again, this is basically a thought experiment.
In Python, there is a somewhat undocumented way of raising an exception in another thread, though there are some caveats. See this recipe for "killable threads":
http://code.activestate.com/recipes/496960-thread2-killable-threads/
http://sebulba.wikispaces.com/recipe+thread2
Related
I have a relatively simple scenario in some Python code where I have two threads, one of which sets a value and the other is waiting for it to be set. My instinct was to reach for threading.Condition to implement this but I got wondering whether I could simply use threading.Event instead.
So, I have something like this:
value = None
readyToRead = threading.Event()
def set():
# executes in thread 1
global value
value = computeValue()
readyToRead.set()
def get():
# executes in thread 2
readyToRead.wait()
useValue(value)
I suppose I am uneasy because access to value is not actually mutex protected and I think in some languages at least it might not be safe simply to rely on the ordering implied by the statements in the code.
Is this a valid use of Event in Python?
yes this is the valid use-case of event..
value is thread protected.
if you increase the number of thread you have to wait in all thread .if that is the case you can use semaphore to.
Suppose I have a pool with a few processes inside of a class that I use to do some processing, like this:
class MyClass:
def __init_(self):
self.pool = Pool(processes = NUM_PROCESSES)
self.pop = []
self.finished = []
def gen_pop(self):
self.pop = [ self.pool.apply_async(Item.test, (Item(),)) for _ in range(NUM_PROCESSES) ]
while (not self.check()):
continue
# Do some other stuff
def check(self):
self.finished = filter(lambda t: self.pop[t].ready(), range(NUM_PROCESSES))
new_pop = []
for f in self.finished:
new_pop.append(self.pop[f].get(timeout = 1))
self.pop[f] = None
# Do some other stuff
When I run this code I get a cPickle.PicklingError which states that a <type 'function'> can't be pickled. What this tells me is that one of the apply_async functions has not returned yet so I am attempting to append a running function to another list. But this shouldn't be happening because all running calls should have been filtered out using the ready() function.
On a related note, the actual nature of the Item class is unimportant but what is important is that at the top of my Item.test function I have a print statement which is supposed to fire for debugging purposes. However, that does not occur. This tells me that that the function has been initiated but has not actually started execution.
So then, it appears that ready() does not actually tell me whether or not a call has finished execution or not. What exactly does ready() do and how should I edit my code so that I can filter out the processes that are still running?
Multiprocessing uses pickle module internally to pass data between processes,
so your data must be picklable. See the list of what is considered picklable, object method is not in that list.
To solve this quickly just use a wrapper function around the method:
def wrap_item_test(item):
item.test()
class MyClass:
def gen_pop(self):
self.pop = [ self.pool.apply_async(wrap_item_test, (Item(),)) for _ in range(NUM_PROCESSES) ]
while (not self.check()):
continue
To answer the question you asked, .ready() is really telling you whether .get() may block: if .ready() returns True, .get() will not block, but if .ready() returns False, .get() may block (or it may not: quite possible the async call will complete before you get around to calling .get()).
So, e.g., the timeout = 1 in your .get() serves no purpose: since you only call .get() if .ready() returned True, you already know for a fact that .get() won't block.
But .get() not blocking does not imply the async call was successful, or even that a worker process even started working on an async call: as the docs say,
If the remote call raised an exception then that exception will be reraised by get().
That is, e.g., if the async call couldn't be performed at all, .ready() will return True and .get() will (re)raise the exception that prevented the attempt from working.
That appears to be what's happening in your case, although we have to guess because you didn't post runnable code, and didn't include the traceback.
Note that if what you really want to know is whether the async call completed normally, after already getting True back from .ready(), then .successful() is the method to call.
It's pretty clear that, whatever Item.test may be, it's flatly impossible to pass it as a callable to .apply_async(), due to pickle restrictions. That explains why Item.test never prints anything (it's never actually called!), why .ready() returns True (the .apply_async() call failed), and why .get() raises an exception (because .apply_async() encountered an exception while trying to pickle one of its arguments - probably Item.test).
I would like to handle a NameError exception by injecting the desired missing variable into the frame and then continue the execution from last attempted instruction.
The following pseudo-code should illustrate my needs.
def function():
return missing_var
try:
print function()
except NameError:
frame = inspect.trace()[-1][0]
# inject missing variable
frame.f_globals["missing_var"] = ...
# continue frame execution from last attempted instruction
exec frame.f_code from frame.f_lasti
Read the whole unittest on repl.it
Notes
As pointed out by ivan_pozdeev in his answer, this is known as resumption.
After more research, I found Veedrac's answer to the question Resuming program at line number in the context before an exception using a custom sys.excepthook posted by lc2817 very interesting. It relies on Richie Hindle's work.
Background
The code runs in a slave process, which is controlled by a parent. Tasks (functions really) are written in the parent and latter passed to the slave using dill. I expect some tasks (running in the slave process) to try to access variables from outer scopes in the parent and I'd like the slave to request those variables to the parent on the fly.
p.s.: I don't expect this magic to run in a production environment.
On the contrary to what various commenters are saying, "resume-on-error" exception handling is possible in Python. The library fuckit.py implements said strategy. It steamrollers errors by rewriting the source code of your module at import time, inserting try...except blocks around every statement and swallowing all exceptions. So perhaps you could try a similar sort of tactic?
It goes without saying: that library is intended as a joke. Don't ever use it in production code.
You mentioned that your use case is to trap references to missing names. Have you thought about using metaprogramming to run your code in the context of a "smart" namespace such as a defaultdict? (This is perhaps only marginally less of a bad idea than fuckit.py.)
from collections import defaultdict
class NoMissingNamesMeta(type):
#classmethod
def __prepare__(meta, name, bases):
return defaultdict(lambda: "foo")
class MyClass(metaclass=NoMissingNamesMeta):
x = y + "bar" # y doesn't exist
>>> MyClass.x
'foobar'
NoMissingNamesMeta is a metaclass - a language construct for customising the behaviour of the class statement. Here we're using the __prepare__ method to customise the dictionary which will be used as the class's namespace during creation of the class. Thus, because we're using a defaultdict instead of a regular dictionary, a class whose metaclass is NoMissingNamesMeta will never get a NameError. Any names referred to during the creation of the class will be auto-initialised to "foo".
This approach is similar to #AndréFratelli's idea of manually requesting the lazily-initialised data from a Scope object. In production I'd do that, not this. The metaclass version requires less typing to write the client code, but at the expense of a lot more magic. (Imagine yourself debugging this code in two years, trying to understand why non-existent variables are dynamically being brought into scope!)
The "resumption" exception handling technique has proven to be problematic, that's why it's missing from C++ and later languages.
Your best bet is to use a while loop to not resume where the exception was thrown but rather repeat from a predetermined place:
while True:
try:
do_something()
except NameError as e:
handle_error()
else:
break
You really can't unwind the stack after an exception is thrown, so you'd have to deal with the issue before hand. If your requirement is to generate these variables on the fly (which wouldn't be recommended, but you seem to understand that), then you'd have to actually request them. You can implement a mechanism for that (such as having a global custom Scope class instance and overriding __getitem__, or using something like the __dir__ function), but not as you are asking for it.
In the question How do I "cd" in python, the accepted answer recommended wrapping the os.chdir call in a class to make the return to your original dir exception safe. Here was the recommended code:
class Chdir:
def __init__( self, newPath ):
self.savedPath = os.getcwd()
os.chdir(newPath)
def __del__( self ):
os.chdir( self.savedPath )
Could someone elaborate on how this works to make an unsafe call exception safe?
Thread safety and exception safety are not really the same thing at all. Wrapping the os.chdir call in a class like this is an attempt to make it exception safe not thread safe.
Exception safety is something you'll frequently hear C++ developers talk about. It isn't talked about nearly as much in the Python community. From Boost's Exception-Safety in Generic Components document:
Informally, exception-safety in a
component means that it exhibits
reasonable behavior when an exception
is thrown during its execution. For
most people, the term “reasonable”
includes all the usual expectations
for error-handling: that resources
should not be leaked, and that the
program should remain in a
well-defined state so that execution
can continue.
So the idea in the code snippet you supplied is to ensure that in the case of the exception, the program will return to a well-defined state. In this case, the process will be returned in the directory it started from, whether os.chdir itself fails, or something causes an exception to be thrown and the "Chdir" instance to be deleted.
This pattern of using an object that exists merely for cleaning up is a form of "Resource Acquisition Is Initialization", or "RAII". This technique is very popular in C++, but is not so popular in Python for a few reasons:
Python has try...finally, which serves pretty much the same purpose and is the more common idiom in Python.
Destructors (__del__) in Python are unreliable/unpredicatble in some implementations, so using them in this way is somewhat discouraged. In cpython they happen to be very reliable and predictable as long as cycles aren't involved (ie: when deletion is handled by reference counting) but in other implementations (Jython and I believe also IronPython) deletion happens when the garbage collector gets around to it, which could be much later. (Interestingly, this doesn't stop most Python programmers from relying on __del__ to close their opened files.)
Python has garbage collection, so you don't need to be quite as careful about cleanup as you do in C++. (I'm not saying you don't have to be careful at all, just that in the common situations you can rely on the gc to do the right thing for you.)
A more "pythonic" way of writing the above code would be:
saved_path = os.getcwd()
os.chdir(new_path)
try:
# code that does stuff in new_path goes here
finally:
os.chdir(saved_path)
The direct answer to the question is: It doesn't, the posted code is horrible.
Something like the following could be reasonable to make it "exception safe" (but much better is to avoid chdir and use full paths instead):
saved_path = os.getcwd()
try:
os.chdir(newPath)
do_work()
finally:
os.chdir(saved_path)
And this precise behavior can also be written into a context manager.
__del__ is called when the instance is about to be destroyed. So when you instantiate this class, the current working directory is saved to an instance attribute and then, well, os.chdir is called. When the instance is destroyed (for whatever reason) the current directory is changed to its old value.
This looks a bit incorrect to me. As far as I know, you must call parent's __del__ in your overriden __del__, so it should be more like this:
class Chdir(object):
def __init__(self, new_path):
self.saved_path = os.getcwd()
os.chdir(new_path)
def __del__(self):
os.chdir(self.saved_path)
super(Chdir, self).__del__()
That is, unless I am missing something, of course.
(By the way, can't you do the same using contextmanager?)
This code alone is neither thread-safe nor exception-safe. Actually I'm not really sure what you mean by exception-safe. Following code comes to mind:
try:
# something thrilling
except:
pass
And this is a terrible idea. Exceptions are not for guarding against. Well written code should catch exceptions and do something useful with them.
I recently wrote a program that used a simple producer/consumer pattern. It initially had a bug related to improper use of threading.Lock that I eventually fixed. But it made me think whether it's possible to implement producer/consumer pattern in a lockless manner.
Requirements in my case were simple:
One producer thread.
One consumer thread.
Queue has place for only one item.
Producer can produce next item before the current one is consumed. The current item is therefore lost, but that's OK for me.
Consumer can consume current item before the next one is produced. The current item is therefore consumed twice (or more), but that's OK for me.
So I wrote this:
QUEUE_ITEM = None
# this is executed in one threading.Thread object
def producer():
global QUEUE_ITEM
while True:
i = produce_item()
QUEUE_ITEM = i
# this is executed in another threading.Thread object
def consumer():
global QUEUE_ITEM
while True:
i = QUEUE_ITEM
consume_item(i)
My question is: Is this code thread-safe?
Immediate comment: this code isn't really lockless - I use CPython and it has GIL.
I tested the code a little and it seems to work. It translates to some LOAD and STORE ops which are atomic because of GIL. But I also know that del x operation isn't atomic when x implements __del__ method. So if my item has a __del__ method and some nasty scheduling happens, things may break. Or not?
Another question is: What kind of restrictions (for example on produced items' type) do I have to impose to make the above code work fine?
My questions are only about theoretical possibility to exploit CPython's and GIL's quirks in order to come up with lockless (i.e. no locks like threading.Lock explicitly in code) solution.
Trickery will bite you. Just use Queue to communicate between threads.
Yes this will work in the way that you described:
That the producer may produce a skippable element.
That the consumer may consume the same element.
But I also know that del x operation isn't atomic when x implements del method. So if my item has a del method and some nasty scheduling happens, things may break.
I don't see a "del" here. If a del happens in consume_item then the del may occur in the producer thread. I don't think this would be a "problem".
Don't bother using this though. You will end up using up CPU on pointless polling cycles, and it is not any faster than using a queue with locks since Python already has a global lock.
This is not really thread safe because producer could overwrite QUEUE_ITEM before consumer has consumed it and consumer could consume QUEUE_ITEM twice. As you mentioned, you're OK with that but most people aren't.
Someone with more knowledge of cpython internals will have to answer you more theoretical questions.
I think it's possible that a thread is interrupted while producing/consuming, especially if the items are big objects.
Edit: this is just a wild guess. I'm no expert.
Also the threads may produce/consume any number of items before the other one starts running.
You can use a list as the queue as long as you stick to append/pop since both are atomic.
QUEUE = []
# this is executed in one threading.Thread object
def producer():
global QUEUE
while True:
i = produce_item()
QUEUE.append(i)
# this is executed in another threading.Thread object
def consumer():
global QUEUE
while True:
try:
i = QUEUE.pop(0)
except IndexError:
# queue is empty
continue
consume_item(i)
In a class scope like below, you can even clear the queue.
class Atomic(object):
def __init__(self):
self.queue = []
# this is executed in one threading.Thread object
def producer(self):
while True:
i = produce_item()
self.queue.append(i)
# this is executed in another threading.Thread object
def consumer(self):
while True:
try:
i = self.queue.pop(0)
except IndexError:
# queue is empty
continue
consume_item(i)
# There's the possibility producer is still working on it's current item.
def clear_queue(self):
self.queue = []
You'll have to find out which list operations are atomic by looking at the bytecode generated.
The __del__ could be a problem as You said. It could be avoided, if only there was a way to prevent the garbage collector from invoking the __del__ method on the old object before We finish assigning the new one to the QUEUE_ITEM. We would need something like:
increase the reference counter on the old object
assign a new one to `QUEUE_ITEM`
decrease the reference counter on the old object
I'm afraid, I don't know if it is possible, though.