Multiprocessing, what does pool.ready do? - python

Suppose I have a pool with a few processes inside of a class that I use to do some processing, like this:
class MyClass:
def __init_(self):
self.pool = Pool(processes = NUM_PROCESSES)
self.pop = []
self.finished = []
def gen_pop(self):
self.pop = [ self.pool.apply_async(Item.test, (Item(),)) for _ in range(NUM_PROCESSES) ]
while (not self.check()):
continue
# Do some other stuff
def check(self):
self.finished = filter(lambda t: self.pop[t].ready(), range(NUM_PROCESSES))
new_pop = []
for f in self.finished:
new_pop.append(self.pop[f].get(timeout = 1))
self.pop[f] = None
# Do some other stuff
When I run this code I get a cPickle.PicklingError which states that a <type 'function'> can't be pickled. What this tells me is that one of the apply_async functions has not returned yet so I am attempting to append a running function to another list. But this shouldn't be happening because all running calls should have been filtered out using the ready() function.
On a related note, the actual nature of the Item class is unimportant but what is important is that at the top of my Item.test function I have a print statement which is supposed to fire for debugging purposes. However, that does not occur. This tells me that that the function has been initiated but has not actually started execution.
So then, it appears that ready() does not actually tell me whether or not a call has finished execution or not. What exactly does ready() do and how should I edit my code so that I can filter out the processes that are still running?

Multiprocessing uses pickle module internally to pass data between processes,
so your data must be picklable. See the list of what is considered picklable, object method is not in that list.
To solve this quickly just use a wrapper function around the method:
def wrap_item_test(item):
item.test()
class MyClass:
def gen_pop(self):
self.pop = [ self.pool.apply_async(wrap_item_test, (Item(),)) for _ in range(NUM_PROCESSES) ]
while (not self.check()):
continue

To answer the question you asked, .ready() is really telling you whether .get() may block: if .ready() returns True, .get() will not block, but if .ready() returns False, .get() may block (or it may not: quite possible the async call will complete before you get around to calling .get()).
So, e.g., the timeout = 1 in your .get() serves no purpose: since you only call .get() if .ready() returned True, you already know for a fact that .get() won't block.
But .get() not blocking does not imply the async call was successful, or even that a worker process even started working on an async call: as the docs say,
If the remote call raised an exception then that exception will be reraised by get().
That is, e.g., if the async call couldn't be performed at all, .ready() will return True and .get() will (re)raise the exception that prevented the attempt from working.
That appears to be what's happening in your case, although we have to guess because you didn't post runnable code, and didn't include the traceback.
Note that if what you really want to know is whether the async call completed normally, after already getting True back from .ready(), then .successful() is the method to call.
It's pretty clear that, whatever Item.test may be, it's flatly impossible to pass it as a callable to .apply_async(), due to pickle restrictions. That explains why Item.test never prints anything (it's never actually called!), why .ready() returns True (the .apply_async() call failed), and why .get() raises an exception (because .apply_async() encountered an exception while trying to pickle one of its arguments - probably Item.test).

Related

python mock.mock_calls does not follow calls made inside #timeout decorated function

I'm unit-testing a function with mock, and trying to test if a call to mock object was made. In the code below the requests.post object is mocked, and I track the requests.post.mock_calls list.
In the following code the
Code construct:
import timeout_decorator
def block_funds(trader_id, amounts):
#timeout_decorator.timeout(3, use_signals=False)
def _block_funds(block_url, amounts):
# requests.post.mock_calls empty here
result = requests.post(url=block_url, data=amounts)
# requests.post.mock_calls here has correct call recorded
return result.status_code
block_url = 'http:/someurl/somepath/{trader_id}'.format(trader_id=trader_id)
try:
# requests.post.mock_calls empty here
code = _block_funds(block_url, amounts)
# requests.post.mock_calls empty again here
except timeout_decorator.TimeoutError as ex:
logger.error('request failed')
code = 500
return code
After the call to code = _block_funds(block_url, amounts) I expect the mock object to keep record of all calls to it, but the mock_calls list gets emptied as soon as the execution exits the internal timeout wrapped function _block_funds(). Mock object is certainly the same, I'm following the mock IDs to ensure object has not changed.
What I'm doing wrong and how to make the mock not forget it's calls?
I've found the issue, it's in the timeout decorator, and specifically - in the use_signals=False part of it. As per timeout decorator documentation, to use timeouts correctly in my scenario (multithreaded web application) you need to not use signals and rely on multiprocessing instead, and in this case I see this unexpected mock that causes the problem. If I remove use_signals=False or remove decorator completely - it works fine.
My solution for now will be to mock the decorator itself also and avoid the issue.
Correction
Directly mocking the decorator turned out to be impractical. Instead I've wrapped it around and mocked the wrap:
def timeout_post(**kwargs):
#timeout_decorator.timeout(3, use_signals=False)
def _post(**kwargs):
return requests.post(**kwargs)
return _post(**kwargs)

How not to return to a calling function?

In python is there a way to not return to the caller function if a certain event happened in the called function. For example,...
def acquire_image(sdkobject):
ret = sdkobject.PrepareAcquisition()
error_check(ret)
ret = sdkobject.StartAcquisition()
error_check(ret)
error_check is a function that checks the return code to see if the sdk call had an error. If it is an error message then I would like to not go back to acquire and image but go to another function to reinitalise the sdk and start from the beginning again. Is there a pythonic way of doing this?
Have your error_check function raise an exception (like SDKError) if there is a problem, then run all the commands in a while loop.
class SDKError(Exception):
pass
# Perhaps define a separate exception for each possible
# error code, and make a dict that maps error codes to the
# appropriate exception.
class SDKType1Error(SDKError):
pass
class SDKType5Error(SDKError):
pass
sdk_errors = {
1: SDKType1Error,
5: SDKType5Error,
}
# Either return, if there was no error, or raise
# the appropriate exception
def error_check(return_code):
if return_code == 0:
return # No error
else:
raise sdk_errors[return_code]
# Example of how to deal with specific SDKErrors subclasses, or a generic
# catch-all SDKError
def acquire_image(sdkobject):
while True:
try:
# initialize sdk here
error_check(sdkobject.PrepareAcquisition())
error_check(sdkobject.StartAcquisition())
except SDKType1Error:
# Special handling for this error
except SDKError:
pass
else:
break
Return the error and use an if condition to check if the returned value has error, and if it has, call the reinitialization code from the calling function.
Use return for happy scenario
Returning to calling function is done by simple return or return response.
Use it for solving typical run of your code, when all goes well.
Throw exception, when something goes wrong
If something goes wrong, call raise Exception(). In many situations, your code does not has to do it explicitly, it throws the exception on its own.
You may even you your own Exception instances and use them to pass to the caller more information about what went wrong.
It took me a while to learn this approach and it made my coding much simpler and shorter then.
Do not care about what will your calling code do with it
Let your function do the task or fail, if there are problems.
Trying to think for client responsibility in your function will mess up your code and will not be complete solution anyway.
Things to avoid
Ignore who is calling you
In OOP this is principle of client anonymity. Just serve the request and do not care, who is calling.
Do not attempt using Exceptions as replacement for returning a value
Sometime, people use the fact, Exception can pass some information to to caller. But this is rather antipattern (there are always exception.)

Bad form to return None in __init__ in python

I was tinkering around with some classes and I came upon a situation where I wanted to cut off __init__ before it got a chance to do anything more. To do so, I simply put a null return statement at the end of the block I wanted to cut off at. For example:
class Example(object):
def __init__(self):
#lots and lots of code
if so_and_so:
return
Is it bad form to return inside __init__, even if it's just under this circumstance?
For any function, not only __init__, using plain return is equivalent to returning None, and if you don't use return in a function, None is implicitly returned anyway.
Therefor, it is perfectly fine to use return inside __init__.
(The exception to the rule above is generator functions, inside which you may only use return and not return None, so these are not equivalent inside generator functions).
Returning in the middle of __init__ will simply cut off object's initialization. It will not prevent the object from being created, nor interrupt program's flow in any way.
How long is a piece of string? If you don't need to initialize any further, then there's no point in continuing. Since None is returned implicitly at the end anyway (and in fact should be the only thing ever returned by __init__()), returning it early if initialization has been completed is fine.
OTOH, if you need to abort initialization then you should raise an exception instead.

How to understand Tornado gen.coroutine

I'm just a newbie to Tornado but not Python.
I'm trying to write an async client for CouchDB(using couchdb package).
After 1 day research/Googling, I found every posts just simply using HttpAsyncClient for example and not telling why and how gen.coroutine works.
The source code just too complicated for me understand cause decorator after decorator HttpAsyncClient is a bad example to me...
Frankly, for ma it was only after reading source, that I (partially) understood the logic
It sort of works like this if you have decorated somefunction() with tornado.gen:
in statement yield somefunction() somefunction() is actually called
Since it's a wrapper, not your code is executed but tornado.coroutine.gen. It runs your code until the first yield in somefunction
If a Future(placehodler) is yielded(back to decorator code!), tornado says: "OK, when this future resolves, schedule a task(IOLoop callback) for IOLoop so that it calls some another_callback()".
To track how far the execution of your somefunction() is gone, Tornado maintains a special object called Runner. It remembers the last yield statement which blocked the execution of your somefunction() and is first run when decorator gen is executed.
After point 3., where Future was "registered", this Runner returns from its main run() method and decorator exits, returning Future of its own
When Future from point 3. is ready, it adds a task to IOLoop, which then calls another_callback. Latter is a special callback created by Tornado, shortly put, it run()s the same Runner as was running when in point 3. a newly resolved Future was yielded.
Runner uses .send() method to inject a value of newly resolved Future back to your somefuncion, which cause ait to be assigned to a variable (if any) in your function in statement like this:
a = yield some_other_coroutine_for_example_async_client_fetch()
OK, that's a gist, there are lots of deatails, some of which I canot wrap my head around, especially, exception handling, but HTH
The first answer covers a lot. I just want to let u know a simpler abstraction.
#Gen.coroutine
res = yield foo()
The coroutine idea is asyncly execute foo(), especially when foo takes a lot IO or network job.
The yield could fire up the foo() execution and transfer control out , say, to caller. And it leaves a Runner to register this foo() task as a future obj. Then when foo() successfully return a result, this is magic happen, the Runner will send the result back, in the form of yield statement execution result (tell the diff between value yield and yield statement execution result).

Forcing an interrupt between threads through a singleton object (academic)

I'm sure this is not a very pythonic situation. But I'm not actually using this in any production code, I'm just considering how (if?) this could work. It doesn't have to be python specific, but I'd like a solution that at least WORKS within python framework.
Basically, I have a thread safe singleton object that implements __enter__ and __exit__ (so it can be used with with.
Singleton():
l = threading.Lock()
__enter__():
l.acquire()
__exit__():
l.release()
In my example, one thread gets the singleton, and inside the with statement it enters an infinite loop.
def infinite():
with Singleton():
while True:
pass
The goal of this experiment is to get the infinite thread out of its infinite loop WITHOUT killing the thread. Specifically using the Singleton object. First I was thinking of using an exception called from a different thread:
Singleton():
....
def killMe():
raise exception
But this obviously doesn't raise the exception in the other thread. What I thought next is that since the enter and exit methods acquire a class variable lock, is there any method that can be called on the Lock that will cause the thread that has acquired it to throw an exception?
Or, what I would probably do in C++ is just delete this or somehow call the destructor of the object from itself. Is there ANY way to do this in python? I know that if it's possible it will be a total hack job. But again, this is basically a thought experiment.
In Python, there is a somewhat undocumented way of raising an exception in another thread, though there are some caveats. See this recipe for "killable threads":
http://code.activestate.com/recipes/496960-thread2-killable-threads/
http://sebulba.wikispaces.com/recipe+thread2

Categories