Passing a Pipe/Connection as context arg to multiprocessing Pool.apply_async()

Passing a Pipe/Connection as context arg to multiprocessing Pool.apply_async() - python

I want to use pipes to talk to the process instances in my pool, but I'm getting an error:
Let __p be an instance of Pool():
(master_pipe, worker_pipe) = Pipe()
self.__p.apply_async(_worker_task,
(handler_info,
context_info,
worker_pipe))
When I execute this, I get the following error [for every instance, obviously]:
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/queues.py", line 376, in get
task = get()
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/queues.py", line 376, in get
TypeError: Required argument 'handle' (pos 1) not found
self.run()
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/process.py", line 114, in run
return recv()
return recv()
self._target(*self._args, **self._kwargs)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/pool.py", line 102, in worker
TypeError: Required argument 'handle' (pos 1) not found
TypeError: Required argument 'handle' (pos 1) not found
task = get()
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/queues.py", line 376, in get
return recv()
TypeError: Required argument 'handle' (pos 1) not found
The error is specifically referring to the Connection instance that I'm trying to pass. If I make it "None", the workers fork without error.
I don't understand this since, as the document emphasizes through example, I can easily pass the same argument to a Process(), and have it work perfectly:
from multiprocessing import Pipe, Process
def call_me(p):
print("Here: %s" % (p))
(master, worker) = Pipe()
p = Process(target=call_me, args=(worker,))
p.start()
Here: <read-write Connection, handle 6>
p.join()

It looks like this bug (http://bugs.python.org/issue4892) noted in this discussion: Python 2.6 send connection object over Queue / Pipe / etc
The pool forks child processes initially with pipes for communicating tasks/results to/from the child processes. It's in communicating your Pipe object over the existing pipe that it blows up - not on the forking. (the failure is when the child process tries a get() on the queue abstraction).
It looks like the problem arises because of how the Pipe object is pickled/unpickled for communication.
In the second case that you noted, the pipe is passed to a process instance and then forked - thus the difference in behavior.
I can't imagine that actively communicating with pool processes outside of pure task distribution was an intended use case for multiprocessing pool though. State/protocol-wise, that would imply that you would want more control over the process. That would require more context than what the general Pool object could ever know.

This is possible to solve by using the initializer and initargs arguments when you create the pool and its processes. Admittedly there has to be a global variable involved as well. However if you put the worker code in a separate module, it doesn't look all that bad. And it is only global to that process. :-)
A typical case is that you want your worker processes to add stuff to a multiprocessing queue. As that has to do with something having to reside in a certain spot in the memory, pickling will not work. Even if it would have worked, it would just have copied data about the fact that some process has a queue. Which is the opposite of what we want here. We want to share the same queue.
So here is a meta code example:
The module containing the worker code, we call it "worker_module":
def worker_init(_the_queue):
global the_queue
the_queue = _the_queue
def do_work(_a_string):
# Add something to the queue
the_queue.put("the string " + _a_string)
And the creation of the pool, followed by having it doing something
# Import our functions
from worker_module import worker_init, do_work
# Good idea: Call it MPQueue to not confuse it with the other Queue
from multiprocessing import Queue as MPQueue
from multiprocessing import Pool
the_queue = MPQueue()
# Initialize workers, it is only during initialization we can pass the_queue
the_pool = Pool(processes= 3, initializer=worker_init, initargs=[the_queue,])
# Do the work
the_pool.apply(do_work, ["my string",])
# The string is now on the queue
my_string = the_queue.get(True))

This is a bug which has been fixed in Python 3.
Easiest solution is to pass the queue through the Pool's initializer as suggested in the other answer.

Related

Why is "pickle" and "multiprocessing picklability" so different in Python?

Using Python's multiprocessing on Windows will require many arguments to be "picklable" while passing them to child processes.
import multiprocessing
class Foobar:
def __getstate__(self):
print("I'm being pickled!")
def worker(foobar):
print(foobar)
if __name__ == "__main__":
# Uncomment this on Linux
# multiprocessing.set_start_method("spawn")
foobar = Foobar()
process = multiprocessing.Process(target=worker, args=(foobar, ))
process.start()
process.join()
The documentation mentions this explicitly several times:
Picklability
Ensure that the arguments to the methods of proxies are picklable.
[...]
Better to inherit than pickle/unpickle
When using the spawn or forkserver start methods many types from multiprocessing need to be picklable so that child processes can use them. However, one should generally avoid sending shared objects to other processes using pipes or queues. Instead you should arrange the program so that a process which needs access to a shared resource created elsewhere can inherit it from an ancestor process.
[...]
More picklability
Ensure that all arguments to Process.__init__() are picklable. Also, if you subclass Process then make sure that instances will be picklable when the Process.start method is called.
However, I noticed two main differences between "multiprocessing pickle" and the standard pickle module, and I have trouble making sense of all of this.
multiprocessing.Queue() are not "pickable" yet passable to child processes
import pickle
from multiprocessing import Queue, Process
def worker(queue):
pass
if __name__ == "__main__":
queue = Queue()
# RuntimeError: Queue objects should only be shared between processes through inheritance
pickle.dumps(queue)
# Works fine
process = Process(target=worker, args=(queue, ))
process.start()
process.join()
Not picklable if defined in "main"
import pickle
from multiprocessing import Process
def worker(foo):
pass
if __name__ == "__main__":
class Foo:
pass
foo = Foo()
# Works fine
pickle.dumps(foo)
# AttributeError: Can't get attribute 'Foo' on <module '__mp_main__' from 'C:\\Users\\Delgan\\test.py'>
process = Process(target=worker, args=(foo, ))
process.start()
process.join()
If multiprocessing does not use pickle internally, then what are the inherent differences between these two ways of serializing objects?
Also, what does "inherit" mean in the context of multiprocessing? How am I supposed to prefer it over pickle?

When a multiprocessing.Queue is passed to a child process, what is actually sent is a file descriptor (or handle) obtained from pipe, which must have been created by the parent before creating the child. The error from pickle is to prevent attempts to send a Queue over another Queue (or similar channel), since it’s too late to use it then. (Unix systems do actually support sending a pipe over certain kinds of socket, but multiprocessing doesn’t use such features.) It’s expected to be “obvious” that certain multiprocessing types can be sent to child processes that would otherwise be useless, so no mention is made of the apparent contradiction.
Since the “spawn” start method can’t create the new process with any Python objects already created, it has to re-import the main script to obtain relevant function/class definitions. It doesn’t set __name__ like the original run for obvious reasons, so anything that is dependent on that setting will not be available. (Here, it is unpickling that failed, which is why your manual pickling works.)
The fork methods start the children with the parent’s objects (at the time of the fork only) still existing; this is what is meant by inheritance.

multiprocessing process gets started directly at creation

Hi I have a problem with multiprocessing in python 3.7
I've made a listener, that should be waiting for a response from a server without blocking the rest of the program (asynchronous communication):
self = cl.appendSend('bar', base_list)
print("client erstellt neuen nebenläufigen listener, für die Antwort des Servers")
multiprocessing.set_start_method("spawn")
queue = multiprocessing.Queue()
process = multiprocessing.Process(target = cl.appendResponse(), args=(self))
process.start()
print("listener aktiv")
thread = threading.Thread(target= waitingPrinter(), args=(process, queue))
print(thread)
is where everything is started
but the line process = multiprocessing.Process(target = cl.appendResponse(), args=(self)) is started once, runs through and then after being done, it just runs again. The debugger never leaves this line.
The method run in the process is:
def appendResponse(self):
print("nebenläufiger listener aktiv")
msgrcv = self.chan.receive_from(self.server)
print("nebenläufiger listener hat Antwort erhalten")
return msgrcv # pass it to caller
Sadly becaus of copyright I can't really post more, but the method runs through fine the first time and fails the second with the message :
Traceback (most recent call last):
> File "D:/Verteile Systeme 2/neues Lab/git/vs2lab/lab2/rpc/runcl.py",
> line 27, in <module>
> process = multiprocessing.Process(target = cl.appendResponse(), args=(self)) File "C:\Program Files
> (x86)\Python37-32\lib\multiprocessing\process.py", line 82, in
> __init__
> self._args = tuple(args) TypeError: 'Client' object is not iterable
So I am wondering, why is the process with cl.appendResponse() even started upon binding to the process and doesn't wait for process.start() and if not already in the answer to that, why does it then run directly a second time. And of course how can I fix that.
Also is there a way to replaces processing with thread and still get a return value?
I am having a lot of trouble with processing and return values.

target = cl.appendResponse() will run the function and return the result to target.
The correct syntax would be target=cl.appendResponse which will tell Process to run cl.appendResponse on start().

The cause of the the apparent immediate execution of the process has been correctly stated by philipp in their answer.
The target argument to Process takes a callable object, that is to be invoked by the run() method. Your code passes whatever is returned by self.chan.receive_from(self.server).
There is no subprocess running in or from the line process = multiprocessing.Process(target = cl.appendResponse(), args=(self)). Your method runs in the main process and blocks it.
On a side note: you will have the exact same issue with your thread, for the same reason: thread = threading.Thread(target= waitingPrinter(), args=(process, queue))
After your method has finished executing in the main process, the initialization of your process object raises the TypeError inside the __init__ method of the BaseProcess class.
You pass an argument, self, to your process, but you do it incorrectly. The args argument requires a tuple of arguments. The creation of a tuple through a literal needs a trailing comma if only a single value is specified: args=(self,). Your code effectively passes self, i.e. a Client object directly, which is not iterable and thus causes the error.In your case, appendResponse appears to be a bound method of the Client object. It will receive the self argument through the inner workings of Python's class system. Passing it explicitly through the process will raise another TypeError for passing two positional arguments to a method that only takes one. Unless appendSend returns something else than the Client instance cl, that you call it on, drop the args parameter in the process instantiation.
On another side note: the start method spawn is the only one available on Windows and thus the default. Unless your code needs to run under Unix using that start method, this line is redundant: multiprocessing.set_start_method("spawn")

TypeError: init() missing 1 required positional argument: 'message' using Multiprocessing

I am running a piece of code using a multiprocessing pool. The code works on a data set and fails on another one. Clearly the issue is data driven - Having said that I am not clear where to begin troubleshooting as the error I receive is the following. Any hints for a starting point would be most helpful. Both sets of data are prepared using the same code - so I don't expect there to be a difference - yet here I am.
Also see comment from Robert - we differ on os, and python version 3.6 (I have 3.4, he has 3.6) and quite different data sets. Yet error is identical down to the lines in the python code.
My suspicions:
there is a memory limit per core that is being enforced.
there is some period of time after which the process literally collects - finds the process is not over and gives up.
Exception in thread Thread-9:
Traceback (most recent call last):
File "C:\Program Files\Python\WinPython-64bit-3.4.4.4Qt5\python-3.4.4.amd64\lib\threading.py", line 911, in _bootstrap_inner
self.run()
File "C:\Program Files\Python\WinPython-64bit-3.4.4.4Qt5\python-3.4.4.amd64\lib\threading.py", line 859, in run
self._target(*self._args, **self._kwargs)
File "C:\Program Files\Python\WinPython-64bit-3.4.4.4Qt5\python-3.4.4.amd64\lib\multiprocessing\pool.py", line 429, in _handle_results
task = get()
File "C:\Program Files\Python\WinPython-64bit-3.4.4.4Qt5\python-3.4.4.amd64\lib\multiprocessing\connection.py", line 251, in recv
return ForkingPickler.loads(buf.getbuffer())
TypeError: init() missing 1 required positional argument: 'message'

I think the issue is that langdetect quietly declares a hidden global detector factory here https://github.com/Mimino666/langdetect/blob/master/langdetect/detector_factory.py#L120:
def init_factory():
global _factory
if _factory is None:
_factory = DetectorFactory()
_factory.load_profile(PROFILES_DIRECTORY)
def detect(text):
init_factory()
detector = _factory.create()
detector.append(text)
return detector.detect()
def detect_langs(text):
init_factory()
detector = _factory.create()
detector.append(text)
return detector.get_probabilities()
This kind of thing can cause issues in multiprocessing, in my experience, by running afoul of the way that multiprocessing attempts to share resources in memory across processes and manages namespaces in workers and the master process, though the exact mechanism in this case is a black box to me. I fixed it by adding a call to init_factory function to my pool initialization function:
from langdetect.detector_factory import init_factory
def worker_init_corpus(stops_in):
global sess
global stops
sess = requests.Session()
sess.mount("http://", HTTPAdapter(max_retries=10))
stops = stops_in
signal.signal(signal.SIGINT, signal.SIG_IGN)
init_factory()
FYI: The "sess" logic is to provide each worker with an http connection pool for requests, for similar issues when using that module with multiprocessing pools. If you don't do this, the workers do all their http communication back up through the parent process because that's where the hidden global http connection pool is by default, and then everything is painfully slow. This is one of the issues I've run into that made me suspect a similar cause here.
Also, to further reduce potential confusion: stops is for providing the stopword list I'm using to the mapped function. And the signal call is to force pools to exit nicely when hit with a user interrupt (ctrl-c). Otherwise they often get orphaned and just keep on chugging along after the parent process dies.
Then my pool is initialized like this:
self.pool = mp.Pool(mp.cpu_count()-2, worker_init_corpus, (self.stops,))
I also wrapped my call to detect in a try/catch LangDetectExeception block:
try:
posting_out["lang"] = detect(posting_out["job_description"])
except LangDetectException:
posting_out["lang"] = "none"
But this doesn't fix it on its own. Pretty confident that the the initialization is the fix.

Thanks to Robert - focusing on lang detect yielded the fact that possibly one of my text entries were empty
LangDetectException: No features in text
rookie mistake - possibly due to encoding errors- re-running after filtering those out - will keep you (Robert) posted.

I was throwing a custom exception somewhere in the code, and it was being thrown in most of my processes (in the pool). About 90% of my processes went to sleep because this exception occurred in them. But, instead of getting a normal traceback, I get this cryptic error. Mine was on Linux, though.
To debug this, I removed the pool and ran the code sequentially.

How to use multiprocessing with class instances in Python?

I am trying to create a class than can run a separate process to go do some work that takes a long time, launch a bunch of these from a main module and then wait for them all to finish. I want to launch the processes once and then keep feeding them things to do rather than creating and destroying processes. For example, maybe I have 10 servers running the dd command, then I want them all to scp a file, etc.
My ultimate goal is to create a class for each system that keeps track of the information for the system in which it is tied to like IP address, logs, runtime, etc. But that class must be able to launch a system command and then return execution back to the caller while that system command runs, to followup with the result of the system command later.
My attempt is failing because I cannot send an instance method of a class over the pipe to the subprocess via pickle. Those are not pickleable. I therefore tried to fix it various ways but I can't figure it out. How can my code be patched to do this? What good is multiprocessing if you can't send over anything useful?
Is there any good documentation of multiprocessing being used with class instances? The only way I can get the multiprocessing module to work is on simple functions. Every attempt to use it within a class instance has failed. Maybe I should pass events instead? I don't understand how to do that yet.
import multiprocessing
import sys
import re
class ProcessWorker(multiprocessing.Process):
"""
This class runs as a separate process to execute worker's commands in parallel
Once launched, it remains running, monitoring the task queue, until "None" is sent
"""
def __init__(self, task_q, result_q):
multiprocessing.Process.__init__(self)
self.task_q = task_q
self.result_q = result_q
return
def run(self):
"""
Overloaded function provided by multiprocessing.Process. Called upon start() signal
"""
proc_name = self.name
print '%s: Launched' % (proc_name)
while True:
next_task_list = self.task_q.get()
if next_task is None:
# Poison pill means shutdown
print '%s: Exiting' % (proc_name)
self.task_q.task_done()
break
next_task = next_task_list[0]
print '%s: %s' % (proc_name, next_task)
args = next_task_list[1]
kwargs = next_task_list[2]
answer = next_task(*args, **kwargs)
self.task_q.task_done()
self.result_q.put(answer)
return
# End of ProcessWorker class
class Worker(object):
"""
Launches a child process to run commands from derived classes in separate processes,
which sit and listen for something to do
This base class is called by each derived worker
"""
def __init__(self, config, index=None):
self.config = config
self.index = index
# Launce the ProcessWorker for anything that has an index value
if self.index is not None:
self.task_q = multiprocessing.JoinableQueue()
self.result_q = multiprocessing.Queue()
self.process_worker = ProcessWorker(self.task_q, self.result_q)
self.process_worker.start()
print "Got here"
# Process should be running and listening for functions to execute
return
def enqueue_process(target): # No self, since it is a decorator
"""
Used to place an command target from this class object into the task_q
NOTE: Any function decorated with this must use fetch_results() to get the
target task's result value
"""
def wrapper(self, *args, **kwargs):
self.task_q.put([target, args, kwargs]) # FAIL: target is a class instance method and can't be pickled!
return wrapper
def fetch_results(self):
"""
After all processes have been spawned by multiple modules, this command
is called on each one to retreive the results of the call.
This blocks until the execution of the item in the queue is complete
"""
self.task_q.join() # Wait for it to to finish
return self.result_q.get() # Return the result
#enqueue_process
def run_long_command(self, command):
print "I am running number % as process "%number, self.name
# In here, I will launch a subprocess to run a long-running system command
# p = Popen(command), etc
# p.wait(), etc
return
def close(self):
self.task_q.put(None)
self.task_q.join()
if __name__ == '__main__':
config = ["some value", "something else"]
index = 7
workers = []
for i in range(5):
worker = Worker(config, index)
worker.run_long_command("ls /")
workers.append(worker)
for worker in workers:
worker.fetch_results()
# Do more work... (this would actually be done in a distributor in another class)
for worker in workers:
worker.close()
Edit: I tried to move the ProcessWorker class and the creation of the multiprocessing queues outside of the Worker class and then tried to manually pickle the worker instance. Even that doesn't work and I get an error
RuntimeError: Queue objects should only be shared between processes
through inheritance
. But I am only passing references of those queues into the worker instance?? I am missing something fundamental. Here is the modified code from the main section:
if __name__ == '__main__':
config = ["some value", "something else"]
index = 7
workers = []
for i in range(1):
task_q = multiprocessing.JoinableQueue()
result_q = multiprocessing.Queue()
process_worker = ProcessWorker(task_q, result_q)
worker = Worker(config, index, process_worker, task_q, result_q)
something_to_look_at = pickle.dumps(worker) # FAIL: Doesn't like queues??
process_worker.start()
worker.run_long_command("ls /")

So, the problem was that I was assuming that Python was doing some sort of magic that is somehow different from the way that C++/fork() works. I somehow thought that Python only copied the class, not the whole program into a separate process. I seriously wasted days trying to get this to work because all of the talk about pickle serialization made me think that it actually sent everything over the pipe. I knew that certain things could not be sent over the pipe, but I thought my problem was that I was not packaging things up properly.
This all could have been avoided if the Python docs gave me a 10,000 ft view of what happens when this module is used. Sure, it tells me what the methods of multiprocess module does and gives me some basic examples, but what I want to know is what is the "Theory of Operation" behind the scenes! Here is the kind of information I could have used. Please chime in if my answer is off. It will help me learn.
When you run start a process using this module, the whole program is copied into another process. But since it is not the "__main__" process and my code was checking for that, it doesn't fire off yet another process infinitely. It just stops and sits out there waiting for something to do, like a zombie. Everything that was initialized in the parent at the time of calling multiprocess.Process() is all set up and ready to go. Once you put something in the multiprocess.Queue or shared memory, or pipe, etc. (however you are communicating), then the separate process receives it and gets to work. It can draw upon all imported modules and setup just as if it was the parent. However, once some internal state variables change in the parent or separate process, those changes are isolated. Once the process is spawned, it now becomes your job to keep them in sync if necessary, either through a queue, pipe, shared memory, etc.
I threw out the code and started over, but now I am only putting one extra function out in the ProcessWorker, an "execute" method that runs a command line. Pretty simple. I don't have to worry about launching and then closing a bunch of processes this way, which has caused me all kinds of instability and performance issues in the past in C++. When I switched to launching processes at the beginning and then passing messages to those waiting processes, my performance improved and it was very stable.
BTW, I looked at this link to get help, which threw me off because the example made me think that methods were being transported across the queues: http://www.doughellmann.com/PyMOTW/multiprocessing/communication.html
The second example of the first section used "next_task()" that appeared (to me) to be executing a task received via the queue.

Instead of attempting to send a method itself (which is impractical), try sending a name of a method to execute.
Provided that each worker runs the same code, it's a matter of a simple getattr(self, task_name).
I'd pass tuples (task_name, task_args), where task_args were a dict to be directly fed to the task method:
next_task_name, next_task_args = self.task_q.get()
if next_task_name:
task = getattr(self, next_task_name)
answer = task(**next_task_args)
...
else:
# poison pill, shut down
break

REF: https://stackoverflow.com/a/14179779
Answer on Jan 6 at 6:03 by David Lynch is not factually correct when he says that he was misled by
http://www.doughellmann.com/PyMOTW/multiprocessing/communication.html.
The code and examples provided are correct and work as advertised. next_task() is executing a task received via the queue -- try and understand what the Task.__call__() method is doing.
In my case what, tripped me up was syntax errors in my implementation of run(). It seems that the sub-process will not report this and just fails silently -- leaving things stuck in weird loops! Make sure you have some kind of syntax checker running e.g. Flymake/Pyflakes in Emacs.
Debugging via multiprocessing.log_to_stderr()F helped me narrow down the problem.

Python threading: What am I missing? (task_done() called too many times)

My apologies for the long-ish post up front. Hopefully it'll give enough context for a solution. I've tried to create a utility function that will take any number of old classmethods and stick them into a multi-threaded queue:
class QueuedCall(threading.Thread):
def __init__(self, name, queue, fn, args, cb):
threading.Thread.__init__(self)
self.name = name
self._cb = cb
self._fn = fn
self._queue = queue
self._args = args
self.daemon = True
self.start()
def run(self):
r = self._fn(*self._args) if self._args is not None \
else self._fn()
if self._cb is not None:
self._cb(self.name, r)
self._queue.task_done()
Here's what my calling code looks like (within a class)
data = {}
def __op_complete(name, r):
data[name] = r
q = Queue.Queue()
socket.setdefaulttimeout(5)
q.put(QueuedCall('twitter', q, Twitter.get_status, [5,], __op_complete))
q.put(QueuedCall('so_answers', q, StackExchange.get_answers,
['api.stackoverflow.com', 534476, 5], __op_complete))
q.put(QueuedCall('so_user', q, StackExchange.get_user_info,
['api.stackoverflow.com', 534476], __op_complete))
q.put(QueuedCall('p_answers', q, StackExchange.get_answers,
['api.programmers.stackexchange.com', 23901, 5], __op_complete))
q.put(QueuedCall('p_user', q, StackExchange.get_user_info,
['api.programmers.stackexchange.com', 23901], __op_complete))
q.put(QueuedCall('fb_image', q, Facebook.get_latest_picture, None, __op_complete))
q.join()
return data
The problem that I'm running into here is that it seems to work every time on a fresh server restart, but fails every second or third request, with the error:
ValueError: task_done() called too many times
This error presents itself in a random thread every second or third request, so it's rather difficult to nail down exactly what the problem is.
Anyone have any ideas and/or suggestions?
Thanks.
Edit:
I had added prints in an effort to debug this (quick and dirty rather than logging). One print statement (print 'running thread: %s' % self.name) in the first line of run and another right before calling task_done() (print 'thread done: %s' % self.name).
The output of a successful request:
running thread: twitter
running thread: so_answers
running thread: so_user
running thread: p_answers
thread done: twitter
thread done: so_user
running thread: p_user
thread done: so_answers
running thread: fb_image
thread done: p_answers
thread done: p_user
thread done: fb_image
The output of an unsuccessful request:
running thread: twitter
running thread: so_answers
thread done: twitter
thread done: so_answers
running thread: so_user
thread done: so_user
running thread: p_answers
thread done: p_answers
Exception in thread p_answers:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 552, in __bootstrap_inner
self.run()
File "/home/demian/src/www/projects/demianbrecht/demianbrecht/demianbrecht/helpers.py", line 37, in run
self._queue.task_done()
File "/usr/lib/python2.7/Queue.py", line 64, in task_done
raise ValueError('task_done() called too many times')
ValueError: task_done() called too many times
running thread: p_user
thread done: p_user
running thread: fb_image
thread done: fb_image

Your approach to this problem is "unconventional". But ignoring that for now ... the issue is simply that in the code you have given
q.put(QueuedCall('twitter', q, Twitter.get_status, [5,], __op_complete))
it is clearly possible for the following workflow to occur
A thread is constructed and started by QueuedCall.__init__
It is then put into the queue q. However ... before the Queue completes its logic for inserting the item, the independent thread has already finished its work and attempted to call q.task_done(). Which causes the error you have (task_done() has been called before the object was safely put into the queue)
How it should be done? You don't insert threads into queues. Queues hold data that threads process. So instead you
Create a Queue. Insert into it jobs you want done (as eg functions, the args they want and the callback)
You create and start worker threads
A worker thread calls
q.get() to get the function to invoke
invokes it
calls q.task_done() to let the queue know the item was handled.

I may be misunderstanding here, but I'm not sure you're using the Queue correctly.
From a brief survey of the docs, it looks like the idea is that you can use the put method to put work into a Queue, then another thread can call get to get some work out of it, do the work, and then call task_done when it has finished.
What your code appears to do is put instances of QueuedCall into a queue. Nothing ever gets from the queue, but the QueuedCall instances are also passed a reference to the queue they're being inserted into, and they do their work (which they know about intrinsically, not because they get it from the queue) and then call task_done.
If my reading of all that is correct (and you don't call the get method from somewhere else I can't see), then I believe I understand the problem.
The issue is that the QueuedCall instances have to be created before they can be put on the queue, and the act of creating one starts its work in another thread. If the thread finishes its work and calls task_done before the main thread has managed to put the QueuedCall into the queue, then you can get the error you see.
I think it only works when you run it the first time by accident. The GIL 'helps' you a lot; it's not very likely that the QueuedCall thread will actually gain the GIL and begin running immediately. The fact that you don't actually care about the Queue other than as a counter also 'helps' this appear to work: it doesn't matter if the QueuedCall hasn't hit the queue yet so long as it's not empty (this QueuedCall can just task_done another element in the queue, and by the time that element calls task_done this one will hopefully be in the queue, and it can be marked as done by that). And adding sleep also makes the new threads wait a bit, giving the main thread time to make sure they're actually in the queue, which is why that masks the problem as well.
Also note that, as far as I can tell from some quick fiddling with an interactive shell, your queue is actually still full at the end, because you never actually get anything out of it. It's just received a number of task_done messages equal to the number of things that were put in it, so the join works.
I think you'll need to radically redesign the way your QueuedCall class works, or use a different synchronisation primitive than a Queue. A Queue is designed to be used to queue work for worker threads that already exist. Starting a thread from within a constructor for an object that you put on the queue isn't really a good fit.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.