Finding exception in python multiprocessing - python

I have a bit of python code that looks like this:
procs = cpu_count()-1
if serial or procs == 1:
results = map(do_experiment, experiments)
else:
pool = Pool(processes=procs)
results = pool.map(do_experiment, experiments)
It runs fine when I set the serial flag, but it gives the following error when the Pool is used. When I try to print something from do_experiment nothing shows up, so I can't try/catch there and print a stack trace.
Exception in thread Thread-2:
Traceback (most recent call last):
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 530, in __bootstrap_inner
self.run()
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 483, in run
self.__target(*self.__args, **self.__kwargs)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/pool.py", line 285, in _handle_tasks
put(task)
TypeError: 'NoneType' object is not callable
What is a good way to proceed debugging this?

I went back in my git history until I found a commit where things were still working.
I added a class to my code that extends dict so that keys can be accessed with a . (so dict.foo in stead of dict["foo"]. Multiprocessing did not take kindly to this, using an ordinary dict solved the problem.

Related

"Dictionary size changed during iteration" from Pebble ProcessPool

We've some parallel processing code which is built around Pebble, it's been working robustly for quite some time but we seem to have run into some odd edge-case.
Based on the exception trace (and the rock-simple code feeding it) I suspect that it's actually a bug in Pebble but who knows.
The code feeding the process pool is pretty trivial:
pool = ProcessPool(max_workers=10, max_tasks=10)
for path in filepaths:
try:
future = pool.schedule(function=self.analyse_file, args(path), timeout=30)
future.add_done_callback(self.process_result)
exception Exception as e:
print("Exception fired:" + e) # NOT where the exception is firing
pool.close()
pool.join()
So in essence, we schedule a bunch of stuff to run, close out the pool then wait for the pool to complete the scheduled tasks. NOTE: the exception is not being thrown in the schedule loop, it gets fired AFTER we call join().
This is the exception stack trace:
Traceback (most recent call last):
File "/home/user/.pyenv/versions/3.6.0/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/home/user/.pyenv/versions/3.6.0/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/home/user/.pyenv/versions/scrapeapp/lib/python3.6/site-packages/pebble/pool/process.py", line 150, in task_scheduler_loop
pool_manager.schedule(task)
File "/home/user/.pyenv/versions/scrapeapp/lib/python3.6/site-packages/pebble/pool/process.py", line 198, in schedule
self.worker_manager.dispatch(task)
File "/home/user/.pyenv/versions/scrapeapp/lib/python3.6/site-packages/pebble/pool/process.py", line 327, in dispatch
self.pool_channel.send(WorkerTask(task.id, task.payload))
File "/home/user/.pyenv/versions/scrapeapp/lib/python3.6/site-packages/pebble/pool/channel.py", line 66, in send
return self.writer.send(obj)
File "/home/user/.pyenv/versions/3.6.0/lib/python3.6/multiprocessing/connection.py", line 206, in send
self._send_bytes(_ForkingPickler.dumps(obj))
File "/home/user/.pyenv/versions/3.6.0/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
RuntimeError: dictionary changed size during iteration
I think it's got to be some weird race condition, as the code will work flawlessly on some data sets but fail at what appears to be a random point on another dataset.
We were using pebble 4.3.1 when we first ran into the issue (same version we'd had since the beginning), tried upgrading to 4.5.0, no change.
Has anybody run into similar issues with Pebble in the past? If so what was your fix?

Dask Distributed: Getting some errors after computations

I am running Dask Distributed on Linux CentOS 7, with a Python 3.6.2 installation. My computation seems to be getting fine (I am still improving my code, but I am able to have some results), but I keep getting some python errors apparently linked to tornado module. I am only launching a one node standalone Dask distributed cluster.
Here is the most common example:
Exception in thread Client loop:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/local/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.6/site-packages/tornado/ioloop.py", line 832, in start
self._run_callback(self._callbacks.popleft())
AttributeError: 'NoneType' object has no attribute 'popleft'
And here is another one:
tornado.application - ERROR - Exception in callback <bound method WorkStealing.balance of <distributed.stealing.WorkStealing object at 0x7f752ce6d6a0>>
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/tornado/ioloop.py", line 1026, in _run
return self.callback()
File "/usr/local/lib/python3.6/site-packages/distributed/stealing.py", line 248, in balance
sat = s.rprocessing[key]
KeyError: 'read-block-9024000000-e3fefd2110094168cc0505db69b326e0'
Do you have any idea why? Should I close some connections or stop the standalone cluster?
Yes, if you don't close down the Tornado IOLoop before exiting the process then it can die in an unpleasant way. Fortunately this shouldn't affect your application, except by looking unpleasant.
You might submit a bug report about this, it's still something that we should fix.

Receive return value from different thread in another module

I am trying to replicate C# code in python which executes a thread, waits for it to finish and returns a value. Essentially the method RunAndWait is in a helper class because a call to that method is being made multiple times.
C# code is as follows:
public static bool RunAndWait(Action _action, long _timeout)
{
Task t = Task.Run(() =>
{
Log.Message(Severity.MESSAGE, "Executing " + _action.Method.Name);
_action();
});
if (!t.Wait(Convert.ToInt32(_timeout)))
{
Log.Message(Severity.ERROR, "Executing " + _action.Method.Name + " timedout. Could not execute MCS command.");
throw new AssertFailedException();
}
t.Dispose();
t = null;
return true;
}
In python I have been struggling with a few things. Firstly, there seem to be different types of Queue's where I simply picked the import that seemed to be working import Queue. Secondly, I receive a TypeError as below.
Traceback (most recent call last):
File "C:/Users/JSC/Documents/Git/EnterprisePlatform/Enterprise/AI.App.Tool.AutomatedMachineTest/Scripts/monkey.py",
line 9, in
File "C:\Users\JSC\Documents\Git\EnterprisePlatform\Enterprise\AI.App.Tool.AutomatedMachineTest\Scripts\Libs\MonkeyHelper.py",
line 4, in RunCmdAndWait
TypeError: module is not callable
Here is the python code for monkey:
from Libs.CreateConnection import CreateMcsConnection
import Libs.MonkeyHelper as mh
import Queue
q = Queue.Queue()
to = 5000 #timeout
mh.RunCmdAndWait(CreateMcsConnection, to, q)
serv, con = q.get()
and MonkeyHelper.py:
import threading
def RunCmdAndWait(CmdToRun, timeout, q):
t = threading(group=None, target=CmdToRun, arg=q)
t.start()
t.join(timeout=timeout)
I am not sure what I am doing wrong. I am fairly new to python. Could someone please help me out?
Edit
t = threading.Thread(group=None, target=CmdToRun, args=q)
correcting the line above brought up another error:
Exception in thread Thread-1:
Traceback (most recent call last):
File "C:\Program Files (x86)\IronPython 2.7\Lib\threading.py", line 552, in _Thread__bootstrap_inner
self.run()
File "C:\Program Files (x86)\IronPython 2.7\Lib\threading.py", line 505, in run
self.target(*self.__args, **self.__kwargs)
AttributeError: Queue instance has no attribute '__len'
Is that because Thread expects multiple args or because the queue is still empty at this point? From what I've seen is that the queue is just being passed as an argument to receive the return value. Is that the right way to go?
Edit2
Changed t = threading.Thread(group=None, target=CmdToRun, args=q) to t = threading.Thread(group=None, target=CmdToRun, args=(q,))
The change yields in a TypeError below, seems weird to me since Thread is expecting a tuple.
Exception in thread Thread-1:
Traceback (most recent call last):
File "C:\Program Files (x86)\IronPython 2.7\Lib\threading.py", line 552, in _Thread__bootstrap_inner
self.run()
File "C:\Program Files (x86)\IronPython 2.7\Lib\threading.py", line 505, in run
self.__target(*self.__args, **self.__kwargs)
TypeError: tuple is not callable
threading is a module. You likely mean to replace
t = threading(group=None, target=CmdToRun, arg=q)
with
t = threading.Thread(group=None, target=CmdToRun, args=(q,))
args is an argument tuple.

Error in Python multiprocessing process

I am trying a write a python code having multiple processes whose structure and flow is something like this:
import multiprocessing
import ctypes
import time
import errno
m=multiprocessing.Manager()
mylist=m.list()
var1=m.Value('i',0)
var2=m.Value('i',1)
var3=m.Value('i',2)
var4=m.Value(ctypes.c_char_p,"a")
var5=m.Value(ctypes.c_char_p,"b")
var6=3
var7=4
var8=5
var9=6
var10=7
def func(var1,var2,var4,var5,mylist):
i=0
try:
if var1.value==0:
print var2.value,var4.value,var5.value
mylist.append(time.time())
elif var1.value==1:
i=i+2
print var2.value+2,var4.value,var5.value
mylist.append(time.time())
except IOError as e:
if e.errno==errno.EPIPE:
var3.value=var3.value+1
print "Error"
def work():
for i in range(var3.value):
print i,var6,var7,va8,var9,var10
p=multiprocessing.Process(target=func,args=(var1,var2,var4,var5,mylist))
p.start()
work()
When I run this code, sometimes it works perfectly, sometimes it does not run for exact amount of loop counts and sometimes I get following error:
0
1
Process Process-2:
Traceback (most recent call last):
File "/usr/lib64/python2.6/multiprocessing/process.py", line 232, in _bootstrap
self.run()
File "/usr/lib64/python2.6/multiprocessing/process.py", line 88, in run
self._target(*self._args, **self._kwargs)
File "dummy.py", line 19, in func
if var1.value==0:
File "/usr/lib64/python2.6/multiprocessing/managers.py", line 1005, in get
return self._callmethod('get')
File "/usr/lib64/python2.6/multiprocessing/managers.py", line 722, in _callmethod
self._connect()
File "/usr/lib64/python2.6/multiprocessing/managers.py", line 709, in _connect
conn = self._Client(self._token.address, authkey=self._authkey)
File "/usr/lib64/python2.6/multiprocessing/connection.py", line 149, in Client
answer_challenge(c, authkey)
File "/usr/lib64/python2.6/multiprocessing/connection.py", line 383, in answer_challenge
message = connection.recv_bytes(256) # reject large message
EOFError
What does this error mean? What wrong am I doing here? What this error indicates? Kindly guide me to the correct path. I am using CentOS 6.5
Working with shared variables in multiprocessing is tricky. Because of the python Global Interpreter Lock (GIL), multiprocessing is not directly possible in Python. When you use the multiprocessing module, you can launch several task on different process, BUT you can't share the memory.
In you case, you need this so you try to use shared memory. But what happens here is that you have several processes trying to read the same memory at the same time. To avoid memory corruption, a process lock the memory address it is currently reading, forbidding other processes to access it until it finishes reading.
Here you have 3 processes trying to evaluate var1.value in the first if loop of your func : the first process read the value, and the other are blocked, raising an error.
To avoid this mechanism, you should always manage the Lock of your shared variables yourself.
You can try with syntax:
var1=multiprocessing.Value('i',0) # create shared variable
var1.acquire() # get the lock : it will wait until lock is available
var1.value # read the value
var1.release() # release the lock
External documentation :
Locks : https://docs.python.org/2/librar/multiprocessing.html#synchronization-between-processes
GIL : https://docs.python.org/2/glossary.html#term-global-interpreter-lock

multiprocessing Pool hangs when there is a exception in any of the thread

I am new to Python and trying a multiprocessing.pool program to process files, it works fine as long as there are no exceptions. If any of the thread/process gets an exception the whole program waits for the thread
snippet of the code:
cp = ConfigParser.ConfigParser()
cp.read(gdbini)
for table in cp.sections():
jobs.append(table)
#print jobs
poolreturn = pool.map(worker, jobs)
pool.close()
pool.join()
Failure Message:
Traceback (most recent call last):
File "/opt/cnet-python/default-2.6/lib/python2.6/threading.py", line 525, in __bootstrap_inner
self.run()
File "/opt/cnet-python/default-2.6/lib/python2.6/threading.py", line 477, in run
self.__target(*self.__args, **self.__kwargs)
File "/opt/cnet-python/default-2.6/lib/python2.6/multiprocessing/pool.py", line 259, in _handle_results
task = get()
TypeError: ('__init__() takes exactly 3 arguments (2 given)', <class 'ConfigParser.NoOptionError'>, ("No option 'inputfilename' in section: 'section-1'",))
I went ahead added a exception handler to terminate the process
try:
ifile=cp.get(table,'inputfilename')
except ConfigParser.NoSectionError,ConfigParser.NoOptionError:
usage("One of Parameter not found for"+ table)
terminate()
but still it waits, not sure whats missing.
In Python 3.2+ this works as expected. For Python 2, this bug was fixed in r74545 and will be available in Python 2.7.3. In the mean time, you can use the configparser library which is a backport of the configparser from 3.2+. Check it out.
I had the same issue. It happens when a worker process raises a user exception which has a custom constructor. Make sure your exception (ConfigParser.NoOptionError in that case) initializes the base exception with exactly two arguments:
class NoOptionError(ValueError):
def __init__(self, message, *args):
super(NoOptionError, self).__init__(message, args)

Categories