Python multiprocessing pool apply_async error - python

I'm trying to evaluate a number of processes in a multiprocessing pool but keep running into errors and I can't work out why... There's a simplified version of the code below:
class Object_1():
def add_godd_spd_column()
def calculate_correlations(arg1, arg2, arg3):
return {'a': 1}
processes = {}
pool = Pool(processes=6)
for i in range(1, 10):
processes[i] = pool.apply_async(calculate_correlations,
args=(arg1, arg2, arg3,))
correlations = {}
for i in range(0, 10):
correlations[i] = processes[i].get()
This returns the following error:
Traceback (most recent call last):
File "./02_results.py", line 116, in <module>
correlations[0] = processes[0].get()
File "/opt/anaconda3/lib/python3.5/multiprocessing/pool.py", line 608, in get
raise self._value
File "/opt/anaconda3/lib/python3.5/multiprocessing/pool.py", line 385, in
_handle_tasks
put(task)
File "/opt/anaconda3/lib/python3.5/multiprocessing/connection.py", line 206, in send
self._send_bytes(ForkingPickler.dumps(obj))
File "/opt/anaconda3/lib/python3.5/multiprocessing/reduction.py", line 50, in dumps
cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object 'SCADA.add_good_spd_column.<locals>.calculate_correlations
When I call the following:
correlations[0].successful()
I get the following error:
Traceback (most recent call last):
File "./02_results.py", line 116, in <module>
print(processes[0].successful())
File "/opt/anaconda3/lib/python3.5/multiprocessing/pool.py", line 595, in
successful
assert self.ready()
AssertionError
Is this because the process isn't actually finished before the .get() is called? The function being evaluated just returns a dictionary which should definitely be pickle-able...
Cheers,

The error is occurring because pickling a function nested in another function is not supported, and multiprocessing.Pool needs to pickle the function you pass as an argument to apply_async in order to execute it in a worker process. You have to move the function to the top level of the module, or make it an instance method of the class. Keep in mind that if you make it an instance method, the instance of the class itself must also be picklable.
And yes, the assertion error when calling successful() occurs because you're calling it before a result is ready. From the docs:
successful()
Return whether the call completed without raising an exception. Will raise AssertionError if the result is not ready.

Related

How do I uses Python's multiprocessing.Value method with .map()?

I'm testing some code using multiprocessing to try to understand it, but I'm struggling to get the .Value to work. What am I doing wrong, it says p doesn't exist?
here's my code:
from multiprocessing import Pool, Value
from ctypes import c_int
if __name__ =="__main__":
p=Value(c_int,1)
def addone(a):
print(a)
with p.get_lock():
print(p.value)
p.value += 1
if __name__ =="__main__":
with Pool() as po:
po.map(addone,range(19))
print(p.value)
And I get this error:
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "C:\Users\(I removed this)\AppData\Local\Programs\Python\Python39\lib\multiprocessing\pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "C:\Users\(I removed this)\AppData\Local\Programs\Python\Python39\lib\multiprocessing\pool.py", line 48, in mapstar
return list(map(*args))
File "C:\Users\(I removed this)\Desktop\Python\make\multiTest.py", line 10, in addone
with p.get_lock():
NameError: name 'p' is not defined
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File (I removed this), line 15, in <module>
po.map(addone,range(19))
File "C:\Users\(I removed this)\AppData\Local\Programs\Python\Python39\lib\multiprocessing\pool.py", line 364, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "C:\Users\(I removed this)\AppData\Local\Programs\Python\Python39\lib\multiprocessing\pool.py", line 771, in get
raise self._value
NameError: name 'p' is not defined
What should I do?
You only want a single instance of the shared value, p, to be operated on by the pool processes executing your worker function, addone. The way to do this is to use the initalizer and initargs arguments of the multiprocessing.pool.Pool class to initialize a global variable, p, in each pool process with the shared value created by the main process. Passing explicitly p to a multiprocessing pool worker function as an additional argument will not work; this will result in the cryptic error: RuntimeError: Synchronized objects should only be shared between processes through inheritance.
def init_pool_processes(shared_value):
global p
p = shared_value
def addone(a):
print(a)
with p.get_lock():
print(p.value)
p.value += 1
if __name__ =="__main__":
from multiprocessing import Pool, Value
from ctypes import c_int
p = Value(c_int, 1)
with Pool(initializer=init_pool_processes, initargs=(p,)) as po:
po.map(addone, range(19))
print(p.value)

Pathos (python module) behaves different in IDE and shell

I am trying to understand how to use the Pathos package to run a function that calls a function. It was my understanding an the advantage of Pathos over the main multiprocessing package was that it allowed functions inside functions. However, I can't seem to make it work. Here is the simplest example I could come up with:
def testf(x):
print(x)
import dill
import pathos
from pathos.multiprocessing import ProcessingPool
pool = ProcessingPool(nodes=3)
out2 = pool.map(testf, [1,2,3,4,5,6,7,8,9])
pool.close()
Output:
multiprocess.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/james/.local/lib/python3.6/site-packages/multiprocess/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/home/james/.local/lib/python3.6/site-packages/multiprocess/pool.py", line 44, in mapstar
return list(map(*args))
File "/home/james/.local/lib/python3.6/site-packages/pathos/helpers/mp_helper.py", line 14, in <lambda>
func = lambda args: f(*args)
File "<input>", line 2, in testf
NameError: name 'print' is not defined
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<input>", line 5, in <module>
File "/home/james/.local/lib/python3.6/site-packages/pathos/multiprocessing.py", line 136, in map
return _pool.map(star(f), zip(*args)) # chunksize
File "/home/james/.local/lib/python3.6/site-packages/multiprocess/pool.py", line 260, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/home/james/.local/lib/python3.6/site-packages/multiprocess/pool.py", line 608, in get
raise self._value
NameError: name 'print' is not defined
Edit: It seems that if I paste this code into a shell console, it works fine. No dice if I run it from my IDE of choice, PyCharm. So now my question is why the same code would work differently in the same version of the python interpreter (3.6.1) based on whether it is run from a shell or the in-app console/

Receive return value from different thread in another module

I am trying to replicate C# code in python which executes a thread, waits for it to finish and returns a value. Essentially the method RunAndWait is in a helper class because a call to that method is being made multiple times.
C# code is as follows:
public static bool RunAndWait(Action _action, long _timeout)
{
Task t = Task.Run(() =>
{
Log.Message(Severity.MESSAGE, "Executing " + _action.Method.Name);
_action();
});
if (!t.Wait(Convert.ToInt32(_timeout)))
{
Log.Message(Severity.ERROR, "Executing " + _action.Method.Name + " timedout. Could not execute MCS command.");
throw new AssertFailedException();
}
t.Dispose();
t = null;
return true;
}
In python I have been struggling with a few things. Firstly, there seem to be different types of Queue's where I simply picked the import that seemed to be working import Queue. Secondly, I receive a TypeError as below.
Traceback (most recent call last):
File "C:/Users/JSC/Documents/Git/EnterprisePlatform/Enterprise/AI.App.Tool.AutomatedMachineTest/Scripts/monkey.py",
line 9, in
File "C:\Users\JSC\Documents\Git\EnterprisePlatform\Enterprise\AI.App.Tool.AutomatedMachineTest\Scripts\Libs\MonkeyHelper.py",
line 4, in RunCmdAndWait
TypeError: module is not callable
Here is the python code for monkey:
from Libs.CreateConnection import CreateMcsConnection
import Libs.MonkeyHelper as mh
import Queue
q = Queue.Queue()
to = 5000 #timeout
mh.RunCmdAndWait(CreateMcsConnection, to, q)
serv, con = q.get()
and MonkeyHelper.py:
import threading
def RunCmdAndWait(CmdToRun, timeout, q):
t = threading(group=None, target=CmdToRun, arg=q)
t.start()
t.join(timeout=timeout)
I am not sure what I am doing wrong. I am fairly new to python. Could someone please help me out?
Edit
t = threading.Thread(group=None, target=CmdToRun, args=q)
correcting the line above brought up another error:
Exception in thread Thread-1:
Traceback (most recent call last):
File "C:\Program Files (x86)\IronPython 2.7\Lib\threading.py", line 552, in _Thread__bootstrap_inner
self.run()
File "C:\Program Files (x86)\IronPython 2.7\Lib\threading.py", line 505, in run
self.target(*self.__args, **self.__kwargs)
AttributeError: Queue instance has no attribute '__len'
Is that because Thread expects multiple args or because the queue is still empty at this point? From what I've seen is that the queue is just being passed as an argument to receive the return value. Is that the right way to go?
Edit2
Changed t = threading.Thread(group=None, target=CmdToRun, args=q) to t = threading.Thread(group=None, target=CmdToRun, args=(q,))
The change yields in a TypeError below, seems weird to me since Thread is expecting a tuple.
Exception in thread Thread-1:
Traceback (most recent call last):
File "C:\Program Files (x86)\IronPython 2.7\Lib\threading.py", line 552, in _Thread__bootstrap_inner
self.run()
File "C:\Program Files (x86)\IronPython 2.7\Lib\threading.py", line 505, in run
self.__target(*self.__args, **self.__kwargs)
TypeError: tuple is not callable
threading is a module. You likely mean to replace
t = threading(group=None, target=CmdToRun, arg=q)
with
t = threading.Thread(group=None, target=CmdToRun, args=(q,))
args is an argument tuple.

Convert a multi-threaded Python to a multi-process one using concurrent futures

I have the following working code (Python 3.5) which uses concurrent futures to parse files in a threaded manner, and then do some post-processing on the results when they come back (in any order).
from concurrent import futures
with futures.ThreadPoolExecutor(max_workers=4) as executor:
# A dictionary which will contain a list the future info in the key, and the filename in the value
jobs = {}
# Loop through the files, and run the parse function for each file, sending the file-name to it, along with the kwargs of parser_variables.
# The results of the functions can come back in any order.
for this_file in files_list:
job = executor.submit(parse_log_file.parse, this_file, **parser_variables)
jobs[job] = this_file
# Get the completed jobs whenever they are done
for job in futures.as_completed(jobs):
debug.checkpointer("Multi-threaded Parsing File finishing")
# Send the result of the file the job is based on (jobs[job]) and the job (job.result)
result_content = job.result()
this_file = jobs[job]
I want to convert this to use processes instead of threads because threads don't offer any speedup. In theory I just need to change ThreadPoolExecutor into ProcessPoolExecutor.
The problem is, if I do that I get this exception:
Process Process-2:
Traceback (most recent call last):
File "C:\Python35\lib\multiprocessing\process.py", line 254, in _bootstrap
self.run()
File "C:\Python35\lib\multiprocessing\process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "C:\Python35\lib\concurrent\futures\process.py", line 169, in _process_worker
call_item = call_queue.get(block=True)
File "C:\Python35\lib\multiprocessing\queues.py", line 113, in get
return ForkingPickler.loads(res)
TypeError: Required argument 'fileno' (pos 1) not found
Traceback (most recent call last):
File "c:/myscript/main.py", line 89, in <module>
main()
File "c:/myscript/main.py", line 59, in main
system_counters = process_system(system, filename)
File "c:\myscript\per_system.py", line 208, in process_system
system_counters = process_filelist(**file_handling_variables)
File "c:\myscript\per_logfile.py", line 31, in process_filelist
results_list = job.result()
File "C:\Python35\lib\concurrent\futures\_base.py", line 398, in result
return self.__get_result()
File "C:\Python35\lib\concurrent\futures\_base.py", line 357, in __get_result
raise self._exception
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.
I think that this might have something to do with pickling, but googling for the error hasn't found anything.
How do I convert the above to use multiple processes?
It turns out this is because one of the things I'm passing inside parser_variables is a class (a reader from a third-party module). If I remove the class, the above works fine.
For whatever reason, pickle doesn't seem to be able to handle this particular object.

Python: eliminating stack traces into library code?

When I get a runtime exception from the standard library, it's almost always a problem in my code and not in the library code. Is there a way to truncate the exception stack trace so that it doesn't show the guts of the library package?
For example, I would like to get this:
Traceback (most recent call last):
File "./lmd3-mkhead.py", line 71, in <module>
main()
File "./lmd3-mkhead.py", line 66, in main
create()
File "./lmd3-mkhead.py", line 41, in create
headver1[depotFile]=rev
TypeError: Data values must be of type string or None.
and not this:
Traceback (most recent call last):
File "./lmd3-mkhead.py", line 71, in <module>
main()
File "./lmd3-mkhead.py", line 66, in main
create()
File "./lmd3-mkhead.py", line 41, in create
headver1[depotFile]=rev
File "/usr/anim/modsquad/oses/fc11/lib/python2.6/bsddb/__init__.py", line 276, in __setitem__
_DeadlockWrap(wrapF) # self.db[key] = value
File "/usr/anim/modsquad/oses/fc11/lib/python2.6/bsddb/dbutils.py", line 68, in DeadlockWrap
return function(*_args, **_kwargs)
File "/usr/anim/modsquad/oses/fc11/lib/python2.6/bsddb/__init__.py", line 275, in wrapF
self.db[key] = value
TypeError: Data values must be of type string or None.
update: added an answer with the code, thanks to the pointer from Alex.
The traceback module in Python's standard library lets you emit error tracebacks in a way that accords to your liking, while an exception is propagating. You can use this power either in the except leg of a try/except statement, or in a function you've installed as sys.excepthook, which gets called if and when an exception propagates all the way; quoting the docs:
In an interactive session this happens
just before control is returned to the
prompt; in a Python program this
happens just before the program exits.
The handling of such top-level
exceptions can be customized by
assigning another three-argument
function to sys.excepthook.
Here's a simple, artificial example:
>>> import sys
>>> import traceback
>>> def f(n):
... if n<=0: raise ZeroDivisionError
... f(n-1)
...
>>> def excepthook(type, value, tb):
... traceback.print_exception(type, value, tb, 3)
...
>>> sys.excepthook = excepthook
>>> f(8)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 3, in f
File "<stdin>", line 3, in f
ZeroDivisionError
as you see, without needing a try/except, you can easily limit the traceback to (for example) the first three levels -- even though we know by design that there were 9 nested levels when the exception was raised.
You want something more sophisticated than a simple limit on levels, so you'll need to call traceback.format_exception, which gives you a list of lines rather than printing it, then "prune" from that list the lines that are about modules you never want to see in your tracebacks, and finally emit the remaining lines (typically to sys.stderr, but, whatever!-).
Thanks to the pointer from Alex, here's teh codez:
def trimmedexceptions(type, value, tb, pylibdir=None, lev=None):
"""trim system packages from the exception printout"""
if pylibdir is None:
import traceback, distutils.sysconfig
pylibdir = distutils.sysconfig.get_python_lib(1,1)
nlev = trimmedexceptions(type, value, tb, pylibdir, 0)
traceback.print_exception(type, value, tb, nlev)
else:
fn = tb.tb_frame.f_code.co_filename
if tb.tb_next is None or fn.startswith(pylibdir):
return lev
else:
return trimmedexceptions(type, value, tb.tb_next, pylibdir, lev+1)
import sys
sys.excepthook=trimmedexceptions
# --- test code ---
def f1(): f2()
def f2(): f3()
def f3():
import xmlrpclib
proxy = xmlrpclib.ServerProxy('http://nosuchserver')
proxy.f()
f1()
Which yields this stack trace:
Traceback (most recent call last):
File "./tsttraceback.py", line 47, in <module>
f1()
File "./tsttraceback.py", line 40, in f1
def f1(): f2()
File "./tsttraceback.py", line 41, in f2
def f2(): f3()
File "./tsttraceback.py", line 45, in f3
proxy.f()
gaierror: [Errno -2] Name or service not known
The Traceback library is probably what you want. Here's one example that might help:
import traceback
try:
your_main()
except:
lines = traceback.format_exc()
print lines[:lines.find('File "/usr')]
(This obviously won't work if there's an exception outside the library, and might not exactly fit your needs, but it's one way of using the traceback library)
Put an unqualified try...except at the top of your code (ie: in your "main") or set sys.excepthook. You can then format the stack trace however you'd like.

Categories