Pathos (python module) behaves different in IDE and shell - python

I am trying to understand how to use the Pathos package to run a function that calls a function. It was my understanding an the advantage of Pathos over the main multiprocessing package was that it allowed functions inside functions. However, I can't seem to make it work. Here is the simplest example I could come up with:
def testf(x):
print(x)
import dill
import pathos
from pathos.multiprocessing import ProcessingPool
pool = ProcessingPool(nodes=3)
out2 = pool.map(testf, [1,2,3,4,5,6,7,8,9])
pool.close()
Output:
multiprocess.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/james/.local/lib/python3.6/site-packages/multiprocess/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/home/james/.local/lib/python3.6/site-packages/multiprocess/pool.py", line 44, in mapstar
return list(map(*args))
File "/home/james/.local/lib/python3.6/site-packages/pathos/helpers/mp_helper.py", line 14, in <lambda>
func = lambda args: f(*args)
File "<input>", line 2, in testf
NameError: name 'print' is not defined
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<input>", line 5, in <module>
File "/home/james/.local/lib/python3.6/site-packages/pathos/multiprocessing.py", line 136, in map
return _pool.map(star(f), zip(*args)) # chunksize
File "/home/james/.local/lib/python3.6/site-packages/multiprocess/pool.py", line 260, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/home/james/.local/lib/python3.6/site-packages/multiprocess/pool.py", line 608, in get
raise self._value
NameError: name 'print' is not defined
Edit: It seems that if I paste this code into a shell console, it works fine. No dice if I run it from my IDE of choice, PyCharm. So now my question is why the same code would work differently in the same version of the python interpreter (3.6.1) based on whether it is run from a shell or the in-app console/

Related

Fuzzywuzzy returns 'ratio' not defined in Pycharm only

Why might I be getting a 'NameError: name 'ratio' is not defined' error when I attempt to use fuzzywuzzy in Pycharm. I have no issues using it in IDLE or python's 32-bit app.
I've reviewed similar topics of "works in idle but not pycharm"; however, those found related only to import name typos, utf encoding, and same function/file name. And I have ruled those out.
Example:
Using: Python 3.7, Windows 10, FuzzyWuzzy version: 0.17
In IDLE -
>>> from fuzzywuzzy import fuzz
>>> fuzz.ratio('test', 'test2')
>>> 89
In PYCHARM's python console:
from fuzzywuzzy import fuzz
fuzz.ratio('test', 'test2')
RETURNS:
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "C:\Users\xx188\AppData\Local\Programs\Python\Python37-32\lib\site-packages\fuzzywuzzy\utils.py", line 38, in decorator
return func(*args, **kwargs)
File "C:\Users\xx188\AppData\Local\Programs\Python\Python37-32\lib\site-packages\fuzzywuzzy\utils.py", line 29, in decorator
return func(*args, **kwargs)
File "C:\Users\xx188\AppData\Local\Programs\Python\Python37-32\lib\site-packages\fuzzywuzzy\utils.py", line 47, in decorator
return func(*args, **kwargs)
File "C:\Users\xx188\AppData\Local\Programs\Python\Python37-32\lib\site-packages\fuzzywuzzy\fuzz.py", line 28, in ratio
return utils.intr(100 * m.ratio())
File "C:\Users\xx188\AppData\Local\Programs\Python\Python37-32\lib\site-packages\fuzzywuzzy\StringMatcher.py", line 64, in ratio
self._ratio = ratio(self._str1, self._str2)
NameError: name 'ratio' is not defined

cProfiler working weirdly with multiprocessing

I got an error for this code:
from pathos.multiprocessing import ProcessingPool
def diePlz(im):
print('Whoopdepoop!')
def caller():
im = 1
pool = ProcessingPool()
pool.map(diePlz,[im,im,im,im])
if __name__=='__main__':
caller()
when I ran it with the cProfiler: (python3 -m cProfile testProfiler.py)
multiprocess.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/rohit/.local/lib/python3.6/site-packages/multiprocess/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/home/rohit/.local/lib/python3.6/site-packages/multiprocess/pool.py", line 44, in mapstar
return list(map(*args))
File "/home/rohit/.local/lib/python3.6/site-packages/pathos/helpers/mp_helper.py", line 15, in <lambda>
func = lambda args: f(*args)
File "testProfiler.py", line 3, in diePlz
print('Whoopdepoop!')
NameError: name 'print' is not defined
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/usr/lib/python3.6/cProfile.py", line 160, in <module>
main()
File "/usr/lib/python3.6/cProfile.py", line 153, in main
runctx(code, globs, None, options.outfile, options.sort)
File "/usr/lib/python3.6/cProfile.py", line 20, in runctx
filename, sort)
File "/usr/lib/python3.6/profile.py", line 64, in runctx
prof.runctx(statement, globals, locals)
File "/usr/lib/python3.6/cProfile.py", line 100, in runctx
exec(cmd, globals, locals)
File "testProfiler.py", line 11, in <module>
caller()
File "testProfiler.py", line 8, in caller
pool.map(diePlz,[im,im,im,im])
File "/home/rohit/.local/lib/python3.6/site-packages/pathos/multiprocessing.py", line 137, in map
return _pool.map(star(f), zip(*args)) # chunksize
File "/home/rohit/.local/lib/python3.6/site-packages/multiprocess/pool.py", line 266, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/home/rohit/.local/lib/python3.6/site-packages/multiprocess/pool.py", line 644, in get
raise self._value
NameError: name 'print' is not defined
But when I ran it without the cProfiler:
$ python3 testProfiler.py
Whoopdepoop!
Whoopdepoop!
Whoopdepoop!
Whoopdepoop!
The code that I've provided is a minimal working example for the problem. There is a much larger code that I want to debug, but am not able to do so because cProfiler keeps raising weird errors.
In this case, the point of importance is
NameError: name 'print' is not defined
which means python3 is not able to recognize print itself. In my code, it was not able to recognize range.
So, I realize this is a long time after the original post, but I have this exact same issue.
In my case I was getting the exact same error as the original post - python builtin functions such as print() or len() resulted in errors like this:
NameError: name 'len' is not defined
I'm currently running multiprocess version 0.70.11.1 and dill version 0.3.3 (components of pathos that make process based parallelism work).
Based on what I found in an issue comment: https://github.com/uqfoundation/pathos/issues/129#issuecomment-536081859 one of the package authors recommends trying:
import dill
dill.settings['recurse'] = True
At least in my case, the above fixed the error!

Python multiprocessing pool apply_async error

I'm trying to evaluate a number of processes in a multiprocessing pool but keep running into errors and I can't work out why... There's a simplified version of the code below:
class Object_1():
def add_godd_spd_column()
def calculate_correlations(arg1, arg2, arg3):
return {'a': 1}
processes = {}
pool = Pool(processes=6)
for i in range(1, 10):
processes[i] = pool.apply_async(calculate_correlations,
args=(arg1, arg2, arg3,))
correlations = {}
for i in range(0, 10):
correlations[i] = processes[i].get()
This returns the following error:
Traceback (most recent call last):
File "./02_results.py", line 116, in <module>
correlations[0] = processes[0].get()
File "/opt/anaconda3/lib/python3.5/multiprocessing/pool.py", line 608, in get
raise self._value
File "/opt/anaconda3/lib/python3.5/multiprocessing/pool.py", line 385, in
_handle_tasks
put(task)
File "/opt/anaconda3/lib/python3.5/multiprocessing/connection.py", line 206, in send
self._send_bytes(ForkingPickler.dumps(obj))
File "/opt/anaconda3/lib/python3.5/multiprocessing/reduction.py", line 50, in dumps
cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object 'SCADA.add_good_spd_column.<locals>.calculate_correlations
When I call the following:
correlations[0].successful()
I get the following error:
Traceback (most recent call last):
File "./02_results.py", line 116, in <module>
print(processes[0].successful())
File "/opt/anaconda3/lib/python3.5/multiprocessing/pool.py", line 595, in
successful
assert self.ready()
AssertionError
Is this because the process isn't actually finished before the .get() is called? The function being evaluated just returns a dictionary which should definitely be pickle-able...
Cheers,
The error is occurring because pickling a function nested in another function is not supported, and multiprocessing.Pool needs to pickle the function you pass as an argument to apply_async in order to execute it in a worker process. You have to move the function to the top level of the module, or make it an instance method of the class. Keep in mind that if you make it an instance method, the instance of the class itself must also be picklable.
And yes, the assertion error when calling successful() occurs because you're calling it before a result is ready. From the docs:
successful()
Return whether the call completed without raising an exception. Will raise AssertionError if the result is not ready.

Multiprocessing in Python 3.4 does not work with imported module

I am using multiprocessing in Python 3.4.3 to speed up my code. I have a problem with getting back my results. I have tried the following simple code, which works just fine.
import numpy
from multiprocessing import Pool
from functools import partial
from OpenDutchWordnet import Wn_grid_parser, le, les, synset, relation
def funct(arg1, value):
return arg1 * value
if __name__ == '__main__':
#------FOR TESTING-------
t=[1,2,3,4]
arg1=4
pool=Pool(processes=1)
func=partial(funct, arg1)
print("func: ", func)
m4=pool.map(func,t)
print(m4)
#------/FOR TESTING-------
Of course, I would like to run more than 1 processes. And the code which I would like to run is the following,
import numpy
from multiprocessing import Pool
from functools import partial
from OpenDutchWordnet import Wn_grid_parser, le, les, synset, relation
def funct2(arg1, value):
return arg1.get_relations(value)
if __name__ == '__main__':
myparser= Wn_grid_parser(Wn_grid_parser.odwn)
l_sensesofwoord = myparser.lemma_get_generator("man")
sense=l_sensesofwoord[0]
synsetid_sense=sense.get_synset_id()
t=["has_hyperonym", "has_holonym"]
arg1=myparser.synsets_find_synset(synsetid_sense)
f=partial(funct2, arg1)
print("f is: ", f)
m1=pool.map(f,t)
When running this code, I get the following errormessage.
f is: functools.partial(<function funct2 at 0x00000000046D5378>, <synset.Synset object at 0x000000005011DDA0>)
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "C:\Python34\lib\multiprocessing\pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "C:\Python34\lib\multiprocessing\pool.py", line 44, in mapstar
return list(map(*args))
File "C:\Users\UTRSB\AppData\Local\Continuum\Anaconda3\mycode\multi.py", line 14, in funct2
return numpy.asarray(arg1.get_relations(value))
File "C:\Python34\lib\site-packages\OpenDutchWordnet\synset.py", line 98, in get_relations
for relation_el in self.synset_el.iterfind(xml_query)]
File "C:\Python34\lib\site-packages\OpenDutchWordnet\synset.py", line 97, in <listcomp>
return [Relation(relation_el)
File "C:\Python34\lib\site-packages\lxml\_elementpath.py", line 156, in select
for elem in result:
File "C:\Python34\lib\site-packages\lxml\_elementpath.py", line 88, in select
for elem in result:
File "C:\Python34\lib\site-packages\lxml\_elementpath.py", line 89, in select
for e in elem.iterchildren(tag):
File "lxml.etree.pyx", line 1363, in lxml.etree._Element.iterchildren (src\lxml\lxml.etree.c:50501)
File "lxml.etree.pyx", line 2730, in lxml.etree.ElementChildIterator.__cinit__ (src\lxml\lxml.etree.c:66739)
File "apihelpers.pxi", line 24, in lxml.etree._assertValidNode (src\lxml\lxml.etree.c:14133)
AssertionError: invalid Element proxy at 53353160
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:/Users/UTRSB/AppData/Local/Continuum/Anaconda3/mycode/multi.py", line 52, in <module>
m1=pool.map(f,t)
File "C:\Python34\lib\multiprocessing\pool.py", line 260, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "C:\Python34\lib\multiprocessing\pool.py", line 599, in get
raise self._value
AssertionError: invalid Element proxy at 53353160
I have also tried using another way: result=pool.apply_async(geefAlleGloss,[p])
this works just fine, but when I want to use get() to obtain the results. I end up with the same error. answer=result.get()
I think the error is somewhere in the map function. At first, I thought it had something to to with the imported modules from OpenDutchWordnet that I use. But as the partial function works, the error should be caused by the get() and/or map() function.
I would appreciate any help.

Why does Python refuse to execute this code in a new subprocess?

I am trying to make a very simple application that allows for people to define their own little python scripts within the application. I want to execute the code in a new process to make it easy to kill later. Unfortunately, Python keeps giving me the following error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/dist-packages/spyderlib/widgets/externalshell/sitecustomize.py", line 540, in runfile
execfile(filename, namespace)
File "/home/skylion/Documents/python_exec test.py", line 19, in <module>
code_process = Process(target=exec_, args=(user_input_code))
File "/usr/lib/python2.7/multiprocessing/process.py", line 104, in __init__
self._args = tuple(args)
TypeError: 'code' object is not iterable
>>>
My code is posted below
user_input_string = '''
import os
world_name='world'
robot_name='default_body + os.path.sep'
joint_names=['hingejoint0', 'hingejoint1', 'hingejoint2', 'hingejoint3', 'hingejoint4', 'hingejoint5', 'hingejoint6', 'hingejoint7', 'hingejoint8']
print(joint_names)
'''
def exec_(arg):
exec(arg)
user_input_code = compile(user_input_string, 'user_defined', 'exec')
from multiprocessing import Process
code_process = Process(target=exec_, args=(user_input_code))
code_process.start()
What am I missing? Is there something wrong with my user_input_string? With my compile options? Any help would be appreciated.
I believe args must be a tuple. To create a single-element tuple, add a comma like so: args=(user_input_code,)

Categories