I am using python 3.6
I am trying to use multiprocessing from inside a class method shown below by the name SubmitJobsUsingMultiProcessing() which further calls another class method in turn.
I keep running into this error : Type Error : can't pickle _thread.RLock objects.
I have no idea what this means. I have a suspicion that the below line trying to establish a connection to a webserver API might be responsible but I am all at sea to understand why.
I am not a proper programmer(code as a part of a portfolio modeling team) so if this is an obvious question please pardon my ignorance and many thanks in advance.
import multiprocessing as mp,functools
def SubmitJobsUsingMultiProcessing(self,PartitionsOfAnalysisDates,PickleTheJobIdsDict = True):
if (self.ExportSetResult == "SUCCESS"):
NumPools = mp.cpu_count()
PoolObj = mp.Pool(NumPools)
userId,clientId,password,expSetName = self.userId , self.clientId , self.password , self.expSetName
PartialFunctor = functools.partial(self.SubmitJobsAsOfDate,userId = userId,clientId = clientId,password = password,expSetName = expSetName)
Result = PoolObj.map(self.SubmitJobsAsOfDate, PartitionsOfAnalysisDates)
BatchJobIDs = OrderedDict((key, val) for Dct in Result for key, val in Dct.items())
f_pickle = open(self.JobIdPickleFileName, 'wb')
pickle.dump(BatchJobIDs, f_pickle, -1)
f_pickle.close()
def SubmitJobsAsOfDate(self,ListOfDatesForBatchJobs,userId,clientId,password,expSetName):
client = Client(self.url, proxy=self.proxysettings)
if (self.ExportSetResult != "SUCCESS"):
print("The export set creation was not successful...exiting")
sys.exit()
BatchJobIDs = OrderedDict()
NumJobsSubmitted = 0
CurrentProcessID = mp.current_process()
for AnalysisDate in ListOfDatesForBatchJobs:
jobName = "Foo_" + str(AnalysisDate)
print('Sending job from process : ', CurrentProcessID, ' : ', jobName)
jobId = client.service.SubmitExportJob(userId,clientId,password,expSetName, AnalysisDate, jobName, False)
BatchJobIDs[AnalysisDate] = jobId
NumJobsSubmitted += 1
'Sleep for 30 secs every 100 jobs'
if (NumJobsSubmitted % 100 == 0):
print('100 jobs have been submitted thus far from process : ', CurrentProcessID,'---Sleeping for 30 secs to avoid the SSL time out error')
time.sleep(30)
self.BatchJobIDs = BatchJobIDs
return BatchJobIDs
Below is the trace ::
Traceback (most recent call last):
File "C:\Program Files\JetBrains\PyCharm 2017.2.3\helpers\pydev\pydevd.py", line 1599, in <module>
globals = debugger.run(setup['file'], None, None, is_module)
File "C:\Program Files\JetBrains\PyCharm 2017.2.3\helpers\pydev\pydevd.py", line 1026, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "C:\Program Files\JetBrains\PyCharm 2017.2.3\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "C:/Users/trpff85/PycharmProjects/QuantEcon/BDTAPIMultiProcUsingPathos.py", line 289, in <module>
BDTProcessObj.SubmitJobsUsingMultiProcessing(Partitions)
File "C:/Users/trpff85/PycharmProjects/QuantEcon/BDTAPIMultiProcUsingPathos.py", line 190, in SubmitJobsUsingMultiProcessing
Result = PoolObj.map(self.SubmitJobsAsOfDate, PartitionsOfAnalysisDates)
File "C:\Users\trpff85\AppData\Local\Continuum\anaconda3\lib\multiprocessing\pool.py", line 266, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "C:\Users\trpff85\AppData\Local\Continuum\anaconda3\lib\multiprocessing\pool.py", line 644, in get
raise self._value
File "C:\Users\trpff85\AppData\Local\Continuum\anaconda3\lib\multiprocessing\pool.py", line 424, in _handle_tasks
put(task)
File "C:\Users\trpff85\AppData\Local\Continuum\anaconda3\lib\multiprocessing\connection.py", line 206, in send
self._send_bytes(_ForkingPickler.dumps(obj))
File "C:\Users\trpff85\AppData\Local\Continuum\anaconda3\lib\multiprocessing\reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
TypeError: can't pickle _thread.RLock objects
I am struggling with a similar problem. There was a bug in <=3.5 whereby _thread.RLock objects did not raise an error when pickled (They cannot be) For the Pool object to work, a function and arguments must be passed to it from the main process and this relies on pickling (pickling is a means of serialising objects) In my case the RLock object is somewhere in the logging module. I suspect your code will work fine on 3.5. Good luck. See this bug resolution.
Related
I have a very large project of a Web API using Flask and Python. It is used for testing some electronic hardware automatically.
The program uses some threading in order to run a web UI while a server runs some services (SSH, serial, VISA) among others.
The program was originally coded in python 2.7 and works just fine with this version. Right now, I am trying to update it to python 3.8 for obvious reasons.
As I am updating the project, I'm having trouble with the copy library. It is supposed to serialize a _thread.RLock object and to send it to another thread, but it keeps giving me an error. Here is the traceback that I get:
Traceback (most recent call last):
File "c:\git_files\[...]\nute\route_config\flask_api_testbench.py", line 208, in _hook_run
super(FlaskAPITestbench, self).hook_run()
File "c:\git_files\[...]\nute\core\testbench\base.py", line 291, in hook_run
while self.state_machine():
File "c:\git_files\[...]\nute\core\testbench\base.py", line 304, in state_machine
on_input=self.state_testrun
File "c:\git_files\[...]\nute\core\testbench\base.py", line 380, in wait_for_input_or_testrun
self.hook_load_testrun(config_with_input)
File "c:\git_files\[...]\nute\core\testbench\base.py", line 428, in hook_load_testrun
self.interface.load_testrun(self.load_testrun(config))
File "c:\git_files\[...]\nute\core\testbench\base.py", line 461, in load_testrun
testrun = self.test_loader.load_testrun(config, context_type=self.TestRunContext)
File "c:\git_files\[...]\nute\core\testrun\loader.py", line 89, in load_testrun
testrun_template = process_all_loaders(self.batchers, _process_batcher)
File "c:\git_files\[...]\nute\core\config\loader.py", line 127, in process_all_loaders
return fn(loader)
File "c:\git_files\[...]\nute\core\testrun\loader.py", line 85, in _process_batcher
batcher.batch_testrun(testrun_template, config, context)
File "c:\git_files\[...]\nute\batcher\python_module_batcher.py", line 21, in batch_testrun
batch_module.main(testrun, context)
File "C:\GIT_Files\[...]\pyscripts\script\patest\_batch.py", line 168, in main
test.suite(ImpedanceTest)
File "c:\git_files\[...]\nute\core\testrun\base.py", line 213, in suite
testsuite = testsuite_instance_or_class()
File "c:\git_files\[...]\nute\core\functions\helpers.py", line 233, in __new__
cls._attach_nodes_to(template)
File "c:\git_files\[...]\nute\core\functions\helpers.py", line 271, in _attach_nodes_to
node = root.import_testcase(testcase)
File "c:\git_files\[...]\nute\core\functions\specific.py", line 307, in import_testcase
test_node = testcase.copy(cls=self.__class__)
File "c:\git_files\[...]\nute\core\functions\base.py", line 645, in copy
value = copy(value)
File "c:\users\[...]\.conda\envs\py37\lib\copy.py", line 96, in copy
rv = reductor(4)
TypeError: can't pickle _thread.RLock objects
It works fine in Python 2.7, but not with Python 3.x. I've tried it on 3.7.10, 3.8.9 and 3.9.6 with the same result.
Here's the implementation of my wrap method of copy:
from copy import copy
...
def copy(self, cls=None): # class method
if cls is None:
cls = self.__class__
new_self = cls()
for key, value in self.__dict__.items():
# if key == "multithread_lock":
# continue
if self.should_copy_attribute(key, value):
# Handle recursion by pointing to the new object instead of copying.
if value is self:
value = new_self
else:
value = copy(value) # This is where it fails
new_self.__dict__[key] = value
return new_self
As you can see with the commented part, skipping the pickling of any _thread.RLock object makes the program work, but I need to refresh the web UI manually to see it running since the thread doesn't work.
Any idea why it's working on python 2.7 but not on newer versions? Thanks in advance.
So I found out that a _thread.RLock() object cannot be copied. I just added a condition to skip an object like this to be copied, and it works fine.
For the web UI not refreshing, I changed to a lower version of Flask-SocketIO and it worked just fine.
I have a simple python script that utilizes Python's BS4 library and multiprocessing to do some web scraping. I was initially getting some errors where the script would not complete since I would exceed the recursion limit, but then I found out here that BeautifulSoup trees cannot be pickled, and so causes issues with multiprocessing, so I followed one recommendation in the top answer which was to do the following: sys.setrecursionlimit(25000)
This worked fine for a couple of weeks with no issues (as far as I could tell), but today I restarted the script and some of the processes do not work and I get the error that you can see below:
I now get this error:
Traceback (most recent call last):
File "C:/Users/user/PycharmProjects/foo/single_items/single_item.py", line 243, in <module>
Process(target=instance.constant_thread).start()
File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\multiprocessing\process.py", line 112, in start
self._popen = self._Popen(self)
File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\multiprocessing\popen_spawn_win32.py", line 89, in __init__
reduction.dump(process_obj, to_child)
File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\site-packages\bs4\element.py", line 1449, in __getattr__
"'%s' object has no attribute '%s'" % (self.__class__, tag))
MemoryError: stack overflow
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\multiprocessing\spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
EOFError: Ran out of input
I am not sure what it means, but here is a pseudocode example of the script I have running:
class foo:
def __init__(url):
self.url = url
def constant_scrape:
while True:
rq = make_get_request(self.url)
soup = BeautifulSoup(rq)
if __name__ == '__main__':
sys.setrecursionlimit(25000)
url_list = [...]
for url in url_list:
instance = foo(url)
Process(target=instance.constant_scrape).start()
update 1:
It seems be that it is the same URLs every time that crash even though each url is of (seemingly) the same HTML format as the URLs that do not crash.
I had a function for making a logger proxy that could be safely passed to multiprocessing workers and log back to the main logger with essentially the following code:
import logging
from multiprocessing.managers import BaseManager
class SimpleGenerator:
def __init__(self, obj): self._obj = obj
def __call__(self): return self._obj
def get_logger_proxy(logger):
class LoggerManager(BaseManager): pass
logger_generator = SimpleGenerator(logger)
LoggerManager.register('logger', callable = logger_generator)
logger_manager = LoggerManager()
logger_manager.start()
logger_proxy = logger_manager.logger()
return logger_proxy
logger = logging.getLogger('test')
logger_proxy = get_logger_proxy(logger)
This worked great on python 2.7 through 3.7. I could pass the resulting logger_proxy to workers and they would log information, which would then be properly sent back to the main logger.
However, on python 3.8.2 (and 3.8.0) I get the following:
Traceback (most recent call last):
File "junk.py", line 20, in <module>
logger_proxy = get_logger_proxy(logger)
File "junk.py", line 13, in get_logger_proxy
logger_manager.start()
File "/anaconda3/envs/py3.8/lib/python3.8/multiprocessing/managers.py", line 579, in start
self._process.start()
File "/anaconda3/envs/py3.8/lib/python3.8/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/anaconda3/envs/py3.8/lib/python3.8/multiprocessing/context.py", line 283, in _Popen
return Popen(process_obj)
File "/anaconda3/envs/py3.8/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in __init__
super().__init__(process_obj)
File "/anaconda3/envs/py3.8/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
self._launch(process_obj)
File "/anaconda3/envs/py3.8/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch
reduction.dump(process_obj, fp)
File "/anaconda3/envs/py3.8/lib/python3.8/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'get_logger_proxy.<locals>.LoggerManager'
So it seems that they changed something about ForkingPickler that makes it unable to handle the closure in my get_logger_proxy function.
My question is, how can I fix this to work in python 3.8? Or is there a better way to get a logger proxy that will work in python 3.8 they way this one did for previous versions?
Using Python 3.x, I am trying to iterate over a dictionary of datasets (NetCDF4 datasets). They are just files...
I want to examine each dataset on a separate process:
def DoProcessWork(datasetId, dataset):
parameter = dataset.variables["so2"]
print(parameter[0,0,0,0])
if __name__ == '__main__':
mp.set_start_method('spawn')
processes = []
for key, dataset in datasets.items():
p = mp.Process(target=DoProcessWork, args=(key, dataset,))
p.start()
processes.append(p)
When I run my program, I get some message about 'pickable'
File "C:\Program Files (x86)\Python36-32\lib\multiprocessing\process.py", line 105, in start
self._popen = self._Popen(self)
File "C:\Program Files (x86)\Python36-32\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Program Files (x86)\Python36-32\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "C:\Program Files (x86)\Python36-32\lib\multiprocessing\popen_spawn_win32.py", line 65, in __init__
reduction.dump(process_obj, to_child)
File "C:\Program Files (x86)\Python36-32\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
File "netCDF4\_netCDF4.pyx", line 1992, in netCDF4._netCDF4.Dataset.__reduce__ (netCDF4\_netCDF4.c:16805)
NotImplementedError: Dataset is not picklable
What am I doing wrong? How can I fix this?
Could it be that opening the file is done on another process, and so I am getting an error because I am trying to pass data loaded on 1 process to another process?
The multiprocessing needs to serialize (pickle) the inputs to pass them to the new proccess which will run the DoProcessWork. In your case the dataset object is a problem, see the list of what can be pickled.
A possible workaround for you would be using multiprocessing with another function which reads the dataset and calls DoProcessWork on it.
I tried to copy this example from this Multiprocessing lecture by jesse noller (as recommended in another SO post)[http://pycon.blip.tv/file/1947354?filename=Pycon-IntroductionToMultiprocessingInPython630.mp4]
But for some reason I'm getting an error, as though it's ignoring my function definitions:
I'm on Windows XP (win32) which I know has restrictions with regards to the multiprocessing library in 2.6 that requires everything be pickleable
from multiprocessing import Process
import time
def sleeper(wait):
print 'Sleeping for %d seconds' % (wait,)
time.sleep(wait)
print 'Sleeping complete'
def doIT():
p = Process(target=sleeper, args=(9,))
p.start()
time.sleep(5)
p.join()
if __name__ == '__main__':
doIT()
Output:
Evaluating mypikklez.py
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Python26\lib\multiprocessing\forking.py", line 342, in main
self = load(from_parent)
File "C:\Python26\lib\pickle.py", line 1370, in load
return Unpickler(file).load()
File "C:\Python26\lib\pickle.py", line 858, in load
dispatch[key](self)
File "C:\Python26\lib\pickle.py", line 1090, in load_global
klass = self.find_class(module, name)
File "C:\Python26\lib\pickle.py", line 1126, in find_class
klass = getattr(mod, name)
AttributeError: 'module' object has no attribute 'sleeper'
The error causing the issue is : AttributeError: 'module' object has no attribute 'sleeper'
As simple of a function as it is I can't understand what would be the hold up.
This is just for self-teaching purposes of basic concepts. I'm not trying to pre-optimize any real world issue.
Thanks.
Seems from the traceback that you are running the code directly into the python interpreter (REPL).
Don't do that. Save the code in a file and run it from the file instead, with the command:
python myfile.py
That will solve your issue.
As an unrelated note, this line is wrong:
print 'Sleeping for ' + wait + ' seconds'
It should be:
print 'Sleeping for %d seconds' % (wait,)
Because you can't concatenate string and int objects (python is strongly typed)