How should I write a logger proxy? - python

I had a function for making a logger proxy that could be safely passed to multiprocessing workers and log back to the main logger with essentially the following code:
import logging
from multiprocessing.managers import BaseManager
class SimpleGenerator:
def __init__(self, obj): self._obj = obj
def __call__(self): return self._obj
def get_logger_proxy(logger):
class LoggerManager(BaseManager): pass
logger_generator = SimpleGenerator(logger)
LoggerManager.register('logger', callable = logger_generator)
logger_manager = LoggerManager()
logger_manager.start()
logger_proxy = logger_manager.logger()
return logger_proxy
logger = logging.getLogger('test')
logger_proxy = get_logger_proxy(logger)
This worked great on python 2.7 through 3.7. I could pass the resulting logger_proxy to workers and they would log information, which would then be properly sent back to the main logger.
However, on python 3.8.2 (and 3.8.0) I get the following:
Traceback (most recent call last):
File "junk.py", line 20, in <module>
logger_proxy = get_logger_proxy(logger)
File "junk.py", line 13, in get_logger_proxy
logger_manager.start()
File "/anaconda3/envs/py3.8/lib/python3.8/multiprocessing/managers.py", line 579, in start
self._process.start()
File "/anaconda3/envs/py3.8/lib/python3.8/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/anaconda3/envs/py3.8/lib/python3.8/multiprocessing/context.py", line 283, in _Popen
return Popen(process_obj)
File "/anaconda3/envs/py3.8/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in __init__
super().__init__(process_obj)
File "/anaconda3/envs/py3.8/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
self._launch(process_obj)
File "/anaconda3/envs/py3.8/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch
reduction.dump(process_obj, fp)
File "/anaconda3/envs/py3.8/lib/python3.8/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'get_logger_proxy.<locals>.LoggerManager'
So it seems that they changed something about ForkingPickler that makes it unable to handle the closure in my get_logger_proxy function.
My question is, how can I fix this to work in python 3.8? Or is there a better way to get a logger proxy that will work in python 3.8 they way this one did for previous versions?

Related

BS4 MemoryError: stack overflow and EOFError: Ran out of input when using multiprocessing in python

I have a simple python script that utilizes Python's BS4 library and multiprocessing to do some web scraping. I was initially getting some errors where the script would not complete since I would exceed the recursion limit, but then I found out here that BeautifulSoup trees cannot be pickled, and so causes issues with multiprocessing, so I followed one recommendation in the top answer which was to do the following: sys.setrecursionlimit(25000)
This worked fine for a couple of weeks with no issues (as far as I could tell), but today I restarted the script and some of the processes do not work and I get the error that you can see below:
I now get this error:
Traceback (most recent call last):
File "C:/Users/user/PycharmProjects/foo/single_items/single_item.py", line 243, in <module>
Process(target=instance.constant_thread).start()
File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\multiprocessing\process.py", line 112, in start
self._popen = self._Popen(self)
File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\multiprocessing\popen_spawn_win32.py", line 89, in __init__
reduction.dump(process_obj, to_child)
File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\site-packages\bs4\element.py", line 1449, in __getattr__
"'%s' object has no attribute '%s'" % (self.__class__, tag))
MemoryError: stack overflow
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\multiprocessing\spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
EOFError: Ran out of input
I am not sure what it means, but here is a pseudocode example of the script I have running:
class foo:
def __init__(url):
self.url = url
def constant_scrape:
while True:
rq = make_get_request(self.url)
soup = BeautifulSoup(rq)
if __name__ == '__main__':
sys.setrecursionlimit(25000)
url_list = [...]
for url in url_list:
instance = foo(url)
Process(target=instance.constant_scrape).start()
update 1:
It seems be that it is the same URLs every time that crash even though each url is of (seemingly) the same HTML format as the URLs that do not crash.

why cannot cuda model be initialized under the __init__ method in a class that inherits multiprocessing.process?

Here is my code:
from MyDetector import Helmet_Detector
from multiprocessing import Process
class Processor(Process):
def __init__(self):
super().__init__()
self.helmet_detector = Helmet_Detector()
def run(self):
print(111)
if __name__ == '__main__':
p=Processor()
p.start()
As you can see, the class 'Processor' inherits multiprocessing.Process, and Helmet_Detector is a YOLO model using cuda. But when I ran it, the error occurred as follow:
THCudaCheck FAIL file=C:\w\1\s\tmp_conda_3.7_075911\conda\conda-bld\pytorch_1579075223148\work\torch/csrc/generic/StorageSharing.cpp line=245 error=71 : operation not supported
Traceback (most recent call last):
File "E:/python-tasks/WHU-CSTECH/Processor.py", line 17, in <module>
p.start()
File "C:\Anaconda\lib\multiprocessing\process.py", line 112, in start
self._popen = self._Popen(self)
File "C:\Anaconda\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Anaconda\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "C:\Anaconda\lib\multiprocessing\popen_spawn_win32.py", line 89, in __init__
reduction.dump(process_obj, to_child)
File "C:\Anaconda\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
File "C:\Anaconda\lib\site-packages\torch\multiprocessing\reductions.py", line 242, in reduce_tensor
event_sync_required) = storage._share_cuda_()
RuntimeError: cuda runtime error (71) : operation not supported at C:\w\1\s\tmp_conda_3.7_075911\conda\conda-bld\pytorch_1579075223148\work\torch/csrc/generic/StorageSharing.cpp:245
then I tried to intialize the Helmet_Detector in run method:
def run(self):
print(111)
self.helmet_detector = Helmet_Detector()
No error occurred. Could anyone please tell me the reason for this and how could I solve this problem? Thank you!
Error occurs because in python multiprocessing requires Process class objects to be pickelable so that data can be transferred to the process being created i.e. Serialisation and deserialization of the object. Suggestion to overcome the issue, lazy instantiate the Helmet_Detector object (hint: try property in python).
Edit:
As per the comment by #jodag, you should use pytorch's multiprocessing library instead of standard multiprocessing library
Example:
import torch.multiprocessing as mp
class Processor(mp.Process):
.
.
.

multiprocessing pickling error: _pickle.PicklingError: Can't pickle <function myProcess at 0x02B2D420>: it's not the same object as __main__.myProcess

I'm reading and applying code from the python book and I can't use multiprocessing in simple example that you can see below:
import multiprocessing
def myProcess():
print("Currently Executing Child Process")
print("This process has it's own instance of the GIL")
print("Executing Main Process")
print("Creating Child Process")
myProcess = multiprocessing.Process(target=myProcess)
myProcess.start()
myProcess.join()
print("Child Process has terminated, terminating main process")
My platform is Windows 10 64 bit and using if __name_ == "__main_" : doesn't work in this case. What's wrong here? This code should work in python version 3.5 and above. Python version I use is 3.7. Full error message below:
C:\Users\Xian\AppData\Local\Programs\Python\Python37-32\python.exe "C:/OneDrive/Utilizing sub-process.py"
Traceback (most recent call last):
File "C:/OneDrive/Utilizing sub-process.py", line 25, in <module>
myProcess.start()
File "C:\Users\Xian\AppData\Local\Programs\Python\Python37-32\lib\multiprocessing\process.py", line 112, in start
self._popen = self._Popen(self)
File "C:\Users\Xian\AppData\Local\Programs\Python\Python37-32\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Users\Xian\AppData\Local\Programs\Python\Python37-32\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "C:\Users\Xian\AppData\Local\Programs\Python\Python37-32\lib\multiprocessing\popen_spawn_win32.py", line 65, in __init__
reduction.dump(process_obj, to_child)
File "C:\Users\Xian\AppData\Local\Programs\Python\Python37-32\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <function myProcess at 0x02B2D420>: it's not the same object as __main__.myProcess
try this
def test()
import multiprocessing
multiprocessing.set_start_method("fork")
p = multiprocessing.Process(target=xxx)
p.start()
python multiprocessing Contexts and start methods

can't pickle _thread.RLock objects when using a webservice

I am using python 3.6
I am trying to use multiprocessing from inside a class method shown below by the name SubmitJobsUsingMultiProcessing() which further calls another class method in turn.
I keep running into this error : Type Error : can't pickle _thread.RLock objects.
I have no idea what this means. I have a suspicion that the below line trying to establish a connection to a webserver API might be responsible but I am all at sea to understand why.
I am not a proper programmer(code as a part of a portfolio modeling team) so if this is an obvious question please pardon my ignorance and many thanks in advance.
import multiprocessing as mp,functools
def SubmitJobsUsingMultiProcessing(self,PartitionsOfAnalysisDates,PickleTheJobIdsDict = True):
if (self.ExportSetResult == "SUCCESS"):
NumPools = mp.cpu_count()
PoolObj = mp.Pool(NumPools)
userId,clientId,password,expSetName = self.userId , self.clientId , self.password , self.expSetName
PartialFunctor = functools.partial(self.SubmitJobsAsOfDate,userId = userId,clientId = clientId,password = password,expSetName = expSetName)
Result = PoolObj.map(self.SubmitJobsAsOfDate, PartitionsOfAnalysisDates)
BatchJobIDs = OrderedDict((key, val) for Dct in Result for key, val in Dct.items())
f_pickle = open(self.JobIdPickleFileName, 'wb')
pickle.dump(BatchJobIDs, f_pickle, -1)
f_pickle.close()
def SubmitJobsAsOfDate(self,ListOfDatesForBatchJobs,userId,clientId,password,expSetName):
client = Client(self.url, proxy=self.proxysettings)
if (self.ExportSetResult != "SUCCESS"):
print("The export set creation was not successful...exiting")
sys.exit()
BatchJobIDs = OrderedDict()
NumJobsSubmitted = 0
CurrentProcessID = mp.current_process()
for AnalysisDate in ListOfDatesForBatchJobs:
jobName = "Foo_" + str(AnalysisDate)
print('Sending job from process : ', CurrentProcessID, ' : ', jobName)
jobId = client.service.SubmitExportJob(userId,clientId,password,expSetName, AnalysisDate, jobName, False)
BatchJobIDs[AnalysisDate] = jobId
NumJobsSubmitted += 1
'Sleep for 30 secs every 100 jobs'
if (NumJobsSubmitted % 100 == 0):
print('100 jobs have been submitted thus far from process : ', CurrentProcessID,'---Sleeping for 30 secs to avoid the SSL time out error')
time.sleep(30)
self.BatchJobIDs = BatchJobIDs
return BatchJobIDs
Below is the trace ::
Traceback (most recent call last):
File "C:\Program Files\JetBrains\PyCharm 2017.2.3\helpers\pydev\pydevd.py", line 1599, in <module>
globals = debugger.run(setup['file'], None, None, is_module)
File "C:\Program Files\JetBrains\PyCharm 2017.2.3\helpers\pydev\pydevd.py", line 1026, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "C:\Program Files\JetBrains\PyCharm 2017.2.3\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "C:/Users/trpff85/PycharmProjects/QuantEcon/BDTAPIMultiProcUsingPathos.py", line 289, in <module>
BDTProcessObj.SubmitJobsUsingMultiProcessing(Partitions)
File "C:/Users/trpff85/PycharmProjects/QuantEcon/BDTAPIMultiProcUsingPathos.py", line 190, in SubmitJobsUsingMultiProcessing
Result = PoolObj.map(self.SubmitJobsAsOfDate, PartitionsOfAnalysisDates)
File "C:\Users\trpff85\AppData\Local\Continuum\anaconda3\lib\multiprocessing\pool.py", line 266, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "C:\Users\trpff85\AppData\Local\Continuum\anaconda3\lib\multiprocessing\pool.py", line 644, in get
raise self._value
File "C:\Users\trpff85\AppData\Local\Continuum\anaconda3\lib\multiprocessing\pool.py", line 424, in _handle_tasks
put(task)
File "C:\Users\trpff85\AppData\Local\Continuum\anaconda3\lib\multiprocessing\connection.py", line 206, in send
self._send_bytes(_ForkingPickler.dumps(obj))
File "C:\Users\trpff85\AppData\Local\Continuum\anaconda3\lib\multiprocessing\reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
TypeError: can't pickle _thread.RLock objects
I am struggling with a similar problem. There was a bug in <=3.5 whereby _thread.RLock objects did not raise an error when pickled (They cannot be) For the Pool object to work, a function and arguments must be passed to it from the main process and this relies on pickling (pickling is a means of serialising objects) In my case the RLock object is somewhere in the logging module. I suspect your code will work fine on 3.5. Good luck. See this bug resolution.

IPython ipengineapp creation with keyword arguments

I am trying to write a script that will start an new engine.
Using some code from IPython source I have:
[engines.py]
def make_engine():
from IPython.parallel.apps import ipengineapp as app
app.launch_new_instance()
if __name__ == '__main__':
make_engine(file='./profiles/security/ipcontroller-engine.json', config='./profiles/e2.py')
if I run this with python engines.py in the command line I run into a configuration problem and my traceback is:
Traceback (most recent call last):
File "engines.py", line 30, in <module>
make_engine(file='./profiles/security/ipcontroller-engine.json', config='./profiles/e2.py')
File "engines.py", line 20, in make_engine
app.launch_new_instance(**kwargs)
File "/Users/martin/anaconda/lib/python2.7/site-packages/IPython/config/application.py", line 562, in launch_instance
app = cls.instance(**kwargs)
File "/Users/martin/anaconda/lib/python2.7/site-packages/IPython/config/configurable.py", line 354, in instance
inst = cls(*args, **kwargs)
File "<string>", line 2, in __init__
File "/Users/martin/anaconda/lib/python2.7/site-packages/IPython/config/application.py", line 94, in catch_config_error
app.print_help()
File "/Users/martin/anaconda/lib/python2.7/site-packages/IPython/config/application.py", line 346, in print_help
self.print_options()
File "/Users/martin/anaconda/lib/python2.7/site-packages/IPython/config/application.py", line 317, in print_options
self.print_alias_help()
File "/Users/martin/anaconda/lib/python2.7/site-packages/IPython/config/application.py", line 281, in print_alias_help
cls = classdict[classname]
KeyError: 'BaseIPythonApplication'
if I do a super ugly hack like the following, it works:
def make_engine():
from IPython.parallel.apps import ipengineapp as app
app.launch_new_instance()
if __name__ == '__main__':
from sys import argv
argv = ['--file=./profiles/security/ipcontroller-engine.json', '--config=./profiles/e2.py'] #OUCH this is ugly!
make_engine()
Why can't I pass the keyword arguments in the launch_new_instance method?
What are the right keyword arguments?
Where can I get the entry point to entering my configuration options?
Thanks,
Martin
The way to instantiate a new ipengine using the IPEngineApp api is:
def make_engine():
from IPython.parallel.apps.ipengineapp import IPEngineApp
lines1 ="a_command()"
app1 = IPEngineApp()
app1.url_file = './profiles/security/ipcontroller-engine.json'
app1.cluster_id = 'e2'
app1.startup_command = lines1
app1.init_engine()
app1.start()
However, this starts a new ipengine process that takes control of the script execution process, so there is no way I can start multiple engines in the same script using this method.
Thus I had to fallback on the subprocess module to spawn all additional new ipengines:
import subprocess
import os
pids = []
for num in range(1,3):
args = ["ipengine", "--config", os.path.abspath("./profiles/e%d.py" % num), "--file",os.path.abspath( "./profiles/security/ipcontroller-engine.json") ]
pid = subprocess.Popen(args).pid
pids.append(pid)

Categories