How to do multiprocessing with class instances - python

The program is designed to set up a process creation listener on various ips on the network. The code is:
import multiprocessing
from wmi import WMI
dynaIP = ['192.168.165.1','192.168.165.2','192.168.165.3','192.168.165.4',]
class WindowsMachine:
def __init__(self, ip):
self.ip = ip
self.connection = WMI(self.ip)
self.created_process = multiprocessing.Process(target = self.monitor_created_process, args = (self.connection,))
self.created_process.start()
def monitor_created_process(self, remote_pc):
while True:
created_process = remote_pc.Win32_Process.watch_for("creation")
print('Creation:',created_process.Caption, created_process.ProcessId, created_process.CreationDate)
return created_process
if __name__ == '__main__':
for ip in dynaIP:
print('Running', ip)
WindowsMachine(ip)
When running the code I get the following error:
Traceback (most recent call last):
File "U:/rmarshall/Work For Staff/ROB/_Python/__Python Projects Code/multipro_instance_stack_question.py", line 26, in <module>
WindowsMachine(ip)
File "U:/rmarshall/Work For Staff/ROB/_Python/__Python Projects Code/multipro_instance_stack_question.py", line 14, in __init__
self.created_process.start()
File "C:\Python33\lib\multiprocessing\process.py", line 111, in start
self._popen = Popen(self)
File "C:\Python33\lib\multiprocessing\forking.py", line 248, in __init__
dump(process_obj, to_child, HIGHEST_PROTOCOL)
File "C:\Python33\lib\multiprocessing\forking.py", line 166, in dump
ForkingPickler(file, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <class 'PyIID'>: attribute lookup builtins.PyIID failed
I have looked at other questions surrounding this issue but none I feel have clearly explained the work-around for pickling class instances.
Is anyone able to demonstrate this?

The problem here is that multiprocessing pickles the arguments to the process in order to pass them around. The WMI class is not pickleable, so cannot be passed as an argument to multiprocessing.Process.
If you want this to work, you can either:
switch to using threads instead of processes (see the threading module)
create the WMI object in monitor_created_process
I'd recommend the former, because there doesn't seem to be much use in creating full-blown processes.

Related

How should I write a logger proxy?

I had a function for making a logger proxy that could be safely passed to multiprocessing workers and log back to the main logger with essentially the following code:
import logging
from multiprocessing.managers import BaseManager
class SimpleGenerator:
def __init__(self, obj): self._obj = obj
def __call__(self): return self._obj
def get_logger_proxy(logger):
class LoggerManager(BaseManager): pass
logger_generator = SimpleGenerator(logger)
LoggerManager.register('logger', callable = logger_generator)
logger_manager = LoggerManager()
logger_manager.start()
logger_proxy = logger_manager.logger()
return logger_proxy
logger = logging.getLogger('test')
logger_proxy = get_logger_proxy(logger)
This worked great on python 2.7 through 3.7. I could pass the resulting logger_proxy to workers and they would log information, which would then be properly sent back to the main logger.
However, on python 3.8.2 (and 3.8.0) I get the following:
Traceback (most recent call last):
File "junk.py", line 20, in <module>
logger_proxy = get_logger_proxy(logger)
File "junk.py", line 13, in get_logger_proxy
logger_manager.start()
File "/anaconda3/envs/py3.8/lib/python3.8/multiprocessing/managers.py", line 579, in start
self._process.start()
File "/anaconda3/envs/py3.8/lib/python3.8/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/anaconda3/envs/py3.8/lib/python3.8/multiprocessing/context.py", line 283, in _Popen
return Popen(process_obj)
File "/anaconda3/envs/py3.8/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in __init__
super().__init__(process_obj)
File "/anaconda3/envs/py3.8/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
self._launch(process_obj)
File "/anaconda3/envs/py3.8/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch
reduction.dump(process_obj, fp)
File "/anaconda3/envs/py3.8/lib/python3.8/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'get_logger_proxy.<locals>.LoggerManager'
So it seems that they changed something about ForkingPickler that makes it unable to handle the closure in my get_logger_proxy function.
My question is, how can I fix this to work in python 3.8? Or is there a better way to get a logger proxy that will work in python 3.8 they way this one did for previous versions?

why cannot cuda model be initialized under the __init__ method in a class that inherits multiprocessing.process?

Here is my code:
from MyDetector import Helmet_Detector
from multiprocessing import Process
class Processor(Process):
def __init__(self):
super().__init__()
self.helmet_detector = Helmet_Detector()
def run(self):
print(111)
if __name__ == '__main__':
p=Processor()
p.start()
As you can see, the class 'Processor' inherits multiprocessing.Process, and Helmet_Detector is a YOLO model using cuda. But when I ran it, the error occurred as follow:
THCudaCheck FAIL file=C:\w\1\s\tmp_conda_3.7_075911\conda\conda-bld\pytorch_1579075223148\work\torch/csrc/generic/StorageSharing.cpp line=245 error=71 : operation not supported
Traceback (most recent call last):
File "E:/python-tasks/WHU-CSTECH/Processor.py", line 17, in <module>
p.start()
File "C:\Anaconda\lib\multiprocessing\process.py", line 112, in start
self._popen = self._Popen(self)
File "C:\Anaconda\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Anaconda\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "C:\Anaconda\lib\multiprocessing\popen_spawn_win32.py", line 89, in __init__
reduction.dump(process_obj, to_child)
File "C:\Anaconda\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
File "C:\Anaconda\lib\site-packages\torch\multiprocessing\reductions.py", line 242, in reduce_tensor
event_sync_required) = storage._share_cuda_()
RuntimeError: cuda runtime error (71) : operation not supported at C:\w\1\s\tmp_conda_3.7_075911\conda\conda-bld\pytorch_1579075223148\work\torch/csrc/generic/StorageSharing.cpp:245
then I tried to intialize the Helmet_Detector in run method:
def run(self):
print(111)
self.helmet_detector = Helmet_Detector()
No error occurred. Could anyone please tell me the reason for this and how could I solve this problem? Thank you!
Error occurs because in python multiprocessing requires Process class objects to be pickelable so that data can be transferred to the process being created i.e. Serialisation and deserialization of the object. Suggestion to overcome the issue, lazy instantiate the Helmet_Detector object (hint: try property in python).
Edit:
As per the comment by #jodag, you should use pytorch's multiprocessing library instead of standard multiprocessing library
Example:
import torch.multiprocessing as mp
class Processor(mp.Process):
.
.
.

Error using pexpect and multiprocessing? error "TypError: cannot serialize '_io.TextIOWrapper' object"

I have a Python 3.7 script on a Linux machine where I am trying to run a function in multiple threads, but when I try I receive the following error:
Traceback (most recent call last):
File "./test2.py", line 43, in <module>
pt.ping_scanx()
File "./test2.py", line 39, in ping_scanx
par = Parallel(function=self.pingx, parameter_list=list, thread_limit=10)
File "./test2.py", line 19, in __init__
self._x = self._pool.starmap(function, parameter_list, chunksize=1)
File "/usr/local/lib/python3.7/multiprocessing/pool.py", line 276, in starmap
return self._map_async(func, iterable, starmapstar, chunksize).get()
File "/usr/local/lib/python3.7/multiprocessing/pool.py", line 657, in get
raise self._value
File "/usr/local/lib/python3.7/multiprocessing/pool.py", line 431, in _handle_tasks
put(task)
File "/usr/local/lib/python3.7/multiprocessing/connection.py", line 206, in send
self._send_bytes(_ForkingPickler.dumps(obj))
File "/usr/local/lib/python3.7/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
TypeError: cannot serialize '_io.TextIOWrapper' object
This is the sample code that I am using to demonstrate the issue:
#!/usr/local/bin/python3.7
from multiprocessing import Pool
import pexpect # Used to run SSH for sessions
class Parallel:
def __init__(self, function, parameter_list, thread_limit=4):
# Create new thread to hold our jobs
self._pool = Pool(processes=thread_limit)
self._x = self._pool.starmap(function, parameter_list, chunksize=1)
class PingTest():
def __init__(self):
self._pex = None
def connect(self):
self._pex = pexpect.spawn("ssh snorton#127.0.0.1")
def pingx(self, target_ip, source_ip):
print("PING {} {}".format(target_ip, source_ip))
def ping_scanx(self):
self.connect()
list = [['8.8.8.8', '96.53.16.93'],
['8.8.8.8', '96.53.16.93']]
par = Parallel(function=self.pingx, parameter_list=list, thread_limit=10)
pt = PingTest()
pt.ping_scanx()
If I don't include the line with pexpect.spawn, the error doesn't happen. Can someone explain why I am getting the error, and suggest a way to fix it?
With multiprocessing.Pool you're actually calling the function as separate processes, not threads. Processes cannot share Python objects unless they are serialized first before transmitting them to each other via inter-processing communication channels, which is what multiprocessing.Pool does for you behind the scene using pickle as the serializer. Since pexpect.spawn opens a terminal device as a file-like TextIOWrapper object, and you're storing the returning object in the PingTest instance and then passing the bound method self.pingx to Pool.starmap, it will try to serialize self, which contains the pexpect.spawn object in the _pex attribute that unfortunately cannot be serialized because TextIOWrapper does not support serialization.
Since your function is I/O-bound, you should use threading instead via the multiprocessing.dummy module for more efficient parallelization and more importantly in this case, to allow the pexpect.spawn object to be shared across the threads, with no need for serialization.
Change:
from multiprocessing import Pool
to:
from multiprocessing.dummy import Pool
Demo: https://repl.it/#blhsing/WiseYoungExperiments

Getting TypeError: can't pickle _thread.lock objects

I am querying MongoDB to get list of dictionary and for each dict in the list, I am doing some comparison of values. Based on the result of comparison, I am storing the values of dictionary, comparison result, and other values calculated in a mongoDB collection. I am trying to do this by invoking multiprocessing and am getting this error.
def save_for_doc(doc_id):
#function to get the fields of doc
fields = get_fields(doc_id)
no_of_process = 5
doc_col_size = 30000
chunk_size = round(doc_col_size/no_of_process)
chunk_ranges = range(0, no_of_process*chunk_size, chunk_size)
processes = [ multiprocessing.Process(target=save_similar_docs, args=
(doc_id,client,fields,chunks,chunk_size)) for chunks in chunk_ranges]
for prc in processes:
prc.start()
def save_similar_docs(arguments):
#This function process the args and saves the results to MongoDB. Does not
#return anything as the end result is directly stored.
Below is the error:
File "H:/Desktop/Performance Improvement/With_Process_Pool.py", line 144,
in save_for_doc
prc.start()
File "C:\ProgramData\Anaconda3\lib\multiprocessing\process.py", line 105,
in start
self._popen = self._Popen(self)
File "C:\ProgramData\Anaconda3\lib\multiprocessing\context.py", line 223,
in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\ProgramData\Anaconda3\lib\multiprocessing\context.py", line 322,
in _Popen
return Popen(process_obj)
File "C:\ProgramData\Anaconda3\lib\multiprocessing\popen_spawn_win32.py",
line 65, in __init__
reduction.dump(process_obj, to_child)
File "C:\ProgramData\Anaconda3\lib\multiprocessing\reduction.py", line 60,
in dump
ForkingPickler(file, protocol).dump(obj)
TypeError: can't pickle _thread.lock objects
What does this error mean? Please explain and how can I get over.
The documentation says that you can't copy a client from a main process to a child process, you have to create the connection after you fork. The client object cannot be copied, create connections, after you fork the process.
On Unix systems the multiprocessing module spawns processes using fork(). Care must be taken when using instances of MongoClient with fork(). Specifically, instances of MongoClient must not be copied from a parent process to a child process. Instead, the parent process and each child process must create their own instances of MongoClient.
http://api.mongodb.com/python/current/faq.html#id21

python multiprocessing pickling/manager/misc error (from PMOTW)

I'm having some trouble getting the following code to run on Eclipse via Windows. The code is from Doug Hellman:
import random
import multiprocessing
import time
class ActivePool:
def __init__(self):
super(ActivePool, self).__init__()
self.mgr = multiprocessing.Manager()
self.active = self.mgr.list()
self.lock = multiprocessing.Lock()
def makeActive(self, name):
with self.lock:
self.active.append(name)
def makeInactive(self, name):
with self.lock:
self.active.remove(name)
def __str__(self):
with self.lock:
return str(self.active)
def worker(s, pool):
name = multiprocessing.current_process().name
with s:
pool.makeActive(name)
print('Activating {} now running {}'.format(
name, pool))
time.sleep(random.random())
pool.makeInactive(name)
if __name__ == '__main__':
pool = ActivePool()
s = multiprocessing.Semaphore(3)
jobs = [
multiprocessing.Process(
target=worker,
name=str(i),
args=(s, pool),
)
for i in range(10)
]
for j in jobs:
j.start()
for j in jobs:
j.join()
print('Now running: %s' % str(pool))
I get the following error, which I assume is due to some pickling issue with passing in pool as an argument to Process.
Traceback (most recent call last):
File "E:\Eclipse_Workspace\CodeExamples\FromCodes\CodeTest.py", line 50, in <module>
j.start()
File "C:\Users\Bob\AppData\Local\Programs\Python\Python36-32\lib\multiprocessing\process.py", line 105, in start
self._popen = self._Popen(self)
File "C:\Users\Bob\AppData\Local\Programs\Python\Python36-32\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Users\Bob\AppData\Local\Programs\Python\Python36-32\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "C:\Users\Bob\AppData\Local\Programs\Python\Python36-32\lib\multiprocessing\popen_spawn_win32.py", line 65, in __init__
reduction.dump(process_obj, to_child)
File "C:\Users\Bob\AppData\Local\Programs\Python\Python36-32\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
File "C:\Users\Bob\AppData\Local\Programs\Python\Python36-32\lib\multiprocessing\connection.py", line 939, in reduce_pipe_connection
dh = reduction.DupHandle(conn.fileno(), access)
File "C:\Users\Bob\AppData\Local\Programs\Python\Python36-32\lib\multiprocessing\connection.py", line 170, in fileno
self._check_closed()
File "C:\Users\Bob\AppData\Local\Programs\Python\Python36-32\lib\multiprocessing\connection.py", line 136, in _check_closed
raise OSError("handle is closed")
OSError: handle is closed
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Users\Bob\AppData\Local\Programs\Python\Python36-32\lib\multiprocessing\spawn.py", line 99, in spawn_main
new_handle = reduction.steal_handle(parent_pid, pipe_handle)
File "C:\Users\Bob\AppData\Local\Programs\Python\Python36-32\lib\multiprocessing\reduction.py", line 87, in steal_handle
_winapi.DUPLICATE_SAME_ACCESS | _winapi.DUPLICATE_CLOSE_SOURCE)
PermissionError: [WinError 5] Access is denied
A similar question's answer seems to suggest that I initialize pool with a function call at the top level, but I don't know how to apply that to this example. Do I initialize ActivePool in worker? That seems to defeat the spirit of Hellman's example.
Another answer suggests I use __getstate__, __setstate__, to remove unpickleable objects and reconstruct them when unpickling, but I don't know a good way to do this with Proxy Objects like Manager, and I actually don't know what the unpickleable object is.
Is there any way I can make this example work with minimal changes? I really wish to understand what is going on under the hood. Thanks!
Edit - Problem Solved:
The pickling issue was pretty obvious in hindsight. The ActivePool's __init__ contained a Manager() object which seems unpicklable. The code runs normally as per Hellman's example if we remove self.mgr, and initialize the list ProxyObject in one line:
def __init__(self):
super(ActivePool, self).__init__()
self.active = multiprocessing.Manager().list()
self.lock = multiprocessing.Lock()
Comment: The 'join()' was in the Hellman example, but I forgot to add it into the code snippet. Any other ideas?
I'm running Linux and it works as expected, Windows behave different read understanding-multiprocessing-shared-memory-management-locks-and-queues-in-pyt
To determine which Parameter of args=(s, pool) raise the Error remove one and use it as global.
Change:
def worker(s):
...
args=(s,),
Note: There is no need to enclose a multiprocessing.Manager().list() with a Lock().
This is not the culprit of your error.
Question: Is there any way I can make this example work with minimal changes?
Your __main__ Process terminates, therefore all started Processes die at unpredicted position of execution. Add simple a .join() at the end to let the __main__ wait until all Processes done:
for j in jobs:
j.join()
print('EXIT __main__')
Tested with Python: 3.4.2

Categories