I am trying to parallelize a code in python by using multiprocessing.Process which targets a Julia function.
The function works fine when I call it directly, i.e. when I execute:
if __name__ == "__main__":
import julia
julia.Julia(compiled_modules=False)
julia.Pkg_jl.func_jl(*args)
However, I have an error when I define the same function as a target in a Process function.
This is the code:
from multiprocessing import Process
import julia
julia.Julia(compiled_modules=False)
class JuliaProcess(object):
...
def _wrapper(self, *args):
ret = julia.Pkg_jl.func_jl(args)
self.queue.put(ret) # this is for save the result of the function
def run(self, *args):
p = Process(target=self._wrapper, args=args)
self.processes.append(p) # this is for save the process job
p.start()
...
if __name__ == "__main__":
...
Jlproc = JuliaProcess()
Jlproc.run(some_args)
The error is when the Process starts, with the following output:
fatal: error thrown and no exception handler available.
ReadOnlyMemoryError()
unknown function (ip: 0x7f9df81cb8f0)
...
If I try to compile the julia modules in the _wrapper function, i.e.:
from multiprocessing import Process
import julia
class JuliaProcess(object):
...
def _wrapper(self, *args):
julia.Julia(compiled_modules=False)
ret = julia.Pkg_jl.func_jl(args)
self.queue.put(ret) # this is for save the result of the function
def run(self, *args):
p = Process(target=self._wrapper, args=args)
self.processes.append(p) # this is for save the process job
p.start()
...
if __name__ == "__main__":
...
Jlproc = JuliaProcess()
Jlproc.run(some_args)
I have the following error:
raise JuliaError(u'Exception \'{}\' occurred while calling julia code:\n{}'
julia.core.JuliaError: Exception 'ReadOnlyMemoryError' occurred while calling julia code:
const PyCall = Base.require(Base.PkgId(Base.UUID("438e738f-606a-5dbb-bf0a-cddfbfd45ab0"), "PyCall"))
...
Does anyone know what is happening? and whether it is possible using python to parallelize julia functions as I suggest.
I finally solved the error.
The syntaxis is not the problem, but the instance on which Julia packages are precompiled.
In the first code, the error is in the call [Jl]:
julia.Julia(compiled_modules=False)
just before Julia is imported.
The second code works fine since the expression [Jl] is precompiled in the target process.
Below, I share an example that works fine if you have Julia and PyCall duly installed.
#!/usr/bin/env python3
# coding=utf-8
from multiprocessing import Process, Queue
import julia
class JuliaProcess(object):
def __init__(self):
self.processes = []
self.queue = Queue()
def _wrapper(self, *args):
julia.Julia(compiled_modules=False)
from julia import LinearAlgebra as LA
ret = LA.dot(args[0],args[1])
self.queue.put(ret) # this is for save the result of the function
def run(self, *args):
p = Process(target=self._wrapper, args=args)
self.processes.append(p) # this is for save the process job
p.start()
def wait(self):
self.rets = []
for p in self.processes:
ret = self.queue.get()
self.rets.append(ret)
for p in self.processes:
p.join()
if __name__ == "__main__":
jp = JuliaProcess()
jp.run([1,5,6],[1,3,2])
jp.wait()
print(jp.rets)
Related
I have two python scripts and I want them to communicate to each other. Specifically, I want script Communication.py to send an array to script Process.py if required by the latter. I've used module multiprocessing.Process and multiprocessing.Pipe to make it works. My code works, but I want to handle gracefully SIGINT and SIGTERM, I've tried the following but it does not exit gracefully:
Process.py
from multiprocessing import Process, Pipe
from Communication import arraySender
import time
import signal
class GracefulKiller:
kill_now = False
def __init__(self):
signal.signal(signal.SIGINT, self.exit_gracefully)
signal.signal(signal.SIGTERM, self.exit_gracefully)
def exit_gracefully(self, *args):
self.kill_now = True
def main():
parent_conn, child_conn = Pipe()
p = Process(target=arraySender, args=(child_conn,True))
p.start()
print(parent_conn.recv())
if __name__ == '__main__':
killer = GracefulKiller()
while not killer.kill_now:
main()
Communication.py
import numpy
from multiprocessing import Process, Pipe
def arraySender(child_conn, sendData):
if sendData:
child_conn.send(numpy.random.randint(0, high=10, size=15, dtype=int))
child_conn.close()
what am I doing wrong?
I strongly suspect you are running this under Windows because I think the code you have should work under Linux. This is why it is important to always tag your questions concerning Python and multiprocessing with the actual platform you are on.
The problem appears to be due to the fact that in addition to your main process you have created a child process in function main that is also receiving the signals. The solution would normally be to add calls like signal.signal(signal.SIGINT, signal.SIG_IGN) to your array_sender worker function. But there are two problems with this:
There is a race condition: The signal could be received by the child process before it has a change to ignore signals.
Regardless, the call to ignore signals when you are using multiprocess.Processing does not seem to work (perhaps that class does its own signal handling that overrides these calls).
The solution is to create a multiprocessing pool and initialize each pool process so that they ignore signals before you submit any tasks. The other advantage of using a pool, although in this case we only need a pool size of 1 because you never have more than one task running at a time, is that you only need to create the process once which can then be reused.
As an aside, you have some inconsistency in your GracefulKiller class by mixing a class attribute kill_now with an instance attribute kill_now that gets created when you execute self.kill_now = True. So when the main process is testing killer.kill_now it is accessing the class attribute until such time as self.kill_now is set to True when it will then be accessing the instance attribute.
from multiprocessing import Pool, Pipe
import time
import signal
import numpy
class GracefulKiller:
def __init__(self):
self.kill_now = False # Instance attribute
signal.signal(signal.SIGINT, self.exit_gracefully)
signal.signal(signal.SIGTERM, self.exit_gracefully)
def exit_gracefully(self, *args):
self.kill_now = True
def init_pool_processes():
signal.signal(signal.SIGINT, signal.SIG_IGN)
signal.signal(signal.SIGTERM, signal.SIG_IGN)
def arraySender(sendData):
if sendData:
return numpy.random.randint(0, high=10, size=15, dtype=int)
def main(pool):
result = pool.apply(arraySender, args=(True,))
print(result)
if __name__ == '__main__':
# Create pool with only 1 process:
pool = Pool(1, initializer=init_pool_processes)
killer = GracefulKiller()
while not killer.kill_now:
main(pool)
pool.close()
pool.join()
Ideally GracefulKiller should be a singleton class so that regardless of how many times GracefulKiller was instantiated by a process, you would be calling signal.signal only once for each type of signal you want to handle:
class Singleton(type):
def __init__(self, *args, **kwargs):
self.__instance = None
super().__init__(*args, **kwargs)
def __call__(self, *args, **kwargs):
if self.__instance is None:
self.__instance = super().__call__(*args, **kwargs)
return self.__instance
class GracefulKiller(metaclass=Singleton):
def __init__(self):
self.kill_now = False # Instance attribute
signal.signal(signal.SIGINT, self.exit_gracefully)
signal.signal(signal.SIGTERM, self.exit_gracefully)
def exit_gracefully(self, *args):
self.kill_now = True
I am trying to share a pool object between multiple processes using the following code
from multiprocessing import Process, Pool
import time
pool = Pool(5)
def print_hello():
time.sleep(1)
return "hello"
def pipeline():
print("In pipeline")
msg = pool.apply_async(print_hello()).get(timeout=1.5)
print("In pipeline")
print(msg)
def post():
p = Process(target = pipeline)
p.start()
return
if __name__ == '__main__':
post()
print("Returned from post")
However the code exists with the timout since get() doesnot return. I believe this has to do with pool being a globally accessible variable because it works just fine when I move pool to being local to pipeline function. Can anyone give me suggestions if there exists a workaround for this problem ?
Edit: finally got working with thread instead of process for running pipeline function.
I am writing a module such that in one function I want to use the Pool function from the multiprocessing library in Python 3.6. I have done some research on the problem and the it seems that you cannot use if __name__=="__main__" as the code is not being run from main. I have also noticed that the python pool processes get initialized in my task manager but essentially are stuck.
So for example:
class myClass()
...
lots of different functions here
...
def multiprocessFunc()
do stuff in here
def funcThatCallsMultiprocessFunc()
array=[array of filenames to be called]
if __name__=="__main__":
p = Pool(processes=20)
p.map_async(multiprocessFunc,array)
I tried to remove the if __name__=="__main__" part but still no dice. any help would appreciated.
It seems to me that your have just missed out a self. from your code. I should think this will work:
class myClass():
...
# lots of different functions here
...
def multiprocessFunc(self, file):
# do stuff in here
def funcThatCallsMultiprocessFunc(self):
array = [array of filenames to be called]
p = Pool(processes=20)
p.map_async(self.multiprocessFunc, array) #added self. here
Now having done some experiments, I see that map_async could take quite some time to start up (I think because multiprocessing creates processes) and any test code might call funcThatCallsMultiprocessFunc and then quit before the Pool has got started.
In my tests I had to wait for over 10 seconds after funcThatCallsMultiprocessFunc before calls to multiprocessFunc started. But once started, they seemed to run just fine.
This is the actual code I've used:
MyClass.py
from multiprocessing import Pool
import time
import string
class myClass():
def __init__(self):
self.result = None
def multiprocessFunc(self, f):
time.sleep(1)
print(f)
return f
def funcThatCallsMultiprocessFunc(self):
array = [c for c in string.ascii_lowercase]
print(array)
p = Pool(processes=20)
p.map_async(self.multiprocessFunc, array, callback=self.done)
p.close()
def done(self, arg):
self.result = 'Done'
print('done', arg)
Run.py
from MyClass import myClass
import time
def main():
c = myClass()
c.funcThatCallsMultiprocessFunc()
for i in range(30):
print(i, c.result)
time.sleep(1)
if __name__=="__main__":
main()
The if __name__=='__main__' construct is an import protection. You want to use it, to stop multiprocessing from running your setup on import.
In your case, you can leave out this protection in the class setup. Be sure to protect the execution points of the class in the calling file like this:
def apply_async_with_callback():
pool = mp.Pool(processes=30)
for i in range(z):
pool.apply_async(parallel_function, args = (i,x,y, ), callback = callback_function)
pool.close()
pool.join()
print "Multiprocessing done!"
if __name__ == '__main__':
apply_async_with_callback()
import multiprocessing as mp
import time as t
class MyProcess(mp.Process):
def __init__(self, target, args, name):
mp.Process.__init__(self, target=target, args=args)
self.exit = mp.Event()
self.name = name
print("{0} initiated".format(self.name))
def run(self):
while not self.exit.is_set():
pass
print("Process {0} exited.".format(self.name))
def shutdown(self):
print("Shutdown initiated for {0}.".format(self.name))
self.exit.set()
def f(x):
while True:
print(x)
x = x+1
if __name__ == "__main__":
p = MyProcess(target=f, args=[3], name="function")
p.start()
#p.join()
t.wait(2)
p.shutdown()
I'm trying to extend the multiprocessing.Process class to add a shutdown method in order to be able to exit a function which could potentially have to be run for an undefined amount of time. Following instructions from Python Multiprocessing Exit Elegantly How? and adding the argument passing I came up with myself, only gets me this output:
function initiated
Shutdown initiated for function.
Process function exited.
But no actual method f(x) output. It seems that the actual process target doesn't get started. I'm obviously doing something wrong, but just can't figure out what, any ideas?
Thanks!
The sane way to handle this situation is, where possible, to have the background task cooperate in the exit mechanism by periodically checking the exit event. For that, there's no need to subclass Process: you can rewrite your background task to include that check. For example, here's your code rewritten using that approach:
import multiprocessing as mp
import time as t
def f(x, exit_event):
while not exit_event.is_set():
print(x)
x = x+1
print("Exiting")
if __name__ == "__main__":
exit_event = mp.Event()
p = mp.Process(target=f, args=(3, exit_event), name="function")
p.start()
t.sleep(2)
exit_event.set()
p.join()
If that's not an option (for example because you can't modify the code that's being run in the background job), then you can use the Process.terminate method. But you should be aware that using it is dangerous: the child process won't have an opportunity to clean up properly, so for example if it's shutdown while holding a multiprocessing lock, no other process will be able to acquire that lock, giving a risk of deadlock. It's far better to have the child cooperate in the shutdown if possible.
The solution to this problem is to call the super().run() function in your class run method.
Of course, this will cause the permanent execution of your function due to the existence of while True, and the specified event will not cause its end.
You can use Process.terminate() method to end your process.
import multiprocessing as mp
import time as t
class MyProcess(mp.Process):
def __init__(self, target, args, name):
mp.Process.__init__(self, target=target, args=args)
self.name = name
print("{0} initiated".format(self.name))
def run(self):
print("Process {0} started.".format(self.name))
super().run()
def shutdown(self):
print("Shutdown initiated for {0}.".format(self.name))
self.terminate()
def f(x):
while True:
print(x)
t.sleep(1)
x += 1
if __name__ == "__main__":
p = MyProcess(target=f, args=(3,), name="function")
p.start()
# p.join()
t.sleep(5)
p.shutdown()
I understand that multiprocessing.Queue has to be passed to subprocess through inheritance. However, when I try passing Pipe to a subprocess through message passing, like the following code, the error I got isn't saying that "Pipe can only be shared between processes through inheritance". Instead it fails at q.get() and the error says TypeError: Required argument 'handle' (pos 1) not found. I'm wondering is it at all possible to do so? Assuming that the pipes are implemented using linux named pipes, then all it matters is the name of the pipe and it could be the states to be serialized and passed between processes right?
from multiprocessing import Process, Pipe, Queue
def reader(q):
output_p = q.get()
msg = output_p.recv()
while msg is not None:
msg = output_p.recv()
if __name__ == '__main__':
q = Queue()
reader_p = Process(target=reader, args=(q,))
reader_p.start() # Launch the reader process
output_p, input_p = Pipe(True)
q.put(output_p)
input_p.send('MyMessage')
input_p.send(None)
reader_p.join()
This is a bug which has been fixed in Python 3.
Your code in Python 3 works flawlessly.
noxadofox gave the correct answer here. I'm adding an example I devised to validate that pipes do not require inheritance. In this example I create a second pipe after the executor started its two processes and pass it to the existing processes as a parameter.
""" Multiprocessing pipe and queue test """
import multiprocessing
import concurrent.futures
import time
class Example:
def __init__(self):
manager = multiprocessing.Manager()
q = manager.Queue()
executor = concurrent.futures.ProcessPoolExecutor(max_workers=2)
pipe_out_1, pipe_in_1 = multiprocessing.Pipe(duplex=True)
executor.submit(self.requester, q, pipe_in_1)
executor.submit(self.worker, q, pipe_out_1)
print(executor._processes)
pipe_out_2, pipe_in_2 = multiprocessing.Pipe(duplex=True)
executor.submit(self.requester, q, pipe_in_2)
executor.submit(self.worker, q, pipe_out_2)
print(executor._processes)
#staticmethod
def worker(q, pipe_out):
task = q.get()
print('worker got task {}'.format(task))
pipe_out.send(task + '-RESPONSE')
print('loop_proc sent')
#staticmethod
def requester(q, pipe_in):
q.put('TASK')
response = pipe_in.recv()
print('requester got response {}'.format(response))
time.sleep(2)
if __name__ == '__main__':
Example()
time.sleep(30)