How to use multiprocessing with class instances in Python? - python

I am trying to create a class than can run a separate process to go do some work that takes a long time, launch a bunch of these from a main module and then wait for them all to finish. I want to launch the processes once and then keep feeding them things to do rather than creating and destroying processes. For example, maybe I have 10 servers running the dd command, then I want them all to scp a file, etc.
My ultimate goal is to create a class for each system that keeps track of the information for the system in which it is tied to like IP address, logs, runtime, etc. But that class must be able to launch a system command and then return execution back to the caller while that system command runs, to followup with the result of the system command later.
My attempt is failing because I cannot send an instance method of a class over the pipe to the subprocess via pickle. Those are not pickleable. I therefore tried to fix it various ways but I can't figure it out. How can my code be patched to do this? What good is multiprocessing if you can't send over anything useful?
Is there any good documentation of multiprocessing being used with class instances? The only way I can get the multiprocessing module to work is on simple functions. Every attempt to use it within a class instance has failed. Maybe I should pass events instead? I don't understand how to do that yet.
import multiprocessing
import sys
import re
class ProcessWorker(multiprocessing.Process):
"""
This class runs as a separate process to execute worker's commands in parallel
Once launched, it remains running, monitoring the task queue, until "None" is sent
"""
def __init__(self, task_q, result_q):
multiprocessing.Process.__init__(self)
self.task_q = task_q
self.result_q = result_q
return
def run(self):
"""
Overloaded function provided by multiprocessing.Process. Called upon start() signal
"""
proc_name = self.name
print '%s: Launched' % (proc_name)
while True:
next_task_list = self.task_q.get()
if next_task is None:
# Poison pill means shutdown
print '%s: Exiting' % (proc_name)
self.task_q.task_done()
break
next_task = next_task_list[0]
print '%s: %s' % (proc_name, next_task)
args = next_task_list[1]
kwargs = next_task_list[2]
answer = next_task(*args, **kwargs)
self.task_q.task_done()
self.result_q.put(answer)
return
# End of ProcessWorker class
class Worker(object):
"""
Launches a child process to run commands from derived classes in separate processes,
which sit and listen for something to do
This base class is called by each derived worker
"""
def __init__(self, config, index=None):
self.config = config
self.index = index
# Launce the ProcessWorker for anything that has an index value
if self.index is not None:
self.task_q = multiprocessing.JoinableQueue()
self.result_q = multiprocessing.Queue()
self.process_worker = ProcessWorker(self.task_q, self.result_q)
self.process_worker.start()
print "Got here"
# Process should be running and listening for functions to execute
return
def enqueue_process(target): # No self, since it is a decorator
"""
Used to place an command target from this class object into the task_q
NOTE: Any function decorated with this must use fetch_results() to get the
target task's result value
"""
def wrapper(self, *args, **kwargs):
self.task_q.put([target, args, kwargs]) # FAIL: target is a class instance method and can't be pickled!
return wrapper
def fetch_results(self):
"""
After all processes have been spawned by multiple modules, this command
is called on each one to retreive the results of the call.
This blocks until the execution of the item in the queue is complete
"""
self.task_q.join() # Wait for it to to finish
return self.result_q.get() # Return the result
#enqueue_process
def run_long_command(self, command):
print "I am running number % as process "%number, self.name
# In here, I will launch a subprocess to run a long-running system command
# p = Popen(command), etc
# p.wait(), etc
return
def close(self):
self.task_q.put(None)
self.task_q.join()
if __name__ == '__main__':
config = ["some value", "something else"]
index = 7
workers = []
for i in range(5):
worker = Worker(config, index)
worker.run_long_command("ls /")
workers.append(worker)
for worker in workers:
worker.fetch_results()
# Do more work... (this would actually be done in a distributor in another class)
for worker in workers:
worker.close()
Edit: I tried to move the ProcessWorker class and the creation of the multiprocessing queues outside of the Worker class and then tried to manually pickle the worker instance. Even that doesn't work and I get an error
RuntimeError: Queue objects should only be shared between processes
through inheritance
. But I am only passing references of those queues into the worker instance?? I am missing something fundamental. Here is the modified code from the main section:
if __name__ == '__main__':
config = ["some value", "something else"]
index = 7
workers = []
for i in range(1):
task_q = multiprocessing.JoinableQueue()
result_q = multiprocessing.Queue()
process_worker = ProcessWorker(task_q, result_q)
worker = Worker(config, index, process_worker, task_q, result_q)
something_to_look_at = pickle.dumps(worker) # FAIL: Doesn't like queues??
process_worker.start()
worker.run_long_command("ls /")

So, the problem was that I was assuming that Python was doing some sort of magic that is somehow different from the way that C++/fork() works. I somehow thought that Python only copied the class, not the whole program into a separate process. I seriously wasted days trying to get this to work because all of the talk about pickle serialization made me think that it actually sent everything over the pipe. I knew that certain things could not be sent over the pipe, but I thought my problem was that I was not packaging things up properly.
This all could have been avoided if the Python docs gave me a 10,000 ft view of what happens when this module is used. Sure, it tells me what the methods of multiprocess module does and gives me some basic examples, but what I want to know is what is the "Theory of Operation" behind the scenes! Here is the kind of information I could have used. Please chime in if my answer is off. It will help me learn.
When you run start a process using this module, the whole program is copied into another process. But since it is not the "__main__" process and my code was checking for that, it doesn't fire off yet another process infinitely. It just stops and sits out there waiting for something to do, like a zombie. Everything that was initialized in the parent at the time of calling multiprocess.Process() is all set up and ready to go. Once you put something in the multiprocess.Queue or shared memory, or pipe, etc. (however you are communicating), then the separate process receives it and gets to work. It can draw upon all imported modules and setup just as if it was the parent. However, once some internal state variables change in the parent or separate process, those changes are isolated. Once the process is spawned, it now becomes your job to keep them in sync if necessary, either through a queue, pipe, shared memory, etc.
I threw out the code and started over, but now I am only putting one extra function out in the ProcessWorker, an "execute" method that runs a command line. Pretty simple. I don't have to worry about launching and then closing a bunch of processes this way, which has caused me all kinds of instability and performance issues in the past in C++. When I switched to launching processes at the beginning and then passing messages to those waiting processes, my performance improved and it was very stable.
BTW, I looked at this link to get help, which threw me off because the example made me think that methods were being transported across the queues: http://www.doughellmann.com/PyMOTW/multiprocessing/communication.html
The second example of the first section used "next_task()" that appeared (to me) to be executing a task received via the queue.

Instead of attempting to send a method itself (which is impractical), try sending a name of a method to execute.
Provided that each worker runs the same code, it's a matter of a simple getattr(self, task_name).
I'd pass tuples (task_name, task_args), where task_args were a dict to be directly fed to the task method:
next_task_name, next_task_args = self.task_q.get()
if next_task_name:
task = getattr(self, next_task_name)
answer = task(**next_task_args)
...
else:
# poison pill, shut down
break

REF: https://stackoverflow.com/a/14179779
Answer on Jan 6 at 6:03 by David Lynch is not factually correct when he says that he was misled by
http://www.doughellmann.com/PyMOTW/multiprocessing/communication.html.
The code and examples provided are correct and work as advertised. next_task() is executing a task received via the queue -- try and understand what the Task.__call__() method is doing.
In my case what, tripped me up was syntax errors in my implementation of run(). It seems that the sub-process will not report this and just fails silently -- leaving things stuck in weird loops! Make sure you have some kind of syntax checker running e.g. Flymake/Pyflakes in Emacs.
Debugging via multiprocessing.log_to_stderr()F helped me narrow down the problem.

Related

How to call method from different class using multiprocess pool python

How do I call a method from a different class (different module) with the use of Multiprocess pool in python?
My aim is to start a process which keep running until some task is provide, and once task is completed it will again go back to waiting mode.
Below is code, which has three module, Reader class is my run time task, I will provide execution of reader method to ProcessExecutor.
Process executor is process pool, it will continue while loop until some task is provided to it.
Main module which initiates everything.
Module 1
class Reader(object):
def __init__(self, message):
self.message = message
def reader(self):
print self.message
Module 2
class ProcessExecutor():
def run(self, queue):
print 'Before while loop'
while True:
print 'Reached Run'
try:
pair = queue.get()
print 'Running process'
print pair
func = pair.get('target')
arguments = pair.get('args', None)
if arguments is None:
func()
else:
func(arguments)
queue.task_done()
except Exception:
print Exception.message
main Module
from process_helper import ProcessExecutor
from reader import Reader
import multiprocessing
import Queue
if __name__=='__main__':
queue = Queue.Queue()
myReader = Reader('Hi')
ps = ProcessExecutor()
pool = multiprocessing.Pool(2)
pool.apply_async(ps.run, args=(queue, ))
param = {'target': myReader.reader}
queue.put(param)
Code executed without any error: C:\Python27\python.exe
C:/Users/PycharmProjects/untitled1/main/main.py
Process finished with exit code 0
Code gets executed but it never reached to run method. I am not sure is it possible to call a method of the different class using multi-processes or not
I tried apply_async, map, apply but none of them are working.
All example searched online are calling target method from the script where the main method is implemented.
I am using python 2.7
Please help.
Your first problem is that you just exit without waiting on anything. You have a Pool, a Queue, and an AsyncResult, but you just ignore all of them and exit as soon as you've created them. You should be able to get away with only waiting on the AsyncResult (after that, there's no more work to do, so who cares what you abandon), except for the fact that you're trying to use Queue.task_done, which doesn't make any sense without a Queue.join on the other side, so you need to wait on that as well.
Your second problem is that you're using the Queue from the Queue module, instead of the one from the multiprocessing module. The Queue module only works across threads in the same process.
Also, you can't call task_done on a plain Queue; that's only a method for the JoinableQueue subclass.
Once you've gotten to the point where the pool tries to actually run a task, you will get the problem that bound methods can't be pickled unless you write a pickler for them. Doing that is a pain, even though it's the right way. The traditional workaround—hacky and cheesy, but everyone did it, and it works—is to wrap each method you want to call in a top-level function. The modern solution is to use the third-party dill or cloudpickle libraries, which know how to pickle bound methods, and how to hook into multiprocessing. You should definitely look into them. But, to keep things simple, I'll show you the workaround.
Notice that, because you've created an extra queue to pass methods onto, in addition to the one built into the pool, you'll need the workaround for both targets.
With these problems fixed, your code looks like this:
from process_helper import ProcessExecutor
from reader import Reader
import multiprocessing
def call_run(ps):
ps.run(queue)
def call_reader(reader):
return reader.reader()
if __name__=='__main__':
queue = multiprocessing.JoinableQueue()
myReader = Reader('Hi')
ps = ProcessExecutor()
pool = multiprocessing.Pool(2)
res = pool.apply_async(call_run, args=(ps,))
param = {'target': call_reader, 'args': myReader}
queue.put(param)
print res.get()
queue.join()
You have additional bugs beyond this in your ProcessReader, but I'm not going to debug everything for you. This gets you past the initial hurdles, and shows the answer to the specific question you were asking about. Also, I'm not sure what the point of all that code is. You seem to be trying to replace what Pool already does on top of Pool, only in a more complicated but less powerful way, but I'm not entirely sure.
Meanwhile, here's a program that does what I think you want, with no problems, by just throwing away that ProcessExecutor and everything that goes with it:
from reader import Reader
import multiprocessing
def call_reader(reader):
return reader.reader()
if __name__=='__main__':
myReader = Reader('Hi')
pool = multiprocessing.Pool(2)
res = pool.apply_async(call_reader, args=(myReader,))
print res.get()

Running multiple independent python scripts concurrently

My goal is create one main python script that executes multiple independent python scripts in windows server 2012 at the same time. One of the benefits in my mind is that I can point taskscheduler to one main.py script as opposed to multiple .py scripts. My server has 1 cpu. I have read on multiprocessing,thread & subprocess which only added to my confusion a bit. I am basically running multiple trading scripts for different stock symbols all at the same time after market open at 9:30 EST. Following is my attempt but I have no idea whether this is right. Any direction/feedback is highly appreciated!
import subprocess
subprocess.Popen(["python", '1.py'])
subprocess.Popen(["python", '2.py'])
subprocess.Popen(["python", '3.py'])
subprocess.Popen(["python", '4.py'])
I think I'd try to do this like that:
from multiprocessing import Pool
def do_stuff_with_stock_symbol(symbol):
return _call_api()
if __name__ == '__main__':
symbols = ["GOOG", "APPL", "TSLA"]
p = Pool(len(symbols))
results = p.map(do_stuff_with_stock_symbol, symbols)
print(results)
(Modified example from multiprocessing introduction: https://docs.python.org/3/library/multiprocessing.html#introduction)
Consider using a constant pool size if you deal with a lot of stock symbols, because every python process will use some amount of memory.
Also, please note that using threads might be a lot better if you are dealing with an I/O bound workload (calling an API, writing and reading from disk). Processes really become necessary with python when dealing with compute bound workloads (because of the global interpreter lock).
An example using threads and the concurrent futures library would be:
import concurrent.futures
TIMEOUT = 60
def do_stuff_with_stock_symbol(symbol):
return _call_api()
if __name__ == '__main__':
symbols = ["GOOG", "APPL", "TSLA"]
with concurrent.futures.ThreadPoolExecutor(max_workers=len(symbols)) as executor:
results = {executor.submit(do_stuff_with_stock_symbol, symbol, TIMEOUT): symbol for symbol in symbols}
for future in concurrent.futures.as_completed(results):
symbol = results[future]
try:
data = future.result()
except Exception as exc:
print('{} generated an exception: {}'.format(symbol, exc))
else:
print('stock symbol: {}, result: {}'.format(symbol, data))
(Modified example from: https://docs.python.org/3/library/concurrent.futures.html#threadpoolexecutor-example)
Note that threads will still use some memory, but less than processes.
You could use asyncio or green threads if you want to reduce memory consumption per stock symbol to a minimum, but at some point you will run into network bandwidth problems because of all the concurrent API calls :)
While what you're asking might not be the best way to handle what you're doing, I've wanted to do similar things in the past and it took a while to find what I needed so to answer your question:
I'm not promising this to be the "best" way to do it, but it worked in my use case.
I created a class I wanted to use to extend threading.
thread.py
"""
Extends threading.Thread giving access to a Thread object which will accept
A thread_id, thread name, and a function at the time of instantiation. The
function will be called when the threads start() method is called.
"""
import threading
class Thread(threading.Thread):
def __init__(self, thread_id, name, func):
threading.Thread.__init__(self)
self.threadID = thread_id
self.name = name
# the function that should be run in the thread.
self.func = func
def run(self):
return self.func()
I needed some work done that was part of another package
work_module.py
import...
def func_that_does_work():
# do some work
pass
def more_work():
# do some work
pass
Then the main script I wanted to run
main.py
from thread import Thread
import work_module as wm
mythreads = []
mythreads.append(Thread(1, "a_name", wm.func_that_does_work))
mythreads.append(Thread(2, "another_name", wm.more_work))
for t in mythreads:
t.start()
The threads die when the run() is returned. Being this extends a Thread from threading there are several options available in the docs here: https://docs.python.org/3/library/threading.html
If all you're looking to do is automate the startup, creating a .bat file is a great and simple alternative to trying to do it with another python script.
the example linked in the comments shows how to do it with bash on unix based machines, but batch files can do a very similar thing with the START command:
start_py.bat:
START "" /B "path\to\python.exe" "path\to\script_1.py"
START "" /B "path\to\python.exe" "path\to\script_2.py"
START "" /B "path\to\python.exe" "path\to\script_3.py"
the full syntax for START can be found here.

Python Unittest and Multiprocessing fail when run sequentially

I'm attempting to create unittests for my application which uses multiple processes, but have been having strange issues when attempting to run all the tests together. Basically when running tests individually they pass without issue but when run sequentially, such as when running all tests in the file, some tests will fail.
What I'm seeing is that many python processes are being created but they aren't closing when the test is reported as passed. For example if 2 tests are run that each generate 5 proceses, then 10 python processes show up in the system monitor.
I've tried using terminate and join but neither work. Is there a way to force a test to correctly close all processes that it generated before running the next test?
I'm running Python 2.7 in Ubuntu 16.04.
Edit:
It's a fairly large code base so here a simplified example.
from multiprocessing import Pipe, Process
class BaseDevice:
# Various methods
pass
class BaseInstr(BaseDevice, Process):
def __init__(self, pipe):
Process.__init__(self)
self.pipe = pipe
def run(self):
# Do stuff and wait for terminate message on pipe
# Various other higher level methods
class BaseCompountInstrument(BaseInstr):
def __init__(self, pipe):
# Create multiple instruments, usually done with config file but simplified here
BaseInstr.__init__(self, pipe)
instrlist = list()
for _ in range(5):
masterpipe, slavepipe = Pipe()
instrlist.append([BaseInstr(slavepipe), masterpipe])
def run(self):
pass
# Listen for message from pipe, send messages to sub-instruments
def shutdown(self):
# When shutdown message received, send to all sub-instruments
pass
class test(unittest.TestCase):
def setUp(self):
# Load up a configuration file from the sample configs so that they're updated
self.parentConn, self.childConn = Pipe()
self.instr = BaseCompountInstrument( self.childConn)
self.instr.start()
def tearDown(self):
self.parentConn.send("shutdown") # Propagates to all sub-instruments
def test1(self):
pass
def test2(self):
pass
After struggling a while (2 days actually) with this, I found a solution with it is not technically wrong, but removes all the parallel code you can have (Only in tests, only in tests...)
I use this package mock to mock functions (which I realize now it's part of the unittest module since Python 3.3 xD), you can suppose the execution of certain function worked well, fix a certain return value, or change the function itself.
So I did the last option: Change the function itself.
In my case I used a list of Process (because Pool didn't work in my case) and Manager's list to share data between the processes.
My original code would be something like this:
import multiprocessing as mp
manager = mp.Manager()
list_data = manager.list()
list_return = manager.list()
def parallel_function(list_data, list_return)
while len(list_data) > 0:
# Do things and make sure to "pop" the data in list_data
list_return.append(return_data)
return None
# Create as many processes as images or cpus, the lesser number
processes = [mp.Process(target=parallel_function,
args=(list_data, list_return))
for num_p in range(mp.cpu_count())]
for p in processes:
p.start()
for p in processes:
p.join(10)
So in my test I mock the function Process._init_ from the multiprocessing module to do my parallel_function instead create a new process.
In the test file, before any test you should define the same function you try to parallelize:
def fake_process(self, list_data, list_return):
while len(list_data) > 0:
# Do things and make sure to "pop" the data in list_data
list_return.append(return_data)
return None
And before the definition of any method which is going to execute this part of the code you have to define its decorators to overwrite the Process._init_ function.
#patch('multiprocessing.Process.__init__', new=fake_process)
#patch('multiprocessing.Process.start', new=lambda x: None)
#patch('multiprocessing.Process.join', new=lambda x, y: None)
def test_from_the_hell(self):
# Do things
If you use Manager data structures there is no need of use Locks or anything to control the access to the data, because those structures are thread safe.
I hope this will help any other lost soul who is trying to test multiprocessing code.

Strange blocking behavior with python multiprocessing queue put() and get()

I have written a class in python 2.7 (under linux) that uses multiple processes to manipulate a database asynchronously. I encountered a very strange blocking behaviour when using multiprocessing.Queue.put() and multiprocessing.Queue.get() which I can't explain.
Here is a simplified version of what I do:
from multiprocessing import Process, Queue
class MyDB(object):
def __init__(self):
self.inqueue = Queue()
p1 = Process(target = self._worker_process, kwargs={"inqueue": self.inqueue})
p1.daemon = True
started = False
while not started:
try:
p1.start()
started = True
except:
time.sleep(1)
#Sometimes I start a same second process but it makes no difference to my problem
p2 = Process(target = self._worker_process, kwargs={"inqueue": self.inqueue})
#blahblah... (same as above)
#staticmethod
def _worker_process(inqueue):
while True:
#--------------this blocks depite data having arrived------------
op = inqueue.get(block = True)
#do something with specified operation
#---------------problem area end--------------------
print "if this text gets printed, the problem was solved"
def delete_parallel(self, key, rawkey = False):
someid = ...blahblah
#--------------this section blocked when I was posting the question but for unknown reasons it's fine now
self.inqueue.put({"optype": "delete", "kwargs": {"key":key, "rawkey":rawkey}, "callid": someid}, block = True)
#--------------problem area end----------------
print "if you see this text, there was no blocking or block was released"
If I run the code above inside a test (in which I call delete_parallel on the MyDB object) then everything works, but if I run it in context of my entire application (importing other stuff, inclusive pygtk) strange things happen:
For some reason self.inqueue.get blocks and never releases despite self.inqueue having the data in its buffer. When I instead call self.inqueue.get(block = False, timeout = 1) then the call finishes by raising Queue.Empty, despite the queue containing data. qsize() returns 1 (suggests that data is there) while empty() returns True (suggests that there is no data).
Now clearly there must be something somewhere else in my application that renders self.inqueue unusable by causing acquisition of some internal semaphore. However I don't know what to look for. Eclipse dubugging becomes useless once a blocking semaphore is reached.
Edit 8 (cleaning up and summarizing my previous edits) Last time I had a similar problem, it turned out that pygtk was hijacking the global interpreter lock, but I solved it by calling gobject.threads_init() before I called anything else. Could this issue be related?
When I introduce a print "successful reception" after the get() method and execute my application in terminal, the same behaviour happens at first. When I then terminate by pressing CTRL+D I suddenly get the string "successful reception" inbetween messages. This looks to me like some other process/thread is terminated and releases the lock that blocks the process that is stuck at get().
Since the process that was stuck terminates later, I still see the message. What kind of process could externally mess with a Queue like that? self.inqueue is only accessed inside my class.
Right now it seems to come down to this queue which won't return anything despite the data being there:
the get() method seems to get stuck when it attempts to receive the actual data from some internal pipe. The last line before my debugger hangs is:
res = self._recv()
which is inside of multiprocessing.queues.get()
Tracking this internal python stuff further I find the assignments
self._recv = self._reader.recv and self._reader, self._writer = Pipe(duplex=False).
Edit 9
I'm currently trying to hunt down the import that causes it. My application is quite complex with hundreds of classes and each class importing a lot of other classes, so it's a pretty painful process. I have found a first candidate class which Uses 3 different MyDB instances when I track all its imports (but doesn't access MyDB.inqueue at any time as far as I can tell). The strange thing is, it's basically just a wrapper and the wrapped class works just fine when imported on its own. This also means that it uses MyDB without freezing. As soon as I import the wrapper (which imports that class), I have the blocking issue.
I started rewriting the wrapper by gradually reusing the old code. I'm testing each time I introduce a couple of new lines until I will hopefully see which line will cause the problem to return.
queue.Queue uses internal threads to maintain its state. If you are using GTK then it will break these threads. So you will need to call gobject.init_threads().
It should be noted that qsize() only returns an approximate size of the queue. The real size may be anywhere between 0 and the value returned by qsize().

How to control a simulation in Python

I have a fairly high-level question about Python and running interactive simulations. Here is the setup:
I am porting to Python some simulation software I originally wrote in Smalltalk (VW). It is a kind of Recurrent Neural Network controlled interactively from a graphical interface. The interface allows the manipulation of most the network's parameters in real time, in addition to controlling the simulation itself (starting it, stopping it, etc). In the original Smalltalk implementation, I had two processes running with different priority levels:
The interface itself with a higher priority
The neural network running forever at a lower priority
Communication between the two processes was trivial, because all Smalltalk processes share the same address space (the Object memory).
I am now starting to realize that replicating a similar setup in Python is not so trivial. The threading module does not allow its threads to share address space, as far as I can tell. The multiprocessing module does, but in a rather complex way (with Queues, etc).
So I am starting to think that my Smalltalk perspective is leading me astray and I am approaching a relatively simple problem from the wrong angle altogether. Problem is, I don't know what is the right angle! How would you recommend I approach the problem? I am fairly new to Python (obviously) and more than willing to learn. But I would greatly appreciate suggestions on how to frame the issues and which multiprocessing modules (if any!) I should delve into.
Thanks,
Stefano
I'll offer my take on how to approach this problem. Within the multiprocessing module the Pipe and Queue IPC mechanisms are really the best way to go; in spite of the added complexity you allude to, it's worth learning how they work. The Pipe is fairly straightforward so I'll use that to illustrate.
Here's the code, followed by some explanation:
import sys
import os
import random
import time
import multiprocessing
class computing_task(multiprocessing.Process):
def __init__(self, name, pipe):
# call this before anything else
multiprocessing.Process.__init__(self)
# then any other initialization
self.name = name
self.ipcPipe = pipe
self.number1 = 0.0
self.number2 = 0.0
sys.stdout.write('[%s] created: %f\n' % (self.name, self.number1))
# Do some kind of computation
def someComputation(self):
try:
count = 0
while True:
count += 1
self.number1 = (random.uniform(0.0, 10.0)) * self.number2
sys.stdout.write('[%s]\t%d \t%g \t%g\n' % (self.name, count, self.number1, self.number2))
# Send result via pipe to parent process.
# Can send lists, whatever - anything picklable.
self.ipcPipe.send([self.name, self.number1])
# Get new data from parent process
newData = self.ipcPipe.recv()
self.number2 = newData[0]
time.sleep(0.5)
except KeyboardInterrupt:
return
def run(self):
sys.stdout.write('[%s] started ... process id: %s\n'
% (self.name, os.getpid()))
self.someComputation()
# When done, send final update to parent process and close pipe.
self.ipcPipe.send([self.name, self.number1])
self.ipcPipe.close()
sys.stdout.write('[%s] task completed: %f\n' % (self.name, self.number1))
def main():
# Create pipe
parent_conn, child_conn = multiprocessing.Pipe()
# Instantiate an object which contains the computation
# (give "child process pipe" to the object so it can phone home :) )
computeTask = computing_task('foo', child_conn)
# Start process
computeTask.start()
# Continually send and receive updates to/from the child process
try:
while True:
# receive data from child process
result = parent_conn.recv()
print "recv: ", result
# send new data to child process
parent_conn.send([random.uniform(0.0, 1.0)])
except KeyboardInterrupt:
computeTask.join()
parent_conn.close()
print "joined, exiting"
if (__name__ == "__main__"):
main()
I have encapsulated the computing to be done inside a class derived from Process. This isn't strictly necessary but makes the code easier to understand and extend, in most cases. From the main process you can start your computing task with the start() method on an instance of this class (this will start a separate process to run the contents of your object).
As you can see, we use Pipe in the parent process to create two connectors ("ends" of the pipe) and give one to the child while the the parent holds the other. Each of these connectors is a two-way communication mechanism between the processes holding the ends, with send() and recv() methods for doing what their names imply. In this example I've used the pipe to transmit lists of numbers and text, but in general you can send lists, tuples, objects, or anything that's picklable (i.e. serializable with Python's pickle facility). So you've got some latitude for what you send back and forth between processes.
So you set up your connectors, invoke start() on your new process, and you're off and computing. Here we're just multiplying two numbers, but you can see it's being done "interactively" in the subprocess with updates sent from the parent. Likewise the parent process is informed regularly of new results from the computing process.
Note that the connector's recv() method is blocking, i.e. if the other end hasn't sent anything yet, recv() will wait until something is there to read, and prevent anything else from happening in the meantime. So just be aware of that.
Hope this helps. Again, this is a barebones example and in real life you'll want to do more error handling, possibly use poll() on the connection objects, and so forth, but hopefully this conveys the major ideas and gets you started.

Categories