raspberry pi 3 multiprocessing queue syncronization between 2 processes - python

I have done a simple code using the multiprocessing library to build an extra process apart from the main code (2 processes in total). I did this code on W7 Professional x64 through Anaconda-spyder v3.2.4 and it works almost as I want except for the fact that when I run the code it increase the memory consumption of my second process (not the main one) until it reaches the total capacity and the computer got stuck and freezed (you can notice this at the whindows task manager).
"""
Example to print data from a function using multiprocessing library
Created on Thu Jan 30 12:07:49 2018
author: Kevin Machado Gamboa
Contct: ing.kevin#hotmail.com
"""
from time import time
import numpy as np
from multiprocessing import Process, Queue, Event
t0=time()
def ppg_parameters(hr, minR, ampR, minIR, ampIR, t):
HR = float(hr)
f= HR * (1/60)
# Spo2 Red signal function
sR = minR + ampR * (0.05*np.sin(2*np.pi*t*3*f)
+ 0.4*np.sin(2*np.pi*t*f) + 0.25*np.sin(2*np.pi*t*2*f+45))
# Spo2 InfraRed signal function
sIR = minIR + ampIR * (0.05*np.sin(2*np.pi*t*3*f)
+ 0.4*np.sin(2*np.pi*t*f) + 0.25*np.sin(2*np.pi*t*2*f+45))
return sR, sIR
def loop(q):
"""
generates the values of the function ppg_parameters
"""
hr = 60
ampR = 1.0814 # amplitud for Red signal
minR = 0.0 # Desplacement from zero for Red signal
ampIR = 1.12 # amplitud for InfraRed signal
minIR = 0.7 # Desplacement from zero for Red signal
# infinite loop to generate the signal
while True:
t = time()-t0
y = ppg_parameters(hr, minR, ampR, minIR, ampIR, t)
q.put([t, y[0], y[1]])
if __name__ == "__main__":
_exit = Event()
q = Queue()
p = Process(target=loop, args=(q,))
p.start()
# starts the main process
while q.qsize() != 1:
try:
data = q.get(True,2) # takes each data from the queue
print(data[0], data[1], data[2])
except KeyboardInterrupt:
p.terminate()
p.join()
print('supposed to stop')
break
Why is this happening? Perhaps is the while loop of my 2nd process? I don't know. I haven't seen this issue nowhere.
Moreover, if I run the same code on my Rpi 3 model B, there is a point when it pops an error that said "the queue is empty" something like if the main process is running faster than process two.
Please any guess of why is this happening, suggestion or link would be helpful.
Thanks

It looks like inside your infinite loop you are adding to the queue and I'm guessing that you are adding data faster than it can be taken off of the queue by the other process.
You could check the queue size periodically from inside the infinite loop and if it is over a certain amount (say 500 items), then you could sleep for a few seconds and then check again.
https://docs.python.org/2/library/queue.html#Queue.Queue.qsize

Related

Best way to use parallel computing within a for-loop in Python [duplicate]

This is probably a trivial question, but how do I parallelize the following loop in python?
# setup output lists
output1 = list()
output2 = list()
output3 = list()
for j in range(0, 10):
# calc individual parameter value
parameter = j * offset
# call the calculation
out1, out2, out3 = calc_stuff(parameter = parameter)
# put results into correct output list
output1.append(out1)
output2.append(out2)
output3.append(out3)
I know how to start single threads in Python but I don't know how to "collect" the results.
Multiple processes would be fine too - whatever is easiest for this case. I'm using currently Linux but the code should run on Windows and Mac as-well.
What's the easiest way to parallelize this code?
Using multiple threads on CPython won't give you better performance for pure-Python code due to the global interpreter lock (GIL). I suggest using the multiprocessing module instead:
pool = multiprocessing.Pool(4)
out1, out2, out3 = zip(*pool.map(calc_stuff, range(0, 10 * offset, offset)))
Note that this won't work in the interactive interpreter.
To avoid the usual FUD around the GIL: There wouldn't be any advantage to using threads for this example anyway. You want to use processes here, not threads, because they avoid a whole bunch of problems.
from joblib import Parallel, delayed
def process(i):
return i * i
results = Parallel(n_jobs=2)(delayed(process)(i) for i in range(10))
print(results) # prints [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
The above works beautifully on my machine (Ubuntu, package joblib was pre-installed, but can be installed via pip install joblib).
Taken from https://blog.dominodatalab.com/simple-parallelization/
Edit on Mar 31, 2021: On joblib, multiprocessing, threading and asyncio
joblib in the above code uses import multiprocessing under the hood (and thus multiple processes, which is typically the best way to run CPU work across cores - because of the GIL)
You can let joblib use multiple threads instead of multiple processes, but this (or using import threading directly) is only beneficial if the threads spend considerable time on I/O (e.g. read/write to disk, send an HTTP request). For I/O work, the GIL does not block the execution of another thread
Since Python 3.7, as an alternative to threading, you can parallelise work with asyncio, but the same advice applies like for import threading (though in contrast to latter, only 1 thread will be used; on the plus side, asyncio has a lot of nice features which are helpful for async programming)
Using multiple processes incurs overhead. Think about it: Typically, each process needs to initialise/load everything you need to run your calculation. You need to check yourself if the above code snippet improves your wall time. Here is another one, for which I confirmed that joblib produces better results:
import time
from joblib import Parallel, delayed
def countdown(n):
while n>0:
n -= 1
return n
t = time.time()
for _ in range(20):
print(countdown(10**7), end=" ")
print(time.time() - t)
# takes ~10.5 seconds on medium sized Macbook Pro
t = time.time()
results = Parallel(n_jobs=2)(delayed(countdown)(10**7) for _ in range(20))
print(results)
print(time.time() - t)
# takes ~6.3 seconds on medium sized Macbook Pro
To parallelize a simple for loop, joblib brings a lot of value to raw use of multiprocessing. Not only the short syntax, but also things like transparent bunching of iterations when they are very fast (to remove the overhead) or capturing of the traceback of the child process, to have better error reporting.
Disclaimer: I am the original author of joblib.
This IS the easiest way to do it!
You can use asyncio. (Documentation can be found here). It is used as a foundation for multiple Python asynchronous frameworks that provide high-performance network and web-servers, database connection libraries, distributed task queues, etc. Plus it has both high-level and low-level APIs to accomodate any kind of problem.
import asyncio
def background(f):
def wrapped(*args, **kwargs):
return asyncio.get_event_loop().run_in_executor(None, f, *args, **kwargs)
return wrapped
#background
def your_function(argument):
#code
Now this function will be run in parallel whenever called without putting main program into wait state. You can use it to parallelize for loop as well. When called for a for loop, though loop is sequential but every iteration runs in parallel to the main program as soon as interpreter gets there.
1. Firing loop in parallel to main thread without any waiting
#background
def your_function(argument):
time.sleep(5)
print('function finished for '+str(argument))
for i in range(10):
your_function(i)
print('loop finished')
This produces following output:
loop finished
function finished for 4
function finished for 8
function finished for 0
function finished for 3
function finished for 6
function finished for 2
function finished for 5
function finished for 7
function finished for 9
function finished for 1
Update: May 2022
Although this answers the original question, there are ways where we can wait for loops to finish as requested by upvoted comments. So adding them here as well. Keys to implementations are: asyncio.gather() & run_until_complete(). Consider the following functions:
import asyncio
import time
def background(f):
def wrapped(*args, **kwargs):
return asyncio.get_event_loop().run_in_executor(None, f, *args, **kwargs)
return wrapped
#background
def your_function(argument, other_argument): # Added another argument
time.sleep(5)
print(f"function finished for {argument=} and {other_argument=}")
def code_to_run_before():
print('This runs Before Loop!')
def code_to_run_after():
print('This runs After Loop!')
2. Run in parallel but wait for finish
code_to_run_before() # Anything you want to run before, run here!
loop = asyncio.get_event_loop() # Have a new event loop
looper = asyncio.gather(*[your_function(i, 1) for i in range(1, 5)]) # Run the loop
results = loop.run_until_complete(looper) # Wait until finish
code_to_run_after() # Anything you want to run after, run here!
This produces following output:
This runs Before Loop!
function finished for argument=2 and other_argument=1
function finished for argument=3 and other_argument=1
function finished for argument=1 and other_argument=1
function finished for argument=4 and other_argument=1
This runs After Loop!
3. Run multiple loops in parallel and wait for finish
code_to_run_before() # Anything you want to run before, run here!
loop = asyncio.get_event_loop() # Have a new event loop
group1 = asyncio.gather(*[your_function(i, 1) for i in range(1, 2)]) # Run all the loops you want
group2 = asyncio.gather(*[your_function(i, 2) for i in range(3, 5)]) # Run all the loops you want
group3 = asyncio.gather(*[your_function(i, 3) for i in range(6, 9)]) # Run all the loops you want
all_groups = asyncio.gather(group1, group2, group3) # Gather them all
results = loop.run_until_complete(all_groups) # Wait until finish
code_to_run_after() # Anything you want to run after, run here!
This produces following output:
This runs Before Loop!
function finished for argument=3 and other_argument=2
function finished for argument=1 and other_argument=1
function finished for argument=6 and other_argument=3
function finished for argument=4 and other_argument=2
function finished for argument=7 and other_argument=3
function finished for argument=8 and other_argument=3
This runs After Loop!
4. Loops running sequentially but iterations of each loop running in parallel to one another
code_to_run_before() # Anything you want to run before, run here!
for loop_number in range(3):
loop = asyncio.get_event_loop() # Have a new event loop
looper = asyncio.gather(*[your_function(i, loop_number) for i in range(1, 5)]) # Run the loop
results = loop.run_until_complete(looper) # Wait until finish
print(f"finished for {loop_number=}")
code_to_run_after() # Anything you want to run after, run here!
This produces following output:
This runs Before Loop!
function finished for argument=3 and other_argument=0
function finished for argument=4 and other_argument=0
function finished for argument=1 and other_argument=0
function finished for argument=2 and other_argument=0
finished for loop_number=0
function finished for argument=4 and other_argument=1
function finished for argument=3 and other_argument=1
function finished for argument=2 and other_argument=1
function finished for argument=1 and other_argument=1
finished for loop_number=1
function finished for argument=1 and other_argument=2
function finished for argument=4 and other_argument=2
function finished for argument=3 and other_argument=2
function finished for argument=2 and other_argument=2
finished for loop_number=2
This runs After Loop!
Update: June 2022
This in its current form may not run on some versions of jupyter notebook. Reason being jupyter notebook utilizing event loop. To make it work on such jupyter versions, nest_asyncio (which would nest the event loop as evident from the name) is the way to go. Just import and apply it at the top of the cell as:
import nest_asyncio
nest_asyncio.apply()
And all the functionality discussed above should be accessible in a notebook environment as well.
What's the easiest way to parallelize this code?
Use a PoolExecutor from concurrent.futures. Compare the original code with this, side by side. First, the most concise way to approach this is with executor.map:
...
with ProcessPoolExecutor() as executor:
for out1, out2, out3 in executor.map(calc_stuff, parameters):
...
or broken down by submitting each call individually:
...
with ThreadPoolExecutor() as executor:
futures = []
for parameter in parameters:
futures.append(executor.submit(calc_stuff, parameter))
for future in futures:
out1, out2, out3 = future.result() # this will block
...
Leaving the context signals the executor to free up resources
You can use threads or processes and use the exact same interface.
A working example
Here is working example code, that will demonstrate the value of :
Put this in a file - futuretest.py:
from concurrent.futures import ProcessPoolExecutor, ThreadPoolExecutor
from time import time
from http.client import HTTPSConnection
def processor_intensive(arg):
def fib(n): # recursive, processor intensive calculation (avoid n > 36)
return fib(n-1) + fib(n-2) if n > 1 else n
start = time()
result = fib(arg)
return time() - start, result
def io_bound(arg):
start = time()
con = HTTPSConnection(arg)
con.request('GET', '/')
result = con.getresponse().getcode()
return time() - start, result
def manager(PoolExecutor, calc_stuff):
if calc_stuff is io_bound:
inputs = ('python.org', 'stackoverflow.com', 'stackexchange.com',
'noaa.gov', 'parler.com', 'aaronhall.dev')
else:
inputs = range(25, 32)
timings, results = list(), list()
start = time()
with PoolExecutor() as executor:
for timing, result in executor.map(calc_stuff, inputs):
# put results into correct output list:
timings.append(timing), results.append(result)
finish = time()
print(f'{calc_stuff.__name__}, {PoolExecutor.__name__}')
print(f'wall time to execute: {finish-start}')
print(f'total of timings for each call: {sum(timings)}')
print(f'time saved by parallelizing: {sum(timings) - (finish-start)}')
print(dict(zip(inputs, results)), end = '\n\n')
def main():
for computation in (processor_intensive, io_bound):
for pool_executor in (ProcessPoolExecutor, ThreadPoolExecutor):
manager(pool_executor, calc_stuff=computation)
if __name__ == '__main__':
main()
And here's the output for one run of python -m futuretest:
processor_intensive, ProcessPoolExecutor
wall time to execute: 0.7326343059539795
total of timings for each call: 1.8033506870269775
time saved by parallelizing: 1.070716381072998
{25: 75025, 26: 121393, 27: 196418, 28: 317811, 29: 514229, 30: 832040, 31: 1346269}
processor_intensive, ThreadPoolExecutor
wall time to execute: 1.190223217010498
total of timings for each call: 3.3561410903930664
time saved by parallelizing: 2.1659178733825684
{25: 75025, 26: 121393, 27: 196418, 28: 317811, 29: 514229, 30: 832040, 31: 1346269}
io_bound, ProcessPoolExecutor
wall time to execute: 0.533886194229126
total of timings for each call: 1.2977914810180664
time saved by parallelizing: 0.7639052867889404
{'python.org': 301, 'stackoverflow.com': 200, 'stackexchange.com': 200, 'noaa.gov': 301, 'parler.com': 200, 'aaronhall.dev': 200}
io_bound, ThreadPoolExecutor
wall time to execute: 0.38941240310668945
total of timings for each call: 1.6049387454986572
time saved by parallelizing: 1.2155263423919678
{'python.org': 301, 'stackoverflow.com': 200, 'stackexchange.com': 200, 'noaa.gov': 301, 'parler.com': 200, 'aaronhall.dev': 200}
Processor-intensive analysis
When performing processor intensive calculations in Python, expect the ProcessPoolExecutor to be more performant than the ThreadPoolExecutor.
Due to the Global Interpreter Lock (a.k.a. the GIL), threads cannot use multiple processors, so expect the time for each calculation and the wall time (elapsed real time) to be greater.
IO-bound analysis
On the other hand, when performing IO bound operations, expect ThreadPoolExecutor to be more performant than ProcessPoolExecutor.
Python's threads are real, OS, threads. They can be put to sleep by the operating system and reawakened when their information arrives.
Final thoughts
I suspect that multiprocessing will be slower on Windows, since Windows doesn't support forking so each new process has to take time to launch.
You can nest multiple threads inside multiple processes, but it's recommended to not use multiple threads to spin off multiple processes.
If faced with a heavy processing problem in Python, you can trivially scale with additional processes - but not so much with threading.
There are a number of advantages to using Ray:
You can parallelize over multiple machines in addition to multiple cores (with the same code).
Efficient handling of numerical data through shared memory (and zero-copy serialization).
High task throughput with distributed scheduling.
Fault tolerance.
In your case, you could start Ray and define a remote function
import ray
ray.init()
#ray.remote(num_return_vals=3)
def calc_stuff(parameter=None):
# Do something.
return 1, 2, 3
and then invoke it in parallel
output1, output2, output3 = [], [], []
# Launch the tasks.
for j in range(10):
id1, id2, id3 = calc_stuff.remote(parameter=j)
output1.append(id1)
output2.append(id2)
output3.append(id3)
# Block until the results have finished and get the results.
output1 = ray.get(output1)
output2 = ray.get(output2)
output3 = ray.get(output3)
To run the same example on a cluster, the only line that would change would be the call to ray.init(). The relevant documentation can be found here.
Note that I'm helping to develop Ray.
I found joblib is very useful with me. Please see following example:
from joblib import Parallel, delayed
def yourfunction(k):
s=3.14*k*k
print "Area of a circle with a radius ", k, " is:", s
element_run = Parallel(n_jobs=-1)(delayed(yourfunction)(k) for k in range(1,10))
n_jobs=-1: use all available cores
Dask futures; I'm surprised no one has mentioned it yet . . .
from dask.distributed import Client
client = Client(n_workers=8) # In this example I have 8 cores and processes (can also use threads if desired)
def my_function(i):
output = <code to execute in the for loop here>
return output
futures = []
for i in <whatever you want to loop across here>:
future = client.submit(my_function, i)
futures.append(future)
results = client.gather(futures)
client.close()
why dont you use threads, and one mutex to protect one global list?
import os
import re
import time
import sys
import thread
from threading import Thread
class thread_it(Thread):
def __init__ (self,param):
Thread.__init__(self)
self.param = param
def run(self):
mutex.acquire()
output.append(calc_stuff(self.param))
mutex.release()
threads = []
output = []
mutex = thread.allocate_lock()
for j in range(0, 10):
current = thread_it(j * offset)
threads.append(current)
current.start()
for t in threads:
t.join()
#here you have output list filled with data
keep in mind, you will be as fast as your slowest thread
thanks #iuryxavier
from multiprocessing import Pool
from multiprocessing import cpu_count
def add_1(x):
return x + 1
if __name__ == "__main__":
pool = Pool(cpu_count())
results = pool.map(add_1, range(10**12))
pool.close() # 'TERM'
pool.join() # 'KILL'
The concurrent wrappers by the tqdm library are a nice way to parallelize longer-running code. tqdm provides feedback on the current progress and remaining time through a smart progress meter, which I find very useful for long computations.
Loops can be rewritten to run as concurrent threads through a simple call to thread_map, or as concurrent multi-processes through a simple call to process_map:
from tqdm.contrib.concurrent import thread_map, process_map
def calc_stuff(num, multiplier):
import time
time.sleep(1)
return num, num * multiplier
if __name__ == "__main__":
# let's parallelize this for loop:
# results = [calc_stuff(i, 2) for i in range(64)]
loop_idx = range(64)
multiplier = [2] * len(loop_idx)
# either with threading:
results_threading = thread_map(calc_stuff, loop_idx, multiplier)
# or with multi-processing:
results_processes = process_map(calc_stuff, loop_idx, multiplier)
Let's say we have an async function
async def work_async(self, student_name: str, code: str, loop):
"""
Some async function
"""
# Do some async procesing
That needs to be run on a large array. Some attributes are being passed to the program and some are used from property of dictionary element in the array.
async def process_students(self, student_name: str, loop):
market = sys.argv[2]
subjects = [...] #Some large array
batchsize = 5
for i in range(0, len(subjects), batchsize):
batch = subjects[i:i+batchsize]
await asyncio.gather(*(self.work_async(student_name,
sub['Code'],
loop)
for sub in batch))
This could be useful when implementing multiprocessing and parallel/ distributed computing in Python.
YouTube tutorial on using techila package
Techila is a distributed computing middleware, which integrates directly with Python using the techila package. The peach function in the package can be useful in parallelizing loop structures. (Following code snippet is from the Techila Community Forums)
techila.peach(funcname = 'theheavyalgorithm', # Function that will be called on the compute nodes/ Workers
files = 'theheavyalgorithm.py', # Python-file that will be sourced on Workers
jobs = jobcount # Number of Jobs in the Project
)
Have a look at this;
http://docs.python.org/library/queue.html
This might not be the right way to do it, but I'd do something like;
Actual code;
from multiprocessing import Process, JoinableQueue as Queue
class CustomWorker(Process):
def __init__(self,workQueue, out1,out2,out3):
Process.__init__(self)
self.input=workQueue
self.out1=out1
self.out2=out2
self.out3=out3
def run(self):
while True:
try:
value = self.input.get()
#value modifier
temp1,temp2,temp3 = self.calc_stuff(value)
self.out1.put(temp1)
self.out2.put(temp2)
self.out3.put(temp3)
self.input.task_done()
except Queue.Empty:
return
#Catch things better here
def calc_stuff(self,param):
out1 = param * 2
out2 = param * 4
out3 = param * 8
return out1,out2,out3
def Main():
inputQueue = Queue()
for i in range(10):
inputQueue.put(i)
out1 = Queue()
out2 = Queue()
out3 = Queue()
processes = []
for x in range(2):
p = CustomWorker(inputQueue,out1,out2,out3)
p.daemon = True
p.start()
processes.append(p)
inputQueue.join()
while(not out1.empty()):
print out1.get()
print out2.get()
print out3.get()
if __name__ == '__main__':
Main()
Hope that helps.
very simple example of parallel processing is
from multiprocessing import Process
output1 = list()
output2 = list()
output3 = list()
def yourfunction():
for j in range(0, 10):
# calc individual parameter value
parameter = j * offset
# call the calculation
out1, out2, out3 = calc_stuff(parameter=parameter)
# put results into correct output list
output1.append(out1)
output2.append(out2)
output3.append(out3)
if __name__ == '__main__':
p = Process(target=pa.yourfunction, args=('bob',))
p.start()
p.join()

Starting a large number of dependent process in async using python multiprocessing

Problem: I've a DAG(Directed-acyclic-graph) like structure for starting the execution of some massive data processing on a machine. Some of the process can only be started when their parent data processing is completed cause there is multi level of processing. I want to use python multiprocessing library to handle all on one single machine of it as first goal and later scale to execute on different machines using Managers. I've got no prior experience with python multiprocessing. Can anyone suggest if it's a good library to begin with? If yes, some basic implementation idea would do just fine. If not, what else can be used to do this thing in python?
Example:
A -> B
B -> D, E, F, G
C -> D
In the above example i want to kick A & C first(parallel), after their successful execution, other remaining processes would just wait for B to finish first. As soon as B finishes its execution all other process will start.
P.S.: Sorry i cannot share actual data because confidential, though i tried to make it clear using the example.
I'm a big fan of using processes and queues for things like this.
Like so:
from multiprocessing import Process, Queue
from Queue import Empty as QueueEmpty
import time
#example process functions
def processA(queueA, queueB):
while True:
try:
data = queueA.get_nowait()
if data == 'END':
break
except QueueEmpty:
time.sleep(2) #wait some time for data to enter queue
continue
#do stuff with data
queueB.put(data)
def processA(queueB, _):
while True:
try:
data = queueB.get_nowait()
if data == 'END':
break
except QueueEmpty:
time.sleep(2) #wait some time for data to enter queue
continue
#do stuff with data
#helper functions for starting and stopping processes
def start_procs(num_workers, target_function, args):
procs = []
for _ in range(num_workers):
p = Process(target=target_function, args=args)
p.start()
procs.append(p)
return procs
def shutdown_process(proc_lst, queue):
for _ in proc_lst:
queue.put('END')
for p in proc_lst:
try:
p.join()
except KeyboardInterrupt:
break
queueA = Queue(<size of queue> * 3) #needs to be a bit bigger than actual. 3x works well for me
queueB = Queue(<size of queue>)
queueC = Queue(<size of queue>)
queueD = Queue(<size of queue>)
procsA = start_procs(number_of_workers, processA, (queueA, queueB))
procsB = start_procs(number_of_workers, processB, (queueB, None))
# feed some data to processA
[queueA.put(data) for data in start_data]
#shutdown processes
shutdown_process(procsA, queueA)
shutdown_process(procsB, queueB)
#etc, etc. You could arrange the start, stop, and data feed statements to arrive at the dag behaviour you desire

Python numpy.fft very slow (10x slower) when run in subprocess

I've found that numpy.fft.fft (and its variants) very slow when run in the background. Here is an example of what I'm talking about
import numpy as np
import multiprocessing as mproc
import time
import sys
# the producer function, which will run in the background and produce data
def Producer(dataQ):
numFrames = 5
n = 0
while n < numFrames:
data = np.random.rand(3000, 200)
dataQ.put(data) # send the datta to the consumer
time.sleep(0.1) # sleep for 0.5 second, so we dont' overload CPU
n += 1
# the consumer function, which will run in the backgrounnd and consume data from the producer
def Consumer(dataQ):
while True:
data = dataQ.get()
t1 = time.time()
fftdata = np.fft.rfft(data, n=3000*5)
tDiff = time.time() - t1
print("Elapsed time is %0.3f" % tDiff)
time.sleep(0.01)
sys.stdout.flush()
# the main program if __name__ == '__main__': is necessary to prevent this code from being run
# only when this program is started by user
if __name__ == '__main__':
data = np.random.rand(3000, 200)
t1 = time.time()
fftdata = np.fft.rfft(data, n=3000*5, axis=0)
tDiff = time.time() - t1
print("Elapsed time is %0.3f" % tDiff)
# generate a queue for transferring data between the producedr and the consumer
dataQ = mproc.Queue(4)
# start up the processoso
producerProcess = mproc.Process(target=Producer, args=[dataQ], daemon=False)
consumerProcess = mproc.Process(target=Consumer, args=[dataQ], daemon=False)
print("starting up processes")
producerProcess.start()
consumerProcess.start()
time.sleep(10) # let program run for 5 seconds
producerProcess.terminate()
consumerProcess.terminate()
The output it produes on my machine:
Elapsed time is 0.079
starting up processes
Elapsed time is 0.859
Elapsed time is 0.861
Elapsed time is 0.878
Elapsed time is 0.863
Elapsed time is 0.758
As you can see, it is roughly 10x slower when run in the background, and I can't figure out why this would be the case. The time.sleep() calls should ensure that the other process (the main process and producer process) aren't doing anything when the FFT is being computed, so it should use all the cores. I've checked CPU utilization through Windows Task Manager and it seems to use up about 25% when numpy.fft.fft is called heavily in both the single process and multiprocess cases.
Anyone have an idea whats going on?
The main problem is that your fft call in the background thread is:
fftdata = np.fft.rfft(data, n=3000*5)
rather than:
fftdata = np.fft.rfft(data, n=3000*5, axis=0)
which for me made all the difference.
There are a few other things worth noting. Rather than having the time.sleep() everywhere, why not just let the processor take care of this itself? Further more, rather than suspending the main thread, you can use
consumerProcess.join()
and then have the producer process run dataQ.put(None) once it has finished loading the data, and break out of the loop in the consumer process, i.e.:
def Consumer(dataQ):
while True:
data = dataQ.get()
if(data is None):
break
...

Parallelly started Processes in Python executing serially?

I have an understanding problem/question concerning the multiprocessing library of Python:
Why do different processes started (almost) simultaneously at least seem to execute serially instead of parallely?
The task is to control a universe of a large number of particles (a particle being a set of x/y/z coordinates and a mass) and perform various analyses on them while taking advantage of a multi-processor environment. Specifically for the example shown below I want to calculate the center of the mass of all particles.
Because the task specifically says to use multiple processors I didn't use the thread library as there is this GIL-thingy in place that constrains the execution to one processor.
Here's my code:
from multiprocessing import Process, Lock, Array, Value
from random import random
import math
from time import time
def exercise2(noOfParticles, noOfProcs):
startingTime = time()
particles = []
processes = []
centerCoords = Array('d',[0,0,0])
totalMass = Value('d',0)
lock = Lock()
#create all particles
for i in range(noOfParticles):
p = Particle()
particles.append(p)
for i in range(noOfProcs):
#determine the number of particles every process needs to analyse
particlesPerProcess = math.ceil(noOfParticles / noOfProcs)
#create noOfProcs Processes, each with a different set of particles
p = Process(target=processBatch, args=(
particles[i*particlesPerProcess:(i+1)*particlesPerProcess],
centerCoords, #handle to shared memory
totalMass, #handle to shared memory
lock, #handle to lock
'batch'+str(i)), #also pass name of process for easier logging
name='batch'+str(i))
processes.append(p)
print('created proc:',i)
#start all processes
for p in processes:
p.start() #here, the program waits for the started process to terminate. why?
#wait for all processes to finish
for p in processes:
p.join()
#normalize the coordinates
centerCoords[0] /= totalMass.value
centerCoords[1] /= totalMass.value
centerCoords[2] /= totalMass.value
print(centerCoords[:])
print('total time used', time() - startingTime, ' seconds')
class Particle():
"""a particle is a very simple physical object, having a set of x/y/z coordinates and a mass.
All values are randomly set at initialization of the object"""
def __init__(self):
self.x = random() * 1000
self.y = random() * 1000
self.z = random() * 1000
self.m = random() * 10
def printProperties(self):
attrs = vars(self)
print ('\n'.join("%s: %s" % item for item in attrs.items()))
def processBatch(particles,centerCoords,totalMass,lock,name):
"""calculates the mass-weighted sum of all coordinates of all particles as well as the sum of all masses.
Writes the results into the shared memory centerCoords and totalMass, using lock"""
print(name,' started')
mass = 0
centerX = 0
centerY = 0
centerZ = 0
for p in particles:
centerX += p.m*p.x
centerY += p.m*p.y
centerZ += p.m*p.z
mass += p.m
with lock:
centerCoords[0] += centerX
centerCoords[1] += centerY
centerCoords[2] += centerZ
totalMass.value += mass
print(name,' ended')
if __name__ == '__main__':
exercise2(2**16,6)
Now I'd expect all processes to start at about the same time and parallelly execute. But when I look at the output of the programm, this looks as if the processes were executing serially:
created proc: 0
created proc: 1
created proc: 2
created proc: 3
created proc: 4
created proc: 5
batch0 started
batch0 ended
batch1 started
batch1 ended
batch2 started
batch2 ended
batch3 started
batch3 ended
batch4 started
batch4 ended
batch5 started
batch5 ended
[499.72234074100135, 497.26586187539453, 498.9208784328791]
total time used 4.7220001220703125 seconds
Also when stepping through the programm using the Eclipse-debugger, I can see how the program always waits for one process to terminate before starting the next one at the line marked with a comment ending in 'why?'. Of course, this might just be the debugger, but when I look at the output produced in a normal run, this shows exactly the above picture.
Are those processes executing parallelly and I just can't see it due to some sharing problem of stdout?
If the processes are executing serially: why? And how can I make them run in parallel?
Any help on understanding this is greatly appreciated.
I executed the above code from PyDev and from command line using Python 3.2.3 on a Windows 7 Machine with a dual core Intel processor.
Edit:
Due to the output of the program I misunderstood the problem: The processes are actually running in parallel, but the overhead of pickling large amounts of data and sending it to the subprocesses takes so long that it completely distorts the picture.
Moving the creation of the particles (i.e. the data) to the subprocesses so that they don't have to be pickled in the first place removed all problems and resulted in a useful, parallel execution of the program.
To solve the task, I will therefore have to keep the particles in shared memory so they don't have to be passed to subprocesses.
I ran your code on my system (Python 2.6.5) and it returned almost instantly with results, which makes me think that perhaps your task size is so small that the processes finish before the next can begin (note that starting a process is slower than spinning up a thread). I question the total time used 4.7220001220703125 seconds in your results, because that's about 40x longer than it took my system to run the same code. I scaled up the number of particles to 2**20, and I got the following results:
('created proc:', 0)
('created proc:', 1)
('created proc:', 2)
('created proc:', 3)
('created proc:', 4)
('created proc:', 5)
('batch0', ' started')
('batch1', ' started')
('batch2', ' started')
('batch3', ' started')
('batch4', ' started')
('batch5', ' started')
('batch0', ' ended')
('batch1', ' ended')
('batch2', ' ended')
('batch3', ' ended')
('batch5', ' ended')
('batch4', ' ended')
[500.12090773656854, 499.92759577086059, 499.97075039983588]
('total time used', 5.1031057834625244, ' seconds')
That's more in line with what I would expect. What do you get if you increase the task size?

How do I parallelize a simple Python loop?

This is probably a trivial question, but how do I parallelize the following loop in python?
# setup output lists
output1 = list()
output2 = list()
output3 = list()
for j in range(0, 10):
# calc individual parameter value
parameter = j * offset
# call the calculation
out1, out2, out3 = calc_stuff(parameter = parameter)
# put results into correct output list
output1.append(out1)
output2.append(out2)
output3.append(out3)
I know how to start single threads in Python but I don't know how to "collect" the results.
Multiple processes would be fine too - whatever is easiest for this case. I'm using currently Linux but the code should run on Windows and Mac as-well.
What's the easiest way to parallelize this code?
Using multiple threads on CPython won't give you better performance for pure-Python code due to the global interpreter lock (GIL). I suggest using the multiprocessing module instead:
pool = multiprocessing.Pool(4)
out1, out2, out3 = zip(*pool.map(calc_stuff, range(0, 10 * offset, offset)))
Note that this won't work in the interactive interpreter.
To avoid the usual FUD around the GIL: There wouldn't be any advantage to using threads for this example anyway. You want to use processes here, not threads, because they avoid a whole bunch of problems.
from joblib import Parallel, delayed
def process(i):
return i * i
results = Parallel(n_jobs=2)(delayed(process)(i) for i in range(10))
print(results) # prints [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
The above works beautifully on my machine (Ubuntu, package joblib was pre-installed, but can be installed via pip install joblib).
Taken from https://blog.dominodatalab.com/simple-parallelization/
Edit on Mar 31, 2021: On joblib, multiprocessing, threading and asyncio
joblib in the above code uses import multiprocessing under the hood (and thus multiple processes, which is typically the best way to run CPU work across cores - because of the GIL)
You can let joblib use multiple threads instead of multiple processes, but this (or using import threading directly) is only beneficial if the threads spend considerable time on I/O (e.g. read/write to disk, send an HTTP request). For I/O work, the GIL does not block the execution of another thread
Since Python 3.7, as an alternative to threading, you can parallelise work with asyncio, but the same advice applies like for import threading (though in contrast to latter, only 1 thread will be used; on the plus side, asyncio has a lot of nice features which are helpful for async programming)
Using multiple processes incurs overhead. Think about it: Typically, each process needs to initialise/load everything you need to run your calculation. You need to check yourself if the above code snippet improves your wall time. Here is another one, for which I confirmed that joblib produces better results:
import time
from joblib import Parallel, delayed
def countdown(n):
while n>0:
n -= 1
return n
t = time.time()
for _ in range(20):
print(countdown(10**7), end=" ")
print(time.time() - t)
# takes ~10.5 seconds on medium sized Macbook Pro
t = time.time()
results = Parallel(n_jobs=2)(delayed(countdown)(10**7) for _ in range(20))
print(results)
print(time.time() - t)
# takes ~6.3 seconds on medium sized Macbook Pro
To parallelize a simple for loop, joblib brings a lot of value to raw use of multiprocessing. Not only the short syntax, but also things like transparent bunching of iterations when they are very fast (to remove the overhead) or capturing of the traceback of the child process, to have better error reporting.
Disclaimer: I am the original author of joblib.
This IS the easiest way to do it!
You can use asyncio. (Documentation can be found here). It is used as a foundation for multiple Python asynchronous frameworks that provide high-performance network and web-servers, database connection libraries, distributed task queues, etc. Plus it has both high-level and low-level APIs to accomodate any kind of problem.
import asyncio
def background(f):
def wrapped(*args, **kwargs):
return asyncio.get_event_loop().run_in_executor(None, f, *args, **kwargs)
return wrapped
#background
def your_function(argument):
#code
Now this function will be run in parallel whenever called without putting main program into wait state. You can use it to parallelize for loop as well. When called for a for loop, though loop is sequential but every iteration runs in parallel to the main program as soon as interpreter gets there.
1. Firing loop in parallel to main thread without any waiting
#background
def your_function(argument):
time.sleep(5)
print('function finished for '+str(argument))
for i in range(10):
your_function(i)
print('loop finished')
This produces following output:
loop finished
function finished for 4
function finished for 8
function finished for 0
function finished for 3
function finished for 6
function finished for 2
function finished for 5
function finished for 7
function finished for 9
function finished for 1
Update: May 2022
Although this answers the original question, there are ways where we can wait for loops to finish as requested by upvoted comments. So adding them here as well. Keys to implementations are: asyncio.gather() & run_until_complete(). Consider the following functions:
import asyncio
import time
def background(f):
def wrapped(*args, **kwargs):
return asyncio.get_event_loop().run_in_executor(None, f, *args, **kwargs)
return wrapped
#background
def your_function(argument, other_argument): # Added another argument
time.sleep(5)
print(f"function finished for {argument=} and {other_argument=}")
def code_to_run_before():
print('This runs Before Loop!')
def code_to_run_after():
print('This runs After Loop!')
2. Run in parallel but wait for finish
code_to_run_before() # Anything you want to run before, run here!
loop = asyncio.get_event_loop() # Have a new event loop
looper = asyncio.gather(*[your_function(i, 1) for i in range(1, 5)]) # Run the loop
results = loop.run_until_complete(looper) # Wait until finish
code_to_run_after() # Anything you want to run after, run here!
This produces following output:
This runs Before Loop!
function finished for argument=2 and other_argument=1
function finished for argument=3 and other_argument=1
function finished for argument=1 and other_argument=1
function finished for argument=4 and other_argument=1
This runs After Loop!
3. Run multiple loops in parallel and wait for finish
code_to_run_before() # Anything you want to run before, run here!
loop = asyncio.get_event_loop() # Have a new event loop
group1 = asyncio.gather(*[your_function(i, 1) for i in range(1, 2)]) # Run all the loops you want
group2 = asyncio.gather(*[your_function(i, 2) for i in range(3, 5)]) # Run all the loops you want
group3 = asyncio.gather(*[your_function(i, 3) for i in range(6, 9)]) # Run all the loops you want
all_groups = asyncio.gather(group1, group2, group3) # Gather them all
results = loop.run_until_complete(all_groups) # Wait until finish
code_to_run_after() # Anything you want to run after, run here!
This produces following output:
This runs Before Loop!
function finished for argument=3 and other_argument=2
function finished for argument=1 and other_argument=1
function finished for argument=6 and other_argument=3
function finished for argument=4 and other_argument=2
function finished for argument=7 and other_argument=3
function finished for argument=8 and other_argument=3
This runs After Loop!
4. Loops running sequentially but iterations of each loop running in parallel to one another
code_to_run_before() # Anything you want to run before, run here!
for loop_number in range(3):
loop = asyncio.get_event_loop() # Have a new event loop
looper = asyncio.gather(*[your_function(i, loop_number) for i in range(1, 5)]) # Run the loop
results = loop.run_until_complete(looper) # Wait until finish
print(f"finished for {loop_number=}")
code_to_run_after() # Anything you want to run after, run here!
This produces following output:
This runs Before Loop!
function finished for argument=3 and other_argument=0
function finished for argument=4 and other_argument=0
function finished for argument=1 and other_argument=0
function finished for argument=2 and other_argument=0
finished for loop_number=0
function finished for argument=4 and other_argument=1
function finished for argument=3 and other_argument=1
function finished for argument=2 and other_argument=1
function finished for argument=1 and other_argument=1
finished for loop_number=1
function finished for argument=1 and other_argument=2
function finished for argument=4 and other_argument=2
function finished for argument=3 and other_argument=2
function finished for argument=2 and other_argument=2
finished for loop_number=2
This runs After Loop!
Update: June 2022
This in its current form may not run on some versions of jupyter notebook. Reason being jupyter notebook utilizing event loop. To make it work on such jupyter versions, nest_asyncio (which would nest the event loop as evident from the name) is the way to go. Just import and apply it at the top of the cell as:
import nest_asyncio
nest_asyncio.apply()
And all the functionality discussed above should be accessible in a notebook environment as well.
What's the easiest way to parallelize this code?
Use a PoolExecutor from concurrent.futures. Compare the original code with this, side by side. First, the most concise way to approach this is with executor.map:
...
with ProcessPoolExecutor() as executor:
for out1, out2, out3 in executor.map(calc_stuff, parameters):
...
or broken down by submitting each call individually:
...
with ThreadPoolExecutor() as executor:
futures = []
for parameter in parameters:
futures.append(executor.submit(calc_stuff, parameter))
for future in futures:
out1, out2, out3 = future.result() # this will block
...
Leaving the context signals the executor to free up resources
You can use threads or processes and use the exact same interface.
A working example
Here is working example code, that will demonstrate the value of :
Put this in a file - futuretest.py:
from concurrent.futures import ProcessPoolExecutor, ThreadPoolExecutor
from time import time
from http.client import HTTPSConnection
def processor_intensive(arg):
def fib(n): # recursive, processor intensive calculation (avoid n > 36)
return fib(n-1) + fib(n-2) if n > 1 else n
start = time()
result = fib(arg)
return time() - start, result
def io_bound(arg):
start = time()
con = HTTPSConnection(arg)
con.request('GET', '/')
result = con.getresponse().getcode()
return time() - start, result
def manager(PoolExecutor, calc_stuff):
if calc_stuff is io_bound:
inputs = ('python.org', 'stackoverflow.com', 'stackexchange.com',
'noaa.gov', 'parler.com', 'aaronhall.dev')
else:
inputs = range(25, 32)
timings, results = list(), list()
start = time()
with PoolExecutor() as executor:
for timing, result in executor.map(calc_stuff, inputs):
# put results into correct output list:
timings.append(timing), results.append(result)
finish = time()
print(f'{calc_stuff.__name__}, {PoolExecutor.__name__}')
print(f'wall time to execute: {finish-start}')
print(f'total of timings for each call: {sum(timings)}')
print(f'time saved by parallelizing: {sum(timings) - (finish-start)}')
print(dict(zip(inputs, results)), end = '\n\n')
def main():
for computation in (processor_intensive, io_bound):
for pool_executor in (ProcessPoolExecutor, ThreadPoolExecutor):
manager(pool_executor, calc_stuff=computation)
if __name__ == '__main__':
main()
And here's the output for one run of python -m futuretest:
processor_intensive, ProcessPoolExecutor
wall time to execute: 0.7326343059539795
total of timings for each call: 1.8033506870269775
time saved by parallelizing: 1.070716381072998
{25: 75025, 26: 121393, 27: 196418, 28: 317811, 29: 514229, 30: 832040, 31: 1346269}
processor_intensive, ThreadPoolExecutor
wall time to execute: 1.190223217010498
total of timings for each call: 3.3561410903930664
time saved by parallelizing: 2.1659178733825684
{25: 75025, 26: 121393, 27: 196418, 28: 317811, 29: 514229, 30: 832040, 31: 1346269}
io_bound, ProcessPoolExecutor
wall time to execute: 0.533886194229126
total of timings for each call: 1.2977914810180664
time saved by parallelizing: 0.7639052867889404
{'python.org': 301, 'stackoverflow.com': 200, 'stackexchange.com': 200, 'noaa.gov': 301, 'parler.com': 200, 'aaronhall.dev': 200}
io_bound, ThreadPoolExecutor
wall time to execute: 0.38941240310668945
total of timings for each call: 1.6049387454986572
time saved by parallelizing: 1.2155263423919678
{'python.org': 301, 'stackoverflow.com': 200, 'stackexchange.com': 200, 'noaa.gov': 301, 'parler.com': 200, 'aaronhall.dev': 200}
Processor-intensive analysis
When performing processor intensive calculations in Python, expect the ProcessPoolExecutor to be more performant than the ThreadPoolExecutor.
Due to the Global Interpreter Lock (a.k.a. the GIL), threads cannot use multiple processors, so expect the time for each calculation and the wall time (elapsed real time) to be greater.
IO-bound analysis
On the other hand, when performing IO bound operations, expect ThreadPoolExecutor to be more performant than ProcessPoolExecutor.
Python's threads are real, OS, threads. They can be put to sleep by the operating system and reawakened when their information arrives.
Final thoughts
I suspect that multiprocessing will be slower on Windows, since Windows doesn't support forking so each new process has to take time to launch.
You can nest multiple threads inside multiple processes, but it's recommended to not use multiple threads to spin off multiple processes.
If faced with a heavy processing problem in Python, you can trivially scale with additional processes - but not so much with threading.
There are a number of advantages to using Ray:
You can parallelize over multiple machines in addition to multiple cores (with the same code).
Efficient handling of numerical data through shared memory (and zero-copy serialization).
High task throughput with distributed scheduling.
Fault tolerance.
In your case, you could start Ray and define a remote function
import ray
ray.init()
#ray.remote(num_return_vals=3)
def calc_stuff(parameter=None):
# Do something.
return 1, 2, 3
and then invoke it in parallel
output1, output2, output3 = [], [], []
# Launch the tasks.
for j in range(10):
id1, id2, id3 = calc_stuff.remote(parameter=j)
output1.append(id1)
output2.append(id2)
output3.append(id3)
# Block until the results have finished and get the results.
output1 = ray.get(output1)
output2 = ray.get(output2)
output3 = ray.get(output3)
To run the same example on a cluster, the only line that would change would be the call to ray.init(). The relevant documentation can be found here.
Note that I'm helping to develop Ray.
I found joblib is very useful with me. Please see following example:
from joblib import Parallel, delayed
def yourfunction(k):
s=3.14*k*k
print "Area of a circle with a radius ", k, " is:", s
element_run = Parallel(n_jobs=-1)(delayed(yourfunction)(k) for k in range(1,10))
n_jobs=-1: use all available cores
Dask futures; I'm surprised no one has mentioned it yet . . .
from dask.distributed import Client
client = Client(n_workers=8) # In this example I have 8 cores and processes (can also use threads if desired)
def my_function(i):
output = <code to execute in the for loop here>
return output
futures = []
for i in <whatever you want to loop across here>:
future = client.submit(my_function, i)
futures.append(future)
results = client.gather(futures)
client.close()
why dont you use threads, and one mutex to protect one global list?
import os
import re
import time
import sys
import thread
from threading import Thread
class thread_it(Thread):
def __init__ (self,param):
Thread.__init__(self)
self.param = param
def run(self):
mutex.acquire()
output.append(calc_stuff(self.param))
mutex.release()
threads = []
output = []
mutex = thread.allocate_lock()
for j in range(0, 10):
current = thread_it(j * offset)
threads.append(current)
current.start()
for t in threads:
t.join()
#here you have output list filled with data
keep in mind, you will be as fast as your slowest thread
thanks #iuryxavier
from multiprocessing import Pool
from multiprocessing import cpu_count
def add_1(x):
return x + 1
if __name__ == "__main__":
pool = Pool(cpu_count())
results = pool.map(add_1, range(10**12))
pool.close() # 'TERM'
pool.join() # 'KILL'
The concurrent wrappers by the tqdm library are a nice way to parallelize longer-running code. tqdm provides feedback on the current progress and remaining time through a smart progress meter, which I find very useful for long computations.
Loops can be rewritten to run as concurrent threads through a simple call to thread_map, or as concurrent multi-processes through a simple call to process_map:
from tqdm.contrib.concurrent import thread_map, process_map
def calc_stuff(num, multiplier):
import time
time.sleep(1)
return num, num * multiplier
if __name__ == "__main__":
# let's parallelize this for loop:
# results = [calc_stuff(i, 2) for i in range(64)]
loop_idx = range(64)
multiplier = [2] * len(loop_idx)
# either with threading:
results_threading = thread_map(calc_stuff, loop_idx, multiplier)
# or with multi-processing:
results_processes = process_map(calc_stuff, loop_idx, multiplier)
Let's say we have an async function
async def work_async(self, student_name: str, code: str, loop):
"""
Some async function
"""
# Do some async procesing
That needs to be run on a large array. Some attributes are being passed to the program and some are used from property of dictionary element in the array.
async def process_students(self, student_name: str, loop):
market = sys.argv[2]
subjects = [...] #Some large array
batchsize = 5
for i in range(0, len(subjects), batchsize):
batch = subjects[i:i+batchsize]
await asyncio.gather(*(self.work_async(student_name,
sub['Code'],
loop)
for sub in batch))
This could be useful when implementing multiprocessing and parallel/ distributed computing in Python.
YouTube tutorial on using techila package
Techila is a distributed computing middleware, which integrates directly with Python using the techila package. The peach function in the package can be useful in parallelizing loop structures. (Following code snippet is from the Techila Community Forums)
techila.peach(funcname = 'theheavyalgorithm', # Function that will be called on the compute nodes/ Workers
files = 'theheavyalgorithm.py', # Python-file that will be sourced on Workers
jobs = jobcount # Number of Jobs in the Project
)
Have a look at this;
http://docs.python.org/library/queue.html
This might not be the right way to do it, but I'd do something like;
Actual code;
from multiprocessing import Process, JoinableQueue as Queue
class CustomWorker(Process):
def __init__(self,workQueue, out1,out2,out3):
Process.__init__(self)
self.input=workQueue
self.out1=out1
self.out2=out2
self.out3=out3
def run(self):
while True:
try:
value = self.input.get()
#value modifier
temp1,temp2,temp3 = self.calc_stuff(value)
self.out1.put(temp1)
self.out2.put(temp2)
self.out3.put(temp3)
self.input.task_done()
except Queue.Empty:
return
#Catch things better here
def calc_stuff(self,param):
out1 = param * 2
out2 = param * 4
out3 = param * 8
return out1,out2,out3
def Main():
inputQueue = Queue()
for i in range(10):
inputQueue.put(i)
out1 = Queue()
out2 = Queue()
out3 = Queue()
processes = []
for x in range(2):
p = CustomWorker(inputQueue,out1,out2,out3)
p.daemon = True
p.start()
processes.append(p)
inputQueue.join()
while(not out1.empty()):
print out1.get()
print out2.get()
print out3.get()
if __name__ == '__main__':
Main()
Hope that helps.
very simple example of parallel processing is
from multiprocessing import Process
output1 = list()
output2 = list()
output3 = list()
def yourfunction():
for j in range(0, 10):
# calc individual parameter value
parameter = j * offset
# call the calculation
out1, out2, out3 = calc_stuff(parameter=parameter)
# put results into correct output list
output1.append(out1)
output2.append(out2)
output3.append(out3)
if __name__ == '__main__':
p = Process(target=pa.yourfunction, args=('bob',))
p.start()
p.join()

Categories