I'd like to do multi-core processing on very long lists (not numpy arrrays!), but I can't get it to work. The examples I find do not help much either. My idea is to split that vector in several equal-sized parts, do something with the data, and return the modified data. The example operation is obviously simple; in reality it contains a number of if-statements and for-loops
ncore = 4
size = 100
vectors = []
for icore in range(ncore):
vectors.append([vector[ind:ind+size] for ind in range(icore * size, (icore + 1) * size)])
and the functions
from multiprocessing import Process
def some_func(vector):
return [val*val for val in vector]
if True:
for vector in vectors:
# print(name)
proc = Process(target=some_func, args=(vector,))
procs.append(proc)
proc.start()
# complete the processes
for proc in procs:
proc.join()
But there is no output.
Solutions?
thanks, Andreas
Related
I am working on a project where I am using multiprocessing and trying to achieve the minimal time. (I have tested that my one process takes around 4secs and if there are 8 processes working in parallel they should take around the same time or lets say around 6 to 7secs at max.
In the list of arguments, A Manager.List() (lets call it main_list) is a common argument that is passed to each process to append a list in the main_list after processing a txt file ( includes conversions, transformations and multiplications of hex data).
Same procedure is followed in all 8 processes.
By using Manager.List(), it was taking around 22 secs. I wanted a way around so I could reduce this time. Now, I am using Queue to achieve my goal but it seems like that the queue will not be effective for this method?
def square(x, q):
q.put((x,x*x))
if __name__=='__main__':
qout = mp.Queue()
processes=[]
t1=time.perf_counter()
for i in range(10):
p = mp.Process(target=square, args=(i, qout))
p.start()
processes.append(p)
for p in processes:
p.join()
unsorted_result = [qout.get() for p in processes]
result = [t[1] for t in sorted(unsorted_result)]
t2=time.perf_counter()
print(t2-t1)
print(result)
OUTPUT
0.7646916
I want to be sure if i can consider using Queue this way instead of Manager.list() to reduce this time.
I am sorry for not sharing the actual code.
See my comment to your question. This would be the solution using a multiprocessing pool with method map:
from multiprocessing import Pool
def square(x):
return x * x
if __name__=='__main__':
# Create a pool with 10 processes:
pool = Pool(10)
result = pool.map(square, range(10))
print(result)
pool.close()
pool.join()
Prints:
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
The managed list that you were using is represented by a proxy object. Every append operation you do on that list results in a message being sent to a thread running in a process started by the multiprocessing.SyncManager instance that was created when you presumably called multiprocessing.Manager(). It is in that process where the actual list resides. So managed lists are generally not the most efficient solution available.
How can I run multiple processes pool where I process run1-3 asynchronously, with a multi processing tool in python. I am trying to pass the values (10,2,4),(55,6,8),(9,8,7) for run1,run2,run3 respectively?
import multiprocessing
def Numbers(number,number2,divider):
value = number * number2/divider
return value
if __name__ == "__main__":
with multiprocessing.Pool(3) as pool: # 3 processes
run1, run2, run3 = pool.map(Numbers, [(10,2,4),(55,6,8),(9,8,7)]) # map input & output
You just need to use method starmap instead of map, which, according to the documentation:
Like map() except that the elements of the iterable are expected to be iterables that are unpacked as arguments.
Hence an iterable of [(1,2), (3, 4)] results in [func(1,2), func(3,4)].
import multiprocessing
def Numbers(number,number2,divider):
value = number * number2/divider
return value
if __name__ == "__main__":
with multiprocessing.Pool(3) as pool: # 3 processes
run1, run2, run3 = pool.starmap(Numbers, [(10,2,4),(55,6,8),(9,8,7)]) # map input & output
print(run1, run2, run3)
Prints:
5.0 41.25 10.285714285714286
Note
This is the correct way of doing what you want to do, but you will not find that using multiprocessing for such a trivial worker function will improve performance; in fact, it will degrade performance due to the overhead in creating the pool and passing arguments and results to and from one address space to another.
Python's multiprocessing library does however have a wrapper for piping data between a parent and child process, the Manager which has shared data utilities such as a shared dictionary. There is a good stack overflow post here about the topic.
Using multiprocessing you can pass unique arguments and a shared dictionary to each process, and you must ensure each process writes to a different key in the dictionary.
An example of this in use given your example is as follows:
import multiprocessing
def worker(process_key, return_dict, compute_array):
"""worker function"""
number = compute_array[0]
number2 = compute_array[1]
divider = compute_array[2]
return_dict[process_key] = number * number2/divider
if __name__ == "__main__":
manager = multiprocessing.Manager()
return_dict = manager.dict()
jobs = []
compute_arrays = [[10, 2, 4], [55, 6, 8], [9, 8, 7]]
for i in range(len(compute_arrays)):
p = multiprocessing.Process(target=worker, args=(
i, return_dict, compute_arrays[i]))
jobs.append(p)
p.start()
for proc in jobs:
proc.join()
print(return_dict)
Edit: Information from Booboo is much more precise, I had a recommendation for threading which I'm removing as it's certainly not the right utility in Python due to the GIL.
Is there a way to reduce memory consumption when working with Python's pool.map?
To give a short example: worker() does some heavy lifting and returns a larger array...
def worker():
# cpu time intensive tasks
return large_array
...and a Pool maps over some large sequence:
with mp.Pool(mp.cpu_count()) as p:
result = p.map(worker, large_sequence)
Considering this setup, obviously, result will allocate a large portion of the system's memory. However, the final operation on the result is:
final_result = np.sum(result, axis=0)
Thus, NumPy effectively does nothing else than reducing with a sum operation on the iterable:
final_result = reduce(lambda x, y: x + y, result)
This, of course, would make it possible to consume results of pool.map as they come in and garbage-collecting them after reducing to eliminate the need of storing all the values first.
I could write some mp.queue now where results go into and then write some queue-consuming worker that sums up the results but this would (1) require significantly more lines of code and (2) feel like a (potentially slower) hack-around to me rather than clean code.
Is there a way to reduce results returned by a mp.Pool operation directly as they come in?
The iterator mappers imap and imap_unordered seem to do the trick:
#!/usr/bin/env python3
import multiprocessing
import numpy as np
def worker( a ):
# cpu time intensive tasks
large_array = np.ones((20,30))+a
return large_array
if __name__ == '__main__':
arraysum = np.zeros((20,30))
large_sequence = range(20)
num_cpus = multiprocessing.cpu_count()
with multiprocessing.Pool( processes=num_cpus ) as p:
for large_array in p.imap_unordered( worker, large_sequence ):
arraysum += large_array
This is my first time trying to use multiprocessing in Python. I'm trying to parallelize my function fun over my dataframe df by row. The callback function is just to append results to an empty list that I'll sort through later.
Is this the correct way to use apply_async? Thanks so much.
import multiprocessing as mp
function_results = []
async_results = []
p = mp.Pool() # by default should use number of processors
for row in df.iterrows():
r = p.apply_async(fun, (row,), callback=function_results.extend)
async_results.append(r)
for r in async_results:
r.wait()
p.close()
p.join()
It looks like using map or imap_unordered (dependending on whether you need your results to be ordered or not) would better suit your needs
import multiprocessing as mp
#prepare stuff
if __name__=="__main__":
p = mp.Pool()
function_results = list(p.imap_unorderd(fun,df.iterrows())) #unordered
#function_results = p.map(fun,df.iterrows()) #ordered
p.close()
I have written a function that returns a Pandas data frame (sample as a row and descriptor as columns) and takes input as a list of peptides (a biological sequence as strings data). "my_function(pep_list)" takes pep_list as a parameter and return data frame. it iterates eache peptide sequence from pep_list and calculates descriptor and combined all the data as pandas data frame and returns df:
pep_list = [DAAAAEF,DAAAREF,DAAANEF,DAAADEF,DAAACEF,DAAAEEF,DAAAQEF,DAAAGEF,DAAAHEF,DAAAIEF,DAAALEF,DAAAKEF]
example:
I want to parallelising this code with the given algorithm bellow:
1. get the number of processor available as .
n = multiprocessing.cpu_count()
2. split the pep_list as
sub_list_of_pep_list = pep_list/n
sub_list_of_pep_list = [[DAAAAEF,DAAAREF,DAAANEF],[DAAADEF,DAAACEF,DAAAEEF],[DAAAQEF,DAAAGEF,DAAAHEF],[DAAAIEF,DAAALEF,DAAAKEF]]
4. run "my_function()" for each core as (example if 4 cores )
df0 = my_function(sub_list_of_pep_list[0])
df1 = my_function(sub_list_of_pep_list[1])
df2 = my_functonn(sub_list_of_pep_list[2])
df3 = my_functonn(sub_list_of_pep_list[4])
5. join all df = concat[df0,df1,df2,df3]
6. returns df with nX speed.
Please suggest me the best suitable library to implement this method.
thanks and regards.
Updated
With some reading i am able to write down a code which work as per my expectation like
1. without parallelising it takes ~10 second for 10 peptide sequence
2. with two processes it takes ~6 second for 12 peptide
3. with four processes it takes ~4 second for 12 peptides
from multiprocessing import Process
def func1():
structure_gen(pep_seq = ["DAAAAEF","DAAAREF","DAAANEF"])
def func2():
structure_gen(pep_seq = ["DAAAQEF","DAAAGEF","DAAAHEF"])
def func3():
structure_gen(pep_seq = ["DAAADEF","DAAALEF"])
def func4():
structure_gen(pep_seq = ["DAAAIEF","DAAALEF"])
if __name__ == '__main__':
p1 = Process(target=func1)
p1.start()
p2 = Process(target=func2)
p2.start()
p3 = Process(target=func1)
p3.start()
p4 = Process(target=func2)
p4.start()
p1.join()
p2.join()
p3.join()
p4.join()
but this code easily work with 10 peptide but not able to implement it for a PEP_list contains 1 million peptide
thanks
multiprocessing.Pool.map is what you're looking for.
Try this:
from multiprocessing import Pool
# I recommend using more partitions than processes,
# this way the work can be balanced.
# Of course this only makes sense if pep_list is bigger than
# the one you provide. If not, change this to 8 or so.
n = 50
# create indices for the partitions
ix = np.linspace(0, len(pep_list), n+1, endpoint=True, dtype=int)
# create partitions using the indices
sub_lists = [pep_list[i1:i2] for i1, i2 in zip(ix[:-1], ix[1:])]
p = Pool()
try:
# p.map will return a list of dataframes which are to be
# concatenated
df = concat(p.map(my_function, sub_lists))
finally:
p.close()
The pool will automatically contain as many processes as there are available cores. But you can overwrite this number if you want to, just have a look at the docs.