Result Array is displayed as empty after trying to append values into it.
I have even declared result as global inside function.
Any suggestions?
Error Image
try this
res= []
inputData = [a,b,c,d]
def function(data):
values = [some_Number_1, some_Number_2]
return values
def parallel_run(function, inputData):
cpu_no = 4
if len(inputData) < cpu_no:
cpu_no = len(inputData)
p = multiprocessing.Pool(cpu_no)
global resultsAr
resultsAr = p.map(function, inputData, chunksize=1)
p.close()
p.join()
print ('res = ', res)
This happens since you're misunderstanding the basic point of multiprocessing: the child process spawned by multiprocessing.Process is separate from the parent process, and thus any modifications to data (including global variables) in the child process(es) are not propagated into the parent.
You will need to use multiprocessing-specific data types (queues and pipes), or the higher-level APIs provided by e.g. multiprocessing.Pool, to get data out of the child process(es).
For your application, the high-level recipe would be
def square(v):
return v * v
def main():
arr = [1, 2, 3, 4, 5]
with multiprocessing.Pool() as p:
squared = p.map(square, arr)
print(squared)
– however you'll likely find that this is massively slower than not using multiprocessing due to the overheads involved in such a small task.
Welcome to StackOverflow, Suyash !
The problem is that multiprocessing.Process is, as its name says, a separate process. You can imagine it almost as if you're running your script again from the terminal, with very little connection to the mother script.
Therefore, it has its own copy of the result array, which it modifies and prints.
The result in the "main" process is unmodified.
To convince yourself of this, try to print id(res) in both __main__ and in square(). You'll see they are different.
Related
How can I run multiple processes pool where I process run1-3 asynchronously, with a multi processing tool in python. I am trying to pass the values (10,2,4),(55,6,8),(9,8,7) for run1,run2,run3 respectively?
import multiprocessing
def Numbers(number,number2,divider):
value = number * number2/divider
return value
if __name__ == "__main__":
with multiprocessing.Pool(3) as pool: # 3 processes
run1, run2, run3 = pool.map(Numbers, [(10,2,4),(55,6,8),(9,8,7)]) # map input & output
You just need to use method starmap instead of map, which, according to the documentation:
Like map() except that the elements of the iterable are expected to be iterables that are unpacked as arguments.
Hence an iterable of [(1,2), (3, 4)] results in [func(1,2), func(3,4)].
import multiprocessing
def Numbers(number,number2,divider):
value = number * number2/divider
return value
if __name__ == "__main__":
with multiprocessing.Pool(3) as pool: # 3 processes
run1, run2, run3 = pool.starmap(Numbers, [(10,2,4),(55,6,8),(9,8,7)]) # map input & output
print(run1, run2, run3)
Prints:
5.0 41.25 10.285714285714286
Note
This is the correct way of doing what you want to do, but you will not find that using multiprocessing for such a trivial worker function will improve performance; in fact, it will degrade performance due to the overhead in creating the pool and passing arguments and results to and from one address space to another.
Python's multiprocessing library does however have a wrapper for piping data between a parent and child process, the Manager which has shared data utilities such as a shared dictionary. There is a good stack overflow post here about the topic.
Using multiprocessing you can pass unique arguments and a shared dictionary to each process, and you must ensure each process writes to a different key in the dictionary.
An example of this in use given your example is as follows:
import multiprocessing
def worker(process_key, return_dict, compute_array):
"""worker function"""
number = compute_array[0]
number2 = compute_array[1]
divider = compute_array[2]
return_dict[process_key] = number * number2/divider
if __name__ == "__main__":
manager = multiprocessing.Manager()
return_dict = manager.dict()
jobs = []
compute_arrays = [[10, 2, 4], [55, 6, 8], [9, 8, 7]]
for i in range(len(compute_arrays)):
p = multiprocessing.Process(target=worker, args=(
i, return_dict, compute_arrays[i]))
jobs.append(p)
p.start()
for proc in jobs:
proc.join()
print(return_dict)
Edit: Information from Booboo is much more precise, I had a recommendation for threading which I'm removing as it's certainly not the right utility in Python due to the GIL.
I am trying to run my simulations in a threadpool and store my results for each repetition in a global numpy array. However, I get problems while doing that and I am observing a really interesting behavior with the following (, simplified) code (python 3.7):
import numpy as np
from multiprocessing import Pool, Lock
log_mutex = Lock()
repetition_count = 5
data_array = np.zeros(shape=(repetition_count, 3, 200), dtype=float)
def record_results(repetition_index, data_array, log_mutex):
log_mutex.acquire()
print("Start record {}".format(repetition_index))
# Do some stuff and modify data_array, e.g.:
data_array[repetition_index, 0, 53] = 12.34
print("Finish record {}".format(repetition_index))
log_mutex.release()
def run(repetition_index):
global log_mutex
global data_array
# do some simulation
record_results(repetition_index, data_array, log_mutex)
if __name__ == "__main__":
random.seed()
with Pool(thread_count) as p:
print(p.map(run, range(repetition_count)))
The issue is: I get the correct "Start record & Finish record" outputs, e.g. Start record 1... Finish record 1. However, the different slices of the numpy array that are modified by each thread is not kept in the global variable. In other words, the elements that have been modified by thread 1 is still zero, a thread 4 overwrites different parts of the array.
One additional remark, the address of the global array, which I retrieve by
print(hex(id(data_array))) is the same for all threads, inside their log_mutex.acquire() ... log_mutex.release() lines.
Am I missing a point? Like, there are multiple copies of the global data_array stored for each thread? I am observing some behavior like this but this should not be the case when I use global keyword, am I wrong?
Looks like you're running the run function using multiple processes, not multiple threads. Try something like this instead:
import numpy as np
from threading import Thread, Lock
log_mutex = Lock()
repetition_count = 5
data_array = np.zeros(shape=(repetition_count, 3, 200), dtype=float)
def record_results(repetition_index, data_array, log_mutex):
log_mutex.acquire()
print("Start record {}".format(repetition_index))
# Do some stuff and modify data_array, e.g.:
data_array[repetition_index, 0, 53] = 12.34
print("Finish record {}".format(repetition_index))
log_mutex.release()
def run(repetition_index):
global log_mutex
global data_array
record_results(repetition_index, data_array, log_mutex)
if __name__ == "__main__":
threads = []
for i in range(repetition_count):
t = Thread(target=run, args=[i])
t.start()
threads.append(t)
for t in threads:
t.join()
Update:
To do this with multiple processes, you would need to use multiprocessing.RawArray to instantiate your array; the size of the array is the product repetition_count * 3 * 200. Within each process, create a view on the array using np.frombuffer, and reshape it accordingly. While this will be very fast, I discourage this style of programming as it relies on global shared memory objects, which are error-prone in larger programs.
If possible, I suggest removing the global data_array and instead instantiate an array in each call to record_results, which you would return in run. The p.map call will return a list of arrays, which you can convert to a numpy array and recover the shape and contents of the global data_array in your original implementation. This will incur a communication cost, but it's a cleaner approach to managing concurrency and eliminates the need for locks.
It's generally a good idea to minimize inter-process communication, but unless performance is critical, I don't think shared memory is the right solution. With p.map, you'll want to avoid returning large objects, but the object sizes in your snippet are very small (600*8 bytes).
I have a python function that requires the user enter an array of data; at which point the function works on the data and produces two arrays which are returned to the main program. In this question I am including a greatly simplified example from which I hope I can solicit some advice or help. I have created a function titled "Test_Function" that requires the programmer supply an array of data titled "Array", which in this case has a length of 5000. The function works on the data and produces two sets of arrays titled "Result1" and "Result2" which are returned to the user in the main program as the variables "Res1" and "Res2". I would like to thread the function so that the function "Test_Function" so that one thread will work on half of the input array and the other thread will work on the other half and then combine them back together in the main program for both output arrays "Result1" and "Result2"/"Res1" and "Res2". I described a scenario where I would produce two threads, but I would like to make it generic enough so that it could run a user defined number of threads. How do I do this with the thread functionality?
import numpy as np
def Test_Function(Array):
Result1 = Array*np.pi*(1-Array)
Result2 = Array+478.5 + (1/Array)
return(np.array(Result1,dtype=float), np.array(Result2,dtype=float))
#---------------------------------------------------------------------------
if __name__ == "__main__":
Dependent_Array = np.linspace(1.0,5000.0,num=5000)
Res1, Res2 = Test_Function(Dependent_Array)
# eof
You could use a ThreadPool to which assign async tasks:
import numpy as np
from multiprocessing.pool import ThreadPool
def sub1(Array):
return Array * np.pi * (1-Array)
def sub2(Array):
return Array + 478.5 + (1/Array)
def Test_Function(Array):
pool = ThreadPool(processes=2)
res1 = pool.apply_async(sub1, (Array,))
res2 = pool.apply_async(sub2, (Array,))
return (np.array(res1.get(),dtype=float), np.array(res2.get(),dtype=float))
if __name__ == "__main__":
Dependent_Array = np.linspace(1.0,5000.0,num=5000)
Res1, Res2 = Test_Function(Dependent_Array)
This module is not very well documented, but, basically, it creates a pool of workers to which you can assign subroutines to execute. The method get() of the result object created by apply_async will return the result only when the corresponding thread has finished its operations.
Yesterday i asked a question: Reading data in parallel with multiprocess
I got very good answers, and i implemented the solution mentioned in the answer i marked as correct.
def read_energies(motif):
os.chdir("blabla/working_directory")
complx_ener = pd.DataFrame()
# complex function to fill that dataframe
lig_ener = pd.DataFrame()
# complex function to fill that dataframe
return motif, complx_ener, lig_ener
COMPLEX_ENERGIS = {}
LIGAND_ENERGIES = {}
p = multiprocessing.Pool(processes=CPU)
for x in p.imap_unordered(read_energies, peptide_kd.keys()):
COMPLEX_ENERGIS[x[0]] = x[1]
LIGAND_ENERGIES[x[0]] = x[2]
However, this solution takes the same amount of time as if i would just iterate over peptide_kd.keys() and fill up the DataFrames one by one. Why is that so? Is there a way to fill up the desired dicts in parallel and actually get a speed increase? i am running it on a 48 core HPC.
You are incurring a good amount of overhead in (1) starting up each process, and (2) having to copy the pandas.DataFrame (and etc) across several processes. If you just need to have a dict filled in parallel, I'd suggest using a shared memory dict. If no key will be overwritten, then it's easy and you don't have to worry about locks.
(Note I'm using multiprocess below, which is a fork of multiprocessing -- but only so I can demonstrate from the interpreter, otherwise, you'd have to do the below from __main__).
>>> from multiprocess import Process, Manager
>>>
>>> def f(d, x):
... d[x] = x**2
...
>>> manager = Manager()
>>> d = manager.dict()
>>> job = [Process(target=f, args=(d, i)) for i in range(5)]
>>> _ = [p.start() for p in job]
>>> _ = [p.join() for p in job]
>>> print d
{0: 0, 1: 1, 2: 4, 3: 9, 4: 16}
This solution doesn't make copies of the dict to share across processes, so that part of the overhead is reduced. For large objects like a pandas.DataFrame, it can be significant compared to the cost of a simple operation like x**2. Similarly, spawning a Process can take time, and you maybe be able to do the above even faster (for lightweight objects) by using threads (i.e. from multiprocess.dummy instead of multiprocess for either your originally posted solution or mine above).
If you do need to share DataFrames (as your code suggests instead of as the question asks), you might be able to do it by creating a shared memory numpy.ndarray.
I have a dataset df of trader transactions.
I have 2 levels of for loops as follows:
smartTrader =[]
for asset in range(len(Assets)):
df = df[df['Assets'] == asset]
# I have some more calculations here
for trader in range(len(df['TraderID'])):
# I have some calculations here, If trader is successful, I add his ID
# to the list as follows
smartTrader.append(df['TraderID'][trader])
# some more calculations here which are related to the first for loop.
I would like to parallelise the calculations for each asset in Assets, and I also want to parallelise the calculations for each trader for every asset. After ALL these calculations are done, I want to do additional analysis based on the list of smartTrader.
This is my first attempt at parallel processing, so please be patient with me, and I appreciate your help.
If you use pathos, which provides a fork of multiprocessing, you can easily nest parallel maps. pathos is built for easily testing combinations of nested parallel maps -- which are direct translations of nested for loops.
It provides a selection of maps that are blocking, non-blocking, iterative, asynchronous, serial, parallel, and distributed.
>>> from pathos.pools import ProcessPool, ThreadPool
>>> amap = ProcessPool().amap
>>> tmap = ThreadPool().map
>>> from math import sin, cos
>>> print amap(tmap, [sin,cos], [range(10),range(10)]).get()
[[0.0, 0.8414709848078965, 0.9092974268256817, 0.1411200080598672, -0.7568024953079282, -0.9589242746631385, -0.27941549819892586, 0.6569865987187891, 0.9893582466233818, 0.4121184852417566], [1.0, 0.5403023058681398, -0.4161468365471424, -0.9899924966004454, -0.6536436208636119, 0.2836621854632263, 0.9601702866503661, 0.7539022543433046, -0.14550003380861354, -0.9111302618846769]]
Here this example uses a processing pool and a thread pool, where the thread map call is blocking, while the processing map call is asynchronous (note the get at the end of the last line).
Get pathos here: https://github.com/uqfoundation
or with:
$ pip install git+https://github.com/uqfoundation/pathos.git#master
Nested parallelism can be done elegantly with Ray, a system that allows you to easily parallelize and distribute your Python code.
Assume you want to parallelize the following nested program
def inner_calculation(asset, trader):
return trader
def outer_calculation(asset):
return asset, [inner_calculation(asset, trader) for trader in range(5)]
inner_results = []
outer_results = []
for asset in range(10):
outer_result, inner_result = outer_calculation(asset)
outer_results.append(outer_result)
inner_results.append(inner_result)
# Then you can filter inner_results to get the final output.
Bellow is the Ray code parallelizing the above code:
Use the #ray.remote decorator for each function that we want to execute concurrently in its own process. A remote function returns a future (i.e., an identifier to the result) rather than the result itself.
When invoking a remote function f() the remote modifier, i.e., f.remote()
Use the ids_to_vals() helper function to convert a nested list of ids to values.
Note the program structure is identical. You only need to add remote and then convert the futures (ids) returned by the remote functions to values using the ids_to_vals() helper function.
import ray
ray.init()
# Define inner calculation as a remote function.
#ray.remote
def inner_calculation(asset, trader):
return trader
# Define outer calculation to be executed as a remote function.
#ray.remote(num_return_vals = 2)
def outer_calculation(asset):
return asset, [inner_calculation.remote(asset, trader) for trader in range(5)]
# Helper to convert a nested list of object ids to a nested list of corresponding objects.
def ids_to_vals(ids):
if isinstance(ids, ray.ObjectID):
ids = ray.get(ids)
if isinstance(ids, ray.ObjectID):
return ids_to_vals(ids)
if isinstance(ids, list):
results = []
for id in ids:
results.append(ids_to_vals(id))
return results
return ids
outer_result_ids = []
inner_result_ids = []
for asset in range(10):
outer_result_id, inner_result_id = outer_calculation.remote(asset)
outer_result_ids.append(outer_result_id)
inner_result_ids.append(inner_result_id)
outer_results = ids_to_vals(outer_result_ids)
inner_results = ids_to_vals(inner_result_ids)
There are a number of advantages of using Ray over the multiprocessing module. In particular, the same code will run on a single machine as well as on a cluster of machines. For more advantages of Ray see this related post.
Probably threading, from standard python library, is most convenient approach:
import threading
def worker(id):
#Do you calculations here
return
threads = []
for asset in range(len(Assets)):
df = df[df['Assets'] == asset]
for trader in range(len(df['TraderID'])):
t = threading.Thread(target=worker, args=(trader,))
threads.append(t)
t.start()
#add semaphore here if you need synchronize results for all traders.
Instead of using for, use map:
import functools
smartTrader =[]
m=map( calculations_as_a_function,
[df[df['Assets'] == asset] \
for asset in range(len(Assets))])
functools.reduce(smartTradder.append, m)
From then on, you can try different parallel map implementations s.a. multiprocessing's, or stackless'