Parallelise nested for-loop in IPython

Parallelise nested for-loop in IPython - python

I have a nested for loop in my python code that looks something like this:
results = []
for azimuth in azimuths:
for zenith in zeniths:
# Do various bits of stuff
# Eventually get a result
results.append(result)
I'd like to parallelise this loop on my 4 core machine to speed it up. Looking at the IPython parallel programming documentation (http://ipython.org/ipython-doc/dev/parallel/parallel_multiengine.html#quick-and-easy-parallelism) it seems that there is an easy way to use map to parallelise iterative operations.
However, to do that I need to have the code inside the loop as a function (which is easy to do), and then map across this function. The problem I have is that I can't get an array to map this function across. itertools.product() produces an iterator which I can't seem to use the map function with.
Am I barking up the wrong tree by trying to use map here? Is there a better way to do it? Or is there some way to use itertools.product and then do parallel execution with a function mapped across the results?

To parallelize every call, you just need to get a list for each argument. You can use itertools.product + zip to get this:
allzeniths, allazimuths = zip(*itertools.product(zeniths, azimuths))
Then you can use map:
amr = dview.map(f, allzeniths, allazimuths)
To go a bit deeper into the steps, here's an example:
zeniths = range(1,4)
azimuths = range(6,8)
product = list(itertools.product(zeniths, azimuths))
# [(1, 6), (1, 7), (2, 6), (2, 7), (3, 6), (3, 7)]
So we have a "list of pairs", but what we really want is a single list for each argument, i.e. a "pair of lists". This is exactly what the slightly weird zip(*product) syntax gets us:
allzeniths, allazimuths = zip(*itertools.product(zeniths, azimuths))
print allzeniths
# (1, 1, 2, 2, 3, 3)
print allazimuths
# (6, 7, 6, 7, 6, 7)
Now we just map our function onto those two lists, to parallelize nested for loops:
def f(z,a):
return z*a
view.map(f, allzeniths, allazimuths)
And there's nothing special about there being only two - this method should extend to an arbitrary number of nested loops.

I assume you are using IPython 0.11 or later. First of all define a simple function.
def foo(azimuth, zenith):
# Do various bits of stuff
# Eventually get a result
return result
then use IPython's fine parallel suite to parallelize your problem. first start a controller with 5 engines attached (#CPUs + 1) by starting a cluster in a terminal window (if you installed IPython 0.11 or later this program should be present):
ipcluster start -n 5
In your script connect to the controller and transmit all your tasks. The controller will take care of everything.
from IPython.parallel import Client
c = Client() # here is where the client establishes the connection
lv = c.load_balanced_view() # this object represents the engines (workers)
tasks = []
for azimuth in azimuths:
for zenith in zeniths:
tasks.append(lv.apply(foo, azimuth, zenith))
result = [task.get() for task in tasks] # blocks until all results are back

I'm not really familiar with IPython, but an easy solution would seem to be to parallelize the outer loop only.
def f(azimuth):
results = []
for zenith in zeniths:
#compute result
results.append(result)
return results
allresults = map(f, azimuths)

If you actually want to run your code in parallel, use concurrent.futures
import itertools
import concurrent.futures
def _work_horse(azimuth, zenith):
#DO HEAVY WORK HERE
return result
futures = []
with concurrent.futures.ProcessPoolExecutor() as executor:
for arg_set in itertools.product(zeniths, azimuths):
futures.append(executor.submit(_work_horse, *arg_set))
executor.shutdown(wait=True)
# Will time out after one hour.
results = [future.result(3600) for future in futures]

If you want to keep the structure of your loop, you can try using Ray (docs), which is a framework for writing parallel and distributed Python. The one requirement is that you would have to separate out the work that can be parallelized into its own function.
You can import Ray like this:
import ray
# Start Ray. This creates some processes that can do work in parallel.
ray.init()
Then, your script would look like this:
# Add this line to signify that the function can be run in parallel (as a
# "task"). Ray will load-balance different `work` tasks automatically.
#ray.remote
def work(azimuth, zenith):
# Do various bits of stuff
# Eventually get a result
return result
results = []
for azimuth in azimuths:
for zenith in zeniths:
# Store a future, which represents the future result of `work`.
results.append(work.remote(azimuth, zenith))
# Block until the results are ready with `ray.get`.
results = ray.get(results)

Related

Unable To Display Result Array In Python Multiprocessing

Result Array is displayed as empty after trying to append values into it.
I have even declared result as global inside function.
Any suggestions?
Error Image

try this
res= []
inputData = [a,b,c,d]
def function(data):
values = [some_Number_1, some_Number_2]
return values
def parallel_run(function, inputData):
cpu_no = 4
if len(inputData) < cpu_no:
cpu_no = len(inputData)
p = multiprocessing.Pool(cpu_no)
global resultsAr
resultsAr = p.map(function, inputData, chunksize=1)
p.close()
p.join()
print ('res = ', res)

This happens since you're misunderstanding the basic point of multiprocessing: the child process spawned by multiprocessing.Process is separate from the parent process, and thus any modifications to data (including global variables) in the child process(es) are not propagated into the parent.
You will need to use multiprocessing-specific data types (queues and pipes), or the higher-level APIs provided by e.g. multiprocessing.Pool, to get data out of the child process(es).
For your application, the high-level recipe would be
def square(v):
return v * v
def main():
arr = [1, 2, 3, 4, 5]
with multiprocessing.Pool() as p:
squared = p.map(square, arr)
print(squared)
– however you'll likely find that this is massively slower than not using multiprocessing due to the overheads involved in such a small task.

Welcome to StackOverflow, Suyash !
The problem is that multiprocessing.Process is, as its name says, a separate process. You can imagine it almost as if you're running your script again from the terminal, with very little connection to the mother script.
Therefore, it has its own copy of the result array, which it modifies and prints.
The result in the "main" process is unmodified.
To convince yourself of this, try to print id(res) in both __main__ and in square(). You'll see they are different.

Using Cython to create parallel threads without prange

I have a recursive function that does something similar to the following:
import numpy as np
from copy import copy
shared_data = np.random.randn(6, 5, 3)
def grow(current_data, level):
grown_data = []
if level < shared_data.shape[0] - 1:
nlevel = level + 1
valid = ((shared_data[nlevel] - current_data[-1])**2).sum(axis=-1) < 1
for new_data in shared_data[nlevel, valid]:
continue_data = copy(current_data)
continue_data.append(new_data)
grown_data.extend(grow(continue_data, level+1))
else:
grown_data.append(current_data)
return grown_data
begin_data = np.random.randn(3)
print(grow([begin_data], 0))
I am wondering if there is some way to start a new parallel thread in cython to do the current processing on each entry for the grow function in order to speed this type of recursion up. While the above sample code runs relatively fast, the actual code is slower (a) because it does more than the simple distance calculation included above and (b) because the data it is operating on is more like the size (3000, 10, 3), which even for this simple example is prohibitively slow, at least on my machine.
One thought that I had was to use a list/queue to add recursive jobs to instead of calling them directly, then, on each return from grow, using a prange loop to process the jobs in the list/queue in parallel, but I'm afraid this will result in the recreation of threads all the time and decrease the efficiency.

How to get return value from thread in Python?

I do some computationally expensive tasks in python and found the thread module for parallelization. I have a function which does the computation and returns a ndarray as result. Now I want to know how I can parallize my function and get back the calculated Arrays from each thread.
The followed example is strongly simplified with light functions and calculations.
import numpy as np
def calculate_result(input):
a=np.linspace(1.0, 1000.0, num=10000) # just an example
result = input*a
return(result)
input =[1,2,3,4]
for i in range(0,len(input(i))):
t.Thread(target=calculate_result, args=(input))
t. start()
#Here I want to receive the return value from the thread
I am looking for a way to get the return value from the thread / function for each thread, because in my task each thread calculates different values.
I found an other Question (how to get the return value from a thread in python?) where someone is looking for a similar problem (no ndarrays) and which is handled with ThreadPool and async...
-------------------------------------------------------------------------------
Thanks for your answers !
Due to your help now I am looking for a way to solve my problem with the multiprocessing modul. To give you a better understanding what I do, see my following Explanation.
Explanation:
My 'input_data' is an ndarray with 282240 elements of type uint32
In the 'calculation_function()'I use a for loop to calculate from
every 12 bit a result and put it into the 'output_data'
Because this is very slow, I split my input_data into e.g. 4 or 8
parts and calculate each part in the calculation_function().
Now I am looking for a way, how to parallize the 4 or 8 function
calls
The order of the data is elementary, because the data is in image and
each pixel have to be at the correct Position. So function call no. 1
calculates the first and the last function call the last pixel of the
image.
The calculations work fine and the image can be completly rebuilt
from my algo but I need the parallelization to speed up for time
critical aspects.
Summary:
One input ndarray is devided into 4 or 8 parts. In each part are 70560 or 35280 uint32 values. From each 12 bit I calculate one Pixel with 4 or 8 function calls. Each function returns one ndarray with 188160 or 94080 pixel. All return values will be put together in a row and reshaped into an image.
What allready works:
Calculations are allready working and I can reconstruct my image
Problem:
Function calls are done seriall and in a row but each image reconstruction is very slow
Main Goal:
Speed up the function calls by parallize the function calls.
Code:
def decompress(payload,WIDTH,HEIGHT):
# INPUTS / OUTPUTS
n_threads = 4
img_input = np.fromstring(payload, dtype='uint32')
img_output = np.zeros((WIDTH * HEIGHT), dtype=np.uint32)
n_elements_part = np.int(len(img_input) / n_threads)
input_part=np.zeros((n_threads,n_elements_part)).astype(np.uint32)
output_part =np.zeros((n_threads,np.int(n_elements_part/3*8))).astype(np.uint32)
# DEFINE PARTS (here 4 different ones)
start = np.zeros(n_threads).astype(np.int)
end = np.zeros(n_threads).astype(np.int)
for i in range(0,n_threads):
start[i] = i * n_elements_part
end[i] = (i+1) * n_elements_part -1
# COPY IMAGE DATA
for idx in range(0,n_threads):
input_part [idx,:] = img_input[start[idx]:end[idx]+1]
for idx in range(0,n_threads): # following line is the function_call that should be parallized
output_part[idx,:] = decompress_part2(input_part[idx],output_part[idx])
# COPY PARTS INTO THE IMAGE
img_output[0 : 188160] = output_part[0,:]
img_output[188160: 376320] = output_part[1,:]
img_output[376320: 564480] = output_part[2,:]
img_output[564480: 752640] = output_part[3,:]
# RESHAPE IMAGE
img_output = np.reshape(img_output,(HEIGHT, WIDTH))
return img_output
Please dont take care of my beginner programming style :)
Just looking for a solution how to parallize the function calls with the multiprocessing module and get back the return ndarrays.
Thank you so much for your help !

You can use process pool from the multiprocessing module
def test(a):
return a
from multiprocessing.dummy import Pool
p = Pool(3)
a=p.starmap(test, zip([1,2,3]))
print(a)
p.close()
p.join()

kar's answer works, however keep in mind that he's using the .dummy module which might be limited by the GIL. Heres more info on it:
multiprocessing.dummy in Python is not utilising 100% cpu

Python: How to run nested parallel process in python?

I have a dataset df of trader transactions.
I have 2 levels of for loops as follows:
smartTrader =[]
for asset in range(len(Assets)):
df = df[df['Assets'] == asset]
# I have some more calculations here
for trader in range(len(df['TraderID'])):
# I have some calculations here, If trader is successful, I add his ID
# to the list as follows
smartTrader.append(df['TraderID'][trader])
# some more calculations here which are related to the first for loop.
I would like to parallelise the calculations for each asset in Assets, and I also want to parallelise the calculations for each trader for every asset. After ALL these calculations are done, I want to do additional analysis based on the list of smartTrader.
This is my first attempt at parallel processing, so please be patient with me, and I appreciate your help.

If you use pathos, which provides a fork of multiprocessing, you can easily nest parallel maps. pathos is built for easily testing combinations of nested parallel maps -- which are direct translations of nested for loops.
It provides a selection of maps that are blocking, non-blocking, iterative, asynchronous, serial, parallel, and distributed.
>>> from pathos.pools import ProcessPool, ThreadPool
>>> amap = ProcessPool().amap
>>> tmap = ThreadPool().map
>>> from math import sin, cos
>>> print amap(tmap, [sin,cos], [range(10),range(10)]).get()
[[0.0, 0.8414709848078965, 0.9092974268256817, 0.1411200080598672, -0.7568024953079282, -0.9589242746631385, -0.27941549819892586, 0.6569865987187891, 0.9893582466233818, 0.4121184852417566], [1.0, 0.5403023058681398, -0.4161468365471424, -0.9899924966004454, -0.6536436208636119, 0.2836621854632263, 0.9601702866503661, 0.7539022543433046, -0.14550003380861354, -0.9111302618846769]]
Here this example uses a processing pool and a thread pool, where the thread map call is blocking, while the processing map call is asynchronous (note the get at the end of the last line).
Get pathos here: https://github.com/uqfoundation
or with:
$ pip install git+https://github.com/uqfoundation/pathos.git#master

Nested parallelism can be done elegantly with Ray, a system that allows you to easily parallelize and distribute your Python code.
Assume you want to parallelize the following nested program
def inner_calculation(asset, trader):
return trader
def outer_calculation(asset):
return asset, [inner_calculation(asset, trader) for trader in range(5)]
inner_results = []
outer_results = []
for asset in range(10):
outer_result, inner_result = outer_calculation(asset)
outer_results.append(outer_result)
inner_results.append(inner_result)
# Then you can filter inner_results to get the final output.
Bellow is the Ray code parallelizing the above code:
Use the #ray.remote decorator for each function that we want to execute concurrently in its own process. A remote function returns a future (i.e., an identifier to the result) rather than the result itself.
When invoking a remote function f() the remote modifier, i.e., f.remote()
Use the ids_to_vals() helper function to convert a nested list of ids to values.
Note the program structure is identical. You only need to add remote and then convert the futures (ids) returned by the remote functions to values using the ids_to_vals() helper function.
import ray
ray.init()
# Define inner calculation as a remote function.
#ray.remote
def inner_calculation(asset, trader):
return trader
# Define outer calculation to be executed as a remote function.
#ray.remote(num_return_vals = 2)
def outer_calculation(asset):
return asset, [inner_calculation.remote(asset, trader) for trader in range(5)]
# Helper to convert a nested list of object ids to a nested list of corresponding objects.
def ids_to_vals(ids):
if isinstance(ids, ray.ObjectID):
ids = ray.get(ids)
if isinstance(ids, ray.ObjectID):
return ids_to_vals(ids)
if isinstance(ids, list):
results = []
for id in ids:
results.append(ids_to_vals(id))
return results
return ids
outer_result_ids = []
inner_result_ids = []
for asset in range(10):
outer_result_id, inner_result_id = outer_calculation.remote(asset)
outer_result_ids.append(outer_result_id)
inner_result_ids.append(inner_result_id)
outer_results = ids_to_vals(outer_result_ids)
inner_results = ids_to_vals(inner_result_ids)
There are a number of advantages of using Ray over the multiprocessing module. In particular, the same code will run on a single machine as well as on a cluster of machines. For more advantages of Ray see this related post.

Probably threading, from standard python library, is most convenient approach:
import threading
def worker(id):
#Do you calculations here
return
threads = []
for asset in range(len(Assets)):
df = df[df['Assets'] == asset]
for trader in range(len(df['TraderID'])):
t = threading.Thread(target=worker, args=(trader,))
threads.append(t)
t.start()
#add semaphore here if you need synchronize results for all traders.

Instead of using for, use map:
import functools
smartTrader =[]
m=map( calculations_as_a_function,
[df[df['Assets'] == asset] \
for asset in range(len(Assets))])
functools.reduce(smartTradder.append, m)
From then on, you can try different parallel map implementations s.a. multiprocessing's, or stackless'

Create a summary Pandas DataFrame using concat/append via a for loop

Can't get my mind around this...
I read a bunch of spreadsheets, do a bunch of calculations and then want to create a summary DF from each set of calculations. I can create the initial df but don't know how to control my loops so that I
create the initial DF (1st time though the loop)
If it has been created append the next DF (last two rows) for each additional tab.
I just can't wrap my head how to create the right nested loop so that once the 1st one is done the subsequent ones get appended?
My current code looks like: (which just dumbly prints each tab's results separately rather than create a new consolidated sumdf with just the last 2 rows of each tabs' results..
#make summary
area_tabs=['5','12']
for area_tabs in area_tabs:
actdf,aname = get_data(area_tabs)
lastq,fcast_yr,projections,yrahead,aname,actdf,merged2,mergederrs,montdist,ols_test,mergedfcst=do_projections(actdf)
sumdf=merged2[-2:]
sumdf['name']= aname #<<< I'll be doing a few more calculations here as well
print sumdf
Still a newb learning basic python loop techniques :-(

Often a neater way than writing for loops, especially if you are planning on using the result, is to use a list comprehension over a function:
def get_sumdf(area_tab): # perhaps you can name better?
actdf,aname = get_data(area_tab)
lastq,fcast_yr,projections,yrahead,aname,actdf,merged2,mergederrs,montdist,ols_test,mergedfcst=do_projections(actdf)
sumdf=merged2[-2:]
sumdf['name']= aname #<<< I'll be doing a few more calculations here as well
return sumdf
[get_sumdf(area_tab) for area_tab in areas_tabs]
and concat:
pd.concat([get_sumdf(area_tab) for area_tab in areas_tabs])
or you can also use a generator expression:
pd.concat(get_sumdf(area_tab) for area_tab in areas_tabs)
.
To explain my comment re named tuples and dictionaries, I think this line is difficult to read and ripe for bugs:
lastq,fcast_yr,projections,yrahead,aname,actdf,merged2,mergederrs,montdist,ols_test,mergedfcst=do_projections(actdf)
A trick is to have do_projections return a named tuple, rather than a tuple:
from collections import namedtuple
Projection = namedtuple('Projection', ['lastq', 'fcast_yr', 'projections', 'yrahead', 'aname', 'actdf', 'merged2', 'mergederrs', 'montdist', 'ols_test', 'mergedfcst'])
then inside do_projections:
return (1, 2, 3, 4, ...) # don't do this
return Projection(1, 2, 3, 4, ...) # do this
return Projection(last_q=last_q, fcast_yr=f_cast_yr, ...) # or this
I think this avoids bugs and is a lot cleaner, especially to access the results later.
projections = do_projections(actdf)
projections.aname

Do the initialisation outside the for loop. Something like this:
#make summary
area_tabs=['5','12']
if not area_tabs:
return # nothing to do
# init the first frame
actdf,aname = get_data(area_tabs[0])
lastq,fcast_yr,projections,yrahead,aname,actdf,merged2,mergederrs,montdist,ols_test,mergedfcst =do_projections(actdf)
sumdf=merged2[-2:]
sumdf['name']= aname
for area_tabs in area_tabs[1:]:
actdf,aname = get_data(area_tabs)
lastq,fcast_yr,projections,yrahead,aname,actdf,merged2,mergederrs,montdist,ols_test,mergedfcst =do_projections(actdf)
sumdf=merged2[-2:]
sumdf['name']= aname #<<< I'll be doing a few more calculations here as well
print sumdf
You can further improve the code by putting the common steps into a function.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Parallelise nested for-loop in IPython - python

I'm not really familiar with IPython, but an easy solution would seem to be to parallelize the outer loop only. def f(azimuth): results = [] for zenith in zeniths: #compute result results.append(result) return results allresults = map(f, azimuths)

Related

Unable To Display Result Array In Python Multiprocessing

Using Cython to create parallel threads without prange

How to get return value from thread in Python?

Python: How to run nested parallel process in python?

Create a summary Pandas DataFrame using concat/append via a for loop

Categories

Resources