Is there a way to reduce memory consumption when working with Python's pool.map?
To give a short example: worker() does some heavy lifting and returns a larger array...
def worker():
# cpu time intensive tasks
return large_array
...and a Pool maps over some large sequence:
with mp.Pool(mp.cpu_count()) as p:
result = p.map(worker, large_sequence)
Considering this setup, obviously, result will allocate a large portion of the system's memory. However, the final operation on the result is:
final_result = np.sum(result, axis=0)
Thus, NumPy effectively does nothing else than reducing with a sum operation on the iterable:
final_result = reduce(lambda x, y: x + y, result)
This, of course, would make it possible to consume results of pool.map as they come in and garbage-collecting them after reducing to eliminate the need of storing all the values first.
I could write some mp.queue now where results go into and then write some queue-consuming worker that sums up the results but this would (1) require significantly more lines of code and (2) feel like a (potentially slower) hack-around to me rather than clean code.
Is there a way to reduce results returned by a mp.Pool operation directly as they come in?
The iterator mappers imap and imap_unordered seem to do the trick:
#!/usr/bin/env python3
import multiprocessing
import numpy as np
def worker( a ):
# cpu time intensive tasks
large_array = np.ones((20,30))+a
return large_array
if __name__ == '__main__':
arraysum = np.zeros((20,30))
large_sequence = range(20)
num_cpus = multiprocessing.cpu_count()
with multiprocessing.Pool( processes=num_cpus ) as p:
for large_array in p.imap_unordered( worker, large_sequence ):
arraysum += large_array
Suppose I have two independent functions. I'd like to call them concurrently, using python's concurrent.futures.ThreadPoolExecutor. Is there a way to call them using Executor and ensure they are returned in order of submission?
I understand this is possible with the Executor.map, but I am looking to parallelize two separate functions, and not one function with a interable input.
I have example code below, but it doesn't guarantee that fn_a will return first, (by design of the wait function).
from concurrent.futures import ThreadPoolExecutor, wait
import time
def fn_a():
t_sleep = 0.5
print("fn_a: Wait {} seconds".format(t_sleep))
time.sleep(t_sleep)
ret = t_sleep * 5 # Do unique work
return "fn_a: return {}".format(ret)
def fn_b():
t_sleep = 1.0
print("fn_b: Wait {} seconds".format(t_sleep))
time.sleep(t_sleep)
ret = t_sleep * 10 # Do unique work
return "fn_b: return {}".format(ret)
if __name__ == "__main__":
with ThreadPoolExecutor() as executor:
futures = []
futures.append(executor.submit(fn_a))
futures.append(executor.submit(fn_b))
complete_futures, incomplete_futures = wait(futures)
for f in complete_futures:
print(f.result())
I'm also interested in knowing if there is a way to do this with joblib
Think I found a reasonable option using lambda and partials. The partials allow me to pass arguments to some functions in the parallelized iterable, but not others.
from functools import partial
import concurrent.futures
fns = [partial(fn_a), partial(fn_b)]
data = []
with concurrent.futures.ThreadPoolExecutor() as executor:
try:
for result in executor.map(lambda x: x(), fns):
data.append(result)
Since it is using executor.map, it returns in order.
I am new to python and tried a lot of methods for multiprocessing in python with no such benefits:
I have a task of implementing 3 methods x,y and z. What I have tried till now is:
Def foo:
Iterate over the lines in a text file:
Call_method_x()
Result from method x say x1
Call_method_y() #this uses x1
Result from method y say y1
For i in range(4):
Multiprocessing.Process(target=Call_method_z()) #this uses y1
I used multiprocessing here on method_z as this is the most cpu intensive.
i tried this another way:
def foo:
call method_x()
call method_y()
call method_z()
def main():
import concurrent.futures
with concurrent.futures.ProcessPoolExecutor() as executor:
executor.map(foo())
Which one seems more appropriate ? I checked the execution time but it was not much of a difference. the thing is that first method_x(), then method_y() and then method_z() should be implemented as they use the output from each other. Both these ways work but theres no significant difference of using multiprocessing with these two methods.
Please let me know if I am missing something here.
You can use multiprocessing.Pool from python, something like :
from multiprocessing import Pool
with open(<path-to-file>) as f:
data = f.readlines()
def method_x():
# do something
pass
def method_y():
x1 = method_x()
#do something
def method_z():
y1 = method_y()
# do something
def call_home():
p = Pool(6)
p.map(method_z, data)
First you read all lines in variable data. Then invoke 6 processes and allow each line to be processed by any of 6 process
I am currently trying to implement a class to do intensive calculus :
import random
import multiprocessing as mp
class IntensiveStuff:
def __init__(self):
self.N = 20
self.nb_process = 4
set_of_things = set()
def lunch_multiprocessing(self):
processes = []
for i in range(self.nb_process):
processes.append(mp.Process(target=self.process_method, args=()))
[x.start() for x in processes]
[x.join() for x in processes]
set_of_things = ... # I want all the sub_set of 'process_method' updated in set_of_things
def process_method(self):
sub_set = set()
for _ in range(self.N):
sub_set.add(random.randint(100))
I want to compute independent calculus, put the results in a sub_set for each process and merge all the sub_set in the set_of_things (which are object in the real code).
I have trying to use Queue without success, any advise ?
P.S : have tried to reproduce the code in Can a set() be shared between Python processes? but without any luck as well.
Processes can't share memory, but they may communicate via Pipes, sockets, etc. multiprocessing module has special objects (i believe, they use pipes under the hood). multiprocessing.Queue should also work, but I use often these two objects:
multiprocessing.Manager().list() and
multiprocessing.Manager().dict()
results = multiprocessing.Manager().list()
# now a bit of your code
processes = []
for i in range(self.nb_process):
processes.append(mp.Process(target=self.process_method, args=(), results))
def process_method(self, results):
sub_set = set()
for _ in range(self.N):
sub_set.add(random.randint(100))
results.append(sub_set) # or what you really need to add to results
I have a dataset df of trader transactions.
I have 2 levels of for loops as follows:
smartTrader =[]
for asset in range(len(Assets)):
df = df[df['Assets'] == asset]
# I have some more calculations here
for trader in range(len(df['TraderID'])):
# I have some calculations here, If trader is successful, I add his ID
# to the list as follows
smartTrader.append(df['TraderID'][trader])
# some more calculations here which are related to the first for loop.
I would like to parallelise the calculations for each asset in Assets, and I also want to parallelise the calculations for each trader for every asset. After ALL these calculations are done, I want to do additional analysis based on the list of smartTrader.
This is my first attempt at parallel processing, so please be patient with me, and I appreciate your help.
If you use pathos, which provides a fork of multiprocessing, you can easily nest parallel maps. pathos is built for easily testing combinations of nested parallel maps -- which are direct translations of nested for loops.
It provides a selection of maps that are blocking, non-blocking, iterative, asynchronous, serial, parallel, and distributed.
>>> from pathos.pools import ProcessPool, ThreadPool
>>> amap = ProcessPool().amap
>>> tmap = ThreadPool().map
>>> from math import sin, cos
>>> print amap(tmap, [sin,cos], [range(10),range(10)]).get()
[[0.0, 0.8414709848078965, 0.9092974268256817, 0.1411200080598672, -0.7568024953079282, -0.9589242746631385, -0.27941549819892586, 0.6569865987187891, 0.9893582466233818, 0.4121184852417566], [1.0, 0.5403023058681398, -0.4161468365471424, -0.9899924966004454, -0.6536436208636119, 0.2836621854632263, 0.9601702866503661, 0.7539022543433046, -0.14550003380861354, -0.9111302618846769]]
Here this example uses a processing pool and a thread pool, where the thread map call is blocking, while the processing map call is asynchronous (note the get at the end of the last line).
Get pathos here: https://github.com/uqfoundation
or with:
$ pip install git+https://github.com/uqfoundation/pathos.git#master
Nested parallelism can be done elegantly with Ray, a system that allows you to easily parallelize and distribute your Python code.
Assume you want to parallelize the following nested program
def inner_calculation(asset, trader):
return trader
def outer_calculation(asset):
return asset, [inner_calculation(asset, trader) for trader in range(5)]
inner_results = []
outer_results = []
for asset in range(10):
outer_result, inner_result = outer_calculation(asset)
outer_results.append(outer_result)
inner_results.append(inner_result)
# Then you can filter inner_results to get the final output.
Bellow is the Ray code parallelizing the above code:
Use the #ray.remote decorator for each function that we want to execute concurrently in its own process. A remote function returns a future (i.e., an identifier to the result) rather than the result itself.
When invoking a remote function f() the remote modifier, i.e., f.remote()
Use the ids_to_vals() helper function to convert a nested list of ids to values.
Note the program structure is identical. You only need to add remote and then convert the futures (ids) returned by the remote functions to values using the ids_to_vals() helper function.
import ray
ray.init()
# Define inner calculation as a remote function.
#ray.remote
def inner_calculation(asset, trader):
return trader
# Define outer calculation to be executed as a remote function.
#ray.remote(num_return_vals = 2)
def outer_calculation(asset):
return asset, [inner_calculation.remote(asset, trader) for trader in range(5)]
# Helper to convert a nested list of object ids to a nested list of corresponding objects.
def ids_to_vals(ids):
if isinstance(ids, ray.ObjectID):
ids = ray.get(ids)
if isinstance(ids, ray.ObjectID):
return ids_to_vals(ids)
if isinstance(ids, list):
results = []
for id in ids:
results.append(ids_to_vals(id))
return results
return ids
outer_result_ids = []
inner_result_ids = []
for asset in range(10):
outer_result_id, inner_result_id = outer_calculation.remote(asset)
outer_result_ids.append(outer_result_id)
inner_result_ids.append(inner_result_id)
outer_results = ids_to_vals(outer_result_ids)
inner_results = ids_to_vals(inner_result_ids)
There are a number of advantages of using Ray over the multiprocessing module. In particular, the same code will run on a single machine as well as on a cluster of machines. For more advantages of Ray see this related post.
Probably threading, from standard python library, is most convenient approach:
import threading
def worker(id):
#Do you calculations here
return
threads = []
for asset in range(len(Assets)):
df = df[df['Assets'] == asset]
for trader in range(len(df['TraderID'])):
t = threading.Thread(target=worker, args=(trader,))
threads.append(t)
t.start()
#add semaphore here if you need synchronize results for all traders.
Instead of using for, use map:
import functools
smartTrader =[]
m=map( calculations_as_a_function,
[df[df['Assets'] == asset] \
for asset in range(len(Assets))])
functools.reduce(smartTradder.append, m)
From then on, you can try different parallel map implementations s.a. multiprocessing's, or stackless'