Why does python multiprocessing script slow down after a while?

Why does python multiprocessing script slow down after a while? - python

I read an old question Why does this python multiprocessing script slow down after a while? and many others before posting this one. They do not answer the problem I'm having.
IDEA OF THE SCRIPT.
The script generates arrays, 256x256, in a serialised loop. Elements of an array are calculated one-by-one from a list that contains dictionaries with relevant params, one dictionary per an array element (256x256 in total per a list). The list is the way for me to enable parallel calculations.
THE PROBLEM.
In the beginning, the generation of the data speeds up from a dozen seconds up-to a few seconds. Then, after a few iterations, it starts slowing down a fraction of a second with each new array generated to the point it takes forever to calculate anything.
Additional info.
I am using a pool.map function. After making a few small changes to identify which element is being calculated, I also tried using map_async. Unfortunately, it is slower because I need to init the pool each time I finish calculating an array.
When using the pool.map, I init the pool once before anything starts. In this way, I hope to save time initializing the pool in comparison to map_async.
CPU shows low usage, up to ~18%.
In my instance, a hard-drive isn't a bottleneck. All the data necessary for calculations is in RAM. I also do not save data onto a hard-drive keeping everything in RAM.
I also checked if the problem persists if I use a different number of cores, 2-24. No changes either.
I made some additional tests by running and terminating a pool, a. each time an array is generated, b. every 10 arrays. I noticed that in each case execution of the code slows down compared to the previous pool's execution time, i.e. if the previous slowed down to 5s, another one will be 5.Xs and so on. The only time the execution doesn't slow down is when I run the code serially.
Working env: Windows 10, Python 3.7, conda 4.8.2, Spyder 4.
THE QUESTION: Why multiprocessing slows down after a while in the case where only CPU & RAM are involved (no hard-drive slowdown)? Any idea?
UPDATED CODE:
import multiprocessing as mp
from tqdm import tqdm
import numpy as np
import random
def wrapper_(arg):
return tmp.generate_array_elements(
self=arg['self'],
nu1=arg['nu1'],
nu2=arg['nu2'],
innt=arg['innt'],
nu1exp=arg['nu1exp'],
nu2exp=arg['nu2exp'],
ii=arg['ii'],
jj=arg['jj'],
llp=arg['self'].llp,
rr=arg['self'].rr,
)
class tmp:
def __init__(self, multiprocessing, length, n_of_arrays):
self.multiprocessing = multiprocessing
self.inshape = (length,length)
self.length = length
self.ll_len = n_of_arrays
self.num_cpus = 8
self.maxtasksperchild = 10000
self.rr = 0
"""original function is different, modified to return something"""
"""for the example purpose, lp is not relevant here but in general is"""
def get_ll(self, lp):
return [random.sample((range(self.length)),int(np.random.random()*12)+1) for ii in range(self.ll_len)]
"""original function is different, modified to return something"""
def get_ip(self): return np.random.random()
"""original function is different, modified to return something"""
def get_op(self): return np.random.random(self.length)
"""original function is different, modified to return something"""
def get_innt(self, nu1, nu2, ip):
return nu1*nu2/ip
"""original function is different, modified to return something"""
def __get_pp(self, nu1):
return np.exp(nu1)
"""dummy function for the example purpose"""
def dummy_function(self):
"""do important stuff"""
return
"""dummy function for the example purpose"""
def dummy_function_2(self, result):
"""do important stuff"""
return np.reshape(result, np.inshape)
"""dummy function for the example purpose"""
def dummy_function_3(self):
"""do important stuff"""
return
"""original function is different, modified to return something"""
"""for the example purpose, lp is not relevant here but in general is"""
def get_llp(self, ll, lp):
return [{'a': np.random.random(), 'b': np.random.random()} for ii in ll]
"""NOTE, lp is not used here for the example purpose but
in the original code, it's very important variable containg
relevant data for calculations"""
def generate(self, lp={}):
"""create a list that is used to the creation of 2-D array"""
"""providing here a dummy pp param to get_ll"""
ll = self.get_ll(lp)
ip = self.get_ip()
self.op = self.get_op()
"""length of args_tmp = self.length * self.length = 256 * 256"""
args_tmp = [
{'self': self,
'nu1': nu1,
'nu2': nu2,
'ii': ii,
'jj': jj,
'innt': np.abs(self.get_innt(nu1, nu2, ip)),
'nu1exp': np.exp(1j*nu1*ip),
'nu2exp': np.exp(1j*nu2*ip),
} for ii, nu1 in enumerate(self.op) for jj, nu2 in enumerate(self.op)]
pool = {}
if self.multiprocessing:
pool = mp.Pool(self.num_cpus, maxtasksperchild=self.maxtasksperchild)
"""number of arrays is equal to len of ll, here 300"""
for ll_ in tqdm(ll):
"""Generate data"""
self.__generate(ll_, lp, pool, args_tmp)
"""Create a pool of CPU threads"""
if self.multiprocessing:
pool.terminate()
def __generate(self, ll, lp, pool = {}, args_tmp = []):
"""In the original code there are plenty other things done in the code
using class' methods, they are not shown here for the example purpose"""
self.dummy_function()
self.llp = self.get_llp(ll, lp)
"""originally the values is taken from lp"""
self.rr = self.rr
if self.multiprocessing and pool:
result = pool.map(wrapper_, args_tmp)
else:
result = [wrapper_(arg) for arg in args_tmp]
"""In the original code there are plenty other things done in the code
using class' methods, they are not shown here for the example purpose"""
result = self.dummy_function_2(result)
"""original function is different"""
def generate_array_elements(self, nu1, nu2, llp, innt, nu1exp, nu2exp, ii = 0, jj = 0, rr=0):
if rr == 1 and self.inshape[0] - 1 - jj < ii:
return 0
elif rr == -1 and ii > jj:
return 0
elif rr == 0:
"""do nothing"""
ll1 = []
ll2 = []
"""In the original code there are plenty other things done in the code
using class' methods, they are not shown here for the example purpose"""
self.dummy_function_3()
for kk, ll in enumerate(llp):
ll1.append(
self.__get_pp(nu1) *
nu1*nu2*nu1exp**ll['a']*np.exp(1j*np.random.random())
)
ll2.append(
self.__get_pp(nu2) *
nu1*nu2*nu2exp**ll['b']*np.exp(1j*np.random.random())
)
t1 = sum(ll1)
t2 = sum(ll2)
result = innt*np.abs(t1 - t2)
return result
g = tmp(False, 256, 300)
g.generate()

It is hard to tell what is going on in your algorithm. I don't know a lot about multiprocessing but it is probably safer to stick with functions and avoid passing self down into the pooled processes. This is done when you pass args_tmp to wrapper_ in pool.map(). Also overall, try to reduce how much data is passed between the parent and child processes in general. I try to move the generation of the lp list into the pool workers to prevent passing excessive data.
Lastly, altough I don't think it matters in this example code but you should be either cleaning up after using pool or using pool with with.
I rewrote some of your code to try things out and this seems faster but I'm not 100% it adheres to your algorithm. Some of the variable names are hard to distinguish.
This runs a lot faster for me but it is hard to tell if it is producing your solutions accurately. My final conclusion if this is accurate is that the extra data passing was significantly slowing down the pool workers.
#main.py
if __name__ == '__main__':
import os
import sys
file_dir = os.path.dirname(__file__)
sys.path.append(file_dir)
from tmp import generate_1
parallel = True
generate_1(parallel)
#tmp.py
import multiprocessing as mp
import numpy as np
import random
from tqdm import tqdm
from itertools import starmap
def wrapper_(arg):
return arg['self'].generate_array_elements(
nu1=arg['nu1'],
nu2=arg['nu2'],
ii=arg['ii'],
jj=arg['jj'],
lp=arg['self'].lp,
nu1exp=arg['nu1exp'],
nu2exp=arg['nu2exp'],
innt=arg['innt']
)
def generate_1(parallel):
"""create a list that is used to the creation of 2-D array"""
il = np.random.random(256)
"""generating params for parallel data generation"""
"""some params are also calculated here to speed up the calculation process
because they are always the same so they can be calculated just once"""
"""this code creates a list of 256*256 elements"""
args_tmp = [
{
'nu1': nu1,
'nu2': nu2,
'ii': ii,
'jj': jj,
'innt': np.random.random()*nu1+np.random.random()*nu2,
'nu1exp': np.exp(1j*nu1),
'nu2exp': np.exp(1j*nu2),
} for ii, nu1 in enumerate(il) for jj, nu2 in enumerate(il)]
"""init pool"""
"""get list of arrays to generate"""
ip_list = [random.sample((range(256)),int(np.random.random()*12)+1) for ii in range(300)]
map_args = [(idx, ip, args_tmp) for idx, ip in enumerate(ip_list)]
"""separate function to do other important things"""
if parallel:
with mp.Pool(8, maxtasksperchild=10000) as pool:
result = pool.starmap(start_generate_2, map_args)
else:
result = starmap(start_generate_2, map_args)
# Wrap iterator in list call.
return list(result)
def start_generate_2(idx, ip, args_tmp):
print ('starting {idx}'.format(idx=idx))
runner = Runner()
result = runner.generate_2(ip, args_tmp)
print ('finished {idx}'.format(idx=idx))
return result
class Runner():
def generate_2(self, ip, args_tmp):
"""NOTE, the method is much more extensive and uses other methods of the class"""
"""so it must remain a method of the class that is not static!"""
self.lp = [{'a': np.random.random(), 'b': np.random.random()} for ii in ip]
"""this part creates 1-D array of the length of args_tmp, that's 256*256"""
result = map(wrapper_, [dict(args, self=self) for args in args_tmp])
"""it's then reshaped to 2-D array"""
result = np.reshape(list(result), (256,256))
return result
def generate_array_elements(self, nu1, nu2, ii, jj, lp, nu1exp, nu2exp, innt):
"""doing heavy calc"""
""""here is something else"""
if ii > jj: return 0
ll1 = []
ll2 = []
for kk, ll in enumerate(lp):
ll1.append(nu1*nu2*nu1exp**ll['a']*np.exp(1j*np.random.random()))
ll2.append(nu1*nu2*nu2exp**ll['b']*np.exp(1j*np.random.random()))
t1 = sum(ll1)
t2 = sum(ll2)
result = innt*np.abs(t1 - t2)
return result
I'm adding a generic template to show an architecture where you would split the preparation of the shared args away from the task runner and still use classes. The strategy here would be do not create too many tasks(300 seems faster than trying to split them down to 64000), and don't pass too much data to each task. The interface of launch_task should be kept as simple as possible, which in my refactoring of your code would be equivalent to start_generate_2.
import multiprocessing
from itertools import starmap
class Launcher():
def __init__(self, parallel):
self.parallel = parallel
def generate_shared_args(self):
return [(i, j) for i, j in enumerate(range(300))]
def launch(self):
shared_args = self.generate_shared_args()
if self.parallel:
with multiprocessing.Pool(8) as pool:
result = pool.starmap(launch_task, shared_args)
else:
result = starmap(launch_task, shared_args)
# Wrap in list to resolve iterable.
return list(result)
def launch_task(i, j):
task = Task(i, j)
return task.run()
class Task():
def __init__(self, i, j):
self.i = i
self.j = j
def run(self):
return self.i + self.j
if __name__ == '__main__':
parallel = True
launcher = Launcher(parallel)
print(launcher.launch())
There is a warning about the cleanup of pool in the pool documentation here: https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.Pool
The first item discusses avoiding shared state and specifically large amounts of data.
https://docs.python.org/3/library/multiprocessing.html#programming-guidelines

Ian Wilson's suggestions were very helpful and one them helped to resolve the issue. That's why his answer is marked as the correct one.
As he suggested, it's better to call pool on a smaller number of tasks. So instead of calling pool.map for each array (N) that is created 256*256 times for each array's element (so N*256*256 tasks in total), now I call pool.map on the function that calculates the whole array so just N times. The array calculation inside the function is done in a serialised way.
I'm still sending self as a param because it's needed in the function but it doesn't have any impact on the performance.
That small change speeds-up a calculation of an array from 7-15s up to 1.5it/s-2s/it!
CURRENT CODE:
import multiprocessing as mp
import tqdm
import numpy as np
import random
def wrapper_(arg):
return tmp.generate_array_elements(
self=arg['self'],
nu1=arg['nu1'],
nu2=arg['nu2'],
innt=arg['innt'],
nu1exp=arg['nu1exp'],
nu2exp=arg['nu2exp'],
ii=arg['ii'],
jj=arg['jj'],
llp=arg['self'].llp,
rr=arg['self'].rr,
)
"""NEW WRAPPER HERE"""
"""Sending self doesn't have bad impact on the performance, at least I don't complain :)"""
def generate(arg):
tmp._tmp__generate(arg['self'], arg['ll'], arg['lp'], arg['pool'], arg['args_tmp'])
class tmp:
def __init__(self, multiprocessing, length, n_of_arrays):
self.multiprocessing = multiprocessing
self.inshape = (length,length)
self.length = length
self.ll_len = n_of_arrays
self.num_cpus = 8
self.maxtasksperchild = 10000
self.rr = 0
"""original function is different, modified to return something"""
"""for the example purpose, lp is not relevant here but in general is"""
def get_ll(self, lp):
return [random.sample((range(self.length)),int(np.random.random()*12)+1) for ii in range(self.ll_len)]
"""original function is different, modified to return something"""
def get_ip(self): return np.random.random()
"""original function is different, modified to return something"""
def get_op(self): return np.random.random(self.length)
"""original function is different, modified to return something"""
def get_innt(self, nu1, nu2, ip):
return nu1*nu2/ip
"""original function is different, modified to return something"""
def __get_pp(self, nu1):
return np.exp(nu1)
"""dummy function for the example purpose"""
def dummy_function(self):
"""do important stuff"""
return
"""dummy function for the example purpose"""
def dummy_function_2(self, result):
"""do important stuff"""
return np.reshape(result, np.inshape)
"""dummy function for the example purpose"""
def dummy_function_3(self):
"""do important stuff"""
return
"""original function is different, modified to return something"""
"""for the example purpose, lp is not relevant here but in general is"""
def get_llp(self, ll, lp):
return [{'a': np.random.random(), 'b': np.random.random()} for ii in ll]
"""NOTE, lp is not used here for the example purpose but
in the original code, it's very important variable containg
relevant data for calculations"""
def generate(self, lp={}):
"""create a list that is used to the creation of 2-D array"""
"""providing here a dummy pp param to get_ll"""
ll = self.get_ll(lp)
ip = self.get_ip()
self.op = self.get_op()
"""length of args_tmp = self.length * self.length = 256 * 256"""
args_tmp = [
{'self': self,
'nu1': nu1,
'nu2': nu2,
'ii': ii,
'jj': jj,
'innt': np.abs(self.get_innt(nu1, nu2, ip)),
'nu1exp': np.exp(1j*nu1*ip),
'nu2exp': np.exp(1j*nu2*ip),
} for ii, nu1 in enumerate(self.op) for jj, nu2 in enumerate(self.op)]
pool = {}
"""MAJOR CHANGE IN THIS PART AND BELOW"""
map_args = [{'self': self, 'idx': (idx, len(ll)), 'll': ll, 'lp': lp, 'pool': pool, 'args_tmp': args_tmp} for idx, ll in enumerate(ll)]
if self.multiprocessing:
pool = mp.Pool(self.num_cpus, maxtasksperchild=self.maxtasksperchild)
for _ in tqdm.tqdm(pool.imap_unordered(generate_js_, map_args), total=len(map_args)):
pass
pool.close()
pool.join()
pbar.close()
else:
for map_arg in tqdm.tqdm(map_args):
generate_js_(map_arg)
def __generate(self, ll, lp, pool = {}, args_tmp = []):
"""In the original code there are plenty other things done in the code
using class' methods, they are not shown here for the example purpose"""
self.dummy_function()
self.llp = self.get_llp(ll, lp)
"""originally the values is taken from lp"""
self.rr = self.rr
"""REMOVED PARALLEL CALL HERE"""
result = [wrapper_(arg) for arg in args_tmp]
"""In the original code there are plenty other things done in the code
using class' methods, they are not shown here for the example purpose"""
result = self.dummy_function_2(result)
"""original function is different"""
def generate_array_elements(self, nu1, nu2, llp, innt, nu1exp, nu2exp, ii = 0, jj = 0, rr=0):
if rr == 1 and self.inshape[0] - 1 - jj < ii:
return 0
elif rr == -1 and ii > jj:
return 0
elif rr == 0:
"""do nothing"""
ll1 = []
ll2 = []
"""In the original code, there are plenty other things done in the code
using class' methods, they are not shown here for the example purpose"""
self.dummy_function_3()
for kk, ll in enumerate(llp):
ll1.append(
self.__get_pp(nu1) *
nu1*nu2*nu1exp**ll['a']*np.exp(1j*np.random.random())
)
ll2.append(
self.__get_pp(nu2) *
nu1*nu2*nu2exp**ll['b']*np.exp(1j*np.random.random())
)
t1 = sum(ll1)
t2 = sum(ll2)
result = innt*np.abs(t1 - t2)
return result
g = tmp(False, 256, 300)
g.generate()
Thank you Ian, again.

Related

Parallelization within a python object

I am working on a simulation where I need to compute an expensive numerical integral at many different time points. Each integrand is a function of the time it is sampling up to, so I must evaluate each of the points independently. Because each integral is independent of all others, this can be implemented in an embarrassingly parallel fashion.
I would like to run this on an HPC cluster, so I have attempted to parallelize this process using mpi4py; however, my current implementation causes each processor to do the entire calculation (including the scattering to other cores) rather than have only the for loop inside of the object parallelized. As written, with n cores this takes n times as long as with one core (not a good sign...).
Because the only step which takes any amount of time is the computation itself, I would like everything except that specific for loop to run on the root node.
Below is a pseudo-code reduction of my current implementation:
import numpy as np
from mpi4py import MPI
COMM = MPI.COMM_WORLD
class Integrand:
def __init__(self, t_max, dt, **kwargs):
self.t_max = t_max
self.dt = dt
self.time_sample = np.arange(0, self.t_max, self.dt)
self.function_args = kwargs
self.final_result = np.empty_like(self.time_sample)
def do_integration(self):
if COMM.rank == 0:
times_partitioned = split(self.time_sample, COMM.size)
else:
times_partitioned = None
times_partitioned = COMM.scatter(times_partitioned, root=0)
results = np.empty(times_partitioned.shape, dtype=complex)
for counter, t in enumerate(times_partitioned):
results = computation(self, t, **self.function_args)
results = MPI.COMM_WORLD.gather(results, root=0)
if COMM.rank is 0:
##inter-leaf back together
for i in range(COMM.size):
self.final_result[i::COMM.size] = results[i]
if __name__ = '__main__':
kwargs_set = [kwargs1, kwargs2, kwargs3, ..., kwargsN]
for kwargs in kwargs_set:
integrand_object = Integrand(**kwargs)
integrand_object.do_integration()
save_and_plot_results(integrand_object.final_result)

A simple way to parallelize this problem without drastically changing how the class is called/used is to make use of a decorator. The decorator (shown below) makes it so that rather than creating the same object on every core, each core creates an object with the chunk of the time steps it needs to evaluate. After they have all been evaluated it gathers their results and returns a single object with the full result to one core. This particular implementation changes the class functionality slightly by forcing evaluation of the integral at creation time.
from functools import wraps
import numpy as np
from mpi4py import MPI
COMM = MPI.COMM_WORLD
def parallelize_integrand(integral_class):
def split(container, count):
return [container[_i::count] for _i in range(count)]
#wraps(integral_class)
def wrapper(*args,**kwargs):
int_object = integral_class(*args, **kwargs)
time_sample_total = int_object.time_sample
if COMM.rank is 0:
split_time = split(time_sample_total,COMM.size)
final_result = np.empty_like(int_object.result)
else:
split_time = None
split_time = COMM.scatter(split_time, root=0)
int_object.time_sample = split_time
int_object.do_integration()
result = int_object.result
result = COMM.gather(result, root=0)
if COMM.rank is 0:
for i in range(COMM.size):
final_result[i::COMM.size] = result[i]
int_object.time_sample = time_sample_total
int_object.result = final_result
return int_object
#parallelize_integrand
class Integrand:
def __init__(self, t_max, dt, **kwargs):
self.t_max = t_max
self.dt = dt
self.time_sample = np.arange(0, self.t_max, self.dt)
self.kwargs = kwargs
self.result = np.empty_like(self.time_sample)
def do_integration(self):
for counter, t in enumerate(self.time_sample):
result[counter] = computation(self, t, **self.kwargs)
if __name__ = '__main__':
kwargs_set = [kwargs1, kwargs2, kwargs3, ..., kwargsN]
for kwargs in kwargs_set:
integrand_object = Integrand(**kwargs)
save_and_plot_results(integrand_object.result)

pyspark cache values in a spark worker

I am writing a python library that will be called by a pyspark code. As part of this library there is a slow function.
I would like to cache the results of this function so that a table is kept in memory. (At least in each worker).
For example:
def slow_function(x):
time.sleep(10)
return x*2
class CacheSlowFunction():
def __init__(self):
self.values = {}
def slow_function(x):
if x in self.values:
return self.values[x]
else:
res = slow_function(x)
self.values[x] = res
return res
def main(x):
csf = CacheSlowFunction()
s = 0
for i in range(x):
s += csf.slow_function(i)
return s
and the code is called from spark with something like:
map(main, [i for i in range(10000)])
Now the code will create a table (self.values) for each call. Is it possible to have this table at least shared across computations done on the same worker?

Multiprocessing pool: How to call an arbitrary list of methods on a list of class objects

A cleaned up version of the code including the solution to the problem (thanks #JohanL!) can be found as a Gist on GitHub.
The following code snipped (CPython 3.[4,5,6]) illustrates my intention (as well as my problem):
from functools import partial
import multiprocessing
from pprint import pprint as pp
NUM_CORES = multiprocessing.cpu_count()
class some_class:
some_dict = {'some_key': None, 'some_other_key': None}
def some_routine(self):
self.some_dict.update({'some_key': 'some_value'})
def some_other_routine(self):
self.some_dict.update({'some_other_key': 77})
def run_routines_on_objects_in_parallel_and_return(in_object_list, routine_list):
func_handle = partial(__run_routines_on_object_and_return__, routine_list)
with multiprocessing.Pool(processes = NUM_CORES) as p:
out_object_list = list(p.imap_unordered(
func_handle,
(in_object for in_object in in_object_list)
))
return out_object_list
def __run_routines_on_object_and_return__(routine_list, in_object):
for routine_name in routine_list:
getattr(in_object, routine_name)()
return in_object
object_list = [some_class() for item in range(20)]
pp([item.some_dict for item in object_list])
new_object_list = run_routines_on_objects_in_parallel_and_return(
object_list,
['some_routine', 'some_other_routine']
)
pp([item.some_dict for item in new_object_list])
verification_object_list = [
__run_routines_on_object_and_return__(
['some_routine', 'some_other_routine'],
item
) for item in object_list
]
pp([item.some_dict for item in verification_object_list])
I am working with a list of objects of type some_class. some_class has a property, a dictionary, named some_dict and a few methods, which can modify the dict (some_routine and some_other_routine). Sometimes, I want to call a sequence of methods on all the objects in the list. Because this is computationally intensive, I intend to distribute the objects over multiple CPU cores (using multiprocessing.Pool and imap_unordered - the list order does not matter).
The routine __run_routines_on_object_and_return__ takes care of calling the list of methods on one individual object. From what I can tell, this is working just fine. I am using functools.partial for simplifying the structure of the code a bit - the multiprocessing pool therefore has to handle the list of objects as an input parameter only.
The problem is ... it does not work. The objects contained in the list returned by imap_unordered are identical to the objects I fed into it. The dictionaries within the objects look just like before. I have used similar mechanisms for working on lists of dictionaries directly without a glitch, so I somehow suspect that there is something wrong with modifying an object property which happens to be a dictionary.
In my example, verification_object_list contains the correct result (though it is generated in a single process/thread). new_object_list is identical to object_list, which should not be the case.
What am I doing wrong?
EDIT
I found the following question, which has an actually working and applicable answer. I modified it a bit following my idea of calling a list of methods on every object and it works:
import random
from multiprocessing import Pool, Manager
class Tester(object):
def __init__(self, num=0.0, name='none'):
self.num = num
self.name = name
def modify_me(self):
self.num += random.normalvariate(mu=0, sigma=1)
self.name = 'pla' + str(int(self.num * 100))
def __repr__(self):
return '%s(%r, %r)' % (self.__class__.__name__, self.num, self.name)
def init(L):
global tests
tests = L
def modify(i_t_nn):
i, t, nn = i_t_nn
for method_name in nn:
getattr(t, method_name)()
tests[i] = t # copy back
return i
def main():
num_processes = num = 10 #note: num_processes and num may differ
manager = Manager()
tests = manager.list([Tester(num=i) for i in range(num)])
print(tests[:2])
args = ((i, t, ['modify_me']) for i, t in enumerate(tests))
pool = Pool(processes=num_processes, initializer=init, initargs=(tests,))
for i in pool.imap_unordered(modify, args):
print("done %d" % i)
pool.close()
pool.join()
print(tests[:2])
if __name__ == '__main__':
main()
Now, I went a bit further and introduced my original some_class into the game, which contains a the described dictionary property some_dict. It does NOT work:
import random
from multiprocessing import Pool, Manager
from pprint import pformat as pf
class some_class:
some_dict = {'some_key': None, 'some_other_key': None}
def some_routine(self):
self.some_dict.update({'some_key': 'some_value'})
def some_other_routine(self):
self.some_dict.update({'some_other_key': 77})
def __repr__(self):
return pf(self.some_dict)
def init(L):
global tests
tests = L
def modify(i_t_nn):
i, t, nn = i_t_nn
for method_name in nn:
getattr(t, method_name)()
tests[i] = t # copy back
return i
def main():
num_processes = num = 10 #note: num_processes and num may differ
manager = Manager()
tests = manager.list([some_class() for i in range(num)])
print(tests[:2])
args = ((i, t, ['some_routine', 'some_other_routine']) for i, t in enumerate(tests))
pool = Pool(processes=num_processes, initializer=init, initargs=(tests,))
for i in pool.imap_unordered(modify, args):
print("done %d" % i)
pool.close()
pool.join()
print(tests[:2])
if __name__ == '__main__':
main()
The diff between working and not working is really small, but I still do not get it:
diff --git a/test.py b/test.py
index b12eb56..0aa6def 100644
--- a/test.py
+++ b/test.py
## -1,15 +1,15 ##
import random
from multiprocessing import Pool, Manager
+from pprint import pformat as pf
-class Tester(object):
- def __init__(self, num=0.0, name='none'):
- self.num = num
- self.name = name
- def modify_me(self):
- self.num += random.normalvariate(mu=0, sigma=1)
- self.name = 'pla' + str(int(self.num * 100))
+class some_class:
+ some_dict = {'some_key': None, 'some_other_key': None}
+ def some_routine(self):
+ self.some_dict.update({'some_key': 'some_value'})
+ def some_other_routine(self):
+ self.some_dict.update({'some_other_key': 77})
def __repr__(self):
- return '%s(%r, %r)' % (self.__class__.__name__, self.num, self.name)
+ return pf(self.some_dict)
def init(L):
global tests
## -25,10 +25,10 ## def modify(i_t_nn):
def main():
num_processes = num = 10 #note: num_processes and num may differ
manager = Manager()
- tests = manager.list([Tester(num=i) for i in range(num)])
+ tests = manager.list([some_class() for i in range(num)])
print(tests[:2])
- args = ((i, t, ['modify_me']) for i, t in enumerate(tests))
+ args = ((i, t, ['some_routine', 'some_other_routine']) for i, t in enumerate(tests))
What is happening here?

Your problem is due to two things; namely that you are using a class variable and that you are running your code in different processes.
Since different processes do not share memory, all objects and parameters must be pickled and sent from the original process to the process that executes it. When the parameter is an object, its class is not sent with it. Instead the receiving process uses its own blueprint (i.e. class).
In your current code, you pass the object as a parameter, update it and return it. However, the updates are not made to the object, but rather to the class itself, since you are updating a class variable. However, this update is not sent back to your main process, and therefore you are left with your not updated class.
What you want to do, is to make some_dict a part of your object, rather than of your class. This is easily done by an __init__() method. Thus modify some_class as:
class some_class:
def __init__(self):
self.some_dict = {'some_key': None, 'some_other_key': None}
def some_routine(self):
self.some_dict.update({'some_key': 'some_value'})
def some_other_routine(self):
self.some_dict.update({'some_other_key': 77})
This will make your program work as you intend it to. You almost always want to setup your object in an __init__() call, rather than as class variables, since in the latter case the data will be shared between all instances (and can be updated by all). That is not normally what you want, when you encapsulate data and state in an object of a class.
EDIT: It seems I was mistaken in whether the class is sent with the pickled object. After further inspection of what happens, I think also the class itself, with its class variables are pickled. Since, if the class variable is updated before sending the object to the new process, the updated value is available. However it is still the case that the updates done in the new process are not relayed back to the original class.

UnitTesting questions to check my coding

My assignment is to test my Statslist for reasonable inputs that might go into using my StatsList class. I'm very confused on how to make unittests for these questions provided. I've correctly done the first one in class. The questions are:
What is the count, mean, median, and mode of an empty list (a StatsList that nothing's been appended to)?
What is the count, mean, median, and mode of a list with one value?
What is the count, mean, median, and mode of a list with two values?
If values are inserted in the wrong order (as in the example above), does the median still work?
If values are inserted in order, does the median still work?
If there are multiple values that all appear the same number of times, what will the mode be?
Coding:
import unittest
import statslist
class StatsTest(unittest.TestCase):
def test_Append(self):
sl = statslist.StatsList()
self.assertEqual(0, sl.count())
sl.append(10)
self.assertEqual(1, sl.count())
def test_OneValue(self):
def test_Mean(self):
self.assert
def test_Median(self):
def test_Mode(self):
if __name__ == '__main__':
unittest.main()
Coding for StatsList:
class StatsList:
def __init__(self):
self.sum = 0
self.nums = []
def append(self, number):
self.nums.append(number)
def count(self):
count = len(self.nums)
return count
def mean(self):
for num in self.nums:
self.sum = self.sum + num
return self.sum /len(self.nums)
def median(self):
self.nums.sort()
midPos = self.count() // 2
if self.count() % 2 == 0:
median = (nums[midPos] + nums[midPos-1]) / 2.0
else:
median = self.nums[midPos]
return median
def mode(self):
counts= {}
for num in self.nums:
counts[num] = counts.get(num,0) + 1
mode = max(counts, key = counts.get)
return mode
def byFreq(pair):
return pair[1]
def main():
l = StatsList()
l.append(1)
l.append(11)
l.append(3)
l.append(1)
l.append(4)
print("Count:", l.count()) # should print 5
print("Mean:", l.mean()) # should print 4.0
print("Median:", l.median()) # should print 3
print("Mode:", l.mode()) # should print 1
if __name__ == '__main__':
main()

You can write something like this:
import unittest
import statslist
class StatsTest(unittest.TestCase):
def test_append(self):
sl = statslist.StatsList()
self.assertEqual(0, sl.count())
sl.append(10)
self.assertEqual(1, sl.count())
def test_one_value(self):
# given
sl = statslist.StatsList()
# when
sl.append(10)
# then
self.assertEqual(1, sl.count())
self.assertEqual(10, sl.mean())
self.assertEqual(10, sl.median())
self.assertEqual(10, sl.mode())
def test_two_values(self):
# given
sl = statslist.StatsList()
# when
sl.append(10)
sl.append(11)
# then
self.assertEqual(1, sl.count())
self.assertEqual(10, sl.mean())
self.assertEqual(10, sl.median())
self.assertEqual(10, sl.mode())
def test_median_wrong_order(self):
# given
sl = statslist.StatsList()
# when
sl.append(12)
sl.append(13)
sl.append(11)
# then
self.assertEqual(12, sl.median())
def test_median_in_order(self):
# given
sl = statslist.StatsList()
# when
sl.append(11)
sl.append(12)
sl.append(13)
# then
self.assertEqual(12, sl.median())
def test_mode_with_multiple_vals_same_num_of_times(self):
# given
sl = statslist.StatsList()
# when
sl.append(11)
sl.append(11)
sl.append(12)
sl.append(12)
sl.append(13)
# then
self.assertEqual(11, sl.mode())
The idea of unit tests is to make sure your code actually works the way it is supposed to. It is a great way to discover bugs early and to prevent you having to spend countless hours debugging that weird bug that's just happened in production.
Your unit test should cover all (or most) edge cases. This brings an additional advantage: it automatically documents your code and helps others refactor your code later because they can just run the unit tests and if there is an error after refactoring, that probably means they've done something wrong.
Depending on your needs, you could improve your code to automatically track the stats when an element is added. This would make the mean(), median(), count() and mode() execute with O(1) complexity, however depending on the algorithms used it might slow down the append() method.

Python decorator with multiprocessing fails

I would like to use a decorator on a function that I will subsequently pass to a multiprocessing pool. However, the code fails with "PicklingError: Can't pickle : attribute lookup __builtin__.function failed". I don't quite see why it fails here. I feel certain that it's something simple, but I can't find it. Below is a minimal "working" example. I thought that using the functools function would be enough to let this work.
If I comment out the function decoration, it works without an issue. What is it about multiprocessing that I'm misunderstanding here? Is there any way to make this work?
Edit: After adding both a callable class decorator and a function decorator, it turns out that the function decorator works as expected. The callable class decorator continues to fail. What is it about the callable class version that keeps it from being pickled?
import random
import multiprocessing
import functools
class my_decorator_class(object):
def __init__(self, target):
self.target = target
try:
functools.update_wrapper(self, target)
except:
pass
def __call__(self, elements):
f = []
for element in elements:
f.append(self.target([element])[0])
return f
def my_decorator_function(target):
#functools.wraps(target)
def inner(elements):
f = []
for element in elements:
f.append(target([element])[0])
return f
return inner
#my_decorator_function
def my_func(elements):
f = []
for element in elements:
f.append(sum(element))
return f
if __name__ == '__main__':
elements = [[random.randint(0, 9) for _ in range(5)] for _ in range(10)]
pool = multiprocessing.Pool(processes=4)
results = [pool.apply_async(my_func, ([e],)) for e in elements]
pool.close()
f = [r.get()[0] for r in results]
print(f)

The problem is that pickle needs to have some way to reassemble everything that you pickle. See here for a list of what can be pickled:
http://docs.python.org/library/pickle.html#what-can-be-pickled-and-unpickled
When pickling my_func, the following components need to be pickled:
An instance of my_decorator_class, called my_func.
This is fine. Pickle will store the name of the class and pickle its __dict__ contents. When unpickling, it uses the name to find the class, then creates an instance and fills in the __dict__ contents. However, the __dict__ contents present a problem...
The instance of the original my_func that's stored in my_func.target.
This isn't so good. It's a function at the top-level, and normally these can be pickled. Pickle will store the name of the function. The problem, however, is that the name "my_func" is no longer bound to the undecorated function, it's bound to the decorated function. This means that pickle won't be able to look up the undecorated function to recreate the object. Sadly, pickle doesn't have any way to know that object it's trying to pickle can always be found under the name __main__.my_func.
You can change it like this and it will work:
import random
import multiprocessing
import functools
class my_decorator(object):
def __init__(self, target):
self.target = target
try:
functools.update_wrapper(self, target)
except:
pass
def __call__(self, candidates, args):
f = []
for candidate in candidates:
f.append(self.target([candidate], args)[0])
return f
def old_my_func(candidates, args):
f = []
for c in candidates:
f.append(sum(c))
return f
my_func = my_decorator(old_my_func)
if __name__ == '__main__':
candidates = [[random.randint(0, 9) for _ in range(5)] for _ in range(10)]
pool = multiprocessing.Pool(processes=4)
results = [pool.apply_async(my_func, ([c], {})) for c in candidates]
pool.close()
f = [r.get()[0] for r in results]
print(f)
You have observed that the decorator function works when the class does not. I believe this is because functools.wraps modifies the decorated function so that it has the name and other properties of the function it wraps. As far as the pickle module can tell, it is indistinguishable from a normal top-level function, so it pickles it by storing its name. Upon unpickling, the name is bound to the decorated function so everything works out.

I also had some problem using decorators in multiprocessing. I'm not sure if it's the same problem as yours:
My code looked like this:
from multiprocessing import Pool
def decorate_func(f):
def _decorate_func(*args, **kwargs):
print "I'm decorating"
return f(*args, **kwargs)
return _decorate_func
#decorate_func
def actual_func(x):
return x ** 2
my_swimming_pool = Pool()
result = my_swimming_pool.apply_async(actual_func,(2,))
print result.get()
and when I run the code I get this:
Traceback (most recent call last):
File "test.py", line 15, in <module>
print result.get()
File "somedirectory_too_lengthy_to_put_here/lib/python2.7/multiprocessing/pool.py", line 572, in get
raise self._value
cPickle.PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed
I fixed it by defining a new function to wrap the function in the decorator function, instead of using the decorator syntax
from multiprocessing import Pool
def decorate_func(f):
def _decorate_func(*args, **kwargs):
print "I'm decorating"
return f(*args, **kwargs)
return _decorate_func
def actual_func(x):
return x ** 2
def wrapped_func(*args, **kwargs):
return decorate_func(actual_func)(*args, **kwargs)
my_swimming_pool = Pool()
result = my_swimming_pool.apply_async(wrapped_func,(2,))
print result.get()
The code ran perfectly and I got:
I'm decorating
4
I'm not very experienced at Python, but this solution solved my problem for me

If you want the decorators too bad (like me), you can also use the exec() command on the function string, to circumvent the mentioned pickling.
I wanted to be able to pass all the arguments to an original function and then use them successively. The following is my code for it.
At first, I made a make_functext() function to convert the target function object to a string. For that, I used the getsource() function from the inspect module (see doctumentation here and note that it can't retrieve source code from compiled code etc.). Here it is:
from inspect import getsource
def make_functext(func):
ft = '\n'.join(getsource(func).split('\n')[1:]) # Removing the decorator, of course
ft = ft.replace(func.__name__, 'func') # Making function callable with 'func'
ft = ft.replace('#§ ', '').replace('#§', '') # For using commented code starting with '#§'
ft = ft.strip() # In case the function code was indented
return ft
It is used in the following _worker() function that will be the target of the processes:
def _worker(functext, args):
scope = {} # This is needed to keep executed definitions
exec(functext, scope)
scope['func'](args) # Using func from scope
And finally, here's my decorator:
from multiprocessing import Process
def parallel(num_processes, **kwargs):
def parallel_decorator(func, num_processes=num_processes):
functext = make_functext(func)
print('This is the parallelized function:\n', functext)
def function_wrapper(funcargs, num_processes=num_processes):
workers = []
print('Launching processes...')
for k in range(num_processes):
p = Process(target=_worker, args=(functext, funcargs[k])) # use args here
p.start()
workers.append(p)
return function_wrapper
return parallel_decorator
The code can finally be used by defining a function like this:
#parallel(4)
def hello(args):
#§ from time import sleep # use '#§' to avoid unnecessary (re)imports in main program
name, seconds = tuple(args) # unpack args-list here
sleep(seconds)
print('Hi', name)
... which can now be called like this:
hello([['Marty', 0.5],
['Catherine', 0.9],
['Tyler', 0.7],
['Pavel', 0.3]])
... which outputs:
This is the parallelized function:
def func(args):
from time import sleep
name, seconds = tuple(args)
sleep(seconds)
print('Hi', name)
Launching processes...
Hi Pavel
Hi Marty
Hi Tyler
Hi Catherine
Thanks for reading, this is my very first post. If you find any mistakes or bad practices, feel free to leave a comment. I know that these string conversions are quite dirty, though...

If you use this code for your decorator:
import multiprocessing
from types import MethodType
DEFAULT_POOL = []
def run_parallel(_func=None, *, name: str = None, context_pool: list = DEFAULT_POOL):
class RunParallel:
def __init__(self, func):
self.func = func
def __call__(self, *args, **kwargs):
process = multiprocessing.Process(target=self.func, name=name, args=args, kwargs=kwargs)
context_pool.append(process)
process.start()
def __get__(self, instance, owner):
return self if instance is None else MethodType(self, instance)
if _func is None:
return RunParallel
else:
return RunParallel(_func)
def wait_context(context_pool: list = DEFAULT_POOL, kill_others_if_one_fails: bool = False):
finished = []
for process in context_pool:
process.join()
finished.append(process)
if kill_others_if_one_fails and process.exitcode != 0:
break
if kill_others_if_one_fails:
# kill unfinished processes
for process in context_pool:
if process not in finished:
process.kill()
# wait for every process to be dead
for process in context_pool:
process.join()
Then you can use it like this, in these 4 examples:
#run_parallel
def m1(a, b="b"):
print(f"m1 -- {a=} {b=}")
#run_parallel(name="mym2", context_pool=DEFAULT_POOL)
def m2(d, cc="cc"):
print(f"m2 -- {d} {cc=}")
a = 1/0
class M:
#run_parallel
def c3(self, k, n="n"):
print(f"c3 -- {k=} {n=}")
#run_parallel(name="Mc4", context_pool=DEFAULT_POOL)
def c4(self, x, y="y"):
print(f"c4 -- {x=} {y=}")
if __name__ == "__main__":
m1(11)
m2(22)
M().c3(33)
M().c4(44)
wait_context(kill_others_if_one_fails=True)
The output will be:
m1 -- a=11 b='b'
m2 -- 22 cc='cc'
c3 -- k=33 n='n'
(followed by the exception raised in method m2)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Why does python multiprocessing script slow down after a while? - python

Related

Parallelization within a python object

pyspark cache values in a spark worker

Multiprocessing pool: How to call an arbitrary list of methods on a list of class objects

UnitTesting questions to check my coding

Python decorator with multiprocessing fails

Categories

Resources