Multiprocessing for a range of loops in Python? - python

I have a very big array that I need to create with over 10^7 columns that needs to get filtered/modified depending on some criteria. There is a set of 24 different criterias (2x4x3 due to combinations) which means the filtering/modification needs to be done 24 times and each result is saved in a different specified directory.
Since this takes a very long time, I am looking into using multiprocessing to speed up the process. Can anyone help me out? Here is an exemplary code:
import itertools
import numpy as np
sample_size = 1000000
variables = 25
x_array = np.random.rand(variables, sample_size)
x_dir = ['x1', 'x2']
y_dir = ['y1', 'y2', 'y3', 'y4']
z_dir = ['z1', 'z2', 'z3']
x_directories = [0, 1]
y_directories = [0, 1, 2, 3]
z_directories = [0, 1, 2]
directory_combinations = itertools.product(x_directories, y_directories, z_directories)
for k, t, h in directory_combinations:
target_dir=main_dir+'/'+x_dir[k]+'/'+y_dir[t]+'/'+z_dir[h]
for i in range(sample_size):
#x_array gets filtered/modified
#x_array gets saved in target_dir directory as a dataframe after modification'''
Basically with multiprocessing I am hoping for either each loop handled by a single core out of 16 I have available or for each loop iteration to be sped up by using all 16 cores.
Many thanks in advance!

Take one of loop and rewrite it to function
for k, t, h in directory_combinations:
Becomes for example
def func(k,t,h):
....
pool = multiprocessing.Pool(12)
pool.starmap_async(func, directory_combinations, 32)
It spawns 12 processes, that apply func on each iteration of 2 argument. Data tranfered to processes by 32-length chunks.

The following code first creates the x_array in shared memory and initialized each process in the pool with global variable x_array, which is this shared array.
I would move the code that creates a the copy of this global x_array, processes it and then writes out the dataframe to a function, worker, which is passed the target directory as an argument.
import itertools
import numpy as np
import ctypes
import multiprocessing as mp
SAMPLE_SIZE = 1000000
VARIABLES = 25
def to_numpy_array(shared_array, shape):
'''Create a numpy array backed by a shared memory Array.'''
arr = np.ctypeslib.as_array(shared_array)
return arr.reshape(shape)
def to_shared_array(arr, ctype):
shared_array = mp.Array(ctype, arr.size, lock=False)
temp = np.frombuffer(shared_array, dtype=arr.dtype)
temp[:] = arr.flatten(order='C')
return shared_array
def init_pool(shared_array, shape):
global x_array
# Recreate x_array using shared memory array:
x_array = to_numpy_array(shared_array, shape)
def worker(target_dir):
# make copy of x_array with np.copy
x_array_copy = np.copy(x_array)
for i in range(sample_size):
#x_array_copy gets filtered/modified
...
#x_array_copy gets saved in target_dir directory as a dataframe after modification
def main():
main_dir = '.' # for example
x_dir = ['x1', 'x2']
y_dir = ['y1', 'y2', 'y3', 'y4']
z_dir = ['z1', 'z2', 'z3']
x_directories = [0, 1]
y_directories = [0, 1, 2, 3]
z_directories = [0, 1, 2]
directory_combinations = itertools.product(x_directories, y_directories, z_directories)
target_dirs = [main_dir+'/'+x_dir[k]+'/'+y_dir[t]+'/'+z_dir[h] for k, t, h in directory_combinations]
x_array = np.random.rand(VARIABLES, SAMPLE_SIZE)
shape = x_array.shape
# Create array in shared memory
shared_array = to_shared_array(x_array, ctypes.c_int64)
# Recreate x_array using the shared memory array as the base:
x_array = to_numpy_array(shared_array, shape)
# Create pool of 12 processes copying the shared array to each process:
pool = mp.Pool(12, initializer=init_pool, initargs=(shared_array, shape))
pool.map(worker, target_dirs)
# This is required for Windows:
if __name__ == '__main__':
main()

Related

Parallelizing with itertools and numba

I've been working on a project for a while now that requires calculating some very large datasets, and very quickly have moved beyond anything that my meager Excel knowledge could handle. In the last few days I've started learning Python, which has helped with handling the size of data I'm dealing with, but the estimated processing time for these datasets is looking to be incredibly long (possibly a couple hundred years on my laptop).
The bottleneck here is an equation that could produce trillions or quadrillions of results, since it is calculating every combination of 6 different lists and running it through an equation that you'll see in the code. The code works just fine, as is, but is isn't feasible for larger datasets than the example I included. A real dataset would be something more like Set1S, 2S, and 3S being 50 items each, and Sets12A...being about 2500 items each (50x50 in this case. These sets always have a length equal to the square of the first 3 lists, but I'm keeping things short and simple here.).
I'm well aware that the amount of results is absolutely huge, but want to start with as large a dataset as I can, so I can see how much I can reduce the input sizes without greatly impacting the results when I plot a cumulative% histogram.
'Calculator'
import numpy as np
Set1S = np.array([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15])
Set2S = np.array([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15])
Set3S = np.array([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15])
Set12A = np.array([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15])
Set23A = np.array([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15])
Set13A = np.array([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15])
'Define an empty array to add results'
BlockVol = []
from itertools import product
'itertools iterates through all combinations of lists'
for i,j,k,a,b,c in product(Set1S, Set2S, Set3S, Set12A, Set23A, Set13A):
'This is the bottleneck equation, with large input datasets'
BlockVol.append((abs(i*j*k*np.sin(a)*np.sin(b)*np.sin(c))))
arr = np.array(BlockVol)
'manipulate the result list a couple ways'
BlockVol = np.cbrt(BlockVol)
BlockVol = BlockVol*12
'quick check to size of results list'
len(BlockVol)
This took me about 3 minutes or so for 11.3M results, just from eyeballing the clock.
I've learned about #njit, prange in the last day or so, but am a bit stuck in trying to translate my work into this format. I do have a desktop PC with a pretty good GPU, so I think I could speed things up by a lot. I'm well aware that the code below is a big garbage fire that doesn't do anything, but I'm hoping that I'm at least getting the point across on what I'm trying to do.
It seems that the way to go is to define a function with my 6 input lists, but i'm just not sure how to fuse the itertools product and the njit together.
import numpy as np
from itertools import product
from numba import njit, prange
#njit(parallel = True)
def BlockVolCalc(Set1S, Set2S, Set3S, Set12A, Set23A, Set13A):
numRows =Len(Set12A)
BlockVol = np.zeros(numRows)
for i,j,k,a,b,c in product(Set1S, Set2S, Set3S, Set12A, Set23A, Set13A):
BlockVol.append((abs(i*j*k*np.sin(a)*np.sin(b)*np.sin(c))))
arr = np.array(BlockVol)
BlockVol = np.cbrt(BlockVol)
BlockVol = BlockVol*12
len(BlockVol)
Any help is much appreciated, as this is all very new and overwhelming.
Thank you!
I solved your task just by NumPy code, it is always nicer to use just NumPy instead of heavy Numba if possible. Next NumPy-only code will be as fast as same solution using Numba.
My code is 2800 times faster than your reference code, time is measured at the end of code.
In next code BlockValCalcRef(...) function is just your reference code organized as function. And BlockVolCalc(...) is my NumPy based function that should give a lot of speedup. At the end I do assert np.allclose(...) in order to check that both solutions give same results.
Also I simplified a bit sets creation to use one N param to generate sets, in your real world you just provide necessary sets.
In order to solve task I did several things:
Instead of computing np.sin(...) many times for same values I precomputed them just once for Set12A, Set23A, Set13A. Also precomputed np.abs(...) for all sets.
In order to compute cross-product I used special way of numpy arrays indexing like [None, None, :, None, None, None] this allows us to use so-called popular numpy arrays broadcasting.
I have also idea how to improve code even more, to make it around 6 times even faster, but I think even with current huge speed you'll fill whole RAM of your machine in matter of seconds. The idea how to improve is next, currently cross product computes on each step product of 6 numbers, instead of this one can compute product of K - 1 sets and then multiply this array by K-th set in order to get K sets product. This will give 6 time more speedup (because there are 6 sets) because you'll need just one multiplication instead of 6.
Update: I've implemented second improved version of function BlockVolCalc2(...) according to paragraph above. It has 2800x speedup, for larger N it will be probably even more faster.
Try it online!
import numpy as np, time
N = 7
Set1S = np.arange(1, N + 1)
Set2S = np.arange(1, N + 1)
Set3S = np.arange(1, N + 1)
Set12A = np.arange(1, N + 1)
Set23A = np.arange(1, N + 1)
Set13A = np.arange(1, N + 1)
def BlockValCalcRef(Set1S, Set2S, Set3S, Set12A, Set23A, Set13A):
BlockVol = []
from itertools import product
for i,j,k,a,b,c in product(Set1S, Set2S, Set3S, Set12A, Set23A, Set13A):
BlockVol.append((abs(i*j*k*np.sin(a)*np.sin(b)*np.sin(c))))
return np.array(BlockVol)
def BlockVolCalc(Set1S, Set2S, Set3S, Set12A, Set23A, Set13A):
Set1S, Set2S, Set3S = np.abs(Set1S), np.abs(Set2S), np.abs(Set3S)
Set12A, Set23A, Set13A = np.abs(np.sin(Set12A)), np.abs(np.sin(Set23A)), np.abs(np.sin(Set13A))
return (
Set1S[:, None, None, None, None, None] *
Set2S[None, :, None, None, None, None] *
Set3S[None, None, :, None, None, None] *
Set12A[None, None, None, :, None, None] *
Set23A[None, None, None, None, :, None] *
Set13A[None, None, None, None, None, :]
).ravel()
def BlockVolCalc2(Set1S, Set2S, Set3S, Set12A, Set23A, Set13A):
Set1S, Set2S, Set3S = np.abs(Set1S), np.abs(Set2S), np.abs(Set3S)
Set12A, Set23A, Set13A = np.abs(np.sin(Set12A)), np.abs(np.sin(Set23A)), np.abs(np.sin(Set13A))
prod = np.ones((1,), dtype = np.float32)
for s in reversed([Set1S, Set2S, Set3S, Set12A, Set23A, Set13A]):
prod = (s[:, None] * prod[None, :]).ravel()
return prod
# -------- Testing Correctness and Time Measuring --------
tb = time.time()
a0 = BlockValCalcRef(Set1S, Set2S, Set3S, Set12A, Set23A, Set13A),
t0 = time.time() - tb
print(f'base time {round(t0, 4)} sec')
tb = time.time()
a1 = BlockVolCalc(Set1S, Set2S, Set3S, Set12A, Set23A, Set13A)
t1 = time.time() - tb
print(f'improved time {round(t1, 4)} sec, speedup {round(t0 / t1, 2)}x')
tb = time.time()
a2 = BlockVolCalc2(Set1S, Set2S, Set3S, Set12A, Set23A, Set13A)
t2 = time.time() - tb
print(f'improved2 time {round(t2, 4)} sec, speedup {round(t0 / t2, 2)}x')
assert np.allclose(a0, a1)
assert np.allclose(a0, a2)
Output:
base time 2.7569 sec
improved time 0.0015 sec, speedup 1834.83x
improved2 time 0.001 sec, speedup 2755.09x
My function embedded into your initial first code will look like here in this code.
Also I created TensorFlow-based variant of code, which will use all of your CPU cores and GPU, this code needs installing tensorflow one time by python -m pip install --upgrade numpy tensorflow:
import numpy as np
N = 18
Set1S, Set2S, Set3S, Set12A, Set23A, Set13A = [np.arange(1 + i, N + 1 + i) for i in range(6)]
dtype = np.float32
def Prepare(Set1S, Set2S, Set3S, Set12A, Set23A, Set13A):
import numpy as np
Set12A, Set23A, Set13A = np.sin(Set12A), np.sin(Set23A), np.sin(Set13A)
return [np.abs(s).astype(dtype) for s in [Set1S, Set2S, Set3S, Set12A, Set23A, Set13A]]
sets = Prepare(Set1S, Set2S, Set3S, Set12A, Set23A, Set13A)
def ProcessNP(sets):
import numpy as np
res = np.ones((1,), dtype = dtype)
for s in reversed(sets):
res = (s[:, None] * res[None, :]).ravel()
res = np.cbrt(res) * 12
return res
def ProcessTF(sets, *, state = {}):
if 'graph' not in state:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
import numpy as np, tensorflow as tf
tf.compat.v1.disable_eager_execution()
cpus = tf.config.list_logical_devices('CPU')
#print(f"CPUs: {[e.name for e in cpus]}")
gpus = tf.config.list_logical_devices('GPU')
#print(f"GPUs: {[e.name for e in gpus]}")
print(f"GPU: {len(gpus) > 0}")
state['graph'] = tf.Graph()
state['sess'] = tf.compat.v1.Session(graph = state['graph'])
#tf.device(cpus[0].name if len(gpus) == 0 else gpus[0].name)
with state['sess'].as_default(), state['graph'].as_default():
res = tf.ones((1,), dtype = dtype)
state['inp'] = []
for s in reversed(sets):
sph = tf.compat.v1.placeholder(dtype, s.shape)
state['inp'].insert(0, sph)
res = sph[:, None] * res[None, :]
res = tf.reshape(res, (tf.size(res),))
res = tf.math.pow(res, 1 / 3) * 12
state['out'] = res
def Run(sets):
with state['sess'].as_default(), state['graph'].as_default():
return tf.compat.v1.get_default_session().run(
state['out'], {ph: s for ph, s in zip(state['inp'], sets)}
)
state['run'] = Run
return state['run'](sets)
# ------------ Testing ------------
npa, tfa = ProcessNP(sets), ProcessTF(sets)
assert np.allclose(npa, tfa)
from timeit import timeit
print('Nums:', round(npa.size / 10 ** 6, 3), 'M')
timeit_num = 2
print('NP:', round(timeit(lambda: ProcessNP(sets), number = timeit_num) / timeit_num, 3), 'sec')
print('TF:', round(timeit(lambda: ProcessTF(sets), number = timeit_num) / timeit_num, 3), 'sec')
On my 2-cores CPU it prints:
GPU: False
Nums: 34.012 M
NP: 3.487 sec
TF: 1.185 sec

How to build a numpy matrix (from scratch, not existing before) adding calculated columns in a for loop

It is a task in school (parallel normalization of each column of a matrix) and besides other problems you may see, I found it particularly difficult to find something easy as the list = [] that you can list.append() entire lists in a loop to, without predefining dimensions.
Here is what I have so far with the line in question at the end. Thank you in advance for any help!
from multiprocessing import Pool
import numpy as np
def fct_norm(col):
mn = col.min()
mx = col.max()
col_norm = np.zeros((6, 1))
for i in range(6):
col_norm[i, 0] = (col[i] - mn) / (mx - mn)
return col_norm
if __name__ == "__main__":
pool = Pool()
arr = np.random.uniform(0, 100, size=(6, 3))
maybe predefine arr_norm here?
for i in range(2):
print("i = ", i)
col = arr[:, i]
result = pool.map(fct_norm, [col])
norm_arr = HOW_TO_ADD_EACH_RESULT_COLUMN_TO_A_NEW_ARRAY?
The function you need to concatenate a number of columns is np.hstack. However, a big problem is pool.mapis not used in the correct way in the original code.
As written, there is no parallel execution of the columns, since each call to pool.map gets only a single column. The idea is to pass an iterator with several values at the same time - in this case, multiple columns to pool.map.
Since numpy loops over rows, rather than columns, the matrix must be transposed (using the (...).T operator. Also, after the pool is finished, it is good measure to close it. One way to handle this automatically, is to use a context (i.e., the with Pool() as pool: construct, as then it will close automatically.
This all taken together gives the following solution:
from multiprocessing import Pool
import numpy as np
def fct_norm(col):
mn = col.min()
mx = col.max()
col_norm = np.zeros((6, 1))
for i in range(6):
col_norm[i, 0] = (col[i] - mn) / (mx - mn)
return col_norm
if __name__ == "__main__":
arr = np.random.uniform(0, 100, size=(6, 3))
with Pool() as pool:
norm_arr = np.hstack(pool.map(fct_norm, arr.T))
# Here norm_arr is available for further operations.
Thus, the whole operation can be performed in two lines.

Numba vectorize for function with no input

I want to parallelize a function using numba.vectorize, but my function doesn't take any input. Currently, I use a dummy array and dummy input for my function that is never used.
Is there a more elegant/fast way (possibly without using numba.vectorize)?
Code example (not my actual code, only for demonstration how I discard input):
import numpy as np
from numba import vectorize
#vectorize(["int32(int32)"], nopython=True)
def particle_path(discard_me):
x = 0
for _ in range(10):
x += np.random.uniform(0, 1)
return np.int32(x)
arr = particle_path(np.empty(1024, dtype=np.int32))
print(arr)
If you'll simply be dealing with 1D arrays, then you can use the following, where the array must be instantiated outside the function. There doesn't seem to be any reason to use vectorize here, you can achieve the goal simply with jit although you do have to explicitly write the loop over the array elements using this. If your array will always be 1D, then you can use:
import numpy as np
from numba import jit
#jit(nopython=True)
def particle_path(out):
for i in range(len(out)):
x = 0
for _ in range(10):
x += np.random.uniform(0, 1)
out[i] = x
arr = np.empty(1024, dtype=np.int32)
particle_path(arr)
You can similarly deal with any-dimensional arrays using the flat attribute (and make sure to use .size to get total number of elements in the array):
#jit(nopython=True)
def particle_path(out):
for i in range(out.size):
x = 0
for _ in range(10):
x += np.random.uniform(0, 1)
out.flat[i] = x
arr = np.empty(1024, dtype=np.int32)
particle_path(arr)
and finally you can create your array inside your function if you need a new array each time you run the function (use the above instead if you'll be calling the function repeatedly and want to overwrite the same array, hence saving the time to re-allocate the same array over and over again).
#jit(nopython=True)
def particle_path(num):
out = np.empty(shape=num, dtype=np.int32)
for i in range(num):
x = 0
for _ in range(10):
x += np.random.uniform(0, 1)
out[i] = x
return out
arr = particle_path(1024)

Problems faced by Manually Implementing a FIFOQueue in tensorflow

I am trying to come up with a method that could implement a FIFOQueue in tensorflow. So on every iteration, the purpose is to assign a placeholder a certain number, then store it in a Variable named: buffer. After each assignment, I am incrementing an index. The buffer size is [5], so that index should range from 0 to 4. Finally, after the buffer is full, I would set buffer[0:4] to be buffer[1:5], and then add the new value to buffer[4]. So here is my
code:
import tensorflow as tf
import numpy as np
import random
dim = 30
lst = []
for i in range(dim):
lst.append(random.randint(1, 10))
data = np.reshape(lst, [dim, 1])
print(lst)
# create a buffer:
buffer_input = tf.placeholder(tf.int32, shape=[1])
buffer = tf.Variable(tf.zeros([5], tf.int32))
index = tf.Variable(tf.constant(0))
def fillBufferBeforeFilled():
update_op1 = tf.scatter_update(buffer, indices=[index], updates=buffer_input)
index_assign_add = tf.assign_add(index, 1)
return update_op1, index_assign_add
def fillBufferAfterFilled():
tmp = tf.slice(buffer, begin=[0], size=[4])
update_op2 = tf.scatter_update(buffer, indices=[0, 1, 2, 3], updates=tmp)
update_op3 = tf.scatter_update(buffer, indices=[index], updates=buffer_input)
return update_op2, update_op3
cond = tf.cond(tf.equal(index, 4), lambda: fillBufferBeforeFilled(), lambda: fillBufferAfterFilled())
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for i in range(dim):
cond_ = sess.run(cond, feed_dict={buffer_input: data[i]})
buf = sess.run(buffer, feed_dict={buffer_input: data[i]})
print('buf: ', buf)
Problem: The index Variable is not incremented after each call, while the first element of the buffer is being assigned to the value passed to the placeholder.
I would like to know why I'm getting this behavior and what is the solution to this problem.
any help is much appreciated!!
You've mixed up the order of the conditions in tf.cond; it should be
cond = tf.cond(tf.equal(index, 4), lambda: fillBufferAfterFilled(), lambda: fillBufferBeforeFilled())
I can get your code running and it mostly works, but the updates aren't quite right; I suspect you'll need to add some tf.control_dependencies calls to force things to happen in the right order.
Here is the solution:
import tensorflow as tf
import numpy as np
import random
dim = 30
lst = []
for i in range(dim):
lst.append(random.randint(1, 10))
data = np.reshape(lst, [dim, 1])
print(lst)
# create a buffer:
buffer_input = tf.placeholder(tf.int32, shape=[1])
buffer = tf.Variable(tf.zeros([5], tf.int32))
index = tf.Variable(-1, tf.int32)
def fillBufferBeforeFilled():
index_assign_add = tf.assign_add(index, 1)
with tf.control_dependencies([index_assign_add]):
update_op1 = tf.scatter_update(buffer, indices=[index], updates=buffer_input)
return update_op1, index_assign_add
def fillBufferAfterFilled():
tmp = tf.slice(buffer, begin=[1], size=[4])
update_op2 = tf.scatter_update(buffer, indices=[0, 1, 2, 3], updates=tmp)
with tf.control_dependencies([update_op2]):
update_op3 = tf.scatter_update(buffer, indices=[index], updates=buffer_input)
return update_op2, update_op3
cond = tf.cond(tf.equal(index, 4), lambda: fillBufferAfterFilled(), lambda: fillBufferBeforeFilled())
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for i in range(dim):
cond_ = sess.run(cond, feed_dict={buffer_input: data[i]})
buf = sess.run(buffer, feed_dict={buffer_input: data[i]})
print('buf: ', buf)

returning a two dimensional array by multiprocessing

In the following code which is an example of my main code, I have tried to use pathos.multiprocessing to increase the speed of iteration of a loop. The output of each iteration which has implemented with multiprocessing is a 2-D array. I used pathos.multiprocessing instead of multiprocessing since I wanted to use it in my class method. I have used apipe method of the pathos.multiprocessing to collect the output in a list but it returns an empty list. I have no idea why it fails
import numpy as np
import random
import pathos.multiprocessing as mp
class Testsystematics(object):
def __init__(self, x, y, NTH = None, THMIN = None, THMAX = None, NRESAMPLE = None):
self.x = x
self.y = y
self.nbins = NTH
self.bmin = THMIN
self.bmax = THMAX
self.nresample= NRESAMPLE
self.bins = np.linspace(self.bmin, self.bmax, self.nbins+1, True).astype(np.float)
self.sample = np.array([[random.choice(range(len(self.y))) for _ in xrange(len(self.y))] for i in range(self.nresample)])
self.result_list=[]
def log_result(self, result):
self.result_list.append(result)
def bootstrapping(self, k):
xi_p = np.zeros(self.nbins, float)
xi_m = np.zeros(self.nbins, float)
nind = np.zeros(self.nbins, float)
for i in range(len(self.x)):
for j in range(len(self.x)):
if (i!=j):
sep= np.sqrt(self.x[i]**2+self.x[j]**2)
index= np.searchsorted(self.bins, sep , side='right')-1
sind = np.sin(sep)
if ((sep< self.bins[-1]) and (sep>=self.bins[0])):
xi_p[index] += sind*(np.mean(y)-np.median(y))
xi_m[index] += sind*np.std(y)
nind[index] += 1.0
for i in range(self.nbins):
xi_p[i]=xi_p[i]/nind[i]
xi_m[i]=xi_m[i]/nind[i]
return np.vstack((xi_p,xi_m))
def twopcf(self):
if (self.sys_type==1):
pool = mp.ProcessingPool(16)
for n in range(self.nresample):
pool.apipe(self.bootstrapping, args=(n,), callback=self.log_result)
shape,scale=0.5, 0.6
x=np.random.gamma(shape, scale, 10000)
mu1, sigma1 = 0, 0.5 # mean and standard deviation
mu2, sigma2 = 0.1, 0.7 # mean and standard deviation
y = np.random.normal(mu1, sigma1, 1000)+np.random.normal(mu2, sigma2, 1000)
sysTest=Testsystematics(x, y, NTH = 10, THMIN = 0, THMAX = 5, NRESAMPLE = 100)
any suggestion?
I'm the pathos author. I tried your code, and it runs, but produces no error and produces no result in result_list. I believe that is because you are using apipe incorrectly. The correct use of apipe is as follows:
>>> import pathos
>>> def squared(x):
... return x**2
...
>>> pool = pathos.multiprocessing.ProcessingPool()
>>> res = pool.apipe(squared, 5)
>>> res.get()
25
self.bootstrapping takes self and k, so you have to provide a k in the pipe call when you calling it as an instance method. There is no callback -- if you want a callback, you'd need to add one to your function.
Note that the return value is retrieved by (1) getting a return object, and (2) by calling get on the return object.
From you use of apipe within a for loop, that points me to suggest you use pool.amap (or pool.imap) instead -- then you can do the for loop in parallel.

Categories