Python multiprocessing pool.map with multiples arguments

Python multiprocessing pool.map with multiples arguments - python

I need some help because I tried since two days, and I don't know how I can do this. I have function compute_desc that takes multiples arguments (5 to be exact) and I would like to run this in parallel.
I have this for now:
def compute_desc(coord, radius, coords, feat, verbose):
# Compute here my descriptors
return my_desc # numpy array (1x10 dimensions)
def main():
points = np.rand.random((1000000, 4))
coords = points[:, 0:3]
feat = points[:, 3]
all_features = np.empty((1000000, 10))
all_features[:] = np.NAN
scales = [0.5, 1, 2]
for radius in scales:
for index, coord in enumerate(coords):
all_features[index, :] = compute_desc(coord,
radius,
coords,
feat,
False)
I would like to parallelize this. I saw several solutions with a Pool, but I don't understand how it works.
I tried with a pool.map(), but I can only send only one argument to the function.
Here is my solution (it doesn't work):
all_features = [pool.map(compute_desc, zip(point, repeat([radius,
coords,
feat,
False]
)
)
)]
but I doubt it can work with a numpy array.
EDIT
This is my minimum code with a pool (it works now):
import numpy as np
from multiprocessing import Pool
from itertools import repeat
def compute_desc(coord, radius, coords, feat, verbose):
# Compute here my descriptors
my_desc = np.rand.random((1, 10))
return my_desc
def compute_desc_pool(args):
coord, radius, coords, feat, verbose = args
compute_desc(coord, radius, coords, feat, verbose)
def main():
points = np.random.rand(1000000, 4)
coords = points[:, 0:3]
feat = points[:, 3]
scales = [0.5, 1, 2]
for radius in scales:
with Pool() as pool:
args = zip(points, repeat(radius),
repeat(coords),
repeat(feat),
repeat(kdtree),
repeat(False))
feat_one_scale = pool.map(compute_desc_pool, args)
feat_one_scale = np.array(feat_one_scale)
if radius == scales[0]:
all_features = feat_one_scale
else:
all_features = np.hstack([all_features, feat_one_scale])
# Others stuffs

The generic solution is to pass to Pool.map a sequence of tuples, each tuple holding one set of arguments for your worker function, and then to unpack the tuple in the worker function.
So, just change your function to accept only one argument, a tuple of your arguments, which you already prepared with zip and passed to Pool.map. Then simply unpack args to variables:
def compute_desc(args):
coord, radius, coords, feat, verbose = args
# Compute here my descriptors
Also, Pool.map should work with numpy types too, since after all, they are valid Python types.
Just be sure to properly zip 5 sequences, so your function receives a 5-tuple. You don't need to iterate over point in coords, zip will do that for you:
args = zip(coords, repeat(radius), repeat(coords), repeat(feat), repeat(False))
# args is a list of [(coords[0], radius, coords, feat, False), (coords[1], ... )]
(if you do, and give point as a first sequence to zip, the zip will iterate over that point, which is in this case a 3-element array).
Your Pool.map line should look like:
for radius in scales:
args = zip(coords, repeat(radius), repeat(coords), repeat(feat), repeat(False))
feat_one_scale = [pool.map(compute_desc_pool, args)]
# other stuff
A solution specific to your case, where all arguments except one are fixed could be to use functools.partial (as the other answer suggests). Furthermore, you don't even need to unpack coords in the first argument, just pass the index [0..n] in coords, since each invocation of your worker function already receives the complete coords array.

I assume from your example that four of those five arguments would be constant to all calls to compute_desc_pool. If so, then you can use partial to do this.
from functools import partial
....
def compute_desc_pool(coord, radius, coords, feat, verbose):
compute_desc(coord, radius, coords, feat, verbose)
def main():
points = np.random.rand(1000000, 4)
coords = points[:, 0:3]
feat = points[:, 3]
feat_one_scale = np.empty((1000000, 10))
feat_one_scale[:] = np.NAN
scales = [0.5, 1, 2]
pool = Pool()
for radius in scales:
feat_one_scale = [pool.map(partial(compute_desc_pool, radius, coords,
feat, False), coords)]

Related

Using multiprocessing on Image processing

I'm trying to speed up my processing of a PIL.Image, where I divide the image into small parts, search for the most similar image inside a database and then replace the original small part of the image with this found image.
This is the described function:
def work_image(img, lenx, leny, neigh, split_dict, img_train_rot):
constructed_img = Image.new(mode='L', size=img.size)
for x in range(0,img.size[0],lenx):
for y in range(0,img.size[1],leny):
box = (x,y,x+lenx,y+leny)
split_img = img.crop(box)
res = neigh.kneighbors(np.asarray(split_img).ravel().reshape((1,-1)))
#look up the found image part in img_train_rot and define the position as new_box
constructed_img.paste(img_train_rot[i].crop(new_box), (x,y))
return constructed_img
Now I wanted to parallelize this function, since f.e. each row of such image parts could be dealt with entirely on its own.
I came up with this approach using multiprocessing.Pool:
def work_image_parallel(leny, neigh, split_dict, img_train_rot, img_slice):
constructed_img_slice = Image.new(mode='L', size=img_slice.size)
for y in range(0, img_slice.size[1], leny):
box = (0, y, img_slice.size[0], y+leny)
img_part = img_slice.crop(box)
res = neigh.kneighbors(np.asarray(img_part).ravel().reshape((1,-1)))
#look up the found image part in img_train_rot and define the position as new_box
constructed_img_slice.paste(img_train_rot[i].crop(new_box), (0,y))
return constructed_img_slice
if __name__ == '__main__':
lenx, leny = 16, 16
#define my image database and so on
neigh = setup_nearest_neighbour(train_imgs, n_neighbors=1)
test_img = test_imgs[0]
func = partial(work_image_parallel, leny, neigh, split_dict, img_train_rot)
pool = multiprocessing.Pool()
try:
res = pool.map(func, map(lambda x: x, [test_img.crop((x, 0, x+lenx, test_img.size[1])) for x in range(0, test_img.size[0], lenx)]))
finally:
pool.close()
pool.join()
test_result2 = Image.new(mode='L', size = test_img.size)
for i in range(len(res)):
test_result2.paste(res[i], box=(i*lenx, 0, i*lenx + lenx, test_result2.size[1]))
However, this parallelized version isn't exactly faster than the normal version, and if I decrease the size of my image division, the parallelized version throws an AssertionError (other posts said this might be because the data size to be sent between the processes becomes too big).
Therefore my question, did I maybe do something wrong? Is multiprocessing maybe not the right approach here? Or why doesn't the multiprocessing decrease the computation time, since the workload per image slice should be big enough to offset the time needed to create processes etc.
Any help would be appreciated.

Disclaimer: I am not that familiar with PIL so you may should take a close look at the PIL method calls, which may need some "adjustment" on your part since there is no way that I can actually test this.
First, I observe that you will probably be making a lot of repeated invocations of your worker function work_image_parallel and that some of those arguments being passed to that function might be quite large (all of this depends, of course, on how large your images are). Rather than repeatedly passing such potentially large arguments, I would prefer to copy these arguments once to each process in your pool and instantiate them as global variables. This is accomplished with a pool initializer function.
Second, I have attempted to modify your work_image_parallel function to be as close to your original work_image function except that it now deals with just a single x, y coordinate pair that is passed to it. In that way more of the work is being done by your subprocesses. I have also tried to reduce the number of pasting operations required (if I have correctly understood what is going on).
Third, because the images may be quite large, I am using a generator expression to create the arguments to be used with imap_unordered instead of map. This is because the number of x, y pairs can be quite large in a very large image and map requires that its iterable argument be such that its length can be computed so that an efficient chunksize value can be computed (see the docs). With imap_unordered, we should specify an explicit chunksize value to be efficient (the default is 1 if unspecified) if we expect that the iterable could be large. If you know that you are dealing with relatively small images so that the size of the x_y_args iterable would not be unreasonably memory-inefficient if stored as a list, then, you could just use method map with the default chunksize value of None and have the pool compute the value for you. The advantage of using imap_unordered is that results do not have to be returned in order, so processing could be faster.
def init_pool(the_img, the_img_train_rot, the_neigh, the_split_dict):
global img, img_train_rot, neigh, split_dict
img = the_img
img_train_rot = the_img_train_rot
neigh = the_neigh
split_dict = the_split_dict
def work_image_parallel(lenx, leny, t):
x, y = t
box = (x,y,x+lenx,y+leny)
split_img = img.crop(box)
res = neigh.kneighbors(np.asarray(split_img).ravel().reshape((1,-1)))
#look up the found image part in img_train_rot and define the position as new_box
# return original x, y values used:
return x, y, img_train_rot[i].crop(new_box)
def compute_chunksize(iterable_size, pool_size):
chunksize, remainder = divmod(iterable_size, 4 * pool_size)
if remainder:
chunksize += 1
return chunksize
if __name__ == '__main__':
lenx, leny = 16, 16
#define my image database and so on
neigh = setup_nearest_neighbour(train_imgs, n_neighbors=1)
test_img = test_imgs[0]
func = partial(work_image_parallel, lenx, leny)
# in case this is a very large image, use a generator expression
x_y_args = ((x, y) for x in range(0, test_img.size[0], lenx) for y in range(0, test_img.size[1], leny))
# approximate size of x_y_args:
iterable_size = (test_img.size[0] // lenx) * (test_img.size[1] // leny)
pool_size = multiprocessing.cpu_count()
chunksize = compute_chunksize(iterable_size, pool_size)
pool = multiprocessing.Pool(pool_size, initiializer=init_pool, initargs=(test_img, img_train_rot, neigh, split_dict))
test_result2 = Image.new(mode='L', size = test_img.size)
try:
# use imap or imap_unordered when the iterable is a generator to avoid conversion of iterable to a list
# but specify a suitable chunksize for efficiency in case the iterable is very large:
for x, y, res in pool.imap_unordered(func, x_y_args, chunksize=chunksize):
test_result2.paste(res, (x, y))
finally:
pool.close()
pool.join()
Update (break up image into bigger slices)
def init_pool(the_img, the_img_train_rot, the_neigh, the_split_dict):
global img, img_train_rot, neigh, split_dict
img = the_img
img_train_rot = the_img_train_rot
neigh = the_neigh
split_dict = the_split_dict
def work_image_parallel(lenx, leny, x):
img_slice = img.crop((x, 0, x+lenx, img.size[1]))
constructed_img_slice = Image.new(mode='L', size=img_slice.size)
for y in range(0, img_slice.size[1], leny):
box = (0, y, img_slice.size[0], y+leny)
img_part = img_slice.crop(box)
res = neigh.kneighbors(np.asarray(img_part).ravel().reshape((1,-1)))
#look up the found image part in img_train_rot and define the position as new_box
constructed_img_slice.paste(img_train_rot[i].crop(new_box), (0,y))
return constructed_img_slice
if __name__ == '__main__':
lenx, leny = 16, 16
#define my image database and so on
neigh = setup_nearest_neighbour(train_imgs, n_neighbors=1)
test_img = test_imgs[0]
pool = multiprocessing.Pool(pool_size, initiializer=init_pool, initargs=(test_img, img_train_rot, neigh, split_dict))
func = partial(work_image_parallel, lenx, leny)
try:
test_result2 = Image.new(mode='L', size = test_img.size)
x = 0
for res in pool.map(func, [x for x in range(0, test_img.size[0], lenx)]):
test_result2.paste(res, box=(x, 0, x + lenx, test_result2.size[1]))
x += lenx
finally:
pool.close()
pool.join()

Number format python

I want to have the legend of the plot shown with the value in a list. But what I get is the element index but not the value itself. I dont know how to fix it. I'm referring to the plt.plot line. Thanks for the help.
import matplotlib.pyplot as plt
import numpy as np
x = np.random.random(1000)
y = np.random.random(1000)
n = len(x)
d_ij = []
for i in range(n):
for j in range(i+1,n):
a = np.sqrt((x[i]-x[j])**2+(y[i]-y[j])**2)
d_ij.append(a)
epsilon = np.linspace(0.01,1,num=10)
sigma = np.linspace(0.01,1,num=10)
def lj_pot(epsi,sig,d):
result = []
for i in range(len(d)):
a = 4*epsi*((sig/d[i])**12-(sig/d[i])**6)
result.append(a)
return result
for i in range(len(epsilon)):
for j in range(len(sigma)):
a = epsilon[i]
b = sigma[j]
plt.cla()
plt.ylim([-1.5, 1.5])
plt.xlim([0, 2])
plt.plot(sorted(d_ij),lj_pot(epsilon[i],sigma[j],sorted(d_ij)),label = 'epsilon = %d, sigma =%d' %(a,b))
plt.legend()
plt.savefig("epsilon_%d_sigma_%d.png" % (i,j))
plt.show()

Your code is a bit unpythonic, so I tried to clean it up to the best of my knowledge. numpy.random.random and numpy.random.uniform(0, 1) are basically the same, however, the latter also allows you to pass the shape of the return array that you would like to have, in this case an array with 1000 rows and two columns (1000, 2). I then use some magic to assign the two colums of the return array to x and y in the same line, respectively.
numpy.hypot does as the name suggests and calculates the hypothenuse of x and y. It can also do that for each entry of arrays with the same size, saving you the for loops, which you should try to aviod in Python since they are pretty slow.
You used plt for all your plotting, which is fine as long as you only have one figure, but I would recommend to be as explicit as possible, according to one of Python's key notions:
explicit is better than implicit.
I recommend you read through this guide, in particular the section called 'Stateful Versus Stateless Approaches'. I changed your commands accordingly.
It is also very unpythonic to loop over items of a list using the index of the item in the list like you did (for i in range(len(list)): item = list[i]). You can just reference the item directly (for item in list:).
Lastly I changed your formatted strings to the more convenient f-strings. Have a read here.
import matplotlib.pyplot as plt
import numpy as np
def pot(epsi, sig, d):
result = 4*epsi*((sig/d)**12 - (sig/d)**6)
return result
# I am not sure why you would create the independent variable this way,
# maybe you are simulating something. In that case, the code below is
# simpler than your version and should achieve the same.
# x, y = zip(*np.random.uniform(0, 1, (1000, 2)))
# d = np.array(sorted(np.hypot(x, y)))
# If you only want to plot your pot function then creating the value range
# like this is just fine.
d = np.linspace(0.001, 1, 1000)
epsilons = sigmas = np.linspace(0.01, 1, num=10)
fig, ax = plt.subplots()
ax.set_xlim([0, 2])
ax.set_ylim([-1.5, 1.5])
line = None
for epsilon in epsilons:
for sigma in sigmas:
if line is None:
line = ax.plot(
d, pot(epsilon, sigma, d),
label=f'epsilon = {epsilon}, sigma = {sigma}'
)[0]
fig.legend()
else:
line.set_data(d, pot(epsilon, sigma, d))
# plt.savefig(f"epsilon_{epsilon}_sigma_{sigma}.png")
fig.show()

Python data structure: parameter dependent arrays

I have a problem where I build some matrices depending on, let's say, two integer parameters. Let's call them A, that depend on p1, p2 where p1, p2 take values from 0 to 5.
Is there a way in Python to store the eigenvalues and eigenvectors of A in an "object", called B, such that somthing like B(1,2)[i] (or B[1,2,i]) will give as a result the eigenvalues (for i=0) or eigenvectors (for i=1) of the matrix A build with p1 = 1 and p2 = 2?
Currently what I am doing is storing the eigenvectors in a dictionary as in the simple example below, but I think it is a dirty hack. I would appreciate any
Example:
import numpy as np
# Build A matrices
def Amatrix(p1,p2):
import numpy as np
return np.array([[p1,p2/10],[p2/10,-p1]])
# Empty dict
eigvec_dict = {}
for p1 in range(3):
for p2 in range(2):
label = str(p1)+str(p2)
eigenvec_dict[label] = np.linalg.eigh(Amatrix(p1,p2))
eigenvec_dict.keys()
Out[9]: ['11', '10', '00', '01', '20', '21']
eigenvec_dict["01"][0]
Out[10]: array([-1., 1.])
eigenvec_dict["01"][1]
Out[11]:
array([[-0.70710678, 0.70710678],
[ 0.70710678, 0.70710678]])

I would use an object that takes a list of points (I think a point is better a tuple than a string) and calculates the eighs immediately.
__getitem__ is overridden returning for this [0, 1, 0] the eigen value for the point (0, 1). The internal data structure still is a dict, but its wrapped in an object and can be nicely called from outside.
import numpy as np
# class to store eigen values / vectors
class EigenH(object):
def __init__(self, points):
self.eighstore = self._create_eighstore(points)
def _create_eighstore(self, points):
eighstore = {}
for point in points:
eighs = np.linalg.eigh(self._get_amatrix(point))
eighstore[point] = eighs
return eighstore
def _get_amatrix(self, point):
p1, p2 = point
return np.array([[p1,p2/10.],[p2/10.,-p1]])
def __getitem__(self, key):
return self.eighstore[key[:2]][key[2]]
def keys(self):
return self.eighstore.keys()
# create point list
points = []
for p1 in range(3):
for p2 in range(2):
# I prefer tuples over strings in this case
points.append((p1, p2))
# instantiate class
eigh = EigenH(points)
# get eigen value
print(eigh[0, 1, 0])
#get eigen vectors
print(eigh[0, 1, 1])
# all available eighs
print(eigh.keys())

How to interpolate SVG path into a pixel coordinates (not simply raster) in Python

I need to convert an SVG description of a path, ie. something like:
M400 597 C235 599 478 607 85 554 C310 675 2 494 399 718 C124 547 569 828 68 400 C-108 317 304 703 96 218 L47 215 L400 290 C602 -146 465 467 550 99 L548 35 L706 400 L580 686 C546 614 591 672 529 629 L400 597 Z
into a list of all the pixels that would fall along that path (assuming the canvas is the size of the monitor. As you can see, the paths I need to work with amount to scribbles and are quite complex. Ideally, I'd like to generate such a path and then convert the entire thing to a pixel-by-pixel description, ie.
p= [(403, 808), (403, 807), (403, 805), (403, 802), (403, 801), (403, 800), (403, 799),
(403, 797), (403, 794), (403, 792), (402, 789), (401, 787), (400, 785), (399, 784),
(399, 783), (398, 782)] # ... it'd be much longer, but you get the idea
Alternatively, I'd be as content with any means of generating paths with curve and line constituents (as in, SVG is simply how I've achieved this so far). The context is a bit peculiar; it's for an experiment in cognitive psychology, wherein I need to gradually animate a dot traversing a path generated according to certain rules, and export that path as pixel data.
To do the animation, I'm intending to simply redraw the dot at each x,y position along the path—hence the need for said list.
My math skills aren't great—I'm came to code from design, not CS—and the paths will get quite complex, meaning computing these points with math alone is... maybe not beyond me but definitely more demanding than I'm aiming for.
Libraries, tricks, strategies—all welcome & appreciated.

I needed to convert an SVG path into discrete points for some weird purposes. Apparently there is no light library for doing that. I ended up creating my own parser.
Input files mostly consist of Bezier curves. I wrote a quick function for that:
def cubic_bezier_sample(start, control1, control2, end):
inputs = np.array([start, control1, control2, end])
cubic_bezier_matrix = np.array([
[-1, 3, -3, 1],
[ 3, -6, 3, 0],
[-3, 3, 0, 0],
[ 1, 0, 0, 0]
])
partial = cubic_bezier_matrix.dot(inputs)
return (lambda t: np.array([t**3, t**2, t, 1]).dot(partial))
def quadratic_sample(start, control, end):
# Quadratic bezier curve is just cubic bezier curve
# with the same control points.
return cubic_bezier_sample(start, control, control, end)
It can be used as follow to generate 10 samples:
n = 10
curve = cubic_bezier_sample((50,0), (50,100), (100,100), (50,0))
points = [curve(float(t)/n) for t in xrange(0, n + 1)]
The code requires numpy. You can also do the dot product without numpy if you want. I am able to get the parameters with svg.path.
Full code example
import numpy as np
import matplotlib.pyplot as plt
def cubic_bezier_sample(start, control1, control2, end):
inputs = np.array([start, control1, control2, end])
cubic_bezier_matrix = np.array([
[-1, 3, -3, 1],
[ 3, -6, 3, 0],
[-3, 3, 0, 0],
[ 1, 0, 0, 0]
])
partial = cubic_bezier_matrix.dot(inputs)
return (lambda t: np.array([t**3, t**2, t, 1]).dot(partial))
# == control points ==
start = np.array([0, 0])
control1 = np.array([60, 5])
control2 = np.array([40, 95])
end = np.array([100, 100])
# number of segments to generate
n_segments = 100
# get curve segment generator
curve = cubic_bezier_sample(start, control1, control2, end)
# get points on curve
points = np.array([curve(t) for t in np.linspace(0, 1, n_segments)])
# == plot ==
controls = np.array([start, control1, control2, end])
# segmented curve
plt.plot(points[:, 0], points[:, 1], '-')
# control points
plt.plot(controls[:,0], controls[:,1], 'o')
# misc lines
plt.plot([start[0], control1[0]], [start[1], control1[1]], '-', lw=1)
plt.plot([control2[0], end[0]], [control2[1], end[1]], '-', lw=1)
plt.show()

I found most of my answer in a different post from unutbu (second answer).
This is my modification of his basic function, with some extra functionality to solve my problem above. I'll have to write a similar function for line segments, but that's obviously much easier, and between them will be able to piece together any combination of curved and linear segments to achieve my goal as described above.
def pascal_row(n):
# This is over my designer's brain, but unutbu says:
# "This returns the nth row of Pascal's Triangle"
result = [1]
x, numerator = 1, n
for denominator in range(1, n//2+1):
# print(numerator,denominator,x)
x *= numerator
x /= denominator
result.append(x)
numerator -= 1
if n&1 == 0:
# n is even
result.extend(reversed(result[:-1]))
else:
result.extend(reversed(result))
return result
def bezier_interpolation(origin, destination, control_o, control_d=None):
points = [origin, control_o, control_d, destination] if control_d else [origin, control_o, destination]
n = len(points)
combinations = pascal_row(n - 1)
def bezier(transitions):
# I don't really understand any of this math, but it works!
result = []
for t in transitions:
t_powers = (t ** i for i in range(n))
u_powers = reversed([(1 - t) ** i for i in range(n)])
coefficients = [c * a * b for c, a, b in zip(combinations, t_powers, u_powers)]
result.append(
list(sum([coef * p for coef, p in zip(coefficients, ps)]) for ps in zip(*points)))
return result
def line_segments(points, size):
# it's more convenient for my purposes to have the pairs of x,y
# coordinates that eventually become the very small line segments
# that constitute my "curves
for i in range(0, len(points), size):
yield points[i:i + size]
# unutbu's function creates waaay more points than needed, and
# these extend well through the "destination" point I want to end at, so,
# I keep inspecting the line segments until one of them passes through
# my intended stop point (ie. "destination") and then manually stop
# collecting, returning the subset I want; probably not efficient,
# but it works
break_next = False
segments = []
for pos in line_segments(bezier([0.01 * t for t in range(101)]), 2):
if break_next:
segments.append([break_next, destination])
break
try:
if [int(i) for i in pos[0]] == destination:
break_next = pos[0]
continue
segments.append(pos)
except IndexError:
# not guaranteed to get an even number of points from bezier()
break
return segments

The svg path interpolator JavaScript library converts svg paths to polygon point data with options for sample size and fidelity. This library supports the full SVG specification and will account for transforms on paths. It takes an svg input and produces JSON representing the interpolated points.

returning a two dimensional array by multiprocessing

In the following code which is an example of my main code, I have tried to use pathos.multiprocessing to increase the speed of iteration of a loop. The output of each iteration which has implemented with multiprocessing is a 2-D array. I used pathos.multiprocessing instead of multiprocessing since I wanted to use it in my class method. I have used apipe method of the pathos.multiprocessing to collect the output in a list but it returns an empty list. I have no idea why it fails
import numpy as np
import random
import pathos.multiprocessing as mp
class Testsystematics(object):
def __init__(self, x, y, NTH = None, THMIN = None, THMAX = None, NRESAMPLE = None):
self.x = x
self.y = y
self.nbins = NTH
self.bmin = THMIN
self.bmax = THMAX
self.nresample= NRESAMPLE
self.bins = np.linspace(self.bmin, self.bmax, self.nbins+1, True).astype(np.float)
self.sample = np.array([[random.choice(range(len(self.y))) for _ in xrange(len(self.y))] for i in range(self.nresample)])
self.result_list=[]
def log_result(self, result):
self.result_list.append(result)
def bootstrapping(self, k):
xi_p = np.zeros(self.nbins, float)
xi_m = np.zeros(self.nbins, float)
nind = np.zeros(self.nbins, float)
for i in range(len(self.x)):
for j in range(len(self.x)):
if (i!=j):
sep= np.sqrt(self.x[i]**2+self.x[j]**2)
index= np.searchsorted(self.bins, sep , side='right')-1
sind = np.sin(sep)
if ((sep< self.bins[-1]) and (sep>=self.bins[0])):
xi_p[index] += sind*(np.mean(y)-np.median(y))
xi_m[index] += sind*np.std(y)
nind[index] += 1.0
for i in range(self.nbins):
xi_p[i]=xi_p[i]/nind[i]
xi_m[i]=xi_m[i]/nind[i]
return np.vstack((xi_p,xi_m))
def twopcf(self):
if (self.sys_type==1):
pool = mp.ProcessingPool(16)
for n in range(self.nresample):
pool.apipe(self.bootstrapping, args=(n,), callback=self.log_result)
shape,scale=0.5, 0.6
x=np.random.gamma(shape, scale, 10000)
mu1, sigma1 = 0, 0.5 # mean and standard deviation
mu2, sigma2 = 0.1, 0.7 # mean and standard deviation
y = np.random.normal(mu1, sigma1, 1000)+np.random.normal(mu2, sigma2, 1000)
sysTest=Testsystematics(x, y, NTH = 10, THMIN = 0, THMAX = 5, NRESAMPLE = 100)
any suggestion?

I'm the pathos author. I tried your code, and it runs, but produces no error and produces no result in result_list. I believe that is because you are using apipe incorrectly. The correct use of apipe is as follows:
>>> import pathos
>>> def squared(x):
... return x**2
...
>>> pool = pathos.multiprocessing.ProcessingPool()
>>> res = pool.apipe(squared, 5)
>>> res.get()
25
self.bootstrapping takes self and k, so you have to provide a k in the pipe call when you calling it as an instance method. There is no callback -- if you want a callback, you'd need to add one to your function.
Note that the return value is retrieved by (1) getting a return object, and (2) by calling get on the return object.
From you use of apipe within a for loop, that points me to suggest you use pool.amap (or pool.imap) instead -- then you can do the for loop in parallel.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python multiprocessing pool.map with multiples arguments - python

Related

Using multiprocessing on Image processing

Number format python

Python data structure: parameter dependent arrays

How to interpolate SVG path into a pixel coordinates (not simply raster) in Python

returning a two dimensional array by multiprocessing

Categories

Resources