python multiprocessing import Pool, cpu_count: causes forever loop - python

The code using multiprocessing causes a forever loop.
I'm using a building an iris recognition system. this is the matching function. everything works fine until the multiprocessing the part.
I'm attaching screenshot of the error output below so that you get a better idea.
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
Code:
##-----------------------------------------------------------------------------
## Import
##-----------------------------------------------------------------------------
import numpy as np
from os import listdir
from fnmatch import filter
import scipy.io as sio
from multiprocessing import Pool, cpu_count
from itertools import repeat
import warnings
warnings.filterwarnings("ignore")
##-----------------------------------------------------------------------------
## Function
##-----------------------------------------------------------------------------
def matching(template_extr, mask_extr, temp_dir, threshold=0.38):
"""
Description:
Match the extracted template with database.
Input:
template_extr - Extracted template.
mask_extr - Extracted mask.
threshold - Threshold of distance.
temp_dir - Directory contains templates.
Output:
List of strings of matched files, 0 if not, -1 if no registered sample.
"""
# Get the number of accounts in the database
n_files = len(filter(listdir(temp_dir), '*.mat'))
if n_files == 0:
return -1
# Use all cores to calculate Hamming distances
args = zip(
sorted(listdir(temp_dir)),
repeat(template_extr),
repeat(mask_extr),
repeat(temp_dir),
)
with Pool(processes=cpu_count()) as pools:
result_list = pools.starmap(matchingPool, args)
filenames = [result_list[i][0] for i in range(len(result_list))]
hm_dists = np.array([result_list[i][1] for i in range(len(result_list))])
# Remove NaN elements
ind_valid = np.where(hm_dists>0)[0]
hm_dists = hm_dists[ind_valid]
filenames = [filenames[idx] for idx in ind_valid]
# Threshold and give the result ID
ind_thres = np.where(hm_dists<=threshold)[0]
# Return
if len(ind_thres)==0:
return 0
else:
hm_dists = hm_dists[ind_thres]
filenames = [filenames[idx] for idx in ind_thres]
ind_sort = np.argsort(hm_dists)
return [filenames[idx] for idx in ind_sort]
#------------------------------------------------------------------------------
def calHammingDist(template1, mask1, template2, mask2):
"""
Description:
Calculate the Hamming distance between two iris templates.
Input:
template1 - The first template.
mask1 - The first noise mask.
template2 - The second template.
mask2 - The second noise mask.
Output:
hd - The Hamming distance as a ratio.
"""
# Initialize
hd = np.nan
# Shift template left and right, use the lowest Hamming distance
for shifts in range(-8,9):
template1s = shiftbits(template1, shifts)
mask1s = shiftbits(mask1, shifts)
mask = np.logical_or(mask1s, mask2)
nummaskbits = np.sum(mask==1)
totalbits = template1s.size - nummaskbits
C = np.logical_xor(template1s, template2)
C = np.logical_and(C, np.logical_not(mask))
bitsdiff = np.sum(C==1)
if totalbits==0:
hd = np.nan
else:
hd1 = bitsdiff / totalbits
if hd1 < hd or np.isnan(hd):
hd = hd1
# Return
return hd
#------------------------------------------------------------------------------
def shiftbits(template, noshifts):
"""
Description:
Shift the bit-wise iris patterns.
Input:
template - The template to be shifted.
noshifts - The number of shift operators, positive for right
direction and negative for left direction.
Output:
templatenew - The shifted template.
"""
# Initialize
templatenew = np.zeros(template.shape)
width = template.shape[1]
s = 2 * np.abs(noshifts)
p = width - s
# Shift
if noshifts == 0:
templatenew = template
elif noshifts < 0:
x = np.arange(p)
templatenew[:, x] = template[:, s + x]
x = np.arange(p, width)
templatenew[:, x] = template[:, x - p]
else:
x = np.arange(s, width)
templatenew[:, x] = template[:, x - s]
x = np.arange(s)
templatenew[:, x] = template[:, p + x]
# Return
return templatenew
#------------------------------------------------------------------------------
def matchingPool(file_temp_name, template_extr, mask_extr, temp_dir):
"""
Description:
Perform matching session within a Pool of parallel computation
Input:
file_temp_name - File name of the examining template
template_extr - Extracted template
mask_extr - Extracted mask of noise
Output:
hm_dist - Hamming distance
"""
# Load each account
data_template = sio.loadmat('%s%s'% (temp_dir, file_temp_name))
template = data_template['template']
mask = data_template['mask']
# Calculate the Hamming distance
hm_dist = calHammingDist(template_extr, mask_extr, template, mask)
return (file_temp_name, hm_dist)
how can I remove multiprocessing and make code still work fine?
screenshots dropbox link

Use python's itertools.starmap()
hope it helps

Related

Getting specific values from ASCII table

I'm currently creating a genetic algorithm and am trying to only get certain values from the ASCII table so the runtime of the algorithm can be a bit faster. In the code below I get the values between 9-127 but I only need the values 9-10, and 32-127 from the ASCII table and I'm not sure on how to exactly only get those specific values. Code below is done in python.
import numpy as np
TARGET_PHRASE = """The smartest and fastest Pixel yet.
Google Tensor: Our first custom-built processor.
The first processor designed by Google and made for Pixel, Tensor makes the new Pixel phones our most powerful yet.
The most advanced Pixel Camera ever.
Capture brilliant color and vivid detail with Pixels best-in-class computational photography and new pro-level lenses.""" # target DNA
POP_SIZE = 4000 # population size
CROSS_RATE = 0.8 # mating probability (DNA crossover)
MUTATION_RATE = 0.00001 # mutation probability
N_GENERATIONS = 100000
DNA_SIZE = len(TARGET_PHRASE)
TARGET_ASCII = np.fromstring(TARGET_PHRASE, dtype=np.uint8) # convert string to number
ASCII_BOUND = [9, 127]
class GA(object):
def __init__(self, DNA_size, DNA_bound, cross_rate, mutation_rate, pop_size):
self.DNA_size = DNA_size
DNA_bound[1] += 1
self.DNA_bound = DNA_bound
self.cross_rate = cross_rate
self.mutate_rate = mutation_rate
self.pop_size = pop_size
self.pop = np.random.randint(*DNA_bound, size=(pop_size, DNA_size)).astype(np.int8) # int8 for convert to ASCII
def translateDNA(self, DNA): # convert to readable string
return DNA.tostring().decode('ascii')
def get_fitness(self): # count how many character matches
match_count = (self.pop == TARGET_ASCII).sum(axis=1)
return match_count
def select(self):
fitness = self.get_fitness() # add a small amount to avoid all zero fitness
idx = np.random.choice(np.arange(self.pop_size), size=self.pop_size, replace=True, p=fitness/fitness.sum())
return self.pop[idx]
def crossover(self, parent, pop):
if np.random.rand() < self.cross_rate:
i_ = np.random.randint(0, self.pop_size, size=1) # select another individual from pop
cross_points = np.random.randint(0, 2, self.DNA_size).astype(np.bool) # choose crossover points
parent[cross_points] = pop[i_, cross_points] # mating and produce one child
return parent
def mutate(self, child):
for point in range(self.DNA_size):
if np.random.rand() < self.mutate_rate:
child[point] = np.random.randint(*self.DNA_bound) # choose a random ASCII index
return child
def evolve(self):
pop = self.select()
pop_copy = pop.copy()
for parent in pop: # for every parent
child = self.crossover(parent, pop_copy)
child = self.mutate(child)
parent[:] = child
self.pop = pop
if __name__ == '__main__':
ga = GA(DNA_size=DNA_SIZE, DNA_bound=ASCII_BOUND, cross_rate=CROSS_RATE,
mutation_rate=MUTATION_RATE, pop_size=POP_SIZE)
for generation in range(N_GENERATIONS):
fitness = ga.get_fitness()
best_DNA = ga.pop[np.argmax(fitness)]
best_phrase = ga.translateDNA(best_DNA)
print('Gen', generation, ': ', best_phrase)
if best_phrase == TARGET_PHRASE:
break
ga.evolve()
You need a customed method to generate random samples in range 9-10, and 32-127, like
def my_rand(pop_size, DNA_size):
bold1=[9,10]
bold2=list(range(32,127))
bold=bold1+bold2
pop = np.random.choice(bold,(pop_size,DNA_size)).astype(np.int8)
return pop
then call this method to replace the line 29, like
delete -- self.pop = np.random.randint(*DNA_bound, size=(pop_size, DNA_size)).astype(np.int8) # int8 for convert to ASCII
call ---self.pop = my_rand(pop_size, DNA_size)

multiprocessing.Pool returns different length of output compared to the input iterable

I wrote a Python program which I want to parallelize using multiprocessing.Pool when calling the program (MyProgram.__call__()). The expected output is a list of dictionaries (dicts) with the same length as the input list images. However, when I test it with input with length 60 using multiprocessing.Pool of 20 cpus, I got an output with only length 41.
Below is my code:
from acat.utilities import neighbor_shell_list, get_adj_matrix, get_max_delta_sum_path
from acat.build.adlayer import StochasticPatternGenerator as SPG
from acat.build.ordering import RandomOrderingGenerator as ROG
from ase.build import fcc111
from ase.io import read
from multiprocessing import Pool
import networkx as nx
import numpy as np
import os
class MyProgram(object):
def __init__(self, alpha=.75, n_jobs=os.cpu_count()):
self.alpha = alpha
self.n_jobs = n_jobs
def __call__(self, images):
# Parallelization
pool = Pool(self.n_jobs)
dicts = pool.map(self.get_dict, images)
return dicts
def get_dict(self, atoms):
d = {}
numbers = atoms.numbers
nblist = neighbor_shell_list(atoms, dx=0.3, neighbor_number=1, mic=True)
A = get_adj_matrix(nblist)
for i in range(len(A)):
nbrs = np.where(A[i] == 1)[0]
An = A[nbrs,:][:,nbrs]
Gn = nx.from_numpy_matrix(An)
path = max(nx.all_simple_paths(Gn, source=0, target=next(Gn.neighbors(0))),
key=lambda x: len(x))
path_numbers = list(numbers[nbrs[path]])
sorted_numbers = get_max_delta_sum_path(path_numbers)
lab1 = str(numbers[i])
lab2 = lab1 + ':' + ','.join(map(str, sorted_numbers))
labs = [lab1, lab2]
for idx, lab in enumerate(labs):
if idx == 0:
factor = 1
elif idx == 1:
factor = self.alpha
if lab in d:
d[lab] += factor
else:
d[lab] = factor
return d
if __name__ == '__main__':
MP = MyProgram(alpha=.75, n_jobs=20)
slab = fcc111('Pt', (4, 4, 4))
slab.center(vacuum=5., axis=2)
rog = ROG(slab, elements=['Ni', 'Pt'])
rog.run(num_gen=10)
slabs = read('orderings.traj', index=':')
spg = SPG(slabs, surface='fcc111',
adsorbate_species=['CO','OH','C'],
min_adsorbate_distance=3.,
composition_effect=True)
spg.run(num_gen=60, action='add', unique=False)
images = read('patterns.traj', index=':')
dicts = MP(images)
print(len(images))
print(len(dicts))
Output
60
41
Does anyone know why is multiprocessing.Pool returning an output of different length from the input? Unfortunately, I cannot reproduce this phenomenon when using simplified code. But in case anyone wants to run my code, you only need to install acat by pip3 install acat. Thanks in advance.
Try changing call to be:
with Pool(self.n_jobs) as pool:
dicts = pool.map(self.get_dict, images)
return dicts
I suspect that the problem is that __call__ returns before all the jobs are done. len may somehow be only seeing the completed jobs rather than all of them.

Why is my interpolation not working properly in my function?

I have a fairly long code that processes spectra, and along the way I need an interpolation of some points. I used to have all this code written line-by-line without any functions, and it all worked properly, but now I'm converting it to two large functions so that I can call it on other models more easily in the future. Below is my code (I have more code after the last line here that plots some things, but that's not relevant to my issue, since I've tested this with a bunch of print lines and learned that my issue arises when I call the interpolation function inside my process function.
import re
import numpy as np
import scipy.interpolate
# Required files and lists
filename = 'bpass_spectra.txt' # number of columns = 4
extinctionfile = 'ExtinctionLawPoints.txt' # R_V = 4.0
datalist = []
if filename == 'bpass_spectra.txt':
filetype = 4
else:
filetype = 1
if extinctionfile == 'ExtinctionLawPoints.txt':
R_V = 4.0
else:
R_V = 1.0 #to be determined
# Constants
h = 4.1357e-15 # Planck's constant [eV s]
c = float(3e8) # speed of light [m/s]
# Inputs
beta = 2.0 # power used in extinction law
R = 1.0 # star formation rate [Msun/yr]
z = 1.0 # redshift
M_gas = 1.0 # mass of gas
M_halo = 2e41 # mass of dark matter halo
# Read spectra file
f = open(filename, 'r')
rawlines = f.readlines()
met = re.findall('Z\s=\s(\d*\.\d+)', rawlines[0])
del rawlines[0]
for i in range(len(rawlines)):
newlist = rawlines[i].split(' ')
datalist.append(newlist)
# Read extinction curve data file
rawpoints = open(extinctionfile, 'r').readlines()
def interpolate(R_V, rawpoints, Elist, i):
pointslist = []
if R_V == 4.0:
for i in range(len(rawpoints)):
newlst = re.split('(?!\S)\s(?=\S)|(?!\S)\s+(?=\S)', rawpoints[i])
pointslist.append(newlst)
pointslist = pointslist[3:]
lambdalist = [float(item[0]) for item in pointslist]
k_abslist = [float(item[4]) for item in pointslist]
xvallist = [(c*h)/(lamb*1e-6) for lamb in lambdalist]
k_interp = scipy.interpolate.interp1d(xvallist, k_abslist)
return k_interp(Elist[i])
# Processing function
def process(interpolate, filetype, datalist, beta, R, z, M_gas, M_halo, met):
speclist = []
if filetype == 4:
metallicity = float(met[0])
Elist = [float(item[0]) for item in datalist]
speclambdalist = [h*c*1e9/E for E in Elist]
met1list = [float(item[1]) for item in datalist]
speclist.extend(met1list)
klist, Tlist = [None]*len(speclist), [None]*len(speclist)
if metallicity > 0.0052:
DGRlist = [50.0*np.exp(-2.21)*metallicity]*len(speclist) # dust to gas ratio
elif metallicity <= 0.0052:
DGRlist = [((50.0*metallicity)**3.15)*np.exp(-0.96)]*len(speclist)
for i in range(len(speclist)):
if Elist[i] <= 4.1357e-3: # frequencies <= 10^12 Hz
klist[i] = 0.1*(float(Elist[i])/(1000.0*h))**beta # extinction law [cm^2/g]
elif Elist[i] > 4.1357e-3: # frequencies > 10^12 Hz
klist[i] = interpolate(R_V, rawpoints, Elist, i) # interpolated function's value at Elist[i]
print "KLIST (INTERPOLATION) ELEMENTS 0 AND 1000:", klist[0], klist[1000]
return
The output from the print line is KLIST (INTERPOLATION) ELEMENTS 0 AND 1000: 52167.31734159269 52167.31734159269.
When I run my old code without functions, I print klist[0] and klist[1000] like I do here and get different values for each. In this new code, I get back two values that are the same from this line. This shouldn't be the case, so it must not be interpolating correctly inside my function (maybe it's not performing it on each point correctly in the loop?). Does anyone have any insight? It would be unreasonable to post my entire code with all the used text files here (they're very large), so I'm not expecting anyone to run it, but rather examine how I use and call my functions.
Edit: Below is the original version of my code up to the interpolation point without the functions (which works).
import re
import numpy as np
import scipy.interpolate
filename = 'bpass_spectra.txt'
extinctionfile = 'ExtinctionLawPoints.txt' # from R_V = 4.0
pointslist = []
datalist = []
speclist = []
# Constants
h = 4.1357e-15 # Planck's constant [eV s]
c = float(3e8) # speed of light [m/s]
# Read spectra file
f = open(filename, 'r')
rawspectra = f.readlines()
met = re.findall('Z\s=\s(\d*\.\d+)', rawspectra[0])
del rawspectra[0]
for i in range(len(rawspectra)):
newlist = rawspectra[i].split(' ')
datalist.append(newlist)
# Read extinction curve data file
rawpoints = open(extinctionfile, 'r').readlines()
for i in range(len(rawpoints)):
newlst = re.split('(?!\S)\s(?=\S)|(?!\S)\s+(?=\S)', rawpoints[i])
pointslist.append(newlst)
pointslist = pointslist[3:]
lambdalist = [float(item[0]) for item in pointslist]
k_abslist = [float(item[4]) for item in pointslist]
xvallist = [(c*h)/(lamb*1e-6) for lamb in lambdalist]
k_interp = scipy.interpolate.interp1d(xvallist, k_abslist)
# Create new lists
Elist = [float(item[0]) for item in datalist]
speclambdalist = [h*c*1e9/E for E in Elist]
z1list = [float(item[1]) for item in datalist]
speclist.extend(z1list)
met = met[0]
klist = [None]*len(speclist)
Loutlist = [None]*len(speclist)
Tlist = [None]*len(speclist)
# Define parameters
b = 2.0 # power used in extinction law (beta)
R = 1.0 # star formation ratw [Msun/yr]
z = 1.0 # redshift
Mgas = 1.0 # mass of gas
Mhalo = 2e41 # mass of dark matter halo
if float(met) > 0.0052:
DGRlist = [50.0*np.exp(-2.21)*float(met)]*len(speclist)
elif float(met) <= 0.0052:
DGRlist = [((50.0*float(met))**3.15)*np.exp(-0.96)]*len(speclist)
for i in range(len(speclist)):
if float(Elist[i]) <= 4.1357e-3: # frequencies <= 10^12 Hz
klist[i] = 0.1*(float(Elist[i])/(1000.0*h))**b # extinction law [cm^2/g]
elif float(Elist[i]) > 4.1357e-3: # frequencies > 10^12 Hz
klist[i] = k_interp(Elist[i]) # interpolated function's value at Elist[i]
print "KLIST (INTERPOLATION) ELEMENTS 0 AND 1000:", klist[0], klist[1000]
The output from this print line is KLIST (INTERPOLATION) ELEMENTS 0 AND 1000 7779.275435560996 58253.589270674354.
You are passing i as an argument to interpolate, and then also using i in a loop within interpolate. Once i is used within the for i in range(len(rawpoints)) loop in interpolate, it will be set to some value: len(rawpoints)-1. The interpolate function will then always return the same value k_interp(Elist[i]), which is equivalent to k_interp(Elist[len(rawpoints)-1]). You will need to either define a new variable within your loop (e.g. for not_i in range(len(rawpoints))), or use a different variable for the Elist argument. Consider the following change to interpolate:
def interpolate(R_V, rawpoints, Elist, j):
pointslist = []
if R_V == 4.0:
for i in range(len(rawpoints)):
newlst = re.split('(?!\S)\s(?=\S)|(?!\S)\s+(?=\S)', rawpoints[i])
pointslist.append(newlst)
pointslist = pointslist[3:]
lambdalist = [float(item[0]) for item in pointslist]
k_abslist = [float(item[4]) for item in pointslist]
xvallist = [(c*h)/(lamb*1e-6) for lamb in lambdalist]
k_interp = scipy.interpolate.interp1d(xvallist, k_abslist)
return k_interp(Elist[j])

implicitly restart Lanczos method

I want to write simple toy code for implicitly restart Lanczos method.
Without implicit restarting, the code is perfectly working but when I turn on the restart, I cannot get proper solution
To my knowledge, newly constructed w should be orthogonal to all of the new Lanczos vectors. For the first restart, the orthogonality is well preserved but from the second restart, the orthogonality is significantly broken down and the program does not find proper eigenvalues.
I already spent several tens of hours to fix it. I almost gave up...... Here is my python code
"""
Author: Sunghwan Choi
Date Created: June 19, 2017
Python Version: 2.7 or 3.5
Reference for Lanczos algorithm
http://www.netlib.org/utk/people/JackDongarra/etemplates/node104.html
Reference for implicit restart
http://www.netlib.org/utk/people/JackDongarra/etemplates/node118.html
"""
import numpy as np
from scipy.sparse.linalg import eigsh
#from scipy.sparse import eye
from scipy.sparse import coo_matrix
from numpy import eye
def clustering(eigvals,eigvecs,tol=1e-2):
ret_eigvals=[]
ret_eigvecs=[]
for i in range(len(eigvals)):
for ret_eigval, ret_eigvec in zip (ret_eigvals,ret_eigvecs):
if (np.abs(eigvals[i]/ret_eigval-1.0)<tol ):
break
else:
ret_eigvals.append(eigvals[i])
ret_eigvecs.append(eigvecs[:,i])
ret_eigvals=np.array(ret_eigvals)
ret_eigvecs=np.array(ret_eigvecs).T
return ret_eigvals,ret_eigvecs
def check_conv(matrix, cal_eigval, cal_eigvec, tol):
indices=[]
for i in range(len(cal_eigval)):
if(np.linalg.norm(matrix.dot(cal_eigvec[:,i]) - cal_eigval[i]*cal_eigvec[:,i])< tol):
indices.append(i)
return indices
################ input
size=1600
max_step=20000
which='SA'
#implicit=False
implicit=True
energy_range=[0.0,6.0]
tol = 1e-5
n_eig=6
n_tol_check=40 # n_tol_check>n_eig ==0
######################
# generate 1D harmonic oscillator
h=0.1
matrix=-5/2*eye(size)
matrix+=4/3*(eye(size,k=1)+eye(size,k=-1))
matrix+=-1/12*(eye(size,k=2)+eye(size,k=-2))
matrix=-0.5*matrix/(h*h)
distance =lambda index: (index-size/2)*h
matrix+=np.diagflat( list(map( lambda i: 0.5*distance(i)**2, range(size))))
# solve eigenvalue problem to check validity
true_eigval,true_eigvec = eigsh(matrix,k=50,which=which)
indices = np.all([true_eigval>energy_range[0], true_eigval<energy_range[1]],axis=0)
true_eigval = true_eigval[indices]
true_eigvec = true_eigvec[:,indices]
#initialize variables
alpha=[]; beta=[]
index_v=0
restart_interval = n_tol_check+n_eig if implicit is not False else max_step
T = np.zeros((restart_interval,restart_interval))
v = np.zeros((size,restart_interval))
#Q=np.eye(restart_interval)
#generate initial vector
np.random.seed(1)
initial_vec = np.random.random(size)
#initial_vec = np.loadtxt("tmp")
w = v[:,index_v] = initial_vec/np.linalg.norm(initial_vec)
init_beta = np.linalg.norm(w)
# start Lanczos i_step
for i_step in range(max_step):
if (i_step is 0):
v[:,index_v] = w/init_beta
else:
v[:,index_v] = w/T[index_v,index_v-1]
w=matrix.dot(v[:,index_v])
if (i_step is 0):
w=w-init_beta*v[:,index_v-1]
else:
w=w-T[index_v,index_v-1]*v[:,index_v-1]
T[index_v,index_v]=np.dot(w,v[:,index_v])
w -=T[index_v,index_v]*v[:,index_v]
#check convergence
if ((i_step+1)%n_tol_check==n_eig and i_step>n_eig):
# calculate eigenval of T matrix
cal_eigval, cal_eigvec_= np.linalg.eigh(T[:index_v+1,:index_v+1])
cal_eigvec = np.dot(v[:,:index_v+1],cal_eigvec_)
#check tolerance
conv_indices = check_conv(matrix, cal_eigval, cal_eigvec,tol)
#filter energy_range
indices = np.all([cal_eigval[conv_indices]>energy_range[0], cal_eigval[conv_indices]<energy_range[1]],axis=0)
#check clustering
conv_cal_eigval,conv_cal_eigvec = clustering((cal_eigval[conv_indices])[indices], (cal_eigvec[conv_indices])[indices])
if (len(conv_cal_eigval)>=n_eig):
break
# implicit restarting
if (implicit is True):
Q=np.eye(restart_interval)
# do shift & QR decomposition
indices = np.argsort(np.abs(cal_eigval-np.mean(energy_range)))
for index in indices[n_eig:]:
new_Q,new_R = np.linalg.qr(T-cal_eigval[index]*np.eye(len(T)))
T = np.dot(new_Q.T,np.dot(T,new_Q))
v = np.dot(v,new_Q)
Q = np.dot(Q,new_Q)
w=v[:,n_eig]*T[n_eig,n_eig-1]+w*Q[-1,n_eig-1]
v[:,n_eig:]=0.0
T[:,n_eig:] = 0.0
T[n_eig:,:] = 0.0
#for debug
#print(np.dot(w.T, v))
# reset index
index_v=n_eig-1
index_v+=1
T[index_v,index_v-1]=np.linalg.norm(w)
T[index_v-1,index_v]=np.linalg.norm(w)
else:
print("not converged")
exit(-1)
print ("energy window: (", energy_range[0],",",energy_range[1],")")
print ("true eigenvalue")
print(true_eigval)
print ("eigenvalue from Lanczos w/ implicit restart (",i_step+1,")")
print(conv_cal_eigval)

Parallel Processing - Pool - Python

I'm trying to learn how to use multiprocessing in Python.
I read about multiprocessing, and I trying to do something like this:
I have the following class(partial code), which has a method to produce voronoi diagrams:
class ImageData:
def generate_voronoi_diagram(self, seeds):
"""
Generate a voronoi diagram with *seeds* seeds
:param seeds: the number of seed in the voronoi diagram
"""
nx = []
ny = []
gs = []
for i in range(seeds):
# Generate a cell position
pos_x = random.randrange(self.width)
pos_y = random.randrange(self.height)
nx.append(pos_x)
ny.append(pos_y)
# Save the f(x,y) data
x = Utils.translate(pos_x, 0, self.width, self.range_min, self.range_max)
y = Utils.translate(pos_y, 0, self.height, self.range_min, self.range_max)
z = Utils.function(x, y)
gs.append(z)
for y in range(self.height):
for x in range(self.width):
# Return the Euclidean norm
d_min = math.hypot(self.width - 1, self.height - 1)
j = -1
for i in range(seeds):
# The distance from a cell to x, y point being considered
d = math.hypot(nx[i] - x, ny[i] - y)
if d < d_min:
d_min = d
j = i
self.data[x][y] = gs[j]
I have to generate a large number of this diagrams, so, this consumes a lot of time, so I thought this is a typical problem to be parallelized.
I was doing this, in the "normal" approach, like this:
if __name__ == "__main__":
entries = []
for n in range(images):
entry = ImD.ImageData(width, height)
entry.generate_voronoi_diagram(seeds)
entry.generate_heat_map_image("ImagesOutput/Entries/Entry"+str(n))
entries.append(entry)
Trying to parallelize this, I tried this:
if __name__ == "__main__":
entries = []
seeds = np.random.poisson(100)
p = Pool()
entry = ImD.ImageData(width, height)
res = p.apply_async(entry.generate_voronoi_diagram,(seeds))
entries.append(entry)
entry.generate_heat_map_image("ImagesOutput/Entries/EntryX")
But, besides it doesn't work not even to generate a single diagram, I don't know how to specify that this have to be made N times.
Any help would be very appreciated.
Thanks.
Python's multiprocessing doesn't share memory (unless you explicitly tell it to). That means that you won't see "side effects" of any function that gets run in a worker processes. Your generate_voronoi_diagram method works by adding data to an entry value, which is a side effect. In order to see the results, you need to be passing it back as a return values from your function.
Here's one approach that handles the entry instance as an argument and return value:
def do_voroni(entry, seeds):
entry.generate_voronoi_diagram(seeds)
return entry
Now, you can use this function in your worker processes:
if __name__ == "__main__":
entries = [ImD.ImageData(width, height) for _ in range(images)]
seeds = numpy.random.poisson(100, images) # array of values
pool = multiprocessing.Pool()
for i, e in enumerate(pool.starmap_async(do_voroni, zip(entries, seeds))):
e.generate_heat_map_image("ImagesOutput/Entries/Entry{:02d}".format(i))
The e values in the loop are not references to the values in the entries list. Rather, they're copies of those objects, which have been passed out to the worker process (which added data to them) and then passed back.
I might be wrong, but I think you should use
res = p.apply_async(entry.generate_voronoi_diagram,(seeds))
res.get(timeout=1)
you may get Can't pickle type 'instancemethod'
i think the easiest way is something like
import random
from multiprocessing import Pool
class ImageData:
def generate_voronoi_diagram(self, seeds):
ooxx
def generate_heat_map_image(self, path):
ooxx
def allinone(obj, seeds, path):
obj.generate_voronoi_diagram(seeds)
obj.generate_heat_map_image(path)
if __name__ == "__main__":
entries = []
seeds = random.random()
p = Pool()
entry = ImageData()
res = p.apply_async(allinone, (entry, seeds, 'tmp.txt'))
res.get(timeout=1)

Categories