Python, how to optimize this code - python

I tried to optimize the code below but I cannot figure out how to improve computation speed. I tried Cthon but the performance is like in python.
Is it possible to improve the performance without rewrite everything in C/C++?
Thanks for any help
import numpy as np
heightSequence = 400
widthSequence = 400
nHeights = 80
DOF = np.zeros((heightSequence, widthSequence), dtype = np.float64)
contrast = np.float64(np.random.rand(heightSequence, widthSequence, nHeights))
initDOF = np.zeros([heightSequence, widthSequence], dtype = np.float64)
initContrast = np.zeros([heightSequence, widthSequence, nHeights], dtype = np.float64)
initHeight = np.float64(np.r_[0:nHeights:1.0])
initPixelContrast = np.array(([0 for ii in range(nHeights)]), dtype = np.float64)
# for each row
for row in range(heightSequence):
# for each col
for col in range(widthSequence):
# initialize variables
height = initHeight # array ndim = 1
c = initPixelContrast # array ndim = 1
# for each height
for indexHeight in range(0, nHeights):
# get contrast profile for current pixel
tempC = contrast[:, :, indexHeight]
c[indexHeight] = tempC[row, col]
# save original contrast
# originalC = c
# originalHeight = height
# remove profile before maximum and after minumum contrast
idxMaxContrast = np.argmax(c)
c = c[idxMaxContrast:]
height = height[idxMaxContrast:]
idxMinContrast = np.argmin(c) + 1
c = c[0:idxMinContrast]
height = height[0:idxMinContrast]
# remove some refraction
if (len(c) <= 1) | (np.max(c) <= 0):
DOF[row, col] = 0
else:
# linear fitting of profile contrast
P = np.polyfit(height, c, 1)
m = P[0]
q = P[1]
# remove some refraction
if m >= 0:
DOF[row, col] = 0
else:
DOF[row, col] = -q / m
print 'row=%i/%i' %(row, heightSequence)
# set range of DOF
DOF[DOF < 0] = 0
DOF[DOF > nHeights] = 0

By looking at the code it seems that you can get rid of the two outer loops completely, converting the code to a vectorised form. However, the np.polyfit call must then be replaced by some other expression, but the coefficients for a linear fit are easy to find, also in vectorised form. The last if-else can then be turned into a np.where call.

Related

Octant partition equally a cloud of points in 3D

Is there a better way to do this? Not necessarily prettier, although it would be nice.
P = [N,3] ‘Cloud of points’
P -= np.sum(P, axis=0) / P.shape[0]
Map = [i for i in range(P.shape[0])]
p_0 = Map[P[:,0] <= 0]
p_1 = Map[P[:,0] > 0]
p_0_0 = p_0[P[p_0,1] <= 0]
p_0_1 = p_0[P[p_0,1] > 0]
p_1_0 = p_1[P[p_1,1] <= 0]
p_1_1 = p_1[P[p_1,1] > 0]
p_0_0_0 = p_0_0[P[p_0_0,2] <= 0]
p_0_0_1 = p_0_0[P[p_0_0,2] > 0]
p_0_1_0 = p_0_1[P[p_0_1,2] <= 0]
p_0_1_1 = p_0_1[P[p_0_1,2] > 0]
p_1_0_0 = p_1_0[P[p_1_0,2] <= 0]
p_1_0_1 = p_1_0[P[p_1_0,2] > 0]
p_1_1_0 = p_1_1[P[p_1_1,2] <= 0]
p_1_1_1 = p_1_1[P[p_1_1,2] > 0]
Or in other words, is there a way to compound conditions like,
Oct_0_0_0 = Map[P[:,0] <= 0 and P[:,1] <= 0 and P[:,2] <= 0]
I’m assuming a loop won’t be better than this… not sure.
Thanks in advance.
Instead of repeatedly slicing and keeping lists of the indices, I'd instead recommend creating a single array that maps the index of the point to the octant it belongs to. I'd argue that this is a more natural way of doing it in numpy. So for instance with
octants = (P>0) # 2**np.arange(P.shape[2])
the n-th entry of octants is the index (here in the range 0,1,2,...,7) of the octant that P belongs to. This works by checking each coordinate whether it is positive or not. This gives three boolean values per point which we can interpret as the binary expansion of that index. (In fact, the line above works for indexing the 2^d-ants in any number of dimensions d.)
To demonstrate this solution, following snippet makes a point cloud and colours them according to their quadrant:
import numpy as np
P = np.random.rand(10000, 3)*2-1
quadrants = (P > 0) # 2**np.arange(3)
import matplotlib.pyplot as plt
fig = plt.figure()
ax = fig.add_subplot(projection='3d')
ax.scatter(*P.T, c=quadrants, cmap='jet')
plt.show()
If you still need to extract an array of the indices of a specific quadrant, say quadrant 0, 1, 1 this corresponds to finding the corresponding decimal number, which is 0*2^0 + 1*2^1 + 1*2^2 = 6, which you can do with e.g.
p_0_1_1 = np.where(quadrants == np.array([0,1,1]) # 2**np.arange(3))
All in all, I’m going with an even messier version,
p_x = P[:,0] < 0
p_0 = np.nonzero(p_x)[0]
p_1 = np.nonzero(~p_x)[0]
p_0_y = P[p_0,1] < 0
p_1_y = P[p_1,1] < 0
p_0_0 = p_0[p_0_y]
p_0_1 = p_0[~p_0_y]
p_1_0 = p_1[p_1_y]
p_1_1 = p_1[~p_1_y]
p_0_0_z = P[p_0_0,2] < 0
p_0_1_z = P[p_0_1,2] < 0
p_1_0_z = P[p_1_0,2] < 0
p_1_1_z = P[p_1_1,2] < 0
p_0_0_0 = p_0_0[p_0_0_z]
p_0_0_1 = p_0_0[~p_0_0_z]
p_0_1_0 = p_0_1[p_0_1_z]
p_0_1_1 = p_0_1[~p_0_1_z]
p_1_0_0 = p_1_0[p_1_0_z]
p_1_0_1 = p_1_0[~p_1_0_z]
p_1_1_0 = p_1_1[p_1_1_z]
p_1_1_1 = p_1_1[~p_1_1_z]
I have a religious belief (I mean based on thin air) that a comparison is fundamentally cheaper than an arithmetic operation.

Is there a way to use numpy array functions to get this same effect?

I am trying to filter out all non-gray values within a given tolerance with the following code. It gives the expected results but runs too slowly to use in practice. Is there a way to do the following using numpy operations?
for i in range(height):
for j in range(width):
r, g, b = im_arr[i][j]
r = (r + 150) / 2
g = (g + 150) / 2
b = (b + 150) / 2
mean = (r + g + b) / 3
diffr = abs(mean - r)
diffg = abs(mean - g)
diffb = abs(mean - b)
maxdev = 2
if (diffr + diffg + diffb) > maxdev:
im_arr[i][j][0] = 0
im_arr[i][j][1] = 0
im_arr[i][j][2] = 0
Looping in plain Python is slow: one of the advantages of numpy is that
transversing the arrays is highly optimized. Without commenting on the algorithm, you can get the same results using only numpy, which will be much faster
Since im_arr is an image, it is very likely that the dtype is np.uint8.
That is only 8 bits, so you have to be careful of overflows. In you code, when you add 150 to a number, the result will be of type np.int64. But if you add 150 to an 8-bit np.ndarray, the result will still be of type np.uint8 and it can overflow.
You can either change the array type (using astype) or add a float, which will automatically promote the array to float
mod_img = (im_arr + 150.)/2 # the point of "150." is important
signed_dif = mod_img - np.mean(mod_img, axis=2, keepdims=True)
collapsed_dif = np.sum(np.abs(signed_dif), axis=2)
maxdev = 2
im_arr[collapsed_dif > maxdev] = 0
This can be done without any loop. I'll try to break out every step into a dedicated line
import numpy as np
im_arr = np.random.rand(300,400,3) # Assuming this how you image looks like
img_shifted = (im_arr + 15) / 2 # This can be done in one go
mean_v = np.mean(img_shifted, axis=2) # Compute the mean along the channel axis
diff_img = np.abs(mean_v[:,:,None] - img_shifted) # Broadcasting to subtract n x m from n x m x k
maxdev = 2
selection = np.sum(diff_img, axis=2) > maxdev
im_arr[selection] = 0 # Using fancy indexing with booleans

Slicing a NumPy array with another set of arrays having beginning and ending set of indices

I am writing a code to simulate Continuous Time Random Walk phenomena with a function in python. My code works correctly so far, but I would like to exploit the indexing abilities of NumPy arrays and improve the speed. In the code, below I am generating an ensemble of trajectories, so I have to loop over each of them while generating it. Is it somehow possible to index the NumPy array x in such a way that I can get rid of the loop on Nens (the for loop in the below code snippet)
for k in range(Nens):
#start building the trajectory
stop = 0
i = 0
while (stop < Nt):
#reset the the stop time
start = stop
#increment the stop till the next waiting time
stop += int(trand[i,k]) # A 2D numpy array
#update the trajectory
x[start:stop,k] = x[start-1,k] \ #x is also a 2D array
+ (1-int(abs((x[start-1,k]+xrand[i,k])/(xmax))))* xrand[i,k] \
- int(abs((x[start-1,k]+xrand[i,k])/(xmax)))*np.sign(x[start-1,k]/xrand[i,k])* xrand[i,k]
i += 1
print i
return T, x
A plausible method that I can look to is as follows.
In this code, start and stop are scalar integers. However, I would like to index this is in way in which both start and stop are 1D Numpy integer arrays.
But I have seen that if I can use only stop/start to slice the numpy array, but using slicing from a beginning to ending tuple of indices is not possible .
EDIT 1 (MWE):
The following is the function that I have written, which produces random walk trajectory if given the appropriate parameters,
def ctrw_ens2d(sig,tau,sig2,tau2,alpha,xmax,Nens,Nt=1000,dt=1.0):
#first define the array for time
T = np.arange(0,Nt,1)*dt
#generate at least Nt random time increments based on Poisson
# distribution (you will use only less than that)
trand = np.random.exponential(tau, (2*Nt,Nens,1))
xrand = np.random.normal(0.0,sig,(2*Nt,Nens,2))
Xdist = np.random.lognormal(-1,0.9,(Nens))
Xdist = np.clip(Xdist,2*sig,12*sig)
trand2 = np.random.exponential(tau2, (2*Nt,Nens,1))
xrand2 = np.random.normal(0.0,sig2,(2*Nt,Nens,2))
#make a zero array of trajectory
x1 = np.zeros((Nt,Nens))
x2 = np.zeros((Nt,Nens))
y1 = np.zeros((Nt,Nens))
y2 = np.zeros((Nt,Nens))
for k in range(Nens):
#start building the trajectory
stop = 0
i = 0
while (stop < Nt):
#reset the the stop time
start = stop
#increment the stop till the next waiting time
stop += int(trand[i,k,0])
#update the trajectory
r1 = np.sqrt(x1[start-1,k]**2 + y1[start-1,k]**2)
rr = np.linalg.norm(xrand[i,k])
x1[start:stop,k] = x1[start-1,k] \
+ (1-int(abs((r1+rr)/(Xdist[k]))))* xrand[i,k,0] \
- int(abs((r1+rr)/(Xdist[k])))* \
np.sign(x1[start-1,k]/xrand[i,k,0])* xrand[i,k,0]
y1[start:stop,k] = y1[start-1,k] \
+ (1-int(abs((r1+rr)/(Xdist[k]))))* xrand[i,k,1] \
- int(abs((r1+rr)/(Xdist[k])))* \
np.sign(y1[start-1,k]/xrand[i,k,1])* xrand[i,k,1]
i += 1
#randomly add jumps in between, at later stage
stop = 1
i = 0
while (stop < Nt):
#reset the the stop time
start = stop
#increment the stop till the next waiting time
stop += int(trand2[i,k,0])
#update the trajectory
x2[start:stop,k] = x2[start-1,k] + xrand2[i,k,0]
y2[start:stop,k] = y2[start-1,k] + xrand2[i,k,1]
i += 1
return T, (x1+x2), (y1+y2)
A simple run of the above function is given below,
Tmin = 0.61 # in ps
Tmax = 1000 # in ps
NT = int(Tmax/Tmin)*10
delt = (Tmax-0.0)/NT
print "Delta T, No. of timesteps:",delt,NT
Dint = 0.21 #give it Ang^2/ps
sig = 0.3 #in Ang
xmax = 5.*sig
tau = sig**2/(2*Dint)/delt # from ps, convert it into the required units according to delt
print "Waiting time for confined motion (in Delta T units)",tau
Dj = 0.03 # in Ang^2/ps
tau2 = 10 # in ps
sig2 = np.sqrt(2*Dj*tau2)
print "Sigma 2:", sig2
tau2 = tau2/delt
alpha = 1
tim, xtall, ytall = ctrw_ens2d(sig,tau,sig2,tau2,alpha,xmax,100,Nt=NT,dt=delt)
The generated trajectories can be plotted as follows,
rall = np.stack((xtall,ytall),axis=-1)
print rall.shape
print xtall.shape
print rall[:,99,:].shape
k = 19
plt.plot(xtall[:,k],ytall[:,k])
Starting with a zero array, the loop
while stop < Nt:
start = stop
stop += randint();
x[start:stop] = x[start-1] + rand();
will create a series of steps.
A step can be achieved with the cumulative sum of the inpulse
while stop < Nt:
start = stop
stop += randint();
x[start] = any();
np.cumsum(x, out=x)
This applies to both the first and second loop.
The (x2, y2) are more easily vectorized because the increments do not depend on the previous values
The (x2, y2) still require a while loop, but each iteration can be vectorized.
The final result is like this
def ctrw_ens2d_vectorized(sig,tau,sig2,tau2,alpha,xmax,Nens,Nt=1000,dt=1.0):
# first define the array for time
T = np.arange(0,Nt,1)*dt
# generate at least Nt random time increments based on Poisson
# distribution (you will use only less than that)
trand = np.random.exponential(tau, (2*Nt,Nens,1))
xrand = np.random.normal(0.0,sig,(2*Nt,Nens,2))
Xdist = np.random.lognormal(-1,0.9,(Nens))
Xdist = np.clip(Xdist,2*sig,12*sig)
trand2 = np.random.exponential(tau2, (2*Nt,Nens,1)).astype(np.int64)
xrand2 = np.random.normal(0.0,sig2,(2*Nt,Nens,2))
#make a zero array of trajectory
x1 = np.zeros((Nt,Nens))
x2 = np.zeros((Nt,Nens))
y1 = np.zeros((Nt,Nens))
y2 = np.zeros((Nt,Nens))
#randomly add jumps in between, at later stage
stop = 1 + np.cumsum(trand2[:,:,0], axis=0)
# vectorize the indices
I, J = np.indices(stop.shape)
m = stop < Nt # Vectorized loop stopping condition
I = I[m]; J = J[m]; # indices only for the executed iterations
# update x
x2[stop[I,J], J] = xrand2[I,J,0]
y2[stop[I,J], J] = xrand2[I,J,1]
np.cumsum(x2, axis=0, out=x2)
np.cumsum(y2, axis=0, out=y2)
# this part is more complicated and I vectorized on axis 1
stop = np.zeros(Nens, dtype=np.int64)
start = np.zeros(Nens, dtype=np.int64)
k = np.arange(Nens)
i = 0
zx1 = np.zeros_like(x1[0])
zy1 = np.zeros_like(y1[0])
assert(np.all(trand > 0))
m = k
i = 0
while np.any(stop < Nt):
start[:] = stop;
stop[m] += trand[i,m,0].astype(np.int64)
m = k[stop < Nt];
r1 = np.sqrt(zx1[m]**2 + zy1[m]**2)
rr = np.linalg.norm(xrand[i,m,:],axis=-1) # axis requires numpy 1.8
tx = (1-(abs((r1+rr)/(Xdist[m]))).astype(np.int64))* xrand[i,m,0] \
- (abs((r1+rr)/(Xdist[m]))).astype(np.int64)* \
np.sign(zx1[m]/xrand[i,m,0])* xrand[i,m,0]
ty = (1-(abs((r1+rr)/(Xdist[m]))).astype(np.int64))* xrand[i,m,1] \
- (abs((r1+rr)/(Xdist[m]))).astype(np.int64)* \
np.sign(zy1[m]/xrand[i,m,1])* xrand[i,m,1]
zx1[m] += tx[:] * (start[m] < stop[m])
zy1[m] += ty[:] * (start[m] < stop[m])
x1[start[m],m] = tx[:]
y1[start[m],m] = ty[:]
i += 1
np.cumsum(x1, axis=0, out=x1)
np.cumsum(y1, axis=0, out=y1)
return T, (x1+x2), (y1+y2)
This runs ~8x faster than the original code here.

Faster iteration on for loop with 2d arrays

I have a problem with optimization to compute errors for disparity map estimation.
To compute errors I create a class with called methods for each error. I need to iterate for every pixel to get an error.
This arrays are big cause I'm iterating in size of 1937 x 1217 images. Do you know how to optimize it?
Here is code of my method:
EDIT:
def mreError(self):
s_gt = self.ref_disp_norm
s_all = self.disp_bin
s_r = self.disp_norm
s_gt = s_gt.astype(np.float32)
s_r = s_r.astype(np.float32)
n, m = s_gt.shape
all_arr = []
for i in range(0, n):
for j in range(0, m):
if s_all[i,j] == 255:
if s_gt[i,j] == 0:
sub_mre = 0
else:
sub_mre = np.abs(s_gt[i,j] - s_r[i,j]) / s_gt[i,j]
all_arr.append(sub_mre)
mre_all = np.mean(all_arr)
return mre_all
A straight up vectorisation of your method would be
def method_1(self):
# get s_gt, s_all, s_r
sub_mre = np.zeros((s_gt.shape), dtype=np.float32)
idx = s_gt != 0
sub_mre[idx] = np.abs((s_gt[idx] - s_r[idx]) / s_gt[idx])
return np.mean(sub_mre[s_all == 255])
But since you're doing your averaging only for pixels where s_all is 255, you could also filter for those first and then do the rest
def method_2(self):
idx = s_all == 255
s_gt = s_gt[idx].astype(np.float32)
s_r = s_r[idx].astype(np.float32)
sub_mre = np.zeros_like(s_gt)
idx = s_gt != 0
sub_mre[idx] = np.abs((s_gt[idx] - s_r[idx]) / s_gt[idx])
return np.mean(sub_mre)
Personally, I would favour the first method unless the second one results in a much faster result. Calling the function only once and spending, for example, 40 ms vs 5 ms is not noticeable and the readability of the function matters more.
You could simply use array operators instead applying them to every element inside a for loop:
import numpy as np
# Creating 2000x2000 Test-Data
s_gt = np.random.randint(0,2,(2000,2000)).astype(np.float32)
s_r = np.random.randint(0,2,(2000,2000)).astype(np.float32)
s_all = np.random.randint(0,256,(2000,2000)).astype(np.float32)
def calc(s_gt, s_r, s_all):
n, m = s_gt.shape
all_arr = []
for i in range(0, n):
for j in range(0, m):
if s_gt[i,j] == 0:
sub_mre = 0
else:
sub_mre = np.abs(s_gt[i,j] - s_r[i,j]) / s_gt[i,j]
if s_all[i,j] == 255:
all_arr.append(sub_mre)
mre_all = np.mean(all_arr)
return mre_all
def calc_optimized(s_gt, s_r, s_all):
sub_mre = np.abs((s_gt-s_r)/s_gt)
sub_mre[s_gt==0] = 0
return np.mean(sub_mre[s_all == 255])
When I test the speed of the two different approaches:
%time calc(s_gt, s_r, s_all)
Wall time: 27.6 s
Out[53]: 0.24686379928315413
%time calc_optimized(s_gt, s_r, s_all)
Wall time: 63.3 ms
__main__:34: RuntimeWarning: divide by zero encountered in true_divide
__main__:34: RuntimeWarning: invalid value encountered in true_divide
Out[54]: 0.2468638
You can just make an image grey (this will speed up calculations substantially) Go check this link how you can do it.

Boolean index did not match indexed array along dimension 0; dimension is 2 but corresponding boolean dimension is 18

I try to learn code in Image Processing with Python but i have a error
bins_num = int(brick_lbp.max() + 1)
brick_hist = np.histogram(brick_lbp, normed=True, bins=bins_num, range=(0, bins_num))
lbp_features = [brick_rot_lbp, grass_rot_lbp, wall_rot_lbp]
min_score = 1000 # Set a very large best score value initially
idx = 0 # To keep track of the winner
for feature in lbp_features:
histogram, _ = np.histogram(feature, normed=True, bins=bins_num, range=(0,bins_num))
p = np.asarray(brick_hist)
q = np.asarray(histogram)
filter_idx = np.logical_and(p != 0, q != 0)
score = np.sum(p[filter_idx] * np.diff(p[filter_idx] / q[filter_idx]))
if score < min_score:
min_score = score
winner = idx
idx = idx + 1
boolean index did not match indexed array along dimension 0; dimension is 2 but corresponding boolean dimension is 18

Categories