I understand the basics of how vectorization works, but I'm struggling to see how to apply that knowledge to my use case. I have a working algorithm for some image processing. However, the particular algorithm that I'm working with doesn't process the entire image as there is a border to account for the "window" that gets shifted around the image.
I'm trying to use this to better understand Numpy's vectorization, but I can't figure out how to account for the window and the border. Below is what I have in vanilla python (with the actual algorithm redacted, I'm only asking for help on how to vectorize). I looked into np.fromfunction and a few other options, but have had no luck. Any suggestions would be welcome at this point.
half_k = np.int(np.floor(k_size / 2));
U = np.zeros(img_a.shape, dtype=np.float64);
V = np.zeros(img_b.shape, dtype=np.float64);
for y in range(half_k, img_a.shape[0] - half_k):
for x in range(half_k, img_a.shape[1] - half_k):
# init variables for window calc goes here
for j in range(y - half_k, y + half_k + 1):
for i in range(x - half_k, x + half_k + 1):
# stuff init-ed above gets added to here
# final calc on things calculated in windows goes here
U[y][x] = one_of_the_window_calculations
V[y][x] = the_other_one
return U, V
I think you can create an array of the indices of the patches with a function like this get_patch_idx in the first place
def get_patch_idx(ind,array_shape,step):
row_nums,col_nums = array_shape
col_idx = ind-(ind//col_nums)*col_nums if ind%col_nums !=0 else col_nums
row_idx = ind//col_nums if ind%col_nums !=0 else ind//col_nums
if col_idx+step==col_nums or row_idx+step==row_nums or col_idx-step==-1 or row_idx-step==-1: raise ValueError
upper = [(row_idx-1)*col_nums+col_idx-1,(row_idx-1)*col_nums+col_idx,(row_idx-1)*col_nums+col_idx+1]
middle = [row_idx*col_nums+col_idx-1,row_idx*col_nums+col_idx,row_idx*col_nums+col_idx+1]
lower = [(row_idx+1)*col_nums+col_idx-1,(row_idx+1)*col_nums+col_idx,(row_idx+1)*col_nums+col_idx+1]
return [upper,middle,lower]
Assume you have an (10,8) array, and half_k is 1
test = np.linspace(1,80,80).reshape(10,8)*2
mask = np.linspace(0,79,80).reshape(10,8)[1:-1,1:-1].ravel().astype(np.int)
in which the indices in mask are allowed, then you can create an array of indices of the patches
patches_inds = np.array([get_patch_idx(ind,test.shape,1) for ind in mask])
with this patches_inds, patches of the original array test can be sliced with np.take
patches = np.take(test,patches_inds)
This will bypass for loop efficiently.
Related
I am trying to construct two sliding windows over a multivariate sequence of data (m*n). The first window should be fixed and the second one is rolling over the data samples. Both windows have the same size. I followed this postdistance calculations, FFT, Sliding window and built upon that. I am struggling to implement it. The python snippet of the code is below:
sliding_dist(data,window_size):
window_size = 10
n = len(data)
dist = np.zeros(n)
for i in range(n - window_size):
fixed_window = data[i:window_size, :]
rolling_window = data[i:i + window_size,:]
ditance = np.linalg.norm(fixed_window - rolling_window)
dist[i] = ditance
return dist
My problem is how to fix the first window and keep the second on rolling over data sample n. Logically, I am not quite sure whether the implementation of for loop and indexes are correct or should I change them. Also, I have issue with the indexes I got an error about the shape: ValueError: operands could not be broadcast together with shapes (9,3) (10,3)
Edit:
I just edited the code to take the fixed window out of the for loop. It solved the previous error ValueError: operands could not be broadcast together with shapes (9,3) (10,3) :
sliding_dist(data,window_size):
window_size = 10
n = len(data)
dist = np.zeros(n)
fixed_window = data[:window_size, :]
for i in range(n - window_size):
rolling_window = data[i:i + window_size,:]
ditance = np.linalg.norm(fixed_window - rolling_window)
dist[i] = ditance
return dist
I am still not sure about whether the way I implemented the two sliding windows is correct or not? Please could anyone have some thoughts and help on this.
I'm trying to define a function in python that performs sliding window on multiple signals (and using the resulting SW as input for ripser).
What I want to achieve is this (examples with sines, sorry for bad drawing skills)
picture describing my goal
I have 14 signals of 10000 points, so a 14 x 10000 matrix, and I want to perform a sliding window on all the signals making them correlated in some way by grouping all the points for all the signals in each window, given its dimension.
I tried first using the code made by Christoper Tralie, but this gives me an error on the dimension of X, so now I'm trying to modify it.
def slidingWindowMultipleSignals(I, dim, Tau, dT):
'''
Performs the sliding window on multiple signals.
Author: Christopher J. Tralie
'''
N = I.shape[0] #Number of frames
P = I.shape[1] #Number of pixels (possibly after PCA)
pix = np.arange(P)
NWindows = int(np.floor((N-dim*Tau)/dT))
X = np.zeros((NWindows, dim*P))
idx = np.arange(N)
for i in range(NWindows):
idxx = dT*i + Tau*np.arange(dim)
start = int(np.floor(idxx[0]))
end = int(np.ceil(idxx[-1]))+2
if end >= I.shape[0]:
X = X[0:i, :]
break
f = scipy.interpolate.interp2d(pix, idx[start:end+1], I[idx[start:end+1], :], kind='linear')
X[i, :] = f(pix, idxx).flatten()
return X
The problem is that I don't know what to modify for making it doing the thing I described with the image.
Can someone point me to the right direction?
I suspect the problem is located in the line
NWindows = int(np.floor((N-dim*Tau)/dT))
specifically with the use of /. I'd check the dtypes of dim, Tau and dT. If all are integers, or if some are floats, / may not behave in exactly the same way.
Also, python expects the body of the function to be indented, which it isn't in your example.
I have some code for calculating missing values in an image, based on neighbouring values in a 2D circular window. It also uses the values from one or more temporally-adjacent images at the same locations (i.e. the same 2D window shifted in the 3rd dimension).
For each position that is missing, I need to calculate the value based not necessarily on all the values available in the whole window, but only on the spatially-nearest n cells that do have values (in both images / Z-axis positions), where n is some value less than the total number of cells in the 2D window.
At the minute, it's much quicker to calculate for everything in the window, because my means of sorting to get the nearest n cells with data is the slowest part of the function as it has to be repeated each time even though the distances in terms of window coordinates do not change. I'm not sure this is necessary and feel I must be able to get the sorted distances once, and then mask those in the process of only selecting available cells.
Here's my code for selecting the data to use within a window of the gap cell location:
# radius will in reality be ~100
radius = 2
y,x = np.ogrid[-radius:radius+1, -radius:radius+1]
dist = np.sqrt(x**2 + y**2)
circle_template = dist > radius
# this will in reality be a very large 3 dimensional array
# representing daily images with some gaps, indicated by 0s
dataStack = np.zeros((2,5,5))
dataStack[1] = (np.random.random(25) * 100).reshape(dist.shape)
dataStack[0] = (np.random.random(25) * 100).reshape(dist.shape)
testdata = dataStack[1]
alternatedata = dataStack[0]
random_gap_locations = (np.random.random(25) * 30).reshape(dist.shape) > testdata
testdata[random_gap_locations] = 0
testdata[radius, radius] = 0
# in reality we will go through every gap (zero) location in the data
# for each image and for each gap use slicing to get a window of
# size (radius*2+1, radius*2+1) around it from each image, with the
# gap being at the centre i.e.
# testgaplocation = [radius, radius]
# and the variables testdata, alternatedata below will refer to these
# slices
locations_to_exclude = np.logical_or(circle_template, np.logical_or
(testdata==0, alternatedata==0))
# the places that are inside the circular mask and where both images
# have data
locations_to_include = ~locations_to_exclude
number_available = np.count_nonzero(locations_to_include)
# we only want to do the interpolation calculations from the nearest n
# locations that have data available, n will be ~100 in reality
number_required = 3
available_distances = dist[locations_to_include]
available_data = testdata[locations_to_include]
available_alternates = alternatedata[locations_to_include]
if number_available > number_required:
# In this case we need to find the closest number_required of elements, based
# on distances recorded in dist, from available_data and available_alternates
# Having to repeat this argsort for each gap cell calculation is slow and feels
# like it should be avoidable
sortedDistanceIndices = available_distances.argsort(kind = 'mergesort',axis=None)
requiredIndices = sortedDistanceIndices[0:number_required]
selected_data = np.take(available_data, requiredIndices)
selected_alternates = np.take(available_alternates , requiredIndices)
else:
# we just use available_data and available_alternates as they are...
# now do stuff with the selected data to calculate a value for the gap cell
This works, but over half of the total time of the function is taken in the argsort of the masked spatial distance data. (~900uS of a total 1.4mS - and this function will be running tens of billions of times, so this is an important difference!)
I am sure that I must be able to just do this argsort once outside of the function, when the spatial distance window is originally set up, and then include those sort indices in the masking, to get the first howManyToCalculate indices without having to re-do the sort. The answer might involve putting the various bits that we are extracting from, into a record array - but I can't figure out how, if so. Can anyone see how I can make this part of the process more efficient?
So you want to do the sorting outside of the loop:
sorted_dist_idcs = dist.argsort(kind='mergesort', axis=None)
Then using some variables from the original code, this is what I could come up with, though it still feels like a major round-trip..
loc_to_incl_sorted = locations_to_include.take(sorted_dist_idcs)
sorted_dist_idcs_to_incl = sorted_dist_idcs[loc_to_incl_sorted]
required_idcs = sorted_dist_idcs_to_incl[:number_required]
selected_data = testdata.take(required_idcs)
selected_alternates = alternatedata.take(required_idcs)
Note the required_idcs refer to locations in the testdata and not available_data as in the original code. And this snippet I used take for the purpose of conveniently indexing the flattened array.
#moarningsun - thanks for the comment and answer. These got me on the right track, but don't quite work for me when the gap is < radius from the edge of the data: in this case I use a window around the gap cell which is "trimmed" to the data bounds. In this situation the indices reflect the "full" window and thus can't be used to select cells from the bounded window.
Unfortunately I edited that part of my code out when I clarified the original question but it's turned out to be relevant.
I've realised now that if you use argsort again on the output of argsort then you get ranks; i.e. the position that each item would have when the overall array was sorted. We can safely mask these and then take the smallest number_required of them (and do this on a structured array to get the corresponding data at the same time).
This implies another sort within the loop, but in fact we can use partitioning rather than a full sort, because all we need is the smallest num_required items. If num_required is substantially less than the number of data items then this is much faster than doing the argsort.
For example with num_required = 80 and num_available = 15000 the full argsort takes ~900µs whereas argpartition followed by index and slice to get the first 80 takes ~110µs. We still need to do the argsort to get the ranks at the outset (rather than just partitioning based on distance) in order to get the stability of the mergesort, and thus get the "right one" when distance is not unique.
My code as shown below now runs in ~610uS on real data, including the actual calculations that aren't shown here. I'm happy with that now, but there seem to be several other apparently minor factors that can have an influence on the runtime that's hard to understand.
For example putting the circle_template in the structured array along with dist, ranks, and another field not shown here, doubles the runtime of the overall function (even if we don't access circle_template in the loop!). Even worse, using np.partition on the structured array with order=['ranks'] increases the overall function runtime by almost two orders of magnitude vs using np.argpartition as shown below!
# radius will in reality be ~100
radius = 2
y,x = np.ogrid[-radius:radius+1, -radius:radius+1]
dist = np.sqrt(x**2 + y**2)
circle_template = dist > radius
ranks = dist.argsort(axis=None,kind='mergesort').argsort().reshape(dist.shape)
diam = radius * 2 + 1
# putting circle_template in this array too doubles overall function runtime!
fullWindowArray = np.zeros((diam,diam),dtype=[('ranks',ranks.dtype.str),
('thisdata',dayDataStack.dtype.str),
('alternatedata',dayDataStack.dtype.str),
('dist',spatialDist.dtype.str)])
fullWindowArray['ranks'] = ranks
fullWindowArray['dist'] = dist
# this will in reality be a very large 3 dimensional array
# representing daily images with some gaps, indicated by 0s
dataStack = np.zeros((2,5,5))
dataStack[1] = (np.random.random(25) * 100).reshape(dist.shape)
dataStack[0] = (np.random.random(25) * 100).reshape(dist.shape)
testdata = dataStack[1]
alternatedata = dataStack[0]
random_gap_locations = (np.random.random(25) * 30).reshape(dist.shape) > testdata
testdata[random_gap_locations] = 0
testdata[radius, radius] = 0
# in reality we will loop here to go through every gap (zero) location in the data
# for each image
gapz, gapy, gapx = 1, radius, radius
desLeft, desRight = gapx - radius, gapx + radius+1
desTop, desBottom = gapy - radius, gapy + radius+1
extentB, extentR = dataStack.shape[1:]
# handle the case where the gap is < search radius from the edge of
# the data. If this is the case, we can't use the full
# diam * diam window
dataL = max(0, desLeft)
maskL = 0 if desLeft >= 0 else abs(dataL - desLeft)
dataT = max(0, desTop)
maskT = 0 if desTop >= 0 else abs(dataT - desTop)
dataR = min(desRight, extentR)
maskR = diam if desRight <= extentR else diam - (desRight - extentR)
dataB = min(desBottom,extentB)
maskB = diam if desBottom <= extentB else diam - (desBottom - extentB)
# get the slice that we will be working within
# ranks, dist and circle are already populated
boundedWindowArray = fullWindowArray[maskT:maskB,maskL:maskR]
boundedWindowArray['alternatedata'] = alternatedata[dataT:dataB, dataL:dataR]
boundedWindowArray['thisdata'] = testdata[dataT:dataB, dataL:dataR]
locations_to_exclude = np.logical_or(boundedWindowArray['circle_template'],
np.logical_or
(boundedWindowArray['thisdata']==0,
boundedWindowArray['alternatedata']==0))
# the places that are inside the circular mask and where both images
# have data
locations_to_include = ~locations_to_exclude
number_available = np.count_nonzero(locations_to_include)
# we only want to do the interpolation calculations from the nearest n
# locations that have data available, n will be ~100 in reality
number_required = 3
data_to_use = boundedWindowArray[locations_to_include]
if number_available > number_required:
# argpartition seems to be v fast when number_required is
# substantially < data_to_use.size
# But partition on the structured array itself with order=['ranks']
# is almost 2 orders of magnitude slower!
reqIndices = np.argpartition(data_to_use['ranks'],number_required)[:number_required]
data_to_use = np.take(data_to_use,reqIndices)
else:
# we just use available_data and available_alternates as they are...
pass
# now do stuff with the selected data to calculate a value for the gap cell
I want to plot an approximation of the number "pi" which is generated by a function of two uniformly distributed random variables. The goal is to show that with a higher sample draw the function value approximates "pi".
Here is my function for pi:
def pi(n):
x = rnd.uniform(low = -1, high = 1, size = n) #n = size of draw
y = rnd.uniform(low = -1, high = 1, size = n)
a = x**2 + y**2 <= 1 #1 if rand. draw is inside the unit cirlce, else 0
ac = np.count_nonzero(a) #count 1's
af = np.float(ac) #create float for precision
pi = (af/n)*4 #compute p dependent on size of draw
return pi
My problem:
I want to create a lineplot that plots the values from pi() dependent on n.
My fist attempt was:
def pipl(n):
for i in np.arange(1,n):
plt.plot(np.arange(1,n), pi(i))
print plt.show()
pipl(100)
which returns:
ValueError: x and y must have same first dimension
My seocond guess was to start an iterator:
def y(n):
n = np.arange(1,n)
for i in n:
y = pi(i)
print y
y(1000)
which results in:
3.13165829146
3.16064257028
3.06519558676
3.19839679359
3.13913913914
so the algorithm isn't far off, however i need the output as a data type which matplotlib can read.
I read:
http://docs.scipy.org/doc/numpy/reference/routines.array-creation.html#routines-array-creation
and tried tom implement the function like:
...
y = np.array(pi(i))
...
or
...
y = pi(i)
y = np.array(y)
...
and all the other functions that are available from the website. However, I can't seem to get my iterated y values into one that matplotlib can read.
I am fairly new to python so please be considerate with my simple request. I am really stuck here and can't seem to solve this issue by myself.
Your help is really appreciated.
You can try with this
def pipl(n):
plt.plot(np.arange(1,n), [pi(i) for i in np.arange(1,n)])
print plt.show()
pipl(100)
that give me this plot
If you want to stay with your iterable approach you can use Numpy's fromiter() to collect the results to an array. Like:
def pipl(n):
for i in np.arange(1,n):
yield pi(i)
n = 100
plt.plot(np.arange(1,n), np.fromiter(pipl(n), dtype='f32'))
But i think Numpy's vectorize would be even better in this case, it makes the resulting code much more readable (to me). With this approach you dont need the pipl function anymore.
# vectorize the function pi
pi_vec = np.vectorize(pi)
# define all n's
n = np.arange(1,101)
# and plot
plt.plot(n, pi_vec(n))
A little side note, naming a function pi which does not return a true pi seems kinda tricky to me.
Alright, i had this homework recently (don't worry, i've already done it, but in c++) but I got curious how i could do it in python. The problem is about 2 light sources that emit light. I won't get into details tho.
Here's the code (that I've managed to optimize a bit in the latter part):
import math, array
import numpy as np
from PIL import Image
size = (800,800)
width, height = size
s1x = width * 1./8
s1y = height * 1./8
s2x = width * 7./8
s2y = height * 7./8
r,g,b = (255,255,255)
arr = np.zeros((width,height,3))
hy = math.hypot
print 'computing distances (%s by %s)'%size,
for i in xrange(width):
if i%(width/10)==0:
print i,
if i%20==0:
print '.',
for j in xrange(height):
d1 = hy(i-s1x,j-s1y)
d2 = hy(i-s2x,j-s2y)
arr[i][j] = abs(d1-d2)
print ''
arr2 = np.zeros((width,height,3),dtype="uint8")
for ld in [200,116,100,84,68,52,36,20,8,4,2]:
print 'now computing image for ld = '+str(ld)
arr2 *= 0
arr2 += abs(arr%ld-ld/2)*(r,g,b)/(ld/2)
print 'saving image...'
ar2img = Image.fromarray(arr2)
ar2img.save('ld'+str(ld).rjust(4,'0')+'.png')
print 'saved as ld'+str(ld).rjust(4,'0')+'.png'
I have managed to optimize most of it, but there's still a huge performance gap in the part with the 2 for-s, and I can't seem to think of a way to bypass that using common array operations... I'm open to suggestions :D
Edit:
In response to Vlad's suggestion, I'll post the problem's details:
There are 2 light sources, each emitting light as a sinusoidal wave:
E1 = E0*sin(omega1*time+phi01)
E2 = E0*sin(omega2*time+phi02)
we consider omega1=omega2=omega=2*PI/T and phi01=phi02=phi0 for simplicity
by considering x1 to be the distance from the first source of a point on the plane, the intensity of the light in that point is
Ep1 = E0*sin(omega*time - 2*PI*x1/lambda + phi0)
where
lambda = speed of light * T (period of oscillation)
Considering both light sources on the plane, the formula becomes
Ep = 2*E0*cos(PI*(x2-x1)/lambda)sin(omegatime - PI*(x2-x1)/lambda + phi0)
and from that we could make out that the intensity of the light is maximum when
(x2-x1)/lambda = (2*k) * PI/2
and minimum when
(x2-x1)/lambda = (2*k+1) * PI/2
and varies in between, where k is an integer
For a given moment of time, given the coordinates of the light sources, and for a known lambda and E0, we had to make a program to draw how the light looks
IMHO i think i optimized the problem as much as it could be done...
Interference patterns are fun, aren't they?
So, first off this is going to be minor because running this program as-is on my laptop takes a mere twelve and a half seconds.
But let's see what can be done about doing the first bit through numpy array operations, shall we? We have basically that you want:
arr[i][j] = abs(hypot(i-s1x,j-s1y) - hypot(i-s2x,j-s2y))
For all i and j.
So, since numpy has a hypot function that works on numpy arrays, let's use that. Our first challenge is to get an array of the right size with every element equal to i and another with every element equal to j. But this isn't too hard; in fact, an answer below points my at the wonderful numpy.mgrid which I didn't know about before that does just this:
array_i,array_j = np.mgrid[0:width,0:height]
There is the slight matter of making your (width, height)-sized array into (width,height,3) to be compatible with your image-generation statements, but that's pretty easy to do:
arr = (arr * np.ones((3,1,1))).transpose(1,2,0)
Then we plug this into your program, and let things be done by array operations:
import math, array
import numpy as np
from PIL import Image
size = (800,800)
width, height = size
s1x = width * 1./8
s1y = height * 1./8
s2x = width * 7./8
s2y = height * 7./8
r,g,b = (255,255,255)
array_i,array_j = np.mgrid[0:width,0:height]
arr = np.abs(np.hypot(array_i-s1x, array_j-s1y) -
np.hypot(array_i-s2x, array_j-s2y))
arr = (arr * np.ones((3,1,1))).transpose(1,2,0)
arr2 = np.zeros((width,height,3),dtype="uint8")
for ld in [200,116,100,84,68,52,36,20,8,4,2]:
print 'now computing image for ld = '+str(ld)
# Rest as before
And the new time is... 8.2 seconds. So you save maybe four whole seconds. On the other hand, that's almost exclusively in the image generation stages now, so maybe you can tighten them up by only generating the images you want.
If you use array operations instead of loops, it is much, much faster. For me, the image generation is now what takes so long time. Instead of your two i,j loops, I have this:
I,J = np.mgrid[0:width,0:height]
D1 = np.hypot(I - s1x, J - s1y)
D2 = np.hypot(I - s2x, J - s2y)
arr = np.abs(D1-D2)
# triplicate into 3 layers
arr = np.array((arr, arr, arr)).transpose(1,2,0)
# .. continue program
The basics that you want to remember for the future is: this is not about optimization; using array forms in numpy is just using it like it is supposed to be used. With experience, your future projects should not go the detour over python loops, the array forms should be the natural form.
What we did here was really simple. Instead of math.hypot we found numpy.hypot and used it. Like all such numpy functions, it accepts ndarrays as arguments, and does exactly what we want.
List comprehensions are much faster than loops. For example, instead of
for j in xrange(height):
d1 = hy(i-s1x,j-s1y)
d2 = hy(i-s2x,j-s2y)
arr[i][j] = abs(d1-d2)
You'd write
arr[i] = [abs(hy(i-s1x,j-s1y) - hy(i-s2x,j-s2y)) for j in xrange(height)]
On the other hand, if you're really trying to "optimize", then you might want to reimplement this algorithm in C, and use SWIG or the like to call it from python.
The only changes that come to my mind is to move some operations out of the loop:
for i in xrange(width):
if i%(width/10)==0:
print i,
if i%20==0:
print '.',
arri = arr[i]
is1x = i - s1x
is2x = i - s2x
for j in xrange(height):
d1 = hy(is1x,j-s1y)
d2 = hy(is2x,j-s2y)
arri[j] = abs(d1-d2)
The improvement, if any, will probably be minor though.