I am reading an image captured through opencv and want to map a function to every pixel value in the image. The output is an m x n x 3 numpy array, where m and n are the coordinates of length and width of the image and the three values are the corresponding blue, green, and red values for each pixel.
I first thought to run a nested for loop to each value in the image. However, it takes a long time to run, so I am looking for a more efficient way to loop over the image quickly.
Here is the nested for loop:
a = list()
for row in img:
for col in row:
a.append(np.sqrt(np.prod(col[1:])))
adjusted = np.asarray(a).reshape((img.shape[0], img.shape[1]))
This code works, but I would like to make it run faster. I know vectorization could be an option, but I do not know how to apply it onto only part of an array and not a whole array. To do this, I think I could reshape it to img.reshape((np.prod(img.shape[:2]),3)) and then loop over each set of three values, but I do not know the correct function/iterator to use.
Also, if opencv/numpy/scipy has another function that does just this, it would be a great help. I'm also open to other options, but I wanted to give some ideas that I had.
In the end, I want to take the input and calculate the geometric mean of the red and green values and create an n x m array of the geometric means. Any help would be appreciated!
This can be vectorized using the axis parameter in np.prod(). Setting axis=-1 will cause the product to only be taken on the last axis.
To perform this product on only the last two channels, index the array to extract only those channels using img[..., 1:]
You can replace your code with the following line:
adjusted = np.sqrt(np.prod(img[..., 1:], axis=-1))
For fun, let's profile these two functions using some simulated data:
import numpy as np
img = np.random.random((100,100,3))
def original_function(img):
a = []
for row in img:
for col in row:
a.append(np.sqrt(np.prod(col[1:])))
adjusted = np.asarray(a).reshape((img.shape[0], img.shape[1]))
return adjusted
def improved_function(img):
return np.sqrt(np.prod(img[:,:,1:], axis=-1))
>>> %timeit -n 100 original_function(img)
100 loops, best of 3: 55.5 ms per loop
>>> %timeit -n 100 improved_function(img)
100 loops, best of 3: 115 µs per loop
500x improvement in speed! The beauty of numpy vectorization :)
Related
I've implemented a k-means clustering algorithm in python, and now I want to label a new data with the clusters I got with my algorithm. My approach is to iterate through every data point and every centroid to find the minimum distance and the centroid associated with it. But I wonder if there are simpler or shorter ways to do it.
def assign_cluster(clusterDict, data):
clusterList = []
label = []
cen = list(clusterDict.values())
for i in range(len(data)):
for j in range(len(cen)):
# if cen[j] has the minimum distance with data[i]
# then clusterList[i] = cen[j]
Where clusterDict is a dictionary with keys as labels, [0,1,2,....] and values as coordinates of centroids.
Can someone help me implementing this?
This is a good use case for numba, because it lets you express this as a simple double loop without a big performance penalty, which in turn allows you to avoid the excessive extra memory of using np.tile to replicate the data across a third dimension just to do it in a vectorized manner.
Borrowing the standard vectorized numpy implementation from the other answer, I have these two implementations:
import numba
import numpy as np
def kmeans_assignment(centroids, points):
num_centroids, dim = centroids.shape
num_points, _ = points.shape
# Tile and reshape both arrays into `[num_points, num_centroids, dim]`.
centroids = np.tile(centroids, [num_points, 1]).reshape([num_points, num_centroids, dim])
points = np.tile(points, [1, num_centroids]).reshape([num_points, num_centroids, dim])
# Compute all distances (for all points and all centroids) at once and
# select the min centroid for each point.
distances = np.sum(np.square(centroids - points), axis=2)
return np.argmin(distances, axis=1)
#numba.jit
def kmeans_assignment2(centroids, points):
P, C = points.shape[0], centroids.shape[0]
distances = np.zeros((P, C), dtype=np.float32)
for p in range(P):
for c in range(C):
distances[p, c] = np.sum(np.square(centroids[c] - points[p]))
return np.argmin(distances, axis=1)
Then for some sample data, I did a few timing experiments:
In [12]: points = np.random.rand(10000, 50)
In [13]: centroids = np.random.rand(30, 50)
In [14]: %timeit kmeans_assignment(centroids, points)
196 ms ± 6.78 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [15]: %timeit kmeans_assignment2(centroids, points)
127 ms ± 12.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
I won't go as far to say that the numba version is certainly faster than the np.tile version, but clearly it's very close while not incurring the extra memory cost of np.tile.
In fact, I noticed for my laptop that when I make the shapes larger and use (10000, 1000) for the shape of points and (200, 1000) for the shape of centroids, then np.tile generated a MemoryError, meanwhile the numba function runs in under 5 seconds with no memory error.
Separately, I actually noticed a slowdown when using numba.jit on the first version (withnp.tile), which is likely due to the extra array creation inside the jitted function combined with the fact that there's not much numba can optimize when you're already calling all vectorized functions.
And I also did not notice any significant improvement in the second version when trying to shorten the code by using broadcasting. E.g. shortening the double loop to be
for p in range(P):
distances[p, :] = np.sum(np.square(centroids - points[p, :]), axis=1)
did not really help anything (and would use more memory when repeatedly broadcasting points[p, :] across all of centroids).
This is one of the really nice benefits of numba. You really can write the algorithms in a very straightforward, loop-based way that comports with standard descriptions of algorithms and allows finer point of control over how the syntax unpacks into memory consumption or broadcasting... all without giving up runtime performance.
An efficient way to perform assignment phase is by doing vectorized computation. This approach assumes that you start with two 2D arrays: points and centroids, with the same number of columns (dimensionality of space), but possibly different number of rows. By using tiling (np.tile) we can then compute the distance matrix in a batch, then select the closest clusters per each point.
Here's the code:
def kmeans_assignment(centroids, points):
num_centroids, dim = centroids.shape
num_points, _ = points.shape
# Tile and reshape both arrays into `[num_points, num_centroids, dim]`.
centroids = np.tile(centroids, [num_points, 1]).reshape([num_points, num_centroids, dim])
points = np.tile(points, [1, num_centroids]).reshape([num_points, num_centroids, dim])
# Compute all distances (for all points and all centroids) at once and
# select the min centroid for each point.
distances = np.sum(np.square(centroids - points), axis=2)
return np.argmin(distances, axis=1)
See this GitHub gist for a complete runnable example.
Consider a regular matrix that represents nodes numbered as shown in the figure:
I want to make a list with all the triangles represented in the figure. Which would result in the following 2 dimensional list:[[0,1,4],[1,5,4],[1,2,5],[2,6,5],...,[11,15,14]]
Assuming that the dimensions of the matrix are (NrXNc) ((4X4) in this case), I was able to achieve this result with the following code:
def MakeFaces(Nr,Nc):
Nfaces=(Nr-1)*(Nc-1)*2
Faces=np.zeros((Nfaces,3),dtype=np.int32)
for r in range(Nr-1):
for c in range(Nc-1):
fi=(r*(Nc-1)+c)*2
l1=r*Nc+c
l2=l1+1
l3=l1+Nc
l4=l3+1
Faces[fi]=[l1,l2,l3]
Faces[fi+1]=[l2,l4,l3]
return Faces
However, the double loop operations make this approach quite slow. Is there a way of using numpy in a smart way to do this faster?
We could play a multi-dimensional game based on slicing and multi-dim assignment that are perfect in NumPy environment on efficiency -
def MakeFacesVectorized1(Nr,Nc):
out = np.empty((Nr-1,Nc-1,2,3),dtype=int)
r = np.arange(Nr*Nc).reshape(Nr,Nc)
out[:,:, 0,0] = r[:-1,:-1]
out[:,:, 1,0] = r[:-1,1:]
out[:,:, 0,1] = r[:-1,1:]
out[:,:, 1,1] = r[1:,1:]
out[:,:, :,2] = r[1:,:-1,None]
out.shape =(-1,3)
return out
Runtime test and verification -
In [226]: Nr,Nc = 100, 100
In [227]: np.allclose(MakeFaces(Nr, Nc), MakeFacesVectorized1(Nr, Nc))
Out[227]: True
In [228]: %timeit MakeFaces(Nr, Nc)
100 loops, best of 3: 11.9 ms per loop
In [229]: %timeit MakeFacesVectorized1(Nr, Nc)
10000 loops, best of 3: 133 µs per loop
In [230]: 11900/133.0
Out[230]: 89.47368421052632
Around 90x speedup for Nr, Nc = 100, 100!
You can achieve a similar result without any explicit loops if you recast the problem correctly. One way would be to imagine the result as three arrays, each containing one of the vertices: first, second and third. You can then zip up or otherwise convert the arrays into whatever format you like in a fairly inexpensive operation.
You start with the actual matrix. This will make indexing and selecting elements much easier:
m = np.arange(Nr * Nc).reshape(Nr, Nc)
The first array will contain all the 90-degree corners:
c1 = np.concatenate((m[:-1, :-1].ravel(), m[1:, 1:].ravel()))
m[:-1, :-1] are the corners that are at the top, m[1:, 1:] are the corners that are at the bottom.
The second array will contain the corresponding top acute corners:
c2 = np.concatenate((m[:-1, 1:].ravel(), m[:-1, 1:].ravel()))
And the third array will contain the bottom corners:
c2 = np.concatenate((m[1:, :-1].ravel(), m[1:, :-1].ravel()))
You can now get an array like your original one back by zipping:
faces = list(zip(c1, c2, c3))
I am sure that you can find ways to improve this algorithm, but it is a start.
I am trying to down sample a fixed [Mx1] vector into any given [Nx1] dimensions by using averaging method. I have a dynamic window size that changes every time depending upon the required output array. So, in some cases i get lucky and get window size of int that perfectly fits according to the window size and sometimes i get floating number as a windows size. But, how can i use floating size windows to make a vector of [Nx1] size from a fixed [Mx1] vector?
Below is the code that i have tried:
chunk = 0.35
def fixed_meanVector(vec, chunk):
size = (vec.size*chunk) #size of output according to the chunk
R = (vec.size/size) #windows size to transform array into chunk size
pad_size = math.ceil(float(vec.size)/R)*R - vec.size
vec_padded = np.append(vec, np.zeros(pad_size)*np.NaN)
print "Org Vector: ",vec.size, "output Size: ",size, "Windows Size: ",R, "Padding size", pad_size
newVec = scipy.nanmean(vec_padded.reshape(-1,R), axis=1)
print "New Vector shape: ",newVec.shape
return newVec
print "Word Mean of N values Similarity: ",cosine(fixed_meanVector(vector1, chunk)
,fixed_meanVector(vector2, chunk))
Output:
New Vector shape: (200,)
Org Vector: 400 output Size: 140.0 Windows Size: 2.85714285714 Padding size 0.0
New Vector shape: (200,)
0.46111661289
In above example, I need to down sample [Mx1] ([400x1]) vector in Nx1 ([140x1]) dimensions. So, dynamically window size [2.857x1] can be used to downsample [Mx1] vector . But, in this case i am getting a vector of [200x1] as my output instead of [140x1] due to the floating window it raises to the flour(2.85) it is downsampled with -> [2x1].
Padding is zero because, my window size is perfect for new [Nx1] dimensions. So, is there any way to use such type of windows sizes to down sample a [Mx1] vector?
It is possible but not natural to vectorise that, as soon as M%N>0. because the amount of cells used to build the result array is not constant, between 3 and 4 in your case.
The natural method is to run through the array, adjusting at each bin :
the idea is to fill each bin until overflow. then cut the overflow (carry) and keep it for next bin. the last carry is always null using int arithmetic.
The code :
def resized(data,N):
M=data.size
res=empty(N,data.dtype)
carry=0
m=0
for n in range(N):
sum = carry
while m*N - n*M < M :
sum += data[m]
m += 1
carry = (m-(n+1)*M/N)*data[m-1]
sum -= carry
res[n] = sum*N/M
return res
Test :
In [5]: resized(np.ones(7),3)
Out[5]: array([ 1., 1., 1.])
In [6]: %timeit resized(rand(400),140)
1000 loops, best of 3: 1.43 ms per loop
It works, but not very quickly. Fortunatelly, you can speed it with numba :
from numba import jit
resized2=jit(resized)
In [7]: %timeit resized2(rand(400),140)
1 loops, best of 3: 8.21 µs per loop
Probably faster than any pure numpy solution (here for N=3*M):
IN [8]: %timeit rand(402).reshape(-1,3).mean(1)
10000 loops, best of 3: 39.2 µs per loop
Note it works also if M>N.
In [9]: resized(arange(4.),9)
Out[9]: array([ 0. , 0. , 0.75, 1. , 1.5 , 2. , 2.25, 3. , 3. ])
You're doing it wrong, you build a window for your required decimation, not the other way around.
Mr Nyquist says you can't have a BW above fs/2, or you'll have nasty aliasing.
So to solve it you don't just "average", but lowpass so that frequencies above fs/2 are below your acceptable noise floor.
MA's are a valid type of low pass filter, you're just applying it to the wrong array.
The usual case for arbitrary decimation is.
Upsample -> Lowpass -> Downsample
So, to be able to arbitrary Decimate from N to M samples the algorithm is:
find LCM between your current samples your target samples.
upsample by LCM/N
design a filter using a stop frequency ws<= M/LCM
downsample by LCM/M
What you call averaging method, is a FIR filter with a rectangular window
If you use the first zero of the frequency response in that window as stop band, then you you can calculate the needed window size K , as
2/K <= M/LCM
so you must use windows of size:
ceil(2*LCM/M) = K
Obviously, you don't need to implement all of this. Just design a proper window with ws<= M/LCM and apply it using scipy.signal.resample.
And if the ceil applied to the window messes up your results, don't use rectangular windows, there are tons of better filters you can use.
I am trying to vectorize a loop iteration using NumPy but am struggling to achieve the desired results. I have an array of pixel values, so 3 dimensions, say (512,512,3) and need to iterate each x,y and calculate another value using a specific index in the third dimension. An example of this code in a standard loop is as follows:
for i in xrange(width):
for j in xrange(height):
temp = math.sqrt((scalar1-array[j,i,1])**2+(scalar2-array[j,i,2])**2)
What I am currently doing is this:
temp = np.sqrt((scalar1-array[:,:,1])**2+(scalar2-array[:,:,2])**2)
The temp array I get from this is the desired dimensions (x,y) but some of the values differ from the loop implementation. How can I eliminate the loop to compute this example efficiently in NumPy?
Thanks in advance!
Edit:
Here is code that is giving me differing results for temp and temp2, obviously temp2 is just the calculation for one cell
temp = np.sqrt((cb_key-fg_cbcr_array[:,:,1])**2+(cr_key-fg_cbcr_array[:,:,2])**2)
temp2 = np.sqrt((cb_key-fg_cbcr_array[500,500,1])**2+(cr_key-fg_cbcr_array[500,500,2])**2)
print temp[500, 500]
print temp2
The output for the above is
12.039
94.069123521
The scalars are definitely initialized and the array is generated from an image using
fg = PIL.Image.open('fg.jpg')
fg_cbcr = fg.convert("YCbCr")
fg_cbcr_array = np.array(fg_cbcr)
Edit2:
Ok so I have tracked it down to a problem with my array. Not sure why yet but it works when the array is generated with np.random.random but not when loading from a file using PIL as above.
Your vectorized solution is correct.
in your for loop temp is a scalar value that will take only the last value
use np.sqrt istead of math.sqrt for vectorized inputs
you should not use array as a variable since it can shadow the np.array method
I checked using the following code, which may give you some tip about where the error may be:
import numpy as np
width = 512
height = 512
scalar1 = 1
scalar2 = 2
a = np.random.random((height, width, 3))
tmp = np.zeros((height, width))
for i in xrange(width):
for j in xrange(height):
tmp[j,i] = np.sqrt((scalar1-a[j,i,1])**2+(scalar2-a[j,i,2])**2)
tmp2 = np.sqrt((scalar1-a[:,:,1])**2+(scalar2-a[:,:,2])**2)
np.allclose(tmp, tmp2)
I've got 3x3D arrays which are the red, green and blue channels of a 3D rgb image. What is an elegant way in numpy to to create a histogram volume of the input channels?
The operation would be equivalent to
""" assume R, G and B are 3D arrays and output is a 3D array filled with zeros """
for x in x_dim:
for y in y_dim:
for z in z_dim:
output[ R[x][y][z] ][ G[x][y][z] ][ B[x][y][z] ] += 1
This code too slow for large images. Can numpy improve the efficiency of the above algorithm?
You can do it using numpy.histogramdd but, as you say, the method proposed by #jozzas won't work. What you have to do is flatten each of your three 3D arrays and then combine them into a 2-d array of dimensions (x_dim*y_dim*z_dim, 3), which you pass to histogramdd. The fact that your original data are 3D is a red herring, since the spatial information is irrelevant to calculating the histogram.
Here is an example using random data in the channel cubes:
import numpy
n = 400 # approximate largest cube size that works on my laptop
# Fill channel cubes with random 8-bit integers
r = numpy.random.randint(256, size=(n,n,n)).astype(numpy.uint8)
g = numpy.random.randint(256, size=(n,n,n)).astype(numpy.uint8)
b = numpy.random.randint(256, size=(n,n,n)).astype(numpy.uint8)
# reorder data into for suitable for histogramming
data = numpy.vstack((r.flat, g.flat, b.flat)).astype(numpy.uint8).T
# Destroy originals to save space
del(r); del(g); del(b)
m = 256 # size of 3d histogram cube
hist, edges = numpy.histogramdd(
data, bins=m, range=((-0.5,255.5),(-0.5,255.5),(-0.5,255.5))
)
# Check that it worked
assert hist.sum() == n**3, 'Failed to conserve pixels'
This does use a lot more memory than you would expect because histogramdd seems to be using 64-bit floats to do its work, even though we are sending it 8-bit integers.
Assuming 8-bit channels, the 3-tuple of integers (R,G,B) can be thought of as a single number in base 256: R*256**2 + G*256 + B. Thus we can convert the 3 arrays R,G,B into a single array of "color values" and use np.bincount to produce the desired histogram.
import numpy as np
def using_bincount(r,g,b):
r=r.ravel().astype('int32')
g=g.ravel().astype('int32')
b=b.ravel().astype('int32')
output=np.zeros((base*base*base),dtype='int32')
result=np.bincount(r*base**2+g*base+b)
output[:len(result)]+=result
output=output.reshape((base,base,base))
return output
def using_histogramdd(r,g,b):
data = np.vstack((r.flat, g.flat, b.flat)).astype(np.uint8).T
del(r); del(g); del(b)
hist, edges = np.histogramdd(
data, bins=base, range=([0,base],[0,base],[0,base])
)
return hist
np.random.seed(0)
n = 200
base = 256
r = np.random.randint(base, size=(n,n,n)).astype(np.uint8)
g = np.random.randint(base, size=(n,n,n)).astype(np.uint8)
b = np.random.randint(base, size=(n,n,n)).astype(np.uint8)
if __name__=='__main__':
bhist=using_bincount(r,g,b)
hhist=using_histogramdd(r,g,b)
assert np.allclose(bhist,hhist)
These timeit results suggest using_bincount is faster than using_histogramdd, perhaps because histogramdd is built for handling floats and bins which are ranges, while bincount is solely for counting integers.
% python -mtimeit -s'import test' 'test.using_bincount(test.r,test.g,test.b)'
10 loops, best of 3: 1.07 sec per loop
% python -mtimeit -s'import test' 'test.using_histogramdd(test.r,test.g,test.b)'
10 loops, best of 3: 8.42 sec per loop
You can use numpy's histogramdd to compute the histogram of an n-dimensional array. If you don't want a histogram for each 2d slice, be sure to set the bins for that dimension to 1.
To get the overall histogram, you could compute them individually for the R, G and B channels and then take the maximum value of the three for each position.