I'm new to the numpy in general so this is an easy question however i'm clueless as how to solve it.
i'm trying to implement K nearest neighbor algorithm for classification of a Data set
there are to arrays named new_points and point that respectively have the shape of (30,4)
and (120,4) (with 4 being the total number of the properties of each element)
so i'm trying to calculate the distance between each new point and all old points using numpy.broadcasting
def calc_no_loop(new_points, points):
return np.sum((new_points-points)**2,axis=1)
#doesn't work here is log
ValueError: operands could not be broadcast together with shapes (30,4) (120,4)
however as per rules of broadcasting two array of shapes (30,4) and (120,4) are incompatible
so i would appreciate any insight on how to slove this (using .reshape prehaps - not sure)
please note: that i'have already implemented the same function using one and two loops but can't implement it without one
def calc_two_loops(new_points, points):
m, n = len(new_points), len(points)
d = np.zeros((m, n))
for i in range(m):
for j in range(n):
d[i, j] = np.sum((new_points[i] - points[j])**2)
return d
def calc_one_loop(new_points, points):
m, n = len(new_points), len(points)
d = np.zeros((m, n))
print(d)
for i in range(m):
d[i] = np.sum((new_points[i] - points)**2)
return d
Let's create an exapmle smaller in size:
nNew = 3; nOld = 5 # Number of new / old points
# New points
new_points = np.arange(100, 100 + nNew * 4).reshape(nNew, 4)
# Old points
points = np.arange(10, 10 + nOld * 8, 2).reshape(nOld, 4)
To compute the differences alone, run:
dfr = new_points[:, np.newaxis, :] - points[np.newaxis, :, :]
So far we have differences in each property of each point (every new point with every old point).
The shape of dfr is (3, 5, 4):
first dimension: the number of new point,
second dimension: the number of old point,
third dimension: the difference in each property.
Then, to sum squares of differences by points, run:
d = np.power(dfr, 2).sum(axis=2)
and this is your result.
For my sample data, the result is:
array([[31334, 25926, 21030, 16646, 12774],
[34230, 28566, 23414, 18774, 14646],
[37254, 31334, 25926, 21030, 16646]], dtype=int32)
So you have 30 new points, and 120 old points, so if I understand you correctly you want a shape(120,30) array result of distances.
You could do
import numpy as np
points = np.random.random(120*4).reshape(120,4)
new_points = np.random.random(30*4).reshape(30,4)
def calc_no_loop(new_points, points):
res = np.zeros([len(points[:,0]),len(new_points[:,0])])
for idx in range(len(points[:,0])):
res[idx,:] = np.sum((points[idx,:]-new_points)**2,axis=1)
return np.sqrt(res)
test = calc_no_loop(new_points,points)
print(np.shape(test))
print(test)
Which gives
(120, 30)
[[0.67166838 0.78096694 0.94983683 ... 1.00960301 0.48076185 0.56419991]
[0.88156338 0.54951826 0.73919191 ... 0.87757896 0.76305462 0.52486626]
[0.85271938 0.56085692 0.73063341 ... 0.97884167 0.90509791 0.7505591 ]
...
[0.53968258 0.64514941 0.89225849 ... 0.99278462 0.31861253 0.44615026]
[0.51647526 0.58611128 0.83298535 ... 0.86669406 0.64931403 0.71517123]
[1.08515826 0.64626221 0.6898687 ... 0.96882542 1.08075076 0.80144746]]
But from your function name above I get the notion that you do not want a loop? Then you could do this instead:
def calc_no_loop(new_points, points):
new_points1 = np.repeat(new_points[np.newaxis,...],len(points),axis=0)
points1 = np.repeat(points[:,np.newaxis,:],len(new_points),axis=1)
return np.sqrt(np.sum((new_points-points1)**2 ,axis=2))
test = calc_no_loop(new_points,points)
print(np.shape(test))
print(test)
which has output:
(120, 30)
[[0.67166838 0.78096694 0.94983683 ... 1.00960301 0.48076185 0.56419991]
[0.88156338 0.54951826 0.73919191 ... 0.87757896 0.76305462 0.52486626]
[0.85271938 0.56085692 0.73063341 ... 0.97884167 0.90509791 0.7505591 ]
...
[0.53968258 0.64514941 0.89225849 ... 0.99278462 0.31861253 0.44615026]
[0.51647526 0.58611128 0.83298535 ... 0.86669406 0.64931403 0.71517123]
[1.08515826 0.64626221 0.6898687 ... 0.96882542 1.08075076 0.80144746]]
i.e. the same result. Note that I added the np.sqrt() into the result which you may have forgotten in your example above.
Related
I am relatively new to python and numpy and am trying to cluster a dense matrix with floating point numbers and having dimensions of 256x256 using spectral clustering. Since the affinity matrix will be of size 65536x65536, a full affinity matrix cannot be computed (due to memory limitations). As such, I am currently calculating the affinity between a given matrix entry and its 5x5 local neighbourhood, and build a sparse graph (in 3-tuple representation).
To do so, I am using for loops (basically, a sliding widow approach) which I think is not the most efficient way of doing so.
import numpy as np
def getAffinity(f1, f2):
return np.exp(-np.linalg.norm(np.absolute(f1 - f2))/ 2.1)
G = np.arange(256*256).reshape((256,256))
dim1 = 256 # Dimension 1 of matrix
dim2 = 256 # Dimension 1 of matrix
values = np.zeros(1623076, dtype=np.float32) # To hold affinities
rows = np.zeros(1623076, dtype=np.int32) # To hold row index
cols = np.zeros(1623076, dtype=np.int32) # To hold column index
index = 0 # To hold column index
for i in range(dim1):
for j in range(dim2):
current = G[i, j]
for k in range(np.maximum(0, i-2), np.minimum(dim1 , i+3)): # traverse rows
for l in range(np.maximum(0, j-2), np.minimum(dim2 , j+3)): # traverse columns
rows[index] = i*d1 + j
cols[index] = k*d1 + l
values[index] = getAffinity(current, G[k, l])
index += 1
I was wondering whether there are any other efficient ways of achieving the same goal.
Here is a sparse matrix approach. It is >800x faster than the loopy code.
import numpy as np
from scipy import sparse
from time import perf_counter as pc
T = []
T.append(pc())
def getAffinity(f1, f2):
return np.exp(-np.linalg.norm(np.absolute(f1 - f2))/ 2.1)
G = 2*np.arange(256*256).reshape((256,256))
dim1 = 256 # Dimension 1 of matrix
dim2 = 256 # Dimension 1 of matrix
values = np.zeros(1623076, dtype=np.float32) # To hold affinities
rows = np.zeros(1623076, dtype=np.int32) # To hold row index
cols = np.zeros(1623076, dtype=np.int32) # To hold column index
index = 0 # To hold column index
for i in range(dim1):
for j in range(dim2):
current = G[i, j]
for k in range(np.maximum(0, i-2), np.minimum(dim1 , i+3)): # traverse rows
for l in range(np.maximum(0, j-2), np.minimum(dim2 , j+3)): # traverse columns
rows[index] = i*dim1 + j
cols[index] = k*dim1 + l
values[index] = getAffinity(current, G[k, l])
index += 1
T.append(pc())
affs_OP = sparse.coo_matrix((values,(rows,cols))).tocsr()
import scipy.sparse as sp
def getAffinity(f1, f2): # similar to #PaulPanzer, I don't think OP is right
return np.exp(-np.abs(f1 - f2)/ 2.1)
def affinity_block(dim = 256, dist = 2):
i = np.arange(-dist, dist+1)
init_block = sp.dia_matrix((np.ones((i.size, dim)), i), (dim, dim))
out = sp.kron(init_block, init_block).tocoo()
out.data = getAffinity(Gf[out.row], Gf[out.col])
return out
T.append(pc())
Gf = G.ravel()
offsets = np.concatenate((np.mgrid[1:3,-2:3].reshape(2,-1).T,np.mgrid[:1,1:3].reshape(2,-1).T), axis=0)
def make_diag(yo,xo):
o = 256*yo+xo
diag = np.exp(-np.abs(Gf[o:]-Gf[:-o])/2.1)
if xo>0:
diag[:xo-256].reshape(-1,256)[:,-xo:] = 0
elif xo<0:
diag[:xo].reshape(-1,256)[:,:-xo] = 0
diag[xo:] = 0
return diag
diags = [make_diag(*o) for o in offsets]
offsets = np.sum(offsets*[256,1], axis=1)
affs_pp = sparse.diags([*diags,[np.ones(256*256)],*diags],np.concatenate([offsets,[0],-offsets]))
T.append(pc())
affs_df = affinity_block()
T.append(pc())
print("OP: {:.3f} s convert OP to sparse matrix: {:.3f} s pp {:.3f} s df: {:.3f} s".format(*np.diff(T)))
diff = affs_pp-affs_OP
diff *= diff.sign()
md = diff.max()
print(f"max deviation pp-OP: {md}")
print(f"number of different entries pp-df: {(affs_pp-affs_df).nnz}")
Sample run:
OP: 23.392 s convert OP to sparse matrix: 0.020 s pp 0.025 s df: 0.093 s
max deviation pp-OP: 2.0616356788405454e-08
number of different entries pp-df: 0
A bit of explanation, 1D first to keep it simple. Let's imagine an actually sliding window, so we can use time as an intuitive axis:
space
+-------------->
|
t | xo... x: window center
i | oxo.. o: window off center
m | .oxo. .: non window
e | ..oxo
| ...ox
v
time here actually is equivalent to space because we move with constant speed. We can now see that all the window points can be described as three diagonals. Offsets are 0, 1 and -1 but note that because the affinities are symmetric and the one for 0 is trivial, we need only calculate them for 1.
Now lets skip to 2D, the smallest example we can do is 3x3 window in 4x4 array. In row major this looks like.
xo..oo..........
oxo.ooo.........
.oxo.ooo........
..ox..oo........
oo..xo..oo......
ooo.oxo.ooo.....
.ooo.oxo.ooo....
..oo..ox..oo....
....oo..xo..oo..
....ooo.oxo.ooo.
.....ooo.oxo.ooo
......oo..ox..oo
........oo..xo..
........ooo.oxo.
.........ooo.oxo
..........oo..ox
The relevant offsets are (0,1),(1,-1),(1,0),(1,1) or in row major 0x4+1 = 1, 1x4-1 = 3, 1x4+0 = 4, 1x4+1 = 5. Also note that most of these diagonals are not complete, the missing bits explained by row major wrapping around, i.e. at z = y,x x = 3 the right neighbor z+1 is not actually a right neighbor y,x+1 ; instead, because of line jump, it is y+1,0 The if-else clause in the code above blanks the right bits of each diagonal.
#DanielF's strategy is similar but takes advantage of the block structure evident in the figure.
xo.. oo.. .... ....
oxo. ooo. .... ....
.oxo .ooo .... ....
..ox ..oo .... ....
oo.. xo.. oo.. ....
ooo. oxo. ooo. ....
.ooo .oxo .ooo ....
..oo ..ox ..oo ....
.... oo.. xo.. oo..
.... ooo. oxo. ooo.
.... .ooo .oxo .ooo
.... ..oo ..ox ..oo
.... .... oo.. xo..
.... .... ooo. oxo.
.... .... .ooo .oxo
.... .... ..oo ..ox
This seems to be a bit more elegant and extensible, albeit a bit (4x) slower, way to do the same thing as #PaulPanzer
import scipy.sparse as sp
from functools import reduce
def getAffinity(f1, f2): # similar to #PaulPanzer, I don't think OP is right
return np.exp(-np.abs(f1 - f2)/ 2.1)
def affinity_block(G, dist = 2):
Gf = G.ravel()
i = np.arange(-dist, dist+1)
init_blocks = [1]
for dim in G.shape:
init_blocks.append(sp.dia_matrix((np.ones((i.size, dim)), i), (dim, dim)))
out = reduce(sp.kron, init_blocks).tocoo()
out.data = getAffinity(Gf[out.row], Gf[out.col])
return out
This allows non-square G matrices, and higher dimensions.
So i have the following lines of code
np.argmin(distances, axis = 0)
Here distances is a distances matrix between k centroids, and n points. so it's a k x n matrix.
So with this line of code i'm trying to find the closest centroid for each point, by taking the argmin along axis 0.
My goal is to have a similar vectorized code without the axis argument, as it is not implemented in the fork of numpy i'm using.
Any help would be nice :)
Here's a vectorized one -
def partial_argsort(a):
idar = np.zeros(a.max()+1,dtype=int)
idar[a] = np.arange(len(a))
return idar[np.sort(a)]
def argmin_0(a):
# Define a scaling array to scale each col such that each col is
# offsetted against its previous one
s = (a.max()+1)*np.arange(a.shape[1])
# Scale each col, flatten with col-major order. Find global partial-argsort.
# With the offsetting, those argsort indices would be limited to per-col
# Subtract each group of ncols elements based on the offsetting.
m,n = a.shape
a1D = (a+s).T.ravel()
return partial_argsort(a1D)[::m]-m*np.arange(n)
Sample run for verification -
In [442]: np.random.seed(0)
...: a = np.random.randint(11,9999,(1000,1000))
...: idx0 = argmin_0(a)
...: idx1 = a.argmin(0)
...: r = np.arange(len(idx0))
...: print (a[idx0,r] == a[idx1,r]).all()
True
I'm currently trying to video stabilization using OpenCV and Python.
I use the following function to calculate rotation:
def accumulate_rotation(src, theta_x, theta_y, theta_z, timestamps, prev, current, f, gyro_delay=None, gyro_drift=None, shutter_duration=None):
if prev == current:
return src
pts = []
pts_transformed = []
for x in range(10):
current_row = []
current_row_transformed = []
pixel_x = x * (src.shape[1] / 10)
for y in range(10):
pixel_y = y * (src.shape[0] / 10)
current_row.append([pixel_x, pixel_y])
if shutter_duration:
y_timestamp = current + shutter_duration * (pixel_y - src.shape[0] / 2)
else:
y_timestamp = current
transform = getAccumulatedRotation(src.shape[1], src.shape[0], theta_x, theta_y, theta_z, timestamps, prev,
current, f, gyro_delay, gyro_drift)
output = cv2.perspectiveTransform(np.array([[pixel_x, pixel_y]], dtype="float32"), transform)
current_row_transformed.append(output)
pts.append(current_row)
pts_transformed.append(current_row_transformed)
o = utilities.meshwarp(src, pts_transformed)
return o
I get the following error when it gets to output = cv2.perspectiveTransform(np.array([[pixel_x, pixel_y]], dtype="float32"), transform):
cv2.error: /Users/travis/build/skvark/opencv-python/opencv/modules/core/src/matmul.cpp:2271: error: (-215) scn + 1 == m.cols in function perspectiveTransform
Any help or suggestions would really be appreciated.
This implementation really needs to be changed in a future version, or the docs should be more clear.
From the OpenCV docs for perspectiveTransform():
src – input two-channel (...) floating-point array
Slant emphasis added by me.
>>> A = np.array([[0, 0]], dtype=np.float32)
>>> A.shape
(1, 2)
So we see from here that A is just a single-channel matrix, that is, two-dimensional. One row, two cols. You instead need a two-channel image, i.e., a three-dimensional matrix where the length of the third dimension is 2 or 3 depending on if you're sending in 2D or 3D points.
Long story short, you need to add one more set of brackets to make the set of points you're sending in three-dimensional, where the x values are in the first channel, and the y values are in the second channel.
>>> A = np.array([[[0, 0]]], dtype=np.float32)
>>> A.shape
(1, 1, 2)
Also, as suggested in the comments:
If you have an array points of shape (n_points, dimension) (i.e. dimension is 2 or 3), a nice way to re-format it for this use-case is points[np.newaxis]
It's not intuitive, and though it's documented, it's not very explicit on that point. That's all you need. I've answered an identical question before, but for the cv2.transform() function.
Here's the code:
x = range(-6,7)
tmp1 = []
for i in range(len(x)):
tmp1.append(math.exp(-(i*i)/(2*self.sigma*self.sigma)))
max_tmp1 = max(tmp1)
mod_tmp1 = []
for i in range(len(tmp1)):
mod_tmp1.append(max_tmp1 - i)
ht1 = np.kron(np.ones((9,1)),tmp1)
sht1 = sum(ht1.flatten(1))
mean = sht1/(13*9)
ht1 = ht1 - mean
ht1 = ht1/sht1
print ht1.shape
h = np.zeros((16,16))
for i in range(0, 9):
for j in range(0, 13):
h[i+3, j+1] = ht1[i, j]
for i in range(0, 10):
ag = 15*i
np.append(h, scipy.misc.imrotate(h, ag, 'bicubic'))
R = []
print h.shape
print self.img.shape
for i in range(0, 11):
print 'here'
R[i] = scipy.signal.convolve2d(self.img, h[i], mode = 'same')
rt = np.zeros(self.img.shape)
x, y = self.img.shape
The error I get states:
ValueError: object of too small depth for desired array
It looks to me as if the problem is that you're setting h up wrongly. I assume you want h[i] to be a 16x16 array suitable for convolving with, but that's not what you've actually made it, for a couple of different reasons.
I suggest you change the loop with the imrotate calls to this:
h = [scipy.misc.imrotate(h, 15*i, 'bicubic') for i in range(10)]
(What your existing code does is: first set up h as a single 16x16 array; then, repeatedly: compute a rotated version, "flatten" both h and that to make 256-element vectors, compute the result of appending them to make a 512-element vector, and throw the result away. numpy.append doesn't operate in place, and defaults to flattening its arguments before it appends. Neither of those is what you want!)
The list comprehension above will give you a 10-element Python list containing rotated versions of your convolution kernel.
... Oh, I see that your loop computing R actually wants 11 kernels, not 10. Make it range(11), then. (Your original code generated rotations of 0, 0, 15, 30, ..., 135 degrees, but I'm guessing 0, 15, 30, ..., 150 degrees is more likely to be what you want.)
I'm working in 3D context. I've some objects in this space who are represented by x, y, z position.
# My objects names (in my real context it's pheromone "point")
A = 1
B = 2
C = 3
D = 4
# My actual way to stock their positions
pheromones_positions = {
(25, 25, 60): [A, D],
(10, 90, 30): [B],
(5, 85, 8): [C]
}
My objective is to found what points (pheromones) are near (with distance) a given emplacement. I do this simply with:
def calc_distance(a, b):
return sqrt((a[0]-b[0])**2+(a[1]-b[1])**2+(a[2]-b[2])**2)
def found_in_dict(search, points, distance):
for point in points:
if calc_distance(search, point) <= distance:
return points[point]
founds = found_in_dict((20, 20, 55), pheromones_positions, 10)
# found [1, 4] (A and D)
But, with a lot of pheromones it's very slow (test them one by one ...). How can i organize these 3D positions to found more quickly "positions by distance from given position" ?
Does exist algorithms or librarys (numpy ?) who can help me in this way ?
You should compute all (squared) distances at once. With NumPy you can simply subtract the target point of size 1x3 from the (nx3) array of all position coordinates and sum the squared coordinate differences to obtain a list with n elements:
squaredDistances = np.sum((np.array(pheromones_positions.keys()) - (20, 20, 55))**2, axis=1)
idx = np.where(squaredDistances < 10**2)[0]
print pheromones_positions.values()[idx]
Output:
[1, 4]
By the way: Since your return statement is within the for-loop over all points, it will stop iterating after finding a first point. So you might miss a second or third match.