How to efficiently order a numpy matrix

How to efficiently order a numpy matrix - python

I have this numpy array
matrix = np.array([[ 0.8, 0.2, 0.1],
[ 1. , 0. , 0. ],
[ 0. , 0. , 1. ]])
and I would like to filter to return, for each row of matrix the indices in decreasing value order.
For example, this would be
np.array([[0, 1, 2], [0, 1, 2], [2, 0, 1]])
I know I could use np.argsort, but this doesn't seem to be returning the right output. I tried changing the axis to different values, but that doesn't help either.

Probably the easiest way to get your desired output would be:
(-matrix).argsort(axis=1)
# array([[0, 1, 2],
# [0, 1, 2],
# [2, 0, 1]])

I think np.argsort does seem to do the trick, you just need to make sure to flip the matrix horizontally to make it decreasing order:
>>>matrix = np.array(
[[ 0.8, 0.2, 0.1],
[ 1. , 0. , 0. ],
[ 0. , 0. , 1. ]])
>>> np.fliplr(np.argsort(matrix))
array([[0, 1, 2],
[0, 2, 1],
[2, 1, 0]])
This should be the right output unless you have any requirements for sorting ties. Right now the flipping would make the rightmost tie the first index. If you wanted to match your exact output, where the leftmost index is first you could do a bit of juggling:
# Flip the array first and get the indices
>>> flipped = np.argsort(np.fliplr(matrix))
# Subtract the width of your array to reverse the indices
# Flip the array to be in descending order
>>> np.fliplr(abs(flipped - flipped.shape[1]))
array([[0, 1, 2],
[0, 1, 2],
[2, 0, 1]])

Related

Understanding _r from numpy

a = np.zeros([4, 4])
b = np.ones([4, 4])
#vertical stacking(ROW WISE)
print(np.r_[a,b])
print(np.r_[[1,2,3],0,0,[4,5,6]])
# output is
[[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]]
[[1. 1. 1. 1.]
[1. 1. 1. 1.]
[1. 1. 1. 1.]
[1. 1. 1. 1.]]
[1 2 3 0 0 4 5 6]
But here np._r doesn't perform vertical stacking, but does horizontal stacking. How does np._r work? Would be grateful for any help

In [324]: a = np.zeros([4, 4],int)
...: b = np.ones([4, 4],int)
In [325]: np.r_[a,b]
Out[325]:
array([[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1]])
This is a row stack; same as vstack. And since the arrays are already 2d, concatenate is enough:
In [326]: np.concatenate((a,b), axis=0)
Out[326]:
array([[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1]])
With the mix of 1d and scalars, r_ is the same as hstack:
In [327]: np.r_[[1,2,3],0,0,[4,5,6]]
Out[327]: array([1, 2, 3, 0, 0, 4, 5, 6])
In [328]: np.hstack([[1,2,3],0,0,[4,5,6]])
Out[328]: array([1, 2, 3, 0, 0, 4, 5, 6])
In [329]: np.concatenate([[1,2,3],0,0,[4,5,6]],axis=0)
...
ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 1 dimension(s) and the array at index 1 has 0 dimension(s)
concatenate fails because of the scalars. The other methods first convert those to 1d arrays.
In both case, r_ does
Translates slice objects to concatenation along the first axis.
r_ is actually an instance of a special class, with its own __getitem__ method, that allows us to use [] instead of (). It also means it can take slices as inputs (which are actually rendered as np.arange or np.linspace).
r_ takes an optional initial string argument, which if consisting of 3 numbers, can control the concatenate axis, and control how inputs are adjusted to matching dimensions. See the docs for details, and np.lib.index_tricks.py file for more details.
In order of importance I think the concatenate functions are:
np.concatenate # base
np.vstack # easy join 1d arrays into 2d
np.stack # generalize np.array
np.hstack # saves specifying axis
np.r_
np.c_
r_ and c_ can do neat things when mixing arrays of different shapes, but it all boils down to using concatanate correctly.

How to invert only negative elements in numpy matrix?

I have a matrix containing positive and negative numbers like this:
>>> source_matrix
array([[-4, -2, 0],
[-5, 0, 4],
[ 0, 6, 5]])
I'd like to had a copy of this matrix with inverted negatives:
>>> result
array([[-0.25, -0.5, 0],
[-0.2, 0, 4],
[ 0, 6, 5]])

Firstly, since your desired array is gonna contain float type you need to determine the array's dtype at creation time as float. The reason for that is because if you assign the float results of the inverted sub-array they'll automatically be casted to float. Secondly, you need to find the negative numbers in your array and then use a simple indexing in order to grab them and use np.true_divide() to perform the inversion.
In [25]: arr = np.array([[-4, -2, 0],
...: [-5, 0, 4],
...: [ 0, 6, 5]], dtype=np.float)
...:
...:
In [26]: mask = arr < 0
In [27]: arr[mask] = np.true_divide(1, arr[mask])
In [28]: arr
Out[28]:
array([[-0.25, -0.5 , 0. ],
[-0.2 , 0. , 4. ],
[ 0. , 6. , 5. ]])

You can also achieve this without masking, by using the where and out params of true_divide.
a = np.array([[-4, -2, 0],
[-5, 0, 4],
[ 0, 6, 5]], dtype=np.float)
np.true_divide(1, a, out=a, where=a<0)
Giving the result:
array([[-0.25, -0.5 , 0. ],
[-0.2 , 0. , 4. ],
[ 0. , 6. , 5. ]])
The where= parameter is passed an array of the same dimensions as your two inputs. Where this evaluates to True the divide is performed. Where it evaluates to False, the original input, passed in via out= is output into the result unchanged.

Python, issue with ties using argsort

I have the following problem with sorting a 2D array using the function argsort.
More precisely, let's assume I have 5 points and have calculated the euclidean distances between them, which are stored in the 2D array D:
D=np.array([[0,0.3,0.4,0.2,0.5],[0.3,0,0.2,0.6,0.1],
[0.4,0.2,0,0.5,0],[0.2,0.6,0.5,0,0.7],[0.5,0.1,0,0.7,0]])
D
array([[ 0. , 0.3, 0.4, 0.2, 0.5],
[ 0.3, 0. , 0.2, 0.6, 0.1],
[ 0.4, 0.2, 0. , 0.5, 0. ],
[ 0.2, 0.6, 0.5, 0. , 0.7],
[ 0.5, 0.1, 0. , 0.7, 0. ]])
Each element D[i,j] (i,j=0,...,4) shows the distance between point i and point j. The diagonal entries are of course equal to zero, as they show the distance of a point to itself. However, there can be 2 or more points which overlap. For instance, in this particular case, point 4 is located in the same position of point 2, so that the the distances D[2,4] and D[4,2] are equal to zero.
Now, I want to sort this array D: for each point i I want to know the indices of its neighbour points, from the closest to the furthest one. Of course, for a given point i the first point/index in the sorted array should be i, i.e. the the closest point to point 1 is 1. I used the function argsort:
N = np.argsort(D)
N
array([[0, 3, 1, 2, 4],
[1, 4, 2, 0, 3],
[2, 4, 1, 0, 3],
[3, 0, 2, 1, 4],
[2, 4, 1, 0, 3]])
This function sorts the distances properly until it gets to point 4: the first entry of the 4th row (counting from zero) is not 4 (D[4,4]=0) as I would like. I would like the 4th row to be [4, 2, 1, 0, 3]. The first entry is 2, because points 2 and 4 overlap so that D[2,4]=D[4,2], and between the same value entries D[2,4]=0 and D[4,2]=0, argsort selects always the first one.
Is there a way to fix this so that the sorted array N[i,j] of D[i,j] always starts with the indices corresponding to the diagonal entries D[i,i]=0?
Thank you for your help,
MarcoC

One way would be to fill the diagonal elements with something lesser than global minimum and then use argsort -
In [286]: np.fill_diagonal(D,D.min()-1) # Or use -1 for filling
# if we know beforehand that the global minimum is 0
In [287]: np.argsort(D)
Out[287]:
array([[0, 3, 1, 2, 4],
[1, 4, 2, 0, 3],
[2, 4, 1, 0, 3],
[3, 0, 2, 1, 4],
[4, 2, 1, 0, 3]])
If you don't want the input array to be changed, make a copy and then do the diagonal filling.

How about this:
import numpy as np
D = np.array([[ 0. , 0.3, 0.4, 0.2, 0.5],
[ 0.3, 0. , 0.2, 0.6, 0.1],
[ 0.4, 0.2, 0. , 0.5, 0. ],
[ 0.2, 0.6, 0.5, 0. , 0.7],
[ 0.5, 0.1, 0. , 0.7, 0. ]])
s = np.argsort(D)
line = np.argwhere(s[:,0] != np.arange(D.shape[0]))[0,0]
column = np.argwhere(s[line,:] == line)[0,0]
s[line,0], s[line, column] = s[line, column], s[line,0]
Just find the lines that don't have the dioganal element in front using numpy.argwhere, then the column to swap and then swap the elements. Then s contains what you want in the end.
This works for your example. In a general case, where numpy.argwhere can contain several elements, one would have to run a loop over those elements instead of just typing [0,0] at the end of the two lines of code above.
Hope I could help.

Replace all elements of a matrix by their inverses

I've got a simple problem and I can't figure out how to solve it.
Here is a matrix: A = np.array([[1,0,3],[0,7,9],[0,0,8]]).
I want to find a quick way to replace all elements of this matrix by their inverses, excluding of course the zero elements.
I know, thanks to the search engine of Stackoverflow, how to replace an element by a given value with a condition. On the contrary, I do not figure out how to replace elements by new elements depending on the previous ones (e.g. squared elements, inverses, etc.)

Use 1. / A (notice the dot for Python 2):
>>> A
array([[1, 0, 3],
[0, 7, 9],
[0, 0, 8]], dtype)
>>> 1./A
array([[ 1. , inf, 0.33333333],
[ inf, 0.14285714, 0.11111111],
[ inf, inf, 0.125 ]])
Or if your array has dtype float, you can do it in-place without warnings:
>>> A = np.array([[1,0,3], [0,7,9], [0,0,8]], dtype=np.float64)
>>> A[A != 0] = 1. / A[A != 0]
>>> A
array([[ 1. , 0. , 0.33333333],
[ 0. , 0.14285714, 0.11111111],
[ 0. , 0. , 0.125 ]])
Here we use A != 0 to select only those elements that are non-zero.
However if you try this on your original array you'd see
array([[1, 0, 0],
[0, 0, 0],
[0, 0, 0]])
because your array could only hold integers, and inverse of all others would have been rounded down to 0.
Generally all of the numpy stuff on arrays does element-wise vectorized transformations so that to square elements,
>>> A = np.array([[1,0,3],[0,7,9],[0,0,8]])
>>> A * A
array([[ 1, 0, 9],
[ 0, 49, 81],
[ 0, 0, 64]])

And just a note on Antti Haapala's answer, (Sorry, I can't comment yet)
if you wanted to keep the 0's, you could use
B=1./A #I use the 1. to make sure it uses floats
B[B==np.inf]=0

How to get euclidean distance on a 3x3x3 array in numpy

say I have a (3,3,3) array like this.
array([[[1, 1, 1],
[1, 1, 1],
[0, 0, 0]],
[[2, 2, 2],
[2, 2, 2],
[2, 2, 2]],
[[3, 3, 3],
[3, 3, 3],
[1, 1, 1]]])
How do I get the 9 values corresponding to euclidean distance between each vector of 3 values and the zeroth values?
Such as doing a numpy.linalg.norm([1,1,1] - [1,1,1]) 2 times, and then doing norm([0,0,0] - [0,0,0]), and then norm([2,2,2] - [1,1,1]) 2 times, norm([2,2,2] - [0,0,0]), then norm([3,3,3] - [1,1,1]) 2 times, and finally norm([1,1,1] - [0,0,0]).
Any good ways to vectorize this? I want to store the distances in a (3,3,1) matrix.
The result would be:
array([[[0. ],
[0. ],
[0. ]],
[[1.73],
[1.73],
[3.46]]
[[3.46],
[3.46],
[1.73]]])

keepdims argument is added in numpy 1.7, you can use it to keep the sum axis:
np.sum((x - [1, 1, 1])**2, axis=-1, keepdims=True)**0.5
the result is:
[[[ 0. ]
[ 0. ]
[ 0. ]]
[[ 1.73205081]
[ 1.73205081]
[ 1.73205081]]
[[ 3.46410162]
[ 3.46410162]
[ 0. ]]]
Edit
np.sum((x - x[0])**2, axis=-1, keepdims=True)**0.5
the result is:
array([[[ 0. ],
[ 0. ],
[ 0. ]],
[[ 1.73205081],
[ 1.73205081],
[ 3.46410162]],
[[ 3.46410162],
[ 3.46410162],
[ 1.73205081]]])

You might want to consider scipy.spatial.distance.cdist(), which efficiently computes distances between pairs of points in two collections of inputs (with a standard euclidean metric, among others). Here's example code:
import numpy as np
import scipy.spatial.distance as dist
i = np.array([[[1, 1, 1],
[1, 1, 1],
[0, 0, 0]],
[[2, 2, 2],
[2, 2, 2],
[2, 2, 2]],
[[3, 3, 3],
[3, 3, 3],
[1, 1, 1]]])
n,m,o = i.shape
# compute euclidean distances of each vector to the origin
# reshape input array to 2-D, as required by cdist
# only keep diagonal, as cdist computes all pairwise distances
# reshape result, adapting it to input array and required output
d = dist.cdist(i.reshape(n*m,o),i[0]).reshape(n,m,o).diagonal(axis1=2).reshape(n,m,1)
d holds:
array([[[ 0. ],
[ 0. ],
[ 0. ]],
[[ 1.73205081],
[ 1.73205081],
[ 3.46410162]],
[[ 3.46410162],
[ 3.46410162],
[ 1.73205081]]])
The big caveat of this approach is that we're calculating n*m*o distances, when we only need n*m (and that it involves an insane amount of reshaping).

I'm doing something similar that is to compute the the sum of squared distances (SSD) for each pair of frames in video volume. I think that it could be helpful for you.
video_volume is a a single 4d numpy array. This array should have dimensions
(time, rows, cols, 3) and dtype np.uint8.
Output is a square 2d numpy array of dtype float. output[i,j] should contain
the SSD between frames i and j.
video_volume = video_volume.astype(float)
size_t = video_volume.shape[0]
output = np.zeros((size_t, size_t), dtype = np.float)
for i in range(size_t):
for j in range(size_t):
output[i, j] = np.square(video_volume[i,:,:,:] - video_volume[j,:,:,:]).sum()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to efficiently order a numpy matrix - python

Probably the easiest way to get your desired output would be: (-matrix).argsort(axis=1) # array([[0, 1, 2], # [0, 1, 2], # [2, 0, 1]])

Related

Understanding _r from numpy

How to invert only negative elements in numpy matrix?

Python, issue with ties using argsort

Replace all elements of a matrix by their inverses

How to get euclidean distance on a 3x3x3 array in numpy

Categories

Resources