Python, issue with ties using argsort - python

I have the following problem with sorting a 2D array using the function argsort.
More precisely, let's assume I have 5 points and have calculated the euclidean distances between them, which are stored in the 2D array D:
D=np.array([[0,0.3,0.4,0.2,0.5],[0.3,0,0.2,0.6,0.1],
[0.4,0.2,0,0.5,0],[0.2,0.6,0.5,0,0.7],[0.5,0.1,0,0.7,0]])
D
array([[ 0. , 0.3, 0.4, 0.2, 0.5],
[ 0.3, 0. , 0.2, 0.6, 0.1],
[ 0.4, 0.2, 0. , 0.5, 0. ],
[ 0.2, 0.6, 0.5, 0. , 0.7],
[ 0.5, 0.1, 0. , 0.7, 0. ]])
Each element D[i,j] (i,j=0,...,4) shows the distance between point i and point j. The diagonal entries are of course equal to zero, as they show the distance of a point to itself. However, there can be 2 or more points which overlap. For instance, in this particular case, point 4 is located in the same position of point 2, so that the the distances D[2,4] and D[4,2] are equal to zero.
Now, I want to sort this array D: for each point i I want to know the indices of its neighbour points, from the closest to the furthest one. Of course, for a given point i the first point/index in the sorted array should be i, i.e. the the closest point to point 1 is 1. I used the function argsort:
N = np.argsort(D)
N
array([[0, 3, 1, 2, 4],
[1, 4, 2, 0, 3],
[2, 4, 1, 0, 3],
[3, 0, 2, 1, 4],
[2, 4, 1, 0, 3]])
This function sorts the distances properly until it gets to point 4: the first entry of the 4th row (counting from zero) is not 4 (D[4,4]=0) as I would like. I would like the 4th row to be [4, 2, 1, 0, 3]. The first entry is 2, because points 2 and 4 overlap so that D[2,4]=D[4,2], and between the same value entries D[2,4]=0 and D[4,2]=0, argsort selects always the first one.
Is there a way to fix this so that the sorted array N[i,j] of D[i,j] always starts with the indices corresponding to the diagonal entries D[i,i]=0?
Thank you for your help,
MarcoC

One way would be to fill the diagonal elements with something lesser than global minimum and then use argsort -
In [286]: np.fill_diagonal(D,D.min()-1) # Or use -1 for filling
# if we know beforehand that the global minimum is 0
In [287]: np.argsort(D)
Out[287]:
array([[0, 3, 1, 2, 4],
[1, 4, 2, 0, 3],
[2, 4, 1, 0, 3],
[3, 0, 2, 1, 4],
[4, 2, 1, 0, 3]])
If you don't want the input array to be changed, make a copy and then do the diagonal filling.

How about this:
import numpy as np
D = np.array([[ 0. , 0.3, 0.4, 0.2, 0.5],
[ 0.3, 0. , 0.2, 0.6, 0.1],
[ 0.4, 0.2, 0. , 0.5, 0. ],
[ 0.2, 0.6, 0.5, 0. , 0.7],
[ 0.5, 0.1, 0. , 0.7, 0. ]])
s = np.argsort(D)
line = np.argwhere(s[:,0] != np.arange(D.shape[0]))[0,0]
column = np.argwhere(s[line,:] == line)[0,0]
s[line,0], s[line, column] = s[line, column], s[line,0]
Just find the lines that don't have the dioganal element in front using numpy.argwhere, then the column to swap and then swap the elements. Then s contains what you want in the end.
This works for your example. In a general case, where numpy.argwhere can contain several elements, one would have to run a loop over those elements instead of just typing [0,0] at the end of the two lines of code above.
Hope I could help.

Related

Losing decimal when doing array operation in Python

I tried to make a function and inside it there is a code to divides a column with its column sum and here I come up with.
A = np.array([[1,2,3,4],[1,2,3,4],[1,2,3,4]])
print(A)
A = A.T
Asum = A.sum(axis=1)
print(Asum)
for i in range(len(Asum)):
A[:,i] = A[:,i]/Asum[i]
I'm hoping some decimal matrix but it automatically turn into integer. It gives me a zero matrix. Where do I go wrong?
You must change:
Asum = A.sum(axis=1)
by:
Asum = A.sum(axis=0)
To get the column by column sum.
Also you can get the division easily with numpy.divide:
np.divide(A, Asum)
#array([[0.1, 0.1, 0.1],
# [0.2, 0.2, 0.2],
# [0.3, 0.3, 0.3],
# [0.4, 0.4, 0.4]])
Or simply with:
A/Asum
Your A is integer dtype; assigned floats get truncated. If A started as a float array your iteration would work. But you don't need to iterate to perform this calculation:
In [108]: A = np.array([[1,2,3,4],[1,2,3,4],[1,2,3,4]]).T
In [109]: A
Out[109]:
array([[1, 1, 1],
[2, 2, 2],
[3, 3, 3],
[4, 4, 4]])
In [110]: Asum = A.sum(axis=1)
In [111]: Asum
Out[111]: array([ 3, 6, 9, 12])
A is (4,3), Asum is (4,). If we make it (4,1):
In [114]: Asum[:,None]
Out[114]:
array([[ 3],
[ 6],
[ 9],
[12]])
we can perform the divide without iteration (review broadcasting if necessary):
In [115]: A/Asum[:,None]
Out[115]:
array([[0.33333333, 0.33333333, 0.33333333],
[0.33333333, 0.33333333, 0.33333333],
[0.33333333, 0.33333333, 0.33333333],
[0.33333333, 0.33333333, 0.33333333]])
sum has keepdims parameter that makes this kind of calculation easier:
In [117]: Asum = A.sum(axis=1, keepdims=True)
In [118]: Asum
Out[118]:
array([[ 3],
[ 6],
[ 9],
[12]])

How to efficiently order a numpy matrix

I have this numpy array
matrix = np.array([[ 0.8, 0.2, 0.1],
[ 1. , 0. , 0. ],
[ 0. , 0. , 1. ]])
and I would like to filter to return, for each row of matrix the indices in decreasing value order.
For example, this would be
np.array([[0, 1, 2], [0, 1, 2], [2, 0, 1]])
I know I could use np.argsort, but this doesn't seem to be returning the right output. I tried changing the axis to different values, but that doesn't help either.
Probably the easiest way to get your desired output would be:
(-matrix).argsort(axis=1)
# array([[0, 1, 2],
# [0, 1, 2],
# [2, 0, 1]])
I think np.argsort does seem to do the trick, you just need to make sure to flip the matrix horizontally to make it decreasing order:
>>>matrix = np.array(
[[ 0.8, 0.2, 0.1],
[ 1. , 0. , 0. ],
[ 0. , 0. , 1. ]])
>>> np.fliplr(np.argsort(matrix))
array([[0, 1, 2],
[0, 2, 1],
[2, 1, 0]])
This should be the right output unless you have any requirements for sorting ties. Right now the flipping would make the rightmost tie the first index. If you wanted to match your exact output, where the leftmost index is first you could do a bit of juggling:
# Flip the array first and get the indices
>>> flipped = np.argsort(np.fliplr(matrix))
# Subtract the width of your array to reverse the indices
# Flip the array to be in descending order
>>> np.fliplr(abs(flipped - flipped.shape[1]))
array([[0, 1, 2],
[0, 1, 2],
[2, 0, 1]])

How to invert only negative elements in numpy matrix?

I have a matrix containing positive and negative numbers like this:
>>> source_matrix
array([[-4, -2, 0],
[-5, 0, 4],
[ 0, 6, 5]])
I'd like to had a copy of this matrix with inverted negatives:
>>> result
array([[-0.25, -0.5, 0],
[-0.2, 0, 4],
[ 0, 6, 5]])
Firstly, since your desired array is gonna contain float type you need to determine the array's dtype at creation time as float. The reason for that is because if you assign the float results of the inverted sub-array they'll automatically be casted to float. Secondly, you need to find the negative numbers in your array and then use a simple indexing in order to grab them and use np.true_divide() to perform the inversion.
In [25]: arr = np.array([[-4, -2, 0],
...: [-5, 0, 4],
...: [ 0, 6, 5]], dtype=np.float)
...:
...:
In [26]: mask = arr < 0
In [27]: arr[mask] = np.true_divide(1, arr[mask])
In [28]: arr
Out[28]:
array([[-0.25, -0.5 , 0. ],
[-0.2 , 0. , 4. ],
[ 0. , 6. , 5. ]])
You can also achieve this without masking, by using the where and out params of true_divide.
a = np.array([[-4, -2, 0],
[-5, 0, 4],
[ 0, 6, 5]], dtype=np.float)
np.true_divide(1, a, out=a, where=a<0)
Giving the result:
array([[-0.25, -0.5 , 0. ],
[-0.2 , 0. , 4. ],
[ 0. , 6. , 5. ]])
The where= parameter is passed an array of the same dimensions as your two inputs. Where this evaluates to True the divide is performed. Where it evaluates to False, the original input, passed in via out= is output into the result unchanged.

Replace all elements of a matrix by their inverses

I've got a simple problem and I can't figure out how to solve it.
Here is a matrix: A = np.array([[1,0,3],[0,7,9],[0,0,8]]).
I want to find a quick way to replace all elements of this matrix by their inverses, excluding of course the zero elements.
I know, thanks to the search engine of Stackoverflow, how to replace an element by a given value with a condition. On the contrary, I do not figure out how to replace elements by new elements depending on the previous ones (e.g. squared elements, inverses, etc.)
Use 1. / A (notice the dot for Python 2):
>>> A
array([[1, 0, 3],
[0, 7, 9],
[0, 0, 8]], dtype)
>>> 1./A
array([[ 1. , inf, 0.33333333],
[ inf, 0.14285714, 0.11111111],
[ inf, inf, 0.125 ]])
Or if your array has dtype float, you can do it in-place without warnings:
>>> A = np.array([[1,0,3], [0,7,9], [0,0,8]], dtype=np.float64)
>>> A[A != 0] = 1. / A[A != 0]
>>> A
array([[ 1. , 0. , 0.33333333],
[ 0. , 0.14285714, 0.11111111],
[ 0. , 0. , 0.125 ]])
Here we use A != 0 to select only those elements that are non-zero.
However if you try this on your original array you'd see
array([[1, 0, 0],
[0, 0, 0],
[0, 0, 0]])
because your array could only hold integers, and inverse of all others would have been rounded down to 0.
Generally all of the numpy stuff on arrays does element-wise vectorized transformations so that to square elements,
>>> A = np.array([[1,0,3],[0,7,9],[0,0,8]])
>>> A * A
array([[ 1, 0, 9],
[ 0, 49, 81],
[ 0, 0, 64]])
And just a note on Antti Haapala's answer, (Sorry, I can't comment yet)
if you wanted to keep the 0's, you could use
B=1./A #I use the 1. to make sure it uses floats
B[B==np.inf]=0

How to convert 2d numpy array into binary indicator matrix for max value

assuming I have a 2d numpy array indicating probabilities for m samples in n classes (probabilities sum to 1 for each sample).
Assuming each sample can only be in one category, I want to create a new array with the same shape as the original, but with only binary values indicating which class had the highest probability.
Example:
[[0.2, 0.3, 0.5], [0.7, 0.1, 0.1]]
should be converted to:
[[0, 0, 1], [1, 0, 0]]
It seems amax already does almost what I want, but instead of the indices I want an indicator matrix as descrived above.
Seems simple, but somehow I can't figure it out using standard numpy functions. I could use regular python loops of course, but it seems there should be a simpler way.
In case multiple classes have the same probability, I would prefer a solution which only selects one of the classes (I don't care which in this case).
Thanks!
Here's one way:
In [112]: a
Out[112]:
array([[ 0.2, 0.3, 0.5],
[ 0.7, 0.1, 0.1]])
In [113]: a == a.max(axis=1, keepdims=True)
Out[113]:
array([[False, False, True],
[ True, False, False]], dtype=bool)
In [114]: (a == a.max(axis=1, keepdims=True)).astype(int)
Out[114]:
array([[0, 0, 1],
[1, 0, 0]])
(But this will give a True value for each occurrence of the maximum in a row. See Divakar's answer for a nice way to select just the first occurrence of the maximum.)
In case of ties (two or more elements being the highest one in a row), where you want to select only one, here's one approach to do so with np.argmax and broadcasting -
(A.argmax(1)[:,None] == np.arange(A.shape[1])).astype(int)
Sample run -
In [296]: A
Out[296]:
array([[ 0.2, 0.3, 0.5],
[ 0.5, 0.5, 0. ]])
In [297]: (A.argmax(1)[:,None] == np.arange(A.shape[1])).astype(int)
Out[297]:
array([[0, 0, 1],
[1, 0, 0]])

Categories