Sort NumPy array by values in two columns [duplicate]

Sort NumPy array by values in two columns [duplicate] - python

This question already has answers here:
Sorting a 2D numpy array by multiple axes
(7 answers)
Closed 7 years ago.
So I've a 2d array, that when sorted by the second column using a[np.argsort(-a[:,1])] looks like this:
array([[ 30. , 98.7804878 ],
[ 24. , 98.7804878 ],
[ 21. , 98.7804878 ],
[ 26. , 98.7804878 ],
[ 20. , 98.70875179],
[ 4. , 98.27833572],
[ 1. , 7.10186514]])
Now I want to sort this by the lowest "id" column so it looks like this:
array([[ 21. , 98.7804878 ],
[ 24. , 98.7804878 ],
[ 26. , 98.7804878 ],
[ 30. , 98.7804878 ],
[ 20. , 98.70875179],
[ 4. , 98.27833572],
[ 1. , 7.10186514]])
I can't figure out how to do it, even if I take the top highest percentages from the first and then order them.

You can use np.lexsort for this:
>>> a[np.lexsort((a[:, 0], -a[:, 1]))]
array([[ 21. , 98.7804878 ],
[ 24. , 98.7804878 ],
[ 26. , 98.7804878 ],
[ 30. , 98.7804878 ],
[ 20. , 98.70875179],
[ 4. , 98.27833572],
[ 1. , 7.10186514]])
This sorts by -a[:, 1], then by a[:, 0], returning an array of indices than you can use to index a.

Related

How can i sort an array based on the mean of each column in python?

Input:
array([[ 1. , 5. , 1. ],
[ 10. , 7. , 1.5],
[ 6.9, 5. , 1. ],
[ 19. , 9. , 100. ],
[ 11. , 11. , 11. ]])
Expected Output:
array([[ 19. , 9. , 100. ],
[ 11. , 11. , 11. ],
[ 10. , 7. , 1.5],
[ 6.9, 5. , 1. ],
[ 1. , 5. , 1. ]])
i tried doing the below:
for i in M:
ls = i.mean()
x = np.append(i,ls)
print(x) #found the mean
After this i am unable to arrange each column based on the mean value in each row. All i can do
is to arrange each row in descending order but that is not what i wanted.

You can do this:
In [405]: row_idxs = np.argsort(np.mean(a * -1, axis=1))
In [406]: a[row_idxs, :]
Out[406]:
array([[ 19. , 9. , 100. ],
[ 11. , 11. , 11. ],
[ 10. , 7. , 1.5],
[ 6.9, 5. , 1. ],
[ 1. , 5. , 1. ]])
Using argsort will sort the indices. Multiplying by -1 allows you to get descending order.

python: DELETE points out of a very big 2D array and elements are float, like discarding unwanted points in KNN

I have a 2D array and I want to delete a point out of it but suppose it's so big meaning I can't specify an index and just grab it and the values of the array are float
How can I delete this point? With a LOOP and WITHOUT LOOP?? the following is 2D array and I want to delete [ 32.9, 23.]
[[ 1. , -1.4],
[ -2.9, -1.5],
[ -3.6, -2. ],
[ 1.5, 1. ],
[ 24. , 11. ],
[ -1. , 1.4],
[ 2.9, 1.5],
[ 3.6, 2. ],
[ -1.5, -1. ],
[ -24. , -11. ],
[ 32.9, 23. ],
[-440. , 310. ]]
I tried this but doesn't work:
this_point = np.asarray([ 32.9, 23.])
[x for x in y if x == point]
del datapoints[this_point]
np.delete(datapoints,len(datapoints), axis=0)
for this_point in datapoints:
del this_point
when I do this, the this_point stays in after printing all points, what should I do?

Python can remove a list element by content, but numpy does only by index. So, use "where" to find the coordinates of the matching row:
import numpy as np
a = np.array([[ 1. , -1.4],
[ -2.9, -1.5],
[ -3.6, -2. ],
[ 1.5, 1. ],
[ 24. , 11. ],
[ -1. , 1.4],
[ 2.9, 1.5],
[ 3.6, 2. ],
[ -1.5, -1. ],
[ -24. , -11. ],
[ 32.9, 23. ],
[-440. , 310. ]])
find = np.array([32.9,23.])
row = np.where( (a == find).all(axis=1))
print( row )
print(np.delete( a, row, axis=0 ) )
Output:
(array([10], dtype=int64),)
[[ 1. -1.4]
[ -2.9 -1.5]
[ -3.6 -2. ]
[ 1.5 1. ]
[ 24. 11. ]
[ -1. 1.4]
[ 2.9 1.5]
[ 3.6 2. ]
[ -1.5 -1. ]
[ -24. -11. ]
[-440. 310. ]]
C:\tmp>

Assign people which have different locations to a group

I have a pandas dataframe with people who live in different locations (latitude, longitude, floor number). I would like to assign 3 people to a group. The groups should be numbered from 1 to n. Important: These 3 people share different locations in terms of latitude, longitude and floor. This means, at the end of this process, every person is assigned to one particular group. My dataframe has the length of multiples of 9 (e.g 18 people).
Example:
Here is my dataframe:
array_data=([[ 50.56419 , 8.67667 , 2. , 160. ],
[ 50.5740356, 8.6718179, 1. , 5. ],
[ 50.5746321, 8.6831284, 3. , 202. ],
[ 50.5747453, 8.6765588, 4. , 119. ],
[ 50.5748992, 8.6611471, 2. , 260. ],
[ 50.5748992, 8.6611471, 3. , 102. ],
[ 50.575 , 8.65985 , 2. , 267. ],
[ 50.5751 , 8.66027 , 2. , 7. ],
[ 50.5751 , 8.66027 , 2. , 56. ],
[ 50.57536 , 8.67741 , 1. , 194. ],
[ 50.57536 , 8.67741 , 1. , 282. ],
[ 50.5755255, 8.6884584, 0. , 276. ],
[ 50.5755273, 8.674282 , 3. , 167. ],
[ 50.57553 , 8.6826 , 2. , 273. ],
[ 50.5755973, 8.6847492, 0. , 168. ],
[ 50.5756757, 8.6846139, 4. , 255. ],
[ 50.57572 , 8.65965 , 0. , 66. ],
[ 50.57591 , 8.68175 , 1. , 187. ]])
all_persons = pd.DataFrame(data=array_data) # convert back to dataframe
all_persons.rename(columns={0: 'latitude', 1: 'longitude', 2:'floor', 3:'id'}, inplace=True) # rename columns
How can I create this column? As you can see, my approach doesn't work correctly.
This was my approach: Google Colab Link to my solution

temp = ()
temp += (pd.concat([df.loc[users group 1]], keys=[1], names=['group']),)
temp += (pd.concat([df.loc[users group 2]], keys=[2], names=['group']),)
temp += (pd.concat([df.loc[users group 3]], keys=[3], names=['group']),)
df = pd.concat(temp)
Of course you can do this in a loop and locate the users you need in a more elegant way.

Sort 2D NumPy array by one of the columns

I though this would be super easy but I am struggling a little. I have a data structure as follows
array([[ 5. , 3.40166205],
[ 10. , 2.72778882],
[ 15. , 2.31881804],
[ 20. , 2.50643777],
[ 1. , 3.94076063],
[ 2. , 3.80598599],
[ 3. , 3.67121134],
[ 6. , 3.2668874 ],
[ 7. , 3.13211276],
[ 8. , 2.99733811],
[ 9. , 2.86256347],
[ 11. , 2.64599467],
[ 12. , 2.56420051],
[ 13. , 2.48240635],
[ 14. , 2.4006122 ],
[ 16. , 1.8280531 ],
[ 17. , 1.74625894],
[ 18. , 1.66446479],
[ 19. , 1.58267063],
[ 20. , 1.50087647]])
And I want to sort it ONLY on the first column ... so it is ordered as follows:
array([[1. , 3.9],
[2. , 3.8],
... ,
[20. , 1.5]])
np.sort doesn't seem to work as it moves array to a flat structure. I've also used itemgetter
from operator import itemgetter
sorted(data, key=itemgetter(1))
But this doesn't give me the output I'm looking for.
Help appreciated!

This is a common numpy idiom. You can use argsort (on the first column) + numpy indexing here -
x[x[:, 0].argsort()]
array([[ 1. , 3.94076063],
[ 2. , 3.80598599],
[ 3. , 3.67121134],
[ 5. , 3.40166205],
[ 6. , 3.2668874 ],
[ 7. , 3.13211276],
[ 8. , 2.99733811],
[ 9. , 2.86256347],
[ 10. , 2.72778882],
[ 11. , 2.64599467],
[ 12. , 2.56420051],
[ 13. , 2.48240635],
[ 14. , 2.4006122 ],
[ 15. , 2.31881804],
[ 16. , 1.8280531 ],
[ 17. , 1.74625894],
[ 18. , 1.66446479],
[ 19. , 1.58267063],
[ 20. , 2.50643777],
[ 20. , 1.50087647]])

NumPy: removing rows in an array if one column's value does not match

I have two arrays in NumPy:
a1 =
array([[ 262.99182129, 213. , 1. ],
[ 311.98925781, 271.99050903, 2. ],
[ 383. , 342. , 3. ],
[ 372.16494751, 348.83505249, 4. ],
[ 214.55493164, 137.01008606, 5. ],
[ 138.29714966, 199.75 , 6. ],
[ 289.75 , 220.75 , 7. ],
[ 239. , 279. , 8. ],
[ 130.75 , 348.25 , 9. ]])
a2 =
array([[ 265.78259277, 212.99705505, 1. ],
[ 384.23312378, 340.99707031, 3. ],
[ 373.66967773, 347.96688843, 4. ],
[ 217.91461182, 137.2791748 , 5. ],
[ 141.35340881, 199.38366699, 6. ],
[ 292.24401855, 220.83808899, 7. ],
[ 241.53366089, 278.56951904, 8. ],
[ 133.26490784, 347.14279175, 9. ]])
Actually there will be thousands of rows.
But as you can see, the third column in a2 does not have the value 2.0.
What I simply want is to remove from a1 the rows whose 3rd column values are not found in any row of a2.
What's the NumPy way/shortcut to do this fast?

One option is to use np.in1d to check whether each of the values in column 2 of a1 is in column 2 of a2 and use the resulting Boolean array to index the rows of a1.
You can do this as follows:
>>> a1[np.in1d(a1[:, 2], a2[:, 2])]
array([[ 262.99182129, 213. , 1. ],
[ 383. , 342. , 3. ],
[ 372.16494751, 348.83505249, 4. ],
[ 214.55493164, 137.01008606, 5. ],
[ 138.29714966, 199.75 , 6. ],
[ 289.75 , 220.75 , 7. ],
[ 239. , 279. , 8. ],
[ 130.75 , 348.25 , 9. ]])
The row in a1 with 2 in the third column in not in this array as required.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Sort NumPy array by values in two columns [duplicate] - python

Related

How can i sort an array based on the mean of each column in python?

python: DELETE points out of a very big 2D array and elements are float, like discarding unwanted points in KNN

Assign people which have different locations to a group

Sort 2D NumPy array by one of the columns

NumPy: removing rows in an array if one column's value does not match

Categories

Resources