numpy sort arrays based on last column values [duplicate] - python

This question already has answers here:
Sorting arrays in NumPy by column
(16 answers)
Closed 4 years ago.
import numpy as np
a = np.array([[5,9,44],
[5,12,43],
[5,33,11]])
b = np.sort(a,axis=0)
print(b) #not well
# [[ 5 9 11]
# [ 5 12 43]
# [ 5 33 44]]
#desired output:
#[[5,33,11],
# [5,12,43],
# [5,9,44]]
what numpy sort does it changes rows completely(ofcourse based on lower to highest), but i would like to keep rows untouched. I would like to sort rows based on last column value, yet rows and values in array must stay untouched. Is there any pythonic way to do this?
Thanks

ind=np.argsort(a[:,-1])
b=a[ind]
EDIT
When you use axis in the sort, it sorts every column individually, what you want is to get indices of the sorted rows from the selected column (-1 is equivalent to the last column), and then reorder your original array.

a[a[:,-1].argsort()]
may work for you

Related

Is there a method to multiply only certain elements in a numpy array [duplicate]

This question already has answers here:
How to multiply a scalar throughout a specific column within a NumPy array?
(3 answers)
Closed 1 year ago.
Suppose I have a numpy array like so:
a = ([[4, 9], [38, 8], [90, 10]...[8545, 17]])
Where the first element is a location ID and the second is the amount of time spent at each location in minutes. I want to convert these times into seconds which requires me to multiply every other value by 60.
As this is a very long array, what would be the most time-efficient method for converting these times?
Use this:
import numpy as np
# a is your original list of [location, time] pairs
a = np.array(a)
a[:, 1] *= 60
It simply multiplies the second column of your array a by 60 to convert the time values into seconds.

Getting the top N values and their coordinates from a 2D Numpy array [duplicate]

This question already has answers here:
Efficient way to take the minimum/maximum n values and indices from a matrix using NumPy
(3 answers)
Closed 2 years ago.
I have a 2D numpy array "bigrams" of shape (851, 851) with float values inside. I want to get the top ten values from this array and I want their coordinates.
I know that np.amax(bigrams) can return the single highest value, so that's basically what I want but then for the top ten.
As a numpy-noob, I wrote some code using a loop to get the top values per row and then using np.where() to get the coordinates, but i feel there must be a smarter way to solve this..
You can flatten and use argsort.
idxs = np.argsort(bigrams.ravel())[-10:]
rows, cols = idxs//851, idxs%851
print(bigrams[rows,cols])
An alternative would be to do a partial sorting with argpartition.
partition = np.argpartition(bigrams.ravel(),-10)[-10:]
max_ten = bigrams[partition//851,partition%851]
You will get the top ten values and their coordinates, but they won't be sorted. You can sort this smaller array of ten values later if you want.

How to change value of remainder of a row in a numpy array once a certain condition is met? [duplicate]

This question already has answers here:
Can NumPy take care that an array is (nonstrictly) increasing along one axis?
(2 answers)
Closed 3 years ago.
I have a 2d numpy array of the form:
array = [[0,0,0,1,0], [0,1,0,0,0], [1,0,0,0,0]]
I'd like to go to each of the rows, iterate over the entries until the value 1 is found, then replace every subsequent value in that row to a 1. The output would then look like:
array = [[0,0,0,1,1], [0,1,1,1,1], [1,1,1,1,1]]
My actual data set is very large, so I was wondering if there is a specialized numpy function that does something like this, or if there's an obvious way to do it that I'm missing.
Thanks!
You can use apply.
import numpy as np
array = np.array([[0,0,0,1,0], [0,1,0,0,0], [1,0,0,0,0]])
def myfunc(l):
i = 0
while(l[i]!=1):
i+=1
return([0]*i+[1]*(len(l)-i))
print(np.apply_along_axis(myfunc, 1, array))

Python: Selecting every Nth row of a matrix [duplicate]

This question already has answers here:
Pythonic way to return list of every nth item in a larger list
(9 answers)
Closed 4 years ago.
does anyone know how to select multiple rows of a matrix to form a new one - e.g. I would like to select EVERY 3rd row of a matrix and build a new matrix with these rows.
Many thanks for your help,
Nicolas
An example using numpys ndarray to create a matrix using 10 rows and 3 columns as an example
import numpy as np
matrix = np.ndarray(shape=(10,3))
rows = np.shape(matrix)[0] #number of rows
columns = np.shape(matrix)[1] #number of columns
l = range(rows)[0::3] #indexes of each third element including the first element
new_matrix = np.ndarray(shape=(len(l),columns)) #Your new matrix
for i in range(len(l)):
new_matrix[i] = matrix[l[i]] #adding each third row from matrix to new_matrix

Slicing a Data frame by checking consecutive elements [duplicate]

This question already has answers here:
Pandas: Drop consecutive duplicates
(8 answers)
Closed 4 years ago.
I have a DF indexed by time and one of its columns (with 2 variables) is like [x,x,y,y,x,x,x,y,y,y,y,x]. I want to slice this DF so Ill get this column without same consecutive variables- in this example :[x,y,x,y,x] and every variable was the first in his subsequence.
Still trying to figure it out...
Thanks!!
Assuming you have df like below
df=pd.DataFrame(['x','x','y','y','x','x','x','y','y','y','y','x'])
We using shift to find the next is equal to the current or not
df[df[0].shift()!=df[0]]
Out[142]:
0
0 x
2 y
4 x
7 y
11 x
You jsut try to loop through and safe the last element used
df=pd.DataFrame(['x','x','y','y','x','x','x','y','y','y','y','x'])
df2=pd.DataFrame()
old = df[0].iloc[0] # get the first element
for column in df:
df[column].iloc[0] != old:
df2.append(df[column].iloc[0])
old = df[column].iloc[0]
EDIT:
Or for a vector use a list
>>> L=[1,1,1,1,1,1,2,3,4,4,5,1,2]
>>> from itertools import groupby
>>> [x[0] for x in groupby(L)]
[1, 2, 3, 4, 5, 1, 2]

Categories