Python: Selecting every Nth row of a matrix [duplicate] - python

This question already has answers here:
Pythonic way to return list of every nth item in a larger list
(9 answers)
Closed 4 years ago.
does anyone know how to select multiple rows of a matrix to form a new one - e.g. I would like to select EVERY 3rd row of a matrix and build a new matrix with these rows.
Many thanks for your help,
Nicolas

An example using numpys ndarray to create a matrix using 10 rows and 3 columns as an example
import numpy as np
matrix = np.ndarray(shape=(10,3))
rows = np.shape(matrix)[0] #number of rows
columns = np.shape(matrix)[1] #number of columns
l = range(rows)[0::3] #indexes of each third element including the first element
new_matrix = np.ndarray(shape=(len(l),columns)) #Your new matrix
for i in range(len(l)):
new_matrix[i] = matrix[l[i]] #adding each third row from matrix to new_matrix

Related

Getting the top N values and their coordinates from a 2D Numpy array [duplicate]

This question already has answers here:
Efficient way to take the minimum/maximum n values and indices from a matrix using NumPy
(3 answers)
Closed 2 years ago.
I have a 2D numpy array "bigrams" of shape (851, 851) with float values inside. I want to get the top ten values from this array and I want their coordinates.
I know that np.amax(bigrams) can return the single highest value, so that's basically what I want but then for the top ten.
As a numpy-noob, I wrote some code using a loop to get the top values per row and then using np.where() to get the coordinates, but i feel there must be a smarter way to solve this..
You can flatten and use argsort.
idxs = np.argsort(bigrams.ravel())[-10:]
rows, cols = idxs//851, idxs%851
print(bigrams[rows,cols])
An alternative would be to do a partial sorting with argpartition.
partition = np.argpartition(bigrams.ravel(),-10)[-10:]
max_ten = bigrams[partition//851,partition%851]
You will get the top ten values and their coordinates, but they won't be sorted. You can sort this smaller array of ten values later if you want.

Fill an empty matrix and work with the rows in python [duplicate]

This question already has answers here:
Filling a 2D matrix in numpy using a for loop
(2 answers)
Closed 2 years ago.
I have a function, where I calculate a prediction with the shape 1 x 250. I do this B times. To save all predictions for the B runs I would like to fill an empty B x 250 matrix row wise. Which python method is the best to fill an empty matrix, so that I can access the rows in a simple way (for example to subtract the rows from each other)
There must be something better then the following
indx = np.argsort(y_real[:,1])
print(len(indx)) #250
hat_y_b = []
for b in range(0,B):
y_pred = function()
hat_y_b = np.append(hat_y_b, y_pred[indx,1], axis=0)
I'm not sure I understand your question fully, but here're some general tips on dealing with numpy arrays:
In general, appending things to np arrays is very slow. Instead, allocate the entire array at the start, and fill with array indexing:
import numpy as np
# first, make an array with 250 rows and B cols.
# Each col will be a single prediction vector
all_b_arr = np.zeros((250, B))
# if no lower bound is specified, it is assumed to be 0
for idx in range(B):
# This array indexing means "every row (:), column idx"
all_b_arr[:, idx] = function()

clustering algorithm in python [duplicate]

This question already has answers here:
Numpy: Get random set of rows from 2D array
(10 answers)
Closed 3 years ago.
im trying to code this algorithm but im struggling with it and step 6 is confusing me my code so far is at the bottom
Set a positive value for K.
Select K different rows from the data matrix at random.
For each of the selected rows a. Copy its values to a new list, let us call it c. Each element of c is a number. (at the end of step 3, you should have the lists 𝑐1 , 𝑐2 , … , 𝑐𝐾. Each of these should have the same number of columns as the data matrix)
For each row i in the data matrix a. Calculate the Manhattan distance between data row 𝐷′ 𝑖 and each of the lists 𝑐1 , 𝑐2 , … , 𝑐𝐾. b. Assign the row 𝐷′ 𝑖 to the cluster of the nearest c. For instance, if the nearest c is 𝑐3 then assign row i to the cluster 3 (ie. you should have a list whose ith entry is equal to 3, let’s call this list S).
If the previous step does not change S, stop.
For each k = 1, 2, …, K a. Update π‘π‘˜. Each element j of π‘π‘˜should be equal to the median of the column 𝐷′𝑗 but only taking into consideration those rows that have been assigned to cluster k.
Go to Step 4.
Notice that in the above K is not the same thing as k
#This is what i have so far:
def clustering(matrix,k):
for i in k:
I'm stuck with how it would choose the rows randomly and also I don't understand what step 5 and 6 mean if someone could explain
You need np.random.choice.
Use this:
import numpy as np
# some data with 10 rows and 5 columns
X=np.random.rand(10,5)
def clustering(X,k):
# create the random indices (selector)
random_selector = np.random.choice(range(X.shape[0]), size=k, replace=False) # replace=False to get unique samples
# select randomly k=10 lines
sampled_X = X[random_selector] # X[random_selector].shape = (10,5)
.
.
.
return #SOMETHING
Now you can continue working on your ?homework?

How to change value of remainder of a row in a numpy array once a certain condition is met? [duplicate]

This question already has answers here:
Can NumPy take care that an array is (nonstrictly) increasing along one axis?
(2 answers)
Closed 3 years ago.
I have a 2d numpy array of the form:
array = [[0,0,0,1,0], [0,1,0,0,0], [1,0,0,0,0]]
I'd like to go to each of the rows, iterate over the entries until the value 1 is found, then replace every subsequent value in that row to a 1. The output would then look like:
array = [[0,0,0,1,1], [0,1,1,1,1], [1,1,1,1,1]]
My actual data set is very large, so I was wondering if there is a specialized numpy function that does something like this, or if there's an obvious way to do it that I'm missing.
Thanks!
You can use apply.
import numpy as np
array = np.array([[0,0,0,1,0], [0,1,0,0,0], [1,0,0,0,0]])
def myfunc(l):
i = 0
while(l[i]!=1):
i+=1
return([0]*i+[1]*(len(l)-i))
print(np.apply_along_axis(myfunc, 1, array))

numpy sort arrays based on last column values [duplicate]

This question already has answers here:
Sorting arrays in NumPy by column
(16 answers)
Closed 4 years ago.
import numpy as np
a = np.array([[5,9,44],
[5,12,43],
[5,33,11]])
b = np.sort(a,axis=0)
print(b) #not well
# [[ 5 9 11]
# [ 5 12 43]
# [ 5 33 44]]
#desired output:
#[[5,33,11],
# [5,12,43],
# [5,9,44]]
what numpy sort does it changes rows completely(ofcourse based on lower to highest), but i would like to keep rows untouched. I would like to sort rows based on last column value, yet rows and values in array must stay untouched. Is there any pythonic way to do this?
Thanks
ind=np.argsort(a[:,-1])
b=a[ind]
EDIT
When you use axis in the sort, it sorts every column individually, what you want is to get indices of the sorted rows from the selected column (-1 is equivalent to the last column), and then reorder your original array.
a[a[:,-1].argsort()]
may work for you

Categories