Python numpy sampling a 2D array for n rows - python

I have a numpy array as follows, I want to take a random sample of n rows.
([[996.924, 265.879, 191.655],
[996.924, 265.874, 191.655],
[996.925, 265.884, 191.655],
[997.294, 265.621, 192.224],
[997.294, 265.643, 192.225],
[997.304, 265.652, 192.223]], dtype=float32)
I've tried:
rows_id = random.sample(range(0,arr.shape[1]-1), 1)
row = arr[rows_id, :]
But this 9ndex mask only returns a single row, I want to return n rows as an numpy array (without duplication).

You have three key issues: arr.shape[1] returns the number of columns, while you want the number of rows--arr.shape[0]. Second, the second parameter to range is exclusive, so you don't really need the -1. Third, your last parameter to random.sample is the number of rows, which you set to 1.
A better way to do what you're trying might be random.choices instead.

Try where x is your original array:
n = 2 #number of rows
idx = np.random.choice(len(x), n, replace = False)
result = np.array([x[i] for i in idx])

Related

Numpy adding rows each step and then work by columns

I'm trying in Python using Numpy to do the following.
Receive every step a row of values. Call a function for each column.
To make it simple: assume I call a function: GetRowOfValues()
And after 5 rows I want to sum each column.
And return a full row which is the sum of all 5 rows received.
Anyone has an idea how to implement to using numpy?
Thanks for the help
I'm assuming that rows have a fixed length n and that their values are of float data type.
import numpy as np
n = 10 # adjust according to your need
cache = np.empty((5, n), dtype=float) # allocate empty 5xn array
cycle = True
while cycle:
for i in range(5):
cache[i,:] = GetRowOfValues() # save result of function call in i-th row
column_sum = np.sum(cache, axis=0) # sum by column
# your logic here...

How best to randomly select a number of non zero elements from an array with many duplicate integers

I need to randomly select x non-zero integers from an unsorted 1D numpy array containing y integer elements including an unknown number of zeros as well as duplicate integers. The output should include duplicate integers if required by this random selection. What is the best way to achieve this?
One option is to select the non-zero elements first then use random.choice() (with the replace parameter set to either True or False) to select a given number of elements.
Something like this:
import numpy as np
rng = np.random.default_rng() # doing this is recommended by numpy
n = 4 # number of non-zero samples
arr = np.array([1,2,0,0,4,2,3,0,0,4,2,1])
non_zero_arr = arr[arr!=0]
rng.choice(non_zero_arr, n, replace=True)

Numpy setting every column in a matrix to a certain value matching a condition

I have a matrix D and sort every row with the indicies (argsort). I'm trying to set values of some_matrix at indicies 1-5 in np.argsort(D) to 1. What I have below does what I need, but is there a way to do this in one line with numpy arrays?
some_matrix = np.zeros((n,n))
for i in range(n):
some_matrix[i,np.argsort(D)[i,1:5]] = 1
Firstly, note that you don't need a full sort, only a partition of elements 1-4 (I assume you need elements 1,2,3,4, because that's what your code does). So let's use that:
#assuming you want indices 1,2,3,4 of the sorted array, in any order
indices = np.argpartition(D, (1, 4), axis=1)[:, 1:5]
Now we've got indices of D with the first, second, third and fourth smallest elements (this is similar to indices = np.argsort(D, 1)[:, 1:5], but will be faster for large arrays). All we need is to set these elements to 1
np.put_along_axis(some_matrix, indices, 1, axis=1)

How can I compare two matrices row-wise in python?

I have two matrices with the same number of columns but a different number of rows, one is a lot larger.
matA = [[1,0,1],[0,0,0],[1,1,0]], matB = [[0,0,0],[1,0,1],[0,0,0],[1,1,1],[1,1,0]]
Both of them are numpy matrices
I am trying to find how many times each row of matA appears in matB and put that in an array so the array in this case will become arr = [1,2,1] because the first row of matA appeared one time in mat, the second row appeared two times and the last row only one time
Find unique rows in numpy.array
What is a faster way to get the location of unique rows in numpy
Here is a solution:
import numpy as np
A = np.array([[1,0,1],[0,0,0],[1,1,0]])
B = np.array([[0,0,0],[1,0,1],[0,0,0],[1,1,1],[1,1,0]])
# stack the rows, A has to be first
combined = np.concatenate((A, B), axis=0) #or np.vstack
unique, unique_indices, unique_counts = np.unique(combined,
return_index=True,
return_counts=True,
axis=0)
print(unique)
print(unique_indices)
print(unique_counts)
# now we need to derive your desired result from the unique
# indices and counts
# we know the number of rows in A
n_rows_in_A = A.shape[0]
# so we know that the indices from 0 to (n_rows_in_A - 1)
# in unique_indices are rows that appear first or only in A
indices_A = np.nonzero(unique_indices < n_rows_in_A)[0] #first
#indices_A1 = np.argwhere(unique_indices < n_rows_in_A)
print(indices_A)
#print(indices_A1)
unique_indices_A = unique_indices[indices_A]
unique_counts_A = unique_counts[indices_A]
print(unique_indices_A)
print(unique_counts_A)
# now we need to subtract one count from the unique_counts
# that's the one occurence in A that we are not interested in.
unique_counts_A -= 1
print(unique_indices_A)
print(unique_counts_A)
# this is nearly the result we want
# now we need to sort it and account for rows that are not
# appearing in A but in B
# will do that later...

Numpy: Finding correspondencies in one array by uniques of other array, arbitrary length

I have a problem where I have two arrays, one with identifiers which can occur multiple time, lets just say
import numpy as np
ind = np.random.randint(0,10,(100,))
and another one which is the same length and contains some info, in this case boolean, for each of the elementes identified by ind. They are sorted correspondingly.
dat = np.random.randint(0,2,(100,)).astype(np.bool8)
I'm looking for a (faster?) way to do the following: Do a np.any() for each element (defined by ind) for all elements. The number of occurences per element is, as in the example, random. What I'm doing now is
result = np.empty(np.unique(ind))
for i,uni in enumerate(np.unique(ind)):
result[i] = np.any(dat[ind==uni])
Which is sort of slow. Any ideas?
Approach #1
Index ind with dat to select the ones required to be checked, get the binned counts with np.bincount and see which bins have more one than occurrence -
result = np.bincount(ind[dat])>0
If ind has negative numbers, offset it with the min value -
ar = ind[dat]
result = np.bincount(ar-ar.min())>0
Approach #2
One more with np.unique -
unq = np.unique(ind[dat])
n = len(np.unique(ind))
result = np.zeros(n,dtype=bool)
result[unq] = 1
We can use pandas to get n :
import pandas as pd
n = pd.Series(ind).nunique()
Approach #3
One more with indexing -
ar = ind[dat]
result = np.zeros(ar.max()+1,dtype=bool)
result[ar] = 1

Categories