Given m x n numpy array
X = np.array([
[1, 2],
[10, 20],
[100, 200]
])
how to find index of a row, i.e. [10, 20] -> 1?
n could any - 2, 3, ..., so I can have n x 3 arrays
Y = np.array([
[1, 2, 3],
[10, 20, 30],
[100, 200, 300]
])
so I need to pass a vector of size n, in this case n=3, i.e a vector [10, 20, 30] to get its row index 1? Again, n could be of any value, like 100 or 1000.
Numpy arrays could be big, so I don't want to convert them to lists to use .index()
Just in case that the query array contains duplicate rows that you are looking for, the function below returns multiple indices in such case.
def find_rows(source, target):
return np.where((source == target).all(axis=1))[0]
looking = [10, 20, 30]
Y = np.array([[1, 2, 3],
[10, 20, 30],
[100, 200, 300],
[10, 20, 30]])
print(find_rows(source=Y, target=looking)) # [1, 3]
You can use numpy.equal, which will broadcast and compare row vector against each row of the original array, and if all elements of a row are equal to the target, the row is identical to the target:
import numpy as np
np.flatnonzero(np.equal(X, [10, 20]).all(1))
# [1]
np.flatnonzero(np.equal(Y, [10, 20, 30]).all(1))
# [1]
You can make a function as follow:
def get_index(seq, *arrays):
for array in arrays:
try:
return np.where(array==seq)[0][0]
except IndexError:
pass
then:
>>>get_index([10,20,30],Y)
1
Or with just indexing:
>>>np.where((Y==[10,20,30]).all(axis=1))[0]
1
Related
This is a follow up on this question.
From a 2d array, create another 2d array composed of randomly selected values from original array (values not shared among rows) without using a loop
I am looking for a way to create a 2D array whose rows are randomly selected unique values (non-repeating) from another row, without using a loop.
Here is a way to do it With using a loop.
pool = np.random.randint(0, 30, size=[4,5])
seln = np.empty([4,3], int)
for i in range(0, pool.shape[0]):
seln[i] =np.random.choice(pool[i], 3, replace=False)
print('pool = ', pool)
print('seln = ', seln)
>pool = [[ 1 11 29 4 13]
[29 1 2 3 24]
[ 0 25 17 2 14]
[20 22 18 9 29]]
seln = [[ 8 12 0]
[ 4 19 13]
[ 8 15 24]
[12 12 19]]
Here is a method that does not uses a loop, however, it can select the same value multiple times in each row.
pool = np.random.randint(0, 30, size=[4,5])
print(pool)
array([[ 4, 18, 0, 15, 9],
[ 0, 9, 21, 26, 9],
[16, 28, 11, 19, 24],
[20, 6, 13, 2, 27]])
# New array shape
new_shape = (pool.shape[0],3)
# Indices where to randomly choose from
ix = np.random.choice(pool.shape[1], new_shape)
array([[0, 3, 3],
[1, 1, 4],
[2, 4, 4],
[1, 2, 1]])
ixs = (ix.T + range(0,np.prod(pool.shape),pool.shape[1])).T
array([[ 0, 3, 3],
[ 6, 6, 9],
[12, 14, 14],
[16, 17, 16]])
pool.flatten()[ixs].reshape(new_shape)
array([[ 4, 15, 15],
[ 9, 9, 9],
[11, 24, 24],
[ 6, 13, 6]])
I am looking for a method that does not use a loop, and if a particular value from a row is selected, that value can Not be selected again.
Here is a way without explicit looping. However, it requires generating an array of random numbers of the size of the original array. That said, the generation is done using compiled code so it should be pretty fast. It can fail if you happen to generate two identical numbers, but the chance of that happening is essentially zero.
m,n = 4,5
pool = np.random.randint(0, 30, size=[m,n])
new_width = 3
mask = np.argsort(np.random.rand(m,n))<new_width
pool[mask].reshape(m,3)
How it works:
We generate a random array of floats, and argsort it. By default, when artsort is applied to a 2d array it is applied along axis 1 so the value of the i,j entry of the argsorted list is what place the j-th entry of the i-th row would appear if you sorted the i-th row.
We then find all the values in this array where the entries whose values are less than new_width. Each row contains the numbers 0,...,n-1 in a random order, so exactly new_width of them will be less than new_width. This means each row of mask will have exactly new_width number of entries which are True, and the rest will be False (when you use a boolean operator between a ndarray and a scalar it applies it component-wise).
Finally, the boolean mask is applied to the original data to grab new_width many entries from each row.
You could also use np.vectorize for your loop solution, although that is just shorthand for a loop.
For example, I have a list of N B x H tensor(i.e. a N x B x H tensor) and a list of N vectors (i.e. N x B tensor). And I want multiply each B x H tensor in the list with corresponding B dimensional tensor, resulting a N x H tensor.
I know how to use a single for-loop with PyTorch to implement the computation, but is there any vectorised implantation? (i.e. no for-loop, just using PyTorch/numpy operations)
You could achieve this with torch.bmm() and some torch.squeeze()/torch.unsqueeze().
I am personally rather fond of the more generictorch.einsum() (which I find more readable):
import torch
import numpy as np
A = torch.from_numpy(np.array([[[1, 10, 100], [2, 20, 200], [3, 30, 300]],
[[4, 40, 400], [5, 50, 500], [6, 60, 600]]]))
B = torch.from_numpy(np.array([[ 1, 2, 3],
[-1, -2, -3]]))
AB = torch.einsum("nbh,nb->nh", (A, B))
print(AB)
# tensor([[ 14, 140, 1400],
# [ -32, -320, -3200]])
I'm using a for loop to pull values from 2 arrays and then put them together in a new array. I can get the last row of the array, but not all the rows before it. (I would like to append each row to the same array.) Any ideas?
import numpy as np
# Array inputs
a = np.array([0, 1, 2])
b = np.array([[1, 10, 20, 30], [2, 40, 50, 60], [3, 70, 80, 90]])
Note that the a array will be random depending on what file is loaded. So for one file, it might be:
([1, 1, 2])
And then for another file it might be:
([0, 2, 1])
So, the a array looks up the first value of the b array and then takes its last 3 values while indexing this action.
From these 2 arrays, I want a new array like this:
# ([[0, 10, 20, 30],
# [1, 40, 50, 60],
# [2, 70, 80, 90]])
Here's my loop:
# Loop to put all values in c and d arrays:
for index, value in enumerate(np.nditer(a)):
c = b[value][1:4]
d = index
# Stack c and d array into e
e = np.hstack((c, d))
But returns this:
([2, 70, 80, 90]) # Only last line of loop.
# I wish to get last line and all lines before it.
Without using for loop, you can copy contents of b to e then, replace content of e of all rows first column ( using e[:,0]) by a:
import numpy as np
# Array inputs
a = np.array([0, 1, 2])
b = np.array([[1, 10, 20, 30], [2, 40, 50, 60], [3, 70, 80, 90]])
e = np.copy(b)
e[:,0] = a
print(e)
Result:
[[ 0 10 20 30]
[ 1 40 50 60]
[ 2 70 80 90]]
I have the following matrix:
M = np.matrix([[1,2,3,4,5,6,7,8,9,10],
[11,12,13,14,15,16,17,18,19,20],
[21,22,23,24,25,26,27,28,29,30]])
And I receive a vector indexing the columns of the matrix:
index = np.array([1,1,2,2,2,2,3,4,4,4])
This vector has 4 different values, so my objective is to create a list containing four new matrices so that the first matrix is made by the first two columns of M, the second matrix is made by columns 3 to 6 and so on:
M1 = np.matrix([[1,2],[11,12],[21,22]])
M2 = np.matrix([[3,4,5,6],[13,14,15,16],[23,24,25,26]])
M3 = np.matrix([[7],[17],[27]])
M4 = np.matrix([[8,9,10],[18,19,20],[28,29,30]])
l = list(M1,M2,M3,M4)
I need to do this in a automated way, since the number of rows and columns of M as well as the indexing scheme are not fixed. How can I do this?
There are 3 points to note:
For a variable number of variables, as in this case, the recommended solution is to use a dictionary.
You can use simple numpy indexing for the individual case.
Unless you have a very specific reason, use numpy.array instead of numpy.matrix.
Combining these points, you can use a dictionary comprehension:
d = {k: np.array(M[:, np.where(index==k)[0]]) for k in np.unique(index)}
Result:
{1: array([[ 1, 2],
[11, 12],
[21, 22]]),
2: array([[ 3, 4, 5, 6],
[13, 14, 15, 16],
[23, 24, 25, 26]]),
3: array([[ 7],
[17],
[27]]),
4: array([[ 8, 9, 10],
[18, 19, 20],
[28, 29, 30]])}
import numpy as np
M = np.matrix([[1,2,3,4,5,6,7,8,9,10],
[11,12,13,14,15,16,17,18,19,20],
[21,22,23,24,25,26,27,28,29,30]])
index = np.array([1,1,2,2,2,2,3,4,4,4])
m = [[],[],[],[]]
for i,c in enumerate(index):
m[k-1].append(c)
for idx in m:
print M[:,idx]
this is a little hard coded, I assumed you will always want 4 matrixes and such.. you can change it for more generalisation
I have several multidimensional arrays that have been zipped into a single list and am trying to remove values from the list according to a selection criteria applied to a single array. Specifically I have the 4 arrays, all of which have the same shape, that have all been zipped into one list of arrays:
in: array1.shape
out: (5,3)
...
in: array4.shape
out: (5,3)
in: array1
out: ([[0, 1, 1],
[0, 0, 1],
[0, 0, 1],
[0, 1, 1],
[0, 0, 0]])
in: array4
out: ([[20, 16, 20],
[15, 19, 17],
[21, 24, 23],
[22, 22, 26],
[27, 24, 23]])
in: fullarray = zip(array1,...,array4)
in: fullarray[0]
out: (array([0, 1, 1]), array([3, 4, 5]), array([33, 34, 35]), array([20, 16, 20]))
I am trying to iterate over the values from a single target array within each set of arrays and select the values from each array with the same index as the target array when the value equals 20. I doubt I explained that clearly so I'll give an example.
in: fullarray[0]
out: (array([0, 1, 1]), array([3, 4, 5]), array([33, 34, 35]), array([20, 16, 20]))
what I want is to iterate over the values of the fourth array in the list for
fullarray[x] and where the value = 20 to take the value of each array with
the same index and append them into a new list as an array.
so the output for fullarray[0] would be ([[0, 3, 33, 20]), [1, 5, 35, 20]])
My previous attempts have all generated a variety of error messages (example below). Any help would be appreciated.
in: for i in g:
for n in i:
if n == 3:
for k in n:
if k == 0:
newlist.append(i[k])
out: for i in fullarray:
2 for n in i:
----> 3 if n == 3:
4 for k in n:
5 if k == 0:
ValueError: The truth value of an array with more than one element is ambiguous.
Use a.any() or a.all()
Edit for modified question:
Here's a piece of code that's doing what you what. The time complexity could probably be improved though.
from numpy import array
fullarray = [(array([0, 1, 1]), array([3, 4, 5]), array([33, 34, 35]), array([20, 16, 20]))]
newlist = []
for arrays in fullarray:
for idx, value in enumerate(arrays[3]):
if value == 20:
newlist.append([array[idx] for array in arrays])
print newlist
Old answer: Assuming all your arrays are the same size you could do the following:
full[idx] contains a tuple with the values of all your arrays at the index idx in the order you zipped them.
import numpy as np
ar1 = np.array([1] * 8)
ar2 = np.array([2] * 8)
full = zip(ar1, ar2)
print full
newlist = []
for idx, v in enumerate(ar1):
if v != 0:
newlist.append(full[idx]) # Here you get a tuple such as (ar1[idx], ar2[idx])
But if len(ar1) > len(ar2) it is going to throw and exception so keep that in mind and adjust your code accordingly.