Row wise element search in an array - python

I have a vector ( say v = (1, 5, 7) ) and an array.
a = [ [1, 2, 3],
[4, 5, 6],
[7, 8, 9] ]
What would be the most efficient way to find indices of elements in vector v in the corresponding row in a. For example, the output here would be
b = (0, 1, 0) since 1 is at the 0th index in 1st row and so on.

You can convert v to a column vector with [:,None] and then compare with a to bring in broadcasting and finally use np.where to get the final output as indices -
np.where(a == v[:,None])[1]
Sample run -
In [34]: a
Out[34]:
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
In [35]: v
Out[35]: array([1, 5, 7])
In [36]: np.where(a == v[:,None])[1]
Out[36]: array([0, 1, 0])
In case, there are multiple elements in a row in a that match the corresponding element from v, you can use np.argmax to get indices of the first match in each row, like so -
np.argmax(a == v[:,None],axis=1)
Sample run -
In [57]: a
Out[57]:
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 7]])
In [58]: v
Out[58]: array([1, 5, 7])
In [59]: np.argmax(a == v[:,None],axis=1)
Out[59]: array([0, 1, 0])

>>> a = [ [1, 2, 3], [4, 5, 6], [7, 8, 9]]
>>> v = (1, 5, 7)
>>> b = tuple([a[id].index(val) for id, val in enumerate(v)])
>>> b
(0, 1, 0)

You can use list comprehension:
[a[idx].index(val) for idx, val in enumerate(v)]
Where enumerate returns an iterable of index and the value itself, and index returns the index of the first apperance of val in the correct row.
If you must get a tuple as the return value convert it in the end:
b = tuple([a[idx].index(val) for idx, val in enumerate(v)])
Just note that index may raise ValueError if val wasn't found in the correct row of a.

Related

remove array() from return

I am creating a function that takes in two lists and a tuple as data, and returns the data sorted in increasing order with respect to the first lists indexes (this isn't very important to my question but context.) Here is what I have:
def sort_data(data):
""" (tuple) -> tuple
data is a tuple of two lists.
Returns a copy of the input tuple sorted in
non-decreasing order with respect to the
data[0]
>>> sort_data(([5, 1, 7], [1, 2, 3]))
([1, 5, 7], [2, 1, 3])
>>> sort_data(([2, 4, 8], [1, 2, 3]))
([2, 4, 8], [1, 2, 3])
>>> sort_data( ([11, 4, -5], [1, 2, 3]))
([-5, 4, 11], [3, 2, 1])
"""
([a,b,c],[d,e,f]) = data
x = [a,b,c]
y = [d,e,f]
xarray = np.array(x)
yarray = np.array(y)
x1 = np.argsort(xarray)
xsort = (xarray[x1])
ysort = (yarray[x1])
#remove array()
return ([xsort],[ysort])
This is working great, but returns very slightly wrong. For example, I would want this as seen in my docstring:
>>> sort_data(([5, 1, 7], [1, 2, 3]))
([1, 5, 7], [2, 1, 3])
but instead I got this:
([array([1, 5, 7])], [array([2, 1, 3])])
How could I remove the array() so that I just have the two lists in a tuple as my return value? I tried to convert it to a tuple, but then it is two tuples, when I only want one.
In [78]: data = ([5, 1, 7], [1, 2, 3])
Since you are using argsort, you can sort both rows together:
Make an array from the list:
In [79]: arr = np.array(data)
In [80]: arr
Out[80]:
array([[5, 1, 7],
[1, 2, 3]])
sorting index:
In [81]: idx = np.argsort(arr[0])
In [82]: idx
Out[82]: array([1, 0, 2])
apply it to the columns:
In [83]: arr[:,idx]
Out[83]:
array([[1, 5, 7],
[2, 1, 3]])
make that array a list:
In [84]: arr[:,idx].tolist()
Out[84]: [[1, 5, 7], [2, 1, 3]]
Since you are given a tuple of lists, there should be a way of doing this sorting using Python sorted and its key. But I haven't used that nearly as much as the numpy.
I don't know if this is best or not:
In [11]: data = ([5, 1, 7], [1, 2, 3])
sort first list, recording the index as well:
In [12]: x=sorted([(v,i) for i,v in enumerate(data[0])], key=lambda x:x[0])
In [13]: x
Out[13]: [(1, 1), (5, 0), (7, 2)]
extract that index:
In [14]: idx = [i[1] for i in x]
In [15]: idx
Out[15]: [1, 0, 2]
use that to return both sublists:
In [16]: [[d[i] for i in idx] for d in data]
Out[16]: [[1, 5, 7], [2, 1, 3]]

Compare two 3d Numpy array and return unmatched values with index and later recreate them without loop

I am currently working on a problem where in one requirement I need to compare two 3d NumPy arrays and return the unmatched values with their index position and later recreate the same array. Currently, the only approach I can think of is to loop across the arrays to get the values during comparing and later recreating. The problem is with scale as there will be hundreds of arrays and looping effects the Latency of the overall application. I would be thankful if anyone can help me with better utilization of NumPy comparison while using minimal or no loops. A dummy code is below:
def compare_array(final_array_list):
base_array = None
i = 0
for array in final_array_list:
if i==0:
base_array =array[0]
else:
index = np.where(base_array != array)
#getting index like (array([0, 1]), array([1, 1]), array([2, 2]))
# to access all unmatched values I need to loop.Need to avoid loop here
i=i+1
return [base_array, [unmatched value (8,10)and its index (array([0, 1]), array([1, 1]), array([2, 2])],..]
# similarly recreate array1 back
def recreate_array(array_list):
# need to avoid looping while recreating array back
return list of array #i.e. [base_array, array_1]
# creating dummy array
base_array = np.array([[[1, 2, 3], [3, 4, 5]], [[5, 6, 7], [7, 8, 9]]])
array_1 = b = np.array([[[1, 2,3], [3, 4,8]], [[5, 6,7], [7, 8,10]]])
final_array_list = [base_array,array_1, ...... ]
#compare base_array with other arrays and get unmatched values (like 8,10 in array_1) and their index
difff_array = compare_array(final_array_list)
# recreate array1 from the base array after receiving unmatched value and its index value
recreate_array(difff_array)
I think this may be what you're looking for:
base_array = np.array([[[1, 2, 3], [3, 4, 5]], [[5, 6, 7], [7, 8, 9]]])
array_1 = b = np.array([[[1, 2,3], [3, 4,8]], [[5, 6,7], [7, 8,10]]])
match_mask = (base_array == array_1)
idx_unmatched = np.argwhere(~match_mask)
# idx_unmatched:
# array([[0, 1, 2],
# [1, 1, 2]])
# values with associated with idx_unmatched:
values_unmatched = base_array[tuple(idx_unmatched.T)]
# values_unmatched:
# array([5, 9])
I'm not sure I understand what you mean by "recreate them" (completely recreate them? why not use the arrays themselves?).
I can help you though by noting that ther are plenty of functions which vectorize with numpy, and as a general rule of thumb, do not use for loops unless G-d himself tells you to :)
For example:
If a, b are any np.arrays (regardless of dimensions), the simple a == b will return a numpy array of the same size, with boolean values. Trues = they are equal in this coordinate, and False otherwise.
The function np.where(c), will convert c to a boolean np.array, and return you the indexes in which c is True.
To clarify:
Here I instantiate two arrays, with b differing from a with -1 values:
Note what a==b is, at the end.
>>> a = np.random.randint(low=0, high=10, size=(4, 4))
>>> b = np.copy(a)
>>> b[2, 3] = -1
>>> b[0, 1] = -1
>>> b[1, 1] = -1
>>> a
array([[9, 9, 3, 4],
[8, 4, 6, 7],
[8, 4, 5, 5],
[1, 7, 2, 5]])
>>> b
array([[ 9, -1, 3, 4],
[ 8, -1, 6, 7],
[ 8, 4, 5, -1],
[ 1, 7, 2, 5]])
>>> a == b
array([[ True, False, True, True],
[ True, False, True, True],
[ True, True, True, False],
[ True, True, True, True]])
Now the function np.where, which output is a bit tricky, but can be used easily. This will return two arrays of the same size: the first array is the rows and the second array is the columns at places in which the given array is True.
>>> np.where(a == b)
(array([0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3], dtype=int64), array([0, 2, 3, 0, 2, 3, 0, 1, 2, 0, 1, 2, 3], dtype=int64))
Now you can "fix" the b array to match a, by switching the values of b ar indexes in which it differs from a, to be a's indexes:
>>> b[np.where(a != b)]
array([-1, -1, -1])
>>> b[np.where(a != b)] = a[np.where(a != b)]
>>> np.all(a == b)
True

How to sort a numpy matrix using a mask?

I have two matrices A, B, Which look like this:
A = array([[2, 2, 1, 0, 8],
[8, 2, 0, 3, 7],
[3, 2, 6, 5, 3],
[1, 4, 2, 5, 8],
[2, 3, 7, 0, 3]])
B = array([[3, 7, 6, 8, 3],
[0, 7, 4, 4, 3],
[1, 2, 0, 0, 4],
[8, 6, 6, 7, 1],
[8, 1, 0, 4, 8]])
I am trying to sort A and B BUT I need B to be ordered with the mask from A.
I tried this:
mask = A.argsort()
A = A[mask]
B = B[mask]
However the return value is a shaped (5, 5, 5) matrix
The next snippet works, but is using two iterations. I need something faster. Has anybody an Idea ?
A = [row[order] for row, order in zip(A,mask)]
B = [row[order] for row, order in zip(B,mask)]
You can use fancy indexing. The result will be the same shape as your indices broadcasted together. Your column index is already the right shape. A row index of size (A.shape[0], 1) would broadcast correctly:
r = np.arange(A.shape[0]).reshape(-1, 1)
c = np.argsort(A)
A = A[r, c]
B = B[r, c]
The reason that your original index didn't work out is that you were indexing with a single dimension, which selects entire rows based on each location. This would have failed if you had more columns than rows.
A simpler way would be to follow what the argsort docs suggest:
A = np.take_along_axis(A, mask, axis=-1)
B = np.take_along_axis(B, mask, axis=-1)

Delete one element from each row of a NumPy array

import numpy as np
a=np.array([[1,2,3], [4,5,6], [7,8,9]])
k = [0, 1, 2]
print np.delete(a, k, 1)
This returns
[]
But, the result I really want is
[[2,3],
[4,6],
[7,8]]
I want to delete the first element (indexed as 0) from a[0], the second (indexed as 1) from a[1], and the third (indexed as 2) from a[2].
Any thoughts?
Here's an approach using boolean indexing -
m,n = a.shape
out = a[np.arange(n) != np.array(k)[:,None]].reshape(m,-1)
If you would like to persist with np.delete, you could calculate the linear indices and then delete those after flattening the input array, like so -
m,n = a.shape
del_idx = np.arange(n)*m + k
out = np.delete(a.ravel(),del_idx,axis=0).reshape(m,-1)
Sample run -
In [94]: a
Out[94]:
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
In [95]: k = [0, 2, 1]
In [96]: m,n = a.shape
In [97]: a[np.arange(n) != np.array(k)[:,None]].reshape(m,-1)
Out[97]:
array([[2, 3],
[4, 5],
[7, 9]])
In [98]: del_idx = np.arange(n)*m + k
In [99]: np.delete(a.ravel(),del_idx,axis=0).reshape(m,-1)
Out[99]:
array([[2, 3],
[4, 5],
[7, 9]])

Finding differences between all values in an List

I want to find the differences between all values in a numpy array and append it to a new list.
Example: a = [1,4,2,6]
result : newlist= [3,1,5,3,2,2,1,2,4,5,2,4]
i.e for each value i of a, determine difference between values of the rest of the list.
At this point I have been unable to find a solution
You can do this:
a = [1,4,2,6]
newlist = [abs(i-j) for i in a for j in a if i != j]
Output:
print newlist
[3, 1, 5, 3, 2, 2, 1, 2, 4, 5, 2, 4]
I believe what you are trying to do is to calculate absolute differences between elements of the input list, but excluding the self-differences. So, with that idea, this could be one vectorized approach also known as array programming -
# Input list
a = [1,4,2,6]
# Convert input list to a numpy array
arr = np.array(a)
# Calculate absolute differences between each element
# against all elements to give us a 2D array
sub_arr = np.abs(arr[:,None] - arr)
# Get diagonal indices for the 2D array
N = arr.size
rem_idx = np.arange(N)*(N+1)
# Remove the diagonal elements for the final output
out = np.delete(sub_arr,rem_idx)
Sample run to show the outputs at each step -
In [60]: a
Out[60]: [1, 4, 2, 6]
In [61]: arr
Out[61]: array([1, 4, 2, 6])
In [62]: sub_arr
Out[62]:
array([[0, 3, 1, 5],
[3, 0, 2, 2],
[1, 2, 0, 4],
[5, 2, 4, 0]])
In [63]: rem_idx
Out[63]: array([ 0, 5, 10, 15])
In [64]: out
Out[64]: array([3, 1, 5, 3, 2, 2, 1, 2, 4, 5, 2, 4])

Categories