Related
I am trying to rewrite the following snippet of Matlab code about outer product of matrices into python code,
function Y = matlab_outer_product(X,x)
A = reshape(X, [size(X) ones(1,ndims(x))]);
B = reshape(x, [ones(1,ndims(X)) size(x)]);
Y = squeeze(bsxfun(#times,A,B));
end
My one-to-one translation of this to python code is as following (considering how the shape of numpy array and matlab matrices are arranged),
def python_outer_product(X, x):
X_shape = list(X.shape)
x_shape = list(x.shape)
A = X.reshape(*list(np.ones(np.ndim(x),dtype=int)),*X_shape)
B = x.reshape(*x_shape,*list(np.ones(np.ndim(X),dtype=int)))
Y = A*B
return Y.squeeze()
Then trying the inputs, for instance,
matlab_outer_product([1,2],[[3,4];[5,6]])
python_out_product(np.array([[1,2]], np.array([[3,4],[5,6]])))
The outputs don't quite match. In matlab, it outputs
output(:,:,1) = [[3,5];[6,10]]
output(:,:,2) = [[4,6];[8,12]]
In python, it outputs
output = array([
[[ 3, 6],
[ 4, 8]],
[[ 5, 10],
[ 6, 12]]
])
They're almost identical, but not quite. I wonder what's wrong with code and how to change the python code to match with matlab output?
In full gory detail (since my MATLAB memory is old):
Octave
>> X = [1,2];
>> x = [[3,4];[5,6]];
>> A = reshape(X, [size(X) ones(1,ndims(x))]);
>> B = reshape(x, [ones(1,ndims(X)) size(x)]);
>> A
A =
1 2
>> B
B =
ans(:,:,1,1) = 3
ans(:,:,2,1) = 5
ans(:,:,1,2) = 4
ans(:,:,2,2) = 6
>> bsxfun(#times,A,B)
ans =
ans(:,:,1,1) =
3 6
ans(:,:,2,1) =
5 10
ans(:,:,1,2) =
4 8
ans(:,:,2,2) =
6 12
>> squeeze(bsxfun(#times,A,B))
ans =
ans(:,:,1) =
3 5
6 10
ans(:,:,2) =
4 6
8 12
You start with a (1,2) and (2,2), expand the second to (1,1,2,2). The bsxfun produces a (1,2,2,2) which is squeezed to (2,2,2).
A is X reshaped to [1 2 1 1], but the two outer size 1 dimensions are squeeze out, resulting in no change.
This MATLAB outter is a bit convoluted, using bsxfun to perform elementwise multiplication of (1,2,1,1) with (1,1,1,2). At least in Octave it's the same as
A.*B
In numpy
In [77]: X
Out[77]: array([[1, 2]]) # (1,2)
In [78]: x
Out[78]:
array([[3, 4], # (2,2)
[5, 6]])
Note that the MATLAB/Octave x when flattened has elements (3,5,4,6), while the numpy ravel is [3,4,5,6].
In numpy I can simply do:
In [79]: X[:,:,None,None]*x
Out[79]:
array([[[[ 3, 4], (1,2,2,2)
[ 5, 6]],
[[ 6, 8],
[10, 12]]]])
or without the extra size 1 dimension of X:
In [84]: (X[0,:,None,None]*x)
Out[84]:
array([[[ 3, 4],
[ 5, 6]],
[[ 6, 8],
[10, 12]]])
In [85]: (X[0,:,None,None]*x).ravel()
Out[85]: array([ 3, 4, 5, 6, 6, 8, 10, 12])
compare that with the Octave ravel
>> squeeze(bsxfun(#times,A,B))(:)'
ans =
3 6 5 10 4 8 6 12
We could add a transpose to the numpy
In [96]: (X[0,:,None,None]*x).transpose(2,1,0).ravel()
Out[96]: array([ 3, 6, 5, 10, 4, 8, 6, 12])
In [97]: (X[0,:,None,None]*x).transpose(2,1,0)
Out[97]:
array([[[ 3, 6],
[ 5, 10]],
[[ 4, 8],
[ 6, 12]]])
At least in numpy we can tweak the dimension order in lots of ways, so I won't try to suggest an optimal. I still think it's better to write code that's "natural" to numpy than to slavishly match the MATLAB order.
another try
I realized, above, that the MATLAB is just doing A*.B with
(1,2,1,1) arrays (1,1,1,2), where the extra 1's were added to "broadcast".
Using transpose to the same dimension outermost (leading in numpy)
In [5]: X = X.T; x = x.T
In [6]: X.shape
Out[6]: (2, 1)
In [7]: x.shape
Out[7]: (2, 2)
In [8]: x
Out[8]:
array([[3, 5],
[4, 6]])
In [9]: x.ravel()
Out[9]: array([3, 5, 4, 6]) # compare with MATLAB (:)'
Elementwise multiplication with the same dimension expansion:
In [10]: X[None,None,:,:]*x[:,:,None,None]
Out[10]:
array([[[[ 3],
[ 6]],
[[ 5],
[10]]],
[[[ 4],
[ 8]],
[[ 6],
[12]]]])
In [11]: _.shape
Out[11]: (2, 2, 2, 1) # compare with octave (1,2,2,2)
In [12]: __.squeeze()
Out[12]:
array([[[ 3, 6],
[ 5, 10]],
[[ 4, 8],
[ 6, 12]]])
the ravel is the same as Octave:
In [13]: ___.ravel()
Out[13]: array([ 3, 6, 5, 10, 4, 8, 6, 12])
expand_dims can be used instead of the indexing. Internally it uses reshape:
In [15]: np.expand_dims(X,(0,1)).shape
Out[15]: (1, 1, 2, 1)
In [16]: np.expand_dims(x,(2,3)).shape
Out[16]: (2, 2, 1, 1)
I am currently working on a problem where in one requirement I need to compare two 3d NumPy arrays and return the unmatched values with their index position and later recreate the same array. Currently, the only approach I can think of is to loop across the arrays to get the values during comparing and later recreating. The problem is with scale as there will be hundreds of arrays and looping effects the Latency of the overall application. I would be thankful if anyone can help me with better utilization of NumPy comparison while using minimal or no loops. A dummy code is below:
def compare_array(final_array_list):
base_array = None
i = 0
for array in final_array_list:
if i==0:
base_array =array[0]
else:
index = np.where(base_array != array)
#getting index like (array([0, 1]), array([1, 1]), array([2, 2]))
# to access all unmatched values I need to loop.Need to avoid loop here
i=i+1
return [base_array, [unmatched value (8,10)and its index (array([0, 1]), array([1, 1]), array([2, 2])],..]
# similarly recreate array1 back
def recreate_array(array_list):
# need to avoid looping while recreating array back
return list of array #i.e. [base_array, array_1]
# creating dummy array
base_array = np.array([[[1, 2, 3], [3, 4, 5]], [[5, 6, 7], [7, 8, 9]]])
array_1 = b = np.array([[[1, 2,3], [3, 4,8]], [[5, 6,7], [7, 8,10]]])
final_array_list = [base_array,array_1, ...... ]
#compare base_array with other arrays and get unmatched values (like 8,10 in array_1) and their index
difff_array = compare_array(final_array_list)
# recreate array1 from the base array after receiving unmatched value and its index value
recreate_array(difff_array)
I think this may be what you're looking for:
base_array = np.array([[[1, 2, 3], [3, 4, 5]], [[5, 6, 7], [7, 8, 9]]])
array_1 = b = np.array([[[1, 2,3], [3, 4,8]], [[5, 6,7], [7, 8,10]]])
match_mask = (base_array == array_1)
idx_unmatched = np.argwhere(~match_mask)
# idx_unmatched:
# array([[0, 1, 2],
# [1, 1, 2]])
# values with associated with idx_unmatched:
values_unmatched = base_array[tuple(idx_unmatched.T)]
# values_unmatched:
# array([5, 9])
I'm not sure I understand what you mean by "recreate them" (completely recreate them? why not use the arrays themselves?).
I can help you though by noting that ther are plenty of functions which vectorize with numpy, and as a general rule of thumb, do not use for loops unless G-d himself tells you to :)
For example:
If a, b are any np.arrays (regardless of dimensions), the simple a == b will return a numpy array of the same size, with boolean values. Trues = they are equal in this coordinate, and False otherwise.
The function np.where(c), will convert c to a boolean np.array, and return you the indexes in which c is True.
To clarify:
Here I instantiate two arrays, with b differing from a with -1 values:
Note what a==b is, at the end.
>>> a = np.random.randint(low=0, high=10, size=(4, 4))
>>> b = np.copy(a)
>>> b[2, 3] = -1
>>> b[0, 1] = -1
>>> b[1, 1] = -1
>>> a
array([[9, 9, 3, 4],
[8, 4, 6, 7],
[8, 4, 5, 5],
[1, 7, 2, 5]])
>>> b
array([[ 9, -1, 3, 4],
[ 8, -1, 6, 7],
[ 8, 4, 5, -1],
[ 1, 7, 2, 5]])
>>> a == b
array([[ True, False, True, True],
[ True, False, True, True],
[ True, True, True, False],
[ True, True, True, True]])
Now the function np.where, which output is a bit tricky, but can be used easily. This will return two arrays of the same size: the first array is the rows and the second array is the columns at places in which the given array is True.
>>> np.where(a == b)
(array([0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3], dtype=int64), array([0, 2, 3, 0, 2, 3, 0, 1, 2, 0, 1, 2, 3], dtype=int64))
Now you can "fix" the b array to match a, by switching the values of b ar indexes in which it differs from a, to be a's indexes:
>>> b[np.where(a != b)]
array([-1, -1, -1])
>>> b[np.where(a != b)] = a[np.where(a != b)]
>>> np.all(a == b)
True
I've been trying to look up how np.diag_indices work, and for examples of them, however the documentation for it is a bit light. I know this creates a diagonal array through your matrix, however I want to change the diagonal array (I was thinking of using a loop to change its dimensions or something along those lines).
I.E.
say we have a 3x2 matrix:
[[1 2]
[3 4]
[5 6]]
Now if I use np.diag_indices it will form a diagonal array starting at (0,0) and goes through (1,1).
[1 4]
However, I'd like this diagonal array to then shift one down. So now it starts at (0,1) and goes through (1,2).
[3 6]
However there are only 2 arguments for np.diag_indices, neither of which from the looks of it enable me to do this. Am I using the wrong tool to try and achieve this? If so, what tools can I use to create a changing diagonal array that goes through my matrix? (I'm looking for something that will also work on larger matrices like a 200x50).
The code for diag_indices is simple, so simple that I've never used it:
idx = arange(n)
return (idx,) * ndim
In [68]: np.diag_indices(4,2)
Out[68]: (array([0, 1, 2, 3]), array([0, 1, 2, 3]))
It just returns a tuple of arrays, the arange repeated n times. It's useful for indexing the main diagonal of a square matrix, e.g.
In [69]: arr = np.arange(16).reshape(4,4)
In [70]: arr
Out[70]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
In [71]: arr[np.diag_indices(4,2)]
Out[71]: array([ 0, 5, 10, 15])
The application is straight forward indexing with two arrays that match in shape.
It works on other shapes - if they are big enogh.
np.diag applied to the same array does the same thing:
In [72]: np.diag(arr)
Out[72]: array([ 0, 5, 10, 15])
but it also allows for offset:
In [73]: np.diag(arr, 1)
Out[73]: array([ 1, 6, 11])
===
Indexing with diag_indices does allow us to change that diagonal:
In [78]: arr[np.diag_indices(4,2)] += 10
In [79]: arr
Out[79]:
array([[10, 1, 2, 3],
[ 4, 15, 6, 7],
[ 8, 9, 20, 11],
[12, 13, 14, 25]])
====
But we don't have to use diag_indices to generate the desired indexing arrays:
In [80]: arr = np.arange(1,7).reshape(3,2)
In [81]: arr
Out[81]:
array([[1, 2],
[3, 4],
[5, 6]])
selecting values from 1st 2 rows, and columns:
In [82]: arr[np.arange(2), np.arange(2)]
Out[82]: array([1, 4])
In [83]: arr[np.arange(2), np.arange(2)] += 10
In [84]: arr
Out[84]:
array([[11, 2],
[ 3, 14],
[ 5, 6]])
and for a difference selection of rows:
In [85]: arr[np.arange(1,3), np.arange(2)] += 20
In [86]: arr
Out[86]:
array([[11, 2],
[23, 14],
[ 5, 26]])
The relevant documentation section on advanced indexing with integer arrays: https://numpy.org/doc/stable/reference/arrays.indexing.html#purely-integer-array-indexing
I have a 2d and 1d array. I am looking to find the two rows that contain at least once the values from the 1d array as follows:
import numpy as np
A = np.array([[0, 3, 1],
[9, 4, 6],
[2, 7, 3],
[1, 8, 9],
[6, 2, 7],
[4, 8, 0]])
B = np.array([0,1,2,3])
results = []
for elem in B:
results.append(np.where(A==elem)[0])
This works and results in the following array:
[array([0, 5], dtype=int64),
array([0, 3], dtype=int64),
array([2, 4], dtype=int64),
array([0, 2], dtype=int64)]
But this is probably not the best way of proceeding. Following the answers given in this question (Search Numpy array with multiple values) I tried the following solutions:
out1 = np.where(np.in1d(A, B))
num_arr = np.sort(B)
idx = np.searchsorted(B, A)
idx[idx==len(num_arr)] = 0
out2 = A[A == num_arr[idx]]
But these give me incorrect values:
In [36]: out1
Out[36]: (array([ 0, 1, 2, 6, 8, 9, 13, 17], dtype=int64),)
In [37]: out2
Out[37]: array([0, 3, 1, 2, 3, 1, 2, 0])
Thanks for your help
If you need to know whether each row of A contains ANY element of array B without interest in which particular element of B it is, the following script can be used:
input:
np.isin(A,B).sum(axis=1)>0
output:
array([ True, False, True, True, True, True])
Since you're dealing with a 2D array* you can use broadcasting to compare B with raveled version of A. This will give you the respective indices in a raveled shape. Then you can reverse the result and get the corresponding indices in original array using np.unravel_index.
In [50]: d = np.where(B[:, None] == A.ravel())[1]
In [51]: np.unravel_index(d, A.shape)
Out[51]: (array([0, 5, 0, 3, 2, 4, 0, 2]), array([0, 2, 2, 0, 0, 1, 1, 2]))
^
# expected result
* From documentation: For 3-dimensional arrays this is certainly efficient in terms of lines of code, and, for small data sets, it can also be computationally efficient. For large data sets, however, the creation of the large 3-d array may result in sluggish performance.
Also, Broadcasting is a powerful tool for writing short and usually intuitive code that does its computations very efficiently in C. However, there are cases when broadcasting uses unnecessarily large amounts of memory for a particular algorithm. In these cases, it is better to write the algorithm's outer loop in Python. This may also produce more readable code, as algorithms that use broadcasting tend to become more difficult to interpret as the number of dimensions in the broadcast increases.
Is something like this what you are looking for?
import numpy as np
from itertools import combinations
A = np.array([[0, 3, 1],
[9, 4, 6],
[2, 7, 3],
[1, 8, 9],
[6, 2, 7],
[4, 8, 0]])
B = np.array([0,1,2,3])
for i in combinations(A, 2):
if np.all(np.isin(B, np.hstack(i))):
print(i[0], ' ', i[1])
which prints the following:
[0 3 1] [2 7 3]
[0 3 1] [6 2 7]
note: this solution does NOT require the rows be consecutive. Please let me know if that is required.
I'm struggling to select the specific columns per row of a NumPy matrix.
Suppose I have the following matrix which I would call X:
[1, 2, 3]
[4, 5, 6]
[7, 8, 9]
I also have a list of column indexes per every row which I would call Y:
[1, 0, 2]
I need to get the values:
[2]
[4]
[9]
Instead of a list with indexes Y, I can also produce a matrix with the same shape as X where every column is a bool / int in the range 0-1 value, indicating whether this is the required column.
[0, 1, 0]
[1, 0, 0]
[0, 0, 1]
I know this can be done with iterating over the array and selecting the column values I need. However, this will be executed frequently on big arrays of data and that's why it has to run as fast as it can.
I was thus wondering if there is a better solution?
If you've got a boolean array you can do direct selection based on that like so:
>>> a = np.array([True, True, True, False, False])
>>> b = np.array([1,2,3,4,5])
>>> b[a]
array([1, 2, 3])
To go along with your initial example you could do the following:
>>> a = np.array([[1,2,3], [4,5,6], [7,8,9]])
>>> b = np.array([[False,True,False],[True,False,False],[False,False,True]])
>>> a[b]
array([2, 4, 9])
You can also add in an arange and do direct selection on that, though depending on how you're generating your boolean array and what your code looks like YMMV.
>>> a = np.array([[1,2,3], [4,5,6], [7,8,9]])
>>> a[np.arange(len(a)), [1,0,2]]
array([2, 4, 9])
You can do something like this:
In [7]: a = np.array([[1, 2, 3],
...: [4, 5, 6],
...: [7, 8, 9]])
In [8]: lst = [1, 0, 2]
In [9]: a[np.arange(len(a)), lst]
Out[9]: array([2, 4, 9])
More on indexing multi-dimensional arrays: http://docs.scipy.org/doc/numpy/user/basics.indexing.html#indexing-multi-dimensional-arrays
Recent numpy versions have added a take_along_axis (and put_along_axis) that does this indexing cleanly.
In [101]: a = np.arange(1,10).reshape(3,3)
In [102]: b = np.array([1,0,2])
In [103]: np.take_along_axis(a, b[:,None], axis=1)
Out[103]:
array([[2],
[4],
[9]])
It operates in the same way as:
In [104]: a[np.arange(3), b]
Out[104]: array([2, 4, 9])
but with different axis handling. It's especially aimed at applying the results of argsort and argmax.
A simple way might look like:
In [1]: a = np.array([[1, 2, 3],
...: [4, 5, 6],
...: [7, 8, 9]])
In [2]: y = [1, 0, 2] #list of indices we want to select from matrix 'a'
range(a.shape[0]) will return array([0, 1, 2])
In [3]: a[range(a.shape[0]), y] #we're selecting y indices from every row
Out[3]: array([2, 4, 9])
You can do it by using iterator. Like this:
np.fromiter((row[index] for row, index in zip(X, Y)), dtype=int)
Time:
N = 1000
X = np.zeros(shape=(N, N))
Y = np.arange(N)
##Aशwini चhaudhary
%timeit X[np.arange(len(X)), Y]
10000 loops, best of 3: 30.7 us per loop
#mine
%timeit np.fromiter((row[index] for row, index in zip(X, Y)), dtype=int)
1000 loops, best of 3: 1.15 ms per loop
#mine
%timeit np.diag(X.T[Y])
10 loops, best of 3: 20.8 ms per loop
Another clever way is to first transpose the array and index it thereafter. Finally, take the diagonal, its always the right answer.
X = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])
Y = np.array([1, 0, 2, 2])
np.diag(X.T[Y])
Step by step:
Original arrays:
>>> X
array([[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9],
[10, 11, 12]])
>>> Y
array([1, 0, 2, 2])
Transpose to make it possible to index it right.
>>> X.T
array([[ 1, 4, 7, 10],
[ 2, 5, 8, 11],
[ 3, 6, 9, 12]])
Get rows in the Y order.
>>> X.T[Y]
array([[ 2, 5, 8, 11],
[ 1, 4, 7, 10],
[ 3, 6, 9, 12],
[ 3, 6, 9, 12]])
The diagonal should now become clear.
>>> np.diag(X.T[Y])
array([ 2, 4, 9, 12]
The answer from hpaulj using take_along_axis should be the accepted one.
Here is a derived version with an N-dim index array:
>>> arr = np.arange(20).reshape((2,2,5))
>>> idx = np.array([[1,0],[2,4]])
>>> np.take_along_axis(arr, idx[...,None], axis=-1)
array([[[ 1],
[ 5]],
[[12],
[19]]])
Note that the selection operation is ignorant about the shapes. I used this to refine a possibly vector-valued argmax result from histogram by fitting parabolas:
def interpol(arr):
i = np.argmax(arr, axis=-1)
a = lambda Δ: np.squeeze(np.take_along_axis(arr, i[...,None]+Δ, axis=-1), axis=-1)
frac = .5*(a(1) - a(-1)) / (2*a(0) - a(-1) - a(1)) # |frac| < 0.5
return i + frac
Note the squeeze to remove the dimension of size 1 resulting in the same shape of i and frac, the integer and fractional part of the peak position.
I'm quite sure that it is possible to avoid the lambda, but would the interpolation formula still look nice?