Related
I am trying to rewrite the following snippet of Matlab code about outer product of matrices into python code,
function Y = matlab_outer_product(X,x)
A = reshape(X, [size(X) ones(1,ndims(x))]);
B = reshape(x, [ones(1,ndims(X)) size(x)]);
Y = squeeze(bsxfun(#times,A,B));
end
My one-to-one translation of this to python code is as following (considering how the shape of numpy array and matlab matrices are arranged),
def python_outer_product(X, x):
X_shape = list(X.shape)
x_shape = list(x.shape)
A = X.reshape(*list(np.ones(np.ndim(x),dtype=int)),*X_shape)
B = x.reshape(*x_shape,*list(np.ones(np.ndim(X),dtype=int)))
Y = A*B
return Y.squeeze()
Then trying the inputs, for instance,
matlab_outer_product([1,2],[[3,4];[5,6]])
python_out_product(np.array([[1,2]], np.array([[3,4],[5,6]])))
The outputs don't quite match. In matlab, it outputs
output(:,:,1) = [[3,5];[6,10]]
output(:,:,2) = [[4,6];[8,12]]
In python, it outputs
output = array([
[[ 3, 6],
[ 4, 8]],
[[ 5, 10],
[ 6, 12]]
])
They're almost identical, but not quite. I wonder what's wrong with code and how to change the python code to match with matlab output?
In full gory detail (since my MATLAB memory is old):
Octave
>> X = [1,2];
>> x = [[3,4];[5,6]];
>> A = reshape(X, [size(X) ones(1,ndims(x))]);
>> B = reshape(x, [ones(1,ndims(X)) size(x)]);
>> A
A =
1 2
>> B
B =
ans(:,:,1,1) = 3
ans(:,:,2,1) = 5
ans(:,:,1,2) = 4
ans(:,:,2,2) = 6
>> bsxfun(#times,A,B)
ans =
ans(:,:,1,1) =
3 6
ans(:,:,2,1) =
5 10
ans(:,:,1,2) =
4 8
ans(:,:,2,2) =
6 12
>> squeeze(bsxfun(#times,A,B))
ans =
ans(:,:,1) =
3 5
6 10
ans(:,:,2) =
4 6
8 12
You start with a (1,2) and (2,2), expand the second to (1,1,2,2). The bsxfun produces a (1,2,2,2) which is squeezed to (2,2,2).
A is X reshaped to [1 2 1 1], but the two outer size 1 dimensions are squeeze out, resulting in no change.
This MATLAB outter is a bit convoluted, using bsxfun to perform elementwise multiplication of (1,2,1,1) with (1,1,1,2). At least in Octave it's the same as
A.*B
In numpy
In [77]: X
Out[77]: array([[1, 2]]) # (1,2)
In [78]: x
Out[78]:
array([[3, 4], # (2,2)
[5, 6]])
Note that the MATLAB/Octave x when flattened has elements (3,5,4,6), while the numpy ravel is [3,4,5,6].
In numpy I can simply do:
In [79]: X[:,:,None,None]*x
Out[79]:
array([[[[ 3, 4], (1,2,2,2)
[ 5, 6]],
[[ 6, 8],
[10, 12]]]])
or without the extra size 1 dimension of X:
In [84]: (X[0,:,None,None]*x)
Out[84]:
array([[[ 3, 4],
[ 5, 6]],
[[ 6, 8],
[10, 12]]])
In [85]: (X[0,:,None,None]*x).ravel()
Out[85]: array([ 3, 4, 5, 6, 6, 8, 10, 12])
compare that with the Octave ravel
>> squeeze(bsxfun(#times,A,B))(:)'
ans =
3 6 5 10 4 8 6 12
We could add a transpose to the numpy
In [96]: (X[0,:,None,None]*x).transpose(2,1,0).ravel()
Out[96]: array([ 3, 6, 5, 10, 4, 8, 6, 12])
In [97]: (X[0,:,None,None]*x).transpose(2,1,0)
Out[97]:
array([[[ 3, 6],
[ 5, 10]],
[[ 4, 8],
[ 6, 12]]])
At least in numpy we can tweak the dimension order in lots of ways, so I won't try to suggest an optimal. I still think it's better to write code that's "natural" to numpy than to slavishly match the MATLAB order.
another try
I realized, above, that the MATLAB is just doing A*.B with
(1,2,1,1) arrays (1,1,1,2), where the extra 1's were added to "broadcast".
Using transpose to the same dimension outermost (leading in numpy)
In [5]: X = X.T; x = x.T
In [6]: X.shape
Out[6]: (2, 1)
In [7]: x.shape
Out[7]: (2, 2)
In [8]: x
Out[8]:
array([[3, 5],
[4, 6]])
In [9]: x.ravel()
Out[9]: array([3, 5, 4, 6]) # compare with MATLAB (:)'
Elementwise multiplication with the same dimension expansion:
In [10]: X[None,None,:,:]*x[:,:,None,None]
Out[10]:
array([[[[ 3],
[ 6]],
[[ 5],
[10]]],
[[[ 4],
[ 8]],
[[ 6],
[12]]]])
In [11]: _.shape
Out[11]: (2, 2, 2, 1) # compare with octave (1,2,2,2)
In [12]: __.squeeze()
Out[12]:
array([[[ 3, 6],
[ 5, 10]],
[[ 4, 8],
[ 6, 12]]])
the ravel is the same as Octave:
In [13]: ___.ravel()
Out[13]: array([ 3, 6, 5, 10, 4, 8, 6, 12])
expand_dims can be used instead of the indexing. Internally it uses reshape:
In [15]: np.expand_dims(X,(0,1)).shape
Out[15]: (1, 1, 2, 1)
In [16]: np.expand_dims(x,(2,3)).shape
Out[16]: (2, 2, 1, 1)
I've been trying to look up how np.diag_indices work, and for examples of them, however the documentation for it is a bit light. I know this creates a diagonal array through your matrix, however I want to change the diagonal array (I was thinking of using a loop to change its dimensions or something along those lines).
I.E.
say we have a 3x2 matrix:
[[1 2]
[3 4]
[5 6]]
Now if I use np.diag_indices it will form a diagonal array starting at (0,0) and goes through (1,1).
[1 4]
However, I'd like this diagonal array to then shift one down. So now it starts at (0,1) and goes through (1,2).
[3 6]
However there are only 2 arguments for np.diag_indices, neither of which from the looks of it enable me to do this. Am I using the wrong tool to try and achieve this? If so, what tools can I use to create a changing diagonal array that goes through my matrix? (I'm looking for something that will also work on larger matrices like a 200x50).
The code for diag_indices is simple, so simple that I've never used it:
idx = arange(n)
return (idx,) * ndim
In [68]: np.diag_indices(4,2)
Out[68]: (array([0, 1, 2, 3]), array([0, 1, 2, 3]))
It just returns a tuple of arrays, the arange repeated n times. It's useful for indexing the main diagonal of a square matrix, e.g.
In [69]: arr = np.arange(16).reshape(4,4)
In [70]: arr
Out[70]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
In [71]: arr[np.diag_indices(4,2)]
Out[71]: array([ 0, 5, 10, 15])
The application is straight forward indexing with two arrays that match in shape.
It works on other shapes - if they are big enogh.
np.diag applied to the same array does the same thing:
In [72]: np.diag(arr)
Out[72]: array([ 0, 5, 10, 15])
but it also allows for offset:
In [73]: np.diag(arr, 1)
Out[73]: array([ 1, 6, 11])
===
Indexing with diag_indices does allow us to change that diagonal:
In [78]: arr[np.diag_indices(4,2)] += 10
In [79]: arr
Out[79]:
array([[10, 1, 2, 3],
[ 4, 15, 6, 7],
[ 8, 9, 20, 11],
[12, 13, 14, 25]])
====
But we don't have to use diag_indices to generate the desired indexing arrays:
In [80]: arr = np.arange(1,7).reshape(3,2)
In [81]: arr
Out[81]:
array([[1, 2],
[3, 4],
[5, 6]])
selecting values from 1st 2 rows, and columns:
In [82]: arr[np.arange(2), np.arange(2)]
Out[82]: array([1, 4])
In [83]: arr[np.arange(2), np.arange(2)] += 10
In [84]: arr
Out[84]:
array([[11, 2],
[ 3, 14],
[ 5, 6]])
and for a difference selection of rows:
In [85]: arr[np.arange(1,3), np.arange(2)] += 20
In [86]: arr
Out[86]:
array([[11, 2],
[23, 14],
[ 5, 26]])
The relevant documentation section on advanced indexing with integer arrays: https://numpy.org/doc/stable/reference/arrays.indexing.html#purely-integer-array-indexing
I'm working with 2D numpy arrays which exhibit variable sizes, in terms of the number of rows and columns. I'd like to pad this array with zeros both before the start of the first row and at the end of the last row, but I'd like the start/end of the zeros to be offset in a different way for each column of data.
So the original 2D array:
1 2 3
4 5 6
7 8 9
A Normal example of padding:
0 0 0
0 0 0
1 2 3
4 5 6
7 8 9
0 0 0
Modified Padding with offsets (what I'm trying to do):
0 0 0
1 0 0
4 0 3
7 2 6
0 5 9
0 8 0
Does numpy possess any functions which can replicate the last example in an extendable manner for variables numbers of rows/columns, that avoids the use of for loops/other computationally slow approaches?
Here's a vectorized one with broadcasting and boolean-indexing -
def create_padded_array(a, row_start, n_rows):
r = np.arange(n_rows)[:,None]
row_start = np.asarray(row_start)
mask = (r >= row_start) & (r < row_start+a.shape[0])
out = np.zeros(mask.shape, dtype=a.dtype)
out.T[mask.T] = a.ravel('F')
return out
Sample run -
In [184]: a
Out[184]:
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
In [185]: create_padded_array(a, row_start=[1,3,2], n_rows=6)
Out[185]:
array([[0, 0, 0],
[1, 0, 0],
[4, 0, 3],
[7, 2, 6],
[0, 5, 9],
[0, 8, 0]])
Sorry for the trouble, but I think I found the answer that I was looking for.
I can use numpy.pad to create an arbitrary number of filler zeros at the end of my original array. There is also a function called numpy.roll which can then be used to shift all array elements along a given axis by a set number of positions down the column.
After a quick test, it looks like this is extendable for an arbitrary number of matrix elements and allows a unique offset along each column.
Thanks to everyone for their responses to this question!
To my knowledge there is no such numpy function with those exact specific requirements, however what you can do is have your array:
`
In [10]: arr = np.array([(1,2,3),(4,5,6),(7,8,9)])
In [11]: arr
Out[11]:
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])`
Then pad it:
In [12]: arr = np.pad(arr, ((2,1),(0,0)), 'constant', constant_values=(0))
In [13]: arr
Out[13]:
array([[0, 0, 0],
[0, 0, 0],
[1, 2, 3],
[4, 5, 6],
[7, 8, 9],
[0, 0, 0]])
Then you can randomize with shuffle (which I assume is what you want to do):
But np.random.shuffle only shuffles rows if this is satisfactory for your needs then:
In [14]: np.random.shuffle(arr)
In [15]: arr
Out[15]:
array([[7, 8, 9],
[4, 5, 6],
[0, 0, 0],
[0, 0, 0],
[0, 0, 0],
[1, 2, 3]])
If this is not satisfactory you can do this:
First create a 1D array:
In [16]: arr = np.arange(1,10)
In [17]: arr
Out[17]: array([1, 2, 3, 4, 5, 6, 7, 8, 9])
Then pad your array with zeros:
In [18]: arr = np.pad(arr, (6,3), 'constant', constant_values = (0))
In [19]: arr
Out[19]: array([0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 0, 0])
Then you shuffle the array:
In [20]: np.random.shuffle(arr)
In [21]: arr
Out[21]: array([4, 0, 0, 5, 0, 0, 3, 0, 0, 0, 8, 0, 7, 2, 1, 6, 0, 9])
Finally you reshape to the desired format:
In [22]: np.reshape(arr,[6,3])
Out[22]:
array([[4, 0, 0],
[5, 0, 0],
[3, 0, 0],
[0, 8, 0],
[7, 2, 1],
[6, 0, 9]])
Although this may seem lengthy this is much quicker for large data sets than it will be using for loops, or any other python control structures. When you say offsets if you want to change the amount of randomness you can choose to only shuffle portions of the 1D array then combine it to the rest of the data so that way the whole data set is not shuffled but a portion you want to be shuffled is.
(If what you mean by offsets is different from my assumption above please clarify in a comment)
I'm struggling to select the specific columns per row of a NumPy matrix.
Suppose I have the following matrix which I would call X:
[1, 2, 3]
[4, 5, 6]
[7, 8, 9]
I also have a list of column indexes per every row which I would call Y:
[1, 0, 2]
I need to get the values:
[2]
[4]
[9]
Instead of a list with indexes Y, I can also produce a matrix with the same shape as X where every column is a bool / int in the range 0-1 value, indicating whether this is the required column.
[0, 1, 0]
[1, 0, 0]
[0, 0, 1]
I know this can be done with iterating over the array and selecting the column values I need. However, this will be executed frequently on big arrays of data and that's why it has to run as fast as it can.
I was thus wondering if there is a better solution?
If you've got a boolean array you can do direct selection based on that like so:
>>> a = np.array([True, True, True, False, False])
>>> b = np.array([1,2,3,4,5])
>>> b[a]
array([1, 2, 3])
To go along with your initial example you could do the following:
>>> a = np.array([[1,2,3], [4,5,6], [7,8,9]])
>>> b = np.array([[False,True,False],[True,False,False],[False,False,True]])
>>> a[b]
array([2, 4, 9])
You can also add in an arange and do direct selection on that, though depending on how you're generating your boolean array and what your code looks like YMMV.
>>> a = np.array([[1,2,3], [4,5,6], [7,8,9]])
>>> a[np.arange(len(a)), [1,0,2]]
array([2, 4, 9])
You can do something like this:
In [7]: a = np.array([[1, 2, 3],
...: [4, 5, 6],
...: [7, 8, 9]])
In [8]: lst = [1, 0, 2]
In [9]: a[np.arange(len(a)), lst]
Out[9]: array([2, 4, 9])
More on indexing multi-dimensional arrays: http://docs.scipy.org/doc/numpy/user/basics.indexing.html#indexing-multi-dimensional-arrays
Recent numpy versions have added a take_along_axis (and put_along_axis) that does this indexing cleanly.
In [101]: a = np.arange(1,10).reshape(3,3)
In [102]: b = np.array([1,0,2])
In [103]: np.take_along_axis(a, b[:,None], axis=1)
Out[103]:
array([[2],
[4],
[9]])
It operates in the same way as:
In [104]: a[np.arange(3), b]
Out[104]: array([2, 4, 9])
but with different axis handling. It's especially aimed at applying the results of argsort and argmax.
A simple way might look like:
In [1]: a = np.array([[1, 2, 3],
...: [4, 5, 6],
...: [7, 8, 9]])
In [2]: y = [1, 0, 2] #list of indices we want to select from matrix 'a'
range(a.shape[0]) will return array([0, 1, 2])
In [3]: a[range(a.shape[0]), y] #we're selecting y indices from every row
Out[3]: array([2, 4, 9])
You can do it by using iterator. Like this:
np.fromiter((row[index] for row, index in zip(X, Y)), dtype=int)
Time:
N = 1000
X = np.zeros(shape=(N, N))
Y = np.arange(N)
##Aशwini चhaudhary
%timeit X[np.arange(len(X)), Y]
10000 loops, best of 3: 30.7 us per loop
#mine
%timeit np.fromiter((row[index] for row, index in zip(X, Y)), dtype=int)
1000 loops, best of 3: 1.15 ms per loop
#mine
%timeit np.diag(X.T[Y])
10 loops, best of 3: 20.8 ms per loop
Another clever way is to first transpose the array and index it thereafter. Finally, take the diagonal, its always the right answer.
X = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])
Y = np.array([1, 0, 2, 2])
np.diag(X.T[Y])
Step by step:
Original arrays:
>>> X
array([[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9],
[10, 11, 12]])
>>> Y
array([1, 0, 2, 2])
Transpose to make it possible to index it right.
>>> X.T
array([[ 1, 4, 7, 10],
[ 2, 5, 8, 11],
[ 3, 6, 9, 12]])
Get rows in the Y order.
>>> X.T[Y]
array([[ 2, 5, 8, 11],
[ 1, 4, 7, 10],
[ 3, 6, 9, 12],
[ 3, 6, 9, 12]])
The diagonal should now become clear.
>>> np.diag(X.T[Y])
array([ 2, 4, 9, 12]
The answer from hpaulj using take_along_axis should be the accepted one.
Here is a derived version with an N-dim index array:
>>> arr = np.arange(20).reshape((2,2,5))
>>> idx = np.array([[1,0],[2,4]])
>>> np.take_along_axis(arr, idx[...,None], axis=-1)
array([[[ 1],
[ 5]],
[[12],
[19]]])
Note that the selection operation is ignorant about the shapes. I used this to refine a possibly vector-valued argmax result from histogram by fitting parabolas:
def interpol(arr):
i = np.argmax(arr, axis=-1)
a = lambda Δ: np.squeeze(np.take_along_axis(arr, i[...,None]+Δ, axis=-1), axis=-1)
frac = .5*(a(1) - a(-1)) / (2*a(0) - a(-1) - a(1)) # |frac| < 0.5
return i + frac
Note the squeeze to remove the dimension of size 1 resulting in the same shape of i and frac, the integer and fractional part of the peak position.
I'm quite sure that it is possible to avoid the lambda, but would the interpolation formula still look nice?
I have the following to calculate the difference of a matrix, i.e. the i-th element - the (i-1) element.
How can I (easily) calculate the difference for each element horizontally and vertically? With a transpose?
inputarr = np.arange(12)
inputarr.shape = (3,4)
inputarr+=1
#shift one position
newarr = list()
for x in inputarr:
newarr.append(np.hstack((np.array([0]),x[:-1])))
z = np.array(newarr)
print inputarr
print 'first differences'
print inputarr-z
Output
[[ 1 2 3 4]
[ 5 6 7 8]
[ 9 10 11 12]]
first differences
[[1 1 1 1]
[5 1 1 1]
[9 1 1 1]]
Check out numpy.diff.
From the documentation:
Calculate the n-th order discrete difference along given axis.
The first order difference is given by out[n] = a[n+1] - a[n] along
the given axis, higher order differences are calculated by using diff
recursively.
An example:
>>> import numpy as np
>>> a = np.arange(12).reshape((3,4))
>>> a
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>> np.diff(a,axis = 1) # row-wise
array([[1, 1, 1],
[1, 1, 1],
[1, 1, 1]])
>>> np.diff(a, axis = 0) # column-wise
array([[4, 4, 4, 4],
[4, 4, 4, 4]])