Here's what I think axis is and want to know if my understanding is correct
We count opening bracket [ from left indexed from 0 and that's the axis
eg1) [[1,2],[3,4]]
for [ find all elements that has one [ left to it, are the elements for the axis 0. ([[ for axis 1 and so on)
0 axis: you see `[`: [x, y] where x = [1,2], y=[3,4]
1 axis: you see `[[`: [[x, y]] where x = [1,3], y = [2,4]
eg2) [[[1,2,3], [4,5,6]], [[7,8,9], [10,11,12]]]
0 axis: you see `[` [x, y] where x = [[1,2,3], [4,5,6]], y= [[7,8,9], [10,11,12]]
1 axis: you see `[[`, x = [1,2,3], [7,8,9] y = [4,5,6], [10,11,12]
2 axis: you see `[[[`, x = [1,4,7,10] y = [2,5,8,11] z = [3,6,9,12]
If there's a function that takes a value along the axis, I could verify if I'm right, but..
closest thing I found was np.take
For the (2,2) shape array:
In [13]: arr = np.array([[1,2],[3,4]])
In [14]: arr
Out[14]:
array([[1, 2],
[3, 4]])
Python unpacking just iterates on the array - that is on the first dimension, 'rows':
In [15]: x, y = arr
In [16]: x,y
Out[16]: (array([1, 2]), array([3, 4]))
To unpack columns, we could transpose the array, so the 2nd dimension is first. But I think a list comprehension is clearer:
In [17]: x, y = [arr[:,i] for i in range(2)]
In [18]: x,y
Out[18]: (array([1, 3]), array([2, 4]))
For the 3d array:
In [19]: arr = np.arange(1,13).reshape(2,2,3)
In [20]: arr
Out[20]:
array([[[ 1, 2, 3],
[ 4, 5, 6]],
[[ 7, 8, 9],
[10, 11, 12]]])
The results of iterating on the first dimension should be obvious - the 2 blocks or panes.
You got the 2nd dimension right. For the third, the results are 3 (2,2) arrays:
In [21]: x,y,z=[arr[:,:,i] for i in range(arr.shape[2])]
In [22]: x
Out[22]:
array([[ 1, 4],
[ 7, 10]])
In [23]: y
Out[23]:
array([[ 2, 5],
[ 8, 11]])
In [24]: z
Out[24]:
array([[ 3, 6],
[ 9, 12]])
x,y,z= from this transpose would also work:
In [25]: arr.transpose(2,0,1)
Out[25]:
array([[[ 1, 4],
[ 7, 10]],
[[ 2, 5],
[ 8, 11]],
[[ 3, 6],
[ 9, 12]]])
np.take can be used like my indexing:
[np.take(arr,i,2) for i in range(3)]
Related
let's say a NumPy array
a = np.array(
[[[1,2,3],
[4,5,6]],
[[7,8,9],
[10,11,12]]])
the shape will be like (2,2,3).
I'd like to make it look like this:
a = np.array(
[[1,2,3],
[7,8,9],
[4,5,6],
[10,11,12]]
)
which shape will be like (4,3).
if I use reshape, it will look like as:
a = np.array(
[[1,2,3],
[4,5,6],
[7,8,9],
[10,11,12]]
)
Which is NOT what I want. How to do this?
One way using numpy.stack and vstack:
np.vstack(np.stack(a, 1))
Output:
array([[ 1, 2, 3],
[ 7, 8, 9],
[ 4, 5, 6],
[10, 11, 12]])
By using indexing method, an idx list could be created that specifies which indices of the ex a must be placed as which indices in the new one i.e. idx is a rearranging list:
idx = [0, 2, 1, 3]
a = a.reshape(4, 3)[idx]
a is firstly reshaped to the intended shape, which is (4,3), and then rearranged by the idx. idx[1] = 2 is showing that value in index = 2 of the ex a will be replaced to index = 1 in the new a.
Here is a more pythonic version of your problem.
This uses concatenate so append the rows of your array.
a = np.array(
[[[1,2,3],
[4,5,6]],
[[7,8,9],
[10,11,12]]]
)
def transform_2d(a_arr):
nrow = len(a[:])
all = a_arr[:,0]
for i in range(1,nrow):
all = np.concatenate((all, a_arr[:,i] ))
return all
print(transform_2d(a))
First use transpose (or swapaxes) to bring the desire rows together:
In [268]: a.transpose(1,0,2)
Out[268]:
array([[[ 1, 2, 3],
[ 7, 8, 9]],
[[ 4, 5, 6],
[10, 11, 12]]])
then the reshape follows:
In [269]: a.transpose(1,0,2).reshape(-1,3)
Out[269]:
array([[ 1, 2, 3],
[ 7, 8, 9],
[ 4, 5, 6],
[10, 11, 12]])
I am trying to rewrite the following snippet of Matlab code about outer product of matrices into python code,
function Y = matlab_outer_product(X,x)
A = reshape(X, [size(X) ones(1,ndims(x))]);
B = reshape(x, [ones(1,ndims(X)) size(x)]);
Y = squeeze(bsxfun(#times,A,B));
end
My one-to-one translation of this to python code is as following (considering how the shape of numpy array and matlab matrices are arranged),
def python_outer_product(X, x):
X_shape = list(X.shape)
x_shape = list(x.shape)
A = X.reshape(*list(np.ones(np.ndim(x),dtype=int)),*X_shape)
B = x.reshape(*x_shape,*list(np.ones(np.ndim(X),dtype=int)))
Y = A*B
return Y.squeeze()
Then trying the inputs, for instance,
matlab_outer_product([1,2],[[3,4];[5,6]])
python_out_product(np.array([[1,2]], np.array([[3,4],[5,6]])))
The outputs don't quite match. In matlab, it outputs
output(:,:,1) = [[3,5];[6,10]]
output(:,:,2) = [[4,6];[8,12]]
In python, it outputs
output = array([
[[ 3, 6],
[ 4, 8]],
[[ 5, 10],
[ 6, 12]]
])
They're almost identical, but not quite. I wonder what's wrong with code and how to change the python code to match with matlab output?
In full gory detail (since my MATLAB memory is old):
Octave
>> X = [1,2];
>> x = [[3,4];[5,6]];
>> A = reshape(X, [size(X) ones(1,ndims(x))]);
>> B = reshape(x, [ones(1,ndims(X)) size(x)]);
>> A
A =
1 2
>> B
B =
ans(:,:,1,1) = 3
ans(:,:,2,1) = 5
ans(:,:,1,2) = 4
ans(:,:,2,2) = 6
>> bsxfun(#times,A,B)
ans =
ans(:,:,1,1) =
3 6
ans(:,:,2,1) =
5 10
ans(:,:,1,2) =
4 8
ans(:,:,2,2) =
6 12
>> squeeze(bsxfun(#times,A,B))
ans =
ans(:,:,1) =
3 5
6 10
ans(:,:,2) =
4 6
8 12
You start with a (1,2) and (2,2), expand the second to (1,1,2,2). The bsxfun produces a (1,2,2,2) which is squeezed to (2,2,2).
A is X reshaped to [1 2 1 1], but the two outer size 1 dimensions are squeeze out, resulting in no change.
This MATLAB outter is a bit convoluted, using bsxfun to perform elementwise multiplication of (1,2,1,1) with (1,1,1,2). At least in Octave it's the same as
A.*B
In numpy
In [77]: X
Out[77]: array([[1, 2]]) # (1,2)
In [78]: x
Out[78]:
array([[3, 4], # (2,2)
[5, 6]])
Note that the MATLAB/Octave x when flattened has elements (3,5,4,6), while the numpy ravel is [3,4,5,6].
In numpy I can simply do:
In [79]: X[:,:,None,None]*x
Out[79]:
array([[[[ 3, 4], (1,2,2,2)
[ 5, 6]],
[[ 6, 8],
[10, 12]]]])
or without the extra size 1 dimension of X:
In [84]: (X[0,:,None,None]*x)
Out[84]:
array([[[ 3, 4],
[ 5, 6]],
[[ 6, 8],
[10, 12]]])
In [85]: (X[0,:,None,None]*x).ravel()
Out[85]: array([ 3, 4, 5, 6, 6, 8, 10, 12])
compare that with the Octave ravel
>> squeeze(bsxfun(#times,A,B))(:)'
ans =
3 6 5 10 4 8 6 12
We could add a transpose to the numpy
In [96]: (X[0,:,None,None]*x).transpose(2,1,0).ravel()
Out[96]: array([ 3, 6, 5, 10, 4, 8, 6, 12])
In [97]: (X[0,:,None,None]*x).transpose(2,1,0)
Out[97]:
array([[[ 3, 6],
[ 5, 10]],
[[ 4, 8],
[ 6, 12]]])
At least in numpy we can tweak the dimension order in lots of ways, so I won't try to suggest an optimal. I still think it's better to write code that's "natural" to numpy than to slavishly match the MATLAB order.
another try
I realized, above, that the MATLAB is just doing A*.B with
(1,2,1,1) arrays (1,1,1,2), where the extra 1's were added to "broadcast".
Using transpose to the same dimension outermost (leading in numpy)
In [5]: X = X.T; x = x.T
In [6]: X.shape
Out[6]: (2, 1)
In [7]: x.shape
Out[7]: (2, 2)
In [8]: x
Out[8]:
array([[3, 5],
[4, 6]])
In [9]: x.ravel()
Out[9]: array([3, 5, 4, 6]) # compare with MATLAB (:)'
Elementwise multiplication with the same dimension expansion:
In [10]: X[None,None,:,:]*x[:,:,None,None]
Out[10]:
array([[[[ 3],
[ 6]],
[[ 5],
[10]]],
[[[ 4],
[ 8]],
[[ 6],
[12]]]])
In [11]: _.shape
Out[11]: (2, 2, 2, 1) # compare with octave (1,2,2,2)
In [12]: __.squeeze()
Out[12]:
array([[[ 3, 6],
[ 5, 10]],
[[ 4, 8],
[ 6, 12]]])
the ravel is the same as Octave:
In [13]: ___.ravel()
Out[13]: array([ 3, 6, 5, 10, 4, 8, 6, 12])
expand_dims can be used instead of the indexing. Internally it uses reshape:
In [15]: np.expand_dims(X,(0,1)).shape
Out[15]: (1, 1, 2, 1)
In [16]: np.expand_dims(x,(2,3)).shape
Out[16]: (2, 2, 1, 1)
I have and ndarray defined in the following way:
dataset = np.ndarray(shape=(len(image_files), image_size, image_size),
dtype=np.float32)
This array represents a collection of images of size image_size * image_size.
So I can say, dataset[0] and get a 2D table corresponding to an image with index 0.
Now I would like to have one additional field for each image in this array. For instance, for image located at index 0, I would like to store number 123, for an image located at index 321 I would like to store number 50000.
What is the simplest way to add this additional data field to the existing ndarray?
What is the appropriate way to access data in the new array after adding this additional dimension?
If you shuffle an index array instead of the dataset itself, you can keep track of the original 'identifiers'
idx = np.arange(len(image_files))
np.random.shuffle(idx)
shuffle_set = dataset[idx]
illustration:
In [20]: x = np.arange(12).reshape(6,2)
...: idx = np.arange(6)
...: np.random.shuffle(idx)
In [21]: x
Out[21]:
array([[ 0, 1],
[ 2, 3],
[ 4, 5],
[ 6, 7],
[ 8, 9],
[10, 11]])
In [22]: x[idx] # shuffled
Out[22]:
array([[ 4, 5],
[ 0, 1],
[ 2, 3],
[ 6, 7],
[10, 11],
[ 8, 9]])
In [23]: idx1=np.argsort(idx)
In [24]: idx
Out[24]: array([2, 0, 1, 3, 5, 4])
In [25]: idx1
Out[25]: array([1, 2, 0, 3, 5, 4])
In [26]: Out[22][idx1] # recover original order
Out[26]:
array([[ 0, 1],
[ 2, 3],
[ 4, 5],
[ 6, 7],
[ 8, 9],
[10, 11]])
Numpy arrays are fundamentally tensors, i.e., they have a shape that is absolute across the axes. Meaning that the shape is fixed and not variable. Take for example,
import numpy as np
x = np.array([[[1,2],[3,4]],
[[5,6],[7,8]]
])
print(x.shape) #Here we have two, 2x2s. Shape = (2,2,2)
If I want to associate x[0] to the number 5 and x[1] to the number 7, then that would be something like (if it was possible):
x = np.array([[[1,2],[3,4]],5,
[[5,6],[7,8]],7
])
But such thing is impossible, since it would "in some sense" have a shape that corresponds to (2,((2,2),1)), or something else that is ambiguous. Such an object is not a numpy array or a tensor. It doesn't have fixed axis sizes. All numpy arrays must have fixed axis sizes. Hence, if you wish to store the new information, the only way to do it, is to create another array.
x = np.array([[[1,2],[3,4]],
[[5,6],[7,8]],
])
y = np.array([5,7])
Now x[0] corresponds to y[0] and x[1] corresponds to y[1]. x has shape (2,2,2) and y has shape (2,).
I've been trying to look up how np.diag_indices work, and for examples of them, however the documentation for it is a bit light. I know this creates a diagonal array through your matrix, however I want to change the diagonal array (I was thinking of using a loop to change its dimensions or something along those lines).
I.E.
say we have a 3x2 matrix:
[[1 2]
[3 4]
[5 6]]
Now if I use np.diag_indices it will form a diagonal array starting at (0,0) and goes through (1,1).
[1 4]
However, I'd like this diagonal array to then shift one down. So now it starts at (0,1) and goes through (1,2).
[3 6]
However there are only 2 arguments for np.diag_indices, neither of which from the looks of it enable me to do this. Am I using the wrong tool to try and achieve this? If so, what tools can I use to create a changing diagonal array that goes through my matrix? (I'm looking for something that will also work on larger matrices like a 200x50).
The code for diag_indices is simple, so simple that I've never used it:
idx = arange(n)
return (idx,) * ndim
In [68]: np.diag_indices(4,2)
Out[68]: (array([0, 1, 2, 3]), array([0, 1, 2, 3]))
It just returns a tuple of arrays, the arange repeated n times. It's useful for indexing the main diagonal of a square matrix, e.g.
In [69]: arr = np.arange(16).reshape(4,4)
In [70]: arr
Out[70]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
In [71]: arr[np.diag_indices(4,2)]
Out[71]: array([ 0, 5, 10, 15])
The application is straight forward indexing with two arrays that match in shape.
It works on other shapes - if they are big enogh.
np.diag applied to the same array does the same thing:
In [72]: np.diag(arr)
Out[72]: array([ 0, 5, 10, 15])
but it also allows for offset:
In [73]: np.diag(arr, 1)
Out[73]: array([ 1, 6, 11])
===
Indexing with diag_indices does allow us to change that diagonal:
In [78]: arr[np.diag_indices(4,2)] += 10
In [79]: arr
Out[79]:
array([[10, 1, 2, 3],
[ 4, 15, 6, 7],
[ 8, 9, 20, 11],
[12, 13, 14, 25]])
====
But we don't have to use diag_indices to generate the desired indexing arrays:
In [80]: arr = np.arange(1,7).reshape(3,2)
In [81]: arr
Out[81]:
array([[1, 2],
[3, 4],
[5, 6]])
selecting values from 1st 2 rows, and columns:
In [82]: arr[np.arange(2), np.arange(2)]
Out[82]: array([1, 4])
In [83]: arr[np.arange(2), np.arange(2)] += 10
In [84]: arr
Out[84]:
array([[11, 2],
[ 3, 14],
[ 5, 6]])
and for a difference selection of rows:
In [85]: arr[np.arange(1,3), np.arange(2)] += 20
In [86]: arr
Out[86]:
array([[11, 2],
[23, 14],
[ 5, 26]])
The relevant documentation section on advanced indexing with integer arrays: https://numpy.org/doc/stable/reference/arrays.indexing.html#purely-integer-array-indexing
I'm struggling to select the specific columns per row of a NumPy matrix.
Suppose I have the following matrix which I would call X:
[1, 2, 3]
[4, 5, 6]
[7, 8, 9]
I also have a list of column indexes per every row which I would call Y:
[1, 0, 2]
I need to get the values:
[2]
[4]
[9]
Instead of a list with indexes Y, I can also produce a matrix with the same shape as X where every column is a bool / int in the range 0-1 value, indicating whether this is the required column.
[0, 1, 0]
[1, 0, 0]
[0, 0, 1]
I know this can be done with iterating over the array and selecting the column values I need. However, this will be executed frequently on big arrays of data and that's why it has to run as fast as it can.
I was thus wondering if there is a better solution?
If you've got a boolean array you can do direct selection based on that like so:
>>> a = np.array([True, True, True, False, False])
>>> b = np.array([1,2,3,4,5])
>>> b[a]
array([1, 2, 3])
To go along with your initial example you could do the following:
>>> a = np.array([[1,2,3], [4,5,6], [7,8,9]])
>>> b = np.array([[False,True,False],[True,False,False],[False,False,True]])
>>> a[b]
array([2, 4, 9])
You can also add in an arange and do direct selection on that, though depending on how you're generating your boolean array and what your code looks like YMMV.
>>> a = np.array([[1,2,3], [4,5,6], [7,8,9]])
>>> a[np.arange(len(a)), [1,0,2]]
array([2, 4, 9])
You can do something like this:
In [7]: a = np.array([[1, 2, 3],
...: [4, 5, 6],
...: [7, 8, 9]])
In [8]: lst = [1, 0, 2]
In [9]: a[np.arange(len(a)), lst]
Out[9]: array([2, 4, 9])
More on indexing multi-dimensional arrays: http://docs.scipy.org/doc/numpy/user/basics.indexing.html#indexing-multi-dimensional-arrays
Recent numpy versions have added a take_along_axis (and put_along_axis) that does this indexing cleanly.
In [101]: a = np.arange(1,10).reshape(3,3)
In [102]: b = np.array([1,0,2])
In [103]: np.take_along_axis(a, b[:,None], axis=1)
Out[103]:
array([[2],
[4],
[9]])
It operates in the same way as:
In [104]: a[np.arange(3), b]
Out[104]: array([2, 4, 9])
but with different axis handling. It's especially aimed at applying the results of argsort and argmax.
A simple way might look like:
In [1]: a = np.array([[1, 2, 3],
...: [4, 5, 6],
...: [7, 8, 9]])
In [2]: y = [1, 0, 2] #list of indices we want to select from matrix 'a'
range(a.shape[0]) will return array([0, 1, 2])
In [3]: a[range(a.shape[0]), y] #we're selecting y indices from every row
Out[3]: array([2, 4, 9])
You can do it by using iterator. Like this:
np.fromiter((row[index] for row, index in zip(X, Y)), dtype=int)
Time:
N = 1000
X = np.zeros(shape=(N, N))
Y = np.arange(N)
##Aशwini चhaudhary
%timeit X[np.arange(len(X)), Y]
10000 loops, best of 3: 30.7 us per loop
#mine
%timeit np.fromiter((row[index] for row, index in zip(X, Y)), dtype=int)
1000 loops, best of 3: 1.15 ms per loop
#mine
%timeit np.diag(X.T[Y])
10 loops, best of 3: 20.8 ms per loop
Another clever way is to first transpose the array and index it thereafter. Finally, take the diagonal, its always the right answer.
X = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])
Y = np.array([1, 0, 2, 2])
np.diag(X.T[Y])
Step by step:
Original arrays:
>>> X
array([[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9],
[10, 11, 12]])
>>> Y
array([1, 0, 2, 2])
Transpose to make it possible to index it right.
>>> X.T
array([[ 1, 4, 7, 10],
[ 2, 5, 8, 11],
[ 3, 6, 9, 12]])
Get rows in the Y order.
>>> X.T[Y]
array([[ 2, 5, 8, 11],
[ 1, 4, 7, 10],
[ 3, 6, 9, 12],
[ 3, 6, 9, 12]])
The diagonal should now become clear.
>>> np.diag(X.T[Y])
array([ 2, 4, 9, 12]
The answer from hpaulj using take_along_axis should be the accepted one.
Here is a derived version with an N-dim index array:
>>> arr = np.arange(20).reshape((2,2,5))
>>> idx = np.array([[1,0],[2,4]])
>>> np.take_along_axis(arr, idx[...,None], axis=-1)
array([[[ 1],
[ 5]],
[[12],
[19]]])
Note that the selection operation is ignorant about the shapes. I used this to refine a possibly vector-valued argmax result from histogram by fitting parabolas:
def interpol(arr):
i = np.argmax(arr, axis=-1)
a = lambda Δ: np.squeeze(np.take_along_axis(arr, i[...,None]+Δ, axis=-1), axis=-1)
frac = .5*(a(1) - a(-1)) / (2*a(0) - a(-1) - a(1)) # |frac| < 0.5
return i + frac
Note the squeeze to remove the dimension of size 1 resulting in the same shape of i and frac, the integer and fractional part of the peak position.
I'm quite sure that it is possible to avoid the lambda, but would the interpolation formula still look nice?