import numpy as np
x = np.random.randn(2, 3, 4)
mask = np.array([1, 0, 1, 0], dtype=np.bool)
y = x[0, :, mask]
z = x[0, :, :][:, mask]
print(y)
print(z)
print(y.T)
Why does doing the above operation in two steps result in the transpose of doing it in one step?
Here's the same behavior with a list index:
In [87]: x=np.arange(2*3*4).reshape(2,3,4)
In [88]: x[0,:,[0,2]]
Out[88]:
array([[ 0, 4, 8],
[ 2, 6, 10]])
In [89]: x[0,:,:][:,[0,2]]
Out[89]:
array([[ 0, 2],
[ 4, 6],
[ 8, 10]])
In the 2nd case, x[0,:,:] returns a (3,4) array, and the next index picks 2 columns.
In the 1st case, it first selects on the first and last dimensions, and appends the slice (the middle dimension). The 0 and [0,2] produce a 2 dimension, and the 3 from the middle is appended, giving (2,3) shape.
This is a case of mixed basic and advanced indexing.
http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#combining-advanced-and-basic-indexing
In the first case, the dimensions resulting from the advanced indexing operation come first in the result array, and the subspace dimensions after that.
This is not an easy case to comprehend or explain. Basically there's some ambiguity as to what the final dimension should be. It tries to illustrate with an example x[:,ind_1,:,ind_2] where ind_1 and ind_2 are 3d (or together broadcast to that).
Earlier attempts to explain this are:
How does numpy order array slice indices?
Combining slicing and broadcasted indexing for multi-dimensional numpy arrays
===========================
A way around this problem is to replace the slice with an array - a column vector
In [221]: x[0,np.array([0,1,2])[:,None],[0,2]]
Out[221]:
array([[ 0, 2],
[ 4, 6],
[ 8, 10]])
In [222]: np.ix_([0],[0,1,2],[0,2])
Out[222]:
(array([[[0]]]), array([[[0],
[1],
[2]]]), array([[[0, 2]]]))
In [223]: x[np.ix_([0],[0,1,2],[0,2])]
Out[223]:
array([[[ 0, 2],
[ 4, 6],
[ 8, 10]]])
Though this last case is 3d, (1,3,2). ix_ didn't like the scalar 0. An alternate way of using ix_:
In [224]: i,j=np.ix_([0,1,2],[0,2])
In [225]: x[0,i,j]
Out[225]:
array([[ 0, 2],
[ 4, 6],
[ 8, 10]])
And here's a way of getting the same numbers, but in a (2,1,3) array:
In [232]: i,j=np.ix_([0,2],[0])
In [233]: x[j,:,i]
Out[233]:
array([[[ 0, 4, 8]],
[[ 2, 6, 10]]])
Related
arr = np.arange(12).reshape((3, 2, 2))
indices = np.array([0, 1, 1])
expected_outcome = np.array([[0, 1], [6, 7], [10, 11]])
I'm trying to index this array of shape (3,2,2) with an array of shape (3) containing the y-index of the value I want to get. I tried to make it work with for in statement, but is there an elegant way to do it with numpy?
So you want arr[0,0,:], arr[1,1,:], arr[2,1,:]?
How about
In [179]: arr[[0,1,2], [0,1,1]]
Out[179]:
array([[ 0, 1],
[ 6, 7],
[10, 11]])
I have and ndarray defined in the following way:
dataset = np.ndarray(shape=(len(image_files), image_size, image_size),
dtype=np.float32)
This array represents a collection of images of size image_size * image_size.
So I can say, dataset[0] and get a 2D table corresponding to an image with index 0.
Now I would like to have one additional field for each image in this array. For instance, for image located at index 0, I would like to store number 123, for an image located at index 321 I would like to store number 50000.
What is the simplest way to add this additional data field to the existing ndarray?
What is the appropriate way to access data in the new array after adding this additional dimension?
If you shuffle an index array instead of the dataset itself, you can keep track of the original 'identifiers'
idx = np.arange(len(image_files))
np.random.shuffle(idx)
shuffle_set = dataset[idx]
illustration:
In [20]: x = np.arange(12).reshape(6,2)
...: idx = np.arange(6)
...: np.random.shuffle(idx)
In [21]: x
Out[21]:
array([[ 0, 1],
[ 2, 3],
[ 4, 5],
[ 6, 7],
[ 8, 9],
[10, 11]])
In [22]: x[idx] # shuffled
Out[22]:
array([[ 4, 5],
[ 0, 1],
[ 2, 3],
[ 6, 7],
[10, 11],
[ 8, 9]])
In [23]: idx1=np.argsort(idx)
In [24]: idx
Out[24]: array([2, 0, 1, 3, 5, 4])
In [25]: idx1
Out[25]: array([1, 2, 0, 3, 5, 4])
In [26]: Out[22][idx1] # recover original order
Out[26]:
array([[ 0, 1],
[ 2, 3],
[ 4, 5],
[ 6, 7],
[ 8, 9],
[10, 11]])
Numpy arrays are fundamentally tensors, i.e., they have a shape that is absolute across the axes. Meaning that the shape is fixed and not variable. Take for example,
import numpy as np
x = np.array([[[1,2],[3,4]],
[[5,6],[7,8]]
])
print(x.shape) #Here we have two, 2x2s. Shape = (2,2,2)
If I want to associate x[0] to the number 5 and x[1] to the number 7, then that would be something like (if it was possible):
x = np.array([[[1,2],[3,4]],5,
[[5,6],[7,8]],7
])
But such thing is impossible, since it would "in some sense" have a shape that corresponds to (2,((2,2),1)), or something else that is ambiguous. Such an object is not a numpy array or a tensor. It doesn't have fixed axis sizes. All numpy arrays must have fixed axis sizes. Hence, if you wish to store the new information, the only way to do it, is to create another array.
x = np.array([[[1,2],[3,4]],
[[5,6],[7,8]],
])
y = np.array([5,7])
Now x[0] corresponds to y[0] and x[1] corresponds to y[1]. x has shape (2,2,2) and y has shape (2,).
I've been trying to look up how np.diag_indices work, and for examples of them, however the documentation for it is a bit light. I know this creates a diagonal array through your matrix, however I want to change the diagonal array (I was thinking of using a loop to change its dimensions or something along those lines).
I.E.
say we have a 3x2 matrix:
[[1 2]
[3 4]
[5 6]]
Now if I use np.diag_indices it will form a diagonal array starting at (0,0) and goes through (1,1).
[1 4]
However, I'd like this diagonal array to then shift one down. So now it starts at (0,1) and goes through (1,2).
[3 6]
However there are only 2 arguments for np.diag_indices, neither of which from the looks of it enable me to do this. Am I using the wrong tool to try and achieve this? If so, what tools can I use to create a changing diagonal array that goes through my matrix? (I'm looking for something that will also work on larger matrices like a 200x50).
The code for diag_indices is simple, so simple that I've never used it:
idx = arange(n)
return (idx,) * ndim
In [68]: np.diag_indices(4,2)
Out[68]: (array([0, 1, 2, 3]), array([0, 1, 2, 3]))
It just returns a tuple of arrays, the arange repeated n times. It's useful for indexing the main diagonal of a square matrix, e.g.
In [69]: arr = np.arange(16).reshape(4,4)
In [70]: arr
Out[70]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
In [71]: arr[np.diag_indices(4,2)]
Out[71]: array([ 0, 5, 10, 15])
The application is straight forward indexing with two arrays that match in shape.
It works on other shapes - if they are big enogh.
np.diag applied to the same array does the same thing:
In [72]: np.diag(arr)
Out[72]: array([ 0, 5, 10, 15])
but it also allows for offset:
In [73]: np.diag(arr, 1)
Out[73]: array([ 1, 6, 11])
===
Indexing with diag_indices does allow us to change that diagonal:
In [78]: arr[np.diag_indices(4,2)] += 10
In [79]: arr
Out[79]:
array([[10, 1, 2, 3],
[ 4, 15, 6, 7],
[ 8, 9, 20, 11],
[12, 13, 14, 25]])
====
But we don't have to use diag_indices to generate the desired indexing arrays:
In [80]: arr = np.arange(1,7).reshape(3,2)
In [81]: arr
Out[81]:
array([[1, 2],
[3, 4],
[5, 6]])
selecting values from 1st 2 rows, and columns:
In [82]: arr[np.arange(2), np.arange(2)]
Out[82]: array([1, 4])
In [83]: arr[np.arange(2), np.arange(2)] += 10
In [84]: arr
Out[84]:
array([[11, 2],
[ 3, 14],
[ 5, 6]])
and for a difference selection of rows:
In [85]: arr[np.arange(1,3), np.arange(2)] += 20
In [86]: arr
Out[86]:
array([[11, 2],
[23, 14],
[ 5, 26]])
The relevant documentation section on advanced indexing with integer arrays: https://numpy.org/doc/stable/reference/arrays.indexing.html#purely-integer-array-indexing
How do I combine multiple column vectors into a Matrix? For example, if I have 3 10 x 1 vectors, how do I put them into a 10 x 3 matrix?
Here's what I've tried so far:
D0 =np.array([[np.cos(2*np.pi*f*time)],[np.sin(2*np.pi*f*time)],np.ones((len(time),1)).transpose()],'float').transpose()
this gives me something like this ,
[[[ 1.00000000e+00 0.00000000e+00 1.00000000e+00]]
[[ 9.99999741e-01 7.19053432e-04 1.00000000e+00]]
[[ 9.99998966e-01 1.43810649e-03 1.00000000e+00]]
...
[[ 9.99998966e-01 -1.43810649e-03 1.00000000e+00]]
[[ 9.99999741e-01 -7.19053432e-04 1.00000000e+00]]
[[ 1.00000000e+00 -2.15587355e-14 1.00000000e+00]]]
but, I don't think this is right, it looks more like an array of lists (and I couldn't do matrix multiplication with this form)...I tried numpy.concatenate as well, but that didn't work for me either...Looking into stack next....
In Matlab notation, I need to get this into a form
D0 =[cos(2*pi*f *t1), sin(2*pi*f*t1) ,1; cos(2*pi*f*t2), sin(2*pi*f*t2) ,1;....] etc
So that I can find the least squares solution s_hat:
s_hat = (D0^T D0)^-1(D0^T x)
where x is another input vector containing the samples of the sinusoid I'm trying to fit.
In Matlab, I could just type
D0 = [cos(2*np.pi*f*time),sin(2*np.pi*f*time), repmat(1,len(time),1)]
to create the D0 matrix. How do I do this in python?
Thank you!
Here you have equivalent complete examples in Matlab and Python/NumPy:
% Matlab
f = 0.1;
time = [0; 1; 2; 3];
D0 = [cos(2*pi*f*time), sin(2*pi*f*time), repmat(1,length(time),1)]
# Python
import numpy as np
f = 0.1
time = np.array([0, 1, 2, 3])
D0 = np.array([np.cos(2*np.pi*f*time), np.sin(2*np.pi*f*time), np.ones(time.size)]).T
print(D0)
Note that unlike Matlab, Python/NumPy has no special syntax to distinguish rows from columns (, vs. ; in Matlab). Similarly, a 1D NumPy array has no notion of either being a "column" or "row" vector. When merging several 1D NumPy arrays into a single 2D array, as above, each 1D array ends up as a row in the 2D array. As you want them as columns, you need to transpose the 2D array, here accomplished simply by the .T attribute.
If the arrays really are (10,1) shape, then simply concatenate:
In [60]: x,y,z = np.ones((10,1),int), np.zeros((10,1),int), np.arange(10)[:,None]
In [61]: np.concatenate([x,y,z], axis=1)
Out[61]:
array([[1, 0, 0],
[1, 0, 1],
[1, 0, 2],
[1, 0, 3],
[1, 0, 4],
[1, 0, 5],
[1, 0, 6],
[1, 0, 7],
[1, 0, 8],
[1, 0, 9]])
If they are actually 1d, you'll have to fiddle with dimensions in one way or other. For example reshape or add a dimension as I did with z above. Or use some function that does that for you:
In [62]: x,y,z = np.ones((10,),int), np.zeros((10,),int), np.arange(10)
In [63]: z.shape
Out[63]: (10,)
In [64]: np.array([x,y,z]).shape
Out[64]: (3, 10)
In [65]: np.array([x,y,z]).T # transpose
Out[65]:
array([[1, 0, 0],
[1, 0, 1],
[1, 0, 2],
[1, 0, 3],
[1, 0, 4],
[1, 0, 5],
[1, 0, 6],
[1, 0, 7],
[1, 0, 8],
[1, 0, 9]])
np.array([...]) joins the arrays on a new initial dimension. Remember in Python/numpy the first dimension is the outermost one (MATLAB is the reverse).
stack variants tweak the dimensions, and then do concatenate:
In [66]: np.stack([x,y,z],axis=1).shape
Out[66]: (10, 3)
In [67]: np.column_stack([x,y,z]).shape
Out[67]: (10, 3)
In [68]: np.vstack([x,y,z]).shape
Out[68]: (3, 10)
===
D0 =np.array([[np.cos(2*np.pi*f*time)],[np.sin(2*np.pi*f*time)],np.ones((len(time),1)).transpose()],'float').transpose()
I'm guessing f is a scalar, and time is a 1d array (shape (10,))
[np.cos(2*np.pi*f*time)]
wraps a (10,) in [], which when turned into an array becomes (1,10) shape.
np.ones((len(time),1)).transpose() is (10,1) transposed to (1,10).
np.array(....) of these creates a (3,1,10) array. Transpose of that is (10,1,3).
If you dropped the [] and shape that created (1,10) arrays:
D0 =np.array([np.cos(2*np.pi*f*time), np.sin(2*np.pi*f*time), np.ones((len(time))]).transpose()
would join 3 (10,) arrays to make (3,10), which then transposes to (10,3).
Alternatively,
D0 =np.concatenate([[np.cos(2*np.pi*f*time)], [np.sin(2*np.pi*f*time)], np.ones((1,len(time),1))], axis=0)
joins the 3 (1,10) arrays to make a (3,10), which you can transpose.
I find it weird that numpy.power has no axis argument... is it because there is a better/safer way to achieve the same goal (elevating each 2D array in a 3D array to the power of a 1D array).
Suppose you have a (3,10,10) array (A) and you want to elevate each (10,10) array to the power of elements in array B of shape (3,).
You should be able to do it by using np.power(A,B,axis=0), right?
Yet it yields the following TypeError :
TypeError: 'axis' is an invalid keyword to ufunc 'power'
Since it seems that power does not have an axis or axes argument (despite being an ufunc), what is the preferred way to do it ?
There may be a solution using the ufunc.reduce method but I don't really see how that would work with numpy.power...
For now I do :
np.array([A[i,:,:]**B[i] for i in range(3)])
But it looks ugly and is probably less efficient than a numpy method would be.
Thanks
power is not a reduction operation: it does not reduce a collection of numbers to a single number, so an axis argument doesn't make sense. Operations such as sum or max are reductions, so it is meaningful to specify an axis along which to apply the reduction.
The operation that you want is broadcasting. Here's a smaller example, with A having shape (3, 2, 2) and B having shape (3,). We can't write np.power(A, B), because the shapes are not compatible for broadcasting. We first have to add trivial dimensions to B to give it the shape (3, 1, 1). That can be done with, for example, B[:, np.newaxis, np.newaxis] or B.reshape(-1, 1, 1).
In [100]: A
Out[100]:
array([[[1, 1],
[3, 3]],
[[3, 2],
[1, 1]],
[[3, 2],
[1, 3]]])
In [101]: B
Out[101]: array([2, 1, 3])
In [102]: np.power(A, B[:, np.newaxis, np.newaxis])
Out[102]:
array([[[ 1, 1],
[ 9, 9]],
[[ 3, 2],
[ 1, 1]],
[[27, 8],
[ 1, 27]]])
The value of np.newaxis is None, so you'll often see expressions that use None instead of np.newaxis. You can also using the ** operator instead of the function power:
In [103]: A ** B[:, None, None]
Out[103]:
array([[[ 1, 1],
[ 9, 9]],
[[ 3, 2],
[ 1, 1]],
[[27, 8],
[ 1, 27]]])