Why doesn't np.newaxis always append the axis where I put it? See example:
import numpy as np
M = np.array([[1,4],[2,5], [3,4]])
M[:,np.newaxis].shape
# returns (3, 1, 2)
According to the documents describing newaxis:
The added dimension is the position of the newaxis object in the selection tuple.
In your example, newaxis is in the second position of the tuple, hence the new dimension of length 1 is inserted there.
This is analogous to selecting a value at a particular index. For a three-dimensional A, you would use A[:,0] to retrieve the index 0 value from the second axis, not the third axis.
If you want to add the new axis in the last position of the tuple, you could write M[:,:,np.newaxis] or alternatively use the ellipses notation:
>>> M[...,np.newaxis].shape
(3, 2, 1)
Related
# split into inputs and outputs
X, y = data[:, :-1], data[:, -1]
print(X.shape, y.shape)
Can someone explain the second line of code with reference to specific documentation? I know its slicing but the I couldn't find any reference for the notation ":-1" anywhere. Please give the specific documentation portion.
Thank you
It results in slicing, most probably using numpy and it is being done on a data of shape (610, 14)
Per the docs:
Indexing on ndarrays
ndarrays can be indexed using the standard Python x[obj] syntax, where x is the array and obj the selection. There are different kinds of indexing available depending on obj: basic indexing, advanced indexing and field access.
1D array
Slicing a 1-dimensional array is much like slicing a list
import numpy as np
np.random.seed(0)
array_1d = np.random.random((5,))
print(len(array_1d.shape))
1
NOTE: The len of the array shape tells you the number of dimensions.
We can use standard python list slicing on the 1D array.
# get the last element
print(array_1d[-1])
0.4236547993389047
# get everything up to but excluding the last element
print(array_1d[:-1])
[0.5488135 0.71518937 0.60276338 0.54488318]
2D array
array_2d = np.random.random((5, 1))
print(len(array_2d.shape))
2
Think of a 2-dimensional array like a data frame. It has rows (the 0th axis) and columns (the 1st axis). numpy grants us the ability to slice these axes independently by separating them with a comma (,).
# the 0th row and all columns
# the 0th row and all columns
print(array_2d[0, :])
[0.79172504]
# the 1st row and everything after + all columns
print(array_2d[1:, :])
[[0.52889492]
[0.56804456]
[0.92559664]
[0.07103606]]
# the 1st through second to last row + the last column
print(array_2d[1:-1, -1])
[0.52889492 0.56804456 0.92559664]
Your Example
# split into inputs and outputs
X, y = data[:, :-1], data[:, -1]
print(X.shape, y.shape)
Note that data.shape is >= 2 (otherwise you'd get an IndexError).
This means data[:, :-1] is keeping all "rows" and slicing up to, but not including, the last "column". Likewise, data[:, -1] is keeping all "rows" and selecting only the last "column".
It's important to know that when you slice an ndarray using a colon (:), you will get an array with the same dimensions.
print(len(array_2d[1:, :-1].shape)) # 2
But if you "select" a specific index (i.e. don't use a colon), you may reduce the dimensions.
print(len(array_2d[1, :-1].shape)) # 1, because I selected a single index value on the 0th axis
print(len(array_2d[1, -1].shape)) # 0, because I selected a single index value on both the 0th and 1st axes
You can, however, select a list of indices on either axis (assuming they exist).
print(len(array_2d[[1], [-1]].shape)) # 1
print(len(array_2d[[1, 3], :].shape)) # 2
This slicing notation is explained here https://docs.python.org/3/tutorial/introduction.html#strings
-1 means last element, -2 - second from last, etc. For example, if there are 8 elements in a list, -1 is equivalent to 7 (not 8 because indexing starts from 0)
Keep in mind that "normal" python slicing for nested lists looks like [1:3][5:7], while numpy arrays also have a slightly different syntax ([8:10, 12:14]) that lets you slice multidimensional arrays. However, -1 always means the same thing. Here is the numpy documentation for slicing https://numpy.org/doc/stable/user/basics.indexing.html
I have 4-dimensional array. I am going to turn it into a 1-dim array. I use numpy ravel and it works fin with the default parameters.
However I would also like the positions/indices in the 4-dim array.
I want something like this as row in my output.
x,y,z,w,value
With x being the first dimension of my initial array and so on.
The obvious approach is iteration, however I was told to avoid it when I can.
for i in range(test.shape[0]):
for j in range(test.shape[1]):
for k in range(test.shape[2]):
for l in range(test.shape[3]):
print(i,j,k,l,test[(i,j,k,l)])
It will be to slow when I use a larger dataset.
Is there a way to configure ravel to do this or any other approach faster than iteration.
Use np.indices with sparse=False, combined with np.concatenate to build the array. np.indices provides the first n columns, and np.concatenate appends the last one:
test = np.random.randint(10, size=(3, 5, 4, 2))
index = np.indices(test.shape, sparse=False) # shape: (4, 3, 5, 4, 2)
data = np.concatenate((index, test[None, ...]), axis=0).reshape(test.ndim + 1, -1).T
A more detailed breakdown:
index is a (4, *test.shape) array, with one element per dimension.
To make test concatenatable with index, you need to prepend a unit dimension, which is what test[None, ...] does. None is synonymous with np.newaxis, and Ellipsis, or ..., means "all the remaining dimensions".
When you concatenate along axis=0, you are appending test to the array of indices. Each element of index along the first axis is now a 5-element array containing the index followed by the value. The remaining axes reflect the shape of test, but besides that, you have what you want.
The goal is to flatten out the trailing dimensions, so you get a (5, N) array, where N = np.prod(test.shape). Thats what the final reshape does. test.ndim + 1 is the size of the index +1 for the value. -1 can appear exactly once in a reshape. It means "product of all the remaining dimensions".
I have a tensor x in pytorch let's say of shape (5,3,2,6) and another tensor idx of shape (5,3,2,1) which contain indices for every element in first tensor. I want a slicing of the first tensor with the indices of the second tensor. I tried x= x[idx] but I get a weird dimensionality when I really want it to be of shape (5,3,2) or (5,3,2,1).
I'll try to give an easier example:
Let's say
x=torch.Tensor([[10,20,30],
[8,4,43]])
idx = torch.Tensor([[0],
[2]])
I want something like
y = x[idx]
such that 'y' outputs [[10],[43]] or something like.
The indices represent the position of the wanted elements the last dimension. for the example above where x.shape = (2,3) the last dimension are the columns, then the indices in 'idx' is the column. I want this but for more than 2 dimensions
From what I understand from the comments, you need idx to be index in the last dimension and each index in idx corresponds to similar index in x (except for the last dimension). In that case (this is the numpy version, you can convert it to torch):
ind = np.indices(idx.shape)
ind[-1] = idx
x[tuple(ind)]
output:
[[10]
[43]]
You can use range; and squeeze to get proper idx dimension like
x[range(x.size(0)), idx.squeeze()]
tensor([10., 43.])
# or
x[range(x.size(0)), idx.squeeze()].unsqueeze(1)
tensor([[10.],
[43.]])
Here's the one that works in PyTorch using gather. The idx needs to be in torch.int64 format which the following line will ensure (note the lowercase of 't' in tensor).
idx = torch.tensor([[0],
[2]])
torch.gather(x, 1, idx) # 1 is the axis to index here
tensor([[10.],
[43.]])
Is there a way of computing a minimum index value of an array after application of a function (i.e. the equivalent of matlab find)?
In other words consider something like:
a = [1,-3,-10,3]
np.find_max(a,lambda x:abs(x))
Should return 2.
I could write a loop for this obviously but I assume it would be faster to use an inbuilt numpy function if one existed.
Use argmax, according to the documentation:
numpy.argmax(a, axis=None, out=None)
Returns the indices of
the maximum values along an axis.
Parameters: a : array_like Input array. axis : int, optional By
default, the index is into the flattened array, otherwise along the
specified axis. out : array, optional If provided, the result will be
inserted into this array. It should be of the appropriate shape and
dtype. Returns: index_array : ndarray of ints Array of indices into
the array. It has the same shape as a.shape with the dimension along
axis removed. See also ndarray.argmax, argmin
amax The maximum value along a given axis. unravel_index Convert a
flat index into an index tuple. Notes
In case of multiple occurrences of the maximum values, the indices
corresponding to the first occurrence are returned.
import numpy as np
a = [1, -3, -10, 3]
print(np.argmax(np.abs(a)))
Output:
2
I am trying to replace a specific row of NaN's in a 3-D array (filled with NaN's) with rows of known integer values from a specific column in a text file (ex: 24 rows of column 8). Is there a method to perform this replacement that I have missed in my search for help?
My most recent trial code (of many) is as follows:
import numpy as np
tfile = "C:\...\Lee_Gilmer_MEM_GA_01_02_2015.txt"
data = np.genfromtxt(tfile, dtype=None)
#creation of empty 24 hour global matrix
s_array = np.empty((24,361,720))
s_array[:] = np.NAN
#Get values from column 8
c_data = data[:,7]
#Replace all 24 NaN's slices of row 1 column 1 with corresponding 24 row values from column 8
s_array[:,0:1,0:1] = c_data
print s_array
This produces a result of:
ValueError: could not broadcast input array from shape (24) into shape (24,1,1)
When I print out the shape of c_data, I get:
(24L,)
Is this at all possible to do without having to use a loop and replacing each one individually?
The error message tells you pretty much everything you need to know: the array slice on the left-hand side of the assignment has a shape of (24,1,1), whereas the right-hand side has shape (24,). Since these shapes don't match, numpy raises a ValueError.
There are two ways to solve this:
Make the shape of the LHS (24,) rather than (24, 1, 1). A nice way to do this would be to index with an integer rather than a slice for the last two dimensions:
s_array[:, 0, 0] = c_data
Reshape c_data to match the shape of the LHS:
s_array[:, 0:1, 0:1] = c_data.reshape(24, 1, 1)
I think option 1 is a lot more readable.