I have 4-dimensional array. I am going to turn it into a 1-dim array. I use numpy ravel and it works fin with the default parameters.
However I would also like the positions/indices in the 4-dim array.
I want something like this as row in my output.
x,y,z,w,value
With x being the first dimension of my initial array and so on.
The obvious approach is iteration, however I was told to avoid it when I can.
for i in range(test.shape[0]):
for j in range(test.shape[1]):
for k in range(test.shape[2]):
for l in range(test.shape[3]):
print(i,j,k,l,test[(i,j,k,l)])
It will be to slow when I use a larger dataset.
Is there a way to configure ravel to do this or any other approach faster than iteration.
Use np.indices with sparse=False, combined with np.concatenate to build the array. np.indices provides the first n columns, and np.concatenate appends the last one:
test = np.random.randint(10, size=(3, 5, 4, 2))
index = np.indices(test.shape, sparse=False) # shape: (4, 3, 5, 4, 2)
data = np.concatenate((index, test[None, ...]), axis=0).reshape(test.ndim + 1, -1).T
A more detailed breakdown:
index is a (4, *test.shape) array, with one element per dimension.
To make test concatenatable with index, you need to prepend a unit dimension, which is what test[None, ...] does. None is synonymous with np.newaxis, and Ellipsis, or ..., means "all the remaining dimensions".
When you concatenate along axis=0, you are appending test to the array of indices. Each element of index along the first axis is now a 5-element array containing the index followed by the value. The remaining axes reflect the shape of test, but besides that, you have what you want.
The goal is to flatten out the trailing dimensions, so you get a (5, N) array, where N = np.prod(test.shape). Thats what the final reshape does. test.ndim + 1 is the size of the index +1 for the value. -1 can appear exactly once in a reshape. It means "product of all the remaining dimensions".
Related
# split into inputs and outputs
X, y = data[:, :-1], data[:, -1]
print(X.shape, y.shape)
Can someone explain the second line of code with reference to specific documentation? I know its slicing but the I couldn't find any reference for the notation ":-1" anywhere. Please give the specific documentation portion.
Thank you
It results in slicing, most probably using numpy and it is being done on a data of shape (610, 14)
Per the docs:
Indexing on ndarrays
ndarrays can be indexed using the standard Python x[obj] syntax, where x is the array and obj the selection. There are different kinds of indexing available depending on obj: basic indexing, advanced indexing and field access.
1D array
Slicing a 1-dimensional array is much like slicing a list
import numpy as np
np.random.seed(0)
array_1d = np.random.random((5,))
print(len(array_1d.shape))
1
NOTE: The len of the array shape tells you the number of dimensions.
We can use standard python list slicing on the 1D array.
# get the last element
print(array_1d[-1])
0.4236547993389047
# get everything up to but excluding the last element
print(array_1d[:-1])
[0.5488135 0.71518937 0.60276338 0.54488318]
2D array
array_2d = np.random.random((5, 1))
print(len(array_2d.shape))
2
Think of a 2-dimensional array like a data frame. It has rows (the 0th axis) and columns (the 1st axis). numpy grants us the ability to slice these axes independently by separating them with a comma (,).
# the 0th row and all columns
# the 0th row and all columns
print(array_2d[0, :])
[0.79172504]
# the 1st row and everything after + all columns
print(array_2d[1:, :])
[[0.52889492]
[0.56804456]
[0.92559664]
[0.07103606]]
# the 1st through second to last row + the last column
print(array_2d[1:-1, -1])
[0.52889492 0.56804456 0.92559664]
Your Example
# split into inputs and outputs
X, y = data[:, :-1], data[:, -1]
print(X.shape, y.shape)
Note that data.shape is >= 2 (otherwise you'd get an IndexError).
This means data[:, :-1] is keeping all "rows" and slicing up to, but not including, the last "column". Likewise, data[:, -1] is keeping all "rows" and selecting only the last "column".
It's important to know that when you slice an ndarray using a colon (:), you will get an array with the same dimensions.
print(len(array_2d[1:, :-1].shape)) # 2
But if you "select" a specific index (i.e. don't use a colon), you may reduce the dimensions.
print(len(array_2d[1, :-1].shape)) # 1, because I selected a single index value on the 0th axis
print(len(array_2d[1, -1].shape)) # 0, because I selected a single index value on both the 0th and 1st axes
You can, however, select a list of indices on either axis (assuming they exist).
print(len(array_2d[[1], [-1]].shape)) # 1
print(len(array_2d[[1, 3], :].shape)) # 2
This slicing notation is explained here https://docs.python.org/3/tutorial/introduction.html#strings
-1 means last element, -2 - second from last, etc. For example, if there are 8 elements in a list, -1 is equivalent to 7 (not 8 because indexing starts from 0)
Keep in mind that "normal" python slicing for nested lists looks like [1:3][5:7], while numpy arrays also have a slightly different syntax ([8:10, 12:14]) that lets you slice multidimensional arrays. However, -1 always means the same thing. Here is the numpy documentation for slicing https://numpy.org/doc/stable/user/basics.indexing.html
I have a matrix D and sort every row with the indicies (argsort). I'm trying to set values of some_matrix at indicies 1-5 in np.argsort(D) to 1. What I have below does what I need, but is there a way to do this in one line with numpy arrays?
some_matrix = np.zeros((n,n))
for i in range(n):
some_matrix[i,np.argsort(D)[i,1:5]] = 1
Firstly, note that you don't need a full sort, only a partition of elements 1-4 (I assume you need elements 1,2,3,4, because that's what your code does). So let's use that:
#assuming you want indices 1,2,3,4 of the sorted array, in any order
indices = np.argpartition(D, (1, 4), axis=1)[:, 1:5]
Now we've got indices of D with the first, second, third and fourth smallest elements (this is similar to indices = np.argsort(D, 1)[:, 1:5], but will be faster for large arrays). All we need is to set these elements to 1
np.put_along_axis(some_matrix, indices, 1, axis=1)
Now I have one 2D Numpy array of float values, i.e. a, and its shape is (10^6, 3).
I want to know which rows are greater than np.array([25.0, 25.0, 25.0]). And then outputting the rows that satisfy this condition.
My code appears as follows.
# Create an empty array
a_cut = np.empty(shape=(0, 3), dtype=float)
minimum = np.array([25.0, 25.0, 25.0])
for i in range(len(a)):
if a[i,:].all() > minimum.all():
a_cut = np.append(a_cut, a[i,:], axis=0)
However, the code is inefficient. After a few hours, the result has not come out.
So Is there a way to improve the speed of this loop?
np.append re-allocates the entire array every time you call it. It is basically the same as np.concatenate: use it very sparingly. The goal is to perform the entire operation in bulk.
You can construct a mask:
mask = (a > minimum).all(axis=1)
Then select:
a_cut = a[mask, :]
You may get a slight improvement from using indices instead of a boolean mask:
a_cut = a[np.flatnonzero(mask), :]
Indexing with fewer indices than there are dimensions applies the indices to the leading dimensions, so you can do
a_cut = a[mask]
The one liner is therefore:
a_cut = a[(a > minimium).all(1)]
I have a tensor x in pytorch let's say of shape (5,3,2,6) and another tensor idx of shape (5,3,2,1) which contain indices for every element in first tensor. I want a slicing of the first tensor with the indices of the second tensor. I tried x= x[idx] but I get a weird dimensionality when I really want it to be of shape (5,3,2) or (5,3,2,1).
I'll try to give an easier example:
Let's say
x=torch.Tensor([[10,20,30],
[8,4,43]])
idx = torch.Tensor([[0],
[2]])
I want something like
y = x[idx]
such that 'y' outputs [[10],[43]] or something like.
The indices represent the position of the wanted elements the last dimension. for the example above where x.shape = (2,3) the last dimension are the columns, then the indices in 'idx' is the column. I want this but for more than 2 dimensions
From what I understand from the comments, you need idx to be index in the last dimension and each index in idx corresponds to similar index in x (except for the last dimension). In that case (this is the numpy version, you can convert it to torch):
ind = np.indices(idx.shape)
ind[-1] = idx
x[tuple(ind)]
output:
[[10]
[43]]
You can use range; and squeeze to get proper idx dimension like
x[range(x.size(0)), idx.squeeze()]
tensor([10., 43.])
# or
x[range(x.size(0)), idx.squeeze()].unsqueeze(1)
tensor([[10.],
[43.]])
Here's the one that works in PyTorch using gather. The idx needs to be in torch.int64 format which the following line will ensure (note the lowercase of 't' in tensor).
idx = torch.tensor([[0],
[2]])
torch.gather(x, 1, idx) # 1 is the axis to index here
tensor([[10.],
[43.]])
For a numpy array I have found that
x = numpy.array([]).reshape(0,4)
is fine and allows me to append (0,4) arrays to x without the array losing its structure (ie it dosnt just become a list of numbers). However, when I try
x = numpy.array([]).reshape(2,3)
it throws an error. Why is this?
This out put will explain what it mean to reshape an array...
np.array([2, 3, 4, 5, 6, 7]).reshape(2, 3)
Output -
array([[2, 3, 4],
[5, 6, 7]])
So reshaping just means reshaping an array. reshape(0, 4) means convert the current array into a format with 0 rows and 4 columns intuitively. But 0 rows means no elements means so it works as your array is empty. Similarly (2, 3) means 2 rows and 3 columns which is 6 elements...
reshape is not an 'append' function. It reshapes the array you give it to the dimensions you want.
np.array([]).reshape(0,4) works because you reshape a zero element array to a 0x4(=0 elements) array.
np.reshape([]).reshape(2,3) doesn't work because you're trying to reshape a zero element array to a 2x3(=6 elements) array.
To create an empty array use np.zeros((2,3)) instead.
And in case you're wondering, numpy arrays can't be appended to. You'll have to work around by casting it as a list, appending what you want and the converting back to a numpy array. Preferably, you only create a numpy array when you don't mean to append data later.