Adding an additional dimension to ndarray - python

I have and ndarray defined in the following way:
dataset = np.ndarray(shape=(len(image_files), image_size, image_size),
dtype=np.float32)
This array represents a collection of images of size image_size * image_size.
So I can say, dataset[0] and get a 2D table corresponding to an image with index 0.
Now I would like to have one additional field for each image in this array. For instance, for image located at index 0, I would like to store number 123, for an image located at index 321 I would like to store number 50000.
What is the simplest way to add this additional data field to the existing ndarray?
What is the appropriate way to access data in the new array after adding this additional dimension?

If you shuffle an index array instead of the dataset itself, you can keep track of the original 'identifiers'
idx = np.arange(len(image_files))
np.random.shuffle(idx)
shuffle_set = dataset[idx]
illustration:
In [20]: x = np.arange(12).reshape(6,2)
...: idx = np.arange(6)
...: np.random.shuffle(idx)
In [21]: x
Out[21]:
array([[ 0, 1],
[ 2, 3],
[ 4, 5],
[ 6, 7],
[ 8, 9],
[10, 11]])
In [22]: x[idx] # shuffled
Out[22]:
array([[ 4, 5],
[ 0, 1],
[ 2, 3],
[ 6, 7],
[10, 11],
[ 8, 9]])
In [23]: idx1=np.argsort(idx)
In [24]: idx
Out[24]: array([2, 0, 1, 3, 5, 4])
In [25]: idx1
Out[25]: array([1, 2, 0, 3, 5, 4])
In [26]: Out[22][idx1] # recover original order
Out[26]:
array([[ 0, 1],
[ 2, 3],
[ 4, 5],
[ 6, 7],
[ 8, 9],
[10, 11]])

Numpy arrays are fundamentally tensors, i.e., they have a shape that is absolute across the axes. Meaning that the shape is fixed and not variable. Take for example,
import numpy as np
x = np.array([[[1,2],[3,4]],
[[5,6],[7,8]]
])
print(x.shape) #Here we have two, 2x2s. Shape = (2,2,2)
If I want to associate x[0] to the number 5 and x[1] to the number 7, then that would be something like (if it was possible):
x = np.array([[[1,2],[3,4]],5,
[[5,6],[7,8]],7
])
But such thing is impossible, since it would "in some sense" have a shape that corresponds to (2,((2,2),1)), or something else that is ambiguous. Such an object is not a numpy array or a tensor. It doesn't have fixed axis sizes. All numpy arrays must have fixed axis sizes. Hence, if you wish to store the new information, the only way to do it, is to create another array.
x = np.array([[[1,2],[3,4]],
[[5,6],[7,8]],
])
y = np.array([5,7])
Now x[0] corresponds to y[0] and x[1] corresponds to y[1]. x has shape (2,2,2) and y has shape (2,).

Related

Numpy: Indexing 3D matrix using 1D array

arr = np.arange(12).reshape((3, 2, 2))
indices = np.array([0, 1, 1])
expected_outcome = np.array([[0, 1], [6, 7], [10, 11]])
I'm trying to index this array of shape (3,2,2) with an array of shape (3) containing the y-index of the value I want to get. I tried to make it work with for in statement, but is there an elegant way to do it with numpy?
So you want arr[0,0,:], arr[1,1,:], arr[2,1,:]?
How about
In [179]: arr[[0,1,2], [0,1,1]]
Out[179]:
array([[ 0, 1],
[ 6, 7],
[10, 11]])

How to slice 2D Torch tensor individually per row?

I have a 2D tensor in Pytorch that I would like to slice:
x = torch.rand((3, 5))
In this example, the tensor has 3 rows and I want to slice x, creating a new tensor y that also has 3 rows and num_col cols.
What's challenging for me is that I want to slice different columns per row. All I have is x, num_cols, and idx, which is a tensor holding the start index from where to slice.
Example:
What I have is num_cols=2, idx=[1,2,3] and
x=torch.arange(15).reshape((3,-1)) =
tensor([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
What I want is
y=
tensor([[ 1, 2],
[ 7, 8],
[13, 14]])
What's the "torch"-way of doing this? I know, I can slice if I get a boolean mask somehow, but I don't know how to construct that with idx and num_cols without normal Python loops.
You could use fancy indexing together with broadcasting. Another solution might be to use torch.gather which is similar to numpy's take_along_axis. Your idx array would need to be extended with the extra column:
x = torch.arange(15).reshape(3,-1)
idx = torch.tensor([1,2,3])
idx = torch.column_stack([idx, idx+1])
torch.gather(x, 1, idx)
output:
tensor([[ 1, 2],
[ 7, 8],
[13, 14]])

How to add a dimension to a numpy array in Python

I have an array that is size (214, 144). I need it to be (214,144,1) is there a way to do this easily in Python? Basically the dimensions are supposed to be (Days, Times, Stations). Since I only have 1 station's data that dimension would be a 1. However if I could also make the code flexible enough work for say 2 stations that would be great (e.g. changing the dimension size from (428,288) to (214,144,2)) that would be great!
You could use reshape:
>>> a = numpy.array([[1,2,3,4,5,6],[7,8,9,10,11,12]])
>>> a.shape
(2, 6)
>>> a.reshape((2, 6, 1))
array([[[ 1],
[ 2],
[ 3],
[ 4],
[ 5],
[ 6]],
[[ 7],
[ 8],
[ 9],
[10],
[11],
[12]]])
>>> _.shape
(2, 6, 1)
Besides changing the shape from (x, y) to (x, y, 1), you could use (x, y/n, n) as well, but you may want to specify the column order depending on the input:
>>> a.reshape((2, 3, 2))
array([[[ 1, 2],
[ 3, 4],
[ 5, 6]],
[[ 7, 8],
[ 9, 10],
[11, 12]]])
>>> a.reshape((2, 3, 2), order='F')
array([[[ 1, 4],
[ 2, 5],
[ 3, 6]],
[[ 7, 10],
[ 8, 11],
[ 9, 12]]])
1) To add a dimension to an array a of arbitrary dimensionality:
b = numpy.reshape (a, list (numpy.shape (a)) + [1])
Explanation:
You get the shape of a, turn it into a list, concatenate 1 to that list, and use that list as the new shape in a reshape operation.
2) To specify subdivisions of the dimensions, and have the size of the last dimension calculated automatically, use -1 for the size of the last dimension. e.g.:
b = numpy.reshape(a, [numpy.size(a,0)/2, numpy.size(a,1)/2, -1])
The shape of b in this case will be [214,144,4].
(obviously you could combine the two approaches if necessary):
b = numpy.reshape (a, numpy.append (numpy.array (numpy.shape (a))/2, -1))

Numpy array slicing using colons

I am trying to learn numpy array slicing.
But this is a syntax i cannot seem to understand.
What does
a[:1] do.
I ran it in python.
a = np.array([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16])
a = a.reshape(2,2,2,2)
a[:1]
Output:
array([[[ 5, 6],
[ 7, 8]],
[[13, 14],
[15, 16]]])
Can someone explain to me the slicing and how it works. The documentation doesn't seem to answer this question.
Another question would be would there be a way to generate the a array using something like
np.array(1:16) or something like in python where
x = [x for x in range(16)]
The commas in slicing are to separate the various dimensions you may have. In your first example you are reshaping the data to have 4 dimensions each of length 2. This may be a little difficult to visualize so if you start with a 2D structure it might make more sense:
>>> a = np.arange(16).reshape((4, 4))
>>> a
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
>>> a[0] # access the first "row" of data
array([0, 1, 2, 3])
>>> a[0, 2] # access the 3rd column (index 2) in the first row of the data
2
If you want to access multiple values using slicing you can use the colon to express a range:
>>> a[:, 1] # get the entire 2nd (index 1) column
array([[1, 5, 9, 13]])
>>> a[1:3, -1] # get the second and third elements from the last column
array([ 7, 11])
>>> a[1:3, 1:3] # get the data in the second and third rows and columns
array([[ 5, 6],
[ 9, 10]])
You can do steps too:
>>> a[::2, ::2] # get every other element (column-wise and row-wise)
array([[ 0, 2],
[ 8, 10]])
Hope that helps. Once that makes more sense you can look in to stuff like adding dimensions by using None or np.newaxis or using the ... ellipsis:
>>> a[:, None].shape
(4, 1, 4)
You can find more here: http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html
It might pay to explore the shape and individual entries as we go along.
Let's start with
>>> a = np.array([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16])
>>> a.shape
(16, )
This is a one-dimensional array of length 16.
Now let's try
>>> a = a.reshape(2,2,2,2)
>>> a.shape
(2, 2, 2, 2)
It's a multi-dimensional array with 4 dimensions.
Let's see the 0, 1 element:
>>> a[0, 1]
array([[5, 6],
[7, 8]])
Since there are two dimensions left, it's a matrix of two dimensions.
Now a[:, 1] says: take a[i, 1 for all possible values of i:
>>> a[:, 1]
array([[[ 5, 6],
[ 7, 8]],
[[13, 14],
[15, 16]]])
It gives you an array where the first item is a[0, 1], and the second item is a[1, 1].
To answer the second part of your question (generating arrays of sequential values) you can use np.arange(start, stop, step) or np.linspace(start, stop, num_elements). Both of these return a numpy array with the corresponding range of values.

Numpy 3D array transposed when indexed in single step vs two steps

import numpy as np
x = np.random.randn(2, 3, 4)
mask = np.array([1, 0, 1, 0], dtype=np.bool)
y = x[0, :, mask]
z = x[0, :, :][:, mask]
print(y)
print(z)
print(y.T)
Why does doing the above operation in two steps result in the transpose of doing it in one step?
Here's the same behavior with a list index:
In [87]: x=np.arange(2*3*4).reshape(2,3,4)
In [88]: x[0,:,[0,2]]
Out[88]:
array([[ 0, 4, 8],
[ 2, 6, 10]])
In [89]: x[0,:,:][:,[0,2]]
Out[89]:
array([[ 0, 2],
[ 4, 6],
[ 8, 10]])
In the 2nd case, x[0,:,:] returns a (3,4) array, and the next index picks 2 columns.
In the 1st case, it first selects on the first and last dimensions, and appends the slice (the middle dimension). The 0 and [0,2] produce a 2 dimension, and the 3 from the middle is appended, giving (2,3) shape.
This is a case of mixed basic and advanced indexing.
http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#combining-advanced-and-basic-indexing
In the first case, the dimensions resulting from the advanced indexing operation come first in the result array, and the subspace dimensions after that.
This is not an easy case to comprehend or explain. Basically there's some ambiguity as to what the final dimension should be. It tries to illustrate with an example x[:,ind_1,:,ind_2] where ind_1 and ind_2 are 3d (or together broadcast to that).
Earlier attempts to explain this are:
How does numpy order array slice indices?
Combining slicing and broadcasted indexing for multi-dimensional numpy arrays
===========================
A way around this problem is to replace the slice with an array - a column vector
In [221]: x[0,np.array([0,1,2])[:,None],[0,2]]
Out[221]:
array([[ 0, 2],
[ 4, 6],
[ 8, 10]])
In [222]: np.ix_([0],[0,1,2],[0,2])
Out[222]:
(array([[[0]]]), array([[[0],
[1],
[2]]]), array([[[0, 2]]]))
In [223]: x[np.ix_([0],[0,1,2],[0,2])]
Out[223]:
array([[[ 0, 2],
[ 4, 6],
[ 8, 10]]])
Though this last case is 3d, (1,3,2). ix_ didn't like the scalar 0. An alternate way of using ix_:
In [224]: i,j=np.ix_([0,1,2],[0,2])
In [225]: x[0,i,j]
Out[225]:
array([[ 0, 2],
[ 4, 6],
[ 8, 10]])
And here's a way of getting the same numbers, but in a (2,1,3) array:
In [232]: i,j=np.ix_([0,2],[0])
In [233]: x[j,:,i]
Out[233]:
array([[[ 0, 4, 8]],
[[ 2, 6, 10]]])

Categories