Generically creating object with n preceding singleton dimensions - python

Say I have matrix X with X.ndim = n. I now want to create a new matrix that has n "singleton" dimensions.
For example, when n=2, and I create a new range, I want to create it like
>>> bar = np.arange(0, 5)[np.newaxis, np.newaxis, ...]
>>> bar.shape
(1, 1, 5)
such that it has 2 singleton dimensions. Say n = 5. How would I generically generate my bar such that it has shape (1,1,1,1,1,5)?

One way could be to create the new array specifying the ndmin parameter:
>>> np.array(np.arange(5), ndmin=6).shape
(1, 1, 1, 1, 1, 5)
NumPy adds the new dimensions on the left.
Alternatively you could use reshape and pass in a tuple specifying the required shape:
>>> np.arange(5).reshape((1,)*5 + (5,)).shape
(1, 1, 1, 1, 1, 5)

>>> n=5
>>> bar = np.arange(0, 5)[(np.newaxis, )*n]
>>> bar.shape
(1, 1, 1, 1, 1, 5)

Related

How to split multi-dimensional arrays based on the unique indices of another array?

I have two torch tensors a and b:
import torch
torch.manual_seed(0) # for reproducibility
a = torch.rand(size = (5, 10, 1))
b = torch.tensor([3, 3, 1, 5, 3, 1, 0, 2, 1, 2])
I want to split the 2nd dimension of a (which is dim = 1 in the Python language) based on the unique values in b.
What I have tried so far:
# find the unique values and unique indices of b
unique_values, unique_indices = torch.unique(b, return_inverse = True)
# split a in where dim = 1, based on unique indices
l = torch.tensor_split(a, unique_indices, dim = 1)
I was expecting l to be a list of n number of tensors where n is the number of unique values in b. I was also expecting the tensors to have the shape (5, number of elements corresponding to unique_values, 1).
However, I get the following:
print(l)
(tensor([[[0.8198],
[0.9971],
[0.6984]],
[[0.7262],
[0.7011],
[0.2038]],
[[0.1147],
[0.3168],
[0.6965]],
[[0.0340],
[0.9442],
[0.8802]],
[[0.6833],
[0.7529],
[0.8579]]]), tensor([], size=(5, 0, 1)), tensor([], size=(5, 0, 1)), tensor([[[0.9971],
[0.6984],
[0.5675]],
[[0.7011],
[0.2038],
[0.6511]],
[[0.3168],
[0.6965],
[0.9143]],
[[0.9442],
[0.8802],
[0.0012]],
[[0.7529],
[0.8579],
[0.6870]]]), tensor([], size=(5, 0, 1)), tensor([], size=(5, 0, 1)), tensor([], size=(5, 0, 1)), tensor([[[0.8198],
[0.9971]],
[[0.7262],
[0.7011]],
[[0.1147],
[0.3168]],
[[0.0340],
[0.9442]],
[[0.6833],
[0.7529]]]), tensor([], size=(5, 0, 1)), tensor([[[0.9971]],
[[0.7011]],
[[0.3168]],
[[0.9442]],
[[0.7529]]]), tensor([[[0.6984],
[0.5675],
[0.8352],
[0.2056],
[0.5932],
[0.1123],
[0.1535],
[0.2417]],
[[0.2038],
[0.6511],
[0.7745],
[0.4369],
[0.5191],
[0.6159],
[0.8102],
[0.9801]],
[[0.6965],
[0.9143],
[0.9351],
[0.9412],
[0.5995],
[0.0652],
[0.5460],
[0.1872]],
[[0.8802],
[0.0012],
[0.5936],
[0.4158],
[0.4177],
[0.2711],
[0.6923],
[0.2038]],
[[0.8579],
[0.6870],
[0.0051],
[0.1757],
[0.7497],
[0.6047],
[0.1100],
[0.2121]]]))
Why do I get empty tensors like tensor([], size=(5, 0, 1)) and how would I achieve what I want to achieve?
From your description of the desired result:
I was also expecting the tensors to have the shape (5, number of elements corresponding to unique_values, 1).
I believe you are looking for the count (or frequency) of unique values. If you want to keep using torch.unique, then you can provide the return_counts argument combined with a call to torch.cumsum.
Something like this should work:
>>> indices = torch.cumsum(counts, dim=0)
>>> splits = torch.tensor_split(a, indices[:-1], dim = 1)
Let's have a look:
>>> for x in splits:
... print(x.shape)
torch.Size([5, 1, 1])
torch.Size([5, 3, 1])
torch.Size([5, 2, 1])
torch.Size([5, 3, 1])
torch.Size([5, 1, 1])
Are you looking for the index_select method?
You have correclty obtained your unique values in unique_values.
Now what you need to do is:
l = a.index_select(1, unique_values)

How to select indices according to another tensor in pytorch

The task seems to be simple, but I cannot figure out how to do it.
So what I have are two tensors:
an indices tensor indices with shape (2, 5, 2), where the last dimensions corresponds to indices in x and y dimension
a "value tensor" value with shape (2, 5, 2, 16, 16), where I want the last two dimensions to be selected with x and y indices
To be more concrete, the indices are between 0 and 15 and I want to get an output:
out = value[:, :, :, x_indices, y_indices]
The shape of the output should therefore be of (2, 5, 2). Can anybody help me here? Thanks a lot!
Edit:
I tried the suggestion with gather, but unfortunately it does not seem to work (I changed the dimensions, but it doesn't matter):
First I generate a coordinate grid:
y_t = torch.linspace(-1., 1., 16, device='cpu').reshape(16, 1).repeat(1, 16).unsqueeze(-1)
x_t = torch.linspace(-1., 1., 16, device='cpu').reshape(1, 16).repeat(16, 1).unsqueeze(-1)
grid = torch.cat((y_t, x_t), dim=-1).permute(2, 0, 1).unsqueeze(0)
grid = grid.unsqueeze(1).repeat(1, 3, 1, 1, 1)
In the next step, I am creating some indices. In this case, I always take index 1:
indices = torch.ones([1, 3, 2], dtype=torch.int64)
Next, I am using your method:
indices = indices.unsqueeze(-1).unsqueeze(-1)
new_coords = torch.gather(grid, -1, indices).squeeze(-1).squeeze(-1)
Finally, I manually select index 1 for x and y coordinate:
new_coords_manual = grid[:, :, :, 1, 1]
This outputs the following new coordinates:
new_coords
tensor([[[-1.0000, -0.8667],
[-1.0000, -0.8667],
[-1.0000, -0.8667]]])
new_coords_manual
tensor([[[-0.8667, -0.8667],
[-0.8667, -0.8667],
[-0.8667, -0.8667]]])
As you can see, it only works for one dimension. Do you have an idea how to fix that?
What you could do is flatten the first three axes together and apply torch.gather:
>>> grid.flatten(start_dim=0, end_dim=2).shape
torch.Size([6, 16, 16])
>>> torch.gather(grid.flatten(0, 2), axis=1, indices)
tensor([[[-0.8667, -0.8667],
[-0.8667, -0.8667],
[-0.8667, -0.8667]]])
As explained on the documentation page, this will perform:
out[i][j][k] = input[i][index[i][j][k]][k] # if dim == 1
I figured it out, thanks again #Ivan for your help! :)
The problem was, that i unsqueezed on the last dimension, while I should have unsqueezed in the middle dimensions, so that the indices are at the end:
y_t = torch.linspace(-1., 1., 16, device='cpu').reshape(16, 1).repeat(1, 16).unsqueeze(-1)
x_t = torch.linspace(-1., 1., 16, device='cpu').reshape(1, 16).repeat(16, 1).unsqueeze(-1)
grid = torch.cat((y_t, x_t), dim=-1).permute(2, 0, 1).unsqueeze(0)
grid = grid.unsqueeze(1).repeat(2, 3, 1, 1, 1)
indices = torch.ones([2, 3, 2], dtype=torch.int64).unsqueeze(-2).unsqueeze(-2)
new_coords = torch.gather(grid, 3, indices).squeeze(-2).squeeze(-2)
new_coords_manual = grid[:, :, :, 1, 1]
Now new_coords equals new_coords_manual.

Change shape of nparray

import numpy as np
​
image1 = np.zeros((120, 120))
image2 = np.zeros((120, 120))
image3 = np.zeros((120, 120))
​
pack1 = np.array([image1,image2,image3])
pack2 = np.array([image1,image2,image3])
​
result = np.array([pack1,pack2])
print result.shape
the result is :
(2, 3, 120, 120)
Question : how can I make array with shape (2,120,120,3) with same data without mixing?
Use np.rollaxis to move (OK, roll) a single axis to a specified position:
>>> a.shape
(2, 3, 11, 11)
>>> np.rollaxis(a, 0, 4).shape
(3, 11, 11, 2)
Here the syntax is "roll the zeroth axis so that it becomes the 4th in the new array".
Notice that rollaxis creates a view and does not copy:
>>> np.rollaxis(a, 0, 4).base is a
True
An alternative (and often more readable) way would be to use the fact that np.transpose accepts a tuple of where to place the axes. Observe:
>>> np.transpose(a, (1, 2, 3, 0)).shape
(3, 11, 11, 2)
>>> np.transpose(a, (1, 2, 3, 0)).base is a
True
Here the syntax is "permute the axes so that what was the zeroth axis in the original array becomes the 4th axis in the new array"
You can transpose your packs
pack1 = np.array([image1,image2,image3]).T
pack2 = np.array([image1,image2,image3]).T
and the result has your desired shape.
The (relatively) new stack function gives more control that np.array on how arrays are joined.
Use stack to join them on a new last axis:
In [24]: pack1=np.stack((image1,image2,image3),axis=2)
In [25]: pack1.shape
Out[25]: (120, 120, 3)
In [26]: pack2=np.stack((image1,image2,image3),axis=2)
then join on a new first axis (same as np.array()):
In [27]: result=np.stack((pack1,pack2),axis=0)
In [28]: result.shape
Out[28]: (2, 120, 120, 3)

Dimensionality agnostic (generic) cartesian product [duplicate]

This question already has answers here:
How to get the cartesian product of multiple lists
(17 answers)
Closed 8 months ago.
I'm looking to generate the cartesian product of a relatively large number of arrays to span a high-dimensional grid. Because of the high dimensionality, it won't be possible to store the result of the cartesian product computation in memory; rather it will be written to hard disk. Because of this constraint, I need access to the intermediate results as they are generated. What I've been doing so far is this:
for x in xrange(0, 10):
for y in xrange(0, 10):
for z in xrange(0, 10):
writeToHdd(x,y,z)
which, apart from being very nasty, is not scalable (i.e. it would require me writing as many loops as dimensions). I have tried to use the solution proposed here, but that is a recursive solution, which therefore makes it quite hard to obtain the results on the fly as they are being generated. Is there any 'neat' way to do this other than having a hardcoded loop per dimension?
In plain Python, you can generate the Cartesian product of a collection of iterables using itertools.product.
>>> arrays = range(0, 2), range(4, 6), range(8, 10)
>>> list(itertools.product(*arrays))
[(0, 4, 8), (0, 4, 9), (0, 5, 8), (0, 5, 9), (1, 4, 8), (1, 4, 9), (1, 5, 8), (1, 5, 9)]
In Numpy, you can combine numpy.meshgrid (passing sparse=True to avoid expanding the product in memory) with numpy.ndindex:
>>> arrays = np.arange(0, 2), np.arange(4, 6), np.arange(8, 10)
>>> grid = np.meshgrid(*arrays, sparse=True)
>>> [tuple(g[i] for g in grid) for i in np.ndindex(grid[0].shape)]
[(0, 4, 8), (0, 4, 9), (1, 4, 8), (1, 4, 9), (0, 5, 8), (0, 5, 9), (1, 5, 8), (1, 5, 9)]
I think I figured out a nice way using a memory mapped file:
def carthesian_product_mmap(vectors, filename, mode='w+'):
'''
Vectors should be a tuple of `numpy.ndarray` vectors. You could
also make it more flexible, and include some error checking
'''
# Make a meshgrid with `copy=False` to create views
grids = np.meshgrid(*vectors, copy=False, indexing='ij')
# The shape for concatenating the grids from meshgrid
shape = grid[0].shape + (len(vectors),)
# Find the "highest" dtype neccesary
dtype = np.result_type(*vectors)
# Instantiate the memory mapped file
M = np.memmap(filename, dtype, mode, shape=shape)
# Fill the memmap with the grids
for i, grid in enumerate(grids):
M[...,i] = grid
# Make sure the data is written to disk (optional?)
M.flush()
# Reshape to put it in the right format for Carthesian product
return M.reshape((-1, len(vectors)))
But I wonder if you really need to store the whole Carthesian product (there's a lot of data duplication). Is it not an option to generate the rows in the product at the moment they're needed?
It seems you just want to loop over an arbitrary number of dimensions. My generic solution for this is using an index field and increment indices plus handling overflows.
Example:
n = 3 # number of dimensions
N = 1 # highest index value per dimension
idx = [0]*n
while True:
print(idx)
# increase first dimension
idx[0] += 1
# handle overflows
for i in range(0, n-1):
if idx[i] > N:
# reset this dimension and increase next higher dimension
idx[i] = 0
idx[i+1] += 1
if idx[-1] > N:
# overflow in the last dimension, we are finished
break
Gives:
[0, 0, 0]
[1, 0, 0]
[0, 1, 0]
[1, 1, 0]
[0, 0, 1]
[1, 0, 1]
[0, 1, 1]
[1, 1, 1]
Numpy has something similar inbuilt: ndenumerate.

Shape of array python

Suppose I create a 2 dimensional array
m = np.random.normal(0, 1, size=(1000, 2))
q = np.zeros(shape=(1000,1))
print m[:,0] -q
When I take m[:,0].shape I get (1000,) as opposed to (1000,1) which is what I want. How do I coerce m[:,0] to a (1000,1) array?
By selecting the 0th column in particular, as you've noticed, you reduce the dimensionality:
>>> m = np.random.normal(0, 1, size=(5, 2))
>>> m[:,0].shape
(5,)
You have a lot of options to get a 5x1 object back out. You can index using a list, rather than an integer:
>>> m[:, [0]].shape
(5, 1)
You can ask for "all the columns up to but not including 1":
>>> m[:,:1].shape
(5, 1)
Or you can use None (or np.newaxis), which is a general trick to extend the dimensions:
>>> m[:,0,None].shape
(5, 1)
>>> m[:,0][:,None].shape
(5, 1)
>>> m[:,0, None, None].shape
(5, 1, 1)
Finally, you can reshape:
>>> m[:,0].reshape(5,1).shape
(5, 1)
but I'd use one of the other methods for a case like this.

Categories