I want to create a three-dimensional table that contains numbers. For two-dimensional data I would just use a csv-file or a pandas.DataFrame, but this appeared to be not so easy to use for three dimensions. I decided to use xarray. If you think, there are easier solutions, feel free to tell: I only want to create the array, save it in file and read it out for later use. Each column won't have more than 100 elements.
I create the array (and access it) using
import numpy as np
import xarray as xr
da =xr.DataArray(
np.ones((4,4,4),dtype=int),
[
("x-col", ['a','b','c','d']),
('y-col', ['A','B','C','D']),
('z-col', [1,2,3,4])
]
)
da.loc['a','C',2]=4
da
which gives
array([[[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 4, 1, 1],
[1, 1, 1, 1]],
[[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1]],
[[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1]],
[[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1]]])
Coordinates:
x-col (x-col) <U1 'a' 'b' 'c' 'd'
y-col (y-col) <U1 'A' 'B' 'C' 'D'
z-col (z-col) int32 1 2 3 4
Attributes: (0)
as expected. However if I want to access the number using da.loc['a','C',2] I still get
array(4)
Coordinates:
x-col () <U1 'a'
y-col () <U1 'C'
z-col () int32 2
Attributes: (0)
Is there a way such that I get the number 4 without the wrapping?
Also, can you propose an elegant method how to store the DataArray to disk so I can use it later?
Related
I have a 1d PyTorch tensor containing integers between 0 and n-1. Now I need to create a 2d PyTorch tensor with n-1 columns, where each row is a sequence from 0 to n-1 excluding the value in the first tensor. How can I achieve this efficiently?
Ex:
n = 3
a = torch.Tensor([0, 1, 2, 1, 2, 0])
# desired output
b = [
[1, 2],
[0, 2],
[0, 1],
[0, 2],
[0, 1],
[1, 2]
]
Typically, the a.numel() >> n.
Detailed Explanation:
The first element of a is 0, hence it has to map to the sequence [0, 1, 2] excluding 0, which is [1, 2].
Similarly, the second element of a is 1, hence it has to map to [0, 2] and so on.
PS: I actually have an additional batch dimension, which I've excluded here for simplicity. Hence, I need the solution to be easily extendable to one additional dimension.
We can construct a tensor with the desired sequences and index with tensor a.
import torch
n = 3
a = torch.Tensor([0, 1, 2, 1, 2, 0]) # using torch.tensor is recommended
def exclude_gather(a, n):
sequences = torch.nonzero(torch.arange(n) != torch.arange(n)[:,None], as_tuple=True)[1].reshape(-1, n-1)
return sequences[a.long()]
exclude_gather(a, n)
Output
tensor([[1, 2],
[0, 2],
[0, 1],
[0, 2],
[0, 1],
[1, 2]])
We can add a batch dimension with functorch.vmap
from functorch import vmap
n = 4
b = torch.Tensor([[0, 1, 2, 1, 3, 0],[0, 3, 1, 0, 2, 1]])
vmap(exclude_gather, in_dims=(0, None))(b, n)
Output
tensor([[[1, 2, 3],
[0, 2, 3],
[0, 1, 3],
[0, 2, 3],
[0, 1, 2],
[1, 2, 3]],
[[1, 2, 3],
[0, 1, 2],
[0, 2, 3],
[1, 2, 3],
[0, 1, 3],
[0, 2, 3]]])
All you have to do is initialize a multi-dimension array with all possible indices using torch.arange(). After that, purge indices that you don't want from each tensor using a boolean mask.
import torch
a = torch.Tensor([0, 1, 2, 1, 2, 0])
n = 3
b = [torch.arange(n) for i in range(len(a))]
c = [b[i]!=a[i] for i in range(len(b))]
# use the boolean array as a mask to apply on b
d = [[b[i][c[i]] for i in range(len(b))]]
print(d) # this can be converted to a list of numbers or torch tensor
This prints the output - [[tensor([1, 2]), tensor([0, 2]), tensor([0, 1]), tensor([0, 2]), tensor([0, 1]), tensor([1, 2])]] which you can convert to int/numpy/torch array/tensor easily.
This can be extended to multiple dimensions as well.
The following does the trick
b = []
for i in range(n-1):
b.append(i * torch.ones_like(a) + (a <= i))
b = torch.stack(b, dim=1)
Since n << size(a), the for loop should not be very costly.
Say I have two matrices, A and B:
A = np.array([[1, 3, 2],
[2, 2, 3],
[3, 1, 1]])
B = np.array([[0, 1, 0],
[1, 1, 0],
[1, 1, 1]])
I want to take one column in A and multiply it by each column in B element-wise, then proceed to the next column in A. So, using just one column as an example, I will use A[:,0] (values 1,2,3), and multiply it by each column in B to get this:
array([[0, 1, 0],
[2, 2, 0],
[3, 3, 3]])
I've implemented this using np.einsum like so:
np.einsum('i,ij->ij',A[:,0],B)
I then want to generate a 3D matrix with the depth dimension corresponding to the multiplication by each column in A, which I implemented using a for loop:
np.stack([np.einsum('i,ij->ij',A[:,i],B) for i in range(0,A.shape[1])])
This returns my desired array:
array([[[0, 1, 0],
[2, 2, 0],
[3, 3, 3]],
[[0, 3, 0],
[2, 2, 0],
[1, 1, 1]],
[[0, 2, 0],
[3, 3, 0],
[1, 1, 1]]])
How would I go about doing this without the loop? Can this be done purely with np.einsum? Is there another function in NumPy that will do this more simply?
Here's a simple way:
A.T[:,:,None]*B
adding the last None in indexing creates a new axis which is then used for broadcasting the elementwise multiplication.
How about this code?
A.T.reshape(3, 3, 1) * B
Reshaping ndarray can make doing many things...
Keeping with your usage of einsum:
np.einsum('ij,ik->jik', A, B)
Is there a way in python/numpy/scipy to create dynamically a list of integers in a specific range, which can vary and in which the numbers are ordererd depending on a distribtuin, like nomral(gaussian), exponential, linear. I imagine something
like for range 3:
[1,2,3]
[2,1,2]
[1,2,1]
[3,2,1]
for range 4:
[1,2,3,4]
[2,1,1,2]
[1,2,2,1]
[4,3,2,1]
for range 5:
[1,2,3,4,5]
[2,1,0,1,2]
[1,2,3,2,1]
[5,4,3,2,1]
We could use a bit of trickery using np.minimum to generate the symmetrical version in third row. The second row is just a complement of the third row subtracted from 3. The first and last rows are just ranges starting from 1 till n and flipped version of it respectively.
Thus, we would have one approach after row-stacking those rows to have a 2D array, like so -
def ranged_arr(n):
r = np.arange(n)+1
row3 = np.minimum(r,r[::-1])
return np.c_[r, 3-row3, row3, r[::-1]].T
We could also use np.row_stack to do the stacking -
np.row_stack((r, 3-row3, row3, r[::-1]))
Sample runs -
In [106]: ranged_arr(n=3)
Out[106]:
array([[1, 2, 3],
[2, 1, 2],
[1, 2, 1],
[3, 2, 1]])
In [107]: ranged_arr(n=4)
Out[107]:
array([[1, 2, 3, 4],
[2, 1, 1, 2],
[1, 2, 2, 1],
[4, 3, 2, 1]])
In [108]: ranged_arr(n=5)
Out[108]:
array([[1, 2, 3, 4, 5],
[2, 1, 0, 1, 2],
[1, 2, 3, 2, 1],
[5, 4, 3, 2, 1]])
I'd like to create a generator that returns a array on fly. For example:
import numpy as np
def my_gen():
c = np.ones(5)
j = 0
t = 10
while j < t:
c[0] = j
yield c
j += 1
With a simple for loop:
for g in my_gen():
print (g)
I got what I want. But with list(my_gen()), I got a list which contains always the same thing.
I digged a little deeper and I find when I yield c.tolist() instead of yield c, everything went ok...
I just cannot explain myself how come this strange behaviour...
That is because c is always pointing to the same numpy array reference, you are just changing the element inside c in the generator function.
When simply printing, it prints the complete c array at that particular moment , hence you correctly get the values printed.
But when you are using list(my_gen()) , you keep adding the same reference to c numpy array into the list, and hence any changes to that numpy array also reflect in the previously added elements in the list.
It works for you when you do yield c.tolist() , because that creates a new list from the numpy array, hence you keep adding new list objects to the list and hence changes in the future to c does not reflect in the previously added lists.
An alternative generator returns a copy of a list. I'm retaining the np.ones() as a convenient way of creating the numbers, but converting it to a list right away (just once) (array.tolist() is relatively expensive).
I yield c[:] to avoid that 'current version' problem.
def gen_c():
c = np.ones(5,dtype=int).tolist()
j = 0
t = 10
while j < t:
c[0] = j
yield c[:]
j += 1
In [54]: list(gen_c())
Out[54]:
[[0, 1, 1, 1, 1],
[1, 1, 1, 1, 1],
[2, 1, 1, 1, 1],
[3, 1, 1, 1, 1],
[4, 1, 1, 1, 1],
[5, 1, 1, 1, 1],
[6, 1, 1, 1, 1],
[7, 1, 1, 1, 1],
[8, 1, 1, 1, 1],
[9, 1, 1, 1, 1]]
In [55]: np.array(list(gen_c()))
Out[55]:
array([[0, 1, 1, 1, 1],
[1, 1, 1, 1, 1],
[2, 1, 1, 1, 1],
[3, 1, 1, 1, 1],
[4, 1, 1, 1, 1],
[5, 1, 1, 1, 1],
[6, 1, 1, 1, 1],
[7, 1, 1, 1, 1],
[8, 1, 1, 1, 1],
[9, 1, 1, 1, 1]])
Ok, I think because in this generator, since I'm returning the same reference, generator yield always the same thing. If I yield np.array(c), that'll work...
I have the following 3 x 3 x 3 numpy array called a (the comments will make sense after you read the rest of the question):
array([[[8, 1, 0], # irrelevant 1 (is at position 1 rather than 0)
[1, 7, 5], # the 1 on this line is what I am after!
[1, 4, 9]], # irrelevant 1 (out of the "cross")
[[4, 0, 1], # irrelevant 1 (is at position 2 rather than 0)
[1, 0, 1], # I'm only after the first 1 on this line!
[6, 2, 1]], # irrelevant 1 (is at position 2 rather than 0)
[[0, 2, 2],
[0, 6, 7],
[3, 4, 9]]])
furthermore I have this list of indexes that refers to the "central cross" of said matrix, called idx
[array([0, 1, 1, 1, 2]), array([1, 0, 1, 2, 1])]
EDIT: I call it "cross" as it marks the central column and row in the following:
>>> a[..., 0]
array([[8, 1, 1],
[4, 1, 6],
[0, 0, 3]])
What I would like to obtain is the indexes of all those arrays located at idx whose first value is 1, but I'm struggling in understanding how to use numpy.where() in the right way. Since...
>>> a[..., 0][idx]
array([1, 4, 1, 6, 0])
...I tried...
>>> np.where(a[..., 0][idx] == 1)
(array([0, 2]),)
...but as you can see it returns the index of the sliced array, not of a, while I would like to get:
[array([0, 1]), array([1, 1])] #as a[0, 1, 0] and a [1, 1, 0] are equal to 1.
Thank you in advance for your help!
PS: In the comments I have been suggested to try to give a broader scenario of applicability. Although it is not what I am using for, I suppose this could be used to process images as many 2D libraries do, with a source layer, a destination layer and a mask (see for example cairo). In this case the mask would be the idx array, and one might imagine working with the R channel of RGB colors (a[..., 0]).
You can translate the indices back using idx:
>>> w = np.where(a[..., 0][idx] == 1)[0]
>>> array(idx).T[w]
array([[0, 1],
[1, 1]])