Add dimensions to an array from a list - python

The structure of my array 'cama' is the following:
shape(cama)
>>>(365, 720, 1440)
And the shape of my 'lon_list' is the following:
shape(lon_list)
>>>(1440,)
What I want is to add or append lon_list to cama. So that I end up with an array with the following dimensions:
shape(new_cama)
>>>(365, 720, 1440, 1440)
I've tried:
new_cama = np.concatenate((cama, lon_list))
ValueError: all the input arrays must have same number of dimensions
Any suggestions?

Dimensions are multiplicative. This means if you have:
a = [[1, 2, 3], [4, 5, 6]]
You have a (2 x 3) array of 6 elements. If you want to add another dimension, say b=[10, 20, 30, 40], you will end up with an array of (2 x 3 x 4) = 24 elements, so you need to provide a way to complete these missing (24 - 4 - 6) = 14 elements. You cannot 'append' a dimension: increasing the dimension means appending newdimiension - 1 elements in all dimensions.
So what can we do with shapes like this ? Well, you can broadcast your arrays to match the corresponding dimension and combine them in some way:
In[52]: a = np.array([[1,2,3], [4,5,6]])
In[53]: b = np.array([10, 20, 30, 40])
In[54]: a
Out[54]:
array([[1, 2, 3],
[4, 5, 6]])
In[55]: b
Out[55]: array([10, 20, 30, 40])
In[56]: c = a[:, :, None] * b[None, None, :]
In[57]: c
Out[57]:
array([[[ 10, 20, 30, 40],
[ 20, 40, 60, 80],
[ 30, 60, 90, 120]],
[[ 40, 80, 120, 160],
[ 50, 100, 150, 200],
[ 60, 120, 180, 240]]])
In[58]: c.shape
Out[58]: (2L, 3L, 4L)
Here I have multiplied them, but it is of course possible to use any other combination: the key thing to understand is that:
You can only apply operations to arrays of similar shapes
You can 'emulate' bigger shapes via broadcasting.
This way, we can combine a (2, 3) and a (4,) arrays by 'viewing' them both as (2, 3, 4) arrays and combining them in some way. But you have to have them make sense as arrays of similar shapes.
Alternatively, you can concatenate arrays, if all their dimensions but one differ. Say you have a (2, 3, 4, 5) array and a (2, 3, 7, 5) array: then it makes sense to concatenate them in a (2, 3, 4+7, 5) array. You can again do that by 'emulating' shapes (basically by repeating the pattern along any missing dimension) but from your question, it's unclear this is what you're trying to achieve...

Related

Sum of rows based on index with Numpy

I have a 2D array composed of 2D vectors and a 1D array of indices.
How can I add / sumvthe rows of the 2D array that share the same index, using numpy?
Example:
arr = np.array([[48, -51], [-15, -55], [26, -49], [-13, -17], [-67, -7], [23, -48], [-29, -64], [37, 68]])
idx = np.array([0, 1, 1, 2, 2, 3, 3, 4])
#desired output
array([[48, -51],
[11, -104],
[-80, -24],
[-6, -112],
[ 37, 68]])
Notice how the original array arr is of shape (8, 2), and the result of the operation is (5, 2).
If the indices are not always grouped, apply np.argsort first:
order = np.argsort(idx)
You can compute the locations of the sums using np.diff followed by np.flatnonzero to get the indices. We'll also prepend zero and shift everything by 1:
breaks = np.flatnonzero(np.concatenate(([1], np.diff(idx[order])))
breaks can now be used as an argument to np.add.reduceat:
result = np.add.reduceat(arr[order, :], breaks, axis=0)
If the indices are already grouped, you don't need to use order at all:
breaks = np.flatnonzero(np.concatenate(([1], np.diff(idx)))
result = np.add.reduceat(arr, breaks, axis=0)
You can use pandas for the purpose:
pd.DataFrame(arr).groupby(idx).sum().to_numpy()
Output:
array([[ 48, -51],
[ 11, -104],
[ -80, -24],
[ -6, -112],
[ 37, 68]])

Is there a way to conditionally index 3D-numpy array?

Having an array A with the shape (2,6, 60), is it possible to index it based on a binary array B of shape (6,)?
The 6 and 60 is quite arbitrary, they are simply the 2D data I wish to access.
The underlying thing I am trying to do is to calculate two variants of the 2D data (in this case, (6,60)) and then efficiently select the ones with the lowest total sum - that is where the binary (6,) array comes from.
Example: For B = [1,0,1,0,1,0] what I wish to receive is equal to stacking
A[1,0,:]
A[0,1,:]
A[1,2,:]
A[0,3,:]
A[1,4,:]
A[0,5,:]
but I would like to do it by direct indexing and not a for-loop.
I have tried A[B], A[:,B,:], A[B,:,:] A[:,:,B] with none of them providing the desired (6,60) matrix.
import numpy as np
A = np.array([[4, 4, 4, 4, 4, 4], [1, 1, 1, 1, 1, 1]])
A = np.atleast_3d(A)
A = np.tile(A, (1,1,60)
B = np.array([1, 0, 1, 0, 1, 0])
A[B]
Expected results are a (6,60) array containing the elements from A as described above, the received is either (2,6,60) or (6,6,60).
Thank you in advance,
Linus
You can generate a range of the indices you want to iterate over, in your case from 0 to 5:
count = A.shape[1]
indices = np.arange(count) # np.arange(6) for your particular case
>>> print(indices)
array([0, 1, 2, 3, 4, 5])
And then you can use that to do your advanced indexing:
result_array = A[B[indices], indices, :]
If you always use the full range from 0 to length - 1 (i.e. 0 to 5 in your case) of the second axis of A in increasing order, you can simplify that to:
result_array = A[B, indices, :]
# or the ugly result_array = A[B, np.arange(A.shape[1]), :]
Or even this if it's always 6:
result_array = A[B, np.arange(6), :]
An alternative solution using np.take_along_axis (from version 1.15 - docs)
import numpy as np
x = np.arange(2*6*6).reshape((2,6,6))
m = np.zeros(6, int)
m[0] = 1
#example: [1, 0, 0, 0, 0, 0]
np.take_along_axis(x, m[None, :, None], 0) #add dimensions to mask to match array dimensions
>>array([[[36, 37, 38, 39, 40, 41],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35]]])

Tensorflow, how to multiply a 2D tensor (matrix) by corresponding elements in a 1D vector

I have a 2D matrix M of shape [batch x dim], I have a vector V of shape [batch]. How can I multiply each of the columns in the matrix by the corresponding element in the V? That is:
I know an inefficient numpy implementation would look like this:
import numpy as np
M = np.random.uniform(size=(4, 10))
V = np.random.randint(4)
def tst(M, V):
rows = []
for i in range(len(M)):
col = []
for j in range(len(M[i])):
col.append(M[i][j] * V[i])
rows.append(col)
return np.array(rows)
In tensorflow, given two tensors, what is the most efficient way to achieve this?
import tensorflow as tf
sess = tf.InteractiveSession()
M = tf.constant(np.random.normal(size=(4,10)), dtype=tf.float32)
V = tf.constant([1,2,3,4], dtype=tf.float32)
In NumPy, we would need to make V 2D and then let broadcasting do the element-wise multiplication (i.e. Hadamard product). I am guessing, it should be the same on tensorflow. So, for expanding dims on tensorflow, we can use tf.newaxis (on newer versions) or tf.expand_dims or a reshape with tf.reshape -
tf.multiply(M, V[:,tf.newaxis])
tf.multiply(M, tf.expand_dims(V,1))
tf.multiply(M, tf.reshape(V, (-1, 1)))
In addition to #Divakar's answer, I would like to make a note that the order of M and V don't matter. It seems that tf.multiply also does broadcasting during multiplication.
Example:
In [55]: M.eval()
Out[55]:
array([[1, 2, 3, 4],
[2, 3, 4, 5],
[3, 4, 5, 6]], dtype=int32)
In [56]: V.eval()
Out[56]: array([10, 20, 30], dtype=int32)
In [57]: tf.multiply(M, V[:,tf.newaxis]).eval()
Out[57]:
array([[ 10, 20, 30, 40],
[ 40, 60, 80, 100],
[ 90, 120, 150, 180]], dtype=int32)
In [58]: tf.multiply(V[:, tf.newaxis], M).eval()
Out[58]:
array([[ 10, 20, 30, 40],
[ 40, 60, 80, 100],
[ 90, 120, 150, 180]], dtype=int32)

Torch sum a tensor along an axis

How do I sum over the columns of a tensor?
torch.Size([10, 100]) ---> torch.Size([10])
The simplest and best solution is to use torch.sum().
To sum all elements of a tensor:
torch.sum(x) # gives back a scalar
To sum over all rows (i.e. for each column):
torch.sum(x, dim=0) # size = [ncol]
To sum over all columns (i.e. for each row):
torch.sum(x, dim=1) # size = [nrow]
It should be noted that the dimension summed over is eliminated from the resulting tensor.
Alternatively, you can use tensor.sum(axis) where axis indicates 0 and 1 for summing over rows and columns respectively, for a 2D tensor.
In [210]: X
Out[210]:
tensor([[ 1, -3, 0, 10],
[ 9, 3, 2, 10],
[ 0, 3, -12, 32]])
In [211]: X.sum(1)
Out[211]: tensor([ 8, 24, 23])
In [212]: X.sum(0)
Out[212]: tensor([ 10, 3, -10, 52])
As, we can see from the above outputs, in both cases, the output is a 1D tensor. If you, on the other hand, wish to retain the dimension of the original tensor in the output as well, then you've set the boolean kwarg keepdim to True as in:
In [217]: X.sum(0, keepdim=True)
Out[217]: tensor([[ 10, 3, -10, 52]])
In [218]: X.sum(1, keepdim=True)
Out[218]:
tensor([[ 8],
[24],
[23]])
If you have tensor my_tensor, and you wish to sum across the second array dimension (that is, the one with index 1, which is the column-dimension, if the tensor is 2-dimensional, as yours is), use torch.sum(my_tensor,1) or equivalently my_tensor.sum(1) see documentation here.
One thing that is not mentioned explicitly in the documentation is: you can sum across the last array-dimension by using -1 (or the second-to last dimension, with -2, etc.)
So, in your example, you could use: outputs.sum(1) or torch.sum(outputs,1), or, equivalently, outputs.sum(-1) or torch.sum(outputs,-1). All of these would give the same result, an output tensor of size torch.Size([10]), with each entry being the sum over the all rows in a given column of the tensor outputs.
To illustrate with a 3-dimensional tensor:
In [1]: my_tensor = torch.arange(24).view(2, 3, 4)
Out[1]:
tensor([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]])
In [2]: my_tensor.sum(2)
Out[2]:
tensor([[ 6, 22, 38],
[54, 70, 86]])
In [3]: my_tensor.sum(-1)
Out[3]:
tensor([[ 6, 22, 38],
[54, 70, 86]])
Based on doc https://pytorch.org/docs/stable/generated/torch.sum.html
it should be
dim (int or tuple of python:ints) – the dimension or dimensions to reduce.
dim=0 means reduce row dimensions: condense all rows = sum by col
dim=1 means reduce col dimensions: condense cols= sum by row
Torch sum along multiple axis or dimensions
Just for the sake of completeness (I could not find it easily) I include how to sum along multiple dimensions with torch.sum which is heavily used in computer vision tasks where you have to reduce along H and W dimensions.
If you have an image x with shape C x H x W and want to compute the average pixel intensity value per channel you could do:
avg = torch.sum(x, dim=(1,2)) / (H*W) # Sum along (H,W) and norm

Interpolating an array within an astropy table column

I have a multiband catalog of radiation sources (from SourceExtractor, if you care to know), which I have read into an astropy table in the following form:
Source # | FLUX_APER_BAND1 | FLUXERR_APER_BAND1 ... FLUX_APER_BANDN | FLUXERR_APER_BANDN
1 np.array(...) np.array(...) ... np.array(...) np.array(...)
...
The arrays in FLUX_APER_BAND1, FLUXERR_APER_BAND1, etc. each have 14 elements, which give the number of photon counts for a given source in a given band, within 14 different distances from the center of the source (aperture photometry). I have the array of apertures (2, 3, 4, 6, 8, 10, 14, 20, 28, 40, 60, 80, 100, and 160 pixels), and I want to interpolate the 14 samples into a single (assumed) count at some other aperture a.
I could iterate over the sources, but the catalog has over 3000 of them, and that's not very pythonic or very efficient (interpolating 3000 objects in 8 bands would take a while). Is there a way of interpolating all the arrays in a single column simultaneously, to the same aperture? I tried simply applying np.interp, but that threw ValueError: object too deep for desired array, as well as np.vectorize(np.interp), but that threw ValueError: object of too small depth for desired array. It seems like aggregation should also be possible over the contents of a single column, but I can't make sense of the documentation.
Can someone shed some light on this? Thanks in advance!
I'm not familiar with the format of an astropy table, but it looks like it could be represented as a three-dimensional numpy array, with axes for source, band and aperture. If that is the case, you can use, for example, scipy.interpolate.interp1d. Here's a simple example.
In [51]: from scipy.interpolate import interp1d
Make some sample data. The "table" y is 3-D, with shape (2, 3, 14). Think of it as the array holding the counts for 2 sources, 3 bands and 14 apertures.
In [52]: x = np.array([2, 3, 4, 6, 8, 10, 14, 20, 28, 40, 60, 80, 100, 160])
In [53]: y = np.array([[x, 2*x, 3*x], [x**2, (x+1)**3/400, (x**1.5).astype(int)]])
In [54]: y
Out[54]:
array([[[ 2, 3, 4, 6, 8, 10, 14, 20, 28,
40, 60, 80, 100, 160],
[ 4, 6, 8, 12, 16, 20, 28, 40, 56,
80, 120, 160, 200, 320],
[ 6, 9, 12, 18, 24, 30, 42, 60, 84,
120, 180, 240, 300, 480]],
[[ 4, 9, 16, 36, 64, 100, 196, 400, 784,
1600, 3600, 6400, 10000, 25600],
[ 0, 0, 0, 0, 1, 3, 8, 23, 60,
172, 567, 1328, 2575, 10433],
[ 2, 5, 8, 14, 22, 31, 52, 89, 148,
252, 464, 715, 1000, 2023]]])
Create the interpolator. This creates a linear interpolator by default. (Check out the docstring for different interpolators. Also, before calling interp1d, you might want to transform your data in such a way that linear interpolation is appropriate.) I use axis=2 to create an interpolator of the aperture axis. f will be a function that takes an aperture value and returns an array with shape (2,3).
In [55]: f = interp1d(x, y, axis=2)
Take a look at a couple y slices. These correspond to apertures 2 and 3 (i.e. x[0] and x[1]).
In [56]: y[:,:,0]
Out[56]:
array([[2, 4, 6],
[4, 0, 2]])
In [57]: y[:,:,1]
Out[57]:
array([[3, 6, 9],
[9, 0, 5]])
Use the interpolator to get the values at apertures 2, 2.5 and 3. As expected, the values at 2 and 3 match the values in y.
In [58]: f(2)
Out[58]:
array([[ 2., 4., 6.],
[ 4., 0., 2.]])
In [59]: f(2.5)
Out[59]:
array([[ 2.5, 5. , 7.5],
[ 6.5, 0. , 3.5]])
In [60]: f(3)
Out[60]:
array([[ 3., 6., 9.],
[ 9., 0., 5.]])
About being Pythonic, key aspects of that are simplicity, readability, and practicality. If your case is really a one-off (i.e. you'll be doing the 3000 x 8 interpolations a few times rather than a million times), then the fastest and most easily understood solution would be the simple one of just iterating with Python loops. By fastest I mean from the time you know your question until the time you have an answer from your code.
The overhead of looping and calling a function 24000 times is quite small in human / astronomer time scales, and definitely much lower than writing a stack-overflow post. :-)

Categories