Interpolating an array within an astropy table column - python

I have a multiband catalog of radiation sources (from SourceExtractor, if you care to know), which I have read into an astropy table in the following form:
Source # | FLUX_APER_BAND1 | FLUXERR_APER_BAND1 ... FLUX_APER_BANDN | FLUXERR_APER_BANDN
1 np.array(...) np.array(...) ... np.array(...) np.array(...)
...
The arrays in FLUX_APER_BAND1, FLUXERR_APER_BAND1, etc. each have 14 elements, which give the number of photon counts for a given source in a given band, within 14 different distances from the center of the source (aperture photometry). I have the array of apertures (2, 3, 4, 6, 8, 10, 14, 20, 28, 40, 60, 80, 100, and 160 pixels), and I want to interpolate the 14 samples into a single (assumed) count at some other aperture a.
I could iterate over the sources, but the catalog has over 3000 of them, and that's not very pythonic or very efficient (interpolating 3000 objects in 8 bands would take a while). Is there a way of interpolating all the arrays in a single column simultaneously, to the same aperture? I tried simply applying np.interp, but that threw ValueError: object too deep for desired array, as well as np.vectorize(np.interp), but that threw ValueError: object of too small depth for desired array. It seems like aggregation should also be possible over the contents of a single column, but I can't make sense of the documentation.
Can someone shed some light on this? Thanks in advance!

I'm not familiar with the format of an astropy table, but it looks like it could be represented as a three-dimensional numpy array, with axes for source, band and aperture. If that is the case, you can use, for example, scipy.interpolate.interp1d. Here's a simple example.
In [51]: from scipy.interpolate import interp1d
Make some sample data. The "table" y is 3-D, with shape (2, 3, 14). Think of it as the array holding the counts for 2 sources, 3 bands and 14 apertures.
In [52]: x = np.array([2, 3, 4, 6, 8, 10, 14, 20, 28, 40, 60, 80, 100, 160])
In [53]: y = np.array([[x, 2*x, 3*x], [x**2, (x+1)**3/400, (x**1.5).astype(int)]])
In [54]: y
Out[54]:
array([[[ 2, 3, 4, 6, 8, 10, 14, 20, 28,
40, 60, 80, 100, 160],
[ 4, 6, 8, 12, 16, 20, 28, 40, 56,
80, 120, 160, 200, 320],
[ 6, 9, 12, 18, 24, 30, 42, 60, 84,
120, 180, 240, 300, 480]],
[[ 4, 9, 16, 36, 64, 100, 196, 400, 784,
1600, 3600, 6400, 10000, 25600],
[ 0, 0, 0, 0, 1, 3, 8, 23, 60,
172, 567, 1328, 2575, 10433],
[ 2, 5, 8, 14, 22, 31, 52, 89, 148,
252, 464, 715, 1000, 2023]]])
Create the interpolator. This creates a linear interpolator by default. (Check out the docstring for different interpolators. Also, before calling interp1d, you might want to transform your data in such a way that linear interpolation is appropriate.) I use axis=2 to create an interpolator of the aperture axis. f will be a function that takes an aperture value and returns an array with shape (2,3).
In [55]: f = interp1d(x, y, axis=2)
Take a look at a couple y slices. These correspond to apertures 2 and 3 (i.e. x[0] and x[1]).
In [56]: y[:,:,0]
Out[56]:
array([[2, 4, 6],
[4, 0, 2]])
In [57]: y[:,:,1]
Out[57]:
array([[3, 6, 9],
[9, 0, 5]])
Use the interpolator to get the values at apertures 2, 2.5 and 3. As expected, the values at 2 and 3 match the values in y.
In [58]: f(2)
Out[58]:
array([[ 2., 4., 6.],
[ 4., 0., 2.]])
In [59]: f(2.5)
Out[59]:
array([[ 2.5, 5. , 7.5],
[ 6.5, 0. , 3.5]])
In [60]: f(3)
Out[60]:
array([[ 3., 6., 9.],
[ 9., 0., 5.]])

About being Pythonic, key aspects of that are simplicity, readability, and practicality. If your case is really a one-off (i.e. you'll be doing the 3000 x 8 interpolations a few times rather than a million times), then the fastest and most easily understood solution would be the simple one of just iterating with Python loops. By fastest I mean from the time you know your question until the time you have an answer from your code.
The overhead of looping and calling a function 24000 times is quite small in human / astronomer time scales, and definitely much lower than writing a stack-overflow post. :-)

Related

2D cross-correlation of a 6x6 array with three 3x3 kernels

I have a 6x6 matrix: e.g. matrix A
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35]])
I also have a 3x3x3 matrix: e.g. matrix B
array([[[ 1, 7, 2],
[ 5, 9, 3],
[ 2, 8, 6]],
[[ 3, 4, 6],
[ 6, 8, 9],
[ 4, 2, 8]],
[[ 6, 4, 7],
[ 8, 7, 8],
[ 4, 4, 7]]])
Finally, I have a 3x4x4 matrix C, (4 rows, 4 columns, 3 dimensions), that's empty (filled with 0s)
I want to multiply each "3rd dimension" of B (i.e. [1,:,:],[2,:,:],[3,:,:]) with A. However, for each dimension I want to multiply B in "windows", sliding by 1 each time across A till I cannot go further, at which point I move back to the beginning, slide 1 unit down and again sliding across one-by-one multiplying B with A, till the end, then move down and repeat till you don't go over the border. The results being stored in the respective "3rd dimension" of matrix C. So my result would be a [3x4x4] matrix.
Ex. (multiplication is dot product giving a scalar value, np.sum((np.multiply(x,y)))), so...
imagining B "overtop" of A, starting in the right corner, I multiply that 3x3 part of A with Bs [1x3x3] part storing the result in C...
referring to 1st unit (located in 1st row and 1st column) in the 1st dimension of C...
C[1,0,0] = 340. because [[0,1,2],[6,7,8],[12,13,4]] dot product [[1,7,2],[5,9,3],[2,8,6]]
sliding B matrix over by 1 on A, and storing my 2nd result in C...
C[1,0,1] = 383. because [[1,2,3],[7,8,9],[13,14,15]] dot product [[1,7,2],[5,9,3],[2,8,6]]
Then repeat this procedure of sliding across and down and across and ..., for B[2,:,:] and B[3,:,:] over A again, storing in C2,:,:] and C[3,:,:] respectively.
What is a good way to do this?
I think you're asking about 2D cross-correlation with three different kernels, rather than straightforward matrix multiplication.
The following piece of code is not the most efficient way to do this, but does this give you the answer you are looking for? I'm using scipy.signal.correlate2d to achieve 2D correlation here...
>>> from scipy.signal import correlate2d
>>> C = np.dstack([correlate2d(A, B[:, :, i], 'valid') for i in range(B.shape[2])])
>>> C.shape
(4, 4, 3)
>>> C
array([[[ 333, 316, 464],
[ 372, 369, 520],
[ 411, 422, 576],
[ 450, 475, 632]],
[[ 567, 634, 800],
[ 606, 687, 856],
[ 645, 740, 912],
[ 684, 793, 968]],
[[ 801, 952, 1136],
[ 840, 1005, 1192],
[ 879, 1058, 1248],
[ 918, 1111, 1304]],
[[1035, 1270, 1472],
[1074, 1323, 1528],
[1113, 1376, 1584],
[1152, 1429, 1640]]])
Here's a more "fun" way of doing this which doesn't use scipy, but using stride_tricks instead. I'm not sure if it's more efficient:
>>> import numpy.lib.stride_tricks as st
>>> s, t = A.strides
>>> i, j = A.shape
>>> k, l, m = B.shape
>>> D = st.as_strided(A, shape=(i-k+1, j-l+1, k, l), strides=(s, t, s, t))
>>> E = np.einsum('ijkl,klm->ijm', D, B)
>>> (E == C).all()
True

Tricky numpy argmax on last dimension of 3-dimensional ndarray

if have an array of shape (9,1,3).
array([[[ 6, 12, 108]],
[[122, 112, 38]],
[[ 57, 101, 62]],
[[119, 76, 177]],
[[ 46, 62, 2]],
[[127, 61, 155]],
[[ 5, 6, 151]],
[[ 5, 8, 185]],
[[109, 167, 33]]])
I want to find the argmax index of the third dimension, in this case it would be 185, so index 7.
I guess the solution is linked to reshaping but I can't wrap my head around it. Thanks for any help!
I'm not sure what's tricky about it. But, one way to get the index of the greatest element along the last axis would be by using np.max and np.argmax like:
# find `max` element along last axis
# and get the index using `argmax` where `arr` is your array
In [53]: np.argmax(np.max(arr, axis=2))
Out[53]: 7
Alternatively, as #PaulPanzer suggested in his comments, you could use:
In [63]: np.unravel_index(np.argmax(arr), arr.shape)
Out[63]: (7, 0, 2)
In [64]: arr[(7, 0, 2)]
Out[64]: 185
You may have to do it like this:
data = np.array([[[ 6, 12, 108]],
[[122, 112, 38]],
[[ 57, 101, 62]],
[[119, 76, 177]],
[[ 46, 62, 2]],
[[127, 61, 155]],
[[ 5, 6, 151]],
[[ 5, 8, 185]],
[[109, 167, 33]]])
np.argmax(data[:,0][:,2])
7

Fill matrix diagonal with different values for each python numpy

I saw a function numpy.fill_diagonal which assigns same value for diagonal elements. But I want to assign different random values for each diagonal elements. How can I do it in python ? May be using scipy or other libraries ?
That the docs call the fill val a scalar is an existing documentation bug. In fact, any value that can be broadcasted here is OK.
Fill diagonal works fine with array-likes:
>>> a = np.arange(1,10).reshape(3,3)
>>> a
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
>>> np.fill_diagonal(a, [99, 42, 69])
>>> a
array([[99, 2, 3],
[ 4, 42, 6],
[ 7, 8, 69]])
It's a stride trick, since the diagonal elements are regularly spaced by the array's width + 1.
From the docstring, that's a better implementation than using np.diag_indices too:
Notes
-----
.. versionadded:: 1.4.0
This functionality can be obtained via `diag_indices`, but internally
this version uses a much faster implementation that never constructs the
indices and uses simple slicing.
You can use np.diag_indices to get those indices and then simply index into the array with those and assign values.
Here's a sample run to illustrate it -
In [86]: arr # Input array
Out[86]:
array([[13, 69, 35, 98, 16],
[93, 42, 72, 51, 65],
[51, 33, 96, 43, 53],
[15, 26, 16, 17, 52],
[31, 54, 29, 95, 80]])
# Get row, col indices
In [87]: row,col = np.diag_indices(arr.shape[0])
# Assign values, let's say from an array to illustrate
In [88]: arr[row,col] = np.array([100,200,300,400,500])
In [89]: arr
Out[89]:
array([[100, 69, 35, 98, 16],
[ 93, 200, 72, 51, 65],
[ 51, 33, 300, 43, 53],
[ 15, 26, 16, 400, 52],
[ 31, 54, 29, 95, 500]])
You can also use np.diag_indices_from and probably would be more idomatic, like so -
row, col = np.diag_indices_from(arr)
Note : The tried function would work just fine. This is discussed in a previous Q&A - Numpy modify ndarray diagonal too.
Create an identity matrix with n dimensions (take input from the user). Fill the diagonals of that matrix with the multiples of the number provided by the user.
arr=np.eye(4)
j=3
np.fill_diagonal(arr,6)
for i,x in zip(range(4),range(1,5)):
arr[i,i]=arr[i,i]*x
arr[i,j]=6*(j+1)
j-=1
arr
output:
array([[ 6., 0., 0., 24.],
[ 0., 12., 18., 0.],
[ 0., 12., 18., 0.],
[ 6., 0., 0., 24.]])

Alternative for numpy.choose that allows an arbitrary or at least more than 32 arguments?

With my code I'm running into an issue where the numpy.choose methods does not accept all the arguments since it is limited by NPY_MAXARGS (=32). Is there an alternative available, that allows an arbitrary number of argument arrays or at least more than 32 that is as fast as numpy.choose?
choices = [np.arange(0,100)]*100
selection = [0] * 100
np.choose(selection, choices)
>> ValueError: Need between 2 and (32) array objects (inclusive).
Any help would be appreciated... :)
The indices can be given as lists. Assuming that selections has the same length as choices:
b = numpy.array(choices)
result = b[range(len(selections)), selections]
will give the value in choices specified by the index in selections. See it in action:
numpy.random.seed(1)
b = numpy.random.randint(0,100,(5,10))
>>> array([[37, 12, 72, 9, 75, 5, 79, 64, 16, 1],
[76, 71, 6, 25, 50, 20, 18, 84, 11, 28],
[29, 14, 50, 68, 87, 87, 94, 96, 86, 13],
[ 9, 7, 63, 61, 22, 57, 1, 0, 60, 81],
[ 8, 88, 13, 47, 72, 30, 71, 3, 70, 21]])
selections = numpy.random.randint(0,10,5)
>>> array([1, 9, 3, 4, 8])
result = b[range(len(selections)),selections]
>>>> array([12, 28, 68, 22, 70])
choose has the 32 object limit because it broadcasts the arrays together. Consider the error messages for these two actions:
In [982]: np.arange(33).choose(np.ones((33,33)))
...
ValueError: Need at least 1 and at most 32 array objects.
In [983]: np.broadcast(*range(33))
...
ValueError: Need at least 1 and at most 32 array objects.
An example exploiting that broadcasting, picking values from a 2d array, 1d and scalar.
In [998]: np.diag([2,1,0]).choose((np.arange(9).reshape(3,3), 0,[.1,.2,.3]))
Out[998]:
array([[ 0.1, 1. , 2. ],
[ 3. , 0. , 5. ],
[ 6. , 7. , 8. ]])
As #Benjamin shows np.choose can be used to select items from the successive columns of a 2d array - provided there aren't more than 32 columns
In [1002]: M=np.arange(9).reshape(3,3)
In [1003]: np.array([2,0,1]).choose(M)
Out[1003]: array([6, 1, 5])
In [1004]: M[[2,0,1],[0,1,2]]
Out[1004]: array([6, 1, 5])
It was in just such a context that I recall first seeing this 32 array limit to choose, and one of the few where I've seen choose used in an answer.
It is a compiled function, PyArray_Choose and array_choose
https://github.com/numpy/numpy/blob/0b2e590ec18942f8f149ab2306b80da86b04eaeb/numpy/core/src/multiarray/item_selection.c
https://github.com/numpy/numpy/blob/945c308e96fb815729e8f8aeb0ad6b39b8bdf84a/numpy/core/src/multiarray/methods.c
I don't see any uses of this function in other compiled numpy code. And apart from testing, little use in rest of numpy.
I know this is 6 years old, but since this is the first result on Google for the Need at least 0 and at most 32 array objects error, I thought I'd add this.
Depending on what you're trying to do, you can probably just use advanced indexing (using an array to index another array):
choices = np.array([np.random.random(5) for _ in range(5)])
keys = np.random.randint(choices.shape[0]-1, size = 10)
choices[keys]

Add dimensions to an array from a list

The structure of my array 'cama' is the following:
shape(cama)
>>>(365, 720, 1440)
And the shape of my 'lon_list' is the following:
shape(lon_list)
>>>(1440,)
What I want is to add or append lon_list to cama. So that I end up with an array with the following dimensions:
shape(new_cama)
>>>(365, 720, 1440, 1440)
I've tried:
new_cama = np.concatenate((cama, lon_list))
ValueError: all the input arrays must have same number of dimensions
Any suggestions?
Dimensions are multiplicative. This means if you have:
a = [[1, 2, 3], [4, 5, 6]]
You have a (2 x 3) array of 6 elements. If you want to add another dimension, say b=[10, 20, 30, 40], you will end up with an array of (2 x 3 x 4) = 24 elements, so you need to provide a way to complete these missing (24 - 4 - 6) = 14 elements. You cannot 'append' a dimension: increasing the dimension means appending newdimiension - 1 elements in all dimensions.
So what can we do with shapes like this ? Well, you can broadcast your arrays to match the corresponding dimension and combine them in some way:
In[52]: a = np.array([[1,2,3], [4,5,6]])
In[53]: b = np.array([10, 20, 30, 40])
In[54]: a
Out[54]:
array([[1, 2, 3],
[4, 5, 6]])
In[55]: b
Out[55]: array([10, 20, 30, 40])
In[56]: c = a[:, :, None] * b[None, None, :]
In[57]: c
Out[57]:
array([[[ 10, 20, 30, 40],
[ 20, 40, 60, 80],
[ 30, 60, 90, 120]],
[[ 40, 80, 120, 160],
[ 50, 100, 150, 200],
[ 60, 120, 180, 240]]])
In[58]: c.shape
Out[58]: (2L, 3L, 4L)
Here I have multiplied them, but it is of course possible to use any other combination: the key thing to understand is that:
You can only apply operations to arrays of similar shapes
You can 'emulate' bigger shapes via broadcasting.
This way, we can combine a (2, 3) and a (4,) arrays by 'viewing' them both as (2, 3, 4) arrays and combining them in some way. But you have to have them make sense as arrays of similar shapes.
Alternatively, you can concatenate arrays, if all their dimensions but one differ. Say you have a (2, 3, 4, 5) array and a (2, 3, 7, 5) array: then it makes sense to concatenate them in a (2, 3, 4+7, 5) array. You can again do that by 'emulating' shapes (basically by repeating the pattern along any missing dimension) but from your question, it's unclear this is what you're trying to achieve...

Categories