Creating Multidimensional arrays and interpolating in Python - python

I have 8 arrays, when each one is plotted it gives 'x Vs. Detection Probability'. I want to combine these arrays so that I can perform a multidimensional interpolation to find the detection probability from variables in each of the dimensions.
Here are a couple of my arrays as an example.
In [3]: mag_rec
Out[3]:
array([[ 1.35000000e+01, 0.00000000e+00],
[ 1.38333333e+01, 5.38461538e-01],
[ 1.41666667e+01, 5.84158416e-01],
[ 1.45000000e+01, 6.93771626e-01],
[ 1.48333333e+01, 7.43629344e-01],
[ 1.51666667e+01, 8.30774480e-01],
[ 1.55000000e+01, 8.74700571e-01],
[ 1.58333333e+01, 8.84866920e-01],
[ 1.61666667e+01, 8.95135908e-01],
[ 1.65000000e+01, 8.97150997e-01],
[ 1.68333333e+01, 8.90416846e-01],
[ 1.71666667e+01, 8.90911598e-01],
[ 1.75000000e+01, 8.90111460e-01],
[ 1.78333333e+01, 8.89567069e-01],
[ 1.81666667e+01, 8.82184730e-01],
[ 1.85000000e+01, 8.76020265e-01],
[ 1.88333333e+01, 8.54947843e-01],
[ 1.91666667e+01, 8.43505477e-01],
[ 1.95000000e+01, 8.24739363e-01],
[ 1.98333333e+01, 7.70070922e-01],
[ 2.01666667e+01, 6.33006993e-01],
[ 2.05000000e+01, 4.45367502e-01],
[ 2.08333333e+01, 2.65029636e-01],
[ 2.11666667e+01, 1.22023390e-01],
[ 2.15000000e+01, 4.02201524e-02],
[ 2.18333333e+01, 1.51190986e-02],
[ 2.21666667e+01, 8.75088215e-03],
[ 2.25000000e+01, 4.39466969e-03],
[ 2.28333333e+01, 3.65476525e-03]])
and
In [5]: lmt_mag
Out[5]:
array([[ 16.325 , 0.35 ],
[ 16.54166667, 0.39583333],
[ 16.75833333, 0.35555556],
[ 16.975 , 0.29666667],
[ 17.19166667, 0.42222222],
[ 17.40833333, 0.38541667],
[ 17.625 , 0.4875 ],
[ 17.84166667, 0.41956242],
[ 18.05833333, 0.45333333],
[ 18.275 , 0.45980392],
[ 18.49166667, 0.46742424],
[ 18.70833333, 0.4952381 ],
[ 18.925 , 0.49423077],
[ 19.14166667, 0.53375 ],
[ 19.35833333, 0.56239316],
[ 19.575 , 0.52217391],
[ 19.79166667, 0.55590909],
[ 20.00833333, 0.57421227],
[ 20.225 , 0.5729304 ],
[ 20.44166667, 0.61708204],
[ 20.65833333, 0.63968037],
[ 20.875 , 0.65627395],
[ 21.09166667, 0.66177885],
[ 21.30833333, 0.69375 ],
[ 21.525 , 0.67083333],
[ 21.95833333, 0.88333333],
[ 22.175 , 0.85833333]])
How, in Python, would I go about combining these arrays into a multidimensional array? (More arrays will have to be included)
Further to this, once I have this multidimensional array, is scipy.ndimage.interpolation.map_coordinates the fastest way to interpolate on this?

you can concatenate your arrays with numpy.concatenate((a1, a2, ...), axis=0)
and for redimension numpy have some different function HERE that you can use them depending on your need !
e.g Demo:
>>> import numpy as np
>>> a = np.array([[1, 2], [3, 4]])
>>> b = np.array([[5, 6]])
>>> np.concatenate((a, b), axis=0)
array([[1, 2],
[3, 4],
[5, 6]])
>>> np.expand_dims(y, axis=0)
array([[[1, 2],
[3, 4],
[5, 6]]])
>>> np.expand_dims(y, axis=2)
array([[[1],
[2]],
[[3],
[4]],
[[5],
[6]]])

Related

How do I apply cumulative sum to a numpy column based on another column having the same values?

I have numpy arrays with the following structure:
array([[ 113.555 , 1506. ],
[ 113.595 , 1460. ],
[ 113.605 , 3900. ],
[ 113.605 , 3002. ],
[ 113.612 , 3434. ],
...
[ 113.6351 , 2095. ]])
And I would like to group the values of column two for those rows with the same value in first column.
So the above would become:
array([[ 113.555 , 1506. ],
[ 113.595 , 1460. ],
[ 113.605 , 6902. ],
[ 113.612 , 3434. ],
...
[ 113.6351 , 2095. ]])
and will have one less row. The order shall be respected. The arrays are always ordered by the first column.
Which would be the numpy way of implementing this? Is there any method from the API that can be used?
I have tried to iterate and check for the previous value, but it does not seem like the right way to do it in numpy.
You need to use dictionary comprehension:
a = np.array([[ 113.555 , 1506. ],
[ 113.595 , 1460. ],
[ 113.605 , 3900. ],
[ 113.605 , 3002. ],
[ 113.612 , 3434. ],
[ 113.6351 , 2095. ]])
np.array([[key, value] for key, value in iter({i[0]:i[1] if a[c][0]!=a[c-1][0] else a[c][1]+a[c-1][1] for c,i in enumerate(a) }.items())])
Output:
There is a pretty simple solution that randomly occurred to me. We can use the normal cumulative sum as a building block. I'll explain the idea before showing the code.
Consider this example:
keys = [0, 0, 1, 2, 2, 2, 3, 3]
values = [1, 2, 3, 4, 5, 6, 7, 8]
We compute the cumulative sum over the values:
psums = [1, 3, 6, 10, 15, 21, 28, 36]
The values that interest us are the last values per sequence of equal keys (plus the very last value). How do we get this? In scalar code, keys[i] != keys[i + 1], in vectorized form keys[:-1] != keys[1:] (plus the very last value).
keys = [0, 0, 1, 2, 2, 2, 3, 3]
psums = [1, 3, 6, 10, 15, 21, 28, 36]
^ ^ ^ ^
diffs = [0, 1, 1, 0, 0, 1, 0, 1]
ends = [3, 6, 21, 36]
Now it should be easy to see that the final result that we want is the difference between the value and its predecessor, except for the first value. np.append(ends[0], ends[1:] - ends[:-1])
Putting this all together:
arr = np.array([[ 113.555 , 1506. ],
[ 113.595 , 1460. ],
[ 113.605 , 3900. ],
[ 113.605 , 3002. ],
[ 113.612 , 3434. ],
[ 113.6351 , 2095. ]])
keys = arr[:, 0]
values = arr[:, 1]
psums = np.cumsum(values)
diffs = np.append(keys[:-1] != keys[1:], True)
ends = psums[diffs]
sums = np.append(ends[0], ends[1:] - ends[:-1])
result = np.stack((keys[diffs], sums), axis=-1)
result = array([[ 113.555 , 1506. ],
[ 113.595 , 1460. ],
[ 113.605 , 6902. ],
[ 113.612 , 3434. ],
[ 113.6351, 2095. ]])
Warning
This approach is numerically unstable when used for floating point. A small sum at the end of the list is computed as the difference of two large partial sums. This will lead to catastrophic cancellation.
However, for integers it works fine. Even with overflow, the wrap-around ensures that the final result is okay.

Creating a matrix of matrices using numpy.array()

I've been trying to create a matrix of matrices using the numpy function numpy.array() and am facing difficulties
I'm specifically trying to create the following matrix
[
[
[ [
[ 1 ,2 ] [ 1 , 2 ]
[ 3 ,4 ] [ 3 , 4 ]
] , ]
]
[
[ [
[ 1 ,2 ] [ 1 , 2 ]
[ 3 ,4 ] [ 3 , 4 ]
] , ]
]
]
more precisely like this one
I've tried the following line in Jupyter
x = np.array( [
[ [ 1,2 ] ,[ 3, 4] ] , [ [ 1,2 ] ,[ 3, 4] ] ,
[ [ 1,2 ] ,[ 3, 4] ] , [ [ 1,2 ] ,[ 3, 4] ]
])
but what it does is puts all the 2X2 matrices in row-wise form.
I'm not able to take 2( 2X2 ) matrices in row form and replicate them in columns or 2 ( 2X2 ) matrices in column form and replicate them into rows
Any idea how to create this using numpy.array() or any other approach( using numpy functions )
it seem simple but I'm finding difficulties in formulating the code.
Thanks in advance.
>>> a = np.array([[[[1,2],[3,4]], [[1,2], [3,4]]], [[[1,2],[3,4]], [[1,2], [3,4]]]])
>>> a
array([[[[1, 2],
[3, 4]],
[[1, 2],
[3, 4]]],
[[[1, 2],
[3, 4]],
[[1, 2],
[3, 4]]]])

Python - add 1D-array as column of 2D

I want to add a vector as the first column of my 2D array which looks like :
[[ 1. 0. 0. nan]
[ 4. 4. 9.97 1. ]
[ 4. 4. 27.94 1. ]
[ 2. 1. 4.17 1. ]
[ 3. 2. 38.22 1. ]
[ 4. 4. 31.83 1. ]
[ 3. 4. 41.87 1. ]
[ 2. 1. 18.33 1. ]
[ 4. 4. 33.96 1. ]
[ 2. 1. 5.65 1. ]
[ 3. 3. 40.74 1. ]
[ 2. 1. 10.04 1. ]
[ 2. 2. 53.15 1. ]]
I want to add an aray [] of 13 elements as the first column of the matrix. I tried with np.stack_column, np.append but it is for 1D vector or doesn't work because I can't chose axis=1 and only do np.append(peak_values, results)
I have a very simple option for you using numpy -
x = np.array( [[ 3.9427767, -4.297677 ],
[ 3.9427767, -4.297677 ],
[ 3.9427767, -4.297677 ],
[ 3.9427767, -4.297677 ],
[ 3.942777 , -4.297677 ],
[ 3.9427767, -4.297677 ],
[ 3.9427767, -4.297677 ],
[ 3.9427767 ,-4.297677 ],
[ 3.9427767, -4.297677 ],
[ 3.9427772 ,-4.297677 ]])
b = np.arange(10).reshape(-1,1)
np.concatenate((b.T, x), axis=1)
Output-
array([[ 0. , 3.9427767, -4.297677 ],
[ 1. , 3.9427767, -4.297677 ],
[ 2. , 3.9427767, -4.297677 ],
[ 3. , 3.9427767, -4.297677 ],
[ 4. , 3.942777 , -4.297677 ],
[ 5. , 3.9427767, -4.297677 ],
[ 6. , 3.9427767, -4.297677 ],
[ 7. , 3.9427767, -4.297677 ],
[ 8. , 3.9427767, -4.297677 ],
[ 9. , 3.9427772, -4.297677 ]])
Improving on this answer by removing the unnecessary transposition, you can indeed use reshape(-1, 1) to transform the 1d array you'd like to prepend along axis 1 to the 2d array to a 2d array with a single column. At this point, the arrays only differ in shape along the second axis and np.concatenate accepts the arguments:
>>> import numpy as np
>>> a = np.arange(12).reshape(3, 4)
>>> b = np.arange(3)
>>> a
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>> b
array([0, 1, 2])
>>> b.reshape(-1, 1) # preview the reshaping...
array([[0],
[1],
[2]])
>>> np.concatenate((b.reshape(-1, 1), a), axis=1)
array([[ 0, 0, 1, 2, 3],
[ 1, 4, 5, 6, 7],
[ 2, 8, 9, 10, 11]])
For the simplest answer, you probably don't even need numpy.
Try the following:
new_array = []
new_array.append(your_array)
That's it.
I would suggest using Numpy. It will allow you to easily do what you want.
Here is an example of squaring the entire set. you can use something like nums[0].
nums = [0, 1, 2, 3, 4]
even_squares = [x ** 2 for x in nums if x % 2 == 0]
print even_squares # Prints "[0, 4, 16]"

Filter a numpy array based on largest value

I have a numpy array which holds 4-dimensional vectors which have the following format (x, y, z, w)
The size of the array is 4 x N. Now, the data I have is where I have (x, y, z) spatial locations and w holds some particular measurement at this location. Now, there could be multiple measurements associated with an (x, y, z) position (measured as floats).
What I would like to do is filter the array, so that I get a new array where I get the maximum measurement corresponding with each (x, y, z) position.
So if my data is like:
x, y, z, w1
x, y, z, w2
x, y, z, w3
where w1 is greater than w2 and w3, the filtered data would be:
x, y, z, w1
So more concretely, say I have data like:
[[ 0.7732126 0.48649481 0.29771819 0.91622924]
[ 0.7732126 0.48649481 0.29771819 1.91622924]
[ 0.58294263 0.32025559 0.6925856 0.0524125 ]
[ 0.58294263 0.32025559 0.6925856 0.05 ]
[ 0.58294263 0.32025559 0.6925856 1.7 ]
[ 0.3239913 0.7786444 0.41692853 0.10467392]
[ 0.12080023 0.74853649 0.15356663 0.4505753 ]
[ 0.13536096 0.60319054 0.82018125 0.10445047]
[ 0.1877724 0.96060999 0.39697999 0.59078612]]
This should return
[[ 0.7732126 0.48649481 0.29771819 1.91622924]
[ 0.58294263 0.32025559 0.6925856 1.7 ]
[ 0.3239913 0.7786444 0.41692853 0.10467392]
[ 0.12080023 0.74853649 0.15356663 0.4505753 ]
[ 0.13536096 0.60319054 0.82018125 0.10445047]
[ 0.1877724 0.96060999 0.39697999 0.59078612]]
This is convoluted, but it is probably as good as you are going to get using numpy only...
First, we use lexsort to put all entries with the same coordinates together. With a being your sample array:
>>> perm = np.lexsort(a[:, 3::-1].T)
>>> a[perm]
array([[ 0.12080023, 0.74853649, 0.15356663, 0.4505753 ],
[ 0.7732126 , 0.48649481, 0.29771819, 0.91622924],
[ 0.7732126 , 0.48649481, 0.29771819, 1.91622924],
[ 0.1877724 , 0.96060999, 0.39697999, 0.59078612],
[ 0.3239913 , 0.7786444 , 0.41692853, 0.10467392],
[ 0.58294263, 0.32025559, 0.6925856 , 0.0524125 ],
[ 0.58294263, 0.32025559, 0.6925856 , 0.05 ],
[ 0.58294263, 0.32025559, 0.6925856 , 1.7 ],
[ 0.13536096, 0.60319054, 0.82018125, 0.10445047]])
Note that by reversing the axis, we are sorting by x, breaking ties with y, then z, then w.
Because it is the maximum we are looking for, we just need to take the last entry in every group, which is a pretty straightforward thing to do:
>>> a_sorted = a[perm]
>>> last = np.concatenate((np.all(a_sorted[:-1, :3] != a_sorted[1:, :3], axis=1),
[True]))
>>> a_unique_max = a_sorted[last]
>>> a_unique_max
array([[ 0.12080023, 0.74853649, 0.15356663, 0.4505753 ],
[ 0.13536096, 0.60319054, 0.82018125, 0.10445047],
[ 0.1877724 , 0.96060999, 0.39697999, 0.59078612],
[ 0.3239913 , 0.7786444 , 0.41692853, 0.10467392],
[ 0.58294263, 0.32025559, 0.6925856 , 1.7 ],
[ 0.7732126 , 0.48649481, 0.29771819, 1.91622924]])
If you would rather not have the output sorted, but keep them in the original order they came up in the original array, you can also get that with the aid of perm:
>>> a_unique_max[np.argsort(perm[last])]
array([[ 0.7732126 , 0.48649481, 0.29771819, 1.91622924],
[ 0.58294263, 0.32025559, 0.6925856 , 1.7 ],
[ 0.3239913 , 0.7786444 , 0.41692853, 0.10467392],
[ 0.12080023, 0.74853649, 0.15356663, 0.4505753 ],
[ 0.13536096, 0.60319054, 0.82018125, 0.10445047],
[ 0.1877724 , 0.96060999, 0.39697999, 0.59078612]])
This will only work for the maximum, and it comes as a by-product of the sorting. If you are after a different function, say the product of all same-coordinates entries, you could do something like:
>>> first = np.concatenate(([True],
np.all(a_sorted[:-1, :3] != a_sorted[1:, :3], axis=1)))
>>> a_unique_prods = np.multiply.reduceat(a_sorted, np.nonzero(first)[0])
And you will have to play a little around with these results to assemble your return array.
I see that you already got the pointer towards pandas in the comments. FWIW, here's how you can get the desired behavior, assuming you don't care about the final sort order since groupby changes it up.
In [14]: arr
Out[14]:
array([[ 0.7732126 , 0.48649481, 0.29771819, 0.91622924],
[ 0.7732126 , 0.48649481, 0.29771819, 1.91622924],
[ 0.58294263, 0.32025559, 0.6925856 , 0.0524125 ],
[ 0.58294263, 0.32025559, 0.6925856 , 0.05 ],
[ 0.58294263, 0.32025559, 0.6925856 , 1.7 ],
[ 0.3239913 , 0.7786444 , 0.41692853, 0.10467392],
[ 0.12080023, 0.74853649, 0.15356663, 0.4505753 ],
[ 0.13536096, 0.60319054, 0.82018125, 0.10445047],
[ 0.1877724 , 0.96060999, 0.39697999, 0.59078612]])
In [15]: import pandas as pd
In [16]: pd.DataFrame(arr)
Out[16]:
0 1 2 3
0 0.773213 0.486495 0.297718 0.916229
1 0.773213 0.486495 0.297718 1.916229
2 0.582943 0.320256 0.692586 0.052413
3 0.582943 0.320256 0.692586 0.050000
4 0.582943 0.320256 0.692586 1.700000
5 0.323991 0.778644 0.416929 0.104674
6 0.120800 0.748536 0.153567 0.450575
7 0.135361 0.603191 0.820181 0.104450
8 0.187772 0.960610 0.396980 0.590786
In [17]: pd.DataFrame(arr).groupby([0,1,2]).max().reset_index()
Out[17]:
0 1 2 3
0 0.120800 0.748536 0.153567 0.450575
1 0.135361 0.603191 0.820181 0.104450
2 0.187772 0.960610 0.396980 0.590786
3 0.323991 0.778644 0.416929 0.104674
4 0.582943 0.320256 0.692586 1.700000
5 0.773213 0.486495 0.297718 1.916229
You can start off with lex-sorting input array to bring entries with identical first three elements in succession. Then, create another 2D array to store the last column entries, such that elements corresponding to each duplicate triplet goes into the same rows. Next, find the max along axis=1 for this 2D array and thus have the final max output for each such unique triplet. Here's the implementation, assuming A as the input array -
# Lex sort A
sortedA = A[np.lexsort(A[:,:-1].T)]
# Mask of start of unique first three columns from A
start_unqA = np.append(True,~np.all(np.diff(sortedA[:,:-1],axis=0)==0,axis=1))
# Counts of unique first three columns from A
counts = np.bincount(start_unqA.cumsum()-1)
mask = np.arange(counts.max()) < counts[:,None]
# Group A's last column into rows based on uniqueness from first three columns
grpA = np.empty(mask.shape)
grpA.fill(np.nan)
grpA[mask] = sortedA[:,-1]
# Concatenate unique first three columns from A and
# corresponding max values for each such unique triplet
out = np.column_stack((sortedA[start_unqA,:-1],np.nanmax(grpA,axis=1)))
Sample run -
In [75]: A
Out[75]:
array([[ 1, 1, 1, 96],
[ 1, 2, 2, 48],
[ 2, 1, 2, 33],
[ 1, 1, 1, 24],
[ 1, 1, 1, 94],
[ 2, 2, 2, 5],
[ 2, 1, 1, 17],
[ 2, 2, 2, 62]])
In [76]: sortedA
Out[76]:
array([[ 1, 1, 1, 96],
[ 1, 1, 1, 24],
[ 1, 1, 1, 94],
[ 2, 1, 1, 17],
[ 2, 1, 2, 33],
[ 1, 2, 2, 48],
[ 2, 2, 2, 5],
[ 2, 2, 2, 62]])
In [77]: out
Out[77]:
array([[ 1., 1., 1., 96.],
[ 2., 1., 1., 17.],
[ 2., 1., 2., 33.],
[ 1., 2., 2., 48.],
[ 2., 2., 2., 62.]])
You can use logical indexing.
I will use random data for an example:
>>> myarr = np.random.random((6, 4))
>>> print(myarr)
[[ 0.7732126 0.48649481 0.29771819 0.91622924]
[ 0.58294263 0.32025559 0.6925856 0.0524125 ]
[ 0.3239913 0.7786444 0.41692853 0.10467392]
[ 0.12080023 0.74853649 0.15356663 0.4505753 ]
[ 0.13536096 0.60319054 0.82018125 0.10445047]
[ 0.1877724 0.96060999 0.39697999 0.59078612]]
To get the row or rows where the last column is the greatest, do this:
>>> greatest = myarr[myarr[:, 3]==myarr[:, 3].max()]
>>> print(greatest)
[[ 0.7732126 0.48649481 0.29771819 0.91622924]]
What this does is it gets the last column of myarr, and finds the maximum of that column, finds all the elements of that column equal to the maximum, and then gets the corresponding rows.
You can use np.argmax
x[np.argmax(x[:,3]),:]
>>> x = np.random.random((5,4))
>>> x
array([[ 0.25461146, 0.35671081, 0.54856798, 0.2027313 ],
[ 0.17079029, 0.66970362, 0.06533572, 0.31704254],
[ 0.4577928 , 0.69022073, 0.57128696, 0.93995176],
[ 0.29708841, 0.96324181, 0.78859008, 0.25433235],
[ 0.58739451, 0.17961551, 0.67993786, 0.73725493]])
>>> x[np.argmax(x[:,3]),:]
array([ 0.4577928 , 0.69022073, 0.57128696, 0.93995176])

Round labels and sum values in label-value pair 2d-numpy array

I have a 2d-Numpy array containing basically a label-value pair. I have combined several of these matricies, but I'm hoping to round the label to 4 decimal places and sum the values, such that:
[[70.00103, 1],
[70.02474, 1],
[70.02474, 1],
[70.024751, 1],
[71.009100, 1],
[79.0152, 1],
[79.0152633, 1],
[79.0152634, 1]]
becomes
[[70.001, 1],
[70.0247, 2],
[70.0248, 1],
[71.0091, 1],
[79.0152, 1],
[79.0153, 2]]
Any thoughts on how one might accomplish this in a speedy manner, using either numpy or pandas? Thanks!
In [10]:
import numpy as np
x=np.array([[70.00103, 1],[70.02474, 1],[70.02474, 1],[70.024751, 1],[71.009100, 1],[79.0152, 1],[79.0152633, 1],[79.0152634,1]])
x[:,0]=x[:,0].round(4)
x
Out[10]:
array([[ 70.001 , 1. ],
[ 70.0247, 1. ],
[ 70.0247, 1. ],
[ 70.0248, 1. ],
[ 71.0091, 1. ],
[ 79.0152, 1. ],
[ 79.0153, 1. ],
[ 79.0153, 1. ]])
In [14]:
import pandas as pd
pd.DataFrame(x).groupby(0).sum()
Out[14]:
70.0010 1
70.0247 2
70.0248 1
71.0091 1
79.0152 1
79.0153 2
It's what that np.around is for :
>>> A=np.array([[70.00103, 1],
... [70.02474, 1],
... [70.02474, 1],
... [70.024751, 1],
... [71.009100, 1],
... [79.0152, 1],
... [79.0152633, 1],
... [79.0152634, 1]])
>>>
>>> np.around(A, decimals=4)
array([[ 70.001 , 1. ],
[ 70.0247, 1. ],
[ 70.0247, 1. ],
[ 70.0248, 1. ],
[ 71.0091, 1. ],
[ 79.0152, 1. ],
[ 79.0153, 1. ],
[ 79.0153, 1. ]])

Categories