Round labels and sum values in label-value pair 2d-numpy array - python

I have a 2d-Numpy array containing basically a label-value pair. I have combined several of these matricies, but I'm hoping to round the label to 4 decimal places and sum the values, such that:
[[70.00103, 1],
[70.02474, 1],
[70.02474, 1],
[70.024751, 1],
[71.009100, 1],
[79.0152, 1],
[79.0152633, 1],
[79.0152634, 1]]
becomes
[[70.001, 1],
[70.0247, 2],
[70.0248, 1],
[71.0091, 1],
[79.0152, 1],
[79.0153, 2]]
Any thoughts on how one might accomplish this in a speedy manner, using either numpy or pandas? Thanks!

In [10]:
import numpy as np
x=np.array([[70.00103, 1],[70.02474, 1],[70.02474, 1],[70.024751, 1],[71.009100, 1],[79.0152, 1],[79.0152633, 1],[79.0152634,1]])
x[:,0]=x[:,0].round(4)
x
Out[10]:
array([[ 70.001 , 1. ],
[ 70.0247, 1. ],
[ 70.0247, 1. ],
[ 70.0248, 1. ],
[ 71.0091, 1. ],
[ 79.0152, 1. ],
[ 79.0153, 1. ],
[ 79.0153, 1. ]])
In [14]:
import pandas as pd
pd.DataFrame(x).groupby(0).sum()
Out[14]:
70.0010 1
70.0247 2
70.0248 1
71.0091 1
79.0152 1
79.0153 2

It's what that np.around is for :
>>> A=np.array([[70.00103, 1],
... [70.02474, 1],
... [70.02474, 1],
... [70.024751, 1],
... [71.009100, 1],
... [79.0152, 1],
... [79.0152633, 1],
... [79.0152634, 1]])
>>>
>>> np.around(A, decimals=4)
array([[ 70.001 , 1. ],
[ 70.0247, 1. ],
[ 70.0247, 1. ],
[ 70.0248, 1. ],
[ 71.0091, 1. ],
[ 79.0152, 1. ],
[ 79.0153, 1. ],
[ 79.0153, 1. ]])

Related

How do I apply cumulative sum to a numpy column based on another column having the same values?

I have numpy arrays with the following structure:
array([[ 113.555 , 1506. ],
[ 113.595 , 1460. ],
[ 113.605 , 3900. ],
[ 113.605 , 3002. ],
[ 113.612 , 3434. ],
...
[ 113.6351 , 2095. ]])
And I would like to group the values of column two for those rows with the same value in first column.
So the above would become:
array([[ 113.555 , 1506. ],
[ 113.595 , 1460. ],
[ 113.605 , 6902. ],
[ 113.612 , 3434. ],
...
[ 113.6351 , 2095. ]])
and will have one less row. The order shall be respected. The arrays are always ordered by the first column.
Which would be the numpy way of implementing this? Is there any method from the API that can be used?
I have tried to iterate and check for the previous value, but it does not seem like the right way to do it in numpy.
You need to use dictionary comprehension:
a = np.array([[ 113.555 , 1506. ],
[ 113.595 , 1460. ],
[ 113.605 , 3900. ],
[ 113.605 , 3002. ],
[ 113.612 , 3434. ],
[ 113.6351 , 2095. ]])
np.array([[key, value] for key, value in iter({i[0]:i[1] if a[c][0]!=a[c-1][0] else a[c][1]+a[c-1][1] for c,i in enumerate(a) }.items())])
Output:
There is a pretty simple solution that randomly occurred to me. We can use the normal cumulative sum as a building block. I'll explain the idea before showing the code.
Consider this example:
keys = [0, 0, 1, 2, 2, 2, 3, 3]
values = [1, 2, 3, 4, 5, 6, 7, 8]
We compute the cumulative sum over the values:
psums = [1, 3, 6, 10, 15, 21, 28, 36]
The values that interest us are the last values per sequence of equal keys (plus the very last value). How do we get this? In scalar code, keys[i] != keys[i + 1], in vectorized form keys[:-1] != keys[1:] (plus the very last value).
keys = [0, 0, 1, 2, 2, 2, 3, 3]
psums = [1, 3, 6, 10, 15, 21, 28, 36]
^ ^ ^ ^
diffs = [0, 1, 1, 0, 0, 1, 0, 1]
ends = [3, 6, 21, 36]
Now it should be easy to see that the final result that we want is the difference between the value and its predecessor, except for the first value. np.append(ends[0], ends[1:] - ends[:-1])
Putting this all together:
arr = np.array([[ 113.555 , 1506. ],
[ 113.595 , 1460. ],
[ 113.605 , 3900. ],
[ 113.605 , 3002. ],
[ 113.612 , 3434. ],
[ 113.6351 , 2095. ]])
keys = arr[:, 0]
values = arr[:, 1]
psums = np.cumsum(values)
diffs = np.append(keys[:-1] != keys[1:], True)
ends = psums[diffs]
sums = np.append(ends[0], ends[1:] - ends[:-1])
result = np.stack((keys[diffs], sums), axis=-1)
result = array([[ 113.555 , 1506. ],
[ 113.595 , 1460. ],
[ 113.605 , 6902. ],
[ 113.612 , 3434. ],
[ 113.6351, 2095. ]])
Warning
This approach is numerically unstable when used for floating point. A small sum at the end of the list is computed as the difference of two large partial sums. This will lead to catastrophic cancellation.
However, for integers it works fine. Even with overflow, the wrap-around ensures that the final result is okay.

Numpy get values from np.argmin indices [duplicate]

This question already has answers here:
How to take elements along a given axis, given by their indices?
(4 answers)
indexing a numpy array with indices from another array
(1 answer)
Closed 4 years ago.
Let's say I've d1, d2 and d3 as following. t is a variable where I've combined my arrays and m contains the indices of the smallest value.
>>> d1
array([[ 0.9850916 , 0.95004463, 1.35728604, 1.18554035],
[ 0.47624542, 0.45561795, 0.6231743 , 0.94746001],
[ 0.74008166, 0. , 1.59774065, 1.00423774],
[ 0.86173439, 0.70940862, 1.0601817 , 0.96112015],
[ 1.03413477, 0.64874991, 1.27488263, 0.80250053]])
>>> d2
array([[ 0.27301946, 0.38387185, 0.93215524, 0.98851404],
[ 0.17996978, 0. , 0.41283798, 0.15204035],
[ 0.10952115, 0.45561795, 0.5334015 , 0.75242805],
[ 0.4600214 , 0.74100962, 0.16743427, 0.36250385],
[ 0.60984208, 0.35161234, 0.44580535, 0.6713633 ]])
>>> d3
array([[ 0. , 0.19658541, 1.14605925, 1.18431945],
[ 0.10697428, 0.27301946, 0.45536417, 0.11922118],
[ 0.42153386, 0.9850916 , 0.28225364, 0.82765657],
[ 1.04940684, 1.63082272, 0.49987388, 0.38596938],
[ 0.21015723, 1.07007177, 0.22599987, 0.89288339]])
>>> t = np.array([d1, d2, d3])
>>> t
array([[[ 0.9850916 , 0.95004463, 1.35728604, 1.18554035],
[ 0.47624542, 0.45561795, 0.6231743 , 0.94746001],
[ 0.74008166, 0. , 1.59774065, 1.00423774],
[ 0.86173439, 0.70940862, 1.0601817 , 0.96112015],
[ 1.03413477, 0.64874991, 1.27488263, 0.80250053]],
[[ 0.27301946, 0.38387185, 0.93215524, 0.98851404],
[ 0.17996978, 0. , 0.41283798, 0.15204035],
[ 0.10952115, 0.45561795, 0.5334015 , 0.75242805],
[ 0.4600214 , 0.74100962, 0.16743427, 0.36250385],
[ 0.60984208, 0.35161234, 0.44580535, 0.6713633 ]],
[[ 0. , 0.19658541, 1.14605925, 1.18431945],
[ 0.10697428, 0.27301946, 0.45536417, 0.11922118],
[ 0.42153386, 0.9850916 , 0.28225364, 0.82765657],
[ 1.04940684, 1.63082272, 0.49987388, 0.38596938],
[ 0.21015723, 1.07007177, 0.22599987, 0.89288339]]])
>>> m = np.argmin(t, axis=0)
>>> m
array([[2, 2, 1, 1],
[2, 1, 1, 2],
[1, 0, 2, 1],
[1, 0, 1, 1],
[2, 1, 2, 1]])
From m and t, I want to calculate the actual values as following. How do I do this? ... preferably, the efficient way?
array([ [ 0. , 0.19658541, 0.93215524, 0.98851404],
[ 0.10697428, 0. , 0.41283798, 0.11922118],
[ 0.10952115, 0. , 0.28225364, 0.75242805],
[ 0.4600214 , 0.70940862, 0.16743427, 0.36250385],
[ 0.21015723, 0.35161234, 0.22599987, 0.6713633 ]])
If only the minimum is what you needed, you can use np.min(t, axis=0)
If you want to use customary indexing, you can use choose:
m.choose(t) # This will return the same thing.
It can also be written as
np.choose(m, t)
Which returns:
array([[0. , 0.19658541, 0.93215524, 0.98851404],
[0.10697428, 0. , 0.41283798, 0.11922118],
[0.10952115, 0. , 0.28225364, 0.75242805],
[0.4600214 , 0.70940862, 0.16743427, 0.36250385],
[0.21015723, 0.35161234, 0.22599987, 0.6713633 ]])

Python - add 1D-array as column of 2D

I want to add a vector as the first column of my 2D array which looks like :
[[ 1. 0. 0. nan]
[ 4. 4. 9.97 1. ]
[ 4. 4. 27.94 1. ]
[ 2. 1. 4.17 1. ]
[ 3. 2. 38.22 1. ]
[ 4. 4. 31.83 1. ]
[ 3. 4. 41.87 1. ]
[ 2. 1. 18.33 1. ]
[ 4. 4. 33.96 1. ]
[ 2. 1. 5.65 1. ]
[ 3. 3. 40.74 1. ]
[ 2. 1. 10.04 1. ]
[ 2. 2. 53.15 1. ]]
I want to add an aray [] of 13 elements as the first column of the matrix. I tried with np.stack_column, np.append but it is for 1D vector or doesn't work because I can't chose axis=1 and only do np.append(peak_values, results)
I have a very simple option for you using numpy -
x = np.array( [[ 3.9427767, -4.297677 ],
[ 3.9427767, -4.297677 ],
[ 3.9427767, -4.297677 ],
[ 3.9427767, -4.297677 ],
[ 3.942777 , -4.297677 ],
[ 3.9427767, -4.297677 ],
[ 3.9427767, -4.297677 ],
[ 3.9427767 ,-4.297677 ],
[ 3.9427767, -4.297677 ],
[ 3.9427772 ,-4.297677 ]])
b = np.arange(10).reshape(-1,1)
np.concatenate((b.T, x), axis=1)
Output-
array([[ 0. , 3.9427767, -4.297677 ],
[ 1. , 3.9427767, -4.297677 ],
[ 2. , 3.9427767, -4.297677 ],
[ 3. , 3.9427767, -4.297677 ],
[ 4. , 3.942777 , -4.297677 ],
[ 5. , 3.9427767, -4.297677 ],
[ 6. , 3.9427767, -4.297677 ],
[ 7. , 3.9427767, -4.297677 ],
[ 8. , 3.9427767, -4.297677 ],
[ 9. , 3.9427772, -4.297677 ]])
Improving on this answer by removing the unnecessary transposition, you can indeed use reshape(-1, 1) to transform the 1d array you'd like to prepend along axis 1 to the 2d array to a 2d array with a single column. At this point, the arrays only differ in shape along the second axis and np.concatenate accepts the arguments:
>>> import numpy as np
>>> a = np.arange(12).reshape(3, 4)
>>> b = np.arange(3)
>>> a
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>> b
array([0, 1, 2])
>>> b.reshape(-1, 1) # preview the reshaping...
array([[0],
[1],
[2]])
>>> np.concatenate((b.reshape(-1, 1), a), axis=1)
array([[ 0, 0, 1, 2, 3],
[ 1, 4, 5, 6, 7],
[ 2, 8, 9, 10, 11]])
For the simplest answer, you probably don't even need numpy.
Try the following:
new_array = []
new_array.append(your_array)
That's it.
I would suggest using Numpy. It will allow you to easily do what you want.
Here is an example of squaring the entire set. you can use something like nums[0].
nums = [0, 1, 2, 3, 4]
even_squares = [x ** 2 for x in nums if x % 2 == 0]
print even_squares # Prints "[0, 4, 16]"

Creating Multidimensional arrays and interpolating in Python

I have 8 arrays, when each one is plotted it gives 'x Vs. Detection Probability'. I want to combine these arrays so that I can perform a multidimensional interpolation to find the detection probability from variables in each of the dimensions.
Here are a couple of my arrays as an example.
In [3]: mag_rec
Out[3]:
array([[ 1.35000000e+01, 0.00000000e+00],
[ 1.38333333e+01, 5.38461538e-01],
[ 1.41666667e+01, 5.84158416e-01],
[ 1.45000000e+01, 6.93771626e-01],
[ 1.48333333e+01, 7.43629344e-01],
[ 1.51666667e+01, 8.30774480e-01],
[ 1.55000000e+01, 8.74700571e-01],
[ 1.58333333e+01, 8.84866920e-01],
[ 1.61666667e+01, 8.95135908e-01],
[ 1.65000000e+01, 8.97150997e-01],
[ 1.68333333e+01, 8.90416846e-01],
[ 1.71666667e+01, 8.90911598e-01],
[ 1.75000000e+01, 8.90111460e-01],
[ 1.78333333e+01, 8.89567069e-01],
[ 1.81666667e+01, 8.82184730e-01],
[ 1.85000000e+01, 8.76020265e-01],
[ 1.88333333e+01, 8.54947843e-01],
[ 1.91666667e+01, 8.43505477e-01],
[ 1.95000000e+01, 8.24739363e-01],
[ 1.98333333e+01, 7.70070922e-01],
[ 2.01666667e+01, 6.33006993e-01],
[ 2.05000000e+01, 4.45367502e-01],
[ 2.08333333e+01, 2.65029636e-01],
[ 2.11666667e+01, 1.22023390e-01],
[ 2.15000000e+01, 4.02201524e-02],
[ 2.18333333e+01, 1.51190986e-02],
[ 2.21666667e+01, 8.75088215e-03],
[ 2.25000000e+01, 4.39466969e-03],
[ 2.28333333e+01, 3.65476525e-03]])
and
In [5]: lmt_mag
Out[5]:
array([[ 16.325 , 0.35 ],
[ 16.54166667, 0.39583333],
[ 16.75833333, 0.35555556],
[ 16.975 , 0.29666667],
[ 17.19166667, 0.42222222],
[ 17.40833333, 0.38541667],
[ 17.625 , 0.4875 ],
[ 17.84166667, 0.41956242],
[ 18.05833333, 0.45333333],
[ 18.275 , 0.45980392],
[ 18.49166667, 0.46742424],
[ 18.70833333, 0.4952381 ],
[ 18.925 , 0.49423077],
[ 19.14166667, 0.53375 ],
[ 19.35833333, 0.56239316],
[ 19.575 , 0.52217391],
[ 19.79166667, 0.55590909],
[ 20.00833333, 0.57421227],
[ 20.225 , 0.5729304 ],
[ 20.44166667, 0.61708204],
[ 20.65833333, 0.63968037],
[ 20.875 , 0.65627395],
[ 21.09166667, 0.66177885],
[ 21.30833333, 0.69375 ],
[ 21.525 , 0.67083333],
[ 21.95833333, 0.88333333],
[ 22.175 , 0.85833333]])
How, in Python, would I go about combining these arrays into a multidimensional array? (More arrays will have to be included)
Further to this, once I have this multidimensional array, is scipy.ndimage.interpolation.map_coordinates the fastest way to interpolate on this?
you can concatenate your arrays with numpy.concatenate((a1, a2, ...), axis=0)
and for redimension numpy have some different function HERE that you can use them depending on your need !
e.g Demo:
>>> import numpy as np
>>> a = np.array([[1, 2], [3, 4]])
>>> b = np.array([[5, 6]])
>>> np.concatenate((a, b), axis=0)
array([[1, 2],
[3, 4],
[5, 6]])
>>> np.expand_dims(y, axis=0)
array([[[1, 2],
[3, 4],
[5, 6]]])
>>> np.expand_dims(y, axis=2)
array([[[1],
[2]],
[[3],
[4]],
[[5],
[6]]])

How to get euclidean distance on a 3x3x3 array in numpy

say I have a (3,3,3) array like this.
array([[[1, 1, 1],
[1, 1, 1],
[0, 0, 0]],
[[2, 2, 2],
[2, 2, 2],
[2, 2, 2]],
[[3, 3, 3],
[3, 3, 3],
[1, 1, 1]]])
How do I get the 9 values corresponding to euclidean distance between each vector of 3 values and the zeroth values?
Such as doing a numpy.linalg.norm([1,1,1] - [1,1,1]) 2 times, and then doing norm([0,0,0] - [0,0,0]), and then norm([2,2,2] - [1,1,1]) 2 times, norm([2,2,2] - [0,0,0]), then norm([3,3,3] - [1,1,1]) 2 times, and finally norm([1,1,1] - [0,0,0]).
Any good ways to vectorize this? I want to store the distances in a (3,3,1) matrix.
The result would be:
array([[[0. ],
[0. ],
[0. ]],
[[1.73],
[1.73],
[3.46]]
[[3.46],
[3.46],
[1.73]]])
keepdims argument is added in numpy 1.7, you can use it to keep the sum axis:
np.sum((x - [1, 1, 1])**2, axis=-1, keepdims=True)**0.5
the result is:
[[[ 0. ]
[ 0. ]
[ 0. ]]
[[ 1.73205081]
[ 1.73205081]
[ 1.73205081]]
[[ 3.46410162]
[ 3.46410162]
[ 0. ]]]
Edit
np.sum((x - x[0])**2, axis=-1, keepdims=True)**0.5
the result is:
array([[[ 0. ],
[ 0. ],
[ 0. ]],
[[ 1.73205081],
[ 1.73205081],
[ 3.46410162]],
[[ 3.46410162],
[ 3.46410162],
[ 1.73205081]]])
You might want to consider scipy.spatial.distance.cdist(), which efficiently computes distances between pairs of points in two collections of inputs (with a standard euclidean metric, among others). Here's example code:
import numpy as np
import scipy.spatial.distance as dist
i = np.array([[[1, 1, 1],
[1, 1, 1],
[0, 0, 0]],
[[2, 2, 2],
[2, 2, 2],
[2, 2, 2]],
[[3, 3, 3],
[3, 3, 3],
[1, 1, 1]]])
n,m,o = i.shape
# compute euclidean distances of each vector to the origin
# reshape input array to 2-D, as required by cdist
# only keep diagonal, as cdist computes all pairwise distances
# reshape result, adapting it to input array and required output
d = dist.cdist(i.reshape(n*m,o),i[0]).reshape(n,m,o).diagonal(axis1=2).reshape(n,m,1)
d holds:
array([[[ 0. ],
[ 0. ],
[ 0. ]],
[[ 1.73205081],
[ 1.73205081],
[ 3.46410162]],
[[ 3.46410162],
[ 3.46410162],
[ 1.73205081]]])
The big caveat of this approach is that we're calculating n*m*o distances, when we only need n*m (and that it involves an insane amount of reshaping).
I'm doing something similar that is to compute the the sum of squared distances (SSD) for each pair of frames in video volume. I think that it could be helpful for you.
video_volume is a a single 4d numpy array. This array should have dimensions
(time, rows, cols, 3) and dtype np.uint8.
Output is a square 2d numpy array of dtype float. output[i,j] should contain
the SSD between frames i and j.
video_volume = video_volume.astype(float)
size_t = video_volume.shape[0]
output = np.zeros((size_t, size_t), dtype = np.float)
for i in range(size_t):
for j in range(size_t):
output[i, j] = np.square(video_volume[i,:,:,:] - video_volume[j,:,:,:]).sum()

Categories