How can a two-column Python list be shuffled? - python

It is straightforward to shuffle a Python list:
>>> import random
>>> random.seed(100)
>>> a = [1, 2, 3, 4, 5, 6, 7]
>>> random.shuffle(a)
>>> a
[7, 1, 5, 6, 4, 3, 2]
How can two-column Python list be shuffled?
So, to be specific, I have a list of the following representative form:
[
[
0.121282446303956,
2.1595318494748978,
0.43386778589085612
],
[
1
],
[
0.121282446303956,
2.1595318494748978,
0.43386778589085612
],
[
2
],
[
0.121282446303956,
2.1595318494748978,
0.43386778589085612
],
[
3
],
[
0.121282446303956,
2.1595318494748978,
0.43386778589085612
],
[
4
],
# ... etc.
]
You could imagine this as being a form of data that captures three characteristics of an event (the three high-precision numbers) in one column and has the event numbers in the second column.
I want to shuffle this two-column list such that the events with their corresponding event numbers are shuffled. Note that I do not want to shuffle the list of event characteristics. So, in this example, the result of a shuffle could be the following:
[
[
0.121282446303956,
2.1595318494748978,
0.43386778589085612
],
[
3
],
[
0.121282446303956,
2.1595318494748978,
0.43386778589085612
],
[
4
],
[
0.121282446303956,
2.1595318494748978,
0.43386778589085612
],
[
1
],
[
0.121282446303956,
2.1595318494748978,
0.43386778589085612
],
[
2
],
# ... etc.
]
You can see that the order of the events has been changed from 1, 2, 3, 4 to 3, 4, 1, 2.
What would be a good way to do this type of shuffle?

Numpy allows inplace shuffling of array slices.
import numpy as np
a = np.array(a)
np.random.shuffle(a[1::2])

Related

How do I apply cumulative sum to a numpy column based on another column having the same values?

I have numpy arrays with the following structure:
array([[ 113.555 , 1506. ],
[ 113.595 , 1460. ],
[ 113.605 , 3900. ],
[ 113.605 , 3002. ],
[ 113.612 , 3434. ],
...
[ 113.6351 , 2095. ]])
And I would like to group the values of column two for those rows with the same value in first column.
So the above would become:
array([[ 113.555 , 1506. ],
[ 113.595 , 1460. ],
[ 113.605 , 6902. ],
[ 113.612 , 3434. ],
...
[ 113.6351 , 2095. ]])
and will have one less row. The order shall be respected. The arrays are always ordered by the first column.
Which would be the numpy way of implementing this? Is there any method from the API that can be used?
I have tried to iterate and check for the previous value, but it does not seem like the right way to do it in numpy.
You need to use dictionary comprehension:
a = np.array([[ 113.555 , 1506. ],
[ 113.595 , 1460. ],
[ 113.605 , 3900. ],
[ 113.605 , 3002. ],
[ 113.612 , 3434. ],
[ 113.6351 , 2095. ]])
np.array([[key, value] for key, value in iter({i[0]:i[1] if a[c][0]!=a[c-1][0] else a[c][1]+a[c-1][1] for c,i in enumerate(a) }.items())])
Output:
There is a pretty simple solution that randomly occurred to me. We can use the normal cumulative sum as a building block. I'll explain the idea before showing the code.
Consider this example:
keys = [0, 0, 1, 2, 2, 2, 3, 3]
values = [1, 2, 3, 4, 5, 6, 7, 8]
We compute the cumulative sum over the values:
psums = [1, 3, 6, 10, 15, 21, 28, 36]
The values that interest us are the last values per sequence of equal keys (plus the very last value). How do we get this? In scalar code, keys[i] != keys[i + 1], in vectorized form keys[:-1] != keys[1:] (plus the very last value).
keys = [0, 0, 1, 2, 2, 2, 3, 3]
psums = [1, 3, 6, 10, 15, 21, 28, 36]
^ ^ ^ ^
diffs = [0, 1, 1, 0, 0, 1, 0, 1]
ends = [3, 6, 21, 36]
Now it should be easy to see that the final result that we want is the difference between the value and its predecessor, except for the first value. np.append(ends[0], ends[1:] - ends[:-1])
Putting this all together:
arr = np.array([[ 113.555 , 1506. ],
[ 113.595 , 1460. ],
[ 113.605 , 3900. ],
[ 113.605 , 3002. ],
[ 113.612 , 3434. ],
[ 113.6351 , 2095. ]])
keys = arr[:, 0]
values = arr[:, 1]
psums = np.cumsum(values)
diffs = np.append(keys[:-1] != keys[1:], True)
ends = psums[diffs]
sums = np.append(ends[0], ends[1:] - ends[:-1])
result = np.stack((keys[diffs], sums), axis=-1)
result = array([[ 113.555 , 1506. ],
[ 113.595 , 1460. ],
[ 113.605 , 6902. ],
[ 113.612 , 3434. ],
[ 113.6351, 2095. ]])
Warning
This approach is numerically unstable when used for floating point. A small sum at the end of the list is computed as the difference of two large partial sums. This will lead to catastrophic cancellation.
However, for integers it works fine. Even with overflow, the wrap-around ensures that the final result is okay.

Creating a matrix of matrices using numpy.array()

I've been trying to create a matrix of matrices using the numpy function numpy.array() and am facing difficulties
I'm specifically trying to create the following matrix
[
[
[ [
[ 1 ,2 ] [ 1 , 2 ]
[ 3 ,4 ] [ 3 , 4 ]
] , ]
]
[
[ [
[ 1 ,2 ] [ 1 , 2 ]
[ 3 ,4 ] [ 3 , 4 ]
] , ]
]
]
more precisely like this one
I've tried the following line in Jupyter
x = np.array( [
[ [ 1,2 ] ,[ 3, 4] ] , [ [ 1,2 ] ,[ 3, 4] ] ,
[ [ 1,2 ] ,[ 3, 4] ] , [ [ 1,2 ] ,[ 3, 4] ]
])
but what it does is puts all the 2X2 matrices in row-wise form.
I'm not able to take 2( 2X2 ) matrices in row form and replicate them in columns or 2 ( 2X2 ) matrices in column form and replicate them into rows
Any idea how to create this using numpy.array() or any other approach( using numpy functions )
it seem simple but I'm finding difficulties in formulating the code.
Thanks in advance.
>>> a = np.array([[[[1,2],[3,4]], [[1,2], [3,4]]], [[[1,2],[3,4]], [[1,2], [3,4]]]])
>>> a
array([[[[1, 2],
[3, 4]],
[[1, 2],
[3, 4]]],
[[[1, 2],
[3, 4]],
[[1, 2],
[3, 4]]]])

Python - add 1D-array as column of 2D

I want to add a vector as the first column of my 2D array which looks like :
[[ 1. 0. 0. nan]
[ 4. 4. 9.97 1. ]
[ 4. 4. 27.94 1. ]
[ 2. 1. 4.17 1. ]
[ 3. 2. 38.22 1. ]
[ 4. 4. 31.83 1. ]
[ 3. 4. 41.87 1. ]
[ 2. 1. 18.33 1. ]
[ 4. 4. 33.96 1. ]
[ 2. 1. 5.65 1. ]
[ 3. 3. 40.74 1. ]
[ 2. 1. 10.04 1. ]
[ 2. 2. 53.15 1. ]]
I want to add an aray [] of 13 elements as the first column of the matrix. I tried with np.stack_column, np.append but it is for 1D vector or doesn't work because I can't chose axis=1 and only do np.append(peak_values, results)
I have a very simple option for you using numpy -
x = np.array( [[ 3.9427767, -4.297677 ],
[ 3.9427767, -4.297677 ],
[ 3.9427767, -4.297677 ],
[ 3.9427767, -4.297677 ],
[ 3.942777 , -4.297677 ],
[ 3.9427767, -4.297677 ],
[ 3.9427767, -4.297677 ],
[ 3.9427767 ,-4.297677 ],
[ 3.9427767, -4.297677 ],
[ 3.9427772 ,-4.297677 ]])
b = np.arange(10).reshape(-1,1)
np.concatenate((b.T, x), axis=1)
Output-
array([[ 0. , 3.9427767, -4.297677 ],
[ 1. , 3.9427767, -4.297677 ],
[ 2. , 3.9427767, -4.297677 ],
[ 3. , 3.9427767, -4.297677 ],
[ 4. , 3.942777 , -4.297677 ],
[ 5. , 3.9427767, -4.297677 ],
[ 6. , 3.9427767, -4.297677 ],
[ 7. , 3.9427767, -4.297677 ],
[ 8. , 3.9427767, -4.297677 ],
[ 9. , 3.9427772, -4.297677 ]])
Improving on this answer by removing the unnecessary transposition, you can indeed use reshape(-1, 1) to transform the 1d array you'd like to prepend along axis 1 to the 2d array to a 2d array with a single column. At this point, the arrays only differ in shape along the second axis and np.concatenate accepts the arguments:
>>> import numpy as np
>>> a = np.arange(12).reshape(3, 4)
>>> b = np.arange(3)
>>> a
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>> b
array([0, 1, 2])
>>> b.reshape(-1, 1) # preview the reshaping...
array([[0],
[1],
[2]])
>>> np.concatenate((b.reshape(-1, 1), a), axis=1)
array([[ 0, 0, 1, 2, 3],
[ 1, 4, 5, 6, 7],
[ 2, 8, 9, 10, 11]])
For the simplest answer, you probably don't even need numpy.
Try the following:
new_array = []
new_array.append(your_array)
That's it.
I would suggest using Numpy. It will allow you to easily do what you want.
Here is an example of squaring the entire set. you can use something like nums[0].
nums = [0, 1, 2, 3, 4]
even_squares = [x ** 2 for x in nums if x % 2 == 0]
print even_squares # Prints "[0, 4, 16]"

Creating Multidimensional arrays and interpolating in Python

I have 8 arrays, when each one is plotted it gives 'x Vs. Detection Probability'. I want to combine these arrays so that I can perform a multidimensional interpolation to find the detection probability from variables in each of the dimensions.
Here are a couple of my arrays as an example.
In [3]: mag_rec
Out[3]:
array([[ 1.35000000e+01, 0.00000000e+00],
[ 1.38333333e+01, 5.38461538e-01],
[ 1.41666667e+01, 5.84158416e-01],
[ 1.45000000e+01, 6.93771626e-01],
[ 1.48333333e+01, 7.43629344e-01],
[ 1.51666667e+01, 8.30774480e-01],
[ 1.55000000e+01, 8.74700571e-01],
[ 1.58333333e+01, 8.84866920e-01],
[ 1.61666667e+01, 8.95135908e-01],
[ 1.65000000e+01, 8.97150997e-01],
[ 1.68333333e+01, 8.90416846e-01],
[ 1.71666667e+01, 8.90911598e-01],
[ 1.75000000e+01, 8.90111460e-01],
[ 1.78333333e+01, 8.89567069e-01],
[ 1.81666667e+01, 8.82184730e-01],
[ 1.85000000e+01, 8.76020265e-01],
[ 1.88333333e+01, 8.54947843e-01],
[ 1.91666667e+01, 8.43505477e-01],
[ 1.95000000e+01, 8.24739363e-01],
[ 1.98333333e+01, 7.70070922e-01],
[ 2.01666667e+01, 6.33006993e-01],
[ 2.05000000e+01, 4.45367502e-01],
[ 2.08333333e+01, 2.65029636e-01],
[ 2.11666667e+01, 1.22023390e-01],
[ 2.15000000e+01, 4.02201524e-02],
[ 2.18333333e+01, 1.51190986e-02],
[ 2.21666667e+01, 8.75088215e-03],
[ 2.25000000e+01, 4.39466969e-03],
[ 2.28333333e+01, 3.65476525e-03]])
and
In [5]: lmt_mag
Out[5]:
array([[ 16.325 , 0.35 ],
[ 16.54166667, 0.39583333],
[ 16.75833333, 0.35555556],
[ 16.975 , 0.29666667],
[ 17.19166667, 0.42222222],
[ 17.40833333, 0.38541667],
[ 17.625 , 0.4875 ],
[ 17.84166667, 0.41956242],
[ 18.05833333, 0.45333333],
[ 18.275 , 0.45980392],
[ 18.49166667, 0.46742424],
[ 18.70833333, 0.4952381 ],
[ 18.925 , 0.49423077],
[ 19.14166667, 0.53375 ],
[ 19.35833333, 0.56239316],
[ 19.575 , 0.52217391],
[ 19.79166667, 0.55590909],
[ 20.00833333, 0.57421227],
[ 20.225 , 0.5729304 ],
[ 20.44166667, 0.61708204],
[ 20.65833333, 0.63968037],
[ 20.875 , 0.65627395],
[ 21.09166667, 0.66177885],
[ 21.30833333, 0.69375 ],
[ 21.525 , 0.67083333],
[ 21.95833333, 0.88333333],
[ 22.175 , 0.85833333]])
How, in Python, would I go about combining these arrays into a multidimensional array? (More arrays will have to be included)
Further to this, once I have this multidimensional array, is scipy.ndimage.interpolation.map_coordinates the fastest way to interpolate on this?
you can concatenate your arrays with numpy.concatenate((a1, a2, ...), axis=0)
and for redimension numpy have some different function HERE that you can use them depending on your need !
e.g Demo:
>>> import numpy as np
>>> a = np.array([[1, 2], [3, 4]])
>>> b = np.array([[5, 6]])
>>> np.concatenate((a, b), axis=0)
array([[1, 2],
[3, 4],
[5, 6]])
>>> np.expand_dims(y, axis=0)
array([[[1, 2],
[3, 4],
[5, 6]]])
>>> np.expand_dims(y, axis=2)
array([[[1],
[2]],
[[3],
[4]],
[[5],
[6]]])

Python matplotlib errorbar issue

Given these numpy arrays
x = [0 1 2 3 4 5 6 7 8 9]
y = [[ 0. ]
[-0.02083473]
[ 0.08819923]
[ 0.9454764 ]
[ 0.80604627]
[ 0.82189822]
[ 0.73613942]
[ 0.64519742]
[ 0.56973868]
[ 0.612912 ]]
c = [[ 0. 0. ]
[-0.09127286 0.04960341]
[-0.00300709 0.17940555]
[ 0.82319693 1.06775586]
[ 0.74512774 0.8669648 ]
[ 0.75177669 0.89201975]
[ 0.63606087 0.83621797]
[ 0.57786173 0.7125331 ]
[ 0.46722312 0.67225423]
[ 0.54951714 0.67630685]]
I want to plot the graph of x,y , with error bars using the values in c. I tried
plt.errorbar(x, y, yerr=c)
But the interpreter is giving me this error:
File "C:\Python\32\lib\site-packages\matplotlib\axes.py", line 3846, in vlines
for thisx, (thisymin, thisymax) in zip(x,Y)]
File "C:\Python\32\lib\site-packages\matplotlib\axes.py", line 3846, in <listcomp>
for thisx, (thisymin, thisymax) in zip(x,Y)]
ValueError: too many values to unpack (expected 2)
The value of x in zip is
[0 1 2 3 4 5 6 7 8 9]
and the value of Y in zip is
[[[ 0. 0. ]
[ 0.07043814 -0.11210759]
[ 0.09120632 0.08519214]
[ 0.12227947 1.76867333]
[ 0.06091853 1.55117401]
[ 0.07012153 1.57367491]
[ 0.10007855 1.3722003 ]
[ 0.06733568 1.22305915]
[ 0.10251555 1.0369618 ]
[ 0.06339486 1.16242914]]
[[ 0. 0. ]
[-0.07043814 0.02876869]
[-0.09120632 0.26760478]
[-0.12227947 2.01323226]
[-0.06091853 1.67301107]
[-0.07012153 1.71391797]
[-0.10007855 1.57235739]
[-0.06733568 1.35773052]
[-0.10251555 1.2419929 ]
[-0.06339486 1.28921885]]]
I've read around and it looks like my code should be correct (a stupid assumption, but I can't find evidence to the contrary... yet), but it looks like errorbar doesn't like the 2d array. The documentation says that yerr can be a 2d array, with the first column being the min error and the second being the max.
What is it that I'm doing wrong here?
There were some problems with the code which I corrected below and so it works with no problem.
import numpy
import pylab
arr = numpy.asarray
x = arr([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) # put comma between numbers
y = arr([[ 0. ], # make it vector
[-0.02083473],
[ 0.08819923],
[ 0.9454764 ],
[ 0.80604627],
[ 0.82189822],
[ 0.73613942],
[ 0.64519742],
[ 0.56973868],
[ 0.612912 ]]).flatten()
c = arr([[ 0. , 0. ],
[-0.09127286, 0.04960341],
[-0.00300709, 0.17940555],
[ 0.82319693, 1.06775586],
[ 0.74512774, 0.8669648 ],
[ 0.75177669, 0.89201975],
[ 0.63606087, 0.83621797],
[ 0.57786173, 0.7125331 ],
[ 0.46722312, 0.67225423],
[ 0.54951714, 0.67630685]]).T # transpose
pylab.errorbar(x, y, yerr=c)
pylab.show()
and the result:
Good luck.

Categories