Python - add 1D-array as column of 2D - python

I want to add a vector as the first column of my 2D array which looks like :
[[ 1. 0. 0. nan]
[ 4. 4. 9.97 1. ]
[ 4. 4. 27.94 1. ]
[ 2. 1. 4.17 1. ]
[ 3. 2. 38.22 1. ]
[ 4. 4. 31.83 1. ]
[ 3. 4. 41.87 1. ]
[ 2. 1. 18.33 1. ]
[ 4. 4. 33.96 1. ]
[ 2. 1. 5.65 1. ]
[ 3. 3. 40.74 1. ]
[ 2. 1. 10.04 1. ]
[ 2. 2. 53.15 1. ]]
I want to add an aray [] of 13 elements as the first column of the matrix. I tried with np.stack_column, np.append but it is for 1D vector or doesn't work because I can't chose axis=1 and only do np.append(peak_values, results)

I have a very simple option for you using numpy -
x = np.array( [[ 3.9427767, -4.297677 ],
[ 3.9427767, -4.297677 ],
[ 3.9427767, -4.297677 ],
[ 3.9427767, -4.297677 ],
[ 3.942777 , -4.297677 ],
[ 3.9427767, -4.297677 ],
[ 3.9427767, -4.297677 ],
[ 3.9427767 ,-4.297677 ],
[ 3.9427767, -4.297677 ],
[ 3.9427772 ,-4.297677 ]])
b = np.arange(10).reshape(-1,1)
np.concatenate((b.T, x), axis=1)
Output-
array([[ 0. , 3.9427767, -4.297677 ],
[ 1. , 3.9427767, -4.297677 ],
[ 2. , 3.9427767, -4.297677 ],
[ 3. , 3.9427767, -4.297677 ],
[ 4. , 3.942777 , -4.297677 ],
[ 5. , 3.9427767, -4.297677 ],
[ 6. , 3.9427767, -4.297677 ],
[ 7. , 3.9427767, -4.297677 ],
[ 8. , 3.9427767, -4.297677 ],
[ 9. , 3.9427772, -4.297677 ]])

Improving on this answer by removing the unnecessary transposition, you can indeed use reshape(-1, 1) to transform the 1d array you'd like to prepend along axis 1 to the 2d array to a 2d array with a single column. At this point, the arrays only differ in shape along the second axis and np.concatenate accepts the arguments:
>>> import numpy as np
>>> a = np.arange(12).reshape(3, 4)
>>> b = np.arange(3)
>>> a
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>> b
array([0, 1, 2])
>>> b.reshape(-1, 1) # preview the reshaping...
array([[0],
[1],
[2]])
>>> np.concatenate((b.reshape(-1, 1), a), axis=1)
array([[ 0, 0, 1, 2, 3],
[ 1, 4, 5, 6, 7],
[ 2, 8, 9, 10, 11]])

For the simplest answer, you probably don't even need numpy.
Try the following:
new_array = []
new_array.append(your_array)
That's it.

I would suggest using Numpy. It will allow you to easily do what you want.
Here is an example of squaring the entire set. you can use something like nums[0].
nums = [0, 1, 2, 3, 4]
even_squares = [x ** 2 for x in nums if x % 2 == 0]
print even_squares # Prints "[0, 4, 16]"

Related

How do I combine two numpy arrays so for each row of the first array I append all rows from the second one?

I have the following numpy arrays:
theta_array =
array([[ 1, 10],
[ 1, 11],
[ 1, 12],
[ 1, 13],
[ 1, 14],
[ 2, 10],
[ 2, 11],
[ 2, 12],
[ 2, 13],
[ 2, 14],
[ 3, 10],
[ 3, 11],
[ 3, 12],
[ 3, 13],
[ 3, 14],
[ 4, 10],
[ 4, 11],
[ 4, 12],
[ 4, 13],
[ 4, 14]])
XY_array =
array([[ 44.0394952 , 505.81099922],
[ 61.03882938, 515.97253226],
[ 26.69851841, 525.18083012],
[ 46.78487831, 533.42309602],
[ 45.77188401, 545.42988355],
[ 81.12969132, 554.78767379],
[ 54.178463 , 565.8716283 ],
[ 41.58952084, 574.76827133],
[ 85.24956815, 585.1355127 ],
[ 80.73726733, 595.49446033],
[ 22.70625059, 605.59017175],
[ 40.66810604, 615.26308629],
[ 47.16694695, 624.39222332],
[ 48.72499541, 633.19846364],
[ 50.68589921, 643.72334885],
[ 38.42731134, 654.68595883],
[ 47.39519707, 666.28232866],
[ 58.07767155, 673.9572227 ],
[ 72.11393347, 683.68307373],
[ 53.70872932, 694.65509894],
[ 82.08237952, 704.5868817 ],
[ 46.64069738, 715.18427515],
[ 40.46032478, 723.91308011],
[ 75.69090892, 733.69595658],
[120.61447884, 745.31322786],
[ 60.17764744, 754.89747186],
[ 87.15961973, 766.24040447],
[ 82.93872713, 773.01518252],
[ 93.56688906, 785.60640153],
[ 70.0474047 , 793.81792947],
[104.3613818 , 805.40234676],
[108.39253837, 814.75002114],
[ 78.97643673, 824.95386427],
[ 85.69096895, 834.44797862],
[ 53.07112931, 844.39555058],
[111.49875807, 855.660508 ],
[ 70.88824958, 865.53417489],
[ 79.55499469, 875.31303945],
[ 60.86941464, 885.85235946],
[101.06017712, 896.69986636],
[ 74.55823544, 905.87417231],
[113.24705653, 915.19350121],
[ 94.21920882, 925.87933273],
[ 63.26478103, 933.70804578],
[ 95.97827181, 945.76196917],
[ 80.48623318, 955.60422694],
[ 80.03451808, 964.39856485],
[ 73.86032436, 973.91032818],
[103.96923524, 984.24366761],
[ 93.20663129, 995.44618851]])
I am trying to combine both, so for each combination of theta_array I get all combinations from XY_array.
I am aware about this post so I have done this:
np.array(np.meshgrid(theta_array, XY_array)).T.reshape(-1,4)
But this generates:
array([[ 1. , 44.0394952 , 1. , 505.81099922],
[ 1. , 61.03882938, 1. , 515.97253226],
[ 1. , 26.69851841, 1. , 525.18083012],
...,
[ 14. , 73.86032436, 14. , 973.91032818],
[ 14. , 103.96923524, 14. , 984.24366761],
[ 14. , 93.20663129, 14. , 995.44618851]])
and the problem requires:
array([[ 1. , 1. , 44.0394952 , 505.81099922],
[ 1. , 1. , 61.03882938, 515.97253226],
[ 1. , 1. , 26.69851841, 525.18083012],
...,
[ 14. , 14. , 73.86032436, 973.91032818],
[ 14. , 14. , 103.96923524, 984.24366761],
[ 14. , 14. , 93.20663129, 995.44618851]])
Which would be the way of doing this combination/aggregation in numpy?
EDIT:
There is a mistake in the above process as the combined arrays do not lead to the generation of that matrix. With separate vectors for each column the actual solution to merge this is:
dataset = np.array(np.meshgrid(theta0_range, theta1_range, X)).T.reshape(-1,3)
And later the Y vector can be added as an additional column.
You can reorder the "columns" after using meshgrid with [:,[0,2,1,3]] and if you need to make the list dynamic because of a large number of columns, then you can see the end of my answer:
np.array(np.meshgrid(theta_array, XY_array)).T.reshape(-1,4)[:,[0,2,1,3]]
Output:
array([[ 1. , 1. , 44.0394952 , 505.81099922]],
[[ 1. , 1. , 61.03882938, 515.97253226]],
[[ 1. , 1. , 26.69851841, 525.18083012]],
...,
[[ 14. , 14. , 73.86032436, 973.91032818]],
[[ 14. , 14. , 103.96923524, 984.24366761]],
[[ 14. , 14. , 93.20663129, 995.44618851]])
If you have many columns you could dynamically create this list: [0,2,1,3] with list comprehension. For example:
n = new_arr.shape[1]*2
lst = [x for x in range(n) if x % 2 == 0]
[lst.append(z) for z in [y for y in range(n) if y % 2 == 1]]
lst
[0, 2, 4, 6, 1, 3, 5, 7]
Then, you could rewrite to:
np.array(np.meshgrid(theta_array, XY_array)).T.reshape(-1,4)[:,lst]
You can use itertools.product:
out = np.array([*product(theta_array, XY_array)])
out = out.reshape(out.shape[0],-1)
Output:
array([[ 1. , 10. , 44.0394952 , 505.81099922],
[ 1. , 10. , 61.03882938, 515.97253226],
[ 1. , 10. , 26.69851841, 525.18083012],
...,
[ 4. , 14. , 73.86032436, 973.91032818],
[ 4. , 14. , 103.96923524, 984.24366761],
[ 4. , 14. , 93.20663129, 995.44618851]])
That said, this looks very much like an XY-problem. What are you trying to do with this array?
Just as side/complementary reference here is a comparison in terms of execution time for both solutions. For this specific operation itertools takes 10 times more time to complete than its numpy equivalent.
%%time
for i in range(1000):
z = np.array(np.meshgrid(theta_array, XY_array)).T.reshape(-1,4)[:,[0,2,1,3]]
CPU times: user 299 ms, sys: 0 ns, total: 299 ms
Wall time: 328 ms
%%time
for i in range(1000):
z = np.array([*product(theta_array, XY_array)])
z = z.reshape(z.shape[0],-1)
CPU times: user 2.79 s, sys: 474 µs, total: 2.79 s
Wall time: 2.84 s

Numpy get values from np.argmin indices [duplicate]

This question already has answers here:
How to take elements along a given axis, given by their indices?
(4 answers)
indexing a numpy array with indices from another array
(1 answer)
Closed 4 years ago.
Let's say I've d1, d2 and d3 as following. t is a variable where I've combined my arrays and m contains the indices of the smallest value.
>>> d1
array([[ 0.9850916 , 0.95004463, 1.35728604, 1.18554035],
[ 0.47624542, 0.45561795, 0.6231743 , 0.94746001],
[ 0.74008166, 0. , 1.59774065, 1.00423774],
[ 0.86173439, 0.70940862, 1.0601817 , 0.96112015],
[ 1.03413477, 0.64874991, 1.27488263, 0.80250053]])
>>> d2
array([[ 0.27301946, 0.38387185, 0.93215524, 0.98851404],
[ 0.17996978, 0. , 0.41283798, 0.15204035],
[ 0.10952115, 0.45561795, 0.5334015 , 0.75242805],
[ 0.4600214 , 0.74100962, 0.16743427, 0.36250385],
[ 0.60984208, 0.35161234, 0.44580535, 0.6713633 ]])
>>> d3
array([[ 0. , 0.19658541, 1.14605925, 1.18431945],
[ 0.10697428, 0.27301946, 0.45536417, 0.11922118],
[ 0.42153386, 0.9850916 , 0.28225364, 0.82765657],
[ 1.04940684, 1.63082272, 0.49987388, 0.38596938],
[ 0.21015723, 1.07007177, 0.22599987, 0.89288339]])
>>> t = np.array([d1, d2, d3])
>>> t
array([[[ 0.9850916 , 0.95004463, 1.35728604, 1.18554035],
[ 0.47624542, 0.45561795, 0.6231743 , 0.94746001],
[ 0.74008166, 0. , 1.59774065, 1.00423774],
[ 0.86173439, 0.70940862, 1.0601817 , 0.96112015],
[ 1.03413477, 0.64874991, 1.27488263, 0.80250053]],
[[ 0.27301946, 0.38387185, 0.93215524, 0.98851404],
[ 0.17996978, 0. , 0.41283798, 0.15204035],
[ 0.10952115, 0.45561795, 0.5334015 , 0.75242805],
[ 0.4600214 , 0.74100962, 0.16743427, 0.36250385],
[ 0.60984208, 0.35161234, 0.44580535, 0.6713633 ]],
[[ 0. , 0.19658541, 1.14605925, 1.18431945],
[ 0.10697428, 0.27301946, 0.45536417, 0.11922118],
[ 0.42153386, 0.9850916 , 0.28225364, 0.82765657],
[ 1.04940684, 1.63082272, 0.49987388, 0.38596938],
[ 0.21015723, 1.07007177, 0.22599987, 0.89288339]]])
>>> m = np.argmin(t, axis=0)
>>> m
array([[2, 2, 1, 1],
[2, 1, 1, 2],
[1, 0, 2, 1],
[1, 0, 1, 1],
[2, 1, 2, 1]])
From m and t, I want to calculate the actual values as following. How do I do this? ... preferably, the efficient way?
array([ [ 0. , 0.19658541, 0.93215524, 0.98851404],
[ 0.10697428, 0. , 0.41283798, 0.11922118],
[ 0.10952115, 0. , 0.28225364, 0.75242805],
[ 0.4600214 , 0.70940862, 0.16743427, 0.36250385],
[ 0.21015723, 0.35161234, 0.22599987, 0.6713633 ]])
If only the minimum is what you needed, you can use np.min(t, axis=0)
If you want to use customary indexing, you can use choose:
m.choose(t) # This will return the same thing.
It can also be written as
np.choose(m, t)
Which returns:
array([[0. , 0.19658541, 0.93215524, 0.98851404],
[0.10697428, 0. , 0.41283798, 0.11922118],
[0.10952115, 0. , 0.28225364, 0.75242805],
[0.4600214 , 0.70940862, 0.16743427, 0.36250385],
[0.21015723, 0.35161234, 0.22599987, 0.6713633 ]])

NumPy: removing rows in an array if one column's value does not match

I have two arrays in NumPy:
a1 =
array([[ 262.99182129, 213. , 1. ],
[ 311.98925781, 271.99050903, 2. ],
[ 383. , 342. , 3. ],
[ 372.16494751, 348.83505249, 4. ],
[ 214.55493164, 137.01008606, 5. ],
[ 138.29714966, 199.75 , 6. ],
[ 289.75 , 220.75 , 7. ],
[ 239. , 279. , 8. ],
[ 130.75 , 348.25 , 9. ]])
a2 =
array([[ 265.78259277, 212.99705505, 1. ],
[ 384.23312378, 340.99707031, 3. ],
[ 373.66967773, 347.96688843, 4. ],
[ 217.91461182, 137.2791748 , 5. ],
[ 141.35340881, 199.38366699, 6. ],
[ 292.24401855, 220.83808899, 7. ],
[ 241.53366089, 278.56951904, 8. ],
[ 133.26490784, 347.14279175, 9. ]])
Actually there will be thousands of rows.
But as you can see, the third column in a2 does not have the value 2.0.
What I simply want is to remove from a1 the rows whose 3rd column values are not found in any row of a2.
What's the NumPy way/shortcut to do this fast?
One option is to use np.in1d to check whether each of the values in column 2 of a1 is in column 2 of a2 and use the resulting Boolean array to index the rows of a1.
You can do this as follows:
>>> a1[np.in1d(a1[:, 2], a2[:, 2])]
array([[ 262.99182129, 213. , 1. ],
[ 383. , 342. , 3. ],
[ 372.16494751, 348.83505249, 4. ],
[ 214.55493164, 137.01008606, 5. ],
[ 138.29714966, 199.75 , 6. ],
[ 289.75 , 220.75 , 7. ],
[ 239. , 279. , 8. ],
[ 130.75 , 348.25 , 9. ]])
The row in a1 with 2 in the third column in not in this array as required.

Sample with replacement from existing array

I have a matrix A with shape 1.6M rows and 400 columns.
One of the columns in A (call it the output column) has binary values (0,1) with a predominance of 0's.
I want to create a new matrix B (same shape as A) by sampling rows in A with replacement such, that the distribution of 0's & 1's in the output column in B becomes 50/50.
What is the efficient way to do this using python/numpy?
You could do this by:
Creating a list of all rows with 0 in the "output column" (called outputZeros), and a list of all rows with 1 in the output column (called outputOnes); then,
Sampling with replacement from outputZeros and outputOnes 1.6M times.
Here's a small example. It's not clear to me if you want the rows in B to be in any particular order, so here they first include 0s, then include 1s.
In [1]: import numpy as np, random
In [2]: A = np.random.rand(10, 2)
In [3]: A
In [4]: A[:7, 1] = 0
In [5]: A[7:, 1] = 1
In [6]: A
Out[6]:
array([[ 0.70126052, 0. ],
[ 0.51161067, 0. ],
[ 0.76511966, 0. ],
[ 0.91257144, 0. ],
[ 0.97024895, 0. ],
[ 0.55817776, 0. ],
[ 0.55963466, 0. ],
[ 0.6318139 , 1. ],
[ 0.90176108, 1. ],
[ 0.76033151, 1. ]])
In [7]: outputZeros = np.where(A[:, 1] == 0)[0]
In [8]: outputZeros
Out[8]: array([0, 1, 2, 3, 4, 5, 6])
In [9]: outputOnes
Out[9]: array([7, 8, 9])
In [10]: outputOnes = np.where(A[:, 1] == 1)[0]
In [11]: B = np.zeros((10, 2))
In [12]: for i in range(10):
if i < 5:
B[i, :] = A[random.choice(outputZeros), :]
else:
B[i, :] = A[random.choice(outputOnes), :]
....:
In [13]: B
Out[13]:
array([[ 0.97024895, 0. ],
[ 0.97024895, 0. ],
[ 0.76511966, 0. ],
[ 0.76511966, 0. ],
[ 0.51161067, 0. ],
[ 0.90176108, 1. ],
[ 0.76033151, 1. ],
[ 0.6318139 , 1. ],
[ 0.6318139 , 1. ],
[ 0.76033151, 1. ]])
I would create a new 1D numpy array filled with values from numpy.random.random_integers(low, high=None, size=None) and swap that new array with the old one.

Python matplotlib errorbar issue

Given these numpy arrays
x = [0 1 2 3 4 5 6 7 8 9]
y = [[ 0. ]
[-0.02083473]
[ 0.08819923]
[ 0.9454764 ]
[ 0.80604627]
[ 0.82189822]
[ 0.73613942]
[ 0.64519742]
[ 0.56973868]
[ 0.612912 ]]
c = [[ 0. 0. ]
[-0.09127286 0.04960341]
[-0.00300709 0.17940555]
[ 0.82319693 1.06775586]
[ 0.74512774 0.8669648 ]
[ 0.75177669 0.89201975]
[ 0.63606087 0.83621797]
[ 0.57786173 0.7125331 ]
[ 0.46722312 0.67225423]
[ 0.54951714 0.67630685]]
I want to plot the graph of x,y , with error bars using the values in c. I tried
plt.errorbar(x, y, yerr=c)
But the interpreter is giving me this error:
File "C:\Python\32\lib\site-packages\matplotlib\axes.py", line 3846, in vlines
for thisx, (thisymin, thisymax) in zip(x,Y)]
File "C:\Python\32\lib\site-packages\matplotlib\axes.py", line 3846, in <listcomp>
for thisx, (thisymin, thisymax) in zip(x,Y)]
ValueError: too many values to unpack (expected 2)
The value of x in zip is
[0 1 2 3 4 5 6 7 8 9]
and the value of Y in zip is
[[[ 0. 0. ]
[ 0.07043814 -0.11210759]
[ 0.09120632 0.08519214]
[ 0.12227947 1.76867333]
[ 0.06091853 1.55117401]
[ 0.07012153 1.57367491]
[ 0.10007855 1.3722003 ]
[ 0.06733568 1.22305915]
[ 0.10251555 1.0369618 ]
[ 0.06339486 1.16242914]]
[[ 0. 0. ]
[-0.07043814 0.02876869]
[-0.09120632 0.26760478]
[-0.12227947 2.01323226]
[-0.06091853 1.67301107]
[-0.07012153 1.71391797]
[-0.10007855 1.57235739]
[-0.06733568 1.35773052]
[-0.10251555 1.2419929 ]
[-0.06339486 1.28921885]]]
I've read around and it looks like my code should be correct (a stupid assumption, but I can't find evidence to the contrary... yet), but it looks like errorbar doesn't like the 2d array. The documentation says that yerr can be a 2d array, with the first column being the min error and the second being the max.
What is it that I'm doing wrong here?
There were some problems with the code which I corrected below and so it works with no problem.
import numpy
import pylab
arr = numpy.asarray
x = arr([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) # put comma between numbers
y = arr([[ 0. ], # make it vector
[-0.02083473],
[ 0.08819923],
[ 0.9454764 ],
[ 0.80604627],
[ 0.82189822],
[ 0.73613942],
[ 0.64519742],
[ 0.56973868],
[ 0.612912 ]]).flatten()
c = arr([[ 0. , 0. ],
[-0.09127286, 0.04960341],
[-0.00300709, 0.17940555],
[ 0.82319693, 1.06775586],
[ 0.74512774, 0.8669648 ],
[ 0.75177669, 0.89201975],
[ 0.63606087, 0.83621797],
[ 0.57786173, 0.7125331 ],
[ 0.46722312, 0.67225423],
[ 0.54951714, 0.67630685]]).T # transpose
pylab.errorbar(x, y, yerr=c)
pylab.show()
and the result:
Good luck.

Categories