Related
I am initializing two multivariate gaussian distributions like so and trying to implement a machine learning algorithm to draw a decision boundary between the classes:
import numpy as np
import matplotlib.pyplot as plt
import torch
import random
mu0 = [-2,-2]
mu1 = [2, 2]
cov = np.array([[1, 0],[0, 1]])
X = np.random.randn(10,2)
L = np.linalg.cholesky(cov)
Y0 = mu0 + X#L.T
Y1 = mu1 + X#L.T
I have two separated circles and I am trying to stack Y0 and Y1, shuffle them, and then break them into training and testing splits. First I append the class labels to the data, and then stack.
n,m = Y1.shape
class0 = np.zeros((n,1))
class1 = np.ones((n,1))
Y_0 = np.hstack((Y0,class0))
Y_1 = np.hstack((Y1,class1))
data = np.vstack((Y_0,Y_1))
Now when i try to call random.shuffle(data) the zero class takes over and I get a small number of class one instances.
random.shuffle(data)
Here is my data before shuffling:
print(data)
[[-3.16184428 -1.89491433 0. ]
[ 0.2710061 -1.41000924 0. ]
[-3.50742027 -2.04238337 0. ]
[-1.39966859 -1.57430259 0. ]
[-0.98356629 -3.02299622 0. ]
[-0.49583458 -1.64067853 0. ]
[-2.62577229 -2.32941225 0. ]
[-1.16005269 -2.76429318 0. ]
[-1.88618759 -2.79178253 0. ]
[-1.34790868 -2.10294791 0. ]
[ 0.83815572 2.10508567 1. ]
[ 4.2710061 2.58999076 1. ]
[ 0.49257973 1.95761663 1. ]
[ 2.60033141 2.42569741 1. ]
[ 3.01643371 0.97700378 1. ]
[ 3.50416542 2.35932147 1. ]
[ 1.37422771 1.67058775 1. ]
[ 2.83994731 1.23570682 1. ]
[ 2.11381241 1.20821747 1. ]
[ 2.65209132 1.89705209 1. ]]
and after shufffling:
data
array([[-0.335667 , -0.60826166, 0. ],
[-0.335667 , -0.60826166, 0. ],
[-0.335667 , -0.60826166, 0. ],
[-0.335667 , -0.60826166, 0. ],
[-2.22547604, -1.62833794, 0. ],
[-3.3287687 , -2.37694753, 0. ],
[-3.2915737 , -1.31558952, 0. ],
[-2.23912202, -1.54625136, 0. ],
[-0.335667 , -0.60826166, 0. ],
[-2.23912202, -1.54625136, 0. ],
[-2.11217077, -2.70157476, 0. ],
[-3.25714184, -2.7679462 , 0. ],
[-3.2915737 , -1.31558952, 0. ],
[-2.22547604, -1.62833794, 0. ],
[ 0.73756329, 1.46127708, 1. ],
[ 1.88782923, 1.29842524, 1. ],
[ 1.77452396, 2.37166206, 1. ],
[ 1.77452396, 2.37166206, 1. ],
[ 3.664333 , 3.39173834, 1. ],
[ 3.664333 , 3.39173834, 1. ]])
Why is random.shuffle deleting my data? I just need all twenty rows to be shuffled, but it is repeating lines and i am losing data. i'm not setting random.shuffle to a variable and am simply just calling random.shuffle(data). Are there any other ways to simply shuffle my data?
Because the swap method used by the random.shuffle does not work in ndarray:
# Python 3.10.7 random.py
class Random(_random.Random):
...
def shuffle(self, x, random=None):
...
if random is None:
randbelow = self._randbelow
for i in reversed(range(1, len(x))):
# pick an element in x[:i+1] with which to exchange x[i]
j = randbelow(i + 1)
x[i], x[j] = x[j], x[i] # <----------------
...
...
Using index on multi-dimensional array will result in a view instead of a copy, which will prevent the swap from working properly. For more information, you can refer to this question.
Better choice numpy.random.Generator.shuffle:
>>> data
array([[-1.88985877, -2.97312795, 0. ],
[-1.52352452, -2.19633099, 0. ],
[-2.06297352, -1.36627294, 0. ],
[-1.47460488, -2.09410403, 0. ],
[-1.18753167, -1.71069966, 0. ],
[-1.92878766, -1.19545861, 0. ],
[-2.4858627 , -2.66525855, 0. ],
[-2.97169999, -1.46985506, 0. ],
[-2.11395907, -2.19108576, 0. ],
[-2.63976951, -1.66742147, 0. ],
[ 2.11014123, 1.02687205, 1. ],
[ 2.47647548, 1.80366901, 1. ],
[ 1.93702648, 2.63372706, 1. ],
[ 2.52539512, 1.90589597, 1. ],
[ 2.81246833, 2.28930034, 1. ],
[ 2.07121234, 2.80454139, 1. ],
[ 1.5141373 , 1.33474145, 1. ],
[ 1.02830001, 2.53014494, 1. ],
[ 1.88604093, 1.80891424, 1. ],
[ 1.36023049, 2.33257853, 1. ]])
>>> rng = np.random.default_rng()
>>> rng.shuffle(data, 0)
>>> data
array([[-1.92878766, -1.19545861, 0. ],
[-2.97169999, -1.46985506, 0. ],
[ 2.07121234, 2.80454139, 1. ],
[ 1.36023049, 2.33257853, 1. ],
[ 1.93702648, 2.63372706, 1. ],
[-2.11395907, -2.19108576, 0. ],
[-2.63976951, -1.66742147, 0. ],
[ 1.02830001, 2.53014494, 1. ],
[ 2.11014123, 1.02687205, 1. ],
[ 1.88604093, 1.80891424, 1. ],
[-1.47460488, -2.09410403, 0. ],
[ 2.52539512, 1.90589597, 1. ],
[-1.18753167, -1.71069966, 0. ],
[-1.88985877, -2.97312795, 0. ],
[ 2.81246833, 2.28930034, 1. ],
[-2.06297352, -1.36627294, 0. ],
[ 1.5141373 , 1.33474145, 1. ],
[-2.4858627 , -2.66525855, 0. ],
[-1.52352452, -2.19633099, 0. ],
[ 2.47647548, 1.80366901, 1. ]])
In this example, numpy.random.shuffle also works normally because OP just requires shuffling along the first axis, but numpy.random.Generator.shuffle is the recommended usage in the new code and supports shuffling along other axis.
I have a 2D array and I want to delete a point out of it but suppose it's so big meaning I can't specify an index and just grab it and the values of the array are float
How can I delete this point? With a LOOP and WITHOUT LOOP?? the following is 2D array and I want to delete [ 32.9, 23.]
[[ 1. , -1.4],
[ -2.9, -1.5],
[ -3.6, -2. ],
[ 1.5, 1. ],
[ 24. , 11. ],
[ -1. , 1.4],
[ 2.9, 1.5],
[ 3.6, 2. ],
[ -1.5, -1. ],
[ -24. , -11. ],
[ 32.9, 23. ],
[-440. , 310. ]]
I tried this but doesn't work:
this_point = np.asarray([ 32.9, 23.])
[x for x in y if x == point]
del datapoints[this_point]
np.delete(datapoints,len(datapoints), axis=0)
for this_point in datapoints:
del this_point
when I do this, the this_point stays in after printing all points, what should I do?
Python can remove a list element by content, but numpy does only by index. So, use "where" to find the coordinates of the matching row:
import numpy as np
a = np.array([[ 1. , -1.4],
[ -2.9, -1.5],
[ -3.6, -2. ],
[ 1.5, 1. ],
[ 24. , 11. ],
[ -1. , 1.4],
[ 2.9, 1.5],
[ 3.6, 2. ],
[ -1.5, -1. ],
[ -24. , -11. ],
[ 32.9, 23. ],
[-440. , 310. ]])
find = np.array([32.9,23.])
row = np.where( (a == find).all(axis=1))
print( row )
print(np.delete( a, row, axis=0 ) )
Output:
(array([10], dtype=int64),)
[[ 1. -1.4]
[ -2.9 -1.5]
[ -3.6 -2. ]
[ 1.5 1. ]
[ 24. 11. ]
[ -1. 1.4]
[ 2.9 1.5]
[ 3.6 2. ]
[ -1.5 -1. ]
[ -24. -11. ]
[-440. 310. ]]
C:\tmp>
I want to add a vector as the first column of my 2D array which looks like :
[[ 1. 0. 0. nan]
[ 4. 4. 9.97 1. ]
[ 4. 4. 27.94 1. ]
[ 2. 1. 4.17 1. ]
[ 3. 2. 38.22 1. ]
[ 4. 4. 31.83 1. ]
[ 3. 4. 41.87 1. ]
[ 2. 1. 18.33 1. ]
[ 4. 4. 33.96 1. ]
[ 2. 1. 5.65 1. ]
[ 3. 3. 40.74 1. ]
[ 2. 1. 10.04 1. ]
[ 2. 2. 53.15 1. ]]
I want to add an aray [] of 13 elements as the first column of the matrix. I tried with np.stack_column, np.append but it is for 1D vector or doesn't work because I can't chose axis=1 and only do np.append(peak_values, results)
I have a very simple option for you using numpy -
x = np.array( [[ 3.9427767, -4.297677 ],
[ 3.9427767, -4.297677 ],
[ 3.9427767, -4.297677 ],
[ 3.9427767, -4.297677 ],
[ 3.942777 , -4.297677 ],
[ 3.9427767, -4.297677 ],
[ 3.9427767, -4.297677 ],
[ 3.9427767 ,-4.297677 ],
[ 3.9427767, -4.297677 ],
[ 3.9427772 ,-4.297677 ]])
b = np.arange(10).reshape(-1,1)
np.concatenate((b.T, x), axis=1)
Output-
array([[ 0. , 3.9427767, -4.297677 ],
[ 1. , 3.9427767, -4.297677 ],
[ 2. , 3.9427767, -4.297677 ],
[ 3. , 3.9427767, -4.297677 ],
[ 4. , 3.942777 , -4.297677 ],
[ 5. , 3.9427767, -4.297677 ],
[ 6. , 3.9427767, -4.297677 ],
[ 7. , 3.9427767, -4.297677 ],
[ 8. , 3.9427767, -4.297677 ],
[ 9. , 3.9427772, -4.297677 ]])
Improving on this answer by removing the unnecessary transposition, you can indeed use reshape(-1, 1) to transform the 1d array you'd like to prepend along axis 1 to the 2d array to a 2d array with a single column. At this point, the arrays only differ in shape along the second axis and np.concatenate accepts the arguments:
>>> import numpy as np
>>> a = np.arange(12).reshape(3, 4)
>>> b = np.arange(3)
>>> a
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>> b
array([0, 1, 2])
>>> b.reshape(-1, 1) # preview the reshaping...
array([[0],
[1],
[2]])
>>> np.concatenate((b.reshape(-1, 1), a), axis=1)
array([[ 0, 0, 1, 2, 3],
[ 1, 4, 5, 6, 7],
[ 2, 8, 9, 10, 11]])
For the simplest answer, you probably don't even need numpy.
Try the following:
new_array = []
new_array.append(your_array)
That's it.
I would suggest using Numpy. It will allow you to easily do what you want.
Here is an example of squaring the entire set. you can use something like nums[0].
nums = [0, 1, 2, 3, 4]
even_squares = [x ** 2 for x in nums if x % 2 == 0]
print even_squares # Prints "[0, 4, 16]"
This question already has answers here:
Sorting a 2D numpy array by multiple axes
(7 answers)
Closed 7 years ago.
So I've a 2d array, that when sorted by the second column using a[np.argsort(-a[:,1])] looks like this:
array([[ 30. , 98.7804878 ],
[ 24. , 98.7804878 ],
[ 21. , 98.7804878 ],
[ 26. , 98.7804878 ],
[ 20. , 98.70875179],
[ 4. , 98.27833572],
[ 1. , 7.10186514]])
Now I want to sort this by the lowest "id" column so it looks like this:
array([[ 21. , 98.7804878 ],
[ 24. , 98.7804878 ],
[ 26. , 98.7804878 ],
[ 30. , 98.7804878 ],
[ 20. , 98.70875179],
[ 4. , 98.27833572],
[ 1. , 7.10186514]])
I can't figure out how to do it, even if I take the top highest percentages from the first and then order them.
You can use np.lexsort for this:
>>> a[np.lexsort((a[:, 0], -a[:, 1]))]
array([[ 21. , 98.7804878 ],
[ 24. , 98.7804878 ],
[ 26. , 98.7804878 ],
[ 30. , 98.7804878 ],
[ 20. , 98.70875179],
[ 4. , 98.27833572],
[ 1. , 7.10186514]])
This sorts by -a[:, 1], then by a[:, 0], returning an array of indices than you can use to index a.
I have two arrays in NumPy:
a1 =
array([[ 262.99182129, 213. , 1. ],
[ 311.98925781, 271.99050903, 2. ],
[ 383. , 342. , 3. ],
[ 372.16494751, 348.83505249, 4. ],
[ 214.55493164, 137.01008606, 5. ],
[ 138.29714966, 199.75 , 6. ],
[ 289.75 , 220.75 , 7. ],
[ 239. , 279. , 8. ],
[ 130.75 , 348.25 , 9. ]])
a2 =
array([[ 265.78259277, 212.99705505, 1. ],
[ 384.23312378, 340.99707031, 3. ],
[ 373.66967773, 347.96688843, 4. ],
[ 217.91461182, 137.2791748 , 5. ],
[ 141.35340881, 199.38366699, 6. ],
[ 292.24401855, 220.83808899, 7. ],
[ 241.53366089, 278.56951904, 8. ],
[ 133.26490784, 347.14279175, 9. ]])
Actually there will be thousands of rows.
But as you can see, the third column in a2 does not have the value 2.0.
What I simply want is to remove from a1 the rows whose 3rd column values are not found in any row of a2.
What's the NumPy way/shortcut to do this fast?
One option is to use np.in1d to check whether each of the values in column 2 of a1 is in column 2 of a2 and use the resulting Boolean array to index the rows of a1.
You can do this as follows:
>>> a1[np.in1d(a1[:, 2], a2[:, 2])]
array([[ 262.99182129, 213. , 1. ],
[ 383. , 342. , 3. ],
[ 372.16494751, 348.83505249, 4. ],
[ 214.55493164, 137.01008606, 5. ],
[ 138.29714966, 199.75 , 6. ],
[ 289.75 , 220.75 , 7. ],
[ 239. , 279. , 8. ],
[ 130.75 , 348.25 , 9. ]])
The row in a1 with 2 in the third column in not in this array as required.