Assign values from 1d NumPy into classes - python

If I have a 1d array:
arr = np.array([ 5.243618, 5.219185, 4.755633, 5.685147, 5.2342 , 6.06918 ,
5.324837, 4.857919, 5.768971, 4.310884, 4.442189, 4.883281,
4.591852, 5.8325 , 5.865175, 5.642187, 5.941979, 6.30038 ,
6.475276, 4.598086, 5.822819, 5.938378, 6.271719, 5.465492,
4.230573, 4.331199, 4.912246, 4.878696, 5.393229, 4.857071,
4.95928 , 4.83672 , 5.530075, 4.233449, 5.591468, 4.546228,
4.710242, 4.880406, 4.279519, 4.461141, 6.168588, 6.074305,
5.720245, 6.127273, 5.79335 , 6.176584, 5.04695 , 5.80022 ,
5.899088, 5.925466, 5.095225, 6.33216 , 6.335905, 3.918357,
4.703728, 4.605504, 5.216878, 6.144148, 4.883721, 5.601009,])
and a list containing upper bounds:
bins = [4.9122459999999997, 5.3932289999999998, 5.7202450000000002, 6.0743049999999998, 6.475276]
I'd like to return an array of equal size to arr, containing the bin number for each value (1, 1, 0, 2, 1, 3, 1 etc.)
I've tried np.split() with the bins (patently wrong), but I can't find a simple method to do this.

You can use numpy digitize method to bin your data into bins:
np.digitize(arr, bins)
The output contains the bin that each data point belong to. See doc here: LINK

Related

Removing multiple values from ndarray at random

I need to remove multiple elements, specifically 11 samples from a Numpy array object with shape (5891, 10) so that when converted to 3d array, its second dimension = 6 in the resultant shape (-1, 6, 10). Need some help in this regard.
array([[-0.0296606 , -0.86639415, 1.31166578, ..., -0.56398655,
-0.62098712, -0.60561292],
[-0.08361501, -0.8338129 , 1.59085632, ..., -0.44607017,
-0.51810143, -0.73432292],
[-0.56023046, -0.90793786, 1.70571559, ..., -0.53988458,
0.16418027, -0.62065893],
...,
[ 0.08385978, -0.85598757, 2.09466405, ..., -0.53553566,
-0.41929891, -0.67636976],
[-0.1878731 , -0.8483329 , 1.93933521, ..., -0.66563641,
-0.43016374, -0.63886954],
[-0.06811212, -0.9358068 , 0.99574035, ..., -0.62080424,
-0.1695455 , -0.8211152 ]])
arr = np.random.random((5891, 10))
# set a static seed if you want reproducability of the choices
rng = np.random.default_rng(seed=42)
# choose all but 11 rows
chosen = rng.choice(arr, size=arr.shape[0] - 11, replace=False, axis=0)
# and reshape
out = chosen.reshape((-1, 6, 10))

cv2.perspectiveTransform() not performing the operation

I want to apply a transformation matrix to a set of points. So the set of points:
points = np.array([[0 ,20], [0, 575], [0, 460]])
And I want to use the matrix I calculated with cv2.getPerspectiveTransform() which is a 3x3 matrix.
matrix = np.array([
[ -4. , -3. , 1920. ],
[ -2.25 , -1.6875 , 1080. ],
[ -0.0020833, -0.0015625, 1. ]])
Then I pass the array and a matrix to the following function:
def poly_points_transform(poly_points, matrix):
poly_points_transformed = np.empty_like(poly_points)
for i in range(len(poly_points)):
point = np.array([[poly_points[i]]])
transformed_point = cv2.perspectiveTransform(point, matrix)
np.append(poly_points_transformed, transformed_point)
return poly_points_transformed
Now It doesn't throw an error, but it just copies the src array to the poly_points_transformed. It might be something really rudimentary and stupid. If it is the case, I am sorry, but could someone give me a hint on what is wrong? Thanks in advance
We may solve it with one line of code:
transformed_point = cv2.perspectiveTransform(np.array([points], np.float64), matrix)[0]
As Micka commented cv2.perspectiveTransform takes a list of points (and returns a list of points as output).
np.array([points]) is used because cv2.perspectiveTransform expects 3D array.
For details see trouble getting cv.transform to work.
np.float64 is used in case the dtype of points is int32 (the method accepts float64 and float32 types).
[0] is used for removing the redundant dimension (convert from 3D to 2D).
For fixing the loop, replace np.append(poly_points_transformed, transformed_point) with:
poly_points_transformed[i] = transformed_point[0].
Since the array is initialized to poly_points_transformed = np.empty_like(poly_points), we can't use np.append().
Code sample:
import cv2
import numpy as np
points = np.array([[0.0 ,20.0], [0.0, 575.0], [0.0, 460.0]])
matrix = np.array([
[ -4. , -3. , 1920. ],
[ -2.25 , -1.6875 , 1080. ],
[ -0.0020833, -0.0015625, 1. ]])
# transformed_point = cv2.perspectiveTransform(np.array([points], np.float64), matrix)[0]
def poly_points_transform(poly_points, matrix):
poly_points_transformed = np.empty_like(poly_points)
for i in range(len(poly_points)):
point = np.array([[poly_points[i]]])
transformed_point = cv2.perspectiveTransform(point, matrix)
poly_points_transformed[i] = transformed_point[0] #np.append(poly_points_transformed, transformed_point)
return poly_points_transformed
poly_points_transformed = poly_points_transform(points, matrix)
The result is:
poly_points_transformed =
array([[1920., 1080.],
[1920., 1080.],
[1920., 1080.]])
Why are we getting [1920.0, 1080.0] value for all the transformed points?
Lets transform the middle point mathematically:
Multiply matrix by point (with 1 in the third index)
[ -4. , -3. , 1920. ] [ 0]
[ -2.25 , -1.6875 , 1080. ] * [575] =
[ -0.0020833, -0.0015625, 1. ] [ 1]
p = matrix # np.array([[0.0], [575.0], [1.0]]) =
[1.950000e+02]
[1.096875e+02]
[1.015625e-01]
Now divide the coordinates by the last element (converting homogeneous coordinates to Euclidian coordinates):
[1.950000e+02/1.015625e-01] [1920]
[1.096875e+02/1.015625e-01] = p / p[2] = [1080]
[1.015625e-01/1.015625e-01] [ 1]
The equivalent Euclidian point is [1920, 1080].
The transformation matrix may be wrong, because it transforms all the input points (with x coordinate equals 0) to the same output point...

How to scale and print an array based on its minimum and maximum value?

I'm trying to scale the following NumPy array based on its minimum and maximum values.
array = [[17405.051 17442.4 17199.6 17245.65 ]
[17094.949 17291.75 17091.15 17222.75 ]
[17289. 17294.9 17076.551 17153. ]
[17181.85 17235.1 17003.9 17222. ]]
Formula used is:
m=(x-xmin)/(xmax-xmin)
wherein m is an individually scaled item, x is an individual item, xmax is the highest value and xmin is the smallest value of the array.
My question is how do I print the scaled array?
P.S. - I can't use MinMaxScaler as I need to scale a given number (outside the array) by plugging it in the mentioned formula with xmin & xmax of the given array.
I tried scaling the individual items by iterating over the array but I'm unable to put together the scaled array.
I'm new to NumPy, any suggestions would be welcome.
Thank you.
Use method ndarray.min(), ndarray.max() or ndarray.ptp()(gets the range of the values in the array):
>>> ar = np.array([[17405.051, 17442.4, 17199.6, 17245.65 ],
... [17094.949, 17291.75, 17091.15, 17222.75 ],
... [17289., 17294.9, 17076.551, 17153. ],
... [17181.85, 17235.1, 17003.9, 17222. ]])
>>> min_val = ar.min()
>>> range_val = ar.ptp()
>>> (ar - min_val) / range_val
array([[0.91482554, 1. , 0.44629418, 0.55131129],
[0.2076374 , 0.65644242, 0.19897377, 0.4990878 ],
[0.65017104, 0.663626 , 0.16568073, 0.34002281],
[0.40581528, 0.527252 , 0. , 0.49737742]])
I think you should learn more about the basic operation of numpy.
import numpy as np
array_list = [[17405.051, 17442.4, 17199.6, 17245.65 ],
[17094.949, 17291.75, 17091.15, 17222.75 ],
[17289., 17294.9, 17076.551, 17153., ],
[17181.85, 17235.1, 17003.9, 17222. ]]
# Convert list into numpy array
array = np.array(array_list)
# Create empty list
scaled_array_list=[]
for x in array:
m = (x - np.min(array))/(np.max(array)-np.min(array))
scaled_array_list.append(m)
# Convert list into numpy array
scaled_array = np.array(scaled_array_list)
scaled_array
My version is by iterating over the array as you said.
You can also put everything in a function and use it in future:
def scaler(array_to_scale):
# Create empty list
scaled_array_list=[]
for x in array:
m = (x - np.min(array))/(np.max(array)-np.min(array))
scaled_array_list.append(m)
# Convert list into numpy array
scaled_array = np.array(scaled_array_list)
return scaled_array
# Here it is our input
array_list = [[17405.051, 17442.4, 17199.6, 17245.65 ],
[17094.949, 17291.75, 17091.15, 17222.75 ],
[17289., 17294.9, 17076.551, 17153., ],
[17181.85, 17235.1, 17003.9, 17222. ]]
# Convert list into numpy array
array = np.array(array_list)
scaler(array)
Output:
Out:
array([[0.91482554, 1. , 0.44629418, 0.55131129],
[0.2076374 , 0.65644242, 0.19897377, 0.4990878 ],
[0.65017104, 0.663626 , 0.16568073, 0.34002281],
[0.40581528, 0.527252 , 0. , 0.49737742]])

Normalizing vectors contained in an array

I've got an array, called X, where every element is a 2d-vector itself. The diagonal of this array is filled with nothing but zero-vectors.
Now I need to normalize every vector in this array, without changing the structure of it.
First I tried to calculate the norm of every vector and put it in an array, called N. After that I wanted to divide every element of X by every element of N.
Two problems occured to me:
1) Many entries of N are zero, which is obviously a problem when I try to divide by them.
2) The shapes of the arrays don't match, so np.divide() doesn't work as expected.
Beyond that I don't think, that it's a good idea to calculate N like this, because later on I want to be able to do the same with more than two vectors.
import numpy as np
# Example array
X = np.array([[[0, 0], [1, -1]], [[-1, 1], [0, 0]]])
# Array containing the norms
N = np.vstack((np.linalg.norm(X[0], axis=1), np.linalg.norm(X[1],
axis=1)))
R = np.divide(X, N)
I want the output to look like this:
R = np.array([[[0, 0], [0.70710678, -0.70710678]], [[-0.70710678, 0.70710678], [0, 0]]])
You do not need to use sklearn. Just define a function and then use list comprehension:
Assuming that the 0th dimension of the X is equal to the number of 2D arrays that you have, use this:
import numpy as np
# Example array
X = np.array([[[0, 0], [1, -1]], [[-1, 1], [0, 0]]])
def stdmtx(X):
X= X - X.mean(axis =1)[:, np.newaxis]
X= X / X.std(axis= 1, ddof=1)[:, np.newaxis]
return np.nan_to_num(X)
R = np.array([stdmtx(X[i,:,:]) for i in range(X.shape[0])])
The desired output R:
array([[[ 0. , 0. ],
[ 0.70710678, -0.70710678]],
[[-0.70710678, 0.70710678],
[ 0. , 0. ]]])

Compare two numpy arrays by first Column and create a third numpy array by concatenating two arrays

I have two 2d numpy arrays which is used to plot simulation results.
The first column of both arrays a and b contains the time intervals and the second column contains the data to be plotted. The two arrays have different shapes a(500,2) b(600,2). I want to compare these two numpy arrays by first column and create a third array with matches found on the first column of a. If no match is found add 0 to third column.
Is there any numpy trick to do this?
For instance:
a=[[0.002,0.998],
[0.004,0.997],
[0.006,0.996],
[0.008,0.995],
[0.010,0.993]]
b= [[0.002,0.666],
[0.004,0.665],
[0.0041,0.664],
[0.0042,0.664],
[0.0043,0.664],
[0.0044,0.663],
[0.0045,0.663],
[0.0005,0.663],
[0.006,0.663],
[0.0061,0.662],
[0.008,0.661]]
expected output
c= [[0.002,0.998,0.666],
[0.004,0.997,0.665],
[0.006,0.996,0.663],
[0.008,0.995,0.661],
[0.010,0.993, 0 ]]
I can quickly think of the solution as
import numpy as np
a = np.array([[0.002, 0.998],
[0.004, 0.997],
[0.006, 0.996],
[0.008, 0.995],
[0.010, 0.993]])
b = np.array([[0.002, 0.666],
[0.004, 0.665],
[0.0041, 0.664],
[0.0042, 0.664],
[0.0043, 0.664],
[0.0044, 0.663],
[0.0045, 0.663],
[0.0005, 0.663],
[0.0006, 0.663],
[0.00061, 0.662],
[0.0008, 0.661]])
c = []
for row in a:
index = np.where(b[:,0] == row[0])[0]
if np.size(index) != 0:
c.append([row[0], row[1], b[index[0], 1]])
else:
c.append([row[0], row[1], 0])
print c
As pointed out in the comments above, there seems to be a data entry error
import numpy as np
i = np.intersect1d(a[:,0], b[:,0])
overlap = np.vstack([i, a[np.in1d(a[:,0], i), 1], b[np.in1d(b[:,0], i), 1]]).T
underlap = np.setdiff1d(a[:,0], b[:,0])
underlap = np.vstack([underlap, a[np.in1d(a[:,0], underlap), 1], underlap*0]).T
fast_c = np.vstack([overlap, underlap])
This works by taking the intersection of the first column of a and b using intersect1d, and then using in1d to cross-reference that intersection with the second columns.
vstack stacks the elements of the input vertically, and the transpose is needed to get the right dimensions (very fast operation).
Then find times in a that are not in b using setdiff1d, and complete the result by putting 0s in the third column.
This prints out
array([[ 0.002, 0.998, 0.666],
[ 0.004, 0.997, 0.665],
[ 0.006, 0.996, 0. ],
[ 0.008, 0.995, 0. ],
[ 0.01 , 0.993, 0. ]])
The following works both for numpy arrays and simple python lists.
c = [[*x, y[1]] for x in a for y in b if x[0] == y[0]]
d = [[*x, 0] for x in a if x[0] not in [y[0] for y in b]]
c.extend(d)
Someone braver than I am could try to make this one line.

Categories