max value from specified column in numpy array - python

We need a maximum value from a numpy array with 3 columns.
Sample, i need the maximum value per array of the last column.
In this case the result is: 57.65048981 for the first array, 58.3501091 for the second and 56.86465836 for the third. How to get these 3 values in an array included by the 2 2 values in the columns before?
[array([[ 402. , 242. , 57.65048981],
[ 401. , 243. , 56.32482529]]),
array([[ 356. , 257. , 53.3116188 ],
[ 355. , 258. , 53.69690704],
[ 356. , 258. , 57.52435684],
[ 355. , 259. , 56.98838806],
[ 356. , 259. , 57.81959152],
[ 354. , 260. , 55.90369415],
[ 355. , 260. , 58.14822769],
[ 356. , 260. , 58.3501091 ],
[ 354. , 261. , 55.1479187 ],
[ 355. , 261. , 58.20180893],
[ 354. , 262. , 54.5345459 ]]),
array([[ 386. , 260. , 56.86465836],
[ 386. , 261. , 54.28659439],
[ 386. , 259. , 56.53445435]])]
The result of this should be:
[[402, 242, 57.65048981],
[356 ,260, 58.3501091],
[386 ,260, 56.86465836]]

I think there's an error in your "results"
np.array([arr[np.argmax(arr[:, 2]), :] for arr in arrays])
returns
array([[ 402. , 242. , 57.65048981],
[ 356. , 260. , 58.3501091 ],
[ 386. , 260. , 56.86465836]])

Related

How do I compute start (x0,y0) and end (x1,y1) coordinates for a vector plot from two 3D arrays?

I have two numpy arrays x and y which I wish to use to compute start (x0, y0) and end coordinates (x1, y1) for a vector plot and return a ColumnDataSource.
Some extra details:
To reduce the density, only pick every VECTOR_GRID_SIZEth coordinate of the vector field, so if VECTOR_GRID_SIZE is 10 then only pick every 10th grid line.
subsampling every nth entry in a numpy array
To reduce the cluttered visuals, scale the length of the vector such that they are not longer than one grid cell diagonal. https://numpy.org/doc/stable/reference/generated/numpy.indices.html
https://numpy.org/doc/stable/reference/generated/numpy.linalg.norm.html
Sorry in advance if this question is vague, I not really sure how to go about this.
def get_vector(vx_wind, vy_wind):
# Compute start (x0, y0) and end coordinates (x1, y1) for the vector plot
# and return a ColumnDataSource
# To reduce the density, only pick every VECTOR_GRID_SIZEth coordinate of the vector field
# so if VECTOR_GRID_SIZE is 10 then only pick every 10th grid line, you can do this with the numpy indexing
# https://stackoverflow.com/questions/25876640/subsampling-every-nth-entry-in-a-numpy-array
# Have a look at the numpy.indices function https://numpy.org/doc/stable/reference/generated/numpy.indices.html
# To reduce the cluttered visuals, scale the length of the vector such that they are not longer
# than one grid cell diagonal. Have a look at the np.linalg.norm function
# https://numpy.org/doc/stable/reference/generated/numpy.linalg.norm.html
return ColumnDataSource(dict(
x0=[],
y0=[],
x1=[],
y1=[],
))
The data being used looks like so
x
array([[[ -1.9352316 , -2.1784391 , -2.0420794 , ..., -7.7259355 ,
-8.222389 , -8.678087 ],
[ -1.9066508 , -2.138837 , -1.9983222 , ..., -7.724714 ,
-8.213853 , -8.661865 ],
[ -1.8783709 , -2.0985072 , -1.9538376 , ..., -7.722766 ,
-8.204756 , -8.64526 ],
...,
[ -3.9277816 , -4.369289 , -4.192238 , ..., -10.524209 ,
-10.877747 , -11.316841 ],
[ -3.9103518 , -4.363982 , -4.2031007 , ..., -10.462545 ,
-10.822174 , -11.269867 ],
[ -3.8947043 , -4.358904 , -4.2136602 , ..., -10.403587 ,
-10.768843 , -11.224494 ]],
[[ -1.825511 , -2.1051912 , -1.9596707 , ..., -7.8445425 ,
-8.280411 , -8.609112 ],
[ -1.7917014 , -2.0628197 , -1.9154303 , ..., -7.862482 ,
-8.28994 , -8.6136055 ],
[ -1.8305721 , -2.0650976 , -1.9168247 , ..., -7.7345357 ,
-8.211504 , -8.650529 ],
...,
[ -3.7763813 , -4.182658 , -4.0067177 , ..., -10.80915 ,
-11.230576 , -11.72405 ],
[ -3.553316 , -3.9992642 , -3.8580554 , ..., -10.715811 ,
-11.127258 , -11.63514 ],
[ -3.6532574 , -4.1288366 , -4.0021563 , ..., -10.53067 ,
-10.913573 , -11.3961 ]],
[[ -1.7675097 , -2.0511622 , -1.8993446 , ..., -7.6926303 ,
-8.149524 , -8.561256 ],
[ -1.6994653 , -1.9638515 , -1.807301 , ..., -7.6823363 ,
-8.141736 , -8.568396 ],
[ -1.7125058 , -1.9276817 , -1.7541373 , ..., -7.647284 ,
-8.143199 , -8.618127 ],
...,
[ -3.6395886 , -4.0289826 , -3.874856 , ..., -11.132759 ,
-11.638063 , -12.186226 ],
[ -3.4105558 , -3.8367655 , -3.7143204 , ..., -10.905692 ,
-11.356341 , -11.8825655 ],
[ -3.5878556 , -4.0547514 , -3.9376018 , ..., -10.61081 ,
-11.002459 , -11.483203 ]],
...,
[[ 0.12312252, 0.12312252, -3.1158295 , ..., 1.4167148 ,
1.2708584 , 1.2205316 ],
[ 0.12312252, 0.12312252, -3.2214224 , ..., 1.2588239 ,
1.0818043 , 1.0105078 ],
[ 0.12312252, 0.12312252, -3.0888753 , ..., 1.2072893 ,
1.0733879 , 1.0432893 ],
...,
[ 2.1753364 , 2.551509 , 2.9152517 , ..., -0.41788816,
-0.59589154, -0.6281264 ],
[ 3.5570922 , 3.8387778 , 4.121702 , ..., -0.21907938,
-0.3638638 , -0.5040648 ],
[ 3.439614 , 3.715447 , 4.0491242 , ..., -0.03998844,
-0.13260394, -0.20173173]],
[[ 0.12312252, 0.12312252, -3.0228846 , ..., 1.5080769 ,
1.4269344 , 1.4237957 ],
[ 0.12312252, 0.12312252, -3.056627 , ..., 1.4476027 ,
1.3561925 , 1.3412501 ],
[ 0.12312252, 0.12312252, -2.8762548 , ..., 1.4169012 ,
1.304064 , 1.2805291 ],
...,
[ 1.8998985 , 2.3315022 , 2.7449026 , ..., -0.18343997,
-0.30102775, -0.25929037],
[ 3.1145933 , 3.3163126 , 3.594128 , ..., -0.13163733,
-0.254448 , -0.19267306],
[ 3.302767 , 3.4859457 , 3.813427 , ..., -0.04629464,
-0.14632823, -0.06244416]],
[[ 0.12312252, 0.12312252, -2.7911987 , ..., 1.5441452 ,
1.4401604 , 1.42046 ],
[ 0.12312252, 0.12312252, -2.7635627 , ..., 1.5532548 ,
1.4525807 , 1.4371556 ],
[ 0.12312252, 0.12312252, -2.7272394 , ..., 1.5624781 ,
1.4651129 , 1.4540106 ],
...,
[ 2.4319267 , 2.8567944 , 3.3883882 , ..., 0.05471662,
-0.02231047, 0.0564903 ],
[ 2.4382203 , 2.8614144 , 3.3819 , ..., 0.04200558,
-0.03283606, 0.04848568],
[ 2.4449396 , 2.8661656 , 3.375133 , ..., 0.02939283,
-0.04327111, 0.04050524]]], dtype=float32)
y
array([[[-2.595199 , -3.2169511 , -3.2727983 , ..., 1.3928391 ,
1.6229352 , 1.8929038 ],
[-2.558872 , -3.1848092 , -3.2498548 , ..., 1.4353906 ,
1.6272032 , 1.8064046 ],
[-2.5326447 , -3.1423392 , -3.2141132 , ..., 1.2896425 ,
1.5389758 , 1.8589873 ],
...,
[ 0.14270303, 0.24559933, 0.16122879, ..., -3.4562018 ,
-2.9151714 , -2.5166512 ],
[ 0.20630346, 0.31290913, 0.21521369, ..., -3.5718331 ,
-3.0277667 , -2.6177022 ],
[ 0.4457795 , 0.52072036, 0.40494448, ..., -3.7770114 ,
-3.2327695 , -2.780641 ]],
[[-2.6011267 , -3.1902573 , -3.2494836 , ..., 1.4165395 ,
1.6578158 , 1.9397547 ],
[-2.5944035 , -3.1658921 , -3.2389994 , ..., 1.4419188 ,
1.6282779 , 1.7864251 ],
[-2.5417545 , -3.0854318 , -3.1678715 , ..., 1.3133304 ,
1.5582792 , 1.8606572 ],
...,
[ 0.02837605, 0.12706836, 0.04636747, ..., -3.1712863 ,
-2.601889 , -2.1924775 ],
[ 0.14692819, 0.25369537, 0.15824656, ..., -3.4253662 ,
-2.8756504 , -2.465939 ],
[ 0.47017992, 0.54757214, 0.4331736 , ..., -3.777063 ,
-3.2332487 , -2.781751 ]],
[[-2.6055431 , -3.1633894 , -3.2261672 , ..., 1.4398497 ,
1.6922122 , 1.9859586 ],
[-2.6524377 , -3.1590784 , -3.232543 , ..., 1.4359448 ,
1.656704 , 1.8844774 ],
[-2.5750177 , -3.049425 , -3.1341977 , ..., 1.5483334 ,
1.7405636 , 1.9164655 ],
...,
[ 0.3395448 , 0.40663922, 0.29736227, ..., -3.2353017 ,
-2.608729 , -2.0984588 ],
[ 0.4134232 , 0.48698184, 0.37462074, ..., -3.501945 ,
-2.937528 , -2.4711974 ],
[ 0.49486905, 0.57474923, 0.46182927, ..., -3.7773702 ,
-3.233875 , -2.7830174 ]],
...,
[[ 3.0361044 , 3.0361044 , 4.0558887 , ..., 2.4582958 ,
2.3423316 , 2.3435678 ],
[ 3.0361044 , 3.0361044 , 4.090932 , ..., 2.4705472 ,
2.363343 , 2.3656795 ],
[ 3.0361044 , 3.0361044 , 4.1839056 , ..., 2.5986 ,
2.4977689 , 2.4689748 ],
...,
[ 3.2534566 , 3.3729532 , 2.9451652 , ..., -1.7741623 ,
-1.8449925 , -1.9917284 ],
[ 2.8036075 , 2.9033844 , 2.4022186 , ..., -1.5723983 ,
-1.5823556 , -1.6814709 ],
[ 3.2926936 , 3.4109974 , 3.108499 , ..., -1.3745431 ,
-1.3424737 , -1.4097861 ]],
[[ 3.0361044 , 3.0361044 , 4.1630526 , ..., 2.471031 ,
2.355262 , 2.3562658 ],
[ 3.0361044 , 3.0361044 , 4.323666 , ..., 2.4462888 ,
2.342822 , 2.4139795 ],
[ 3.0361044 , 3.0361044 , 4.369047 , ..., 2.7487423 ,
2.689608 , 2.5143237 ],
...,
[ 4.774482 , 4.858542 , 4.4763894 , ..., -1.7574229 ,
-1.8401223 , -1.9464349 ],
[ 4.2394996 , 4.168938 , 3.623238 , ..., -1.605789 ,
-1.6330541 , -1.706359 ],
[ 3.278389 , 3.4058824 , 3.120429 , ..., -1.363395 ,
-1.3358322 , -1.4080203 ]],
[[ 3.0361044 , 3.0361044 , 4.276292 , ..., 2.4835618 ,
2.3681564 , 2.368855 ],
[ 3.0361044 , 3.0361044 , 4.387421 , ..., 2.4378831 ,
2.3277998 , 2.3953032 ],
[ 3.0361044 , 3.0361044 , 4.360895 , ..., 2.6621757 ,
2.5886424 , 2.4301517 ],
...,
[ 4.551075 , 4.6568046 , 4.374236 , ..., -1.550195 ,
-1.5861742 , -1.6662221 ],
[ 4.3284426 , 4.288225 , 3.8846107 , ..., -1.4945599 ,
-1.5059901 , -1.5768548 ],
[ 3.2622411 , 3.3990464 , 3.1308765 , ..., -1.3521166 ,
-1.3291104 , -1.4061832 ]]], dtype=float32)
x.shape
y.shape
(500, 500, 100)
(500, 500, 100)

Having trouble indexing specific vectors in dataset

This is what the dataset looks like. Type: dic len: 7500[1]
I am having trouble trying to index specific values within the arrays! There are 46 vectors per array (23 x/y pairs). If possible I am trying to select a specific vector for all the 'sequences' in the dataset. Or ideally I am trying to separate the 46 vectors into 23 different variables to cluster them!
Some of the code that I have tried:
for sequence, vectors in df.items():
df1 = np.array_split(vectors, 23, axis=1)
print(df1)
The problem was I couldn't get python to recognize the x, y.
df1 = pd.DataFrame
I tried converting this into a dataframe but still could not index it.
'sequence_7303': array([[ 38.382774 , -1.6118518 , 3.3157895 , ..., 7.757037 ,
-26.928228 , -35.36 ],
[ 38.282295 , -1.6118518 , 3.3157895 , ..., 7.6562963 ,
-27.591389 , -35.22904 ],
[ 38.282295 , -1.7125926 , 3.3157895 , ..., 7.6562963 ,
-28.264595 , -35.108147 ],
...,
[ 51.84689 , -0.60444444, 49.33493 , ..., -16.42074 ,
51.997604 , -24.127409 ],
[ 51.94737 , -0.9066667 , 49.736843 , ..., -16.42074 ,
52.36938 , -24.973629 ],
[ 51.94737 , -1.1081481 , 50.038277 , ..., -16.32 ,
52.751198 , -25.81985 ]], dtype=float32),
'sequence_7302': array([[ 40.100502 , 1.3293233, -7.8090453, ..., 2.863158 ,
-27.753767 , 5.419549 ],
[ 39.994976 , 1.2270677, -8.125628 , ..., 2.7609022,
-28.830153 , 5.153684 ],
[ 39.889446 , 1.2270677, -8.547738 , ..., 2.7609022,
-29.906534 , 4.88782 ],
...,
[ 51.497486 , 3.2721806, 34.190952 , ..., -2.5563908,
42.569847 , 19.121803 ],
[ 51.603016 , 3.3744361, 34.296482 , ..., -2.5563908,
43.023617 , 19.172932 ],
[ 51.603016 , 3.4766917, 34.296482 , ..., -2.4541354,
43.477386 , 19.22406 ]], dtype=float32),
'sequence_1465': array([[ 33.635933 , -0.09883721, -23.654943 , ..., -3.8546512 ,
-24.752851 , -3.8546512 ],
[ 33.53612 , -0.09883721, -24.153992 , ..., -4.3488374 ,
-25.13213 , -4.7046514 ],
[ 33.43631 , 0. , -24.752851 , ..., -4.94186 ,
-25.511406 , -5.5546513 ],
...,
[ 50.40399 , 8.203488 , 2.8944867 , ..., 12.156977 ,
48.95675 , 30.175 ],
[ 50.40399 , 8.401163 , 2.9942966 , ..., 12.3546505 ,
49.276142 , 30.352907 ],
[ 50.30418 , 8.5 , 3.1939163 , ..., 12.552325 ,
49.60551 , 30.540699 ]], dtype=float32),
'sequence_642': array([[ 39.011856 , -1.8658537 , 1.7638341 , ..., -20.939024 ,
-17.534584 , 23.737804 ],
[ 39.011856 , -1.7621951 , 1.5563241 , ..., -20.835365 ,
-17.596838 , 23.665243 ],
[ 39.011856 , -1.6585366 , 1.3488142 , ..., -20.731709 ,
-17.659092 , 23.603048 ],
...,
[ 43.265812 , 2.1768293 , 0. , ..., -18.34756 ,
2.2618577 , 9.598781 ],
[ 43.369564 , 2.1768293 , -0.10375495, ..., -18.34756 ,
3.0503953 , 9.515854 ],
[ 43.369564 , 2.1768293 , -0.20750989, ..., -18.34756 ,
3.838933 , 9.432927 ]], dtype=float32),

matrix.dot(inv(matrix)) isn't equal to identity matrix

I'm encountering an issue since hours, I don't understand why the V matrix below doesn't equal the Identity matrix:
A = np.random.randint(50, size=(100, 2))
V = A.dot(A.T)
D = V.dot(inv(V))
D
The result I found is below either:
array([[ 3.26611328, 7.87890625, 14.1953125 , ..., 2. ,
-5. , -24. ],
[ -5.91061401, -26.05834961, 5.30126953, ..., -10. ,
8. , -16. ],
[ -2.64431763, 3.55639648, 3.10107422, ..., -0.5 ,
-5. , -4. ],
...,
[ -2.62512207, -7.78222656, 10.26367188, ..., -6. ,
18. , 0. ],
[ -3.0625 , 14. , -4. , ..., -0.0625 ,
0. , 8. ],
[ 2. , -7. , 16. , ..., -7.5 ,
-8. , -4. ]])
Thank you for your help
I've found my issue:
I was trying to find the inv() of a matrix which det(matrix) = 0, that's why the calculus wasn't correct.
D = V.T.dot(V)
inv(D).dot(D)
then I find the Identity matrix
Thank you
Habib

Sort 2D NumPy array by one of the columns

I though this would be super easy but I am struggling a little. I have a data structure as follows
array([[ 5. , 3.40166205],
[ 10. , 2.72778882],
[ 15. , 2.31881804],
[ 20. , 2.50643777],
[ 1. , 3.94076063],
[ 2. , 3.80598599],
[ 3. , 3.67121134],
[ 6. , 3.2668874 ],
[ 7. , 3.13211276],
[ 8. , 2.99733811],
[ 9. , 2.86256347],
[ 11. , 2.64599467],
[ 12. , 2.56420051],
[ 13. , 2.48240635],
[ 14. , 2.4006122 ],
[ 16. , 1.8280531 ],
[ 17. , 1.74625894],
[ 18. , 1.66446479],
[ 19. , 1.58267063],
[ 20. , 1.50087647]])
And I want to sort it ONLY on the first column ... so it is ordered as follows:
array([[1. , 3.9],
[2. , 3.8],
... ,
[20. , 1.5]])
np.sort doesn't seem to work as it moves array to a flat structure. I've also used itemgetter
from operator import itemgetter
sorted(data, key=itemgetter(1))
But this doesn't give me the output I'm looking for.
Help appreciated!
This is a common numpy idiom. You can use argsort (on the first column) + numpy indexing here -
x[x[:, 0].argsort()]
array([[ 1. , 3.94076063],
[ 2. , 3.80598599],
[ 3. , 3.67121134],
[ 5. , 3.40166205],
[ 6. , 3.2668874 ],
[ 7. , 3.13211276],
[ 8. , 2.99733811],
[ 9. , 2.86256347],
[ 10. , 2.72778882],
[ 11. , 2.64599467],
[ 12. , 2.56420051],
[ 13. , 2.48240635],
[ 14. , 2.4006122 ],
[ 15. , 2.31881804],
[ 16. , 1.8280531 ],
[ 17. , 1.74625894],
[ 18. , 1.66446479],
[ 19. , 1.58267063],
[ 20. , 2.50643777],
[ 20. , 1.50087647]])

NumPy: removing rows in an array if one column's value does not match

I have two arrays in NumPy:
a1 =
array([[ 262.99182129, 213. , 1. ],
[ 311.98925781, 271.99050903, 2. ],
[ 383. , 342. , 3. ],
[ 372.16494751, 348.83505249, 4. ],
[ 214.55493164, 137.01008606, 5. ],
[ 138.29714966, 199.75 , 6. ],
[ 289.75 , 220.75 , 7. ],
[ 239. , 279. , 8. ],
[ 130.75 , 348.25 , 9. ]])
a2 =
array([[ 265.78259277, 212.99705505, 1. ],
[ 384.23312378, 340.99707031, 3. ],
[ 373.66967773, 347.96688843, 4. ],
[ 217.91461182, 137.2791748 , 5. ],
[ 141.35340881, 199.38366699, 6. ],
[ 292.24401855, 220.83808899, 7. ],
[ 241.53366089, 278.56951904, 8. ],
[ 133.26490784, 347.14279175, 9. ]])
Actually there will be thousands of rows.
But as you can see, the third column in a2 does not have the value 2.0.
What I simply want is to remove from a1 the rows whose 3rd column values are not found in any row of a2.
What's the NumPy way/shortcut to do this fast?
One option is to use np.in1d to check whether each of the values in column 2 of a1 is in column 2 of a2 and use the resulting Boolean array to index the rows of a1.
You can do this as follows:
>>> a1[np.in1d(a1[:, 2], a2[:, 2])]
array([[ 262.99182129, 213. , 1. ],
[ 383. , 342. , 3. ],
[ 372.16494751, 348.83505249, 4. ],
[ 214.55493164, 137.01008606, 5. ],
[ 138.29714966, 199.75 , 6. ],
[ 289.75 , 220.75 , 7. ],
[ 239. , 279. , 8. ],
[ 130.75 , 348.25 , 9. ]])
The row in a1 with 2 in the third column in not in this array as required.

Categories