Having trouble indexing specific vectors in dataset - python

This is what the dataset looks like. Type: dic len: 7500[1]
I am having trouble trying to index specific values within the arrays! There are 46 vectors per array (23 x/y pairs). If possible I am trying to select a specific vector for all the 'sequences' in the dataset. Or ideally I am trying to separate the 46 vectors into 23 different variables to cluster them!
Some of the code that I have tried:
for sequence, vectors in df.items():
df1 = np.array_split(vectors, 23, axis=1)
print(df1)
The problem was I couldn't get python to recognize the x, y.
df1 = pd.DataFrame
I tried converting this into a dataframe but still could not index it.
'sequence_7303': array([[ 38.382774 , -1.6118518 , 3.3157895 , ..., 7.757037 ,
-26.928228 , -35.36 ],
[ 38.282295 , -1.6118518 , 3.3157895 , ..., 7.6562963 ,
-27.591389 , -35.22904 ],
[ 38.282295 , -1.7125926 , 3.3157895 , ..., 7.6562963 ,
-28.264595 , -35.108147 ],
...,
[ 51.84689 , -0.60444444, 49.33493 , ..., -16.42074 ,
51.997604 , -24.127409 ],
[ 51.94737 , -0.9066667 , 49.736843 , ..., -16.42074 ,
52.36938 , -24.973629 ],
[ 51.94737 , -1.1081481 , 50.038277 , ..., -16.32 ,
52.751198 , -25.81985 ]], dtype=float32),
'sequence_7302': array([[ 40.100502 , 1.3293233, -7.8090453, ..., 2.863158 ,
-27.753767 , 5.419549 ],
[ 39.994976 , 1.2270677, -8.125628 , ..., 2.7609022,
-28.830153 , 5.153684 ],
[ 39.889446 , 1.2270677, -8.547738 , ..., 2.7609022,
-29.906534 , 4.88782 ],
...,
[ 51.497486 , 3.2721806, 34.190952 , ..., -2.5563908,
42.569847 , 19.121803 ],
[ 51.603016 , 3.3744361, 34.296482 , ..., -2.5563908,
43.023617 , 19.172932 ],
[ 51.603016 , 3.4766917, 34.296482 , ..., -2.4541354,
43.477386 , 19.22406 ]], dtype=float32),
'sequence_1465': array([[ 33.635933 , -0.09883721, -23.654943 , ..., -3.8546512 ,
-24.752851 , -3.8546512 ],
[ 33.53612 , -0.09883721, -24.153992 , ..., -4.3488374 ,
-25.13213 , -4.7046514 ],
[ 33.43631 , 0. , -24.752851 , ..., -4.94186 ,
-25.511406 , -5.5546513 ],
...,
[ 50.40399 , 8.203488 , 2.8944867 , ..., 12.156977 ,
48.95675 , 30.175 ],
[ 50.40399 , 8.401163 , 2.9942966 , ..., 12.3546505 ,
49.276142 , 30.352907 ],
[ 50.30418 , 8.5 , 3.1939163 , ..., 12.552325 ,
49.60551 , 30.540699 ]], dtype=float32),
'sequence_642': array([[ 39.011856 , -1.8658537 , 1.7638341 , ..., -20.939024 ,
-17.534584 , 23.737804 ],
[ 39.011856 , -1.7621951 , 1.5563241 , ..., -20.835365 ,
-17.596838 , 23.665243 ],
[ 39.011856 , -1.6585366 , 1.3488142 , ..., -20.731709 ,
-17.659092 , 23.603048 ],
...,
[ 43.265812 , 2.1768293 , 0. , ..., -18.34756 ,
2.2618577 , 9.598781 ],
[ 43.369564 , 2.1768293 , -0.10375495, ..., -18.34756 ,
3.0503953 , 9.515854 ],
[ 43.369564 , 2.1768293 , -0.20750989, ..., -18.34756 ,
3.838933 , 9.432927 ]], dtype=float32),

Related

How do I compute start (x0,y0) and end (x1,y1) coordinates for a vector plot from two 3D arrays?

I have two numpy arrays x and y which I wish to use to compute start (x0, y0) and end coordinates (x1, y1) for a vector plot and return a ColumnDataSource.
Some extra details:
To reduce the density, only pick every VECTOR_GRID_SIZEth coordinate of the vector field, so if VECTOR_GRID_SIZE is 10 then only pick every 10th grid line.
subsampling every nth entry in a numpy array
To reduce the cluttered visuals, scale the length of the vector such that they are not longer than one grid cell diagonal. https://numpy.org/doc/stable/reference/generated/numpy.indices.html
https://numpy.org/doc/stable/reference/generated/numpy.linalg.norm.html
Sorry in advance if this question is vague, I not really sure how to go about this.
def get_vector(vx_wind, vy_wind):
# Compute start (x0, y0) and end coordinates (x1, y1) for the vector plot
# and return a ColumnDataSource
# To reduce the density, only pick every VECTOR_GRID_SIZEth coordinate of the vector field
# so if VECTOR_GRID_SIZE is 10 then only pick every 10th grid line, you can do this with the numpy indexing
# https://stackoverflow.com/questions/25876640/subsampling-every-nth-entry-in-a-numpy-array
# Have a look at the numpy.indices function https://numpy.org/doc/stable/reference/generated/numpy.indices.html
# To reduce the cluttered visuals, scale the length of the vector such that they are not longer
# than one grid cell diagonal. Have a look at the np.linalg.norm function
# https://numpy.org/doc/stable/reference/generated/numpy.linalg.norm.html
return ColumnDataSource(dict(
x0=[],
y0=[],
x1=[],
y1=[],
))
The data being used looks like so
x
array([[[ -1.9352316 , -2.1784391 , -2.0420794 , ..., -7.7259355 ,
-8.222389 , -8.678087 ],
[ -1.9066508 , -2.138837 , -1.9983222 , ..., -7.724714 ,
-8.213853 , -8.661865 ],
[ -1.8783709 , -2.0985072 , -1.9538376 , ..., -7.722766 ,
-8.204756 , -8.64526 ],
...,
[ -3.9277816 , -4.369289 , -4.192238 , ..., -10.524209 ,
-10.877747 , -11.316841 ],
[ -3.9103518 , -4.363982 , -4.2031007 , ..., -10.462545 ,
-10.822174 , -11.269867 ],
[ -3.8947043 , -4.358904 , -4.2136602 , ..., -10.403587 ,
-10.768843 , -11.224494 ]],
[[ -1.825511 , -2.1051912 , -1.9596707 , ..., -7.8445425 ,
-8.280411 , -8.609112 ],
[ -1.7917014 , -2.0628197 , -1.9154303 , ..., -7.862482 ,
-8.28994 , -8.6136055 ],
[ -1.8305721 , -2.0650976 , -1.9168247 , ..., -7.7345357 ,
-8.211504 , -8.650529 ],
...,
[ -3.7763813 , -4.182658 , -4.0067177 , ..., -10.80915 ,
-11.230576 , -11.72405 ],
[ -3.553316 , -3.9992642 , -3.8580554 , ..., -10.715811 ,
-11.127258 , -11.63514 ],
[ -3.6532574 , -4.1288366 , -4.0021563 , ..., -10.53067 ,
-10.913573 , -11.3961 ]],
[[ -1.7675097 , -2.0511622 , -1.8993446 , ..., -7.6926303 ,
-8.149524 , -8.561256 ],
[ -1.6994653 , -1.9638515 , -1.807301 , ..., -7.6823363 ,
-8.141736 , -8.568396 ],
[ -1.7125058 , -1.9276817 , -1.7541373 , ..., -7.647284 ,
-8.143199 , -8.618127 ],
...,
[ -3.6395886 , -4.0289826 , -3.874856 , ..., -11.132759 ,
-11.638063 , -12.186226 ],
[ -3.4105558 , -3.8367655 , -3.7143204 , ..., -10.905692 ,
-11.356341 , -11.8825655 ],
[ -3.5878556 , -4.0547514 , -3.9376018 , ..., -10.61081 ,
-11.002459 , -11.483203 ]],
...,
[[ 0.12312252, 0.12312252, -3.1158295 , ..., 1.4167148 ,
1.2708584 , 1.2205316 ],
[ 0.12312252, 0.12312252, -3.2214224 , ..., 1.2588239 ,
1.0818043 , 1.0105078 ],
[ 0.12312252, 0.12312252, -3.0888753 , ..., 1.2072893 ,
1.0733879 , 1.0432893 ],
...,
[ 2.1753364 , 2.551509 , 2.9152517 , ..., -0.41788816,
-0.59589154, -0.6281264 ],
[ 3.5570922 , 3.8387778 , 4.121702 , ..., -0.21907938,
-0.3638638 , -0.5040648 ],
[ 3.439614 , 3.715447 , 4.0491242 , ..., -0.03998844,
-0.13260394, -0.20173173]],
[[ 0.12312252, 0.12312252, -3.0228846 , ..., 1.5080769 ,
1.4269344 , 1.4237957 ],
[ 0.12312252, 0.12312252, -3.056627 , ..., 1.4476027 ,
1.3561925 , 1.3412501 ],
[ 0.12312252, 0.12312252, -2.8762548 , ..., 1.4169012 ,
1.304064 , 1.2805291 ],
...,
[ 1.8998985 , 2.3315022 , 2.7449026 , ..., -0.18343997,
-0.30102775, -0.25929037],
[ 3.1145933 , 3.3163126 , 3.594128 , ..., -0.13163733,
-0.254448 , -0.19267306],
[ 3.302767 , 3.4859457 , 3.813427 , ..., -0.04629464,
-0.14632823, -0.06244416]],
[[ 0.12312252, 0.12312252, -2.7911987 , ..., 1.5441452 ,
1.4401604 , 1.42046 ],
[ 0.12312252, 0.12312252, -2.7635627 , ..., 1.5532548 ,
1.4525807 , 1.4371556 ],
[ 0.12312252, 0.12312252, -2.7272394 , ..., 1.5624781 ,
1.4651129 , 1.4540106 ],
...,
[ 2.4319267 , 2.8567944 , 3.3883882 , ..., 0.05471662,
-0.02231047, 0.0564903 ],
[ 2.4382203 , 2.8614144 , 3.3819 , ..., 0.04200558,
-0.03283606, 0.04848568],
[ 2.4449396 , 2.8661656 , 3.375133 , ..., 0.02939283,
-0.04327111, 0.04050524]]], dtype=float32)
y
array([[[-2.595199 , -3.2169511 , -3.2727983 , ..., 1.3928391 ,
1.6229352 , 1.8929038 ],
[-2.558872 , -3.1848092 , -3.2498548 , ..., 1.4353906 ,
1.6272032 , 1.8064046 ],
[-2.5326447 , -3.1423392 , -3.2141132 , ..., 1.2896425 ,
1.5389758 , 1.8589873 ],
...,
[ 0.14270303, 0.24559933, 0.16122879, ..., -3.4562018 ,
-2.9151714 , -2.5166512 ],
[ 0.20630346, 0.31290913, 0.21521369, ..., -3.5718331 ,
-3.0277667 , -2.6177022 ],
[ 0.4457795 , 0.52072036, 0.40494448, ..., -3.7770114 ,
-3.2327695 , -2.780641 ]],
[[-2.6011267 , -3.1902573 , -3.2494836 , ..., 1.4165395 ,
1.6578158 , 1.9397547 ],
[-2.5944035 , -3.1658921 , -3.2389994 , ..., 1.4419188 ,
1.6282779 , 1.7864251 ],
[-2.5417545 , -3.0854318 , -3.1678715 , ..., 1.3133304 ,
1.5582792 , 1.8606572 ],
...,
[ 0.02837605, 0.12706836, 0.04636747, ..., -3.1712863 ,
-2.601889 , -2.1924775 ],
[ 0.14692819, 0.25369537, 0.15824656, ..., -3.4253662 ,
-2.8756504 , -2.465939 ],
[ 0.47017992, 0.54757214, 0.4331736 , ..., -3.777063 ,
-3.2332487 , -2.781751 ]],
[[-2.6055431 , -3.1633894 , -3.2261672 , ..., 1.4398497 ,
1.6922122 , 1.9859586 ],
[-2.6524377 , -3.1590784 , -3.232543 , ..., 1.4359448 ,
1.656704 , 1.8844774 ],
[-2.5750177 , -3.049425 , -3.1341977 , ..., 1.5483334 ,
1.7405636 , 1.9164655 ],
...,
[ 0.3395448 , 0.40663922, 0.29736227, ..., -3.2353017 ,
-2.608729 , -2.0984588 ],
[ 0.4134232 , 0.48698184, 0.37462074, ..., -3.501945 ,
-2.937528 , -2.4711974 ],
[ 0.49486905, 0.57474923, 0.46182927, ..., -3.7773702 ,
-3.233875 , -2.7830174 ]],
...,
[[ 3.0361044 , 3.0361044 , 4.0558887 , ..., 2.4582958 ,
2.3423316 , 2.3435678 ],
[ 3.0361044 , 3.0361044 , 4.090932 , ..., 2.4705472 ,
2.363343 , 2.3656795 ],
[ 3.0361044 , 3.0361044 , 4.1839056 , ..., 2.5986 ,
2.4977689 , 2.4689748 ],
...,
[ 3.2534566 , 3.3729532 , 2.9451652 , ..., -1.7741623 ,
-1.8449925 , -1.9917284 ],
[ 2.8036075 , 2.9033844 , 2.4022186 , ..., -1.5723983 ,
-1.5823556 , -1.6814709 ],
[ 3.2926936 , 3.4109974 , 3.108499 , ..., -1.3745431 ,
-1.3424737 , -1.4097861 ]],
[[ 3.0361044 , 3.0361044 , 4.1630526 , ..., 2.471031 ,
2.355262 , 2.3562658 ],
[ 3.0361044 , 3.0361044 , 4.323666 , ..., 2.4462888 ,
2.342822 , 2.4139795 ],
[ 3.0361044 , 3.0361044 , 4.369047 , ..., 2.7487423 ,
2.689608 , 2.5143237 ],
...,
[ 4.774482 , 4.858542 , 4.4763894 , ..., -1.7574229 ,
-1.8401223 , -1.9464349 ],
[ 4.2394996 , 4.168938 , 3.623238 , ..., -1.605789 ,
-1.6330541 , -1.706359 ],
[ 3.278389 , 3.4058824 , 3.120429 , ..., -1.363395 ,
-1.3358322 , -1.4080203 ]],
[[ 3.0361044 , 3.0361044 , 4.276292 , ..., 2.4835618 ,
2.3681564 , 2.368855 ],
[ 3.0361044 , 3.0361044 , 4.387421 , ..., 2.4378831 ,
2.3277998 , 2.3953032 ],
[ 3.0361044 , 3.0361044 , 4.360895 , ..., 2.6621757 ,
2.5886424 , 2.4301517 ],
...,
[ 4.551075 , 4.6568046 , 4.374236 , ..., -1.550195 ,
-1.5861742 , -1.6662221 ],
[ 4.3284426 , 4.288225 , 3.8846107 , ..., -1.4945599 ,
-1.5059901 , -1.5768548 ],
[ 3.2622411 , 3.3990464 , 3.1308765 , ..., -1.3521166 ,
-1.3291104 , -1.4061832 ]]], dtype=float32)
x.shape
y.shape
(500, 500, 100)
(500, 500, 100)

matrix.dot(inv(matrix)) isn't equal to identity matrix

I'm encountering an issue since hours, I don't understand why the V matrix below doesn't equal the Identity matrix:
A = np.random.randint(50, size=(100, 2))
V = A.dot(A.T)
D = V.dot(inv(V))
D
The result I found is below either:
array([[ 3.26611328, 7.87890625, 14.1953125 , ..., 2. ,
-5. , -24. ],
[ -5.91061401, -26.05834961, 5.30126953, ..., -10. ,
8. , -16. ],
[ -2.64431763, 3.55639648, 3.10107422, ..., -0.5 ,
-5. , -4. ],
...,
[ -2.62512207, -7.78222656, 10.26367188, ..., -6. ,
18. , 0. ],
[ -3.0625 , 14. , -4. , ..., -0.0625 ,
0. , 8. ],
[ 2. , -7. , 16. , ..., -7.5 ,
-8. , -4. ]])
Thank you for your help
I've found my issue:
I was trying to find the inv() of a matrix which det(matrix) = 0, that's why the calculus wasn't correct.
D = V.T.dot(V)
inv(D).dot(D)
then I find the Identity matrix
Thank you
Habib

Sort 2D NumPy array by one of the columns

I though this would be super easy but I am struggling a little. I have a data structure as follows
array([[ 5. , 3.40166205],
[ 10. , 2.72778882],
[ 15. , 2.31881804],
[ 20. , 2.50643777],
[ 1. , 3.94076063],
[ 2. , 3.80598599],
[ 3. , 3.67121134],
[ 6. , 3.2668874 ],
[ 7. , 3.13211276],
[ 8. , 2.99733811],
[ 9. , 2.86256347],
[ 11. , 2.64599467],
[ 12. , 2.56420051],
[ 13. , 2.48240635],
[ 14. , 2.4006122 ],
[ 16. , 1.8280531 ],
[ 17. , 1.74625894],
[ 18. , 1.66446479],
[ 19. , 1.58267063],
[ 20. , 1.50087647]])
And I want to sort it ONLY on the first column ... so it is ordered as follows:
array([[1. , 3.9],
[2. , 3.8],
... ,
[20. , 1.5]])
np.sort doesn't seem to work as it moves array to a flat structure. I've also used itemgetter
from operator import itemgetter
sorted(data, key=itemgetter(1))
But this doesn't give me the output I'm looking for.
Help appreciated!
This is a common numpy idiom. You can use argsort (on the first column) + numpy indexing here -
x[x[:, 0].argsort()]
array([[ 1. , 3.94076063],
[ 2. , 3.80598599],
[ 3. , 3.67121134],
[ 5. , 3.40166205],
[ 6. , 3.2668874 ],
[ 7. , 3.13211276],
[ 8. , 2.99733811],
[ 9. , 2.86256347],
[ 10. , 2.72778882],
[ 11. , 2.64599467],
[ 12. , 2.56420051],
[ 13. , 2.48240635],
[ 14. , 2.4006122 ],
[ 15. , 2.31881804],
[ 16. , 1.8280531 ],
[ 17. , 1.74625894],
[ 18. , 1.66446479],
[ 19. , 1.58267063],
[ 20. , 2.50643777],
[ 20. , 1.50087647]])

Having trouble determining a 2D feature matrix structure to feed into machine learning algorithm

I am training an emotion recognition system that detects emotions through facial movement as a result, I have formed a 4 dimensional matrix that I am trying to reduce to 2 dimensions.
Features that makes up the 4D matrix:
Number of videos (and each video will be assigned emotion label)
Number of frames per video
Direction of the facial landmarks per frame
Speed of the facial landmarks per frame
The important features that I am trying to train with:
The left side is the speed (hypotenuse between same facial landmark each frame)
The right side is direction (arctan of the x and y values of the same facial landmark each frame)
The 4D matrix that I am stuck with and trying to reduce to 2D
>> main.shape
(60, 17, 68, 2)
# 60 videos, 17 frames per video, 68 facial landmarks, 2 features (direction and speed)
>> main
array([[[[ 0. , 0. ],
[ 0. , 0. ],
[ 0. , 0. ],
...,
[ 0. , 0. ],
[ 0. , 0. ],
[ 0. , 0. ]],
[[ 1. , 1. ],
[ 1.41421356, 0.78539816],
[ 1.41421356, 0.78539816],
...,
[ 3. , 1. ],
[ 3. , 1. ],
[ 3. , 1. ]],
[[ 0. , 0. ],
[ -1.41421356, 0.78539816],
[ -1.41421356, 0.78539816],
...,
[ 2. , 1. ],
[ 3. , 1. ],
[ 3. , 1. ]],
...,
[[ 1. , 1. ],
[ 1.41421356, -0.78539816],
[ 1.41421356, -0.78539816],
...,
[ -1.41421356, 0.78539816],
[ 1. , 1. ],
[ -1.41421356, 0.78539816]],
[[ 2.23606798, -0.46364761],
[ 2.82842712, -0.78539816],
[ 2.23606798, -0.46364761],
...,
[ 1. , 0. ],
[ 0. , 0. ],
[ 1. , 1. ]],
[[ -1.41421356, -0.78539816],
[ -2.23606798, -0.46364761],
[ -2.23606798, -0.46364761],
...,
[ 1.41421356, -0.78539816],
[ 1.41421356, -0.78539816],
[ 2.23606798, -1.10714872]]],
[[[ 0. , 0. ],
[ 0. , 0. ],
[ 0. , 0. ],
...,
[ 0. , 0. ],
[ 0. , 0. ],
[ 0. , 0. ]],
[[ 2. , 1. ],
[ 2.23606798, -1.10714872],
[ 1.41421356, -0.78539816],
...,
[ -2. , -0. ],
[ -1. , -0. ],
[ -1.41421356, -0.78539816]],
[[ 2. , 1. ],
[ -2.23606798, 1.10714872],
[ -1.41421356, 0.78539816],
...,
[ 1. , 1. ],
[ -1. , -0. ],
[ -1. , -0. ]],
...,
[[ -2. , -0. ],
[ -3. , -0. ],
[ -4.12310563, -0.24497866],
...,
[ 0. , 0. ],
[ -1. , -0. ],
[ -2.23606798, 1.10714872]],
[[ -2.23606798, 1.10714872],
[ -1.41421356, 0.78539816],
[ -2.23606798, 1.10714872],
...,
[ -2.23606798, 0.46364761],
[ -1.41421356, 0.78539816],
[ -1.41421356, 0.78539816]],
[[ 2. , 1. ],
[ 1.41421356, 0.78539816],
[ 2.82842712, 0.78539816],
...,
[ 1. , 1. ],
[ 1. , 1. ],
[ -2.23606798, -1.10714872]]],
[[[ 0. , 0. ],
[ 0. , 0. ],
[ 0. , 0. ],
...,
[ 0. , 0. ],
[ 0. , 0. ],
[ 0. , 0. ]],
[[ 1. , 1. ],
[ 0. , 0. ],
[ 1. , 1. ],
...,
[ -3. , -0. ],
[ -2. , -0. ],
[ 0. , 0. ]],
[[ 0. , 0. ],
[ 0. , 0. ],
[ 0. , 0. ],
...,
[ 1.41421356, 0.78539816],
[ 1. , 0. ],
[ 0. , 0. ]],
...,
[[ 1. , 0. ],
[ 1. , 1. ],
[ 0. , 0. ],
...,
[ 2. , 1. ],
[ 3. , 1. ],
[ 3. , 1. ]],
[[ -7.28010989, 1.29249667],
[ -7.28010989, 1.29249667],
[ -8.54400375, 1.21202566],
...,
[-22.02271555, 1.52537305],
[ 22.09072203, -1.48013644],
[ 22.36067977, -1.39094283]],
[[ 1. , 0. ],
[ 1.41421356, -0.78539816],
[ 1. , 0. ],
...,
[ -1.41421356, -0.78539816],
[ 1. , 1. ],
[ 1.41421356, 0.78539816]]],
...,
[[[ 0. , 0. ],
[ 0. , 0. ],
[ 0. , 0. ],
...,
[ 0. , 0. ],
[ 0. , 0. ],
[ 0. , 0. ]],
[[ 5.38516481, 0.38050638],
[ 5.09901951, 0.19739556],
[ 4.47213595, -0.46364761],
...,
[ -1.41421356, 0.78539816],
[ -2.82842712, 0.78539816],
[ -5. , 0.64350111]],
[[ -6.32455532, 0.32175055],
[ -6.08276253, -0.16514868],
[ -5.65685425, -0.78539816],
...,
[ 3.60555128, 0.98279372],
[ 5. , 0.92729522],
[ 5.65685425, 0.78539816]],
...,
[[ -3.16227766, -0.32175055],
[ -3.60555128, -0.98279372],
[ 5. , 1. ],
...,
[ 12.08304597, 1.14416883],
[ 13.15294644, 1.418147 ],
[ 14.31782106, 1.35970299]],
[[ 3.60555128, -0.5880026 ],
[ 4.47213595, -1.10714872],
[ 6. , 1. ],
...,
[-20.39607805, 1.37340077],
[-21.02379604, 1.52321322],
[-22.09072203, 1.48013644]],
[[ 1. , 1. ],
[ -1.41421356, 0.78539816],
[ 1. , 1. ],
...,
[ 4.12310563, 1.32581766],
[ 4. , 1. ],
[ 4.12310563, 1.32581766]]],
[[[ 0. , 0. ],
[ 0. , 0. ],
[ 0. , 0. ],
...,
[ 0. , 0. ],
[ 0. , 0. ],
[ 0. , 0. ]],
[[ 0. , 0. ],
[ 1. , 1. ],
[ -2.23606798, 1.10714872],
...,
[ -3.16227766, 0.32175055],
[ 1. , 1. ],
[ 1.41421356, -0.78539816]],
[[ 1. , 1. ],
[ 1. , 1. ],
[ 1. , 1. ],
...,
[ 3. , 1. ],
[ 2. , 1. ],
[ -1.41421356, 0.78539816]],
...,
[[ 5.38516481, -1.19028995],
[ 4.47213595, -1.10714872],
[ 4.12310563, -1.32581766],
...,
[ 2.23606798, -0.46364761],
[ 1. , 1. ],
[ -1. , -0. ]],
[[ -5.38516481, 1.19028995],
[ -4.12310563, 1.32581766],
[ -3.16227766, 1.24904577],
...,
[ 0. , 0. ],
[ 1. , 0. ],
[ 1.41421356, -0.78539816]],
[[ 8.06225775, 1.44644133],
[ -7.07106781, -1.42889927],
[ 6. , 1. ],
...,
[ -3.16227766, -0.32175055],
[ -3.16227766, -0.32175055],
[ -3.16227766, -0.32175055]]],
[[[ 0. , 0. ],
[ 0. , 0. ],
[ 0. , 0. ],
...,
[ 0. , 0. ],
[ 0. , 0. ],
[ 0. , 0. ]],
[[ -2.23606798, 0.46364761],
[ -1.41421356, 0.78539816],
[ -2.23606798, 0.46364761],
...,
[ 1. , 0. ],
[ 1. , 0. ],
[ 1. , 1. ]],
[[ -2.23606798, -0.46364761],
[ -1.41421356, -0.78539816],
[ 2. , 1. ],
...,
[ 0. , 0. ],
[ 1. , 0. ],
[ 1. , 0. ]],
...,
[[ 1. , 0. ],
[ 1. , 1. ],
[ -2.23606798, -1.10714872],
...,
[ 19.02629759, 1.51821327],
[ 19. , 1. ],
[-19.10497317, -1.46591939]],
[[ 3.60555128, 0.98279372],
[ 3.60555128, 0.5880026 ],
[ 5. , 0.64350111],
...,
[ 7.28010989, -1.29249667],
[ 7.61577311, -1.16590454],
[ 8.06225775, -1.05165021]],
[[ -7.28010989, 1.29249667],
[ -5. , 0.92729522],
[ -5.83095189, 0.5404195 ],
...,
[ 20.09975124, 1.47112767],
[ 21.02379604, 1.52321322],
[-20.22374842, -1.42190638]]]])
The direction and speed features are quite valuable (the most important features) as it represents the movement of each facial landmark per frame and I am trying to get the machine learning algorithm to train base on that
I tried to reshape three of the dimensions into one long vector (just mushed speed, direction, and frame all together) and finally formed a 2D matrix, I fed it into sklearn SVM function and it produced a rather low accuracy. I expected this as I figured there is no way the ml algorithm would recognize that the difference between the features in the giant single matrix and assume that everything in the vector is the same features.
The 2D matrix I was forced to make to feed in sklearn SVM by forcing speed, direction, and video per frame all into one vector, and got a low accuracy with:
>> main
array([[ 0. , 0. , 0. , ..., -0.78539816,
2.23606798, -1.10714872],
[ 0. , 0. , 0. , ..., 1. ,
-2.23606798, -1.10714872],
[ 0. , 0. , 0. , ..., 1. ,
1.41421356, 0.78539816],
...,
[ 0. , 0. , 0. , ..., 1. ,
4.12310563, 1.32581766],
[ 0. , 0. , 0. , ..., -0.32175055,
-3.16227766, -0.32175055],
[ 0. , 0. , 0. , ..., 1.52321322,
-20.22374842, -1.42190638]])
>> main.shape
(60, 2312)
I want to preserve the speed and direction features, but have to represent them in a 2D matrix that takes into account the frames in the video.
The emotion label will be attached to each of the 17 frames in each video. (so basically, the 17 frame video will be labeled as an emotion)
Is there any smart way in reshaping and reducing the 4D matrix that would accomplish this?
So, the way that you've framed the question you will absolutely see poor accuracy and there's very little you can do to change it. Assigning a single emotion to a video (depending on your corpus), is generally inaccurate enough that any machine learning algorithm will have trouble learning the signal you're trying to pull out.
Additionally, you've framed the problem as a time-series problem, which is going to make your life a headache, especially if you're using off-the-shelf sklearn algorithms, which are very poorly suited for this kind of task.
If at all possible, you should instead frame your problem as a computer vision problem. You should attempt to predict on each individual frame, what the emotion content is. If you don't have a dataset with that level of granularity, you just aren't going to see great accuracy.
It's a little bit of a departure from the way in which you asked the question, but the way in which you've asked the question it's non-tractable. Here is, instead, the way that you should approach the problem:
Label individual frames with emotional content
Train an image-based algorithm to categorize those tagged frames
Convolutional neural networks will likely give you the best performance for any image-based problem where you have a decently-sized dataset
If that is not an option, you need to develop a 1d feature representation of the image. I would personally suggest using indico's image features API. Once you have this representation a typical algorithm like an SVM will work great.
If accuracy is not quite to your liking, but is getting close I would recommend using a pre-processing/data-augmentation pipeline like the one details here. Granted, that example is for plankton identification, the basic approach is identical
If the accuracy still isn't up to snuff, and you need to predict on the entire video you will then want to aggregate your results to give accurate results over the entire video
One method is to train a convolutional neural network on a graph of the predictions you've made over the video. This is kind of weird, but might work pretty well
A good approach would be to use a bayesian method, assuming each prediction has a certain level of confidence, and combining the prediction distributions over the video.
The best approach is to treat this as an ensemble learning problem. Luckily, ensemble learning is a very well-studied and understood problem. You can find details of how to combine multiple predictions in this format here.
I hope this has been helpful! Let me know if you have any more questions.
Disclaimer: I am the CEO of indico, so I may be biased in recommending its use.

max value from specified column in numpy array

We need a maximum value from a numpy array with 3 columns.
Sample, i need the maximum value per array of the last column.
In this case the result is: 57.65048981 for the first array, 58.3501091 for the second and 56.86465836 for the third. How to get these 3 values in an array included by the 2 2 values in the columns before?
[array([[ 402. , 242. , 57.65048981],
[ 401. , 243. , 56.32482529]]),
array([[ 356. , 257. , 53.3116188 ],
[ 355. , 258. , 53.69690704],
[ 356. , 258. , 57.52435684],
[ 355. , 259. , 56.98838806],
[ 356. , 259. , 57.81959152],
[ 354. , 260. , 55.90369415],
[ 355. , 260. , 58.14822769],
[ 356. , 260. , 58.3501091 ],
[ 354. , 261. , 55.1479187 ],
[ 355. , 261. , 58.20180893],
[ 354. , 262. , 54.5345459 ]]),
array([[ 386. , 260. , 56.86465836],
[ 386. , 261. , 54.28659439],
[ 386. , 259. , 56.53445435]])]
The result of this should be:
[[402, 242, 57.65048981],
[356 ,260, 58.3501091],
[386 ,260, 56.86465836]]
I think there's an error in your "results"
np.array([arr[np.argmax(arr[:, 2]), :] for arr in arrays])
returns
array([[ 402. , 242. , 57.65048981],
[ 356. , 260. , 58.3501091 ],
[ 386. , 260. , 56.86465836]])

Categories