Scikit FDA - Landmark_registration Problem

Scikit FDA - Landmark_registration Problem - python

After a smoothing procedure, I have a problem with the landmark registration in this line:
skfda.preprocessing.registration.landmark_registration_warping(fd, land)
It return the following error:
ValueError: `x` must be strictly increasing sequence.
fd is a FDataGrid (typical type of data required to represent the function) with 5 samples, while land is an array of landmark that I want to align and it is an increasing sequence of points (see below)
land <- array([[[0.1 , 0.134, 0.258, 0.292, 0.328, 0.558, 0.602],
[0.1 , 0.126, 0.23 , 0.256, 0.292, 0.454, 0.474],
[0.1 , 0.148, 0.25 , 0.278, 0.34 , 0.514, 0.568],
[0.1 , 0.116, 0.25 , 0.276, 0.298, 0.508, 0.612],
[0.1 , 0.132, 0.258, 0.286, 0.376, 0.59 , 0.648]]])
fd <-
Can somebody help me? I'm using scikit fda package to perform this kind of analysis
https://fda.readthedocs.io/en/latest/modules/preprocessing/autosummary/skfda.preprocessing.registration.landmark_registration.html#skfda.preprocessing.registration.landmark_registration
This is the link to the function that I'm using

I had this error when finding my own landmarks. I forgot to pass in the actual domain value at that point (in my case the peak(s) I was wanting to align). Once I did that my error changed to: ValueError: Sample points must be within the domain range. Which brings me to my next point:
Manually specifying the end result landmark locations allowed the code to run, and from what I can tell "work." I'm not sure if this is a bug, or if I am doing something wrong myself. However, the examples they provide do explicitly state that the end result landmark locations shouldn't have to be specified.
Additionally, the end result landmark locations do not seem to end up at the specified points. They end up at the closest point in the grid_point array. This may not be too obvious/a problem for high sample rate data, but for the demo GAIT data scikit-fda provides, there are only 20 sample points so it is noticeably visible that the landmarks do not go exactly where specified. This is also the case for when converting to a basis function as well. One could possibly toy around with the interpolation options and see if it helps.

Related

Detecting Pattern in Real Time Data array in Python

I'm trying to detect specific pattern in the Real time data (Time Series). For the visualization, I'll show the data in two parts here.
Pattern: I'm trying to search for in time series,
DataWindow: data buffer(window) I slide in real time to keep track of history.
Here is my recorded data(red boxes shows the pattern that I want to detect), but this can be different since it is Real Time:
The above data doesn't have a lot of noise (at least for this collection) - as far as I look at the resolutions, peaks (maybe I would say sinusoidal peaks) are distinguishable at first glance. That is why applying a moving average filter does not help me at all.
The below image shows some samples from real time data but in the saved data, plotter applies extrapolation to draw continous plot. In general, data samples look like the image below or maybe with more resolution than this image.
For the initial start, I've tried Spike Detection in a Time-Seriesusing moving average and did not work as I expected.
I've also tried some solutions here from this thread Detecting patterns from two arrays of data in Python and the results are not good enough for me to raise a flag in the patterns during run-time(there are many false positives)
Also, as you might realize from the saved real time data that, patterns can have different scale and most importantly can have different offset. That is the problem I guess for me to apply above solutions on my problem to get distinguishable results.
To give some example to try out, these can be used for the Pattern and DataWindow
Pattern = [5.9, 5.6, 4.08, 2.57, 2.78, 4.78, 7.3, 7.98, 4.81, 5.57, 4.7]
SampleTarget = [4.74, 4.693, 4.599, 4.444, 3.448, 2.631, 1.845, 2.032, 2.415, 3.714, 5.184, 5.82, 5.61, 4.841, 3.802, 3.11]
SampleTarget2 = [5.898, 5.91, 5.62, 5.25, 4.72, 4.09, 3.445, 2.91, 2.7, 2.44, 2.515, 2.79, 3.25, 3.915,4.72, 5.65, 6.28, 7.15, 7.81, 8.2, 7.9, 7.71, 7.32, 6.88, 6.44, 6.0,5.58, 5.185, 4.88, 4.72, 4.69, 4.82]
I am trying to solve this problem on Python for PoC.
UPDATE: Dataset is added, includes first two red boxes and a bit wider side as well, which is shown in the saved real time data.dataset

You can compute the gradient of the data and use a threshold to identify the features. Here I use a triple mask to get the down/up/down feature.
I commented the code to give you the main steps, so I hope it is comprehensive.
import pandas as pd
import matplotlib.pyplot as plt
# read data
s = pd.read_csv('sin_peaks.txt', header=None)[0]
# 0 5.574537
# 1 5.736071
# 2 5.965132
# 3 6.164344
# 4 6.172413
thresh = 0.5 # threshold of derivative
span = 10 # max span of the feature (in number of points)
# calculate gradient
# if the points are not evenly spaced
# you should also divide by the spacing
s2 = s.diff()
# get points outside of threshold
m1 = s2.lt(-thresh)
m2 = s2.gt(thresh)
# extend masks
m1_fw = m1.where(m1).ffill(limit=span)
m1_bw = m1.where(m1).bfill(limit=span)
m2_fbw = m2.where(m2).ffill(limit=span).bfill(limit=span)
# slice data where all conditions are met
# up peak & down peak in the "span" before and down peak in the "span" after
peaks = s[m1_fw & m1_bw & m2_fbw]
# group peaks
groups = peaks.index.to_series().diff().ne(1).cumsum()
# plot identified features
ax = s.plot(label='data')
s.diff().plot(ax=ax, label='gradient')
ax.legend()
ax.axhline(thresh, ls=':', c='k')
ax.axhline(-thresh, ls=':', c='k')
for _, group in peaks.groupby(groups):
start = group.index[0]
stop = group.index[-1]
ax.axvspan(start, stop, color='k', alpha=0.1)

possibility of Pymoo to work within a candidate search space

I have a problem of two objective functions, three variables, and zero constraints.
I have also a search space for these variables read from CSV.
Is it possible to use pymoo to use that search space of variables (instead of xl, and xu) to get the best combination of them that maximize the two functions.
class MyProblem (Problem):
def __init__(self):
super().__init__(n_var=3,
n_obj=2,
n_constr=0,
#I want to use the search space of the three variables (I already have)
xl=np.array([0.0,0.0,0.0]),
xu=np.array([1.0,1.0,1.0])
)
def _evaluate(self,X,out,*args,**kwargs):
#Maximizing the triangle area of the three variables
f1=-1*(0.5*math.sin(120)*(X[:,0]*X[:,1] +X[:,2]*X[:,1]+X[:,0]*X[:,2]))
#maximizing the sum of the variables
f2 = -1*(X[:,0]+X[:,1]+X[:,2])
out["F"] = np.column_stack([f1, f2])
problem = MyProblem()
When I use the xl and xu, it always gets the combination of ones [1.0,1.0,1.0], but I want to get the best combination out of my numpy multi-dimension array.
import csv
with open("sample_data/dimensions.csv", 'r') as f:
dimensions = list(csv.reader(f, delimiter=","))
import numpy as np
dimensions = np.array(dimensions[1:])
dimensions=np.array(dimensions[:,1:], dtype=np.float)
dimensions
that looks like the following:
array([[0.27 , 0.45 , 0.23 ],
[0. , 0.23 , 0.09 ],
[0.82 , 0.32 , 0.27 ],
[0.64 , 0.55 , 0.32 ],
[0.77 , 0.55 , 0.36 ],
[0.25 , 0.86 , 0.18 ],
[0. , 0.68 , 0.09 ],...])
Thanks for your help!

Have you tried sampling by numpy.array?
class pymoo.algorithms.nsga2.NSGA2(self, pop_size=100, sampling=numpy.array)
where (from pymoo API)
The sampling process defines the initial set of solutions which are
the starting point of the optimization algorithm. Here, you have three
different options by passing
(i) A Sampling implementation which is an implementation of a random
sampling method.
(ii) A Population object containing the variables to be evaluated
initially OR already evaluated solutions (F needs to be set in this
case).
(iii) Pass a two dimensional numpy.array with (n_individuals, n_var)
which contains the variable space values for each individual.

I can't seem to grasp how to use a radial basis function kernel for a classification task in python

I'm tasked with using Parzen windows with the radial basis function kernel to determine which label to give to a given point.
My training data set has 4 dimensions (4 features per point).
My training label set contains the labels (which can be 0,1,2,... depending on how many classes we have) for all the points in my training set (It's a 1D-array).
My test data set contains a couple of points with 4 dimensions but no labels so it's a nx4 array.
We're interested in giving labels for each of the points in my test data set.
I first compute the rdf kernel $k(x_i,x)$: (using python and numpy)
for (i, ex) in enumerate(test_data):
squared_distances = (np.sum((np.abs(ex - self.train_inputs)) ** 2, axis=1)) ** (1.0 / 2)
k = np.exp(- squared_distances/2*(np.square(self.sigma)))
Let's assume that test_data looks like this :
[[ 0.40614 1.3492 -1.4501 -0.55949]
[ -1.3887 -4.8773 6.4774 0.34179]
[ -3.7503 -13.4586 17.5932 -2.7771 ]
[ -3.5637 -8.3827 12.393 -1.2823 ]
[ -2.5419 -0.65804 2.6842 1.1952 ]]
ex is a point from the test data set. here as an example :
[ 0.40614 1.3492 -1.4501 -0.55949]
self.train_inputs is the training data set and it looks like this
[[ 3.6216 8.6661 -2.8073 -0.44699]
[ 4.5459 8.1674 -2.4586 -1.4621 ]
[ 3.866 -2.6383 1.9242 0.10645]
...
[-1.1667 -1.4237 2.9241 0.66119]
[-2.8391 -6.63 10.4849 -0.42113]
[-4.5046 -5.8126 10.8867 -0.52846]]
k is an array containing all the distances between every x_i (in self.training_inputs) and our current test point x (which is ex in the code).
k = [0.99837982 0.9983832 0.99874063 ... 0.9988909 0.99706044 0.99698724]
It's of the same length as the number of points in self.train_inputs.
My understanding of the radial basis function is that the closest the training points are to the test point the greater the value of k(current training point, test point). However k can never exceed 1 or be below 0.
So the goal is to select the training point that is the closest to the test point. We do this by looking which has the greatest value in k. Then we take its index and use that same index on the array containing the labels only. Therefore we get the label we want our test point to take.
In code it translates to this (the additional code is put below the first code snippet above) :
best_arg = np.argmax(k) #selects the greatest value in k and gives back its index.
classes_pred[i] = self.train_labels[best_arg] #we use the index to select the label in the train labels array.
Here self.train_labels looks like :
[0. 0. 0. ... 1. 1. 1.]
This approach gives for ex = [ 0.40614 1.3492 -1.4501 -0.55949] and k = [0.99837982 0.9983832 0.99874063 ... 0.9988909 0.99706044 0.99698724] :
818 for the index containing the greatest value in the current k and 1. as the label given self.train_labels[818] = 1.
However it seems that I'm doing this wrong. Given an already implemented algorithm by my teacher I get some of the labels wrong (especially when we have more then two classes). My question is am I doing this wrong? If yes where? I'm new to machine learning btw.

Understanding the output of scipy.stats.multivariate_normal

I am trying to build a multidimensional gaussian model using scipy.stats.multivariate_normal. I am trying to use the output of scipy.stats.multivariate_normal.pdf() to figure out if a test value fits reasonable well in the observed distribution.
From what I understand, high values indicate a better fit to the given model and low values otherwise.
However, in my dataset, I see extremely large PDF(x) results, which lead me to question if I understand things correctly. The area under the PDF curve must be 1, so very large values are hard to comprehend.
For e.g., consider:
x = [-0.0007569417915494715, -0.01394295997613827, 0.000982078369890444, -0.03633664354397629, -0.03730583036106844, 0.013920453054506978, -0.08115836865224338, -0.07208494497398354, -0.06255237023298793, -0.0531888840386906, -0.006823760545565131]
mean = [0.01663645201261102, 0.07800335614699873, 0.016291452384234965, 0.012042931155488702, 0.0042637244100103885, 0.016531331606477996, -0.021702714746699842, -0.05738646649459681, 0.00921296058625439, 0.027940994009345254, 0.07548111758006244]
covariance = [[0.07921927017771506, 0.04780185747873293, 0.0788086850274493, 0.054129466248481264, 0.018799028456661045, 0.07523731808137141, 0.027682748950487425, -0.007296954729572955, 0.07935165417756569, 0.0569381100965656, 0.04185848489472492], [0.04780185747873293, 0.052300105044833595, 0.047749467098423544, 0.03254872837949123, 0.010582358713999951, 0.045792252383799206, 0.01969282984717051, -0.006089301208961258, 0.05067712814145293, 0.03146214776997301, 0.04452949330387575], [0.0788086850274493, 0.047749467098423544, 0.07841809405745602, 0.05374461924031552, 0.01871005609017673, 0.07487015790787396, 0.02756781074862818, -0.007327131572569985, 0.07895548129950304, 0.056417456686115544, 0.04181063355048408], [0.054129466248481264, 0.03254872837949123, 0.05374461924031552, 0.04538801863296238, 0.015795381235224913, 0.05055944754764062, 0.02017033995851422, -0.006505939129684573, 0.05497361331950649, 0.043858860182247515, 0.029356699144606032], [0.018799028456661045, 0.010582358713999951, 0.01871005609017673, 0.015795381235224913, 0.016260640022897347, 0.015459548918222347, 0.0064542528152879705, -0.0016656858963383602, 0.018761682220822192, 0.015361512546799405, 0.009832025009280924], [0.07523731808137141, 0.045792252383799206, 0.07487015790787396, 0.05055944754764062, 0.015459548918222347, 0.07207012779105286, 0.026330967917717253, -0.006907504360835279, 0.0753380831201204, 0.05335128471397023, 0.03998397595850863], [0.027682748950487425, 0.01969282984717051, 0.02756781074862818, 0.02017033995851422, 0.0064542528152879705, 0.026330967917717253, 0.020837940236441078, -0.003320408544812026, 0.027859582829638897, 0.01967636950969646, 0.017105000942890598], [-0.007296954729572955, -0.006089301208961258, -0.007327131572569985, -0.006505939129684573, -0.0016656858963383602, -0.006907504360835279, -0.003320408544812026, 0.024529061074105817, -0.007869287828047853, -0.006228903058681195, -0.0058974553248417995], [0.07935165417756569, 0.05067712814145293, 0.07895548129950304, 0.05497361331950649, 0.018761682220822192, 0.0753380831201204, 0.027859582829638897, -0.007869287828047853, 0.08169291677188911, 0.05731196406065222, 0.04450058445993234], [0.0569381100965656, 0.03146214776997301, 0.056417456686115544, 0.043858860182247515, 0.015361512546799405, 0.05335128471397023, 0.01967636950969646, -0.006228903058681195, 0.05731196406065222, 0.05064023101024737, 0.02830810316675855], [0.04185848489472492, 0.04452949330387575, 0.04181063355048408, 0.029356699144606032, 0.009832025009280924, 0.03998397595850863, 0.017105000942890598, -0.0058974553248417995, 0.04450058445993234, 0.02830810316675855, 0.040658283674780395]]
For this, if I compute y = multivariate_normal.pdf(x, mean, cov);
the result is 342562705.3859754.
How could this be the case? Am I missing something?
Thanks.

This is fine. The probability density function can be larger than 1 at a specific point. It's the integral than must be equal to 1.
The idea that pdf < 1 is correct for discrete variables. However, for continuous ones, the pdf is not a probability. It's a value that is integrated to a probability. That is, the integral from minus infinity to infinity, in all dimensions, is equal to 1.

How to get a good cv2.stereoCalibrate after successful cv2.calibrateCamera

Hi everyone I've been digging a bit into computer vision using Python and OpenCV and was trying to calibrate two cameras I've bought in order to do some 3D stereo reconstruction but I'm having some problems with it.
I've followed mostly this tutorial in order to calibrate the cameras separately (I apply it to both of them) and then I intend to use the cv2.stereoCalibrate to get the relative calibration.
With the single camera calibration everything seems to be working correctly, I get a very low re-proyect error and as far as my knowledge goes the matrices seems to look OK. Here I leave the results of the single camera calibration.
cameraMatrix1 and distCoeffs1:
[[ 951.3607329 0. 298.74117671]
[ 0. 954.23088299 219.20548594]
[ 0. 0. 1. ]]
[[ -1.07320015e-01 -5.56147908e-01 -1.13339913e-03 1.85969704e-03
2.24131322e+00]]
cameraMatrix2 and distCoeffs2:
[[ 963.41078117 0. 362.85971342]
[ 0. 965.66793023 175.63216871]
[ 0. 0. 1. ]]
[[ -3.31491728e-01 2.26020466e+00 3.86190151e-03 -2.32988011e-03
-9.82275646e+00]]
So after having those I do the following (I fix the intrinsics as I already know them from the previous calibrations):
stereocalibration_criteria = (cv2.TERM_CRITERIA_MAX_ITER + cv2.TERM_CRITERIA_EPS, 100, 1e-5)
stereocalibration_flags = cv2.CALIB_FIX_INTRINSIC
stereocalibration_retval, cameraMatrix1, distCoeffs1, cameraMatrix2, distCoeffs2, R, T, E, F = cv2.stereoCalibrate(objpoints,imgpoints_left,imgpoints_right,cameraMatrix1,distCoeffs1,cameraMatrix2,distCoeffs2,gray_left.shape[::-1],criteria = stereocalibration_criteria, flags = stereocalibration_flags)
I've tried several times to change the flags of the stereoCalibrate and switch the matrices to see if I was mistaken in the order and that mattered but I'm still blocked with this and get a retval of around 30 (and after that I try to rectify the images and of course the result is a disaster).
I've also tried using some calibration images from the internet and I do get the same result so I assume that the problem is not with the images I've taken. If anyone can point me in the right direction or knows what could be it will be very very welcome.

Turns out that the order of the images I was using was not the same for right and left camera... I was using
images_left = glob.glob('Calibration/images/set1/left*' + images_format)
images_right = glob.glob('Calibration/images/set1/right*' + images_format)
When I should have been using something more like:
images_left = sorted(glob.glob('Calibration/images/set1/left*' + images_format))
images_right = sorted(glob.glob('Calibration/images/set1/right*' + images_format))
This is because glob gets the images in an apparently random order so I was trying to match the wrong images. Now I finally get a 0.4 retval, which is not that bad.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.