I've been trying to work on designing a Kalman Filter for a few weeks now, but I'm pretty sure I'm making a major error because my results are terrible. My common sense tells me it's because I'm using an already-existing matrix as my predicted state instead of using a transition matrix, but I'm not sure how to solve that if it indeed is the issue. By the way, this is my first time using Kalman Filtering, so I may be missing basic stuff.
Here is a detailed explanation:
I have 2 datasets of 81036 observations each, with each observation including 6 datapoints (i.e., I end up with 2 matrices of shape 81036 x 6). The first dataset is the measured state and the other one is the predicted state. I want to end up with a Python code that filters the data using both states, and I need the final covariance and error estimates. Here's the main part of my code:
import numpy as np
#nb of observations
nn=81036
#nb of datapoints
ns=6
#import
ps=np.genfromtxt('.......csv', delimiter=',')
ms=np.genfromtxt('.......csv', delimiter=',')
##kalman filtering with covariance
#initialize data (lazy initialization using means of columns)
xi=np.mean(ms,axis=0)
for i in np.arange(nn):
#errors
d=ms[i,:]-xi
d2=ps[i,:]-xi
#covariance matrices
P=np.zeros((ns,ns))
R=np.zeros((ns,ns))
for j in np.arange(ns):
for s in np.arange(ns):
P[j,s]=d[j]*d[s]
R[j,s]=d2[j]*d2[s]
#Gain
k=P*(P+R)**-1
#Update estimate
xi=xi+np.matmul(k,d2)
#Uncertainty/error
I=np.identity(ns)
mlt=np.matmul((I-k),P)
mlt=np.matmul(mlt,((I-k).T))
mlt2=np.matmul(k,R)
mlt2=np.matmul(mlt2,k.T)
Er=mlt+mlt2
When I run this code, I end up with my filtered state xi going through the roof, so I'm pretty sure this is not the correct code. I've tried to fix it in several ways (e.g., I tried to calculate the covariance matrix in the standard way I'm used to - D'D/n -, I tried to remove my predicted state matrix and simply add random noise to my measured state instead...), but nothing seems to work. I also tried some available libraries for Kalman Filtering (as well as libraries in Matlab and R), but they either work in 1D only or need me to specify variables like the transitional matrix, which I don't have. I'm at the end of my wits here, so I'd appreciate any help.
I've found the solution to this issue. Huge props to Kani for their comment, as it pointed me in the right direction.
It turns out that the issue is simply in the calculation of k. Although the equation is correct, the inverse function was not working properly because of the very small values in some instances of R and P. To solve this, I used the pseudoinverse instead, so the line for calculating k became as follows:
k = P # np.linalg.pinv(P + R)
Note that this might not be as accurate as the inverse in other cases, but it does the trick here.
Related
I have a data set of 15,497 sets of values. The graph shows the raw data angle of pendulum vs. sample number which, obviously, looks awful. It should look like the second picture filtered data. A part of the assignment is introducing a mean filter to "smoothen" the data, making it look like the data on the 2nd graph. The data is put into np.array's in Python. But, using np.array's, I can't seem to figure out how to introduce a mean filter.
I'm interested in applying a mean filter on theta in the code screenshot of Python code, as theta are the values on the y axis on the plots. The code is added for you to easily see how the data file is introduced in the code.
There is a whole world of filtering techniques. There is a not a single unique 'mean filter'. Moreover, there are causal and non-causal filters (i.e. the difference between not using future values in the filter vs. using the future values in the filter.) I'm going to assume you are desiring a mean filter of size N, as that is pretty standard. Then, to apply this filter, convolve your 'theta' vector with a mean kernel.
I suggest printing the mean kernel and studying how it looks with different N. Then you may understand how it is averaging the values in the signal. I also urge you to think about why convolution is applying this filter to theta. I'll help you by telling you to think about the equivalent multiplication in the frequency domain. Also, investigate the different modes in the convolution function, as this may be more tailored for the specific solution you desire.
N=2
mean_kernel = np.ones(N)/N
filtered_sig = np.convolve(sig, mean_kernel, mode='same')
I'm trying to use KernelPCA for reducing the dimensionality of a dataset to 2D (both for visualization purposes and for further data analysis).
I experimented computing KernelPCA using a RBF kernel at various values of Gamma, but the result is unstable:
(each frame is a slightly different value of Gamma, where Gamma is varying continuously from 0 to 1)
Looks like it is not deterministic.
Is there a way to stabilize it/make it deterministic?
Code used to generate transformed data:
def pca(X, gamma1):
kpca = KernelPCA(kernel="rbf", fit_inverse_transform=True, gamma=gamma1)
X_kpca = kpca.fit_transform(X)
#X_back = kpca.inverse_transform(X_kpca)
return X_kpca
KernelPCA should be deterministic and evolve continuously with gamma. It is different from RBFSampler that does have built-in randomness in order to provide an efficient (more scalable) approximation of the RBF kernel.
However what can change in KernelPCA is the order of the principal components: in scikit-learn they are returned sorted in order of descending eigenvalue, so if you have 2 eigenvalues close to each other it could be that the order changes with gamma.
My guess (from the gif) is that this is what is happening here: the axes along which you are plotting are not constant so your data seems to jump around.
Could you provide the code you used to produce the gif?
I'm guessing it is a plot of the data points along the 2 first principal components but it would help to see how you produced it.
You could try to further inspect it by looking at the values of kpca.alphas_ (the eigenvectors) for each value of gamma.
Hope this makes sense.
EDIT: As you noted it looks like the points are reflected against the axis, the most plausible explanation is that one of the eigenvector flips sign (note this does not affect the eigenvalue).
I put in a simple gist to reproduce the issue (you'll need a Jupyter notebook to run it). You can see the sign-flipping when you change the value of gamma.
As a complement note that this kind of discrepancy happens only because you fit several times the KernelPCA object several times. Once you settled with a particular gamma value and you've fit kpca once you can call transform several times and get consistent results.
For the classical PCA the docs mention that:
Due to implementation subtleties of the Singular Value Decomposition (SVD), which is used in this implementation, running fit twice on the same matrix can lead to principal components with signs flipped (change in direction). For this reason, it is important to always use the same estimator object to transform data in a consistent fashion.
I don't know about the behavior of a single KernelPCA object that you would fit several times (I did not find anything relevant in the docs).
It does not apply to your case though as you have to fit the object with several gamma values.
So... I can't give you a definitive answer on why KernelPCA is not deterministic. The behavior resembles the differences I've observed between the results of PCA and RandomizedPCA. PCA is deterministic, but RandomizedPCA is not, and sometimes the eigenvectors are flipped in sign relative to the PCA eigenvectors.
That leads me to my vague idea of how you might get more deterministic results....maybe. Use RBFSampler with a fixed seed:
def pca(X, gamma1):
kernvals = RBFSampler(gamma=gamma1, random_state=0).fit_transform(X)
kpca = PCA().fit_transform(X)
X_kpca = kpca.fit_transform(X)
return X_kpca
I have written python (2.7.3) code wherein I aim to create a weighted sum of 16 data sets, and compare the result to some expected value. My problem is to find the weighting coefficients which will produce the best fit to the model. To do this, I have been experimenting with scipy's optimize.minimize routines, but have had mixed results.
Each of my individual data sets is stored as a 15x15 ndarray, so their weighted sum is also a 15x15 array. I define my own 'model' of what the sum should look like (also a 15x15 array), and quantify the goodness of fit between my result and the model using a basic least squares calculation.
R=np.sum(np.abs(model/np.max(model)-myresult)**2)
'myresult' is produced as a function of some set of parameters 'wts'. I want to find the set of parameters 'wts' which will minimise R.
To do so, I have been trying this:
res = minimize(get_best_weightings,wts,bounds=bnds,method='SLSQP',options={'disp':True,'eps':100})
Where my objective function is:
def get_best_weightings(wts):
wts_tr=wts[0:16]
wts_ti=wts[16:32]
for i,j in enumerate(portlist):
originalwtsr[j]=wts_tr[i]
originalwtsi[j]=wts_ti[i]
realwts=originalwtsr
imagwts=originalwtsi
myresult=make_weighted_beam(realwts,imagwts,1)
R=np.sum((np.abs(modelbeam/np.max(modelbeam)-myresult))**2)
return R
The input (wts) is an ndarray of shape (32,), and the output, R, is just some scalar, which should get smaller as my fit gets better. By my understanding, this is exactly the sort of problem ("Minimization of scalar function of one or more variables.") which scipy.optimize.minimize is designed to optimize (http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.optimize.minimize.html ).
However, when I run the code, although the optimization routine seems to iterate over different values of all the elements of wts, only a few of them seem to 'stick'. Ie, all but four of the values are returned with the same values as my initial guess. To illustrate, I plot the values of my initial guess for wts (in blue), and the optimized values in red. You can see that for most elements, the two lines overlap.
Image:
http://imgur.com/p1hQuz7
Changing just these few parameters is not enough to get a good answer, and I can't understand why the other parameters aren't also being optimised. I suspect that maybe I'm not understanding the nature of my minimization problem, so I'm hoping someone here can point out where I'm going wrong.
I have experimented with a variety of minimize's inbuilt methods (I am by no means committed to SLSQP, or certain that it's the most appropriate choice), and with a variety of 'step sizes' eps. The bounds I am using for my parameters are all (-4000,4000). I only have scipy version .11, so I haven't tested a basinhopping routine to get the global minimum (this needs .12). I have looked at minimize.brute, but haven't tried implementing it yet - thought I'd check if anyone can steer me in a better direction first.
Any advice appreciated! Sorry for the wall of text and the possibly (probably?) idiotic question. I can post more of my code if necessary, but it's pretty long and unpolished.
I currently want to calculate all-pair document similarity using cosine similarity and Tfidf features in python. My basic approach is the following:
from sklearn.feature_extraction.text import TfidfVectorizer
#c = [doc1, doc2, ..., docn]
vec = TfidfVectorizer()
X = vec.fit_transform(c)
del vec
Y = X * X.T
Works perfectly fine, but unfortunately, not for my very large datasets. X has a dimension of (350363, 2526183) and hence, the output matrix Y should have (350363, 350363). X is very sparse due to the tfidf features, and hence, easily fits into memory (around 2GB only). Yet, the multiplication gives me a memory error after running for some time (even though the memory is not full but I suppose that scipy is so clever as to expect the memory usage).
I have already tried to play around with the dtypes without any success. I have also made sure that numpy and scipy have their BLAS libraries linked -- whereas this does not have an effect on the csr_matrix dot functionality as it is implemented in C. I thought of maybe using things like memmap, but I am not sure about that.
Does anyone have an idea of how to best approach this?
Even though X is sparse, X * X.T probably won't, notice, that it just needs one nonzero common element in a given pair of rows. You are working with NLP task, so I am pretty sure that there are huge amounts of words which occur in nearly all documents (and as said before - it does not have to be one word for all pairs, but one (possibly different) for each pair. As a result you get a matrix of 350363^2 elements which has about 122,000,000,000 elements, if you don't have 200GB of ram, it does not look computable. Try to perform much more aggresive filtering of words in order to force X * X.T to be sparse (remove many common words)
In general you won't be able to compute Gram matrix of big data, unless you enforce the sparsity of the X * X.T, so most of your vectors' pairs (documents) have 0 "similarity". It can be done in numerous ways, the easiest way is to set some threshold T under which you treat <a,b> as 0 and compute the dot product by yourself, and create an entry in the resulting sparse matrix iff <a,b> > T
You may want to look at the random_projection module in scikit-learn. The Johnson-Lindenstrauss lemma says that a random projection matrix is guaranteed to preserve pairwise distances up to some tolerance eta, which is a hyperparameter when you calculate the number of random projections needed.
To cut a long story short, the scikit-learn class SparseRandomProjection seen here is a transformer to do this for you. If you run it on X after vec.fit_transform you should see a fairly large reduction in feature size.
The formula from sklearn.random_projection.johnson_lindenstrauss_min_dim shows that to preserve up to a 10% tolerance, you only need johnson_lindenstrauss_min_dim(350363, .1) 10942 features. This is an upper bound, so you may be able to get away with much less. Even 1% tolerance would only need johnson_lindenstrauss_min_dim(350363, .01) 1028192 features which is still significantly less than you have right now.
EDIT:
Simple thing to try - if your data is dtype='float64', try using 'float32'. That alone can save a massive amount of space, especially if you do not need double precision.
If the issue is that you cannot store the "final matrix" in memory either, I would recommend working with the data in an HDF5Store (as seen in pandas using pytables). This link has some good starter code, and you could iteratively calculate chunks of your dot product and write to disk. I have been using this extensively in a recent project on a 45GB dataset, and could provide more help if you decide to go this route.
What you could do is slice a row and a column of X, multiply those and save the resulting row to a file. Then move to the next row and column.
It is still the same amount of calculation work but you wouldn't run out of memory.
Using multiprocessing.Pool.map() or multiprocessing.Pool.map_async() you migt be able to speed it up, provided you use numpy.memmap() to read the matrix in the mapped function. And you would probably have to write each of the calculated rows to a separate file to merge them later. If you were to return the row from the mapped function it would have to be transferred back to the original process. That would take a lot of memory and IPC bandwidth.
Using a Microsoft Kinect, I am collecting depth data about an object. From these data, I create a "cloud" of points (point cloud), which, when plotted, allow me to view the object that I scanned using the Kinect.
However, I would like to be able to collect multiple point clouds from different "views" and align them. More specifically, I would like to use an algorithm such as Iterative Closest Point (ICP) to do so, transforming each point in my point cloud by calculating the rotation and translation between each cloud that I collect and the previously-collected cloud.
However, while I understand the process behind ICP, I do not understand how I would implement it in 3D. Perhaps it is my lack of mathematical experience or my lack of experience with frameworks such as OpenCV, but I cannot find a solution. I would like to avoid libraries such as the Point Cloud Library which does this sort of thing for me, since I would like to do it myself.
Any and all suggestions are appreciated (if there is a solution that involves OpenCV/python that I can work on, that would be even better!)
I am currently struggling with ICP myself. Here is what I have gathered so far:
ICP consists of three steps:
Given two point clouds A and B, find pairs of points between A and B that probably represent the same point in space. Often this is done simply by matching each point with its closest neighbor in the other cloud, but you can use additional features such as color, texture or surface normal to improve the matching. Optionally you can then discard the worst matches.
Given this list of correspondence pairs, find the optimal transformation from A to B
Apply this transformation to all points in A
repeat these three steps until you converge on an acceptable solution.
Step one is easy, although there are lots of ways to optimize its speed, since this is the major performance bottleneck of ICP; and to improve the accuracy, since this is the main source of errors. OpenCV can help you there with the FLANN library.
I assume your troubles are with step two, finding the best transformation given a list of correspondences.
One common approach works with Singular Value Decomposition (SVD). Here is a rough sketch of the algorithm. Searching for ICP & SVD will give a lot of further references.
Take the list of corresponding points A1..An and B1..Bn from step 1
calculate the centroid Ca of all points in A and the centroid Cb of all points in B
Calculate the 3x3 covariance matrix M
M = (A1 - Ca)* (B1 - Cb)T + ... + (An - Ca)* (Bn - Cb)T
Use SVD to calculate the 3x3 Matrices U and V for M
(OpenCV has a function to perform SVD)
Calculate R = U * VT.
This is your desired optimal rotation matrix.
Calculate the optimal translation as Cb - R*Ca
The optimal transformation is the combination of R and this translation
Please note that I have not yet implemented this algorithm myself, so I am only paraphrasing what I read.
A very good introduction to ICP, including accelerated variants, can be found in Rusinkievicz's old paper here.
A new ICP algorithm is now in OpenCV contrib (surface matching module). It also benefits from the variants of various types (including Rusinkievicz and more):
http://docs.opencv.org/3.0-beta/modules/surface_matching/doc/surface_matching.html
For MATLAB implementation:
http://www.mathworks.co.jp/matlabcentral/fileexchange/47152-icp-registration-using-efficient-variants-and-multi-resolution-scheme/content/icp_mod_point_plane_pyr.m
#tdirdal:
Ok then I may not be looking at the correct code.
I am talking about this package link:
The code starts with constructing a transformation matrix and then loads a *.ply which contains a mesh (faces and vertices). The rest of the code depends on the mesh that has been loaded.
I have a very simple problem and I would appreciate it if you could let me know how I can solve this using the ICP method. I have the following two point clouds. P2 is a subset of P39 and I would like to find P2 in P39. Please let me know how I can use your matlab package to solve this problem.
P2:
11.2706 -5.3392 1.1903
13.6194 -4.8500 2.6222
8.8809 -3.8407 1.1903
10.7704 -2.1800 2.6222
8.5570 -1.0346 1.1903
13.1808 -2.5632 1.1903
P39:
-1.9977 -4.1434 -1.6750
-4.3982 -3.5743 -3.1069
-6.8065 -3.0071 -1.6751
-9.2169 -2.4386 -3.1070
-11.6285 -1.8696 -1.6751
-16.4505 -0.7305 -1.6751
-14.0401 -1.3001 -3.1070
-18.8577 -0.1608 -3.1070
-25.9398 -0.8647 -3.1070
-30.1972 -4.6857 -3.1069
-28.2349 -2.5200 -3.1069
-29.5843 -0.2496 -1.6751
-31.1688 -2.0974 -3.1070
-21.2580 0.4093 -1.6751
-23.6450 0.9838 -3.1070
-26.0636 1.5073 -1.6751
-28.4357 1.9258 -3.1070