Python library for GPS trajectory pre-processing? - python

I'd like to know if there is any implemented python library for GPS trajectory pre-processing such as compression, smoothing, filtering, etc.

Expanding on my comment, a Kalman filter is the usual choice for estimating position and velocity from noisy sensor readings.
Here's what Wikipedia has to say on the topic (emphasis mine:)
The Kalman filter is an algorithm, commonly used since the 1960s for
improving vehicle navigation (among other applications, although
aerospace is typical), that yields an optimized estimate of the
system's state (e.g. position and velocity). The algorithm works
recursively in real time on streams of noisy input observation data
(typically, sensor measurements) and filters out errors using a
least-squares curve-fit optimized with a mathematical prediction of
the future state generated through a modeling of the system's physical
characteristics.
The Kalman filter is the basic version; there's also the extended Kalman filter and unscented Kalman filter (though my control systems lecturer never got around to telling us what those were actually used for.)
#stark has provided a link to an implementation of the Kalman filter in Python (not sure of the quality.) You may be able to find others, or roll your own with scipy.

Not GPS-specific, but numpy has general statistics and scientific algorithms. For example, if you want to make a best-fit line to a series of points, you would run a linear regression on the data.

Related

Roadmap from sensor data to predictive maintenance

I am new about these topics. I research a lot of article about this issue. There are a lot of different techniques. But I am confused, because I don't know, where to start.
According to my research, first important thing; I must make preprocessing to the raw sensor data. There are some techniques, fft is one of them. (But how can I search to learn all techniques? I did not see all techniques in same page.)
Then I start the statistical calculates to processing.
I did not draw a roadmap. Can you help these issue or suggest books or anything?
Welcome to SO ... to leverage this site hover your mouse over top of tag fft on your question ... then click View tag ... then hit learn more ... then after reading the info page on fft hit Votes to see the highest voted posts here on SO ... those questions/answers will get you into the ball park
I highly suggest you master the details explained here Discrete Fourier Transform - Simple Step by Step
An Interactive Guide To The Fourier Transform
https://betterexplained.com/articles/an-interactive-guide-to-the-fourier-transform/
Intuitive Understanding of the Fourier Transform and FFTs
https://www.youtube.com/watch?v=FjmwwDHT98c
An Intuitive Discrete Fourier Transform Tutorial
http://practicalcryptography.com/miscellaneous/machine-learning/intuitive-guide-discrete-fourier-transform/
How to get frequency from fft result?
I could go on mentioning nuggets from my notes however I will leave you with this excerpt from an excellent book
http://www.dspguide.com/ch10/6.htm
The Discrete Time Fourier Transform (DTFT) is the member of the Fourier transform family that operates on aperiodic,
discrete signals. The best way to understand the DTFT is how it relates to the DFT. To start, imagine that you
acquire an N sample signal, and want to find its frequency spectrum. By using the DFT, the signal can be
decomposed into sine and cosine waves, with frequencies equally spaced between zero and one-half of the
sampling rate. As discussed in the last chapter, padding the time domain signal with zeros makes the period
of the time domain longer, as well as making the spacing between samples in the frequency domain narrower.
As N approaches infinity, the time domain becomes aperiodic, and the frequency domain becomes a continuous signal.
This is the DTFT, the Fourier transform that relates an aperiodic, discrete signal, with a periodic,
continuous frequency spectrum
The first step will be data cleaning and feature extraction. You need to prepare data in format that is applicable to Machine Learning algorithms. I recommend to you my paper "Generic Data Imputation and Feature Extraction for Signals
from Multifunctional Printers". It is about preparing data from IoT signals for further application of ML algorithms.

Algorithms to model non-linear relationship between two vectors

I want to build a model that describes a curve that fits the data shown in the scatterplot. I thought it would be straight forward using sklearn. But the choice and application of the different methods gets rather confusing.
Which algorithms would you use to tackle this problem?
This is really a question for CrossValidated rather than a Python question.
Your data seems to strongly indicate a simple underlying model which is linear until the very end, when it perhaps becomes polynomial.
As a first step, if possible, I would investigate this phenomenon. It's unusual. Perhaps there's something wrong with the data source. But maybe not. For example, a physical phenomenon with two distinct phases might produce data like these.
As to models, I would suggest natural cubic splines for this data. They are simple and involve cutting the data up into windows which you fit with cubic polynomials (a special case of which is a line).
You might also consider smoothing splines, and local regression.
For information on these, see the free online textbook, An Introduction to Statistical Learning.

How to get a time series based on a spectrogram in Python?

I have a time series and generate its spectrogram in Python with matplotlib.pyplot.specgram.
After I make some analysis and changes I need to convert the spectrogram back into time series.
Is there any function in matplotlib or in other library that I can use directly? Or if not, could you please elaborate on which direction I should work on?
Your warm help is appreciated.
Matplotlib is a library for plotting data. Generally if you're trying to do any computation you'd use a library suited for that.
numpy is a very popular library for doing numerical computation in Python. It just so happens they have a fairly extensive set of fft and ifft methods.
I would check them out here and see if they can solve your problem.
One thing commonly done (for example in the source separation community) is to use the phase data of the original signal (before transformation where applied to it) - the result is much better than null or random phase, and not so far from algorithms aiming at reconstructing the phase information from scratch.
A classic reconstruction algorithm is Griffin&Lim's, described in the paper "Signal estimation from modified short-time Fourier transform". This is an iterative algorithm, each iteration requires a full STFT / inverse STFT, which makes it quite costly.
This problem is indeed an active area of research, a search for STFT + reconstruction + magnitude will yield plenty of papers aiming at improving on Griffin&Lim in terms of signal quality and/or computational efficiency.
You can find detailed dicussion hereThread on DSP Stack Exchange

Classifying a Distribution of Points for Object Identification

I have some points that I need to classify. Given the collection of these points, I need to say which other (known) distribution they match best. For example, given the points in the top left distribution, my algorithm would have to say whether they are a better match to the 2nd, 3rd, or 4th distribution. (Here the bottom-left would be correct due to the similar orientations)
I have some background in Machine Learning, but I am no expert. I was thinking of using Gaussian Mixture Models, or perhaps Hidden Markov Models (as I have previously classified signatures with these- similar problem).
I would appreciate any help as to which approach to use for this problem. As background information, I am working with OpenCV and Python, so I would most likely not have to implement the chosen algorithm from scratch, I just want a pointer to know which algorithms would be applicable to this problem.
Disclaimer: I originally wanted to post this on the Mathematics section of StackExchange, but I lacked the necessary reputation to post images. I felt that my point could not be made clear without showing some images, so I posted it here instead. I believe that it is still relevant to Computer Vision and Machine Learning, as it will eventually be used for object identification.
EDIT:
I read and considered some of the answers given below, and would now like to add some new information. My main reason for not wanting to model these distributions as a single Gaussian is that eventually I will also have to be able to discriminate between distributions. That is, there might be two different and separate distributions representing two different objects, and then my algorithm should be aware that only one of the two distributions represents the object that we are interested in.
I think this depends on where exactly the data comes from and what sort of assumptions you would like to make as to its distribution. The points above can easily be drawn even from a single Gaussian distribution, in which case the estimation of parameters for each one and then the selection of the closest match are pretty simple.
Alternatively you could go for the discriminative option, i.e. calculate whatever statistics you think may be helpful in determining the class a set of points belongs to and perform classification using SVM or something similar. This can be viewed as embedding these samples (sets of 2d points) in a higher-dimensional space to get a single vector.
Also, if the data is actually as simple as in this example, you could just do the principle component analysis and match by the first eigenvector.
You should just fit the distributions to the data, determine the chi^2 deviation for each one, look at F-Test. See for instance these notes on model fitting etc
You might want to consider also non-parametric techniques (e.g. multivariate kernel density estimation on each of your new data set) in order to compare the statistics or distances of the estimated distributions. In Python stats.kde is an implementation in SciPy.Stats.

python - audio classification of equal length samples / 'vocoder' thingy

Anybody able to supply links, advice, or other forms of help to the following?
Objective - use python to classify 10-second audio samples so that I afterwards can speak into a microphone and have python pick out and play snippets (faded together) of closest matches from db.
My objective is not to have the closest match and I don't care what the source of the audio samples is. So the result is probably of no use other than speaking in noise (fun).
I would like the python app to be able to find a specific match of FFT for example within the 10 second samples in the db. I guess the real-time sampling of the microphone will have a 100 millisecond buffersample.
Any ideas? FFT? What db? Other?
In order to do this, you need three things:
Segmentation (decide how to make your audio samples)
Feature Extraction (decide what audio feature (e.g. FFT) you care about)
Distance Metric (decide what the "closest" sample is)
Segmentation: you currently describe using 10-second samples. I think you might have better results with shorter segments (closer to 100-1000ms) in order to get something that fits the changes in the voice better.
Feature Extraction: you mention using FFT. The zero crossing rate is surprisingly ok considering how simple it is. If you want to get more fancy, using MFCCs or spectral centroid is probably the way to go.
Distance Metric: most people use the euclidean distance, but there are also fancier ones like the manhattan distance, cosine distance, and earth-movers distance.
For a database, if you have a small enough set of samples, you might try just loading everything up into a kdtree so that you can do fast distance calculations, and just hold it in memory.
Good luck! It sounds like a fun project.
Try searching for algorithms on "music fingerprinting".
You could try some typical short-term feature extraction (e.g. energy, zero crossing rate, MFCCs, spectral features, chroma, etc) and then model your segment through a vector of feature statistics. Then you could use a simple distance-based classifier (e.g. kNN) in order to retrieve the "closest" training samples from a manually laballed set, given an unknown "query".
Check out my lib on several Python Audio Analysis functionalities: pyAudioAnalysis

Categories