Roadmap from sensor data to predictive maintenance - python

I am new about these topics. I research a lot of article about this issue. There are a lot of different techniques. But I am confused, because I don't know, where to start.
According to my research, first important thing; I must make preprocessing to the raw sensor data. There are some techniques, fft is one of them. (But how can I search to learn all techniques? I did not see all techniques in same page.)
Then I start the statistical calculates to processing.
I did not draw a roadmap. Can you help these issue or suggest books or anything?

Welcome to SO ... to leverage this site hover your mouse over top of tag fft on your question ... then click View tag ... then hit learn more ... then after reading the info page on fft hit Votes to see the highest voted posts here on SO ... those questions/answers will get you into the ball park
I highly suggest you master the details explained here Discrete Fourier Transform - Simple Step by Step
An Interactive Guide To The Fourier Transform
https://betterexplained.com/articles/an-interactive-guide-to-the-fourier-transform/
Intuitive Understanding of the Fourier Transform and FFTs
https://www.youtube.com/watch?v=FjmwwDHT98c
An Intuitive Discrete Fourier Transform Tutorial
http://practicalcryptography.com/miscellaneous/machine-learning/intuitive-guide-discrete-fourier-transform/
How to get frequency from fft result?
I could go on mentioning nuggets from my notes however I will leave you with this excerpt from an excellent book
http://www.dspguide.com/ch10/6.htm
The Discrete Time Fourier Transform (DTFT) is the member of the Fourier transform family that operates on aperiodic,
discrete signals. The best way to understand the DTFT is how it relates to the DFT. To start, imagine that you
acquire an N sample signal, and want to find its frequency spectrum. By using the DFT, the signal can be
decomposed into sine and cosine waves, with frequencies equally spaced between zero and one-half of the
sampling rate. As discussed in the last chapter, padding the time domain signal with zeros makes the period
of the time domain longer, as well as making the spacing between samples in the frequency domain narrower.
As N approaches infinity, the time domain becomes aperiodic, and the frequency domain becomes a continuous signal.
This is the DTFT, the Fourier transform that relates an aperiodic, discrete signal, with a periodic,
continuous frequency spectrum

The first step will be data cleaning and feature extraction. You need to prepare data in format that is applicable to Machine Learning algorithms. I recommend to you my paper "Generic Data Imputation and Feature Extraction for Signals
from Multifunctional Printers". It is about preparing data from IoT signals for further application of ML algorithms.

Related

Waveform frequency resolution for FFT – any way to increase it?

This article https://www.bitweenie.com/listings/fft-zero-padding/ gives a simple relation between time-length of the input data to the FFT and the minimum distance between two frequencies that can be distinguished in the FFT. The article calls this Waveform frequency resolution.
In other words; if two input-frequencies are closer in frequency than 1/time-length_of_input_data, they will show as only one peak in the FFT-plot.
My question is: is there a way to increase this Waveform frequency resolution? I am finding it difficult to work with rather short data-series due to this limitation.
As an example, if I use a combination of sine series with periods 9.5, 10, and 11 over 240 datapoints I cannot distinguish between the different frequencies.
To have good frequency resultion you need a long time series.
This is a fundamental issue, called uncertainty principle. It cannot be overcome within Fourier analysis (Fourier transform, DFT, short-time Fourier transform and so on).
Also note that zero padding will not overcome this issue.
It gives more points in the frequency domain, in the sense that the same spectral information is sampled more densely, but it will not make peaks sharper or more separated.
The only way to overcome the uncertainty principle is to make further assumptions on the data.
If for example you know that there is only a single frequency component, it is possible to determine its frequency more accurately than the uncertainty principle predicts.
Also you can use transforms such as the Vigner-Wille transform . It is not bound by the uncertainty principle, but generates "crossterms", i.e. frequency component artifacts. However, when you only have few frequency compoents this might be acceptable. Depends on the use-case.

How to determine the frequency range of interested sound with ambient noise

I'm very new to signal processing. I have two sound signal data right now. Each of the data is collected at a sample rate of 10 KHz, 2 seconds. I have imported this data into python. Both sound_1 and sound_2 is a numpy array right now. The length of each sound data is 20000 of course.
Sound_1 contains a water flow sound(which I'm interested) and environmental noise(I'm not interested), while sound_2 only contains environment noise(I'm not interested).
I'm looking for an algorithm(or package) which can help me determine the frequency range of this water flow sound. I think if I can find out the frequency range, I can use an inverse Fourier transform to filter the environment noise.
However, my ultimate purpose is to extract the water flow sound from sound_1 data and eliminate environmental noise. It would be great if there are other approaches.
I'm currently looking at this post: Python frequency detection
But I don't understand how they can find out the frequency by only one sound signal. I think we need to compare 2 signal data at least(one contains the sound I am interested, the other doesn't), so we can find out the difference.
Since sound_1 contains both water flow and environmental noise, there's no straightforward way of extracting the water flow. The Fourier transform will get you all frequencies in the signal, irrespective of the source.
The way to approach is get frequencies of environmental noise from sound_2 and then remove them from sound_1. After that is done, you can extract the frequencies from already denoised sound_1.
One of popular approaches to such noise reduction is with spectral gating. Essentially, you first determine how the noise sounds like and then remove smoothed spectrum from your signal. Smoothing is crucial, as sound is a wave, a continuous entity. If you simply chop out discrete frequencies from the wave, you will get very poor results (audio will sound unnatural and robotic). The amount of smoothing you apply will determine how much noise is reduced (mind it's never truly removed - you will always get some residue).
To the concrete solution.
As you're new to the subject, I'd recommend first how noise reduction works in a software that will do the work for you. Audacity is an excellent choice. I linked the manual for noise reduction, but there are plenty of tutorials out there.
After you know what you want to get, you can either implement spectral gating yourself or use existing package. Audacity has an excellent implementation in C++, but it may prove difficult to a newbie to port. I'd recommend going first with noisereduce package. It's based on Audacity implementation. If you use it, you will be done in a few lines.
Here's a snippet:
import noisereduce as nr
# load data
rate, data = wavfile.read("sound_1.wav")
# select section of data that is noise
noisy_part = wavfile.read("sound_2.wav")
# perform noise reduction
reduced_noise = nr.reduce_noise(audio_clip=data, noise_clip=noisy_part, verbose=True)
Now simply run FFT on the reduced_noise to discover the frequencies of water flow.
Here's how I am using noisereduce. In this part I am determining the frequency statistics.

How to get a time series based on a spectrogram in Python?

I have a time series and generate its spectrogram in Python with matplotlib.pyplot.specgram.
After I make some analysis and changes I need to convert the spectrogram back into time series.
Is there any function in matplotlib or in other library that I can use directly? Or if not, could you please elaborate on which direction I should work on?
Your warm help is appreciated.
Matplotlib is a library for plotting data. Generally if you're trying to do any computation you'd use a library suited for that.
numpy is a very popular library for doing numerical computation in Python. It just so happens they have a fairly extensive set of fft and ifft methods.
I would check them out here and see if they can solve your problem.
One thing commonly done (for example in the source separation community) is to use the phase data of the original signal (before transformation where applied to it) - the result is much better than null or random phase, and not so far from algorithms aiming at reconstructing the phase information from scratch.
A classic reconstruction algorithm is Griffin&Lim's, described in the paper "Signal estimation from modified short-time Fourier transform". This is an iterative algorithm, each iteration requires a full STFT / inverse STFT, which makes it quite costly.
This problem is indeed an active area of research, a search for STFT + reconstruction + magnitude will yield plenty of papers aiming at improving on Griffin&Lim in terms of signal quality and/or computational efficiency.
You can find detailed dicussion hereThread on DSP Stack Exchange

Background noise removal from audio signals using FFT Python

I am currently doing a project at university where I am distinguishing between different instruments playing notes of the same pitch using python.
I have recorded various notes on different instruments using a microphone attached to a computer.
I have also recorded background for the room.
So far I have plots for different notes on different instruments, where on the y-axis I have the amplitude in dB: 20*log10(|FFT(signal)|)
And on the x-axis I have DFT sample frequencies
Some of the harmonic peaks are small enough (or the background is large enough) for noise to be a factor-(can't post images as I'm a noob!) my problem is calculating the level of uncertainty in the height of the peaks when accounting for background noise.
My question is:
Well, how to calculate the level of uncertainty in the height of the peaks (their relative harmonic amplitudes) when accounting for background noise.
Some ideas:
What dB threshold I should use when classifying what is a harmonic peak and what is attributable to noise (should I discount a peak lower than the maximum backgound (~28000dB) or the mean (~15000) or perhaps twice one of these values)?
Also, to take account of the noise introduced by background, is it legitimate to subtract the value in FFT bin n for the background, from FFT bin n for my instrument recording?
Also I have looked at this post how can the noise be removed from a recorded sound,using fft in MATLAB? , there seem to be very differing opinions on there.
If it's relevant I can post segments of my code- wary of putting too much up though in case of classmate plagarism.
Links to literature that would help with the project would be very much appreciated. (Still at the stage where I'm plotting the data every which way I can think of to look for distinguishing attributes for each instrument).
Thanks in advance
You seem to be asking many questions. Let me start by answering your first one:
Well, how to calculate the level of uncertainty in the height of the
peaks (their relative harmonic amplitudes) when accounting for
background noise.
You would expect the sound to summate linearly (to a first order approximation). The natural thing to do would be to do some recordings of only the background and then measure the mean amplitude and standard deviation of the harmonics in the background.
As an example, say you are looking at 3 harmonics - 20KHz, 11KHz and 33KHz. Do some recordings of only background and you find mean amplitudes of 1.3dB 2.2dB and 2.3dB with standard deviations of say +/-0.1, +/-0.2 and +/-0.4dB. You now have an uncertainty estimate and a mean background harmonic to subtract from.
There are smarter ways to do this but it's a start.
Now then, to get on to your second question
What dB threshold I should use when classifying what is a harmonic
peak and what is attributable to noise (should I discount a peak lower
than the maximum backgound (~28000dB) or the mean (~15000) or perhaps
twice one of these values)?
If a peak is within the mean + the uncertainty (one or two standard deviations, this is arbitrary really and depends on convention) you can say it's significant. Eg if you find that noise level at 3KHz is 1.2dB with an uncertainty of +/- 0.3dB and you measure your harmonic to be 1.3dB with an uncertainty of (measured in the same way) of 0.1dB then it's not significant.
Now for the third part:
Also, to take account of the noise introduced by background, is it
legitimate to subtract the value in FFT bin n for the background, from
FFT bin n for my instrument recording?
Yes (generally speaking). If you really want to convince yourself of this you can either A)do some simulations with summating waves and doing an FFT of them, B)do an experiment and the same as in A or C)Go through the mathematics of Fourier transforms.
With regard to literature, I think that would depend on what you're doing specifically, if you're a physics student "Mathematical Methods in the Physical Sciences" by Mary Boas treats Fourier transforms well, if you're a computer scientist/engineer you probably want something different.
Let me know if you need more help.
As a musician [bassoon], neural network researcher, and using fft to compress bird song, I have a few suggestions:
musical instruments are defined by the bore -- straight, conical, or a combination. This results in emphasis of harmonic variations that will help distinguish.
double Reed, single Reed, flute, and brass instruments have different vibration patterns.
fft can resolve sound into fundamental and harmonics.
training a neural network to recognize harmonic patterns [normalize if fundamentals fiffer?] -- use a separate input for each frequency bin AND a separate output for each instrument [or family? -- it may be difficult to distinguish between saxes, for instance, while oboe, English horn, bassoon, and contra-bassoon may be distinguished]. I like at least 3 layer neural nets and had excellent results doing OCR with 4 layers [2 internal]

Python library for GPS trajectory pre-processing?

I'd like to know if there is any implemented python library for GPS trajectory pre-processing such as compression, smoothing, filtering, etc.
Expanding on my comment, a Kalman filter is the usual choice for estimating position and velocity from noisy sensor readings.
Here's what Wikipedia has to say on the topic (emphasis mine:)
The Kalman filter is an algorithm, commonly used since the 1960s for
improving vehicle navigation (among other applications, although
aerospace is typical), that yields an optimized estimate of the
system's state (e.g. position and velocity). The algorithm works
recursively in real time on streams of noisy input observation data
(typically, sensor measurements) and filters out errors using a
least-squares curve-fit optimized with a mathematical prediction of
the future state generated through a modeling of the system's physical
characteristics.
The Kalman filter is the basic version; there's also the extended Kalman filter and unscented Kalman filter (though my control systems lecturer never got around to telling us what those were actually used for.)
#stark has provided a link to an implementation of the Kalman filter in Python (not sure of the quality.) You may be able to find others, or roll your own with scipy.
Not GPS-specific, but numpy has general statistics and scientific algorithms. For example, if you want to make a best-fit line to a series of points, you would run a linear regression on the data.

Categories