I know it is possible to create .wav file from frequency domain data (magnitude + phase) but I would like to know how close would that be to the real(orginal) sound ? Does it depend on the frequency step for example (or something else).
Second question:
I need to write a code that takes a frequency domain data (magnitude + phase) to build a wav file.
In order to do so, I started by the following code which creates a fake signal --> fft (at this point I have the kind of input(mag + phase) that I would expect for my target code). But it doesn't seem top work fine, could you please help
import numpy as np
from scipy import pi
import matplotlib.pyplot as plt
#%matplotlib inline
from scipy.fftpack import fft
min=0
max=400
def calculateFFT (timeStep,micDataX,micDataY):
n=micDataX.size
FFT=np.fft.fft(micDataY)
fft_amlitude=2*abs(FFT)/n
fft_phase=np.angle(FFT)
fft_freq= np.fft.fftfreq(n, d=timeStep) #not used created manually (7 lines) check pi_fFreqDomainCreateConstantBW it is kept here to compare sizes
upper_bound=int((n)/2)
return fft_freq[1:upper_bound],fft_amlitude[1:upper_bound],fft_phase[1:upper_bound]
def calculateI_FFT (n,amplitude_spect,phase_spect):
data=list()
for mag,phase in zip(amplitude_spect,phase_spect):
data.append((mag*n/2)*(np.cos(phase)+1j* np.sin(phase)))
full_data=list(data)
i_data=np.fft.irfft(data)
return i_data
#sampling rate and time vector
start_time=0 #sec
end_time= 2
sampling_rate=1000 #Hz
N=(end_time-start_time)*sampling_rate
#Freq domain peaks
peak1_hz=60 # freq of peak
peak1_mag= 25
peak2_hz=270 # freq of peak
peak2_mag= 2
#Vibration data generation
time =np.linspace(start_time,end_time,N)
vib_data=peak1_mag*np.sin(2*pi*peak1_hz*time)+peak2_mag*np.sin(2*pi*peak2_hz*time)
#Data plotting
plt.plot(time[min:max],vib_data[min:max])
# fft
time_step=1/sampling_rate
fft_freq,fft_data,fft_phase=calculateFFT(time_step,time,vib_data)
#ifft
i_data=calculateI_FFT(N,fft_data,fft_phase)
#plotting
plt.plot(time[min:max],i_data[min:max])
plt.xlabel("Time (s)")
plt.ylabel("Vibration (g)")
plt.title("Time domain")
plt.show()
The output signal screenshot is attached (blue for original signal Orange for the reconstructed one)
enter image description here
Thank you!
Related
I'm trying to plot the Amplitude (dBFS) vs. Time (s) plot of an audio (.wav) file using matplotlib. I managed to do that with the following code:
def convert_to_decibel(sample):
ref = 32768 # Using a signed 16-bit PCM format wav file. So, 2^16 is the max. value.
if sample!=0:
return 20 * np.log10(abs(sample) / ref)
else:
return 20 * np.log10(0.000001)
from scipy.io.wavfile import read as readWav
from scipy.fftpack import fft
import matplotlib.pyplot as gplot1
import matplotlib.pyplot as gplot2
import numpy as np
import struct
import gc
wavfile1 = '/home/user01/audio/speech.wav'
wavsamplerate1, wavdata1 = readWav(wavfile1)
wavdlen1 = wavdata1.size
wavdtype1 = wavdata1.dtype
gplot1.rcParams['figure.figsize'] = [15, 5]
pltaxis1 = gplot1.gca()
gplot1.axhline(y=0, c="black")
gplot1.xticks(np.arange(0, 10, 0.5))
gplot1.yticks(np.arange(-200, 200, 5))
gplot1.grid(linestyle = '--')
wavdata3 = np.array([convert_to_decibel(i) for i in wavdata1], dtype=np.int16)
yvals3 = wavdata3
t3 = wavdata3.size / wavsamplerate1
xvals3 = np.linspace(0, t3, wavdata3.size)
pltaxis1.set_xlim([0, t3 + 2])
pltaxis1.set_title('Amplitude (dBFS) vs Time(s)')
pltaxis1.plot(xvals3, yvals3, '-')
which gives the following output:
I had also plotted the Power Spectral Density (PSD, in dBm) using the code below:
from scipy.signal import welch as psd # Computes PSD using Welch's method.
fpsd, wPSD = psd(wavdata1, wavsamplerate1, nperseg=1024)
gplot2.rcParams['figure.figsize'] = [15, 5]
pltpsdm = gplot2.gca()
gplot2.axhline(y=0, c="black")
pltpsdm.plot(fpsd, 20*np.log10(wPSD))
gplot2.xticks(np.arange(0, 4000, 400))
gplot2.yticks(np.arange(-150, 160, 10))
pltpsdm.set_xlim([0, 4000])
pltpsdm.set_ylim([-150, 150])
gplot2.grid(linestyle = '--')
which gives the output as:
The second output above, using the Welch's method plots a more presentable output. The dBFS plot though informative is not very presentable IMO. Is this because of:
the difference in the domains (time in case of 1st output vs frequency in the 2nd output)?
the way plot function is implemented in pyplot?
Also, is there a way I can plot my dBFS output as a peak-to-peak style of plot just like in my PSD (dBm) plot rather than a dense stem plot?
Would be much helpful and would appreciate any pointers, answers or suggestions from experts here as I'm just a beginner with matplotlib and plots in python in general.
TLNR
This has nothing to do with pyplot.
The frequency domain is different from the time domain, but that's not why you didn't get what you want.
The calculation of dbFS in your code is wrong.
You should frame your data, calculate RMSs or peaks in every frame, and then convert that value to dbFS instead of applying this transformation to every sample point.
When we talk about the amplitude, we are talking about a periodic signal. And when we read in a series of data from a sound file, we read in a series of sample points of a signal(may be or be not periodic). The value of every sample point represents a, say, voltage value, or sound pressure value sampled at a specific time.
We assume that, within a very short time interval, maybe 10ms for example, the signal is stationary. Every such interval is called a frame.
Some specific function is applied to each frame usually, to reduce the sudden change at the edge of this frame, and these functions are called window functions. If you did nothing to every frame, you added rectangle windows to them.
An example: when the sampling frequency of your sound is 44100Hz, in a 10ms-long frame, there are 44100*0.01=441 sample points. That's what the nperseg argument means in your psd function but it has nothing to do with dbFS.
Given the knowledge above, now we can talk about the amplitude.
There are two methods a get the value of amplitude in every frame:
The most straightforward one is to get the maximum(peak) values in every frame.
Another one is to calculate the RMS(Root Mean Sqaure) of every frame.
After that, the peak values or RMS values can be converted to dbFS values.
Let's start coding:
import numpy as np
import matplotlib.pyplot as plt
from scipy.io import wavfile
# Determine full scall(maximum possible amplitude) by bit depth
bit_depth = 16
full_scale = 2 ** bit_depth
# dbFS function
to_dbFS = lambda x: 20 * np.log10(x / full_scale)
# Read in the wave file
fname = "01.wav"
fs,data = wavfile.read(fname)
# Determine frame length(number of sample points in a frame) and total frame numbers by window length(how long is a frame in seconds)
window_length = 0.01
signal_length = data.shape[0]
frame_length = int(window_length * fs)
nframes = signal_length // frame_length
# Get frames by broadcast. No overlaps are used.
idx = frame_length * np.arange(nframes)[:,None] + np.arange(frame_length)
frames = data[idx].astype("int64") # Convert to in 64 to avoid integer overflow
# Get RMS and peaks
rms = ((frames**2).sum(axis=1)/frame_length)**.5
peaks = np.abs(frames).max(axis=1)
# Convert them to dbfs
dbfs_rms = to_dbFS(rms)
dbfs_peak = to_dbFS(peaks)
# Let's start to plot
# Get time arrays of every sample point and ever frame
frame_time = np.arange(nframes) * window_length
data_time = np.linspace(0,signal_length/fs,signal_length)
# Plot
f,ax = plt.subplots()
ax.plot(data_time,data,color="k",alpha=.3)
# Plot the dbfs values on a twin x Axes since the y limits are not comparable between data values and dbfs
tax = ax.twinx()
tax.plot(frame_time,dbfs_rms,label="RMS")
tax.plot(frame_time,dbfs_peak,label="Peak")
tax.legend()
f.tight_layout()
# Save serval details
f.savefig("whole.png",dpi=300)
ax.set_xlim(1,2)
f.savefig("1-2sec.png",dpi=300)
ax.set_xlim(1.295,1.325)
f.savefig("1.2-1.3sec.png",dpi=300)
The whole time span looks like(the unit of the right axis is dbFS):
And the voiced part looks like:
You can see that the dbFS values become greater while the amplitudes become greater at the vowel start point:
I have a sound signal of 5 secs length and it is from the sound of a propeller. I need to find rpm of the propeller by finding frequency of the envelopes.
import wave
import numpy as np
import matplotlib.pyplot as plt
raw = wave.open('/content/drive/MyDrive/Demon.wav','r')
signal = raw.readframes(-1)
signal = np.frombuffer(signal , dtype="int16")
frate = raw.getframerate()
time = np.linspace(0,len(signal) / frate,num = len(signal))
plt.figure(1)
plt.title("Sound Wave")
plt.xlabel("Time")
plt.plot(time, signal)
plt.show()
Here is the link to the sound file itself: https://sndup.net/5v3j
And since it is a 5 second-length signal and has 80.000 samples, I want to see it in details by looking 1 second part of the signal.
partial_signal = signal [1 : 16000]
partial_time = time[1 : 16000]
plt.plot(partial_time,partial_signal)
plt.show()
Output of the plot is shown below.
Edit: Looks like image will not show up here is the link to the image:
https://imgur.com/P5lnSM1
Now I need to find frequency of the envelopes thus rpm of the propeller by using only python.
You can do that quite easily with a fast Fourier transform (FFT) applied on the signal amplitude. Here is an example:
import wave
import numpy as np
import matplotlib.pyplot as plt
from scipy.fft import rfft, rfftfreq
from scipy.ndimage import gaussian_filter
raw = wave.open('Demon.wav','r')
signal = raw.readframes(-1)
signal = np.frombuffer(signal , dtype="int16")
frate = raw.getframerate()
time = np.linspace(0,len(signal) / frate,num = len(signal))
# Compute the amplitude of the sound signal
signalAmplitude = signal.astype(np.float64)**2
# Filter the signal to remove very short-timed amplitude modulations (<= 1 ms)
signalAmplitude = gaussian_filter(signalAmplitude, sigma=frate/1000)
# Compute the frequency amplitude of the FFT signal
tmpFreq = np.abs(rfft(signalAmplitude))
# Get the associated practical frequency for this signal
hzFreq = rfftfreq(signal.shape[0], d=1/frate)
finalFrequency = hzFreq[1+tmpFreq[1:].argmax()]
print(finalFrequency)
# Show sound frequency diagram
plt.xticks(np.arange(21))
plt.xlim([1, 20]) # Show only interesting low frequencies
plt.plot(hzFreq, tmpFreq)
plt.show()
The frequency diagram is the following:
The final detected frequency is 3.0 Hz which is very consistent with what we can hear.
I'm trying to do the following:
Extract the melody of me asking a question (word "Hey?" recorded to
wav) so I get a melody pattern that I can apply to any other
recorded/synthesized speech (basically how F0 changes in time).
Use polynomial interpolation (Lagrange?) so I get a function that describes the melody (approximately of course).
Apply the function to another recorded voice sample. (eg. word "Hey." so it's transformed to a question "Hey?", or transform the end of a sentence to sound like a question [eg. "Is it ok." => "Is it ok?"]). Voila, that's it.
What I have done? Where am I?
Firstly, I have dived into the math that stands behind the fft and signal processing (basics). I want to do it programatically so I decided to use python.
I performed the fft on the entire "Hey?" voice sample and got data in frequency domain (please don't mind y-axis units, I haven't normalized them)
So far so good. Then I decided to divide my signal into chunks so I get more clear frequency information - peaks and so on - this is a blind shot, me trying to grasp the idea of manipulating the frequency and analyzing the audio data. It gets me nowhere however, not in a direction I want, at least.
Now, if I took those peaks, got an interpolated function from them, and applied the function on another voice sample (a part of a voice sample, that is also ffted of course) and performed inversed fft I wouldn't get what I wanted, right?
I would only change the magnitude so it wouldn't affect the melody itself (I think so).
Then I used spec and pyin methods from librosa to extract the real F0-in-time - the melody of asking question "Hey?". And as we would expect, we can clearly see an increase in frequency value:
And a non-question statement looks like this - let's say it's moreless constant.
The same applies to a longer speech sample:
Now, I assume that I have blocks to build my algorithm/process but I still don't know how to assemble them beacause there are some blanks in my understanding of what's going on under the hood.
I consider that I need to find a way to map the F0-in-time curve from the spectrogram to the "pure" FFT data, get an interpolated function from it and then apply the function on another voice sample.
Is there any elegant (inelegant would be ok too) way to do this? I need to be pointed in a right direction beceause I can feel I'm close but I'm basically stuck.
The code that works behind the above charts is taken just from the librosa docs and other stackoverflow questions, it's just a draft/POC so please don't comment on style, if you could :)
fft in chunks:
import numpy as np
import matplotlib.pyplot as plt
from scipy.io import wavfile
import os
file = os.path.join("dir", "hej_n_nat.wav")
fs, signal = wavfile.read(file)
CHUNK = 1024
afft = np.abs(np.fft.fft(signal[0:CHUNK]))
freqs = np.linspace(0, fs, CHUNK)[0:int(fs / 2)]
spectrogram_chunk = freqs / np.amax(freqs * 1.0)
# Plot spectral analysis
plt.plot(freqs[0:250], afft[0:250])
plt.show()
spectrogram:
import librosa.display
import numpy as np
import matplotlib.pyplot as plt
import os
file = os.path.join("/path/to/dir", "hej_n_nat.wav")
y, sr = librosa.load(file, sr=44100)
f0, voiced_flag, voiced_probs = librosa.pyin(y, fmin=librosa.note_to_hz('C2'), fmax=librosa.note_to_hz('C7'))
times = librosa.times_like(f0)
D = librosa.amplitude_to_db(np.abs(librosa.stft(y)), ref=np.max)
fig, ax = plt.subplots()
img = librosa.display.specshow(D, x_axis='time', y_axis='log', ax=ax)
ax.set(title='pYIN fundamental frequency estimation')
fig.colorbar(img, ax=ax, format="%+2.f dB")
ax.plot(times, f0, label='f0', color='cyan', linewidth=2)
ax.legend(loc='upper right')
plt.show()
Hints, questions and comments much appreciated.
The problem was that I didn't know how to modify the fundamental frequency (F0). By modifying it I mean modify F0 and its harmonics, as well.
The spectrograms in question show frequencies at certain points in time with power (dB) of certain frequency point.
Since I know which time bin holds which frequency from the melody (green line below) ...
....I need to compute a function that represents that green line so I can apply it to other speech samples.
So I need to use some interpolation method which takes as parameters the sample F0 function points.
One need to remember that degree of the polynomial should equal to the number of points. The example doesn't have that unfortunately, but the effect is somehow ok as for the prototype.
def _get_bin_nr(val, bins):
the_bin_no = np.nan
for b in range(0, bins.size - 1):
if bins[b] <= val < bins[b + 1]:
the_bin_no = b
elif val > bins[bins.size - 1]:
the_bin_no = bins.size - 1
return the_bin_no
def calculate_pattern_poly_coeff(file_name):
y_source, sr_source = librosa.load(os.path.join(ROOT_DIR, file_name), sr=sr)
f0_source, voiced_flag, voiced_probs = librosa.pyin(y_source, fmin=librosa.note_to_hz('C2'),
fmax=librosa.note_to_hz('C7'), pad_mode='constant',
center=True, frame_length=4096, hop_length=512, sr=sr_source)
all_freq_bins = librosa.core.fft_frequencies(sr=sr, n_fft=n_fft)
f0_freq_bins = list(filter(lambda x: np.isfinite(x), map(lambda val: _get_bin_nr(val, all_freq_bins), f0_source)))
return np.polynomial.polynomial.polyfit(np.arange(0, len(f0_freq_bins), 1), f0_freq_bins, 3)
def calculate_pattern_poly_func(coefficients):
return np.poly1d(coefficients)
Method calculate_pattern_poly_coeff calculates polynomial coefficients.
Using pythons poly1d lib I can compute function which can modify the speech. How to do that?
I just need to move up or down all values vertically at certain point in time.
for instance I want to move all frequencies at time bin 0,75 seconds up 3 times -> it means that frequency will be increased and the melody at that point will sound higher.
Code:
def transform(sentence_audio_sample, mode=None, show_spectrograms=False, frames_from_end_to_transform=12):
# cutting out silence
y_trimmed, idx = librosa.effects.trim(sentence_audio_sample, top_db=60, frame_length=256, hop_length=64)
stft_original = librosa.stft(y_trimmed, hop_length=hop_length, pad_mode='constant', center=True)
stft_original_roll = stft_original.copy()
rolled = stft_original_roll.copy()
source_frames_count = np.shape(stft_original_roll)[1]
sentence_ending_first_frame = source_frames_count - frames_from_end_to_transform
sentence_len = np.shape(stft_original_roll)[1]
for i in range(sentence_ending_first_frame + 1, sentence_len):
if mode == 'question':
by = int(_question_pattern(i) / 500)
elif mode == 'exclamation':
by = int(_exclamation_pattern(i) / 500)
else:
by = 0
rolled = _roll_column(rolled, i, by)
transformed_data = librosa.istft(rolled, hop_length=hop_length, center=True)
def _roll_column(two_d_array, column, shift):
two_d_array[:, column] = np.roll(two_d_array[:, column], shift)
return two_d_array
In this case I am simply rolling up or down frequencies referencing certain time bin.
This needs to be polished as it doesn't take into consideration an actual state of the transformed sample. It just rolls it up/down according to the factor calculated using the polynomial function computer earlier.
You can check full code of my project at github, "audio" package contains pattern calculator and audio transform algorithm described above.
Feel free to ask if something's unclear :)
I have been thinking about it for a long time, but I don't find out what the problem is. Hope you can help me, Thank you.
F(s) Gaussian function
F(s)=1/(√2π s) e^(-(w-μ)^2/(2s^2 ))
Code:
import numpy as np
from matplotlib import pyplot as plt
from math import pi
from scipy.fft import fft
def F_S(w, mu, sig):
return (np.exp(-np.power(w-mu, 2)/(2 * np.power(sig, 2))))/(np.power(2*pi, 0.5)*sig)
w=np.linspace(-5,5,100)
plt.plot(w, np.real(np.fft.fft(F_S(w, 0, 1))))
plt.show()
Result:
As was mentioned before you want the absolute value, not the real part.
A minimal example, showing the the re/im , abs/phase spectra.
import numpy as np
import matplotlib.pyplot as p
%matplotlib inline
n=1001 # add 1 to keep the interval a round number when using linspace
t = np.linspace(-5, 5, n ) # presumed to be time
dt=t[1]-t[0] # time resolution
print(f'sampling every {dt:.3f} sec , so at {1/dt:.1f} Sa/sec, max. freq will be {1/2/dt:.1f} Hz')
y = np.exp(-(t**2)/0.01) # signal in time
fr= np.fft.fftshift(np.fft.fftfreq(n, dt)) # shift helps with sorting the frequencies for better plotting
ft=np.fft.fftshift(np.fft.fft(y)) # fftshift only necessary for plotting in sequence
p.figure(figsize=(20,12))
p.subplot(231)
p.plot(t,y,'.-')
p.xlabel('time (secs)')
p.title('signal in time')
p.subplot(232)
p.plot(fr,np.abs(ft), '.-',lw=0.3)
p.xlabel('freq (Hz)')
p.title('spectrum, abs');
p.subplot(233)
p.plot(fr,np.real(ft), '.-',lw=0.3)
p.xlabel('freq (Hz)')
p.title('spectrum, real');
p.subplot(235)
p.plot(fr,np.angle(ft), '.-', lw=0.3)
p.xlabel('freq (Hz)')
p.title('spectrum, phase');
p.subplot(236)
p.plot(fr,np.imag(ft), '.-',lw=0.3)
p.xlabel('freq (Hz)')
p.title('spectrum, imag');
you have to change from time scale to frequency scale
When you make a FFT you will get the simetric tranformation, i.e, mirror of the positive to negative curve. Usually, you only will look at the positive side.
Also, you should take care with sample rate, as FFT is designed to transform time domain input to frequency domain, the time, or sample rate, of input info matters. So add timestep in np.fft.fftfreq(n, d=timestep) for your sample rate.
If you simple want to make a fft of normal dist signal, here is another question with it and some good explanations on why are you geting this behavior:
Fourier transform of a Gaussian is not a Gaussian, but thats wrong! - Python
There are two mistakes in your code:
Don't take the real part, take the absoulte value when plotting.
From the docs:
If A = fft(a, n), then A[0] contains the zero-frequency term (the mean
of the signal), which is always purely real for real inputs. Then
A[1:n/2] contains the positive-frequency terms, and A[n/2+1:] contains
the negative-frequency terms, in order of decreasingly negative
frequency.
You can rearrange the elements with np.fft.fftshift.
The working code:
import numpy as np
from matplotlib import pyplot as plt
from math import pi
from scipy.fftpack import fft, fftshift
def F_S(w, mu, sig):
return (np.exp(-np.power(w-mu, 2)/(2 * np.power(sig, 2))))/(np.power(2*pi, 0.5)*sig)
w=np.linspace(-5,5,100)
plt.plot(w, fftshift(np.abs(np.fft.fft(F_S(w, 0, 1)))))
plt.show()
Also, you might want to consider scaling the x axis too.
Could you please advise me on the following:
I gather data from an Arduino ADC and store the data in a list on a Raspberry Pi 4 with Python 3.
The list is called 'dataList' and contains 1024 10 bits samples. This all works fine: I can reproduce the sampled signal on the Raspberry.
I would like to use the power spectrum of the acquired signal using numpy FFT.
I tried the following:
[see below]
This should illustrate what I'm trying to do; however this produces incoherent output. The sampled signal has a frequency of about 300 Hz. I would be very grateful for any hints in the right direction!
def show_FFT(window):
fft = np.fft.fft (dataList, 1024, -1, None)
for X_value in range (0,512, 1):
Y_value = fft ([X_value]
gfxdraw.pixel (window, X_value, int(abs(Y_value), black)
As you mentioned in your question, you have a data set whith X starting from 0 to... but for numpy.fft.fft you must keep in mind that it is a discrete Fourier transform (DFT) which caculate the fft of equaly spaced samples and i must mntion that it must be a symetric range of dataset from -x to x. You can simply try it with a gausian finction and change the parameters as you wish and see what are the results...
Since you didn''t give any data set here , I would refer you to a generl case with below code:
import numpy as np
from scipy import interpolate
import matplotlib.pyplot as plt
# create data from dataframes
x = np.random.rand(50) #unequaly spaced measurment
x.sort()
y = np.exp(-x*x) #measured signal
based on the answer here you can resample your data into equaly spaced points by:
f = interpolate.interp1d(x, y)
num = 500
xx = np.linspace(x[0], x[-1], num)
yy = f(xx)
plt.close('all')
plt.plot(x,y,'bo')
plt.plot(xx,yy, 'g.-')
plt.show()
enter image description here
then you can make your x data symetric very simply by :
x=xx
y=yy
xsample = x-((x.max()-x.min())/2)
xsample=xsample-(xsample.max()+xsample.min())/2
x=xsample
thne if you try fft you will get the corect results as:
ysample =yy
ysample_fft = np.fft.fftshift(np.abs(np.fft.fft(ysample/ysample.max()))) /
np.sqrt(len(ysample))
plt.plot(xsample,ysample_fft/ysample_fft.max(),'b--')
plt.show()
enter image description here