I am using SpectrumLab for logging meteors with an SDR. SpectrumLab records waterfall screenshots and a wav file of an event. I am trying to reproduce the waterfall screenshot of SpectrumLab from the wave file but the pattern looks different:
import librosa
import matplotlib.pyplot as plt
import librosa.display
audio_data = 'event20230131_101027_26.wav'
x , sr = librosa.load(audio_data)
plt.figure(figsize=(14, 5))
librosa.display.waveshow(x, sr=sr)
X = librosa.stft(x)
Xdb = librosa.amplitude_to_db(abs(X))
plt.figure(figsize=(14, 5))
librosa.display.specshow(Xdb, sr=sr, x_axis='time', y_axis='log')
plt.colorbar()
plt.show()
Screenshot of SpectrumLab:
Screenshot generated with librosa:
Wave file which was recorded together with the screenshot in SpectrumLab:
Wavefile
Related
I am unable to render correctly a matplotlib plot that contains hatch pattern to a pdf via reportlab.
Here is below a sample code that highlights the problem. The example is a simple but reportlab is needed in the project I am working on, so I cannot output straight to pdf from matplotlib.
from io import BytesIO
import matplotlib.pyplot as plt
import numpy as np
from reportlab.graphics import renderPDF
from reportlab.pdfgen import canvas
from svglib.svglib import svg2rlg
x = np.arange(100)
y = np.random.randint(50, 90, 100)
fig = plt.figure()
plt.plot(x, y, "o", color="b")
plt.axhspan(ymin=40, ymax=55, hatch="/", facecolor="none")
plt.show()
imgdata = BytesIO()
fig.savefig(imgdata, format="svg")
imgdata.seek(0)
drawing = svg2rlg(imgdata)
drawing.scale(0.5, 0.5)
c = canvas.Canvas("test.pdf")
renderPDF.draw(drawing, c, 10, 200)
c.drawString(100, 100, "No hatch pattern rendered")
c.showPage()
c.save()
The expected plot output is (as it is from matplotlib):
however, when exporting to pdf, I'm getting:
The culprit here seems to be svg2rlg that does not the following from the svg output of matplotlib:
Can't handle color: url(#h06c0184332)
Is there a way to circumvent this problem while keeping the output from matplotlib as svg?
I am having some odd vertical scaling issues with librosa.feature.melspectrogram(). It seems that when I use librosa.load() with sr=None, the Hz scale doesn't coincide with the intended spectrographic features. To investigate this further, I looked at a pure 1,000Hz tone which I got from https://www.mediacollege.com/audio/tone/download/
import librosa
import librosa.display
import matplotlib.pyplot as plt
import numpy as np
filename = '1kHz_44100Hz_16bit_05sec.wav'
y1, sr1 = librosa.load(filename,sr=None)
y2, sr2 = librosa.load(filename)
fig, ax = plt.subplots(1,2)
S = librosa.feature.melspectrogram(y1, sr=sr1, n_mels=128)
S_DB = librosa.power_to_db(S, ref=np.max)
librosa.display.specshow(S_DB, sr=sr1, y_axis='mel', ax=ax[0]);
ax[0].title.set_text(f"sr1={sr1}\nload(filename,sr=None)")
S = librosa.feature.melspectrogram(y2, sr=sr2, n_mels=128)
S_DB = librosa.power_to_db(S, ref=np.max)
librosa.display.specshow(S_DB, sr=sr2, y_axis='mel', ax=ax[1]);
ax[1].title.set_text(f"sr2={sr2}\nload(filename)")
plt.tight_layout()
I'm not sure why the 1kHz tone is not lining up in both spectrograms. I would suspect the one with sr=None to be the more accurate as it is using the actual samplerate from the file. Would anyone know why there is a difference? The feature in the left plot is obviously not at 1kHz, but more like 800Hz or so. Thanks.
I am doing my final project at university: pitch estimation from song recording using convolutional neural network (CNN). I want to retrieve pitches existed in a song recording. For CNN input, I am using a spectrogram.
I am using MIR-QBSH dataset with pitch vectors as data label. Before processing the audio to CNN (each audio has 8 sec duration in .wav files of 8 KHz, 8 bit, mono), I need to pre-process the audio into a spectrogram representation.
I have found 3 ways to generate a spectrogram, the code are listed below.
Audio example I am using in this code is available here.
Imports:
import librosa
import numpy as np
import matplotlib.pyplot as plt
import librosa.display
from numpy.fft import *
import math
import wave
import struct
from scipy.io import wavfile
Spectrogram A
x, sr = librosa.load('audio/00020_2003_person1.wav', sr=None)
window_size = 1024
hop_length = 512
n_mels = 128
time_steps = 384
window = np.hanning(window_size)
stft= librosa.core.spectrum.stft(x, n_fft = window_size, hop_length = hop_length, window=window)
out = 2 * np.abs(stft) / np.sum(window)
plt.figure(figsize=(12, 4))
ax = plt.axes()
plt.set_cmap('hot')
librosa.display.specshow(librosa.amplitude_to_db(out, ref=np.max), y_axis='log', x_axis='time',sr=sr)
plt.savefig('spectrogramA.png', bbox_inches='tight', transparent=True, pad_inches=0.0 )
Spectrogram B
x, sr = librosa.load('audio/00020_2003_person1.wav', sr=None)
X = librosa.stft(x)
Xdb = librosa.amplitude_to_db(abs(X))
# plt.figure(figsize=(14, 5))
librosa.display.specshow(Xdb, sr=sr, x_axis='time', y_axis='hz')
Spectrogram C
# Read the wav file (mono)
samplingFrequency, signalData = wavfile.read('audio/00020_2003_person1.wav')
print(samplingFrequency)
print(signalData)
# Plot the signal read from wav file
plt.subplot(111)
plt.specgram(signalData,Fs=samplingFrequency)
plt.xlabel('Time')
plt.ylabel('Frequency')
Spectrogram results are displayed below:
My question is, from the 3 spectrograms I have listed above, which spectrogram is best to use for input to CNN and why should I use that spectrogram type? I am currently having difficulty to find their differences, as well as their pros and cons.
I have 276 audio file (.wav). I want to plot signal and spectrogram together like following code:
import matplotlib.pyplot as plot
from scipy.io import wavfile
samplingfrequency, signaldata = wavfile.read('/home/narges/dataset/seri1.16khz.128kbps/Voice Recorder/N00xxxx/N000200.wav')
plot.subplot(211)
plot.title('spec of vowel')
plot.plot(signaldata)
plot.xlabel('sample')
plot.ylabel('amp')
plot.subplot(212)
plot.specgram(signaldata,Fs=samplingfrequency)
plot.xlabel('time')
plot.ylabel('freq')
plot.show()
But i want to read all folders and sub-folders. I use this code for reading all folders and sub-folders:
path = Path('/home/narges/dataset/dataset-CV-16kHz-128kbps/train/').glob('**/*.wav')
wavs = [str(wavf) for wavf in path if wavf.is_file()]
wavs.sort()
And i'm using this code for saving Speaker ID:
number_of_files=len(wavs)
spk_ID = [wavs[i].split('/')[-1].lower() for i in range(number_of_files)]
Now, how can i change following code for read all of my .wav file in directory (path) and plot signal and spectrogram (like first code), and save it with the name of spk_ID?
def graph_spectrogram(wav_file):
wavs = rate, data = wavfile.read('/home/narges/dataset/dataset-CV-16kHz-128kbps/train/speaker_00/s_00_0_00.wav')
pxx, freq, bins, im = plt.specgram(x=data, Fs=rate, noverlap=384, NFFT=512)
spk_ID = [wavs.split('/')[-1].lower()]
plt.xlabel('time')
plt.ylabel('freq')
plt.savefig('xyz.png',bbox_inches='tight', dpi=300, frameon='false')
if __name__=='__main__':
graph_spectrogram('...')
I'm using this code and it's correct, if any one want to use:
from pathlib import Path
import matplotlib.pyplot as plot
from scipy.io import wavfile
path = Path('/home/narges/dataset/dataset-CV-16kHz-128kbps/train/').glob('**/*.wav')
wavs = [str(wavf) for wavf in path if wavf.is_file()]
wavs.sort()
number_of_files=len(wavs)
spk_ID = [wavs[i].split('/')[-1].lower() for i in range(number_of_files)]
for i in range(number_of_files):
samplingfrequency, signaldata = wavfile.read(wavs[i])
pxx, freq, bins, im = plot.specgram(x=signaldata, Fs=samplingfrequency, noverlap=384, NFFT=512)
plot.title('spec of vowel')
plot.xlabel('time')
plot.ylabel('freq')
plot.savefig("spk_ID:{}.png".format(spk_ID[i]), bbox_inches='tight', dpi=300, frameon='false')
I have just read a wav file with scipy and now I want to make the plot of the file using matplotlib, on the "y scale" I want to see the aplitude and over the "x scale" I want to see the numbers of frames!
Any help how can I do this??
Thank you!
from scipy.io.wavfile import read
import numpy as np
from numpy import*
import matplotlib.pyplot as plt
a=read("C:/Users/Martinez/Desktop/impulso.wav")
print a
You can call wave lib to read an audio file.
To plot the waveform, use the "plot" function from matplotlib
import matplotlib.pyplot as plt
import numpy as np
import wave
import sys
spf = wave.open("wavfile.wav", "r")
# Extract Raw Audio from Wav File
signal = spf.readframes(-1)
signal = np.fromstring(signal, "Int16")
# If Stereo
if spf.getnchannels() == 2:
print("Just mono files")
sys.exit(0)
plt.figure(1)
plt.title("Signal Wave...")
plt.plot(signal)
plt.show()
you will have something like:
To Plot the x-axis in seconds you need get the frame rate and divide by size of your signal, you can use linspace function from numpy to create a Time Vector spaced linearly with the size of the audio file and finally you can use plot again like plt.plot(Time,signal)
import matplotlib.pyplot as plt
import numpy as np
import wave
import sys
spf = wave.open("Animal_cut.wav", "r")
# Extract Raw Audio from Wav File
signal = spf.readframes(-1)
signal = np.fromstring(signal, "Int16")
fs = spf.getframerate()
# If Stereo
if spf.getnchannels() == 2:
print("Just mono files")
sys.exit(0)
Time = np.linspace(0, len(signal) / fs, num=len(signal))
plt.figure(1)
plt.title("Signal Wave...")
plt.plot(Time, signal)
plt.show()
New plot x-axis in seconds:
Alternatively, if you want to use SciPy, you may also do the following:
from scipy.io.wavfile import read
import matplotlib.pyplot as plt
# read audio samples
input_data = read("Sample.wav")
audio = input_data[1]
# plot the first 1024 samples
plt.plot(audio[0:1024])
# label the axes
plt.ylabel("Amplitude")
plt.xlabel("Time")
# set the title
plt.title("Sample Wav")
# display the plot
plt.show()
Here's a version that will also handle stereo inputs, based on the answer by #ederwander
import matplotlib.pyplot as plt
import numpy as np
import wave
file = 'test.wav'
with wave.open(file,'r') as wav_file:
#Extract Raw Audio from Wav File
signal = wav_file.readframes(-1)
signal = np.fromstring(signal, 'Int16')
#Split the data into channels
channels = [[] for channel in range(wav_file.getnchannels())]
for index, datum in enumerate(signal):
channels[index%len(channels)].append(datum)
#Get time from indices
fs = wav_file.getframerate()
Time=np.linspace(0, len(signal)/len(channels)/fs, num=len(signal)/len(channels))
#Plot
plt.figure(1)
plt.title('Signal Wave...')
for channel in channels:
plt.plot(Time,channel)
plt.show()
Here is the code to draw a waveform and a frequency spectrum of a wavefile
import wave
import numpy as np
import matplotlib.pyplot as plt
signal_wave = wave.open('voice.wav', 'r')
sample_rate = 16000
sig = np.frombuffer(signal_wave.readframes(sample_rate), dtype=np.int16)
For the whole segment of the wave file
sig = sig[:]
For partial segment of the wave file
sig = sig[25000:32000]
Separating stereo channels
left, right = data[0::2], data[1::2]
Plot the waveform (plot_a) and the frequency spectrum (plot_b)
plt.figure(1)
plot_a = plt.subplot(211)
plot_a.plot(sig)
plot_a.set_xlabel('sample rate * time')
plot_a.set_ylabel('energy')
plot_b = plt.subplot(212)
plot_b.specgram(sig, NFFT=1024, Fs=sample_rate, noverlap=900)
plot_b.set_xlabel('Time')
plot_b.set_ylabel('Frequency')
plt.show()
Just an observation (I cannot add comment).
You will receive the following mesage:
DeprecationWarning: Numeric-style type codes are deprecated and will
resultin an error in the future.
Do not use np.fromstring with binaries. Instead of signal = np.fromstring(signal, 'Int16'), it's preferred to use signal = np.frombuffer(signal, dtype='int16').
Here is a version that handles mono/stereo and 8-bit/16-bit PCM.
import matplotlib.pyplot as plt
import numpy as np
import wave
file = 'test.wav'
wav_file = wave.open(file,'r')
#Extract Raw Audio from Wav File
signal = wav_file.readframes(-1)
if wav_file.getsampwidth() == 1:
signal = np.array(np.frombuffer(signal, dtype='UInt8')-128, dtype='Int8')
elif wav_file.getsampwidth() == 2:
signal = np.frombuffer(signal, dtype='Int16')
else:
raise RuntimeError("Unsupported sample width")
# http://schlameel.com/2017/06/09/interleaving-and-de-interleaving-data-with-python/
deinterleaved = [signal[idx::wav_file.getnchannels()] for idx in range(wav_file.getnchannels())]
#Get time from indices
fs = wav_file.getframerate()
Time=np.linspace(0, len(signal)/wav_file.getnchannels()/fs, num=len(signal)/wav_file.getnchannels())
#Plot
plt.figure(1)
plt.title('Signal Wave...')
for channel in deinterleaved:
plt.plot(Time,channel)
plt.show()
I suppose I could've put this in a comment, but building slightly on the answers from both #ederwander and #TimSC, I wanted to make something more fine (as in detailed) and aesthetically pleasing. The code below creates what I think is a very nice waveform of a stereo or mono wave file (I didn't need a title so I just commented that out, nor did I need the show method - just needed to save the image file).
Here's an example of a stereo wav rendered:
And the code, with the differences I mentioned:
import matplotlib.pyplot as plt
import numpy as np
import wave
file = '/Path/to/my/audio/file/DeadMenTellNoTales.wav'
wav_file = wave.open(file,'r')
#Extract Raw Audio from Wav File
signal = wav_file.readframes(-1)
if wav_file.getsampwidth() == 1:
signal = np.array(np.frombuffer(signal, dtype='UInt8')-128, dtype='Int8')
elif wav_file.getsampwidth() == 2:
signal = np.frombuffer(signal, dtype='Int16')
else:
raise RuntimeError("Unsupported sample width")
# http://schlameel.com/2017/06/09/interleaving-and-de-interleaving-data-with-python/
deinterleaved = [signal[idx::wav_file.getnchannels()] for idx in range(wav_file.getnchannels())]
#Get time from indices
fs = wav_file.getframerate()
Time=np.linspace(0, len(signal)/wav_file.getnchannels()/fs, num=len(signal)/wav_file.getnchannels())
plt.figure(figsize=(50,3))
#Plot
plt.figure(1)
#don't care for title
#plt.title('Signal Wave...')
for channel in deinterleaved:
plt.plot(Time,channel, linewidth=.125)
#don't need to show, just save
#plt.show()
plt.savefig('/testing_folder/deadmentellnotales2d.png', dpi=72)
I came up with a solution that's more flexible and more performant:
Downsampling is used to achieve two samples per second. This is achieved by calculating the average of absolute values for each window. The result looks like the waveforms from streaming sites like SoundCloud.
Multi-channel is supported (thanks #Alter)
Numpy is used for each operation, which is much more performant than looping through the array.
The file is processed in batches to support very large files.
import matplotlib.pyplot as plt
import numpy as np
import wave
import math
file = 'audiofile.wav'
with wave.open(file,'r') as wav_file:
num_channels = wav_file.getnchannels()
frame_rate = wav_file.getframerate()
downsample = math.ceil(frame_rate * num_channels / 2) # Get two samples per second!
process_chunk_size = 600000 - (600000 % frame_rate)
signal = None
waveform = np.array([])
while signal is None or signal.size > 0:
signal = np.frombuffer(wav_file.readframes(process_chunk_size), dtype='int16')
# Take mean of absolute values per 0.5 seconds
sub_waveform = np.nanmean(
np.pad(np.absolute(signal), (0, ((downsample - (signal.size % downsample)) % downsample)), mode='constant', constant_values=np.NaN).reshape(-1, downsample),
axis=1
)
waveform = np.concatenate((waveform, sub_waveform))
#Plot
plt.figure(1)
plt.title('Waveform')
plt.plot(waveform)
plt.show()