I'm a python beginner and as a learning project I'm doing a SSTV encoder using Wraase SC2-120 methods.
SSTV for those who don't know is a technique sending images through radio as sound and to be decoded in the receiving end back to an image. Wraase SC2-120 is one of many types of encoding but it's one of the more simpler ones that support color.
I've been able to create a system that takes an image and converts it to an array. Then take that array and create the needed values for the luminance and chrominance needed for the encoder.
I'm using then this block to create a value between 1500hz - 2300hz for the method.
def ChrominanceAsHertz(value=0.0):
value = 800 * value
value -= value % 128 # Test. Results were promising but too much noise
value += 1500
return int(value)
You can ignore the modulus operation. It is just my way of playing around with the data for "fun" and experiments.
I then clean the audio to avoid having too many of the same values in the same array and add their duration together to achieve a cleaner sound
cleanTone = []
cleanDuration = []
for i in range(len(hertzData)-1):
# If the next tone is not the same
# Add it to the cleantone array
# with it's initial duration
if hertzData[i] != hertzData[i+1]:
cleanTone.append(hertzData[i])
cleanDuration.append(durationData[i])
# else add the duration of the current hertz to the clean duration array
else:
# the current duration is the last inserted duration
currentDur = cleanDuration[len(cleanDuration)-1]
# Add the new duration to the current duration
currentDur += durationData[i]
cleanDuration[len(cleanDuration)-1] = currentDur
My array handling can use some work but it's not why I'm here for now.
The result is a array where no consecutive values are the same and the duration of that tone is still correct.
I then create a sinewave array using this block
audio = []
for i in range(len(cleanTone)):
sineAudio = AudioGen.SineWave(cleanTone[i], cleanDuration[i])
for sine in sineAudio:
audio.append(sine)
The sinewave function is
def SineWave( freq=440, durationMS = 500, sample_rate = 44100.0 ):
num_samples = durationMS * (sample_rate / 1000)
audio = []
for i in range(int(num_samples)):
audio.insert(i, np.sin(2 * np.pi * freq * (i / sample_rate)))
return audio
It works as intended. It creates a sinewave of the frequency I want and for the duration I want.
The problem is that when I create the .wav file then with wave the sinewaves created are not smoothly transitioning.
Screenshot of a closeup of what I mean.
Sinusoidal wave artifacts
The audio file has these immense screeches and cracks because of these artifacts that the above method produces, seeing as how it takes a single frequency and duration with no regard of where the last tone ended and starts a new.
What I've tried to do to remedy these is to refactor that SineWave method to take in a whole array and create the sinewaves consecutively right after one another in hopes of achieving a clean sound but it still did the same thing.
I also tried "smoothing" the generated audio array then with a simple filtering operation from this post.
0.7 * audio[1:-1] + 0.15 * ( audio[2:] + audio[:-2] )
but the results again were not satisfying and the artifacts were still present.
I've also started to look into Fourier Transforms, mainly FFT (fast fourier transform) but I'm not that familiar with them yet to know exactly what it is that I'm trying to do and code.
For SSTV to work the changes in frequency have to sometimes be very fast. 0.3ms fast to be exact, so I'm kinda lost on how to achieve this without loosing too much data in the process.
TL;DR
My sinewave function is producing artifacts inbetween tone changes that cause scratches and unwanted pops. How to not do that?
What you need to transfer from one wave-snippet to the next is the phase. You have to start the next wave with the phase you ended the previous phase.
def SineWave( freq=440, durationMS = 500, phase = 0, sample_rate = 44100.0):
num_samples = int(durationMS * (sample_rate / 1000))
audio = []
for i in range(num_samples):
audio.insert(i, np.sin(2 * np.pi * freq * (i / sample_rate) + phase))
phase = (phase + 2 * np.pi * freq * (num_samples / sample_rate)) % (2 * np.pi)
return audio, phase
In your main loop pass through the phase from one wave fragment to the next:
audio = []
phase = 0
for i in range(len(cleanTone)):
sineAudio, phase = AudioGen.SineWave(cleanTone[i], cleanDuration[i], phase)
for sine in sineAudio:
audio.append(sine)
Related
I am trying to generate a "beep" sound with a constant tone of 2350 Hz. I am using the code (which I got here) below to generate a WAV file with this tone that has a duration of 0.5 seconds.
import math
import wave
import struct
# Audio will contain a long list of samples (i.e. floating point numbers describing the
# waveform). If you were working with a very long sound you'd want to stream this to
# disk instead of buffering it all in memory list this. But most sounds will fit in
# memory.
audio = []
sample_rate = 44100.0
def append_silence(duration_milliseconds=500):
"""
Adding silence is easy - we add zeros to the end of our array
"""
num_samples = duration_milliseconds * (sample_rate / 1000.0)
for x in range(int(num_samples)):
audio.append(0.0)
return
def append_sinewave(
freq=440.0,
duration_milliseconds=500,
volume=1.0):
"""
The sine wave generated here is the standard beep. If you want something
more aggresive you could try a square or saw tooth waveform. Though there
are some rather complicated issues with making high quality square and
sawtooth waves... which we won't address here :)
"""
global audio # using global variables isn't cool.
num_samples = duration_milliseconds * (sample_rate / 1000.0)
for x in range(int(num_samples)):
audio.append(volume * math.sin(2 * math.pi * freq * ( x / sample_rate )))
return
def save_wav(file_name):
# Open up a wav file
wav_file=wave.open(file_name,"w")
# wav params
nchannels = 1
sampwidth = 2
# 44100 is the industry standard sample rate - CD quality. If you need to
# save on file size you can adjust it downwards. The stanard for low quality
# is 8000 or 8kHz.
nframes = len(audio)
comptype = "NONE"
compname = "not compressed"
wav_file.setparams((nchannels, sampwidth, sample_rate, nframes, comptype, compname))
# WAV files here are using short, 16 bit, signed integers for the
# sample size. So we multiply the floating point data we have by 32767, the
# maximum value for a short integer. NOTE: It is theortically possible to
# use the floating point -1.0 to 1.0 data directly in a WAV file but not
# obvious how to do that using the wave module in python.
for sample in audio:
wav_file.writeframes(struct.pack('h', int( sample * 32767.0 )))
wav_file.close()
return
append_sinewave(volume=1, freq=2350)
save_wav("output.wav")
When run the code below (using Librosa) to generate a spectrogram of the WAV file I see this:
Spectrogram:
Code:
beepData,beep_sample_rate = librosa.load(beepSoundPath, sr=44100)
D = librosa.stft(beepData)
S_db = librosa.amplitude_to_db(np.abs(D), ref=np.max)
librosa.display.specshow(S_db)
The problem is the extra frequencies at the beginning and end of the spectrogram. How can I get rid of these unwanted frequencies?
These are artifacts of the STFT / FFT process, because there are discontinuities at the start/end of a window. You can try to use librosa.stft(..., center=False), which should eliminate the one in the start. Then you may need to trim off / ignore the last output segments as well. At least half of n_fft parameter.
I am trying to detect sudden loud noises in audio recordings. One way I have found to do this is by creating a spectrogram of the audio and adding the values of each column. By graphing the sum of the values in each column, one can see a spike every time there is a sudden loud noise. The problem is that, in my use case, I need to play a beep tone (with a frequency of 2350 Hz), while the audio is being recorded. The spectrogram of the beep looks like this:
As you can see, at the beginning and the end of this beep (which is a simple tone with a frequency of 2350 Hz), there are other frequencies present, which I have been unsuccessful in removing. These unwanted frequencies cause a spike when summing up the columns of the spectrogram, at the beginning and at the end of the beep. I want to avoid this because I don't want my beep to be detected as a sudden loud noise. See the spectrogram below for reference:
Here is what the graph of the sum of each column in the spectrogram:
Obviously, I want to avoid having false positives in my algorithm. So I need some way of getting rid of the spikes caused by the beginning and end of the beep. One idea that I have had so far is to add random noise with a low decibel value above and/or below the 2350 Hz line in the beep spectrogram above. This would ideally, create a tone that sounds very similar to the original, but instead of creating a spike when I add up all the values in the column, it would create more of a plateau. Is this idea a feasible solution to my problem? If so, how would I go about creating a beep sound that has random noise like I described above using python? Is there another, easier solution to my problem that I am overlooking?
Currently, I am using the following code to generate my beep sound:
import math
import wave
import struct
audio = []
sample_rate = 44100.0
def append_sinewave(
freq=440.0,
duration_milliseconds=500,
volume=1.0):
"""
The sine wave generated here is the standard beep. If you want something
more aggresive you could try a square or saw tooth waveform. Though there
are some rather complicated issues with making high quality square and
sawtooth waves... which we won't address here :)
"""
global audio # using global variables isn't cool.
num_samples = duration_milliseconds * (sample_rate / 1000.0)
for x in range(int(num_samples)):
audio.append(volume * math.sin(2 * math.pi * freq * ( x / sample_rate )))
return
def save_wav(file_name):
# Open up a wav file
wav_file=wave.open(file_name,"w")
# wav params
nchannels = 1
sampwidth = 2
# 44100 is the industry standard sample rate - CD quality. If you need to
# save on file size you can adjust it downwards. The stanard for low quality
# is 8000 or 8kHz.
nframes = len(audio)
comptype = "NONE"
compname = "not compressed"
wav_file.setparams((nchannels, sampwidth, sample_rate, nframes, comptype, compname))
# WAV files here are using short, 16 bit, signed integers for the
# sample size. So we multiply the floating point data we have by 32767, the
# maximum value for a short integer. NOTE: It is theortically possible to
# use the floating point -1.0 to 1.0 data directly in a WAV file but not
# obvious how to do that using the wave module in python.
for sample in audio:
wav_file.writeframes(struct.pack('h', int( sample * 32767.0 )))
wav_file.close()
return
append_sinewave(volume=1, freq=2350)
save_wav("output.wav")
Not really an answer - more of a question.
You're asking the speaker to go from stationary to a sine wave instantaneously - that is quite hard to do (though the frequencies aren't that high). If it does manage it, then the received signal should be the convolution of the top hat and the sine wave (sort of like what you are seeing, but without having some data and knowing what you're doing for the spectrogram it's hard to tell).
In either case you could check this by smoothing the start and end of your tone. Something like this for your tone generation:
tr = 0.05 # rise time, in seconds
tf = duration_milliseconds / 1000 # finish time of tone, in seconds
for x in range(int(num_samples)):
t = x / sample_rate # Time of sample in seconds
# Calculate a bump function
bump_function = 1
if 0 < t < tr: # go smoothly from 0 to 1 at the start of the tone
tp = 1 - t / tr
bump_function = math.e * math.exp(1/(tp**2 - 1))
elif tf - tr < t < tf: # go smoothly from 1 to 0 at the end of the tone
tp = 1 + (t - tf) / tr
bump_function = math.e * math.exp(1/(tp**2 - 1))
audio.append(volume * bump_function * math.sin(2 * math.pi * freq * t))
You might need to tune the rise time a bit. With this form of bump function you know that you have a full volume tone from tr after the start to tr before the end. Lots of other functions exist, but if this smooths the start/stop effects in your spectrogram then you at least know why they are there. And prevention is generally better than trying to remove the effect in post-processing.
I'm trying to make a wavetable synthesizer in Python for the first time (based off an example I found here https://blamsoft.com/tutorials/expanse-creating-wavetables/) but the resultant sound I'm getting doesn't sound tonal at all. My output is just a low grainy buzz. I'm pretty new to making wavetables in Python and I was wondering if anybody might be able to tell me what I'm missing in order to write an A440 sine wavetable to the file "wavetable.wav" and have it actually produce a pure sine tone? Here's what I have at the moment:
import wave
import struct
import numpy as np
frame_count = 256
frame_size = 2048
sps = 44100
freq_hz = 440
file = "wavetable.wav" #write waveform to file
wav_file = wave.open(file, 'w')
wav_file.setparams((1, 2, sps, frame_count, 'NONE', 'not compressed'))
values = bytes(0)
for i in range(frame_count):
for ii in range(frame_size):
sample = np.sin((float(ii)/frame_size) * (i+128)/256 * 2 * np.pi * freq_hz/sps) * 65535
if sample < 0:
sample = 0
sample -= 32768
sample = int(sample)
values += struct.pack('h', sample)
wav_file.writeframes(values)
wav_file.close()
print("Generated " + file)
The sine function I have inside the for loop is probably the part I understand the least because I just went by the example verbatim. I'm used to making sine functions like (y = Asin(2πfx)) but I'm not sure what the purpose is of multiplying by ((i+128)/256) and 65535 (16-bit amplitude resolution?). I'm also not sure what the purpose is of subtracting 32768 from each sample. Is anyone able to clarify what I'm missing and maybe point me in the right direction? Am I going about this the wrong way? Any help is appreciated!
If you just wanted to generate sound data ahead of time and then dump it all into a file, and you’re also comfortable using NumPy, I’d suggest using it with a library like SoundFile. Then there’s no need to delimit the data into frames.
Starting with a naïve approach (using numpy.sin, not trying to optimize things yet), one ends with something like this:
from math import tau
import numpy as np
import soundfile as sf
file_path = 'sine.flac'
sample_rate = 48_000 # hertz
duration = 1.0 # seconds
frequency = 432.0 # hertz
amplitude = 0.8 # (not in decibels!)
start_phase = 0.0 # at what phase to start
sample_count = floor(sample_rate * duration)
# cyclical frequency in sample^-1
omega = frequency * tau / sample_rate
# all phases for which we want to sample our sine
phases = np.linspace(start_phase, start_phase + omega * sample_count,
sample_count, endpoint=False)
# our sine wave samples, generated all at once
audio = amplitude * np.sin(phases)
# now write to file
fmt, sub = 'FLAC', 'PCM_24'
assert sf.check_format(fmt, sub) # to make sure we ask the correct thing beforehand
sf.write(file_path, audio, sample_rate, format=fmt, subtype=sub)
This will be a mono sound, you can write stereo using 2d arrays (see NumPy and SoundFile’s docs).
But note that to make a wavetable specifically, you need to be sure it contains just a single period (or an integer number of periods) of the wave exactly, so the playback of the wavetable will be without clicks and have a correct frequency.
You can play chunked sound in real time in Python too, using something like PyAudio. (I’ve not yet used that, so at least for a time this answer would lack code related to that.)
Finally, frankly, all above is unrelated to the generation of sound data from a wavetable: you just pick a wavetable from somewhere, that doesn’t do much for actual synthesis. Here is a simple starting algorithm for that. Assume you want to play back a chunk of sample_count samples and have a wavetable stored in wavetable, a single period which loops perfectly and is normalized. And assume your current wave phase is start_phase, frequency is frequency, sample rate is sample_rate, amplitude is amplitude. Then:
# indices for the wavetable values; this is just for `np.interp` to work
wavetable_period = float(len(wavetable))
wavetable_indices = np.linspace(0, wavetable_period,
len(wavetable), endpoint=False)
# frequency of the wavetable played at native resolution
wavetable_freq = sample_rate / wavetable_period
# start index into the wavetable
start_index = start_phase * wavetable_period / tau
# code above you run just once at initialization of this wavetable ↑
# code below is run for each audio chunk ↓
# samples of wavetable per output sample
shift = frequency / wavetable_freq
# fractional indices into the wavetable
indices = np.linspace(start_index, start_index + shift * sample_count,
sample_count, endpoint=False)
# linearly interpolated wavetavle sampled at our frequency
audio = np.interp(indices, wavetable_indices, wavetable,
period=wavetable_period)
audio *= amplitude
# at last, update `start_index` for the next chunk
start_index += shift * sample_count
Then you output the audio. Though there are better ways to play back a wavetable, linear interpolation is at least a fine start. Frequency slides are also possible with this approach: just compute indices in another way, no longer spaced uniformly.
I am analyzing the spectrogram's of .wav files. But after getting the code to finally work, I've run into a small issue. After saving the spectrograms of 700+ .wav files I realize that they all essentially look the same!!! This is not because they are the same audio file, but because I don't know how to change the scale of the plot to be smaller(so I can make out the differences).
I've already tried to fix this issue by looking at this StackOverflow post
Changing plot scale by a factor in matplotlib
I'll show the graph of two different .wav files below
This is .wav #1
This is .wav #2
Believe it or not, these are two different .wav files, but they look super similar. And a computer especially won't be able to pick up the differences in these two .wav files if the scale is this broad.
My code is below
def individualWavToSpectrogram(myAudio, fileNameToSaveTo):
print(myAudio)
#Read file and get sampling freq [ usually 44100 Hz ] and sound object
samplingFreq, mySound = wavfile.read(myAudio)
#Check if wave file is 16bit or 32 bit. 24bit is not supported
mySoundDataType = mySound.dtype
#We can convert our sound array to floating point values ranging from -1 to 1 as follows
mySound = mySound / (2.**15)
#Check sample points and sound channel for duel channel(5060, 2) or (5060, ) for mono channel
mySoundShape = mySound.shape
samplePoints = float(mySound.shape[0])
#Get duration of sound file
signalDuration = mySound.shape[0] / samplingFreq
#If two channels, then select only one channel
#mySoundOneChannel = mySound[:,0]
#if one channel then index like a 1d array, if 2 channel index into 2 dimensional array
if len(mySound.shape) > 1:
mySoundOneChannel = mySound[:,0]
else:
mySoundOneChannel = mySound
#Plotting the tone
# We can represent sound by plotting the pressure values against time axis.
#Create an array of sample point in one dimension
timeArray = numpy.arange(0, samplePoints, 1)
#
timeArray = timeArray / samplingFreq
#Scale to milliSeconds
timeArray = timeArray * 1000
plt.rcParams['agg.path.chunksize'] = 100000
#Plot the tone
plt.plot(timeArray, mySoundOneChannel, color='Black')
#plt.xlabel('Time (ms)')
#plt.ylabel('Amplitude')
print("trying to save")
plt.savefig('/Users/BillyBobJoe/Desktop/' + fileNameToSaveTo + '.jpg')
print("saved")
#plt.show()
#plt.close()
How can I modify this code to increase the sensitivity of the graphing so that the differences between two .wav files is made more distinct?
Thanks!
[UPDATE]
I have tried using
plt.xlim((0, 16000))
But this just adds whitespace to the right of the graph
like
I need a way to change the scale of each unit. so that the graph is filled out when I change the x axis from 0 - 16000
If the question is: how to limit the scale on the xaxis, say to between 0 and 1000, you can do as follows:
plt.xlim((0, 1000))
I am trying to detect the pitch of a B3 note played with a guitar. The audio can be found here.
This is the spectrogram:
As you can see, it is visible that the fundamental pitch is about 250Hz which corresponds to the B3 note.
It also contains a good amount of harmonics and that is why I chose to use HPS from here. I am using this code for detecting the pitch:
def freq_from_hps(signal, fs):
"""Estimate frequency using harmonic product spectrum
Low frequency noise piles up and overwhelms the desired peaks
"""
N = len(signal)
signal -= mean(signal) # Remove DC offset
# Compute Fourier transform of windowed signal
windowed = signal * kaiser(N, 100)
# Get spectrum
X = log(abs(rfft(windowed)))
# Downsample sum logs of spectra instead of multiplying
hps = copy(X)
for h in arange(2, 9): # TODO: choose a smarter upper limit
dec = decimate(X, h)
hps[:len(dec)] += dec
# Find the peak and interpolate to get a more accurate peak
i_peak = argmax(hps[:len(dec)])
i_interp = parabolic(hps, i_peak)[0]
# Convert to equivalent frequency
return fs * i_interp / N # Hz
My sampling rate is 40000. However, instead of getting a result close to 250Hz (B3 note), I am getting 0.66Hz. How is this possible?
I also tried with an autocorrelation method from the same repo but I also get bad results like 10000Hz.
Thanks to an answer I understand I have to apply a filter to remove the low frequencies in the signal. How do I do that? Are there multiple methods to do that, and which one is recommended?
STATUS UPDATE:
The high-pass filter proposed by the answer is working. If I apply the function in the answer to my audio signal, it correctly displays about 245Hz. However, I would like to filter the whole signal, not only a part of it. A note could lie in the middle of the signal or a signal contain more than one note (I know a solution is onset detection, but I am curious to know why this isn't working). That is why I edited the code to return filtered_audio.
The problem is that if I do that, even though the noise has been correctly removed (see screenshot). I get 0.05 as a result.
Based on the distances between the harmonics in the spectrogram, I would estimate the pitch to be about 150-200 Hz. So, why doesn't the pitch detection algorithm detect the pitch that we can see by eye in the spectrogram? I have a few guesses:
The note only lasts for a few seconds. At the beginning, there is a beautiful harmonic stack with 10 or more harmonics! These quickly fade away and are not even visible after 5 seconds. If you are trying to estimate the pitch of the entire signal, your estimate might be contaminated by the "pitch" of the sound from 5-12 seconds. Try computing the pitch only for the first 1-2 seconds.
There is too much low frequency noise. In the spectrogram, you can see a lot of power between 0 and 64 Hz. This is not part of the harmonics, so you could try removing it with a high-pass filter.
Here is some code that does the job:
import numpy as np
from scipy.io import wavfile
from scipy import signal
import matplotlib.pyplot as plt
from frequency_estimator import freq_from_hps
# downloaded from https://github.com/endolith/waveform-analyzer/
filename = 'Vocaroo_s1KZzNZLtg3c.wav'
# downloaded from http://vocaroo.com/i/s1KZzNZLtg3c
# Parameters
time_start = 0 # seconds
time_end = 1 # seconds
filter_stop_freq = 70 # Hz
filter_pass_freq = 100 # Hz
filter_order = 1001
# Load data
fs, audio = wavfile.read(filename)
audio = audio.astype(float)
# High-pass filter
nyquist_rate = fs / 2.
desired = (0, 0, 1, 1)
bands = (0, filter_stop_freq, filter_pass_freq, nyquist_rate)
filter_coefs = signal.firls(filter_order, bands, desired, nyq=nyquist_rate)
# Examine our high pass filter
w, h = signal.freqz(filter_coefs)
f = w / 2 / np.pi * fs # convert radians/sample to cycles/second
plt.plot(f, 20 * np.log10(abs(h)), 'b')
plt.ylabel('Amplitude [dB]', color='b')
plt.xlabel('Frequency [Hz]')
plt.xlim((0, 300))
# Apply high-pass filter
filtered_audio = signal.filtfilt(filter_coefs, [1], audio)
# Only analyze the audio between time_start and time_end
time_seconds = np.arange(filtered_audio.size, dtype=float) / fs
audio_to_analyze = filtered_audio[(time_seconds >= time_start) &
(time_seconds <= time_end)]
fundamental_frequency = freq_from_hps(audio_to_analyze, fs)
print 'Fundamental frequency is {} Hz'.format(fundamental_frequency)