I am using pydub to do some experiments with an audio file. After I load it I want to take some parts and analyze them further with numpy, so I extract the raw data as explained here Pydub raw audio data
song = AudioSegment.from_file("dodos_delight.m4a")# Size of segments to break song into for volume calculations
# Take the first 17.5 seconds
start = 0
end = 17.5*1000
guitar = song[start:end]
#Now get raw data as an array
bit_depth = song.sample_width * 8
array_type = get_array_type(bit_depth)
fs = song.frame_rate
guitar_np = array.array(array_type, guitar.raw_data)
guitar_t = np.arange(0,len(guitar_np)/fs,1/fs)
However len(guitar_np)/fs = 35 which does not make sense. It's the exact double of what it should be. The only way to be 17.5 would be if fs was doubled, but the points are taken at 1/fs time apart.
If I try to save the data like this
from scipy.io.wavfile import write
rate = fs
scaled = np.int16(guitar_np / np.max(np.abs(guitar_np)) * 32767)
write('test.wav', rate, scaled)
I get a super slow version of it, and the only way to make it sound as the original is to save it with rate = fs*2
Any thought?
Related
I'm a python beginner and as a learning project I'm doing a SSTV encoder using Wraase SC2-120 methods.
SSTV for those who don't know is a technique sending images through radio as sound and to be decoded in the receiving end back to an image. Wraase SC2-120 is one of many types of encoding but it's one of the more simpler ones that support color.
I've been able to create a system that takes an image and converts it to an array. Then take that array and create the needed values for the luminance and chrominance needed for the encoder.
I'm using then this block to create a value between 1500hz - 2300hz for the method.
def ChrominanceAsHertz(value=0.0):
value = 800 * value
value -= value % 128 # Test. Results were promising but too much noise
value += 1500
return int(value)
You can ignore the modulus operation. It is just my way of playing around with the data for "fun" and experiments.
I then clean the audio to avoid having too many of the same values in the same array and add their duration together to achieve a cleaner sound
cleanTone = []
cleanDuration = []
for i in range(len(hertzData)-1):
# If the next tone is not the same
# Add it to the cleantone array
# with it's initial duration
if hertzData[i] != hertzData[i+1]:
cleanTone.append(hertzData[i])
cleanDuration.append(durationData[i])
# else add the duration of the current hertz to the clean duration array
else:
# the current duration is the last inserted duration
currentDur = cleanDuration[len(cleanDuration)-1]
# Add the new duration to the current duration
currentDur += durationData[i]
cleanDuration[len(cleanDuration)-1] = currentDur
My array handling can use some work but it's not why I'm here for now.
The result is a array where no consecutive values are the same and the duration of that tone is still correct.
I then create a sinewave array using this block
audio = []
for i in range(len(cleanTone)):
sineAudio = AudioGen.SineWave(cleanTone[i], cleanDuration[i])
for sine in sineAudio:
audio.append(sine)
The sinewave function is
def SineWave( freq=440, durationMS = 500, sample_rate = 44100.0 ):
num_samples = durationMS * (sample_rate / 1000)
audio = []
for i in range(int(num_samples)):
audio.insert(i, np.sin(2 * np.pi * freq * (i / sample_rate)))
return audio
It works as intended. It creates a sinewave of the frequency I want and for the duration I want.
The problem is that when I create the .wav file then with wave the sinewaves created are not smoothly transitioning.
Screenshot of a closeup of what I mean.
Sinusoidal wave artifacts
The audio file has these immense screeches and cracks because of these artifacts that the above method produces, seeing as how it takes a single frequency and duration with no regard of where the last tone ended and starts a new.
What I've tried to do to remedy these is to refactor that SineWave method to take in a whole array and create the sinewaves consecutively right after one another in hopes of achieving a clean sound but it still did the same thing.
I also tried "smoothing" the generated audio array then with a simple filtering operation from this post.
0.7 * audio[1:-1] + 0.15 * ( audio[2:] + audio[:-2] )
but the results again were not satisfying and the artifacts were still present.
I've also started to look into Fourier Transforms, mainly FFT (fast fourier transform) but I'm not that familiar with them yet to know exactly what it is that I'm trying to do and code.
For SSTV to work the changes in frequency have to sometimes be very fast. 0.3ms fast to be exact, so I'm kinda lost on how to achieve this without loosing too much data in the process.
TL;DR
My sinewave function is producing artifacts inbetween tone changes that cause scratches and unwanted pops. How to not do that?
What you need to transfer from one wave-snippet to the next is the phase. You have to start the next wave with the phase you ended the previous phase.
def SineWave( freq=440, durationMS = 500, phase = 0, sample_rate = 44100.0):
num_samples = int(durationMS * (sample_rate / 1000))
audio = []
for i in range(num_samples):
audio.insert(i, np.sin(2 * np.pi * freq * (i / sample_rate) + phase))
phase = (phase + 2 * np.pi * freq * (num_samples / sample_rate)) % (2 * np.pi)
return audio, phase
In your main loop pass through the phase from one wave fragment to the next:
audio = []
phase = 0
for i in range(len(cleanTone)):
sineAudio, phase = AudioGen.SineWave(cleanTone[i], cleanDuration[i], phase)
for sine in sineAudio:
audio.append(sine)
I'm trying to make a wavetable synthesizer in Python for the first time (based off an example I found here https://blamsoft.com/tutorials/expanse-creating-wavetables/) but the resultant sound I'm getting doesn't sound tonal at all. My output is just a low grainy buzz. I'm pretty new to making wavetables in Python and I was wondering if anybody might be able to tell me what I'm missing in order to write an A440 sine wavetable to the file "wavetable.wav" and have it actually produce a pure sine tone? Here's what I have at the moment:
import wave
import struct
import numpy as np
frame_count = 256
frame_size = 2048
sps = 44100
freq_hz = 440
file = "wavetable.wav" #write waveform to file
wav_file = wave.open(file, 'w')
wav_file.setparams((1, 2, sps, frame_count, 'NONE', 'not compressed'))
values = bytes(0)
for i in range(frame_count):
for ii in range(frame_size):
sample = np.sin((float(ii)/frame_size) * (i+128)/256 * 2 * np.pi * freq_hz/sps) * 65535
if sample < 0:
sample = 0
sample -= 32768
sample = int(sample)
values += struct.pack('h', sample)
wav_file.writeframes(values)
wav_file.close()
print("Generated " + file)
The sine function I have inside the for loop is probably the part I understand the least because I just went by the example verbatim. I'm used to making sine functions like (y = Asin(2πfx)) but I'm not sure what the purpose is of multiplying by ((i+128)/256) and 65535 (16-bit amplitude resolution?). I'm also not sure what the purpose is of subtracting 32768 from each sample. Is anyone able to clarify what I'm missing and maybe point me in the right direction? Am I going about this the wrong way? Any help is appreciated!
If you just wanted to generate sound data ahead of time and then dump it all into a file, and you’re also comfortable using NumPy, I’d suggest using it with a library like SoundFile. Then there’s no need to delimit the data into frames.
Starting with a naïve approach (using numpy.sin, not trying to optimize things yet), one ends with something like this:
from math import tau
import numpy as np
import soundfile as sf
file_path = 'sine.flac'
sample_rate = 48_000 # hertz
duration = 1.0 # seconds
frequency = 432.0 # hertz
amplitude = 0.8 # (not in decibels!)
start_phase = 0.0 # at what phase to start
sample_count = floor(sample_rate * duration)
# cyclical frequency in sample^-1
omega = frequency * tau / sample_rate
# all phases for which we want to sample our sine
phases = np.linspace(start_phase, start_phase + omega * sample_count,
sample_count, endpoint=False)
# our sine wave samples, generated all at once
audio = amplitude * np.sin(phases)
# now write to file
fmt, sub = 'FLAC', 'PCM_24'
assert sf.check_format(fmt, sub) # to make sure we ask the correct thing beforehand
sf.write(file_path, audio, sample_rate, format=fmt, subtype=sub)
This will be a mono sound, you can write stereo using 2d arrays (see NumPy and SoundFile’s docs).
But note that to make a wavetable specifically, you need to be sure it contains just a single period (or an integer number of periods) of the wave exactly, so the playback of the wavetable will be without clicks and have a correct frequency.
You can play chunked sound in real time in Python too, using something like PyAudio. (I’ve not yet used that, so at least for a time this answer would lack code related to that.)
Finally, frankly, all above is unrelated to the generation of sound data from a wavetable: you just pick a wavetable from somewhere, that doesn’t do much for actual synthesis. Here is a simple starting algorithm for that. Assume you want to play back a chunk of sample_count samples and have a wavetable stored in wavetable, a single period which loops perfectly and is normalized. And assume your current wave phase is start_phase, frequency is frequency, sample rate is sample_rate, amplitude is amplitude. Then:
# indices for the wavetable values; this is just for `np.interp` to work
wavetable_period = float(len(wavetable))
wavetable_indices = np.linspace(0, wavetable_period,
len(wavetable), endpoint=False)
# frequency of the wavetable played at native resolution
wavetable_freq = sample_rate / wavetable_period
# start index into the wavetable
start_index = start_phase * wavetable_period / tau
# code above you run just once at initialization of this wavetable ↑
# code below is run for each audio chunk ↓
# samples of wavetable per output sample
shift = frequency / wavetable_freq
# fractional indices into the wavetable
indices = np.linspace(start_index, start_index + shift * sample_count,
sample_count, endpoint=False)
# linearly interpolated wavetavle sampled at our frequency
audio = np.interp(indices, wavetable_indices, wavetable,
period=wavetable_period)
audio *= amplitude
# at last, update `start_index` for the next chunk
start_index += shift * sample_count
Then you output the audio. Though there are better ways to play back a wavetable, linear interpolation is at least a fine start. Frequency slides are also possible with this approach: just compute indices in another way, no longer spaced uniformly.
I currently have the following code, which produces a sine wave of varying frequencies using the pyaudio module:
import pyaudio
import numpy as np
p = pyaudio.PyAudio()
volume = 0.5
fs = 44100
duration = 1
f = 440
samples = (np.sin(2 * np.pi * np.arange(fs * duration) * f /
fs)).astype(np.float32).tobytes()
stream = p.open(format = pyaudio.paFloat32,
channels = 1,
rate = fs,
output = True)
stream.write(samples)
However, instead of playing the sound, is there any way to make it so that the sound is written into an audio file?
Add this code at the top of your code.
from scipy.io.wavfile import write
Also, add this code at the bottom of your code.
This worked for me.
scaled = numpy.int16(s/numpy.max(numpy.abs(s)) * 32767)
write('test.wav', 44100, scaled)
Using scipy.io.wavfile.write as suggested by #h lee produced the desired results:
import numpy
from scipy.io.wavfile import write
volume = 1
sample_rate = 44100
duration = 10
frequency = 1000
samples = (numpy.sin(2 * numpy.pi * numpy.arange(sample_rate * duration)
* frequency / sample_rate)).astype(numpy.float32)
write('test.wav', sample_rate, samples)
Another example can be found on the documentation: https://docs.scipy.org/doc/scipy/reference/generated/scipy.io.wavfile.write.html
Handle your audio input as a numpy array like I did here in the second answer, but instead of just processing the frames and sending the data back to PyAudio, save each frame in a new output_array. Then when the processing is done you can use that output_array to write it to .wav or .mp3 file.
If you do that, however, the sound would still play. If you don't want to play the sound you have two options, either using the blocking mode or, if you wanna stick with the non blocking mode and the callbacks, do the following:
Erase output=True so that it is False as default.
Add a input=True parameter.
In your callback do not return ret_data, instead return None.
Keep count of the frames you've processed so that when you're done you return paComplete as second value of the returned tuple.
I am trying to write a Python script that can demodulate an FSK modulated audio file and return the data encoded in the audio. The data being transmitted is GPS NMEA strings which are embedded as the audio channel in video files. Basically, text is encoded with FSK modulation, and I am trying to retrieve the text using Python. The device I am using to encode the data can also decode it, so I have been able to generate the correct output, but I need to be able to do it using software.
I have done some background reading to introduce myself to signal processing and FSK, and I have looked at example scripts (e.g. this one and minimodem).
I managed to write a Python script that runs successfully, although the output is incorrect. The correct output derived from the encoding/decoding device has 8,280 raw binary (0 and 1) characters, the Python output has 1,344,786. I think I am missing a symbol synchronizer, but I'm not sure how this works.
My question now is: how can I add symbol synchronization to the script and/or symbol timing? Are there better examples or explanations of how to do FSK demodulation in Python? I would appreciate any feedback or direction. Thank you.
Here's my script so far:
from scipy.io.wavfile import read
import numpy as np
import wave
import matplotlib.pyplot as plt
import scipy.signal as signal
from scipy.signal import blackman, butter
from scipy.fftpack import fft, rfft, rfftfreq, irfft
import scipy.signal.signaltools as sigtool
import binascii
# Read in data; 'wav' allows getting paramters, 'wav1' is actual signal data
wavfile = 'Sample4_160224_mono.wav'
wavfile1 = open(wavfile, 'r')
wav = wave.open(wavfile, 'r')
wav_1 = read(wavfile1)
params = wav.getparams()
N = params[3] #Sample size
wav1 = read(wavfile1)
wav2 = wav1[1][0:N]
duration = float(params[3] / params[2])
n_samples = len(wav2)
Fs = params[2]
nyq = 0.5 * Fs #Nyquist rate
Fbit = (params[2]*params[0]*16)/100
print "Fbit", Fbit
# Windowing function
w = blackman(n_samples)
print "W is", w
# FFT
wfft = rfft(wav2 * w)
wfft_norm = wfft/N
wfft_norm = abs(wfft_norm[range(N/2)])
# Working with frequencies...
freqs = rfftfreq(len(wfft_norm))
index = np.argmax(np.abs(wfft)) #Returns the index of the maximum absolute value of the windowed FFT
freq = freqs[index] #Returns the frequency from the above index
freq_range = [freq - 0.01, freq + 0.01]
freq_in_Hz = abs(freq * params[2]) #Converts the Hz
freq_range_Hz = [abs(freq_range[0] * params[2]), abs(freq_range[1] * params[2])]
# Differentiator
diff = np.diff(wav2)
# Envelope detector
env = np.abs(sigtool.hilbert(diff))
print "ENV", len(env)
# Low-pass filter
h = signal.firwin(numtaps = 10, cutoff = freq_range[1], nyq = nyq)
filt = signal.lfilter(h, 1, env)
# Signal's mean
mean = np.mean(filt)
#Do some crazy stuff to get binary **maybe wrong**
rx_data = []
sampled_signal = env[Fs/Fbit/2:params[3]+1:]
for bit in sampled_signal:
if bit > mean:
rx_data.append(int(1))
else:
rx_data.append(int(0))
# Save raw binary output
rx_data1 = ''.join(map(str, (rx_data)))
outfile1 = open('FSK_wav6_output_binary.txt', 'w')
outfile1.write(rx_data1)
outfile1.close()
Seems that you use multiple channles and the sound you need is embedded in one of them.
So far I have found few problems in your scripts:
Nyquist rate is not a half rate of your sound. It is the rate which could sample the original sound wave, and should be at least 2 times bigger than the sound sampling rate. Hence,
nyq = 0.5 * Fs
is wrong.
If you take advantage of the noiseless sound to demodulate, then the Differentiator can be omitted.
For the low-pass filter:
h = signal.firwin(numtaps = 10, cutoff = freq_range[1], nyq = nyq)
the cutoff frequency is your data sample rate, please read this.
filt is the final signal which can extract the specific data you desire.
How to choose points in sampled_signal to recreate the original signal actually depends on the ratio between the original signal rate and the sampling rate. Just like the first link you provided, assuming the data were written in 11025 Hz and the sampling or recording rate is 44100 Hz, then the code you gave:
sampled_signal = env[Fs/Fbit/2:params[3]+1:]
should be:
sampled_signal = filt[Fs/Fbit*2:params[3]:Fs/Fbit*4]
where Fs/Fbit*2 is the beginning, params[3] is the ending, Fs/Fbit*4 is the step length.
The correct output derived from the encoding/decoding device has 8,280 raw binary (0 and 1) characters, the Python output has 1,344,786.
It is normal, because of different sample rates, you can add some special characters acting like a start-sign and end-sign in you text, and try to find them, then you might find the data with correct lenght you need.
What I am trying to achieve is the following: I need the frequency values of a sound file (.wav) for analysis. I know a lot of programs will give a visual graph (spectrogram) of the values but I need to raw data. I know this can be done with FFT and should be fairly easily scriptable in python but not sure how to do it exactly.
So let's say that a signal in a file is .4s long then I would like multiple measurements giving an output as an array for each timepoint the program measures and what value (frequency) it found (and possibly power (dB) too). The complicated thing is that I want to analyse bird songs, and they often have harmonics or the signal is over a range of frequency (e.g. 1000-2000 Hz). I would like the program to output this information as well, since this is important for the analysis I would like to do with the data :)
Now there is a piece of code that looked very much like I wanted, but I think it does not give me all the values I want.... (thanks to Justin Peel for posting this to a different question :)) So I gather that I need numpy and pyaudio but unfortunately I am not familiar with python so I am hoping that a Python expert can help me on this?
Source Code:
# Read in a WAV and find the freq's
import pyaudio
import wave
import numpy as np
chunk = 2048
# open up a wave
wf = wave.open('test-tones/440hz.wav', 'rb')
swidth = wf.getsampwidth()
RATE = wf.getframerate()
# use a Blackman window
window = np.blackman(chunk)
# open stream
p = pyaudio.PyAudio()
stream = p.open(format =
p.get_format_from_width(wf.getsampwidth()),
channels = wf.getnchannels(),
rate = RATE,
output = True)
# read some data
data = wf.readframes(chunk)
# play stream and find the frequency of each chunk
while len(data) == chunk*swidth:
# write data out to the audio stream
stream.write(data)
# unpack the data and times by the hamming window
indata = np.array(wave.struct.unpack("%dh"%(len(data)/swidth),\
data))*window
# Take the fft and square each value
fftData=abs(np.fft.rfft(indata))**2
# find the maximum
which = fftData[1:].argmax() + 1
# use quadratic interpolation around the max
if which != len(fftData)-1:
y0,y1,y2 = np.log(fftData[which-1:which+2:])
x1 = (y2 - y0) * .5 / (2 * y1 - y2 - y0)
# find the frequency and output it
thefreq = (which+x1)*RATE/chunk
print "The freq is %f Hz." % (thefreq)
else:
thefreq = which*RATE/chunk
print "The freq is %f Hz." % (thefreq)
# read some more data
data = wf.readframes(chunk)
if data:
stream.write(data)
stream.close()
p.terminate()
I'm not sure if this is what you want, if you just want the FFT:
import scikits.audiolab, scipy
x, fs, nbits = scikits.audiolab.wavread(filename)
X = scipy.fft(x)
If you want the magnitude response:
import pylab
Xdb = 20*scipy.log10(scipy.absolute(X))
f = scipy.linspace(0, fs, len(Xdb))
pylab.plot(f, Xdb)
pylab.show()
I think that what you need to do is a Short-time Fourier Transform(STFT). Basically, you do multiple partially overlapping FFTs and add them together for each point in time. Then you would find the peak for each point in time. I haven't done this myself, but I've looked into it some in the past and this is definitely the way to go forward.
There's some Python code to do a STFT here and here.