Sound generated not being saved to a file, as it should

Sound generated not being saved to a file, as it should - python

I generate a sound wave on frequency 440hz as expected in pyaudio, but even though I am using the same sample array to save a wav file, it does not save the same sound and I can´t figure out why
Here is the code:
import wave
import numpy as np
import pyaudio
p = pyaudio.PyAudio()
volume = 0.5 # range [0.0, 1.0]
fs = 44100 # sampling rate, Hz, must be integer
duration = 2.0 # in seconds, may be float
f = 440.0 # sine frequency, Hz, may be float
channels = 1
# open stream (2)
stream = p.open(format=pyaudio.paFloat32,
channels=channels,
rate=fs,
output=True)
def get_value(i):
return np.sin(f * np.pi * float(i) / float(fs))
samples = np.array([get_value(a) for a in range(0, fs)]).astype(np.float32)
for i in range(0, int(duration)):
stream.write(samples, fs)
wf = wave.open("test.wav", 'wb')
wf.setnchannels(channels)
wf.setsampwidth(3)
wf.setframerate(fs)
wf.setnframes(int(fs * duration))
wf.writeframes(samples)
wf.close()
# stop stream (4)
stream.stop_stream()
stream.close()
# close PyAudio (5)
p.terminate()
https://gist.github.com/badjano/c727b20429295e2695afdbc601f2334b

I think the main problem is that you are using the float32 data type which is not supported by the wave module.
You can use int16 or int32 or you can use 24-bit integers with some manual conversion.
Since you are using wf.setsampwidth(3), I assume you want to use 24-bit data?
I've written a little tutorial about the wave module (including how to handle 24-bit data) and an overview about different modules for handling sound files.
You may also be interested in my tutorial about creating a simple signal.
Since you are already using NumPy, I recommend using a library that supports NumPy arrays out-of-the-box and does all the conversions for you.
My personal preference would be to use the soundfile module, but I'm quite biased.
For playback, I would also recommend using a library that supports NumPy. Here my suggestion is the sounddevice module, but I'm very biased here as well.
If you want to follow my suggestions, your code might become something like that (including the handling of volume and fixing a missing factor of 2 in the sinus' argument):
from __future__ import division
import numpy as np
import sounddevice as sd
import soundfile as sf
volume = 0.5 # range [0.0, 1.0]
fs = 44100 # sampling rate, Hz
duration = 2.0 # in seconds
f = 440.0 # sine frequency, Hz
t = np.arange(int(duration * fs)) / fs
samples = volume * np.sin(2 * np.pi * f * t)
sf.write('myfile.wav', samples, fs, subtype='PCM_24')
sd.play(samples, fs)
sd.wait()
UPDATE:
If you want to keep using PyAudio, that's fine.
But you'll have to manually convert the floating point array (with values from -1.0 to 1.0) to integers in the appropriate range, depending on the data type you want to use.
The first link I mentioned above contains the file utility.py which has a function float2pcm() to do just that.
Here's an abbreviated version of that function:
def float2pcm(sig, dtype='int16'):
i = np.iinfo(dtype)
abs_max = 2 ** (i.bits - 1)
offset = i.min + abs_max
return (sig * abs_max + offset).clip(i.min, i.max).astype(dtype)

Related

Remove unwanted frequencies from tone

I am trying to generate a "beep" sound with a constant tone of 2350 Hz. I am using the code (which I got here) below to generate a WAV file with this tone that has a duration of 0.5 seconds.
import math
import wave
import struct
# Audio will contain a long list of samples (i.e. floating point numbers describing the
# waveform). If you were working with a very long sound you'd want to stream this to
# disk instead of buffering it all in memory list this. But most sounds will fit in
# memory.
audio = []
sample_rate = 44100.0
def append_silence(duration_milliseconds=500):
"""
Adding silence is easy - we add zeros to the end of our array
"""
num_samples = duration_milliseconds * (sample_rate / 1000.0)
for x in range(int(num_samples)):
audio.append(0.0)
return
def append_sinewave(
freq=440.0,
duration_milliseconds=500,
volume=1.0):
"""
The sine wave generated here is the standard beep. If you want something
more aggresive you could try a square or saw tooth waveform. Though there
are some rather complicated issues with making high quality square and
sawtooth waves... which we won't address here :)
"""
global audio # using global variables isn't cool.
num_samples = duration_milliseconds * (sample_rate / 1000.0)
for x in range(int(num_samples)):
audio.append(volume * math.sin(2 * math.pi * freq * ( x / sample_rate )))
return
def save_wav(file_name):
# Open up a wav file
wav_file=wave.open(file_name,"w")
# wav params
nchannels = 1
sampwidth = 2
# 44100 is the industry standard sample rate - CD quality. If you need to
# save on file size you can adjust it downwards. The stanard for low quality
# is 8000 or 8kHz.
nframes = len(audio)
comptype = "NONE"
compname = "not compressed"
wav_file.setparams((nchannels, sampwidth, sample_rate, nframes, comptype, compname))
# WAV files here are using short, 16 bit, signed integers for the
# sample size. So we multiply the floating point data we have by 32767, the
# maximum value for a short integer. NOTE: It is theortically possible to
# use the floating point -1.0 to 1.0 data directly in a WAV file but not
# obvious how to do that using the wave module in python.
for sample in audio:
wav_file.writeframes(struct.pack('h', int( sample * 32767.0 )))
wav_file.close()
return
append_sinewave(volume=1, freq=2350)
save_wav("output.wav")
When run the code below (using Librosa) to generate a spectrogram of the WAV file I see this:
Spectrogram:
Code:
beepData,beep_sample_rate = librosa.load(beepSoundPath, sr=44100)
D = librosa.stft(beepData)
S_db = librosa.amplitude_to_db(np.abs(D), ref=np.max)
librosa.display.specshow(S_db)
The problem is the extra frequencies at the beginning and end of the spectrogram. How can I get rid of these unwanted frequencies?

These are artifacts of the STFT / FFT process, because there are discontinuities at the start/end of a window. You can try to use librosa.stft(..., center=False), which should eliminate the one in the start. Then you may need to trim off / ignore the last output segments as well. At least half of n_fft parameter.

How can i extract the frequency from WAV file - python

I'm creating WAV file and write list of specific notes frequency separate by silence note with const duration in 44100 sample_rate for example 440 Hz, silence, 351 Hz, silence etc.
now i want to read from the WAV file and get the exact frequency list.
how can i do that?
thanks!
This is my note to WAV code:
# !/usr/bin/python
# based on : www.daniweb.com/code/snippet263775.html
import math
import wave
import struct
import txtToNote
# Audio will contain a long list of samples (i.e. floating point numbers describing the
# waveform). If you were working with a very long sound you'd want to stream this to
# disk instead of buffering it all in memory list this. But most sounds will fit in
# memory.
import wavToNote
audio = []
sample_rate = 44100.0
def append_silence(duration_milliseconds=500):
"""
Adding silence is easy - we add zeros to the end of our array
"""
num_samples = duration_milliseconds * (sample_rate / 1000.0)
for x in range(int(num_samples)):
audio.append(0.0)
return
def append_sinewave(freq=440.0, duration_milliseconds=1000, volume=1.0):
"""
The sine wave generated here is the standard beep. If you want something
more aggressive you could try a square or saw tooth waveform. Though there
are some rather complicated issues with making high quality square and
sawtooth waves... which we won't address here :)
"""
global audio # using global variables isn't cool.
num_samples = duration_milliseconds * (sample_rate / 1000.0)
print("audio:")
for x in range(int(num_samples)):
audio.append(volume * math.sin(2 * math.pi * freq * (x / sample_rate)))
return
def revers_audio():
print("hi")
def save_wav(file_name):
# Open up a wav file
wav_file = wave.open(file_name, "w")
# wav params
nchannels = 1
sampwidth = 2
# 44100 is the industry standard sample rate - CD quality. If you need to
# save on file size you can adjust it downwards. The stanard for low quality
# is 8000 or 8kHz.
nframes = len(audio)
comptype = "NONE"
compname = "not compressed"
wav_file.setparams((nchannels, sampwidth, sample_rate, nframes, comptype, compname))
# WAV files here are using short, 16 bit, signed integers for the
# sample size. So we multiply the floating point data we have by 32767, the
# maximum value for a short integer. NOTE: It is theortically possible to
# use the floating point -1.0 to 1.0 data directly in a WAV file but not
# obvious how to do that using the wave module in python.
for sample in audio:
wav_file.writeframes(struct.pack('h', int(sample * 32767.0)))
wav_file.close()
return
# txt_to_note is my function that return frequency (float) for each latter in the string
# (the letters are note chords!)
# for simplicity ['a' = 220, 'b' = 467, 'c' = 351, 'd' = 367]
for i in txtToNote.txt_to_note("abcda"):
append_sinewave(freq=i)
append_silence()
save_wav("output.wav")
if __name__ == '__main__':
pass

You'd read the signal from the file, window it into suitably long chunks depending on your desired temporal resolution, run FFT on each chunk to get frequency data and find the peaks.
You can find windowing and FFT in the Scipy library.

How to write pyaudio output into audio file?

I currently have the following code, which produces a sine wave of varying frequencies using the pyaudio module:
import pyaudio
import numpy as np
p = pyaudio.PyAudio()
volume = 0.5
fs = 44100
duration = 1
f = 440
samples = (np.sin(2 * np.pi * np.arange(fs * duration) * f /
fs)).astype(np.float32).tobytes()
stream = p.open(format = pyaudio.paFloat32,
channels = 1,
rate = fs,
output = True)
stream.write(samples)
However, instead of playing the sound, is there any way to make it so that the sound is written into an audio file?

Add this code at the top of your code.
from scipy.io.wavfile import write
Also, add this code at the bottom of your code.
This worked for me.
scaled = numpy.int16(s/numpy.max(numpy.abs(s)) * 32767)
write('test.wav', 44100, scaled)

Using scipy.io.wavfile.write as suggested by #h lee produced the desired results:
import numpy
from scipy.io.wavfile import write
volume = 1
sample_rate = 44100
duration = 10
frequency = 1000
samples = (numpy.sin(2 * numpy.pi * numpy.arange(sample_rate * duration)
* frequency / sample_rate)).astype(numpy.float32)
write('test.wav', sample_rate, samples)
Another example can be found on the documentation: https://docs.scipy.org/doc/scipy/reference/generated/scipy.io.wavfile.write.html

Handle your audio input as a numpy array like I did here in the second answer, but instead of just processing the frames and sending the data back to PyAudio, save each frame in a new output_array. Then when the processing is done you can use that output_array to write it to .wav or .mp3 file.
If you do that, however, the sound would still play. If you don't want to play the sound you have two options, either using the blocking mode or, if you wanna stick with the non blocking mode and the callbacks, do the following:
Erase output=True so that it is False as default.
Add a input=True parameter.
In your callback do not return ret_data, instead return None.
Keep count of the frames you've processed so that when you're done you return paComplete as second value of the returned tuple.

Demodulating an FSK signal in Python

I am trying to write a Python script that can demodulate an FSK modulated audio file and return the data encoded in the audio. The data being transmitted is GPS NMEA strings which are embedded as the audio channel in video files. Basically, text is encoded with FSK modulation, and I am trying to retrieve the text using Python. The device I am using to encode the data can also decode it, so I have been able to generate the correct output, but I need to be able to do it using software.
I have done some background reading to introduce myself to signal processing and FSK, and I have looked at example scripts (e.g. this one and minimodem).
I managed to write a Python script that runs successfully, although the output is incorrect. The correct output derived from the encoding/decoding device has 8,280 raw binary (0 and 1) characters, the Python output has 1,344,786. I think I am missing a symbol synchronizer, but I'm not sure how this works.
My question now is: how can I add symbol synchronization to the script and/or symbol timing? Are there better examples or explanations of how to do FSK demodulation in Python? I would appreciate any feedback or direction. Thank you.
Here's my script so far:
from scipy.io.wavfile import read
import numpy as np
import wave
import matplotlib.pyplot as plt
import scipy.signal as signal
from scipy.signal import blackman, butter
from scipy.fftpack import fft, rfft, rfftfreq, irfft
import scipy.signal.signaltools as sigtool
import binascii
# Read in data; 'wav' allows getting paramters, 'wav1' is actual signal data
wavfile = 'Sample4_160224_mono.wav'
wavfile1 = open(wavfile, 'r')
wav = wave.open(wavfile, 'r')
wav_1 = read(wavfile1)
params = wav.getparams()
N = params[3] #Sample size
wav1 = read(wavfile1)
wav2 = wav1[1][0:N]
duration = float(params[3] / params[2])
n_samples = len(wav2)
Fs = params[2]
nyq = 0.5 * Fs #Nyquist rate
Fbit = (params[2]*params[0]*16)/100
print "Fbit", Fbit
# Windowing function
w = blackman(n_samples)
print "W is", w
# FFT
wfft = rfft(wav2 * w)
wfft_norm = wfft/N
wfft_norm = abs(wfft_norm[range(N/2)])
# Working with frequencies...
freqs = rfftfreq(len(wfft_norm))
index = np.argmax(np.abs(wfft)) #Returns the index of the maximum absolute value of the windowed FFT
freq = freqs[index] #Returns the frequency from the above index
freq_range = [freq - 0.01, freq + 0.01]
freq_in_Hz = abs(freq * params[2]) #Converts the Hz
freq_range_Hz = [abs(freq_range[0] * params[2]), abs(freq_range[1] * params[2])]
# Differentiator
diff = np.diff(wav2)
# Envelope detector
env = np.abs(sigtool.hilbert(diff))
print "ENV", len(env)
# Low-pass filter
h = signal.firwin(numtaps = 10, cutoff = freq_range[1], nyq = nyq)
filt = signal.lfilter(h, 1, env)
# Signal's mean
mean = np.mean(filt)
#Do some crazy stuff to get binary **maybe wrong**
rx_data = []
sampled_signal = env[Fs/Fbit/2:params[3]+1:]
for bit in sampled_signal:
if bit > mean:
rx_data.append(int(1))
else:
rx_data.append(int(0))
# Save raw binary output
rx_data1 = ''.join(map(str, (rx_data)))
outfile1 = open('FSK_wav6_output_binary.txt', 'w')
outfile1.write(rx_data1)
outfile1.close()

Seems that you use multiple channles and the sound you need is embedded in one of them.
So far I have found few problems in your scripts:
Nyquist rate is not a half rate of your sound. It is the rate which could sample the original sound wave, and should be at least 2 times bigger than the sound sampling rate. Hence,
nyq = 0.5 * Fs
is wrong.
If you take advantage of the noiseless sound to demodulate, then the Differentiator can be omitted.
For the low-pass filter:
h = signal.firwin(numtaps = 10, cutoff = freq_range[1], nyq = nyq)
the cutoff frequency is your data sample rate, please read this.
filt is the final signal which can extract the specific data you desire.
How to choose points in sampled_signal to recreate the original signal actually depends on the ratio between the original signal rate and the sampling rate. Just like the first link you provided, assuming the data were written in 11025 Hz and the sampling or recording rate is 44100 Hz, then the code you gave:
sampled_signal = env[Fs/Fbit/2:params[3]+1:]
should be:
sampled_signal = filt[Fs/Fbit*2:params[3]:Fs/Fbit*4]
where Fs/Fbit*2 is the beginning, params[3] is the ending, Fs/Fbit*4 is the step length.
The correct output derived from the encoding/decoding device has 8,280 raw binary (0 and 1) characters, the Python output has 1,344,786.
It is normal, because of different sample rates, you can add some special characters acting like a start-sign and end-sign in you text, and try to find them, then you might find the data with correct lenght you need.

Generating sine wave sound in Python

I need to generate a sine wave sound in Python, and I need to be able to control frequency, duration, and relative volume. By 'generate' I mean that I want it to play though the speakers immediately, not save to a file.
What is the easiest way to do this?

Version with numpy:
import time
import numpy as np
import pyaudio
p = pyaudio.PyAudio()
volume = 0.5 # range [0.0, 1.0]
fs = 44100 # sampling rate, Hz, must be integer
duration = 5.0 # in seconds, may be float
f = 440.0 # sine frequency, Hz, may be float
# generate samples, note conversion to float32 array
samples = (np.sin(2 * np.pi * np.arange(fs * duration) * f / fs)).astype(np.float32)
# per #yahweh comment explicitly convert to bytes sequence
output_bytes = (volume * samples).tobytes()
# for paFloat32 sample values must be in range [-1.0, 1.0]
stream = p.open(format=pyaudio.paFloat32,
channels=1,
rate=fs,
output=True)
# play. May repeat with different volume values (if done interactively)
start_time = time.time()
stream.write(output_bytes)
print("Played sound for {:.2f} seconds".format(time.time() - start_time))
stream.stop_stream()
stream.close()
p.terminate()
Version without numpy:
import array
import math
import time
import pyaudio
p = pyaudio.PyAudio()
volume = 0.5 # range [0.0, 1.0]
fs = 44100 # sampling rate, Hz, must be integer
duration = 5.0 # in seconds, may be float
f = 440.0 # sine frequency, Hz, may be float
# generate samples, note conversion to float32 array
num_samples = int(fs * duration)
samples = [volume * math.sin(2 * math.pi * k * f / fs) for k in range(0, num_samples)]
# per #yahweh comment explicitly convert to bytes sequence
output_bytes = array.array('f', samples).tobytes()
# for paFloat32 sample values must be in range [-1.0, 1.0]
stream = p.open(format=pyaudio.paFloat32,
channels=1,
rate=fs,
output=True)
# play. May repeat with different volume values (if done interactively)
start_time = time.time()
stream.write(output_bytes)
print("Played sound for {:.2f} seconds".format(time.time() - start_time))
stream.stop_stream()
stream.close()
p.terminate()

ivan-onys gave an excellent answer, but there is a little addition to it:
this script will produce 4 times shorter sound than expected because Pyaudio write method needs string data of float32, but when you pass numpy array to this method, it converts whole array as entity to a string, therefore you have to convert data in numpy array to the byte sequence yourself like this:
samples = (np.sin(2*np.pi*np.arange(fs*duration)*f/fs)).astype(np.float32).tobytes()
and you have to change this line as well:
stream.write(samples)

One of the more consistent andeasy to install ways to deal with sound in Python is the Pygame multimedia libraries.
I'd recomend using it - there is the pygame.sndarray submodule that allows you to manipulate numbers in a data vector that become a high-level sound object that can be playerd in the pygame.mixer module.
The documentation in the pygame.org site should be enough for using the sndarray module.

Today for Python 3.5+ the best way is to install the packages recommended by the developer.
http://people.csail.mit.edu/hubert/pyaudio/
For Debian do
sudo apt-get install python3-all-dev portaudio19-dev
before trying to install pyaudio

The script from ivan_onys produces a signal that is four times shorter than intended. If a TypeError is returned when volume is a float, try adding .tobytes() to the following line instead.
stream.write((volume*samples).tobytes())
#mm_ float32 = 32 bits, and 8 bits = 1 byte, so float32 = 4 bytes. When samples are passed to stream.write as float32, byte count (duration) is divided by 4. Writing samples back .tobytes() corrects for quartering the sample count when writing to float32.

I the bregman lab toolbox you have a set of functions that does exactly what you want. This python module is a little bit buggy but you can adapt this code to get your own functions

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.