So I'm trying to generate a controllable tone with python. I'm using PyAudio and the simple code is like this (source):
import pyaudio
import numpy as np
p = pyaudio.PyAudio()
fs = 44100 # sampling rate, Hz, must be integer
duration = 1.0 # in seconds, may be float
f = 440.0 # sine frequency, Hz, may be float
# generate samples, note conversion to float32 array
samples = (np.sin(2*np.pi*np.arange(fs*duration)*f/fs)).astype(np.float32)
# for paFloat32 sample values must be in range [-1.0, 1.0]
stream = p.open(format=pyaudio.paFloat32,
channels=1,
rate=fs,
output=True)
# play. May repeat with different volume values (if done interactively)
stream.write(samples.tobytes())
stream.stop_stream()
stream.close()
p.terminate()
the problem is when I played 2 different frequencies in sequence, it didn't play smoothly. I think it is because of the distortion between end of first frequency and beginning of second frequency. I'm trying to give a smooth transition with this code:
fstart = 440.0 # first frequency
fstop = 523.6 # second frequency
transition = 0.5 # transition time
finest = 10 # finest of transition
duration = 2.0 # duration of first and second frequency
samples = []
samples = np.asarray(samples)
# print(duration)
# print(fs*duration)
samples = np.append(samples, (np.sin(2*np.pi*np.arange(fs*duration)*fstart/fs)).astype(np.float32)).astype(np.float32)
for x in range(0, finest):
freq = fstart + (fstop-fstart)*x/finest
time = transition/finest
# print(time)
# print(fs*time)
samples = np.append(samples, (np.sin(2*np.pi*np.arange(fs*time)*freq/fs)).astype(np.float32)).astype(np.float32)
samples = np.append(samples, (np.sin(2*np.pi*np.arange(fs*duration)*fstop/fs)).astype(np.float32)).astype(np.float32)
but as expected, it also didn't go well. And when the finest is set to high, the sound was very distorted. Any idea to make this smooth like frequency sweep sound?
Thank you.
Related
I saw a question posted on Stack Overflow about generating a sound based on a sine wav in Python. Here was the code that produced the output I was looking for:
import pyaudio
import numpy as np
p = pyaudio.PyAudio()
volume = 0.5 # range [0.0, 1.0]
fs = 44100 # sampling rate, Hz, must be integer
duration = 1.0 # in seconds, may be float
f = 440.0 # sine frequency, Hz, may be float
# generate samples, note conversion to float32 array
samples = (np.sin(2*np.pi*np.arange(fs*duration)*f/fs)).astype(np.float32)
# for paFloat32 sample values must be in range [-1.0, 1.0]
stream = p.open(format=pyaudio.paFloat32,
channels=1,
rate=fs,
output=True)
# play. May repeat with different volume values (if done interactively)
stream.write(volume*samples)
stream.stop_stream()
stream.close()
p.terminate()
What if instead of me using the line...
samples = (np.sin(2*np.pi*np.arange(fs*duration)*f/fs)).astype(np.float32)
I want to write whatever lines of code it would take, to where the samples (numpy array) would contain the values of a certain 'position' inside a specific wave file instead. Example, if I have a .wav file that has a sine wav stored inside the .wav file, and the .wav file is 1 second long, then it would 'replace' the line of code above. Is there a way to do that?
I want to play an ultrasound in the interval of 1 second infinitely and I use this code to do it. But it produce a sound when switching between ultrasound frequency and sleep. Is there any way that I can avoid this?
import pyaudio,time
import numpy
p = pyaudio.PyAudio()
volume = 0.5 # range [0.0, 1.0]
fs = 44100 # sampling rate, Hz, must be integer
duration = 1 # in seconds, may be float
f = 19000.0 # sine frequency, Hz, may be float
# generate samples, note conversion to float32 array
samples = (numpy.sin(2*numpy.pi*numpy.arange(fs*duration)*f/fs)).astype(numpy.float32)
# for paFloat32 sample values must be in range [-1.0, 1.0]
stream = p.open(format=pyaudio.paFloat32,
channels=1,
rate=fs,
output=True)
# play. May repeat with different volume values (if done interactively)
while True:
stream.write(volume*samples)
time.sleep(0.5)
stream.stop_stream()
stream.close()
p.terminate()
We have designed a code that records two wav files:
1. Records the ambient noise
2. Records the ambient noise with voice
We are then going to use those two wav files as inputs for our third def function that will subtract the ambient noise wav file from the ambient noise with voice. The only problem is that when we run the script and call the combination() function, the resulting wav file combines the two preceding wav files. Our goal is to get an output where the ambient noise will be reduced and the voice will be the one heard louder than it. Here is our script below:
import pyaudio
import wave
import matplotlib.pyplot as plt
import numpy as np
import scipy.io.wavfile
import scipy.signal as sp
def ambient():
FORMAT = pyaudio.paInt16
CHANNELS = 2
RATE = 44100
CHUNK = 1024
RECORD_SECONDS = 5
WAVE_OUTPUT_FILENAME = "ambientnoise.wav"
audio = pyaudio.PyAudio()
# start Recording
stream = audio.open(format=FORMAT, channels=CHANNELS,
rate=RATE, input=True,
frames_per_buffer=CHUNK)
print ("recording...")
frames = []
for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
data = stream.read(CHUNK)
frames.append(data)
print ("finished recording")
# stop Recording
stream.stop_stream()
stream.close()
audio.terminate()
waveFile = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
waveFile.setnchannels(CHANNELS)
waveFile.setsampwidth(audio.get_sample_size(FORMAT))
waveFile.setframerate(RATE)
waveFile.writeframes(b''.join(frames))
waveFile.close()
x = scipy.io.wavfile.read('ambientnoise.wav')
n = x[1]
y = np.zeros(n.shape)
y = n.cumsum(axis=0)
times = np.linspace(0, len(n), len(n))
plt.title("Plot 261 $speech1.wav\n $Secades, M.F.\spadesuit SIGNLAB \spadesuit 6Feb2018$")
plt.xlabel("n")
plt.ylabel("$speech1.wav$")
plt.plot(times,n)
plt.show()
def voice():
FORMAT = pyaudio.paInt16
CHANNELS = 2
RATE = 44100
CHUNK = 1024
RECORD_SECONDS = 5
WAVE_OUTPUT_FILENAME = "ambientwithvoice.wav"
audio = pyaudio.PyAudio()
# start Recording
stream = audio.open(format=FORMAT, channels=CHANNELS,
rate=RATE, input=True,
frames_per_buffer=CHUNK)
print ("recording...")
frames = []
for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
data = stream.read(CHUNK)
frames.append(data)
print ("finished recording")
# stop Recording
stream.stop_stream()
stream.close()
audio.terminate()
waveFile = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
waveFile.setnchannels(CHANNELS)
waveFile.setsampwidth(audio.get_sample_size(FORMAT))
waveFile.setframerate(RATE)
waveFile.writeframes(b''.join(frames))
waveFile.close()
x = scipy.io.wavfile.read('ambientwithvoice.wav')
n = x[1]
y = np.zeros(n.shape)
y = n.cumsum(axis=0)
times = np.linspace(0, len(n), len(n))
plt.title("Plot 261 $speech1.wav\n $Secades, M.F.\spadesuit SIGNLAB \spadesuit 6Feb2018$")
plt.xlabel("n")
plt.ylabel("$speech1.wav$")
plt.plot(times,n)
plt.show()
def combination():
rate1,Data1 = scipy.io.wavfile.read('ambientnoise.wav')
rate2,Data2 = scipy.io.wavfile.read('ambientwithvoice.wav')
new_Data = [0]*len(Data1)
for i in range(0,len(Data1)):
new_Data[i] = Data2[i] + Data1[i]
new_Data = np.array(new_Data)
scipy.io.wavfile.write('filtered.wav', rate1, new_Data)
x = scipy.io.wavfile.read('ambientwithvoice.wav')
n = x[1]
y = np.zeros(n.shape)
y = n.cumsum(axis=0)
times = np.linspace(0, len(n), len(n))
plt.title("Plot 261 $speech1.wav\n $Secades, M.F.\spadesuit SIGNLAB \spadesuit 6Feb2018$")
plt.xlabel("n")
plt.ylabel("$speech1.wav$")
plt.plot(times,n)
plt.show()
We have designed a code that records two wav files: 1. Records the ambient noise 2. Records the ambient noise with voice
This means, that while the ambient noise is continuously going on in the background two different recordings are made, one after the other. The first records only the noise, the second also has speech in it.
To simplify the explanation, let's assume the speech is not present (maybe the speaker simply said nothing). This should work similarly; noise from the first recording should be used to reduce the noise in the second recording - it does not matter if there is another signal present in the second recording or not. We know we were successful if the noise is reduced.
The situation looks like this:
Now let's combine the two recordings either by adding them or by subtracting:
Apparently, neither approach reduced the noise. Looking closely, the situation got worse: the noise amplitude in the resulting signal is higher than in either of the two recordings!
In order to work, the signal we subtract must be an exact replicate of noise in the speech signal (or at least a reasonable approximation). There lies the problem: we do not know the noise signal, because every time we record it looks differently.
So what can we do?
Use a second microphone that records the noise at the same time as the speech, but does not record the speaker.
Apply domain knowledge (#1): if you know for example that the noise is in a different frequency range than the speech signal filters can reduce the noise part.
Apply domain knowledge (#2): if the noise is predictable (e.g. something periodic like a fan or an engine) create a mathematical model that predicts the noise and subtract that from the speech signal.
If the noise is "real noise" (statistically independent and broad-band) such as Gaussian white-noise, we're pretty much out of luck.
import pyaudio
import numpy as np
RATE=44100
block = 64
pa = pyaudio.PyAudio()
stream = pa.open(format=pyaudio.paFloat32,
channels=1,
rate=RATE,
output=True)
while True:
x = np.arange(block,dtype=np.float32)
output = np.cos(2*np.pi*2000*x/44100)
output = output.tobytes()
stream.write(output)
I want to play a cosine wave with 2000Hz frequency and 64 block size. Why does tone change when I change the block size? It should be fixed tone with certain frequency whatever the block size is, shouldn't it?
Thank you for your reply.
I'm not sure what you are trying to achieve with your calculation. For a 2kHz-sound, you need 2000 sin-waves every second or every 44100 samples/ 1 sin-wave every ~22 samples or 0.5ms. The best way to find such formulas is grabbing pen and paper and find out what you actually want (how to actually combine frequency, sampling-rate and desired blocklength). One possible way is here but try to understand the math behind (untested):
import pyaudio
import numpy as np
RATE=44100
FREQUENCY = 2000
pa = pyaudio.PyAudio()
stream = pa.open(format=pyaudio.paFloat32,
channels=1,
rate=RATE,
output=True)
sample_len = 4000.0
wave_len = float(RATE) / FREQUENCY # ~22 samples per wave
# x goes from 0 to 1 for approx index 0..wave_len-1, 1..2 for wave_len..2wave_len-1, ...
x = np.arange(sample_len,dtype=np.float32)/wave_len
# 0..1 -> 0..1..0..-1..0; 1..2 -> 0..1..0..-1..0
# yes, I prefer sin over cos
output = np.sin(2*np.pi*x)
output = output.tobytes()
# no need to recreate the pattern every cycle
while True:
stream.write(output)
I'm trying to write naiv low pass filter using Python.
Values of the Fourier Transformant higher than a specific frequency should be equal to 0, right?
As far as I know that should to work.
But after an inverse fourier transformation what I get is just noise.
Program1 records RECORD_SECONDS from microphone and writes information about fft in fft.bin file.
Program2 reads from this file, do ifft and plays result on speakers.
In addition, I figured out, that every, even very little change in fft causes Program2 to fail.
Where do I make mistake?
Program1:
import pickle
import pyaudio
import wave
import numpy as np
CHUNK = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 1 #1-mono, 2-stereo
RATE = 44100
RECORD_SECONDS = 2
p = pyaudio.PyAudio()
stream = p.open(format=FORMAT,
channels=CHANNELS,
rate=RATE,
input=True,
frames_per_buffer=CHUNK)
f = open("fft.bin", "wb")
Tsamp = 1./RATE
#arguments for a fft
fft_x_arg = np.fft.rfftfreq(CHUNK/2, Tsamp)
#max freq
Fmax = 4000
print("* recording")
for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
#read one chunk from mic
SigString = stream.read(CHUNK)
#convert string to int
SigInt = np.fromstring(SigString, 'int')
#calculate fft
fft_Sig = np.fft.rfft(SigInt)
"""
#apply low pass filter, maximum freq = Fmax
j=0
for value in fft_x_arg:
if value > Fmax:
fft_Sig[j] = 0
j=j+1
"""
#write one chunk of data to file
pickle.dump(fft_Sig,f)
print("* done recording")
f.close()
stream.stop_stream()
stream.close()
p.terminate()
Program2:
import pyaudio
import pickle
import numpy as np
CHUNK = 1024
p = pyaudio.PyAudio()
stream = p.open(format=pyaudio.paInt16,
channels=1,
rate=44100/2, #anyway, why 44100 Hz plays twice faster than normal?
output=True)
f = open("fft.bin", "rb")
#load first value from file
fft_Sig = pickle.load(f)
#calculate ifft and cast do int
SigInt = np.int16(np.fft.irfft(fft_Sig))
#convert once more - to string
SigString = np.ndarray.tostring(SigInt)
while SigString != '':
#play sound
stream.write(SigString)
fft_Sig = pickle.load(f)
SigInt = np.int16(np.fft.irfft(fft_Sig))
SigString = np.ndarray.tostring(SigInt)
f.close()
stream.stop_stream()
stream.close()
p.terminate()
FFTs operate on complex numbers. You might be able to feed them real numbers (which will get converted to complex by setting the imaginary part to 0) but their outputs will always be complex.
This is probably throwing off your sample counting by 2 among other things. It should also be trashing your output because you're not converting back to real data.
Also, you forgot to apply a 1/N scale factor to the IFFT output. And you need to keep in mind that the frequency range of an FFT is half negative, that is it's approximately the range -1/(2T) <= f < 1/(2T). BTW, 1/(2T) is known as the Nyquist frequency, and for real input data, the negative half of the FFT output will mirror the positive half (i.e. for y(f) = F{x(t)} (where F{} is the forward Fourier transform) y(f) == y(-f).
I think you need to read up a bit more on DSP algorithms using FFTs. What you're trying to do is called a brick wall filter.
Also, something that will help you a lot is matplotlib, which will help you see what the data looks like at intermediate steps. You need to look at this intermediate data to find out where things are going wrong.