I'm trying to write naiv low pass filter using Python.
Values of the Fourier Transformant higher than a specific frequency should be equal to 0, right?
As far as I know that should to work.
But after an inverse fourier transformation what I get is just noise.
Program1 records RECORD_SECONDS from microphone and writes information about fft in fft.bin file.
Program2 reads from this file, do ifft and plays result on speakers.
In addition, I figured out, that every, even very little change in fft causes Program2 to fail.
Where do I make mistake?
Program1:
import pickle
import pyaudio
import wave
import numpy as np
CHUNK = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 1 #1-mono, 2-stereo
RATE = 44100
RECORD_SECONDS = 2
p = pyaudio.PyAudio()
stream = p.open(format=FORMAT,
channels=CHANNELS,
rate=RATE,
input=True,
frames_per_buffer=CHUNK)
f = open("fft.bin", "wb")
Tsamp = 1./RATE
#arguments for a fft
fft_x_arg = np.fft.rfftfreq(CHUNK/2, Tsamp)
#max freq
Fmax = 4000
print("* recording")
for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
#read one chunk from mic
SigString = stream.read(CHUNK)
#convert string to int
SigInt = np.fromstring(SigString, 'int')
#calculate fft
fft_Sig = np.fft.rfft(SigInt)
"""
#apply low pass filter, maximum freq = Fmax
j=0
for value in fft_x_arg:
if value > Fmax:
fft_Sig[j] = 0
j=j+1
"""
#write one chunk of data to file
pickle.dump(fft_Sig,f)
print("* done recording")
f.close()
stream.stop_stream()
stream.close()
p.terminate()
Program2:
import pyaudio
import pickle
import numpy as np
CHUNK = 1024
p = pyaudio.PyAudio()
stream = p.open(format=pyaudio.paInt16,
channels=1,
rate=44100/2, #anyway, why 44100 Hz plays twice faster than normal?
output=True)
f = open("fft.bin", "rb")
#load first value from file
fft_Sig = pickle.load(f)
#calculate ifft and cast do int
SigInt = np.int16(np.fft.irfft(fft_Sig))
#convert once more - to string
SigString = np.ndarray.tostring(SigInt)
while SigString != '':
#play sound
stream.write(SigString)
fft_Sig = pickle.load(f)
SigInt = np.int16(np.fft.irfft(fft_Sig))
SigString = np.ndarray.tostring(SigInt)
f.close()
stream.stop_stream()
stream.close()
p.terminate()
FFTs operate on complex numbers. You might be able to feed them real numbers (which will get converted to complex by setting the imaginary part to 0) but their outputs will always be complex.
This is probably throwing off your sample counting by 2 among other things. It should also be trashing your output because you're not converting back to real data.
Also, you forgot to apply a 1/N scale factor to the IFFT output. And you need to keep in mind that the frequency range of an FFT is half negative, that is it's approximately the range -1/(2T) <= f < 1/(2T). BTW, 1/(2T) is known as the Nyquist frequency, and for real input data, the negative half of the FFT output will mirror the positive half (i.e. for y(f) = F{x(t)} (where F{} is the forward Fourier transform) y(f) == y(-f).
I think you need to read up a bit more on DSP algorithms using FFTs. What you're trying to do is called a brick wall filter.
Also, something that will help you a lot is matplotlib, which will help you see what the data looks like at intermediate steps. You need to look at this intermediate data to find out where things are going wrong.
Related
We have designed a code that records two wav files:
1. Records the ambient noise
2. Records the ambient noise with voice
We are then going to use those two wav files as inputs for our third def function that will subtract the ambient noise wav file from the ambient noise with voice. The only problem is that when we run the script and call the combination() function, the resulting wav file combines the two preceding wav files. Our goal is to get an output where the ambient noise will be reduced and the voice will be the one heard louder than it. Here is our script below:
import pyaudio
import wave
import matplotlib.pyplot as plt
import numpy as np
import scipy.io.wavfile
import scipy.signal as sp
def ambient():
FORMAT = pyaudio.paInt16
CHANNELS = 2
RATE = 44100
CHUNK = 1024
RECORD_SECONDS = 5
WAVE_OUTPUT_FILENAME = "ambientnoise.wav"
audio = pyaudio.PyAudio()
# start Recording
stream = audio.open(format=FORMAT, channels=CHANNELS,
rate=RATE, input=True,
frames_per_buffer=CHUNK)
print ("recording...")
frames = []
for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
data = stream.read(CHUNK)
frames.append(data)
print ("finished recording")
# stop Recording
stream.stop_stream()
stream.close()
audio.terminate()
waveFile = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
waveFile.setnchannels(CHANNELS)
waveFile.setsampwidth(audio.get_sample_size(FORMAT))
waveFile.setframerate(RATE)
waveFile.writeframes(b''.join(frames))
waveFile.close()
x = scipy.io.wavfile.read('ambientnoise.wav')
n = x[1]
y = np.zeros(n.shape)
y = n.cumsum(axis=0)
times = np.linspace(0, len(n), len(n))
plt.title("Plot 261 $speech1.wav\n $Secades, M.F.\spadesuit SIGNLAB \spadesuit 6Feb2018$")
plt.xlabel("n")
plt.ylabel("$speech1.wav$")
plt.plot(times,n)
plt.show()
def voice():
FORMAT = pyaudio.paInt16
CHANNELS = 2
RATE = 44100
CHUNK = 1024
RECORD_SECONDS = 5
WAVE_OUTPUT_FILENAME = "ambientwithvoice.wav"
audio = pyaudio.PyAudio()
# start Recording
stream = audio.open(format=FORMAT, channels=CHANNELS,
rate=RATE, input=True,
frames_per_buffer=CHUNK)
print ("recording...")
frames = []
for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
data = stream.read(CHUNK)
frames.append(data)
print ("finished recording")
# stop Recording
stream.stop_stream()
stream.close()
audio.terminate()
waveFile = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
waveFile.setnchannels(CHANNELS)
waveFile.setsampwidth(audio.get_sample_size(FORMAT))
waveFile.setframerate(RATE)
waveFile.writeframes(b''.join(frames))
waveFile.close()
x = scipy.io.wavfile.read('ambientwithvoice.wav')
n = x[1]
y = np.zeros(n.shape)
y = n.cumsum(axis=0)
times = np.linspace(0, len(n), len(n))
plt.title("Plot 261 $speech1.wav\n $Secades, M.F.\spadesuit SIGNLAB \spadesuit 6Feb2018$")
plt.xlabel("n")
plt.ylabel("$speech1.wav$")
plt.plot(times,n)
plt.show()
def combination():
rate1,Data1 = scipy.io.wavfile.read('ambientnoise.wav')
rate2,Data2 = scipy.io.wavfile.read('ambientwithvoice.wav')
new_Data = [0]*len(Data1)
for i in range(0,len(Data1)):
new_Data[i] = Data2[i] + Data1[i]
new_Data = np.array(new_Data)
scipy.io.wavfile.write('filtered.wav', rate1, new_Data)
x = scipy.io.wavfile.read('ambientwithvoice.wav')
n = x[1]
y = np.zeros(n.shape)
y = n.cumsum(axis=0)
times = np.linspace(0, len(n), len(n))
plt.title("Plot 261 $speech1.wav\n $Secades, M.F.\spadesuit SIGNLAB \spadesuit 6Feb2018$")
plt.xlabel("n")
plt.ylabel("$speech1.wav$")
plt.plot(times,n)
plt.show()
We have designed a code that records two wav files: 1. Records the ambient noise 2. Records the ambient noise with voice
This means, that while the ambient noise is continuously going on in the background two different recordings are made, one after the other. The first records only the noise, the second also has speech in it.
To simplify the explanation, let's assume the speech is not present (maybe the speaker simply said nothing). This should work similarly; noise from the first recording should be used to reduce the noise in the second recording - it does not matter if there is another signal present in the second recording or not. We know we were successful if the noise is reduced.
The situation looks like this:
Now let's combine the two recordings either by adding them or by subtracting:
Apparently, neither approach reduced the noise. Looking closely, the situation got worse: the noise amplitude in the resulting signal is higher than in either of the two recordings!
In order to work, the signal we subtract must be an exact replicate of noise in the speech signal (or at least a reasonable approximation). There lies the problem: we do not know the noise signal, because every time we record it looks differently.
So what can we do?
Use a second microphone that records the noise at the same time as the speech, but does not record the speaker.
Apply domain knowledge (#1): if you know for example that the noise is in a different frequency range than the speech signal filters can reduce the noise part.
Apply domain knowledge (#2): if the noise is predictable (e.g. something periodic like a fan or an engine) create a mathematical model that predicts the noise and subtract that from the speech signal.
If the noise is "real noise" (statistically independent and broad-band) such as Gaussian white-noise, we're pretty much out of luck.
I am trying to achieve active noise reduction in python. My project is composed of two set of codes:
sound recording code
sound filtering code
What I aim for is that when you run the program, it will start recording through the microphone. After you've finished recording there will be a saved file called "file1.wav" When you play that file, it is the one that you recorded originally. After you're finished with that, you will now put "file1.wav" through a filter by calling "fltrd()". This will create a second wav file in the same folder and that second wav file is supposedly the one with less/reduced noise. Now my problem is that the second wav file is enhancing noise instead of reducing it. Can anyone please troubleshoot my code? :(
Here is my code below:
import pyaudio
import wave
import matplotlib.pyplot as plt
import numpy as np
import scipy.io.wavfile
import scipy.signal as sp
FORMAT = pyaudio.paInt16
CHANNELS = 2
RATE = 44100
CHUNK = 1024
RECORD_SECONDS = 5
WAVE_OUTPUT_FILENAME = "file1.wav"
audio = pyaudio.PyAudio()
# start Recording
stream = audio.open(format=FORMAT, channels=CHANNELS,
rate=RATE, input=True,
frames_per_buffer=CHUNK)
print ("recording...")
frames = []
for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
data = stream.read(CHUNK)
frames.append(data)
print ("finished recording")
# stop Recording
stream.stop_stream()
stream.close()
audio.terminate()
waveFile = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
waveFile.setnchannels(CHANNELS)
waveFile.setsampwidth(audio.get_sample_size(FORMAT))
waveFile.setframerate(RATE)
waveFile.writeframes(b''.join(frames))
waveFile.close()
x = scipy.io.wavfile.read('file1.wav')
n = x[1]
y = np.zeros(n.shape)
y = n.cumsum(axis=0)
def fltrd():
n,x = scipy.io.wavfile.read('file1.wav')
a2 = x.cumsum(axis=0)
a3 = np.asarray(a2, dtype = np.int16)
scipy.io.wavfile.write('file2.wav',n,a3)
Actual noise filtering is difficult and intense. However, an simple noise filter using high and low pass filter can be easily created using pydub library. See here for more details (install, requirements etc)
Also see here for more details on low and high pass filter using pydub.
Basic idea is to take a audio file and then pass it through both low and high pass filter such that audio above and below certain threahold will be highly attenuated (in effect demonstrating filtering).
Although, this will not affect any noise falling in pass-band for which you will need to look at other noise cancellation techniques.
from pydub import AudioSegment
from pydub import low_passfilter
from pydub import high_pass_filter
from pydub.playback import play
song = AudioSegment.from_wav('file1.wav')
#Freq in Hz ,Adjust as per your needs
new = song.low_pass_filter(5000).high_pass_filter(200)
play(new)
I'm in a need for a program that transforms tones recorded by the microphone into keybord presses. Example: if somebody sings at a frequency between 400hz and 600hz at the microphone and the average tone is 550hz, then i store the average frequency in the var 'tom', and the key "G" of my keyboard is pressed.
Even tho i'm newbye at programming, i searched and figured out a way to do so,
by using Audiopy at python language, by recording small WAV files, i could then read those and get a number as an average frequency, and with this number and some ifs and elifs, press keys (not that hard to find a code to press keys), in an enormous WHILE that repeats the process while the program runs, and so i would have the process of talking, reading the small files the talk would produce, and then transforming into key presses, according to the tone.
The main problem is that I have no idea on how to transform the WAV files i've been recording on a single average number of frequency. Can somebody help me with this? Or with the big picture? Cuz i know this method is not a really good one. Thanks! I was using this code to record, that I found on the Audiopy website:
import pyaudio
import wave
import numpy as np
import pyaudio
CHUNK = 2048
FORMAT = pyaudio.paInt16
CHANNELS = 2
RATE = 44100
RECORD_SECONDS = 1
WAVE_OUTPUT_FILENAME = "output1.wav"
p = pyaudio.PyAudio()
stream = p.open(format=FORMAT,
channels=CHANNELS,
rate=RATE,
input=True,
frames_per_buffer=CHUNK)
print("* recording")
frames = []
for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
data = stream.read(CHUNK)
frames.append(data)
print("* done chunk")
stream.stop_stream()
stream.close()
p.terminate()
To press the keys, this other code:
import win32com.client
shell = win32com.client.Dispatch("WScript.Shell")
if tom >= 400 and tom<=500:
shell.SendKeys("G")
PS.: I'm using Windows
You can use the Fourier transform to convert sound into frequencies.
More specifically, use the one-dimensional discrete Fourier Transform provided by numpy.fft.rfft.
An example to read a single second from a stereo WAV file and extract the frequencies.
import wave
import numpy as np
with wave.open('input.wav', 'r') as wr:
sz = wr.getframerate() # Read and process 1 second.
da = np.fromstring(wr.readframes(sz), dtype=np.int16)
left, right = da[0::2], da[1::2] # separate into left and right channel
lf, rf = np.absolute(np.fft.rfft(left)), np.absolute(np.fft.rfft(right))
The lf and rf are numpy arrays containing the intensity of each frequency. Using numpy.argmax you can get the index (frequency) with the highest strength.
But try it and graph the result using e.g. matplotlib. You'll see that there are probably multiple peaks in the data. For example you might find a peak at 50 Hz or 60 Hz. This is most probably interference from mains electricity and should be ignored by zero-ing out the data.
Example for 60 Hz:
lf[55:65], rf[55:65] = 0, 0
Below is an example plot made with matplotlib from a one-second sound clip. The top graph shows the samples from the WAV file while the bottom one shows the same data converted to frequencies. This is a graph of a person speaking, so there are many peaks. The highest is around 200 Hz.
In order to record a 2 second wav file I used PyAudio (with Pyzo) and the following classical code to record a sound and save it :
import pyaudio
import wave
chunk = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 44100
RECORD_SECONDS = 2
WAVE_OUTPUT_FILENAME = "my_path//a_test_2.wav"
p = pyaudio.PyAudio()
# Création et initialisation de l'objet stream...
s = p.open(format = FORMAT,
channels = CHANNELS,
rate = RATE,
input = True,
frames_per_buffer = chunk)
print("---recording---")
d = []
print((RATE / chunk) * RECORD_SECONDS)
for i in range(0, (RATE // chunk * RECORD_SECONDS)):
data = s.read(chunk)
d.append(data)
#s.write(data, chunk)
print("---done recording---")
s.close()
p.terminate()
wf = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
wf.setnchannels(CHANNELS)
wf.setsampwidth(p.get_sample_size(FORMAT))
wf.setframerate(RATE)
wf.writeframes(b''.join(d))
wf.close()
Then I used it, saying "aaa". Everything's fine, no error.
And when I read the wav file, no "aaa" could be heard. I visualized the file in Audacity and I could see everything was just silence (0). So it seems Pyzo doesn't know where my microphone is, because it didn't use it. What can I do ? Any idea ?
Or maybe it didn't write all the data recorded, but I don't know why.
I have already checked that my microphone is 16 bits and has a 44100 rate.
You'll need to do get this working step-by-step. To make sure that you're recording from the mic, I would suggest printing out the max of each chunk as you read it. Then you should see, in real time, a difference between background noise and your speech. For example:
import audioop
# all the setup stuff, then in your main data loop:
for i in range(0, (RATE // chunk * RECORD_SECONDS)):
data = s.read(chunk)
mx = audioop.max(data, 2)
print mx
Usually this difference between background noise and speech is more than 10x, so you can easily see the size change in the numbers as they fly by.
Also, at the start, list all of your microphones to make sure that you're using the right one (get_device_info_by_index). For example, you could be reading from the "line in" rather than the microphone.
What I am trying to achieve is the following: I need the frequency values of a sound file (.wav) for analysis. I know a lot of programs will give a visual graph (spectrogram) of the values but I need to raw data. I know this can be done with FFT and should be fairly easily scriptable in python but not sure how to do it exactly.
So let's say that a signal in a file is .4s long then I would like multiple measurements giving an output as an array for each timepoint the program measures and what value (frequency) it found (and possibly power (dB) too). The complicated thing is that I want to analyse bird songs, and they often have harmonics or the signal is over a range of frequency (e.g. 1000-2000 Hz). I would like the program to output this information as well, since this is important for the analysis I would like to do with the data :)
Now there is a piece of code that looked very much like I wanted, but I think it does not give me all the values I want.... (thanks to Justin Peel for posting this to a different question :)) So I gather that I need numpy and pyaudio but unfortunately I am not familiar with python so I am hoping that a Python expert can help me on this?
Source Code:
# Read in a WAV and find the freq's
import pyaudio
import wave
import numpy as np
chunk = 2048
# open up a wave
wf = wave.open('test-tones/440hz.wav', 'rb')
swidth = wf.getsampwidth()
RATE = wf.getframerate()
# use a Blackman window
window = np.blackman(chunk)
# open stream
p = pyaudio.PyAudio()
stream = p.open(format =
p.get_format_from_width(wf.getsampwidth()),
channels = wf.getnchannels(),
rate = RATE,
output = True)
# read some data
data = wf.readframes(chunk)
# play stream and find the frequency of each chunk
while len(data) == chunk*swidth:
# write data out to the audio stream
stream.write(data)
# unpack the data and times by the hamming window
indata = np.array(wave.struct.unpack("%dh"%(len(data)/swidth),\
data))*window
# Take the fft and square each value
fftData=abs(np.fft.rfft(indata))**2
# find the maximum
which = fftData[1:].argmax() + 1
# use quadratic interpolation around the max
if which != len(fftData)-1:
y0,y1,y2 = np.log(fftData[which-1:which+2:])
x1 = (y2 - y0) * .5 / (2 * y1 - y2 - y0)
# find the frequency and output it
thefreq = (which+x1)*RATE/chunk
print "The freq is %f Hz." % (thefreq)
else:
thefreq = which*RATE/chunk
print "The freq is %f Hz." % (thefreq)
# read some more data
data = wf.readframes(chunk)
if data:
stream.write(data)
stream.close()
p.terminate()
I'm not sure if this is what you want, if you just want the FFT:
import scikits.audiolab, scipy
x, fs, nbits = scikits.audiolab.wavread(filename)
X = scipy.fft(x)
If you want the magnitude response:
import pylab
Xdb = 20*scipy.log10(scipy.absolute(X))
f = scipy.linspace(0, fs, len(Xdb))
pylab.plot(f, Xdb)
pylab.show()
I think that what you need to do is a Short-time Fourier Transform(STFT). Basically, you do multiple partially overlapping FFTs and add them together for each point in time. Then you would find the peak for each point in time. I haven't done this myself, but I've looked into it some in the past and this is definitely the way to go forward.
There's some Python code to do a STFT here and here.