I'm in a need for a program that transforms tones recorded by the microphone into keybord presses. Example: if somebody sings at a frequency between 400hz and 600hz at the microphone and the average tone is 550hz, then i store the average frequency in the var 'tom', and the key "G" of my keyboard is pressed.
Even tho i'm newbye at programming, i searched and figured out a way to do so,
by using Audiopy at python language, by recording small WAV files, i could then read those and get a number as an average frequency, and with this number and some ifs and elifs, press keys (not that hard to find a code to press keys), in an enormous WHILE that repeats the process while the program runs, and so i would have the process of talking, reading the small files the talk would produce, and then transforming into key presses, according to the tone.
The main problem is that I have no idea on how to transform the WAV files i've been recording on a single average number of frequency. Can somebody help me with this? Or with the big picture? Cuz i know this method is not a really good one. Thanks! I was using this code to record, that I found on the Audiopy website:
import pyaudio
import wave
import numpy as np
import pyaudio
CHUNK = 2048
FORMAT = pyaudio.paInt16
CHANNELS = 2
RATE = 44100
RECORD_SECONDS = 1
WAVE_OUTPUT_FILENAME = "output1.wav"
p = pyaudio.PyAudio()
stream = p.open(format=FORMAT,
channels=CHANNELS,
rate=RATE,
input=True,
frames_per_buffer=CHUNK)
print("* recording")
frames = []
for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
data = stream.read(CHUNK)
frames.append(data)
print("* done chunk")
stream.stop_stream()
stream.close()
p.terminate()
To press the keys, this other code:
import win32com.client
shell = win32com.client.Dispatch("WScript.Shell")
if tom >= 400 and tom<=500:
shell.SendKeys("G")
PS.: I'm using Windows
You can use the Fourier transform to convert sound into frequencies.
More specifically, use the one-dimensional discrete Fourier Transform provided by numpy.fft.rfft.
An example to read a single second from a stereo WAV file and extract the frequencies.
import wave
import numpy as np
with wave.open('input.wav', 'r') as wr:
sz = wr.getframerate() # Read and process 1 second.
da = np.fromstring(wr.readframes(sz), dtype=np.int16)
left, right = da[0::2], da[1::2] # separate into left and right channel
lf, rf = np.absolute(np.fft.rfft(left)), np.absolute(np.fft.rfft(right))
The lf and rf are numpy arrays containing the intensity of each frequency. Using numpy.argmax you can get the index (frequency) with the highest strength.
But try it and graph the result using e.g. matplotlib. You'll see that there are probably multiple peaks in the data. For example you might find a peak at 50 Hz or 60 Hz. This is most probably interference from mains electricity and should be ignored by zero-ing out the data.
Example for 60 Hz:
lf[55:65], rf[55:65] = 0, 0
Below is an example plot made with matplotlib from a one-second sound clip. The top graph shows the samples from the WAV file while the bottom one shows the same data converted to frequencies. This is a graph of a person speaking, so there are many peaks. The highest is around 200 Hz.
Related
I wonder that how to get sound pressure level in dB. The input should be the signal from the microphone of PC (real-time audio signal). The output is the real-time sound pressure level of input signal. This is my simple code.
import pyaudio
import numpy as np
import wave
from threading import Thread
from pysine import sine
import math
import time
def print_sound():
CHUNK = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 44100
pa = pyaudio.PyAudio()
stream = pa.open(format=FORMAT,
channels=CHANNELS,
rate=RATE,
input=True,
frames_per_buffer=CHUNK)
buffer = []
while True:
string_audio_data = stream.read(CHUNK)
audio_data = np.frombuffer(string_audio_data, np.int16)
volume_norm = np.linalg.norm(audio_data)*10
dfft = 10.*np.log10(abs(np.fft.rfft(audio_data)))
print(int(volume_norm))
print_sound()
and this error is
Warning (from warnings module): File "C:\Users\Admin\1.py", line 27 dfft = 10.*np.log10(abs(np.fft.rfft(audio_data))) RuntimeWarning: divide by zero encountered in log10
As #Thierry pointed out above it will not be possible to get a dB SPL answer without calibrating your speaker/microphone system with actual measurements. It will be possible (as it seems you've done with your code snippet) to report relative levels.
These are the minimum requirements you need to report absolute dB SPL:
dynamic range of your ADC (max/min voltage range) - what is the loudest sound that can be captured by the ADC of the computer.
the sensitivity of your microphone (mV/Pa). For a sound of X Pascals pressure, how many mV (or sample-related measurement) are produced
speaker calibration - given an input signal with Y rms, how many Pa (and thus dB SPL) will the output sound be?
I am trying to achieve active noise reduction in python. My project is composed of two set of codes:
sound recording code
sound filtering code
What I aim for is that when you run the program, it will start recording through the microphone. After you've finished recording there will be a saved file called "file1.wav" When you play that file, it is the one that you recorded originally. After you're finished with that, you will now put "file1.wav" through a filter by calling "fltrd()". This will create a second wav file in the same folder and that second wav file is supposedly the one with less/reduced noise. Now my problem is that the second wav file is enhancing noise instead of reducing it. Can anyone please troubleshoot my code? :(
Here is my code below:
import pyaudio
import wave
import matplotlib.pyplot as plt
import numpy as np
import scipy.io.wavfile
import scipy.signal as sp
FORMAT = pyaudio.paInt16
CHANNELS = 2
RATE = 44100
CHUNK = 1024
RECORD_SECONDS = 5
WAVE_OUTPUT_FILENAME = "file1.wav"
audio = pyaudio.PyAudio()
# start Recording
stream = audio.open(format=FORMAT, channels=CHANNELS,
rate=RATE, input=True,
frames_per_buffer=CHUNK)
print ("recording...")
frames = []
for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
data = stream.read(CHUNK)
frames.append(data)
print ("finished recording")
# stop Recording
stream.stop_stream()
stream.close()
audio.terminate()
waveFile = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
waveFile.setnchannels(CHANNELS)
waveFile.setsampwidth(audio.get_sample_size(FORMAT))
waveFile.setframerate(RATE)
waveFile.writeframes(b''.join(frames))
waveFile.close()
x = scipy.io.wavfile.read('file1.wav')
n = x[1]
y = np.zeros(n.shape)
y = n.cumsum(axis=0)
def fltrd():
n,x = scipy.io.wavfile.read('file1.wav')
a2 = x.cumsum(axis=0)
a3 = np.asarray(a2, dtype = np.int16)
scipy.io.wavfile.write('file2.wav',n,a3)
Actual noise filtering is difficult and intense. However, an simple noise filter using high and low pass filter can be easily created using pydub library. See here for more details (install, requirements etc)
Also see here for more details on low and high pass filter using pydub.
Basic idea is to take a audio file and then pass it through both low and high pass filter such that audio above and below certain threahold will be highly attenuated (in effect demonstrating filtering).
Although, this will not affect any noise falling in pass-band for which you will need to look at other noise cancellation techniques.
from pydub import AudioSegment
from pydub import low_passfilter
from pydub import high_pass_filter
from pydub.playback import play
song = AudioSegment.from_wav('file1.wav')
#Freq in Hz ,Adjust as per your needs
new = song.low_pass_filter(5000).high_pass_filter(200)
play(new)
import pyaudio
import numpy as np
RATE=44100
block = 64
pa = pyaudio.PyAudio()
stream = pa.open(format=pyaudio.paFloat32,
channels=1,
rate=RATE,
output=True)
while True:
x = np.arange(block,dtype=np.float32)
output = np.cos(2*np.pi*2000*x/44100)
output = output.tobytes()
stream.write(output)
I want to play a cosine wave with 2000Hz frequency and 64 block size. Why does tone change when I change the block size? It should be fixed tone with certain frequency whatever the block size is, shouldn't it?
Thank you for your reply.
I'm not sure what you are trying to achieve with your calculation. For a 2kHz-sound, you need 2000 sin-waves every second or every 44100 samples/ 1 sin-wave every ~22 samples or 0.5ms. The best way to find such formulas is grabbing pen and paper and find out what you actually want (how to actually combine frequency, sampling-rate and desired blocklength). One possible way is here but try to understand the math behind (untested):
import pyaudio
import numpy as np
RATE=44100
FREQUENCY = 2000
pa = pyaudio.PyAudio()
stream = pa.open(format=pyaudio.paFloat32,
channels=1,
rate=RATE,
output=True)
sample_len = 4000.0
wave_len = float(RATE) / FREQUENCY # ~22 samples per wave
# x goes from 0 to 1 for approx index 0..wave_len-1, 1..2 for wave_len..2wave_len-1, ...
x = np.arange(sample_len,dtype=np.float32)/wave_len
# 0..1 -> 0..1..0..-1..0; 1..2 -> 0..1..0..-1..0
# yes, I prefer sin over cos
output = np.sin(2*np.pi*x)
output = output.tobytes()
# no need to recreate the pattern every cycle
while True:
stream.write(output)
I'm trying to write naiv low pass filter using Python.
Values of the Fourier Transformant higher than a specific frequency should be equal to 0, right?
As far as I know that should to work.
But after an inverse fourier transformation what I get is just noise.
Program1 records RECORD_SECONDS from microphone and writes information about fft in fft.bin file.
Program2 reads from this file, do ifft and plays result on speakers.
In addition, I figured out, that every, even very little change in fft causes Program2 to fail.
Where do I make mistake?
Program1:
import pickle
import pyaudio
import wave
import numpy as np
CHUNK = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 1 #1-mono, 2-stereo
RATE = 44100
RECORD_SECONDS = 2
p = pyaudio.PyAudio()
stream = p.open(format=FORMAT,
channels=CHANNELS,
rate=RATE,
input=True,
frames_per_buffer=CHUNK)
f = open("fft.bin", "wb")
Tsamp = 1./RATE
#arguments for a fft
fft_x_arg = np.fft.rfftfreq(CHUNK/2, Tsamp)
#max freq
Fmax = 4000
print("* recording")
for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
#read one chunk from mic
SigString = stream.read(CHUNK)
#convert string to int
SigInt = np.fromstring(SigString, 'int')
#calculate fft
fft_Sig = np.fft.rfft(SigInt)
"""
#apply low pass filter, maximum freq = Fmax
j=0
for value in fft_x_arg:
if value > Fmax:
fft_Sig[j] = 0
j=j+1
"""
#write one chunk of data to file
pickle.dump(fft_Sig,f)
print("* done recording")
f.close()
stream.stop_stream()
stream.close()
p.terminate()
Program2:
import pyaudio
import pickle
import numpy as np
CHUNK = 1024
p = pyaudio.PyAudio()
stream = p.open(format=pyaudio.paInt16,
channels=1,
rate=44100/2, #anyway, why 44100 Hz plays twice faster than normal?
output=True)
f = open("fft.bin", "rb")
#load first value from file
fft_Sig = pickle.load(f)
#calculate ifft and cast do int
SigInt = np.int16(np.fft.irfft(fft_Sig))
#convert once more - to string
SigString = np.ndarray.tostring(SigInt)
while SigString != '':
#play sound
stream.write(SigString)
fft_Sig = pickle.load(f)
SigInt = np.int16(np.fft.irfft(fft_Sig))
SigString = np.ndarray.tostring(SigInt)
f.close()
stream.stop_stream()
stream.close()
p.terminate()
FFTs operate on complex numbers. You might be able to feed them real numbers (which will get converted to complex by setting the imaginary part to 0) but their outputs will always be complex.
This is probably throwing off your sample counting by 2 among other things. It should also be trashing your output because you're not converting back to real data.
Also, you forgot to apply a 1/N scale factor to the IFFT output. And you need to keep in mind that the frequency range of an FFT is half negative, that is it's approximately the range -1/(2T) <= f < 1/(2T). BTW, 1/(2T) is known as the Nyquist frequency, and for real input data, the negative half of the FFT output will mirror the positive half (i.e. for y(f) = F{x(t)} (where F{} is the forward Fourier transform) y(f) == y(-f).
I think you need to read up a bit more on DSP algorithms using FFTs. What you're trying to do is called a brick wall filter.
Also, something that will help you a lot is matplotlib, which will help you see what the data looks like at intermediate steps. You need to look at this intermediate data to find out where things are going wrong.
What I am trying to achieve is the following: I need the frequency values of a sound file (.wav) for analysis. I know a lot of programs will give a visual graph (spectrogram) of the values but I need to raw data. I know this can be done with FFT and should be fairly easily scriptable in python but not sure how to do it exactly.
So let's say that a signal in a file is .4s long then I would like multiple measurements giving an output as an array for each timepoint the program measures and what value (frequency) it found (and possibly power (dB) too). The complicated thing is that I want to analyse bird songs, and they often have harmonics or the signal is over a range of frequency (e.g. 1000-2000 Hz). I would like the program to output this information as well, since this is important for the analysis I would like to do with the data :)
Now there is a piece of code that looked very much like I wanted, but I think it does not give me all the values I want.... (thanks to Justin Peel for posting this to a different question :)) So I gather that I need numpy and pyaudio but unfortunately I am not familiar with python so I am hoping that a Python expert can help me on this?
Source Code:
# Read in a WAV and find the freq's
import pyaudio
import wave
import numpy as np
chunk = 2048
# open up a wave
wf = wave.open('test-tones/440hz.wav', 'rb')
swidth = wf.getsampwidth()
RATE = wf.getframerate()
# use a Blackman window
window = np.blackman(chunk)
# open stream
p = pyaudio.PyAudio()
stream = p.open(format =
p.get_format_from_width(wf.getsampwidth()),
channels = wf.getnchannels(),
rate = RATE,
output = True)
# read some data
data = wf.readframes(chunk)
# play stream and find the frequency of each chunk
while len(data) == chunk*swidth:
# write data out to the audio stream
stream.write(data)
# unpack the data and times by the hamming window
indata = np.array(wave.struct.unpack("%dh"%(len(data)/swidth),\
data))*window
# Take the fft and square each value
fftData=abs(np.fft.rfft(indata))**2
# find the maximum
which = fftData[1:].argmax() + 1
# use quadratic interpolation around the max
if which != len(fftData)-1:
y0,y1,y2 = np.log(fftData[which-1:which+2:])
x1 = (y2 - y0) * .5 / (2 * y1 - y2 - y0)
# find the frequency and output it
thefreq = (which+x1)*RATE/chunk
print "The freq is %f Hz." % (thefreq)
else:
thefreq = which*RATE/chunk
print "The freq is %f Hz." % (thefreq)
# read some more data
data = wf.readframes(chunk)
if data:
stream.write(data)
stream.close()
p.terminate()
I'm not sure if this is what you want, if you just want the FFT:
import scikits.audiolab, scipy
x, fs, nbits = scikits.audiolab.wavread(filename)
X = scipy.fft(x)
If you want the magnitude response:
import pylab
Xdb = 20*scipy.log10(scipy.absolute(X))
f = scipy.linspace(0, fs, len(Xdb))
pylab.plot(f, Xdb)
pylab.show()
I think that what you need to do is a Short-time Fourier Transform(STFT). Basically, you do multiple partially overlapping FFTs and add them together for each point in time. Then you would find the peak for each point in time. I haven't done this myself, but I've looked into it some in the past and this is definitely the way to go forward.
There's some Python code to do a STFT here and here.