Why does tone change when I change the block size? - python

import pyaudio
import numpy as np
RATE=44100
block = 64
pa = pyaudio.PyAudio()
stream = pa.open(format=pyaudio.paFloat32,
channels=1,
rate=RATE,
output=True)
while True:
x = np.arange(block,dtype=np.float32)
output = np.cos(2*np.pi*2000*x/44100)
output = output.tobytes()
stream.write(output)
I want to play a cosine wave with 2000Hz frequency and 64 block size. Why does tone change when I change the block size? It should be fixed tone with certain frequency whatever the block size is, shouldn't it?
Thank you for your reply.

I'm not sure what you are trying to achieve with your calculation. For a 2kHz-sound, you need 2000 sin-waves every second or every 44100 samples/ 1 sin-wave every ~22 samples or 0.5ms. The best way to find such formulas is grabbing pen and paper and find out what you actually want (how to actually combine frequency, sampling-rate and desired blocklength). One possible way is here but try to understand the math behind (untested):
import pyaudio
import numpy as np
RATE=44100
FREQUENCY = 2000
pa = pyaudio.PyAudio()
stream = pa.open(format=pyaudio.paFloat32,
channels=1,
rate=RATE,
output=True)
sample_len = 4000.0
wave_len = float(RATE) / FREQUENCY # ~22 samples per wave
# x goes from 0 to 1 for approx index 0..wave_len-1, 1..2 for wave_len..2wave_len-1, ...
x = np.arange(sample_len,dtype=np.float32)/wave_len
# 0..1 -> 0..1..0..-1..0; 1..2 -> 0..1..0..-1..0
# yes, I prefer sin over cos
output = np.sin(2*np.pi*x)
output = output.tobytes()
# no need to recreate the pattern every cycle
while True:
stream.write(output)

Related

How to generate frequency sweep sound in python

So I'm trying to generate a controllable tone with python. I'm using PyAudio and the simple code is like this (source):
import pyaudio
import numpy as np
p = pyaudio.PyAudio()
fs = 44100 # sampling rate, Hz, must be integer
duration = 1.0 # in seconds, may be float
f = 440.0 # sine frequency, Hz, may be float
# generate samples, note conversion to float32 array
samples = (np.sin(2*np.pi*np.arange(fs*duration)*f/fs)).astype(np.float32)
# for paFloat32 sample values must be in range [-1.0, 1.0]
stream = p.open(format=pyaudio.paFloat32,
channels=1,
rate=fs,
output=True)
# play. May repeat with different volume values (if done interactively)
stream.write(samples.tobytes())
stream.stop_stream()
stream.close()
p.terminate()
the problem is when I played 2 different frequencies in sequence, it didn't play smoothly. I think it is because of the distortion between end of first frequency and beginning of second frequency. I'm trying to give a smooth transition with this code:
fstart = 440.0 # first frequency
fstop = 523.6 # second frequency
transition = 0.5 # transition time
finest = 10 # finest of transition
duration = 2.0 # duration of first and second frequency
samples = []
samples = np.asarray(samples)
# print(duration)
# print(fs*duration)
samples = np.append(samples, (np.sin(2*np.pi*np.arange(fs*duration)*fstart/fs)).astype(np.float32)).astype(np.float32)
for x in range(0, finest):
freq = fstart + (fstop-fstart)*x/finest
time = transition/finest
# print(time)
# print(fs*time)
samples = np.append(samples, (np.sin(2*np.pi*np.arange(fs*time)*freq/fs)).astype(np.float32)).astype(np.float32)
samples = np.append(samples, (np.sin(2*np.pi*np.arange(fs*duration)*fstop/fs)).astype(np.float32)).astype(np.float32)
but as expected, it also didn't go well. And when the finest is set to high, the sound was very distorted. Any idea to make this smooth like frequency sweep sound?
Thank you.

How to write pyaudio output into audio file?

I currently have the following code, which produces a sine wave of varying frequencies using the pyaudio module:
import pyaudio
import numpy as np
p = pyaudio.PyAudio()
volume = 0.5
fs = 44100
duration = 1
f = 440
samples = (np.sin(2 * np.pi * np.arange(fs * duration) * f /
fs)).astype(np.float32).tobytes()
stream = p.open(format = pyaudio.paFloat32,
channels = 1,
rate = fs,
output = True)
stream.write(samples)
However, instead of playing the sound, is there any way to make it so that the sound is written into an audio file?
Add this code at the top of your code.
from scipy.io.wavfile import write
Also, add this code at the bottom of your code.
This worked for me.
scaled = numpy.int16(s/numpy.max(numpy.abs(s)) * 32767)
write('test.wav', 44100, scaled)
Using scipy.io.wavfile.write as suggested by #h lee produced the desired results:
import numpy
from scipy.io.wavfile import write
volume = 1
sample_rate = 44100
duration = 10
frequency = 1000
samples = (numpy.sin(2 * numpy.pi * numpy.arange(sample_rate * duration)
* frequency / sample_rate)).astype(numpy.float32)
write('test.wav', sample_rate, samples)
Another example can be found on the documentation: https://docs.scipy.org/doc/scipy/reference/generated/scipy.io.wavfile.write.html
Handle your audio input as a numpy array like I did here in the second answer, but instead of just processing the frames and sending the data back to PyAudio, save each frame in a new output_array. Then when the processing is done you can use that output_array to write it to .wav or .mp3 file.
If you do that, however, the sound would still play. If you don't want to play the sound you have two options, either using the blocking mode or, if you wanna stick with the non blocking mode and the callbacks, do the following:
Erase output=True so that it is False as default.
Add a input=True parameter.
In your callback do not return ret_data, instead return None.
Keep count of the frames you've processed so that when you're done you return paComplete as second value of the returned tuple.

Transforming small WAV files in a single value of frequency (PYTHON)

I'm in a need for a program that transforms tones recorded by the microphone into keybord presses. Example: if somebody sings at a frequency between 400hz and 600hz at the microphone and the average tone is 550hz, then i store the average frequency in the var 'tom', and the key "G" of my keyboard is pressed.
Even tho i'm newbye at programming, i searched and figured out a way to do so,
by using Audiopy at python language, by recording small WAV files, i could then read those and get a number as an average frequency, and with this number and some ifs and elifs, press keys (not that hard to find a code to press keys), in an enormous WHILE that repeats the process while the program runs, and so i would have the process of talking, reading the small files the talk would produce, and then transforming into key presses, according to the tone.
The main problem is that I have no idea on how to transform the WAV files i've been recording on a single average number of frequency. Can somebody help me with this? Or with the big picture? Cuz i know this method is not a really good one. Thanks! I was using this code to record, that I found on the Audiopy website:
import pyaudio
import wave
import numpy as np
import pyaudio
CHUNK = 2048
FORMAT = pyaudio.paInt16
CHANNELS = 2
RATE = 44100
RECORD_SECONDS = 1
WAVE_OUTPUT_FILENAME = "output1.wav"
p = pyaudio.PyAudio()
stream = p.open(format=FORMAT,
channels=CHANNELS,
rate=RATE,
input=True,
frames_per_buffer=CHUNK)
print("* recording")
frames = []
for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
data = stream.read(CHUNK)
frames.append(data)
print("* done chunk")
stream.stop_stream()
stream.close()
p.terminate()
To press the keys, this other code:
import win32com.client
shell = win32com.client.Dispatch("WScript.Shell")
if tom >= 400 and tom<=500:
shell.SendKeys("G")
PS.: I'm using Windows
You can use the Fourier transform to convert sound into frequencies.
More specifically, use the one-dimensional discrete Fourier Transform provided by numpy.fft.rfft.
An example to read a single second from a stereo WAV file and extract the frequencies.
import wave
import numpy as np
with wave.open('input.wav', 'r') as wr:
sz = wr.getframerate() # Read and process 1 second.
da = np.fromstring(wr.readframes(sz), dtype=np.int16)
left, right = da[0::2], da[1::2] # separate into left and right channel
lf, rf = np.absolute(np.fft.rfft(left)), np.absolute(np.fft.rfft(right))
The lf and rf are numpy arrays containing the intensity of each frequency. Using numpy.argmax you can get the index (frequency) with the highest strength.
But try it and graph the result using e.g. matplotlib. You'll see that there are probably multiple peaks in the data. For example you might find a peak at 50 Hz or 60 Hz. This is most probably interference from mains electricity and should be ignored by zero-ing out the data.
Example for 60 Hz:
lf[55:65], rf[55:65] = 0, 0
Below is an example plot made with matplotlib from a one-second sound clip. The top graph shows the samples from the WAV file while the bottom one shows the same data converted to frequencies. This is a graph of a person speaking, so there are many peaks. The highest is around 200 Hz.

Filtering a wav file using python

So i recently successfully built a system which will record, plot, and playback an audio wav file entirely with python. Now, I'm trying to put some filtering and audio mixing in between the when i record and when i start plotting and outputting the file to the speakers. However, i have no idea where to start. Right now I'm to read in a the intial wav file, apply a low pass filter, and then re-pack the newly filtered data into a new wav file. Here is the code i used to plot the initial data once i recorded it.
import matplotlib.pyplot as plt
import numpy as np
import wave
import sys
spf = wave.open('wavfile.wav','r')
#Extract Raw Audio from Wav File
signal = spf.readframes(-1)
signal = np.fromstring(signal, 'Int16')
plt.figure(1)
plt.title('Signal Wave...')
plt.plot(signal)
And here is some code i used to generate a test audio file of a single tone:
import numpy as np
import wave
import struct
freq = 440.0
data_size = 40000
fname = "High_A.wav"
frate = 11025.0
amp = 64000.0
sine_list_x = []
for x in range(data_size):
sine_list_x.append(np.sin(2*np.pi*freq*(x/frate)))
wav_file = wave.open(fname, "w")
nchannels = 1
sampwidth = 2
framerate = int(frate)
nframes = data_size
comptype = "NONE"
compname = "not compressed"
wav_file.setparams((nchannels, sampwidth, framerate, nframes,
comptype, compname))
for s in sine_list_x:
wav_file.writeframes(struct.pack('h', int(s*amp/2)))
wav_file.close()
I'm not really sure how to apply said audio filter and repack it, though. Any help and/or advice you could offer would be greatly appreciated.
First step : What kind of audio filter do you need ?
Choose the filtered band
Low-pass Filter : remove highest frequency from your audio signal
High-pass Filter : remove lowest frequencies from your audio signal
Band-pass Filter : remove both highest and lowest frequencies from your audio signal
For the following steps, i assume you need a Low-pass Filter.
Choose your cutoff frequency
The Cutoff frequency is the frequency where your signal will be attenuated by -3dB.
Your example signal is 440Hz, so let's choose a Cutoff frequency of 400Hz. Then your 440Hz-signal is attenuated (more than -3dB), by the Low-pass 400Hz filter.
Choose your filter type
According to this other stackoverflow answer
Filter design is beyond the scope of Stack Overflow - that's a DSP
problem, not a programming problem. Filter design is covered by any
DSP textbook - go to your library. I like Proakis and Manolakis'
Digital Signal Processing. (Ifeachor and Jervis' Digital Signal
Processing isn't bad either.)
To go inside a simple example, I suggest to use a moving average filter (for a simple low-pass filter).
See Moving average
Mathematically, a moving average is a type of convolution and so it can be viewed as an example of a low-pass filter used in signal processing
This Moving average Low-pass Filter is a basic filter, and it is quite easy to use and to understand.
The parameter of the moving average is the window length.
The relationship between moving average window length and Cutoff frequency needs little bit mathematics and is explained here
The code will be
import math
sampleRate = 11025.0
cutOffFrequency = 400.0
freqRatio = cutOffFrequency / sampleRate
N = int(math.sqrt(0.196201 + freqRatio**2) / freqRatio)
So, in the example, the window length will be 12
Second step : coding the filter
Hand-made moving average
see specific discussion on how to create a moving average in python
Solution from Alleo is
def running_mean(x, windowSize):
cumsum = numpy.cumsum(numpy.insert(x, 0, 0))
return (cumsum[windowSize:] - cumsum[:-windowSize]) / windowSize
filtered = running_mean(signal, N)
Using lfilter
Alternatively, as suggested by dpwilson, we can also use lfilter
win = numpy.ones(N)
win *= 1.0/N
filtered = scipy.signal.lfilter(win, [1], signal).astype(channels.dtype)
Third step : Let's Put It All Together
import matplotlib.pyplot as plt
import numpy as np
import wave
import sys
import math
import contextlib
fname = 'test.wav'
outname = 'filtered.wav'
cutOffFrequency = 400.0
# from http://stackoverflow.com/questions/13728392/moving-average-or-running-mean
def running_mean(x, windowSize):
cumsum = np.cumsum(np.insert(x, 0, 0))
return (cumsum[windowSize:] - cumsum[:-windowSize]) / windowSize
# from http://stackoverflow.com/questions/2226853/interpreting-wav-data/2227174#2227174
def interpret_wav(raw_bytes, n_frames, n_channels, sample_width, interleaved = True):
if sample_width == 1:
dtype = np.uint8 # unsigned char
elif sample_width == 2:
dtype = np.int16 # signed 2-byte short
else:
raise ValueError("Only supports 8 and 16 bit audio formats.")
channels = np.fromstring(raw_bytes, dtype=dtype)
if interleaved:
# channels are interleaved, i.e. sample N of channel M follows sample N of channel M-1 in raw data
channels.shape = (n_frames, n_channels)
channels = channels.T
else:
# channels are not interleaved. All samples from channel M occur before all samples from channel M-1
channels.shape = (n_channels, n_frames)
return channels
with contextlib.closing(wave.open(fname,'rb')) as spf:
sampleRate = spf.getframerate()
ampWidth = spf.getsampwidth()
nChannels = spf.getnchannels()
nFrames = spf.getnframes()
# Extract Raw Audio from multi-channel Wav File
signal = spf.readframes(nFrames*nChannels)
spf.close()
channels = interpret_wav(signal, nFrames, nChannels, ampWidth, True)
# get window size
# from http://dsp.stackexchange.com/questions/9966/what-is-the-cut-off-frequency-of-a-moving-average-filter
freqRatio = (cutOffFrequency/sampleRate)
N = int(math.sqrt(0.196196 + freqRatio**2)/freqRatio)
# Use moviung average (only on first channel)
filtered = running_mean(channels[0], N).astype(channels.dtype)
wav_file = wave.open(outname, "w")
wav_file.setparams((1, ampWidth, sampleRate, nFrames, spf.getcomptype(), spf.getcompname()))
wav_file.writeframes(filtered.tobytes('C'))
wav_file.close()
sox library can be used for static noise removal.
I found this gist which has some useful commands as examples

naive filtering using fft in python

I'm trying to write naiv low pass filter using Python.
Values of the Fourier Transformant higher than a specific frequency should be equal to 0, right?
As far as I know that should to work.
But after an inverse fourier transformation what I get is just noise.
Program1 records RECORD_SECONDS from microphone and writes information about fft in fft.bin file.
Program2 reads from this file, do ifft and plays result on speakers.
In addition, I figured out, that every, even very little change in fft causes Program2 to fail.
Where do I make mistake?
Program1:
import pickle
import pyaudio
import wave
import numpy as np
CHUNK = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 1 #1-mono, 2-stereo
RATE = 44100
RECORD_SECONDS = 2
p = pyaudio.PyAudio()
stream = p.open(format=FORMAT,
channels=CHANNELS,
rate=RATE,
input=True,
frames_per_buffer=CHUNK)
f = open("fft.bin", "wb")
Tsamp = 1./RATE
#arguments for a fft
fft_x_arg = np.fft.rfftfreq(CHUNK/2, Tsamp)
#max freq
Fmax = 4000
print("* recording")
for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
#read one chunk from mic
SigString = stream.read(CHUNK)
#convert string to int
SigInt = np.fromstring(SigString, 'int')
#calculate fft
fft_Sig = np.fft.rfft(SigInt)
"""
#apply low pass filter, maximum freq = Fmax
j=0
for value in fft_x_arg:
if value > Fmax:
fft_Sig[j] = 0
j=j+1
"""
#write one chunk of data to file
pickle.dump(fft_Sig,f)
print("* done recording")
f.close()
stream.stop_stream()
stream.close()
p.terminate()
Program2:
import pyaudio
import pickle
import numpy as np
CHUNK = 1024
p = pyaudio.PyAudio()
stream = p.open(format=pyaudio.paInt16,
channels=1,
rate=44100/2, #anyway, why 44100 Hz plays twice faster than normal?
output=True)
f = open("fft.bin", "rb")
#load first value from file
fft_Sig = pickle.load(f)
#calculate ifft and cast do int
SigInt = np.int16(np.fft.irfft(fft_Sig))
#convert once more - to string
SigString = np.ndarray.tostring(SigInt)
while SigString != '':
#play sound
stream.write(SigString)
fft_Sig = pickle.load(f)
SigInt = np.int16(np.fft.irfft(fft_Sig))
SigString = np.ndarray.tostring(SigInt)
f.close()
stream.stop_stream()
stream.close()
p.terminate()
FFTs operate on complex numbers. You might be able to feed them real numbers (which will get converted to complex by setting the imaginary part to 0) but their outputs will always be complex.
This is probably throwing off your sample counting by 2 among other things. It should also be trashing your output because you're not converting back to real data.
Also, you forgot to apply a 1/N scale factor to the IFFT output. And you need to keep in mind that the frequency range of an FFT is half negative, that is it's approximately the range -1/(2T) <= f < 1/(2T). BTW, 1/(2T) is known as the Nyquist frequency, and for real input data, the negative half of the FFT output will mirror the positive half (i.e. for y(f) = F{x(t)} (where F{} is the forward Fourier transform) y(f) == y(-f).
I think you need to read up a bit more on DSP algorithms using FFTs. What you're trying to do is called a brick wall filter.
Also, something that will help you a lot is matplotlib, which will help you see what the data looks like at intermediate steps. You need to look at this intermediate data to find out where things are going wrong.

Categories