I am trying to record microphone input, perform some endpointing algorithm that will endpoint speech and save new wave file with speech only.
I managed to save a file but when I play the recording it plays half of the speech sequence.
1) What format should the array be in in order for me to succesfuly save it?
2) How can I convert it to that format?
I am using the following algorithm for microphone recording and the mistake is somewhere in the way I save the file:
If I call writeframes(frames) it nicely saves complete 3second mic input.
FORMAT = pyaudio.paInt16 # We use 16bit format per sample
CHANNELS = 1
RATE = 44100
CHUNK = 1024 # 1024bytes of data red from a buffer
RECORD_SECONDS = 3
WAVE_OUTPUT_FILENAME = "file.wav"
audio = pyaudio.PyAudio()
# start Recording
stream = audio.open(format=FORMAT,
channels=CHANNELS,
rate=RATE, input=True,
frames_per_buffer=CHUNK)
print "recording..."
frames = []
for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
data = stream.read(CHUNK)
frames.append(data)
print "finished recording"
# stop Recordings
stream.stop_stream()
stream.close()
audio.terminate()
frames = ''.join(frames)
# important! convert from string to int
amplitudeSamples = np.fromstring(frames, np.int16)
# Perform endpointing algorithm where I compute start and end indexes
# within amplitudeSamples array
voiceSample = amplitudeSamples[start:end]
# Here lies the problem
waveFile = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
waveFile.setnchannels(1)
waveFile.setsampwidth(2)
waveFile.setframerate(RATE)
waveFile.writeframes(voiceSample)
#waveFile.writeframesraw(voiceSample) # doesn't work also
waveFile.close()
Convert the numpy array to a string before writing it:
wavFile.writeframes(voiceSample.tostring())
Related
I try to build a system that can be used to modify sound from the microphone when I talk. For example, when I use google meet or Zoom and I run the program I can apply any filter like scale-pitch to my microphone in real-time.
I try this code to open my microphone:
import pyaudio
import wave
chunk = 1024 # Record in chunks of 1024 samples
sample_format = pyaudio.paInt16 # 16 bits per sample
channels = 2
fs = 44100 # Record at 44100 samples per second
seconds = 3
filename = "output.wav"
p = pyaudio.PyAudio() # Create an interface to PortAudio
print('Recording')
stream = p.open(format=sample_format,
channels=channels,
rate=fs,
frames_per_buffer=chunk,
input=True)
frames = [] # Initialize array to store frames
# Store data in chunks for 3 seconds
for i in range(0, int(fs / chunk * seconds)):
data = stream.read(chunk)
frames.append(data)
# Stop and close the stream
stream.stop_stream()
stream.close()
# Terminate the PortAudio interface
p.terminate()
But I can apply filter to the sound data only when I read it, but I can not apply the filter as a defualt setting of the microphone.
Heads up, this is my first real programming project, but I'm really dedicated to making it work and would love some input.
I've written a program to attempt to record sound using Pyaudio and then halt the recording once the sound intensity has dropped under a certain threshold. To do this, I took the audio data, turned it into integer data, averaged the chunks of data collected by the program, and then set the program to halt after it dropped below a threshold that I picked after some trial and error. The issue is that, the averages of the data clusters don't seem to actually correlate to the intensity of the audio input, and it sometimes drops below the threshold and halts recording even when there is significant input (eg, constant music playing). Below is the code:
import pyaudio
import struct
import wave
def record(outputFile):
#defining audio variables
chunk = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 44100
Y = 100
#Calling pyadio module and starting recording
p = pyaudio.PyAudio()
stream = p.open(format=FORMAT,
channels=CHANNELS,
rate=RATE,
input=True,
frames_per_buffer=chunk)
stream.start_stream()
print("Starting!")
#Recording data until under threshold
frames=[]
while True:
#Converting chunk data into integers
data = stream.read(chunk)
data_int = struct.unpack(str(2*chunk) +'B', data)
#Finding average intensity per chunk
avg_data=sum(data_int)/len(data_int)
print(str(avg_data))
#Recording chunk data
frames.append(data)
if avg_data < Y:
break
#Stopping recording
stream.stop_stream()
stream.close()
p.terminate()
print("Ending recording!")
#Saving file with wave module
wf = wave.open(outputFile, 'wb')
wf.setnchannels(CHANNELS)
wf.setsampwidth(p.get_sample_size(FORMAT))
wf.setframerate(RATE)
wf.writeframes(b''.join(frames))
wf.close()
record('outputFile1.wav')
I want to know about how to get signal from different channels PC sound card using Python.
In one channel, I want to get a simple signal like sine wave and from another, I want to get a square wave.
I know that I can get signal using pyaudio like
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 44100
CHUNK = 1024
p = pyaudio.PyAudio()
stream = p.open(
format=FORMAT,
channels=CHANNELS,
rate=RATE, input=True,
frames_per_buffer=CHUNK) #I get one signal
But using this method I can only get one signal at a time (stream), and I would need to get two simultaneous signals (two "stream").
< /Hey >
It looks like your only issue here is you are only specifying one channel!
FORMAT = pyaudio.paInt16
CHANNELS = 1 # Change this to 2 !!!
RATE = 44100
CHUNK = 1024
p = pyaudio.PyAudio()
stream = p.open(
format=FORMAT,
channels=CHANNELS,
rate=RATE, input=True,
frames_per_buffer=CHUNK) #I get one signal
Just change the line where you specify the channels to :
CHANNELS = 2
Then you will receive 2 channels of audio :-)
You can increase this number more to record more channels at once if your audio interface has enough inputs.
Edit 1:
This will let you read 2 streams of incoming audio for example:
frames = []
for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
data = stream.read(CHUNK)
frames.append(data)
print "finished recording"
Then your data (2 arrays, your 2 channel inputs) will be stored in the frames array.
For a full 2 channel example see git hosted Python code here.
I was able to solve splitting the list in two and changing CHANNELS=2
if signal is the list I get from the pyaudio function,
signal[::2] is a channel and signal[1::2] is another.
There's a exemple below:
sinal = np.frombuffer(stream.read(CHUNK, exception_on_overflow=False), np.int16)
sinal[::2] #from channel 1 for exemple
sinal[1::2] #from channel 2 for exemple
We have designed a code that records two wav files:
1. Records the ambient noise
2. Records the ambient noise with voice
We are then going to use those two wav files as inputs for our third def function that will subtract the ambient noise wav file from the ambient noise with voice. The only problem is that when we run the script and call the combination() function, the resulting wav file combines the two preceding wav files. Our goal is to get an output where the ambient noise will be reduced and the voice will be the one heard louder than it. Here is our script below:
import pyaudio
import wave
import matplotlib.pyplot as plt
import numpy as np
import scipy.io.wavfile
import scipy.signal as sp
def ambient():
FORMAT = pyaudio.paInt16
CHANNELS = 2
RATE = 44100
CHUNK = 1024
RECORD_SECONDS = 5
WAVE_OUTPUT_FILENAME = "ambientnoise.wav"
audio = pyaudio.PyAudio()
# start Recording
stream = audio.open(format=FORMAT, channels=CHANNELS,
rate=RATE, input=True,
frames_per_buffer=CHUNK)
print ("recording...")
frames = []
for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
data = stream.read(CHUNK)
frames.append(data)
print ("finished recording")
# stop Recording
stream.stop_stream()
stream.close()
audio.terminate()
waveFile = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
waveFile.setnchannels(CHANNELS)
waveFile.setsampwidth(audio.get_sample_size(FORMAT))
waveFile.setframerate(RATE)
waveFile.writeframes(b''.join(frames))
waveFile.close()
x = scipy.io.wavfile.read('ambientnoise.wav')
n = x[1]
y = np.zeros(n.shape)
y = n.cumsum(axis=0)
times = np.linspace(0, len(n), len(n))
plt.title("Plot 261 $speech1.wav\n $Secades, M.F.\spadesuit SIGNLAB \spadesuit 6Feb2018$")
plt.xlabel("n")
plt.ylabel("$speech1.wav$")
plt.plot(times,n)
plt.show()
def voice():
FORMAT = pyaudio.paInt16
CHANNELS = 2
RATE = 44100
CHUNK = 1024
RECORD_SECONDS = 5
WAVE_OUTPUT_FILENAME = "ambientwithvoice.wav"
audio = pyaudio.PyAudio()
# start Recording
stream = audio.open(format=FORMAT, channels=CHANNELS,
rate=RATE, input=True,
frames_per_buffer=CHUNK)
print ("recording...")
frames = []
for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
data = stream.read(CHUNK)
frames.append(data)
print ("finished recording")
# stop Recording
stream.stop_stream()
stream.close()
audio.terminate()
waveFile = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
waveFile.setnchannels(CHANNELS)
waveFile.setsampwidth(audio.get_sample_size(FORMAT))
waveFile.setframerate(RATE)
waveFile.writeframes(b''.join(frames))
waveFile.close()
x = scipy.io.wavfile.read('ambientwithvoice.wav')
n = x[1]
y = np.zeros(n.shape)
y = n.cumsum(axis=0)
times = np.linspace(0, len(n), len(n))
plt.title("Plot 261 $speech1.wav\n $Secades, M.F.\spadesuit SIGNLAB \spadesuit 6Feb2018$")
plt.xlabel("n")
plt.ylabel("$speech1.wav$")
plt.plot(times,n)
plt.show()
def combination():
rate1,Data1 = scipy.io.wavfile.read('ambientnoise.wav')
rate2,Data2 = scipy.io.wavfile.read('ambientwithvoice.wav')
new_Data = [0]*len(Data1)
for i in range(0,len(Data1)):
new_Data[i] = Data2[i] + Data1[i]
new_Data = np.array(new_Data)
scipy.io.wavfile.write('filtered.wav', rate1, new_Data)
x = scipy.io.wavfile.read('ambientwithvoice.wav')
n = x[1]
y = np.zeros(n.shape)
y = n.cumsum(axis=0)
times = np.linspace(0, len(n), len(n))
plt.title("Plot 261 $speech1.wav\n $Secades, M.F.\spadesuit SIGNLAB \spadesuit 6Feb2018$")
plt.xlabel("n")
plt.ylabel("$speech1.wav$")
plt.plot(times,n)
plt.show()
We have designed a code that records two wav files: 1. Records the ambient noise 2. Records the ambient noise with voice
This means, that while the ambient noise is continuously going on in the background two different recordings are made, one after the other. The first records only the noise, the second also has speech in it.
To simplify the explanation, let's assume the speech is not present (maybe the speaker simply said nothing). This should work similarly; noise from the first recording should be used to reduce the noise in the second recording - it does not matter if there is another signal present in the second recording or not. We know we were successful if the noise is reduced.
The situation looks like this:
Now let's combine the two recordings either by adding them or by subtracting:
Apparently, neither approach reduced the noise. Looking closely, the situation got worse: the noise amplitude in the resulting signal is higher than in either of the two recordings!
In order to work, the signal we subtract must be an exact replicate of noise in the speech signal (or at least a reasonable approximation). There lies the problem: we do not know the noise signal, because every time we record it looks differently.
So what can we do?
Use a second microphone that records the noise at the same time as the speech, but does not record the speaker.
Apply domain knowledge (#1): if you know for example that the noise is in a different frequency range than the speech signal filters can reduce the noise part.
Apply domain knowledge (#2): if the noise is predictable (e.g. something periodic like a fan or an engine) create a mathematical model that predicts the noise and subtract that from the speech signal.
If the noise is "real noise" (statistically independent and broad-band) such as Gaussian white-noise, we're pretty much out of luck.
In order to record a 2 second wav file I used PyAudio (with Pyzo) and the following classical code to record a sound and save it :
import pyaudio
import wave
chunk = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 44100
RECORD_SECONDS = 2
WAVE_OUTPUT_FILENAME = "my_path//a_test_2.wav"
p = pyaudio.PyAudio()
# Création et initialisation de l'objet stream...
s = p.open(format = FORMAT,
channels = CHANNELS,
rate = RATE,
input = True,
frames_per_buffer = chunk)
print("---recording---")
d = []
print((RATE / chunk) * RECORD_SECONDS)
for i in range(0, (RATE // chunk * RECORD_SECONDS)):
data = s.read(chunk)
d.append(data)
#s.write(data, chunk)
print("---done recording---")
s.close()
p.terminate()
wf = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
wf.setnchannels(CHANNELS)
wf.setsampwidth(p.get_sample_size(FORMAT))
wf.setframerate(RATE)
wf.writeframes(b''.join(d))
wf.close()
Then I used it, saying "aaa". Everything's fine, no error.
And when I read the wav file, no "aaa" could be heard. I visualized the file in Audacity and I could see everything was just silence (0). So it seems Pyzo doesn't know where my microphone is, because it didn't use it. What can I do ? Any idea ?
Or maybe it didn't write all the data recorded, but I don't know why.
I have already checked that my microphone is 16 bits and has a 44100 rate.
You'll need to do get this working step-by-step. To make sure that you're recording from the mic, I would suggest printing out the max of each chunk as you read it. Then you should see, in real time, a difference between background noise and your speech. For example:
import audioop
# all the setup stuff, then in your main data loop:
for i in range(0, (RATE // chunk * RECORD_SECONDS)):
data = s.read(chunk)
mx = audioop.max(data, 2)
print mx
Usually this difference between background noise and speech is more than 10x, so you can easily see the size change in the numbers as they fly by.
Also, at the start, list all of your microphones to make sure that you're using the right one (get_device_info_by_index). For example, you could be reading from the "line in" rather than the microphone.