I try to build a system that can be used to modify sound from the microphone when I talk. For example, when I use google meet or Zoom and I run the program I can apply any filter like scale-pitch to my microphone in real-time.
I try this code to open my microphone:
import pyaudio
import wave
chunk = 1024 # Record in chunks of 1024 samples
sample_format = pyaudio.paInt16 # 16 bits per sample
channels = 2
fs = 44100 # Record at 44100 samples per second
seconds = 3
filename = "output.wav"
p = pyaudio.PyAudio() # Create an interface to PortAudio
print('Recording')
stream = p.open(format=sample_format,
channels=channels,
rate=fs,
frames_per_buffer=chunk,
input=True)
frames = [] # Initialize array to store frames
# Store data in chunks for 3 seconds
for i in range(0, int(fs / chunk * seconds)):
data = stream.read(chunk)
frames.append(data)
# Stop and close the stream
stream.stop_stream()
stream.close()
# Terminate the PortAudio interface
p.terminate()
But I can apply filter to the sound data only when I read it, but I can not apply the filter as a defualt setting of the microphone.
Related
Heads up, this is my first real programming project, but I'm really dedicated to making it work and would love some input.
I've written a program to attempt to record sound using Pyaudio and then halt the recording once the sound intensity has dropped under a certain threshold. To do this, I took the audio data, turned it into integer data, averaged the chunks of data collected by the program, and then set the program to halt after it dropped below a threshold that I picked after some trial and error. The issue is that, the averages of the data clusters don't seem to actually correlate to the intensity of the audio input, and it sometimes drops below the threshold and halts recording even when there is significant input (eg, constant music playing). Below is the code:
import pyaudio
import struct
import wave
def record(outputFile):
#defining audio variables
chunk = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 44100
Y = 100
#Calling pyadio module and starting recording
p = pyaudio.PyAudio()
stream = p.open(format=FORMAT,
channels=CHANNELS,
rate=RATE,
input=True,
frames_per_buffer=chunk)
stream.start_stream()
print("Starting!")
#Recording data until under threshold
frames=[]
while True:
#Converting chunk data into integers
data = stream.read(chunk)
data_int = struct.unpack(str(2*chunk) +'B', data)
#Finding average intensity per chunk
avg_data=sum(data_int)/len(data_int)
print(str(avg_data))
#Recording chunk data
frames.append(data)
if avg_data < Y:
break
#Stopping recording
stream.stop_stream()
stream.close()
p.terminate()
print("Ending recording!")
#Saving file with wave module
wf = wave.open(outputFile, 'wb')
wf.setnchannels(CHANNELS)
wf.setsampwidth(p.get_sample_size(FORMAT))
wf.setframerate(RATE)
wf.writeframes(b''.join(frames))
wf.close()
record('outputFile1.wav')
I need to slow down short bursts of spoken audio, captured over a mic and then play it out in realtime in a python script. I can capture and playback audio fine without changing the speed using an input and an output stream using PyAudio but I can't work out how to slow it down.
I've seen this post which uses pydub does something similar for audio from a file but can't work out how to modify it for my purposes.
Just to stress the key point from the question title, "(not from mp3/wav or any other file type)" as I want to do this in realtime with short bursts, idealy <= ~0.1s so just want to work with data read in from a PyAudio stream.
Does anyone who has experience with pydub know if it might do what I need?
NB I realise that the output would lag further and further behind and that there might be buffering issues however I'm just doing this for short bursts of upto 30 seconds and only want to slow the speech down by ~10%.
So it turns out it was very very simple.
Once I looked into the pydub and pyaudio code bases i realised that by simply specifying a lower value for the 'rate' parameter on the output audio stream (speaker) compared with the input audio stream (mic) the stream.write() function would handle it for me.
I had been expecting that a physical manipulation of the raw data would be required to transform the data into a loarger buffer.
Here's a simple example:
import pyaudio
FORMAT = pyaudio.paInt16
CHANNELS = 1
FRAME_RATE = 44100
CHUNK = 1024*4
# simply modify the value for the 'rate' parameter to change the playback speed
# <1 === slow down; >1 === speed up
FRAMERATE_OFFSET = 0.8
audio = pyaudio.PyAudio()
#output stream
stream_out = audio.open(format=FORMAT,
channels=CHANNELS,
rate= int(FRAME_RATE * FRAMERATE_OFFSET),
output=True)
# open input steam to start recording mic audio
stream_in = audio.open(format=FORMAT,
channels=CHANNELS,
rate=FRAME_RATE,
input=True)
for i in range(1):
# modify the chunk multiplier below to captyre longer time durations
data = stream_in.read(CHUNK*25)
stream_out.write(data)
stream_out.stop_stream()
stream_out.close()
audio.terminate()
To make this operational I'll need to set up a shared memory data buffer and setup a subprocess to handle the output so that I don't miss anything significant from the input signal.
Here is what I did.
import wave
channels = 1
swidth = 2
multiplier = 0.2
spf = wave.open('flute-a4.wav', 'rb')
fr=spf.getframerate() # frame rate
signal = spf.readframes(-1)
wf = wave.open('ship.wav', 'wb')
wf.setnchannels(channels)
wf.setsampwidth(swidth)
wf.setframerate(fr*multiplier)
wf.writeframes(signal)
wf.close()
I used flute from this repo.
As mentioned in the comments, by simply increasing or decreasing sampling frequency / frame rate , you can speed-up of slowdown audio. Although if you are planning to do it from microphone in realtime, one of the idea will be to record in chunks of few seconds, play the slowed down audio and then move onto recording again.
Here's an example using sounddevice , which is basically slight mod of my answer here.
We record audio for 4 seconds in loop for 3 times, and play back immediatly with frame rate offset ( > 1 to speedup and < 1 for slowdown). Added time delay of 1 sec for audio playback to complete before we start new chunk.
import sounddevice as sd
import numpy as np
import scipy.io.wavfile as wav
import time
fs=44100
duration = 4 # seconds
#fs_offset = 1.3 #speedup
fs_offset = 0.8 #speedup slow down
for count in range(1,4):
myrecording = sd.rec(duration * fs, samplerate=fs, channels=2, dtype='float64')
print "Recording Audio chunk {} for {} seconds".format(count, duration)
sd.wait()
print "Recording complete, Playing chunk {} with offset {} ".format(count, fs_offset)
sd.play(myrecording, fs * fs_offset)
sd.wait()
print "Playing chunk {} Complete".format(count)
time.sleep(1)
Output:
$python sdaudio.py
Recording Audio chunk 1 for 4 seconds
Recording complete, Playing chunk 1 with offset 0.8
Playing chunk 1 Complete
Recording Audio chunk 2 for 4 seconds
Recording complete, Playing chunk 2 with offset 0.8
Playing chunk 2 Complete
Recording Audio chunk 3 for 4 seconds
Recording complete, Playing chunk 3 with offset 0.8
Playing chunk 3 Complete
Here's an example using PyAudio for recording from microphone and pydub for playback. Although you can also use pyaudio blocking wire capability to modify outgoing audio. I used pydub since you referrred to a pydub based solution. This is a mod of code from here.
import pyaudio
import wave
from pydub import AudioSegment
from pydub.playback import play
import time
FORMAT = pyaudio.paInt16
CHANNELS = 2
RATE = 44100
CHUNK = 1024
RECORD_SECONDS = 4
#FRAMERATE_OFFSET = 1.4 #speedup
FRAMERATE_OFFSET = 0.7 #slowdown
WAVE_OUTPUT_FILENAME = "file.wav"
def get_audio():
audio = pyaudio.PyAudio()
# start Recording
stream = audio.open(format=FORMAT, channels=CHANNELS,
rate=RATE, input=True,
frames_per_buffer=CHUNK)
frames = []
for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
data = stream.read(CHUNK)
frames.append(data)
# stop Recording
stream.stop_stream()
stream.close()
audio.terminate()
#save to file
waveFile = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
waveFile.setnchannels(CHANNELS)
waveFile.setsampwidth(audio.get_sample_size(FORMAT))
waveFile.setframerate(RATE * FRAMERATE_OFFSET)
waveFile.writeframes(b''.join(frames))
waveFile.close()
for count in range(1,4):
print "recording segment {} ....".format(count)
frame_array = get_audio()
print "Playing segment {} .... at offset {}".format(count, FRAMERATE_OFFSET)
audio_chunk = AudioSegment.from_wav(WAVE_OUTPUT_FILENAME)
print "Finished playing segment {} .... at offset {}".format(count, FRAMERATE_OFFSET)
play(audio_chunk)
time.sleep(1)
Output:
$python slowAudio.py
recording segment 1 ....
Playing segment 1 .... at offset 0.7
Finished playing segment 1 .... at offset 0.7
recording segment 2 ....
Playing segment 2 .... at offset 0.7
Finished playing segment 2 .... at offset 0.7
recording segment 3 ....
Playing segment 3 .... at offset 0.7
This question has been answered here.
from pydub import AudioSegment
sound = AudioSegment.from_file(…)
def speed_change(sound, speed=1.0):
# Manually override the frame_rate. This tells the computer how many
# samples to play per second
sound_with_altered_frame_rate = sound._spawn(sound.raw_data, overrides={
"frame_rate": int(sound.frame_rate * speed)
})
# convert the sound with altered frame rate to a standard frame rate
# so that regular playback programs will work right. They often only
# know how to play audio at standard frame rate (like 44.1k)
return sound_with_altered_frame_rate.set_frame_rate(sound.frame_rate)
slow_sound = speed_change(sound, 0.75)
fast_sound = speed_change(sound, 2.0)
I want to know about how to get signal from different channels PC sound card using Python.
In one channel, I want to get a simple signal like sine wave and from another, I want to get a square wave.
I know that I can get signal using pyaudio like
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 44100
CHUNK = 1024
p = pyaudio.PyAudio()
stream = p.open(
format=FORMAT,
channels=CHANNELS,
rate=RATE, input=True,
frames_per_buffer=CHUNK) #I get one signal
But using this method I can only get one signal at a time (stream), and I would need to get two simultaneous signals (two "stream").
< /Hey >
It looks like your only issue here is you are only specifying one channel!
FORMAT = pyaudio.paInt16
CHANNELS = 1 # Change this to 2 !!!
RATE = 44100
CHUNK = 1024
p = pyaudio.PyAudio()
stream = p.open(
format=FORMAT,
channels=CHANNELS,
rate=RATE, input=True,
frames_per_buffer=CHUNK) #I get one signal
Just change the line where you specify the channels to :
CHANNELS = 2
Then you will receive 2 channels of audio :-)
You can increase this number more to record more channels at once if your audio interface has enough inputs.
Edit 1:
This will let you read 2 streams of incoming audio for example:
frames = []
for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
data = stream.read(CHUNK)
frames.append(data)
print "finished recording"
Then your data (2 arrays, your 2 channel inputs) will be stored in the frames array.
For a full 2 channel example see git hosted Python code here.
I was able to solve splitting the list in two and changing CHANNELS=2
if signal is the list I get from the pyaudio function,
signal[::2] is a channel and signal[1::2] is another.
There's a exemple below:
sinal = np.frombuffer(stream.read(CHUNK, exception_on_overflow=False), np.int16)
sinal[::2] #from channel 1 for exemple
sinal[1::2] #from channel 2 for exemple
We have designed a code that records two wav files:
1. Records the ambient noise
2. Records the ambient noise with voice
We are then going to use those two wav files as inputs for our third def function that will subtract the ambient noise wav file from the ambient noise with voice. The only problem is that when we run the script and call the combination() function, the resulting wav file combines the two preceding wav files. Our goal is to get an output where the ambient noise will be reduced and the voice will be the one heard louder than it. Here is our script below:
import pyaudio
import wave
import matplotlib.pyplot as plt
import numpy as np
import scipy.io.wavfile
import scipy.signal as sp
def ambient():
FORMAT = pyaudio.paInt16
CHANNELS = 2
RATE = 44100
CHUNK = 1024
RECORD_SECONDS = 5
WAVE_OUTPUT_FILENAME = "ambientnoise.wav"
audio = pyaudio.PyAudio()
# start Recording
stream = audio.open(format=FORMAT, channels=CHANNELS,
rate=RATE, input=True,
frames_per_buffer=CHUNK)
print ("recording...")
frames = []
for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
data = stream.read(CHUNK)
frames.append(data)
print ("finished recording")
# stop Recording
stream.stop_stream()
stream.close()
audio.terminate()
waveFile = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
waveFile.setnchannels(CHANNELS)
waveFile.setsampwidth(audio.get_sample_size(FORMAT))
waveFile.setframerate(RATE)
waveFile.writeframes(b''.join(frames))
waveFile.close()
x = scipy.io.wavfile.read('ambientnoise.wav')
n = x[1]
y = np.zeros(n.shape)
y = n.cumsum(axis=0)
times = np.linspace(0, len(n), len(n))
plt.title("Plot 261 $speech1.wav\n $Secades, M.F.\spadesuit SIGNLAB \spadesuit 6Feb2018$")
plt.xlabel("n")
plt.ylabel("$speech1.wav$")
plt.plot(times,n)
plt.show()
def voice():
FORMAT = pyaudio.paInt16
CHANNELS = 2
RATE = 44100
CHUNK = 1024
RECORD_SECONDS = 5
WAVE_OUTPUT_FILENAME = "ambientwithvoice.wav"
audio = pyaudio.PyAudio()
# start Recording
stream = audio.open(format=FORMAT, channels=CHANNELS,
rate=RATE, input=True,
frames_per_buffer=CHUNK)
print ("recording...")
frames = []
for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
data = stream.read(CHUNK)
frames.append(data)
print ("finished recording")
# stop Recording
stream.stop_stream()
stream.close()
audio.terminate()
waveFile = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
waveFile.setnchannels(CHANNELS)
waveFile.setsampwidth(audio.get_sample_size(FORMAT))
waveFile.setframerate(RATE)
waveFile.writeframes(b''.join(frames))
waveFile.close()
x = scipy.io.wavfile.read('ambientwithvoice.wav')
n = x[1]
y = np.zeros(n.shape)
y = n.cumsum(axis=0)
times = np.linspace(0, len(n), len(n))
plt.title("Plot 261 $speech1.wav\n $Secades, M.F.\spadesuit SIGNLAB \spadesuit 6Feb2018$")
plt.xlabel("n")
plt.ylabel("$speech1.wav$")
plt.plot(times,n)
plt.show()
def combination():
rate1,Data1 = scipy.io.wavfile.read('ambientnoise.wav')
rate2,Data2 = scipy.io.wavfile.read('ambientwithvoice.wav')
new_Data = [0]*len(Data1)
for i in range(0,len(Data1)):
new_Data[i] = Data2[i] + Data1[i]
new_Data = np.array(new_Data)
scipy.io.wavfile.write('filtered.wav', rate1, new_Data)
x = scipy.io.wavfile.read('ambientwithvoice.wav')
n = x[1]
y = np.zeros(n.shape)
y = n.cumsum(axis=0)
times = np.linspace(0, len(n), len(n))
plt.title("Plot 261 $speech1.wav\n $Secades, M.F.\spadesuit SIGNLAB \spadesuit 6Feb2018$")
plt.xlabel("n")
plt.ylabel("$speech1.wav$")
plt.plot(times,n)
plt.show()
We have designed a code that records two wav files: 1. Records the ambient noise 2. Records the ambient noise with voice
This means, that while the ambient noise is continuously going on in the background two different recordings are made, one after the other. The first records only the noise, the second also has speech in it.
To simplify the explanation, let's assume the speech is not present (maybe the speaker simply said nothing). This should work similarly; noise from the first recording should be used to reduce the noise in the second recording - it does not matter if there is another signal present in the second recording or not. We know we were successful if the noise is reduced.
The situation looks like this:
Now let's combine the two recordings either by adding them or by subtracting:
Apparently, neither approach reduced the noise. Looking closely, the situation got worse: the noise amplitude in the resulting signal is higher than in either of the two recordings!
In order to work, the signal we subtract must be an exact replicate of noise in the speech signal (or at least a reasonable approximation). There lies the problem: we do not know the noise signal, because every time we record it looks differently.
So what can we do?
Use a second microphone that records the noise at the same time as the speech, but does not record the speaker.
Apply domain knowledge (#1): if you know for example that the noise is in a different frequency range than the speech signal filters can reduce the noise part.
Apply domain knowledge (#2): if the noise is predictable (e.g. something periodic like a fan or an engine) create a mathematical model that predicts the noise and subtract that from the speech signal.
If the noise is "real noise" (statistically independent and broad-band) such as Gaussian white-noise, we're pretty much out of luck.
I am trying to record microphone input, perform some endpointing algorithm that will endpoint speech and save new wave file with speech only.
I managed to save a file but when I play the recording it plays half of the speech sequence.
1) What format should the array be in in order for me to succesfuly save it?
2) How can I convert it to that format?
I am using the following algorithm for microphone recording and the mistake is somewhere in the way I save the file:
If I call writeframes(frames) it nicely saves complete 3second mic input.
FORMAT = pyaudio.paInt16 # We use 16bit format per sample
CHANNELS = 1
RATE = 44100
CHUNK = 1024 # 1024bytes of data red from a buffer
RECORD_SECONDS = 3
WAVE_OUTPUT_FILENAME = "file.wav"
audio = pyaudio.PyAudio()
# start Recording
stream = audio.open(format=FORMAT,
channels=CHANNELS,
rate=RATE, input=True,
frames_per_buffer=CHUNK)
print "recording..."
frames = []
for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
data = stream.read(CHUNK)
frames.append(data)
print "finished recording"
# stop Recordings
stream.stop_stream()
stream.close()
audio.terminate()
frames = ''.join(frames)
# important! convert from string to int
amplitudeSamples = np.fromstring(frames, np.int16)
# Perform endpointing algorithm where I compute start and end indexes
# within amplitudeSamples array
voiceSample = amplitudeSamples[start:end]
# Here lies the problem
waveFile = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
waveFile.setnchannels(1)
waveFile.setsampwidth(2)
waveFile.setframerate(RATE)
waveFile.writeframes(voiceSample)
#waveFile.writeframesraw(voiceSample) # doesn't work also
waveFile.close()
Convert the numpy array to a string before writing it:
wavFile.writeframes(voiceSample.tostring())