I need to slow down short bursts of spoken audio, captured over a mic and then play it out in realtime in a python script. I can capture and playback audio fine without changing the speed using an input and an output stream using PyAudio but I can't work out how to slow it down.
I've seen this post which uses pydub does something similar for audio from a file but can't work out how to modify it for my purposes.
Just to stress the key point from the question title, "(not from mp3/wav or any other file type)" as I want to do this in realtime with short bursts, idealy <= ~0.1s so just want to work with data read in from a PyAudio stream.
Does anyone who has experience with pydub know if it might do what I need?
NB I realise that the output would lag further and further behind and that there might be buffering issues however I'm just doing this for short bursts of upto 30 seconds and only want to slow the speech down by ~10%.
So it turns out it was very very simple.
Once I looked into the pydub and pyaudio code bases i realised that by simply specifying a lower value for the 'rate' parameter on the output audio stream (speaker) compared with the input audio stream (mic) the stream.write() function would handle it for me.
I had been expecting that a physical manipulation of the raw data would be required to transform the data into a loarger buffer.
Here's a simple example:
import pyaudio
FORMAT = pyaudio.paInt16
CHANNELS = 1
FRAME_RATE = 44100
CHUNK = 1024*4
# simply modify the value for the 'rate' parameter to change the playback speed
# <1 === slow down; >1 === speed up
FRAMERATE_OFFSET = 0.8
audio = pyaudio.PyAudio()
#output stream
stream_out = audio.open(format=FORMAT,
channels=CHANNELS,
rate= int(FRAME_RATE * FRAMERATE_OFFSET),
output=True)
# open input steam to start recording mic audio
stream_in = audio.open(format=FORMAT,
channels=CHANNELS,
rate=FRAME_RATE,
input=True)
for i in range(1):
# modify the chunk multiplier below to captyre longer time durations
data = stream_in.read(CHUNK*25)
stream_out.write(data)
stream_out.stop_stream()
stream_out.close()
audio.terminate()
To make this operational I'll need to set up a shared memory data buffer and setup a subprocess to handle the output so that I don't miss anything significant from the input signal.
Here is what I did.
import wave
channels = 1
swidth = 2
multiplier = 0.2
spf = wave.open('flute-a4.wav', 'rb')
fr=spf.getframerate() # frame rate
signal = spf.readframes(-1)
wf = wave.open('ship.wav', 'wb')
wf.setnchannels(channels)
wf.setsampwidth(swidth)
wf.setframerate(fr*multiplier)
wf.writeframes(signal)
wf.close()
I used flute from this repo.
As mentioned in the comments, by simply increasing or decreasing sampling frequency / frame rate , you can speed-up of slowdown audio. Although if you are planning to do it from microphone in realtime, one of the idea will be to record in chunks of few seconds, play the slowed down audio and then move onto recording again.
Here's an example using sounddevice , which is basically slight mod of my answer here.
We record audio for 4 seconds in loop for 3 times, and play back immediatly with frame rate offset ( > 1 to speedup and < 1 for slowdown). Added time delay of 1 sec for audio playback to complete before we start new chunk.
import sounddevice as sd
import numpy as np
import scipy.io.wavfile as wav
import time
fs=44100
duration = 4 # seconds
#fs_offset = 1.3 #speedup
fs_offset = 0.8 #speedup slow down
for count in range(1,4):
myrecording = sd.rec(duration * fs, samplerate=fs, channels=2, dtype='float64')
print "Recording Audio chunk {} for {} seconds".format(count, duration)
sd.wait()
print "Recording complete, Playing chunk {} with offset {} ".format(count, fs_offset)
sd.play(myrecording, fs * fs_offset)
sd.wait()
print "Playing chunk {} Complete".format(count)
time.sleep(1)
Output:
$python sdaudio.py
Recording Audio chunk 1 for 4 seconds
Recording complete, Playing chunk 1 with offset 0.8
Playing chunk 1 Complete
Recording Audio chunk 2 for 4 seconds
Recording complete, Playing chunk 2 with offset 0.8
Playing chunk 2 Complete
Recording Audio chunk 3 for 4 seconds
Recording complete, Playing chunk 3 with offset 0.8
Playing chunk 3 Complete
Here's an example using PyAudio for recording from microphone and pydub for playback. Although you can also use pyaudio blocking wire capability to modify outgoing audio. I used pydub since you referrred to a pydub based solution. This is a mod of code from here.
import pyaudio
import wave
from pydub import AudioSegment
from pydub.playback import play
import time
FORMAT = pyaudio.paInt16
CHANNELS = 2
RATE = 44100
CHUNK = 1024
RECORD_SECONDS = 4
#FRAMERATE_OFFSET = 1.4 #speedup
FRAMERATE_OFFSET = 0.7 #slowdown
WAVE_OUTPUT_FILENAME = "file.wav"
def get_audio():
audio = pyaudio.PyAudio()
# start Recording
stream = audio.open(format=FORMAT, channels=CHANNELS,
rate=RATE, input=True,
frames_per_buffer=CHUNK)
frames = []
for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
data = stream.read(CHUNK)
frames.append(data)
# stop Recording
stream.stop_stream()
stream.close()
audio.terminate()
#save to file
waveFile = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
waveFile.setnchannels(CHANNELS)
waveFile.setsampwidth(audio.get_sample_size(FORMAT))
waveFile.setframerate(RATE * FRAMERATE_OFFSET)
waveFile.writeframes(b''.join(frames))
waveFile.close()
for count in range(1,4):
print "recording segment {} ....".format(count)
frame_array = get_audio()
print "Playing segment {} .... at offset {}".format(count, FRAMERATE_OFFSET)
audio_chunk = AudioSegment.from_wav(WAVE_OUTPUT_FILENAME)
print "Finished playing segment {} .... at offset {}".format(count, FRAMERATE_OFFSET)
play(audio_chunk)
time.sleep(1)
Output:
$python slowAudio.py
recording segment 1 ....
Playing segment 1 .... at offset 0.7
Finished playing segment 1 .... at offset 0.7
recording segment 2 ....
Playing segment 2 .... at offset 0.7
Finished playing segment 2 .... at offset 0.7
recording segment 3 ....
Playing segment 3 .... at offset 0.7
This question has been answered here.
from pydub import AudioSegment
sound = AudioSegment.from_file(…)
def speed_change(sound, speed=1.0):
# Manually override the frame_rate. This tells the computer how many
# samples to play per second
sound_with_altered_frame_rate = sound._spawn(sound.raw_data, overrides={
"frame_rate": int(sound.frame_rate * speed)
})
# convert the sound with altered frame rate to a standard frame rate
# so that regular playback programs will work right. They often only
# know how to play audio at standard frame rate (like 44.1k)
return sound_with_altered_frame_rate.set_frame_rate(sound.frame_rate)
slow_sound = speed_change(sound, 0.75)
fast_sound = speed_change(sound, 2.0)
Related
I try to build a system that can be used to modify sound from the microphone when I talk. For example, when I use google meet or Zoom and I run the program I can apply any filter like scale-pitch to my microphone in real-time.
I try this code to open my microphone:
import pyaudio
import wave
chunk = 1024 # Record in chunks of 1024 samples
sample_format = pyaudio.paInt16 # 16 bits per sample
channels = 2
fs = 44100 # Record at 44100 samples per second
seconds = 3
filename = "output.wav"
p = pyaudio.PyAudio() # Create an interface to PortAudio
print('Recording')
stream = p.open(format=sample_format,
channels=channels,
rate=fs,
frames_per_buffer=chunk,
input=True)
frames = [] # Initialize array to store frames
# Store data in chunks for 3 seconds
for i in range(0, int(fs / chunk * seconds)):
data = stream.read(chunk)
frames.append(data)
# Stop and close the stream
stream.stop_stream()
stream.close()
# Terminate the PortAudio interface
p.terminate()
But I can apply filter to the sound data only when I read it, but I can not apply the filter as a defualt setting of the microphone.
I am trying to achieve active noise reduction in python. My project is composed of two set of codes:
sound recording code
sound filtering code
What I aim for is that when you run the program, it will start recording through the microphone. After you've finished recording there will be a saved file called "file1.wav" When you play that file, it is the one that you recorded originally. After you're finished with that, you will now put "file1.wav" through a filter by calling "fltrd()". This will create a second wav file in the same folder and that second wav file is supposedly the one with less/reduced noise. Now my problem is that the second wav file is enhancing noise instead of reducing it. Can anyone please troubleshoot my code? :(
Here is my code below:
import pyaudio
import wave
import matplotlib.pyplot as plt
import numpy as np
import scipy.io.wavfile
import scipy.signal as sp
FORMAT = pyaudio.paInt16
CHANNELS = 2
RATE = 44100
CHUNK = 1024
RECORD_SECONDS = 5
WAVE_OUTPUT_FILENAME = "file1.wav"
audio = pyaudio.PyAudio()
# start Recording
stream = audio.open(format=FORMAT, channels=CHANNELS,
rate=RATE, input=True,
frames_per_buffer=CHUNK)
print ("recording...")
frames = []
for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
data = stream.read(CHUNK)
frames.append(data)
print ("finished recording")
# stop Recording
stream.stop_stream()
stream.close()
audio.terminate()
waveFile = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
waveFile.setnchannels(CHANNELS)
waveFile.setsampwidth(audio.get_sample_size(FORMAT))
waveFile.setframerate(RATE)
waveFile.writeframes(b''.join(frames))
waveFile.close()
x = scipy.io.wavfile.read('file1.wav')
n = x[1]
y = np.zeros(n.shape)
y = n.cumsum(axis=0)
def fltrd():
n,x = scipy.io.wavfile.read('file1.wav')
a2 = x.cumsum(axis=0)
a3 = np.asarray(a2, dtype = np.int16)
scipy.io.wavfile.write('file2.wav',n,a3)
Actual noise filtering is difficult and intense. However, an simple noise filter using high and low pass filter can be easily created using pydub library. See here for more details (install, requirements etc)
Also see here for more details on low and high pass filter using pydub.
Basic idea is to take a audio file and then pass it through both low and high pass filter such that audio above and below certain threahold will be highly attenuated (in effect demonstrating filtering).
Although, this will not affect any noise falling in pass-band for which you will need to look at other noise cancellation techniques.
from pydub import AudioSegment
from pydub import low_passfilter
from pydub import high_pass_filter
from pydub.playback import play
song = AudioSegment.from_wav('file1.wav')
#Freq in Hz ,Adjust as per your needs
new = song.low_pass_filter(5000).high_pass_filter(200)
play(new)
I've looked at all the example code but can't seem to read audio from my RME card using PyAudio.
here is the code I'm using:
import wave
import scipy.io.wavfile as waveIO
from __future__ import division
import pyaudio
import time
p=pyaudio.PyAudio()
def record_audio(RECORD_SECONDS,MIC_channel):
CHUNK = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 44100
stream_in = p.open(rate=RATE,
channels=CHANNELS,
format=FORMAT,
input=True,
input_device_index=MIC_channel,
frames_per_buffer=CHUNK)
frames = ""
t=time.time()
for i in range(0, int((RATE / CHUNK) * RECORD_SECONDS)):
data = stream_in.read(CHUNK)
frames=frames+data
elapsed=time.time() - t
print('I read the data in %f seconds' %elapsed)
stream_in.stop_stream()
stream_in.close()
return frames
record_audio(5,40)
I expect this code to read 5 seconds of audio. However, the audio I get is just noise so I added the timer lines to check. the answer I get is 'I read the data in 0.001 seconds'.
The length of the frames string is correct (sample_rate*2 (bytes per sample) * 5 seconds).
The number of frames is also correct. So it just looks like the data=stream_in.read(CHUNK) line takes up almost no time while I expected it to take CHUNK/RATE.
In order to record a 2 second wav file I used PyAudio (with Pyzo) and the following classical code to record a sound and save it :
import pyaudio
import wave
chunk = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 44100
RECORD_SECONDS = 2
WAVE_OUTPUT_FILENAME = "my_path//a_test_2.wav"
p = pyaudio.PyAudio()
# Création et initialisation de l'objet stream...
s = p.open(format = FORMAT,
channels = CHANNELS,
rate = RATE,
input = True,
frames_per_buffer = chunk)
print("---recording---")
d = []
print((RATE / chunk) * RECORD_SECONDS)
for i in range(0, (RATE // chunk * RECORD_SECONDS)):
data = s.read(chunk)
d.append(data)
#s.write(data, chunk)
print("---done recording---")
s.close()
p.terminate()
wf = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
wf.setnchannels(CHANNELS)
wf.setsampwidth(p.get_sample_size(FORMAT))
wf.setframerate(RATE)
wf.writeframes(b''.join(d))
wf.close()
Then I used it, saying "aaa". Everything's fine, no error.
And when I read the wav file, no "aaa" could be heard. I visualized the file in Audacity and I could see everything was just silence (0). So it seems Pyzo doesn't know where my microphone is, because it didn't use it. What can I do ? Any idea ?
Or maybe it didn't write all the data recorded, but I don't know why.
I have already checked that my microphone is 16 bits and has a 44100 rate.
You'll need to do get this working step-by-step. To make sure that you're recording from the mic, I would suggest printing out the max of each chunk as you read it. Then you should see, in real time, a difference between background noise and your speech. For example:
import audioop
# all the setup stuff, then in your main data loop:
for i in range(0, (RATE // chunk * RECORD_SECONDS)):
data = s.read(chunk)
mx = audioop.max(data, 2)
print mx
Usually this difference between background noise and speech is more than 10x, so you can easily see the size change in the numbers as they fly by.
Also, at the start, list all of your microphones to make sure that you're using the right one (get_device_info_by_index). For example, you could be reading from the "line in" rather than the microphone.
Is it possible to play a certain part of a .wav file in Python?
I'd like to have a function play(file, start, length) that plays the audiofile file from start seconds and stops playing after length seconds. Is this possible, and if so, what library do I need?
this is possible and can be easy in python.
Pyaudio is a nice library and you can use to play your audio!
First do you need decode the audio file (wav, mp3, etc) this step convert audio data in numbers(short int or float32).
Do you need convert the seconds in equivalent position point to cut the signal in the position of interest, to do this multiply your frame rate by what seconds do you want !
Here one simple example for wav files:
import pyaudio
import sys
import numpy as np
import wave
import struct
File='ederwander.wav'
start = 12
length=7
chunk = 1024
spf = wave.open(File, 'rb')
signal = spf.readframes(-1)
signal = np.fromstring(signal, 'Int16')
p = pyaudio.PyAudio()
stream = p.open(format =
p.get_format_from_width(spf.getsampwidth()),
channels = spf.getnchannels(),
rate = spf.getframerate(),
output = True)
pos=spf.getframerate()*length
signal =signal[start*spf.getframerate():(start*spf.getframerate()) + pos]
sig=signal[1:chunk]
inc = 0;
data=0;
#play
while data != '':
data = struct.pack("%dh"%(len(sig)), *list(sig))
stream.write(data)
inc=inc+chunk
sig=signal[inc:inc+chunk]
stream.close()
p.terminate()
I know that this is a rather old question, but I just needed the exact same thing and for me ederwander's example seems a little bit too complicated.
Here is my shorter (and commented) solution:
import pyaudio
import wave
# set desired values
start = 7
length = 3
# open wave file
wave_file = wave.open('myWaveFile.wav', 'rb')
# initialize audio
py_audio = pyaudio.PyAudio()
stream = py_audio.open(format=py_audio.get_format_from_width(wave_file.getsampwidth()),
channels=wave_file.getnchannels(),
rate=wave_file.getframerate(),
output=True)
# skip unwanted frames
n_frames = int(start * wave_file.getframerate())
wave_file.setpos(n_frames)
# write desired frames to audio buffer
n_frames = int(length * wave_file.getframerate())
frames = wave_file.readframes(n_frames)
stream.write(frames)
# close and terminate everything properly
stream.close()
py_audio.terminate()
wave_file.close()