I do not understand the example material for pyaudio. It seems they had written an entire small program and it threw me off.
How do I just play a single audio file?
Format is not an issue, I just want to know the bare minimum code I need to play an audio file.
May be this small wrapper (warning: created on knees) of their example will help you to understand the meaning of code they wrote.
import pyaudio
import wave
import sys
class AudioFile:
chunk = 1024
def __init__(self, file):
""" Init audio stream """
self.wf = wave.open(file, 'rb')
self.p = pyaudio.PyAudio()
self.stream = self.p.open(
format = self.p.get_format_from_width(self.wf.getsampwidth()),
channels = self.wf.getnchannels(),
rate = self.wf.getframerate(),
output = True
)
def play(self):
""" Play entire file """
data = self.wf.readframes(self.chunk)
while data != b'':
self.stream.write(data)
data = self.wf.readframes(self.chunk)
def close(self):
""" Graceful shutdown """
self.stream.close()
self.p.terminate()
# Usage example for pyaudio
a = AudioFile("1.wav")
a.play()
a.close()
The example seems pretty clear to me. You simply save the example as playwav.py call:
python playwav.py my_fav_wav.wav
The wave example with some extra comments:
import pyaudio
import wave
import sys
# length of data to read.
chunk = 1024
# validation. If a wave file hasn't been specified, exit.
if len(sys.argv) < 2:
print "Plays a wave file.\n\n" +\
"Usage: %s filename.wav" % sys.argv[0]
sys.exit(-1)
'''
************************************************************************
This is the start of the "minimum needed to read a wave"
************************************************************************
'''
# open the file for reading.
wf = wave.open(sys.argv[1], 'rb')
# create an audio object
p = pyaudio.PyAudio()
# open stream based on the wave object which has been input.
stream = p.open(format =
p.get_format_from_width(wf.getsampwidth()),
channels = wf.getnchannels(),
rate = wf.getframerate(),
output = True)
# read data (based on the chunk size)
data = wf.readframes(chunk)
# play stream (looping from beginning of file to the end)
while data:
# writing to the stream is what *actually* plays the sound.
stream.write(data)
data = wf.readframes(chunk)
# cleanup stuff.
wf.close()
stream.close()
p.terminate()
This way requires ffmpeg for pydub, but can play not only wave files:
import pyaudio
import sys
from pydub import AudioSegment
if len(sys.argv) <= 1:
print('No File Name!')
sys.exit(1)
chunk = 1024
fn = ' '.join(sys.argv[1:])
pd = AudioSegment.from_file(fn)
p = pyaudio.PyAudio()
stream = p.open(format =
p.get_format_from_width(pd.sample_width),
channels = pd.channels,
rate = pd.frame_rate,
output = True)
i = 0
data = pd[:chunk]._data
while data:
stream.write(data)
i += chunk
data = pd[i:i + chunk]._data
stream.close()
p.terminate()
sys.exit(0)
Related
I am trying to dynamically modify the volume of a song by messing with numpy arrays. And I did it, but I can hear some strange noises in addition to the main song. I think the problem is "numpy.int8".
import pyaudio
import wave
import numpy
def audio_datalist_set_volume(datalist, volume):
""" Change value of list of audio chunks """
sound_level = (volume / 100.)
chunk = numpy.frombuffer(datalist, numpy.int8)
chunk = chunk * sound_level
return chunk.astype(numpy.int8)
CHUNK = 1024
# if len(sys.argv) < 2:
# print("Plays a wave file.\n\nUsage: %s filename.wav" % sys.argv[0])
# sys.exit(-1)
wf = wave.open("your_audio.wav", 'rb')
# instantiate PyAudio (1)
p = pyaudio.PyAudio()
# open stream (2)
stream = p.open(format=p.get_format_from_width(wf.getsampwidth()),
channels=wf.getnchannels(),
rate=wf.getframerate(),
output=True)
# read data
data = wf.readframes(CHUNK)
# play stream (3)
while len(data) > 0:
nda = audio_datalist_set_volume(data, 20)
stream.write(nda)
data = wf.readframes(CHUNK)
# stop stream (4)
stream.stop_stream()
stream.close()
# close PyAudio (5)
p.terminate()
Update: I think I have found the problem: numpy.int8 should be changed for "b" or "i1". Is it a proper decision?
If you want to handle arbitrary wav files, you should make sure the numpy data type matches the format of the stream:
def audio_datalist_set_volume(datalist, volume, numpy_type):
""" Change value of list of audio chunks """
sound_level = (volume / 100.)
chunk = numpy.frombuffer(datalist, numpy_type)
chunk = chunk * sound_level
return chunk
# ...
numpy_type = {
pyaudio.paFloat32: np.float32,
pyaudio.paInt32: np.int32,
# numpy can't do 24-bit ints :(
pyaudio.paInt16: np.int16,
pyaudio.paInt8: np.int8,
pyaudio.paUInt8: np.uint8,
}[stream._format]
Using the example from: https://realpython.com/playing-and-recording-sound-python/#python-sounddevice_1 , I'm getting the following error when using Thonny: "Backend terminated or disconnected. Use 'Stop/Restart' to restart.". When I run the program in terminal, I get this as the error: "OSError: [Errno -9981] Input overflowed". The example code (which isn't threaded) can work in both terminal and Thonny if I modify it to not throw an exception on overflow "data = stream.read(chunk, exception_on_overflow = False)" but does not work when in a new thread. I've also tried changing the chunk size to be larger and smaller to no avail. When I have the overflow exception for the threaded version, I get a different error in terminal: "Segmentation fault". I'm running Raspbian 10 Buster and Python 3.7.3, if someone could test/see if it works, thanks.
import time
import board
import busio
import digitalio
from adafruit_mcp230xx.mcp23017 import MCP23017
from adafruit_debouncer import Debouncer
i2c = busio.I2C(board.SCL, board.SDA)
mcp = MCP23017(i2c)
import threading
import pyaudio
import wave
import sys
import subprocess
record = False
def background_audio_recording():
#chunk = 1024 # Record in chunks of 1024 samples
chunk = 1024
sample_format = pyaudio.paInt16 # 16 bits per sample
channels = 2
fs = 44100 # Record at 44100 samples per second
#seconds = 3
filename = "output.wav"
p = pyaudio.PyAudio() # Create an interface to PortAudio
stream = p.open(format=sample_format,
channels=channels,
rate=fs,
frames_per_buffer=chunk,
#input_device_index = 2,
input=True)
frames = [] # Initialize array to store frames
while record:
#data = stream.read(chunk, exception_on_overflow = False)
data = stream.read(chunk)
frames.append(data)
'''
for i in range(0, int(fs / chunk * seconds)):
data = stream.read(chunk)
frames.append(data)
'''
# Stop and close the stream
stream.stop_stream()
stream.close()
# Terminate the PortAudio interface
p.terminate()
print('Finished recording')
# Save the recorded data as a WAV file
wf = wave.open(filename, 'wb')
wf.setnchannels(channels)
wf.setsampwidth(p.get_sample_size(sample_format))
wf.setframerate(fs)
wf.writeframes(b''.join(frames))
wf.close()
print("File Saved")
return
button1PinSetup = mcp.get_pin(0) # GPA0
button1PinSetup.direction = digitalio.Direction.INPUT
button1PinSetup.pull = digitalio.Pull.UP
button1Pin = Debouncer(button1PinSetup)
button2PinSetup = mcp.get_pin(1) # GPA1
button2PinSetup.direction = digitalio.Direction.INPUT
button2PinSetup.pull = digitalio.Pull.UP
button2Pin = Debouncer(button2PinSetup)
while True:
button1Pin.update()
button2Pin.update()
if button1Pin.fell:
print("Record")
if record == True:
print("Already Recording")
else:
record = True
threading.Thread(target=background_audio_recording).start()
if button2Pin.fell:
print("Recording Stopped")
record = False
It seems as though, I needed to create the PyAudio object as a global, as well as within the thread (for each additional recording attempt). I got it working with the following code
import pyaudio
import wave
import time
import threading
import board
import busio
import digitalio
from adafruit_mcp230xx.mcp23017 import MCP23017
from adafruit_debouncer import Debouncer
i2c = busio.I2C(board.SCL, board.SDA)
mcp = MCP23017(i2c)
form_1 = pyaudio.paInt16 # 16-bit resolution
chans = 2 # 1 channel
samp_rate = 44100 # 44.1kHz sampling rate
chunk = 4096 # 2^12 samples for buffer
#record_secs = 3 # seconds to record
dev_index = 2 # device index found by p.get_device_info_by_index(ii)
wav_output_filename = 'test1.wav' # name of .wav file
audio = pyaudio.PyAudio() # create pyaudio instantiation
record = False
def background_audio_recording():
audio = pyaudio.PyAudio() # create pyaudio instantiation
# create pyaudio stream
stream = audio.open(format = form_1,rate = samp_rate,channels = chans, \
input_device_index = dev_index,input = True, \
frames_per_buffer=chunk)
print("recording")
frames = []
# loop through stream and append audio chunks to frame array
#for ii in range(0,int((samp_rate/chunk)*record_secs)):
while record == True:
data = stream.read(chunk)
frames.append(data)
print("finished recording")
# stop the stream, close it, and terminate the pyaudio instantiation
stream.stop_stream()
stream.close()
audio.terminate()
# save the audio frames as .wav file
wavefile = wave.open(wav_output_filename,'wb')
wavefile.setnchannels(chans)
wavefile.setsampwidth(audio.get_sample_size(form_1))
wavefile.setframerate(samp_rate)
wavefile.writeframes(b''.join(frames))
wavefile.close()
return
button1PinSetup = mcp.get_pin(0) # GPA0
button1PinSetup.direction = digitalio.Direction.INPUT
button1PinSetup.pull = digitalio.Pull.UP
button1Pin = Debouncer(button1PinSetup)
button2PinSetup = mcp.get_pin(1) # GPA1
button2PinSetup.direction = digitalio.Direction.INPUT
button2PinSetup.pull = digitalio.Pull.UP
button2Pin = Debouncer(button2PinSetup)
while True:
button1Pin.update()
button2Pin.update()
if button1Pin.fell:
print("Record")
if record == True:
print("Already Recording")
else:
record = True
threading.Thread(target=background_audio_recording).start()
if button2Pin.fell:
print("Recording Stopped")
record = False
I'm trying to play a .wav file for a random number of milliseconds before proceeding to the next file in my code. What is the best way to do so?
I currently have the following code:
#!/usr/bin/env python
from random import randint
import time
import pyaudio
import wave
while True:
# random number to indicate wav file name
x = randint(1,45)
print ("Note: %d" % x)
# play wav file
chunk = 1024
f = wave.open(r"%d.wav" % x,"rb")
p = pyaudio.PyAudio()
stream = p.open(format = p.get_format_from_width(f.getsampwidth()),
channels = f.getnchannels(),
rate = f.getframerate(),
output = True)
data = f.readframes(chunk)
while data != '':
stream.write(data)
data = f.readframes(chunk)
stream.stop_stream()
stream.close()
p.terminate()
Try this out, not sure if it's clean code but it works.
#!/usr/bin/env python
from random import randint
import time
import pyaudio
import threading
import multiprocessing
import wave
# play wav file
chunk = 1024
def playAudio(x):
f = wave.open(r"%d.wav" % x,"rb")
p = pyaudio.PyAudio()
stream = p.open(format = p.get_format_from_width(f.getsampwidth()),
channels = f.getnchannels(),
rate = f.getframerate(),
output = True)
data = f.readframes(chunk)
while data != '':
stream.write(data)
data = f.readframes(chunk)
stream.stop_stream()
stream.close()
p.terminate()
if __name__ == '__main__':
while True:
x = randint(1,45)
p = multiprocessing.Process(target=playAudio, args=(x,))
p.start()
time.sleep(5)
p.terminate()
p.join()
I have recently been working with pocket sphinx in python. I have successfully got the
example below to work recognising a recorded wav.
#!/usr/bin/env python
import sys,os
def decodeSpeech(hmmd,lmdir,dictp,wavfile):
"""
Decodes a speech file
"""
try:
import pocketsphinx as ps
import sphinxbase
except:
print """Pocket sphinx and sphixbase is not installed
in your system. Please install it with package manager.
"""
speechRec = ps.Decoder(hmm = hmmd, lm = lmdir, dict = dictp)
wavFile = file(wavfile,'rb')
wavFile.seek(44)
speechRec.decode_raw(wavFile)
result = speechRec.get_hyp()
return result[0]
if __name__ == "__main__":
hmdir = "/home/jaganadhg/Desktop/Docs_New/kgisl/model/hmm/wsj1"
lmd = "/home/jaganadhg/Desktop/Docs_New/kgisl/model/lm/wsj/wlist5o.3e-7.vp.tg.lm.DMP"
dictd = "/home/jaganadhg/Desktop/Docs_New/kgisl/model/lm/wsj/wlist5o.dic"
wavfile = "/home/jaganadhg/Desktop/Docs_New/kgisl/sa1.wav"
recognised = decodeSpeech(hmdir,lmd,dictd,wavfile)
print "%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%"
print recognised
print "%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%"
The problem is how can I do real time speech recognition from a microphone? In
a while loop with a if statement so that if a set word is recognised from the microphone
a function can be called?
The code for realtime recognition looks like this:
config = Decoder.default_config()
config.set_string('-hmm', path.join(MODELDIR, 'en-us/en-us'))
config.set_string('-lm', path.join(MODELDIR, 'en-us/en-us.lm.bin'))
config.set_string('-dict', path.join(MODELDIR, 'en-us/cmudict-en-us.dict'))
config.set_string('-logfn', '/dev/null')
decoder = Decoder(config)
import pyaudio
p = pyaudio.PyAudio()
stream = p.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=1024)
stream.start_stream()
in_speech_bf = False
decoder.start_utt()
while True:
buf = stream.read(1024)
if buf:
decoder.process_raw(buf, False, False)
if decoder.get_in_speech() != in_speech_bf:
in_speech_bf = decoder.get_in_speech()
if not in_speech_bf:
decoder.end_utt()
print 'Result:', decoder.hyp().hypstr
decoder.start_utt()
else:
break
decoder.end_utt()
You can also use gstreamer python bindings in pocketsphinx, check livedemo.py
Try this. Pocketsphinx is now a GStreamer plugin.
This is the code I see on the internet and I've modified a few things to really listen to the words very bad and slow
You can help me modify it for good. It is built on ubuntu 16.04 LTS
I do not know much about programming
Looking forward to help
# -*- encoding: utf-8 -*-
#!/usr/bin/env python
from pocketsphinx.pocketsphinx import *
from sphinxbase.sphinxbase import *
import os
import pyaudio
import wave
import audioop
from collections import deque
import time
import math;import Mic
"""
Written by Sophie Li, 2016
http://blog.justsophie.com/python-speech-to-text-with-pocketsphinx/
"""
class SpeechDetector:
def __init__(self):
# Microphone stream config.
self.CHUNK = 1024 # CHUNKS of bytes to read each time from mic
self.FORMAT = pyaudio.paInt16
self.CHANNELS = 1
self.RATE = 16000
self.SILENCE_LIMIT = 1 # Silence limit in seconds. The max ammount of seconds where
# only silence is recorded. When this time passes the
# recording finishes and the file is decoded
self.PREV_AUDIO = 0.5 # Previous audio (in seconds) to prepend. When noise
# is detected, how much of previously recorded audio is
# prepended. This helps to prevent chopping the beginning
# of the phrase.
self.THRESHOLD = 4500
self.num_phrases = -1
# These will need to be modified according to where the pocketsphinx folder is
MODELDIR = "/home/l/Desktop/pocketsphinx/model/en-us"
# Create a decoder with certain model
config = Decoder.default_config()
config.set_string('-hmm', os.path.join(MODELDIR, '/home/l/Desktop/pocketsphinx/model/en-us/en-us/'))
config.set_string('-lm', os.path.join(MODELDIR, '/home/l/Desktop/pocketsphinx/model/en-us/en-us.lm.bin'))
config.set_string('-dict', os.path.join(MODELDIR, '/home/l/Desktop/pocketsphinx/model/en-us/cmudict-en-us.dict'))
config.set_string('-keyphrase', 'no one')
config.set_float('-kws_threshold', 1e+20)
# Creaders decoder object for streaming data.
self.decoder = Decoder(config)
def setup_mic(self, num_samples=50):
""" Gets average audio intensity of your mic sound. You can use it to get
average intensities while you're talking and/or silent. The average
is the avg of the .2 of the largest intensities recorded.
"""
#print "Getting intensity values from mic."
p = pyaudio.PyAudio()
stream = p.open(format=self.FORMAT,
channels=self.CHANNELS,
rate=self.RATE,
input=True,
frames_per_buffer=self.CHUNK)
values = [math.sqrt(abs(audioop.avg(stream.read(self.CHUNK), 4)))
for x in range(num_samples)]
values = sorted(values, reverse=True)
r = sum(values[:int(num_samples * 0.2)]) / int(num_samples * 0.2)
#print " Finished "
#print " Average audio intensity is ", r
stream.close()
p.terminate()
if r < 3000:
self.THRESHOLD = 3500
else:
self.THRESHOLD = r + 100
def save_speech(self, data, p):
"""
Saves mic data to temporary WAV file. Returns filename of saved
file
"""
filename = 'output_'+str(int(time.time()))
# writes data to WAV file
data = ''.join(data)
wf = wave.open(filename + '.wav', 'wb')
wf.setnchannels(1)
wf.setsampwidth(p.get_sample_size(pyaudio.paInt16))
wf.setframerate(16000) # TODO make this value a function parameter?
wf.writeframes(data)
wf.close()
return filename + '.wav'
def decode_phrase(self, wav_file):
self.decoder.start_utt()
stream = open(wav_file, "rb")
while True:
buf = stream.read(1024)
if buf:
self.decoder.process_raw(buf, False, False)
else:
break
self.decoder.end_utt()
words = []
[words.append(seg.word) for seg in self.decoder.seg()]
return words
def run(self):
"""
Listens to Microphone, extracts phrases from it and calls pocketsphinx
to decode the sound
"""
self.setup_mic()
#Open stream
p = pyaudio.PyAudio()
stream = p.open(format=self.FORMAT,
channels=self.CHANNELS,
rate=self.RATE,
input=True,
frames_per_buffer=self.CHUNK)
audio2send = []
cur_data = '' # current chunk of audio data
rel = self.RATE/self.CHUNK
slid_win = deque(maxlen=self.SILENCE_LIMIT * rel)
#Prepend audio from 0.5 seconds before noise was detected
prev_audio = deque(maxlen=self.PREV_AUDIO * rel)
started = False
while True:
cur_data = stream.read(self.CHUNK)
slid_win.append(math.sqrt(abs(audioop.avg(cur_data, 4))))
if sum([x > self.THRESHOLD for x in slid_win]) > 0:
if started == False:
print "Bắt đầu ghi âm"
started = True
audio2send.append(cur_data)
elif started:
print "Hoàn thành ghi âm"
filename = self.save_speech(list(prev_audio) + audio2send, p)
r = self.decode_phrase(filename)
print "RESULT: ", r
# hot word for me " no one" if r.count('one') and r.count("no") > 0 the end programs
if r.count("one") > 0 and r.count("no") > 0:
Mic.playaudiofromAudio().play("/home/l/Desktop/PROJECT/Audio/beep_hi.wav")
os.remove(filename)
return
# Removes temp audio file
os.remove(filename)
# Reset all
started = False
slid_win = deque(maxlen=self.SILENCE_LIMIT * rel)
prev_audio = deque(maxlen= 0.5 * rel)
audio2send = []
print "Chế độ nghe ..."
else:
prev_audio.append(cur_data)
print "* Hoàn thành nghe"
stream.close()
p.terminate()
I looked at this question: pyaudio help play a file
While this question did get answered I never got a clear answer of where to actually put the song file.
This is the code for playing a WAVE file:
""" Play a WAVE file. """
import pyaudio
import wave
import sys
chunk = 1024
if len(sys.argv) < 2:
print "Plays a wave file.\n\n" +\
"Usage: %s filename.wav" % sys.argv[0]
sys.exit(-1)
wf = wave.open(sys.argv[1], 'rb')
p = pyaudio.PyAudio()
# open stream
stream = p.open(format =
p.get_format_from_width(wf.getsampwidth()),
channels = wf.getnchannels(),
rate = wf.getframerate(),
output = True)
# read data
data = wf.readframes(chunk)
# play stream
while data != '':
stream.write(data)
data = wf.readframes(chunk)
stream.close()
p.terminate()
I've looked through the code but I can't find anything in the code where I actually insert the music file itself. When I press the "Play" button in my program (I use wxform with this program) nothing is played.
The magic line is:
wf = wave.open(sys.argv[1], 'rb')
This seems to say that the first argument to the script (sys.argv[1]) is used as the input for waves.
I don't know anything of pyaudio but it seems pretty clear that the song file is the first argument that is passed to the program when you execute it. Look att this line: wf = wave.open(sys.argv[1], 'rb') Just change to sys.arg[1] to 'c:/filename.wav' or something.
And the program won't run as it is written now if you don't pass any argument to it. Because of the if len(sys.argv) < 2 block
Just know a few things of python, pyaudio but it seems that the song file is the first argument that is passed to the program when you execute it.
Just change insert an argument like this :
python your-python_file.py sound_file.wav
regards.
Here is a solution :
Comment the If Statment and directly add the file name to play
import pyaudio
import wave
import sys
CHUNK = 1024
#if len(sys.argv) < 2:
# print("Plays a wave file.\n\nUsage: %s output.wav" % sys.argv[0])
# sys.exit(-1)
wf = wave.open("output.wav", 'rb')
p = pyaudio.PyAudio()
stream = p.open(format=p.get_format_from_width(wf.getsampwidth()),
channels=wf.getnchannels(),
rate=wf.getframerate(),
output=True)
data = wf.readframes(CHUNK)
while data != '':
stream.write(data)
data = wf.readframes(CHUNK)
stream.stop_stream()
stream.close()
p.terminate()