The Question
I want to load an audio file of any type (mp3, m4a, flac, etc) and write it to an output stream.
I tried using pydub, but it loads the entire file at once which takes forever and runs out of memory easily.
I also tried using python-vlc, but it's been unreliable and too much of a black box.
So, how can I open large audio files chunk-by-chunk for streaming?
Edit #1
I found half of a solution here, but I'll need to do more research for the other half.
TL;DR: Use subprocess and ffmpeg to convert the file to wav data, and pipe that data into np.frombuffer. The problem is, the subprocess still has to finish before frombuffer is used.
...unless it's possible to have the pipe written to on 1 thread while np reads it from another thread, which I haven't tested yet. For now, this problem is not solved.
I think the python package can be of helpful. You can stream an audio file like this
import miniaudio
audio_path = "my_audio_file.mp3"
target_sampling_rate = 44100 #the input audio will be resampled a this sampling rate
n_channels = 1 #either 1 or 2
waveform_duration = 30 #in seconds
offset = 15 #this means that we read only in the interval [15s, duration of file]
waveform_generator = miniaudio.stream_file(
filename = audio_path,
sample_rate = target_sampling_rate,
seek_frame = int(offset * target_sampling_rate),
frames_to_read = int(waveform_duration * target_sampling_rate),
output_format = miniaudio.SampleFormat.FLOAT32,
nchannels = n_channels)
for waveform in waveform_generator:
#do something with the waveform....
I know for sure that this works on mp3, ogg, wav, flac but for some reason it does not on mp4/acc and I am actually looking for a way to read mp4/acc
I want to play a base64-encoded sound in Python, I've tried using Pygame.mixer but all I get is a hiss of white noise.
This is an example of my code:
import pygame
coinflip = b'data:audio/ogg;base64,T2dnUwACAAAAAAAAAACYZ...' # Truncated for brevity
flip = pygame.mixer.Sound(coinflip)
ch =
while ch.get_busy():
The pygame mixer works well if I import a wav/mp3/ogg file, but I want to write a compact self-contained program that doesn't need external files, so I'm trying to embed a base64 encoded version of the sound in the Python code.
NB: The solution doesn't need to be using pygame, but it would be preferable since I'm already using it elsewhere in the program.
The reason you hear white noise is because you try to play audio data with a diffrent encoding then expected.
I think the documentation is not 100% clear about this, but it states that a Sound object represents actual sound sample data. It can be loaded from a file or a buffer. Apparently, when using a buffer, it does expect raw sample data, not some base64-encoded data (and not even raw MP3 or OGG file data).
Note that there has been an issue reported about this on the GitHub repository.
So there are two things you can do:
Get the raw bytes of your sound (e.g. using pygame.mixer.Sound(filename).get_raw(), or for simple sounds you could create them mathematically) and decode that in base64 format.
Wrap the original (MP3/OGG encoded) file data in a BytesIO object, which is a file-like object, so the Sound module will treat it like a file and properly decode it.
Note that in both cases, you still need to base64-decode the data first! The pygame module doesn't automatically do that for you.
Since you want a small file, option 2 is the best. But I'll give examples of both solutions.
Example 1
If you have the raw sample data, you could use that directly as the buffer argument for pygame.mixer.Sound(). Note that the sample data must match the frequency, bit size and number of channels used by the mixer. The following is a small example that plays a 400 Hz sine wave tone.
import base64
import pygame
# The following bytes object consists of 160 signed 8-bit samples,
# which are base64 encoded, When played at 8000 Hz, it results in a
# tone of 400 Hz. The duration of the sound is 0.02 Hz, so it should
# be looped 50 times per second for longer sounds.
base64_encoded_sound_data = b'''
pygame.mixer.init(frequency=8000, size=8, channels=1, allowedchanges=0)
sound_data = base64.b64decode(base64_encoded_sound_data)
sound = pygame.mixer.Sound(sound_data)
ch =
while ch.get_busy():
Example 2
If you want to use a MP3 or OGG file (which is generally much smaller), you could do it like the following example
import base64
import io
import pygame
# Your base64-encoded data here.
# NOTE: Do NOT include the "data:audio/ogg;base64," part.
base64_encoded_sound_file_data = b'T2dnUwACAAAAAAAAAACY...' # Truncated for brevity
sound_file_data = base64.b64decode(base64_encoded_sound_file_data)
assert sound_file_data.startswith(b'OggS') # just to prove it is an Ogg Vorbis file
sound_file = io.BytesIO(sound_file_data)
# The following line will only work with VALID data. With above example data it will fail.
sound = pygame.mixer.Sound(sound_file)
ch =
while ch.get_busy():
I would have preferred to use real data in this example as well, but the smallest useful Ogg file I could find was 9 kB, which would add about 120 long lines of data, and I don't think that is appropriate for a Stack Overflow answer. But if you replace it with your own data (which is hopefully a valid Ogg audio file), it should work.
I know there are similar questions about this out there, but I haven't found an answer to this problem yet.
My (albeit unoriginal) goal is to create a Virtual Assistant via Python and eventually be ran on a Raspberry Pi. I started with just most simple way to record/play audio with Python and naturally found pyaudio (as well as a swath of other libraries, speechrecognition/sounddevices/etc.). My problem however is that it seems I am unable to even make a recording.
First I run the following script to see what devices are available:
>>> import pyaudio
>>> audio = pyaudio.PyAudio()
>>> for i in range(audio.get_device_count()):
... print(i, audio.get_device_info_by_index(i).get('name'))
0 Built-in Microphone
1 Built-in Output
I then use the device index to identify what I am using to record (0 Built-in Microphone) in the following script, which should record a 3 second clip and save it to test1.wav:
import pyaudio
import wave
import io
form_1 = pyaudio.paInt16 # 16-bit resolution
chans = 1 # 1 channel
samp_rate = 44100 # 44.1kHz sampling rate
chunk = 4096 # 2^12 samples for buffer
record_secs = 3 # seconds to record
dev_index = 0 # device index found by p.get_device_info_by_index(ii)
wav_output_filename = 'test1.wav' # name of .wav file
audio = pyaudio.PyAudio() # create pyaudio instantiation
# create pyaudio stream
stream = = form_1,rate = samp_rate,channels = chans,
input_device_index = dev_index,input = True,
frames = []
# loop through stream and append audio chunks to frame array
for ii in range(0,int((samp_rate/chunk)*record_secs)):
data =
print("finished recording")
# stop the stream, close it, and terminate the pyaudio instantiation
# save the audio frames as .wav file
wavefile =,'wb')
However the output of the print(frames) is a list of byte-strings that resemble:
Which suggests to me that the python script never actually accesses the microphone and just records nothing for 3 seconds.
Anyone have any suggestions here? Anything would help, been at it for well over 6 hours now.
I am running macOS Mojave 10.14 as an FYI.
I can confirm that it is localized to whatever my setup is on this mac. I was able to transfer the exact same code to a different machine and it worked correctly. Any thoughts on what might be keeping python from having access to the microphone?
I am trying to find a way in python to play a section of an audio file given a start and end time.
For example, say I have an audio file that is 1 min in duration. I want to play the section from 0:30 to 0:45 seconds.
I do not want to process or splice the file, only playback of the given section.
Any suggestions would be greatly appreciated!
I found a great solution using pydub:
from pydub import AudioSegment
from pydub.playback import play
audiofile = #path to audiofile
start_ms = #start of clip in milliseconds
end_ms = #end of clip in milliseconds
sound = AudioSegment.from_file(audiofile, format="wav")
splice = sound[start_ms:end_ms]
step one is to get your python to play entire audio file ... several libraries are available for this ... see if the library has a time specific api call ... you can always roll up your sleeves and implement this yourself after you read the audio file into a buffer or possibly stream the file and stop streaming at end of chosen time section
Another alternative is to leverage command line tools like ffmpeg which is the Swiss Army Knife of audio processing ... ffmpeg has command line input parms to do time specific start and stop ... also look at its sibling ffplay
Similar to ffplay/ffmpeg is another command line audio tool called sox
Use PyMedia and Player. Look at the functions SeekTo() and SeekEndTime(). I think you will be able to find a right solution after playing around with these functions.
I always have trouble installing external libraries and if you are running your code on a server and you don't have sudo privileges then it becomes even more cumbersome. Don't even get me started on ffmpeg installation.
So, here's an alternative solution with scipy and native IPython that avoids the hassle of installing some other library.
from import wavfile # to read and write audio files
import IPython #to play them in jupyter notebook without the hassle of some other library
def PlayAudioSegment(filepath, start, end, channel='none'):
# get sample rate and audio data
sample_rate, audio_data = # where filepath = 'directory/audio.wav'
#get length in minutes of audio file
print('duration: ', audio_data.shape[0] / sample_rate / 60,'min')
## splice the audio with prefered start and end times
spliced_audio = audio_data[start * sample_rate : end * sample_rate, :]
## choose left or right channel if preferred (0 or 1 for left and right, respectively; or leave as a string to keep as stereo)
spliced_audio = spliced_audio[:,channel] if type(channel)==int else spliced_audio
## playback natively with IPython; shape needs to be (nChannel,nSamples)
return IPython.display.Audio(spliced_audio.T, rate=sample_rate)
Use like this:
filepath = 'directory_with_file/audio.wav'
start = 30 # in seconds
end = 45 # in seconds
channel = 0 # left channel
Just found out this interesting python package pydub which converts any audio file to mp3, wav, etc.
As far as I have read its documentation, the process is as follows:
read the mp3 audio file using from_mp3()
creates a wav file using export().
Just curious if there is a way to access the sampling rate and the audio signal(of 1-dimensional array, supposing it is a mono) directly from the mp3 file without converting it to a wav file. I am working on thousands of audio files and it might be expensive to convert all of them to wav file.
If you aren't interested in the actual audio content of the file, you may be able to use pydub.utils.mediainfo():
>>> from pydub.utils import mediainfo
>>> info = mediainfo("/path/to/file.mp3")
>>> print info['sample_rate']
>>> print info['channels']
This uses avlib's avprobe utility, and returns all kinds of info. I suggest giving it a try :)
Should be much faster than opening each mp3 using AudioSegment.from_mp3(…)
frame_rate means sample_rate, so you can get like below;
from pydub import AudioSegment
filename = "hoge.wav"
myaudio = AudioSegment.from_file(filename)
I want to adjust the volume of the mp3 file while it is being playing by adjusting the potentiometer. I am reading the potentiometer signal serially via Arduino board with python scripts. With the help of pydub library i can able to read the file but cannot adjust the volume of the file while it is being playing. This is the code i have done after a long search
I specified only the portion of Pydub part. for your information im using vlc media player for changing the volume.
>>> from pydub import AudioSegment
>>> song = AudioSegment.from_wav("C:\Users\RAJU\Desktop\En_Iniya_Ponnilave.wav")
While the file is playing, i cannot adjust the value. Please, someone explain how to do it.
First you need decode your audio signal to raw audio and Split your signal in X frames, and you can manipulate your áudio and at every frame you can change Volume or change the Pitch or change the Speed, etc!
To change the volume you just need multiply your raw audio vector by one factor (this can be your potentiometer data signal).
This factor can be different if your vector are in short int or float point format !
One way to get raw audio data from wav files in python is using wave lib
import wave
spf ='wavfile.wav','r')
#Extract Raw Audio from Wav File
signal = spf.readframes(-1)
decoded = numpy.fromstring(signal, 'Float32');
Now you can multiply the vector decoded by one factor, for example if you want increase 10dB you need calculate 10^(DbValue/20) then in python 10**(10/20) = 3.1623
newsignal = decoded * 3.1623;
Now you need encode the vector again to play the new framed audio, you can use "from struct import pack" and pyaudio to do it!
stream =
format = pyaudio.paFloat32,
channels = 1,
rate = 44100,
output = True,
input = True)
EncodeAgain = pack("%df"%(len(newsignal)), *list(newsignal))
And finally Play your framed audio, note that you will do it at every frame and play it in one loop, this process is too fast and the latency can be imperceptibly !
PS: This example is for float point format !
Ederwander,As u said I have treid coding but when packing the data, im getting total zero. so it is not streaming. I understand the problem may occur in converting the format data types.This is the code i have written. Please look at it and say the suggestion
import sys
import serial
import time
import os
from pydub import AudioSegment
import wave
from struct import pack
import numpy
import pyaudio
CHUNK = 1024
wf ='C:\Users\RAJU\Desktop\En_Iniya_Ponnilave.wav', 'rb')
# instantiate PyAudio (1)
p = pyaudio.PyAudio()
# open stream (2)
stream = = p.get_format_from_width(wf.getsampwidth()),channels = wf.getnchannels(),rate = wf.getframerate(),output = True)
# read data
data_read = wf.readframes(CHUNK)
decoded = numpy.fromstring(data_read, 'int32', sep = '');
data = decoded*3.123
EncodeAgain = struct.pack(h,data)