I want to play a base64-encoded sound in Python, I've tried using Pygame.mixer but all I get is a hiss of white noise.
This is an example of my code:
import pygame
coinflip = b'data:audio/ogg;base64,T2dnUwACAAAAAAAAAACYZ...' # Truncated for brevity
flip = pygame.mixer.Sound(coinflip)
ch = flip.play()
while ch.get_busy():
pygame.time.wait(100)
The pygame mixer works well if I import a wav/mp3/ogg file, but I want to write a compact self-contained program that doesn't need external files, so I'm trying to embed a base64 encoded version of the sound in the Python code.
NB: The solution doesn't need to be using pygame, but it would be preferable since I'm already using it elsewhere in the program.
The reason you hear white noise is because you try to play audio data with a diffrent encoding then expected.
I think the documentation is not 100% clear about this, but it states that a Sound object represents actual sound sample data. It can be loaded from a file or a buffer. Apparently, when using a buffer, it does expect raw sample data, not some base64-encoded data (and not even raw MP3 or OGG file data).
Note that there has been an issue reported about this on the GitHub repository.
So there are two things you can do:
Get the raw bytes of your sound (e.g. using pygame.mixer.Sound(filename).get_raw(), or for simple sounds you could create them mathematically) and decode that in base64 format.
Wrap the original (MP3/OGG encoded) file data in a BytesIO object, which is a file-like object, so the Sound module will treat it like a file and properly decode it.
Note that in both cases, you still need to base64-decode the data first! The pygame module doesn't automatically do that for you.
Since you want a small file, option 2 is the best. But I'll give examples of both solutions.
Example 1
If you have the raw sample data, you could use that directly as the buffer argument for pygame.mixer.Sound(). Note that the sample data must match the frequency, bit size and number of channels used by the mixer. The following is a small example that plays a 400 Hz sine wave tone.
import base64
import pygame
# The following bytes object consists of 160 signed 8-bit samples,
# which are base64 encoded, When played at 8000 Hz, it results in a
# tone of 400 Hz. The duration of the sound is 0.02 Hz, so it should
# be looped 50 times per second for longer sounds.
base64_encoded_sound_data = b'''
gKfK5vj/+ObKp39YNRkHAQcZNViAp8rm+P/45sqngFg1GQcBBxk1WI
Cnyub4//jmyqeAWDUZBwEHGTVYf6fK5vj/+ObKp39YNRkHAQcZNViA
p8rm+P/45sqngFg1GQcBBxk1WH+nyub4//jmyqd/WDUZBwEHGTVYf6
fK5vj/+ObKp39YNRkHAQcZNViAp8rm+P/45sqnf1g1GQcBBxk1WA==
'''
pygame.mixer.init(frequency=8000, size=8, channels=1, allowedchanges=0)
sound_data = base64.b64decode(base64_encoded_sound_data)
sound = pygame.mixer.Sound(sound_data)
ch = sound.play(loops=50)
while ch.get_busy():
pygame.time.wait(100)
Example 2
If you want to use a MP3 or OGG file (which is generally much smaller), you could do it like the following example
import base64
import io
import pygame
# Your base64-encoded data here.
# NOTE: Do NOT include the "data:audio/ogg;base64," part.
base64_encoded_sound_file_data = b'T2dnUwACAAAAAAAAAACY...' # Truncated for brevity
pygame.mixer.init()
sound_file_data = base64.b64decode(base64_encoded_sound_file_data)
assert sound_file_data.startswith(b'OggS') # just to prove it is an Ogg Vorbis file
sound_file = io.BytesIO(sound_file_data)
# The following line will only work with VALID data. With above example data it will fail.
sound = pygame.mixer.Sound(sound_file)
ch = sound.play()
while ch.get_busy():
pygame.time.wait(100)
I would have preferred to use real data in this example as well, but the smallest useful Ogg file I could find was 9 kB, which would add about 120 long lines of data, and I don't think that is appropriate for a Stack Overflow answer. But if you replace it with your own data (which is hopefully a valid Ogg audio file), it should work.
Related
The Question
I want to load an audio file of any type (mp3, m4a, flac, etc) and write it to an output stream.
I tried using pydub, but it loads the entire file at once which takes forever and runs out of memory easily.
I also tried using python-vlc, but it's been unreliable and too much of a black box.
So, how can I open large audio files chunk-by-chunk for streaming?
Edit #1
I found half of a solution here, but I'll need to do more research for the other half.
TL;DR: Use subprocess and ffmpeg to convert the file to wav data, and pipe that data into np.frombuffer. The problem is, the subprocess still has to finish before frombuffer is used.
...unless it's possible to have the pipe written to on 1 thread while np reads it from another thread, which I haven't tested yet. For now, this problem is not solved.
I think the python package https://github.com/irmen/pyminiaudio can be of helpful. You can stream an audio file like this
import miniaudio
audio_path = "my_audio_file.mp3"
target_sampling_rate = 44100 #the input audio will be resampled a this sampling rate
n_channels = 1 #either 1 or 2
waveform_duration = 30 #in seconds
offset = 15 #this means that we read only in the interval [15s, duration of file]
waveform_generator = miniaudio.stream_file(
filename = audio_path,
sample_rate = target_sampling_rate,
seek_frame = int(offset * target_sampling_rate),
frames_to_read = int(waveform_duration * target_sampling_rate),
output_format = miniaudio.SampleFormat.FLOAT32,
nchannels = n_channels)
for waveform in waveform_generator:
#do something with the waveform....
I know for sure that this works on mp3, ogg, wav, flac but for some reason it does not on mp4/acc and I am actually looking for a way to read mp4/acc
I have a function that can generate WAV audio frames into a list. Is there any way I can play audio from that list without using an intermediate file to generate an AudioSegment object?
EDIT: For reference, this is my code.
I managed to solve this by using a BytesIO object. Since my library uses wave.open, I can just input an IO-like object to save to and read from. I'm not sure that this is the most pythonic answer, but that is what I used.
I suggest instantiating AudioSegment() objects directly like so:
from pydub import AudioSegment
sound = AudioSegment(
# raw audio data (bytes)
data=b'…',
# 2 byte (16 bit) samples
sample_width=2,
# 44.1 kHz frame rate
frame_rate=44100,
# stereo
channels=2
)
addendum: I see you're generating sound in your linked code snippet. You may be interested in pydub's audio generators
from pydub.generators import Sine
from pydub import AudioSegment
sine_generator = Sine(300)
# 0.1 sec silence
silence = AudioSegment.silent(duration=100)
dot = sine_generator.to_audio_segment(duration=150)
dash = sine_generator.to_audio_segment(duration=300)
signal = [dot, dot, dot, dash, dash, dash, dot, dot, dot]
output = AudioSegment.empty()
for piece in signal:
output += piece + silence
and one final note: iteratively extending an AudioSegment like this can get slow. You might want to do something like this Mixer example
I was trying to combine multiple wav files in python using pydub but the output song's playback speed was kinda slower than I wanted. So I referred to this question and tried the same.
import os, glob
import random
from pydub import AudioSegment
FRAMERATE = 44100 # The frequency of default wav file
OUTPUT_FILE = 'MySong/random.wav'
audio_data = [AudioSegment.from_wav(wavfile)
for wavfile in glob.glob(os.path.join('wav_files/', '*.wav'))]
my_music = sum([random.choice(audio_data)for i in range(100)])
my_music = my_music.set_frame_rate(FRAMERATE * 4)
my_music.export(OUTPUT_FILE, format='wav')
But this isn't working. Is there any technical reason I'm unaware of, or is there any better way of doing it?
to increase pace without changing pitch, you’ll need to do something a little fancier than changing the frame rate (which will give you a “chipmunk” effect).
If you’re dealing with spoken word, you can try stripping out silence with the (unfortunately undocumented) functions in pydub.silence.
You can also look at AudioSegment().speedup() which is a naive attempt at resampling. You can also make a copy of that function and try to improve it (and contribute back to pydub?)
I've found pyDub, and it seems like just what I need:
http://pydub.com/
The only issue is with generating silence. Can pyDub do this?
Essentially the workflow I want is:
Take all the WAV files in a directory
Piece them together in filename order with 1 sec of silence in between
Generate a single MP3 of the result
Is this possible? I realize I could create a WAV of silence and do it that way (spacer GIF flashback, anyone?), but I'd prefer to generate the silence programmatically, because I may want to experiment with the duration of silence and/or the bitrate of the MP3.
I greatly appreciate any responses.
The pydub sequences are composed of pydub.AudioSegment instances. The pydub quickstart documentation only shows how to create AudioSegments from files.
However, reading the source, or even more easily, running pydoc pydub.AudioSequence reveals
pydub.AudioSegment = class AudioSegment(__builtin__.object)
| AudioSegments are *immutable* objects representing segments of audio
| that can be manipulated using python code.
…
| silent(cls, duration=1000) from __builtin__.type
| Generate a silent audio segment.
| duration specified in milliseconds (default: 1000ms).
which would be called like (following the usage in the quick start guide):
from pydub import AudioSegment
second_of_silence = AudioSegment.silent() # use default
second_of_silence = AudioSegment.silent(duration=1000) # or be explicit
now second_of_silence would be an AudioSegement just like song in the example
song = AudioSegment.from_wav("never_gonna_give_you_up.wav")
and could be manipulated, composed, etc. with no blank audio files needed.
Short question
When using the Python Image Library, does open() immediately decompress the image file?
Details
I would like to measure the decompression time of compressed images (jpeg, png...), as I read that it's supposed to be a good measure of an image's "complexity" (a blank image will be decompressed quickly, and so will a purely random image, since it will not have been compressed at all, so the most "interesting" images are supposed to have the longest decompression time). So I wrote the following python program:
# complexity.py
from PIL import Image
from cStringIO import StringIO
import time
import sys
def mesure_complexity(image_path, iterations = 10000):
with open(image_path, "rb") as f:
data = f.read()
data_io = StringIO(data)
t1 = time.time()
for i in xrange(iterations):
data_io.seek(0)
Image.open(data_io, "r")
t2 = time.time()
return t2 - t1
def main():
for filepath in sys.argv[1:]:
print filepath, mesure_complexity(filepath)
if __name__ == '__main__':
main()
It can be used like this:
#python complexity.py blank.jpg blackandwhitelogo.jpg trees.jpg random.jpg
blank.jpg 1.66653203964
blackandwhitelogo.jpg 1.33399987221
trees.jpg 1.62251782417
random.jpg 0.967066049576
As you can see, I'm not getting the expected results at all, especially for the blank.jpg file: it should be the one with the lowest "complexity" (quickest decompression time). So either the article I read is utterly wrong (I really doubt it, it was a serious scientific article), or PIL is not doing what I think it's doing. Maybe the actual conversion to a bitmap is done lazily, when it's actually needed? But then why would the open delays differ? The smallest jpg file is of course the blank image, and the largest is the random image. This really does not make sense.
Note 1: when running the program multiple times, I get roughly the same results: the results are absurd, but stable. ;-)
Note 2: all images have the same size (width x height).
Edit
I just tried with png images instead of jpeg, and now everything behaves as expected. Cool! I just sorted about 50 images by complexity, and they do look more and more "complex". I checked the article (BTW, it's an article by Jean-Paul Delahaye in 'Pour la Science', April 2013): the author actually mentions that he used only loss-less compression algorithms. So I guess the answer is that open does decompress the image, but my program did not work because I should have used images compressed with loss-less algorithms only (png, but not jpeg).
Glad you got it sorted out. Anyway, the open() method is indeed a lazy operation – as stated in the documentation, to ensure that the image will be loaded, use image.load(), as this will actually force PIL / Pillow to interpret the image data (which is also stated in the linked documentation).