I've found pyDub, and it seems like just what I need:
http://pydub.com/
The only issue is with generating silence. Can pyDub do this?
Essentially the workflow I want is:
Take all the WAV files in a directory
Piece them together in filename order with 1 sec of silence in between
Generate a single MP3 of the result
Is this possible? I realize I could create a WAV of silence and do it that way (spacer GIF flashback, anyone?), but I'd prefer to generate the silence programmatically, because I may want to experiment with the duration of silence and/or the bitrate of the MP3.
I greatly appreciate any responses.
The pydub sequences are composed of pydub.AudioSegment instances. The pydub quickstart documentation only shows how to create AudioSegments from files.
However, reading the source, or even more easily, running pydoc pydub.AudioSequence reveals
pydub.AudioSegment = class AudioSegment(__builtin__.object)
| AudioSegments are *immutable* objects representing segments of audio
| that can be manipulated using python code.
…
| silent(cls, duration=1000) from __builtin__.type
| Generate a silent audio segment.
| duration specified in milliseconds (default: 1000ms).
which would be called like (following the usage in the quick start guide):
from pydub import AudioSegment
second_of_silence = AudioSegment.silent() # use default
second_of_silence = AudioSegment.silent(duration=1000) # or be explicit
now second_of_silence would be an AudioSegement just like song in the example
song = AudioSegment.from_wav("never_gonna_give_you_up.wav")
and could be manipulated, composed, etc. with no blank audio files needed.
Related
I want to play a base64-encoded sound in Python, I've tried using Pygame.mixer but all I get is a hiss of white noise.
This is an example of my code:
import pygame
coinflip = b'data:audio/ogg;base64,T2dnUwACAAAAAAAAAACYZ...' # Truncated for brevity
flip = pygame.mixer.Sound(coinflip)
ch = flip.play()
while ch.get_busy():
pygame.time.wait(100)
The pygame mixer works well if I import a wav/mp3/ogg file, but I want to write a compact self-contained program that doesn't need external files, so I'm trying to embed a base64 encoded version of the sound in the Python code.
NB: The solution doesn't need to be using pygame, but it would be preferable since I'm already using it elsewhere in the program.
The reason you hear white noise is because you try to play audio data with a diffrent encoding then expected.
I think the documentation is not 100% clear about this, but it states that a Sound object represents actual sound sample data. It can be loaded from a file or a buffer. Apparently, when using a buffer, it does expect raw sample data, not some base64-encoded data (and not even raw MP3 or OGG file data).
Note that there has been an issue reported about this on the GitHub repository.
So there are two things you can do:
Get the raw bytes of your sound (e.g. using pygame.mixer.Sound(filename).get_raw(), or for simple sounds you could create them mathematically) and decode that in base64 format.
Wrap the original (MP3/OGG encoded) file data in a BytesIO object, which is a file-like object, so the Sound module will treat it like a file and properly decode it.
Note that in both cases, you still need to base64-decode the data first! The pygame module doesn't automatically do that for you.
Since you want a small file, option 2 is the best. But I'll give examples of both solutions.
Example 1
If you have the raw sample data, you could use that directly as the buffer argument for pygame.mixer.Sound(). Note that the sample data must match the frequency, bit size and number of channels used by the mixer. The following is a small example that plays a 400 Hz sine wave tone.
import base64
import pygame
# The following bytes object consists of 160 signed 8-bit samples,
# which are base64 encoded, When played at 8000 Hz, it results in a
# tone of 400 Hz. The duration of the sound is 0.02 Hz, so it should
# be looped 50 times per second for longer sounds.
base64_encoded_sound_data = b'''
gKfK5vj/+ObKp39YNRkHAQcZNViAp8rm+P/45sqngFg1GQcBBxk1WI
Cnyub4//jmyqeAWDUZBwEHGTVYf6fK5vj/+ObKp39YNRkHAQcZNViA
p8rm+P/45sqngFg1GQcBBxk1WH+nyub4//jmyqd/WDUZBwEHGTVYf6
fK5vj/+ObKp39YNRkHAQcZNViAp8rm+P/45sqnf1g1GQcBBxk1WA==
'''
pygame.mixer.init(frequency=8000, size=8, channels=1, allowedchanges=0)
sound_data = base64.b64decode(base64_encoded_sound_data)
sound = pygame.mixer.Sound(sound_data)
ch = sound.play(loops=50)
while ch.get_busy():
pygame.time.wait(100)
Example 2
If you want to use a MP3 or OGG file (which is generally much smaller), you could do it like the following example
import base64
import io
import pygame
# Your base64-encoded data here.
# NOTE: Do NOT include the "data:audio/ogg;base64," part.
base64_encoded_sound_file_data = b'T2dnUwACAAAAAAAAAACY...' # Truncated for brevity
pygame.mixer.init()
sound_file_data = base64.b64decode(base64_encoded_sound_file_data)
assert sound_file_data.startswith(b'OggS') # just to prove it is an Ogg Vorbis file
sound_file = io.BytesIO(sound_file_data)
# The following line will only work with VALID data. With above example data it will fail.
sound = pygame.mixer.Sound(sound_file)
ch = sound.play()
while ch.get_busy():
pygame.time.wait(100)
I would have preferred to use real data in this example as well, but the smallest useful Ogg file I could find was 9 kB, which would add about 120 long lines of data, and I don't think that is appropriate for a Stack Overflow answer. But if you replace it with your own data (which is hopefully a valid Ogg audio file), it should work.
I am more or less following the code below to merge two audio files. It mostly works, where audio segment can export both the original files and the combined file to a folder. These play fine in finder (Mac). However, when brought into a music app like Ableton, the waveform is distorted and sounds like digital garbage. I have a feeling this is because this code is messing with the wav header.
I have also noted the combined sound is showing a bitrate of 32 in the finder file info, whereas I am specifically outputting it as bitrate='24'
Any theories?
from pydub import AudioSegment
sound1 = AudioSegment.from_file("1.wav", format="wav")
sound2 = AudioSegment.from_file("2.wav", format="wav")
# Overlay sound2 over sound1 at position 0
overlay = sound1.overlay(sound2, position=0)
# simple export
file_handle = overlay.export("output.wav", format="wav", bitrate='24')
Note to others, I solved this by moving to using Sox (or PySox) instead of AudioSegment, which seams to work much more reliably with all the features I was looking for.
I am trying to find a way in python to play a section of an audio file given a start and end time.
For example, say I have an audio file that is 1 min in duration. I want to play the section from 0:30 to 0:45 seconds.
I do not want to process or splice the file, only playback of the given section.
Any suggestions would be greatly appreciated!
Update:
I found a great solution using pydub:
https://github.com/jiaaro/pydub
from pydub import AudioSegment
from pydub.playback import play
audiofile = #path to audiofile
start_ms = #start of clip in milliseconds
end_ms = #end of clip in milliseconds
sound = AudioSegment.from_file(audiofile, format="wav")
splice = sound[start_ms:end_ms]
play(splice)
step one is to get your python to play entire audio file ... several libraries are available for this ... see if the library has a time specific api call ... you can always roll up your sleeves and implement this yourself after you read the audio file into a buffer or possibly stream the file and stop streaming at end of chosen time section
Another alternative is to leverage command line tools like ffmpeg which is the Swiss Army Knife of audio processing ... ffmpeg has command line input parms to do time specific start and stop ... also look at its sibling ffplay
Similar to ffplay/ffmpeg is another command line audio tool called sox
Use PyMedia and Player. Look at the functions SeekTo() and SeekEndTime(). I think you will be able to find a right solution after playing around with these functions.
I always have trouble installing external libraries and if you are running your code on a server and you don't have sudo privileges then it becomes even more cumbersome. Don't even get me started on ffmpeg installation.
So, here's an alternative solution with scipy and native IPython that avoids the hassle of installing some other library.
from scipy.io import wavfile # to read and write audio files
import IPython #to play them in jupyter notebook without the hassle of some other library
def PlayAudioSegment(filepath, start, end, channel='none'):
# get sample rate and audio data
sample_rate, audio_data = wavfile.read(filepath) # where filepath = 'directory/audio.wav'
#get length in minutes of audio file
print('duration: ', audio_data.shape[0] / sample_rate / 60,'min')
## splice the audio with prefered start and end times
spliced_audio = audio_data[start * sample_rate : end * sample_rate, :]
## choose left or right channel if preferred (0 or 1 for left and right, respectively; or leave as a string to keep as stereo)
spliced_audio = spliced_audio[:,channel] if type(channel)==int else spliced_audio
## playback natively with IPython; shape needs to be (nChannel,nSamples)
return IPython.display.Audio(spliced_audio.T, rate=sample_rate)
Use like this:
filepath = 'directory_with_file/audio.wav'
start = 30 # in seconds
end = 45 # in seconds
channel = 0 # left channel
PlayAudioSegment(filepath,start,end,channel)
I was trying to combine multiple wav files in python using pydub but the output song's playback speed was kinda slower than I wanted. So I referred to this question and tried the same.
import os, glob
import random
from pydub import AudioSegment
FRAMERATE = 44100 # The frequency of default wav file
OUTPUT_FILE = 'MySong/random.wav'
audio_data = [AudioSegment.from_wav(wavfile)
for wavfile in glob.glob(os.path.join('wav_files/', '*.wav'))]
my_music = sum([random.choice(audio_data)for i in range(100)])
my_music = my_music.set_frame_rate(FRAMERATE * 4)
my_music.export(OUTPUT_FILE, format='wav')
But this isn't working. Is there any technical reason I'm unaware of, or is there any better way of doing it?
to increase pace without changing pitch, you’ll need to do something a little fancier than changing the frame rate (which will give you a “chipmunk” effect).
If you’re dealing with spoken word, you can try stripping out silence with the (unfortunately undocumented) functions in pydub.silence.
You can also look at AudioSegment().speedup() which is a naive attempt at resampling. You can also make a copy of that function and try to improve it (and contribute back to pydub?)
I'd like to use pyDub to take a long WAV file of individual words (and silence in between) as input, then strip out all the silence, and output the remaining chunks is individual WAV files. The filenames can just be sequential numbers, like 001.wav, 002.wav, 003.wav, etc.
The "Yet another Example?" example on the Github page does something very similar, but rather than outputting separate files, it combines the silence-stripped segments back together into one file:
from pydub import AudioSegment
from pydub.utils import db_to_float
# Let's load up the audio we need...
podcast = AudioSegment.from_mp3("podcast.mp3")
intro = AudioSegment.from_wav("intro.wav")
outro = AudioSegment.from_wav("outro.wav")
# Let's consider anything that is 30 decibels quieter than
# the average volume of the podcast to be silence
average_loudness = podcast.rms
silence_threshold = average_loudness * db_to_float(-30)
# filter out the silence
podcast_parts = (ms for ms in podcast if ms.rms > silence_threshold)
# combine all the chunks back together
podcast = reduce(lambda a, b: a + b, podcast_parts)
# add on the bumpers
podcast = intro + podcast + outro
# save the result
podcast.export("podcast_processed.mp3", format="mp3")
Is it possible to output those podcast_parts fragments as individual WAV files? If so, how?
Thanks!
The example code is pretty simplified, you'll probably want to look at the strip_silence function:
https://github.com/jiaaro/pydub/blob/2644289067aa05dbb832974ac75cdc91c3ea6911/pydub/effects.py#L98
And then just export each chunk instead of combining them.
The main difference between the example and the strip_silence function is the example looks at one millisecond slices, which doesn't count low frequency sound very well since one waveform of a 40hz sound, for example, is 25 milliseconds long.
The answer to your original question though, is that all those slices of the original audio segment are also audio segments, so you can just call the export method on them :)
update: you may want to take a look at the silence utilities I've just pushed up into the master branch; especially split_on_silence() which could do this (assuming the right specific arguments) like so:
from pydub import AudioSegment
from pydub.silence import split_on_silence
sound = AudioSegment.from_mp3("my_file.mp3")
chunks = split_on_silence(sound,
# must be silent for at least half a second
min_silence_len=500,
# consider it silent if quieter than -16 dBFS
silence_thresh=-16
)
you could export all the individual chunks as wav files like this:
for i, chunk in enumerate(chunks):
chunk.export("/path/to/ouput/dir/chunk{0}.wav".format(i), format="wav")
which would make output each one named "chunk0.wav", "chunk1.wav", "chunk2.wav", and so on