I was trying to combine multiple wav files in python using pydub but the output song's playback speed was kinda slower than I wanted. So I referred to this question and tried the same.
import os, glob
import random
from pydub import AudioSegment
FRAMERATE = 44100 # The frequency of default wav file
OUTPUT_FILE = 'MySong/random.wav'
audio_data = [AudioSegment.from_wav(wavfile)
for wavfile in glob.glob(os.path.join('wav_files/', '*.wav'))]
my_music = sum([random.choice(audio_data)for i in range(100)])
my_music = my_music.set_frame_rate(FRAMERATE * 4)
my_music.export(OUTPUT_FILE, format='wav')
But this isn't working. Is there any technical reason I'm unaware of, or is there any better way of doing it?
to increase pace without changing pitch, you’ll need to do something a little fancier than changing the frame rate (which will give you a “chipmunk” effect).
If you’re dealing with spoken word, you can try stripping out silence with the (unfortunately undocumented) functions in pydub.silence.
You can also look at AudioSegment().speedup() which is a naive attempt at resampling. You can also make a copy of that function and try to improve it (and contribute back to pydub?)
Related
I have a function that can generate WAV audio frames into a list. Is there any way I can play audio from that list without using an intermediate file to generate an AudioSegment object?
EDIT: For reference, this is my code.
I managed to solve this by using a BytesIO object. Since my library uses wave.open, I can just input an IO-like object to save to and read from. I'm not sure that this is the most pythonic answer, but that is what I used.
I suggest instantiating AudioSegment() objects directly like so:
from pydub import AudioSegment
sound = AudioSegment(
# raw audio data (bytes)
data=b'…',
# 2 byte (16 bit) samples
sample_width=2,
# 44.1 kHz frame rate
frame_rate=44100,
# stereo
channels=2
)
addendum: I see you're generating sound in your linked code snippet. You may be interested in pydub's audio generators
from pydub.generators import Sine
from pydub import AudioSegment
sine_generator = Sine(300)
# 0.1 sec silence
silence = AudioSegment.silent(duration=100)
dot = sine_generator.to_audio_segment(duration=150)
dash = sine_generator.to_audio_segment(duration=300)
signal = [dot, dot, dot, dash, dash, dash, dot, dot, dot]
output = AudioSegment.empty()
for piece in signal:
output += piece + silence
and one final note: iteratively extending an AudioSegment like this can get slow. You might want to do something like this Mixer example
I need to filter frequencies above 5Khz in a wav file. I made some research and found about butterworth algorithm but couldn't apply it.
Assume that I have a mono channel wav file. I read it, then I want to use a low pass filter to filter frequencies above 5Khz.
What I've done so far is this. I read the file, read frames and convert them to numerical values.
from pydub import AudioSegment
song = AudioSegment.from_wav("audio.wav")
frame_count = int(song.frame_count())
all_frames = [song.get_frame(i) for i in range(frame_count)]
def sample_to_int(sample):
return int(sample.encode("hex"), 16)
int_freqs = [sample_to_int(frame) for frame in all_frames]
If I make change values >5000 to 0 is it enough? I don't think that's the way, I am very confused and would be glad to hear any help.
Pydub includes a lopass filter -- there's no need for you to implement it yourself:
from pydub import AudioSegment
song = AudioSegment.from_wav("audio.wav")
new = song.low_pass_filter(5000)
It's "documented" in effects.py.
I've found pyDub, and it seems like just what I need:
http://pydub.com/
The only issue is with generating silence. Can pyDub do this?
Essentially the workflow I want is:
Take all the WAV files in a directory
Piece them together in filename order with 1 sec of silence in between
Generate a single MP3 of the result
Is this possible? I realize I could create a WAV of silence and do it that way (spacer GIF flashback, anyone?), but I'd prefer to generate the silence programmatically, because I may want to experiment with the duration of silence and/or the bitrate of the MP3.
I greatly appreciate any responses.
The pydub sequences are composed of pydub.AudioSegment instances. The pydub quickstart documentation only shows how to create AudioSegments from files.
However, reading the source, or even more easily, running pydoc pydub.AudioSequence reveals
pydub.AudioSegment = class AudioSegment(__builtin__.object)
| AudioSegments are *immutable* objects representing segments of audio
| that can be manipulated using python code.
…
| silent(cls, duration=1000) from __builtin__.type
| Generate a silent audio segment.
| duration specified in milliseconds (default: 1000ms).
which would be called like (following the usage in the quick start guide):
from pydub import AudioSegment
second_of_silence = AudioSegment.silent() # use default
second_of_silence = AudioSegment.silent(duration=1000) # or be explicit
now second_of_silence would be an AudioSegement just like song in the example
song = AudioSegment.from_wav("never_gonna_give_you_up.wav")
and could be manipulated, composed, etc. with no blank audio files needed.
I'd like to use pyDub to take a long WAV file of individual words (and silence in between) as input, then strip out all the silence, and output the remaining chunks is individual WAV files. The filenames can just be sequential numbers, like 001.wav, 002.wav, 003.wav, etc.
The "Yet another Example?" example on the Github page does something very similar, but rather than outputting separate files, it combines the silence-stripped segments back together into one file:
from pydub import AudioSegment
from pydub.utils import db_to_float
# Let's load up the audio we need...
podcast = AudioSegment.from_mp3("podcast.mp3")
intro = AudioSegment.from_wav("intro.wav")
outro = AudioSegment.from_wav("outro.wav")
# Let's consider anything that is 30 decibels quieter than
# the average volume of the podcast to be silence
average_loudness = podcast.rms
silence_threshold = average_loudness * db_to_float(-30)
# filter out the silence
podcast_parts = (ms for ms in podcast if ms.rms > silence_threshold)
# combine all the chunks back together
podcast = reduce(lambda a, b: a + b, podcast_parts)
# add on the bumpers
podcast = intro + podcast + outro
# save the result
podcast.export("podcast_processed.mp3", format="mp3")
Is it possible to output those podcast_parts fragments as individual WAV files? If so, how?
Thanks!
The example code is pretty simplified, you'll probably want to look at the strip_silence function:
https://github.com/jiaaro/pydub/blob/2644289067aa05dbb832974ac75cdc91c3ea6911/pydub/effects.py#L98
And then just export each chunk instead of combining them.
The main difference between the example and the strip_silence function is the example looks at one millisecond slices, which doesn't count low frequency sound very well since one waveform of a 40hz sound, for example, is 25 milliseconds long.
The answer to your original question though, is that all those slices of the original audio segment are also audio segments, so you can just call the export method on them :)
update: you may want to take a look at the silence utilities I've just pushed up into the master branch; especially split_on_silence() which could do this (assuming the right specific arguments) like so:
from pydub import AudioSegment
from pydub.silence import split_on_silence
sound = AudioSegment.from_mp3("my_file.mp3")
chunks = split_on_silence(sound,
# must be silent for at least half a second
min_silence_len=500,
# consider it silent if quieter than -16 dBFS
silence_thresh=-16
)
you could export all the individual chunks as wav files like this:
for i, chunk in enumerate(chunks):
chunk.export("/path/to/ouput/dir/chunk{0}.wav".format(i), format="wav")
which would make output each one named "chunk0.wav", "chunk1.wav", "chunk2.wav", and so on
I am trying to write some code that will extract the amplitude data from an mp3 as a function of time. I wrote up a rough version on MATLAB a while back using this function: http://labrosa.ee.columbia.edu/matlab/mp3read.html However I am having trouble finding a Python equivalent.
I've done a lot of research, and so far I've gathered that I need to use something like mpg321 to convert the .mp3 into a .wav. I haven't been able to figure out how to get that to work.
The next step will be reading the data from the .wav file, which I also haven't had any success with. Has anyone done anything similar or could recommend some libraries to help with this? Thanks!
You can use the subprocess module to call mpg123:
import subprocess
import sys
inname = 'foo.mp3'
outname = 'out.wav'
try:
subprocess.check_call(['mpg123', '-w', outname, inname])
except CalledProcessError as e:
print e
sys.exit(1)
For reading wav files you should use the wave module, like this:
import wave
import numpy as np
wr = wave.open('input.wav', 'r')
sz = 44100 # Read and process 1 second at a time.
da = np.fromstring(wr.readframes(sz), dtype=np.int16)
wr.close()
left, right = da[0::2], da[1::2]
After that, left and right contain the samples of the same channels.
You can find a more elaborate example here.
Here is a project in pure python where you can decode an MP3 file about 10x slower than realtime: http://portalfire.wordpress.com/category/pymp3/
The rest is done by Fourier mathematics etc.:
How to analyse frequency of wave file
and have a look at the python module wave:
http://docs.python.org/2/library/wave.html
The Pymedia library seems to be stable and to deals with what you need.