pydub audio playback is *extremely* loud for non 16-bit files

pydub audio playback is *extremely* loud for non 16-bit files - python

I have some some audio samples (from SampleSwap) which I am working with in pydub. Most of them have a sample-depth / bits per sample of 16, while others are 24 or 32. Looks something like this:
import pydub
a = pydub.AudioSegment.from_file('16bit_file.wav')
b = pydub.AudioSegment.from_file('24bit_file.wav')
The problem I am running into is when I try to get them to play back:
from pydub.playback import play
play(a)
play(b)
While the 16-bit files play normally, the 24-bit files are all Earth-shatteringly loud, like seriously to the point of potential speaker damage. With my computer set to minimum volume, the 24-bit play back is about as loud as regular music would play back on maximum volume. It's super distorted, sharp, and clipped.
I'm pretty sure I've isolated it to be a problem of bit-depth. The sounds all play normally when played in other software. I can convert the problem sounds to be 16-bit either using sox or using pydub.AudioSegment.set_sample_width(2) and the issue goes away. I have also gone directly through simpleaudio to do the playback (copying the code from pydub, here) and get the same issue.
The main problem is I am writing some code for working with audio which I would like to share, but I do not want users to experience the physical or mental damage from hearing one of these busted sounds. My only idea of a workaround is to immediately convert the bit-depth of any use loaded sounds/lock audio playback to 16-bit files only; this works for the files I am testing, but a) I don't know if it holds true for all sounds/computers, and b) I thought this shouldn't be an issue in pydub anyway. I also thought to somehow check the volume of the sound before playing (using e.g. a.dBFS or a.max), but I haven't found anything that seems to be reliable (either the metric isn't really correlated with the volume, or the value seems to be more of an indication of the dynamic range provided by the extra bits).
So my questions are:
Why do I get this alarmingly loud, distorted playback in pydub when playing non-16-bit files?
What can I do to prevent it?
Am I missing something obvious here about audio playback?
I understand this is (hopefully) not so reproducible; I could try to record it and post if that would be helpful. I can also point out the sounds I am using on SampleSwap, but the problem really seems to be caused by any file that is not 16-bit (i.e. I can convert a sound to be 32-bit and generate the issue).
Here's some version info:
ffmpeg 4.4
PyAudio 0.2.11
pydub 0.25.1
simpleaudio 1.0.4
And the issue is on a 2019 MacBook Pro, Catalina 10.15.7. I've also tested my Windows 10 Desktop (with similar versions as above), but rather than the issue above, I just get silence.

Related

Automate tasks with Python

I am looking for a way to automate tasks in external programs with Python.
I have large audio files in AAC format. I need to convert them to mp3, and then amplify them (avoiding the distortion).
I wrote a program with the pydub library that works great with small files, but my files are too large (longer than 2hs or 200mb) and I run out of memory (because that lib store the full files in RAM, I think). I can't split the file in chunks because I could not merge them again for the previous reason, and I need the file in one piece.
So, I would like to write a program that open another program to convert the file to mp3 (mediahuman audio converter) and then, amplify the converted file with another program (WavePad audio editor) but i don't know if is this possible.
In the present, I'm doing that manually, but that takes a long time of waiting and requires less than 10 clicks (spread throughout the process), which is tedious.
I leave the program I wrote. I transcribed it to remove some functions that are not relevant and are not related to this process, plus I translated the comments, variables and other things into English, so it may have some errors but the original program works well:
import glob
import os
from pydub import AudioSegment
#convert to mp3 128 bits
sound = AudioSegment.from_file("input-file.aac")
sound.export("output-file.mp3", format="mp3", bitrate="128k")
#sound.max_dBFS shows how far below the limit the highest sample is (in dB)
sound = AudioSegment.from_file("output.mp3", format="mp3")
max_gain_without_distortion = -1 * sound.max_dBFS
#increase volume by "max_gain_without_distortion" dB
from pydub.playback import play
song = AudioSegment.from_mp3("output-file.mp3")
louder_song = song + max_gain_without_distortion
#save louder song
louder_song.export("output.mp3", format='mp3')
PC specifications: ///
OS: windows 10 pro 64 bits ///
RAM: 4gb ///
CPU: dualcore 3ghz ///
PYTHON VERSION: 3.7.1 ///
Pydub version: v0.23.1-0-g46782a9 ///
ffmpeg/avlib version: "Build: ffmpeg-20190219-ff03418-win32-static" ///

As agreed in comments, as a solution I am going to propose using a command line tool: FFmpeg. Here's the command you need:
ffmpeg -i input-file.aac -b:v 128k -filter:a loudnorm output.mp3
using loudnorm. You can also apply gain directly as explained in the docs, but one should expect inferior results. Normalization can be done in number of ways, I suggest reading this post.
By combining it with e.g. find . -name '*.wav' -type f you can easily find and convert all files in a directory tree.
If you're bent on using Python, you can check Python bindings. Basics:
import ffmpeg
ffmpeg.input('stereo.aac').output('mono.mp3').run()
Initially I was going to propose using sox: Sound eXchange, the Swiss Army knife of audio manipulation. It's not Python, though has Python bindings: pysox. However, it turned out it does not support aac format (still has dozens of other formats). I thought it could be interesting to mention it anyway, as one could convert first to more popular format with ffmpeg and pipe results to sox. The latter has many more options for modification of audio stream.
Convert wav to mp3 and resample to 128kbit:
sox -r 128k input-file.wav output-file.mp3
The OP asks to "increase volume by max_gain_without_distortion dB" and for this we can use either gain or norm as explained in docs:
sox -r 128k input-file.wav output-file.mp3 gain −n -3
After docs, The −n option normalises the audio to 0dB FSD; it is often used in conjunction with a negative gain-dB to the effect that the audio is normalised to a given level below 0dB.
sox −−norm -r 128k input-file.wav output-file.mp3

Converting .ul files

I recently copied a bunch of audio files, which are feedback left during a phone call.
The vast majority of them are mp3, but a small percentage are files ending in a .ul extension, which I believe is ULAW.
I have tried to play them in Audacity and VLC, but get garbled sounds. I suspect they are corrupted, but I'd like to confirm that by attempting to convert them to another audio format.
Would anyone be able to recommend a library to do that?
I know Python has the audioop module but I do not know enough to start messing with the audio data.

Pure python library for MIDI to Score (Notes) and/or Audio Translation

I want something that abstracts away MIDI events, to extract/synthesize pitch/duration/dynamic/onset (e.g. loud D# quarter note on the 4th beat).
fluidsynth and timidity work, but I'd prefer a pure python library. I can't find anything but bindings here.
midiutil makes MIDIs and pygame plays them, but I want something that can both synthesize raw audio data and quantize the notes (i.e. as they would be represented in sheet music, not as midi events / pulses / "pitch" / etc).
EDIT: these don't quite do it (either not in python, or too low-level, or "do it yourself"):
Get note data from MIDI file
Python: midi to audio stream

What you probably want is a process called "quantization" which matches the midi events to the closest note length.
I wrote such an app in C 1999:
http://www.findthatzipfile.com/search-3558240-hZIP/winrar-winzip-download-midi2tone.zip.htm
(I don't have source any more, sorry)
The process itself is not very complex. I just brute forced different note lengths to find the closest match. MIDI event pitches themselves map directly to notes, so conversation there is not neede.d
MIDI format itself is not very complex, so I suggest you find a pure Python MIDI reading library and then apply the algorithm on the top of that.
https://github.com/vishnubob/python-midi

Have you tried Mingus? It with py FluidSynth http://code.google.com/p/mingus/wiki/tutorialFluidsynth

Python Creating raw audio

I use Windows 7. All I want to do is create raw audio and stream it to a speaker. After that, I want to create classes that can generate sine progressions (basically, a tone that slowly gets more and more shrill). After that, I want to put my raw audio into audio codecs and containers like .WAV and .MP3 without going insane. How might I be able to achieve this in Python without using dependencies that don't come with a standard install?
I looked up a great deal of files, descriptions, and related questions from here and all over the internet. I read about PCM and ADPCM, as well as A/D Converters. Where I get lost is somewhere between the ratio of byte input --> Kbps output, and all that stuff.
Really, all I want is for somebody to please be able to point me in the right direction to learn the audio formats precisely, and how to use them in Python (but first I want to start with raw audio).

This questions really has 2 parts:
How do I generate audio signals
How do I play audio signals through the speakers.
I wrote a simple wrapper around the python std lib's wave module, called pydub, which you can look at (on github) as a point of reference for how to manipulate raw audio data.
I generally just export the audio data to a file and then play it using VLC player. IMHO there's no reason to write a bunch of code to playback audio unless you're making a synthesizer or a game or some other realtime app.
Anyway, I hope that helps you get started :)

High level audio crossfading library for python

I am looking for a high level audio library that supports crossfading for python (and that works in linux). In fact crossfading a song and saving it is about the only thing I need.
I tried pyechonest but I find it really slow. Working with multiple songs at the same time is hard on memory too (I tried to crossfade about 10 songs in one, but I got out of memory errors and my script was using 1.4Gb of memory). So now I'm looking for something else that works with python.
I have no idea if there exists anything like that, if not, are there good command line tools for this, I could write a wrapper for the tool.

A list of Python sound libraries.
Play a Sound with Python
PyGame or Snack would work, but for this, I'd use something like audioop.
— basic first steps here : merge background audio file

A scriptable solution using external tools AviSynth and avs2wav or WAVI:
Create an AviSynth script file:
test.avs
v=ColorBars()
a1=WAVSource("audio1.wav").FadeOut(50)
a2=WAVSource("audio2.wav").Reverse.FadeOut(50).Reverse
AudioDub(v,a1+a2)
Script fades out on audio1 stores that in a1 then fades in on audio2 and stores that in a2.
a1 & a2 are concatenated and then dubbed with a Colorbar screen pattern to make a video.
You can't just work with audio alone - a valid video must be generated.
I kept the script as simple as possible for demonstration purposes. Google for more details on audio processing via AviSynth.
Now using avs2wav (or WAVI) you can render the audio:
avs2wav.exe test.avs combined.wav
or
wavi.exe test.avs combined.wav
Good luck!
Some references:
How to edit with Avisynth
AviSynth filters reference

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.