Exporting Audio for Google Speech using pydub

Exporting Audio for Google Speech using pydub - python

I'm trying to export audio files to LINEAR16 for Google Speech and I notice that they specify little-endian byte ordering. I'm using pydub to export to 'raw' format, but I can't tell from the documentation (or the source) whether the exported files are in little or big endian format?
I'm using the following command for exporting:
audio = pydub.from_file(self.mFilePathName, "mp4")
fullFileNameRaw = "audio.raw"
audio.export(fullFileNameRaw, format='raw')
Thank you.
-K

According to this answer, standard (RIFF) wave files are little endian. Pydub uses the stdlib wavemodule to write wave files, so I'm guessing it is little endian. (if you write the file with the wave headers it does in fact have RIFF at the beginning).
Looking into it a little further though, it seems like it may depend on the hardware platform's endianness. x86 and AMD64 are both little endian though so that covers basically all the places people would run pydub (I think?)

Related

pydub audio playback is extremely loud for non 16-bit files

I have some some audio samples (from SampleSwap) which I am working with in pydub. Most of them have a sample-depth / bits per sample of 16, while others are 24 or 32. Looks something like this:
import pydub
a = pydub.AudioSegment.from_file('16bit_file.wav')
b = pydub.AudioSegment.from_file('24bit_file.wav')
The problem I am running into is when I try to get them to play back:
from pydub.playback import play
play(a)
play(b)
While the 16-bit files play normally, the 24-bit files are all Earth-shatteringly loud, like seriously to the point of potential speaker damage. With my computer set to minimum volume, the 24-bit play back is about as loud as regular music would play back on maximum volume. It's super distorted, sharp, and clipped.
I'm pretty sure I've isolated it to be a problem of bit-depth. The sounds all play normally when played in other software. I can convert the problem sounds to be 16-bit either using sox or using pydub.AudioSegment.set_sample_width(2) and the issue goes away. I have also gone directly through simpleaudio to do the playback (copying the code from pydub, here) and get the same issue.
The main problem is I am writing some code for working with audio which I would like to share, but I do not want users to experience the physical or mental damage from hearing one of these busted sounds. My only idea of a workaround is to immediately convert the bit-depth of any use loaded sounds/lock audio playback to 16-bit files only; this works for the files I am testing, but a) I don't know if it holds true for all sounds/computers, and b) I thought this shouldn't be an issue in pydub anyway. I also thought to somehow check the volume of the sound before playing (using e.g. a.dBFS or a.max), but I haven't found anything that seems to be reliable (either the metric isn't really correlated with the volume, or the value seems to be more of an indication of the dynamic range provided by the extra bits).
So my questions are:
Why do I get this alarmingly loud, distorted playback in pydub when playing non-16-bit files?
What can I do to prevent it?
Am I missing something obvious here about audio playback?
I understand this is (hopefully) not so reproducible; I could try to record it and post if that would be helpful. I can also point out the sounds I am using on SampleSwap, but the problem really seems to be caused by any file that is not 16-bit (i.e. I can convert a sound to be 32-bit and generate the issue).
Here's some version info:
ffmpeg 4.4
PyAudio 0.2.11
pydub 0.25.1
simpleaudio 1.0.4
And the issue is on a 2019 MacBook Pro, Catalina 10.15.7. I've also tested my Windows 10 Desktop (with similar versions as above), but rather than the issue above, I just get silence.

Compressing a video file in python with the standard library

Is there a way to effectively compress a video file with the standard library of python? I wrote a quick script to accomplish this, but it barely compresses the video file. Take a look:
import sys
import zlib
with open('Some_Video.mp4', 'rb') as f:
original_data = f.read()
original_size = sys.getsizeof(original_data)
compress_data = zlib.compress(original_data, level=5)
compressed_size = sys.getsizeof(compress_data)
print(original_size)
print(compressed_size)
This was the output:
2793876
2788282
Why is the difference so small, and how can I compress further?

Video files are already compressed. You cannot compress them further, at least not significantly.
Your only option would be to decompress them, and then recompress them with a more effective compressor, e.g. HEVC.

I believe the small reduction in file size is due zlib being a lossless compression library, and mp4 is already a compressed format so there's little margin to improvement.
From the standard library, lzma claims to have the best compression ratio. But keep in mind it's also lossless so I would not expect that much difference.
I recommend you use the third-party lib ffmpeg-python. It's a wrapper for the command line application ffmpeg, which would let you transcode your mp4 using better encoders like h265.

Automate tasks with Python

I am looking for a way to automate tasks in external programs with Python.
I have large audio files in AAC format. I need to convert them to mp3, and then amplify them (avoiding the distortion).
I wrote a program with the pydub library that works great with small files, but my files are too large (longer than 2hs or 200mb) and I run out of memory (because that lib store the full files in RAM, I think). I can't split the file in chunks because I could not merge them again for the previous reason, and I need the file in one piece.
So, I would like to write a program that open another program to convert the file to mp3 (mediahuman audio converter) and then, amplify the converted file with another program (WavePad audio editor) but i don't know if is this possible.
In the present, I'm doing that manually, but that takes a long time of waiting and requires less than 10 clicks (spread throughout the process), which is tedious.
I leave the program I wrote. I transcribed it to remove some functions that are not relevant and are not related to this process, plus I translated the comments, variables and other things into English, so it may have some errors but the original program works well:
import glob
import os
from pydub import AudioSegment
#convert to mp3 128 bits
sound = AudioSegment.from_file("input-file.aac")
sound.export("output-file.mp3", format="mp3", bitrate="128k")
#sound.max_dBFS shows how far below the limit the highest sample is (in dB)
sound = AudioSegment.from_file("output.mp3", format="mp3")
max_gain_without_distortion = -1 * sound.max_dBFS
#increase volume by "max_gain_without_distortion" dB
from pydub.playback import play
song = AudioSegment.from_mp3("output-file.mp3")
louder_song = song + max_gain_without_distortion
#save louder song
louder_song.export("output.mp3", format='mp3')
PC specifications: ///
OS: windows 10 pro 64 bits ///
RAM: 4gb ///
CPU: dualcore 3ghz ///
PYTHON VERSION: 3.7.1 ///
Pydub version: v0.23.1-0-g46782a9 ///
ffmpeg/avlib version: "Build: ffmpeg-20190219-ff03418-win32-static" ///

As agreed in comments, as a solution I am going to propose using a command line tool: FFmpeg. Here's the command you need:
ffmpeg -i input-file.aac -b:v 128k -filter:a loudnorm output.mp3
using loudnorm. You can also apply gain directly as explained in the docs, but one should expect inferior results. Normalization can be done in number of ways, I suggest reading this post.
By combining it with e.g. find . -name '*.wav' -type f you can easily find and convert all files in a directory tree.
If you're bent on using Python, you can check Python bindings. Basics:
import ffmpeg
ffmpeg.input('stereo.aac').output('mono.mp3').run()
Initially I was going to propose using sox: Sound eXchange, the Swiss Army knife of audio manipulation. It's not Python, though has Python bindings: pysox. However, it turned out it does not support aac format (still has dozens of other formats). I thought it could be interesting to mention it anyway, as one could convert first to more popular format with ffmpeg and pipe results to sox. The latter has many more options for modification of audio stream.
Convert wav to mp3 and resample to 128kbit:
sox -r 128k input-file.wav output-file.mp3
The OP asks to "increase volume by max_gain_without_distortion dB" and for this we can use either gain or norm as explained in docs:
sox -r 128k input-file.wav output-file.mp3 gain −n -3
After docs, The −n option normalises the audio to 0dB FSD; it is often used in conjunction with a negative gain-dB to the effect that the audio is normalised to a given level below 0dB.
sox −−norm -r 128k input-file.wav output-file.mp3

Converting .ul files

I recently copied a bunch of audio files, which are feedback left during a phone call.
The vast majority of them are mp3, but a small percentage are files ending in a .ul extension, which I believe is ULAW.
I have tried to play them in Audacity and VLC, but get garbled sounds. I suspect they are corrupted, but I'd like to confirm that by attempting to convert them to another audio format.
Would anyone be able to recommend a library to do that?
I know Python has the audioop module but I do not know enough to start messing with the audio data.

Importing audio track (wav or aiff) in Python

I have an audio track in AIFF format. I would like to open this audio file with Python, and import the amplitudes of the sound and perform some mathematical analysis such as Fourier Transform, etc.
Is this possible in Python?
Are there libraries or modules, which allow me to acquire an audio file?
Throughout my search, I have found scipy.io.wavfile, which works for WAV audio files.
Are there other libraries to import audio files in Python?
Is there something similar for AIFF files?
Obviously, I can convert the AIFF into a WAV file, but I would like to import the AIFF file directly, if possible.
As a side question: are there some more specific (by specific, I mean better than Python) programming languages to perform such kind of analysis and acquisition of audio files?

Python comes with AIFF support as part of the standard library -- see the aifc module.
This module provides support for reading and writing AIFF and AIFF-C
files. AIFF is Audio Interchange File Format, a format for storing
digital audio samples in a file. AIFF-C is a newer version of the
format that includes the ability to compress the audio data.
Depending on what your end goals are, you may be more productive using a tool like PureData that's designed just for working with audio and has things like reading audio files and performing ffts as primitives.

Yes, I also came across this problem using scipy.io.wavfile. I looked up the problem and see that Scikits might be interesting to get around this wave only solution.
https://sites.google.com/site/ldpyproject/scikits-audiolab
As for Pure Data I use this a lot, but of course it does depend on what you wishing to do with your sound file...?

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.