Automate tasks with Python - python

I am looking for a way to automate tasks in external programs with Python.
I have large audio files in AAC format. I need to convert them to mp3, and then amplify them (avoiding the distortion).
I wrote a program with the pydub library that works great with small files, but my files are too large (longer than 2hs or 200mb) and I run out of memory (because that lib store the full files in RAM, I think). I can't split the file in chunks because I could not merge them again for the previous reason, and I need the file in one piece.
So, I would like to write a program that open another program to convert the file to mp3 (mediahuman audio converter) and then, amplify the converted file with another program (WavePad audio editor) but i don't know if is this possible.
In the present, I'm doing that manually, but that takes a long time of waiting and requires less than 10 clicks (spread throughout the process), which is tedious.
I leave the program I wrote. I transcribed it to remove some functions that are not relevant and are not related to this process, plus I translated the comments, variables and other things into English, so it may have some errors but the original program works well:
import glob
import os
from pydub import AudioSegment
#convert to mp3 128 bits
sound = AudioSegment.from_file("input-file.aac")
sound.export("output-file.mp3", format="mp3", bitrate="128k")
#sound.max_dBFS shows how far below the limit the highest sample is (in dB)
sound = AudioSegment.from_file("output.mp3", format="mp3")
max_gain_without_distortion = -1 * sound.max_dBFS
#increase volume by "max_gain_without_distortion" dB
from pydub.playback import play
song = AudioSegment.from_mp3("output-file.mp3")
louder_song = song + max_gain_without_distortion
#save louder song
louder_song.export("output.mp3", format='mp3')
PC specifications: ///
OS: windows 10 pro 64 bits ///
RAM: 4gb ///
CPU: dualcore 3ghz ///
PYTHON VERSION: 3.7.1 ///
Pydub version: v0.23.1-0-g46782a9 ///
ffmpeg/avlib version: "Build: ffmpeg-20190219-ff03418-win32-static" ///

As agreed in comments, as a solution I am going to propose using a command line tool: FFmpeg. Here's the command you need:
ffmpeg -i input-file.aac -b:v 128k -filter:a loudnorm output.mp3
using loudnorm. You can also apply gain directly as explained in the docs, but one should expect inferior results. Normalization can be done in number of ways, I suggest reading this post.
By combining it with e.g. find . -name '*.wav' -type f you can easily find and convert all files in a directory tree.
If you're bent on using Python, you can check Python bindings. Basics:
import ffmpeg
ffmpeg.input('stereo.aac').output('mono.mp3').run()
Initially I was going to propose using sox: Sound eXchange, the Swiss Army knife of audio manipulation. It's not Python, though has Python bindings: pysox. However, it turned out it does not support aac format (still has dozens of other formats). I thought it could be interesting to mention it anyway, as one could convert first to more popular format with ffmpeg and pipe results to sox. The latter has many more options for modification of audio stream.
Convert wav to mp3 and resample to 128kbit:
sox -r 128k input-file.wav output-file.mp3
The OP asks to "increase volume by max_gain_without_distortion dB" and for this we can use either gain or norm as explained in docs:
sox -r 128k input-file.wav output-file.mp3 gain −n -3
After docs, The −n option normalises the audio to 0dB FSD; it is often used in conjunction with a negative gain-dB to the effect that the audio is normalised to a given level below 0dB.
sox −−norm -r 128k input-file.wav output-file.mp3

Related

How to efficiently extract specific sections of a video (Python/FFmpeg)?

I have a bunch of videos for which I want to extract specific sections (either as videos or as frames). I get the specific sections from a .json file where the start and end frames are stored according to labels, like 'cat in video', 'dog in video'. I have an existing method in Python using opencv using the method mentioned here but I found a one-liner using ffmpeg which is a lot more faster and efficient than my Python script, except that I have to manually fill in the start and end frames in this command.
ffmpeg -i in.mp4 -vf select='between(n\,x\,y)' -vsync 0 frames%d.png
I read a few questions about working with .json files in a shell script or passing arguments to a batch script which looks quite complicated and might spoil my system. Since I'm not familar working with .json files in a shell/batch script, I'm not sure how to start. Can anyone point me in the right direction on how to make a batch script that can read variables from a .json file and input it into my ffmpeg command?
Since you're already familiar with Python, I suggest you to use it to parse JSON files, then you can use ffmpeg-python library, which is a ffmpeg binding for Python. It also has a crop function, which I assume is what you need.
An alternative would be to use the os.system('ffmpeg <arguments>') calls from a Python script, which allows you to run external tools from the script.
Python natively supports JSON with its builtin json package
As for doing this in python, here is an alternative approach that you can try my ffmpegio-core package:
import ffmpegio
ffmpegio.transcode('in.mp4','frames%d.png',vf=f"select='between(n\,{x}\,{y})'",vsync=0)
If the videos are constant frame rate, it could be faster to specify the start and end timestamps as input options:
fs = ffmpegio.probe.video_streams_basic('in.mp4')[0]['frame_rate']
ffmpegio.transcode('in.mp4', 'frames%d.png', ss_in=x/fs, to_in=y/fs, vsync=0)
If you don't know the frame rate, you are calling ffprobe and ffmpeg for each file, so there is a tradeoff. But if your input video is long, it could be worthwhile.
But if speed is your primary goal, calling FFmpeg directly always is the fastest.
ffmpegio GitHub repo

pydub audio playback is *extremely* loud for non 16-bit files

I have some some audio samples (from SampleSwap) which I am working with in pydub. Most of them have a sample-depth / bits per sample of 16, while others are 24 or 32. Looks something like this:
import pydub
a = pydub.AudioSegment.from_file('16bit_file.wav')
b = pydub.AudioSegment.from_file('24bit_file.wav')
The problem I am running into is when I try to get them to play back:
from pydub.playback import play
play(a)
play(b)
While the 16-bit files play normally, the 24-bit files are all Earth-shatteringly loud, like seriously to the point of potential speaker damage. With my computer set to minimum volume, the 24-bit play back is about as loud as regular music would play back on maximum volume. It's super distorted, sharp, and clipped.
I'm pretty sure I've isolated it to be a problem of bit-depth. The sounds all play normally when played in other software. I can convert the problem sounds to be 16-bit either using sox or using pydub.AudioSegment.set_sample_width(2) and the issue goes away. I have also gone directly through simpleaudio to do the playback (copying the code from pydub, here) and get the same issue.
The main problem is I am writing some code for working with audio which I would like to share, but I do not want users to experience the physical or mental damage from hearing one of these busted sounds. My only idea of a workaround is to immediately convert the bit-depth of any use loaded sounds/lock audio playback to 16-bit files only; this works for the files I am testing, but a) I don't know if it holds true for all sounds/computers, and b) I thought this shouldn't be an issue in pydub anyway. I also thought to somehow check the volume of the sound before playing (using e.g. a.dBFS or a.max), but I haven't found anything that seems to be reliable (either the metric isn't really correlated with the volume, or the value seems to be more of an indication of the dynamic range provided by the extra bits).
So my questions are:
Why do I get this alarmingly loud, distorted playback in pydub when playing non-16-bit files?
What can I do to prevent it?
Am I missing something obvious here about audio playback?
I understand this is (hopefully) not so reproducible; I could try to record it and post if that would be helpful. I can also point out the sounds I am using on SampleSwap, but the problem really seems to be caused by any file that is not 16-bit (i.e. I can convert a sound to be 32-bit and generate the issue).
Here's some version info:
ffmpeg 4.4
PyAudio 0.2.11
pydub 0.25.1
simpleaudio 1.0.4
And the issue is on a 2019 MacBook Pro, Catalina 10.15.7. I've also tested my Windows 10 Desktop (with similar versions as above), but rather than the issue above, I just get silence.

Exporting Audio for Google Speech using pydub

I'm trying to export audio files to LINEAR16 for Google Speech and I notice that they specify little-endian byte ordering. I'm using pydub to export to 'raw' format, but I can't tell from the documentation (or the source) whether the exported files are in little or big endian format?
I'm using the following command for exporting:
audio = pydub.from_file(self.mFilePathName, "mp4")
fullFileNameRaw = "audio.raw"
audio.export(fullFileNameRaw, format='raw')
Thank you.
-K
According to this answer, standard (RIFF) wave files are little endian. Pydub uses the stdlib wavemodule to write wave files, so I'm guessing it is little endian. (if you write the file with the wave headers it does in fact have RIFF at the beginning).
Looking into it a little further though, it seems like it may depend on the hardware platform's endianness. x86 and AMD64 are both little endian though so that covers basically all the places people would run pydub (I think?)

Proper params to convert mp3 to aac using avconv

I am writing a little python script that converts mp3 files to aac. I am doing so by using avconv and I am wondering if I am doing it right.
Currently my command looks like this:
avconv -i input.mp3 -ab 80k output.aac
This brings me to my first question: I am using -ab 80k as this works with my test-files. On some files I can go higher and use 100k. But I'd prefer to have that always on the highest settings. Is there a way to say that?
The other question: I am using it in a python script. Currently I call it as a subprocess. What I'd prefer is not to do so, as this forces me to write a file to disc and then load it again when everything is done. Is there a way to only do it in memory? I am returning the file afterwards using web.py and don't need or want it on my disc? So would be cool not having to use temporary files at all.
Thanks for any tipps and tricks :)
I don't have the -ab option but if it is equivalent to -ar (specify the sample rate), I should point out that your ears won't be able to tell the difference between 80k and anything higher.
On the subject of temporary files, have you considered using /tmp or a specific tmpfs file system created for the purpose.
Edit:
In response to comment about tempfiles, yes you still use them but create them in /tmp or a tmpfs file system that you have created for the job. It should get cleared on reboot but I would expect you to delete the file once you have passed it on anyway.
The other point about lossless aac I may come back to you later.
Edit 2:
As I suspected the aac format is described as the logical successor to mp3 and I strongly suspect, you or someone else may know different, that as mp3 is what is termed as lossy, compressed i.e. bits ( no pun intended) missing, your desire to convert losslessly is doomed, in so much as the source is already lossy.
Of course, being no expert in the matter, I stand to be corrected.
Edit 3:
your comment about too many frames leads me to believe that you are conflating the two avconv options -ar and -b
The -b is used for video and specifies the output bit rate for video and audio. The way you are using it, I suspect that it is attempting to apply the same bit rate to audio and video but there is a limit on the audio stream.
You would have to use -b:v to tell avconv to set the video bit rate and leave the audio rate alone.
I suggest that you lose the -ab option and use -ar instead as that is audio only.

Pure python library for MIDI to Score (Notes) and/or Audio Translation

I want something that abstracts away MIDI events, to extract/synthesize pitch/duration/dynamic/onset (e.g. loud D# quarter note on the 4th beat).
fluidsynth and timidity work, but I'd prefer a pure python library. I can't find anything but bindings here.
midiutil makes MIDIs and pygame plays them, but I want something that can both synthesize raw audio data and quantize the notes (i.e. as they would be represented in sheet music, not as midi events / pulses / "pitch" / etc).
EDIT: these don't quite do it (either not in python, or too low-level, or "do it yourself"):
Get note data from MIDI file
Python: midi to audio stream
What you probably want is a process called "quantization" which matches the midi events to the closest note length.
I wrote such an app in C 1999:
http://www.findthatzipfile.com/search-3558240-hZIP/winrar-winzip-download-midi2tone.zip.htm
(I don't have source any more, sorry)
The process itself is not very complex. I just brute forced different note lengths to find the closest match. MIDI event pitches themselves map directly to notes, so conversation there is not neede.d
MIDI format itself is not very complex, so I suggest you find a pure Python MIDI reading library and then apply the algorithm on the top of that.
https://github.com/vishnubob/python-midi
Have you tried Mingus? It with py FluidSynth http://code.google.com/p/mingus/wiki/tutorialFluidsynth

Categories