.raw to .wav via pydub(AudioSegment) sounds noisy - python

I want to convert .raw audio file to .wav audio file. So, I use below code with pydub AudioSegment
final = AudioSegment.from_file('input.raw', format='raw', frame_rate=8000, channels=1, sample_width=1).export('result.wav', format='wav')
btw, its output file 'result.wav' sounds very noisy. Actually, I'm not sure 'input.raw' file has clear sound (because it is gotten from RTP packet of VoIP phone call).
So, my question is, does output(.wav) file have clear sound if input(.raw) file does not be crashed? I'm wondering what is the problem. crashed file? or not correct code?

I ran into a similar issue when I was attempting to convert PCMU RAW audio to WAV format and I reached to the author of pydub via this issue on GitHub and here was his response:
pydub assumes any file is a raw wave if the filename ends with raw.
And also doesn't have a way to inject the -ar 8000 into the conversion
command (to tell ffmpeg that the audio is at 8000 samples per second)
So the workaround is to open the file manually and explicitly tell pydub what the format of the file is like so:
# open the file ourselves so that pydub doesn't try to inspect the file name
with open('input.raw', 'rb') as raw_audio_f:
# explicitly tell pydub the format for your file
# use ffmpeg -i format | grep PCM to figure out what to string value to use
sound = AudioSegment.from_file(raw_audio_f, format="mulaw")
# override the default sample rate with the rate we know is correct
sound.frame_rate = 8000
# then export it
sound.export('result.wav')

Related

Using os.system() to convert audio files sample rate

I have started working on an NLP project, and at the start of this, I need to downsample the audio files. To do this I have found one script that can do it automatically, but though I can use it to downsample my audio I'm struggling to understand how it's working.
def convert_audio(audio_path, target_path, remove=False):
"""This function sets the audio `audio_path` to:
- 16000Hz Sampling rate
- one audio channel ( mono )
Params:
audio_path (str): the path of audio wav file you want to convert
target_path (str): target path to save your new converted wav file
remove (bool): whether to remove the old file after converting
Note that this function requires ffmpeg installed in your system."""
os.system(f"ffmpeg -i {audio_path} -ac 1 -ar 16000 {target_path}")
# os.system(f"ffmpeg -i {audio_path} -ac 1 {target_path}")
if remove:
os.remove(audio_path)
this is the code that's giving my trouble, I don't understand how the 4th line from the bottom works, I believe that is the line that resamples the audio files.
The repo this is inside of :
https://github.com/x4nth055/pythoncode-tutorials/
if anyone has information on how this is done I'd love to know, or if there are better ways to downsample audio files! Thanks
Have you ever used ffmpeg? the docs clearly show the options(maybe need audio expertise to understand)
-ac[:stream_specifier] channels (input/output,per-stream) Set the number of audio channels. For output streams it is set by default to
the number of input audio channels. For input streams this option only
makes sense for audio grabbing devices and raw demuxers and is mapped
to the corresponding demuxer options.
-ar[:stream_specifier] freq (input/output,per-stream) Set the audio sampling frequency. For output streams it is set by default to the
frequency of the corresponding input stream. For input streams this
option only makes sense for audio grabbing devices and raw demuxers
and is mapped to the corresponding demuxer options.
Explanations for os.system
Execute the command (a string) in a subshell...on Windows, the return
value is that returned by the system shell after running command. The
shell is given by the Windows environment variable COMSPEC: it is
usually cmd.exe, which returns the exit status of the command run; on
systems using a non-native shell, consult your shell documentation.
for better understanding, suggest print the command
cmd_str = f"ffmpeg -i {audio_path} -ac 1 -ar 16000 {target_path}"
print(cmd_str) # then you can copy paste to cmd/bash and run
os.system(cmd_str)

Write .3gp file into .wav format python Flask server

I need to record a .3gp audio file coming from the Android front-end to be converted into .wav audio using the python Flask server back-end for further processing. Any suggested method or library to convert .3gp audio into .wav audio format?
audiofile = flask.request.files['file']
filename = werkzeug.utils.secure_filename(audiofile.filename)
audiofile.save('Audio/' + filename)
I'm using this code now which receives the audio file as .3gp. I need to convert this into .wav format
Update: You can also do it using ffmpeg
Method 1:
https://github.com/adaptlearning/adapt_authoring/wiki/Installing-FFmpeg#installing-ffmpeg-in-ubuntu
bash
ffmpeg -i path/to/3gp.3gp path/to/wav.wav
or
python (which runs bash command)
import os
os.system('ffmpeg -i path/to/3gp.3gp path/to/wav.wav')
Method 2:
Convert .3gp to .mp3 then .mp3 to .wav
Use https://pypi.org/project/ftransc/ to convert .3gp to .mp3. Currently there is no python API for that so either use
bash
ftransc -f mp3 filename.3gp give the destination - check for help
OR
python
os.system('ftransc -f mp3 filename.3gp')
Then use pydub https://github.com/jiaaro/pydub#installation to convert .mp3 to .wav
newAudio = AudioSegment.from_mp3('path/to/mp3')
newAudio.export('path/to/destination.wav', format="wav")

How to validate mp4 file or audio files in general with python?

I have a rest API built with Django rest framework, one of its serializers is to accept Base64file which is our audio file, now what I want is simply check and validate the decoded file so I can know if this a valid mp4 or any audio type in general or not.
the problem is, sometimes the audio file after upload and save is corrupted and can't be played, so doing this validation is essential to make sure that the file is sent correctly or it was sent corrupted at first place.
I have been digging on google and the Internet searching for anything can do this simple task but all I found was how to play audio or manipulate it, I didn't even find something that may raise an exception if the file is not valid when trying to open it.
for more info. I'm using django-extra-fields, I use Base64FileField to implement my Audio file field, they provided an example to do so for like PDF's, I'm trying to do this similar way for audio but what is holding me is doing the check for audio.
The example of PDF:
class PDFBase64File(Base64FileField):
ALLOWED_TYPES = ['pdf']
def get_file_extension(self, filename, decoded_file):
try:
PyPDF2.PdfFileReader(io.BytesIO(decoded_file))
except PyPDF2.utils.PdfReadError as e:
logger.warning(e)
else:
return 'pdf'
What is done so far:
class AudioBase64File(Base64FileField):
ALLOWED_TYPES = (
'amr',
'ogg',
'm4a',
'3gp',
'aac',
'mp4',
'mp3',
'flac'
)
INVALID_FILE_MESSAGE = ("Please upload a valid audio.")
INVALID_TYPE_MESSAGE = ("The type of the audio couldn't be determined.")
def get_file_extension(self, filename, decoded_file):
# missing validation
return 'mp4'
You can use ffmpeg.
You can read the file and see if there is any error or not. ffmpeg will report any error while reading the file.
You can also skip some parts of the video just to make it faster but reading a file without doing anything is pretty fast and should be good enough.
ffmpeg -v error -i file.mp4 -f null - 2>error.log
How can I check the integrity of a video file (avi, mpeg, mp4…)?
ffmpeg

EOF in scipy.io.wavfile.read

I was trying to get data of a wav file using scipy.io.wavfile.read but it always returns this error message: ValueError: Unexpected end of file.
I went through all the related questions on this site (I guess). But none of them worked. I have also tried writing filename as r'Mozart 40 Allegro.wav'.
import scipy.io.wavfile
sample,data=scipy.io.wavfile.read('Mozart 40 Allegro.wav')
print(data)
Note: Others have mentioned that my wav file may be corrupt, so I downloaded a sample wav file. And this was the result. WavFileWarning: Chunk (non-data) not understood, skipping it.
WavFileWarning)
But is there any way to get the wav file I require which is not corrupt and doesn't give the second error message I mentioned?
Thank You
Thanks: Initially I used some online converter but they do a very bad job in keeping the file intact with the precise format, vlc can handle such errors but these can't. Always use sox to convert and other stuff and don't forget to include the required extra files (lame files) if you are working with mp3.
I have some similar problems with some files that aren't created with proper headers. To solved I first transformed the file from wav to wav with ffmeg. This creates the metadata for the wav file.
Then the steps to follow should be more or less:
ffmpeg -i "Mozart 40 Allegro.wav" -f wav -acodec pcm_s16le -ar 22050 -ac 1 "Mozart 40 Allegro_.wav"
And then the new created file should have the proper metadata. So now it should not raise the error when it is opened on python:
sample,data=scipy.io.wavfile.read('Mozart 40 Allegro_.wav')
Use underscore for spaces:
sample,data=scipy.io.wavfile.read('Mozart_40_Allegro.wav')
Try:
import soundfile as sf
audio = sf.read("file")

How to fix the # issue 400 Specify FLAC encoding to match file header?

I am using google API for speech to text.
below is my python code:
from google.cloud import speech_v1p1beta1 as speech
import os
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="C:\\Users\\chetan.patil\\Speech Recognition-db71b5de7c80.json" #Specified key
client=speech.SpeechClient()
speech_file="Chetan_Recording_20Secflac.flac" #import file
with open(speech_file,'rb') as audio_file:
content=audio_file.read()
audio=speech.types.RecognitionAudio(content=content)
config=speech.types.RecognitionConfig(encoding=speech.enums.RecognitionConfig.AudioEncoding.LINEAR16,
language_code='en_US',enable_speaker_diarization=True,audio_channel_count=1,
sample_rate_hertz=44100)
response = client.recognize(config, audio)
When i run the last code of line. It gives error as "400 Specify FLAC encoding to match file header"
Even i tried with .wav file then its giving error as "400 Must use single channel (mono) audio, but WAV header indicates 2 channels"
Can anyone please help me on this?
Removing the entire encoding configuration also seems to work. I mean dropping the encoding=speech.enums.RecognitionConfig.AudioEncoding.LINEAR16 from the config settings since this can be inferred from the headers of the audio file.
When i run the last code of line. It gives error as "400 Specify FLAC encoding to match file header"
You need speech.enums.RecognitionConfig.AudioEncoding.FLAC to process FLAC files
Even i tried with .wav file then its giving error as "400 Must use
single channel (mono) audio, but WAV header indicates 2 channels"
The wav file should be mono indeed, looks like you tried a stereo file.

Categories