python make_chunks from an audio stream wav or mp3 - python

I want to write a python program that write chunks from an audio file. I can write chunks from an audio file available locally using following code,
from pydub import AudioSegment
from pydub.utils import make_chunks
myaudio = AudioSegment.from_file("file1.wav" , "wav")
chunk_length_ms = 10000 # pydub calculates in millisec
chunks = make_chunks(myaudio, chunk_length_ms) #Make chunks of one sec
#Export all of the individual chunks as wav files
for i, chunk in enumerate(chunks):
chunk_name = "chunk{0}.wav".format(i)
print "exporting", chunk_name
chunk.export(chunk_name, format="wav")
The above code will create chunks with 10000 milliseconds of the audio file "file1.wav". But I want to write chunks from an audio stream, the stream could be wav or mp3. Can someone help me on this?

change the audio chunk to numpy array and use the function .get_array_of_samples()
np.array(chunk[0].get_array_of_samples())

Related

Python - Reading a large audio file to a stream?

The Question
I want to load an audio file of any type (mp3, m4a, flac, etc) and write it to an output stream.
I tried using pydub, but it loads the entire file at once which takes forever and runs out of memory easily.
I also tried using python-vlc, but it's been unreliable and too much of a black box.
So, how can I open large audio files chunk-by-chunk for streaming?
Edit #1
I found half of a solution here, but I'll need to do more research for the other half.
TL;DR: Use subprocess and ffmpeg to convert the file to wav data, and pipe that data into np.frombuffer. The problem is, the subprocess still has to finish before frombuffer is used.
...unless it's possible to have the pipe written to on 1 thread while np reads it from another thread, which I haven't tested yet. For now, this problem is not solved.
I think the python package https://github.com/irmen/pyminiaudio can be of helpful. You can stream an audio file like this
import miniaudio
audio_path = "my_audio_file.mp3"
target_sampling_rate = 44100 #the input audio will be resampled a this sampling rate
n_channels = 1 #either 1 or 2
waveform_duration = 30 #in seconds
offset = 15 #this means that we read only in the interval [15s, duration of file]
waveform_generator = miniaudio.stream_file(
filename = audio_path,
sample_rate = target_sampling_rate,
seek_frame = int(offset * target_sampling_rate),
frames_to_read = int(waveform_duration * target_sampling_rate),
output_format = miniaudio.SampleFormat.FLOAT32,
nchannels = n_channels)
for waveform in waveform_generator:
#do something with the waveform....
I know for sure that this works on mp3, ogg, wav, flac but for some reason it does not on mp4/acc and I am actually looking for a way to read mp4/acc

Audio profanity filter using python

I am trying to write a python code in which users inputs a video file and program will mute/beep the curse/bad words in it and outputs a filtered video file, basically a profanity filter.
Firstly, I converted the video file into .wav format and then trying to apply audio profanity filter into the .wav file and will write that .wav file into the video.
So far, I am able to make chunks of audio file and extract text from each audio chunk of 5 secs using speech_recognition library. But, if words are overlapped between the chunks I will not be able to detect detect them and apply the check condition if that chunk text is found in curse words list and will reduce the db of that audio file which will mute it (Suggest other way to make beep sound instead of muting it).
I am confused if my approach is right.
I just want a audio profanity filter using python, up-till now I am able to extract text and make chunks only.
import speech_recognition as sr
import os
from pydub.utils import make_chunks
from pydub import AudioSegment
from pydub.playback import play
import codecs
import re
import fileinput
list=[]
#Curse words list
with codecs.open("words_final.txt", "r") as f0:
sentences_lines=f0.read().splitlines()
for sentences in sentences_lines:
list.append(sentences)
# print(list)
# create a speech recognition object
r = sr.Recognizer()
# a function that splits the audio file into chunks
# and applies speech recognition
def get_large_audio_transcription(path):
"""
Splitting the large audio file into chunks
and apply speech recognition on each of these chunks
"""
# open the audio file using pydub
sound = AudioSegment.from_wav(path)
chunk_length_ms = 5000 # pydub calculates in millisec
chunks = make_chunks(sound, chunk_length_ms) #Make chunks of one sec
folder_name = "audio-chunks"
# create a directory to store the audio chunks
if not os.path.isdir(folder_name):
os.mkdir(folder_name)
whole_text = ""
# process each chunk
for i, audio_chunk in enumerate(chunks, start=1):
# export audio chunk and save it in
# the `folder_name` directory.
chunk_filename = os.path.join(folder_name, f"chunk{i}.wav")
audio_chunk.export(chunk_filename, format="wav")
# recognize the chunk
with sr.AudioFile(chunk_filename) as source:
audio_listened = r.record(source)
# try converting it to text
try:
text = r.recognize_google(audio_listened,language="en-US")
wav_file=AudioSegment.from_file(chunk_filename, format = "wav")
# Reducing volume by 5
silent_wav_file = AudioSegment.silent(duration=8000)
# Playing silent file
play(silent_wav_file)
except sr.UnknownValueError as e:
print("Error:", str(e))
else:
text = f"{text.capitalize()}. "
print(chunk_filename, ":", text)
whole_text += text
# return the text for all chunks detected
return whole_text
path = "Welcome.wav"
print("\nFull text:", get_large_audio_transcription(path))
#Will implement a loop to sum all chunks a make a final filtered .wav file
First extract audio.wav from the video file. Then using this solution get the timestamps for each word from that audio. You can then mute/beep the bad words from the audio and join it back to the video.

how to append audio frames to wav file python

I have a stream of PCM audio frames coming to my python code .Is there way to write frame in a way that appends to an existing .wav file. What i have tried is i am taking 2 wav files . From 1 wav file i am reading the data and writing to a existing wav file
import numpy
import wave
import scipy.io.wavfile
with open('testing_data.wav', 'rb') as fd:
contents = fd.read()
contents1=bytearray(contents)
numpy_data = numpy.array(contents1, dtype=float)
scipy.io.wavfile.write("whatstheweatherlike.wav", 8000, numpy_data)
data is getting appended in the existing wav file but the wav file is getting corrupted when i am trying to play in a media player
With wave library you can do that with something like:
import wave
audiofile1="youraudiofile1.wav"
audiofile2="youraudiofile2.wav"
concantenated_file="youraudiofile3.wav"
frames=[]
wave0=wave.open(audiofile2,'rb')
frames.append([wave0.getparams(),wave0.readframes(wave0.getnframes())])
wave.close()
wave1=wave.open(audiofile2,'rb')
frames.append([wave1.getparams(),wave1.readframes(wave1.getnframes())])
wave1.close()
result=wave.open(concantenated_file,'wb')
result.setparams(frames[0][0])
result.writeframes(frames[0][1])
result.writeframes(frames[1][1])
result.close()
And the order of concatenation is exactly the order of the writing here :
result.writeframes(frames[0][1]) #audiofile1
result.writeframes(frames[1][1]) #audiofile2

Convert PyAudio microphone input stream to mp3

I am looking for ways to directly encode mp3 files from the microphone without saving to an intermediate wav file. There are tons of examples for saving to a wav file out there and a ton of examples for converting a wav file to mp3. But I have had no luck finding a way to save an mp3 directly from the mic. For example I am using the below example found on the webs to record to a wav file.
Am hoping to get suggestions on how to convert the frames list (pyaudio stream reads) to an mp3 directly. Or alternatively, stream the pyaudio microphone input directly to an mp3 via ffmpeg without populating a list/array with read data. Thank you very much!
import pyaudio
import wave
# the file name output you want to record into
filename = "recorded.wav"
# set the chunk size of 1024 samples
chunk = 1024
# sample format
FORMAT = pyaudio.paInt16
# mono, change to 2 if you want stereo
channels = 1
# 44100 samples per second
sample_rate = 44100
record_seconds = 5
# initialize PyAudio object
p = pyaudio.PyAudio()
# open stream object as input & output
stream = p.open(format=FORMAT,
channels=channels,
rate=sample_rate,
input=True,
output=True,
frames_per_buffer=chunk)
frames = []
print("Recording...")
for i in range(int(44100 / chunk * record_seconds)):
data = stream.read(chunk)
frames.append(data)
print("Finished recording.")
# stop and close stream
stream.stop_stream()
stream.close()
# terminate pyaudio object
p.terminate()
# save audio file
# open the file in 'write bytes' mode
wf = wave.open(filename, "wb")
# set the channels
wf.setnchannels(channels)
# set the sample format
wf.setsampwidth(p.get_sample_size(FORMAT))
# set the sample rate
wf.setframerate(sample_rate)
# write the frames as bytes
wf.writeframes(b"".join(frames))
# close the file
wf.close()
I was able to find a way to convert the pyaudio pcm stream to mp3 without saving to an intermediate wav file using a lame 3.1 binary from rarewares. I'm sure it can be done with ffmpeg as well but since ffmpeg uses lame to encode to mp3 I thought I would just focus on lame.
For converting the raw pcm array to an mp3 directly, remove all the wave file operations and replace with the following. This pipes the data into lame all in one go.
raw_pcm = b''.join(frames)
l = subprocess.Popen("lame - -r -m m recorded.mp3", stdin=subprocess.PIPE)
l.communicate(input=raw_pcm)
For piping the pcm data into lame as it is read, I used the following. I'm sure you could do this in a stream callback if you wished.
l = subprocess.Popen("lame - -r -m m recorded.mp3", stdin=subprocess.PIPE)
for i in range(int(44100 / chunk * record_seconds)):
l.stdin.write(stream.read(chunk))
I should note, that either way, lame did not start encoding until after the data was finished piping in. When piping in the data on each stream read, I assumed the encoding would start right away, but that was not the case.
Also, using .stdin.write may cause some trouble if stdout and stderr buffers arent read. Something I need to look into further.

pydub accessing the sampling rate(Hz) and the audio signal from an mp3 file

Just found out this interesting python package pydub which converts any audio file to mp3, wav, etc.
As far as I have read its documentation, the process is as follows:
read the mp3 audio file using from_mp3()
creates a wav file using export().
Just curious if there is a way to access the sampling rate and the audio signal(of 1-dimensional array, supposing it is a mono) directly from the mp3 file without converting it to a wav file. I am working on thousands of audio files and it might be expensive to convert all of them to wav file.
If you aren't interested in the actual audio content of the file, you may be able to use pydub.utils.mediainfo():
>>> from pydub.utils import mediainfo
>>> info = mediainfo("/path/to/file.mp3")
>>> print info['sample_rate']
44100
>>> print info['channels']
1
This uses avlib's avprobe utility, and returns all kinds of info. I suggest giving it a try :)
Should be much faster than opening each mp3 using AudioSegment.from_mp3(…)
frame_rate means sample_rate, so you can get like below;
from pydub import AudioSegment
filename = "hoge.wav"
myaudio = AudioSegment.from_file(filename)
print(myaudio.frame_rate)

Categories