I am trying to write a python code in which users inputs a video file and program will mute/beep the curse/bad words in it and outputs a filtered video file, basically a profanity filter.
Firstly, I converted the video file into .wav format and then trying to apply audio profanity filter into the .wav file and will write that .wav file into the video.
So far, I am able to make chunks of audio file and extract text from each audio chunk of 5 secs using speech_recognition library. But, if words are overlapped between the chunks I will not be able to detect detect them and apply the check condition if that chunk text is found in curse words list and will reduce the db of that audio file which will mute it (Suggest other way to make beep sound instead of muting it).
I am confused if my approach is right.
I just want a audio profanity filter using python, up-till now I am able to extract text and make chunks only.
import speech_recognition as sr
import os
from pydub.utils import make_chunks
from pydub import AudioSegment
from pydub.playback import play
import codecs
import re
import fileinput
list=[]
#Curse words list
with codecs.open("words_final.txt", "r") as f0:
sentences_lines=f0.read().splitlines()
for sentences in sentences_lines:
list.append(sentences)
# print(list)
# create a speech recognition object
r = sr.Recognizer()
# a function that splits the audio file into chunks
# and applies speech recognition
def get_large_audio_transcription(path):
"""
Splitting the large audio file into chunks
and apply speech recognition on each of these chunks
"""
# open the audio file using pydub
sound = AudioSegment.from_wav(path)
chunk_length_ms = 5000 # pydub calculates in millisec
chunks = make_chunks(sound, chunk_length_ms) #Make chunks of one sec
folder_name = "audio-chunks"
# create a directory to store the audio chunks
if not os.path.isdir(folder_name):
os.mkdir(folder_name)
whole_text = ""
# process each chunk
for i, audio_chunk in enumerate(chunks, start=1):
# export audio chunk and save it in
# the `folder_name` directory.
chunk_filename = os.path.join(folder_name, f"chunk{i}.wav")
audio_chunk.export(chunk_filename, format="wav")
# recognize the chunk
with sr.AudioFile(chunk_filename) as source:
audio_listened = r.record(source)
# try converting it to text
try:
text = r.recognize_google(audio_listened,language="en-US")
wav_file=AudioSegment.from_file(chunk_filename, format = "wav")
# Reducing volume by 5
silent_wav_file = AudioSegment.silent(duration=8000)
# Playing silent file
play(silent_wav_file)
except sr.UnknownValueError as e:
print("Error:", str(e))
else:
text = f"{text.capitalize()}. "
print(chunk_filename, ":", text)
whole_text += text
# return the text for all chunks detected
return whole_text
path = "Welcome.wav"
print("\nFull text:", get_large_audio_transcription(path))
#Will implement a loop to sum all chunks a make a final filtered .wav file
First extract audio.wav from the video file. Then using this solution get the timestamps for each word from that audio. You can then mute/beep the bad words from the audio and join it back to the video.
Related
So far i tried:
from moviepy.editor import *
videoclip = VideoFileClip("filename.mp4")
audioclip = AudioFileClip("audioname.mp3")
new_audioclip = CompositeAudioClip([audioclip])
videoclip.audio = new_audioclip
videoclip.write_videofile("new_filename.mp4")
But it takes very long time.
I'd like to do it without re encoding. i also prefer opening video or audio clip from bytes in moviepy
One way to do it is using ffmpeg_merge_video_audio from FFMPEG tools.
ffmpeg_merge_video_audio - merges video file video and audio file audio into one movie file output.
By default the merging is performed without re-encoding.
Code sample:
from moviepy.video.io import ffmpeg_tools
ffmpeg_tools.ffmpeg_merge_video_audio("filename.mp4", "audioname.mp3", 'new_filename.mp4') # Merge audio and video without re-encoding
Note:
As far as I know, it's not possible to do it "from bytes" using MoviePy.
My current code is taking a mp4 video file, it adds a mp3 music file to it, the duration of the mp4 file is set to the length of the mp3 file, the clip is resized to 1920x1080 pixels and finally it saves and outputs the finished video.
Result: The finished video plays the mp4 file one time and then freezes until the mp3 file ends.
Result that I want: How can I make the mp4 file loop until the end of the mp3 file so it doesn't freeze after one play.
from moviepy.editor import *
import moviepy.editor as mp
import moviepy.video.fx.all as vfx
audio = AudioFileClip("PATH/TO/MP3_FILE")
clip = VideoFileClip("PATH/TO/MP4_FILE").set_duration(audio.duration)
# Set the audio of the clip
clip = clip.set_audio(audio)
#Resizing
clip_resized = clip.resize((1920, 1080))
#Code that doesn't work for looping
newClip = vfx.loop(clip_resized)
# Export the clip
clip_resized.write_videofile("movie_resized.mp4", fps=24)
The code itself works but the mp4 doesn't loop until the end. Thanks in advance.
3 Years late on that one, but it's still the top result for this on google.
If you want to loop it for the duration of audio, moviepy has a simple loop function that you call from the clip you want to loop, it takes either the amount of loops or the duration of time to loop for.
So in your case it would be
loopedClip = clip_resized.loop(duration = audio.duration)
Or if you don't want to create a seperate clip then
clip_resized = clip_resized.loop(duration = audio.duration)
If you get this error :
OSError: Error in file ...video.mp4, Accessing time t=101.77-101.81 seconds, with clip duration=101 seconds,
This problem occured because the loop do not match with the audio of the final looped video file
So you have to extract the audio from the video file (or your custom audio file) and loop it too as the video file
audio = AudioFileClip("video.mp4")#here i'm using the audio of the original video but if you have custom audio pass it here
audio = afx.audio_loop(audio, duration=500) #you can use n=X too
clip1 = VideoFileClip("video.mp4")
clip1 = vfx.loop(clip1, duration=500) #you can use n=X too
clip1 = clip1.set_audio(audio)
clip1.write_videofile("movie.mp4")
I want to determine, when exactly a speech in the audio file starts and ends. Firstly, I am using speech_recognition library to determine speech content of the audio file:
import speech_recognition as sr
filename = './dir_name/1.wav'
r = sr.Recognizer()
with sr.AudioFile(filename) as source:
audio_data = r.record(source)
text = r.recognize_google(audio_data, language = "en-US")
print(text)
Running this code I get a correct output (speech recognized content of the audio file). How can I determine, where exactly the speech starts and ends? Let's assume the file contains ONLY speech. I tried different methods, e. g. by analysing amplitude of the signal, its envelope etc. but I do not get high efficiency. So I started using speech_recognition library with a hope it could be useful here.
I am trying to load an audio file in python and process it with google speech recognition
The problem is that unlike in C++, python doesn't show data types, classes, or give you access to memory to convert between one data type and another by creating a new object and repacking data
I dont understand how it's possible to convert from one data type to another in python
The code in question is below,
import speech_recognition as spr
import librosa
audio, sr = librosa.load('sample_data/metal.mp3')
# create a speech recognition object
r = spr.Recognizer()
r.recognize_google(audio)
The error is:
audio_data must be audio data
How do I convert the audio object to be used in google speech recognition
#Mich, I hope you have found a solution by now. If not, please try the below.
First, convert the .mp3 format to .wav format using other methods as a pre-process step.
import speech_recognition as sr
# Create an instance of the Recognizer class
recognizer = sr.Recognizer()
# Create audio file instance from the original file
audio_ex = sr.AudioFile('sample_data/metal.wav')
type(audio_ex)
# Create audio data
with audio_ex as source:
audiodata = recognizer.record(audio_ex)
type(audiodata)
# Extract text
text = recognizer.recognize_google(audio_data=audiodata, language='en-US')
print(text)
You can select the speech language from https://cloud.google.com/speech-to-text/docs/languages
Additionally you can set the minimum threshold for the loudness of the audio using below command.
recognizer.set_threshold = 300 # min threshold set to 300
Librosa returns numpy array, you need to convert it back to wav. Something like this:
raw_audio = np.int16(audio/np.max(np.abs(audio)) * 32767).tobytes()
You probably better load mp3 with ffmpeg wrapper without librosa things, librosa does strange things with the audio (normalizes, etc). Its better to work with raw data.
Try this with speech recognizer:
import speech_recognition as spr
with spr.WavFile('sample_data/metal.mp3') as source:
audio = r.record(source)
r = spr.Recognizer()
r.recognize_google(audio)
I want to write a python program that write chunks from an audio file. I can write chunks from an audio file available locally using following code,
from pydub import AudioSegment
from pydub.utils import make_chunks
myaudio = AudioSegment.from_file("file1.wav" , "wav")
chunk_length_ms = 10000 # pydub calculates in millisec
chunks = make_chunks(myaudio, chunk_length_ms) #Make chunks of one sec
#Export all of the individual chunks as wav files
for i, chunk in enumerate(chunks):
chunk_name = "chunk{0}.wav".format(i)
print "exporting", chunk_name
chunk.export(chunk_name, format="wav")
The above code will create chunks with 10000 milliseconds of the audio file "file1.wav". But I want to write chunks from an audio stream, the stream could be wav or mp3. Can someone help me on this?
change the audio chunk to numpy array and use the function .get_array_of_samples()
np.array(chunk[0].get_array_of_samples())