I have a problem that the output audio clip contains a delay

I have a problem that the output audio clip contains a delay - python

I use a library movepy in Python to make several mp3 audio files in one file.
clips = [AudioFileClip(c) for c in audio_list]
final_clip_audio = concatenate_audioclips(clips)
I have a problem that the output audio clip contains a delay when it switches between one sound and another. I want it to become as a unified sound without any interruption.

Related

How to direct realtime-synthesized sound to individual channels in multichannel audio output in Python

I need to
read in variable data from sensors
use those data to generate audio
spit out the generated audio to individual audio output channels in real time
My trouble is with item 3.
Parts 1&2 have a lot in common with a guitar effects pedal, I should think: take in some variable and then adjust the audio output in real time as the input variable changes but don't ever stop sending a signal while doing it.
I have had no trouble using pyaudio to drive wav files to specific channels using the mapping[] parameter of pyaudio.play nor have I had trouble generating sine waves dynamically and sending them out using pyaudio.stream.play.
I'm working with 8 audio output channels. My problem is that stream.play only lets you specify a count of channels and as far as I can tell I can't say, for example, "stream generated_audio to channel 5".

synchronize audio and video using frame timestamps

I'm writing a multi-threaded application in python 3, one thread grab frames from a webcam using opencv, another one record audio frames using pyaudio. Both threads put the frames in a separate circular buffer, with absolute timestamp for every frame.
Now I'd like to create another thread who read from the buffers and join audio and video frame together using the timestamp information, then save everything to a mp4 file. The only thing I found is merging audio and video files using for example ffmpeg, but nothing related to frames on the fly.
Do I really need to create the audio and video files before join them? What I don't understand in this case is how to handle synchronization..
Any hints will be appreciated.
EDIT
In reponse to the comments, the timestamps are created by me and are absolute, I use a data structure which contains the actual data (video or audio frame) and the timestamp. The point is that audio is recorded with a microphone and video using a webcam, which are different hardware, not synchronized.
Webcam grab a frame, elaborate it and put in a circular buffer using my data structure (data + timestamp).
Microphone record an audio frame, elaborate it and put in a circular buffer using my data structure (data + timestamp).
So I have 2 buffers, I want to pop frames and join together in whatever video file format, matching the timestamps in the most accurate way possible. My idea is something that can add an audio frame to a video frame (I will check about timestamps matching).

align audio files that start and end at different times

I have audio recordings that starts and ends at different times.
audio 1: -----t1--------------------------s1->time
audio 2: ---------t2----s2------------------->time
audio 3: ------------------------t3-------s3->time
audio 1 is the longest and it overlaps with both audio 2 and 3.
audio 2, and audio 3 are short segments but they do not overlap at all.
Is there a python library that does this?

You could first use a python library to read the audio file (numpy or scipy for instance, see https://stackoverflow.com/a/26716031/3244382).
Then you have to determine t and s for each file. If the files are not too noisy a simple threshold on the audio signal could be sufficient. A little bit more sophisticated approach would be to compute the RMS energy or the envelope (that average the signal), and use a threshold on it.
Once you know s and t, you could write a new audio file from this boundaries with the same audio library.

Automated aligning audio tracks with timings for dubbing screencasts

We have a some screen casts that need to be dubbed to various languages for which we have textual script for the target language as shown below:
Begining Time Audio Narration
0:0 blah nao lorep iposm...
1:20 xao dok dkjv dwv....
..
We can record each of the above units separately and then align it at the proper beginning times as mentioned in the above script.
Example:
Input:
Input the N timing values: 0:0,1:20 ...
Then input the N audio recordings
Output:
Audio recordings aligned to the above timings. An overflow should be detected by the system individually whereas an underflow is padded by silence.
Are there any platform independent audio apis \ software or a code snippet preferably in python that allows us to align these audio units based on the times provided?

If the input audio files are uncompressed (i.e., WAV files, etc.), the audio library I like to use is libsndfile. It appears to have a python wrapper here: https://code.google.com/p/libsndfile-python/. With that in mind, the rest could be accomplished like such:
Open an output audio stream to write audio data to with libsndfile
For each input audio file, open an input stream with libsndfile
Extract the meta-data information for the given audio file based on your textual description 'script'
Write any silence needed to your master output stream, and then write the data from the input stream to the output stream. Note current position/time. Repeat this step for each input audio file, checking that the audio clips target start time is always >= the current position/time noted earlier. If not then you have an overlap.
Of course, you have to worry about sample rates matching etc., but that should be enough to get started. Also, I'm not exactly sure if you are trying to write a single output file, or one for each input-file, but this answer should be tweekable enough. libsndfile will give you all the information you need (such as clip lengths, etc.) assuming it supports the input file format.

Play audio and video with gnonlin

I've been messing around with Gstreamer and Gnonlin lately, I've been concatenating segments of video files but when I dynamically connect the src pad on the composition, I can choose either the audio or video portion of the files, producing silent playback or videoless audio. How can I attach my composition to an audioconverter and a video sink at the same time. Do I have to make two compositions and add the files to both them?

Yes, gnonlin compositions work on one media type at a time. Audio and Video are treated separately.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.