I am trying to build a sports analysis platform where I have a deep learning model which processes Live video(RTMP/Webcam) frames, applies overlays,score etc. and then I need to combine it with microphone audio and rebroadcast with audio and video in sync. I think I need the presentation time stamps of the frames (Since AI frame processing takes variable time) and somehow provide ffmpeg with it but I'm lost and could not find a similar example doing this.
Related
I need to
read in variable data from sensors
use those data to generate audio
spit out the generated audio to individual audio output channels in real time
My trouble is with item 3.
Parts 1&2 have a lot in common with a guitar effects pedal, I should think: take in some variable and then adjust the audio output in real time as the input variable changes but don't ever stop sending a signal while doing it.
I have had no trouble using pyaudio to drive wav files to specific channels using the mapping[] parameter of pyaudio.play nor have I had trouble generating sine waves dynamically and sending them out using pyaudio.stream.play.
I'm working with 8 audio output channels. My problem is that stream.play only lets you specify a count of channels and as far as I can tell I can't say, for example, "stream generated_audio to channel 5".
I am new to object detection using USB webcam.
I have a USB webcam which is capable of recording at 30fps FHD. I've connected this camera to a linux machine to capture video. The USB camera is connected to USB 3.0 port.
ffmpeg command line is used to capture a minute long, 15fps, 640x720, bitrate 5M video.
A simple opencv based python program reads this video file, frame by frame using cap.read(). However, I've noticed that when there is an moving object (e.g. human) in the frame, it becomes very blurry. (Here is a link of an example) I am wondering if this is normal or some adjustments are missing.
I am asking this question because I would like to run an object detection algorithm (SSD + MobileNet v2) on this video that I am capturing. But for many of the frames, if the object is moving, object detection fails to spot the object. (Yes, of course there isn't a perfect detection algorithm for all video analytics and there are various reasons for it to fail object detection)
Could you give pointers to remove the blurriness of this video frames?
1) Is it due to the video recording resolution is too low?
2) Is it because the python program is reading at different frame rate? (approximately 13~14 fps)
I'm writing a multi-threaded application in python 3, one thread grab frames from a webcam using opencv, another one record audio frames using pyaudio. Both threads put the frames in a separate circular buffer, with absolute timestamp for every frame.
Now I'd like to create another thread who read from the buffers and join audio and video frame together using the timestamp information, then save everything to a mp4 file. The only thing I found is merging audio and video files using for example ffmpeg, but nothing related to frames on the fly.
Do I really need to create the audio and video files before join them? What I don't understand in this case is how to handle synchronization..
Any hints will be appreciated.
EDIT
In reponse to the comments, the timestamps are created by me and are absolute, I use a data structure which contains the actual data (video or audio frame) and the timestamp. The point is that audio is recorded with a microphone and video using a webcam, which are different hardware, not synchronized.
Webcam grab a frame, elaborate it and put in a circular buffer using my data structure (data + timestamp).
Microphone record an audio frame, elaborate it and put in a circular buffer using my data structure (data + timestamp).
So I have 2 buffers, I want to pop frames and join together in whatever video file format, matching the timestamps in the most accurate way possible. My idea is something that can add an audio frame to a video frame (I will check about timestamps matching).
I've been messing around with Gstreamer and Gnonlin lately, I've been concatenating segments of video files but when I dynamically connect the src pad on the composition, I can choose either the audio or video portion of the files, producing silent playback or videoless audio. How can I attach my composition to an audioconverter and a video sink at the same time. Do I have to make two compositions and add the files to both them?
Yes, gnonlin compositions work on one media type at a time. Audio and Video are treated separately.
I have a camera that is taking pictures one by one (about 10 pictures per second) and sending them to PC. I need to show this incoming sequence of images as a live video in PC.
Is it enough just to use some Python GUI framework, create a control that will hold a single image and just change the image in the control very fast?
Or would that be just lame? Should I use some sort of video streaming library? If yes, what do you recommend?
Or would that be just lame?
No. It wouldn't work at all.
There's a trick to getting video to work. Apple's QuickTime implements that trick. So does a bunch of Microsoft product. Plus some open source video playback tools.
There are several closely-related tricks, all of which are a huge pain in the neck.
Compression. Full-sized video is Huge. Do the math 640x480x24-bit color at 30 frames per second. It adds up quickly. Without compression, you can't read it in fast enough.
Buffering and Timing. Sometimes the data rates and frame rates don't align well. You need a buffer of ready-to-display frames and you need a deadly accurate clock to get them do display at exactly the right intervals.
Making a sequence of JPEG images into a movie is what iPhoto and iMovie are for.
Usually, what we do is create the video file from the image and play the video file through a standard video player. Making a QuickTime movie or Flash movie from images isn't that hard. There are a lot of tools to help make movies from images. Almost any photo management solution can create a slide show and save it as a movie in some standard format.
Indeed, I think that Graphic Converter can do this.