Python Multithreaded camera and visualization fast - python

Challenge:
I want to run three USB cameras 1600x1300# 60 fps on a jetson Xavier NX using python.
Now there are some ways of doing this but my approach has been:
Main -> Camera1 Thread -> Memory 1 -> Visualization thread 1.
The main starts up three Camera threads and three visualizations.
The problem is the latency.
I store the images from camera 1 in Memory 1 which is shared with the visualization thread.
There are thread-lock on both the memory and cv2.imshow in the visualization thread.
Is there a way of speeding up the camera visualization. I get about 16fps. Is it better to have 1 visualization thread showing all three images in one view or as I have now, three separate.
The input capture is:
cv2.VideoCapture(Gstreamer_string, cv2.CAP_GSTREAMER)
The output to disc with the Gstreamer string is by branching the stream to a multifilesink and an appsink. The file-sink writes all three at 60FPS. Its just the
visualization on screen that takes for-ever.
I have tried also to visualize directly after the capture in the camera thread, without the memory, not much difference. I have a tendency to think that the imshow thread-lock I need in order not to crash/freeze the GUI is the reason. Perhaps combining all three into one is faster.

It is hard to guess without code, but possible bottlenecks may be:
cv imshow is not so efficient on Jetsons. You may use opencv VideoWriters with gstreamer backend to some display sinks such as nveglglessink.
Your disk storage may not be able to store 3 streams at that resolution at 60 fps. Are you using a NVME SSD ? SD Card may be slow depending on model. Does lowering the framerate help ? Are you encoding or trying to save RAW video ?
Opencv may also add some overhead. If opencv is not required for processing, a pure gstreamer pipeline may be able to display and record (if point 2 is not the issue).

Related

Multiprocess/multithread image analysis raspberry pi

I am trying to analyse the face of a driver with a raspberry pi while he's driving. I am using a Raspberry Pi 4 and I use mediapipe for face recognition with Python. I am trying to maximise my performance to analyse the maximum image per second. So I have 3 process, i am getting the image from the camera, i am analysing them with mediapipe and the final one is i am showing the image. My first question is: is it better to use multithreading for these 3 process or is it better to use multiprocessing. I was able to use multithreading, and I was able to get image at 30 fps (camera limitation) and to show them at this rate too. But, I was only analysing them at 13 fps. I tried to multiprocess but I am not able to figure out how. The first process of getting image is working but the other 2 are not working. This image is my class VideoGet, the first process to show you how I did my multiprocessing. And this code is my function that call every process together.
I was expecting that multiprocessing is the best thing to do.
I saw that maybe I should use pool instead of process but I am not sure.

Playing audio in sync with video with frames generated on the fly, real time. Plausible?

I'm a self taught python programmer working on a hobby project, but I'm having some difficulty and would like to address what I see as a potential XY problem.
My app takes an input of an audio file (converts it to wav) and produces visual representations of the audio (90x90, RGB, frames) in the form of numpy arrays. I used to save these frames to a video file using open-cv, then use ffmpeg to scale the video and add the (original, non-wav) audio over the top, but this meant waiting until the app had finished to play the file. I would like to be able to play the audio and display the frames as they are generated, in sync. My generation code takes at maximum 8ms of a 16ms frame (60fps), so I have a reasonable amount of cycles to play with.
From my research, I have found that SDL is the tool that is most appropriate to display frames at high speeds, and have managed to make a simple system to display frames 'in time', by brute-force pixel editing. I have also discovered that SDL can play audio, and it even seems that I could synchronize this with the video as I would like, via the callback function. However, being a decidedly non-c programmer, I am at a loss as to how to best to display frames, as directly assigning pixels cannot be the safest or fastest, and I would like to scale the frames as the are displayed. I am also at a loss as to how best to convert numpy arrays to textures efficiently, as well as how best to control the synchronicity of my generation code, the audio, and video frames.
I'm not specifically looking for an answer to any of those problems, though advice would be appreciated, I'm just making sure that this is a reasonable way forward. Is SDL/pysdl2 coupled with numpy appropriate in this scenario? Or is this asking too much from python overall?

Video Overlay System with OpenCV

I'm trying to implement a video overlay solution such as this one: https://www.videologixinc.com/, where there is no delay in the original source video.
My problem is that, with OpenCV, all the necessary drawings (circle, text, etc) requires the entire frame to be processed and then returned to be exhibited. Is there any solution where I could just overlay the information in the original source without implying in delay/frame drop? (the additional information can be displayed with delay - drawings, text - but not the original video pipeline).
Multiprocessing could make things faster, but I would still have delay or frame drops.
I was also thinking if would be better to have two simultaneous applications and maybe two different computers - one to read the frame and make the processing - and another one to just receive, somehow, the information to overlay it on the original video pipeline.
Any thoughts? Thank you all!
An example of data pipeline in this case, without interfering in the original video flow

OpenCV decentralized processing for stereo vision

I have a decent amount of experience with OpenCV and am currently familiarizing myself with stereo vision. I happen to have two JeVois cameras (don't ask why) and was wondering if it was possible to run some sort of code on each camera to distribute the workload and cut down on processing time. It needs to be so that each camera can do part of the overall process (without needing to talk to each other) and the computer they're connected to receives that information and handles the rest of the work. If this is possible, does anyone have any solutions or tips? Thanks in advance!
To generalize the stereo-vision pipeline (look here for more in-depth):
Find the intrinsic/extrinsic values of each camera (good illustration here)
Solve for the transformation that will rectify your cameras' images (good illustration here)
Capture a pair of images
Transform the images according to Step 2.
Perform stereo-correspondence on that pair of rectified images
If we can assume that your cameras are going to remain perfectly stationary (relative to each other), you'll only need to perform Steps 1 and 2 one time after camera installation.
That leaves you with image capture (duh) and the image rectification as general stereo-vision tasks that can be done without the two cameras communicating.
Additionally, there are some pre-processing techniques (you could try this and this) that have been shown to improve the accuracy of some stereo-correspondence algorithms. These could also be done on each of your image-capture platforms individually.

The simplest video streaming?

I have a camera that is taking pictures one by one (about 10 pictures per second) and sending them to PC. I need to show this incoming sequence of images as a live video in PC.
Is it enough just to use some Python GUI framework, create a control that will hold a single image and just change the image in the control very fast?
Or would that be just lame? Should I use some sort of video streaming library? If yes, what do you recommend?
Or would that be just lame?
No. It wouldn't work at all.
There's a trick to getting video to work. Apple's QuickTime implements that trick. So does a bunch of Microsoft product. Plus some open source video playback tools.
There are several closely-related tricks, all of which are a huge pain in the neck.
Compression. Full-sized video is Huge. Do the math 640x480x24-bit color at 30 frames per second. It adds up quickly. Without compression, you can't read it in fast enough.
Buffering and Timing. Sometimes the data rates and frame rates don't align well. You need a buffer of ready-to-display frames and you need a deadly accurate clock to get them do display at exactly the right intervals.
Making a sequence of JPEG images into a movie is what iPhoto and iMovie are for.
Usually, what we do is create the video file from the image and play the video file through a standard video player. Making a QuickTime movie or Flash movie from images isn't that hard. There are a lot of tools to help make movies from images. Almost any photo management solution can create a slide show and save it as a movie in some standard format.
Indeed, I think that Graphic Converter can do this.

Categories