I am trying to analyse the face of a driver with a raspberry pi while he's driving. I am using a Raspberry Pi 4 and I use mediapipe for face recognition with Python. I am trying to maximise my performance to analyse the maximum image per second. So I have 3 process, i am getting the image from the camera, i am analysing them with mediapipe and the final one is i am showing the image. My first question is: is it better to use multithreading for these 3 process or is it better to use multiprocessing. I was able to use multithreading, and I was able to get image at 30 fps (camera limitation) and to show them at this rate too. But, I was only analysing them at 13 fps. I tried to multiprocess but I am not able to figure out how. The first process of getting image is working but the other 2 are not working. This image is my class VideoGet, the first process to show you how I did my multiprocessing. And this code is my function that call every process together.
I was expecting that multiprocessing is the best thing to do.
I saw that maybe I should use pool instead of process but I am not sure.
Related
Challenge:
I want to run three USB cameras 1600x1300# 60 fps on a jetson Xavier NX using python.
Now there are some ways of doing this but my approach has been:
Main -> Camera1 Thread -> Memory 1 -> Visualization thread 1.
The main starts up three Camera threads and three visualizations.
The problem is the latency.
I store the images from camera 1 in Memory 1 which is shared with the visualization thread.
There are thread-lock on both the memory and cv2.imshow in the visualization thread.
Is there a way of speeding up the camera visualization. I get about 16fps. Is it better to have 1 visualization thread showing all three images in one view or as I have now, three separate.
The input capture is:
cv2.VideoCapture(Gstreamer_string, cv2.CAP_GSTREAMER)
The output to disc with the Gstreamer string is by branching the stream to a multifilesink and an appsink. The file-sink writes all three at 60FPS. Its just the
visualization on screen that takes for-ever.
I have tried also to visualize directly after the capture in the camera thread, without the memory, not much difference. I have a tendency to think that the imshow thread-lock I need in order not to crash/freeze the GUI is the reason. Perhaps combining all three into one is faster.
It is hard to guess without code, but possible bottlenecks may be:
cv imshow is not so efficient on Jetsons. You may use opencv VideoWriters with gstreamer backend to some display sinks such as nveglglessink.
Your disk storage may not be able to store 3 streams at that resolution at 60 fps. Are you using a NVME SSD ? SD Card may be slow depending on model. Does lowering the framerate help ? Are you encoding or trying to save RAW video ?
Opencv may also add some overhead. If opencv is not required for processing, a pure gstreamer pipeline may be able to display and record (if point 2 is not the issue).
I want to make a drone that can detect objects from up. I found examples
of Background Subtraction but it detects things and then considers new image as background object. I want the drone to come to it's way point and see if something new is detected.
Drone will fly by itself and the image processing will be done using Opencv on Raspberry pi. How do I write the code in python for this? I can code in python. Please tell me what should I follow.
Thanks in advance.
Background subtraction don't works on drones, a stabilized camera don't help. It need to search a homography matrix between frames with subpixel quality and create custom background subtraction algorithm. This work is not work Raspberry and Python.
If you know anything about objects then try to use neural networks for detection. MobileNet v3 can work on Raspberry.
For training do you can use datasets:
http://aiskyeye.com/
https://gdo152.llnl.gov/cowc/
http://cvgl.stanford.edu/projects/uav_data/
https://github.com/gjy3035/Awesome-Crowd-Counting#datasets
I have a decent amount of experience with OpenCV and am currently familiarizing myself with stereo vision. I happen to have two JeVois cameras (don't ask why) and was wondering if it was possible to run some sort of code on each camera to distribute the workload and cut down on processing time. It needs to be so that each camera can do part of the overall process (without needing to talk to each other) and the computer they're connected to receives that information and handles the rest of the work. If this is possible, does anyone have any solutions or tips? Thanks in advance!
To generalize the stereo-vision pipeline (look here for more in-depth):
Find the intrinsic/extrinsic values of each camera (good illustration here)
Solve for the transformation that will rectify your cameras' images (good illustration here)
Capture a pair of images
Transform the images according to Step 2.
Perform stereo-correspondence on that pair of rectified images
If we can assume that your cameras are going to remain perfectly stationary (relative to each other), you'll only need to perform Steps 1 and 2 one time after camera installation.
That leaves you with image capture (duh) and the image rectification as general stereo-vision tasks that can be done without the two cameras communicating.
Additionally, there are some pre-processing techniques (you could try this and this) that have been shown to improve the accuracy of some stereo-correspondence algorithms. These could also be done on each of your image-capture platforms individually.
I'm currently trying to build an ANN that can play the online game "Helicopter Game" (see picture below if you're unfamiliar) using only the pixels of screenshots for training.
I've built similar models in OpenAI Universe but was hoping to try my hand at training directly on an online game instead of using an emulator.
The first thing I tried was to use the Selenium screenshot method to capture 100 screenshots at 10 frames per second.
for i in range(100):
driver.save_screenshot(r'C:\Users\MyName\Desktop\Screenshots\shot'+str(i)+'.png')
time.sleep(0.1)
But Selenium doesn't seem to be able to handle that kind of speed, it can only capture about 2 or 3 screenshots per second, even when I take away the time delay, and this is before even doing any preprocessing of the images.
Does anyone know of a method faster than what I'm trying to accomplish with Selenium?
You can give a try to the MSS module, and more precisely to that example to only capture the revelant part of the screen.
The module can be used with PIL, Numpy and OpenCV for other work, just check the doc' :)
I have a small project that I am tinkering with. I have a small box and I have attached my camera on top of it. I want to get a notification if anything is added or removed from it.
My original logic was to constantly take images and compare it to see the difference but that process is not good even the same images on comparison gives out a difference. I do not know why?
Can anyone suggest me any other way to achieve this?
Well I can suggest a way of doing this. So basically what you can do is you can use some kind of Object Detection coupled with a Machine Learning Algo. So the way this might work is you first train your camera to recongnize the closed box. You can take like 10 pics of the closed box(just an example) and train your program to recognize that closed box. So the program will be able to detect when the box is closed. So when the box is not closed(i.e open or missing or something else) then you can code your program appropriately to fire off a signal or whatever it is you are trying to do. So the first obvious step is to write code for object detection. There are numerous ways of doing this alone like Haar Classification, Support Vector Machines. Once you have trained your program to look for the closed box you can then run this program to predict what's happening in every frame of the camera feed. Hope this answered your question! Cheers!