so I'm really new here. Currently working on a public art project where I need a little help with the programming because I'm kind off lost between codes.
First I'll give you a short description of the goal of the work and then state my problem.
I'm putting a webcam in the shopwindow of a gallery that is facing out on a public street. This webcam is connected to a tv screen that is facing outwards on the street so people see themselves being filmed (like cctv). Then if people stand still long enough for the camera the webcam makes an automatic screenshot what will be emailed to a site which hold a script for automatic attachment printing and the people from the street instantly come in to my gallery, on paper.
(and yes I have permission from the gallery to do this since it is slightly in the grey area of legality)
I come from a art background with interest in programming so this was all very very new for me and made it already quite far I think. I have a raspberry pi running with open cv and put a script on it for deep learning object detection (https://www.pyimagesearch.com/2017/09/18/real-time-object-detection-with-deep-learning-and-opencv/) < the link I used for that.
I also come across loads of pedestrian tracking but did not find a suitable code yet for a real time video stream.
So what I need from you guys, is a little help with how to make a timer in the script so that when people stand still long enough for the camera, it wil make the screenshot. It is a bit like reversed security cams script because they react on movement and I want it to react to no movement at all.
The automatic attachment printing part I got covered I think because there are a lot of scripts already on the internet.
If you have any tips or tricks.. please let me know.
Help a girl out!
Marije
There are a number of things you can try.
Is the camera faced towards a shopping street? In that case you could go for simple background subtraction. For each frame, apply some preprocessing (e.g. blurring, morpholoy operations), call findContours and compute the center of minEnclosingRect for each of these.
Another option is to use the inbuilt (and pretrained) HOG PeopleDetector. This is based on SVM (Support Vector Machines), which is another machine learning technique. For this to work efficiently you'd have to tune the parameters adequately. Since you're using a Pi you'd also need to consider the tradeoff between speed and accuracy. Using this technique, we'd be left with rectangles as well, so we can again compute the center.
For both techniques, you'd want to make sure that the center point doesn't fluctuate too much from frame to frame (that would mean the person is moving). For this you'd also want to take into account the framerate and understand that you can't guarantee person detection for every frame.
The caveat of the first technique, whilst having more explanatory power, would be that it'd detect ANYTHING that changes from frame to frame, that includes pets, bikes, cars (if on a public street) and so on. You could then consider filtering (e.g. by area, color).
Related
I am new on computer vision, I have a project that can segments floor after that it can change floor. I have good segmentation model but I don't know how to change pattern properly. As an example if somebody uploads this picture;
After that, if gives this image as input;
output should be like that;
As you can see in the image changed floor rotated and well fitted to room.
It can be any of room picture but pattern always fixed like an example that I gived above. I can do converting and integration operations but I don't know how to get room rotation and and camera angle automatically. I am open to any kind of suggestions or resources to topic in computer vision about for thath kind of operations.
How can I solve this?
Note: Sorry about my poor english. I hope I could manage to describe my problem.
I have a work project currently where my company is using a 3D printer, a couple of servo motors, and a raspberry pi to produce a robotic arm. This robotic arm is meant to be visually-guided. I am task to research and code the software for the robot to be able to pick up a specific object and place it at a specific location. I need some help on how I could program the robot to pick up the object using visual guidance.
I've made a set-up where the top-down image will very likely look like the picture below. The objective is to pick up a die (those that you can find in a casino or in a game of monopoly) inside a clearly marked area of operation and place it on a piece of paper with an 'X' marked on it.
This is what I've accomplished thus far:
I am able to use OpenCV and Machine Learning to identify the die and the paper marked with 'X' and extract the coordinates in the image where the die is located (right now my camera is not located directly above the area of operations)
I am able to determine the orientation of the die (i.e. 30 degrees clockwise, 42 degrees anti-clockwise, etc) - this is important because the robotic arm uses a caliper-like clipper to pick up the die and I would not want it to hold the die insecurely by the corner
I am also able to code a program to move the robotic arm (something like servo1 to turn 30 degrees clockwise, servo2 to turn 40 degrees anti-clockwise, etc)
However, I have no idea how I could code the robotic arm to pick up the die. The die and the X in the photo are for illustration purposes only, their location could change to any spot inside the area of operation.
What is really depressing is I can't come up with any strategy. Hopefully, someone can advise on some principles and recommend some strategies - I'm not asking for code because I am confident to code my own software. I also don't think the programming language is important at this point but if it helps, I am using Python for this project - are there any libraries for such a task already? I felt like I have searched the entire web but haven't found any helpful tutorials on this yet.
Also, if it's helpful - I come from a web-software developer background usually coding in HTML, CSS, and Javascript. Python was the first language I've managed to master to a competent degree before I started coding with web technologies. I have some experience in C during my high school but have not coded in C for more than 10 years already.
Thanks for any help.
I have build a real time face detection using OPENCV in python now i am expanding my project to fyp and making a iot based smart home automation using AI.I am implementing a door lock as a initial state which will be opened if it detect a face which will be in dataset.
I don't want it to detect or opened if someone show a picture from mobile of same person which is a security concern
please Help.
An idea to improve security is asking people to perform something like:
blink one or both eyes
open mouth
turn left or right...
Probably choose one or two of these randomly... This will improve security, but it is not really safe. It may be useful to overcome some accessibility issues.
Some other ideas I've read eslewhere:
check background of the image (if the camera is fixed)
use infrared camera to detect heat patterns
use two cameras to get a stereoscopic image
I have a video of a road/building and I want to create a 3D model out of it. The scene I am looking at is rigid and the drone is moving. I assume not having any extra info like camera pose, accelerations or GPS position. I would love to find a python implementation that I can adapt to my liking.
So far, I have decided to use the OpenCV calcOpticalFlowFarneback() for optical flow, which seems reasonably fast and accurate. With it, I can get the Fundamental Matrix F with findFundamentalMat(). So far so good.
Now, according to the tutorial I am following here, I am supposed to magically have the Calibration Matrix of the camera, which I obviously don't have nor plan to have available in the future app I am developing.
After some long research, I have found a paper (Self-calibration of a moving camera from point correspondences and
fundamental matrices) from 1997 that defines what I am looking for (with a nice summary here). I am looking for the simplest/easiest implementation possible, and I am stuck with these problems:
If the camera I am going to use changes exposure and focus automatically (no zoom), are the intrinsic parameters of the camera going to change?
I am not familiar with the Homotopy Continuation Method for solving equations numerically, plus they seem to be slow.
I intend to use the Extended Kalman Filter, but do not know where to start, knowing that a bad initialization leads to non-convergence.
Digging some more I found a Multi Camera Self Calibration toolbox open-source written for Octave with a Python wrapper. My last resort will be to break down the code and write it in Python directly. Any other options?
Note: I do not want to use the a chess board nor the planarity constraint.
Is there any other way to very accurately self-calibrate my camera? After 20 years of research since 1997, has anyone come up with a more straightforward method??
Is this a one-shot thing, or are you developing an app to process lots videos like these automatically?
If the former, I'd rather use an integrated tool like Blender. Look up one of the motion tracking (or "matchmoving") tutorials on youtube to get an idea of it, for example this one.
I have a camera that will be stationary, pointed at an indoors area. People will walk past the camera, within about 5 meters of it. Using OpenCV, I want to detect individuals walking past - my ideal return is an array of detected individuals, with bounding rectangles.
I've looked at several of the built-in samples:
None of the Python samples really apply
The C blob tracking sample looks promising, but doesn't accept live video, which makes testing difficult. It's also the most complicated of the samples, making extracting the relevant knowledge and converting it to the Python API problematic.
The C 'motempl' sample also looks promising, in that it calculates a silhouette from subsequent video frames. Presumably I could then use that to find strongly connected components and extract individual blobs and their bounding boxes - but I'm still left trying to figure out a way to identify blobs found in subsequent frames as the same blob.
Is anyone able to provide guidance or samples for doing this - preferably in Python?
The latest SVN version of OpenCV contains an (undocumented) implementation of HOG-based pedestrian detection. It even comes with a pre-trained detector and a python wrapper. The basic usage is as follows:
from cv import *
storage = CreateMemStorage(0)
img = LoadImage(file) # or read from camera
found = list(HOGDetectMultiScale(img, storage, win_stride=(8,8),
padding=(32,32), scale=1.05, group_threshold=2))
So instead of tracking, you might just run the detector in each frame and use its output directly.
See src/cvaux/cvhog.cpp for the implementation and samples/python/peopledetect.py for a more complete python example (both in the OpenCV sources).
Nick,
What you are looking for is not people detection, but motion detection. If you tell us a lot more about what you are trying to solve/do, we can answer better.
Anyway, there are many ways to do motion detection depending on what you are going to do with the results. Simplest one would be differencing followed by thresholding while a complex one could be proper background modeling -> foreground subtraction -> morphological ops -> connected component analysis, followed by blob analysis if required. Download the opencv code and look in samples directory. You might see what you are looking for. Also, there is an Oreilly book on OCV.
Hope this helps,
Nand
This is clearly a non-trivial task. You'll have to look into scientific publications for inspiration (Google Scholar is your friend here). Here's a paper about human detection and tracking: Human tracking by fast mean shift mode seeking
This is similar to a project we did as part of a Computer Vision course, and I can tell you right now that it is a hard problem to get right.
You could use foreground/background segmentation, find all blobs and then decide that they are a person. The problem is that it will not work very well since people tend to go together, go past each other and so on, so a blob might very well consist of two persons and then you will see that blob splitting and merging as they walk along.
You will need some method of discriminating between multiple persons in one blob. This is not a problem I expect anyone being able to answer in a single SO-post.
My advice is to dive into the available research and see if you can find anything there. The problem is not unsolvavble considering that there exists products which do this: Autoliv has a product to detect pedestrians using an IR-camera on a car, and I have seen other products which deal with counting customers entering and exiting stores.