How can I find the actual real world velocity of an object using the optical flow information obtained from two images? Can anyone help me out?
as the commentators have already said we need some more information on your problem.
Basically: Yes, it is possible to calculate real world velocity from an image
But all of this depends on the following things:
Is your camera fixed or is it maybe even moving
Do you try to calculate velocity of any object moving anywhere on the scene or do you have a fixed lane, like a street filmed with a mounted camera and objects (cars) will always move along one lane?
If the latter, can you do measurements on the street in real world? Like marking points on the boardwalk (permanently or simply to find out to how long a distance of x meters in real world will appear on your camera image in pixels)
if you cannot do those measurements in the real world scene you will need to provide information on angle of the camera to the scene/ground level, distance of the camera to the scene, and parameters of your camera.
For calculating the velocity of any tracked object on the scene you'd probably need all the latter stuff to really calculate the distances in the scene. But this is much more difficult.
If you have the case of a fixed lane where you i.e. want to measure a car's velocity I would prefer the method with measuring or marking points in real world.
Because if have that information:
x m = y px
and an object has moved y px in t time (you get that time by the refreshment rate of your calculation) you can calculate how many pixels it will have moved in 1 second and since you know how many pixels are one meter you'd know its speed in meters per second (or any other unit you prefer.
You could also just set your two marks in the scene and simply measure, how many frames (and therefore how much time) the object needed to move from one marking to the other. This would give you a more averaged velocity since if you do calculations in small time steps you might get a noisy result due to segmentation problems or simply because changes are fairly small between the shorter the measured timespan is.
Well and for segmentation you could simply try a substraction method. Substract two or three following frames from each other. Moving objects (and therefore image parts that have changed) will result in non-zero values whereas color values of a steady image part should substract to something about 0.
Maybe that helps you with your problem... but of couse this depends on your setting and your desired goal... You'll need to provide more information then...
This method is quite long but in short:
What you can do is set a value that specifies the distance of object from camera.
Then capture first frame and save it somewhere.
Capture last frame and save it somewhere.
Apply threshold on both the frames.
Trim all the pixels from left of first frame and then do the same for second frame.
For detail tutorial I think this article may help you a bit.
http://morefunscience.blogspot.in/2012/05/calculating-speed-using-webcam.html
Related
I have a picture of human eye taken roughly 10cm away using a mobile phone(no specifications regarding the camera). After some detection and contouring, I got 113px as the Euclidean distance between the center of the detected iris and the outermost edge of iris on the taken image. Dimensions of the image: 483x578px.
I tried converting the pixels into mm by simply multiplying the number of pixels with the size of a pixel in mm since 1px is roughly equal to 0.264mm which gives the proper length only if the image is in 1:1 ratio wrt to the real-time eye which is not the case here.
Edit:
Device used: One Plus 7T
View of range = 117 degrees
Aperture = f/2.2
Distance photo was taken = 10 cm (approx)
Question:
Is there an optimal way to find the real time radius of this particular eye with the amount of information I have gathered through processing thus far and by not including a reference object within the image?
P.S. The actual HVID of the volunteer's iris is 12.40mm taken using Sirus(A hi-end device to calculate iris radius and I'm trying to simulate the same actions using Python and OpenCV)
After months I was able to come up with the result after ton of research and lots of trials and errors. This is not the most ideal answer but it gave me expected results with decent precision.
Simply, In order to measure object size/distance from the image we need multiple parameters. In my case, I was trying to measure the diameter of iris from a smart phone camera.
To make that possible we need to know the following details prior to the calculation
1. The Size of the physical sensor (height and width) (usually in mm)
(camera inside the smart phone whose details can be obtained from websites on the internet but you need to know the exact brand and version of the smart phone used)
Note: You cannot use random values for these, otherwise you will get inaccurate results. Every step/constraint must be considered carefully.
2. The Size of the image taken (pixels).
Note: Size of the image can be easily obtained used img.shape but make sure the image is not cropped. This method relies on the total width/height of the original smartphone image so any modifications/inconsistencies would result in inaccurate results.
3. Focal Length of the Physical Sensor (mm)
Note: Info regarding focal length of the sensor used can be acquired from the internet and random values should not be given. Make sure you are taking images with auto focus feature disabled so the focal length is preserved. Incase if you have auto focus on then the focal length will be constantly changing and the results will be all over the place.
4. Distance at which the image is taken (Very Important)
Note: As "Christoph Rackwitz" told in the comment section. The distance from which the image is taken must be known and should not be arbitrary. Head cannoning a number as input will always result in inaccuracy for sure. Make sure you properly measure the distance from sensor to the object using some sort of measuring tool. There are some depth detection algorithms out there in the internet but they are not accurate in most cases and need to calibrated after every single try. That is indeed an option if you dont have any setup to take consistent photos but inaccuracies are inevitable especially in objects like iris which requires medical precision.
Once you have gathered all these "proper" information the rest is to dump these into a very simple equation which is a derivative of the "Similar Traingles".
Object height/width on sensor (mm) = Sensor height/width (mm) × Object height/width (pixels) / Sensor height/width (pixels)
Real Object height (in units) = Distance to Object (in units) × Object height on sensor (mm) / Focal Length (mm)
In the first equation, you must decide from which axis you want to measure. For instance, if the image is taken in portrait and you are measuring the width of the object on the image, then input the width of the image in pixels and width of the sensor in mm
Sensor height/width in pixels is nothing but the size of the "image"
Also you must acquire the object size in pixels by any means.
If you are taking image in landscape, make sure you are passing the correct width and height.
Equation 2 is pretty simple as well.
Things to consider:
No magnification (Digital magnification can destroy any depth info)
No Autofocus (Already Explained)
No cropping/editing image size/resizing (Already Explained)
No image skewing.(Rotating the image can make the image unfit)
Do not substitute random values for any of these inputs (Golden Advice)
Do not tilt the camera while taking images (Tilting the camera can distort the image so the object height/width will be altered)
Make sure the object and the camera is exactly in the same line
Don't use EXIF data of the image (EXIF data contains depth information which is absolute garbage since they are not accurate at all. DO NOT CONSIDER THEM)
Things I'm unsure of till now:
Lens distortion / Manufacturing defects
Effects of field of view
Perspective Foreshortening due to camera tilt
Depth field cameras
DISCLAIMER: There are multiple ways to solve this issue but I chose to use this method and I highly recommend you guys to explore more and see what you can come up with. You can basically extend this idea to measure pretty much any object using a smartphone (given the images that a normal smart phone can take)
(Please don't try to measure the size of an amoeba with this. Simply won't work but you can indeed take some of the advice I have gave for your advantage)
If you have cool ideas and issues with my answers. Please feel free to let me know I would love to have discussions. Feel free to correct me if I have made any mistakes and misunderstood any of these concepts.
Final Note:
No matter how hard you try, you cannot make something like a smartphone to work and behave like a camera sensor which is specifically designed to take images for measuring purposes. Smart phone can never beat those but sure we can manipulate the smart phone camera to achieve similar results upto a certain degree. So you guys must keep this in mind and I learnt it the hard way
I have an air drone with four motors and wanted to make it fly between two straight lines.
The first problem:
its initial position will be in the middle at certain height but because of the air factors it may deviate (up or down) or (left or right). I have calculated the error when it deviates left or right using the camera, but still don't know how to calculate the error of the height (using the camera too without pressure sensor).
The second problem:
after calculating these errors how to convert them from an integer to a real move.
Sorry, I couldn't provide my code. it is too large and complicated.
1) Using a single camera to calculate distance is not enough.
However, if you're using a stereo camera, you can get a distance data pretty easily. If you want to avoid using a pressure sensor, you may want to consider using a distance sensor(LIDAR or ultrasonic: check the maximum range on these) to measure the height at which your drone will fly. In addition to this, you'll require a error control algorithm eg. PID algorithm to make your drone fly at a constant height.
This is a fantastic source for understanding the fundamentals of PID.
2)For implementation:
In my opinion, this video is awesome for understanding how your sensor data will get converted to an actual movement and will help you can create an analogy. You'll also get a headstart on the code provided.
I am reading the slides for temporal filtering in Computer vision (page 108) class and i am wondering how can we do temporal filtering for videos?
For example they say our data is a vide which is in XYT, whre X,Y are spatial domain and T is time.
"How could we create a filter that keeps sharp objects that move at some velocity (vx, vy)while blurring the rest?"
and they kinda drive the formula for that, but im confused how to apply it?
How can we do filtering in Fourie Domain , how should we apply that in general? can someone please help me how should i code it?
In that example, they're talking about a specific known speed. For example, if you know that a car is moving left at 2 pixels per frame. It's possible to make a video that blurs everything except that car.
Here's the idea: start at frame 0 of the video. At each pixel, look one frame in the future, and 2 pixels left. You will be looking at the same part of the moving car. Now, imagine you take the average color value between your current pixel & the future pixel (the one that is 2 pixels left, and 1 frame in the future). If your pixel is on the moving car, both pixels will be the exact same color, so taking the average has no effect. On the other hand, if it's NOT on the moving car, they'll be different colors, and so take the average will have the effect of blurring between them.
Thus, the pixels of the car will be unchanged, but the rest of the video will get a blur. Repeat for each frame. You can also include more frames in your filter; e.g. you could look 2 frames in the future and 4 pixels left, or 1 frame in the past and 2 pixels right.
Note: this was a teaching example; I don't think there are many real computer vision applications for this (at least, not as a standalone technique), because it's so fragile. If the car speeds up or slows down slightly, it gets blurred.
I recently worked on a code that allowed to display a simulation of particles' motions in a periodical space. In concrete terms, it resulted in a 2D plot provided with N points (N ~ 10^4) initially gathered at the center, then spread out according to a matching velocity. As it is a periodical space, any points that would go beyond the upper limit is actually brought back to the lower limit, and vice versa. To illustrate, here are two images :
Initial positions
After a certain time
Each points are supposed to travel horizontally, either to the right or to the left (respectively positive or negative velocity).
I programmed it using Python, but now, in the scope of my project, I'd like to simulate the same thing but on a torus. To give you a good glimpse of how it looked like, please take a look at the following pic :
Transformation from a rectangle to a torus
(Imagine my initial 2D plan is the initial rectangle, which I'd like to transform into the final torus).
Therefore, in that case we would see every particle moving on the surface of the torus. The previous 1st picture would correspond to particles gathered on a "single" circus of the torus, and the previous 2nd picture would correspond to the "filling up" the entire surface of the torus.
Since my code for previous simulations was written in Python, I am wondering if I can still use it for this task. If so, I'd like to have some clues about how to do it, and otherwise, what would be the best language to use for this ?
I hope I have been clear. I apologize in advance for some mistakes I could have done with English.
I am building a system which detects coins that are picked up from a tray. This tray will be kept in a public place. People will pick up one or more coins, but would be expected to keep them back after some time.
I would have a live stream through a webcam placed at the top. I will have a calibration step, say at the beginning of the day, that captures the initial state of the tray to be used for comparing with the live feed. A few slots might be empty to begin with, as you can see in the sample image.
I need to detect slots that had a coin initially, but are missing the same at any given point of time during the day.
I am trying out a few approaches using OpenCV:
SSIM difference: I can use SSIM to find diff between my live image frame and initial state. However, a number of slots are larger than the corresponding coin sizes (e.g. top two rows). This could mean that if the coin was originally placed at the center, but was later put back to touch one of the edges, we may get a false positive.
Blob detection: Alternatively, I can pre-feed (or detect) slot co-ordinates. Then do a blob detection within every slot. If a blob was present in the original state, but is missing in a camera frame, this would mean a coin has been picked up. However, accurate blob detection could be a challenge if the contrast between the coin and the tray is low.
I might also need to watch out for slight variations in lighting due to shadows of people moving around.
Any thoughts on these or any pointers on alternate approaches that can be tried out? Is there any analogous implementation that I can learn from?
Many thanks in advance.
Edit: Thanks to #I.Newton's suggestion. For those who stumble upon this question and would benefit from a sample implementation, look here: https://github.com/kewats/computer-vision-samples/tree/master/image-processing/missing-coins-detection
If you complete control over the lighting conditions, you can use simple color thresholding to solve the problem.
First make a mask for the boxes. You can do it in multiple ways by color threshold or by using adaptive threshold or canny edge etc. I did by color threshold
Then make a mask for the coins by the same method.
Now flood fill your box mask from from the center of each of this coins. It'll retain only those which do not have the coins.
Now you can compare this with your initial mask to figure out if all the coins are present
This does not include frame subtraction. So you need not worry about different position of coin in the box. Only thing you need to make sure is the lighting conditions for making the masks. If you want to make sure the coins are returned to the same box, you should go for template matching etc which again needs effort.