I am currently learning python and playing around with tensorflow.
I have a bunch of images where I have obtained the landmarks (pixel points) of a person's facial features such as ears and eyes. In addition, it also provides me with a box (4 coordinates) where the face exists.
My goal is to normalise all the data from different images into a standard sized rectangle / square and calculate the position of the landmarks relative to the normalised size.
Is there an API that allows me to do this already or should I get cracking and calculate the points myself?
Thanks in advance.
Actually I think I have figured it out, pretty simple maths actually, here is what i am going to do
Take every point and take away the first box point values - this will give me the points as if the box starts at [ 0,0 ]
Apply the box/normalised size ratio to every point
Related
I'm beginning to work on a project with OpenCV (in python), and I'm trying to figure out the best way to tackle the problem I'm facing. I'm trying to get the area of an item in an image...but the surface area I'm looking for is on a 3D object.
So I found this while searching for calculating the area of a contour, but as you can see, this is only for a 2D object. For example, how could I find the area of the red question marks on the sphere in this image? Or the size of the rash on this baby's face (which is most certainly not 2-dimensional)? Is there a way to find the depth of the surface in the image, then use the high color gradient difference to find contours and calculate the difference based on the depths?
I found this deep learning paper (and associated PyTorch library), but was wondering if there was another way that I was missing...
Thanks to all for any ideas / replies.
I'm working on a project to track a bt device using three bt readers in a room. This already works fine, I have some data.
I hope, I have done my math correct thus I can calculate my position using trilateration. Well, works fine on my paper sheet and arbitrary python script.
I used following tipps:
Trilateration C# How to get back into "normal" coordinates?
Trilateration example in java
and finally
https://math.stackexchange.com/questions/100448/finding-location-of-a-point-on-2d-plane-given-the-distances-to-three-other-know
As I know the coordinates of my 3 receivers in "the real world" and distances, I ask my self how to transform this informations onto my 2D picture (or svg).
For instance, how do I convert my three distances 3m, 5m and 6m into a picture with 600x800 pixel. How to I set the position of the readers onto the picture? Any suggestions or real world hints? What happens if I either zoom in or zoom out of the picture? How to find the coordinate for my position marker on my picture derived from real data?
Thanks
You're essentially asking how to draw a map of a small area.
Take the corners of the 600x800 image and decide where they should land in the real world. Ideally the they should make a rectangle of the same 3 by 4 shape, so that the conversion factor of real world distance to pixels is the same on horizontal and vertical. After that it's just linear interpolation.
firstly, I wanted to know the metric unit of the 3d point we got from the opencv reprojectImageTo3D() function.
secondly, I have calibrated each camera individually with a chessboard with "mm" as metric unit and then use the opencv functions to calibrate the stereo system, rectify the stereo pair and then compute the disparity map.
Basically i want the distance of a center of a bounding box.
so i compute the disparity map and reproject it to 3D with the reprojectImageTo3D() function and then i take from those 3D points, the one which correspond to the center of the bbox (x, y).
But which image should i use to get the center of bbox? the rectified or the original?
Secondly, is it better to use the same camera model for a stereo system?
Thank you
During the calibration process (calibrateCamera) you have to give the points grid of your calibration target. The unit that you give there will then define the unit for the rest of the process.
When calling reprojectImageTo3D, you probably used the matrix Q output by stereoRectify, which takes in the individual calibrations (cameraMatrix1, cameraMatrix2). That's where the unit came from.
So in your case you get mm I guess.
reprojectImageTo3D has to use the rectified image, since the disparity is calculated using the rectified image (It wouldn't be properly aligned otherwise). Also, when calculating the disparity, it is calculated relative to the first image given (left one in the doc). So you should use the left rectified image if you computed the disparity like this: cv::StereoMatcher::compute(left, right)
I never had two different cameras, but it makes sense to use the same ones. I think that if you have very different color images, edges or any image difference, that could potentially influence the disparity quality.
What is actually very important (unless you are only working with still pictures), is to use cameras that can be synchronized by hardware (e.g. GENLOCK signal: https://en.wikipedia.org/wiki/Genlock). If you have a bit of delay between left and right and a moving subject, the disparity can be wrong. This is also true for the calibration.
Hope this helps!
I couldn't find the proper answer to my problem on the Web, so I'll ask it here. Let's say we're given two 2D photos of the same place taken from slightly different angles. I've chosen the set of points (edge detection), found correspondences between them (which point is which on other photo). Now I need to somehow find out world coordinates of these points in 3D.
For the last 5 hours I've read a lot about it but I still can't understand what steps should I follow. I've tried to estimate motion of a camera using the function recoverPose applied to an essential matrix and two sets of points on each frame. I can't understand what it gives me when I know rotation and translation matrices (thatrecoverPose returned). What should I do in order to achieve my goal?
I also know the calibration matrix of my camera (I use KITTI dataset). I've read opencv documentation but still don't understand.
It's monocular vision.
I need some help developing some code that segments a binary image into components of a certain pixel density. I've been doing some research in OpenCV algorithms, but before developing my own algorithm to do this, I wanted to ask around to make sure it hasn't been made already.
For instance, in this picture, I have code that imports it as a binary image. However, is there a way to segment objects in the objects from the lines? I would need to segment nodes (corners) and objects (the circle in this case). However, the object does not necessarily have to be a shape.
The solution I thought was to use pixel density. Most of the picture will made up of lines, and the objects have a greater pixel density than that of the line. Is there a way to segment it out?
Below is a working example of the task.
Original Picture:
Resulting Images after Segmentation of Nodes (intersection of multiple lines) and Components (Electronic components like the Resistor or the Voltage Source in the picture)
You can use an integral image to quickly compute the density of black pixels in a rectangular region. Detection of regions with high density can then be performed with a moving window in varying scales. This would be very similar to how face detection works but using only one super-simple feature.
It might be beneficial to make all edges narrow with something like skeletonizing before computing the integral image to make the result insensitive to wide lines.
OpenCV has some functionality for finding contours that is able to put the contours in a hierarchy. It might be what you are looking for. If not, please add some more information about your expected output!
If I understand correctly, you want to detect the lines and the circle in your image, right?
If it is the case, have a look at the Hough line transform and Hough circle transform.