I need to crop an image with subpixel accuracy. For example, I might need to create an interpolated rectangular crop with corners (108.5, 350.9) and (368.3, 230.1) out of an image with dimensions 640x480. How can I achieve this?
Edit: It's a reasonable concession to stretch the cropped area to fit it into a data matrix. However you can't just change the borders of the crop to integer coordinates.
Well I'm not sure if I can call this an answer because I don't really know what your question is but I try to shed some light on you.
So I guess your problem arises from some misconception.
First of all DPI, PPI or whatever you want to use is nothing but a factor that tells you how many dots, points, pixels you have per inch. That factor allows you to determin print sizes or convert between pixel dimensions and inch dimensions.
That's by no means related to cropping an image.
Cropping a rectangular region is a very common task.
Also having ROIs with sub-pixel coordinates is pretty common as their coordinates often arise from calculations that yield non-integer values.
Usually you simply round coordinates to integers so your problem vanishes.
If you want to get intensity values for sub-pixel coordinates you can interpolate between neigbouring pixels. But as images cannot have half pixels you will have to store that information in an image that has more or less pixels.
So here's what I would do if I didn't want to use rounded coordinates.
If my coordinate is >= x.5 I'd add a column or row, otherwise I'd skip the pixel.
If I would add a column or row I would interpolate it's values.
But to be honest I don't see any use case for this and I never had to do anything but using integer coordinates for cropping in my career.
You cannot print fractions of pixels and you cannot display them either, so what's the point?
The solution seems to require that you calculate the center of the rectangle that you want to crop out of the image, and the height and width of the rectangle as well. Then just scale up the entire image until the desired rectangle has integer dimensions, then do a usual crop. You will have to scale the horizontal and vertical dimensions by separate amounts, so this will slightly distort the cropped portion and you will have to adjust for the distortion in the image encoding format you use.
Related
I'm trying to translate coordinates from one picture (Res: 311, 271) to another picture (Res: 1920, 1080).
The coordinates don't need to be accurate in the 2nd picture, it just needs to be the same vector relative to the center of the images
Don't know if that makes sense...
Edit:
So far I've tried to calculate the difference between the center of the first image and the coordinates and then apply them to the bigger image. However this doesn't seem to work very consistently.
You'll need to use trigonometry.
Say there's some object in the image you're trying to get the vector for. Given the x and y distances from the center of the original image, you can tabulate the angle and hypotenuse distance. Simply use the same angle and scale the hypotenuse distance with the new size image.
Regarding the following cv2.inRange(...) invocation:
mask = cv2.inRange(quantized_img, color, color)
Must the 'quantized_img' and 'color' arguments be strictly in HSV or it's OK to have RGB image and RGB 'color'? It seems that RGB works for me, but all examples I could find are HSV-based. So I'm concerned about the correct usage.
Thanks!
In general, use whatever color space you like. RGB/BGR is fine, HSV is fine, something completely made up (with cv.transform) is fine too.
inRange spans a "cube".
Think about it. Imagine a 3D plot with R,G,B axes, or with H,S,V axes. In RGB space, the faces of the cube are aligned with those RGB axes. in HSV space the faces of the cube are aligned with those axes instead.
Now, a cube spanned in RGB space, when transformed into HSV space, is not aligned with the axes in HSV space. In fact it's not even a cube anymore, but likely some kind of torus or section of a cone or something. Same goes the other way around.
If the area of values you're interested in, in whatever space you choose, is flat or even stick-shaped (instead of a mostly spherical cloud), the cube you have to span might align very badly with the area of values you are interested in, and would have to include a lot of values you aren't interested in.
So you move into another color space where your values of interest are somewhat better aligned with the axes in that space. Then the cube spanned by inRange fits your purpose better.
Imagine a "stick" in RGB space going from the black corner to the white corner. It represents "colors" with no saturation to them (because colors are in the other six corners of the cube). Try spanning a cube over that area. Doesn't fit well.
In HSV space however, it's trivial. Usually it's visualized as a cylinder/inverted cone though... span a thin cylinder in the center: any Hue (angle), any Value (height), with very low Saturation (close to the center axis). If you took HSV as a cube, you'd span a thin wall instead. And it all would fit very well.
The explanation given by #Christoph Rackwitz is completely correct. I'll just like to add a few tips observed by me.
HSV and Lab color spaces are the best ones for color segmentation.
Keep BGR color space as probably the last option.
Do not just blindly start finding the range in HSV or Lab color segmentation for your color. Look for other methods too.
Other methods include:
Visualize each color channel of HSV and Lab separately as a grayscale image. You might see some pattern there only.
One thing that helped in my case was I did Otsu's thresholding on "Hue" and "Saturation" channels of my image and then performed a bitwise OR operation on their output. The final image had everything I need without any errors. Do a hit-and-try on your input images to observe such patterns. This helps a lot.
I have a platform which I know the sizes. I would like to get the positions of objects placed on it as (x,y) while looking through the webcam, the origin being the top-left corner of the platform. However, I can only look through from a low angle: example
I detect the objects using the otsu threshold. I want to use the bottom edge of the bounding rectangles, then proportion it accordingly concerning the corners (the best I can think of), but I don't know how to implement it. I tried warp perspective but it enlarges the objects too much. image with threshold // attempt of warp perspective
Any help or suggestion would be appreciated.
Don't use warp perspective to transform the image to make the table cover the complete image as you did here.
While performing perspective transformations in image processing, try not to transform the image too much.
Below is the image with your table marked with red trapezium that you transformed.
Now try to transform it into a perfect rectangle but you do not want to transform it too much as you did. One way is to transform the trapezium to a rectangle by simply adjusting the shorter edge's vertices to come directly above the lower edge's vertices as shown in the image below with green.
This way, things far from the camera will be skewed wrt width only a little. This will give better results. Another even better way would be to decrease the size of the lower edge a little and increase the size of the upper edge a little. This will evenly skew objects kept over the table as shown below.
Now, as you know the real dimensions of the table and the dimensions of the rectangle in the image, you can do the mapping. Using this, you can determine the exact position of the objects kept on the table.
Need to get rectangular shapes from a noisy color segmented image.
The problem is that sometimes the object isn't uniformly the correct color causing holes in the image, or sometimes reflection of the object in the background cause noise/false positive for the color segmentation.
The object could be in any position of the image and of any unknown rectangular size, the holes can occur anywhere inside the object and the noise could occur on any side of the object.
The only known constant is that the object is rectangular in shape.
Whats the best way to filter out that noise to the left of the object and get a bounding box around the object?
Using erosion would remove the detail from the bottom of the object and would cause the size of the bounding box to be wrong
I can't comment because of my rep, but I think you could try to analyse the colored image using other color spaces. Create a upper and a lower bound of the color you want until it selects the object, leaving you with less noise, which you can filter with erode/dilate/opening/closing.
For example, in my project I wanted to found a bounding box of a color-changing green rectangle, so I went and tried a lot of diferent color spaces with a lot of diferent upper/lower bounds until I finally got something worthy. Here is a nice read of what I'm talking about : Docs
You can also try filtering the object by área, after dilating it (you dilate first so the closer points connect to one another, while the more distant ones, which are the noise, don't, creating a big rectangle with lots of noise, but then you filter by a big área).
One method is to take histogram projection on both the horizontal and vertical axes, and select the intersection of ranges that have high projections.
The projections are just totals of object pixels in each row and each column. When you are looking for only one rectangle, the values indicated the probablity of the row/column belonging to the rectangle.
I want to make a program that turns a given image into the format of the MNIST dataset, as a kind of exercise to understand the various preprocessing steps involved. But the description the authors made on their site: http://yann.lecun.com/exdb/mnist/ was not entirely straightforward:
The original black and white (bilevel) images from NIST were size
normalized to fit in a 20x20 pixel box while preserving their aspect
ratio. The resulting images contain grey levels as a result of the
anti-aliasing technique used by the normalization algorithm. the
images were centered in a 28x28 image by computing the center of mass
of the pixels, and translating the image so as to position this point
at the center of the 28x28 field.
So from the original I have to normalize it to fit a 20x20 box, and still preserving their aspect ratio (I think they mean the aspect ratio of the actual digit, not the entire image). Still I really don't know how to do this.
Center of mass: I have found some online code about this, but I don't think I understand the principle. Here is my take on this: The coordinate of each pixel is actually a vector from the origin to that point, so for each point you multiply the coordinate with the image intensity, then sum everything, before dividing by the total intensity of the image. I may be wrong about this :(
Translating the image so as to position this point at the center: Maybe cook up some translation equation, or maybe use a convolutional filter to facilitate translation, then find a path that leads to the center (Dijikstra's shortest path ?).
All in all, I think i still need guidance on this. Can anyone explain about these parts for me ? Thank you very much.
I think