I am trying to use an image editor (such as MS paint or paint.net) to draw bounding boxes with a fixed color (such as pure red with RGB = 255, 0, 0) on images, and later load the images in python (such as opencv imread) by looking for pixels with this RGB value (or BGR value in 0, 0, 255) so that I can create labels for object detection purposes.
However, after the image is saved and loaded, I don't see any pixels with such RGB or BGR values. Instead, these pixels are within a range of values which may be far from what I specified.
I also tried to use something like this for experiment purpose:
cv2.rectangle(img_arr, (10, 10), (60, 60), (0, 0, 255), thickness=1)
Right after this statement, I do see pixels with values (0, 0, 255). However, when I run cv2.imwrite and then cv2.imread like this:
cv2.imwrite(full_path_name, img_arr)
and later:
img_arr = cv2.imread(full_path_name)
I noticed in this new img_arr, there is no any pixels with these BGR values any more. What is the problem?
Back to original problem of labeling images for object detection, I don't want to use any tools for labeling as most of them are detecting mouse motions, however, my task of object detection is to detect text areas, which requires very accurate bounding boxes so that the later stages of image segmentation and character recognition won't be too hard. Therefore, I prefer a static way so that the bounding boxes can be adjusted to be accurate and even be reviewed. When they are final, we create labels. Will this idea even work?
Thank you very much!
Be careful when using JPEG as intermediate storage for image processing tasks since it is a lossy format and values may differ when you subsequently read them back.
Consider maybe using lossless PNG format for intermediate storage. Or use NetPBM PGM (greyscale) or PPM (colour) format for a particularly simple format to read and write - though be aware it cannot persist metadata, such as copyrights or EXIF data.
Related
I learned that OpenCV color order is BGR while that of Matpotlib's Pyplot is RGB. So I started experimenting with reading and displaying an image using both libraries. Here is the Image I experimented with:
It's just a Black and white image with red color in some parts. Now, when I used pyplot.imshow() to display the image copy read by OpenCV, the tie's and the shoes' color changes to blue. The same happened when I used cv2.imshow() to display a copy of the image read by pyplot.imread(). However, the color remains the same when I use cv2.imshow() to display the image copy read by cv2.imread() and use plt.imshow() to display a copy read by plt.imread().
I am just curious and would like to know about the things that go behind the scenes when such operations are performed. Can Anyone help me with that?
Assume you have a vector like this: [0, 0, 255].
You know have two different color encodings: RGB and BGR. So, in the first case you have Blue, in the second system you have Red.
Now, Let's call RGB_Reader and BGR_Reader two systems to open the number and display it.
If I open the image with BGR_Reader, I have [0, 0, 255]. I pass it on to RGB_Reader, still is [0, 0, 255]. I see Blue. When I pass it around, I would pass [0, 0, 255]. I open it again with RGB_Reader, it is blue, again.
The same happens the other way around.
Does it make sense to you? The vector doesn't change, but the way it is decoded does.
Now introduce another thing, called jpg_encoder. That one is telling people where to put Blue, Red and Green, and will probably re-order things.
That's basically dependent upon the color convention. OpenCV follows BGR convention, which means that it interprets a triplet (0, 150, 255) as B, G and R values respectively. And all other libraries follow the more obvious RGB convention. The reason for OpenCV to follow BGR convention is legacy I guess(since 1991, maybe).
I would recommend you to use OpenCV methods only such as cv2.imread(), cv2.imshow() or cv2.imwrite(), etc. to perform any operation on image(s). Because writing code in this way you will never have to worry about the underlying BGR or RGB stuff, everything will just work fine.
The problem would arise when you want to use OpenCV with matplotlib or pillow etc. In those cases you need to take extra care while passing on your image matrix to respective libraries. Since OpenCV holds the data in BGR format, while matplotlib or pillow would be expecting RGB format, so you explicitly need to convert the color order using cv2.cvtColor(img, cv2.COLOR_BGR2RGB), or you may use numpy slicing as well to swap the first and third channel as well.
You may consult this answer for a demo code which converts OpenCV images to PIL(another python image processing module) format images.
I have a question regarding opencv python , I have an example of image here I want to know how to get the size of this object using opencv python.
Here's the sample image
Here's the output I want
I'm just using paint to this output that I want.
A simple approach is to obtain a binary image then find the bounding box on that image. Here's the result with the width (in pixels) and the height of the box drawn onto the image. To determine real-world measurements, you would need calibration information to scale pixels into concrete values (such as centimeters). Without calibration information to translate pixels into quantifiable lengths, it would be difficult to convert this to a real-life size.
Code
import cv2
# Load image, grayscale, Gaussian blur, Otsu's threshold
image = cv2.imread("1.jpg")
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (5,5), 0)
thresh = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
# Find bounding box
x,y,w,h = cv2.boundingRect(thresh)
cv2.rectangle(image, (x, y), (x + w, y + h), (36,255,12), 2)
cv2.putText(image, "w={},h={}".format(w,h), (x,y - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (36,255,12), 2)
cv2.imshow("thresh", thresh)
cv2.imshow("image", image)
cv2.waitKey()
Here's a quick recommendation why not use an object detector like YOLO, there are lot of pretrained weights you can download online and banana is fortunately within the coco classes.
You can take a look at this github repo: https://github.com/divikshrivastava/drfoodie
Edit:
Take a look here for a sample
https://drive.google.com/file/d/1uJE0hy9mv75Ya3dqBIyw-kn1h87WLxy4/view?usp=drivesdk
In your case, using just an RGB picture, you actually can't.
Nevertheless, a simple and practical way would be the following. You probably need a reference object with known size in real world in a measurable unit such as millimeters. This should be placed beforehand at the same distance from the camera with the object of interest. Having detected both objects within the image (reference object and object of interest), you would be able to calculate the "Pixel Per Metric ratio" in order to compute it's actual size. For further detail, you check these links : tutorial, similarly in github.
Another way would be by using depth cameras, or just simply retrieve the distance from the object of interest using alternative techniques as this answer may suggest.
(edit) On the other hand, since your question doesn't exactly clarify whether you mean real-world metrics (i.e. centimeters) or just a measurement in pixels, forgive me if I mislead you..
Good day,
I have an image which I generate through a deep learning process. The image is RGB, and the values of the image range from -0.28 to 1.25. Typically I would rescale the image so that the values are floating point between 0 and 1, or integers between 0 and 255. However I have found that in my current experiment doing this has made my images much darker. The image type is np.array (float64).
If I plot the image using matplotlib.pyplot then the values of the original image get clipped, but the image is not darkened.
The problem with this is that I am unable to save this version of the image. plt.imsave('image.png', art) gives an error.
When I scale the image I get the below output which is dark. This image can be saved using plt.imsave().
Here is my scaling function:
def scale(img):
return((img - img.min())/(img.max() - img.min()) * 255)
My questions:
1) Why I am I not able to save my image in the first (bright) image? If scaling is the problem, then:
2) Why does scaling make the image dark.
Help is much appreciated.
1) Why am I not able to save my image in the first (bright) image?
It's hard to answer this without seeing the specific error you're getting, but my guess is it might have to do with the range of values in your image. Maybe negative values are an issue, or the fact that you have both negative floats and floats larger than 1.
If I create some fake RGB image data in the range [-0.28, 1.25] and try to save it with plt.imsave(), I get the following error:
ValueError: Floating point image RGB values must be in the 0..1 range.
2) Why does scaling make the image dark?
Scaling your image's pixel values will likely change the appearance.
Imagine you had a light image, such that the values in the image ranged from [200, 255]. When you scale the values, you spread the values from [0, 255] and now you have pixels that were previously bright (around 200) being mapped to black (0). If you have a generally bright image, it will seem darker after scaling. This seems to be the case for you.
As a side note: I would suggest using Pillow or OpenCV rather than Matplotlib if you're doing lots of image-related work :)
EDIT
As #alkasm pointed out in a comment, when you use plt.imshow() to display the image, the values are clipped. This means that the first image will have all negative values mapped to 0, and all values greater than 1 mapped to 1. The first image is clipped and saturated to make it appear that there are more dark and bright pixels than there should be.
So it's not that the second image is darker, it's that the first image isn't displayed properly.
Input image
I need to group the region in green and get its coordinates, like this output image. How to do this in python?
Please see the attached images for better clarity
At first, split the green channel of the image, put a threshold on that and have a binary image. This binary image contains the objects of the green area. Start dilating the image with the suitable kernel, this would make adjacent objects stick to each other and become to one big object. Then use findcontour to take the sizes of all objects, then hold the biggest object and remove the others, this image would be your mask. Now you can reconstruct the original image (green channel only) with this mask and fit a box to the remained objects.
You can easily find the code each part.
I would like to darken one image based on the mask of an edge-detected second image.
Image 1: Original (greyscale) image
Image 2: Edge detected (to be used as mask)
Image 3: Failed example showing cv2.subtract processing
In my failed example (Image 3), I subtracted the white pixels (255) from the original image but what I want to do is DARKEN the original image based on a mask of the edge detected image.
In this article: How to fast change image brightness with python + OpenCV?, Bill Gates describes how he converts the image to HSV, splits out then modifies Value, and then finally merges back. This seems like a reasonable approach but I only want to modify the Value where the mask is white i.e. the edge exists.
Ultimately, I am trying to enhance the edge of a low resolution thermal video stream in a similar way to the FLIR One VividIR technology.
I believe that I've made it really far as a complete novice to image processing, OpenCV and Python but after days now of trying just about every function OpenCV offers, I've got myself stuck.
## get the edge coordinates
pos = np.where(edge >0)
## divide
img[pos] //=2