I was working with the laplacian function to detect edges in OpenCV, when I ran into some confusion regarding the underlying principles behind the code.
The documentation features us reading an image with the following code and passing it through the laplacian function.
img = cv2.imread("messi5.jpg", cv2.IMREAD_GRAYSCALE)
lap = cv2.Laplacian(img, cv2.CV_32F, ksize=1)
Now, I am able to understand the code written above pretty well. As I believe, we read in an image, and calculate the Laplacian at each pixel. This value can be bigger or smaller than the original 8-bit unsigned int pixel, so we store it in an array of 32-bit floats.
My confusion begins with the next few lines of code. In the documentation, the image is converted back to an 8-bit usigned integer using the convertScaleAbs() function, and then displayed as seen below.
lap = cv2.convertScaleAbs(lap)
cv2.imshow(lap)
However, my instructor showed me the following method of converting back to uint8:
lap = np.uint8(np.absolute(lap))
cv2.imshow(lap)
Surprisingly both solutions display identical images. However, I am unable to understand why this occurs. From what I've seen, np.uint8 simply truncates values (floats, etc.) down to unsigned 8-bit integers. So for example, 1025 becomes 1 as all the other bits beyond the 8-th bit are discarded.
Yet this would literally mean that any value of our laplacian for each pixel would become heavily reduced and muddled. If our Laplacian for a pixel was 1024 (signaling a non-zero second derivative in both x and y dimensions), we would instead have the value 0 on hand (singaling a second derivative of zero and a possible local max/min, or in other words an edge). Thus by my logic, my instructor's solution should fail miserably, but surprisingly everything works fine. Why is this?
On the other hand, I do not have any idea about how convertScaleAbs() works. I'm going to assume it works similarly as my instructor's solution, but I'm not sure. Can someone please explain what's going on?
OpenCV BGR images or Grayscale have pixel values from 0 to 255 when in CV_8U 8 Bit which corresponds to np.uint8, more details here.
So when you use the Laplacian function with ddepth (Desired depth of the destination image.) set to cv2.CV_32F you get this:
lap = cv2.Laplacian(img, cv2.CV_32F, ksize=1)
print(np.amax(lap)) #=> 317.0
print(np.amin(lap)) #=> -315.0
So, you need to convert back to np.uint8, for example:
lap_uint8 = lap.copy()
lap_uint8[lap > 255] = 255
lap_uint8[lap < 0] = 0
lap_uint8 = lap_uint8.astype(np.uint8)
print(np.amax(lap_uint8)) #=> 255
print(np.amin(lap_uint8)) #=> 0
Or with any other more straightforward way which does the same.
But you can use also set -1 as argument for ddepth, see documentation, to get:
lap = cv2.Laplacian(img, -1, ksize=1)
print(np.amax(lap)) #=> 0
(print(np.amin(lap))) #=> 255
In this way you get a wrong result:
lap_abs = np.absolute(lap)
print(np.amax(lap_abs)) #=> 317.0
print(np.amin(lap_abs)) #=> 0.0
Related
I am doing a denoising work and I'm not very familiar with Python. I applied BM3D to get the denoised picture and I also have the original one.
Now I want to get the noise by doing this:
tmp = img - img_denoised
But it turns out to be a very strange black and white figure like this:
So how can I get a proper noise picture? What I wish to get is image like this:
Edit:
Got an image from the Internet and done the same processing.
after processing:
Edit again:
Providing a simple example:
import cv2
img = cv2.imread("path of the original image")
img_denoised = cv2.imread("path of the denoised image")
tmp = img - img_denoised
cv2.imwrite("test_noise.jpg",tmp)
In your example code, both img and img_denoised are uint8 NumPy arrays. When operating on these arrays, the output is of the same type. These operations are modulo 256. When the result of an operation exceeds 255, it wraps around back to 0, and when the result is negative, it wraps around back to 255. For example:
np.array([5], np.uint8) - np.array([10], np.uint8)
return array([251], dtype=uint8). Instead of -5, which cannot be represented in a uint8 value, we get 256 - 5 = 251.
The subtraction img - img_denoised results in some values just above zero, which look black, and some values just below zero, which will be stored as values near 255 and look white.
We can solve this in different ways. One is to force the operation to happen with floating-point values:
tmp = img.astype(float) - img_denoised.astype(float)
We now have an array of floats, about half of them negative. But a JPEG file can only store uint8 values, and casting our float values to uint8 will get us back where we started. So we need to shift the origin (the zero value) to a middle-gray (typically 128):
tmp = img.astype(float) - img_denoised.astype(float)
tmp += 128
cv2.imwrite("test_noise.jpg", tmp.astype(np.uint8))
This is very fiddly, but it works. I prefer using a library that takes care of data types for me, so I don't have to think about them when I don't want to. DIPlib is such a library (disclaimer: I'm an author):
import diplib as dip
img = dip.ImageRead("7yJS3.png")
img_denoised = dip.ImageRead("xjQIy.png")
tmp = img - img_denoised
tmp += 128
dip.ImageWrite(tmp, "test_noise.jpg")
In DIPlib, arithmetic operations automatically promote the images to a floating-point type, unless we explicitly prevent it. Saving as JPEG silently casts the image to uint8 (this is where errors will happen if the pixel values are outside the range of the uint8 type).
With limited information it is hard to pin-point the problem. Please provide input images and more code.
Looks like the result image is a binary bitmap, only white or black, no gray. Your tmp image's pixel format is probably incorrect, which might be due to your img and img_denoised are not having the same pixel format, or both are wrong. Try display your input images to see if they look normal.
Your img and img_denoised should be the same pixel format, maybe 8-bit gray scale, or 24-bit RGB, and after img-img_denoised, the result should still have the same pixel format.
It could also due to it's unsigned, try to make it signed, or + 128 to all pixels and see what happened.
What is your image data type/range? 0-1 or 0-255?
if your image is 0-1 float32, the noise image will have a data range of [-1, 1], around half of the pixels is below 0, and when displayed by cv2.imshow() as "black".
Try
noise = origin - clean
noise = (noise + 1) * 0.5
I have 10 greyscale brain MRI scans from BrainWeb. They are stored as a 4d numpy array, brains, with shape (10, 181, 217, 181). Each of the 10 brains is made up of 181 slices along the z-plane (going through the top of the head to the neck) where each slice is 181 pixels by 217 pixels in the x (ear to ear) and y (eyes to back of head) planes respectively.
All of the brains are type dtype('float64'). The maximum pixel intensity across all brains is ~1328 and the minimum is ~0. For example, for the first brain, I calculate this by brains[0].max() giving 1328.338086605072 and brains[0].min() giving 0.0003886114541273855. Below is a plot of a slice of a brain[0]:
I want to binarize all these brain images by rescaling the pixel intensities from [0, 1328] to {0, 1}. Is my method correct?
I do this by first normalising the pixel intensities to [0, 1]:
normalized_brains = brains/1328
And then by using the binomial distribution to binarize each pixel:
binarized_brains = np.random.binomial(1, (normalized_brains))
The plotted result looks correct:
A 0 pixel intensity represents black (background) and 1 pixel intensity represents white (brain).
I experimented by implementing another method to normalise an image from this post but it gave me just a black image. This is because np.finfo(np.float64) is 1.7976931348623157e+308, so the normalization step
normalized_brains = brains/1.7976931348623157e+308
just returned an array of zeros which in the binarizition step also led to an array of zeros.
Am I binarising my images using a correct method?
Your method of converting the image to a binary image basically amounts to random dithering, which is a poor method of creating the illusion of grey values on a binary medium. Old-fashioned print is a binary medium, they have fine-tuned the methods to represent grey-value photographs in print over centuries. This process is called halftoning, and is shaped in part by properties of ink on paper, that we do not have to deal with in binary images.
So what methods have people come up with outside of print? Ordered dithering (mostly Bayer matrix), and error diffusion dithering. Read more about dithering on Wikipedia. I wrote a blog post showing how to implement all of these methods in MATLAB some years ago.
I would recommend you use error diffusion dithering for your particular application. Here is some code in MATLAB (taken from my blog post liked above) for the Floyd-Steinberg algorithm, I hope that you can translate this to Python:
img = imread('https://i.stack.imgur.com/d5E9i.png');
img = img(:,:,1);
out = double(img);
sz = size(out);
for ii=1:sz(1)
for jj=1:sz(2)
old = out(ii,jj);
%new = 255*(old >= 128); % Original Floyd-Steinberg
new = 255*(old >= 128+(rand-0.5)*100); % Simple improvement
out(ii,jj) = new;
err = new-old;
if jj<sz(2)
% right
out(ii ,jj+1) = out(ii ,jj+1)-err*(7/16);
end
if ii<sz(1)
if jj<sz(2)
% right-down
out(ii+1,jj+1) = out(ii+1,jj+1)-err*(1/16);
end
% down
out(ii+1,jj ) = out(ii+1,jj )-err*(5/16);
if jj>1
% left-down
out(ii+1,jj-1) = out(ii+1,jj-1)-err*(3/16);
end
end
end
end
imshow(out)
Resampling the image before applying the dithering greatly improves the results:
img = imresize(img,4);
% (repeat code above)
imshow(out)
NOTE that the above process expects the input to be in the range [0,255]. It is easy to adapt to a different range, say [0,1328] or [0,1], but it is also easy to scale your images to the [0,255] range.
Have you tried a threshold on the image?
This is a common way to binarize images, rather than trying to apply a random binomial distribution. You could try something like:
binarized_brains = (brains > threshold_value).astype(int)
which returns an array of 0s and 1s according to whether the image value was less than or greater than your chosen threshold value.
You will have to experiment with the threshold value to find the best one for your images, but it does not need to be normalized first.
If this doesn't work well, you can also experiment with the thresholding options available in the skimage filters package.
IT is easy in OpenCV. as mentioned a very common way is defining a threshold, But your result looks like you are allocating random values to your intensities instead of thresholding it.
import cv2
im = cv2.imread('brain.png', cv2.CV_LOAD_IMAGE_GRAYSCALE)
(th, brain_bw) = cv2.threshold(imy, 128, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)
th = (DEFINE HERE)
im_bin = cv2.threshold(im, th, 255, cv
cv2.imwrite('binBrain.png', brain_bw)
brain
binBrain
I have a matrix consisting of True and False values. I want to print this as an image where all the True values are white and the False values are black. The matrix is called indices. I have tried the following:
indices = indices.astype(int) #To convert the true to 1 and false to 0
indices*=255 #To change all the 1's to 255
cv2.imshow('Indices',indices)
cv2.waitKey()
This is printing a fully black image. When I try, print (indices==255).sum(), it returns a values of 669 which means that there are 669 elements/pixels in the indices matrix which should be white. But I can only see a pure black image. How can I fix this?
As far as I know, opencv represents an image as a matrix of floats ranging from 0 to 1, or an integer with values between the minimum and the maximum of that type.
An int has no bounds (except the boundaries of what can be represented with all available memory). If you however use np.uint8, that means you are working with (unsigned) bytes, where the minimum is 0 and the maximum 255.
So there are several options. The two most popular would be:
cast to np.uint8 and then multiply with 255:
indices = indices.astype(np.uint8) #convert to an unsigned byte
indices*=255
cv2.imshow('Indices',indices)
cv2.waitKey()
Use a float representation:
indices = indices.astype(float)
cv2.imshow('Indices',indices)
cv2.waitKey()
Note that you could also choose to use np.uint16 for instance to use unsigned 16-bit integers. In that case you will have to multiply with 65'535. The advantage of this approach is that you can use an arbitrary color depth (although most image formats use 24-bit colors (8 bits per channel), there is no reason not to use 48-bit colors. If you for instance are doing image processing for a glossy magazine, then it can be beneficial to work with more color depth.
Furthermore even if the end result is a 24-bit colorpalette, one can sometimes better use a higher color depth for the different steps in image processing.
i am trying to make a tracking program that takes an image and displays where the the object with the specified color is:
example: https://imgur.com/a/8LR40
to do this i am using RGB right now but it is realy hard to work with it so i want to convert it into a hue so it is easier to work with. i am trying to use colorsys but after doing some research i have no idea what parameters it wants in and what it gives. i have tried to get a match using colorizer.org but i get some nonsence.
>>> import colorsys
>>> colorsys.rgb_to_hsv(45,201,18)
(0.3087431693989071, 0.9104477611940298, 201)
alredy the colorsys is not acting as documented because at https://docs.python.org/2/library/colorsys.html it says that the output is always a float between 0 and 1, but the value is 201. that also is impossible as in standard HSV the value is between 0 and 100.
my questions are:
what does colorsys expect as an input?
how do i convert the output to standard HSV? (Hue = 0-360, saturation = 0-100, value = 0-100)
Coordinates in all of these color spaces are floating point values. In the YIQ space, the Y coordinate is between 0 and 1, but the I and Q coordinates can be positive or negative. In all other spaces, the coordinates are all between 0 and 1.
https://docs.python.org/3/library/colorsys.html
You must scale from 0 - 255 to 0 - 1, or divide your RGB values with 255. If using python 2 make sure not to do floor division.
I am using OpenCV 2 to do some images manipulations in YCbCr color space. For the moment I can detect some noise due to the conversion RGB -> YCbCr and then YCbCr -> RGB, but as said in the documentation:
If you use cvtColor with 8-bit images, the conversion will have some information lost. For many applications, this will not be noticeable but it is recommended to use 32-bit images in applications that need the full range of colors or that convert an image before an operation and then convert back.
So I would like to convert my image in 16 or 32 bits, but I didn't found how to do it with NumPy. Some ideas?
img = cv2.imread(imgNameIn)
# Here I want to convert img in 32 bits
cv2.cvtColor(img, cv2.COLOR_BGR2YCR_CB, img)
# Some image processing ...
cv2.cvtColor(img, cv2.COLOR_YCR_CB2BGR, img)
cv2.imwrite(imgNameOut, img, [cv2.cv.CV_IMWRITE_PNG_COMPRESSION, 0])
Thanks to #moarningsun, problem resolved:
i = cv2.imread(imgNameIn, cv2.CV_LOAD_IMAGE_COLOR) # Need to be sure to have a 8-bit input
img = np.array(i, dtype=np.uint16) # This line only change the type, not values
img *= 256 # Now we get the good values in 16 bit format
The accepted answer is not accurate. A 16-bit image has 65536 intensity levels (2^16) hence, values ranging from 0 to 65535.
If one wants to obtain a 16-bit image from an image represented as an array of float ranging from 0 to 1, one has to multiply every coefficient of this array by 65535.
Also, it is good practice to cast the type of your end result as the very last step of the operations you perform.
This is mainly for two reasons:
- If you perform divisions or multiplications by float, the result will return a float and you will need to change the type again.
- In general (in the mathematical sense of the term), a transformation from float to integer can introduce errors. Casting the type at the very end of the operations prevents error propagation.