In computer vision course the teacher says that first of all image should be normalized to remove brightness variations.
The link for the video https://youtu.be/0WNiYrRjJbM
The formula looks like below:
I = I/||I||, where I is an image, ||I|| is the magnitude of this image.
Could somebody explain how to implement this normalization using python and any library, opencv for instance. May be there is already exists such function in some library and ready to use?
What I think is the magnitude of an image calculates like m=sqrt(sum(v*v)), where v - is the array of values for each point after converting image to hsv. And then I=v/m, each point value divided by magnitude. But this doesn't work. It looks strange.
Thanks.
Below is the small code i wrote which does image normalization.
import numpy as np
import cv2
img = cv2.imread("../images/segmentation/peppers_BlueHills.png")
print("img shape = ", img.shape)
print("img type = ", img.dtype)
print("img[0][0]", img[0][0])
#2-norm
norm = np.linalg.norm(img)
print("img norm = ", norm)
img2 = img / norm
#here img2 becomes float64, reducing it to float32
img2 = np.float32(img2)
print("img2 type = ", img2.dtype)
print("img2[0][0]", img2[0][0])
cv2.imwrite('../images/segmentation/NormalizedPeppers_BlueHills.tif', img2)
cv2.imshow('normalizedImg', img2.astype(np.uint8))
cv2.waitKey(0)
cv2.destroyAllWindows()
exit(0)
The output looks like below:
img shape = (384, 512, 3)
img type = uint8
img[0][0] [64 29 62]
img norm = 78180.45637497904
img2 type = float32
img2[0][0] [0.00081862 0.00037094 0.00079304]
The output image looks like black square.
However it's possible to equalize brightness in Photoshop for instance, to see something.
Each channel (R,G,B) becomes float and only tiff format supports it.
To me it's still not clear what it gives us to divide each pixel brightness by some value, in this case it's 2-norm value of an image. It just makes an image too dark and unreadable. But it doesn't equalize brightness to make it even across entire image.
What do you think about?
Related
I am doing a denoising work and I'm not very familiar with Python. I applied BM3D to get the denoised picture and I also have the original one.
Now I want to get the noise by doing this:
tmp = img - img_denoised
But it turns out to be a very strange black and white figure like this:
So how can I get a proper noise picture? What I wish to get is image like this:
Edit:
Got an image from the Internet and done the same processing.
after processing:
Edit again:
Providing a simple example:
import cv2
img = cv2.imread("path of the original image")
img_denoised = cv2.imread("path of the denoised image")
tmp = img - img_denoised
cv2.imwrite("test_noise.jpg",tmp)
In your example code, both img and img_denoised are uint8 NumPy arrays. When operating on these arrays, the output is of the same type. These operations are modulo 256. When the result of an operation exceeds 255, it wraps around back to 0, and when the result is negative, it wraps around back to 255. For example:
np.array([5], np.uint8) - np.array([10], np.uint8)
return array([251], dtype=uint8). Instead of -5, which cannot be represented in a uint8 value, we get 256 - 5 = 251.
The subtraction img - img_denoised results in some values just above zero, which look black, and some values just below zero, which will be stored as values near 255 and look white.
We can solve this in different ways. One is to force the operation to happen with floating-point values:
tmp = img.astype(float) - img_denoised.astype(float)
We now have an array of floats, about half of them negative. But a JPEG file can only store uint8 values, and casting our float values to uint8 will get us back where we started. So we need to shift the origin (the zero value) to a middle-gray (typically 128):
tmp = img.astype(float) - img_denoised.astype(float)
tmp += 128
cv2.imwrite("test_noise.jpg", tmp.astype(np.uint8))
This is very fiddly, but it works. I prefer using a library that takes care of data types for me, so I don't have to think about them when I don't want to. DIPlib is such a library (disclaimer: I'm an author):
import diplib as dip
img = dip.ImageRead("7yJS3.png")
img_denoised = dip.ImageRead("xjQIy.png")
tmp = img - img_denoised
tmp += 128
dip.ImageWrite(tmp, "test_noise.jpg")
In DIPlib, arithmetic operations automatically promote the images to a floating-point type, unless we explicitly prevent it. Saving as JPEG silently casts the image to uint8 (this is where errors will happen if the pixel values are outside the range of the uint8 type).
With limited information it is hard to pin-point the problem. Please provide input images and more code.
Looks like the result image is a binary bitmap, only white or black, no gray. Your tmp image's pixel format is probably incorrect, which might be due to your img and img_denoised are not having the same pixel format, or both are wrong. Try display your input images to see if they look normal.
Your img and img_denoised should be the same pixel format, maybe 8-bit gray scale, or 24-bit RGB, and after img-img_denoised, the result should still have the same pixel format.
It could also due to it's unsigned, try to make it signed, or + 128 to all pixels and see what happened.
What is your image data type/range? 0-1 or 0-255?
if your image is 0-1 float32, the noise image will have a data range of [-1, 1], around half of the pixels is below 0, and when displayed by cv2.imshow() as "black".
Try
noise = origin - clean
noise = (noise + 1) * 0.5
I want to change the pixel value of a grayscale image using OpenCV.
Assume that I have a grayscale image and I want to convert all its pixel to 0 value one at a time. So that the resultant image is completely black. I tried this but there is no change in the image:
image = cv2.imread('test_image.png',0)
for i in range(image.shape[0]):
for j in range(image.shape[1]):
image[i, j] = 0
Result:
display the updated image
In most cases, you want to avoid using double for loops to modify pixel values since it is very slow. A better approach is to use Numpy for pixel modification since OpenCV uses Numpy arrays to display images. To achieve your desired result, you can use np.zeros to create a completely black image with the same shape as the original image.
import cv2
import numpy as np
image = cv2.imread("test_image.png", 0)
black = np.zeros(image.shape, np.uint8)
cv2.imshow('image', image)
cv2.imshow('black', black)
cv2.waitKey(0)
For example with a test image. Original (left), result (right)
I would suggest you to always try manipulating the copy of an image so that the image doesn't get affected in the wrong way. Coming to your question, you can do the following:
import cv2
image = cv2.imread('test_image.png',0)
#Creating a copy of the image to confirm right operation is performed on the image.
image_copy = image.copy()
image_copy[:,:] = [0] #Setting all values to 0.
I'm trying resize images retrieved from cifar10 in the original 32x32 to 96x96 for use with MobileNetV2, howevery I'm running into this error. Tried a variety of solutions but nothing seems to work.
My code:
for a in range(len(train_images)):
train_images[a] = cv2.resize(train_images[a], dsize=(minSize, minSize), interpolation=cv2.INTER_CUBIC)
Error I'm getting:
----> 8 train_images[a] = cv2.resize(train_images[a], dsize=(minSize, minSize), interpolation=cv2.INTER_CUBIC)
ValueError: could not broadcast input array from shape (96,96,3) into shape (32,32,3)
Sometimes you have to convert the image from RGB to grayscale. If that is the problem, the only thing you should do is gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY), resize the image and then again resized_image = cv2.cvtColor(gray_image, cv2.COLOR_GRAY2RGB)
I have never run into this error but if the first option doesn't work, you can try and resize image with pillow like this:
from PIL import Image
im = Image.fromarray(cv2_image)
nx, ny = im.size
im2 = im.resize((nx*2, ny*2), Image.LANCZOS)
cv2_image = cv2.cvtColor(numpy.array(im2), cv2.COLOR_RGB2BGR)
You can make this into a function and call it in the list comprehension. I hope this solves your problem :)
This is simply because you are reading the 32x32 image from train_images and trying to save the reshaped image (96x96) in the same array which is impossible!
Try something like:
train_images_reshaped = np.array((num_images, 96, 96, 3))
for a in range(len(train_images)):
train_images_reshaped[a] = cv2.resize(train_images[a], dsize=(minSize, minSize), interpolation=cv2.INTER_CUBIC)
There are some interpolation algorithms in OpenCV. Such as-
INTER_NEAREST – a nearest-neighbor interpolation
INTER_LINEAR – a bilinear interpolation (used by default)
INTER_AREA – resampling using pixel area relation. It may be a
preferred method for image decimation, as it gives moire’-free
results. But when the image is zoomed, it is similar to the
INTER_NEAREST method.
INTER_CUBIC – a bicubic interpolation over 4×4 pixel neighborhood
INTER_LANCZOS4 – a Lanczos interpolation over 8×8 pixel neighborhood
Code:
image_scaled=cv2.resize(image,None,fx=.75,fy=.75,interpolation = cv2.INTER_LINEAR)
img_double=cv2.resize(image,None,fx=2,fy=2,interpolation=cv2.INTER_CUBIC)
image_resize=cv2.resize(image,(200,300),interpolation=cv2.INTER_AREA)
image_resize=cv2.resize(image,(500,400),interpolation=cv2.INTER_LANCZOS4)
You can find the details about python implementation here as well: How to resize images in OpenCV python
I am trying to resize a .jpg image with skimage.transform.resize function. Function returns me weird result (see image below). I am not sure if it is a bug or just wrong use of the function.
import numpy as np
from skimage import io, color
from skimage.transform import resize
rgb = io.imread("../../small_dataset/" + file)
# show original image
img = Image.fromarray(rgb, 'RGB')
img.show()
rgb = resize(rgb, (256, 256))
# show resized image
img = Image.fromarray(rgb, 'RGB')
img.show()
Original image:
Resized image:
I allready checked skimage resize giving weird output, but I think that my bug has different propeties.
Update: Also rgb2lab function has similar bug.
The problem is that skimage is converting the pixel data type of your array after resizing the image. The original image has a 8 bits per pixel, of type numpy.uint8, and the resized pixels are numpy.float64 variables.
The resize operation is correct, but the result is not being correctly displayed. For solving this issue, I propose 2 different approaches:
To change the data structure of the resulting image. Prior to changing to uint8 values, the pixels have to be converted to a 0-255 scale, as they are on a 0-1 normalized scale:
# ...
# Do the OP operations ...
resized_image = resize(rgb, (256, 256))
# Convert the image to a 0-255 scale.
rescaled_image = 255 * resized_image
# Convert to integer data type pixels.
final_image = rescaled_image.astype(np.uint8)
# show resized image
img = Image.fromarray(final_image, 'RGB')
img.show()
Update: This method is deprecated, as per scipy.misc.imshow
To use another library for displaying the image. Taking a look at the Image library documentation, there isn't any mode supporting 3xfloat64 pixel images. However, the scipy.misc library has the appropriate tools for converting the array format in order to display it correctly:
from scipy import misc
# ...
# Do OP operations
misc.imshow(resized_image)
I wrote a little script to transform pictures of chalkboards into a form that I can print off and mark up.
I take an image like this:
Auto-crop it, and binarize it. Here's the output of the script:
I would like to remove the largest connected black regions from the image. Is there a simple way to do this?
I was thinking of eroding the image to eliminate the text and then subtracting the eroded image from the original binarized image, but I can't help thinking that there's a more appropriate method.
Sure you can just get connected components (of certain size) with findContours or floodFill, and erase them leaving some smear. However, if you like to do it right you would think about why do you have the black area in the first place.
You did not use adaptive thresholding (locally adaptive) and this made your output sensitive to shading. Try not to get the black region in the first place by running something like this:
Mat img = imread("desk.jpg", 0);
Mat img2, dst;
pyrDown(img, img2);
adaptiveThreshold(255-img2, dst, 255, ADAPTIVE_THRESH_MEAN_C,
THRESH_BINARY, 9, 10); imwrite("adaptiveT.png", dst);
imshow("dst", dst);
waitKey(-1);
In the future, you may read something about adaptive thresholds and how to sample colors locally. I personally found it useful to sample binary colors orthogonally to the image gradient (that is on the both sides of it). This way the samples of white and black are of equal size which is a big deal since typically there are more background color which biases estimation. Using SWT and MSER may give you even more ideas about text segmentation.
I tried this:
import numpy as np
import cv2
im = cv2.imread('image.png')
gray = cv2.cvtColor(im,cv2.COLOR_BGR2GRAY)
grayout = 255*np.ones((im.shape[0],im.shape[1],1), np.uint8)
blur = cv2.GaussianBlur(gray,(5,5),1)
thresh = cv2.adaptiveThreshold(blur,255,1,1,11,2)
contours,hierarchy = cv2.findContours(thresh,cv2.RETR_LIST,cv2.CHAIN_APPROX_SIMPLE)
wcnt = 0
for item in contours:
area =cv2.contourArea(item)
print wcnt,area
[x,y,w,h] = cv2.boundingRect(item)
if area>10 and area<200:
roi = gray[y:y+h,x:x+w]
cntd = 0
for i in range(x,x+w):
for j in range(y,y+h):
if gray[j,i]==0:
cntd = cntd + 1
density = cntd/(float(h*w))
if density<0.5:
for i in range(x,x+w):
for j in range(y,y+h):
grayout[j,i] = gray[j,i];
wcnt = wcnt + 1
cv2.imwrite('result.png',grayout)
You have to balance two things, removing the black spots but balance that with not losing the contents of what is on the board. The output I got is this:
Here is a Python numpy implementation (using my own mahotas package) of the method for the top answer (almost the same, I think):
import mahotas as mh
import numpy as np
Imported mahotas & numpy with standard abbreviations
im = mh.imread('7Esco.jpg', as_grey=1)
Load the image & convert to gray
im2 = im[::2,::2]
im2 = mh.gaussian_filter(im2, 1.4)
Downsample and blur (for speed and noise removal).
im2 = 255 - im2
Invert the image
mean_filtered = mh.convolve(im2.astype(float), np.ones((9,9))/81.)
Mean filtering is implemented "by hand" with a convolution.
imc = im2 > mean_filtered - 4
You might need to adjust the number 4 here, but it worked well for this image.
mh.imsave('binarized.png', (imc*255).astype(np.uint8))
Convert to 8 bits and save in PNG format.