I am trying to enhance my image by first converting RGB color space to YUV color space and do histogram equalization to Y value. However, the output image does not look good.
For histogram equalization, I use the method found on Wikipedia.
Here is the input image:
Here is the output image:
I really don't know where the problem is, can anyone help me or give me some hint?
Below is my code,
import cv2
import numpy as np
img = cv2.imread('/Users/simon/Documents/DIP/Homework_3/input4.bmp')
shape = img.shape
Y_origin_hist = [0] * 256
U_origin = [[0 for i in range(0, shape[1])] for j in range(0, shape[0])]
V_origin = [[0 for i in range(0, shape[1])] for j in range(0, shape[0])]
Y_hist = [0] * 256
# Read RGB value and calculate YUV value
for i in range(0, shape[0]) :
for j in range(0, shape[1]) :
px = img[i,j]
y = int(0.299 * px[2] + 0.587 * px[1] + 0.114 * px[0])
u = int(-0.169 * px[2] - 0.331 * px[1] + 0.5 * px[0]) + 128
v = int(0.5 * px[2] - 0.419 * px[1] - 0.081 * px[0]) + 128
Y_origin_hist[y] = Y_origin_hist[y] + 1
U_origin[i][j] = u
V_origin[i][j] = v
# Histogram equalization
for i in range(0, 256) :
Y_hist[i] = int(((sum(Y_origin_hist[0:i]) - min(Y_origin_hist) - 1) * 255) / ((shape[0] * shape[1]) - 1))
# Write back to RGB value
for i in range(0, shape[0]) :
for j in range(0, shape[1]) :
px = img[i,j]
px[0] = int(Y_hist[px[0]] + 1.77216 * (U_origin[i][j] - 128) + 0.00099 * (V_origin[i][j] - 128))
px[1] = int(Y_hist[px[1]] - 0.3437 * (U_origin[i][j] - 128) - 0.71417 * (V_origin[i][j] - 128))
px[2] = int(Y_hist[px[2]] - 0.00093 * (U_origin[i][j] - 128) + 1.401687 * (V_origin[i][j] - 128))
cv2.imwrite('/Users/simon/Documents/DIP/Homework_3/output4.bmp', img)
For OpenCV in C++ the + and - operators are overloaded and they automatically prevent overflows. However, this is not the case when using Python. For this reason you should use cv2.add() and cv2.subtract() when doing math to get the same results that you would get using C++.
Related
I watched some tutorials and tried to create a Perlin noise generator in python.
It takes in a tuple for the number of vectors in the x and y directions and a scale for the distance in pixels between the arrays, then calculates the dot product between each pixel and each of the 4 arrays surrounding it, It then interpolates them bilinearly to get the pixel's value.
here's the code:
from PIL import Image
import numpy as np
scale = 16
size = np.array([8, 8])
vectors = []
for i in range(size[0]):
for j in range(size[1]):
rand = np.random.rand() * 2 * np.pi
vectors.append(np.array([np.cos(rand), np.sin(rand)]))
interpolated_map = np.zeros(size * scale)
def interpolate(x1, x2, w):
t = (w % scale) / scale
return (x2 - x1) * t + x1
def dot_product(a, b):
return a[0] * b[0] + a[1] * b[1]
for i in range(size[1] * scale):
for j in range(size[0] * scale):
dot_products = []
for m in range(4):
corner_vector_x = round(i / scale) + (m % 2)
corner_vector_y = round(j / scale) + int(m / 2)
x = i - corner_vector_x * scale
y = j - corner_vector_y * scale
if corner_vector_x >= size[0]:
corner_vector_x = 0
if corner_vector_y >= size[1]:
corner_vector_y = 0
corner_vector = vectors[corner_vector_x + corner_vector_y * (size[0])]
distance_vector = np.array([x, y])
dot_products.append(dot_product(corner_vector, distance_vector))
x1 = interpolate(dot_products[0], dot_products[1], i)
x2 = interpolate(dot_products[2], dot_products[3], i)
interpolated_map[i][j] = (interpolate(x1, x2, j) / 2 + 1) * 255
img = Image.fromarray(interpolated_map)
img.show()
I'm getting this image:
but I should be getting this:
I don't know what's going wrong, I've tried watching multiple different tutorials, reading a bunch of different articles, but the result is always the same.
I have reached to this bilinear interpolation code (added here), but I would like to improve this code to 3D, meaning update it to work with an RGB image (3D, instead of only 2D).
If you have any suggestions of how I can to that I would love to know.
This was the one dimension linear interpolation:
import math
def linear1D_resize(in_array, size):
"""
`in_array` is the input array.
`size` is the desired size.
"""
ratio = (len(in_array) - 1) / (size - 1)
out_array = []
for i in range(size):
low = math.floor(ratio * i)
high = math.ceil(ratio * i)
weight = ratio * i - low
a = in_array[low]
b = in_array[high]
out_array.append(a * (1 - weight) + b * weight)
return out_array
And this for the 2D:
import math
def bilinear_resize(image, height, width):
"""
`image` is a 2-D numpy array
`height` and `width` are the desired spatial dimension of the new 2-D array.
"""
img_height, img_width = image.shape[:2]
resized = np.empty([height, width])
x_ratio = float(img_width - 1) / (width - 1) if width > 1 else 0
y_ratio = float(img_height - 1) / (height - 1) if height > 1 else 0
for i in range(height):
for j in range(width):
x_l, y_l = math.floor(x_ratio * j), math.floor(y_ratio * i)
x_h, y_h = math.ceil(x_ratio * j), math.ceil(y_ratio * i)
x_weight = (x_ratio * j) - x_l
y_weight = (y_ratio * i) - y_l
a = image[y_l, x_l]
b = image[y_l, x_h]
c = image[y_h, x_l]
d = image[y_h, x_h]
pixel = a * (1 - x_weight) * (1 - y_weight) + b * x_weight * (1 - y_weight) + c * y_weight * (1 - x_weight) + d * x_weight * y_weight
resized[i][j] = pixel # pixel is the scalar with the value comptued by the interpolation
return resized
Check out some of the scipy ndimage interpolate functions. They will do what you're looking for and are 'using numpy'.
They are also very functional, fast and have been tested many times.
Richard
I'm currently working on a volume rendering project in python where I use a compositing ray casting function to produce an image, given a 3D volume consisting of voxels. The function (which I show below) works correctly, but has a very long runtime. Do you guys have tips on how to make this function faster? The code is Python 3.6.8 and uses various numpy arrays.
def render_compositing(self, view_matrix: np.ndarray, volume: Volume, image_size: int, image: np.ndarray):
# Clear the image
self.clear_image()
# U, V, View vectors. See documentation in parent's class
u_vector = view_matrix[0:3]
v_vector = view_matrix[4:7]
view_vector = view_matrix[8:11]
# Center of the image. Image is squared
image_center = image_size / 2
# Center of the volume (3-dimensional)
volume_center = [volume.dim_x / 2, volume.dim_y / 2, volume.dim_z / 2]
# Define a step size to make the loop faster
step = 2 if self.interactive_mode else 1
for i in range(0, image_size, step):
for j in range(0, image_size, step):
sum_color = TFColor(0, 0, 0, 0)
for k in range(0, image_size, step):
# Get the voxel coordinate X
voxel_coordinate_x = u_vector[0] * (i - image_center) + v_vector[0] * (j - image_center) + \
view_vector[0] * (k - image_center) + volume_center[0]
# Get the voxel coordinate Y
voxel_coordinate_y = u_vector[1] * (i - image_center) + v_vector[1] * (j - image_center) + \
view_vector[1] * (k - image_center) + volume_center[1]
# Get the voxel coordinate Y
voxel_coordinate_z = u_vector[2] * (i - image_center) + v_vector[2] * (j - image_center) + \
view_vector[2] * (k - image_center) + volume_center[2]
color = self.tfunc.get_color(
get_voxel(volume, voxel_coordinate_x, voxel_coordinate_y, voxel_coordinate_z))
sum_color.r = color.a * color.r + (1 - color.a) * sum_color.r
sum_color.g = color.a * color.g + (1 - color.a) * sum_color.g
sum_color.b = color.a * color.b + (1 - color.a) * sum_color.b
sum_color.a = color.a + (1 - color.a) * sum_color.a
red = sum_color.r
green = sum_color.g
blue = sum_color.b
alpha = sum_color.a
# Compute the color value (0...255)
red = math.floor(red * 255) if red < 255 else 255
green = math.floor(green * 255) if green < 255 else 255
blue = math.floor(blue * 255) if blue < 255 else 255
alpha = math.floor(alpha * 255) if alpha < 255 else 255
# Assign color to the pixel i, j
image[(j * image_size + i) * 4] = red
image[(j * image_size + i) * 4 + 1] = green
image[(j * image_size + i) * 4 + 2] = blue
image[(j * image_size + i) * 4 + 3] = alpha
I don't understand why you want to use python for this code. Isn't using a shader the better approach if you are concerned about speed?
Anyways here are few things that can be done in the current code.
voxel coordinates can be calculated using a numpy. you can make a 3 channel 2d image and compute the x,y,z coordinates for an entire slice(k) in a single shot.
Above step can be further optimized by storing an image of x,y,z coordinated of first slice(k=0) and a constant view_directionstep (step_size). Now every other slice can be simply calculated by (XYZ#k=0) + kstep_size.
Use early ray termination by thresholding alpha value to 0.999 or 0.99. This does not look like much but gives a lot of speed gain.
My input is a PIL.Image.Image with mode RGB or RGBA, and I need to fill a numpy.ndarray with 3 float values calculated from the RGB values of each pixel. The output array should be indexable by the pixel coordinates. I have found the following way to do it:
import numpy as np
from PIL import Image
def generate_ycbcr(img: Image.Image):
for r, g, b in img.getdata():
yield 0.299 * r + 0.587 * g + 0.114 * b
yield 128 - 0.168736 * r - 0.331264 * g + 0.5 * b
yield 128 + 0.5 * r - 0.418688 * g - 0.081312 * b
def get_ycbcr_arr(img: Image.Image):
width, height = img.size
arr = np.fromiter(generate_ycbcr(img), float, height * width * 3)
return arr.reshape(height, width, 3)
It works, but I suspect there is a better and/or faster way. Please tell me if there is one, but also if there is not.
N.B.: I know I can convert() the image to YCbCr, and then fill a numpy.array from that, but the conversion is rounded to integer values, which is not what I need.
For starters, you can convert an image directly to a numpy array and use vectorized operations to do what you want:
def get_ycbcr_vectorized(img: Image.Image):
R,G,B = np.array(img).transpose(2,0,1)[:3] # ignore alpha if present
Y = 0.299 * R + 0.587 * G + 0.114 * B
Cb = 128 - 0.168736 * R - 0.331264 * G + 0.5 * B
Cr = 128 + 0.5 * R - 0.418688 * G - 0.081312 * B
return np.array([Y,Cb,Cr]).transpose(1,2,0)
print(np.array_equal(get_ycbcr_arr(img), get_ycbcr_vectorized(img))) # True
However, are you sure that directly converting to 'YCbCr' will be that much different? I tested the conversion defined in the above function:
import matplotlib.pyplot as plt
def aux():
# generate every integer R/G/B combination
R,G,B = np.ogrid[:256,:256,:256]
Y = 0.299 * R + 0.587 * G + 0.114 * B
Cb = 128 - 0.168736 * R - 0.331264 * G + 0.5 * B
Cr = 128 + 0.5 * R - 0.418688 * G - 0.081312 * B
# plot the maximum error along one of the RGB channels
for arr,label in zip([Y,Cb,Cr], ['Y', 'Cb', 'Cr']):
plt.figure()
plt.imshow((arr - arr.round()).max(-1))
plt.xlabel('R')
plt.ylabel('G')
plt.title(f'max_B ({label} - {label}.round())')
plt.colorbar()
aux()
plt.show()
The results suggest that the largest absolute error is 0.5, although these errors happen all over the pixels:
So yeah, this could be a large-ish relative error, but this isn't necessarily a huge issue.
In case the built-in conversion suffices:
arr = np.array(img.convert('YCbCr'))
is all you need.
I read a reference code for creating a montage of images, and there's a few lines of code I don't quite understand.
This is the full code:
def montage(images, saveto='montage.png'):
"""Draw all images as a montage separated by 1 pixel borders.
Also saves the file to the destination specified by `saveto`.
Parameters
----------
images : numpy.ndarray
Input array to create montage of. Array should be:
batch x height x width x channels.
saveto : str
Location to save the resulting montage image.
Returns
-------
m : numpy.ndarray
Montage image.
"""
if isinstance(images, list):
images = np.array(images)
img_h = images.shape[1]
img_w = images.shape[2]
n_plots = int(np.ceil(np.sqrt(images.shape[0])))
if len(images.shape) == 4 and images.shape[3] == 3:
m = np.ones(
(images.shape[1] * n_plots + n_plots + 1,
images.shape[2] * n_plots + n_plots + 1, 3)) * 0.5
else:
m = np.ones(
(images.shape[1] * n_plots + n_plots + 1,
images.shape[2] * n_plots + n_plots + 1)) * 0.5
for i in range(n_plots):
for j in range(n_plots):
this_filter = i * n_plots + j
if this_filter < images.shape[0]:
this_img = images[this_filter]
m[1 + i + i * img_h:1 + i + (i + 1) * img_h,
1 + j + j * img_w:1 + j + (j + 1) * img_w] = this_img
plt.imsave(arr=m, fname=saveto)
return m
For the creation of m, I get the idea that the author is trying to create a scaffold of sorts to multiply the image on later, but how is the value of images.shape[1] * n_plots + n_plots + 1 calculated? Why must the ones be multiplied by 0.5?
Why can't it be images.shape[1] * n_plots only, since the shape should b sufficient for the number of images that could be included in the montage?