I'm trying to convert an image from RGB to LMS -and vice versa- using OpenCV in Python. From what I understand, I am supposed to use a given 3x3 transformation matrix and multiply it to a 3x1 RGB/LMS matrix. The transformation matrices used can be found here.
I've explored previously asked questions on this site but unfortunately they're in C++, a language I have yet to be proficient in and I have difficulty in understanding how exactly they've solved their problems.
Here is my code so far: [Solved as of 2019-05-19]
import numpy as np
import cv2
#Transformation Matrix#
MsRGB = np.zeros((3,3), dtype='float')
MHPE = np.zeros((3,3), dtype='float')
MsRGB = np.array([[0.4124564, 0.3575761, 0.1804375],
[0.2126729, 0.7151522, 0.0721750],
[0.0193339, 0.1191920, 0.9503041]])
MHPE = np.array([[ 0.4002, 0.7076, -0.0808],
[-0.2263, 1.1653, 0.0457],
[ 0, 0, 0.9182]])
Trgb2lms = MHPE # MsRGB
Tlms2rgb = np.linalg.inv(Trgb2lms)
imgpath = "(insert file directory here)"
imgIN = cv2.imread(imgpath,cv2.IMREAD_UNCHANGED)
imgINrgb = cv2.cvtColor(imgIN, cv2.COLOR_BGR2RGB)
x,y,z = imgINrgb.shape
imgLMS = np.zeros((x,y,z), dtype='float')
imgReshaped = imgINrgb.transpose(2, 0, 1).reshape(3,-1)
imgLMS = Trgb2lms # imgReshaped #Convert to LMS
imgOUT = Tlms2rgb # imgLMS #Convert back to RGB
imgLMS = imgLMS.reshape(z, x, y).transpose(1, 2, 0).astype(np.uint8)
imgOUT = imgOUT.reshape(z, x, y).transpose(1, 2, 0).astype(np.uint8)
imgOUT = cv2.cvtColor(imgOUT, cv2.COLOR_RGB2BGR)
cv2.imshow('Input', imgIN)
cv2.imshow('LMS', imgLMS)
cv2.imshow('Output', imgOUT)
cv2.waitKey(0)
cv2.destroyAllWindows()
The code is now able to perform linear transformation on a given RGB image using a given transformation matrix. Results can be found here.
There are a few errors given the context of your question:
T is not defined. Judging from the context of your code, this should be Trgb2lms instead so we need to change those.
From what I can gather from the question, you are applying a linear transformation to all pixels in the image. To do this, you want to reshape the matrix so that we have three rows where each row corresponds to a single pixel followed by an unravelling of all pixels along the columns. In that case, the reshape method is incorrect. You need not only shuffle the dimensions so that the last dimension is first but you'll also need to set the last dimension of the reshape so that it's -1. This means that we will automatically fill up the columns so that it contains the total number of pixels in the image.
Finally, once you do the linear transformation, you need to reshape the matrix back to the original image size. You can use a final reshape call and use x, y and z from the original call you made to infer the image dimensions. Remember that when we reshape, the channels come first so we'll have to permute the dimensions again. You'll also want to go back to unsigned 8-bit precision after we do the transformation.
Also to compare, let's run this through the inverse transformation to make sure we have the original.
Therefore:
import numpy as np
import cv2
#Transformation Matrix#
MsRGB = np.zeros((3,3), dtype='float')
MHPE = np.zeros((3,3), dtype='float')
MsRGB = np.array([[0.4124564, 0.3575761, 0.1804375],
[0.2126729, 0.7151522, 0.0721750],
[0.0193339, 0.1191920, 0.9503041]])
MHPE = np.array([[ 0.4002, 0.7076, -0.0808],
[-0.2263, 1.1653, 0.0457],
[ 0, 0, 0.9182]])
Trgb2lms = MHPE # MsRGB
# Change
Tlms2rgb = np.linalg.inv(Trgb2lms)
imgpath = "(insert filename here)"
imgIN = cv2.imread(imgpath,cv2.IMREAD_UNCHANGED)
imgINrgb = cv2.cvtColor(imgIN, cv2.COLOR_BGR2RGB)
x,y,z = imgINrgb.shape
imgLMS = np.zeros((x,y,z), dtype='float')
#imgFlatten = imgINrgb.flatten()
# Change
imgReshaped = imgINrgb.transpose(2, 0, 1).reshape(3,-1)
# Change
imgLMS = Trgb2lms # imgReshaped
imgOUT = Tlms2rgb # imgLMS
# New
imgLMS = imgLMS.transpose(z, x, y).permute(1, 2, 0).astype(np.uint8)
imgOUT = imgOUT.transpose(z, x, y).permute(1, 2, 0).astype(np.uint8)
Related
I'm new to Python, and I'm trying to deconstruct image bands as arrays of numbers by applying the Singular Value Decomposition (SVD) to them and then putting them back together with matplotlib.image and the Image module from PIL. An SVD may also be written as a sum of dyads s1u1v1T + ... + sKuKvKT, and the point in decomposing it in this way is that a near-perfect approximation of the image can be made from just a few of those dyads, so less data is required.
There must be something wrong with the calculation, though because result_r, result_g, and result_b look like this when converted to Images, and new_image looks like this.
For an example of what this should look like, here are the first dyads of the layers of this image. The image that I'm using (April23.jpg) is this.
import matplotlib.image as image
import numpy.linalg as la
import numpy as np
from PIL import Image
def getcolumn(j, m):
col = []
for i in range(len(m)):
col.append(m[i][j])
return col
def extractCols(U):
Ucols = []
for j in range(len(U[0])):
Ucols.append(getcolumn(j, U))
return np.asarray(Ucols)
def vectorMultiply(u, v):
matrix = []
for i in range(len(u)):
newVec = []
for j in range(len(v)):
newVec.append(u[i] * v[j])
matrix.append(newVec)
return np.asarray(matrix)
im = Image.open('C:/Users/<user>/Desktop/img/April23.jpg')
im.load()
sim = Image.Image.split(im)
rsim = sim[0].save("rsim.jpg") # image bands as images
gsim = sim[1].save("gsim.jpg")
bsim = sim[2].save("bsim.jpg")
# image bands as arrays of numbers
arsim = image.imread('C:/Users/<user>/Desktop/img/rsim.jpg')
agsim = image.imread('C:/Users/<user>/Desktop/img/gsim.jpg')
absim = image.imread('C:/Users/<user>/Desktop/img/bsim.jpg')
ur, sr, vhr = la.svd(arsim, False) # SVD on each band
ug, sg, vhg = la.svd(agsim, False)
ub, sb, vhb = la.svd(absim, False)
urcols = extractCols(ur)
ugcols = extractCols(ug)
ubcols = extractCols(ub)
# calculating the first dyads
result_r = np.multiply(sr[0], vectorMultiply(urcols[0], vhr[0]))
result_g = np.multiply(sg[0], vectorMultiply(ugcols[0], vhg[0]))
result_b = np.multiply(sb[0], vectorMultiply(ubcols[0], vhb[0]))
r = Image.fromarray(result_r, "L")
g = Image.fromarray(result_g, "L")
b = Image.fromarray(result_b, "L")
new_image = Image.merge("RGB", (r, g, b))
What am I missing, here? It seems to be something with the calculations. I figured for a matrix one would have to extract the columns, say the column [1, 2, 3] from a matrix [[1,...], [2,...], [3,...]], since each element of the matrix is a row. So, I wrote extractCols() for that. numpy's matrix add and multiply seem to be fine. I wrote vectorMultiply because np.dot(), np.multiply(), and np.matmul() didn't seem to realize that u was a column and kept saying the dimensions didn't match up. I tested it and it seemed to do what I wanted it to. I was also thinking that maybe the "rows" of U are actually the columns already and don't need to be extracted, but that didn't work either. I've also tried not using np.asarray() without any luck.
Any advice is appreciated.
I'm trying to sort an image by luminosity using NumPy, which I'm new to. I've managed to create a random image and sort it.
def create_image(output, width, height, arr):
array = np.zeros([height, width, 3], dtype=np.uint8)
numOfSwatches = len(arr)
swatchWidth = int(width/ numOfSwatches)
for i in range (0, numOfSwatches):
m = i * swatchWidth
r = (i+1) * swatchWidth
array[:, m:r] = arr[i]
img = Image.fromarray(array)
img.save(output)
Which creates this image:
So far so good. Only now I want to switch from creating random images to loading them and then sorting them.
#!/usr/bin/python3
import numpy as np
from PIL import Image
# --------------------------------------------------------------
def load_image( infilename ) :
img = Image.open( infilename )
img.load()
data = np.asarray( img, dtype = "int32" )
return data
# --------------------------------------------------------------
def lum (r,g,b):
return math.sqrt( .241 * r + .691 * g + .068 * b )
myImageFile = "random_colours.png"
imageNP = load_image(myImageFile)
imageNP.sort(key=lambda rgb: lum(*rgb) )
The image should look like this:
The error I get is TypeError: 'key' is an invalid keyword argument for this function I may have created the NP array incorrectly as it worked when it was a random NP array.
Have not ever used PIL, but the following approach hopefully works (I'm not sure as I can't reproduce your exact examples), and of course there might be more efficient ways to do so.
I'm using your functions, having changed the math.sqrt function to np.sqrt in the lum function - as it is better for vector calculations. By the way, I believe this won't work with an int32 type array (as in your load_image function).
The key part is Numpy's argsort function (last line), which gives the indices that would sort the given array; this is applied to a row of the luminosity array (exploiting simmetry) and later used as indexer of img_array.
# Create random image
np.random.seed(4)
img = create_image('test.png', 75, 75, np.random.random((25,3))*255)
# Convert to Numpy array and calculate luminosity
img_array = np.array(img, dtype = np.uint8)
luminosity = lum(img_array[...,0], img_array[...,1], img_array[...,2])
# Sort by luminosity and convert to image again
img_sorted = Image.fromarray(img_array[:,luminosity[0].argsort()])
The original picture:
And the luminosity-sorted one:
I implemented FFT-based convolution in Pytorch and compared the result with spatial convolution via conv2d() function. The convolution filter used is an average filter. The conv2d() function produced smoothened output due to average filtering as expected but the fft-based convolution returned a more blurry output.
I have attached the code and outputs here -
spatial convolution -
from PIL import Image, ImageOps
import torch
from matplotlib import pyplot as plt
from torchvision.transforms import ToTensor
import torch.nn.functional as F
import numpy as np
im = Image.open("/kaggle/input/tiger.jpg")
im = im.resize((256,256))
gray_im = im.convert('L')
gray_im = ToTensor()(gray_im)
gray_im = gray_im.squeeze()
fil = torch.tensor([[1/9,1/9,1/9],[1/9,1/9,1/9],[1/9,1/9,1/9]])
conv_gray_im = gray_im.unsqueeze(0).unsqueeze(0)
conv_fil = fil.unsqueeze(0).unsqueeze(0)
conv_op = F.conv2d(conv_gray_im,conv_fil)
conv_op = conv_op.squeeze()
plt.figure()
plt.imshow(conv_op, cmap='gray')
FFT-based convolution -
def fftshift(image):
sh = image.shape
x = np.arange(0, sh[2], 1)
y = np.arange(0, sh[3], 1)
xm, ym = np.meshgrid(x,y)
shifter = (-1)**(xm + ym)
shifter = torch.from_numpy(shifter)
return image*shifter
shift_im = fftshift(conv_gray_im)
padded_fil = F.pad(conv_fil, (0, gray_im.shape[0]-fil.shape[0], 0, gray_im.shape[1]-fil.shape[1]))
shift_fil = fftshift(padded_fil)
fft_shift_im = torch.rfft(shift_im, 2, onesided=False)
fft_shift_fil = torch.rfft(shift_fil, 2, onesided=False)
shift_prod = fft_shift_im*fft_shift_fil
shift_fft_conv = fftshift(torch.irfft(shift_prod, 2, onesided=False))
fft_op = shift_fft_conv.squeeze()
plt.figure('shifted fft')
plt.imshow(fft_op, cmap='gray')
original image -
spatial convolution output -
fft-based convolution output -
Could someone kindly explain the issue?
The main problem with your code is that Torch doesn't do complex numbers, the output of its FFT is a 3D array, with the 3rd dimension having two values, one for the real component and one for the imaginary. Consequently, the multiplication does not do a complex multiplication.
There currently is no complex multiplication defined in Torch (see this issue), we'll have to define our own.
A minor issue, but also important if you want to compare the two convolution operations, is the following:
The FFT takes the origin of its input in the first element (top-left pixel for an image). To avoid a shifted output, you need to generate a padded kernel where the origin of the kernel is the top-left pixel. This is quite tricky, actually...
Your current code:
fil = torch.tensor([[1/9,1/9,1/9],[1/9,1/9,1/9],[1/9,1/9,1/9]])
conv_fil = fil.unsqueeze(0).unsqueeze(0)
padded_fil = F.pad(conv_fil, (0, gray_im.shape[0]-fil.shape[0], 0, gray_im.shape[1]-fil.shape[1]))
generates a padded kernel where the origin is in pixel (1,1), rather than (0,0). It needs to be shifted by one pixel in each direction. NumPy has a function roll that is useful for this, I don't know the Torch equivalent (I'm not at all familiar with Torch). This should work:
fil = torch.tensor([[1/9,1/9,1/9],[1/9,1/9,1/9],[1/9,1/9,1/9]])
padded_fil = fil.unsqueeze(0).unsqueeze(0).numpy()
padded_fil = np.pad(padded_fil, ((0, gray_im.shape[0]-fil.shape[0]), (0, gray_im.shape[1]-fil.shape[1])))
padded_fil = np.roll(padded_fil, -1, axis=(0, 1))
padded_fil = torch.from_numpy(padded_fil)
Finally, your fftshift function, applied to the spatial-domain image, causes the frequency-domain image (the result of the FFT applied to the image) to be shifted such that the origin is in the middle of the image, rather than the top-left. This shift is useful when looking at the output of the FFT, but is pointless when computing the convolution.
Putting these things together, the convolution is now:
def complex_multiplication(t1, t2):
real1, imag1 = t1[:,:,0], t1[:,:,1]
real2, imag2 = t2[:,:,0], t2[:,:,1]
return torch.stack([real1 * real2 - imag1 * imag2, real1 * imag2 + imag1 * real2], dim = -1)
fft_im = torch.rfft(gray_im, 2, onesided=False)
fft_fil = torch.rfft(padded_fil, 2, onesided=False)
fft_conv = torch.irfft(complex_multiplication(fft_im, fft_fil), 2, onesided=False)
Note that you can do one-sided FFTs to save a bit of computation time:
fft_im = torch.rfft(gray_im, 2, onesided=True)
fft_fil = torch.rfft(padded_fil, 2, onesided=True)
fft_conv = torch.irfft(complex_multiplication(fft_im, fft_fil), 2, onesided=True, signal_sizes=gray_im.shape)
Here the frequency domain is about half the size as in the full FFT, but it is only redundant parts that are left out. The result of the convolution is unchanged.
I'm trying to calibrate a fisheye camera using OpenCV 3.0.0 python bindings (with an asymmetric circle grid), but I have problems to format the object and image point arrays correctly. My current source looks like this:
import cv2
import glob
import numpy as np
def main():
circle_diameter = 4.5
circle_radius = circle_diameter/2.0
pattern_width = 4
pattern_height = 11
num_points = pattern_width*pattern_height
images = glob.glob('*.bmp')
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 30, 0.001)
imgpoints = []
objpoints = []
obj = []
for i in range(pattern_height):
for j in range(pattern_width):
obj.append((
float(2*j + i % 2)*circle_radius,
float(i*circle_radius),
0
))
for name in images:
image = cv2.imread(name)
grayimage = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
retval, centers = cv2.findCirclesGrid(grayimage, (pattern_width, pattern_height), flags=(cv2.CALIB_CB_ASYMMETRIC_GRID + cv2.CALIB_CB_CLUSTERING))
imgpoints_tmp = np.zeros((num_points, 2))
if retval:
for i in range(num_points):
imgpoints_tmp[i, 0] = centers[i, 0, 0]
imgpoints_tmp[i, 1] = centers[i, 0, 1]
imgpoints.append(imgpoints_tmp)
objpoints.append(obj)
# Convertion to numpy array
imgpoints = np.array(imgpoints, dtype=np.float32)
objpoints = np.array(objpoints, dtype=np.float32)
K, D = cv2.fisheye.calibrate(objpoints, imgpoints, image_size=(1280, 800), K=None, D=None)
if __name__ == '__main__':
main()
The error message is:
OpenCV Error: Assertion failed (objectPoints.type() == CV_32FC3 || objectPoints.type() == CV_64FC3) in cv::fisheye::calibrate
objpoints has shape (31,44,3).
So objpoints array needs to be formatted in a different way, but I'm not able to achieve the correct layout. Maybe someone can help here?
In the sample of OpenCV (Camera Calibration) they set the objp to objp2 = np.zeros((8*9,3), np.float32)
However, in omnidirectional camera or fisheye camera, it should be:
objp = np.zeros((1,8*9,3), np.float32)
Idea is from here Calibrate fisheye lens using OpenCV — part 1
The correct layout of objpoints is a list of numpy arrays with len(objpoints) = "number of pictures" and each entry beeing a numpy array.
Please have a look at the official help. OpenCV documentation talks about "vectors", which is equivalent of a list or numpy.array. In this instance a "vector of vectors" can be interpreted as a list of numpy.arrays.
The data type is correct, but the shape is not. The expected shape of objpoints supposed to be (n_observations, 1, n_corners_per_observation, 3). Therefore, the code in your case should be:
imgpoints = np.array(imgpoints, dtype=np.float32).reshape(
-1,
1,
pattern_width * pattern_height,
3
)
or more general:
imgpoints = np.array(imgpoints, dtype=np.float32).reshape(
n_observations,
1,
n_corners_per_observation,
3
)
The error message is slightly misleading.
Didn't find a satisfying answer here so I messed around and eventually got this chunk to work:
calibration_flags = cv2.fisheye.CALIB_RECOMPUTE_EXTRINSIC + cv2.fisheye.CALIB_CHECK_COND + cv2.fisheye.CALIB_FIX_SKEW
# lists with each element a [1,n_points,_] array of type float32
obj_points = [np.random.rand(1,10,3).astype(np.float32)]
fisheye_points = [np.random.rand(1,10,2).astype(np.float32)]
# initialize empty variables of correct size and type, where total_num_points is summed across all arrays in each above list
rvecs = [np.zeros((1, 1, 3), dtype=np.float32) for i in range(total_num_points)]
tvecs = [np.zeros((1, 1, 3), dtype=np.float32) for i in range(total_num_points)]
D = np.zeros([4,1]).astype(np.float32)
K = np.zeros([3,3]).astype(np.float32)
outputs = cv2.fisheye.calibrate(gt_points,fisheye_points,(1920,1080),K,D,rvecs,tvecs)
I am trying to define a window that scans across an image, I want to find the average RGB values in each window and output them.
I have managed to get the average RGB values for the entire image like this:
img = cv2.imread('images/0021.jpg')
mean = cv2.mean(img)
print mean[0]
print mean[1]
print mean[2]
Gives:
#Output
51.0028081597
63.1069849537
123.663025174
How could I apply this mean function to a moving window and output the values for each window?
EDIT:
Here is what I have now:
img = cv2.imread('images/0021.jpg')
def new(img):
rows,cols = img.shape
final = np.zeros((rows, cols, 3, 3))
for x in (0,1,2):
for y in (0,1,2):
img1 = np.vstack((img[x:],img[:x]))
img1 = np.column_stack((img1[:,y:],img1[:,:y]))
final[x::3,y::3] = np.swapaxes(img1.reshape(rows/3,3,cols/3,-1),1,2)
b,g,r = cv2.split(final)
rgb_img = cv2.merge([r,g,b])
mean = cv2.mean(rgb_img)
print mean[0]
print mean[1]
print mean[2]
But now I am getting zero output.
I wrote a script similar to the given links. It basically divides your img to 3*3 parts and then computes mean (and standard deviation) of each part. With a little array optimization I think you can use it real time/on video.
PS: Divisions should be integer division
EDIT: now the script gives 9 outputs each represent a mean of its own region.
import numpy as np
import cv2
img=cv2.imread('aerial_me.jpg')
scale=3
y_len,x_len,_=img.shape
mean_values=[]
for y in range(scale):
for x in range(scale):
cropped_image=img[(y*y_len)/scale:((y+1)*y_len)/scale,
(x*x_len)/scale:((x+1)*x_len)/scale]
mean_val,std_dev=cv2.meanStdDev(cropped_image)
mean_val=mean_val[:3]
mean_values.append([mean_val])
mean_values=np.asarray(mean_values)
print mean_values.reshape(3,3,3)
The output is bgr mean values of each window:
[[[ 69.63661573 66.75843063 65.02066449]
[ 118.39233345 114.72655391 116.14441964]
[ 159.26887164 143.40760348 144.63208436]]
[[ 75.50831044 107.45708276 103.0781851 ]
[ 108.46450034 141.52005495 139.84878949]
[ 122.67583265 154.86071992 153.67907072]]
[[ 83.67678571 131.45284169 128.27706902]
[ 86.57919815 129.09968235 128.64439389]
[ 90.1102402 135.33173999 132.86622807]]]
[Finished in 0.5s]
Filter with a kernel of shape equal to your window, and values all equal to 1/window_areas. The result is local average you seek (also known as a "box blur" operation).