I'm trying to use a Laplace Filter via TensorFlow tf.nn.conv2d on my image. But the output is super weird and I don't have a clue what I did wrong.
I load my picture via:
file = tf.io.read_file("corgi.jpg")
uint_image = tf.io.decode_jpeg(file, 1)
image = tf.cast(uint_image,tf.float32)
kernel = tf.constant(np.array([[1, 1, 1],
[1, -8, 1],
[1, 1, 1]]), dtype=tf.float32)
convoluted_image = self.convoluteTest(image, kernel)
rs_convoluted_image = tf.reshape(convoluted_image,
[tf.shape(image)[0] - tf.shape(kernel)[0] + 1,
tf.shape(image)[1] - tf.shape(kernel)[0] + 1, 1])
casted_image = tf.cast(rs_convoluted_image, tf.uint8)
encoded = tf.io.encode_jpeg(casted_image)
tf.io.write_file("corgi-tensor-laplace.jpg", encoded)
But the image parameter cant be passed onto the tf.nn.conv2d function since image tensor requires to be a 4d tensor.
This function here reshapes and applies my laplace filter:
def convoluteTest(image_tensor, kernel_tensor):
shape = tf.shape(image_tensor)
reshaped_image_tensor = tf.reshape(image_tensor, [1, shape[0].numpy(), shape[1].numpy(), 1])
reshaped_kernel_tensor = tf.reshape(kernel_tensor,
[tf.shape(kernel_tensor)[0].numpy(), tf.shape(kernel_tensor)[0].numpy(), 1,
1])
convoluted = tf.nn.conv2d(reshaped_image_tensor, reshaped_kernel_tensor, strides=[1, 1, 1, 1], padding='VALID')
return convoluted
Original Picture:
Failed laplace:
Update:
Greyish output:
What did I do wrong? I can't wrap my head around this...
I believe the problem is casted_image = tf.cast(rs_convoluted_image, tf.uint8) truncates data outside of [0, 255] to pure black or pure white (0 and 255).
I think you are missing a normalization step back to the [0, 255] range before casting to utint8.
Try
normalized_convolved = (rs_convoluted_image - tf.reduce_min(rs_convoluted_image) / (tf.reduce_max(rs_convoluted_image) - tf.reduce_min(rs_convoluted_image))
normalized_convolved = normalized_convolved * 255
casted_image = tf.cast(normalized_convolved, tf.uint8)
Related
In PyTorch I have an RGB tensor imgA of batch size 256. I want to retain the green channel for first 128 batches and red channel for remaining 128 batches, something like below:
imgA[:128,2,:,:] = imgA[:128,1,:,:]
imgA[128:,2,:,:] = imgA[128:,0,:,:]
imgA = imgA[:,2,:,:].unsqueeze(1)
or same can be achieved like
imgA = torch.cat((imgA[:128,1,:,:].unsqueeze(1),imgA[128:,0,:,:].unsqueeze(1)),dim=0)
but as I have multiple such images like imgA, imgB, imgC, etc what is the fastest way of achieving the above goal?
A slicing-based solution can be achieved using torch.gather and repeat_interleave:
select = torch.tensor([1, 0], device=imgA.device)
imgA = = imgA.gather(dim=1, index=select.repeat_interleave(128, dim=0).view(256, 1, 1, 1).expand(-1, -1, *imgA.shape[-2:]))
You can also do that using matrix multiplication and repeat_interleave:
# select c=1 for first half and c=0 for second
select = torch.tensor([[0, 1],[1, 0],[0, 0]], dtype=imgA.dtype, device=imgA.device)
imgA = torch.einsum('cb,bchw->bhw',select.repeat_interleave(128, dim=1), imgA).unsqueeze(dim=1)
I tried to apply INT8bit quantization before FloatingPoint32bit Matrix Multiplication, then requantize accumulated INT32bit output to INT8bit. After all, I guess there's a couple of mix-ups somewhere in the process. I feel stuck in spotting those trouble spots.
data flow [Affine Quantization]:
input(fp32) -> quant(int8) ____\ matmul(int32) -> requant(int8) ->deq(fp32)
input(fp32) -> quant(int8) ----/
My Pseudo Code
INPUT(FP32) :
Embedded Words in Tensor (shape : [1, 4, 1024, 256]) A and B (B is the same as A)
input A(=B) : enter image description here
EXPECTING OUTPUT(FP32) :
Embedded Words in Tensor (shape : [1, 4, 1024, 1024]) AB(after matrix multiplication to itself)
do while(true):
# convert A and B of FP32 into INT8
A_zero_offset = torch.empty(A.shape)
A_zero_offset = torch.zeros_like(A_zero_offset) # offset to be zero **[Question1]**
scale = 255 / (torch.max(A) - torch.min(B)) # 2^8 - 1 = 255
A_quantized = np.round((A - A_zero_offset) * scale)
# likewise
B_quantized = A_quantized
AB = A_quantized.matmul(B_quantized.transpose(-1, -2))
# now accumulated datatype is INT32
AB_offset = torch.empty(AB.shape)
AB_offset = AB_offset.new_full(AB.shape, torch.min(AB)) # offset to be AB's min element **[Question 1]**
scale_AB = 255 / (torch.max(AB) - torch.min(AB)) **[Question 2]**
AB_requantized = np.round((AB - AB_offset) * scale_AB)
# dequantize AB(INT8 at the status quo) into FP32
**[Question 3]**
[Question 1] : does it make sense to set A's offset to be zero and AB's to be min(AB)?
[Question 2] : What operation should I follow with the scale calculation, "max(AB) - min(AB)" or any otherwise method?
[Question 3] : After all, what operation do I have to follow especially with the scale and offset calculation when to dequantize the result into FP32?
I believe this approach is totally wrong because for every embedded word tensor there is an different max and min values, so this bug changes your data continuity. I assume you are aware of you loose information anyway because you cant sequezee(map) fp32 to int8 in same tensor shapes
import torch
import numpy as np
# create Pseudo tensor
a = torch.tensor([[0.654654, 1.654687, -0.5645365],
[5.687646, -5.662354, 0.6546646]], dtype=torch.float32)
print(a.dtype)
print(a)
# torch.float32
# tensor([[ 0.6547, 1.6547, -0.5645],
# [ 5.6876, -5.6624, 0.6547]])
b = a.clone().int()
print(b)
# tensor([[ 0, 1, 0],
# [ 5, -5, 0]], dtype=torch.int32)
# converting to int8 please note range is here -128 to + 128
c = a.clone().to(torch.int8)
print(c)
# tensor([[ 0, 1, 0],
# [ 5, -5, 0]], dtype=torch.int8)
# converting to uint8 please note range is here 0 to 255
d = a.clone().byte()
print(d)
# tensor([[ 0, 1, 0],
# [ 5, 251, 0]], dtype=torch.uint8)
Your approach(wrong)
A, B = a
A_zero_offset = torch.empty(A.shape)
A_zero_offset = torch.zeros_like(A_zero_offset) # offset to be zero **[Question1]**
scale = 255 / (torch.max(A) - torch.min(B)) # 2^8 - 1 = 255
A_quantized = np.round((A - A_zero_offset) * scale)
print(A_quantized.dtype)
print(A_quantized)
# torch.float32
# tensor([ 23., 58., -20.])
This is the Matlab code I want to replicate in OpenCv
e[~, threshold] = edge(I, 'sobel');
fudgeFactor = .5;
BWs = edge(I,'sobel', threshold * fudgeFactor);
figure, imshow(BWs), title('binary gradient mask');
This is my test image:
Cell image
I have tried things like
blurred_gray = cv2.GaussianBlur(gray_image,(3,3),0)
sobelx = cv2.Sobel(blurred_gray,cv2.CV_8U,1,0,ksize=3)
sobely = cv2.Sobel(blurred_gray,cv2.CV_8U,0,1,ksize=3)[2]
And the output I got is:
sobelx
sobely
I tried adding sobelx and sobely because I read they're partial derivatives, but the result image looks same as the above and varying the ksize didn't help.
This is the output I need:
edge image
Could someone please tell me what I'm doing wrong and what I should do to get the same result image?
The MATLAB implementation of the sobel edge detection isn't visible so we can only guess exactly what is happening. The only hint we get is from the documentation on edge states that when the 'sobel' option is used then
Finds edges at those points where the gradient of the image I is
maximum, using the Sobel approximation to the derivative.
It's not stated, but taking the maximum of the gradient is more complicated than simply taking the local maximums in the image. Instead we want to find local maximums with respect to the gradient direction. Unfortunately the actual code used by MATLAB for this operation is hidden.
Looking at the code that is available in edge it appears that they use 4*mean(magnitude) for the threshold in the thinning operation so I'm using this combined with your fudge factor. The orientated_non_max_suppression function is far from optimal but I wrote it for readability over performance.
import cv2
import numpy as np
import scipy.ndimage.filters
gray_image = cv2.imread('cell.png', cv2.IMREAD_GRAYSCALE).astype(dtype=np.float32)
def orientated_non_max_suppression(mag, ang):
ang_quant = np.round(ang / (np.pi/4)) % 4
winE = np.array([[0, 0, 0],
[1, 1, 1],
[0, 0, 0]])
winSE = np.array([[1, 0, 0],
[0, 1, 0],
[0, 0, 1]])
winS = np.array([[0, 1, 0],
[0, 1, 0],
[0, 1, 0]])
winSW = np.array([[0, 0, 1],
[0, 1, 0],
[1, 0, 0]])
magE = non_max_suppression(mag, winE)
magSE = non_max_suppression(mag, winSE)
magS = non_max_suppression(mag, winS)
magSW = non_max_suppression(mag, winSW)
mag[ang_quant == 0] = magE[ang_quant == 0]
mag[ang_quant == 1] = magSE[ang_quant == 1]
mag[ang_quant == 2] = magS[ang_quant == 2]
mag[ang_quant == 3] = magSW[ang_quant == 3]
return mag
def non_max_suppression(data, win):
data_max = scipy.ndimage.filters.maximum_filter(data, footprint=win, mode='constant')
data_max[data != data_max] = 0
return data_max
# compute sobel response
sobelx = cv2.Sobel(gray_image, cv2.CV_32F, 1, 0, ksize=3)
sobely = cv2.Sobel(gray_image, cv2.CV_32F, 0, 1, ksize=3)
mag = np.hypot(sobelx, sobely)
ang = np.arctan2(sobely, sobelx)
# threshold
fudgefactor = 0.5
threshold = 4 * fudgefactor * np.mean(mag)
mag[mag < threshold] = 0
# non-maximal suppression
mag = orientated_non_max_suppression(mag, ang)
# alternative but doesn't consider gradient direction
# mag = skimage.morphology.thin(mag.astype(np.bool)).astype(np.float32)
# create mask
mag[mag > 0] = 255
mag = mag.astype(np.uint8)
Results on the Cell
Python
MATLAB
Results on MATLAB's peppers.png (built-in)
Python
MATLAB
The MATLAB implementation must use something a little different but it looks like this gets pretty close.
My goal is to transform an image in such a way that three source points are mapped to three target points in an empty array. I have solved the finding of the correct affine matrix, however I cannot apply an affine transformation on a color image.
More specifically, I am struggling with the correct use of the scipy.ndimage.interpolation.affine_transform method. As this question and it's anwers point out, the affine_transform-method can be somewhat unintuitive (especially regarding offset calculation), however, user timday shows how apply a rotation and a shearing on an image and position it in another array, while user geodata gives more background information.
My problem is to generalize the approach shown there (1) to color images and (2) to an arbitrary transformation which I calculated myself.
This is my code (which should run as is on your computer):
import numpy as np
from scipy import ndimage
import matplotlib.pyplot as plt
def calcAffineMatrix(sourcePoints, targetPoints):
# For three source- and three target points, find the affine transformation
# Function works correctly, not part of the question
A = []
b = []
for sp, trg in zip(sourcePoints, targetPoints):
A.append([sp[0], 0, sp[1], 0, 1, 0])
A.append([0, sp[0], 0, sp[1], 0, 1])
b.append(trg[0])
b.append(trg[1])
result, resids, rank, s = np.linalg.lstsq(np.array(A), np.array(b))
a0, a1, a2, a3, a4, a5 = result
# Ignoring offset here, later use timday's suggested offset calculation
affineTrafo = np.array([[a0, a1, 0], [a2, a3, 0], [0, 0, 1]], 'd')
# Testing the correctness of transformation matrix
for i, _ in enumerate(sourcePoints):
src = sourcePoints[i]
src.append(1.)
trg = targetPoints[i]
trg.append(1.)
at = affineTrafo.copy()
at[2, 0:2] = [a4, a5]
assert(np.array_equal(np.round(np.array(src).dot(at)), np.array(trg)))
return affineTrafo
# Prepare source image
sourcePoints = [[162., 112.], [130., 112.], [162., 240.]]
targetPoints = [[180., 102.], [101., 101.], [190., 200.]]
image = np.empty((300, 300, 3), dtype='uint8')
image[:] = 255
# Mark border for better visibility
image[0:2, :] = 0
image[-3:-1, :] = 0
image[:, 0:2] = 0
image[:, -3:-1] = 0
# Mark source points in red
for sp in sourcePoints:
sp = [int(u) for u in sp]
image[sp[1] - 5:sp[1] + 5, sp[0] - 5:sp[0] + 5, :] = np.array([255, 0, 0])
# Show image
plt.subplot(3, 1, 1)
plt.imshow(image)
# Prepare array in which the image is placed
array = np.empty((400, 300, 3), dtype='uint8')
array[:] = 255
a2 = array.copy()
# Mark target points in blue
for tp in targetPoints:
tp = [int(u) for u in tp]
a2[tp[1] - 2:tp[1] + 2, tp[0] - 2:tp[0] + 2] = [0, 0, 255]
# Show array
plt.subplot(3, 1, 2)
plt.imshow(a2)
# Next 5 program lines are actually relevant for question:
# Calculate affine matrix
affineTrafo = calcAffineMatrix(sourcePoints, targetPoints)
# This follows the c_in-c_out method proposed in linked stackoverflow issue
# extended for color channel (no translation here)
c_in = np.array([sourcePoints[0][0], sourcePoints[0][1], 0])
c_out = np.array([targetPoints[0][0], targetPoints[0][1], 0])
offset = (c_in - np.dot(c_out, affineTrafo))
# Affine transform!
ndimage.interpolation.affine_transform(image, affineTrafo, order=2, offset=offset,
output=array, output_shape=array.shape,
cval=255)
# Mark blue target points in array, expected to be above red source points
for tp in targetPoints:
tp = [int(u) for u in tp]
array[tp[1] - 2:tp[1] + 2, tp[0] - 2:tp[0] + 2] = [0, 0, 255]
plt.subplot(3, 1, 3)
plt.imshow(array)
plt.show()
Other approaches I tried include working with the inverse, transpose or both of affineTrafo:
affineTrafo = np.linalg.inv(affineTrafo)
affineTrafo = affineTrafo.T
affineTrafo = np.linalg.inv(affineTrafo.T)
affineTrafo = np.linalg.inv(affineTrafo).T
In his answer, geodata shows how to calculate the matrix that affine_trafo needs to do a scaling and rotation:
If one wants a scaling S first and then a rotation R it holds that T=R*S and therefore T.inv=S.inv*R.inv (note the reversed order).
Which I tried to copy using matrix decomposition (decomposing my affine transformation into a rotation, a shearing and another rotation):
u, s, v = np.linalg.svd(affineTrafo[:2,:2])
uInv = np.linalg.inv(u)
sInv = np.linalg.inv(np.diag((s)))
vInv = np.linalg.inv(v)
affineTrafo[:2, :2] = uInv.dot(sInv).dot(vInv)
Again, without success.
For all of my results, it's not (only) an offset problem. It is clearly visible from the pictures that the relative positions of source and target points do not correspond.
I searched the web and stackoverflow and did not find an answer for my problem. Please help me! :)
I finally got it working thanks to AlexanderReynolds hint to use another library. This is of course a workaround; I could not get it working using scipy's affine_transform, so I used OpenCVs cv2.warpAffine instead. In case this is helpful to anyone else, this is my code:
import numpy as np
import matplotlib.pyplot as plt
import cv2
# Prepare source image
sourcePoints = [[162., 112.], [130., 112.], [162., 240.]]
targetPoints = [[180., 102.], [101., 101.], [190., 200.]]
image = np.empty((300, 300, 3), dtype='uint8')
image[:] = 255
# Mark border for better visibility
image[0:2, :] = 0
image[-3:-1, :] = 0
image[:, 0:2] = 0
image[:, -3:-1] = 0
# Mark source points in red
for sp in sourcePoints:
sp = [int(u) for u in sp]
image[sp[1] - 5:sp[1] + 5, sp[0] - 5:sp[0] + 5, :] = np.array([255, 0, 0])
# Show image
plt.subplot(3, 1, 1)
plt.imshow(image)
# Prepare array in which the image is placed
array = np.empty((400, 300, 3), dtype='uint8')
array[:] = 255
a2 = array.copy()
# Mark target points in blue
for tp in targetPoints:
tp = [int(u) for u in tp]
a2[tp[1] - 2:tp[1] + 2, tp[0] - 2:tp[0] + 2] = [0, 0, 255]
# Show array
plt.subplot(3, 1, 2)
plt.imshow(a2)
# Calculate affine matrix and transform image
M = cv2.getAffineTransform(np.float32(sourcePoints), np.float32(targetPoints))
array = cv2.warpAffine(image, M, array.shape[:2], borderValue=[255, 255, 255])
# Mark blue target points in array, expected to be above red source points
for tp in targetPoints:
tp = [int(u) for u in tp]
array[tp[1] - 2:tp[1] + 2, tp[0] - 2:tp[0] + 2] = [0, 0, 255]
plt.subplot(3, 1, 3)
plt.imshow(array)
plt.show()
Comments:
Interesting how it worked almost immediately after changing the library. After having spent more than a day trying to get it work with scipy, this is a lesson for myself to change libraries faster.
In case someone wants to find an (least squares) approximation for an affine transformation based on more than three points, this is how you get the matrix that works with cv2.warpAffine:
Code:
def calcAffineMatrix(sourcePoints, targetPoints):
# For three or more source and target points, find the affine transformation
A = []
b = []
for sp, trg in zip(sourcePoints, targetPoints):
A.append([sp[0], 0, sp[1], 0, 1, 0])
A.append([0, sp[0], 0, sp[1], 0, 1])
b.append(trg[0])
b.append(trg[1])
result, resids, rank, s = np.linalg.lstsq(np.array(A), np.array(b))
a0, a1, a2, a3, a4, a5 = result
affineTrafo = np.float32([[a0, a2, a4], [a1, a3, a5]])
return affineTrafo
I have the following blurring kernel I need to apply to every pixel in an RGB image
[ 0.0625 0.025 0.375 0.025 0.0625 ]
So, the pseudo-code looks something like this in Numpy
for i in range(rows):
for j in range(cols):
for k in range(3):
final[i][j][k] = image[i-2][j][k]*0.0625 + \
image[i-1][j][k]*0.25 + \
image[i][j][k]*0.375 + \
image[i+1][j][k]*0.25 + \
image[i+2][j][k]*0.0625
I've tried searching for a question similar to this but never found these sort of data accesses in the computation.
How do I perform the above function for a Theano tensor matrix?
You can use Conv2D function for this task. see the reference here and may be you also can read the example tutorial here. Notes for this solution:
Because your kernel is symmetrical, you can ignore filter_flip parameter
Conv2D is using 4D input and kernel shape as parameters, so you need to reshape it first
Conv2D sum every channel (I think in your case 'k' variable is for RGB right? it's called channel) so you should separate it first
This is my example code, I use simpler kernel here:
import numpy as np
import theano
import theano.tensor as T
from theano.tensor.nnet import conv2d
# original image
img = [[[1, 2, 3, 4], #R channel
[1, 1, 1, 1], #
[2, 2, 2, 2]], #
[[1, 1, 1, 1], #G channel
[2, 2, 2, 2], #
[1, 2, 3, 4]], #
[[1, 1, 1, 1], #B channel
[1, 2, 3, 4], #
[2, 2, 2, 2],]]#
# separate and reshape each channel to 4D
R = np.asarray([[img[0]]], dtype='float32')
G = np.asarray([[img[1]]], dtype='float32')
B = np.asarray([[img[2]]], dtype='float32')
# 4D kernel from the original : [1,0,1]
kernel = np.asarray([[[[1],[0],[1]]]], dtype='float32')
# theano convolution
t_img = T.ftensor4("t_img")
t_kernel = T.ftensor4("t_kernel")
result = conv2d(
input = t_img,
filters=t_kernel,
filter_shape=(1,1,1,3),
border_mode = 'half')
f = theano.function([t_img,t_kernel],result)
# compute each channel
R = f(R,kernel)
G = f(G,kernel)
B = f(B,kernel)
# reshape again
img = np.asarray([R,G,B])
img = np.reshape(img,(3,3,4))
print img
If you have anything to discuss about the code, please comment. Hope it helps.