Upsampling images in frequency domain using Pytorch - python

I'm trying to upsample an RGB image in the frequency domain, using Pytorch. I'm using this article for reference on grayscale images. Since Pytorch processes the channels individually, I figure the colorspace is irrelevant here.
The basic steps outlined by this article are:
Perform FFT on the image.
Pad the FFT with zeros.
Perform inverse FFT.
I wrote the following code for the same:
import torch
import cv2
import numpy as np
img = src = cv2.imread('orig.png')
torch_img = torch.from_numpy(img).to(torch.float32).permute(2, 0, 1) / 255.
fft = torch.fft.fft2(torch_img, norm="forward")
fr = fft.real
fi = fft.imag
fr = F.pad(fr, (fft.shape[-1]//2, fft.shape[-1]//2, fft.shape[-2]//2, fft.shape[-2]//2), mode='constant', value=0)
fi = F.pad(fi, (fft.shape[-1]//2, fft.shape[-1]//2, fft.shape[-2]//2, fft.shape[-2]//2), mode='constant', value=0)
fft_hires = torch.complex(fr, fi)
inv = torch.fft.ifft2(fft_hires, norm="forward").real
print(inv.max(), inv.min())
img = (inv.permute(1, 2, 0).detach()).clamp(0, 1)
img = (255 * img).numpy().astype(np.uint8)
cv2.imwrite('hires.png', img)
The original image:
The upscaled image:
Another interesting thing to note is the maximum and minimum values of the image pixels after performing IFFT: they are 2.2729 and -1.8376 respectively. Ideally, they should be 1.0 and 0.0.
Can someone please explain what's wrong here?

The usual convention for the DFT is to treat the first sample as 0Hz component. But you need to have the 0Hz component in the center in order for padding to make sense. Most FFT tools provide a shift function to circularly shift your result so that the 0Hz component is in the center. In pytorch you need to perform torch.fft.fftshift after the FFT and torch.fft.ifftshift right before taking the inverse FFT to put the 0Hz component back in the upper left corner.
import torch
import torch.nn.functional as F
import cv2
import numpy as np
img = src = cv2.imread('orig.png')
torch_img = torch.from_numpy(img).to(torch.float32).permute(2, 0, 1) / 255.
# note the fftshift
fft = torch.fft.fftshift(torch.fft.fft2(torch_img, norm="forward"))
fr = fft.real
fi = fft.imag
fr = F.pad(fr, (fft.shape[-1]//2, fft.shape[-1]//2, fft.shape[-2]//2, fft.shape[-2]//2), mode='constant', value=0)
fi = F.pad(fi, (fft.shape[-1]//2, fft.shape[-1]//2, fft.shape[-2]//2, fft.shape[-2]//2), mode='constant', value=0)
# note the ifftshift
fft_hires = torch.fft.ifftshift(torch.complex(fr, fi))
inv = torch.fft.ifft2(fft_hires, norm="forward").real
print(inv.max(), inv.min())
img = (inv.permute(1, 2, 0).detach()).clamp(0, 1)
img = (255 * img).numpy().astype(np.uint8)
cv2.imwrite('hires.png', img)
which produces the following hires.png

Related

How Do I Develop a negative film image using python

I have tried inverting a negative film images color with the bitwise_not() function in python but it has this blue tint. I would like to know how I could develop a negative film image that looks somewhat good. Here's the outcome of what I did. (I just cropped the negative image for a new test I was doing so don't mind that)
If you don't use exact maximum and minimum, but 1st and 99th percentile, or something nearby (0.1%?), you'll get some nicer contrast. It'll cut away outliers due to noise, compression, etc.
Additionally, you should want to mess with gamma, or scale the values linearly, to achieve white balance.
I'll apply a "gray world assumption" and scale each plane so the mean is gray. I'll also mess with gamma, but that's just messing around.
And... all of that completely ignores gamma mapping, both of the "negative" and of the outputs.
import numpy as np
import cv2 as cv
import skimage
im = cv.imread("negative.png")
(bneg,gneg,rneg) = cv.split(im)
def stretch(plane):
# take 1st and 99th percentile
imin = np.percentile(plane, 1)
imax = np.percentile(plane, 99)
# stretch the image
plane = (plane - imin) / (imax - imin)
return plane
b = 1 - stretch(bneg)
g = 1 - stretch(gneg)
r = 1 - stretch(rneg)
bgr = cv.merge([b,g,r])
cv.imwrite("positive.png", bgr * 255)
b = 1 - stretch(bneg)
g = 1 - stretch(gneg)
r = 1 - stretch(rneg)
# gray world
b *= 0.5 / b.mean()
g *= 0.5 / g.mean()
r *= 0.5 / r.mean()
bgr = cv.merge([b,g,r])
cv.imwrite("positive_grayworld.png", bgr * 255)
b = 1 - np.clip(stretch(bneg), 0, 1)
g = 1 - np.clip(stretch(gneg), 0, 1)
r = 1 - np.clip(stretch(rneg), 0, 1)
# goes in the right direction
b = skimage.exposure.adjust_gamma(b, gamma=b.mean()/0.5)
g = skimage.exposure.adjust_gamma(g, gamma=g.mean()/0.5)
r = skimage.exposure.adjust_gamma(r, gamma=r.mean()/0.5)
bgr = cv.merge([b,g,r])
cv.imwrite("positive_gamma.png", bgr * 255)
Here's what happens when gamma is applied to the inverted picture... a reasonably tolerable transfer function results from applying the same factor twice, instead of applying its inverse.
Trying to "undo" the gamma while ignoring that the values were inverted... causes serious distortions:
And the min/max values for contrast stretching also affect the whole thing.
A simple photo of a negative simply won't do. It'll include stray light that offsets the black point, at the very least. You need a proper scan of the negative.
Here is one simple way to do that in Python/OpenCV. Basically one stretches each channel of the image to full dynamic range separately. Then recombines. Then inverts.
Input:
import cv2
import numpy as np
import skimage.exposure
# read image
img = cv2.imread('boys_negative.png')
# separate channels
r,g,b = cv2.split(img)
# stretch each channel
r_stretch = skimage.exposure.rescale_intensity(r, in_range='image', out_range=(0,255)).astype(np.uint8)
g_stretch = skimage.exposure.rescale_intensity(g, in_range='image', out_range=(0,255)).astype(np.uint8)
b_stretch = skimage.exposure.rescale_intensity(b, in_range='image', out_range=(0,255)).astype(np.uint8)
# combine channels
img_stretch = cv2.merge([r_stretch, g_stretch, b_stretch])
# invert
result = 255 - img_stretch
cv2.imshow('input', img)
cv2.imshow('result', result)
cv2.waitKey(0)
cv2.destroyAllWindows()
# save results
cv2.imwrite('boys_negative_inverted.jpg', result)
Result:
Caveat: This works for this image, but may not be a universal solution for all images.
ADDITION
In the above, I did not clip when stretching as I wanted to preserver all information. But if one wants to clip and use skimage.exposure.rescale_intensity for stretching, then it is easy enough by the following:
import cv2
import numpy as np
import skimage.exposure
# read image
img = cv2.imread('boys_negative.png')
# separate channels
r,g,b = cv2.split(img)
# compute clip points -- clip 1% only on high side
clip_rmax = np.percentile(r, 99)
clip_gmax = np.percentile(g, 99)
clip_bmax = np.percentile(b, 99)
clip_rmin = np.percentile(r, 0)
clip_gmin = np.percentile(g, 0)
clip_bmin = np.percentile(b, 0)
# stretch each channel
r_stretch = skimage.exposure.rescale_intensity(r, in_range=(clip_rmin,clip_rmax), out_range=(0,255)).astype(np.uint8)
g_stretch = skimage.exposure.rescale_intensity(g, in_range=(clip_gmin,clip_gmax), out_range=(0,255)).astype(np.uint8)
b_stretch = skimage.exposure.rescale_intensity(b, in_range=(clip_bmin,clip_bmax), out_range=(0,255)).astype(np.uint8)
# combine channels
img_stretch = cv2.merge([r_stretch, g_stretch, b_stretch])
# invert
result = 255 - img_stretch
cv2.imshow('input', img)
cv2.imshow('result', result)
cv2.waitKey(0)
cv2.destroyAllWindows()
# save results
cv2.imwrite('boys_negative_inverted2.jpg', result)
Result:

Add noise with varying grain size in Python

I am trying to add noise into image to imitate real world noise created by having high ISO settings in camera.
from skimage.util import random_noise
import random
val = random.uniform(0.036, 0.107)
noisy_img = random_noise(im_arr, mode='gaussian', var=val ** 2)
noisy_img = (255 * noisy_img).astype(np.uint8)
That code works fine, but the size of the noise grain is always 1 pixel. I really want to have a varying size of the noise grain. How can I achieve that?
It's very challenging to imitate the varying grain size noise of high ISO settings.
One of the reasons is that the source of the varying grain is not purely physical effect.
Some of the grain comes from digital noise reduction (image processing) artifacts that are different from camera to camera.
I thought about a relatively simple solution:
Add random noise at different resolutions.
Resize the different resolutions to the original image size.
Sum the resized images to from "noise image" (with zero mean).
Add the "noise image" to the original (clean) image.
A lot of tuning is required - selecting the resolutions, setting different noise to different resolutions, select the resizing interpolation method...
I don't think it's going to be exactly what you are looking for, but it applies "noise with varying grain size", and may give you a lead.
Code sample:
from skimage.util import random_noise
from skimage.io import imsave
from skimage.transform import resize
import random
import numpy as np
im_arr = np.full((256, 320), 0.5) # Original image - use gray image for testing
rows, cols = im_arr.shape
val = 0.036 #random.uniform(0.036, 0.107) # Use constant variance (for testing).
# Full resolution
noise_im1 = np.zeros((rows, cols))
noise_im1 = random_noise(noise_im1, mode='gaussian', var=val**2, clip=False)
# Half resolution
noise_im2 = np.zeros((rows//2, cols//2))
noise_im2 = random_noise(noise_im2, mode='gaussian', var=(val*2)**2, clip=False) # Use val*2 (needs tuning...)
noise_im2 = resize(noise_im2, (rows, cols)) # Upscale to original image size
# Quarter resolution
noise_im3 = np.zeros((rows//4, cols//4))
noise_im3 = random_noise(noise_im3, mode='gaussian', var=(val*4)**2, clip=False) # Use val*4 (needs tuning...)
noise_im3 = resize(noise_im3, (rows, cols)) # What is the interpolation method?
noise_im = noise_im1 + noise_im2 + noise_im3 # Sum the noise in multiple resolutions (the mean of noise_im is around zero).
noisy_img = im_arr + noise_im # Add noise_im to the input image.
noisy_img = np.round((255 * noisy_img)).clip(0, 255).astype(np.uint8)
imsave('noisy_img.png', noisy_img)
Result:
Your question suggests that you want spatially correlated noise, whereby neighboring pixels share some information.
If you don't really care about what that correlation structure looks like, you can use a simple smoothing kernel to generate noise with coarser granularity.
One way to achieve that would be:
from skimage.data import shepp_logan_phantom
from skimage.util import random_noise
from scipy.ndimage import correlate
import numpy as np
# Granularity = 1
im_arr = shepp_logan_phantom()
val = 0.05
noisy_img = random_noise(im_arr, mode='gaussian', var=val)
# Correlated noise to increase granularity
# Generate random noise like skimage's random_noise does
noise = np.random.normal(scale=np.sqrt(val), size=im_arr.shape)
# Create a smoothing kernel
weights = np.array([[0, 1, 0], [1, 1, 1], [0, 1, 0]]) / 5
# Apply it to the noise
noise_corr = correlate(noise, weights)
# Apply noise to image and clip
noisy_img_corr = np.clip(im_arr + noise_corr, 0, 1)
fig, (ax1, ax2) = plt.subplots(ncols=2)
ax1.imshow(noisy_img)
ax1.set_title("Uncorrelated noise")
ax1.axis("off")
ax2.imshow(noisy_img_corr)
ax2.set_title("Correlated noise")
ax2.axis("off")
Or you could come up with better noise model from first principles if you know where the noise in your camera is coming from. There are some ideas here: https://graphics.stanford.edu/courses/cs178-10/lectures/noise-27apr10-150dpi-med.pdf .
Rotem's answer is the best implementation.
I (the original poster) use the following code to expand on his implementation for colored images and using PIL as import, just in case anyone need it later:
from skimage.transform import resize
import numpy as np
from skimage.util import random_noise
from PIL import Image
def gen_noise_mask(rows, cols):
val = 0.036 # random.uniform(0.036, 0.107) # Use constant variance (for testing).
# Full resolution
noise_im1 = np.zeros((rows, cols))
noise_im1 = random_noise(noise_im1, mode='gaussian', var=val ** 2, clip=False)
# Half resolution
noise_im2 = np.zeros((rows // 2, cols // 2))
noise_im2 = random_noise(noise_im2, mode='gaussian', var=(val * 2) ** 2, clip=False) # Use val*2 (needs tuning...)
noise_im2 = resize(noise_im2, (rows, cols)) # Upscale to original image size
# Quarter resolution
noise_im3 = np.zeros((rows // 4, cols // 4))
noise_im3 = random_noise(noise_im3, mode='gaussian', var=(val * 4) ** 2, clip=False) # Use val*4 (needs tuning...)
noise_im3 = resize(noise_im3, (rows, cols)) # What is the interpolation method?
noise_im = noise_im1 + noise_im2 + noise_im3 # Sum the noise in multiple resolutions (the mean of noise_im is around zero).
return noise_im
def noiseGenerator(im):
im_arr = np.asarray(im)
rows, cols, depth = im_arr.shape
rgba_array = np.zeros((rows, cols, depth), 'float64')
for d in range(0, depth):
rgba_array[..., d] += gen_noise_mask(rows, cols)
noisy_img = im_arr / 255 + rgba_array # Add noise_im to the input image.
noisy_img = np.round((255 * noisy_img)).clip(0, 255).astype(np.uint8)
return Image.fromarray(noisy_img)

Get mask of image without using OpenCV

I'm trying the following to get the mask out of this image, but unfortunately I fail.
import numpy as np
import skimage.color
import skimage.filters
import skimage.io
# get filename, sigma, and threshold value from command line
filename = 'pathToImage'
# read and display the original image
image = skimage.io.imread(fname=filename)
skimage.io.imshow(image)
# blur and grayscale before thresholding
blur = skimage.color.rgb2gray(image)
blur = skimage.filters.gaussian(blur, sigma=2)
# perform inverse binary thresholding
mask = blur < 0.8
# use the mask to select the "interesting" part of the image
sel = np.ones_like(image)
sel[mask] = image[mask]
# display the result
skimage.io.imshow(sel)
How can I obtain the mask?
Is there a general approach that would work for this image as well. without custom fine-tuning and changing parameters?
Apply high contrast (maximum possible value)
convert to black & white image using high threshold (I've used 250)
min filter (value=8)
max filter (value=8)
Here is how you can get a rough mask using only the skimage library methods:
import numpy as np
from skimage.io import imread, imsave
from skimage.feature import canny
from skimage.color import rgb2gray
from skimage.filters import gaussian
from skimage.morphology import dilation, erosion, selem
from skimage.measure import find_contours
from skimage.draw import polygon
def get_mask(img):
kernel = selem.rectangle(7, 6)
dilate = dilation(canny(rgb2gray(img), 0), kernel)
dilate = dilation(dilate, kernel)
dilate = dilation(dilate, kernel)
erode = erosion(dilate, kernel)
mask = np.zeros_like(erode)
rr, cc = polygon(*find_contours(erode)[0].T)
mask[rr, cc] = 1
return gaussian(mask, 7) > 0.74
def save_img_masked(file):
img = imread(file)[..., :3]
mask = get_mask(img)
result = np.zeros_like(img)
result[mask] = img[mask]
imsave("masked_" + file, result)
save_img_masked('belt.png')
save_img_masked('bottle.jpg')
Resulting masked_belt.png:
Resulting masked_bottle.jpg:
One approach uses the fact that the background changes color only very slowly. Here I apply the gradient magnitude to each of the channels and compute the norm of the result, giving me an image highlighting the quicker changes in color. The watershed of this (with sufficient tolerance) should have one or more regions covering the background and touching the image edge. After identifying those regions, and doing a bit of cleanup we get these results (red line is the edge of the mask, overlaid on the input image):
I did have to adjust the tolerance, with a lower tolerance in the first case, more of the shadow is seen as object. I think it should be possible to find a way to set the tolerance based on the statistics of the gradient image, I have not tried.
There are no other parameters to tweak here, the minimum object area, 300, is quite safe; an alternative would be to keep only the one largest object.
This is the code, using DIPlib (disclaimer: I'm an author). out is the mask image, not the outline as displayed above.
import diplib as dip
import numpy as np
# Case 1:
img = dip.ImageRead('Pa9DO.png')
img = img[362:915, 45:877] # cut out actual image
img = img(slice(0,2)) # remove alpha channel
tol = 7
# Case 2:
#img = dip.ImageRead('jTnVr.jpg')
#tol = 1
# Compute gradient
gm = dip.Norm(dip.GradientMagnitude(img))
# Compute watershed with tolerance
lab = dip.Watershed(gm, connectivity=1, maxDepth=tol, flags={'correct','labels'})
# Identify regions touching the image edge
ll = np.unique(np.concatenate((
np.unique(lab[:,0]),
np.unique(lab[:,-1]),
np.unique(lab[0,:]),
np.unique(lab[-1,:]))))
# Remove regions touching the image edge
out = dip.Image(lab.Sizes(), dt='BIN')
out.Fill(1)
for l in ll:
if l != 0: # label zero is for the watershed lines
out = out - (lab == l)
# Remove watershed lines
out = dip.Opening(out, dip.SE(3, 'rectangular'))
# Remove small regions
out = dip.AreaOpening(out, filterSize=300)
# Display
dip.Overlay(img, dip.Dilation(out, 3) - out).Show()

Linear-Blurring an Image

I'm trying to blurr an image by mapping each pixel to the average of the N pixels to the right of it (in the same row). My iterative solution produces good output, but my linear-algebra solution is producing bad output.
From testing, I believe my kernel-matrix is correct; and, I know the last N rows don't get blurred, but that's fine for now. I'd appreciate any hints or solutions.
iterative-solution output (good), linear-algebra output (bad)
original image; and here is the failing linear-algebra code:
def blur(orig_img):
# turn image-mat into a vector
flattened_img = orig_img.flatten()
L = flattened_img.shape[0]
N = 3
# kernel
kernel = np.zeros((L, L))
for r, row in enumerate(kernel[0:-N]):
row[r:r+N] = [round(1/N, 3)]*N
print(kernel)
# blurr the img
print('starting blurring')
blurred_img = np.matmul(kernel, flattened_img)
blurred_img = blurred_img.reshape(orig_img.shape)
return blurred_img
The equation I'm modelling is this:
One option might be to just use a kernel and a convolution?
For example if we load a gray scale image like so:
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
from scipy import ndimage
# load a hackinsh grayscale image
image = np.asarray(Image.open('cup.jpg')).mean(axis=2)
plt.imshow(image)
plt.title('Gray scale image')
plt.show()
Now one can use a kernel and convolution. For example to create a filter that filters just one rows and compute the value of the center pixel as the difference between the pixels to the right and left one can do the following:
# Create a kernel that takes the difference between neighbors horizontal pixes
k = np.array([[-1,0,1]])
plt.subplot(121)
plt.title('Kernel')
plt.imshow(k)
plt.subplot(122)
plt.title('Output')
plt.imshow(ndimage.convolve(image, k, mode='constant', cval=0.0))
plt.show()
Therefore, one can blurr an image by mapping each pixel to the average of the N pixels to the right of it by creating the appropiate kernel.
# Create a kernel that takes the average of N pixels to the right
n=10
k = np.zeros(n*2);k[n:]=1/n
k = k[np.newaxis,...]
plt.subplot(121)
plt.title('Kernel')
plt.imshow(k)
plt.subplot(122)
plt.title('Output')
plt.imshow(ndimage.convolve(image, k, mode='constant', cval=0.0))
plt.show()
The issue was incorrect usage of cv2.imshow() in displaying the output image. It expects floating-point pixel values to be in [0, 1]; which, is done in the below code (near bottom):
def blur(orig_img):
flattened_img = orig_img.flatten()
L = flattened_img.shape[0]
N = int(round(0.1 * orig_img.shape[0], 0))
# mask (A)
mask = np.zeros((L, L))
for r, row in enumerate(mask[0:-N]):
row[r:r+N] = [round(1/N, 2)]*N
# blurred img = A * flattened_img
print('starting blurring')
blurred_img = np.matmul(mask, flattened_img)
blurred_img = blurred_img.reshape(orig_img.shape)
cv2.imwrite('blurred_img.png', blurred_img)
# normalize img to [0,1]
blurred_img = (
blurred_img - blurred_img.min()) / (blurred_img.max()-blurred_img.min())
return blurred_img
Ammended output
Thank you to #CrisLuengo for identifying the issue.

Sobel filter implementation in scipy

I tried to implement the Sobel_X filter in scipy with convolve2d function.
I compared with the results from this function:
from scipy.signal import convolve2d
from scipy import misc
from skimage.exposure import rescale_intensity
import cv2
import numpy as np
#https://www.pyimagesearch.com/2016/07/25/convolutions-with-opencv-and-python/
def convolve(image, kernel):
# grab the spatial dimensions of the image, along with
# the spatial dimensions of the kernel
(iH, iW) = image.shape[:2]
(kH, kW) = kernel.shape[:2]
# print("Kh,Kw", kernel.shape[:2])
# allocate memory for the output image, taking care to
# "pad" the borders of the input image so the spatial
# size (i.e., width and height) are not reduced
pad = (kW - 1) // 2
# print("pad", pad)
image = cv2.copyMakeBorder(image, pad, pad, pad, pad,
cv2.BORDER_REPLICATE)
# self.imshow(image, "padded image")
output = np.zeros((iH, iW), dtype="float32")
# loop over the input image, "sliding" the kernel across
# each (x, y)-coordinate from left-to-right and top to
# bottom
for y in np.arange(pad, iH + pad):
for x in np.arange(pad, iW + pad):
# extract the ROI of the image by extracting the
# *center* region of the current (x, y)-coordinates
# dimensions
roi = image[y - pad:y + pad + 1, x - pad:x + pad + 1]
# perform the actual convolution by taking the
# element-wise multiplicate between the ROI and
# the kernel, then summing the matrix
k = (roi * kernel).sum()
# store the convolved value in the output (x,y)-
# coordinate of the output image
output[y - pad, x - pad] = k
# self.imshow(output, "padded image")
# rescale the output image to be in the range [0, 255]
output = rescale_intensity(output, in_range=(0, 255))
output = (output * 255).astype("uint8")
# return the output image
return output
Here are the Sobel_X Kernel and code to compare.
sobelX = np.array((
[-1, 0, 1],
[-2, 0, 2],
[-1, 0, 1]), dtype="int")]
testim=misc.face(gray=True)
convolved_func=convolve(testim, sobelX)
convolved_np=convolve2d(testim, sobelX, boundary='symm', mode='same')
cv2.imshow("Face", np.hstack((convolved_func,np.array(convolved_np, dtype="uint8"))))
cv2.waitKey(0)
cv2.destroyAllWindows()
As you can see here the results are entirely different
I can't get how to implement these filters to get the same results.
Should I somehow change the filter function or maybe there some special things in numpy to implement it, wright?
I tried to make the function for scipy as in this and that examples, but the results the same or worth (I've got black image).
You will get results slightly different.
Do thresholding to remove all numbers which are less than 0.
convolved_np[convolved_np<0]=0
That will give you something similar, still not the same. Some artifacts appeared.
I think these functions differ, that's why I have got a bit different results. Maybe there are some mistakes, so if you can add some to this answer, I will appreciate it.

Categories