I have a numpy array of images of shape (N, H, W, C) where N is the number of images, H the image height, W the image width and C the RGB channels.
I would like to standardize my images channel-wise, so for each image I would like to channel-wise subtract the image channel's mean and divide by its standard deviation.
I did this in a loop, which worked, however it is very inefficient and as it makes a copy my RAM is getting too full.
def standardize(img):
mean = np.mean(img)
std = np.std(img)
img = (img - mean) / std
return img
for img in rgb_images:
r_channel = standardize(img[:,:,0])
g_channel = standardize(img[:,:,1])
b_channel = standardize(img[:,:,2])
normalized_image = np.stack([r_channel, g_channel, b_channel], axis=-1)
standardized_images.append(normalized_image)
standardized_images = np.array(standardized_images)
How can I do this more efficiently making use of numpy's capabilities?
Perform the ufunc reductions (mean, std) along the second and third axes, while keeping the dims intact that help in broadcasting later on with the division step -
mean = np.mean(rgb_images, axis=(1,2), keepdims=True)
std = np.std(rgb_images, axis=(1,2), keepdims=True)
standardized_images_out = (rgb_images - mean) / std
Boost the performance further by re-using the average values to compute standard-deviation, according to its formula and hence inspired by this solution , like so -
std = np.sqrt(((rgb_images - mean)**2).mean((1,2), keepdims=True))
Packaging into a function with the axes for reductions as a parameter, we would have -
from __future__ import division
def normalize_meanstd(a, axis=None):
# axis param denotes axes along which mean & std reductions are to be performed
mean = np.mean(a, axis=axis, keepdims=True)
std = np.sqrt(((a - mean)**2).mean(axis=axis, keepdims=True))
return (a - mean) / std
standardized_images = normalize_meanstd(rgb_images, axis=(1,2))
Related
I would like to know I to calculate the mean and the std of a given dataset of RGB images.
For example, with imagenet we have imagenet_stats: ([0.485, 0.456, 0.406], [0.229, 0.224, 0.225].
I tried:
rgb_values = [np.mean(Image.open(img).getdata(), axis=0)/255 for img in imgs_path]
np.mean(rgb_values, axis=0)
np.std(rgb_values, axis=0)
I am not sure that the values I get are correct.
Which could be a better implementation?
Two solutions:
The first solution iterates over the images. It is MUCH slower than the second solution, and it uses the same amount of memory because it first loads and then stores all the images in a list. So it is strictly worse than the second solution, unless you will change how your images are loaded - load and process them one by one from disc.
The second solution needs to hold all images in memory at the same time. It is MUCH faster, because it is fully vectorized.
First solution (iterating over the images):
For each channel: R, G, B, here is how to calculate the means and stds of all the pixels in all the images:
Requirement:
Each image has the same number of pixels.
If this is not the case - use the second solution (below).
images_rgb = [np.array(Image.open(img).getdata()) / 255. for img in imgs_path]
# Each image_rgb is of shape (n, 3),
# where n is the number of pixels in each image,
# and 3 are the channels: R, G, B.
means = []
for image_rgb in images_rgb:
means.append(np.mean(image_rgb, axis=0))
mu_rgb = np.mean(means, axis=0) # mu_rgb.shape == (3,)
variances = []
for image_rgb in images_rgb:
var = np.mean((image_rgb - mu_rgb) ** 2, axis=0)
variances.append(var)
std_rgb = np.sqrt(np.mean(variances, axis=0)) # std_rgb.shape == (3,)
Proof
... that the mean and std will be same if calculated like this, and if calculated using all pixels at once:
Let's say each image has n pixels (with values vals_i), and there are m images.
Then there are (n*m) pixels.
The real_mean of all pixels in all vals_is is:
total_sum = sum(vals_1) + sum(vals_2) + ... + sum(vals_m)
real_mean = total_sum / (n*m)
Adding up the means of each image individually:
sum_of_means = sum(vals_1) / m + sum(vals_2) / m + ... + sum(vals_m) / m
= (sum(vals_1) + sum(vals_2) + ... + sum(vals_m)) / m
Now, what is the relationship between the real_mean and sum_of_means? - As you can see,
real_mean = sum_of_means / n
Analogously, using the formula for standard deviation, the real_std of all pixels in all vals_is is:
sum_of_square_diffs = sum(vals_1 - real_mean) ** 2
+ sum(vals_2 - real_mean) ** 2
+ ...
+ sum(vals_m - real_mean) ** 2
real_std = sqrt( total_sum / (n*m) )
If you look at this equation from another angle, you can see that real_std is basically the average of average variances of n values in m images.
Verification
Real mean and std:
rng = np.random.default_rng(0)
vals = rng.integers(1, 100, size=100) # data
mu = np.mean(vals)
print(mu)
print(np.std(vals))
50.93 # real mean
28.048976808432776 # real standard deviation
Comparing it to the image-by-image approach:
n_images = 10
means = []
for subset in np.split(vals, n_images):
means.append(np.mean(subset))
new_mu = np.mean(means)
variances = []
for subset in np.split(vals, n_images):
var = np.mean((subset - mu) ** 2)
variances.append(var)
print(new_mu)
print(np.sqrt(np.mean(variances)))
50.92999999999999 # calculated mean
28.048976808432784 # calculated standard deviation
Second solution (fully vectorized):
Using all the pixels of all images at once.
rgb_values = np.concatenate(
[Image.open(img).getdata() for img in imgs_path],
axis=0
) / 255.
# rgb_values.shape == (n, 3),
# where n is the total number of pixels in all images,
# and 3 are the 3 channels: R, G, B.
# Each value is in the interval [0; 1]
mu_rgb = np.mean(rgb_values, axis=0) # mu_rgb.shape == (3,)
std_rgb = np.std(rgb_values, axis=0) # std_rgb.shape == (3,)
Say I have a batch of images in the form of tensors with dimensions (B x C x W x H) where B is the batch size, C is the number of channels in the image, and W and H are the width and height of the image respectively. I'm looking to use the transforms.Normalize() function to normalize my images with respect to the mean and standard deviation of the dataset across the C image channels, meaning that I want a resulting tensor in the form 1 x C. Is there a straightforward way to do this?
I tried torch.view(C, -1).mean(1) and torch.view(C, -1).std(1) but I get the error:
view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.
Edit
After looking into how view() works in PyTorch, I know realize why my approach doesn't work; however, I still can't figure out how to get the per-channel mean and standard deviation.
Note that variances add, not standard deviations. See detailed explanation here: https://apcentral.collegeboard.org/courses/ap-statistics/classroom-resources/why-variances-add-and-why-it-matters
Here is the modified code:
nimages = 0
mean = 0.0
var = 0.0
for i_batch, batch_target in enumerate(trainloader):
batch = batch_target[0]
# Rearrange batch to be the shape of [B, C, W * H]
batch = batch.view(batch.size(0), batch.size(1), -1)
# Update total number of images
nimages += batch.size(0)
# Compute mean and std here
mean += batch.mean(2).sum(0)
var += batch.var(2).sum(0)
mean /= nimages
var /= nimages
std = torch.sqrt(var)
print(mean)
print(std)
You just need to rearrange batch tensor in a right way: from [B, C, W, H] to [B, C, W * H] by:
batch = batch.view(batch.size(0), batch.size(1), -1)
Here is complete usage example on random data:
Code:
import torch
from torch.utils.data import TensorDataset, DataLoader
data = torch.randn(64, 3, 28, 28)
labels = torch.zeros(64, 1)
dataset = TensorDataset(data, labels)
loader = DataLoader(dataset, batch_size=8)
nimages = 0
mean = 0.
std = 0.
for batch, _ in loader:
# Rearrange batch to be the shape of [B, C, W * H]
batch = batch.view(batch.size(0), batch.size(1), -1)
# Update total number of images
nimages += batch.size(0)
# Compute mean and std here
mean += batch.mean(2).sum(0)
std += batch.std(2).sum(0)
# Final step
mean /= nimages
std /= nimages
print(mean)
print(std)
Output:
tensor([-0.0029, -0.0022, -0.0036])
tensor([0.9942, 0.9939, 0.9923])
I have been stuck here for sometime now. I cannot understand what am I doing wrong in calculating the displacement vectors along x-axis and y-axis using the Lucas Kanade method.
I implemented it as given in the above Wikipedia link. Here is what I have done:
import cv2
import numpy as np
img_a = cv2.imread("./images/1.png",0)
img_b = cv2.imread("./images/2.png",0)
# Calculate gradient along x and y axis
ix = cv2.Sobel(img_a, cv2.CV_64F, 1, 0, ksize = 3, scale = 1.0/3.0)
iy = cv2.Sobel(img_a, cv2.CV_64F, 0, 1, ksize = 3, scale = 1.0/3.0)
# Calculate temporal difference between the 2 images
it = img_b - img_a
ix = ix.flatten()
iy = iy.flatten()
it = -it.flatten()
A = np.vstack((ix, iy)).T
atai = np.linalg.inv(np.dot(A.T,A))
atb = np.dot(A.T, it)
v = np.dot(np.dot(np.linalg.inv(np.dot(A.T,A)),A.T),it)
print(v)
This code runs without an error but it prints an array of 2 values! I had expected the v matrix to be of the same size as that of the image. Why does this happen? What am I doing incorrectly?
PS: I know there are methods directly available with OpenCV but I want to write this simple algorithm (as also given in the Wikipedia link shared above) myself.
To properly compute the Lucas–Kanade optical flow estimate you need to solve the system of two equations for every pixel, using information from its neighborhood, not for the image as a whole.
This is the recipe (notation refers to that used on the Wikipedia page):
Compute the image gradient (A) for the first image (ix, iy in the OP) using any method (Sobel is OK, I prefer Gaussian derivatives; note that it is important to apply the right scaling in Sobel: 1/8).
ix = cv2.Sobel(img_a, cv2.CV_64F, 1, 0, ksize = 3, scale = 1.0/8.0)
iy = cv2.Sobel(img_a, cv2.CV_64F, 0, 1, ksize = 3, scale = 1.0/8.0)
Compute the structure tensor (ATWA): Axx = ix * ix, Axy = ix * iy, Ayy = iy * iy. Each of these three images must be smoothed with a Gaussian filter (this is the windowing). For example,
Axx = cv2.GaussianBlur(ix * ix, (0,0), 5)
Axy = cv2.GaussianBlur(ix * iy, (0,0), 5)
Ayy = cv2.GaussianBlur(iy * iy, (0,0), 5)
These three images together form the structure tensor, which is a 2x2 symmetric matrix at each pixel. For a pixel at (i,j), the matrix is:
| Axx(i,j) Axy(i,j) |
| Axy(i,j) Ayy(i,j) |
Compute the temporal gradient (b) by subtracting the two images (it in the OP).
it = img_b - img_a
Compute ATWb: Abx = ix * it, Aby = iy * it, and smooth these two images with the same Gaussian filter as above.
Abx = cv2.GaussianBlur(ix * it, (0,0), 5)
Aby = cv2.GaussianBlur(iy * it, (0,0), 5)
Compute the inverse of ATWA (a symmetric positive-definite matrix) and multiply by ATWb. Note that this inverse is of the 2x2 matrix at each pixel, not of the images as a whole. You can write this out as a set of simple arithmetic operations on the images Axx, Axy, Ayy, Abx and Aby.
The inverse of the matrix ATWA is given by:
| Ayy -Axy |
| -Axy Axx | / ( Axx*Ayy - Axy*Axy )
so you can write the solution as
norm = Axx*Ayy - Axy*Axy
vx = ( Ayy * Abx - Axy * Aby ) / norm
vy = ( Axx * Aby - Axy * Abx ) / norm
If the image is natural, it will have at least a tiny bit of noise, and norm will not have zeros. But for artificial images norm could have zeros, meaning you can't divide by it. Simply adding a small value to it will avoid division by zero errors: norm += 1e-6.
The size of the Gaussian filter is chosen as a compromise between precision and allowed motion speed: a larger filter will yield less precise results, but will work with larger shifts between images.
Typically, the vx and vy is only evaluated where the two eigenvalues of the matrix ATWA are sufficiently large (if at least one is small, the result is inaccurate or possibly wrong).
Using DIPlib (disclosure: I'm an author) this is all very easy because it supports images with a matrix at each pixel. You would do this as follows:
import diplib as dip
img_a = dip.ImageRead("./images/1.png")
img_b = dip.ImageRead("./images/2.png")
A = dip.Gradient(img_a, [1.0])
b = img_b - img_a
ATA = dip.Gauss(A * dip.Transpose(A), [5.0])
ATb = dip.Gauss(A * b, [5.0])
v = dip.Inverse(ATA) * ATb
I'm trying to implement the loss function of the classic Image Colorization paper by Levin et al (2004) in Tensorflow/Keras:
This is the weights equation (correlation between intensities):
y is every neighboring pixel of x in a 3x3 window and w is the weight for each of these pixels.
The weights require computing the mean and variance for the neighborhood of every pixel.
I couldn't find a function that would allow me to write this loss function in a symbolic way, and I'm thinking I should write it in a loop where I calculate the w for each window.
How can I write this Loss function in Tensorflow In a Symbolic way or in loops?
Thanks so much.
EDIT: Here's the code I've come up for calculating the weights in Numpy:
import cv2
import numpy as np
im = cv2.resize(cv2.imread('./Image.jpg', 0), (256, 256)) / np.float32(255.0)
M = 3
N = 3
# Split the image into 3x3 windows
windows = [im[x:x + M, y:y + N] for x in range(0, im.shape[0], M) for y in range(0, im.shape[1], N)]
# Calculate the correlation for each window
weights = [1 + np.corrcoef(tile) for tile in windows]
I think this code computes the value in your formula:
import tensorflow as tf
from itertools import product
SIGMA = 1.0
dtype = tf.float32
# Input images batch
img = tf.placeholder(dtype, [None, None, None])
img_shape = tf.shape(img)
img_height = img_shape[1]
img_width = img_shape[2]
# Compute 3 x 3 block means
mean_filter = tf.ones((3, 3), dtype) / 9
img_mean = tf.nn.conv2d(img[:, :, :, tf.newaxis],
mean_filter[:, :, tf.newaxis, tf.newaxis],
[1, 1, 1, 1], 'VALID')[:, :, :, 0]
# Remove 1px border
img_clip = img[:, 1:-1, 1:-1]
# Difference between pixel intensity and its block mean
x_diff = img_clip - img_mean
# Compute neighboring pixel loss contributions
contributions = []
for i, j in product((-1, 0, 1), repeat=2):
if i == j == 0: continue
# Take "shifted" image
displaced_img = img[:, 1 + i:img_width - 1 + i, 1 + j:img_height - 1 + j]
# Compute difference with mean of corresponding pixel block
y_diff = displaced_img - img_mean
# Weights formula
weight = 1 + x_diff * y_diff / (SIGMA ** 2)
# Contribution of this displaced image to the loss of each pixel
contribution = weight * displaced_img
contributions.append(contribution)
contributions = tf.add_n(contributions)
# Compute loss value
loss = tf.reduce_sum(tf.squared_difference(img_clip, contributions))
The loss for the pixels along the image border is not computed, since in principle is not well defined in the formula, although you could make a few changes to take them into account if you want (change convolution to "'SAME'", pad where necessary, etc.).
this is a mean squared error of a 3 x 3 windows. right?
sounds like a GLCM matrix for texture analysis do you want apply this loss function for every 3x3 windows in the image?
I think that is better build the function that make this calculation with a Random weight in Numpy so after try build with TF to try a optimization.
This is more a question on theory of Gaussian filters, than specific coding question.
I've got an implementation of a 2D D.O.G. filter in python. I want to make noise masks at different spatial frequency bands e.g. 1-5 cpd. To do this I first create a white noise array and then I will add the DOG filters to bandpass filter the noise across different spatial ranges.
Is there a way to explicitly define the bandwidth of a Difference of Gaussian filter from the parameters of each contributing Gaussian filter?
(BONUS Q's: Would it be possible to take a Fourier transform of each of these Gaussians and then view this as a spectrum of their individual bandwidths, and then DOG bandwidth? What would the units be in the Fourier space? How could I convert this into a spatial frequency scale? Sorry lots of questions)
Many thanks,
NOTE: I use the conv2 function, rather than inbuilt python 2d convolutions, for speed (other applications).
import numpy as np
import math
import matplotlib.pylab as plt
from scipy.ndimage.filters import convolve
def Gaussian2D(GCenter, Gamp, Ggamma,Gconst): #new_theta > 0.4:
"""
Produces a 2D Gaussian pulse *EDITED BY WMBM
Parameters
----------
GCenter : int
Centre point of Gaussian pulse
Gamp : int
Amplitude of Gaussian pulse
Ggamma : int
FWHM of Gaussian pulse
Gconst : float
Unkown parameter of density function
Returns
----------
GKernel : array_like
Gaussian kernel
"""
new_theta = math.sqrt(Gconst**-1)*Ggamma
SizeHalf = np.int(math.floor(9*new_theta))
[y, x] = np.meshgrid(np.arange(-SizeHalf,SizeHalf+1), np.arange(-SizeHalf,SizeHalf+1))
part1=(x-GCenter[0])**2+(y-GCenter[1])**2
GKernel = Gamp*np.exp(-0.5*Ggamma**-2*Gconst*part1)
return GKernel
def conv2(x,y,mode='same'):
"""
Emulate the Matlab function conv2 from Mathworks.
Usage:
z = conv2(x,y,mode='same')
"""
if not(mode == 'same'):
raise Exception("Mode not supported")
# Add singleton dimensions
if (len(x.shape) < len(y.shape)):
dim = x.shape
for i in range(len(x.shape),len(y.shape)):
dim = (1,) + dim
x = x.reshape(dim)
elif (len(y.shape) < len(x.shape)):
dim = y.shape
for i in range(len(y.shape),len(x.shape)):
dim = (1,) + dim
y = y.reshape(dim)
origin = ()
# Apparently, the origin must be set in a special way to reproduce
# the results of scipy.signal.convolve and Matlab
for i in range(len(x.shape)):
if ( (x.shape[i] - y.shape[i]) % 2 == 0 and
x.shape[i] > 1 and
y.shape[i] > 1):
origin = origin + (-1,)
else:
origin = origin + (0,)
z = convolve(x,y, mode='constant', origin=origin)
return z
# Create white noise array
N=50 # Noise array dimension
A=10 # Noise amplitude
noise = np.random.rand(N,N)*A
# Gaussian Noise paramerers
GCenter=[0,0]
Gconst=1
# First gaussian filter
cutoff_f1 = 0.05 # < pi/10
gamma1 = 1/(2*np.pi*cutoff_f1) #minimum gamma == 0.5
Gamp1 = 1/(2*np.pi*gamma1)
filtr1 = Gaussian2D([0,0],Gamp1,gamma1,Gconst)
# Second gaussian filter
cutoff_f2 = 0.04 # < pi/10
gamma2 = 1/(2*np.pi*cutoff_f2) #minimum gamma == 0.5
Gamp2 = 1/(2*np.pi*gamma2)
filtr2 = Gaussian2D([0,0],Gamp2,gamma2,Gconst)
# Convolve filters with noise
noise_filtr1 = conv2(noise, filtr1, mode='same')
noise_filtr2 = conv2(noise, filtr2, mode='same')
# Difference of Gaussian Output
noise_out = noise_filtr1- noise_filtr2