Say I have a batch of images in the form of tensors with dimensions (B x C x W x H) where B is the batch size, C is the number of channels in the image, and W and H are the width and height of the image respectively. I'm looking to use the transforms.Normalize() function to normalize my images with respect to the mean and standard deviation of the dataset across the C image channels, meaning that I want a resulting tensor in the form 1 x C. Is there a straightforward way to do this?
I tried torch.view(C, -1).mean(1) and torch.view(C, -1).std(1) but I get the error:
view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.
Edit
After looking into how view() works in PyTorch, I know realize why my approach doesn't work; however, I still can't figure out how to get the per-channel mean and standard deviation.
Note that variances add, not standard deviations. See detailed explanation here: https://apcentral.collegeboard.org/courses/ap-statistics/classroom-resources/why-variances-add-and-why-it-matters
Here is the modified code:
nimages = 0
mean = 0.0
var = 0.0
for i_batch, batch_target in enumerate(trainloader):
batch = batch_target[0]
# Rearrange batch to be the shape of [B, C, W * H]
batch = batch.view(batch.size(0), batch.size(1), -1)
# Update total number of images
nimages += batch.size(0)
# Compute mean and std here
mean += batch.mean(2).sum(0)
var += batch.var(2).sum(0)
mean /= nimages
var /= nimages
std = torch.sqrt(var)
print(mean)
print(std)
You just need to rearrange batch tensor in a right way: from [B, C, W, H] to [B, C, W * H] by:
batch = batch.view(batch.size(0), batch.size(1), -1)
Here is complete usage example on random data:
Code:
import torch
from torch.utils.data import TensorDataset, DataLoader
data = torch.randn(64, 3, 28, 28)
labels = torch.zeros(64, 1)
dataset = TensorDataset(data, labels)
loader = DataLoader(dataset, batch_size=8)
nimages = 0
mean = 0.
std = 0.
for batch, _ in loader:
# Rearrange batch to be the shape of [B, C, W * H]
batch = batch.view(batch.size(0), batch.size(1), -1)
# Update total number of images
nimages += batch.size(0)
# Compute mean and std here
mean += batch.mean(2).sum(0)
std += batch.std(2).sum(0)
# Final step
mean /= nimages
std /= nimages
print(mean)
print(std)
Output:
tensor([-0.0029, -0.0022, -0.0036])
tensor([0.9942, 0.9939, 0.9923])
Related
I would like to know I to calculate the mean and the std of a given dataset of RGB images.
For example, with imagenet we have imagenet_stats: ([0.485, 0.456, 0.406], [0.229, 0.224, 0.225].
I tried:
rgb_values = [np.mean(Image.open(img).getdata(), axis=0)/255 for img in imgs_path]
np.mean(rgb_values, axis=0)
np.std(rgb_values, axis=0)
I am not sure that the values I get are correct.
Which could be a better implementation?
Two solutions:
The first solution iterates over the images. It is MUCH slower than the second solution, and it uses the same amount of memory because it first loads and then stores all the images in a list. So it is strictly worse than the second solution, unless you will change how your images are loaded - load and process them one by one from disc.
The second solution needs to hold all images in memory at the same time. It is MUCH faster, because it is fully vectorized.
First solution (iterating over the images):
For each channel: R, G, B, here is how to calculate the means and stds of all the pixels in all the images:
Requirement:
Each image has the same number of pixels.
If this is not the case - use the second solution (below).
images_rgb = [np.array(Image.open(img).getdata()) / 255. for img in imgs_path]
# Each image_rgb is of shape (n, 3),
# where n is the number of pixels in each image,
# and 3 are the channels: R, G, B.
means = []
for image_rgb in images_rgb:
means.append(np.mean(image_rgb, axis=0))
mu_rgb = np.mean(means, axis=0) # mu_rgb.shape == (3,)
variances = []
for image_rgb in images_rgb:
var = np.mean((image_rgb - mu_rgb) ** 2, axis=0)
variances.append(var)
std_rgb = np.sqrt(np.mean(variances, axis=0)) # std_rgb.shape == (3,)
Proof
... that the mean and std will be same if calculated like this, and if calculated using all pixels at once:
Let's say each image has n pixels (with values vals_i), and there are m images.
Then there are (n*m) pixels.
The real_mean of all pixels in all vals_is is:
total_sum = sum(vals_1) + sum(vals_2) + ... + sum(vals_m)
real_mean = total_sum / (n*m)
Adding up the means of each image individually:
sum_of_means = sum(vals_1) / m + sum(vals_2) / m + ... + sum(vals_m) / m
= (sum(vals_1) + sum(vals_2) + ... + sum(vals_m)) / m
Now, what is the relationship between the real_mean and sum_of_means? - As you can see,
real_mean = sum_of_means / n
Analogously, using the formula for standard deviation, the real_std of all pixels in all vals_is is:
sum_of_square_diffs = sum(vals_1 - real_mean) ** 2
+ sum(vals_2 - real_mean) ** 2
+ ...
+ sum(vals_m - real_mean) ** 2
real_std = sqrt( total_sum / (n*m) )
If you look at this equation from another angle, you can see that real_std is basically the average of average variances of n values in m images.
Verification
Real mean and std:
rng = np.random.default_rng(0)
vals = rng.integers(1, 100, size=100) # data
mu = np.mean(vals)
print(mu)
print(np.std(vals))
50.93 # real mean
28.048976808432776 # real standard deviation
Comparing it to the image-by-image approach:
n_images = 10
means = []
for subset in np.split(vals, n_images):
means.append(np.mean(subset))
new_mu = np.mean(means)
variances = []
for subset in np.split(vals, n_images):
var = np.mean((subset - mu) ** 2)
variances.append(var)
print(new_mu)
print(np.sqrt(np.mean(variances)))
50.92999999999999 # calculated mean
28.048976808432784 # calculated standard deviation
Second solution (fully vectorized):
Using all the pixels of all images at once.
rgb_values = np.concatenate(
[Image.open(img).getdata() for img in imgs_path],
axis=0
) / 255.
# rgb_values.shape == (n, 3),
# where n is the total number of pixels in all images,
# and 3 are the 3 channels: R, G, B.
# Each value is in the interval [0; 1]
mu_rgb = np.mean(rgb_values, axis=0) # mu_rgb.shape == (3,)
std_rgb = np.std(rgb_values, axis=0) # std_rgb.shape == (3,)
I'm training a U-Net for extracting the area of buildings from satellite images. The results are not bad but I want to sharp the contours of the figures inside the image.
In order to improve it, I'm trying to use a weight map of the contours or borders of the figure inside the image.
Therefore, I'm trying to construct a map of weights with high values - e.g. 10 - on the borders and the values decaying from both sides. But I didn't know how to do it yet.
I have adapted from another code a solution that works for this. Here is the code:
def unet_weight_map(y, wc=None, w0 = 10, sigma = 20):
"""
Generate weight maps as specified in the U-Net paper
for boolean mask.
"U-Net: Convolutional Networks for Biomedical Image Segmentation"
https://arxiv.org/pdf/1505.04597.pdf
Parameters
----------
mask: Numpy array
2D array of shape (image_height, image_width) representing binary mask
of objects.
wc: dict
Dictionary of weight classes.
w0: int
Border weight parameter.
sigma: int
Border width parameter.
Returns
-------
Numpy array
Training weights. A 2D array of shape (image_height, image_width).
"""
y = y.reshape(y.shape[0], y.shape[1])
labels = label(y)
no_labels = labels == 0
label_ids = sorted(np.unique(labels))
if len(label_ids) > 0:
distances = np.zeros((y.shape[0], y.shape[1], len(label_ids)))
for i, label_id in enumerate(label_ids):
distances[:,:,i] = distance_transform_edt(labels != label_id)
distances = np.sort(distances, axis=2)
d1 = distances[:,:,0]
d2 = distances[:,:,1]
w = w0 * np.exp(-1/2*((d1 + d2) / sigma)**2) * no_labels
else:
w = np.zeros_like(y)
if wc:
class_weights = np.zeros_like(y)
for k, v in wc.items():
class_weights[y == k] = v
w = w + class_weights
return w
wc = {
0: 0, # background
1: 1 # objects
}
w = unet_weight_map(img, wc)
Input:
Output:
If someone has a better solution, please!
I have a tensor input of dimensions (B,C,H,W) and I would like to find a correlation matrix of the input. The code I am using is :
def corr(x):
"""
x: [B, C, H, W]
"""
# [B, C, H, W] -> [B, C, H * W]
x = x.view((x.size(0), x.size(1), -1))
# estimated covariance
x = x - x.mean(dim=-1, keepdim=True)
factor = 1 / (x.shape[-1] - 1)
cov = factor * (x # x.transpose(-1, -2))
return torch.div(cov,torch.diagonal(cov, dim1=-2, dim2=-1))
So I rechecked myself and it looks like I am getting good results for the cov variable in a function but when I try to normalize it to get the correlation, the result's range is very strange, there are values above 1 and below -1, and overall the solution does not seem to be right.
Any suggestions on how to solve the problem?
I'm trying to implement the loss function of the classic Image Colorization paper by Levin et al (2004) in Tensorflow/Keras:
This is the weights equation (correlation between intensities):
y is every neighboring pixel of x in a 3x3 window and w is the weight for each of these pixels.
The weights require computing the mean and variance for the neighborhood of every pixel.
I couldn't find a function that would allow me to write this loss function in a symbolic way, and I'm thinking I should write it in a loop where I calculate the w for each window.
How can I write this Loss function in Tensorflow In a Symbolic way or in loops?
Thanks so much.
EDIT: Here's the code I've come up for calculating the weights in Numpy:
import cv2
import numpy as np
im = cv2.resize(cv2.imread('./Image.jpg', 0), (256, 256)) / np.float32(255.0)
M = 3
N = 3
# Split the image into 3x3 windows
windows = [im[x:x + M, y:y + N] for x in range(0, im.shape[0], M) for y in range(0, im.shape[1], N)]
# Calculate the correlation for each window
weights = [1 + np.corrcoef(tile) for tile in windows]
I think this code computes the value in your formula:
import tensorflow as tf
from itertools import product
SIGMA = 1.0
dtype = tf.float32
# Input images batch
img = tf.placeholder(dtype, [None, None, None])
img_shape = tf.shape(img)
img_height = img_shape[1]
img_width = img_shape[2]
# Compute 3 x 3 block means
mean_filter = tf.ones((3, 3), dtype) / 9
img_mean = tf.nn.conv2d(img[:, :, :, tf.newaxis],
mean_filter[:, :, tf.newaxis, tf.newaxis],
[1, 1, 1, 1], 'VALID')[:, :, :, 0]
# Remove 1px border
img_clip = img[:, 1:-1, 1:-1]
# Difference between pixel intensity and its block mean
x_diff = img_clip - img_mean
# Compute neighboring pixel loss contributions
contributions = []
for i, j in product((-1, 0, 1), repeat=2):
if i == j == 0: continue
# Take "shifted" image
displaced_img = img[:, 1 + i:img_width - 1 + i, 1 + j:img_height - 1 + j]
# Compute difference with mean of corresponding pixel block
y_diff = displaced_img - img_mean
# Weights formula
weight = 1 + x_diff * y_diff / (SIGMA ** 2)
# Contribution of this displaced image to the loss of each pixel
contribution = weight * displaced_img
contributions.append(contribution)
contributions = tf.add_n(contributions)
# Compute loss value
loss = tf.reduce_sum(tf.squared_difference(img_clip, contributions))
The loss for the pixels along the image border is not computed, since in principle is not well defined in the formula, although you could make a few changes to take them into account if you want (change convolution to "'SAME'", pad where necessary, etc.).
this is a mean squared error of a 3 x 3 windows. right?
sounds like a GLCM matrix for texture analysis do you want apply this loss function for every 3x3 windows in the image?
I think that is better build the function that make this calculation with a Random weight in Numpy so after try build with TF to try a optimization.
I have a numpy array of images of shape (N, H, W, C) where N is the number of images, H the image height, W the image width and C the RGB channels.
I would like to standardize my images channel-wise, so for each image I would like to channel-wise subtract the image channel's mean and divide by its standard deviation.
I did this in a loop, which worked, however it is very inefficient and as it makes a copy my RAM is getting too full.
def standardize(img):
mean = np.mean(img)
std = np.std(img)
img = (img - mean) / std
return img
for img in rgb_images:
r_channel = standardize(img[:,:,0])
g_channel = standardize(img[:,:,1])
b_channel = standardize(img[:,:,2])
normalized_image = np.stack([r_channel, g_channel, b_channel], axis=-1)
standardized_images.append(normalized_image)
standardized_images = np.array(standardized_images)
How can I do this more efficiently making use of numpy's capabilities?
Perform the ufunc reductions (mean, std) along the second and third axes, while keeping the dims intact that help in broadcasting later on with the division step -
mean = np.mean(rgb_images, axis=(1,2), keepdims=True)
std = np.std(rgb_images, axis=(1,2), keepdims=True)
standardized_images_out = (rgb_images - mean) / std
Boost the performance further by re-using the average values to compute standard-deviation, according to its formula and hence inspired by this solution , like so -
std = np.sqrt(((rgb_images - mean)**2).mean((1,2), keepdims=True))
Packaging into a function with the axes for reductions as a parameter, we would have -
from __future__ import division
def normalize_meanstd(a, axis=None):
# axis param denotes axes along which mean & std reductions are to be performed
mean = np.mean(a, axis=axis, keepdims=True)
std = np.sqrt(((a - mean)**2).mean(axis=axis, keepdims=True))
return (a - mean) / std
standardized_images = normalize_meanstd(rgb_images, axis=(1,2))