Subtract mean from image - python

I'm implementing a CNN with Theano. In the paper, I have to do this image preprocess before train the CNN
We extracted RGB patches of 61x61 dimensions associated with each poselet activation, subtracted the mean and used this data to train the convnet model shown in Table 1
Can you tell me what does it mean with "subtracted the mean"? Tell me if these steps are correct (it is what I understood)
1) Compute the mean for Red Channel, Green Channel and Blue Channel for the whole image
2) For each pixel, subtract from red value the mean of red channel, from green value the mean of green channel and the same for the blue channel
3) Is it correct to have negative value or do I have use the abs?
Thanks all!!

You should read paper carefully, but what is the most probable is that they mean mean of the patches, so you have N matrices 61x61 pixels, which is equivalent of a vector of length 61^2 (if there are three channels then 3*61^2). What they do - they simple compute mean of each dimension, so they calculate mean over these N vectors in respect to each of the 3*61^2 dimensions. As the result they obtain a mean vector of length 3*61^2 (or mean matrix/mean patch if you prefer) and they substract it from all of these N patches. Resulting patches will have negatives values, it is perfectly fine, you should not take abs value, neural networks prefer this kind of data.

I would assume the mean mentioned in the paper is the mean over all images used in the training set (computed separately for each channel).
Several indications:
Caffe is a lib for ConvNets. In their tutorial they mention the compute image mean part: http://caffe.berkeleyvision.org/gathered/examples/imagenet.html
For this they use the following script: https://github.com/BVLC/caffe/blob/master/examples/imagenet/make_imagenet_mean.sh
which does what I indicated.
Google played around with ConvNets and published their code here: https://github.com/google/deepdream/blob/master/dream.ipynb and they do also use the mean of the training set.
This is of course only indirect evidence since I can not explain you why this happens. In fact I stumbled over this question while trying to figure out precisely that.
//EDIT:
In the mean time I found a source confirming my claim (Highlighting added by me):
There are three common forms of data preprocessing a data matrix X [...]
Mean subtraction is the most common form of preprocessing. It
involves subtracting the mean across every individual feature in the
data, and has the geometric interpretation of centering the cloud of
data around the origin along every dimension. In numpy, this operation
would be implemented as: X -= np.mean(X, axis = 0). With images
specifically, for convenience it can be common to subtract a single
value from all pixels (e.g. X -= np.mean(X)), or to do so separately
across the three color channels.
As we can see, the whole data is used to compute the mean.

Related

How to calculate batch normalization with python?

When I implement batch normalization in python from scrach, I am confused. Please see
A paper demonstrates some figures about normalization methods, I think it may be not correct. The description and figure are both not correct.
Description from the paper:
Figure from the paper:
As far as I am concerned, the representation of batch normalization is not correct in the original paper. I post the issue here for discussion.
I think the batch normalization should be like the following figure.
The key point is how to calculate mean and std.
With feature maps' shape as (batch_size, channel_number, width, height),
mean = X.mean(axis=(0, 2, 3), keepdims=True)
or
mean = X.mean(axis=(0, 1), keepdims=True)
Which one is correct?
You should calculate mean and std across all pixels in the images of the batch. So use axis=(0, 2, 3) parameters.
If the channels have roughly same distributions - you may calculate mean and std across channels as well. so just use mean() and std() without axes parameter.
The figure in the article is correct - it takes mean and std across H and W (image dimensions) for each batch. Obviously, channel is not shown in the 3d cube.

MNIST dataset conversion

I've trained a decision tree on a dataset (handwritten) which contains 8 x-y points sampled along the length of the number (number digit dataset). The test dataset given (assignment), is the MNIST dataset, which is the pixel intensities in a 28x28 bitmap image. I need to sample 8 points and along the trajectory of the number so that it performs well.
I'm doing this in Python. I don't know what to do with the image to sample those points. Any package/procedure will help.
Simply index the array as you would any other array. The pixel intensities are merely ints. e.g val = arr[3,9].
You do not have the stroke direction in mnist.
Hence, you cannot reliably infer such positions.
You can do the opposite though: render the stroke information as pixel image, train a classifier on that, and then test it with mnist.
There's an MNIST sequence dataset from Edwin de Jong:
Paper: https://arxiv.org/pdf/1611.03068.pdf
Github: https://edwin-de-jong.github.io/blog/mnist-sequence-data/
Blog: https://github.com/edwin-de-jong/mnist-digits-as-stroke-sequences/
and an MNIST classification using RNN by Ryan Epp:
https://www.ryanepp.com/blog/mnist-classification-using-stroke-paths
In both projects, the direction strokes will take at a T-junction depends on the algorithm and is often counterintuitive. This means that there's more to learn for sequences since many stroke patterns will produce the same image.

t-SNE High Dimension Data Visualisation

I have a twitter corpus which I am using to build sentiment analysis application. The corpus has 5k tweets which have been hand labelled as - negative, neutral or positive
To represent the text - I am using gensim word2vec pretrained vectors. Each word is mapped to 300 dimensions. For a tweet, I add all the word vectors to get a single 300 dim vectors. Thus every tweet is mapped to a single vector of 300 dimension.
I am visualizing my data using t-SNE (tsne python package). See attached image 1 - Red points = negative tweets, Blue points = neutral tweets and Green points = Positive tweets
Question:
In the plot there no clear separation (boundary) among the data points. Can I assume this will also be the case with the original points in 300 Dimensions ?
i.e if points overlap in t-SNE graph then they also overlap in original space and vice-versa ?
Question: In the plot there no clear separation (boundary) among the data points. Can I assume this will also be the case with the original points in 300 Dimensions ?
In most cases NO. By reducing dimensions you will probably loose some information.
The case where you may reduce dimension without losing information is when or data in some dimensions is zero(for example line in 3dimensional space) or when some dimensions linearly dependent on other.
There are few tricks to test how good some dimensionality reductions techniques works. For example:
You may use PCA to reduce dimension form 300 to for example 10. You can calculate sum of 300 eigenvalues(original space) and sum of 10 biggest eigenvalues(these 10 eigenvalues represent eigenvectors that will be used for dimension reduction) and calculate percentage of lost information sum(top-10-eigenvalues)/sum(300-eigenvalues) .This value is not exactly "information" lost, but it is close to that.

Caffe, how to predict from a pretrained net

I'm using this code to load my net:
net = caffe.Classifier(MODEL_FILE, PRETRAINED,
mean=np.load(caffe_root + 'python/caffe/imagenet/ilsvrc_2012_mean.npy').mean(1).mean(1),
channel_swap=(2,1,0),
raw_scale=255,
image_dims=(256, 256))
I have doubts on three lines.
1- mean=np.load(caffe_root + 'python/caffe/imagenet/ilsvrc_2012_mean.npy').mean(1).mean(1)
What is mean? Should I use this mean value or another? And if yes, where can I get custom mean value? I'm using a custom dataset.
2- channel_swap=(2,1,0)
What channel_swap means? And again, should I use this value or an custom value?
And the last
3- raw_scale=255
What is raw_scale? And what value should I use?
I'm using Cohn Kanade dataset. All images are 64x64 and in grayscale.
The channel_swap is to reverse RGB into BGR, which is apparently necessary if you use a reference image net model, based on a comment in [1]. In your case the images are greyscale, so you probably do not have three channels. You might need to set it to (0, 0, 0), but even that might not help (I am unsure on the exact implementation of channel_swap). If that does not help, the simplest solution might be to preprocess you data by splitting every pixel into three values (RGB) with equal values. After that you might drop channel_swap altogether, because your channels have the same value, and swapping them is a no-op.
Mean is what will be subtracted from your input data to center it. (Remember that neural networks need the data to have zero mean, while the input images usually have positive mean, hence the need of the subtraction). The mean you subtract should be the same that was used for training, so using mean from the file associated with the model is correct. I am not sure, however, on whether you should call .mean(1) on it -- did you get that line from some example? If yes, then it is most likely the correct thing to do.
raw_scale is a scale of your input data. The model expects pixels to be normalized, so if your input data has values between 0 and 255, then raw_scale set to 255 is correct. If your data has values between 0 and 1, then raw_scale should be set to 1.
Finally, based on my understanding of the comment in [2] you do not need to provide image_dims
[1] https://github.com/BVLC/caffe/blob/master/python/caffe/io.py#L204
[2] https://github.com/BVLC/caffe/blob/master/python/caffe/classifier.py#L18
I agree on comments of #Ishamael on channel_swap and mean. I just wanted to add further clarification on raw_scale. Assuming that images are loaded with caffe.io.load_image, values are always in the range of 0 to 1 [1]. Just to note that:
While Python represents images in [0, 1], certain Caffe models
CaffeNet and AlexNet represent images in [0, 255] so the raw_scale
these models must be 255.
And I think it's wise to check the input image values prior to feeding to the data layer of network in order to choose appropriate raw_scale.
Thank you.
[1] https://github.com/BVLC/caffe/blob/master/python/caffe/io.py#L224

How would I translate this equation into code?

I am working in Python, and I a trying to compute a wight matrix for a graph of pixels, and the weight of each edge is dependent on their "feature" similarity (F(i) - F(j)), and their location similarity (X(i)-X(j)). "Features" includes intensity, color, texture.
Right now I have it implemented and it is working, but not for color images. I at first tried to simply take some RGB values and average each pixel to convert the entire image to greyscale. But that didn't work as I had hoped, and I have read throgh a paper that suggests a different method.
They say to use this: F(i) = [v, v*s*sin(h), v*s*cos(h)](i)
where h, s, and v and the HSV color values.
I am just confused on the notation. What is this suppsed to mean? What does it mean to have three different terms separated by commas inside square brackets? I'm also confused with what the (i) at the end is supposed to mean. The solution to F(i) for any given pixel should be a single number, to be able to carry out F(i)-F(j)?
I'm not asking for someone to do this for me I just need some clarification.
Features can be vectors and you can calculate distance between vectors.
f1 = numpy.array([1,2,3])
f2 = numpy.array([0,2,3])
distance = numpy.linalg.norm(f1 - f2).

Categories