Increasing accuracy by changing batch-size and input image size - python

I am extracting a road network from satellite imagery. Herein the pixel classification is binary ( 0 = non-road, 1 = road). Hence, the mask of the complete satellite image which is 6400 x 6400 pixels shows one large road network where each road is connected to another road. For the implementation of the U-net I divided that large image in 625 images of 256 x 256 pixels.
My question is: Can a neural network easier find structure with an increase in batch size (thus can it find structure between different batches), or can it only find structure if the input image size is enlarged?

If your model is a regular convolutional network (without any weird hacks), the samples in a batch will not be connected to each other.
Depending on which loss function you use, the batch size might be important too. For regular functions (available 'mse', 'binary_crossentropy', 'categorical_crossentropy', etc.), they all keep the samples independent from each other. But some losses might consider the entire batch. (F1 metrics, for instance). If you're using a loss function that doesn't treat samples independently, then the batch size is very important.
That said, having a bigger batch size may help the net to find its way more easily, since one image might push weights towards one direction, while another may want a different direction. The mean results of all images in the batch should then be more representative of a general weight update.
Now, entering an experimenting field (we never know everything about neural networks until we test them), consider this comparison:
a batch with 1 huge image versus
a batch of patches of the same image
Both will have the same amount of data, and for a convolutional network, it wouldn't make a drastic difference. But for the first case, the net will probably be better at finding connections between roads, maybe find more segments where the road might be covered by something, while the small patches, being full of borders might be looking more into textures and be not good at identifying these gaps.
All of this is, of course, a guess. Testing is the best.
My net in a GPU cannot really use big patches, which is bad for me...

Related

Massive neural network training time increase by inverting images in a data set

I have been working with neural networks for a few months now and I have a little mystery that I can't solve on my own.
I wanted to create and train a neural network which can identify simple geometric shapes (squares, circles, and triangles) in 56*56 pixel greyscale images. If I use images with a black background and a white shape, everything work pretty well. The training time is about 18 epochs and the accuracy is pretty close to 100% (usually 99.6 % - 99.8%).
But all that changes when I invert the images (i.e., now a white background and black shapes). The training time skyrockets to somewhere around 600 epochs and during the first 500-550 epochs nothing really happens. The loss barely decreases in those first 500-550 epochs and it just seems like something is "stuck".
Why does the training time increase so much and how can I reduce it (if possible)?
Color inversion
You have to essentially “switch” WxH pixels, hence touching every possible pixel during augmentation for every image, which amounts to lots of computation.
In total it would be DxWxH operations per epoch (D being size of your dataset).
You might want to precompute these and feed your neural network with them afterwards.
Loss
It is harder for neural networks as white is encoded with 1, while black is encoded with 0. Inverse giving us 0 for white and 1 for black.
This means most of neural network weights are activated by background pixels!
What is more, every sensible signal (0 in case of inversion) is multiplied by zero value and has not effect on final loss.
With hard {0, 1} encoding neural network tries to essentially get signal from the background (now 1 for black pixels) which is mostly meaningless (each weight will tend to zero or almost zero as it bears little to no information) and what it does instead is fitting distribution to your labels (if I predict 1 only I will get smaller loss, no matter the input).
Experiment if you are bored
Try to encode your pixels with smooth values, e.g. white being 0.1 and black being 0.9 (although 1.0 might work okayish, although more epochs might be needed as 0 is very hard to obtain via backprop) and see what the results are.

Number of Conv2d Layers and Filters for Small Image Classification Task

If I'm working with a dataset where I have ~100,000 training images and ~20,000 validation images, each of size 32 x 32 x 3, how does the size and the dimensions of my dataset affect the number of Conv2d layers I have in my CNN? My intuition is to use fewer Conv2d layers, 2-3, because any more than 3 layers will be working with parts of the image that are too small to gain relevant data from.
In addition, does it make sense to have layers with a large number of filters, >128? My thought is that when dealing with small images, it doesn't make sense to have a large number of parameters.
Since you have the exact input size like the images in Cifar10 and Cifar100 just have a look what people tried out.
In general you can start with something like a ResNet18. Also I don't quite understand why you say
because any more than 3 layers will be working with parts of the image that are too small to gain relevant data from.
As long as you don't downsample using something like max pooling or a conv with padding 1 and stride 2. The size of 32x32 will be the same and only the number of channels will change depending on the network.
Designing networks is almost always at looking what did other people do and what worked for them and starting from there. You almost never want to do it from scratch on your own, since the iteration cycles are just to long and models released by researches from Google, Facebook ... had way more resources then you will ever have to find something good.

Keras - Using large numbers of features

I'm developing a Keras NN that predicts the label using 20,000 features. I can build the network, but have to use system RAM since the model is too large to fit in my GPU, which has meant it's taken days to run the model on my machine. The input is currently 500,20000,1 to an output of 500,1,1
-I'm using 5,000 nodes in the first fully connected (Dense) layer. Is this sufficient for the number of features?
-Is there a way of reducing the dimensionality so as to run it on my GPU?
I suppose each input entry has size (20000, 1) and you have 500 entries which make up your database?
In that case you can start by reducing the batch_size, but I also suppose that you mean that even the network weights don't fit in you GPU memory. In that case the only thing (that I know of) that you can do is dimensionality reduction.
You have 20000 features, but it is highly unlikely that all of them are important for the output value. With PCA (Principal Component Analysis) you can check the importance of all you parameters and you will probably see that only a few of them combined will be 90% or more important for the end result. In this case you can disregard the unimportant features and create a network that predicts the output based on let's say only 1000 (or even less) features.
An important note: The only reason I can think of where you would need that many features, is if you are dealing with an image, a spectrum (you can see a spectrum as a 1D image), ... In this case I recommend looking into convolutional neural networks. They are not fully-connected, which removes a lot of trainable parameters while probably performing even better.

How to tackle different Image dimensions for classification

I am working on a problem where I have to classify images into different groups. I am a beginner and working with Keras with simple sequence model. How should I tackle the problem of images with different dimension in below code e.g. some images have dimension 2101583 while some have 210603 etc. Please suggest.
model.add(Dense(100,input_dim = ?,activation= "sigmoid"))
model.add(Dense(100,input_dim = ?,activation= "sigmoid"))
For simple classification algorithms in Keras the input should always be the same size. Since Feedforward Neural Networks are consisting of an input layer, one to many hidden layers and one output layer, with all nodes connected, you should always have an Input at each Input-Node. Moreover, the shape and other hyperparameters of the Neural Network are static, so you can't change the number of inputs, and therefor the size for each Image in one Neural Network.
The best practice for your case would be to either downsize all Images to the size of your smallest Image, or upsize all Images to the size of your largest Image.
Downsizing
With downsizing you would actively delete pixels from your Image, including information contained in the pixels. This can lead to overfitting, but would decrease the computational time, too.
Upsizing
With upsizing you would add pixels to your image, without adding information. This would increase computational time, but you would keep the inforamtion of each Image.
For a good start I would suggest you to try and downsize your Images to the smallest one. This is a common practice in science as well [1]. One library to do so is OpenCV, for implementation issues, please refer to the multiple questions on Stackoverflow:
Python - resize image
Opencv-Python-Resizing image

How to train different size of image using cnn? [duplicate]

I am trying to train my model which classifies images.
The problem I have is, they have different sizes. how should i format my images/or model architecture ?
You didn't say what architecture you're talking about. Since you said you want to classify images, I'm assuming it's a partly convolutional, partly fully connected network like AlexNet, GoogLeNet, etc. In general, the answer to your question depends on the network type you are working with.
If, for example, your network only contains convolutional units - that is to say, does not contain fully connected layers - it can be invariant to the input image's size. Such a network could process the input images and in turn return another image ("convolutional all the way"); you would have to make sure that the output matches what you expect, since you have to determine the loss in some way, of course.
If you are using fully connected units though, you're up for trouble: Here you have a fixed number of learned weights your network has to work with, so varying inputs would require a varying number of weights - and that's not possible.
If that is your problem, here's some things you can do:
Don't care about squashing the images. A network might learn to make sense of the content anyway; does scale and perspective mean anything to the content anyway?
Center-crop the images to a specific size. If you fear you're losing data, do multiple crops and use these to augment your input data, so that the original image will be split into N different images of correct size.
Pad the images with a solid color to a squared size, then resize.
Do a combination of that.
The padding option might introduce an additional error source to the network's prediction, as the network might (read: likely will) be biased to images that contain such a padded border.
If you need some ideas, have a look at the Images section of the TensorFlow documentation, there's pieces like resize_image_with_crop_or_pad that take away the bigger work.
As for just don't caring about squashing, here's a piece of the preprocessing pipeline of the famous Inception network:
# This resizing operation may distort the images because the aspect
# ratio is not respected. We select a resize method in a round robin
# fashion based on the thread number.
# Note that ResizeMethod contains 4 enumerated resizing methods.
# We select only 1 case for fast_mode bilinear.
num_resize_cases = 1 if fast_mode else 4
distorted_image = apply_with_random_selector(
distorted_image,
lambda x, method: tf.image.resize_images(x, [height, width], method=method),
num_cases=num_resize_cases)
They're totally aware of it and do it anyway.
Depending on how far you want or need to go, there actually is a paper here called Spatial Pyramid Pooling in Deep Convolution Networks for Visual Recognition that handles inputs of arbitrary sizes by processing them in a very special way.
Try making a spatial pyramid pooling layer. Then put it after your last convolution layer so that the FC layers always get constant dimensional vectors as input . During training , train the images from the entire dataset using a particular image size for one epoch . Then for the next epoch , switch to a different image size and continue training .

Categories