for my exam based around data crunching, we've received a small simpsons dataset of 4 characters (Bart, Homer, Lisa, Marge) to build a convolutional neural network around. However, the dataset contains only a rather small amount of images: around 2200 to split into test & train.
Since I'm very new to neural networks and deep learning, is it acceptable to augment my data (i'm turning the images X degrees 9 times) and splitting my data afterwards using sklearn's testtrainsplit function.
Since I've made this change, I'm getting a training and test accuracy of around 95% after 50 epochs with my current model. Since that's more than I've expected to get, I started questioning if augmenting test-data mainly is accepted without having a biased or wrong result in the end.
so:
a) Can you augment your data before splitting it with sklearn's TrainTestSplit without influencing your results in a wrong way?
b) if my method is wrong, what's another method I could try out?
Thanks in advance!
One should augment the data after Train and Test split. To work correctly one needs to make sure to augment data only from the train split.
If one augments data and before splitting the dataset, it will likely inject small variations of the train dataset into the test dataset. Thus the network will be overestimating its accuracy (and it might be over-fitting as well, among other issues).
A good way to avoid this pitfall it is to augment the data after the original dataset was split.
A lot of libraries implement python generators that randomly apply one or more combination of image modifications to augment the data. These might include
Image rotation
Image Shearing
Image zoom ( Cropping and re-scaling)
Adding noise
Small shift in hue
Image shifting
Image padding
Image Blurring
Image embossing
This github library has a good overview of classical image augmentation techniques: https://github.com/aleju/imgaug ( I have not used this library. Thus cannot endorse it speed or implementation quality, but their overview in README.md seems to be quite comprehensive.)
Some neural network libraries already have some utilities to do that. For example: Keras has methods for Image Preprocessing https://keras.io/preprocessing/image/
Related
I'm working on a project that requires training a PyTorch framework NN on a very large dataset of images. Some of these images are completely irrelevant to the problem, and but these irrelevant images are not labelled as such. However, there are some metrics I can use to calculate if they are irrelevant (e.g. summing all the pixel values would give me a good sense of which are the relevant images and which are not). What I would ideally like to do is have a Dataloader that can take in a Dataset class, and create batches only with the relevant images. The Dataset class would just know the list of images and their labels, and the Dataloader would interpret whether or not the image it is making a batch with is relevant or not, and would then only make batches with relevant images.
To apply this to an example, lets say I have a dataset of black and white images. The white images are irrelevant, but they are not labelled as such. I want to be able to load batches from a file location, and have these batches only contain the black images. I could filter at some point by summing all the pixels and finding it equals to 0.
What I am wondering is if a custom Dataset, Dataloader, or Sampler would be able to solve this task for me? I already have written a custom Dataset that stores the directory of all the saved images, and a list of all the images in that directory, and can return an image with its label in the getitem function. Is there something more I should add there to filter out certain images? Or should that filter be applied in a custom Dataloader, or Sampler?
Thank you!
I'm assuming that your image dataset belongs to two classes (0 or 1) but it's unlabeled. As #PranayModukuru mentioned that you can determine the similarity by using some measure (e.g aggregating all the pixels intensity values of a image, as you mentioned) in the getitem function in tour custom Dataset class.
However, determining the similarity in getitem function while training your model will make the training process very slow. So, i would recommend you to approximate the similarity before start training (not in the getitem function). Moreover if your image dataset is comprised of complex images (not black and white images) it's better to use a pretrained deep learning model (e.g. resnet or autoencoder) for dimentionality reduction followed by applying clustering approach (e.g. agglomerative clustering) to label your image.
In the second approach you only need to label your images for exactly one time and if you apply augmentation on images while training you don't need to re-determine the similarity (label) in the getitem funcion. On the other hand, in the first approach you need to determine the similarity (label) every time (after applying transformation on images) in the getitem function which is redundant, unnecessary and time consuming.
Hope this will help.
It sounds like your goal is to totally remove the irrelevant images from training.
The best way to deal with this would be to figure out the filenames of all the relevant images up front and save their filenames to a csv or something. Then pass only the good filenames to your dataset.
The reason is you will run through your dataset multiple times during training. This means you will be loading, analyzing and discarding irrelevant images over and over again, which is a waste of compute.
It's better to do this sort of preprocessing/filtering once upfront.
I have a small acoustic dataset of human sounds which I would like to augment and later pass to a binary classifier.
I am familiar with data augmentation for images, but how is it done for acoustic datasets?
I've found 2 related answers regarding autoencoders and SpecAugment with Pytorch & TorchAudio
but I would like to hear your thoughts about the audio-specific "best method".
It really depends on what are you trying to achieve, what your classifier is designed for and how it works.
Depending on the above, you can for example cut the audio differently (if you are feeding the classifier with cut audio segments, and that makes sense in your particular case). You can also augment it with some background noise (artificial like white noise, or recorded one) with different signal to noise ratio - this should additionally make the classifier more robust against noise.
Where can I find details to implement siamese networks to perform image similarity and to retrieve the most similar image from a dataset
It is difficult to get a large number of image data for all the classes, so only a few images, eg 10 images for some classes, are available for most of the classes.
SIFT or ORB seems to perform poorly on some classes.
My project is to differentiate between the license plates based on the states of the UAE. Here I upload few example images.
When there is few training data, no matter how annoying it sounds, the best approach is usually to collect more. Deep networks are infamously data hungry and their performance is poor when the data is scarce. This said, there are approaches that might help you:
Transfer learning
Data augmentation
In transfer learning, you take an already trained deep net (e.g. ResNet50), which was trained for some other task (e.g. ImageNet), fix all its network weights except for the weights in the last few layers and train on your task of interest.
Data augmentation slightly modifies your training data in some predictable way. In your case you can rotate your image by a small angle, apply a perspective transformation, scale the image intensities or slightly change the colors. You apply a different set of these operations with different parameters every time you want to use a particular training image. This way you generate new training examples enlarging your training set.
When preparing train set for neural network training, I find two possible way.
The traditional way: calculate the mean on whole training set, and minus this fixed mean value per image before sending to network. Processing standard deviation in the similar way.
I find tensorflow provides a function tf.image.per_image_standardization that do normalization on single image.
I wonder which way is more appropriate?
Both ways are possible and the choice mostly depends on the way you read the data.
Whole training set normalization is convenient when you can load the whole dataset at once into a numpy array. E.g., MNIST dataset is usually loaded fully into memory. This way is also preferable in terms of convergence, when the individual images vary significantly: two training images, one is mostly white and the other is mostly black, will have very different means.
Per image normalization is convenient when the images are loaded one by one or in small batches, for example from the TFRecord. It's also the only viable option when the dataset is too large too fit in memory. In this case, it's better to organize the input pipeline in tensorflow and transform the image tensors just like other tensors in the graph. I've seen pretty good accuracy with this normalization in CIFAR-10, so it's a viable way, despite the issues stated earlier. Also note that you can reduce the negative effect via batch normalization.
I have a project that use Deep CNN to classify parking lot. My idea is to classify every space whether there is a car or not. and my question is, how do i prepare my image dataset to train my model ?
i have downloaded PKLot dataset for training included negative and positive image.
should i turn all my data training image to grayscale ? should i rezise all my training image to one fix size? (but if i resize my training image to one fixed size, i have landscape and portrait image). Thanks :)
This is an extremely vague question since every image processing algorithm has different approaches to extracting features. However, in your parking lot example, you would probably need to do RGB to Greyscale conversion and Size normalization among other image processing techniques.
A great starting point would be in this link: http://www.scipy-lectures.org/advanced/image_processing/
First detect the cars present in the image, and obtain their size and alignment. Then go for segmentation and labeling of the parking lot by fixing a suitable size and alignment.
as you want use pklot dataset for training your machine and test with real data, the best approach is to make both datasets similar and homological, they must be normalized , fixed sized , gray-scaled and parameterized shapes. then you can use Scale-invariant feature transform (SIFT) for image feature extraction as basic method.the exact definition often depends on the problem or the type of application. Since features are used as the starting point and main primitives for subsequent algorithms, the overall algorithm will often only be as good as its feature detector. you can use these types of image features based on your problem:
Corners / interest points
Edges
Blobs / regions of interest points
Ridges
...