I have a project that use Deep CNN to classify parking lot. My idea is to classify every space whether there is a car or not. and my question is, how do i prepare my image dataset to train my model ?
i have downloaded PKLot dataset for training included negative and positive image.
should i turn all my data training image to grayscale ? should i rezise all my training image to one fix size? (but if i resize my training image to one fixed size, i have landscape and portrait image). Thanks :)
This is an extremely vague question since every image processing algorithm has different approaches to extracting features. However, in your parking lot example, you would probably need to do RGB to Greyscale conversion and Size normalization among other image processing techniques.
A great starting point would be in this link: http://www.scipy-lectures.org/advanced/image_processing/
First detect the cars present in the image, and obtain their size and alignment. Then go for segmentation and labeling of the parking lot by fixing a suitable size and alignment.
as you want use pklot dataset for training your machine and test with real data, the best approach is to make both datasets similar and homological, they must be normalized , fixed sized , gray-scaled and parameterized shapes. then you can use Scale-invariant feature transform (SIFT) for image feature extraction as basic method.the exact definition often depends on the problem or the type of application. Since features are used as the starting point and main primitives for subsequent algorithms, the overall algorithm will often only be as good as its feature detector. you can use these types of image features based on your problem:
Corners / interest points
Edges
Blobs / regions of interest points
Ridges
...
Related
I'm working on a project that requires training a PyTorch framework NN on a very large dataset of images. Some of these images are completely irrelevant to the problem, and but these irrelevant images are not labelled as such. However, there are some metrics I can use to calculate if they are irrelevant (e.g. summing all the pixel values would give me a good sense of which are the relevant images and which are not). What I would ideally like to do is have a Dataloader that can take in a Dataset class, and create batches only with the relevant images. The Dataset class would just know the list of images and their labels, and the Dataloader would interpret whether or not the image it is making a batch with is relevant or not, and would then only make batches with relevant images.
To apply this to an example, lets say I have a dataset of black and white images. The white images are irrelevant, but they are not labelled as such. I want to be able to load batches from a file location, and have these batches only contain the black images. I could filter at some point by summing all the pixels and finding it equals to 0.
What I am wondering is if a custom Dataset, Dataloader, or Sampler would be able to solve this task for me? I already have written a custom Dataset that stores the directory of all the saved images, and a list of all the images in that directory, and can return an image with its label in the getitem function. Is there something more I should add there to filter out certain images? Or should that filter be applied in a custom Dataloader, or Sampler?
Thank you!
I'm assuming that your image dataset belongs to two classes (0 or 1) but it's unlabeled. As #PranayModukuru mentioned that you can determine the similarity by using some measure (e.g aggregating all the pixels intensity values of a image, as you mentioned) in the getitem function in tour custom Dataset class.
However, determining the similarity in getitem function while training your model will make the training process very slow. So, i would recommend you to approximate the similarity before start training (not in the getitem function). Moreover if your image dataset is comprised of complex images (not black and white images) it's better to use a pretrained deep learning model (e.g. resnet or autoencoder) for dimentionality reduction followed by applying clustering approach (e.g. agglomerative clustering) to label your image.
In the second approach you only need to label your images for exactly one time and if you apply augmentation on images while training you don't need to re-determine the similarity (label) in the getitem funcion. On the other hand, in the first approach you need to determine the similarity (label) every time (after applying transformation on images) in the getitem function which is redundant, unnecessary and time consuming.
Hope this will help.
It sounds like your goal is to totally remove the irrelevant images from training.
The best way to deal with this would be to figure out the filenames of all the relevant images up front and save their filenames to a csv or something. Then pass only the good filenames to your dataset.
The reason is you will run through your dataset multiple times during training. This means you will be loading, analyzing and discarding irrelevant images over and over again, which is a waste of compute.
It's better to do this sort of preprocessing/filtering once upfront.
I trained a road sign detection network. In the training data, the sign occupies the entire frame, like so:
However in the images which I want to use for predictions, road signs occupy a much smaller space, for example:
Predictions for such images are not very good, however if I crop to just the sign the predictions are fine.
How do I go about generating predictions for larger images?
I haven't been able to find an answer in similar questions unfortunately.
It sounds like you're trying to solve a different kind of problem when you want to extend your classification of individual signs to "detecting" them and classifying them inside a larger image.
You have (at least) a couple of options:
Create sliding-window that sweeps the image and makes a classification of each step. In this way when you hit the sign it will return a good classification. But you'll quickly realize that this is not very practical or efficient. The window size and stepping size become more parameters to optimize and as you'll see in the following option, there are object-detection specific methods that already try to solve this specific problem.
You can try an object detection architecture. This will require you to come up with a training dataset that's different from the one you used in your image classification. You'll need many (hundreds or thousands) of the "large" version of your image that contain (and in some cases doesn't contain) the signs you want to identify. You'll need a annotation tool to locate and label those signs and then you can train a network to locate and label them.
Some of the architectures to look up for that second option include: YOLO, Single Shot Detection (SSD), Faster RCNN, to name a few.
I have a bunch of images (basically a resized subset of Celeba dataset). And I have a binary label for each of those images. The images are people's faces.
The problem is : I don't know to which characteristic correspond those labels.
Do you have any method to "backtest" features and labels ? I have no idea how to determine what those labels correspond to.
I have tried visualizing those images again and again trying to understand what were the similarities between the images without success.
I have tried SVM classifier and then plot the coefficients to determine where the classifier was targeted, without success.
Thank you for the help
No. From an information theory standpoint, there is no way to reconstruct an arbitrary, abstract concept from a set of observations. This is equivalent to most research using the scientific method: you're trying to guess the classification rationale from a finite set of data. A guess that does a reliable job of predicting observed behaviour is called a "theory".
Does that help frame the way you look at the problem you're facing?
Where can I find details to implement siamese networks to perform image similarity and to retrieve the most similar image from a dataset
It is difficult to get a large number of image data for all the classes, so only a few images, eg 10 images for some classes, are available for most of the classes.
SIFT or ORB seems to perform poorly on some classes.
My project is to differentiate between the license plates based on the states of the UAE. Here I upload few example images.
When there is few training data, no matter how annoying it sounds, the best approach is usually to collect more. Deep networks are infamously data hungry and their performance is poor when the data is scarce. This said, there are approaches that might help you:
Transfer learning
Data augmentation
In transfer learning, you take an already trained deep net (e.g. ResNet50), which was trained for some other task (e.g. ImageNet), fix all its network weights except for the weights in the last few layers and train on your task of interest.
Data augmentation slightly modifies your training data in some predictable way. In your case you can rotate your image by a small angle, apply a perspective transformation, scale the image intensities or slightly change the colors. You apply a different set of these operations with different parameters every time you want to use a particular training image. This way you generate new training examples enlarging your training set.
for my exam based around data crunching, we've received a small simpsons dataset of 4 characters (Bart, Homer, Lisa, Marge) to build a convolutional neural network around. However, the dataset contains only a rather small amount of images: around 2200 to split into test & train.
Since I'm very new to neural networks and deep learning, is it acceptable to augment my data (i'm turning the images X degrees 9 times) and splitting my data afterwards using sklearn's testtrainsplit function.
Since I've made this change, I'm getting a training and test accuracy of around 95% after 50 epochs with my current model. Since that's more than I've expected to get, I started questioning if augmenting test-data mainly is accepted without having a biased or wrong result in the end.
so:
a) Can you augment your data before splitting it with sklearn's TrainTestSplit without influencing your results in a wrong way?
b) if my method is wrong, what's another method I could try out?
Thanks in advance!
One should augment the data after Train and Test split. To work correctly one needs to make sure to augment data only from the train split.
If one augments data and before splitting the dataset, it will likely inject small variations of the train dataset into the test dataset. Thus the network will be overestimating its accuracy (and it might be over-fitting as well, among other issues).
A good way to avoid this pitfall it is to augment the data after the original dataset was split.
A lot of libraries implement python generators that randomly apply one or more combination of image modifications to augment the data. These might include
Image rotation
Image Shearing
Image zoom ( Cropping and re-scaling)
Adding noise
Small shift in hue
Image shifting
Image padding
Image Blurring
Image embossing
This github library has a good overview of classical image augmentation techniques: https://github.com/aleju/imgaug ( I have not used this library. Thus cannot endorse it speed or implementation quality, but their overview in README.md seems to be quite comprehensive.)
Some neural network libraries already have some utilities to do that. For example: Keras has methods for Image Preprocessing https://keras.io/preprocessing/image/