I'm building CNN that will tell me if a person has brain damage. I'm planning to use tf inception v3 model, and build_image_data.py script to build TFRecord.
Dataset is composed of brain scans. Every scan has about 100 images(different head poses, angles). On some images, damage is visible, but on some is not. I can't label all images from the scan as a damage positive(or negative), because some of them would be labeled wrong(if scan is positive on damage, but that is not visible on specific image).
Is there a way to label the whole scan as positive/negative and in that way train the network?
And after training is done, pass scan as input to network(not single image) and classify it.
It looks like multiple instance learning might be your approach. Check out these two papers:
Multiple Instance Learning Convolutional Neural
Networks for Object Recognition
Classifying and segmenting microscopy images
with deep multiple instance learning
The last one is implemented by #dancsalo (not sure if he has a stack overflow account) here.
I looks like the second paper deals with very large images and breaks them into sub images, but labels the entire image. So, it is like labeling a bag of images with a label instead of having to make a label for each sub-image. In your case, you might be able to construct a matrix of images, i.e. a 10 image x 10 image master image for each of the scans...
Let us know if you do this and if it works well on your data set!
Related
I'm working on a project that requires training a PyTorch framework NN on a very large dataset of images. Some of these images are completely irrelevant to the problem, and but these irrelevant images are not labelled as such. However, there are some metrics I can use to calculate if they are irrelevant (e.g. summing all the pixel values would give me a good sense of which are the relevant images and which are not). What I would ideally like to do is have a Dataloader that can take in a Dataset class, and create batches only with the relevant images. The Dataset class would just know the list of images and their labels, and the Dataloader would interpret whether or not the image it is making a batch with is relevant or not, and would then only make batches with relevant images.
To apply this to an example, lets say I have a dataset of black and white images. The white images are irrelevant, but they are not labelled as such. I want to be able to load batches from a file location, and have these batches only contain the black images. I could filter at some point by summing all the pixels and finding it equals to 0.
What I am wondering is if a custom Dataset, Dataloader, or Sampler would be able to solve this task for me? I already have written a custom Dataset that stores the directory of all the saved images, and a list of all the images in that directory, and can return an image with its label in the getitem function. Is there something more I should add there to filter out certain images? Or should that filter be applied in a custom Dataloader, or Sampler?
Thank you!
I'm assuming that your image dataset belongs to two classes (0 or 1) but it's unlabeled. As #PranayModukuru mentioned that you can determine the similarity by using some measure (e.g aggregating all the pixels intensity values of a image, as you mentioned) in the getitem function in tour custom Dataset class.
However, determining the similarity in getitem function while training your model will make the training process very slow. So, i would recommend you to approximate the similarity before start training (not in the getitem function). Moreover if your image dataset is comprised of complex images (not black and white images) it's better to use a pretrained deep learning model (e.g. resnet or autoencoder) for dimentionality reduction followed by applying clustering approach (e.g. agglomerative clustering) to label your image.
In the second approach you only need to label your images for exactly one time and if you apply augmentation on images while training you don't need to re-determine the similarity (label) in the getitem funcion. On the other hand, in the first approach you need to determine the similarity (label) every time (after applying transformation on images) in the getitem function which is redundant, unnecessary and time consuming.
Hope this will help.
It sounds like your goal is to totally remove the irrelevant images from training.
The best way to deal with this would be to figure out the filenames of all the relevant images up front and save their filenames to a csv or something. Then pass only the good filenames to your dataset.
The reason is you will run through your dataset multiple times during training. This means you will be loading, analyzing and discarding irrelevant images over and over again, which is a waste of compute.
It's better to do this sort of preprocessing/filtering once upfront.
I'm somewhat new to coding but am currently trying to use applied machine learning for the purposes of a biological research project. I am trying to create a neural network that will take an input of a green-channel protein image and guess the red-channel of the same image.
I have a dataset of 10,000 40x40 images that have both red and green channel variants (so 20,000 images in total). To clarify, the images are all grayscale, red channel is just used to signify that proteins stained with a red dye would appear in a different spot than proteins with a green dye. Basically, I want to input an image of one set of proteins and have it guess the location of the other set, then compare with the known paired image of the other set.
I am pretty new to Python and Generative Deep Learning. I understand how it works conceptually, but am wondering what the best architecture for this project would be. I was considering a GAN but am unsure how the discriminator would be trained in this case.
Are there any architectures anyone could recommend I take a look at that I could modify for the purpose of this project?
I am training an object detection network using Tensorflow's object detection,
https://github.com/tensorflow/models/tree/master/research/object_detection
I can successfully train a network based on my own images and labels.
However, I have a large dataset of images that do not contain any of my labeled objects, and I want to be able to train the network to not detect anything in these images.
From what I understand with Tensorflow object detection, I need to give it a set of images and corresponding XML files that box and label the objects in the image. The scripts convert the XML to CSV and then to another format for the training, and do not allow XML files that have no objects.
How to give an image and XML files that have no objects?
Or, how does the network learn what is not an object?
For example if you want to detect "hot dogs" you can train it with a set of images with hot dogs. But how to train it what is not a hot dog?
An Object Detection CNN can learn what is not an object, simply by letting it see examples of images without any labels.
There are two main architecture types:
two-stages, with first stage object/region proposal (RPN), and second - classification and bounding box fine-tuning;
one-stage, which directly classifies and regresses BB based on the feature vector corresponding to a certain cell in the feature map.
In any case, there's a part which is responsible to decide what is an object and what's not. In RPN you have "objectness" score, and in one-stages there's the confidence of classification, where you usually a background class (i.e. everything which is not the supported classes).
So in both cases, in case a specific example in an image doesn't have any supported class, you teach the CNN to decrease the objectness score or increase the background confidence correspondingly.
You might want to take a look at this solution.
For for the tensorflow object detection API to include your negative examples, you need to add the negative examples to the csv file you have created from the xml, either by modifying the script that generates the csv file or by adding the examples afterwards.
For generating xml-files without class labels using LabelImg, you can do this by pressing "Verify Image".
Where can I find details to implement siamese networks to perform image similarity and to retrieve the most similar image from a dataset
It is difficult to get a large number of image data for all the classes, so only a few images, eg 10 images for some classes, are available for most of the classes.
SIFT or ORB seems to perform poorly on some classes.
My project is to differentiate between the license plates based on the states of the UAE. Here I upload few example images.
When there is few training data, no matter how annoying it sounds, the best approach is usually to collect more. Deep networks are infamously data hungry and their performance is poor when the data is scarce. This said, there are approaches that might help you:
Transfer learning
Data augmentation
In transfer learning, you take an already trained deep net (e.g. ResNet50), which was trained for some other task (e.g. ImageNet), fix all its network weights except for the weights in the last few layers and train on your task of interest.
Data augmentation slightly modifies your training data in some predictable way. In your case you can rotate your image by a small angle, apply a perspective transformation, scale the image intensities or slightly change the colors. You apply a different set of these operations with different parameters every time you want to use a particular training image. This way you generate new training examples enlarging your training set.
Honestly, i'm just stuck and can't think. I have worked hard to create an amazing model that can read letters, but how do I move on to words, sentences, paragraphs and full papers?
This is a general question so forgive me for not providing code, but assume I have successfully trained a network at recognizing letters of many kinds and many fonts, with all sorts of different noise and distortions in the image.
(just to be technical, the images the model is trained on are 36*36 grayscale images only, and the model is a simple classifier with some conv2d layers)
Now I want to use this well-trained model with all it's parameters and give it something to read, to turn in into a full OCR program. This is where i'm stuck. I want to give the program a photo/scan of a paper, and have it recognize all the letters. But how do I "predict" using my model, when the image is obviously larger than the images it was trained on of single letter?
I have tried adding an additional layer of conv2d that would try to read features of parts of the image, but that was too complicated and I couldn't figure it out.
I have also looked at opencv programs that recognize where there is text in the image and crop that out, but none that I could find separate out single letters that could now be fed to the trained model to try and read.
What is my next step from here?
If the fonts of the letters will be the same throughout the whole image you could use the so called: "sliding window technique"
You start from the upper left corner and slide your scan window to the right for the size of the letter until you reach the end of the paper.
The sliding window will be the size of the scanned letter and when inputted to your neural network it will output the letter. Save those letters somewhere.
Other methods would include changing your neural network and being smarter about detecting blobs of text on the scanned paper
If you are looking for an off-the-shelf solution take a look at Tessaract-ocr.
Check out the following links for ideas:
STN-OCR: A single Neural Network for Text Detection and Text Recognition
STN-OCR on Medium
Attention-based Extraction of Structured Information from Street View Imagery
Another Attention-based OCR Repo
A model using both CNN and LSTM