how do I run two separate deep learning based model together? - python

I trained a deep learning-based detection network to detect and locate some objects. I also trained a deep learning-based classification network to classify the color of the detected objects. Now I want to combine these two networks to detect the object and also classify color. I have some problems with combining these two networks and running them together. How do I call classification while running detection?
They are in two different frameworks: the classifier is based on the Keras and TensorFlow backend, the detection is based on opencv DNN module.

I have read your question and from that, I can infer that your classification network takes the input from the output of your first network(object locator). i.e the located object from your first network is passed to the second network which in turn classifies them into different colors. The entire Pipeline you are using seems to be a sequential one. Your best bet is to first supply input to the first network, get its output, apply some trigger to activate the second network, feed the output of the first net into the second net, and lastly get the output of the second net. You can run both of these networks in separate GPUs.
The Trigger that calls the second function can be something as simple as cropping the located object in local storage and have a function running that checks for any changes in the file structure(adding a new file). If this function returns true you can grab that cropped object and run the network with this image as input.

Related

How can I use 2 images as a training sample in PyTorch?

I just begin learning deep learning and my first homework is to finish an leaves-classification system based on convolutional neural networks.I built a resnet-34 model with the code on github to do it.However,my teacher told me that the basic training unit in his dataset is an image pair.I should use 2 images(photos of the same leaf under different light conditions) as the input,combining two 3-channel images into one 6-channel image,but I don't know how to input 2 images and combine them into 6 channels.How can I do that?Are there any functions?Should I modify the structure of the resnet network?
this is my dataset,you can see every two images are about the same leaf.
You have several issues to tackle:
You need a Dataset with a __getitem__ method that returns 2 images (and a label) instead of the basic ones that returns a single image and a label. You'll probably need to customize your own dataset.
Make sure the augmentations you apply to your images are applied in the same manner to each pair.
You need to modify ResNet-34 network to get as an input 2 images, instead of one. See, e.g., this answer how that can be done.
You need to change the first convolution layer to have 6 input channels instead of 3.
If you want to use pre-trained weights you will not be able to load the existing state_dict of ResNet34 because of changes #3 and #4 - you'll have to do it manually for the first time.

Using a siamese model to obtain an embedding by cutting off one half?

I used Keras to build a Siamese network using the coding format of one of the questions posted (please see code sample here). To explain this briefly, I built a Siamese network using the pretrained efficient net so that each copy of the network produces a dense layer which then get combined into into a L1-similarity output.
However, during prediction time, I only want to obtain the dense output of one of the layers (as an embedding). I plan on using a variety of unsupervised learning methods (including KNN) on these outputs.
During prediction, how can I ask keras to run only one copy of my network graph using a single input? Can I extract only a part of the NN graph? I don't want to have to always generate pairs of images or run the cost of running 2 images when I only need one output.
Let me just make sure that I understand your question and context. You are using a Siamese network (efficient net) and you want to generate embeddings for your input images.
From the image below, you only want to save the image encodings for one the ConvNets?
If that is the case, I dont really see the point of building a Siamese network at all. Just go for a single ConvNet (using efficient net). Because if you use the Siamese network model, it will always ask you to make image pairs.
If you go for only a single ConvNet model, and you identify the layer which you want to use to get the embeddings, then you can use the tf.keras.backend.function like this:
get_layer_output = tf.keras.backend.function([fine_tuned_model.layers[0].input],[fine_tuned_model.layers[-2].output])
Which then, for the predict, you can call it like this:
features = get_layer_output([x])[0]

How to train Tensorflow Object Detection images that do not contain objects?

I am training an object detection network using Tensorflow's object detection,
https://github.com/tensorflow/models/tree/master/research/object_detection
I can successfully train a network based on my own images and labels.
However, I have a large dataset of images that do not contain any of my labeled objects, and I want to be able to train the network to not detect anything in these images.
From what I understand with Tensorflow object detection, I need to give it a set of images and corresponding XML files that box and label the objects in the image. The scripts convert the XML to CSV and then to another format for the training, and do not allow XML files that have no objects.
How to give an image and XML files that have no objects?
Or, how does the network learn what is not an object?
For example if you want to detect "hot dogs" you can train it with a set of images with hot dogs. But how to train it what is not a hot dog?
An Object Detection CNN can learn what is not an object, simply by letting it see examples of images without any labels.
There are two main architecture types:
two-stages, with first stage object/region proposal (RPN), and second - classification and bounding box fine-tuning;
one-stage, which directly classifies and regresses BB based on the feature vector corresponding to a certain cell in the feature map.
In any case, there's a part which is responsible to decide what is an object and what's not. In RPN you have "objectness" score, and in one-stages there's the confidence of classification, where you usually a background class (i.e. everything which is not the supported classes).
So in both cases, in case a specific example in an image doesn't have any supported class, you teach the CNN to decrease the objectness score or increase the background confidence correspondingly.
You might want to take a look at this solution.
For for the tensorflow object detection API to include your negative examples, you need to add the negative examples to the csv file you have created from the xml, either by modifying the script that generates the csv file or by adding the examples afterwards.
For generating xml-files without class labels using LabelImg, you can do this by pressing "Verify Image".

Does inception model label multiple object in one image?

I used retrain.py to train tensorflow with my own dataset of traffic sign but it seems it doesn't capture multi-object in one image.I am using the label_image.py to detect the object in my image. I have an image of two road sign which exists in my dataset but i get only one sign with high accuracy. It doesn't detect other sign.
You have misunderstood what a classification CNN does. Inception is built and trained to classify an image. Not objects in an image. For this reason you will only get a single result from label_image.py as it is using softmax to generate a confidence that an image is of a certain class.
It does not identify individual objects as I explained to you on another question here: Save Image of Detected object from image using Tensor-flow
If you are trying to detect multiple signs then you will need to use object detection models.

Multiple artificial neural networks

I am trying to set up a Multiple Artificial Neural Network as you can see here on image (a):
(source)
I want that each of the networks work independently on its own domain. The single networks must be built and trained for their specific task. The final decision will be make on the results of the individual networks, often called expert networks or agents.
Because of privacy, I could not share my data.
I try to set up this with Tensorflow in Python. Do you have an idea of ​​how I would do it if that is achievable? At the moment I have not found any examples of this.
The way to go about this is to just take the outputs of the two networks and concatenate the resulting output tensors (and reshape them if needed) and then pass them into the final network. Take a look at here for the concatenation documentation and here for an example of taking the output from one network and feeding it into another. This should give you a place to start from.
As for (a), it is simple, just train the networks before hand and load them when you are training the final network. Then do the concatenation on the outputs.
Hope this helps

Categories