Target: I want to use the pretrained Faster-RCNN model to extract features from image.
What I have tried: I use below code to build the model:
import torchvision.models as models
from PIL import Image
import torchvision.transforms as T
import torch
# download the pretrained fasterrcnn model
model = models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
model.eval()
model.cuda()
# remove [2:] layers
modules = list(model.children())[:2]
model_t=torch.nn.Sequential(*modules)
# load image and extract features
img = Image.open('data/person.jpg')
transform = T.Compose([T.ToTensor()])
img_t = transform(img)
batch_t = torch.unsqueeze(img_t, 0).cuda()
ft = model_t(batch_t)
Error: But I got the following error:TypeError: conv2d(): argument 'input' (position 1) must be Tensor, not tuple
Please help! Thank you!
print(model.modules) to get the layer names. Then delete a layer with:
del model.my_layer_name
Related
from keras_multi_head import MultiHeadAttention
import keras
from keras.layers import Dense,Input,Multiply
from keras import backend as K
from keras.layers.core import Dropout, Layer
from keras.models import Sequential,Model
import numpy as np
import tensorflow as tf
from self_attention_layer import Encoder
## multi source attention
class Multi_source_attention(keras.Model):
def __init__(self,read_n,embed_dim,num_heads,ff_dim,num_layers):
super().__init__()
self.read_n = read_n
self.embed_dim = embed_dim
self.num_heads = num_heads
self.ff_dim = ff_dim
self.num_layers = num_layers
self.get_weights = Dense(49, activation = 'relu',name = "get_weights")
def compute_output_shape(self,input_shape):
#([batch,7,7,256],[1,256])
return input_shape
def call(self,inputs):
## weights matrix
#(1,49)
weights_res = self.get_weights(inputs[1])
#(1,7,7)
weights = tf.reshape(weights_res,(1,7,7))
#(256,7,7)
weights = tf.tile(weights,[256,1,1])
## img from mobilenet
img=tf.reshape(inputs[0],[-1,7,7])
inter_res = tf.multiply(img,weights)
inter_res = tf.reshape(inter_res, (-1,256,49))
print(inter_res.shape)
att = Encoder(self.embed_dim,self.num_heads,self.ff_dim,self.num_layers)(inter_res)
return att
I try to construct a network to implement the part circled in the image. The output from LSTM **(1,256) and from the previous Mobilenet (batch,7,7,256). Then the output of LSTM is transformed to a weights matrix in form of (7,7).
But the problem is that the input shape of the output from mobilenet has a attribute batch. I have no idea how to deal with "batch" or how to set up a parameter to constraint the batch?
Could someone give me a tip?
And if I remove the function compute_output_shape(), one error unimplementerror occurs. the keras official doc tells me that I don't need to overwrite the function.
Could someone explain me about that?
Compute_output_shape is crucial to custom the layer. if the function summary() is called, the corresponding Graph is generated where the input and output shapes are showed in every layer. The compute_output_shape is responsible for the output shape.
I want to reduce the object detection model size. For the same, I tried optimising Faster R-CNN model for object detection using pytorch-mobile optimiser, but the .pt zip file generated is of the same size as that of the original model size.
I used the code mention below
import torch
import torchvision
from torch.utils.mobile_optimizer import optimize_for_mobile
model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
model.eval()
script_model = torch.jit.script(model)
from torch.utils.mobile_optimizer import optimize_for_mobile
script_model_vulkan = optimize_for_mobile(script_model, backend='Vulkan')
torch.jit.save(script_model_vulkan, "frcnn.pth")
You have to quantize your model first
follow these steps here
& then use these methods
from torch.utils.mobile_optimizer import optimize_for_mobile
script_model_vulkan = optimize_for_mobile(script_model, backend='Vulkan')
torch.jit.save(script_model_vulkan, "frcnn.pth")
EDIT:
Quantization process for resnet50 model
import torchvision
model = torchvision.models.resnet50(pretrained=True)
import os
import torch
def print_model_size(mdl):
torch.save(mdl.state_dict(), "tmp.pt")
print("%.2f MB" %(os.path.getsize("tmp.pt")/1e6))
os.remove('tmp.pt')
print_model_size(model) # will print original model size
backend = "qnnpack"
model.qconfig = torch.quantization.get_default_qconfig(backend)
torch.backends.quantized.engine = backend
model_static_quantized = torch.quantization.prepare(model, inplace=False)
model_static_quantized = torch.quantization.convert(model_static_quantized, inplace=False)
print_model_size(model_static_quantized) ## will print quantized model size
I'm using keras' pre-trained model VGG16, following this link: Transfer learning I'm trying predict content of an image:
# example of using a pre-trained model as a classifier
from keras.preprocessing.image import load_img
from keras.preprocessing.image import img_to_array
from keras.applications.vgg16 import preprocess_input
from keras.applications.vgg16 import decode_predictions
from keras.applications.vgg16 import VGG16
# load an image from file
image = load_img('dog.jpg', target_size=(224, 224))
# convert the image pixels to a numpy array
image = img_to_array(image)
# reshape data for the model
image = image.reshape((1, image.shape[0], image.shape[1], image.shape[2]))
# prepare the image for the VGG model
image = preprocess_input(image)
# load the model
model = VGG16()
# predict the probability across all output classes
yhat = model.predict(image)
# convert the probabilities to class labels
label = decode_predictions(yhat)
# retrieve the most likely result, e.g. highest probability
label = label[0][0]
# print the classification
print('%s (%.2f%%)' % (label[1], label[2]*100))
Full Error Output:
ValueError: decode_predictions expects a batch of predictions (i.e. a 2D array of shape (samples, 2622)) for V1 or (samples, 8631) for V2.Found array with shape: (1, 1000)
This is link to a seemingly similar question on SO.
Any comments and suggestions highly appreciated. Thank you!
I ran your code and it works properly. Since I do not have your image dog.jpg I used a color jpg image of an Afghan dog and the network identified it correctly as an Afghan Hound. So I suspect there is something amiss with your image. Yhat is a 1 X 1000 array as expected. Ensure you image is an rgb image.
thank you for your help. I was running this in Colab and had earlier tests code where in different cell i have imported for:
from keras_vggface.vggface import VGGFace
from keras_vggface.utils import preprocess_input
from keras_vggface.utils import decode_predictions
That was the reason for the error.... –
I have converted the .pb file to tflite file using the bazel. Now I want to load this tflite model in my python script just to test that weather this is giving me correct output or not ?
You can use TensorFlow Lite Python interpreter to load the tflite model in a python shell, and test it with your input data.
The code will be like this:
import numpy as np
import tensorflow as tf
# Load TFLite model and allocate tensors.
interpreter = tf.lite.Interpreter(model_path="converted_model.tflite")
interpreter.allocate_tensors()
# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
# Test model on random input data.
input_shape = input_details[0]['shape']
input_data = np.array(np.random.random_sample(input_shape), dtype=np.float32)
interpreter.set_tensor(input_details[0]['index'], input_data)
interpreter.invoke()
# The function `get_tensor()` returns a copy of the tensor data.
# Use `tensor()` in order to get a pointer to the tensor.
output_data = interpreter.get_tensor(output_details[0]['index'])
print(output_data)
The above code is from TensorFlow Lite official guide, for more detailed information, read this.
Using TensorFlow lite models in Python:
The verbosity of TensorFlow Lite is powerful because it allows you more control, but in many cases you just want to pass input and get an output, so I made a class that wraps this logic:
The following works with classification models from tfhub.dev, for example: https://tfhub.dev/tensorflow/lite-model/mobilenet_v2_1.0_224/1/metadata/1
# Usage
model = TensorflowLiteClassificationModel("path/to/model.tflite")
(label, probability) = model.run_from_filepath("path/to/image.jpeg")
import tensorflow as tf
import numpy as np
from PIL import Image
class TensorflowLiteClassificationModel:
def __init__(self, model_path, labels, image_size=224):
self.interpreter = tf.lite.Interpreter(model_path=model_path)
self.interpreter.allocate_tensors()
self._input_details = self.interpreter.get_input_details()
self._output_details = self.interpreter.get_output_details()
self.labels = labels
self.image_size=image_size
def run_from_filepath(self, image_path):
input_data_type = self._input_details[0]["dtype"]
image = np.array(Image.open(image_path).resize((self.image_size, self.image_size)), dtype=input_data_type)
if input_data_type == np.float32:
image = image / 255.
if image.shape == (1, 224, 224):
image = np.stack(image*3, axis=0)
return self.run(image)
def run(self, image):
"""
args:
image: a (1, image_size, image_size, 3) np.array
Returns list of [Label, Probability], of type List<str, float>
"""
self.interpreter.set_tensor(self._input_details[0]["index"], image)
self.interpreter.invoke()
tflite_interpreter_output = self.interpreter.get_tensor(self._output_details[0]["index"])
probabilities = np.array(tflite_interpreter_output[0])
# create list of ["label", probability], ordered descending probability
label_to_probabilities = []
for i, probability in enumerate(probabilities):
label_to_probabilities.append([self.labels[i], float(probability)])
return sorted(label_to_probabilities, key=lambda element: element[1])
Caution
However, you'll need to modify this to support different use cases, since I am passing images as input, and getting classification ([label, probability]) output. If you need text input (NLP), or other output (object detection outputs bounding boxes, labels and probabilities), classification (just labels), etc).
Also, if you are expecting different size image inputs, then you'd have to change the input size and reallocate the model (self.interpreter.allocate_tensors()). This is slow (inefficient). It's better to use the platform resizing functionality (e.g. Android graphics library) instead of using a TensorFlow lite model to do the resizing. Alternatively, you could resize the model with a separate model which would be much quicker to allocate_tensors() for.
I use the pre-trained VGG-16 model from Keras.
My working source code so far is like this:
from keras.applications.vgg16 import VGG16
from keras.preprocessing.image import load_img
from keras.preprocessing.image import img_to_array
from keras.applications.vgg16 import preprocess_input
from keras.applications.vgg16 import decode_predictions
model = VGG16()
print(model.summary())
image = load_img('./pictures/door.jpg', target_size=(224, 224))
image = img_to_array(image) #output Numpy-array
image = image.reshape((1, image.shape[0], image.shape[1], image.shape[2]))
image = preprocess_input(image)
yhat = model.predict(image)
label = decode_predictions(yhat)
label = label[0][0]
print('%s (%.2f%%)' % (label[1], label[2]*100))
I wound out that the model is trained on 1000 classes. It there any possibility to get the list of the classes this model is trained on? Printing out all the prediction labels is not an option because there are only 5 returned.
Thanks in advance
You could use decode_predictions and pass the total number of classes in the top=1000 parameter (only its default value is 5).
Or you could look at how Keras does this internally: It downloads the file imagenet_class_index.json (and usually caches it in ~/.keras/models/). This is a simple json file containing all class labels.
I think if you do something like this:
vgg16 = keras.applications.vgg16.VGG16(include_top=True,
weights='imagenet',
input_tensor=None,
input_shape=None,
pooling=None,
classes=1000)
vgg16.decode_predictions(np.arange(1000), top=1000)
Substitute your prediction array for np.arange(1000). Untested code so far.
Link to training labels here I think: http://image-net.org/challenges/LSVRC/2014/browse-synsets
If you edit your code a little bit you could get a list of all top predictions for the example you provided. Tensorflow decode_predictions returns a list of list class predictions tuples. So first, add top=1000 argument as #YSelf recommended to label = decode_predictions(yhat, top=1000) Then change label = label[0][0] to label = label[0][:] to select all of the predictions. label would look like this:
[('n04252225', 'snowplow', 0.4144803),
('n03796401', 'moving_van', 0.09205707),
('n04461696', 'tow_truck', 0.08912289),
('n03930630', 'pickup', 0.07173037),
('n04467665', 'trailer_truck', 0.048759833),
('n02930766', 'cab', 0.043586567),
('n04037443', 'racer', 0.036957625),....)]
From here you need to make a tuple unpacking. If you want to just get the list of 1000 classes you could just call [y for (x,y,z) in label] and you will have your list of all the 1000 classes. Output would look like this:
['snowplow',
'moving_van',
'tow_truck',
'pickup',
'trailer_truck',
'cab',
'racer',....]
this single line shall print out all class names and indices:
decode_predictions(np.expand_dims(np.arange(1000), 0), top=1000)