I dont understand how to use .pt trained model - python

I have a trained .pt model that was trained to classify pets. Now I try to create new Google Colab that uses this model to classify new image. But I get only number 961 instead of a class. I'm new to neural models so I don't quiet understand this. This is my code:
from google.colab import drive
drive.mount('/content/drive')
import os
os.chdir('/content/drive/MyDrive/pet_detection/models')
import torch
model_dict = torch.load('oxford_animal_frozen_1.pt', map_location=torch.device('cpu'))
print(model_dict.keys())
model_state_dict = model_dict['model_state_dict']
import torchvision.models as models
model = models.resnet101()
model.load_state_dict(model_state_dict, strict=False)
model.eval()
from PIL import Image
import torchvision.transforms as transforms
# Load the image
img = Image.open('/content/drive/MyDrive/photo_2023-02-17_15-26-39.jpg')
# Define the transformation to apply to the image
transform = transforms.Compose([
transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])
# Apply the transformation and convert the image to a tensor
img_tensor = transform(img).unsqueeze(0)
# Make a prediction with the pre-trained model
outputs = model(img_tensor)
# Get the predicted class label
_, predicted = torch.max(outputs, 1)
# Print the predicted class label
print(predicted.item())
In training oxford_animal_frozen_1.pt I used resnet101 and classes of training data are:
{'english', 'great', 'staffordshire', 'Russian', 'wheaten', 'newfoundland', 'saint', 'american', 'Bombay', 'miniature', 'yorkshire', 'pomeranian', 'Bengal', 'shiba', 'chihuahua', 'British', 'beagle', 'japanese', 'Ragdoll', 'Siamese', 'Sphynx', 'Maine', 'boxer', 'scottish', 'samoyed', 'basset', 'german', 'pug', 'leonberger', 'keeshond', 'Egyptian', 'Birman', 'Persian', 'Abyssinian', 'havanese'}
I try to fix it with ChatPGT but it doesn't help at all. I expect to get 1 of 37 classes of pet classification but I get some number instead

Related

How to resave new images (accessed by image_dataset_from_directory) classified by CNN model into new label folders

I used Kaggle's Dog Emotions Prediction dataset (https://www.kaggle.com/datasets/devzohaib/dog-emotions-prediction) to train a CNN model with TensorFlow in Python. I saved the labeled Kaggle jpeg images on my local drive "C:/.../CNN_model/train_images/..." with each labeled image in its corresponding subfolder ("angry", "happy", "relaxed", "sad").
I read in the training (ds_train) and validation (ds_val) jpeg images using "tf.keras.utils.image_data_set_from_directory" to train the model. I read in separate raw unlabeled jpeg images (ds_raw) from a single folder "C:/...CNN_model/raw_images" to run through the fitted CNN model to predict the dog emotions (pictures of my wife's dog).
My challenge is to make the predictions useful for my wife - she needs the resulting labeled jpeg images of her dog moved to their corresponding labeled folder based on the CNN model prediction. I have the predictions in a vector, however, I am struggling to understand how to move (or copy) the raw images from their original folder to new subfolders corresponding to their predicted emotion label.
How do I apply these predictions to the raw jpeg image dataset (ds_raw) to resave (or copy) the images to their own labeled subfolders? A sample directory would be:
C:/.../CNN_model/labeled_predictions/" containing four subfolders ("0-angry", "1-happy", "2-relaxed", "3-sad").
Below are parts of my code to help guide the question. I can provide more if desired. Thank you.
Here are parts of my code to explain how it was trained and used to predict:
import tensorflow as tf
import multiprocessing
from sklearn import preprocessing
from tensorflow.keras.models import Sequential
from keras.layers import Input, RandomRotation, RandomFlip, Conv2D, MaxPool2D, BatchNormalization, Dropout, Flatten, Dense
Read in the training/validation images (saved to my local drive from Kaggle)
kaggle_dir = 'C:/...CNN_model/train_images/' # labeled subfolders include "angry", "happy", "relaxed", "sad"
raw_dir = 'C:/...CNN_model/raw_images/' # no labeled subfolders
ds_train = tf.keras.utils.image_dataset_from_directory(
kaggle_dir,
labels = 'inferred',
label_mode = 'categorical',
class_names = ['angry', 'happy', 'relaxed', 'sad'],
color_mode = 'grayscale',
batch_size = 32,
image_size = (96, 96),
shuffle = True,
seed = 123,
validation_split = .2,
subset = 'training')
ds_val = tf.keras.utils.image_dataset_from_directory(
kaggle_dir,
labels = 'inferred',
label_mode = 'categorical',
class_names = ['angry', 'happy', 'relaxed', 'sad'],
color_mode = 'grayscale',
batch_size = 32,
image_size = (96, 96),
shuffle = True,
seed = 123,
validation_split = .2,
subset = 'validation')
Read in raw unclassified images
ds_raw = tf.keras.utils.image_dataset_from_directory(
'C:/.../raw_images/',
labels = None,
color_mode = 'grayscale',
batch_size = 32,
image_size = (96, 96),
#shuffle = True,
seed = 123)
Modeling
model = Sequential([...])
# Input (96,96,1), 4 relu Conv2D/MaxPool2D/BatchNorm/Dropout layers, flatten and softmax dense layer
model.compile(optimizer = 'adam',
loss = 'categorical_crossentropy',
metrics = ['accuracy'])
fit_history = model.fit_generator(ds_train,
epochs = 150,
validation data = ds_val,
workers = multiprocessing.cpu_count() - 1)
Prediction
raw_predict = model.predict(ds_raw, workers = multiprocessing.cpu_count() - 1)
labels = np.argmax(raw_predict, axis = -1)
print(labels)
# [1,0,2,1,2...]
lb = preprocessing.LabelBinarizer()
lb = lb.fit_transform(labels)
print(lb)
# [[0,1,0,0],[1,0,0,0],[0,0,1,0],[0,1,0,0],[0,0,1,0]...]

ImageNet classification challenge: Achieving top-5 error of 0.99472 on test set using VGG11

I recently took an imagenet pre-trained VGG11 network and made predictions on the imagenet test dataset. Upon submitting this file to the evaluation server, I received an email with following text:
Error: 0.99607 (top-5) 0.99898 (top-1)
Per-class error (classes 1-1000):
1 1
1 1
1 1
...
Does this mean that my top-5 accuracy is 1-0.99607=0.393%? If so then the score is too low.
Could you please point out where I could be going wrong? Here is the code for reference.
P.S.: I have checked that the images are loaded and predicted upon in alphabetical order.
vgg11 = models.vgg11(pretrained=True)
vgg11.to(torch.device("cuda"))
vgg11.eval()
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
test_loader = torch.utils.data.DataLoader(datasets.ImageFolder("test_dataset",
transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
normalize
])),
batch_size=32, shuffle=False)
fp = open("predictions.txt", "w")
for a, b in tqdm(test_loader):
preds = vgg11(a.cuda())
_, preds = torch.topk(preds, k=5, dim=1)
preds = preds.cpu().detach().numpy()
for i in range(len(preds)):
fp.write(" ".join(str(j) for j in preds[i])+"\n")
fp.close()
Based on your code, I believe the error is right because of the lack of normalization. I don't have the environment to test on the ImageNet test set, so I made a small example with 4 random cat images from the internet. (Link: image1, image2, image3, image4).
The code test as below:
import torch
from torchvision import models
import numpy as np
import cv2
import os
with torch.no_grad():
vgg11 = models.vgg11(pretrained=True)
vgg11.eval()
mean=torch.tensor([0.485, 0.456, 0.406])
std=torch.tensor([0.229, 0.224, 0.225])
def read_image(image_path, size=224):
image = cv2.imread(image_path)
image = cv2.resize(image, (size,size))
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB).astype(np.float32)
image = torch.tensor(image).permute(2,0,1).unsqueeze(0) / 255.
image = (image - mean[None, :, None, None])/std[None, :, None, None]
return image
from_path = './../test_image/'
cat_name = ['cat1','cat2','cat3','cat4']
images = torch.empty(0, 3, 224, 224)
for name in cat_name:
image_path = os.path.join(from_path, f'{name}.png')
image = read_image(image_path)
images = torch.cat((images, image), 0)
preds = vgg11(images.float()).detach().cpu().numpy()
result = np.argmax(preds, axis=1)
print(result)
Without normalization, the result is ['Egyptian cat', 'sock', 'Komodo dragon', 'doormat'] ([285, 806, 48, 539]).
With normalization, the result is ['tabby cat', 'tabby cat', 'leopard', 'Egyptian cat'] ([281 281 288 285]).

Decode prediction of custom keras model

some days ago I started with ML as I wanted to do a hcaptcha solver. I have everything ready, I just need to train a model that will classify the captcha images so I can send a request with the good answer and get the captcha token.
I've looked into some tutorials on how to train my own model with several classes. I have it the following way:
1 trainer folder, 1 validation folder and 1 testing folder. On the trainer and validation folder there is more subfolders named airplane, truck, boat, train,... each one containing aprox 20 images. On the testing folder, some random images related with the classes I have.
I have trained the model and it seems like I'm getting a 1 accuracy. Then I get some of the random testing images and try to predict them using this saved model. It does it's job and predicts them, returning an array of numbers. The thing is I don't know how to decode those predictions nor how to see the classes list with his representative integer before predicting.
I'm super new on this so I'm sure anything will help :)
My code below:
import os
from keras.preprocessing import image
from keras.models import Sequential
from keras import layers
from keras.models import load_model
import numpy as np
trainer_path = "./img/trainer"
validator_path = "./img/validator"
testing_path = "./img/tester"
WIDTH = 128
HEIGHT = 128
BATCH = 30
EPOCHS = 15
train_dataset = image.image_dataset_from_directory(
trainer_path,
label_mode="int",
batch_size=BATCH,
image_size=(WIDTH, HEIGHT)
)
validator_dataset = image.image_dataset_from_directory(
validator_path,
label_mode="int",
batch_size=BATCH,
image_size=(WIDTH, HEIGHT)
)
model = Sequential([
layers.Input((WIDTH, HEIGHT, 3)),
layers.Conv2D(16, 3, padding="same"),
layers.Conv2D(32, 3, padding="same"),
layers.MaxPooling2D(),
layers.Flatten(),
layers.Dense(10)
])
model.compile(
optimizer="adam",
loss=[
"sparse_categorical_crossentropy"
],
metrics=["accuracy"]
)
model_fit = model.fit(
train_dataset,
epochs=EPOCHS,
validation_data=validator_dataset,
verbose=2
)
#loading the saved model
model = load_model("./model")
for i in os.listdir(testing_path):
img = image.load_img(testing_path + "/" + i, target_size=(WIDTH, HEIGHT, 3))
img_array = image.img_to_array(img)
img_batch = np.expand_dims(img_array, axis=0)
prediction = model.predict(img_batch)
print(prediction)
print()
Output example:
[[ 875.5614 3123.8257 1521.7046 90.056526 335.5274
-785.3671 1075.9199 1105.3068 -14.917503 -3745.6494 ]]
You have to apply activation function on last Dense layer, if you want to classify the image it should be softmax (you will get probabilities for all classes), here is the link:
https://keras.io/api/layers/activations/
When it comes to class names it should be sorted by alphanumerical values, you can also pass class_names argument, here is the link to arguments of this function:
https://www.tensorflow.org/api_docs/python/tf/keras/utils/image_dataset_from_directory

Input image size of Faster-RCNN model in Pytorch

I'm Trying to implement of Faster-RCNN model with Pytorch.
In the structure, First element of model is Transform.
from torchvision.models.detection import fasterrcnn_resnet50_fpn
model = fasterrcnn_resnet50_fpn(pretrained=True)
print(model.transform)
GeneralizedRCNNTransform(
Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
Resize(min_size=(800,), max_size=1333, mode='bilinear')
)
When images pass forward of Resize(), They come out with (800,h) or (w, 1333) according to ratio of Width and Height.
for i in range(2):
_, image, target = testset.__getitem__(i)
img = image.unsqueeze(0)
output, _ = model.transform(img)
Before Transform : torch.Size([512, 640])
After Transform : [(800, 1000)]
Before Transform : torch.Size([315, 640])
After Transform : [(656, 1333)]
My question is how to get those resized output and why they use This method? I can't find the information in the paper and I can't understand the source code about transform in fasterrcnn_resnet50_fpn.
Sorry for my English
GeneralizedRCNN data transform:
https://github.com/pytorch/vision/blob/922db3086e654871c35cd80c2c01eabb65d78475/torchvision/models/detection/generalized_rcnn.py#L15
performs the data transformation on the inputs to feed into the model
min_size: minimum size of the image to be rescaled before feeding it to the backbone.
max_size: maximum size of the image to be rescaled before feeding it to the backbone
https://github.com/pytorch/vision/blob/main/torchvision/models/detection/faster_rcnn.py#L256
I couldn't either find out why it was generalize for min 800 and max 1333, didn't find anything in research paper either.
but as the 1st layer is a Conv layer, the input to the network is fixed size, I apply many other augmentations such as mirror, random cropping etc, inspired by SSD based networks. Hence I would prefer to do all augmentation in a separate place once instead of twice.
I would assume the model should work the best during validation using images with shapes and other properties as close as possible to the training data.
though you can experiment with custom min_size and max_size...
`
from .transform import GeneralizedRCNNTransform
min_size = 900 #changed from default
max_size = 1433 #changed from default
image_mean = [0.485, 0.456, 0.406]
image_std = [0.229, 0.224, 0.225]
model = fasterrcnn_resnet50_fpn(pretrained=True, min_size, max_size, image_mean, image_std)
#batch of 4 image, 4 bboxes
images, boxes = torch.rand(4, 3, 600, 1200), torch.rand(4, 11, 4)
labels = torch.randint(1, 91, (4, 11))
images = list(image for image in images)
targets = []
for i in range(len(images)):
d = {}
d['boxes'] = boxes[i]
d['labels'] = labels[i]
targets.append(d)
output = model(images, targets)
`
or you can completely write your transforms
https://pytorch.org/vision/stable/transforms.html
'
from torchvision.transforms import transforms as T
model = fasterrcnn_resnet50_rpn()
model.transform = T.Compose([*check torchvision.transforms for more*])
'
Hope this helps.

ValueError: Dimensions must be equal, but are 3 and 3072 for 'loss/output_1_loss/mul' (op: 'Mul') with input shapes: [?,3], [?,3072]

I have an error in my codes, and I've done read the documentation but it still error,
what is mean about dimensions must be equal? But actually I've done adding some layers in my code model.fit()
This is my code :
# USAGE
# python train_simple_nn.py --dataset animals --model output/simple_nn.model --label-bin output/simple_nn_lb.pickle --plot output/simple_nn_plot.png
# set the matplotlib backend so figures can be saved in the background
# import the necessary packages
from sklearn.preprocessing import LabelBinarizer
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from keras.models import Sequential
from keras.layers.core import Dense
from keras.optimizers import SGD
from imutils import paths
import matplotlib.pyplot as plt
import numpy as np
import argparse
import random
import pickle
import cv2
import os
from keras import layers
import tensorflow as tf
# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-d", "--dataset", required=True,
help="path to input dataset of images")
ap.add_argument("-m", "--model", required=True,
help="path to output trained model")
ap.add_argument("-l", "--label-bin", required=True,
help="path to output label binarizer")
ap.add_argument("-p", "--plot", required=True,
help="path to output accuracy/loss plot")
args = vars(ap.parse_args())
# initialize the data and labels
print("[INFO] loading images...")
data = []
labels = []
# grab the image paths and randomly shuffle them
imagePaths = sorted(list(paths.list_images(args["dataset"])))
random.seed(42)
random.shuffle(imagePaths)
# loop over the input images
for imagePath in imagePaths:
# load the image, resize the image to be 32x32 pixels (ignoring
# aspect ratio), flatten the image into 32x32x3=3072 pixel image
# into a list, and store the image in the data list
image = cv2.imread(imagePath)
image = cv2.resize(image, (32, 32)).flatten()
data.append(image)
# extract the class label from the image path and update the
# labels list
label = imagePath.split(os.path.sep)[-2]
labels.append(label)
# scale the raw pixel intensities to the range [0, 1]
data = np.array(data, dtype="float") / 255.0
labels = np.array(labels)
# partition the data into training and testing splits using 75% of
# the data for training and the remaining 25% for testing
(trainX, testX, trainY, testY) = train_test_split(data,
labels, test_size=0.25, random_state=42)
# convert the labels from integers to vectors (for 2-class, binary
# classification you should use Keras' to_categorical function
# instead as the scikit-learn's LabelBinarizer will not return a
# vector)
lb = LabelBinarizer()
trainY = lb.fit_transform(trainY)
testY = lb.transform(testY)
# define the 3072-1024-512-3 architecture using Keras
model = tf.keras.Sequential()
tf.keras.layers.Dense(1024, input_shape=(3072,), activation="sigmoid")
tf.keras.layers.Dense(512, activation="sigmoid")
tf.keras.layers.Dense(len(lb.classes_), activation="softmax")
# initialize our initial learning rate and # of epochs to train for
INIT_LR = 0.01
EPOCHS = 75
# compile the model using SGD as our optimizer and categorical
# cross-entropy loss (you'll want to use binary_crossentropy
# for 2-class classification)
print("[INFO] training network...")
opt = tf.keras.optimizers.SGD(lr=INIT_LR)
model.compile(loss="categorical_crossentropy", optimizer=opt,
metrics=["accuracy"])
# train the neural network
H = model.fit(trainX, trainY, validation_data=(testX, testY),
epochs=EPOCHS, batch_size=32)
# evaluate the network
print("[INFO] evaluating network...")
predictions = model.predict(testX, batch_size=32)
print(classification_report(testY.argmax(axis=1),
predictions.argmax(axis=1), target_names=lb.classes_))
# plot the training loss and accuracy
N = np.arange(0, EPOCHS)
plt.style.use("ggplot")
plt.figure()
plt.plot(N, H.history["loss"], label="train_loss")
plt.plot(N, H.history["val_loss"], label="val_loss")
plt.plot(N, H.history["acc"], label="train_acc")
plt.plot(N, H.history["val_acc"], label="val_acc")
plt.title("Training Loss and Accuracy (Simple NN)")
plt.xlabel("Epoch #")
plt.ylabel("Loss/Accuracy")
plt.legend()
plt.savefig(args["plot"])
# save the model and label binarizer to disk
print("[INFO] serializing network and label binarizer...")
model.save(args["model"])
f = open(args["label_bin"], "wb")
f.write(pickle.dumps(lb))
f.close()
and the error :
ValueError: Dimensions must be equal, but are 3 and 3072 for
'loss/output_1_loss/mul' (op: 'Mul') with input shapes: [?,3],
[?,3072]. in model.fit(),
How to solve it?
The problem in your code is here:
model = tf.keras.Sequential()
tf.keras.layers.Dense(1024, input_shape=(3072,), activation="sigmoid")
tf.keras.layers.Dense(512, activation="sigmoid")
tf.keras.layers.Dense(len(lb.classes_), activation="softmax")
You define those layers, but you never add them to your model.
When using the sequential model, you need to add those layers to your model, via the .add() method.
Change those lines to:
model.add(tf.keras.layers.Dense(1024, input_shape=(3072,), activation="sigmoid"))
model.add(tf.keras.layers.Dense(512, activation="sigmoid"))
model.add(tf.keras.layers.Dense(len(lb.classes_), activation="softmax"))

Categories