Related
I am working on a classification problem in pytorch. I built a custom dataset. I have computed the mean and std values for all samples which have 4 channels in train dataset.
def __getitem__(self, index: int):
images = self.loadImg(self.file[index]) # returns dict
label = torch.tensor([self.label[index]],dtype=torch.int64)
for name,image in images.items():
image = torch.from_numpy(image).float().unsqueeze(0)
images[name] = self.transform(image)
if self.train_transform:
images[name] = self.transform_cpu(images[name].to("cpu"))
images = {key:image for key,image in images.items()}
image = torch.stack([image.squeeze() for image in images.values()])
# image.size() -> (4,W,H)
if config.NORMALIZE: # True
self.norm_T(image) # norm_T = T.Compose([T.Normalize([0.1704, 0.1610, 0.1710, 0.1615], [0.2887, 0.2859, 0.2890, 0.2861])])
for img in image: # show imgs
T.ToPILImage()(img).show()
time.sleep(1)
return image,label
In the last for loop, I display every channel one by one. While I check image channels, I see some channels are not affected by normalization. They are as normal as just opened and displayed image. However remain has affected by the normalization.
Why Normalization is applied to only some of the channels?
currently I'm trying to train a Matterport Mask R-CNN with custom classes and a custom dataset on Colab. I followed this tutorial:
https://github.com/TannerGilbert/MaskRCNN-Object-Detection-and-Segmentation
Instead of using images with the same size, my images do have different sizes. I spend hours to match the masks(.json) to the image size. But finally it is working:
The following lines are loading two sample images and displaying the segmentation mask:
# Load and display random samples
image_ids = np.random.choice(dataset_train.image_ids, 2)
for image_id in image_ids:
image, img_height, img_width = dataset_train.load_image(image_id)
mask, class_ids = dataset_train.load_mask(image_id, img_height, img_width)
visualize.display_top_masks(image, mask, class_ids, dataset_train.class_names)
load_image() in utils.py is looking like this:
def load_image(self, image_id):
"""Load the specified image and return a [H,W,3] Numpy array.
"""
# Load image
image = skimage.io.imread(self.image_info[image_id]['path'])
img_height, img_width, num_channels = image.shape
# If grayscale. Convert to RGB for consistency.
if image.ndim != 3:
image = skimage.color.gray2rgb(image)
# If has an alpha channel, remove it for consistency
if num_channels == 4:
image = image[..., :3]
return image, img_height, img_width
load_mask() is looking like this:
def load_mask(self, image_id, img_height, img_width):
# get details of image
info = self.image_info[image_id]
# define box file location
path = info['annotation']
# load XML
masks, classes = self.extract_masks(path, img_height, img_width)
return masks, np.asarray(classes, dtype='int32')
And extract_mask() is looking like this:
def extract_masks(self, filename, img_height, img_width):
json_file = os.path.join(filename)
with open(json_file) as f:
img_anns = json.load(f)
masks = np.zeros([img_height, img_width, len(img_anns['shapes'])], dtype='uint8')
classes = []
for i, anno in enumerate(img_anns['shapes']):
mask = np.zeros([img_height, img_width], dtype=np.uint8)
cv2.fillPoly(mask, np.array([anno['points']], dtype=np.int32), 1)
masks[:, :, i] = mask
classes.append(self.class_names.index(anno['label']))
return masks, classes
Now we are getting to the curious part...
After going on with my code creating the model
# Create model in training mode
model = modellib.MaskRCNN(mode="training", config=config,
model_dir=MODEL_DIR)
...and choosing the weights to start with
# Which weights to start with?
init_with = "coco" # imagenet, coco, or last
if init_with == "imagenet":
model.load_weights(model.get_imagenet_weights(), by_name=True)
elif init_with == "coco":
# Load weights trained on MS COCO, but skip layers that
# are different due to the different number of classes
# See README for instructions to download the COCO weights
model.load_weights(COCO_MODEL_PATH, by_name=True,
exclude=["mrcnn_class_logits", "mrcnn_bbox_fc",
"mrcnn_bbox", "mrcnn_mask"])
elif init_with == "last":
# Load the last model you trained and continue training
model.load_weights(model.find_last(), by_name=True)
...I get to the point where training should start:
model.train(dataset_train, dataset_val,
learning_rate=config.LEARNING_RATE,
epochs=5,
layers='heads')
I'm receiving following error:
/content/drive/My Drive/Colab/Mask_RCNN/mrcnn/model.py in load_image_gt()
1210 # Load image and mask
1211 image, img_height, img_width = dataset.load_image(image_id)
-> 1212 mask, class_ids = dataset.load_mask(image_id, img_height, img_width)
1213 original_shape = image.shape
1214 image, window, scale, padding, crop = utils.resize_image(
TypeError: load_mask() missing 2 required positional arguments: 'img_height' and 'img_width'
I noticed, that load_image_gt() is different to the above mentioned load_image(). I've already adjusted load_image_gt() as followed (I added #<------- to the relevant lines):
def load_image_gt(dataset, config, image_id, augment=False, augmentation=None,
use_mini_mask=False):
"""Load and return ground truth data for an image (image, mask, bounding boxes).
augment: (deprecated. Use augmentation instead). If true, apply random
image augmentation. Currently, only horizontal flipping is offered.
augmentation: Optional. An imgaug (https://github.com/aleju/imgaug) augmentation.
For example, passing imgaug.augmenters.Fliplr(0.5) flips images
right/left 50% of the time.
use_mini_mask: If False, returns full-size masks that are the same height
and width as the original image. These can be big, for example
1024x1024x100 (for 100 instances). Mini masks are smaller, typically,
224x224 and are generated by extracting the bounding box of the
object and resizing it to MINI_MASK_SHAPE.
Returns:
image: [height, width, 3]
shape: the original shape of the image before resizing and cropping.
class_ids: [instance_count] Integer class IDs
bbox: [instance_count, (y1, x1, y2, x2)]
mask: [height, width, instance_count]. The height and width are those
of the image unless use_mini_mask is True, in which case they are
defined in MINI_MASK_SHAPE.
"""
# Load image and mask
image, img_height, img_width = dataset.load_image(image_id) **#<-------**
mask, class_ids = dataset.load_mask(image_id, img_height, img_width) **#<-------**
original_shape = image.shape
image, window, scale, padding, crop = utils.resize_image(
image,
min_dim=config.IMAGE_MIN_DIM,
min_scale=config.IMAGE_MIN_SCALE,
max_dim=config.IMAGE_MAX_DIM,
mode=config.IMAGE_RESIZE_MODE)
mask = utils.resize_mask(mask, scale, padding, crop)
# Random horizontal flips.
# TODO: will be removed in a future update in favor of augmentation
if augment:
logging.warning("'augment' is deprecated. Use 'augmentation' instead.")
if random.randint(0, 1):
image = np.fliplr(image)
mask = np.fliplr(mask)
# Augmentation
# This requires the imgaug lib (https://github.com/aleju/imgaug)
if augmentation:
import imgaug
# Augmenters that are safe to apply to masks
# Some, such as Affine, have settings that make them unsafe, so always
# test your augmentation on masks
MASK_AUGMENTERS = ["Sequential", "SomeOf", "OneOf", "Sometimes",
"Fliplr", "Flipud", "CropAndPad",
"Affine", "PiecewiseAffine"]
def hook(images, augmenter, parents, default):
"""Determines which augmenters to apply to masks."""
return augmenter.__class__.__name__ in MASK_AUGMENTERS
# Store shapes before augmentation to compare
image_shape = image.shape
mask_shape = mask.shape
# Make augmenters deterministic to apply similarly to images and masks
det = augmentation.to_deterministic()
image = det.augment_image(image)
# Change mask to np.uint8 because imgaug doesn't support np.bool
mask = det.augment_image(mask.astype(np.uint8),
hooks=imgaug.HooksImages(activator=hook))
# Verify that shapes didn't change
assert image.shape == image_shape, "Augmentation shouldn't change image size"
assert mask.shape == mask_shape, "Augmentation shouldn't change mask size"
# Change mask back to bool
mask = mask.astype(np.bool)
# Note that some boxes might be all zeros if the corresponding mask got cropped out.
# and here is to filter them out
_idx = np.sum(mask, axis=(0, 1)) > 0
mask = mask[:, :, _idx]
class_ids = class_ids[_idx]
# Bounding boxes. Note that some boxes might be all zeros
# if the corresponding mask got cropped out.
# bbox: [num_instances, (y1, x1, y2, x2)]
bbox = utils.extract_bboxes(mask)
# Active classes
# Different datasets have different classes, so track the
# classes supported in the dataset of this image.
active_class_ids = np.zeros([dataset.num_classes], dtype=np.int32)
source_class_ids = dataset.source_class_ids[dataset.image_info[image_id]["source"]]
active_class_ids[source_class_ids] = 1
# Resize masks to smaller size to reduce memory usage
if use_mini_mask:
mask = utils.minimize_mask(bbox, mask, config.MINI_MASK_SHAPE)
# Image meta data
image_meta = compose_image_meta(image_id, original_shape, image.shape,
window, scale, active_class_ids)
return image, img_height, img_width, image_meta, class_ids, bbox, mask **#<-------**
I don't know why the two requied arguments are missing, because img_height and img_with are defined, arent't they?
In my oppion the code is exactly the same like the "# Load and display random samples"-code mentioned above.
I would be very gratefull, if someone could help.
Many thanks in advance!
You must set width and height value in load_yourdatasetname() by self.add_imge and get in load_mask function
Example:
class Covid19Dataset(utils.Dataset):
def load_covid19(self, dataset_dir, subset):
"""Load a subset of the covid-19 dataset.
dataset_dir: Root directory of the dataset.
subset: Subset to load: train or val
"""
# Add classes. We have three class to add.
self.add_class("covid19", 1, "lung right")
self.add_class("covid19", 2, "lung left")
self.add_class("covid19", 3, "infection")
# Train or validation dataset?
assert subset in ["train", "val"]
dataset_dir = os.path.join(dataset_dir, subset)
dataset_Image_dir = os.path.join(dataset_dir, "Images")
dataset_Labels_dir = os.path.join(dataset_dir, "Labels")
# get image names in dataset directory
filenamesInDir = [f for f in listdir(dataset_Image_dir) if isfile(join(dataset_Image_dir, f))]
png_filenames = []
for name in filenamesInDir:
# Skip if file does not end with .png
if not name.endswith(".png") : continue
png_filenames.append(name)
# Add images
for a in png_filenames:
image_path = os.path.join(dataset_Image_dir, a)
label_path = os.path.join(dataset_Labels_dir, a)
image = skimage.io.imread(image_path)
height, width = image.shape[:2]
self.add_image(
"covid19",
image_id=a, # use file name as a unique image id
path=image_path,
label_path=label_path,
width=width, height=height)
def load_mask(self, image_id):
"""Generate instance masks for an image.
Returns:
masks: A bool array of shape [height, width, instance count] with
one mask per instance.
class_ids: a 1D array of class IDs of the instance masks.
"""
# If not a covid19 dataset image, delegate to parent class.
image_info = self.image_info[image_id]
if image_info["source"] != "covid19":
return super(self.__class__, self).load_mask(image_id)
# [height, width, instance_count]
info = self.image_info[image_id]
label = skimage.io.imread(info["label_path"])
Class_names = np.unique(label)
Class_names = Class_names[1:]
mask = np.zeros([info["height"], info["width"], len(Class_names)],
dtype=np.uint8)
class_ids = []
for i,p in enumerate(Class_names):
mask[label==p,i]=1
class_ids.append(i+1)
# Return mask, and array of class IDs of each instance. Since we have
class_ids = np.array(class_ids, dtype=np.int32)
mask = np.rot90(mask)
return mask.astype(np.bool), class_ids
def image_reference(self, image_id):
"""Return the path of the image."""
info = self.image_info[image_id]
if info["source"] == "covid19":
return info["path"]
else:
super(self.__class__, self).image_reference(image_id)
def load_image(self, image_id):
"""Load the specified image and return a [H,W,3] Numpy array.
"""
# Load image
image = skimage.io.imread(self.image_info[image_id]['path'])
#image = rotate(image, 90, resize=False)
image = np.rot90(image)
# If grayscale. Convert to RGB for consistency.
if image.ndim != 3:
image = skimage.color.gray2rgb(image)
# If has an alpha channel, remove it for consistency
if image.shape[-1] == 4:
image = image[..., :3]
return image
I am trying to feed my image roi into the Tensorflow classifier I took from here. The idea is to first run a simple filter, get rectangle candidates, and then check (using the network) whether each rectangle(roi) is actually what I am looking for.
class ScrewDetector:
def __init__(self):
self.session = None # an internal variable needed for inception network
# to keep the screw data in
self.screw_data = dict()
# load the labels of the classification: screw / non-screw
self.class_labels = [line.rstrip() for line in tf.gfile.GFile(home + "/imagine_weights/screw_detector/retrained_labels.txt")]
# prepare the network
with tf.gfile.FastGFile(home + "/weights/screw_detector/retrained_graph.pb", 'rb') as f:
graph_def = tf.GraphDef() ## the graph-graph_def is a saved copy of a TensorFlow graph, object initialization
graph_def.ParseFromString(f.read()) # parse serialized protocol buffer data into variable
_ = tf.import_graph_def(graph_def, name='') # import a serialized TensorFlow GraphDef protocol buffer, extract objects in the GraphDef as tf.Tensor
# start the session
with tf.Session() as self.session:
self.softmax_tensor = self.session.graph.get_tensor_by_name('final_result:0')
def detect_screw(self):
# get a copy and resize it
img_raw = self.cv_image.copy()
resized_img = cv2.resize(img_raw, (0,0), fx=RESIZE_FACTOR, fy=RESIZE_FACTOR)
# grayscale it
gray = cv2.cvtColor(resized_img, cv2.COLOR_BGR2GRAY)
# detect circles in the image
circles = cv2.HoughCircles(gray, cv2.HOUGH_GRADIENT, 1, 100, param1=50,param2=35,minRadius=15,maxRadius=30)
# ensure at least some circles were found
if circles is not None:
# convert the (x, y) coordinates and radius of the circles to integers
circles = np.round(circles[0, :]).astype("int")
# get a counter
screw_id = 0
# loop over the (x, y) coordinates and radius of the circles
for (x, y, r) in circles:
# draw the circle in the output image, then draw a rectangle corresponding to the center of the circle
#cv2.circle(resized_img, (x, y), r, (0, 255, 0), 4)
cv2.rectangle(resized_img, (x - r, y - r), (x + r, y + r), (0, 0, 255), 5)
# get the above rectangle as ROI
screw_roi = resized_img[y:y+r, x:x+r]
# feed it into the network
#import IPython; IPython.embed()
predictions = self.session.run(self.softmax_tensor, feed_dict={screw_id: [screw_roi.flatten()]})
# get prediction values in array back
top_k = predictions[0].argsort()[-len(predictions[0]):][::-1]
# output
for node_id in top_k:
human_string = self.class_labels[node_id]
score = predictions[0][node_id]
print('%s (score = %.5f)' % (human_string, score))
# if it is a screw, go on, save its coordinates and append into the network
# remap in the original image
scaled_point = (round(x * (1/RESIZE_FACTOR)), round(y * (1/RESIZE_FACTOR)))
# append to the dict
self.screw_data[scaled_point] = r * RESIZE_FACTOR
# iterate the counter
screw_id += screw_id
# publish the result, which is an image (scaled)
result_image_msg = Image()
try:
result_image_msg = self.bridge.cv2_to_imgmsg(resized_img, "bgr8")
#print(self.screw_data)
except CvBridgeError as e:
print("Could not make it through the cv bridge of death.")
self.result_image_pub.publish(result_image_msg)
else:
print("No detection of circles.")
but I get:
TypeError: Cannot interpret feed_dict key as Tensor: Can not convert a int into a Tensor.
I do know that the variables screw_id and screw_roi are not empty. And I do know that one needs to feed a dictionary in, which is why in the first place I was trying to do that. But I can't get it running for the reason above.
Any thoughts?
EDIT: So normally, this code loads the image and conducts the prediction as follows:
image_data = tf.gfile.FastGFile(image_path, 'rb').read()
softmax_tensor = sess.graph.get_tensor_by_name('final_result:0')
predictions = sess.run(softmax_tensor, {'DecodeJpeg/contents:0': image_data})
All I want is to turn this into a form which operates with the image ROI provided during the operation. It can't be too complicated.
It's not a rocket science, it turns out.
One somehow needs to convert image so that he can pass a string of image bytes, because that's what the function sess.run() expects.
If you don't have a file that you want to load from a file system, then the following is the way:
image_data = cv2.imencode('.jpg', screw_roi)[1].tostring() # pass a string of image bytes
after this, you simply can run:
predictions = self.session.run(self.softmax_tensor, {'DecodeJpeg/contents:0': image_data})
That's it.
feed_dict expect a dictionary with tensors as keys, to populate the placeholders with the specified valued. It's not in your code snippet how does the screw_id is initiated, but I bet it's not a tensor of any kind, hence, your error.
I'm trying to classify images using SIFT-computed local descriptors with Bag of Visual Words, KMeans clustering and histograms.
I've read a lot of SO answers and tried to follow these instructions, however, it feels like I don't understand how the whole pipeline should work. Below will be the code I've implemented and it works reeeally slow.
That's why I'm asking this question: to clarify my understanding of using SIFT descriptors for classification and verify my code implementation.
I hope to get feedback on my understanding and get some help in improving my knowledge of the concept.
Firstly, I've written a class wrapper for SIFT. My wrapper computes SIFT descriptors on image patches using sliding window. It also uses Root SIFT for descriptors computation. The function detectAndCompute is its main function and basically it takes an image as an argument, crops it into several sub-images using sliding window, computes Root SIFT descriptors for each sub-image and unites all the descriptors from all sub-images into a single matrix of descriptors.
class DenseRootSIFT(object):
def __init__(self):
self.sift = cv2.xfeatures2d.SIFT_create()
def detectAndCompute(self, image, step_size=12, window_size=(10, 10)):
if window_size is None:
winH, winW = image.shape[:2]
window_size = (winW // 4, winH // 4)
descriptors = np.array([], dtype=np.float32).reshape(0, 128)
for crop in self._crop_image(image, step_size, window_size):
crop = cv2.cvtColor(crop, cv2.COLOR_BGR2GRAY)
descs = self._detectAndCompute(crop)[1]
if descs is not None:
descriptors = np.vstack([descriptors, self._detectAndCompute(crop)[1]])
return descriptors
def _detect(self, image):
return self.sift.detect(image)
def _compute(self, image, kps, eps=1e-7):
kps, descs = self.sift.compute(image, kps)
if len(kps) == 0:
return [], None
descs /= (descs.sum(axis=1, keepdims=True) + eps)
descs = np.sqrt(descs)
return kps, descs
def _detectAndCompute(self, image):
kps = self._detect(image)
return self._compute(image, kps)
def _sliding_window(self, image, step_size, window_size):
for y in xrange(0, image.shape[0], step_size):
for x in xrange(0, image.shape[1], step_size):
yield (x, y, image[y:y + window_size[1], x:x + window_size[0]])
def _crop_image(self, image, step_size=12, window_size=(10, 10)):
crops = []
winH, winW = window_size
for (x, y, window) in self._sliding_window(image, step_size=step_size, window_size=(winW, winH)):
if window.shape[0] != winH or window.shape[1] != winW:
continue
crops.append(image[y:y+winH, x:x+winW])
return np.array(crops)
Below I post my class called DenseRootSiftPreparator that should provide tools for extracting SIFT features from image and preparing them for further classification (particularly, with LinearSVC from sklearn).
So, I follow this process:
Generate a codebook (_generate_codebook function in the class below). The codebook is generated by applying mini-batch KMeans clustering with 2048 clusters. As an output, the function returns a 2048 x 128 matrix.
Then I'm trying to create a histogram for each image in the dataset by following the instructions I've posted above. A histogram for a single image is created using _create_histogram function. At first, the histogram is initialized with zeros. Then the descriptors get computed for the input image and for each descriptor, I'm trying to find the index of the closest descriptor in the previously generated codebook (using KDTree from scipy) and increment the value of the histogram on that index. Then I L2-normalize the histogram array and return it. The same process is repeated for each image. And it is very very slow.
Here's the code for DenseRootSiftPreparator:
class DenseRootSiftPreparator(object):
def __init__(self, histogram_size=2048):
self.X = []
self.dense_root_sift = DenseRootSIFT()
self.histogram_size = histogram_size
def fit(self, image_dataset, y=None):
# #param image_dataset - array of images in OpenCV format
self.X = image_dataset
def extract_descriptors_and_prepare_for_classification(self, image):
return self._get_histograms_for_image(image)
def _get_histograms_for_image(self, image):
codebook = self._generate_codebook(image)
histograms = []
for img in self.X:
histogram = self._create_histogram(img, self.histogram_size, codebook)
histograms.append(histogram)
return histograms
def _create_histogram(self, image, hist_size, codebook):
histogram = np.zeros(hist_size)
descriptors = self.dense_root_sift.detectAndCompute(image, window_size=None)
tree = spatial.KDTree(codebook)
for i in xrange(len(descriptors)):
histogram[tree.query(descriptors[i])[1]] += 1
return normalize(histogram[:, np.newaxis], axis=0).ravel()
def _generate_codebook(self, image):
descriptors = self.dense_root_sift.detectAndCompute(image, window_size=None)
kmeans = MiniBatchKMeans(n_clusters=2048, batch_size=128,
n_init=10, max_no_improvement=10)
kmeans.fit(descriptors)
codebook = kmeans.cluster_centers_[:]
return codebook
I would test my code in the following way:
images = get_images_dataset()
test_input_img = cv2.imread('test_input_image.jpg')
histogram_extractor = DenseRootSiftPreparator()
histogram_extractor.fit(images)
hists = histogram_extractor.extract_descriptors_and_prepare_for_classification(test_input_img)
Here're my imports (just in case):
import numpy as np
from scipy import spatial
import cv2
from cv2.xfeatures2d import SIFT_create
from sklearn.cluster import MiniBatchKMeans
from sklearn.preprocessing import normalize
My main questions:
Is my understanding of creating the Bag of Visual Words model using
SIFT descriptors correct?
If not, what am I doing wrong? What could be done better?
Are the functions I've described above work as they should work or I'm missing something out?
Is there a way of making the SIFT descriptor preparation for classification process better and more effecient?
I want to extract face descriptors from photos of people. This is what I've done so far:
First detected faces from photos using opencv library in python.
Saved those faces in another image.
Next I have to extract descriptor from face image.
For this I have downloaded vgg face caffemodel CNN from here: http://www.robots.ox.ac.uk/~vgg/software/vgg_face/
To extract descriptor, first I did this:
net = caffe.Net('CAFFE_FACE_deploy.prototxt','CAFFE_FACE.caffemodel',caffe.TEST)
img = caffe.io.load_image( "detectedface.jpg" )
img = img[:,:,::-1]*255.0
avg = np.array([129.1863,104.7624,93.5940])
img = img - avg
img = img.transpose((2,0,1))
img = img[None,:]
out = net.forward_all( data = img )
But it gives dimension mismatch error that data should be of dimension (50,3,224,224) instead of (50,3,490,490)
Then I tried this:
# input preprocessing: 'data' is the name of the input blob == net.inputs[0]
transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})
transformer.set_transpose('data', (2,0,1))
transformer.set_mean('data', np.load(caffe_root + 'python/caffe/imagenet/ilsvrc_2012_mean.npy').mean(1).mean(1)) # mean pixel
transformer.set_raw_scale('data', 255) # the reference model operates on images in [0,255] range instead of [0,1]
transformer.set_channel_swap('data', (2,1,0)) # the reference model has channels in BGR order instead of RGB
net.blobs['data'].reshape(50,3,224,224)
net.blobs['data'].data[...] = transformer.preprocess('data', caffe.io.load_image('detectedface.jpg'))
out = net.forward()
feats = net.blobs['fc7'].data[0]
Here when I print feats, it displays all zeros. Why is it so?