I am attempting to understand more about computer vision models, and I'm trying to do some exploring of how they work. In an attempt to understand how to interpret feature vectors more I'm trying to use Pytorch to extract a feature vector. Below is my code that I've pieced together from various places.
import torch
import torch.nn as nn
import torchvision.models as models
import torchvision.transforms as transforms
from torch.autograd import Variable
from PIL import Image
img=Image.open("Documents/01235.png")
# Load the pretrained model
model = models.resnet18(pretrained=True)
# Use the model object to select the desired layer
layer = model._modules.get('avgpool')
# Set model to evaluation mode
model.eval()
transforms = torchvision.transforms.Compose([
torchvision.transforms.Resize(256),
torchvision.transforms.CenterCrop(224),
torchvision.transforms.ToTensor(),
torchvision.transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
def get_vector(image_name):
# Load the image with Pillow library
img = Image.open("Documents/Documents/Driven Data Competitions/Hateful Memes Identification/data/01235.png")
# Create a PyTorch Variable with the transformed image
t_img = transforms(img)
# Create a vector of zeros that will hold our feature vector
# The 'avgpool' layer has an output size of 512
my_embedding = torch.zeros(512)
# Define a function that will copy the output of a layer
def copy_data(m, i, o):
my_embedding.copy_(o.data)
# Attach that function to our selected layer
h = layer.register_forward_hook(copy_data)
# Run the model on our transformed image
model(t_img)
# Detach our copy function from the layer
h.remove()
# Return the feature vector
return my_embedding
pic_vector = get_vector(img)
When I do this I get the following error:
RuntimeError: Expected 4-dimensional input for 4-dimensional weight [64, 3, 7, 7], but got 3-dimensional input of size [3, 224, 224] instead
I'm sure this is an elementary error, but I can't seem to figure out how to fix this. It was my impression that the "totensor" transformation would make my data 4-d, but it seems it's either not working correctly or I'm misunderstanding it. Appreciate any help or resources I can use to learn more about this!
All the default nn.Modules in pytorch expect an additional batch dimension. If the input to a module is shape (B, ...) then the output will be (B, ...) as well (though the later dimensions may change depending on the layer). This behavior allows efficient inference on batches of B inputs simultaneously. To make your code conform you can just unsqueeze an additional unitary dimension onto the front of t_img tensor before sending it into your model to make it a (1, ...) tensor. You will also need to flatten the output of layer before storing it if you want to copy it into your one-dimensional my_embedding tensor.
A couple of other things:
You should infer within a torch.no_grad() context to avoid computing gradients since you won't be needing them (note that model.eval() just changes the behavior of certain layers like dropout and batch normalization, it doesn't disable construction of the computation graph, but torch.no_grad() does).
I assume this is just a copy paste issue but transforms is the name of an imported module as well as a global variable.
o.data is just returning a copy of o. In the old Variable interface (circa PyTorch 0.3.1 and earlier) this used to be necessary, but the Variable interface was deprecated way back in PyTorch 0.4.0 and no longer does anything useful; now its use just creates confusion. Unfortunately, many tutorials are still being written using this old and unnecessary interface.
Updated code is then as follows:
import torch
import torchvision
import torchvision.models as models
from PIL import Image
img = Image.open("Documents/01235.png")
# Load the pretrained model
model = models.resnet18(pretrained=True)
# Use the model object to select the desired layer
layer = model._modules.get('avgpool')
# Set model to evaluation mode
model.eval()
transforms = torchvision.transforms.Compose([
torchvision.transforms.Resize(256),
torchvision.transforms.CenterCrop(224),
torchvision.transforms.ToTensor(),
torchvision.transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
def get_vector(image):
# Create a PyTorch tensor with the transformed image
t_img = transforms(image)
# Create a vector of zeros that will hold our feature vector
# The 'avgpool' layer has an output size of 512
my_embedding = torch.zeros(512)
# Define a function that will copy the output of a layer
def copy_data(m, i, o):
my_embedding.copy_(o.flatten()) # <-- flatten
# Attach that function to our selected layer
h = layer.register_forward_hook(copy_data)
# Run the model on our transformed image
with torch.no_grad(): # <-- no_grad context
model(t_img.unsqueeze(0)) # <-- unsqueeze
# Detach our copy function from the layer
h.remove()
# Return the feature vector
return my_embedding
pic_vector = get_vector(img)
model(t_img) Instead of this
Here just do--
model(t_img[None])
This will add an extra dimension, hence the image will be of shape [1,3,224,224] and it will work.
Related
I wish to take the FFT of the input dataset loaded using ImageDataGenerator. Taking the FFT will double the number of channels as I stack the real and complex parts of the complex output of the FFT together along the channels dimension. The preprocessing_function attribute of the ImageDataGenerator class should output a Numpy tensor with the same shape as the input, so I could not use that.
I tried applying tf.math.fft2d directly on the ImageDataGenerator.flow_from_directory() output, but it is consuming too much RAM - causing the program to crash on Google colab. Another way I tried was to add a custom layer computing the FFT as the first layer of my neural network, but this adds to the training time. So I wish to do it as a pre-processing step.
Could anyone kindly suggest an efficient way to apply a function on ImageDataGenerator.
You can do a custom ImageDataGenerator, but I have no reason to think this is any faster than using it in the first layer. It seems like a costly operation, since tf.signal.fft2d takes complex64 or complex128 dtypes. So it needs casting, and then casting back because neural network weights are tf.float32 and other image processing functions don't take complex dtype.
import tensorflow as tf
labels = ['Cats', 'Dogs', 'Others']
def read_image(file_name):
image = tf.io.read_file(file_name)
image = tf.image.decode_jpeg(image, channels=3)
image = tf.image.convert_image_dtype(image, tf.float32)
image = tf.image.resize_with_pad(image, target_height=224, target_width=224)
image = tf.cast(image, tf.complex64)
image = tf.signal.fft2d(image)
label = tf.strings.split(file_name, '\\')[-2]
label = tf.where(tf.equal(label, labels))
return image, label
ds = tf.data.Dataset.list_files(r'path\to\my\pictures\*\*.jpg')
ds = ds.map(read_image)
next(iter(ds))
(Apologies for the long post)
All,
I want to use the bottleneck features from a pretrained Inceptionv3 model to predict classification for my input images. Before training a model and predicting classification, I tried 3 different approaches for extracting the bottleneck features.
My 3 approaches yielded different bottleneck features (not just in values but even the size was different).
Size of my bottleneck features from Approach 1 and 2: (number of input images) x 3 x 3 x 2048
Size of my bottleneck features from Approach 3: (number of input images) x 2048
Why are the sizes different between the Keras based Inceptionv3 model and the native Tensorflow model? My guess is that when I say include_top=False in Keras, I'm not extracting the 'pool_3/_reshape:0' layer. Is this correct? If yes, how do I extract the 'pool_3/_reshape:0' layer in Keras? If my guess is incorrect, what 'am I missing?
I compared the bottleneck feature values from Approach 1 and 2 and they were significantly different. I think I'm feeding it the same input images because I resize and rescale my images before I even read it as input for my script. I have no options for my ImageDataGenerator in Approach 1 and according to the documentation for that function all the default values do not change my input image. I have set shuffle to false so I assumed that predict_generator and predict are reading images in the same order. What 'am I missing?
Please note:
My inputs images are in RGB format (so number of channels = 3) and I resized all of them to 150x150. I used the preprocess_input function in inceptionv3.py to preprocess all my images.
def preprocess_input(image):
image /= 255.
image -= 0.5
image *= 2.
return image
Approach 1: Used Keras with tensorflow as backend, an ImageDataGenerator to read my data and model.predict_generator to compute bottleneck features
I followed the example (Section Using the bottleneck features of a pre-trained network: 90% accuracy in a minute) from Keras' blog. Instead of VGG model listed there I used Inceptionv3. Below is the snippet of code I used
(code not shown here but what i did before the code below) : read all input images, resize to 150x150x3, rescale according to the preprocessing_input function mentioned above, save the resized and rescaled images
train_datagen = ImageDataGenerator()
train_generator = train_datagen.flow_from_directory(my_input_dir, target_size=(150,150),shuffle=False, batch_size=16)
# get bottleneck features
# use pre-trained model and exclude top layer - which is used for classification
pretrained_model = InceptionV3(include_top=False, weights='imagenet', input_shape=(150,150,3))
bottleneck_features_train_v1 = pretrained_model.predict_generator(train_generator,len(train_generator.filenames)//16)
Approach 2: Used Keras with tensorflow as backend, my own reader and model.predict to compute bottleneck features
Only difference between this approach and earlier one is that I used my own reader to read the input images.
(code not shown here but what i did before the code below) : read all input images, resize to 150x150x3, rescale according to the preprocessing_input function mentioned above, save the resized and rescaled images
# inputImages is a numpy array of size <number of input images x 150 x 150 x 3>
inputImages = readAllJPEGsInFolderAndMergeAsRGB(my_input_dir)
# get bottleneck features
# use pre-trained model and exclude top layer - which is used for classification
pretrained_model = InceptionV3(include_top=False, weights='imagenet', input_shape=(img_width, img_height, 3))
bottleneck_features_train_v2 = pretrained_model.predict(trainData.images,batch_size=16)
Approach 3: Used tensorflow (NO KERAS) compute bottleneck features
I followed retrain.py to extract bottleneck features for my input images. Please note that that the weights from that script can be obtained from (http://download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz)
As mentioned in that example, I used the bottleneck_tensor_name = 'pool_3/_reshape:0' as the layer to extract and compute bottleneck features. Similar to the first 2 approaches, I used resized and rescaled images as input to the script and I called this feature list bottleneck_features_train_v3
Thank you so much
Different results between 1 and 2
Since you haven't shown your code, I (maybe wrongly) suggest that the problem is that you might not have used preprocess_input when declaring ImageDataGenerator ?
from keras.applications.inception_v3 import preprocess_input
train_datagen = ImageDataGenerator(preprocessing_function=preprocess_input)
Make sure, though, that your saved image files range from 0 to 255. (Bit depth 24).
Different shapes between 1 and 3
There are three possible types of model in this case:
include_top = True -> this will return classes
include_top = False (only) -> this implies in pooling = None (no final pooling layer)
include_top = False, pooling='avg' or ='max' -> has a pooling layer
So, your declared model without an explicit pooling=something doesn't have the final pooling layer in keras. Then the outputs will still have the spatial dimensions.
Solve that simply by adding a pooling at the end. One of these:
pretrained_model = InceptionV3(include_top=False, pooling = 'avg', weights='imagenet', input_shape=(img_width, img_height, 3))
pretrained_model = InceptionV3(include_top=False, pooling = 'max', weights='imagenet', input_shape=(img_width, img_height, 3))
Not sure which one the model in the tgz file is using.
As an alternative, you can also get another layer from the Tensorflow model, the one coming immediately before 'pool_3'.
You can look into the Keras implementation of inceptionv3 here:
https://github.com/keras-team/keras/blob/master/keras/applications/inception_v3.py
so, the default parameter is:
def InceptionV3(include_top=True,
weights='imagenet',
input_tensor=None,
input_shape=None,
pooling=None,
classes=1000):
Notice that default for pooling=None, then when building the model, the code is:
if include_top:
# Classification block
x = GlobalAveragePooling2D(name='avg_pool')(x)
x = Dense(classes, activation='softmax', name='predictions')(x)
else:
if pooling == 'avg':
x = GlobalAveragePooling2D()(x)
elif pooling == 'max':
x = GlobalMaxPooling2D()(x)
# Ensure that the model takes into account
# any potential predecessors of `input_tensor`.
if input_tensor is not None:
inputs = get_source_inputs(input_tensor)
else:
inputs = img_input
# Create model.
model = Model(inputs, x, name='inception_v3')
So if you do not specify the pooling the bottleneck feature is extracted without any pooling, you need to specify if you want to get an average pooling or max pooling on top of these feature.
it is possible to set as param filter array with own filters instead of number of filters in Conv2D
filters = [[[1,0,0],[1,0,0],[1,0,0]],
[[1,0,0],[0,1,0],[0,0,1]],
[[0,1,0],[0,1,0],[0,1,0]],
[[0,0,1],[0,0,1],[0,0,1]]]
model = Sequential()
model.add(Conv2D(filters, (3, 3), activation='relu', input_shape=(3, 1024, 1024), data_format='channels_first'))
The accepted answer is right but it would certainly be more useful with a complete example, similar to the one provided in this excellent tensorflow example showing what Conv2d does.
For keras, this is,
from keras.models import Sequential
from keras.layers import Conv2D
import numpy as np
# Keras version of this example:
# https://stackoverflow.com/questions/34619177/what-does-tf-nn-conv2d-do-in-tensorflow
# Requires a custom kernel initialise to set to value from example
# kernel = [[1,0,1],[2,1,0],[0,0,1]]
# image = [[4,3,1,0],[2,1,0,1],[1,2,4,1],[3,1,0,2]]
# output = [[14, 6],[6,12]]
#Set Image
image = [[4,3,1,0],[2,1,0,1],[1,2,4,1],[3,1,0,2]]
# Pad to "channels_last" format
# which is [batch, width, height, channels]=[1,4,4,1]
image = np.expand_dims(np.expand_dims(np.array(image),2),0)
#Initialise to set kernel to required value
def kernel_init(shape):
kernel = np.zeros(shape)
kernel[:,:,0,0] = np.array([[1,0,1],[2,1,0],[0,0,1]])
return kernel
#Build Keras model
model = Sequential()
model.add(Conv2D(1, [3,3], kernel_initializer=kernel_init,
input_shape=(4,4,1), padding="valid"))
model.build()
# To apply existing filter, we use predict with no training
out = model.predict(image)
print(out[0,:,:,0])
which outputs
[[14, 6]
[6, 12]]
as expected.
You must have in mind that the purpose of a Conv2D network is to train these filters values. I mean, in a traditional image processing task using morphological filters we are supposed to design the filter kernels and then iterate them through the whole image (convolution).
In a deep learning approach we are trying to do the same task. But here instead we assume we don't know which filters should be used, although we know exactly what we are looking for (the labeled images). When we are training a convolutional neural network we are showing to it what we want and asking it to find out its own weights, i.e. the filters values.
So, in this context, we should just define how many filters we want to train (in your case, 4 filters) and how they will be initialized. Their weights will be set when training the network.
There are many ways to initialize your filters weights (e.g. setting them all to zero or one; or using a random function to guarantee that distinct image characteristics would be catched by them). The Keras Conv2D function uses as default the 'glorot uniform' algorithm, as specified in https://keras.io/layers/convolutional/#conv2d.
If you really want to initialize your filters weights in the way you have showed, you can write your own function (take a look at https://keras.io/initializers/) and pass it via kernel_initializer parameter:
model.add(Conv2D(number_of_filters, (3, 3), activation='relu', input_shape=(3, 1024, 1024), kernel_initializer=your_function, data_format='channels_first'))
I am trying to use a function that uses some OpenCV function on the image. But the data I am getting is a tensor and I am not able to convert it into an image.
def image_func(img):
img=cv2.cvtColor(img,cv2.COLOR_BGR2YUV)
img=cv2.resize(img,(200,66))
return img
model=Sequential()
model.add(Lambda(get_ideal_img,input_shape=(r,c,ch),output_shape=(r,c,ch)))
When I run this snippet it throws an error in the cvtColor function saying that img is not a numpy array. I printed out img and it seemed to be a tensor.
I do not know how to change the tensor to an image and then return the tensor as well. I want the model to have this layer.
If I cannot achieve this with a lambda layer what else can I do?
You confused with the symbolic operation in the Lambda layer with the numerical operation in a python function.
Basically, your custom operation accepts numerical inputs but not symbolic ones. To fix this, what you need is something like py_func in tensorflow
In addition, you have not considered the backpropagation. In short, although this layer is non-parametric and non-learnable, you need to take care of its gradient as well.
import tensorflow as tf
from keras.layers import Input, Conv2D, Lambda
from keras.models import Model
from keras import backend as K
import cv2
def image_func(img):
img=cv2.cvtColor(img,cv2.COLOR_BGR2YUV)
img=cv2.resize(img,(200,66))
return img.astype('float32')
def image_tensor_func(img4d) :
results = []
for img3d in img4d :
rimg3d = image_func(img3d )
results.append( np.expand_dims( rimg3d, axis=0 ) )
return np.concatenate( results, axis = 0 )
class CustomLayer( Layer ) :
def call( self, xin ) :
xout = tf.py_func( image_tensor_func,
[xin],
'float32',
stateful=False,
name='cvOpt')
xout = K.stop_gradient( xout ) # explicitly set no grad
xout.set_shape( [xin.shape[0], 66, 200, xin.shape[-1]] ) # explicitly set output shape
return xout
def compute_output_shape( self, sin ) :
return ( sin[0], 66, 200, sin[-1] )
x = Input(shape=(None,None,3))
f = CustomLayer(name='custom')(x)
y = Conv2D(1,(1,1), padding='same')(x)
model = Model( inputs=x, outputs=y )
print model.summary()
Now you can test this layer with some dummy data.
a = np.random.randn(2,100,200,3)
b = model.predict(a)
print b.shape
model.compile('sgd',loss='mse')
model.fit(a,b)
Im going to assume image_func function does what you want (resize) and image. Note that an image is represent by a numpy array. Since you are using the tensorflow backend you are operating over Tensors (this you knew).
The job now is to convert a Tensor to a numpy array. To do that we need to
evaluate the Tensor using its evaluate the tensor. But inorder to do that we need a to grab a tensor flow session.
Use the get_session() method of the keras backend module to grab the current tensorflow session.
Here is the docstring for get_session()
def get_session():
"""Returns the TF session to be used by the backend.
If a default TensorFlow session is available, we will return it.
Else, we will return the global Keras session.
If no global Keras session exists at this point:
we will create a new global session.
Note that you can manually set the global session
via `K.set_session(sess)`.
# Returns
A TensorFlow session.
"""
So try:
def image_func(img)
from keras import backend as K
sess = K.get_session()
img = sess.run(img) # now img is a proper numpy array
img=cv2.cvtColor(img,cv2.COLOR_BGR2YUV)
img=cv2.resize(img,(200,66))
return img
Note, I haven't been able to test this
EDIT: Just tested this and it won't work (as you noticed). The lambda function needs to return
Tensor. Computation flows throw a Tensor so it also needs to be to be smooth in the sense of differentiation.
I see that essentially the lambda is changing the color and resizing the image, why don't you do this in pre-processing step?
I have been given a trained neural network in torch and I need to rebuild it exactly in tensorflow. I believe I have correctly defined the network's architecture in tensorflow but I am having trouble transferring the weight and bias tensors. Using a third party package, I converted all the weight and bias tensors from the torch network to numpy arrays then wrote them to disk. I can load them back into my python program but I cannot figure out a way to assign them to the corresponding layers in my tensorflow network.
For instance, I have a convolution layer defined in tensorflow as
kernel_1 = tf.Variable(tf.truncated_normal([11,11,3,64], stddev=0.1))
conv_kernel_1 = tf.nn.conv2d(input, kernel_1, [1,4,4,1], padding='SAME')
biases_1 = tf.Variable(tf.zeros[64])
bias_layer_1 = tf.nn_add(conv_kernel_1, biases_1)
According to the tensorflow documentation, the tf.nn.conv2d operation uses the shape defined in the kernel_1 variable to construct the weight tensor. However, I cannot figure out how to access that weight tensor to set it to the weight array I have loaded from file.
Is it possible to explicitly set the weight tensor? And if so, how?
(The same question applies to bias tensor.)
If you have the weights and biases in a NumPy array, it should be easy to connect them into your TensorFlow network:
weights_1_array = ... # ndarray of weights for layer 1
biases_1_array = ... # ndarray of biases for layer 1
conv_kernel_1 = tf.nn.conv2d(input, weights_1_array, [1, 4, 4, 1], padding='SAME')
bias_layer_1 = tf.nn.bias_add(conv_kernel_1, biases_1_array)
Note that you must ensure that weights_1_array and biases_1_array are in the correct data format. See the documentation for tf.nn.conv2d() for an explanation of the required filter shape.