Hello Guys I have a question about different Input Sizes.
My training set and validation dataset have an input Size of 256 and for my prediction (with an unseen Test Dataset) I have an input size of 496.
class Net(nn.Module):
def __init__(self, shape):
self.conv1 = nn.Conv1d(shape,1,1)
self.batch1 = nn.BatchNorm1d(1)
self.avgpl1 = nn.AvgPool1d(1, stride=1)
self.fc1 = nn.Linear(1,3)
#forward method
def forward(self,x):
x = self.conv1(x)
x = self.batch1(x)
x = F.relu(x)
x = self.avgpl1(x)
x = torch.flatten(x,1)
x = F.log_softmax(self.fc1(x))
return x
I saved the model and wanna use it also for my prediction.
Error Message is:
Input In [244], in predict_data(prediction_data, model_path, data_config, context)
25 new_model = Net(shape_preprocessed_data)
26 # load the previously saved state_dict
---> 27 new_model.load_state_dict(torch.load("NetModel.pth"))
29 # check if predictions of models are equal
31 # generate random input of size (N,C,H,W)
33 # switch to eval mode for both models
34 model = model.eval()
RuntimeError: Error(s) in loading state_dict for Net:
size mismatch for conv1.weight: copying a param with shape
torch.Size([1, 256, 1]) from checkpoint, the shape in current model is torch.Size([1, 494, 1]).
How can I solve this?
You could reshape/downsample the input as the first step of the forward pass in your model. This can be done using the torch.nn.functional.interpolate function.
For example:
class Net(nn.Module):
def __init__(self, shape):
self.input_shape = shape
self.conv1 = nn.Conv1d(shape,1,1)
self.batch1 = nn.BatchNorm1d(1)
self.avgpl1 = nn.AvgPool1d(1, stride=1)
self.fc1 = nn.Linear(1,3)
#forward method
def forward(self,x):
x = torch.nn.functional.interpolate(x, size=self.input_shape)
x = self.conv1(x)
x = self.batch1(x)
x = F.relu(x)
x = self.avgpl1(x)
x = torch.flatten(x,1)
x = F.log_softmax(self.fc1(x))
return x
Your test images would then be downsampled to size 256 in order to be compatible with the model.
It seems that the saved model was initialized with shape, the number of input channels equal to 256, while the model you are trying to load the weight onto new_model was initialized with 494.
But this value refers to the feature size, not the sequence length. I believe you might have mixed up the two things. The feature size should remain constant. But in your case it's hard to say what you are trying to do since you are not providing information about the kind of dataset used.
Try using nn.AdaptiveAvgPool1d(output_size) instead of nn.AvgPool1d, and mention the desired output size. Refer to this for detailed explanation of how Adaptive average pooling works in Pytorch.
I have read other people's questions for similar issues, but can't figure it out in my case. My code is below, how do I fix this? Thank you.
data = ImageFolder(data_dir, transform=transforms.Compose([transforms.Resize((224,224)),transforms.ToTensor()]))
trainloader = torch.utils.data.DataLoader(data, batch_size=3600,
shuffle=True, num_workers=2)
dataiter = iter(trainloader)
x_train, y_train = dataiter.next()
torch.Size([3600, 3, 224, 224])
class Net(torch.nn.Module):
def __init__(self):
# here we set up the tensors......
self.layer1 = torch.nn.Linear(224, 12)
self.layer2 = torch.nn.Linear(12, 10)
def forward(self, x):
# here we define the (forward) computational graph,
# in terms of the tensors, and elt-wise non-linearities
x = F.relu(self.layer1(x))
x = self.layer2(x)
return x
net = Net()
y = net.forward(x_train)
lossFn = torch.nn.CrossEntropyLoss()
loss = lossFn(y, y_train)
Your input to the network is a 2D image. That is a tensor with 4 dimensions: batch, channel, height and width.
However, you treat the 2D input as a 1D signal by applying nn.Linear layers to its width dimension only, resulting with an output of shape batchchannelheight*output_dim. In contrast, the nn.CrossEntropyLoss expects only one output vector per target label.
You need to change your Net to properly process images into a single vector of predictions.
You can checkout milestone image classification architectures here.
I have developed a model with three inputs types. Image, categorical data and numerical data. For Image data I've used ResNet50 for the other two I develop my own network.
class MulticlassClassification(nn.Module):
def __init__(self, cat_size, num_col, output_size, layers, p=0.4):
super(MulticlassClassification, self).__init__()
# IMAGE: ResNet
self.cnn = models.resnet50(pretrained = True)
for param in self.cnn.parameters():
param.requires_grad = False
n_inputs = self.cnn.fc.in_features
self.cnn.fc = nn.Sequential(
nn.Linear(n_inputs, 250),
nn.Linear(250, output_size),
self.all_embeddings = nn.ModuleList(
[nn.Embedding(categories, size) for categories, size in cat_size]
self.embedding_dropout = nn.Dropout(p)
self.batch_norm_num = nn.BatchNorm1d(num_col)
all_layers = []
num_cat_col = sum(e.embedding_dim for e in self.all_embeddings)
input_size = num_cat_col + num_col
for i in layers:
all_layers.append(nn.Linear(input_size, i))
input_size = i
all_layers.append(nn.Linear(layers[-1], output_size))
self.layers = nn.Sequential(*all_layers)
self.combine_fc = nn.Linear(output_size * 2, output_size)
def forward(self, image, x_categorical, x_numerical):
embeddings = []
for i, embedding in enumerate(self.all_embeddings):
x = torch.cat(embeddings, 1)
x = self.embedding_dropout(x)
x_numerical = self.batch_norm_num(x_numerical)
x = torch.cat([x, x_numerical], 1)
x = self.layers(x)
# img
x2 = self.cnn(image)
# combine
x3 = torch.cat([x, x2], 1)
x3 = F.relu(self.combine_fc(x3))
return x
Now after successful training I would like to calculate integrated gradients by using the captum library.
from captum.attr import IntegratedGradients
ig = IntegratedGradients(model)
testiter = iter(testloader)
img, stack_cat, stack_num, target = next(testiter)
attributions_ig = ig.attribute(inputs=(img.cuda(), stack_cat.cuda(), stack_num.cuda()), target=target.cuda())
And here I got an error:
RuntimeError: Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got torch.cuda.FloatTensor instead (while checking arguments for embedding)
I figured out that captum injects a wrongly shaped tensor into my x_categorical input (with the print in my forward method). It seems like captum only sees the first input tensor and uses it's shape for all other inputs. How can I change this behaviour?
I've found the similar issue here (https://github.com/pytorch/captum/issues/439). It was recommended to use Interpretable Embedding for categorical data. When I used it I got this error:
IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)
I would be very grateful for any tips and advises how to combine all three inputs and to solve my problem.
This is a minimally working/reproducible example:
import torch
import torch.nn as nn
from torchsummary import summary
class Network(nn.Module):
def __init__(self, channels_img, features_d, num_classes, img_size):
super(Network, self).__init__()
self.img_size = img_size
self.disc = nn.Conv2d(
in_channels = channels_img + 1,
out_channels = features_d,
kernel_size = (4,4)
# ConditionalGan:
self.embed = nn.Embedding(
num_embeddings = num_classes,
embedding_dim = img_size * img_size
def forward(self, x, labels):
embedding = self.embed(labels).view(labels.shape[0], 1, self.img_size, self.img_size)
x = torch.cat([x, embedding], dim = 1)
return self.disc(x)
# device:
device = torch.device("cpu")
# hyperparameter:
batch_size = 64
# Initialize model:
model = Network(
channels_img = 1,
features_d = 16,
num_classes = 10,
img_size = 28).to(device)
# Print model summary:
input_size = [(1, 28, 28), (1, 28, 28)], # MNIST
batch_size = batch_size
The error message I get is (for the line with summary(...)):
Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.cuda.FloatTensor instead (while checking arguments for embedding)
I saw in this post, that .to(torch.int64) is supposed to help, but I honestly don't know where to write it.
Thank you!
The problem lies here:
An embedding layer is kind of a mapping between discrete indices and continuous values, as stated here. That is, its inputs should be integers and it will give you back floats. In your case, for example, you are embedding class labels of the MNIST which range from 0 to 9, to a contiuum (for some reason that I don't know as i'm not familiar with GANs :)). But in short, that embedding layer will give a transformation of 10 -> 784 for you and those 10 numbers should be integers, PyTorch says.
A fancy name for an integer type is "long", so you need to make sure the data type of what goes into self.embed is of that type. There are some ways to do that:
Long datatype is really an 64 bit integer (you may see here), so all these work.
I am trying to implement a hierarchical transformer for document classification in Keras/tensorflow, in which:
(1) a word-level transformer produces a representation of each sentence, and attention weights for each word, and,
(2) a sentence-level transformer uses the outputs from (1) to produce a representation of each document, and attention weights for each sentence, and finally,
(3) the document representations produced by (2) are used to classify documents (in the following example, as belonging or not belonging to a given class).
I am attempting to model the classifier on Yang et al.'s approach here (https://www.cs.cmu.edu/~./hovy/papers/16HLT-hierarchical-attention-networks.pdf), but replacing the GRU and attention layers with transformers.
I am using Apoorv Nandan's transformer implementation from https://keras.io/examples/nlp/text_classification_with_transformer/.
I have two issues for which I would be grateful for the community's help:
(1) I get an error in the upper (sentence) level model that I can't resolve (details and code below)
(2) I don't know how to extract the word- and sentence-level attention weights, and value advice on how best to do this.
I am new to both Keras and this forum, so apologies for obvious mistakes and thank you in advance for any help.
Here is a reproducible example, indicating where I encounter errors:
First, establish the multi-head attention, transformer, and token/position embedding layers, after Nandan.
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import pandas as pd
import numpy as np
class MultiHeadSelfAttention(layers.Layer):
def __init__(self, embed_dim, num_heads=8):
super(MultiHeadSelfAttention, self).__init__()
self.embed_dim = embed_dim
self.num_heads = num_heads
if embed_dim % num_heads != 0:
raise ValueError(
f"embedding dimension = {embed_dim} should be divisible by number of heads = {num_heads}"
self.projection_dim = embed_dim // num_heads
self.query_dense = layers.Dense(embed_dim)
self.key_dense = layers.Dense(embed_dim)
self.value_dense = layers.Dense(embed_dim)
self.combine_heads = layers.Dense(embed_dim)
def attention(self, query, key, value):
score = tf.matmul(query, key, transpose_b=True)
dim_key = tf.cast(tf.shape(key)[-1], tf.float32)
scaled_score = score / tf.math.sqrt(dim_key)
weights = tf.nn.softmax(scaled_score, axis=-1)
output = tf.matmul(weights, value)
return output, weights
def separate_heads(self, x, batch_size):
x = tf.reshape(x, (batch_size, -1, self.num_heads, self.projection_dim))
return tf.transpose(x, perm=[0, 2, 1, 3])
def call(self, inputs):
# x.shape = [batch_size, seq_len, embedding_dim]
batch_size = tf.shape(inputs)[0]
query = self.query_dense(inputs) # (batch_size, seq_len, embed_dim)
key = self.key_dense(inputs) # (batch_size, seq_len, embed_dim)
value = self.value_dense(inputs) # (batch_size, seq_len, embed_dim)
query = self.separate_heads(
query, batch_size
) # (batch_size, num_heads, seq_len, projection_dim)
key = self.separate_heads(
key, batch_size
) # (batch_size, num_heads, seq_len, projection_dim)
value = self.separate_heads(
value, batch_size
) # (batch_size, num_heads, seq_len, projection_dim)
attention, weights = self.attention(query, key, value)
attention = tf.transpose(
attention, perm=[0, 2, 1, 3]
) # (batch_size, seq_len, num_heads, projection_dim)
concat_attention = tf.reshape(
attention, (batch_size, -1, self.embed_dim)
) # (batch_size, seq_len, embed_dim)
output = self.combine_heads(
) # (batch_size, seq_len, embed_dim)
return output
class TransformerBlock(layers.Layer):
def __init__(self, embed_dim, num_heads, ff_dim, dropout_rate, name=None):
super(TransformerBlock, self).__init__(name=name)
self.att = MultiHeadSelfAttention(embed_dim, num_heads)
self.ffn = keras.Sequential(
[layers.Dense(ff_dim, activation="relu"), layers.Dense(embed_dim),]
self.layernorm1 = layers.LayerNormalization(epsilon=1e-6)
self.layernorm2 = layers.LayerNormalization(epsilon=1e-6)
self.dropout1 = layers.Dropout(dropout_rate)
self.dropout2 = layers.Dropout(dropout_rate)
def call(self, inputs, training):
attn_output = self.att(inputs)
attn_output = self.dropout1(attn_output, training=training)
out1 = self.layernorm1(inputs + attn_output)
ffn_output = self.ffn(out1)
ffn_output = self.dropout2(ffn_output, training=training)
return self.layernorm2(out1 + ffn_output)
class TokenAndPositionEmbedding(layers.Layer):
def __init__(self, maxlen, vocab_size, embed_dim, name=None):
super(TokenAndPositionEmbedding, self).__init__(name=name)
self.token_emb = layers.Embedding(input_dim=vocab_size, output_dim=embed_dim)
self.pos_emb = layers.Embedding(input_dim=maxlen, output_dim=embed_dim)
def call(self, x):
maxlen = tf.shape(x)[-1]
positions = tf.range(start=0, limit=maxlen, delta=1)
positions = self.pos_emb(positions)
x = self.token_emb(x)
return x + positions
For the purpose of this example, the data are 10,000 documents, each truncated to 15 sentences, each sentence with a maximum of 60 words, which are already converted to integer tokens 1-1000.
X is a 3-D tensor (10000, 15, 60) containing these tokens. y is a 1-D tensor containing the classes of the documents (1 or 0). For the purpose of this example there is no relation between X and y.
The following produces the example data:
max_docs = 10000
max_sentences = 15
max_words = 60
X = tf.random.uniform(shape=(max_docs, max_sentences, max_words), minval=1, maxval=1000, dtype=tf.dtypes.int32, seed=1)
y = tf.random.uniform(shape=(max_docs,), minval=0, maxval=2, dtype=tf.dtypes.int32, seed=1)
Here I attempt to construct the word level encoder, after https://keras.io/examples/nlp/text_classification_with_transformer/:
# Lower level (produce a representation of each sentence):
embed_dim = 100 # Embedding size for each token
num_heads = 2 # Number of attention heads
ff_dim = 64 # Hidden layer size in feed forward network inside transformer
L1_dense_units = 100 # Size of the sentence-level representations output by the word-level model
dropout_rate = 0.1
word_input = layers.Input(shape=(max_words,), name='word_input')
word_embedding = TokenAndPositionEmbedding(maxlen=max_words, vocab_size=vocab_size,
embed_dim=embed_dim, name='word_embedding')(word_input)
word_transformer = TransformerBlock(embed_dim=embed_dim, num_heads=num_heads, ff_dim=ff_dim,
dropout_rate=dropout_rate, name='word_transformer')(word_embedding)
word_pool = layers.GlobalAveragePooling1D(name='word_pooling')(word_transformer)
word_drop = layers.Dropout(dropout_rate,name='word_drop')(word_pool)
word_dense = layers.Dense(L1_dense_units, activation="relu",name='word_dense')(word_drop)
word_encoder = keras.Model(word_input, word_dense)
It looks as though this word encoder works as intended to produce a representation of each sentence. Here, run on the 1st document, it produces a tensor of shape (15, 100), containing the vectors representing each of 15 sentences:
My problem is in connecting this to the higher (sentence) level model, to produce document representations.
I get error "NotImplementedError" when trying to apply the word encoder to each sentence in a document. I would be grateful for any help in fixing this issue, since the error message is not informative as to the specific problem.
After applying the word encoder to each sentence, the goal is to apply another transformer to produce attention weights for each sentence, and a document-level representation with which to perform classification. I can't determine whether this part of the model will work because of the error above.
Finally, I would like to extract word- and sentence-level attention weights for each document, and would be grateful for advice on how to do so.
Thank you in advance for any insight.
# Upper level (produce a representation of each document):
L2_dense_units = 100
sentence_input = layers.Input(shape=(max_sentences, max_words), name='sentence_input')
# This is the line producing "NotImplementedError":
sentence_encoder = tf.keras.layers.TimeDistributed(word_encoder, name='sentence_encoder')(sentence_input)
sentence_transformer = TransformerBlock(embed_dim=L1_dense_units, num_heads=num_heads, ff_dim=ff_dim,
dropout_rate=dropout_rate, name='sentence_transformer')(sentence_encoder)
sentence_dense = layers.TimeDistributed(Dense(int(L2_dense_units)),name='sentence_dense')(sentence_transformer)
sentence_out = layers.Dropout(dropout_rate)(sentence_dense)
preds = layers.Dense(1, activation='sigmoid', name='sentence_output')(sentence_out)
model = keras.Model(sentence_input, preds)
I got NotImplementedError as well while trying to do the same thing as you. The thing is Keras's TimeDistributed layer needs to know its inner custom layer's output shapes. So you should add compute_output_shape method to your custom layers.
In your case MultiHeadSelfAttention, TransformerBlock and TokenAndPositionEmbedding layers should include:
class MultiHeadSelfAttention(layers.Layer):
def compute_output_shape(self, input_shape):
# it does not change the shape of its input
return input_shape
class TransformerBlock(layers.Layer):
def compute_output_shape(self, input_shape):
# it does not change the shape of its input
return input_shape
class TokenAndPositionEmbedding(layers.Layer):
def compute_output_shape(self, input_shape):
# it changes the shape from (batch_size, maxlen) to (batch_size, maxlen, embed_dim)
return input_shape + (self.pos_emb.output_dim,)
After you add these methods you should be able to run your code.
As for your second question, I am not sure but maybe you can return the "weights" variable that is returned from MultiHeadSelfAttention's attention method in call methods of both MultiHeadSelfAttention and TransformerBlock. So that you can access it where you build your model.
I am creating an RNN with pytorch, it looks like this:
class MyRNN(nn.Module):
def __init__(self, batch_size, n_inputs, n_neurons, n_outputs):
super(MyRNN, self).__init__()
self.n_neurons = n_neurons
self.batch_size = batch_size
self.n_inputs = n_inputs
self.n_outputs = n_outputs
self.basic_rnn = nn.RNN(self.n_inputs, self.n_neurons)
self.FC = nn.Linear(self.n_neurons, self.n_outputs)
def init_hidden(self, ):
# (num_layers, batch_size, n_neurons)
return torch.zeros(1, self.batch_size, self.n_neurons)
def forward(self, X):
self.batch_size = X.size(0)
self.hidden = self.init_hidden()
lstm_out, self.hidden = self.basic_rnn(X, self.hidden)
out = self.FC(self.hidden)
return out.view(-1, self.n_outputs)
My input x looks like this:
tensor([[-1.0173e-04, -1.5003e-04, -1.0218e-04, -7.4541e-05, -2.2869e-05,
-7.7171e-02, -4.4630e-03, -5.0750e-05, -1.7911e-04, -2.8082e-04,
-9.2992e-06, -1.5608e-05, -3.5471e-05, -4.9127e-05, -3.2883e-01],
[-1.1193e-04, -1.6928e-04, -1.0218e-04, -7.4541e-05, -2.2869e-05,
-7.7171e-02, -4.4630e-03, -5.0750e-05, -1.7911e-04, -2.8082e-04,
-9.2992e-06, -1.5608e-05, -3.5471e-05, -4.9127e-05, -3.2883e-01],
[-6.9490e-05, -8.9197e-05, -1.0218e-04, -7.4541e-05, -2.2869e-05,
-7.7171e-02, -4.4630e-03, -5.0750e-05, -1.7911e-04, -2.8082e-04,
-9.2992e-06, -1.5608e-05, -3.5471e-05, -4.9127e-05, -3.2883e-01]],
and is a batch of 64 vectors with size 15.
When trying to test this model by doing:
I get the following error:
File "/home/tt/anaconda3/envs/venv/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 126, in check_forward_args
expected_input_dim, input.dim()))
RuntimeError: input must have 3 dimensions, got 2
How can I fix it?
You are missing one of the required dimensions for the RNN layer.
Per the documentation, your input size needs to be of shape (sequence length, batch, input size).
So - with the example above, you are missing one of these. Based on your variable names, it appears you are trying to pass 64 examples of 15 inputs each... if that’s true, you are missing sequence length.
With an RNN, the sequence length is the number of times you want the layer to recur. For example, in NLP your sequence length might be equal to the number of words in a sentence, while batch size would be the number of sentences you are passing, and input size would be the vector size of each word.
You might not need an RNN here if you are just trying to do use 64 samples of size 15.
See the documentation, the RNN layer expects
input of shape (seq_len, batch, input_size): tensor containing the features of the input sequence.
In your case it seems that your "size" is the length of the sequence, and you have one feature at every timestep. Edited for 15 features, one timestep
# 15 features, 150 neurons
rnn = nn.RNN(15, 150)
# sequence of length 1, batch size 64, 15 features
x = torch.rand(1, 64, 15)
res, _ = rnn(x)
# => torch.Size([1, 64, 150])
Also note that you don't need to prespecify batch size.