I am trying to make a model to predict insurance cost based on the individual. And this is the code for it.
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import DataLoader, TensorDataset
import numpy as np
import pandas as pd
from LSR import ListSearchReplace as LSR
csv = pd.read_csv("main.csv")
partialInputs = csv[["age", "bmi", "children"]]
smoker, sex = list(csv["smoker"]), list(csv["sex"])
L1 = LSR(smoker)
L1.replace("yes", 1, True)
L1.replace("no", 0, True)
L2 = LSR(sex)
L2.replace("female", 1, True)
L2.replace("male", 0, True)
pdReadySmoker = pd.DataFrame({"smoker": smoker})
pdReadySex = pd.DataFrame({"sex": sex})
SmokerAndSex = pd.merge(pdReadySmoker, pdReadySex, how="outer", left_index=True, right_index=True)
INPUTS = pd.merge(partialInputs, SmokerAndSex, how="outer", left_index=True, right_index=True)
TARGETS = csv["charges"]
INPUTS = torch.from_numpy(np.array(INPUTS, dtype='float32'))
TARGETS = torch.from_numpy(np.array(TARGETS, dtype='float32'))
print(INPUTS.shape, TARGETS.shape)
loss_fn = F.mse_loss
model = nn.Linear(5, 3) # <-- changing this, changes the error message.
opt = torch.optim.SGD(model.parameters(), lr=1e-5)
trainDataset = TensorDataset(INPUTS, TARGETS)
BATCH_SIZE = 5
trainDataloader = DataLoader(trainDataset, BATCH_SIZE, shuffle=True)
def fit(numEpochs, model, loss_fn, opt, trainDataloader):
for epochs in range(numEpochs):
for inputBatch, targetBatch in trainDataloader:
preds = model(inputBatch)
loss = loss_fn(preds, targetBatch)
loss.backward()
opt.step()
opt.zero_grad()
e = epoch + 1
if e % 10 == 0:
print(f"Epoch: {e/numEpochs}, loss: {loss.item():.4f}")
fit(100, model, loss_fn, opt, trainDataloader) <-- error
Error produced:
<ipython-input-7-b7028a3d94fd>:5: UserWarning: Using a target size (torch.Size([5])) that is different to the input size (torch.Size([5, 3])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.
loss = loss_fn(preds, targetBatch)
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-20-d8f5bcdc847d> in <module>
----> 1 fit(100, model, loss_fn, opt, trainDataloader)
<ipython-input-7-b7028a3d94fd> in fit(numEpochs, model, loss_fn, opt, trainDataloader)
3 for inputBatch, targetBatch in trainDataloader:
4 preds = model(inputBatch)
----> 5 loss = loss_fn(preds, targetBatch)
6 loss.backward()
7
D:\coding\machine-learning\env-ml\lib\site-packages\torch\nn\functional.py in mse_loss(input, target, size_average, reduce, reduction)
2657 reduction = _Reduction.legacy_get_string(size_average, reduce)
2658
-> 2659 expanded_input, expanded_target = torch.broadcast_tensors(input, target)
2660 return torch._C._nn.mse_loss(expanded_input, expanded_target, _Reduction.get_enum(reduction))
2661
D:\coding\machine-learning\env-ml\lib\site-packages\torch\functional.py in broadcast_tensors(*tensors)
69 if any(type(t) is not Tensor for t in tensors) and has_torch_function(tensors):
70 return handle_torch_function(broadcast_tensors, tensors, *tensors)
---> 71 return _VF.broadcast_tensors(tensors) # type: ignore
72
73
RuntimeError: The size of tensor a (3) must match the size of tensor b (5) at non-singleton dimension 1
I've tried changing the dimensions of the of model, and these are a few of the changes made and the associated errors:
model = nn.Linear(5, 1338)
Error:
RuntimeError: The size of tensor a (1338) must match the size of tensor b (5) at non-singleton dimension 1
model = nn.Linear(1338, 1338)
Error:
RuntimeError: mat1 and mat2 shapes cannot be multiplied (5x5 and 1338x1338)
Sometimes this error, will make me change the matrix to the correct shape, but that results in the previous error regarding non-singleton dimension
This should be quite straight-forward, you only have a single layer. This is a matter of sorting the shapes right.
You are feeding a nn.Linear layer an input with shape input_shape. This type of layer takes two arguments: in_features the number of features in the input vector, and out_features the number of features in the resulting vector. Since you are using the F.mse_loss, your target vector needs to have the same shape as your prediction.
Bear in mind the first dimension is the batch dimension. In summary, your input tensor has shape (batch, input_size), your dense layer is defined as nn.Linear(input_size, out_size) and your target tensor has shape (batch, output_size).
Coming back to your case, your TARGETS tensor is of shape (1338) so you either mean to:
have a single prediction with 1338 components which would match a nn.Linear(?, 1338) and it would actually correspond to (1, 1338) (a single element in the batch). This can be fixed with TARGETS = TARGETS.unsqueeeze(0).
or, there are actually 1338 predictions one element, which would match a nn.Linear(?, 1) and the appropriate target shape would be (1338, 1). This can be fixed with TARGETS = TARGETS.unsqueeeze(-1) (adds an additional axis after the last dimension).
Your input dimension is 5, and you predict a scalar value (target) for each input.
Therefore, your linear model should be of size:
model = nn.Linear(5, 1) # from 5-dim inputs to 1-dim output
I think the setting batch size to 5 (similar to input dimension) is confusing you. Try changing the batch size and see how it does not affect the dimensions of the model.
Related
I am having an issue with my code that I modified from https://keras.io/examples/generative/wgan_gp/ . Instead of the data being images, my data is a (1001,2) array of sequential data. The first column being the time and the second the velocity measurements. I'm getting this error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_14704/3651127346.py in <module>
21 # Training the WGAN-GP model
22 tic = time.perf_counter()
---> 23 WGAN.fit(dataset, batch_size=batch_Size, epochs=n_epochs, callbacks=[cbk])
24 toc = time.perf_counter()
25 time_elapsed(toc-tic)
~\Anaconda3\lib\site-packages\keras\utils\traceback_utils.py in error_handler(*args, **kwargs)
65 except Exception as e: # pylint: disable=broad-except
66 filtered_tb = _process_traceback_frames(e.__traceback__)
---> 67 raise e.with_traceback(filtered_tb) from None
68 finally:
69 del filtered_tb
~\Anaconda3\lib\site-packages\tensorflow\python\framework\func_graph.py in autograph_handler(*args, **kwargs)
1145 except Exception as e: # pylint:disable=broad-except
1146 if hasattr(e, "ag_error_metadata"):
-> 1147 raise e.ag_error_metadata.to_exception(e)
1148 else:
1149 raise
ValueError: in user code:
File "C:\Users\sissonn\Anaconda3\lib\site-packages\keras\engine\training.py", line 1021, in train_function *
return step_function(self, iterator)
File "C:\Users\sissonn\Anaconda3\lib\site-packages\keras\engine\training.py", line 1010, in step_function **
outputs = model.distribute_strategy.run(run_step, args=(data,))
File "C:\Users\sissonn\Anaconda3\lib\site-packages\keras\engine\training.py", line 1000, in run_step **
outputs = model.train_step(data)
File "C:\Users\sissonn\AppData\Local\Temp/ipykernel_14704/3074469771.py", line 141, in train_step
gp = self.gradient_penalty(batch_size, x_real, x_fake)
File "C:\Users\sissonn\AppData\Local\Temp/ipykernel_14704/3074469771.py", line 106, in gradient_penalty
alpha = tf.random.uniform(batch_size,1,1)
ValueError: Shape must be rank 1 but is rank 0 for '{{node random_uniform/RandomUniform}} = RandomUniform[T=DT_INT32, dtype=DT_FLOAT, seed=0, seed2=0](strided_slice)' with input shapes: [].
And here is my code:
import time
from tqdm.notebook import tqdm
import os
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import Model
from tensorflow.keras.layers import Dense, Input
import numpy as np
import matplotlib.pyplot as plt
def define_generator(latent_dim):
# This function creates the generator model using the functional API.
# Layers...
# Input Layer
inputs = Input(shape=latent_dim, name='INPUT_LAYER')
# 1st hidden layer
x = Dense(50, activation='relu', name='HIDDEN_LAYER_1')(inputs)
# 2nd hidden layer
x = Dense(150, activation='relu', name='HIDDEN_LAYER_2')(x)
# 3rd hidden layer
x = Dense(300, activation='relu', name='HIDDEN_LAYER_3')(x)
# 4th hidden layer
x = Dense(150, activation='relu', name='HIDDEN_LAYER_4')(x)
# 5th hidden layer
x = Dense(50, activation='relu', name='HIDDEN_LAYER_5')(x)
# Output layer
outputs = Dense(2, activation='linear', name='OUPUT_LAYER')(x)
# Instantiating the generator model
model = Model(inputs=inputs, outputs=outputs, name='GENERATOR')
return model
def generator_loss(fake_logits):
# This function calculates and returns the WGAN-GP generator loss.
# Expected value of critic ouput from fake images
expectation_fake = tf.reduce_mean(fake_logits)
# Loss to minimize
loss = -expectation_fake
return loss
def define_critic():
# This function creates the critic model using the functional API.
# Layers...
# Input Layer
inputs = Input(shape=2, name='INPUT_LAYER')
# 1st hidden layer
x = Dense(50, activation='relu', name='HIDDEN_LAYER_1')(inputs)
# 2nd hidden layer
x = Dense(150, activation='relu', name='HIDDEN_LAYER_2')(x)
# 3rd hidden layer
x = Dense(300, activation='relu', name='HIDDEN_LAYER_3')(x)
# 4th hidden layer
x = Dense(150, activation='relu', name='HIDDEN_LAYER_4')(x)
# 5th hidden layer
x = Dense(50, activation='relu', name='HIDDEN_LAYER_5')(x)
# Output layer
outputs = Dense(1, activation='linear', name='OUPUT_LAYER')(x)
# Instantiating the critic model
model = Model(inputs=inputs, outputs=outputs, name='CRITIC')
return model
def critic_loss(real_logits, fake_logits):
# This function calculates and returns the WGAN-GP critic loss.
# Expected value of critic output from real images
expectation_real = tf.reduce_mean(real_logits)
# Expected value of critic output from fake images
expectation_fake = tf.reduce_mean(fake_logits)
# Loss to minimize
loss = expectation_fake - expectation_real
return loss
class define_wgan(keras.Model):
# This class creates the WGAN-GP object.
# Attributes:
# critic = the critic model.
# generator = the generator model.
# latent_dim = defines generator input dimension.
# critic_steps = defines how many times the discriminator gets trained for each training cycle.
# gp_weight = defines and returns the critic gradient for the gradient penalty term.
# Methods:
# compile() = defines the optimizer and loss function of both the critic and generator.
# gradient_penalty() = calcuates and returns the gradient penalty term in the WGAN-GP loss function.
# train_step() = performs the WGAN-GP training by updating the critic and generator weights
# and returns the loss for both. Called by fit().
def __init__(self, gen, critic, latent_dim, n_critic_train, gp_weight):
super().__init__()
self.critic = critic
self.generator = gen
self.latent_dim = latent_dim
self.critic_steps = n_critic_train
self.gp_weight = gp_weight
def compile(self, generator_loss, critic_loss):
super().compile()
self.generator_optimizer = keras.optimizers.Adam(learning_rate=0.0002, beta_1=0.5, beta_2=0.9)
self.critic_optimizer = keras.optimizers.Adam(learning_rate=0.0002, beta_1=0.5, beta_2=0.9)
self.generator_loss_function = generator_loss
self.critic_loss_function = critic_loss
def gradient_penalty(self, batch_size, x_real, x_fake):
# Random uniform samples of points between distribution.
# "alpha" must be a tensor so that "x_interp" will also be a tensor.
alpha = tf.random.uniform(batch_size,1,1)
# Data interpolated between real and fake distributions
x_interp = alpha*x_real + (1-alpha)*x_fake
# Calculating critic output gradient wrt interpolated data
with tf.GradientTape() as gp_tape:
gp_tape.watch(x_interp)
critc_output = self.discriminator(x_interp, training=True)
grad = gp_tape.gradient(critic_output, x_interp)[0]
# Calculating norm of gradient
grad_norm = tf.sqrt(tf.reduce_sum(tf.square(grad)))
# calculating gradient penalty
gp = tf.reduce_mean((norm - 1.0)**2)
return gp
def train_step(self, x_real):
# Critic training
# Getting batch size for creating latent vectors
print(x_real)
batch_size = tf.shape(x_real)[0]
print(batch_size)
# Critic training loop
for i in range(self.critic_steps):
# Generating latent vectors
latent = tf.random.normal(shape=(batch_size, self.latent_dim))
with tf.GradientTape() as tape:
# Obtaining fake data from generator
x_fake = self.generator(latent, training=True)
# Critic output from fake data
fake_logits = self.critic(x_fake, training=True)
# Critic output from real data
real_logits = self.critic(x_real, training=True)
# Calculating critic loss
c_loss = self.critic_loss_function(real_logits, fake_logits)
# Calcuating gradient penalty
gp = self.gradient_penalty(batch_size, x_real, x_fake)
# Adjusting critic loss with gradient penalty
c_loss = c_loss + gp_weight*gp
# Calculating gradient of critic loss wrt critic weights
critic_grad = tape.gradient(c_loss, self.critic.trainable_variables)
# Updating critic weights
self.critic_optimizer.apply_gradients(zip(critic_gradient, self.critic.trainable_variables))
# Generator training
# Generating latent vectors
latent = tf.random.normal(shape=(batch_size, self.latent_dim))
with tf.GradientTape() as tape:
# Obtaining fake data from generator
x_fake = self.generator(latent, training=True)
# Critic output from fake data
fake_logits = self.critic(x_fake, training=True)
# Calculating generator loss
g_loss = self.generator_loss_function(fake_logits)
# Calculating gradient of generator loss wrt generator weights
genertor_grad = tape.gradient(g_loss, self.generator.trainable_variables)
# Updating generator weights
self.generator_optimizer.apply_gradients(zip(generator_gradient, self.generator.trainable_variables))
return g_loss, c_loss
class GAN_monitor(keras.callbacks.Callback):
def __init__(self, n_samples, latent_dim):
self.n_samples = n_samples
self.latent_dim = latent_dim
def on_epoch_end(self, epoch, logs=None):
latent = tf.random.normal(shape=(self.n_samples, self.latent_dim))
generated_data = self.model.generator(latent)
plt.plot(generated_data)
plt.savefig('Epoch _'+str(epoch)+'.png', dpi=300)
data = np.genfromtxt('Flight_1.dat', dtype='float', encoding=None, delimiter=',')[0:1001,0]
time_span = np.linspace(0,20,1001)
dataset = np.concatenate((time_sapn[:,np.newaxis], data[:,np.newaxis]), axis=1)
dataset.shape
# Training Parameters
latent_dim = 100
n_epochs = 10
n_critic_train = 5
gp_weight = 10
batch_Size = 100
# Instantiating the generator and discriminator models
gen = define_generator(latent_dim)
critic = define_critic()
# Instantiating the WGAN-GP object
WGAN = define_wgan(gen, critic, latent_dim, n_critic_train, gp_weight)
# Compling the WGAN-GP model
WGAN.compile(generator_loss, critic_loss)
# Instantiating custom Keras callback
cbk = GAN_monitor(n_samples=1, latent_dim=latent_dim)
# Training the WGAN-GP model
tic = time.perf_counter()
WGAN.fit(dataset, batch_size=batch_Size, epochs=n_epochs, callbacks=[cbk])
toc = time.perf_counter()
time_elapsed(toc-tic)
This issue is the shape I am providing to tf.random.rand() for the assignment of alpha. I don't fully understand why the shape input is (batch_size, 1, 1, 1) in the Keras example. So I don't know how to specify the shape for my example. Furthermore I don't understand this line in the Keras example:
batch_size = tf.shape(real_images)[0]
In this example 'real_images' is a (60000, 28, 28, 1) array and it gets passed to the fit() method which then passes it to the train_step() method. (It gets passed as "train_images", but they are the same variable.) If I add a line that prints out 'real_images' before this tf.shape() this is what it produces:
Tensor("IteratorGetNext:0", shape=(None, 28, 28, 1), dtype=float32)
Why is the 60000 now None? Then, I added a line that printed out "batch_size" after the tf.shape() and this is what it produces:
Tensor("strided_slice:0", shape=(), dtype=int32)
I googled "tf strided_slice", but all I could find is the method tf.strided_slice(). So what exactly is the value of "batch_size" and why are the output of variables so ambiguous when they are tensors? In fact, I type:
tf.shape(train_images)[0]
in another cell of Jupyter notebook. I get a completely different output:
<tf.Tensor: shape=(), dtype=int32, numpy=60000>
I really need to understand this Keras example in order to successfully implement this code for my data. Any help is appreciated.
BTW: I am using only one set of data for now, but once I get the GAN running, I will provide multiple sets of these (1001,2) datasets. Also, if you want to test the code yourself, replacing the "dataset" variable with any (1001,2) numpy array should suffice. Thank You.
'Why is the 60000 now None?': In defining TensorFlow models, the first dimension (batch_size) is None. Getting under the hood of what goes on with TensorFlow and how it uses graphs for computation can be quite complex. But for your understanding right now, all you need to know is that batch_size does not need to be specified when defining the model, hence None. This is essential as it allow a model to be defined once but then trained with and applied to datasets of an arbitrary number of examples. For example, when training you may provide the model with a batch of 256 images at a time, but when using the trained model for inference, it's very likely that you might only want the input to be a single image. Therefore the actual value of the first dimension of the size of the input is only important once the computation is going to begin.
'I don't fully understand why the shape input is (batch_size, 1, 1, 1) in the Keras example': The reason for this size is that you want a different random value, alpha, for each image. You have batch_size number of images, hence batch_size in the first dimension, but it is just a single value in tensor format, so it only need size 1 in all other dimensions. The reason it has 4 dimensions overall is so that it can be used in calculation with your inputs, which are 4-D image tensors which will have a shape of something like (batch_size, img_h, img_w, 3) for color images with 3 RGB channels.
In terms of understanding your error, Shape must be rank 1 but is rank 0, this is saying that the function you are using - tf.random.uniform requires a rank 1 tensor, i.e. something with 1 dimension, but is being passed a rank 0 tensor, i.e. a scalar value. It is possible from your code that you are just passing it the value of batch_size rather than a tensor. This might work instead:
alpha = tf.random.uniform([batch_size, 1, 1, 1])
The first parameter of this function is its shape and so it is important to have the [] there. Check out the documentation on this function in order to make sure you're using it correctly - https://www.tensorflow.org/api_docs/python/tf/random/uniform.
I am trying to compile and train an RNN model for regression using Keras Tensorflow. I am using the "Functional API" way for the definition of my model.
I need to have 2 different inputs. The first one (input) is my training data which is an array with the shape: (TOTAL_TRAIN_DATA, SEQUENCE_LENGTH, NUM_OF_FEATURES) = (15000,1564,2). To make it more clear, I have 2 features for every frame of 15000 videos. The videos had initially a different number of frames, so all of them have been padded to have SEQUENCE_LENGTH=1564 frames (by repeating the last row). The second input (lengths) is a vector (15000,) that contains the initial length of each video. It's something like this: lengths = [317 215 576 ... 1245 213 654].
What I am trying to do is concatenate the features in the output of a GRU layer and then multiply them with the appropriate masks to keep only the features corresponding to the initial video lengths. To be more precise, the output of the GRU layer has a shape of (batch_size, SEQUENCE_LENGTH, GRU_UNITS) = (50,1564,256). I have defined a Flatten() layer that reshapes the output of the RNN to (50, 1564*256). So in this step, I want to create a mask array with a shape of (50,1564*256). Each row of the array is going to be the mask for the corresponding sample of the batch.
def mask_creator(lengths,number_of_GRU_features=256,max_pad_len=1564):
masks = np.zeros((lengths.shape[0],number_of_GRU_features*max_pad_len))
for i, length in enumerate(lengths):
masks[i,:] = np.concatenate((np.ones([length * number_of_GRU_features, ]),
np.zeros([(max_pad_len - length) * number_of_GRU_features, ])), axis=0)
return masks
#tf.compat.v1.enable_eager_execution()
#tf.data.experimental.enable_debug_mode()
#tf.config.run_functions_eagerly(True)
GRU_UNITS = 256
SEQUENCE_LENGTH = 1564
NUM_OF_FEATURES = 2
input = tf.keras.layers.Input(shape=(SEQUENCE_LENGTH,NUM_OF_FEATURES))
lengths = tf.keras.layers.Input(shape=())
masks = tf.keras.layers.Lambda(mask_creator, name="mask_function")(lengths)
gru = tf.keras.layers.GRU(GRU_UNITS , return_sequences=True)(input)
flat = tf.keras.layers.Flatten()(gru)
multiplied = tf.keras.layers.Multiply()([flat, masks])
outputs = tf.keras.layers.Dense(7, name="pred")(multiplied )
# Compile
model = tf.keras.Model([input, lengths], outputs, name="RNN")
# optimizer = tf.keras.optimizers.Adam(learning_rate=1e-2)
#Compile keras model
model.compile(optimizer='adam',
loss='mean_squared_error',
metrics=['MeanSquaredError', 'MeanAbsoluteError']),
#run_eagerly=True)
model.summary()
To create the masks, I have to somehow access the length vector that I am passing as an input argument to my keras model (lengths = tf.keras.layers.Input(shape=())). For that purpose, I thought about defining a Lamda layer (masks=tf.keras.layers.Lambda(mask_creator, name="mask_function")(lengths)) which calls the mask_creator function to create the masks. The lengths variable is supposed to be a Tensor with a shape of (batch_size,)=(50,) if I am not mistaken. However, I cannot, by any means, access the elements of the lengths as I get different types of errors, like that.
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-30-8e31522694ee> in <module>()
9 input = tf.keras.layers.Input(shape=(SEQUENCE_LENGTH,FEATURES))
10 lengths = tf.keras.layers.Input(shape=())
---> 11 masks = tf.keras.layers.Lambda(mask_creator, name="mask_function")(lengths)
12 gru = tf.keras.layers.GRU(GRU_UNITS , return_sequences=True)(input)
13 flat = tf.keras.layers.Flatten()(gru)
1 frames
<ipython-input-19-9490084e8336> in mask_creator(lengths, number_of_GRU_features, max_pad_len)
1 def mask_creator(lengths,number_of_GRU_features=256,max_pad_len=1564):
2
----> 3 masks = np.zeros((lengths.shape[0],number_of_GRU_features*max_pad_len))
4
5 for i, length in enumerate(lengths):
TypeError: Exception encountered when calling layer "mask_function" (type Lambda).
'NoneType' object cannot be interpreted as an integer
Call arguments received:
• inputs=tf.Tensor(shape=(None,), dtype=float32)
• mask=None
• training=None
Why is that and how could I fix this?
Try using tf operations only:
import tensorflow as tf
#tf.function
def mask_creator(lengths, number_of_GRU_features=256, max_pad_len=1564):
ones = tf.ragged.range(lengths * number_of_GRU_features)* 0 + 1
zeros = tf.ragged.range((max_pad_len - lengths) * number_of_GRU_features) * 0
masks = tf.concat([ones, zeros], axis=1)
return masks.to_tensor()
lengths = tf.constant([5, 10])
tf.print(mask_creator(lengths).shape, summarize=-1)
I am still grappling with PyTorch, having played with Keras for a while (which feels a lot more intuitive).
Anyway - I have the nn.linear model code below, which works fine for just one input feature, where:
inputDim = 1
I am now trying to expand the same code to include 2 features, and so I have included another column in my feature dataframe and also set:
inputDim = 2
However, when I run the code, I get the dreaded error:
RuntimeError: mat1 dim 1 must match mat2 dim 0
This error references line 63, which is:
outputs = model(inputs)
I have gone through several other posts here relating to this dimensionality error, but I still can't see what is wrong with my code. Any help would be appreciated.
The full code looks like this:
import numpy as np
import pandas as pd
import torch
from torch.autograd import Variable
import matplotlib.pyplot as plt
device = 'cuda' if torch.cuda.is_available() else 'cpu'
df = pd.read_csv('Adjusted Close - BAC-UBS-WFC.csv')
x = df[['BAC', 'UBS']]
y = df['WFC']
# number_of_features = x.shape[1]
# print(number_of_features)
x_train = np.array(x, dtype=np.float32)
x_train = x_train.reshape(-1, 1)
y_train = np.array(y, dtype=np.float32)
y_train = y_train.reshape(-1, 1)
class linearRegression(torch.nn.Module):
def __init__(self, inputSize, outputSize):
super(linearRegression, self).__init__()
self.linear = torch.nn.Linear(inputSize, outputSize)
def forward(self, x):
out = self.linear(x)
return out
inputDim = 2
outputDim = 1
learningRate = 0.01
epochs = 500
# Model instantiation
torch.manual_seed(42)
model = linearRegression(inputDim, outputDim)
if torch.cuda.is_available(): model.cuda()
criterion = torch.nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learningRate)
# Model training
loss_series = []
for epoch in range(epochs):
# Converting inputs and labels to Variable
inputs = Variable(torch.from_numpy(x_train).cuda())
labels = Variable(torch.from_numpy(y_train).cuda())
# Clear gradient buffers because we don't want any gradient from previous epoch to carry forward, dont want to cummulate gradients
optimizer.zero_grad()
# get output from the model, given the inputs
outputs = model(inputs)
# get loss for the predicted output
loss = criterion(outputs, labels)
loss_series.append(loss.item())
print(loss)
# get gradients w.r.t to parameters
loss.backward()
# update parameters
optimizer.step()
print('epoch {}, loss {}'.format(epoch, loss.item()))
# Calculate predictions on training data
with torch.no_grad(): # we don't need gradients in the testing phase
predicted = model(Variable(torch.from_numpy(x_train).cuda())).cpu().data.numpy()
General advice: For errors with dimension, it usually helps to print out dimensions at each step of the computation.
Most likely in this specific case, you have made mistake in reshaping the input with this x_train = x_train.reshape(-1, 1)
Your input is (N,1) but NN expects (N,2).
I was creating a program that would take in as input the Fashion MNIST set and I was tweaking around with my model to see how different parameters would change the accuracy.
One of the tweaks I made to my model was to change my model's loss function from cross entropy to MSE.
# The code above is miscellaneous training data import code
trainloader = torch.utils.data.DataLoader(trainset, batch_size = 64, shuffle = True, num_workers=4)
testloader = torch.utils.data.DataLoader(testset, batch_size = 64, shuffle = True, num_workers=4)
dataiter = iter(trainloader)
images, labels = dataiter.next()
from torch import nn, optim
import torch.nn.functional as F
model = nn.Sequential(nn.Linear(784, 128),
nn.ReLU(),
nn.Linear(128, 10),
nn.LogSoftmax(dim = 1)
)
model.to(device)
# Define the loss
criterion = nn.MSELoss()
# Define the optimizer
optimizer = optim.Adam(model.parameters(), lr = 0.001)
# Define the epochs
epochs = 5
train_losses, test_losses = [], []
for e in range(epochs):
running_loss = 0
for images, labels in trainloader:
# Flatten Fashion-MNIST images into a 784 long vector
images = images.to(device)
labels = labels.to(device)
images = images.view(images.shape[0], -1)
# Training pass
optimizer.zero_grad()
output = model.forward(images)
loss = criterion(output, labels)
loss.backward()
optimizer.step()
My model worked without any problems when using cross entropy loss, but when I changed to MSE loss, the interpreter complained and said that my tensors were different sizes and thus could not be computed.
<class 'torch.Tensor'>
torch.Size([64, 1, 28, 28])
torch.Size([64])
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-62-ec6942122f02> in <module>
44 output = model.forward(images)
45
---> 46 loss = criterion(output, labels)
47 loss.backward()
48 optimizer.step()
/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
530 result = self._slow_forward(*input, **kwargs)
531 else:
--> 532 result = self.forward(*input, **kwargs)
533 for hook in self._forward_hooks.values():
534 hook_result = hook(self, input, result)
/opt/conda/lib/python3.7/site-packages/torch/nn/modules/loss.py in forward(self, input, target)
429
430 def forward(self, input, target):
--> 431 return F.mse_loss(input, target, reduction=self.reduction)
432
433
/opt/conda/lib/python3.7/site-packages/torch/nn/functional.py in mse_loss(input, target, size_average, reduce, reduction)
2213 ret = torch.mean(ret) if reduction == 'mean' else torch.sum(ret)
2214 else:
-> 2215 expanded_input, expanded_target = torch.broadcast_tensors(input, target)
2216 ret = torch._C._nn.mse_loss(expanded_input, expanded_target, _Reduction.get_enum(reduction))
2217 return ret
/opt/conda/lib/python3.7/site-packages/torch/functional.py in broadcast_tensors(*tensors)
50 [0, 1, 2]])
51 """
---> 52 return torch._C._VariableFunctions.broadcast_tensors(tensors)
53
54
RuntimeError: The size of tensor a (10) must match the size of tensor b (64) at non-singleton dimension 1
I tried reshaping my tensors and creating new arrays as placeholders for my output array, yet seem to be getting nowhere.
Why cross entropy loss works without any errors yet MSE does not?
nn.CrossEntropyLoss and nn.MSELoss are completely different loss functions with fundamentally different rationale behind them.
nn.CrossEntropyLoss is a loss function for discrete labeling tasks. Therefore it expects as inputs a prediction of label probabilities and targets as ground-truth discrete labels: x shape is nxc (where c is the number of labels) and y is of shape n of type integer, each target takes values in the range {0,...,c-1}.
In contrast, nn.MSELoss is a loss function for regression tasks. Therefore it expects both predictions and targets to be of the same shape and data type. That is, if your prediction is of shape nxc the target should also be of shape nxc (and not just n as in the cross-entropy case).
If you are insisting on using MSE loss instead of cross entropy, you will need to convert the target integer labels you currently have (of shape n) into 1-hot vectors of shape nxc and only then compute the MSE loss between your predictions and the generated one-hot targets.
I have LSTM model that gets one 88-dimensional vector per step at input. Each element in vector can be of class {0, 1, 2}. Output is coded as one-hot, so that means at each step I have matrix of size 3x88 at output. I would like to calculate cross-entropy loss. This is my model:
x = tf.placeholder(tf.float32, (None, None, INPUT_SIZE))
y = tf.placeholder(tf.float32, (None, None, None, OUTPUT_SIZE))
def LSTM(x_):
cell = tf.contrib.rnn.LSTMCell(RNN_HIDDEN, state_is_tuple=True)
cell = tf.contrib.rnn.DropoutWrapper(cell, output_keep_prob=0.5)
cell = tf.contrib.rnn.MultiRNNCell([cell] * num_layers, state_is_tuple=True)
batch_size = tf.shape(x_)[0]
initial_state = cell.zero_state(batch_size, tf.float32)
rnn_outputs, rnn_states = tf.nn.dynamic_rnn(cell,
x_,
initial_state=initial_state,
time_major=False)
final_projection = lambda lx: layers.linear(lx, num_outputs=OUTPUT_SIZE,
activation_fn=None)
predicted_outputs = tf.map_fn(final_projection, rnn_outputs)
return predicted_outputs
Sample inputs and outputs to my network are here. In this sample, for inputs, size of batch is 1, there are 3 time steps, and data dimension is 88. Outputs are same, just data are transformed into one-hot vectors. So, batch size is 1 (1st dimension), there are 3 time steps (2nd dimension), there are 3 classes (3rd dimension) and data dimension is 88.
I do not know what to do with rnn_outputs and what to do to make predicted_outputs of appropriate shape so that I can call softmax_cross_entropy_with_logits(logits=pred, labels=batch_y_oh).
Code as it is now, gives me following error:
InvalidArgumentError (see above for traceback): logits and labels must be same size: logits_size=[3,88] labels_size=[9,88]
Is it even possible to calculate cross entropy like this, by feeding it directly to TF's function, or do I have to write my own function, because basically, loss would be sum of 88 cross entropies (I am thinking of iterating over columns and calling softmax_cross_entropy_with_logits() for every column?