I am a beginner in Keras programming. I just want to manually update the model weights manually in keras so as to get a deep understanding of gradient descent. However, when I tried it, the model either cannot get converged or the loss even gets exploded. My steps are listed as follows:
First, I use keras sequential model to fit a quadratic function y = 2*x*x - 7*x + 11
below is the code using the sequential model:
model = Sequential()
model.add(Dense(64, input_dim = 1, activation = 'relu'))
model.add(Dense(32, activation = 'relu'))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
model.summary()
training loss
fitted curved and original one
Then, I use the following code to update the weight manually:
class MyModel(keras.Model):
def __init__(self):
super().__init__()
self.layer1 = Dense(64, input_shape = (1, ))
self.layer2 = Dense(32)
self.layer3 = Dense(1)
def forward(self, x):
y = keras.activations.relu(self.layer1(x))
y = keras.activations.relu(self.layer2(y))
y = self.layer3(y)
return y
def loss_fun(y_pred, y):
return keras_backend.mean(keras.losses.mean_squared_error(y, y_pred))
def compute_loss(model, x, y, loss_fun = loss_fun):
logits = model.forward(x)
mse = loss_fun(y, logits)
return mse, logits
def compute_gradients(model, x, y, loss_fun = loss_fun):
with tf.GradientTape() as tape:
loss, _ = compute_loss(model, x, y, loss_fun)
return tape.gradient(loss, model.trainable_variables), loss
def apply_gradients(optimizer, gradients, variables):
optimizer.apply_gradients(zip(gradients, variables))
def train_batch(x, y, model, optimizer):
'''
one step batch training
'''
gradients, loss = compute_gradients(model, x, y)
apply_gradients(optimizer, gradients, model.trainable_variables)
return loss
model2 = MyModel()
epochs = 200
optimizer = keras.optimizers.Adam(learning_rate = 0.01) #据查这个0.01是keras默认的learning rate
loss = []
x_train = np.expand_dims(x_train, axis = 0)
y_train = np.expand_dims(y_train, axis = 0)
for i in range(epochs):
l = train_batch(x_train, y_train, model2, optimizer)
loss.append(l)
if i % 10 == 0:
print(f'current loss = {l}')
while the loss looks like this:
I also try another way to manually update the weights:
epochs = 200
lr = 0.01
optimizer = keras.optimizers.Adam(learning_rate = 0.01)
loss = []
x_train = np.expand_dims(x_train, axis = 0)
y_train = np.expand_dims(y_train, axis = 0)
x_train = tf.cast(x_train, tf.float32)
y_train = tf.cast(y_train, tf.float32)
for i in range(epochs):
y_pred = model5.forward(x_train)
l = k.mean(keras.losses.mean_squared_error(y_train, y_pred))
gradient = k.gradients(l, model5.trainable_weights)
new_weights = model5.get_weights() - 0.001 * np.array(gradients)
model5.set_weights(new_weights)
if i % 10 == 0:
loss.append(l)
print(f'{i}th loss is: {l}')
In this case, the loss explodes like this:
where is the problem?
I have figure out where the problem is.
When getting the model through the following code:
model = MyModel()
The trainable variables in model are null.
When I try to print them using this:
print(model.trainable_variables)
it outputs
[]
I try to make the weight trainable manually by the following code:
for layers in model.layers:
layers.trainable = True
But it still doesn't work at all.
Related
I am completely new to PyTorch, I would like to move my TF code to PyTorch, and I think I am missing something.
I have X as input and Y as output. X is a time series data, on which I would like to do 1D convolution. Y is just a plain number.
X has a shape of (1050589, 81, 21). I have 1050589 experiments, each experiment has 81 timestamps and each timestamp has 21 points of data. This is the required format for TF, but as far as I was able to get out in PyTorch the time dimension should be the last one.
I have my data in a numpy array, so first I transformed the data to fit PyTorch, and also transformed into a list.
a = []
for n, i in enumerate(X):
a.append([X[n].T, Y[n]])
train_data = DataLoader(a, batch_size=128)
My model looks like this:
class NeuralNetwork(nn.Module):
def __init__(self):
super(NeuralNetwork, self).__init__()
self.linear_relu_stack = nn.Sequential(
nn.Conv1d(EMBED_SIZE, 32, 7, padding='same'),
nn.ReLU(),
nn.Flatten(),
nn.Linear(81*32, 32),
nn.ReLU(),
nn.Linear(32, 1),
)
def forward(self, x):
logits = self.linear_relu_stack(x)
return logits.double()
The architecture is simple, as I want to keep it the same as I have in Tensorflow. One convolution with a kernel of 7 and 32 channels, followed by a dense layer and a single output layer.
Same network in Tensorflow:
def conv_1d_model():
model = Sequential(name="model_conv1D")
model.add(Conv1D(filters=32, kernel_size=7, activation='relu', input_shape=(81, 21), padding="same"))
model.add(Flatten())
model.add(Dense(32, activation='relu'))
model.add(Dense(1))
return model
Now when I try to optimize this network in PyTorch my losses are all over the place, not decreasing at all, while in TensorFlow it runs perfectly well.
I am sure I am missing something, can anyone point me in the right direction?
My optimization function in PyTorch:
model = NeuralNetwork()
loss_fn = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
def train_loop(dataloader, model, loss_fn, optimizer):
size = len(dataloader.dataset)
for batch, (X, y) in enumerate(dataloader):
# Compute prediction and loss
pred = torch.squeeze(model(X)) # I was getting a warning about the pred being in different shape than y, so I squeezed it
loss = loss_fn(pred, y)
# Backpropagation
optimizer.zero_grad()
loss.backward()
optimizer.step()
if batch % 10 == 0:
loss, current = loss.item(), batch * len(X)
print(f"loss: {loss:>7f} [{current:>5d}/{size:>5d}]")
Optimization in Tensorflow
model = conv_1d_model()
opt = Adam(learning_rate=learning_rate)
model.compile(loss='mse', optimizer=opt, metrics=['mae'])
model_history = model.fit(X, Y, validation_split=0.2, epochs=epochs, batch_size=batch_size, verbose=1)
I'm having trouble understanding why this is throwing an error. This code is pulled directly from the PyTorch documentation for a NN classifier for the fashion MNIST dataset. However when I try to flip this to the MNIST handwritten digits data set it comes up with the following error:
RuntimeError: The size of tensor a (10) must match the size of tensor b (64) at non-singleton dimension 1
This occurs when using the loss function during the training loop function. Can anyone help me understand why this is happening. Thanks!
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor, Lambda
import torchvision.models as models
import matplotlib.pyplot as plt
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
#device = "cpu"
print(f"Using {device} device")
training_data = datasets.MNIST(
root="data",
train=True,
download=True,
transform=ToTensor()
)
test_data = datasets.MNIST(
root="data",
train=False,
download=True,
transform=ToTensor()
)
train_dataloader = DataLoader(training_data, batch_size=64)
test_dataloader = DataLoader(test_data, batch_size=64)
class NeuralNetwork(nn.Module):
def __init__(self):
super(NeuralNetwork, self).__init__()
self.flatten = nn.Flatten()
self.linear_relu_stack = nn.Sequential(
nn.Linear(28*28, 512),
nn.ReLU(),
nn.Linear(512, 512),
nn.ReLU(),
nn.Linear(512, 10),
)
def forward(self, x):
x = self.flatten(x)
logits = self.linear_relu_stack(x)
return logits
def train_loop(dataloader, model, loss_fn, optimizer):
size = len(dataloader.dataset)
for batch, (X, y) in enumerate(dataloader):
# Compute prediction and loss
X, y = X.to(device), y.to(device)
pred = model(X)
loss = loss_fn(pred, y)
# Backpropagation
optimizer.zero_grad()
loss.backward()
optimizer.step()
if batch % 100 == 0:
loss, current = loss.item(), batch * len(X)
print(f"loss: {loss:>7f} [{current:>5d}/{size:>5d}]")
def test_loop(dataloader, model, loss_fn):
size = len(dataloader.dataset)
num_batches = len(dataloader)
test_loss, correct = 0, 0
with torch.no_grad():
for X, y in dataloader:
X, y = X.to(device), y.to(device)
pred = model(X)
test_loss += loss_fn(pred, y).item()
correct += (pred.argmax(1) == y).type(torch.float).sum().item()
test_loss /= num_batches
correct /= size
print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")
def save_checkpoint(state, filename = "checkpoint.pth.tar"):
print("=> Saving checkpoint")
torch.save(state, filename)
model = NeuralNetwork().to(device)
learning_rate = 1e-3
batch_size = 64
epochs = 10
# Initialize the loss function
loss_fn = nn.MSELoss()
optimiser = torch.optim.Adam(model.parameters(), lr=learning_rate)
for t in range(epochs):
print(f"Epoch {t+1}\n-------------------------------")
train_loop(train_dataloader, model, loss_fn, optimiser)
test_loop(test_dataloader, model, loss_fn)
print("Done!")
torch.nn.MSELoss is an implemention of mean squared error. You can't measure the difference between two tensors if they're different sizes (MSELoss does not allow for broadcasting). So if you're using MSELoss, then the predictions and the targets must be the same shape. In your case, preds is a tensor of shape [64, 10], and y is a tensor of shape [64].
The reason y is of shape [64] rather than [64, 10] is that most classification dataset implementations represent targets as integer labels rather than one-hot encoded vectors. In theory, you could convert these integer label targets to one-hot encoded targets.
But in reality, since this is a classification problem, you should probably be using something like nn.CrossEntropyLoss rather than nn.MSELoss. The former is a conventional classification loss function, and it allows the targets to be integer labels rather than one-hot labels (so just swapping out MSELoss for CrossEntropyLoss should solve your problem). MSELoss is better suited for regression tasks and such.
I made a custom loss function for cost-sensitive learning and to set the optimal parameters for neural network, I experienced with different parameters.(layers, batch size, epochs)
Then, the result was best when layers only has input and output layers.
I'm curious about this result.
Is it acceptable? then could you please tell me the reason?
Does that result occur because the dataset is already normalized?
Here's the code.
Thank you in advance.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
import tensorflow as tf
from keras.layers import Dense
from keras.callbacks import History
from keras.models import Sequential
import keras.backend as K
# Define dataset
X,y = make_classification(n_samples=150000, n_features=10, n_informative=4,
n_redundant=4, n_repeated=2, n_classes=2, n_clusters_per_class=3,
class_sep = 0.5, weights=[0.9,0.1], random_state=27)
X_train, X_true, y_train, y_true = train_test_split(X, y, test_size=0.33, random_state=42)
# Define Function
def custom_loss_wrapper(p):
def custom_loss(y_true, y_pred):
y_true = tf.cast(y_true, tf.float32)
y_pred = tf.cast(y_pred, tf.float32)
neg_y_true = 1 - y_true
neg_y_pred = 1 - y_pred
fp = K.sum(y_pred * (1 - y_true))
fn = K.sum((1 - y_pred) * y_true)
cost = tf.cast(fn * p + fp, tf.float32)
return cost
return custom_loss
def FindLayerNodesLinear(n_layers, first_layer_nodes, last_layer_nodes):
layers = []
nodes_increment = (last_layer_nodes - first_layer_nodes)/ (n_layers-1)
nodes = first_layer_nodes
for i in range(1, n_layers+1):
layers.append(math.ceil(nodes))
nodes = nodes + nodes_increment
return layers
def createmodel_for_grid_search(n_layers, first_layer_nodes, last_layer_nodes):
p =5
model = Sequential()
n_nodes = FindLayerNodesLinear(n_layers, first_layer_nodes, last_layer_nodes)
for i in range(1, n_layers):
if i==1:
model.add(Dense(first_layer_nodes, input_dim=X_train.shape[1], activation='relu'))
else:
model.add(Dense(n_nodes[i-1], activation='sigmoid'))
#Finally, the output layer should have a single node in binary classification
model.add(Dense(1, activation = 'sigmoid'))
model.compile(optimizer='adam', loss=custom_loss_wrapper(p) )
return model
# Grid search by hand
batch_sizes = [16, 32, 64, 128]
epochs = [10,20,30,50,100]
b = 0
loss_= []
for a in range(2,6):
for i in range(len(batch_sizes)):
batch_size_ = batch_sizes[i]
for j in range(len(epochs)):
epochs_ = epochs[j]
loss = np.round(hist_list[b][-1],3)
loss_.append(loss)
print('layers: {}, batch size: {}, epoch: {}, and loss: {}'.format(a, batch_size_, epochs_, loss))
b +=1
print('')
I'm new to Tensorflow and Keras. To get started, I followed the https://www.tensorflow.org/tutorials/quickstart/advanced tutorial. I'm now adapting it to train on CIFAR10 instead of MNIST dataset. I recreated this model https://keras.io/examples/cifar10_cnn/ and I'm trying to run it in my own codebase.
Logically, if the model, batch size and optimizer are all the same, then the two should perform identically, but they don't. I thought it might be that I'm making a mistake in preparing the data. So I copied the model.fit function from the keras code into my script, and it still performs better. Using .fit gives me around 75% accuracy in 25 epochs, while with the manual method it takes around 60 epochs. With .fit I also achieve slightly better max accuracy.
What I want to know is: Is .fit doing something behind the scenes that's optimizing training? What do I need to add to my code to get the same performance? Am I doing something obviously wrong?
Thanks for your time.
Main code:
import tensorflow as tf
from tensorflow import keras
import msvcrt
from Plotter import Plotter
#########################Configuration Settings#############################
BatchSize = 32
ModelName = "CifarModel"
############################################################################
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.cifar10.load_data()
print("x_train",x_train.shape)
print("y_train",y_train.shape)
print("x_test",x_test.shape)
print("y_test",y_test.shape)
x_train, x_test = x_train / 255.0, x_test / 255.0
# Convert class vectors to binary class matrices.
y_train = keras.utils.to_categorical(y_train, 10)
y_test = keras.utils.to_categorical(y_test, 10)
train_ds = tf.data.Dataset.from_tensor_slices(
(x_train, y_train)).batch(BatchSize)
test_ds = tf.data.Dataset.from_tensor_slices((x_test, y_test)).batch(BatchSize)
loss_object = tf.keras.losses.CategoricalCrossentropy(from_logits=True)
optimizer = tf.keras.optimizers.RMSprop(learning_rate=0.0001,decay=1e-6)
# Create an instance of the model
model = ModelManager.loadModel(ModelName,10)
train_loss = tf.keras.metrics.Mean(name='train_loss')
train_accuracy = tf.keras.metrics.CategoricalAccuracy(name='train_accuracy')
test_loss = tf.keras.metrics.Mean(name='test_loss')
test_accuracy = tf.keras.metrics.CategoricalAccuracy(name='test_accuracy')
########### Using this function I achieve better results ##################
model.compile(loss='categorical_crossentropy',
optimizer=optimizer,
metrics=['accuracy'])
model.fit(x_train, y_train,
batch_size=BatchSize,
epochs=100,
validation_data=(x_test, y_test),
shuffle=True,
verbose=2)
############################################################################
########### Using the below code I achieve worse results ##################
#tf.function
def train_step(images, labels):
with tf.GradientTape() as tape:
predictions = model(images, training=True)
loss = loss_object(labels, predictions)
gradients = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
train_loss(loss)
train_accuracy(labels, predictions)
#tf.function
def test_step(images, labels):
predictions = model(images, training=False)
t_loss = loss_object(labels, predictions)
test_loss(t_loss)
test_accuracy(labels, predictions)
epoch = 0
InterruptLoop = False
while InterruptLoop == False:
#Shuffle training data
train_ds.shuffle(1000)
epoch = epoch + 1
# Reset the metrics at the start of the next epoch
train_loss.reset_states()
train_accuracy.reset_states()
test_loss.reset_states()
test_accuracy.reset_states()
for images, labels in train_ds:
train_step(images, labels)
for test_images, test_labels in test_ds:
test_step(test_images, test_labels)
test_accuracy = test_accuracy.result() * 100
train_accuracy = train_accuracy.result() * 100
#Print update to console
template = 'Epoch {}, Loss: {}, Accuracy: {}, Test Loss: {}, Test Accuracy: {}'
print(template.format(epoch,
train_loss.result(),
train_accuracy ,
test_loss.result(),
test_accuracy))
# Check if keyboard pressed
while msvcrt.kbhit():
char = str(msvcrt.getch())
if char == "b'q'":
InterruptLoop = True
print("Stopping loop")
The model:
from tensorflow.keras.layers import Dense, Flatten, Conv2D, Dropout, MaxPool2D
from tensorflow.keras import Model
class ModelData(Model):
def __init__(self,NumberOfOutputs):
super(ModelData, self).__init__()
self.conv1 = Conv2D(32, 3, activation='relu', padding='same', input_shape=(32,32,3))
self.conv2 = Conv2D(32, 3, activation='relu')
self.maxpooling1 = MaxPool2D(pool_size=(2,2))
self.dropout1 = Dropout(0.25)
############################
self.conv3 = Conv2D(64,3,activation='relu',padding='same')
self.conv4 = Conv2D(64,3,activation='relu')
self.maxpooling2 = MaxPool2D(pool_size=(2,2))
self.dropout2 = Dropout(0.25)
############################
self.flatten = Flatten()
self.d1 = Dense(512, activation='relu')
self.dropout3 = Dropout(0.5)
self.d2 = Dense(NumberOfOutputs,activation='softmax')
def call(self, x):
x = self.conv1(x)
x = self.conv2(x)
x = self.maxpooling1(x)
x = self.dropout1(x)
x = self.conv3(x)
x = self.conv4(x)
x = self.maxpooling2(x)
x = self.dropout2(x)
x = self.flatten(x)
x = self.d1(x)
x = self.dropout3(x)
x = self.d2(x)
return x
I know this question already has an answer, but I faced the same problem and the solution seemed to be something different, that's not specified in the documentation.
I copy & paste here the answer (and the relative link) I found on GitHub, which solved the issue in my case:
The problem is caused by broadcasting in your loss function in the
custom loop. Make sure that the dimensions of predictions and label is
equal. At the moment (for MAE) they are [128,1] and [128]. Just make
use of tf.squeeze or tf.expand_dims.
Link: https://github.com/tensorflow/tensorflow/issues/28394
Basic translation: when computing the loss, always be sure of the tensors' shapes.
Mentioning the solution here (Answer Section) even though it is present in the Comments, for the benefit of the Community.
On the same Dataset, the Accuracy can differ when using Keras Model.fit with that of the Model built using Tensorflow mainly if the Data is shuffled because, when we shuffle the Data, the Split of Data between Training and Testing (or Validation) will be different resulting in different Train and Test Data in both the cases (Keras and Tensorflow).
If we want to observe the similar results on the Same Dataset and with similar Architecture in Keras and in Tensorflow, we can Turn off Shuffling the Data.
Hope this helps. Happy Learning!
I am trying to train a model in pytorch.
input: 686-array
first layer: 64-array
second layer: 2-array
output: predition either 1 or 0
this is what I have so far:
class autoencoder(nn.Module):
def __init__(self):
super(autoencoder, self).__init__()
self.encoder_softmax = nn.Sequential(
nn.Linear(686, 256),
nn.ReLU(True),
nn.Linear(256, 2),
nn.Softmax()
)
def forward(self, x):
x = self.encoder_softmax(x)
return x
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)
net = net.to(device)
iterations = 10
learning_rate = 0.98
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(
net.parameters(), lr=learning_rate, weight_decay=1e-5)
for epoch in range(iterations):
loss = 0.0
print("train_dl len: ", len(train_dl))
# net.train()
for i, data in enumerate(train_dl, 0):
inputs, labels, vectorize = data
labels = labels.long().to(device)
inputs = inputs.float().to(device)
optimizer.zero_grad()
outputs = net(inputs)
train_loss = criterion(outputs, labels)
train_loss.backward()
optimizer.step()
loss += train_loss.item()
loss = loss / len(train_dl)
but when I train the model, the loss is not going down. What am I doing wrong?
You're using nn.CrossEntropyLoss as the loss function, which applies log-softmax, but you also apply softmax in the model:
self.encoder_softmax = nn.Sequential(
nn.Linear(686, 256),
nn.ReLU(True),
nn.Linear(256, 2),
nn.Softmax() # <- needs to be removed
)
The output of your model should be the raw logits, without the nn.Softmax.
You should also lower the learning rate, because a learning rate of 0.98 is very high, which makes the training much less stable and you'll likely see the loss oscillate. Are more appropriate learning rate would be in the magnitude of 0.01 or 0.001.