Keras fit() and fit_generator() gives different results. I implemented both methods keeping all the other parameters same. I have attached my data generator and model below. The model is taken from this site. https://machinelearningmastery.com/
In data generator, I am loading the data from the hard drive. Each X_train file contains a matrix of size (3,1). For example, if the batch size is 2, the size of X_batch will be (2, 3, 1).
def generator(list_xtrain, list_ytrain, batch_size):
samples_per_epoch = len(list_xtrain)
number_of_batches = samples_per_epoch/batch_size
counter=0
X_batch = np.empty((batch_size,3,1))
y_batch = np.empty((batch_size))
while 1:
temp_listx = list_xtrain[batch_size*counter:batch_size*(counter+1)]
temp_listy = list_ytrain[batch_size*counter:batch_size*(counter+1)]
for i, ID in enumerate(temp_listx):
X_batch[i,] = np.load('F:/Air_passenger_data_gen/' + ID)
for j, ID in enumerate(temp_listy):
# Store class
y_batch[j] = np.load('F:/Air_passenger_data_gen/' + ID)
counter += 1
yield X_batch,y_batch
#restart counter to yeild data in the next epoch as well
if counter >= number_of_batches:
counter = 0
#using fit_generator()
batch_size=2
model.fit_generator(generator=generator(list_xtrain, list_ytrain,
batch_size),
epochs=100,
steps_per_epoch=len(list_xtrain)/batch_size,
verbose=2,
use_multiprocessing=False,
workers=4)
#using fit()
model.fit(trainX, trainY, epochs=100, batch_size=2)
I expect the output to be same as that from fit(). But using fit_generator() gives some crazy value for loss=41781.00 whereas using fit(), the loss=0.0020
Related
My goal is to compute a confusion matrix from a huge dataset with 10 classes, so far I got the following code and results:
Note: As far as I know is doing the correct predictions over all the classes, I computed the loss in a pre-training phase, and the accuracy during this Transfer classification phase and they behave as expected, my problem comes in the obtention of the predicted labels from the outputs.
train_dataset = Subset(eurosat_dataset, train_indices, train_transforms)
val_dataset = Subset(eurosat_dataset, val_indices, val_transforms)
train_loader = DataLoader(train_dataset, batch_size=batchsize, shuffle=False, num_workers=2, pin_memory=False,
drop_last=True)
val_loader = DataLoader(val_dataset, batch_size=batchsize, shuffle=False, num_workers=2, pin_memory=False,
drop_last=True)
print('train_len: %d val_len: %d' % (len(train_dataset), len(val_dataset)))
#for i, data in enumerate(val_loader): # inputs = data[0], labels = data[1]
# inputs, labels = data # inputs [1,13,224,224], labels[0-9] --> classes
# if i > 10:
# break
# print(inputs.shape, labels, inputs[0].max())
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
#inputs = inputs.to(device)
# Get the model, definition of the model to be loaded
import models.models_mae_mod as models_mae_mod
from models.util.pos_embed import interpolate_pos_embed # import pos_embed.py ----> Run OK
def prepare_model(chkpt_dir, arch='mae_vit_small_patch16'):
# build model
model = getattr(models_mae_mod, arch)(in_chans=13)
# load model
checkpoint = torch.load(chkpt_dir, map_location='cpu')
state_dict = model.state_dict()
for k in ['head.weight', 'head.bias']:
if k in checkpoint and checkpoint[k].shape != state_dict[k].shape:
print(f"Removing key {k} from pretrained checkpoint")
del checkpoint[k]
# interpolate position embedding
interpolate_pos_embed(model, checkpoint)
msg = model.load_state_dict(checkpoint['model'], strict=False)
print(msg)
return model
# loading the model
chkpt_dir = 'C:/Users/hugo_/PycharmProjects/transfermodel_Eurosat/datasets/B_raw_norm.pth'
model_mae = prepare_model(chkpt_dir, 'mae_vit_small_patch16')
model_mae = model_mae.to(device)
model_mae.eval()
print('Model loaded.')
with torch.no_grad():
for i, (inputs, labels) in enumerate(val_loader):
inputs = inputs.to(device)
labels = labels.to(device)
outputs = model_mae(inputs) # 0 is LOSS, 1 is [1, 196, 3328] is PRED, 2 is [1, 196] is MASK,
# 3 is [1, 13, 224, 224] is TARGET
#_, preds = torch.max(outputs, 1)
#outputs = outputs[-1:]
print("set")
I'm not computing the confusion matrix this time since the Outputs format is not the correct to get it.
nb_classes = 10
confusion_matrix = torch.zeros(nb_classes, nb_classes)
with torch.no_grad():
for i, (inputs, classes) in enumerate(val_loader):
inputs = inputs.to(device)
classes = classes.to(device)
outputs = model_mae(inputs)
outputs = outputs[3]
_, preds = torch.max(outputs, 1)
for t, p in zip(classes.view(-1), preds.view(-1)):
confusion_matrix[t.long(), p.long()] += 1
print(confusion_matrix)
I identified my problem as the way I'm getting the Outputs, which is the correct one but not enough to get the information I want, how to get those predicted labels and use them for the calculation of the Confusion Matrix?
I attach an image of my debugging process for a better understanding:
I'm very new to Keras and machine learning in general, and am training a model like so:
history = model.fit_generator(flight_generator(train_files_train, 4), steps_per_epoch=500, epochs=50)
Where flight_generator is a function that prepares the training data and formats it, and then yields it back to the model to fit. this works great, so now I want to add some validation and after much looking online I still don't know how to implement it.
My best guess would be something like:
history = model.fit_generator(flight_generator(train_files_train, 4), steps_per_epoch=500, epochs=50, validation_data=flight_generator(train_files_cv, 4))
But when I run the code it just freezes in the first epoch. What am I missing?
EDIT:
Code for flight_generator:
def flight_generator(files, batch_size):
while True:
batch_inputs = numpy.random.choice(a = files,
size = batch_size)
batch_input_X = []
batch_input_Y = []
c=0
for batch_input in batch_inputs:
# reshape into X=t and Y=t+1
trainX, trainY = create_dataset(batch_input, look_back)
# reshape input to be [samples, time steps, features]
trainX = numpy.reshape(trainX, (trainX.shape[0], 1, trainX.shape[1]))
if c is 0:
batch_input_X = trainX
batch_input_Y = trainY
else:
batch_input_X = numpy.concatenate((batch_input_X, trainX), axis = 0)
batch_input_Y = numpy.concatenate((batch_input_Y, trainY), axis = 0)
c += 1
# Return a tuple of (input) to feed the network
batch_x = numpy.array( batch_input_X )
batch_y = numpy.array( batch_input_Y )
yield(batch_x, batch_y)
Your validation_data should be in format of tuple. So you should try changing it :
history = model.fit_generator(flight_generator(train_files_train, 4), steps_per_epoch=500, epochs=50,batch_size=32,validation_data=(flight_generator(train_files_cv, 4)))
I guess you should be using model.fit(........)
Do not try to use generator unless you actually require it
In whatever code I have seen, model.fit() does the magic
Please refer to Keras documentation for fit()
https://keras.io/api/models/sequential/
And please mention the optimizer and the metrics
At the moment I'm trying to follow a example of Temperature Forecasting in Keras (as given in chapter 6.3 of F. Chollet's "Deep Learning with Python" book). I'm having some issues with prediction using the generator that is specified. My understanding is that I should be using model.predict_generator for prediction, but I'm unsure how to use the steps parameter for this method and how to get back predictions that are the correct "shape" for my original data.
Ideally, I would like to be able to plot the test set (indices 300001 until the end) and also plot my predictions for this test set (i.e. an array of the same length with predicted values).
An example (Dataset available here: https://s3.amazonaws.com/keras-datasets/jena_climate_2009_2016.csv.zip) is as follows:
import numpy as np
# Read in data
fname = ('jena_climate_2009_2016.csv')
f = open(fname)
data = f.read()
f.close()
lines = data.split('\n')
col_names = lines[0].split(',')
col_names = [i.replace('"', "") for i in col_names]
# Normalize the data
float_data = np.array(df.iloc[:, 1:])
temp = float_data[:, 1]
mean = float_data[:200000].mean(axis=0)
float_data -= mean
std = float_data[:200000].std(axis=0)
float_data /= std
def generator(data, lookback, delay, min_index, max_index, shuffle=False, batch_size=128, step=6):
if max_index is None:
max_index = len(data) - delay - 1
i = min_index + lookback
while 1:
if shuffle:
rows = np.random.randint(
min_index + lookback, max_index, size=batch_size)
else:
if i + batch_size >= max_index:
i = min_index + lookback
rows = np.arange(i, min(i + batch_size, max_index))
i += len(rows)
samples = np.zeros((len(rows),
lookback // step,
data.shape[-1]))
targets = np.zeros((len(rows),))
for j, row in enumerate(rows):
indices = range(rows[j] - lookback, rows[j], step)
samples[j] = data[indices]
targets[j] = data[rows[j] + delay][1]
yield(samples, targets)
lookback = 720
step = 6
delay = 144
train_gen = generator(float_data, lookback=lookback, delay=delay,
min_index=0, max_index=200000, shuffle=True,
step=step, batch_size=batch_size)
val_gen = generator(float_data, lookback=lookback, delay=delay,
min_index=200001, max_index=300000, step=step,
batch_size=batch_size)
test_gen = generator(float_data, lookback=lookback, delay=delay,
min_index=300001, max_index=None, step=step,
batch_size=batch_size)
val_steps = (300000 - 200001 - lookback)
test_steps = (len(float_data) - 300001 - lookback)
from keras.models import Sequential
from keras import layers
from keras.optimizers import RMSprop
model = Sequential()
model.add(layers.Flatten(input_shape=(lookback // step, float_data.shape[-1])))
model.add(layers.Dense(32, activation='relu'))
model.add(layers.Dense(1))
model.compile(optimizer=RMSprop(), loss='mae')
model.fit_generator(train_gen, steps_per_epoch=500,
epochs=20, validation_data=val_gen,
validation_steps=val_steps)
After some searching around online, I tried some techniques similar to the following:
pred = model.predict_generator(test_gen, steps=test_steps // batch_size)
However the prediction array that I got back was far too long and didn't match up to my original data at all. Has anyone got any suggestions?
For anyone looking at the question now, we are not required to specify the steps parameter when using predict_generator for the newer versions of keras. Ref: https://github.com/keras-team/keras/issues/11902
If a value is provided, predictions for step*batch_size examples will be generated. This may result in exclusion of len(test)%batch_size rows, as mentioned by OP.
Also, it seems to me that setting batch_size=1 defeats the purpose of using the generator, as it is equivalent to iterating over the test data one by one.
Similarly setting steps=1 (when batch_size is not set in test_generator) will read the entire test data at once, which is not ideal for large test data.
In predict_generator for steps divide number of images you have in test path with whatever batchsize you have provided in test_gen
EX: i have 50 images and i provided batch size of 10 than steps would be 5
#first seperate the `test images` and `test labels`
test_images,test_labels = next(test_gen)
#get the class indices
test_labels = test_labels[:,0] #this should give you array of labels
predictions = model.predict_generator(test_gen,steps = number of images/batchsize,verbose=0)
predictions[:,0] #this is your actual predictions
Your original code looks correct:
pred = model.predict_generator(test_gen, steps=test_steps // batch_size)
I tried and did not see any problem generating a pred of length around 120k. What size did you get?
Actually both of the steps in the code are incorrect. They should be:
val_steps = (300000 - 200001 - lookback) // batch_size
test_steps = (len(float_data) - 300001 - lookback) // batch_size
(Didn't it take forever for your validation to run for each epoch?)
Of course with this correction you can simply use
pred = model.predict_generator(test_gen, steps=test_steps)
As I arrived at a semi-acceptable version of an answer to my own question, I decided to post it for posterity:
test_gen = generator(float_data, lookback=lookback, delay=delay,
min_index=300001, max_index=None, step=step,
batch_size=1) # "reset" the generator
pred = model.predict_generator(test_gen, steps=test_steps)
This now has the shape I want to plot it against my original test set. I could also use a more manual approach inspired somewhat by this answer:
test_gen = generator(float_data, lookback=lookback, delay=delay,
min_index=300001, max_index=None, step=step,
batch_size=1) # "reset" the generator
truth = []
pred = []
for i in range(test_steps):
x, y = next(test_gen)
pred.append(model.pred(x))
truth.append(y)
pred = np.concatenate(pred)
truth = np.concatenate(truth)
I would like to train an LSTM or GRU network in TensorFlow/Keras to continuously recognize whether a user is walking or not based on input from motion sensors (accelerometer and gyroscope). I have 50 input sequences with lengths varying from 581 to 5629 time steps and 6 features and 50 corresponding output sequences of boolean values. My problem is that I don't know how to feed the training data to the fit() method.
I know approximately what I need to do: I'd like to train with 5 batches of 10 sequences each, and for each batch I have to pad all but the longest sequence so all 10 sequences have the same lengths and apply masking. I just don't know how to build the data structures. I know that I can make one big 3D tensor of size (50,5629,6) and that works, but it's painfully slow, so I'd really like to make the sequence length of each batch as small as possible.
Here's the problem in code:
import tensorflow as tf
import numpy as np
# Load data from file
x_list, y_list = loadSequences("train.csv")
# x_list is now a list of arrays (n,6) of float64, where n is the timesteps
# and 6 is the number of features, sorted by increasing sequence lengths.
# y_list is a list of arrays (n,1) of Boolean.
x_train = # WHAT DO I WRITE HERE?
y_train = # AND HERE?
model = tf.keras.models.Sequential([
tf.keras.layers.Masking(),
tf.keras.layers.LSTM(32, return_sequences=True),
tf.keras.layers.Dense(2, activation=tf.nn.softmax)
])
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.fit(x_train, y_train, batch_size=10, epochs=100)
You can do some thing like this
use generator function take a look at this link fit_generator look for fit_generator method.
def data_generater(batch_size):
print("reading data")
training_file = 'data_location', 'r')
# assuming data is in json format so feels free to change accordingly
training_set = json.loads(training_file.read())
training_file.close()
batch_i = 0 # Counter inside the current batch vector
batch_x = [] # The current batch's x data
batch_y = [] # The current batch's y data
while True:
for obj in training_set:
batch_x.append(your input sequences one by one)
if obj['val'] == True:
batch_y.append([1])
elif obj['val'] == False:
batch_y.append([0])
batch_i += 1
if batch_i == batch_size:
# Ready to yield the batch
# pad input to max length in the batch
batch_x = pad_txt_data(batch_x)
yield batch_x, np.array(batch_y)
batch_x = []
batch_y = []
batch_i = 0
def pad_txt_data(arr):
# expecting arr to be in the shape of (10, m, 6)
paded_arr = []
prefered_len = len(max(arr, key=len))
# Now pad all your sequences to preferred length in the batch(arr)
return np.array(paded_arr)
and in the model
model = keras.Sequential()
model.add(keras.layers.Masking(mask_value=0., input_shape=(None,6)))
model.add(keras.layers.LSTM(32))
model.add(keras.layers.Dense(1, activation="softmax"))
model.compile(optimizer="Adam", loss='categorical_crossentropy', metrics=['categorical_accuracy'])
model.fit_generator(data_generater(10), steps_per_epoch=5, epochs=10)
Batch_size, steps_per_epoch, epoch can be different.
Generally
steps_per_epoch = (number of sequences/batch_size)
Note: Form reading your description your task appears to be Binary classification problem not like an Sequence to sequence problem. A good example for sequence to sequence is a language translation. Just google around you will find what i mean.
And if you really want to see the difference in training times I suggest using a GPU if available and CuDNNLSTM.
In case it helps someone, here's how I ended up implementing a solution:
import tensorflow as tf
import numpy as np
# Load data from file
x_list, y_list = loadSequences("train.csv")
# x_list is now a list of arrays (m,n) of float64, where m is the timesteps
# and n is the number of features.
# y_list is a list of arrays (m,1) of Boolean.
assert len(x_list) == len(y_list)
num_sequences = len(x_list)
num_features = len(x_list[0][0])
batch_size = 10
batches_per_epoch = 5
assert batch_size * batches_per_epoch == num_sequences
def train_generator():
# Sort by length so the number of timesteps in each batch is minimized
x_list.sort(key=len)
y_list.sort(key=len)
# Generate batches
while True:
for b in range(batches_per_epoch):
longest_index = (b + 1) * batch_size - 1
timesteps = len(x_list[longest_index])
x_train = np.zeros((batch_size, timesteps, num_features))
y_train = np.zeros((batch_size, timesteps, 1))
for i in range(batch_size):
li = b * batch_size + i
x_train[i, 0:len(x_list[li]), :] = x_list[li]
y_train[i, 0:len(y_list[li]), 0] = y_list[li]
yield x_train, y_train
model = tf.keras.models.Sequential([
tf.keras.layers.Masking(mask_value=0., input_shape=(None,num_features)),
tf.keras.layers.LSTM(32, return_sequences=True),
tf.keras.layers.Dense(2, activation=tf.nn.softmax)
])
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.fit_generator(train_generator(), steps_per_epoch=batches_per_epoch, epochs=100)
I'm using this Pytorch implementation of Segnet with pretrained values I found for object segmentation, and it works fine.
Now I want to resume the training from the values I have, using a new dataset with similar images.
How can I do that?
I guess I have to use the "train.py" file found in the repository, but I don't know what to write in order to replace the "fill the batch" comment.
Here is that portion of the code:
def train(epoch):
model.train()
# update learning rate
lr = args.lr * (0.1 ** (epoch // 30))
for param_group in optimizer.param_groups:
param_group['lr'] = lr
# define a weighted loss (0 weight for 0 label)
weights_list = [0]+[1 for i in range(17)]
weights = np.asarray(weights_list)
weigthtorch = torch.Tensor(weights_list)
if(USE_CUDA):
loss = nn.CrossEntropyLoss(weight=weigthtorch).cuda()
else:
loss = nn.CrossEntropyLoss(weight=weigthtorch)
total_loss = 0
# iteration over the batches
batches = []
for batch_idx,batch_files in enumerate(tqdm(batches)):
# containers
batch = np.zeros((args.batch_size,input_nbr, imsize, imsize), dtype=float)
batch_labels = np.zeros((args.batch_size,imsize, imsize), dtype=int)
# fill the batch
# ...
# What should I write here?
batch_th = Variable(torch.Tensor(batch))
target_th = Variable(torch.LongTensor(batch_labels))
if USE_CUDA:
batch_th =batch_th.cuda()
target_th = target_th.cuda()
# initilize gradients
optimizer.zero_grad()
# predictions
output = model(batch_th)
# Loss
output = output.view(output.size(0),output.size(1), -1)
output = torch.transpose(output,1,2).contiguous()
output = output.view(-1,output.size(2))
target = target.view(-1)
l_ = loss(output.cuda(), target)
total_loss += l_.cpu().data.numpy()
l_.cuda()
l_.backward()
optimizer.step()
return total_loss/len(files)
If I had to guess he probablly made some Dataloader feeder that extended the Pytorch Dataloader class. See
https://pytorch.org/tutorials/beginner/data_loading_tutorial.html
Near the bottom of the page you can see an example in which they loop over their data loader
for i_batch, sample_batched in enumerate(dataloader):
What this would like like for images for example is:
trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=False, transform=transform_train)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=batchSize, shuffle=True, num_workers=2)
for batch_idx, (inputs, targets) in enumerate(trainloader):
# Using the pytorch data loader the inputs and targets are given
# automatically
inputs, targets = inputs.cuda(), targets.cuda()
optimizer.zero_grad()
inputs, targets = Variable(inputs), Variable(targets)
How exactly the author loads his files I don't know. You could follow the procedure from: https://pytorch.org/tutorials/beginner/data_loading_tutorial.html to make your own Dataloader though.