( IF possible ) How to Vectorize This Training Process?

( IF possible ) How to Vectorize This Training Process? - python

The problem is that I am only seeing a one second improvement during training when I switched from a CPU to a GPU. I believe this is because my training process is highly iterative instead of vectorized.
Any recommendations on vectorizing this Keras training?
e_num = 0
sample_count = len(trainX[0])
for e in range(epochs):
e_num += 1
print(e_num)
for i in range(len(trainX)):
model.fit(trainX[i], trainY[i], epochs=1, batch_size=32,
verbose=2, shuffle=False,
validation_split=fix_validation_split(0.05, sample_count,
batch_size))
model.reset_states()
The problem with Keras is that I'm having difficulty passing the fit function a 4-D dataset. Thus, I'm iteratively training over each 3-D dataset inside trainX. When I increased the batch_size from 1 to 32, training is much faster obviously. However, there is still only a 1 second time difference per epoch between the CPU and GPU. Is this because my training process is not properly vectorized? If so, what recommendations would you have whilst using Keras?
Thank you!

Related

Neural Network optimization using epoch and batch

I am trying to optimize a given neural network (ex Perceptron Multilayer, with 2 hidden layers), by finding the number of epoch and batch that give the highest accuracy.
for epoch from 10 to 200 (in steps of 10):
for batch from 40 to 200 (in steps of 20):
modele.fit (X_train, Y_train, epochs = epoch, batch_size = batch)
I save batch, epoch, Accuracy;
Afterwards I kept the smallest epoch with the smallest corresponding batch which has the highest recognition
ex best_params: epoch = 10, batch = 150 => Accuracy = 94%
My problem is that when I re-run my model with the best_params, it doesn't give me the same results (loss, accuracy), even sometimes very low accuracy (eg 10%).
i try to fix seed, but no best result
Regards
Djam75

df=pd.DataFrame(columns=['Nb_Batch','Nb_Epoch','Accuracy'])
i=0
lst_loss=[]
lst_accuracy=[]
lst_epoch=list(np.arange(10,200,10))
lst_batch=list(np.arange(100,400,20))
for epoch in lst_epoch:
print ('---------------- Epoch ' + str(epoch)+ '------------------')
for batch in lst_batch:
modelSimple.fit(X_train, Y_train, nb_epoch = epoch, batch_size = batch, verbose = 0)
score = modelSimple.evaluate(X_test, Y_test)
df.loc[i,"Nb_Batch"]=batch
df.loc[i,"Nb_Epoch"]=epoch
df.loc[i,"Accuracy"]=score[1]*100
i=i+1

This might be happening due to random parameter initialization. Because if you are building an end-to-end model without transfer learn the weights, every time you training architecture get random values for its parameters.
In this case, a good practice is to use batch normalization layers after some layers according to your architecture.
tensoflow-implementation
pytorch-implmentation
extra idea:
Do not use any 'for', 'while' loops in the model implementation.
you can follow templates in TensorFlow or PyTorch.
OR, if you build a complete model from scratch, vectorize operations by using NumPy like metrics operation library.

Thanks for the update.
I resolve my probelm by saving a model and load it after.
thaks for idea (batch normalization ) and extra idea : not user any for ;-)
regards

I think you might not be updating the weight matrix after completing the training for certain batch sizes and epochs.
Please include the code as well in order to see the problem

PyTorch: is there a definitive training loop similar to Keras' fit()?

I'm coming over from Keras to PyTorch, and one of the surprising things I've found is that I'm supposed to implement my own training loop.
In Keras, there is a de facto fit() function that: (1) runs gradient descent and (2) collects a history of metrics for loss and accuracy over both the training set and validation set.
In PyTorch, it appears that the programmer needs to implement the training loop. Since I'm new to PyTorch, I don't know if my training loop implementation is correct. I just want to compare apples-to-apples loss and accuracy metrics with what I'm seeing in Keras.
I've already read through:
the official PyTorch 60-minute blitz, where they provide a sample training loop.
official PyTorch example code, where I've found the training loop placed in-line with other code.
the O'Reilly book Programming PyTorch for Deep Learning with its own training loop.
Stanford CS230 sample code.
various blog posts (e.g. here and here).
So I'm wondering: is there a definitive, universal training loop implementation that does the same thing and reports the same numbers as the Keras fit() function?
My points of frustration:
Pulling data out of the dataloader is not consistent between image data and NLP data.
Correctly computing loss and accuracy is not consistent in any sample code I've seen.
Some code examples use Variable, while others do not.
Unnecessarily detailed: moving data to/from the GPU; knowing when to call zero_grad().
For what it's worth, here is my current implementation. Are there any obvious bugs?
import time
def train(model, optimizer, loss_fn, train_dl, val_dl, epochs=20, device='cuda'):
'''
Runs training loop for classification problems. Returns Keras-style
per-epoch history of loss and accuracy over training and validation data.
Parameters
----------
model : nn.Module
Neural network model
optimizer : torch.optim.Optimizer
Search space optimizer (e.g. Adam)
loss_fn :
Loss function (e.g. nn.CrossEntropyLoss())
train_dl :
Iterable dataloader for training data.
val_dl :
Iterable dataloader for validation data.
epochs : int
Number of epochs to run
device : string
Specifies 'cuda' or 'cpu'
Returns
-------
Dictionary
Similar to Keras' fit(), the output dictionary contains per-epoch
history of training loss, training accuracy, validation loss, and
validation accuracy.
'''
print('train() called: model=%s, opt=%s(lr=%f), epochs=%d, device=%s\n' % \
(type(model).__name__, type(optimizer).__name__,
optimizer.param_groups[0]['lr'], epochs, device))
history = {} # Collects per-epoch loss and acc like Keras' fit().
history['loss'] = []
history['val_loss'] = []
history['acc'] = []
history['val_acc'] = []
start_time_sec = time.time()
for epoch in range(epochs):
# --- TRAIN AND EVALUATE ON TRAINING SET -----------------------------
model.train()
train_loss = 0.0
num_train_correct = 0
num_train_examples = 0
for batch in train_dl:
optimizer.zero_grad()
x = batch[0].to(device)
y = batch[1].to(device)
yhat = model(x)
loss = loss_fn(yhat, y)
loss.backward()
optimizer.step()
train_loss += loss.data.item() * x.size(0)
num_train_correct += (torch.max(yhat, 1)[1] == y).sum().item()
num_train_examples += x.shape[0]
train_acc = num_train_correct / num_train_examples
train_loss = train_loss / len(train_dl.dataset)
# --- EVALUATE ON VALIDATION SET -------------------------------------
model.eval()
val_loss = 0.0
num_val_correct = 0
num_val_examples = 0
for batch in val_dl:
x = batch[0].to(device)
y = batch[1].to(device)
yhat = model(x)
loss = loss_fn(yhat, y)
val_loss += loss.data.item() * x.size(0)
num_val_correct += (torch.max(yhat, 1)[1] == y).sum().item()
num_val_examples += y.shape[0]
val_acc = num_val_correct / num_val_examples
val_loss = val_loss / len(val_dl.dataset)
print('Epoch %3d/%3d, train loss: %5.2f, train acc: %5.2f, val loss: %5.2f, val acc: %5.2f' % \
(epoch+1, epochs, train_loss, train_acc, val_loss, val_acc))
history['loss'].append(train_loss)
history['val_loss'].append(val_loss)
history['acc'].append(train_acc)
history['val_acc'].append(val_acc)
# END OF TRAINING LOOP
end_time_sec = time.time()
total_time_sec = end_time_sec - start_time_sec
time_per_epoch_sec = total_time_sec / epochs
print()
print('Time total: %5.2f sec' % (total_time_sec))
print('Time per epoch: %5.2f sec' % (time_per_epoch_sec))
return history

Short answer: there is no equivalent training loop for PT and TF.keras and there shall never be one.
First of all, the training loop is syntactical sugar that is supposed to makes one's life easier. From my point of view, "making life easier" is a moto of TF.keras framework and this is the main reason it has it. Training loop can not be formalized as well defined practice, it might vary a lot depending on the task/dataset/procedure/metric/you_name_it and it would require a lot of effort to match all the options for 2 frameworks. Furthermore, creating a defining interface for training loop in Pytorch might be too restrictive for many actual users of the framework.
Matching the outputs of network would require matching behaviors of every operation within 2 frameworks, which would be impossible. First of all, the frameworks don't necessarily provide same sets of operations. Operations can be grouped into higher level abstracts differently. Also, some common functions like sigmoid or BatchNorm might look well mathematically defined on paper, but in reality have dozens of implementation specific details. Also, when improvements are introduced to the operations it is up to the community to integrate these updates into main framework distributions or plane ignore them. Needless to say, developers of 2 frameworks make these decisions independently and likely have different motivation behind them.
To sum it all up, matching high level details of 2 frameworks would require enormous effort and would probably be very disruptive for the existing users.

Indeed, the Pytorch Module class (source code) doesn't have a fit() method, so you have to implement your own according to your needs.
However there are some implementations which mimic the Keras training API, such as this one:
https://github.com/ncullen93/torchsample
or a simpler one:
https://github.com/henryre/pytorch-fitmodule

A close thing to Keras model.fit in Pytorch is Pytorch Extension called Torchbearer.
From the MNIST example notebook:
trial = Trial(model, optimizer, loss, metrics=['acc', 'loss'], callbacks=callbacks).to(device)
trial.with_generators(train_generator=traingen, val_generator=valgen, test_generator=testgen)
history = trial.run(epochs=5, verbose=1)
the similarity is there, although the usage requiers some reading.
Best of luck!

I wanted to point out that there is now a fit(...) equivalent. PyTorch Lightning is a wrapper around PyTorch that allows for a clean object-oriented approach to creating ML models in PyTorch. It provides a fit(...) loop using their Trainer class. I would checkout their official website for a more detailed answer.

relation between batch_size and running time

previously I thought that smaller batch_size would lead to faster training, but in practice in keras, I am receiving the opposite results which is that bigger batch_size make training faster.
I am implementing a sample code, and by increasing the amount of batch_size, the training become faster. that is the opposite of my previously common believe (that smaller batch_size results in faster training),
here's the sample code:
# fit model
import time
start = time.time()
history = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=1000,
batch_size= 500 , verbose=0)
end = time.time()
elapsed = end - start
print(elapsed)
I put 500, 250, 50 and 10 as batch_size respectively, i expect the lower batch_size would have faster training, but batch_size 500 results in 6.3 sec,
250 results in 6.7sec, 50 results in 28.0sec, and 10 results in 140.2secc !!!

This makes sense. I do not know what model you are using but Keras is highly optimized making use of vectorization for fast matrix operations. So if you split your data of 5000 samples into batch sizes of 500, with 1000 epochs, essentially there are (5000/500) x 1000 iterations through the model. That's 10 000
Now if you do that for a batch size of 10 there are (5000/10) x 1000 iterations. That's 500 000. Many more iterations through the model, both forward and backwards.
Hope that helps.

For hardware, GPUs are very good at parallelising the calculations, specifically, the matrix operations happens in forward-and-backward-propagation. The same thing happens in software side, tensorflow and other DL libraies optimise matrix operations.
Therefore, lager batch size makes the GPU and DL libraries able to "optimise more matrix computation", which leads to faster training time.

Can I use "model.fit()" in "for" loop to change train data in each iteration

I have a large dataset and it doesn't fit in memory. So while training, SSD is being used and epochs take too much time.
I save my dataset 9 part of .npz file. I choose first part (part 0) as validation part and I didn't use in training.
I use code below, and acc & val_acc result were fine. But I feel I do big mistake somewhere. I didn't see any example like this
for part in range(1,9):
X_Train, Y_Train = loadPart(part)
history = model.fit(X_Train, Y_Train, batch_size=128, epochs=1, verbose=1)
and also I load part 0 as Test data
val_loss, val_acc = model.evaluate(X_Test, Y_Test)
I tried to check val_acc after train each part of dataset and I observed val_acc was increasing.
Could you please let me know if this usage is legal or illegal and why?
EDIT:
I tried fit_generator but it still use disk during training and ETA was about 2,500 hours. (in model.fit with whole dataset it was about 30 mins per epoch) I use code below:
model.fit_generator(generate_batches()), steps_per_epoch=196000,epochs=10)
def generate_batches():
for part in range(1,9):
x, y = loadPart(part) yield(x,y)
def loadPart(part):
data = np.load('C:/FOLDER_PATH/'+str(part)+'.npz')
return [data['x'], data['y']
and X data shape is (196000,1536,1)
EDIT 2:
I found an answer in [github](
https://github.com/keras-team/keras/issues/4446). It says it is ok with call model.fit() multiple times in for but still I don't sure what happens in behind.What is the different between call model.fit() multiple times and call once with whole dataset.

If your model does not fit in RAM the keras documentation suggests the following (https://keras.io/getting-started/faq/#how-can-i-use-keras-with-datasets-that-dont-fit-in-memory):
You can do batch training using model.train_on_batch(x, y) and model.test_on_batch(x, y). See the models documentation.
Alternatively, you can write a generator that yields batches of training data and use the method model.fit_generator(data_generator, steps_per_epoch, epochs).
This means you could try to further split your training data into batches of 128 on your SSD and then do something like:
import glob
import numpy as np
def generate_batches(data_folder):
while True:
batches_paths = glob.glob("%s/*.npz" % data_folder)
for batch_path in batches_paths:
with np.load(batch_path) as batch:
x, y = preprocess_batch(batch)
yield (x, y)
model.fit_generator(generate_batches("/your-data-folder"), steps_per_epoch=10000, epochs=10)
The preprocess_batch function would be responsible for extracting your x and y from each .npz file and the steps_per_epoch argument in the fit_generator function should be the rounded up value of your number of data samples divided by your batch size.
More info:
https://keras.io/models/sequential/#fit_generator
https://www.pyimagesearch.com/2018/12/24/how-to-use-keras-fit-and-fit_generator-a-hands-on-tutorial/

you can use dask also, this chunks the data into smaller sections by default if you have a set that doesnt fit into RAM

If you are training as described in your question and it is trained in one session, then there is no difference. But if you are training in multiple sessions and continuing from previous training then you should save your model either after every epoch(i.e, are training through 9 sets in 1 epoch) or in your case you can save after every set of the dataset(i.e, after every 1 of 9 dataset) and in every session load the weights using model.load_weights("path to model") before you continue your training.
You can save model after every epoch using model.save("path to directory").

For Keras LSTM, what is the difference in passing in lag features vs timesteps of features?

I'm getting acquainted with LSTMs and I need clarity on something. I'm modeling a time series using t-300:t-1 to predict t:t+60. My first approach was to set up an LSTM like this:
# fake dataset to put words into code:
X = [[1,2...299,300],[2,3,...300,301],...]
y = [[301,302...359,360],[302,303...360,361],...]
# LSTM requires (num_samples, timesteps, num_features)
X = X.reshape(X.shape[0],1,X.shape[1])
model = Sequential()
model.add(LSTM(n_neurons[0], batch_input_shape=(n_batch, X.shape[1], X.shape[2]), stateful=True))
model.add(Dense(y.shape[1]))
model.compile(loss='mse', optimizer='adam')
model.fit(X, y, epochs=1, batch_size=1, verbose=1, shuffle=False)
With my real dataset, the results have been suboptimal, and on CPU it was able to train 1 epoch of around 400,000 samples in 20 minutes. The network converged quickly after a single epoch, and for any set of points I fed it, the same results would come out.
My latest change has been to reshape X in the following way:
X = X.reshape(X.shape[0],X.shape[1],1)
Training seems to be going slower (I have not tried on the full dataset), but it is noticably slower. It takes about 5 minutes to train over a single epoch of 2,800 samples. I toyed around with a smaller subset of my real data and a smaller number of epochs and it seems to be promising. I am not getting the same output for different inputs.
Can anyone help me understand what is happening here?

In Keras, timesteps in (num_samples, timesteps, num_features) determine how many steps BPTT will propagate the error back.
This, in turn, takes more time to do hence the slow down that you are observing.
X.reshape(X.shape[0], X.shape[1], 1) is the right thing to do in your case, since what you have is a single feature, with 300 timesteps.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.