Speeding up Tensorflow 2.0 Gradient Tape

Speeding up Tensorflow 2.0 Gradient Tape - python

I have been following the TF 2.0 tutorial for convolution VAE's, located here.
Since it is eager, the gradients are computed by hand and then applied manually, using tf.GradientTape().
for epoch in epochs:
for x in x_train:
with tf.GradientTape() as tape:
loss = compute_loss(model, x)
apply_gradients(tape.gradient(loss, model.trainable_variables))
The problem with that code is that it is pretty slow, taking around 40-50 seconds per epoch.
If I increase the batch size by a lot (to around 2048), then it ends up taking around 8 seconds per epoch, but the model's performance decreases by quite a lot.
On the other hand, if I do a more traditional model (i.e., that uses the lazy graph-based model instead of eagerness), such as the one here, then it takes 8 seconds per epoch even with a small batch size.
model.add_loss(lazy_graph_loss)
model.fit(x_train epochs=epochs)
Based on this information, my guess would be that the problem with the TF2.0 code is the manual computation of losses and gradients.
Is there any way to speed up the TF2.0 code so that it comes closer to the normal code?

I found the solution: TensorFlow 2.0 introduces the concept of functions, which translate eager code into graph code.
The usage is pretty straight-forward. The only change needed is that all relevant functions (like compute_loss and apply_gradients) have to be annotated with #tf.function.

The main reason why your code is taking a long time is because when we use a normal pythonic for loop which does not have any tensor, the construction of the graph takes tremendous amount of time because intuitively we might think that the same graph for training might be getting re used at every iteration, but the graph constructed is actually a chain like structure where each node is the training subgraph, and the total no. Of nodes in that chain are equivalent to the no. Of iterations in the loop. In a nutshell tensorflow unwraps the iterations and then constructs the graph. So it employs lot of redundancy in terms of both space and time for the graph alone. It's so bad that, just to add two tensors repeatedly in a normal pythonic loop for like a billion times, it takes almost half hour.
To get around this problem, specifically in your case we can take the help of .repeat transformation in the tf.data.Datasets api, instead of writing
for i in range(epochs) :
We can write
For x in x_train.repeat(epochs) :
Train here

Related

Retrain your CNN model successifely with two different datasets

I had implemented a CNN with 3 Convolutional layers with Maxpooling and dropout after each layer
I had noticed that when I trained the model for the first time it gave me 88% as testing accuracy but after retraining it for the second time successively, with the same training dataset it gave me 92% as testing accuracy.
I could not understand this behavior, is it possible that the model had overfitting in the second training process?
Thank you in advance for any help!

It is quite possible if you have not provided the seed number set.seed( ) in the R language or tf.random.set_seed(any_no.) in python

Well I am no expert when it comes to machine learning but I do know the math behind it. What you are doing when you train a neural network you basicly find the local minima to the loss function. What this means is that the end result will heavily depend on the initial guess of all of the internal varaibles.
Usually the variables are randomized as a initial estimation and you could therefore reach quite different results from running the training process multiple times.
That being said, from when I studied the subject I was told that you usually reach similar regardless of the initial guess of the parameters. However it is hard to say if 0.88 and 0.92 would be considered similar or not.
Hope this gives a somewhat possible answer to your question.
As mentioned in another answer, you could remove the randomization, both in the parameter initialization of the parameters and the randomization of the data used for each epoch of training by introducing a seed. This would insure that when you run it twice, everything will get "randomized" in the exact same order. In tensorflow this is done using for example tf.random.set_seed(1), the number 1 can be changed to any number to get a new seed.

Estimator.train() and .predict() are too slow for small data sets

I'm trying to implement a DQN which makes many calls to Estimator.train() followed by Estimator.predict() on the same model with a small number of examples each. But each call takes a minimum of a few hundred milliseconds to over a second which is independent of the number of examples for small numbers like 1-20.
I think these delays are caused by rebuilding the graph and saving checkpoints on each call. Is there are way to keep the same graph and parameters in memory for fast train-predict iterations or otherwise speed it up?

Convert to a tf.keras.Model instead of an Estimator, and use tf.keras.Model.fit() instead of Estimator.train(). fit() doesn't have the fixed delay that train() does. The Keras predict() doesn't either.

Python Sklearn Neural Network Classifier Iteration and Loss

I was trying to implement a paper I read. Basically it uses 3 neural network classifiers with different parameter to work on the same loan default data with 9 different training-to-testing ratios.
In order to find best parameter, we use following criterion, when(1) max_iteration=25000
and (2) Loss value is less than 0.008, we measure the accuracy value, and pick the best.
However, when I try to use python sklearnsklearn.neural_network.MLPClassifier to finish this, I met a problem. When the training-to-test ratio increases, the iterating time the program runs drops dramatically. In the mean while, the loss value increases,
Classifier Performance Table.
This is clearly not what I want, the iterating number should keep rising to 25000 before stop.
This is how I defined classifer:
clf1= MLPClassifier(activation='relu',solver='sgd',early_stopping=False,alpha=1e-5,max_iter=25000,\
hidden_layer_sizes=(18),momentum=0.7,learning_rate_init=0.0081,tol=0,random_state=3)
clf2= MLPClassifier(activation='relu',early_stopping=False,solver='sgd',alpha=1e-5,max_iter=25000,\
hidden_layer_sizes=(23),momentum=0.69,learning_rate_init=0.0095,tol=0,random_state=3)
clf3= MLPClassifier(activation='relu',early_stopping=False,solver='sgd',alpha=1e-5,max_iter=25000,\
hidden_layer_sizes=(27),momentum=0.79,learning_rate_init=0.0075,tol=0,random_state=3)
As you can see, I already set Tolerance=0, so that every time when we iterate, it could surely decrease the loss. And I had tried other value, but still, the iterate number is way smaller than I expected.
Hope someone can help me, thanks!

Training changing input size RNN on Tensorflow

I want to train an RNN with different input size of sentence X, without padding. The logic used for this is that I am using Global Variables and for every step, I take an example, write the forward propagation i.e. build the graph, run the optimizer and then repeat the step again with another example. The program is extremely slow as compared to the numpy implementation of the same thing where I have implemented forward and backward propagation and using the same logic as above. The numpy implementation takes a few seconds while Tensorflow is extremely slow. Can running the same thing on GPU will be useful or I am doing some logical mistake ?

As a general guideline, GPU boosts performance only if you have calculation intensive code and little data transfer. In other words, if you train your model one instance at a time (or on small batch sizes) the overhead for data transfer to/from GPU can even make your code run slower! But if you feed in a good chunk of samples, then GPU will definitely boost your code.

neural network: cost is erratic

I implemented a simple neural network for classification (one class) of images in python. Layers are simple (image_matrix, 5,1). Using relu and sigmoid for the hidden layers.
I am iterating 5000 times. At first it looks like the cost goes down gradually in a sensible way.
However, no matter how many training examples I use, or what my learning_rate is, the costs starts behaving erratically after around 3000 iterations every time...
cost (click to see image)
Can someone help me understand what's going on?
Thanks

In training models, you should remember that their are multiple local minima for the its cost. Your graph shows that you're cost is moving around this local minima whilst finding your global minimum, which is the goal finding the best performance for a model.
1st - you should probably try checking for accuracy, f1-score, or loss per iteration/epoch to check if the performance is actually improving.
2nd - do cross validation and check for same metrics above for validation
3rd - implement an early stopping function that should check if you're model is improving or not.
*note: find the best alpha that would help you find the global minimum better.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Speeding up Tensorflow 2.0 Gradient Tape - python

I found the solution: TensorFlow 2.0 introduces the concept of functions, which translate eager code into graph code. The usage is pretty straight-forward. The only change needed is that all relevant functions (like compute_loss and apply_gradients) have to be annotated with #tf.function.

Related

Retrain your CNN model successifely with two different datasets

Estimator.train() and .predict() are too slow for small data sets

Python Sklearn Neural Network Classifier Iteration and Loss

Training changing input size RNN on Tensorflow

neural network: cost is erratic

Categories

Resources