Query about tensorlfow keras learning rate? - python

I'm implementing deep architecture using TensorFlow Keras. At first, I used a loss function without defining the learning rate, for example:
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
I'm wondering as to what the default learning rate is and how TensorFlow Keras sets it. Second, which is preferable: the default-specified learning rate or the Custom (user-specified) learning rate?
Then I switched to a custom learning rate. However, I've observed two different approaches of assigning a value to learning rate. For instance, one is lr and the other is learning_rate.
first way to set learing rate
optimizer = Adam(learning_rate=0.001)
model.compile(loss="categorical_crossentropy", optimizer=optimizer, metrics=["accuracy"])
Second way to set learning rate
optimizer = Adam(lr=0.001)
model.compile(loss="categorical_crossentropy", optimizer=optimizer, metrics=["accuracy"])
What is the difference between learning_rate and lr?

The lr implementation is deprecated but they essentially did the same thing. You can check this here. Credit - #Frightera.

Related

convert tf keras model to scikit MLP NN

I am experimenting with training NLTK classifier model with tensorflow and keras, would anyone know if this could be recreated with sklearn neural work MLP classifier? For what I am using ML for I don't think I need tensorflow but something simplier and easier to install/deploy.
Not a lot of wisdom on machine learning wisdom here, any tips greatly appreciated even describing this deep learning tensorflow keras model greatly appreciated.
So my tf keras model architecture looks like this:
training = []
random.shuffle(training)
training = np.array(training)
# create train and test lists. X - patterns, Y - intents
train_x = list(training[:,0])
train_y = list(training[:,1])
# Create model - 3 layers. First layer 128 neurons, second layer 64 neurons and 3rd output layer contains number of neurons
# equal to number of intents to predict output intent with softmax
model = Sequential()
model.add(Dense(128, input_shape=(len(train_x[0]),), activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(len(train_y[0]), activation='softmax'))
# Compile model. Stochastic gradient descent with Nesterov accelerated gradient gives good results for this model
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
# Fit the model
model.fit(np.array(train_x), np.array(train_y), epochs=200, batch_size=5, verbose=1)
SO the sklearn neural network, am I on track at all with this below? Can someone help me I understand what exactly the tensorflow model architecture is and what cannot be duplicated with sklearn. I sort of understand tensorflow is probabaly much more powerful that sklearn that is something more simple.
#Importing MLPClassifier
from sklearn.neural_network import MLPClassifier
model = MLPClassifier(hidden_layer_sizes=(128,64),activation ='relu',solver='sgd',random_state=1)
Just google converting a keras model to pytorch, there is quite a bit of tutorials out there for that... It doesnt look easy but maybe worth the effort for what ever you need it for...
Going down this road just using sklearn MLP neural network I can get good enough results with sklearn... without the hassle of getting tensorflow installed properly.
Also on a cloud linux instance tensorflow requires a LOT more memory and storage than a FREE account can do on pythonanywhere.com but free account seems just fine with sklearn
When experimenting with sklearn MLP NN for whatever reason better results just leaving architecture as default and playing around with the learning rate.
from sklearn.neural_network import MLPClassifier
model = MLPClassifier(learning_rate_init=0.0001,max_iter=9000,shuffle=True).fit(train_x, train_y)

how to reduce learning rate on train correctly

I am training a neural network and I want to reduce the learning rate while training.
I am currently using ReduceLROnPlateau function provided by keras. But then it reaches the patience factor, it simply stops and don't continue training.
I want to reduce the learning rate and keep the net training.
Here is my code.
optimizer=k.optimizers.Adam(learning_rate=1e-5)
model.compile(loss='categorical_crossentropy',
optimizer=optimizer,
metrics=['acc'])
learningRate=callbacks.callbacks.ReduceLROnPlateau(monitor='val_acc', verbose=1, mode='max',factor=0.2, min_lr=1e-8,patience=7)
model.fit_generator(generator=training_generator,
validation_data=validation_generator,
steps_per_epoch=1000,
epochs=30,
validation_steps=1000,
callbacks=[learningRate]
)
You're using EarlyStopping which is stopping your training.
I want to reduce the learning rate and keep the net training but don't know how to do it.
If you want this then remove EarlyStopping.

What is the difference between tf.train.AdamOptimizer and use adam in keras.compile?

i was building a dense neural network for predicting poker hands. First i had a problem with the reproducibility, but then i discovered my real problem: That i can not reproduce my code is because of the adam-optimizer, because with sgd it worked.
This means
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
did NOT work, whereas
opti = tf.train.AdamOptimizer()
model.compile(loss='sparse_categorical_crossentropy', optimizer=opti, metrics=['accuracy'])
worked with reproducibility.
So my question is now:
Is there any difference using
tf.train.AdamOptimizer
and
model.compile(..., optimizer = 'adam')
because i would like to use the first one because of the reproduce-problem.
They both are the same. However, in the tensorflow.train.AdamOptimizer you can change the learning rate
tf.compat.v1.train.AdamOptimizer(
learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-08, use_locking=False,
name='Adam')
which will improve the learning performance and the training would take longer. but in the model.compile(optimizer="adam") it will set the learning rate, beta1, beta2...etc to the default settings

Using Tensorflow 2.0 and eager execution without Keras

So this question might stem from a lack of knowledge about tensorflow. But I am trying to build a multilayer perceptron with tensorflow 2.0, but without Keras.
The reason being that it is a requirement for my machine learning course that we do not use keras. Why you might ask? I am not sure.
I already have implemented our model in tensorflow 2.0 with Keras ease, and now I want to do the exact same thing without keras.
model = Sequential()
model.add(Dense(64, activation='relu', input_dim=784))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(5, activation='softmax'))
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy',
optimizer=Adam(),
metrics=['accuracy'])
X_train = X[:7000]
y_train = tf.keras.utils.to_categorical(y[:7000], num_classes=5)
X_dev = X[7000:]
y_dev = tf.keras.utils.to_categorical(y[7000:], num_classes=5)
model.fit(X_train, y_train,
epochs=100,
batch_size=128)
score = model.evaluate(X_dev, y_dev, batch_size=128)
print(score)
Here is my problem. Whenever I look up the documentation on Tensorflow 2.0, then even the guides on custom training are using Keras.
As placeholders and sessions are a thing of the past in tensorflow 2.0, as I understand it, then I am a bit unsure of how to structure it.
I can make tensor objects. I have the impression that I need to use eager execution and use gradient tape. But I still am unsure of how to put these things together.
Now my question is. Where should I look to get a better understanding? Which direction has the greatest descent?
Please do tell me if I am doing this stack overflow post wrong. It is my first time here.
As #Daniel Möller stated, there are these tutorials for custom training and custom layers on the official TensorFlow page. As stated on the custom training page:
This tutorial used tf.Variable to build and train a simple linear model.
There is also this blog that creates custom layers and training without Keras API. You can check this code on Google Colab, which uses Cifar-10 with custom layers and training in the same manner.

How does exactly fitting with SGD in Keas works?

I'm a newbie in a Neural Networks. I'm doing my university NN project in Keras. I assembled and trained the one-layer sequential model using SGD optimizer:
[...]
nn_model = Sequential()
nn_model.add(Dense(32, input_dim=X_train.shape[1], activation='tanh'))
nn_model.add(Dense(1, activation='tanh'))
sgd = keras.optimizers.SGD(lr=0.001, momentum=0.25)
nn_model.compile(loss='mean_squared_error', optimizer=sgd, metrics=['accuracy'])
history = nn_model.fit(X_train, Y_train, epochs=2000, verbose=2, validation_data=(X_test, Y_test))
[...]
I've tried various learning rate, momentum, neurons and get satisfying accuracy and error results. But, I have to know how keras works. So could you please explain me how exactly fitting in keras works, because I can't find it in Keras documentation?
How does Keras update weights? Does it use a backpropagation algorithm? (I'm 95% sure.)
How SGD algorithm is implemented in Keras? Is it similar to Wikipedia explanation?
How Keras exactly calculate a gradient?
Thank you kindly for any information.
Let's try to break it down and I'll cover only Keras specific bits:
How does Keras update weights? Using an Optimiser which is a base class for different optimisers. Each optimiser calculates the new weights, under a function get_updates which returns a list of functions when run applies the updates.
Back-propagation? Yes, but Keras doesn't implement it directly, it leaves it for the backend tensor libraries to perform automatic differentiation. For example K.gradients calls tf.gradients in the Tensorflow backend.
SGD algorithm? It is implemented as expected on Wikipedia in the SGD class with the basic extensions such as momentum. You can follow the code easily and how it calculates the updates.
How gradient is calculated? Using back-propagation.

Categories