I'm a newbie in a Neural Networks. I'm doing my university NN project in Keras. I assembled and trained the one-layer sequential model using SGD optimizer:
[...]
nn_model = Sequential()
nn_model.add(Dense(32, input_dim=X_train.shape[1], activation='tanh'))
nn_model.add(Dense(1, activation='tanh'))
sgd = keras.optimizers.SGD(lr=0.001, momentum=0.25)
nn_model.compile(loss='mean_squared_error', optimizer=sgd, metrics=['accuracy'])
history = nn_model.fit(X_train, Y_train, epochs=2000, verbose=2, validation_data=(X_test, Y_test))
[...]
I've tried various learning rate, momentum, neurons and get satisfying accuracy and error results. But, I have to know how keras works. So could you please explain me how exactly fitting in keras works, because I can't find it in Keras documentation?
How does Keras update weights? Does it use a backpropagation algorithm? (I'm 95% sure.)
How SGD algorithm is implemented in Keras? Is it similar to Wikipedia explanation?
How Keras exactly calculate a gradient?
Thank you kindly for any information.
Let's try to break it down and I'll cover only Keras specific bits:
How does Keras update weights? Using an Optimiser which is a base class for different optimisers. Each optimiser calculates the new weights, under a function get_updates which returns a list of functions when run applies the updates.
Back-propagation? Yes, but Keras doesn't implement it directly, it leaves it for the backend tensor libraries to perform automatic differentiation. For example K.gradients calls tf.gradients in the Tensorflow backend.
SGD algorithm? It is implemented as expected on Wikipedia in the SGD class with the basic extensions such as momentum. You can follow the code easily and how it calculates the updates.
How gradient is calculated? Using back-propagation.
Related
I am experimenting with training NLTK classifier model with tensorflow and keras, would anyone know if this could be recreated with sklearn neural work MLP classifier? For what I am using ML for I don't think I need tensorflow but something simplier and easier to install/deploy.
Not a lot of wisdom on machine learning wisdom here, any tips greatly appreciated even describing this deep learning tensorflow keras model greatly appreciated.
So my tf keras model architecture looks like this:
training = []
random.shuffle(training)
training = np.array(training)
# create train and test lists. X - patterns, Y - intents
train_x = list(training[:,0])
train_y = list(training[:,1])
# Create model - 3 layers. First layer 128 neurons, second layer 64 neurons and 3rd output layer contains number of neurons
# equal to number of intents to predict output intent with softmax
model = Sequential()
model.add(Dense(128, input_shape=(len(train_x[0]),), activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(len(train_y[0]), activation='softmax'))
# Compile model. Stochastic gradient descent with Nesterov accelerated gradient gives good results for this model
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
# Fit the model
model.fit(np.array(train_x), np.array(train_y), epochs=200, batch_size=5, verbose=1)
SO the sklearn neural network, am I on track at all with this below? Can someone help me I understand what exactly the tensorflow model architecture is and what cannot be duplicated with sklearn. I sort of understand tensorflow is probabaly much more powerful that sklearn that is something more simple.
#Importing MLPClassifier
from sklearn.neural_network import MLPClassifier
model = MLPClassifier(hidden_layer_sizes=(128,64),activation ='relu',solver='sgd',random_state=1)
Just google converting a keras model to pytorch, there is quite a bit of tutorials out there for that... It doesnt look easy but maybe worth the effort for what ever you need it for...
Going down this road just using sklearn MLP neural network I can get good enough results with sklearn... without the hassle of getting tensorflow installed properly.
Also on a cloud linux instance tensorflow requires a LOT more memory and storage than a FREE account can do on pythonanywhere.com but free account seems just fine with sklearn
When experimenting with sklearn MLP NN for whatever reason better results just leaving architecture as default and playing around with the learning rate.
from sklearn.neural_network import MLPClassifier
model = MLPClassifier(learning_rate_init=0.0001,max_iter=9000,shuffle=True).fit(train_x, train_y)
I would like to monitor accuracy for my tensorflow model, however, when compiling my model using metrics=['accuracy'] or metrics=[tf.keras.metrics.Accuracy()] and then train my model the following Warning pops up.
WARNING:tensorflow: Early stopping conditioned on metric accuracy which is not available. Available metrics are: loss, val_loss
model.compile(optimizer='adam', loss='mean_squared_error', metrics=["tried both options i mentioned"])
callbacks = [EarlyStopping(monitor='accuracy', patience=1000)]
model.fit(x_train, y_train, epochs=5000, batch_size=100, validation_split=0.2, callbacks=callbacks)
Based on the link here:
Accuracy is one metric for evaluating classification models. Informally, accuracy is the fraction of predictions our model got right. Formally, accuracy has the following definition:
So, for other problems like regression you should use other metrics rather than accuracy, like metrics=[tf.keras.metrics.MeanSquaredError()])
In addition to Kaveh's answer, there are other metrics for regression problems. One that I think is quite useful is R2 squared (https://en.wikipedia.org/wiki/Coefficient_of_determination) and it isn't included in Keras.
Tensorflow addons library (https://www.tensorflow.org/addons) implements it and can be used in a ANN with the following code:
import tensorflow_addons as tfa
model.compile(optimizer=tf.optimizers.Adam(learning_rate=0.01),
loss="mean_squared_error",
metrics=tfa.metrics.RSquare(y_shape=(1,)))
Using the keras simpleRNN layer, I am hitting this wall. I have two other models, one only with fully connected Dense layers, and one using LSTM which work as expected, so I don't think it's the data processing that is the issue.
For context, I am using the tf.keras reuters dataset, which comes tokenized, and the output data consists of 46 possible tags which I have categorized.
What the data looks like and how it's processed
Below is the model code.
modelRNN = Sequential()
modelRNN.add(Embedding(input_dim=maxFeatures, output_dim=256,input_shape=(maxWords,)))
modelRNN.add(SimpleRNN(1024))
#modelRNN.add(Activation("sigmoid"))
modelRNN.add(Dropout(0.8))
modelRNN.add(Dense(128))
modelRNN.add(Dense(46, activation="softmax"))
modelRNN.compile(
optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'],
)
And I am fitting using the following parameters
historyRNN = modelRNN.fit(x_train, y_train,
epochs=100,
batch_size=512,
shuffle=True,
validation_data = (x_test,y_test)
)
Fitting this model, consistently has a val_accuracy of 0,3762, and a val_loss of ~3,4. This "ceiling" can be clearly seen in the graph:
Things I've tried: changing super parameters, changing the input data shape, trying different optimizers.
Any tip is appreciated, thank you. And thank you to the people that helped edit my posts to be more understandable :)
The graphs for the other two models, working on the same data:
Dense layers only
LSTM
So this question might stem from a lack of knowledge about tensorflow. But I am trying to build a multilayer perceptron with tensorflow 2.0, but without Keras.
The reason being that it is a requirement for my machine learning course that we do not use keras. Why you might ask? I am not sure.
I already have implemented our model in tensorflow 2.0 with Keras ease, and now I want to do the exact same thing without keras.
model = Sequential()
model.add(Dense(64, activation='relu', input_dim=784))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(5, activation='softmax'))
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy',
optimizer=Adam(),
metrics=['accuracy'])
X_train = X[:7000]
y_train = tf.keras.utils.to_categorical(y[:7000], num_classes=5)
X_dev = X[7000:]
y_dev = tf.keras.utils.to_categorical(y[7000:], num_classes=5)
model.fit(X_train, y_train,
epochs=100,
batch_size=128)
score = model.evaluate(X_dev, y_dev, batch_size=128)
print(score)
Here is my problem. Whenever I look up the documentation on Tensorflow 2.0, then even the guides on custom training are using Keras.
As placeholders and sessions are a thing of the past in tensorflow 2.0, as I understand it, then I am a bit unsure of how to structure it.
I can make tensor objects. I have the impression that I need to use eager execution and use gradient tape. But I still am unsure of how to put these things together.
Now my question is. Where should I look to get a better understanding? Which direction has the greatest descent?
Please do tell me if I am doing this stack overflow post wrong. It is my first time here.
As #Daniel Möller stated, there are these tutorials for custom training and custom layers on the official TensorFlow page. As stated on the custom training page:
This tutorial used tf.Variable to build and train a simple linear model.
There is also this blog that creates custom layers and training without Keras API. You can check this code on Google Colab, which uses Cifar-10 with custom layers and training in the same manner.
I've implemented a basic MLP in Keras with tensorflow and I'm trying to solve a binary classification problem. For binary classification, it seems that sigmoid is the recommended activation function and I'm not quite understanding why, and how Keras deals with this.
I understand the sigmoid function will produce values in a range between 0 and 1. My understanding is that for classification problems using sigmoid, there will be a certain threshold used to determine the class of an input (typically 0.5). In Keras, I'm not seeing any way to specify this threshold, so I assume it's done implicitly in the back-end? If this is the case, how is Keras distinguishing between the use of sigmoid in a binary classification problem, or a regression problem? With binary classification, we want a binary value, but with regression a nominal value is needed. All I can see that could be indicating this is the loss function. Is that informing Keras on how to handle the data?
Additionally, assuming Keras is implicitly applying a threshold, why does it output nominal values when I use my model to predict on new data?
For example:
y_pred = model.predict(x_test)
print(y_pred)
gives:
[7.4706882e-02] [8.3481872e-01] [2.9314638e-04] [5.2297767e-03]
[2.1608515e-01] ... [4.4894204e-03] [5.1120580e-05] [7.0263929e-04]
I can apply a threshold myself when predicting to get a binary output, however surely Keras must be doing that anyway in order to correctly classify? Perhaps Keras is applying a threshold when training the model, but when I use it to predict new values, the threshold isn't used as the loss function isn't used in predicting? Or is not applying a threshold at all, and the nominal values outputted happen to be working well with my model? I've checked this is happening on the Keras example for binary classification, so I don't think I've made any errors with my code, especially as it's predicting accurately.
If anyone could explain how this is working, I would greatly appreciate it.
Here's my model as a point of reference:
model = Sequential()
model.add(Dense(124, activation='relu', input_shape = (2,)))
model.add(Dropout(0.5))
model.add(Dense(124, activation='relu'))
model.add(Dropout(0.1))
model.add(Dense(1, activation='sigmoid'))
model.summary()
model.compile(loss='binary_crossentropy',
optimizer=SGD(lr = 0.1, momentum = 0.003),
metrics=['acc'])
history = model.fit(x_train, y_train,
batch_size=batch_size,
epochs=epochs,
verbose=1,
validation_data=(x_test, y_test))
score = model.evaluate(x_test, y_test, verbose=0)
The output of a binary classification is the probability of a sample belonging to a class.
how is Keras distinguishing between the use of sigmoid in a binary classification problem, or a regression problem?
It does not need to. It uses the loss function to calculate the loss, then the derivatives and update the weights.
In other words:
During training the framework minimizes the loss. The user must specify the loss function (provided by the framework) or supply their own. The network only cares about the scalar value this function outputs and its 2 arguments are predicted y^ and actual y.
Each activation function implements the forward propagation and back-propagation functions. The framework is only interested in these 2 functions. It does not care what the function does exactly, as long as it is differentiable for gradient descent to work.
You can assign the threshold explicitly in compile() by using
tf.keras.metrics.BinaryAccuracy(
name="binary_accuracy", dtype=None, threshold=0.5
)
like following:
model.compile(optimizer='sgd',
loss='mse',
metrics=[tf.keras.metrics.BinaryAccuracy()])