convert tf keras model to scikit MLP NN - python

I am experimenting with training NLTK classifier model with tensorflow and keras, would anyone know if this could be recreated with sklearn neural work MLP classifier? For what I am using ML for I don't think I need tensorflow but something simplier and easier to install/deploy.
Not a lot of wisdom on machine learning wisdom here, any tips greatly appreciated even describing this deep learning tensorflow keras model greatly appreciated.
So my tf keras model architecture looks like this:
training = []
random.shuffle(training)
training = np.array(training)
# create train and test lists. X - patterns, Y - intents
train_x = list(training[:,0])
train_y = list(training[:,1])
# Create model - 3 layers. First layer 128 neurons, second layer 64 neurons and 3rd output layer contains number of neurons
# equal to number of intents to predict output intent with softmax
model = Sequential()
model.add(Dense(128, input_shape=(len(train_x[0]),), activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(len(train_y[0]), activation='softmax'))
# Compile model. Stochastic gradient descent with Nesterov accelerated gradient gives good results for this model
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
# Fit the model
model.fit(np.array(train_x), np.array(train_y), epochs=200, batch_size=5, verbose=1)
SO the sklearn neural network, am I on track at all with this below? Can someone help me I understand what exactly the tensorflow model architecture is and what cannot be duplicated with sklearn. I sort of understand tensorflow is probabaly much more powerful that sklearn that is something more simple.
#Importing MLPClassifier
from sklearn.neural_network import MLPClassifier
model = MLPClassifier(hidden_layer_sizes=(128,64),activation ='relu',solver='sgd',random_state=1)

Just google converting a keras model to pytorch, there is quite a bit of tutorials out there for that... It doesnt look easy but maybe worth the effort for what ever you need it for...
Going down this road just using sklearn MLP neural network I can get good enough results with sklearn... without the hassle of getting tensorflow installed properly.
Also on a cloud linux instance tensorflow requires a LOT more memory and storage than a FREE account can do on pythonanywhere.com but free account seems just fine with sklearn
When experimenting with sklearn MLP NN for whatever reason better results just leaving architecture as default and playing around with the learning rate.
from sklearn.neural_network import MLPClassifier
model = MLPClassifier(learning_rate_init=0.0001,max_iter=9000,shuffle=True).fit(train_x, train_y)

Related

Keras classification model with pure numpy classification layer

I have a multiclass(108 classes) classification model which I want to apply transfer learning to the classification layer. I want to deploy this model in a low computing resource device (Raspberry Pi) and I thought to implement the classification layer in pure numpy instead of using Keras or TF.
Below is my original model.
from tensorflow.keras.models import Sequential, Model, LSTM, Embedding
model = Sequential()
model.add(Embedding(108, 50, input_length=10))
model.add((LSTM(32, return_sequences=False)))
model.add(Dense(108, activation="softmax"))
model.compile(loss="categorical_crossentropy", optimizer=Adam(lr=0.001), metrics=['accuracy'])
model.summary()
es = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=3)
history = model.fit(X_train, y_train, epochs=10, batch_size=32, validation_split=0.5, callbacks=[es]).history
I split this model into two parts, encoder and decoder as follows. decoder is the classification layer which I want to convert into NumPy model and then do the on-device transfer learning later.
encoder = Sequential([
Embedding(108, 50, input_length=10),
GRU(32, return_sequences=False)
])
decoder = Sequential([
Dense(108, activation="softmax")
])
model = Model(inputs=encoder.input, outputs=decoder(encoder.output))
model.compile(loss="categorical_crossentropy", optimizer=Adam(lr=0.001), metrics=['accuracy'])
model.summary()
es = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=3)
history = model.fit(X_train, y_train, epochs=50, batch_size=32, validation_split=0.5, callbacks=[es]).history
I have a few questions related to this approach.
Only way i know to train this model is, first to train the encoder and decoder. then
train NumPy classification layer using trained encoder outputs.
Is there any way i can train the NumPy model at the same time when i train the encoder (without using the above Keras decoder part and Model)? I can't use Model as I can't use Keras or TF in raspberry Pi during the transfer learning.
If there is no any way to train encoder and Numpy model at the same time,
How to use learned decoder weights as the starting weights of the Numpy Model instead of starting from random weights?
What is the most efficient code (or way) to implement the Numpy classification layer (decoder)? It requires a highly efficient model as i do the transfer learning on Raspberry Pi for incoming streaming data.
Once i trained the model for reasonable data, i plan to convert the encoder into TFLite and do the inference
Highly appreciate any help or guidance to achieve this as I'm new to NumPy-based NN implementations.
Thanks in advance

Neural Network optimization for image classification in keras/tensorflow

I am writing a program for clasifying images into two categories: "Wires" and "non-Wires". I have hand-labeled around 5000 microscope images, examples:
non-wire
wire
The neural network I am using is adapted from "Deep Learning with Python", chapter about convolutional networks (I don't think convolutional networks are neccesary here because there are no obvious hierarchies; Dense networks should be more suitable):
model = models.Sequential()
model.add(layers.Dense(32, activation='relu',input_shape=(200,200,3)))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Dense(32, activation='relu'))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Flatten())
model.add(layers.Dense(256, activation='relu'))
model.add(layers.Dense(2, activation='softmax'))
However, test accuracy after 10 epochs of training does not go over 92% when playing around with the paramters of the network. Training images contain about 1/3 wires, 2/3 non-wires. My question: Do you see any obvious mistakes in this neural network design that inhibits accuracy, or do you think I am limited by the image quality? I have about 4000 train and 1000 test images.
You might get some improvement by trying to handle the class imbalance using a weights dictionary. If the label of non wire is 0 and the label for wire is 1 then the weight dictionary would be
weight_dict= { 0:.5, 1:1}
in model.fit set
class_weight=weight_dict .
Without seeing the results of training (training loss and validation loss) can't tell what else to do. If you are over fitting try adding some dropout layers. Also recommend you try using an adjustable learning using the keras callback ReduceLROnPlateau, and early stopping using the keras callback EarlyStopping. Documentation is here. Set each callback to monitor validation loss. My suggested code is shown below:
reduce_lr=tf.keras.callbacks.ReduceLROnPlateau(
monitor="val_loss",factor=0.5, patience=2, verbose=1)
e_stop=tf.keras.callbacks.EarlyStopping( monitor="val_loss", patience=5,
verbose=0, restore_best_weights=True)
callbacks=[reduce_lr, e_stop]
In model.fit include
callbacks=callbacks
If you want to give a convolutional network a try I recommend transfer learning using the Mobilenetmodel. Documentation for that is here.. My recommend code for that is below:
base_model=tf.keras.applications.mobilenet.MobileNet( include_top=False,
input_shape=(200,200,3) pooling='max', weights='imagenet',dropout=.4)
x=base_model.output
x=keras.layers.BatchNormalization(axis=-1, momentum=0.99, epsilon=0.001 )(x)
x = Dense(1024, activation='relu')(x)
x=Dropout(rate=.3, seed=123)(x)
output=Dense(2, activation='softmax')(x)
model=Model(inputs=base_model.input, outputs=output)
model.compile(Adamax(lr=.001),loss='categorical_crossentropy',metrics=
['accuracy'] )
In model.fit include the callbacks as shown above.

Using Tensorflow 2.0 and eager execution without Keras

So this question might stem from a lack of knowledge about tensorflow. But I am trying to build a multilayer perceptron with tensorflow 2.0, but without Keras.
The reason being that it is a requirement for my machine learning course that we do not use keras. Why you might ask? I am not sure.
I already have implemented our model in tensorflow 2.0 with Keras ease, and now I want to do the exact same thing without keras.
model = Sequential()
model.add(Dense(64, activation='relu', input_dim=784))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(5, activation='softmax'))
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy',
optimizer=Adam(),
metrics=['accuracy'])
X_train = X[:7000]
y_train = tf.keras.utils.to_categorical(y[:7000], num_classes=5)
X_dev = X[7000:]
y_dev = tf.keras.utils.to_categorical(y[7000:], num_classes=5)
model.fit(X_train, y_train,
epochs=100,
batch_size=128)
score = model.evaluate(X_dev, y_dev, batch_size=128)
print(score)
Here is my problem. Whenever I look up the documentation on Tensorflow 2.0, then even the guides on custom training are using Keras.
As placeholders and sessions are a thing of the past in tensorflow 2.0, as I understand it, then I am a bit unsure of how to structure it.
I can make tensor objects. I have the impression that I need to use eager execution and use gradient tape. But I still am unsure of how to put these things together.
Now my question is. Where should I look to get a better understanding? Which direction has the greatest descent?
Please do tell me if I am doing this stack overflow post wrong. It is my first time here.
As #Daniel Möller stated, there are these tutorials for custom training and custom layers on the official TensorFlow page. As stated on the custom training page:
This tutorial used tf.Variable to build and train a simple linear model.
There is also this blog that creates custom layers and training without Keras API. You can check this code on Google Colab, which uses Cifar-10 with custom layers and training in the same manner.

How does exactly fitting with SGD in Keas works?

I'm a newbie in a Neural Networks. I'm doing my university NN project in Keras. I assembled and trained the one-layer sequential model using SGD optimizer:
[...]
nn_model = Sequential()
nn_model.add(Dense(32, input_dim=X_train.shape[1], activation='tanh'))
nn_model.add(Dense(1, activation='tanh'))
sgd = keras.optimizers.SGD(lr=0.001, momentum=0.25)
nn_model.compile(loss='mean_squared_error', optimizer=sgd, metrics=['accuracy'])
history = nn_model.fit(X_train, Y_train, epochs=2000, verbose=2, validation_data=(X_test, Y_test))
[...]
I've tried various learning rate, momentum, neurons and get satisfying accuracy and error results. But, I have to know how keras works. So could you please explain me how exactly fitting in keras works, because I can't find it in Keras documentation?
How does Keras update weights? Does it use a backpropagation algorithm? (I'm 95% sure.)
How SGD algorithm is implemented in Keras? Is it similar to Wikipedia explanation?
How Keras exactly calculate a gradient?
Thank you kindly for any information.
Let's try to break it down and I'll cover only Keras specific bits:
How does Keras update weights? Using an Optimiser which is a base class for different optimisers. Each optimiser calculates the new weights, under a function get_updates which returns a list of functions when run applies the updates.
Back-propagation? Yes, but Keras doesn't implement it directly, it leaves it for the backend tensor libraries to perform automatic differentiation. For example K.gradients calls tf.gradients in the Tensorflow backend.
SGD algorithm? It is implemented as expected on Wikipedia in the SGD class with the basic extensions such as momentum. You can follow the code easily and how it calculates the updates.
How gradient is calculated? Using back-propagation.

Learning Keras model by using Distributed Tensorflow

I have two GPU installed on two different machines. I want to build a cluster that allows me to learn a Keras model by using the two GPUs together.
Keras blog shows two slices of code in Distributed training section and link official Tensorflow documentation.
My problem is that I don't know how to learn my model and put into practice what is reported in Tensorflow documentation.
For example, what should I do if I want to execute the following code on a cluster of multiple GPU?
# For a single-input model with 2 classes (binary classification):
model = Sequential()
model.add(Dense(32, activation='relu', input_dim=100))
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer='rmsprop',
loss='binary_crossentropy',
metrics=['accuracy'])
# Generate dummy data
import numpy as np
data = np.random.random((1000, 100))
labels = np.random.randint(2, size=(1000, 1))
# Train the model, iterating on the data in batches of 32 samples
model.fit(data, labels, epochs=10, batch_size=32)
In the first and second part of the blog he explains how to use keras models with tensorflow.
Also I found this example of keras with distributed training.
And here is another with horovod.

Categories