How do I debug keras model - python

I am going through tutorial for handwritten text recognition. And to do hand written digit recognition the author has constructed a Keras model as follows:
# # Creating CNN model
input_shape = (28,28,1)
number_of_classes = 10
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),activation='relu',input_shape=input_shape))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPool2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(number_of_classes, activation='softmax'))
model.compile(loss=keras.losses.categorical_crossentropy,
optimizer=keras.optimizers.Adadelta(),metrics=['accuracy'])
model.summary()
history = model.fit(X_train, y_train,epochs=5, shuffle=True,
batch_size = 200,validation_data= (X_test, y_test))
model.save('digit_classifier2.h5')
Source (here)
I am very confused that on how has the author choose these layers. I know how Conv2D works by applying filters to an image, I know what is activation function. In short I have a rough understanding of what each term means.
What I am finding it difficult is how do I know what is happening in each step of this code?
For example lets take this python code:
values_List=[11,34,43]
for index, num in enumerate(values_List):
print(index,num)
I know that line 1 initializes a list named values_List
Line 2 iterates through this list
Line 3 prints output as (index of a number , number)
This python code is easy to understand and debug. But I am confused that if there is any error inside the keras layers. How do I proceed to debug this Keras code ? How do I see output on each step inside the Keras code ?

In short, you can't easily debug in Keras cause it is a high-level API made for the faster and easier implementations of Neural network architecture using pre-defined layers and functions there is less chance of error inside these layers or function cause it is well tested.
If you want to more fine-grained control on you you need to implement in Low-level API like Tensorflow v1 or use tf.GradientTape with tf-keras in TensorFlow v2 to see gradients at each step.
You can also try Tensorwatch by Microsoft for a deeper understanding of your model -
https://github.com/microsoft/tensorwatch

Related

Can I perform non-text sequence classification in fastai?

I am trying to figure out if I can use fastai for my problem.
I am trying to classify sequences of floats. Each sequence is a vector of 24 floats. In principle, item 0 in the vector effects item 1, which effects item 2, etc., so an LSTM is of interest. I am open treating the data as a non-sequence and modeling with some kind of 1d CNN as well. I would like to be able to be able to predict (binary classification) the label for each vector I pass in to the model.
Can fastai support this kind of model? I have a LSTM trained in keras on this data that performs well, but I need to use torch or fastai for a variety of reasons. The architecture looks like this:
model = Sequential()
model.add(Bidirectional(LSTM(32, input_shape=(n_timesteps,n_features))))
model.add(Dropout(0.5))
model.add(Dense(12, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(n_outputs, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
I've used fastai a bunch for image and text classification, but I can't figure out how to formulate this problem in fastai. Any ideas?

Tensorflow: Train Keras model using GPU

I know, there are a lot of related questions, but they are outdated, mostly they are even dealing with TensorFlow 1.
I have 1 GPU (GeForce 960) which is recognized by TensorFlow, so the installation was successful.
I'm not sure if this is the right way to do it, but this is how I train a Keras-model:
def create_model():
model = Sequential()
model.add(Conv2D(128, (3,3), padding="valid"))
model.add(layers.BatchNormalization())
model.add(layers.Activation(activations.relu))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.2))
model.add(Flatten())
model.add(layers.Dense(10, activation="softmax"))
return model
strategy = tf.distribute.MirroredStrategy(devices=["/gpu:0"])
with strategy.scope():
model = create_model()
model.compile(loss="categorical_crossentropy",
optimizer="adam",
metrics=["acc"])
train_dataset, test_dataset = get_dataset()
model.fit(train_dataset,
epochs=20,
verbose=1,
validation_data=test_dataset)
But I get a lot of problems:
The exact same code is equally fast when I turn off the Strategy part
I always get this "warning": BaseCollectiveExecutor::StartAbort Out of range: End of sequence
I found out, that when I run this code, with the strategy part turned off, in a different Anaconda environment which does not have GPU support (CUDA etc), then it is way slowlier. So, is the GPU automatically used when you are in a GPU supporting environment (because, as stated in 1., it is equally fast without the strategy part)?
Is this the right way to use my GPU? If not, what is the right way?

how can we generate a CNN model?

I saw a post about Dog and Cat classification, link to that blog
https://medium.com/#mrgarg.rajat/kaggle-dogs-vs-cats-challenge-complete-step-by-step-guide-part-2-e9ee4967b9
but in the code, author show architecture of CNN network like this
model = Sequential()
model.add(Conv2D(32, (3,3), input_shape=(ROWS, COLS, CHANNELS), activation='relu'))
model.add(MaxPooling2D(pool_size = (2,2)))
model.add(Conv2D(64, (3,3), activation='relu'))
model.add(MaxPooling2D(pool_size = (2,2)))
model.add(Dropout(0.4))
model.add(Conv2D(128, (3,3), activation='relu'))
model.add(MaxPooling2D(pool_size = (2,2)))
model.add(Dropout(0.4))
model.add(Conv2D(256, (3,3), activation='relu'))
model.add(MaxPooling2D(pool_size = (2,2)))
model.add(Dropout(0.4))
model.add(Conv2D(512, (1,1), activation='relu'))
#model.add(MaxPooling2D(pool_size = (2,2)))
model.add(Flatten())
model.add(Dropout(0.4))
model.add(Dense(units=120, activation='relu'))
model.add(Dense(units=2, activation='sigmoid'))
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.summary()
I really don't know how author use this , like, how he think a network like this but none other, why need
4 con2d and 2 dense, and how he use dropout, please help me explain this network, thank a lot
in this CNN:
ROW and COL is 64
CHANNELS is 3
Your question is about the architecture of the model. The real answer is that, there is actually no straight-froward to find the good one which does the job in the very first place. That is actually the main task in R&D, with numerous cross-validation to estimate a good architecture for a specific problem. And that is one of the reasons why pre-trained models are so precious.
There is a branch of ML called AutoML, which tries to navigate through space and approximately find the good architecture. Feel free to have a look.
If you want to learn about why people use certain CNN models, you'll need to read research papers about how CNN models were developed. A couple of models you might want to read up on are VGG, ResNet, AlexNet, and Inception. Most of these bloggers making articles are basing their work off these models since they've been shown to be highly effective.
These models are developed through a combination of theory (coming up with ideas about what might work) and testing (actually trying and adding in all of the layers).
Understand, much of this is built on matrix algebra, probability, and calculus so if you don't know these subjects, you need to start there if you really want to understand what's going on under the hood.

Using Tensorflow 2.0 and eager execution without Keras

So this question might stem from a lack of knowledge about tensorflow. But I am trying to build a multilayer perceptron with tensorflow 2.0, but without Keras.
The reason being that it is a requirement for my machine learning course that we do not use keras. Why you might ask? I am not sure.
I already have implemented our model in tensorflow 2.0 with Keras ease, and now I want to do the exact same thing without keras.
model = Sequential()
model.add(Dense(64, activation='relu', input_dim=784))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(5, activation='softmax'))
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy',
optimizer=Adam(),
metrics=['accuracy'])
X_train = X[:7000]
y_train = tf.keras.utils.to_categorical(y[:7000], num_classes=5)
X_dev = X[7000:]
y_dev = tf.keras.utils.to_categorical(y[7000:], num_classes=5)
model.fit(X_train, y_train,
epochs=100,
batch_size=128)
score = model.evaluate(X_dev, y_dev, batch_size=128)
print(score)
Here is my problem. Whenever I look up the documentation on Tensorflow 2.0, then even the guides on custom training are using Keras.
As placeholders and sessions are a thing of the past in tensorflow 2.0, as I understand it, then I am a bit unsure of how to structure it.
I can make tensor objects. I have the impression that I need to use eager execution and use gradient tape. But I still am unsure of how to put these things together.
Now my question is. Where should I look to get a better understanding? Which direction has the greatest descent?
Please do tell me if I am doing this stack overflow post wrong. It is my first time here.
As #Daniel Möller stated, there are these tutorials for custom training and custom layers on the official TensorFlow page. As stated on the custom training page:
This tutorial used tf.Variable to build and train a simple linear model.
There is also this blog that creates custom layers and training without Keras API. You can check this code on Google Colab, which uses Cifar-10 with custom layers and training in the same manner.

Why not all the activation functions are identical?

This is what I picked up from somewhere on the Internet.
This is a very simple GAN+CNN modeling code especially for a descrinimator model, written in keras python3.6.
It works pretty fine but I've got something not clear.
def __init__(self):
self.img_rows = 28
self.img_cols = 28
self.channels = 1
def build_discriminator(self):
img_shape = (self.img_rows, self.img_cols, self.channels)
model = Sequential()
model.add(Conv2D(64,5,5, strides=(2,2)
padding='same', input_shape=img_shape))
model.add(LeakyReLU(0.2))
model.add(Conv2D(128,5,5,strides=(2,2)))
model.add(LeakyReLU(0.2))
model.add(Flatten())
model.add(Dense(256))
model.add(LeakyReLU(0.2))
model.add(Dropout(0.5))
model.add(Dense(1))
model.add(Activation('sigmoid'))
return model
There are some activation functions appearing but why aren't they all identical?
If the very last output is 'sigmoid' here, I think the rest also better be the same functions?
Why are LeakyReLU is used in the middle??Thanks.
I guess they didn't use sigmoid for the rest of the layers, because with sigmoid you have a big problem of vanishing gradients in deep networks.
The reason is, that the sigmoid function "flattens out" on both sides around zero giving the layers towards the output layer a tendency to produce very small gradients and thus small learning rates, because loosely speaking, the gradient of the deeper layers is kind of a product of the gradients of the lower layers as a result of the chain rule of derivation. So if you have just few sigmoid layers, you might have luck with them, but as soon as you chain several of them, they produce instability in the gradients.
Its too complex for me to explain it in an article here, but if you want to know it in more detail, you can read it in a chapter of a online book.
Btw. this book is really great. It's worth reading more. Probably to understand the chapter, you have to read chapter 1 of the book first, if you don't know how back propagation works.
The output and hidden layer activation functions do not have to be identical. Hidden layer activations are part of the mechanisms that learns features, so its important that they do not have vanishing gradient issues (like sigmoid has), while the output layer activation function is more related to the output task, for example, softmax activation for classification.

Categories