I saw a post about Dog and Cat classification, link to that blog
https://medium.com/#mrgarg.rajat/kaggle-dogs-vs-cats-challenge-complete-step-by-step-guide-part-2-e9ee4967b9
but in the code, author show architecture of CNN network like this
model = Sequential()
model.add(Conv2D(32, (3,3), input_shape=(ROWS, COLS, CHANNELS), activation='relu'))
model.add(MaxPooling2D(pool_size = (2,2)))
model.add(Conv2D(64, (3,3), activation='relu'))
model.add(MaxPooling2D(pool_size = (2,2)))
model.add(Dropout(0.4))
model.add(Conv2D(128, (3,3), activation='relu'))
model.add(MaxPooling2D(pool_size = (2,2)))
model.add(Dropout(0.4))
model.add(Conv2D(256, (3,3), activation='relu'))
model.add(MaxPooling2D(pool_size = (2,2)))
model.add(Dropout(0.4))
model.add(Conv2D(512, (1,1), activation='relu'))
#model.add(MaxPooling2D(pool_size = (2,2)))
model.add(Flatten())
model.add(Dropout(0.4))
model.add(Dense(units=120, activation='relu'))
model.add(Dense(units=2, activation='sigmoid'))
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.summary()
I really don't know how author use this , like, how he think a network like this but none other, why need
4 con2d and 2 dense, and how he use dropout, please help me explain this network, thank a lot
in this CNN:
ROW and COL is 64
CHANNELS is 3
Your question is about the architecture of the model. The real answer is that, there is actually no straight-froward to find the good one which does the job in the very first place. That is actually the main task in R&D, with numerous cross-validation to estimate a good architecture for a specific problem. And that is one of the reasons why pre-trained models are so precious.
There is a branch of ML called AutoML, which tries to navigate through space and approximately find the good architecture. Feel free to have a look.
If you want to learn about why people use certain CNN models, you'll need to read research papers about how CNN models were developed. A couple of models you might want to read up on are VGG, ResNet, AlexNet, and Inception. Most of these bloggers making articles are basing their work off these models since they've been shown to be highly effective.
These models are developed through a combination of theory (coming up with ideas about what might work) and testing (actually trying and adding in all of the layers).
Understand, much of this is built on matrix algebra, probability, and calculus so if you don't know these subjects, you need to start there if you really want to understand what's going on under the hood.
Related
I am trying to figure out if I can use fastai for my problem.
I am trying to classify sequences of floats. Each sequence is a vector of 24 floats. In principle, item 0 in the vector effects item 1, which effects item 2, etc., so an LSTM is of interest. I am open treating the data as a non-sequence and modeling with some kind of 1d CNN as well. I would like to be able to be able to predict (binary classification) the label for each vector I pass in to the model.
Can fastai support this kind of model? I have a LSTM trained in keras on this data that performs well, but I need to use torch or fastai for a variety of reasons. The architecture looks like this:
model = Sequential()
model.add(Bidirectional(LSTM(32, input_shape=(n_timesteps,n_features))))
model.add(Dropout(0.5))
model.add(Dense(12, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(n_outputs, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
I've used fastai a bunch for image and text classification, but I can't figure out how to formulate this problem in fastai. Any ideas?
I know, there are a lot of related questions, but they are outdated, mostly they are even dealing with TensorFlow 1.
I have 1 GPU (GeForce 960) which is recognized by TensorFlow, so the installation was successful.
I'm not sure if this is the right way to do it, but this is how I train a Keras-model:
def create_model():
model = Sequential()
model.add(Conv2D(128, (3,3), padding="valid"))
model.add(layers.BatchNormalization())
model.add(layers.Activation(activations.relu))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.2))
model.add(Flatten())
model.add(layers.Dense(10, activation="softmax"))
return model
strategy = tf.distribute.MirroredStrategy(devices=["/gpu:0"])
with strategy.scope():
model = create_model()
model.compile(loss="categorical_crossentropy",
optimizer="adam",
metrics=["acc"])
train_dataset, test_dataset = get_dataset()
model.fit(train_dataset,
epochs=20,
verbose=1,
validation_data=test_dataset)
But I get a lot of problems:
The exact same code is equally fast when I turn off the Strategy part
I always get this "warning": BaseCollectiveExecutor::StartAbort Out of range: End of sequence
I found out, that when I run this code, with the strategy part turned off, in a different Anaconda environment which does not have GPU support (CUDA etc), then it is way slowlier. So, is the GPU automatically used when you are in a GPU supporting environment (because, as stated in 1., it is equally fast without the strategy part)?
Is this the right way to use my GPU? If not, what is the right way?
In RNN neural network,
what does the number 128 behind LSTM mean?
# RNN Recurrent Neural Network architecture
model = Sequential()
model.add(LSTM(128, input_shape=(X_train.shape[1:]), return_sequences=True))
#model.add(Dropout(0.2))
model.add(BatchNormalization())
I think that the following link can provide clear answer to this question.
https://www.tensorflow.org/api_docs/python/tf/compat/v1/keras/layers/LSTM
According to the link, it's dimension of the output space for LSTM.
It is the number of LSTM node you want to use (perpendicular to the flow of information).
I've created keras model to recognize human activity, based on data from mobile accelerometer:
model = Sequential()
model.add(Reshape((const.PERIOD, const.N_FEATURES), input_shape=(240,)))
model.add(Conv1D(100, 10, activation='relu', input_shape=(const.PERIOD, const.N_FEATURES)))
model.add(Conv1D(100, 10, activation='relu'))
model.add(MaxPooling1D(const.N_FEATURES))
model.add(Conv1D(160, 10, activation='relu'))
model.add(Conv1D(160, 10, activation='relu'))
model.add(Flatten())
model.add(Dropout(0.5))
model.add(Dense(7, activation='softmax'))
model.summary()
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
I've tested model, and the accuracy after ten epochs is like 85-90%. I don't know, but when I converse my model to TF Lite and I run interpreter in my android app, there's horrible predictions. What can be reason of that bad results? No compatibility on keras -> tensorflow -> tensorflow lite line? Should I run it with another way, using something like servlet + keras model?
A few suggestions:
Try to visualize your tflite graph with
https://lutzroeder.github.io/netron/. See if there's anything
unexpected.
Try to debug with tensorflow lite's python API first. Feed the same
input to the keras model and tflite model and compare the output
tensor.
I am going through tutorial for handwritten text recognition. And to do hand written digit recognition the author has constructed a Keras model as follows:
# # Creating CNN model
input_shape = (28,28,1)
number_of_classes = 10
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),activation='relu',input_shape=input_shape))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPool2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(number_of_classes, activation='softmax'))
model.compile(loss=keras.losses.categorical_crossentropy,
optimizer=keras.optimizers.Adadelta(),metrics=['accuracy'])
model.summary()
history = model.fit(X_train, y_train,epochs=5, shuffle=True,
batch_size = 200,validation_data= (X_test, y_test))
model.save('digit_classifier2.h5')
Source (here)
I am very confused that on how has the author choose these layers. I know how Conv2D works by applying filters to an image, I know what is activation function. In short I have a rough understanding of what each term means.
What I am finding it difficult is how do I know what is happening in each step of this code?
For example lets take this python code:
values_List=[11,34,43]
for index, num in enumerate(values_List):
print(index,num)
I know that line 1 initializes a list named values_List
Line 2 iterates through this list
Line 3 prints output as (index of a number , number)
This python code is easy to understand and debug. But I am confused that if there is any error inside the keras layers. How do I proceed to debug this Keras code ? How do I see output on each step inside the Keras code ?
In short, you can't easily debug in Keras cause it is a high-level API made for the faster and easier implementations of Neural network architecture using pre-defined layers and functions there is less chance of error inside these layers or function cause it is well tested.
If you want to more fine-grained control on you you need to implement in Low-level API like Tensorflow v1 or use tf.GradientTape with tf-keras in TensorFlow v2 to see gradients at each step.
You can also try Tensorwatch by Microsoft for a deeper understanding of your model -
https://github.com/microsoft/tensorwatch