Training a image classifier with over 300k classes - python

Is it possible to train a image classifier network with over an enormous number of classes? (say 300k classes), with each class having a minimum of 10 images split up between train/test/validation (ie. >3mil 250x250x3 images).
I have tried to train the dataset using the ResNet50 model and decreasing the batch size to as low as 1, but still have been running into OOM issues (2080 Ti). I have found out that the OOM is caused by having too many parameters and ergo I have resorted to trying to train the network on an extremely basic 10-layer model with a batch size of 1. It runs, but, the speed/accuracy is unsurprisingly abysmal.
Is there anyway I can maybe divide the training sets into smaller sections of classes, such that:
1st .h5 = classes 1 ~ 20,000
2nd .h5 = classes 20,001 ~ 40,000
3rd .h5 = classes 40,001 ~ 60,000, etc.
and later merging into a single h5 file that can be loaded to recognize all 300k different classes?
EDIT PER ASHISH'S SUGGESTION:
I have (I think) successfully merged 2 models into one, but the merged model has somewhat doubled in the number of layers...
Source code:
model1 = load_model('001.h5')
model2 = load_model('002.h5')
for layer in model1.layers:
layer._name = layer._name + "_1" # avoid duplicate layer names, which would otherwise throw an error
layer.trainable = False
for layer in model2.layers:
layer._name = layer._name + "_2"
layer.trainable = False
x1 = model1.layers[-1].output
classes = x1.shape[1]
x1 = Dense(classes, activation='relu', name='out1')(x1)
x2 = model2.layers[-1].output
x2 = Dense(x2.shape[1], activation='relu', name='out2')(x2)
classes += x2.shape[1]
x = concatenate([x1, x2])
output_layer = Dense(classes, activation='softmax', name='combined_layer')(x)
new_model = Model(inputs=[model1.inputs, model2.inputs], outputs=output_layer)
new_model.summary()
new_model.save('new_model.h5', overwrite=True)
And the resulting model looks like this:
Model: "model"
_________________________________________________________________________
Layer (type) Output Shape Param # Connected to
=========================================================================
input_1_1 (InputLayer) [(None, 224, 224, 3) 0
_________________________________________________________________________
input_1_2 (InputLayer) [(None, 224, 224, 3) 0
_________________________________________________________________________
conv1_pad_1 (ZeroPadding2D) (None, 230, 230, 3) 0 input_1_1[0][0]
_________________________________________________________________________
conv1_pad_2 (ZeroPadding2D) (None, 230, 230, 3) 0 input_1_2[0][0]
_________________________________________________________________________
conv1_conv_1 (Conv2D) (None, 112, 112, 64) 9472 conv1_pad_1[0][0]
_________________________________________________________________________
conv1_conv_2 (Conv2D) (None, 112, 112, 64) 9472 conv1_pad_2[0][0]
...
...
conv5_block3_out_1 (Activation) (None, 7, 7, 2048) 0 conv5_block3_add_1[0][0]
_________________________________________________________________________
conv5_block3_out_2 (Activation) (None, 7, 7, 2048) 0 conv5_block3_add_2[0][0]
_________________________________________________________________________
avg_pool_1 (GlobalAveragePoolin (None, 2048) 0 conv5_block3_out_1[0][0]
_________________________________________________________________________
avg_pool_2 (GlobalAveragePoolin (None, 2048) 0 conv5_block3_out_2[0][0]
_________________________________________________________________________
probs_1 (Dense) (None, 953) 1952697 avg_pool_1[0][0]
_________________________________________________________________________
probs_2 (Dense) (None, 3891) 7972659 avg_pool_2[0][0]
_________________________________________________________________________
out1 (Dense) (None, 953) 909162 probs_1[0][0]
_________________________________________________________________________
out2 (Dense) (None, 3891) 15143772 probs_2[0][0]
_________________________________________________________________________
concatenate (Concatenate) (None, 4844) 0 out1[0][0]
out2[0][0]
_________________________________________________________________________
combined_layer (Dense) (None, 4844) 23469180 concatenate[0][0]
=========================================================================
Total params: 96,622,894
Trainable params: 39,522,114
Non-trainable params: 57,100,780
As you can see, all the layers have been doubled due to Model(inputs=[input1, input2]). That will cause problems for me later when I want to use this model to predict images. Is there anyway I can do this without doubling all the previous layers and just add the trailing dense layers? At this rate I'll be overloaded with the number of parameters even faster than before...

technically it's possible, So what you can do is since you have 3 classifiers(1.h5,2.h5,3.h5), you can load these model with their weights and then use functional API in tensorflow https://www.tensorflow.org/guide/keras/functional where concatenate() API will combine output of the 3 classifiers to single vector and then use few dense network with activation function to make the final prediction.

Related

Cannot resolve error in keras sequential model

I am working on a gesture recognition problem. For that I have a train set. Train set consists of multiple folders and each folder consists of a series of 30 images. From those images the model is trained. Also I have a csv file that contains the class label of each folder. The class labels are : "Left Swipe", "Right Swipe", "Stop", "Thumbs Down" and "Thumbs Up". Those labels are present in one np.array variable train_class. Now, I have created a CNN model then feeding that in a Sequential model.
The code is available in below GIT location
https://github.com/subhrajyoti-ghosh/ML-and-Deep-Learning/blob/main/Gesture_Recognition.ipynb
But when I am trying to fit the model, I am receiving error. Can you please help me understanding the error and how to solve that?
You are trying to use a TimeDistributed layer on a 2D input (batch_size, 256), which will not work, because the layer needs at least a 3D tensor. You should try using tf.keras.layers.RepeatVector:
import tensorflow as tf
resnet = tf.keras.applications.ResNet50(include_top=False,weights='imagenet',input_shape=(224,224,3))
cnn = tf.keras.Sequential([resnet])
cnn.add(tf.keras.layers.Conv2D(64,(2,2),strides=(1,1)))
cnn.add(tf.keras.layers.Conv2D(16,(3,3),strides=(1,1)))
cnn.add(tf.keras.layers.Flatten())
inputs = tf.keras.layers.Input(shape=(224,224,3))
x = cnn(inputs)
x = tf.keras.layers.RepeatVector(n=30)(x)
x = tf.keras.layers.GRU(16,return_sequences=True)(x)
x = tf.keras.layers.GRU(8)(x)
outputs = tf.keras.layers.Dense(5,activation='softmax')(x)
model = tf.keras.Model(inputs, outputs)
dummy_x = tf.random.normal((1, 224,224,3))
print(model.summary())
print(model(dummy_x))
Model: "model_2"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_14 (InputLayer) [(None, 224, 224, 3)] 0
sequential_6 (Sequential) (None, 256) 24121296
repeat_vector_2 (RepeatVect (None, 30, 256) 0
or)
gru_5 (GRU) (None, 30, 16) 13152
gru_6 (GRU) (None, 8) 624
dense_7 (Dense) (None, 5) 45
=================================================================
Total params: 24,135,117
Trainable params: 24,081,997
Non-trainable params: 53,120
_________________________________________________________________
None

Keras Sequential model input: How significant are the dimensions?

I am trying to build a multioutput classifier on 3D data structured like [sampleID, timestamp, deviceID, sensorID] with one-hot labels like [sampleID, deviceID] to determine which device "wins".
In a nutshell, it is a massive collection of timeseries readings from five sensors taken at regular intervals from each of four different devices. The objective is to determine which of the devices is most likely to be in a particular state at the end of each sampleID. The labels are a one-hot representation of the devices.
In a case like this where a human would find meaning in the structure of the dataset, does the training process derive similar benefit? Can I simplify my dataset by reducing it to [dataset, deviceID, timestamp X sensor] or even [dataset, deviceID X timestamp X sensor] and still get similar accuracy?
In other words would simplifying the following dataset:
[10000, 1000, 4, 5]
down to
[10000, 4, 5000]
or
[10000, 1000, 20]
or even
[10000, 20000]
significantly diminish the model's ability to classify output?
Edited to for detail and formatting.
IIUC, you are asking if using 1000 timesteps for 20 objects (device X sensor) is better than using 1000 timesteps for 4 devices for 5 sensors.
There is no way of actually determining which would better model your problem, but, we can quickly build some tests to see which models capture the complexity of the problem better.
Case 1: 1000 time steps, 20 objects -> Sequential LSTM based model
If you consider the 20 sensors individually, you can simply use a LSTM based model and let the model handle the non linear relationships between them. Since you have a 2D input, simply build reshape your data and build a model in the following structure. Feel free to add more layers and activations etc.
from tensorflow.keras import layers, Model, utils
#Temporal model
inp = layers.Input((1000,20))
x = layers.LSTM(30, return_sequences=True)(inp)
x = layers.LSTM(30)(x)
out = layers.Dense(4, activation='softmax')(x)
model = Model(inp, out)
model.summary()
Model: "model_4"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_6 (InputLayer) [(None, 1000, 20)] 0
_________________________________________________________________
lstm_4 (LSTM) (None, 1000, 30) 6120
_________________________________________________________________
lstm_5 (LSTM) (None, 30) 7320
_________________________________________________________________
dense_20 (Dense) (None, 4) 124
=================================================================
Total params: 13,564
Trainable params: 13,564
Non-trainable params: 0
_________________________________________________________________
Case 2: 1000 time steps, 4x5 objects -> Conv-LSTM based model
Since you have a 3D input, you want to consider the 4x5 as your spatial axes and your 1000 as your channels/feature maps/temporal features. Since your data type has channels_first, do specify them in the Conv2D as well as MaxPooling2D layers.
Then, once you have convolved over the spatial axes, you can start working on the feature maps with an LSTM. Sample code below, feel free to modify and build on top of this.
from tensorflow.keras import layers, Model, utils
#Conv-LSTM model
inp = layers.Input((1000,4,5))
x = layers.Conv2D(30,2, data_format="channels_first")(inp)
x = layers.MaxPooling2D(2, data_format="channels_first")(x)
x = layers.Reshape((-1,2))(x)
x = layers.LSTM(20)(x)
out = layers.Dense(4, activation='softmax')(x)
model = Model(inp, out)
model.summary()
Model: "model_21"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_25 (InputLayer) [(None, 1000, 4, 5)] 0
_________________________________________________________________
conv2d_19 (Conv2D) (None, 30, 3, 4) 120030
_________________________________________________________________
max_pooling2d_14 (MaxPooling (None, 30, 1, 2) 0
_________________________________________________________________
reshape_10 (Reshape) (None, 30, 2) 0
_________________________________________________________________
lstm_19 (LSTM) (None, 20) 1840
_________________________________________________________________
dense_30 (Dense) (None, 4) 84
=================================================================
Total params: 121,954
Trainable params: 121,954
Non-trainable params: 0
_________________________________________________________________

Acces to last convolutional layer transfer learning

I'm trying to get some heatmaps from a computervision model that's it's already working to classify images but I'm finding some difficulties.
This is the model summary:
model.summary()
Model: "model_4"
Layer (type) Output Shape Param #
=================================================================
input_9 (InputLayer) [(None, 512, 512, 1)] 0
_________________________________________________________________
conv2d_4 (Conv2D) (None, 512, 512, 3) 30
_________________________________________________________________
densenet121 (Functional) (None, 1024) 7037504
_________________________________________________________________
dense_4 (Dense) (None, 100) 102500
_________________________________________________________________
dropout_4 (Dropout) (None, 100) 0
_________________________________________________________________
predictions (Dense) (None, 2) 202
=================================================================
Total params: 7,140,236
Trainable params: 7,056,588
Non-trainable params: 83,648
As part of the standard procces to create a heatmap, I know I have to acces to the last convolutional layer in the model, that in this case I'll say it's a layer inside the Densenet121, but I can not find a way to access to all the layers belonging to densenet121.
Right now, I've been using conv2d_4 layer to run some tests, but I feel is not the right way because that layer is before all the Transfer learning work from densenet.
Also, I just looked up for Funcitnal layers in KErar official documentation but I cound't find it, so I guess it's not a layer, it's like the hole densenet model embedded there, but I can not find a way to access.
By the way, here I share the model construction because it may help to answer this:
from tensorflow.keras.applications.densenet import DenseNet121
num_classes = 2
input_tensor = Input(shape=(IMG_SIZE,IMG_SIZE,1))
x = Conv2D(3,(3,3), padding='same')(input_tensor)
x = DenseNet121(include_top=False, classes=2, pooling="avg", weights="imagenet")(x)
x = Dense(100)(x)
x = Dropout(0.45)(x)
predictions = Dense(num_classes, activation='softmax', name="predictions")(x)
model = Model(inputs=input_tensor, outputs=predictions)
I found you can use
.get_layer()
twice to acces layers inside functional densenet model embebeed in the "main" model.
In this case I can use model.get_layer('densenet121').summary() to check all thje layer inside the embebeed model, and then use them with this code: model.get_layer('densenet121').get_layer('xxxxx')

How to feed and build a "Input->Dense->Conv2D->Dense" network in keras?

This is a simple example that reproduces my issue in a network I am trying to deploy.
I have an image input layer (which I need to maintain), then a Dense layer, Conv2D layer and a dense layer.
The idea is that the inputs are 10x10 images and the labels are 10x10 images. Inspired by my code and this example.
import numpy as np
from keras.models import Model
from keras.layers import Input, Conv2D
#Building model
size=10
a = Input(shape=(size,size,1))
hidden = Dense(size)(a)
hidden = Conv2D(kernel_size = (3,3), filters = size*size, activation='relu', padding='same')(hidden)
outputs = Dense(size, activation='sigmoid')(hidden)
model = Model(inputs=a, outputs=outputs)
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
#Create random data and accounting for 1 channel of data
n_images=55
data = np.random.randint(0,2,(n_images,size,size,1))
labels = np.random.randint(0,2,(n_images,size,size,1))
#Fit model
model.fit(data, labels, verbose=1, batch_size=10, epochs=20)
print(model.summary())
I get the following error: ValueError: Error when checking target: expected dense_92 to have shape (10, 10, 10) but got array with shape (10, 10, 1)
I don't get an error if I change:
outputs = Dense(size, activation='sigmoid')(hidden)
with:
outputs = Dense(1, activation='sigmoid')(hidden)
No idea how Dense(1) is even valid and how it allows 10x10 output signal as model.summary() indicates:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_26 (InputLayer) (None, 10, 10, 1) 0
_________________________________________________________________
dense_93 (Dense) (None, 10, 10, 10) 20
_________________________________________________________________
conv2d_9 (Conv2D) (None, 10, 10, 100) 9100
_________________________________________________________________
dense_94 (Dense) (None, 10, 10, 1) 101
=================================================================
Total params: 9,221
Trainable params: 9,221
Non-trainable params: 0
_________________________________________________________________
None
Well, according to your comments:
what I am trying to do isn't standard. I have set of images and for
each image I want to find a binary image of the same size that if the
value of its pixel is 1 it means the feature exists in the input image
the insight wether a pixel has a feature should be taken both from
local information (extracted by a convolution layers) and global
information extracted by Dense layers.
I guess you are looking for creating a two branch model where one branch consists of convolution layers and another one is simply one or more dense layers on top of each other (although, I should mention that in my opinion one convolution network may achieve what you are looking for, because the combination of pooling and convolution layers and then maybe some up-sampling layers at the end somehow preserves both local and global information). To define such a model, you can use Keras functional API like this:
from keras import models
from keras import layers
input_image = layers.Input(shape=(10, 10, 1))
# branch one: dense layers
b1 = layers.Flatten()(input_image)
b1 = layers.Dense(64, activation='relu')(b1)
b1_out = layers.Dense(32, activation='relu')(b1)
# branch two: conv + pooling layers
b2 = layers.Conv2D(32, (3,3), activation='relu')(input_image)
b2 = layers.MaxPooling2D((2,2))(b2)
b2 = layers.Conv2D(64, (3,3), activation='relu')(b2)
b2_out = layers.MaxPooling2D((2,2))(b2)
# merge two branches
flattened_b2 = layers.Flatten()(b2_out)
merged = layers.concatenate([b1_out, flattened_b2])
# add a final dense layer
output = layers.Dense(10*10, activation='sigmoid')(merged)
output = layers.Reshape((10,10))(output)
# create the model
model = models.Model(input_image, output)
model.compile(optimizer='rmsprop', loss='binary_crossentropy')
model.summary()
Model summary:
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) (None, 10, 10, 1) 0
__________________________________________________________________________________________________
conv2d_1 (Conv2D) (None, 8, 8, 32) 320 input_1[0][0]
__________________________________________________________________________________________________
max_pooling2d_1 (MaxPooling2D) (None, 4, 4, 32) 0 conv2d_1[0][0]
__________________________________________________________________________________________________
flatten_1 (Flatten) (None, 100) 0 input_1[0][0]
__________________________________________________________________________________________________
conv2d_2 (Conv2D) (None, 2, 2, 64) 18496 max_pooling2d_1[0][0]
__________________________________________________________________________________________________
dense_1 (Dense) (None, 64) 6464 flatten_1[0][0]
__________________________________________________________________________________________________
max_pooling2d_2 (MaxPooling2D) (None, 1, 1, 64) 0 conv2d_2[0][0]
__________________________________________________________________________________________________
dense_2 (Dense) (None, 32) 2080 dense_1[0][0]
__________________________________________________________________________________________________
flatten_2 (Flatten) (None, 64) 0 max_pooling2d_2[0][0]
__________________________________________________________________________________________________
concatenate_1 (Concatenate) (None, 96) 0 dense_2[0][0]
flatten_2[0][0]
__________________________________________________________________________________________________
dense_3 (Dense) (None, 100) 9700 concatenate_1[0][0]
__________________________________________________________________________________________________
reshape_1 (Reshape) (None, 10, 10) 0 dense_3[0][0]
==================================================================================================
Total params: 37,060
Trainable params: 37,060
Non-trainable params: 0
__________________________________________________________________________________________________
Note that this is one way of achieving what you are looking for and it may or may not work for the specific problem and the data you are working on. You may modify this model (e.g. remove the pooling layers or add more dense layers) or completely use another architecture with different kind of layers (e.g. up-sampling, conv2dtrans) to reach a better accuracy. At the end, you must experiment to find the perfect solution.
Edit:
For completeness here is how to generate data and fitting the network:
n_images=10
data = np.random.randint(0,2,(n_images,size,size,1))
labels = np.random.randint(0,2,(n_images,size,size,1))
model.fit(data, labels, verbose=1, batch_size=32, epochs=20)

Keras Functional API: LTSM returns a 2 dimensional array

I'm stack and I need the wisdom of stackoverflow.
I have a two inputs neural network implemented in Keras using the Functional API, the input shapes are:
X.shape, X_size.shape, y.shape
((123, 9), (123, 2), (123, 9, 10))
So, my problem is I want to get output shape from LSTMs have 3-D shape, in order to use my y tensor. I know, I can reshape my y to 2-D shape, but I want to use it as a 3-D array.
from keras.models import Model
from keras import layers
from keras import Input
# first input
list_input = Input(shape=(None,), dtype='int32', name='li')
embedded_list = layers.Embedding(100,90)(list_input)
encoded_list = layers.LSTM(4, name = "lstm1")(embedded_list)
# second input
size_input = Input(shape=(None,), dtype='int32', name='si')
embedded_size = layers.Embedding(100,10)(size_input)
encoded_size = layers.LSTM(4, name = "lstm2")(embedded_size)
# concatenate
concatenated = layers.concatenate([encoded_size, encoded_list], axis=-1)
answer = layers.Dense(90, activation='sigmoid', name = 'outpuy_layer')(concatenated)
model = Model([list_input, size_input], answer)
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=[f1])
Model summary:
____________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
====================================================================================================
si (InputLayer) (None, None) 0
____________________________________________________________________________________________________
li (InputLayer) (None, None) 0
____________________________________________________________________________________________________
embedding_16 (Embedding) (None, None, 10) 1000 si[0][0]
____________________________________________________________________________________________________
embedding_15 (Embedding) (None, None, 90) 9000 li[0][0]
____________________________________________________________________________________________________
lstm2 (LSTM) (None, 4) 240 embedding_16[0][0]
____________________________________________________________________________________________________
lstm1 (LSTM) (None, 4) 1520 embedding_15[0][0]
____________________________________________________________________________________________________
concatenate_8 (Concatenate) (None, 8) 0 lstm2[0][0]
lstm1[0][0]
____________________________________________________________________________________________________
outpuy_layer (Dense) (None, 90) 810 concatenate_8[0][0]
====================================================================================================
Total params: 12,570
Trainable params: 12,570
Non-trainable params: 0
One more time, the question is:
How to get output shape from LSTMs like (None, None, None/10)?
Keras ignores every timestep output except the last one by default, which creates a 2D array. To get a 3D array (meaning you get the output of every timestep), instantiate the layer with return_sequences set to True. In your case for example:
encoded_list = layers.LSTM(4, name = "lstm1", return_sequences=True)(embedded_list)

Categories