I am trying to build a really large model in Keras with 3 LSTM layers with 4096 hidden units each. Previously I had 1024 hidden units in each layer. The compile time for this network was reasonable. Each layer would add in about 1 to 2 seconds. Now that the model has 4096 hidden units per layer the add time for each layer is about 5 minutes. What I think is strange though is that the slow performance happens during the three calls to model.add(LSTM...) and not during model.compile(...). I need to use a larger network but this wait time is a little unbearable. It is not so bad for the training since that will take much longer but I don't want to sit through it every time I want to generate test output. Why does the add take so much time? Isn't add just defining the layer and all the time should be spent in the compile function? Also is there anything I can do about it?
print('Building Model')
model = Sequential()
model.add(LSTM(lstm_size, batch_input_shape = (batch_size, 1, len(bytes_set)), stateful = True, return_sequences = True, consume_less = consume_less))
model.add(Dropout(0.5))
print('Added LSTM_1')
model.add(LSTM(lstm_size, stateful = True, return_sequences = True, consume_less = consume_less))
model.add(Dropout(0.5))
print('Added LSTM_2')
model.add(LSTM(lstm_size, stateful = True, return_sequences = False, consume_less = consume_less))
model.add(Dropout(0.5))
print('Added LSTM_3')
model.add(Dense(len(bytes_set), activation = 'softmax'))
print('Compiling Model')
model.compile(optimizer = SGD(lr = 0.3, momentum = 0.9, decay = 1e-5, nesterov = True),
loss = 'categorical_crossentropy',
metrics = ['accuracy'])
Here is my .theanorc
[global]
floatX = float32
mode = FAST_RUN
device = gpu
exception_verbosity = high
[nvcc]
fastmath = 1
Here are is my model summary as requested. Unfortunately I have been running this new version for the past few hours so I dont want to make any new changes. This model has 4 LSTM layers of size 1500 each.
Layer (type) Output Shape Param # Connected to
====================================================================================================
lstm_1 (LSTM) (64, 1, 1500) 9774000 lstm_input_1[0][0]
____________________________________________________________________________________________________
dropout_1 (Dropout) (64, 1, 1500) 0 lstm_1[0][0]
____________________________________________________________________________________________________
lstm_2 (LSTM) (64, 1, 1500) 18006000 dropout_1[0][0]
____________________________________________________________________________________________________
dropout_2 (Dropout) (64, 1, 1500) 0 lstm_2[0][0]
____________________________________________________________________________________________________
lstm_3 (LSTM) (64, 1, 1500) 18006000 dropout_2[0][0]
____________________________________________________________________________________________________
dropout_3 (Dropout) (64, 1, 1500) 0 lstm_3[0][0]
____________________________________________________________________________________________________
lstm_4 (LSTM) (64, 1500) 18006000 dropout_3[0][0]
____________________________________________________________________________________________________
dropout_4 (Dropout) (64, 1500) 0 lstm_4[0][0]
____________________________________________________________________________________________________
dense_1 (Dense) (64, 128) 192128 dropout_4[0][0]
====================================================================================================
Total params: 63984128
____________________________________________________________________________________________________
It's slow because you are trying to allocate a matrix which needs at least 0.5GB memory. 4096 units * 4097 weights is already a huge number. LSTM has additional inner weights associated with input, output and forgetting gates. As you can see this sums up to a huge number.
UPDATE
I wrote my answer from my mobile and I wrote TB instead of GB. You can easily check the size of your model by adding:
print model.summary()
in both cases (1024 and 4096). Please share your results in a comment because I'm interested :)
Related
I have a regression task and time series data. For each observation I need to predict one outcome value. My data is a series of images. I have hand-crafted 32 features from my images. Images have 10 channels. My data has 4D shape: (observations, time steps, channels, features), e.g. (3348, 121, 10, 32). After normalisation one channel for one observation looks like this:
matplotlib.pyplot.matshow(normalized[170,:,0,:].transpose())
The figure shows 121 time steps (x-axis) and each time step has features on rows (32). The intensity of feature value is shown in colors. So there seems to be something happening in time.
Question 1: How to apply RNN in such a task?
Could I somehow use CNN to extract information from my 3rd and 4th axis of my data (as shown in figure above)?
A proposed solution (and troubles ahead):
I have flattened the data into 3D. I think I have sadly degenerated information for the learner, but at least this (almost) works:
m,n = xtrain4d.shape[:2]
xtrain3d = xtrain4d.reshape(m,n,-1)
Shape is now: (3348, 121, 320). Target ytrain.shape is (3348,).
Here's my LSTM:
def LSTMrecurrentNN(shape1, shape2):
model = Sequential()
model.add(LSTM(128, return_sequences=True, input_shape=(shape1, shape2)))
model.add(Dropout(0.5))
model.add(Dense(64, activation = 'relu'))
model.add(Dropout(0.5))
model.add(Dense(16, activation = 'relu'))
model.add(Dropout(0.5))
model.add(Dense(1, activation='linear'))
return model
model = LSTMrecurrentNN(xtrain3d.shape[1], xtrain3d.shape[2])
model.summary()
Model: "sequential_12"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_11 (LSTM) (None, 121, 128) 229888
_________________________________________________________________
dropout_18 (Dropout) (None, 121, 128) 0
_________________________________________________________________
dense_21 (Dense) (None, 121, 64) 8256
_________________________________________________________________
dropout_19 (Dropout) (None, 121, 64) 0
_________________________________________________________________
dense_22 (Dense) (None, 121, 16) 1040
_________________________________________________________________
dropout_20 (Dropout) (None, 121, 16) 0
_________________________________________________________________
dense_23 (Dense) (None, 121, 1) 17
=================================================================
Total params: 239,201
Trainable params: 239,201
Non-trainable params: 0
_________________________________________________________________
Running the model:
epochs = 20
batchsize=128
learningrate=0.001
epsilon=0.1
# monitor validation progress:
early = EarlyStopping(monitor = "val_loss", mode = "min", patience = 10)
callbacks_list = [early]
# compile:
model.compile(loss = 'mean_squared_error',
optimizer = Adam(learning_rate=learningrate, epsilon = epsilon),
metrics = ['mse'])
# and train the model
history = model.fit(xtrain3d, ytrain,
epochs=epochs, batch_size=batchsize, verbose=0,
validation_split = 0.20,
callbacks = callbacks_list)
# predict:
test_predictions = model.predict(Xtest)
Training and validation performance looks ok:
But on the test set the model predicts one value for all observations! The figure below shows how for 10 observations in the test set the model predicts already in the early time steps a value of 3267 that is close to the mean of target y.
Statistics tell the same:
scipy.stats.describe(test_predictions[:,-1,0])
DescribeResult(nobs=1544, minmax=(3267.813, 3267.813),
mean=3267.8127, variance=5.964328e-08, skewness=1.0, kurtosis=-2.0)
For target y:
scipy.stats.describe(ytest)
DescribeResult(nobs=1544, minmax=(0.0, 8000.0),
mean=3312.1081606217617,
variance=1381985.8476585718, skewness=0.2847730511366937, kurtosis=0.20894280037919222)
Question 2: Why the model predicts the same value for all?
Any hints, how to check LSTM behaviour (states)? I would like to know how far back it "remembers".
latent_dim = 500
embedding_dim = 256
# Encoder
encoder_inputs = Input(shape=(max_eng_len,))
enc_emb = Embedding(x_voc_size, embedding_dim,trainable=True)(encoder_inputs)
#LSTM 1
encoder_lstm1 = Bidirectional(LSTM(latent_dim,return_sequences=True,return_state=True))
encoder_output1, forw_state_h, forw_state_c, back_state_h, back_state_c = encoder_lstm1(enc_emb)
final_enc_h = Concatenate()([forw_state_h,back_state_h])
final_enc_c = Concatenate()([forw_state_c,back_state_c])
encoder_states =[final_enc_h, final_enc_c]
# Decoder
decoder_inputs = Input(shape=(None,))
dec_emb_layer = Embedding(y_voc_size, embedding_dim,trainable=True)
dec_emb = dec_emb_layer(decoder_inputs)
#LSTM using encoder_states as initial state
decoder_lstm = LSTM(latent_dim*2, return_sequences=True, return_state=True)
decoder_outputs,decoder_fwd_state, decoder_back_state = decoder_lstm(dec_emb,initial_state=encoder_states)
#from tensorflow.keras.layers import Attention
#Attention Layer
attention_layer = AttentionLayer()
attn_res, attn_weight = attention_layer([encoder_output1, decoder_outputs])
# Concat attention output and decoder LSTM output
decoder_concat_input = Concatenate(axis=-1, name='concat_layer')([decoder_outputs, attn_res])
#Dense layer
decoder_dense = TimeDistributed(Dense(y_voc_size, activation='softmax'))
decoder_outputs = decoder_dense(decoder_concat_input)
# model
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
model.summary()
# Compile
model.compile(optimizer='rmsprop', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
checkpoint = ModelCheckpoint("/content/drive/My Drive/checkpoint.txt", monitor='val_accuracy')
early_stopping = EarlyStopping(monitor='val_accuracy', patience=5)
callbacks_list = [checkpoint, early_stopping]
# Training set
encoder_input_data = X_train
decoder_input_data = Y_train[:,:-1]
decoder_target_data = Y_train[:,1:]
# devlopment set
encoder_input_test = X_test
decoder_input_test = Y_test[:,:-1]
decoder_target_test= Y_test[:,1:]
history = model.fit([encoder_input_data, decoder_input_data],decoder_target_data,
epochs=50,
batch_size=64,
validation_data = ([encoder_input_test, decoder_input_test],decoder_target_test),
callbacks= callbacks_list)
x_voc_size is 45701 and y_voc_size is 84213. Approximately there are 45,000 records. I am getting memory error while training this model on 35GB RAM. Even after reducing the batch size to 25, I am getting the same error. Please suggest how to go about this error.
Model: "model"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) [(None, 5515)] 0
__________________________________________________________________________________________________
embedding (Embedding) (None, 5515, 256) 11699456 input_1[0][0]
__________________________________________________________________________________________________
input_2 (InputLayer) [(None, None)] 0
__________________________________________________________________________________________________
bidirectional (Bidirectional) [(None, 5515, 1000), 3028000 embedding[0][0]
__________________________________________________________________________________________________
embedding_1 (Embedding) (None, None, 256) 21558528 input_2[0][0]
__________________________________________________________________________________________________
concatenate (Concatenate) (None, 1000) 0 bidirectional[0][1]
bidirectional[0][3]
__________________________________________________________________________________________________
concatenate_1 (Concatenate) (None, 1000) 0 bidirectional[0][2]
bidirectional[0][4]
__________________________________________________________________________________________________
lstm_1 (LSTM) [(None, None, 1000), 5028000 embedding_1[0][0]
concatenate[0][0]
concatenate_1[0][0]
__________________________________________________________________________________________________
attention_layer (AttentionLayer ((None, None, 1000), 2001000 bidirectional[0][0]
lstm_1[0][0]
__________________________________________________________________________________________________
concat_layer (Concatenate) (None, None, 2000) 0 lstm_1[0][0]
attention_layer[0][0]
__________________________________________________________________________________________________
time_distributed (TimeDistribut (None, None, 84213) 168510213 concat_layer[0][0]
==================================================================================================
Total params: 211,825,197
Trainable params: 211,825,197
Non-trainable params: 0
__________________________________________________________________________________________________
EDIT - This is the model's summary. I think the parameters are huge. But how to efficiently reduce the complexity of the model?
That's quite the model and I bet you if we can talk it out a bit we can find something suitable for your use case. To give you an idea of where you stand, you're really going to want a cloud tpu cluster for something that big. I've been through most of the deeplearning ai specializations now and they choose a cloud tpu cluster between 5,000,000 and 13,000,000 parameters. The model you have there is something that would really want to be trained in a bigger corporate data center or national lab environment. In lieu of that though, it would be really good for you to check out transfer learning as large numbers of great models have already been trained in that environment and you could piggyback off them for free. I'd say if you can bring down the number of trainable parameters to something like 3,000,000, you might find something much, much more amenable for your hardware. Please, let's turn this into a conversation so everyone gets to learn. Let me know your thoughts!
Would much appreciate your help.
I am new to the RNN and I am trying to implement a RNN architecture to classify protein sequences. essentially they are one hot encoded np arrays.
I have an issue that the data is very imbalanced:
Examples:
Total: 34909
Positive: 282 (0.81% of total)
Therefore I am planning to implement the weights for the different classes by adding the class_weight=class_weight parameter when model is fitted.
I am also planning to use the f1 on the validation as a metric instead of accuracy or loss for the model as I am not interested in the true negatives.
Moreover, I am planning to implement transfer learning as I have dataseets with more positive data and datasets with only few points therefore I am planning to pretrain a general model and use the weights to further train on the specific problem.
I have come up with this architecture of the model however I am not sure if adding 4 bidirectional LSTM layers is a wise choice?:
from keras import regularizers
if output_bias is not None:
output_bias = Constant(output_bias)
model = Sequential()
# First LSTM layer
model.add(Bidirectional(LSTM(units=50, return_sequences=True, recurrent_dropout=0.1), input_shape=(timesteps, features)))
model.add(Dropout(0.5))
# Second LSTM layer
model.add(Bidirectional(LSTM(units=50, return_sequences=True)))
model.add(Dropout(0.5))
# Third LSTM layer
model.add(Bidirectional(LSTM(units=50, return_sequences=True)))
model.add(Dropout(0.5))
# Forth LSTM layer
model.add(Bidirectional(LSTM(units=50, return_sequences=False)))
model.add(Dropout(0.5))
#First Dense Layer
model.add(Dense(units=128,kernel_initializer='he_normal',activation='relu'))
model.add(Dropout(0.5))
# Adding the output layer
if output_bias == None:
model.add(Dense(units=1, activation='sigmoid',kernel_regularizer=regularizers.l2(0.001)))
else:
model.add(Dense(units=1, activation='sigmoid',
bias_initializer=output_bias,kernel_regularizer=regularizers.l2(0.001)))
model.compile(optimizer=Adam(lr=1e-3), loss=BinaryCrossentropy(), metrics=metrics)
model.build()
How do I know how many LSTM layers I should add? is it just trial and error?
Is there anything else I should include in the layers?
model.summary():
Model: "sequential_4"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
bidirectional_13 (Bidirectio (None, 5, 100) 28400
_________________________________________________________________
dropout_16 (Dropout) (None, 5, 100) 0
_________________________________________________________________
bidirectional_14 (Bidirectio (None, 5, 100) 60400
_________________________________________________________________
dropout_17 (Dropout) (None, 5, 100) 0
_________________________________________________________________
bidirectional_15 (Bidirectio (None, 5, 100) 60400
_________________________________________________________________
dropout_18 (Dropout) (None, 5, 100) 0
_________________________________________________________________
bidirectional_16 (Bidirectio (None, 100) 60400
_________________________________________________________________
dropout_19 (Dropout) (None, 100) 0
_________________________________________________________________
dense_7 (Dense) (None, 128) 12928
_________________________________________________________________
dropout_20 (Dropout) (None, 128) 0
_________________________________________________________________
dense_8 (Dense) (None, 1) 129
=================================================================
Total params: 222,657
Trainable params: 222,657
Non-trainable params: 0
I have built this model by going through multiple tutorials such as https://www.tensorflow.org/tutorials/text/text_classification_rnn
and https://www.tensorflow.org/tutorials/structured_data/imbalanced_data
Would appreciate if you could point in the right direction.
Thanks!
Is it possible to train a image classifier network with over an enormous number of classes? (say 300k classes), with each class having a minimum of 10 images split up between train/test/validation (ie. >3mil 250x250x3 images).
I have tried to train the dataset using the ResNet50 model and decreasing the batch size to as low as 1, but still have been running into OOM issues (2080 Ti). I have found out that the OOM is caused by having too many parameters and ergo I have resorted to trying to train the network on an extremely basic 10-layer model with a batch size of 1. It runs, but, the speed/accuracy is unsurprisingly abysmal.
Is there anyway I can maybe divide the training sets into smaller sections of classes, such that:
1st .h5 = classes 1 ~ 20,000
2nd .h5 = classes 20,001 ~ 40,000
3rd .h5 = classes 40,001 ~ 60,000, etc.
and later merging into a single h5 file that can be loaded to recognize all 300k different classes?
EDIT PER ASHISH'S SUGGESTION:
I have (I think) successfully merged 2 models into one, but the merged model has somewhat doubled in the number of layers...
Source code:
model1 = load_model('001.h5')
model2 = load_model('002.h5')
for layer in model1.layers:
layer._name = layer._name + "_1" # avoid duplicate layer names, which would otherwise throw an error
layer.trainable = False
for layer in model2.layers:
layer._name = layer._name + "_2"
layer.trainable = False
x1 = model1.layers[-1].output
classes = x1.shape[1]
x1 = Dense(classes, activation='relu', name='out1')(x1)
x2 = model2.layers[-1].output
x2 = Dense(x2.shape[1], activation='relu', name='out2')(x2)
classes += x2.shape[1]
x = concatenate([x1, x2])
output_layer = Dense(classes, activation='softmax', name='combined_layer')(x)
new_model = Model(inputs=[model1.inputs, model2.inputs], outputs=output_layer)
new_model.summary()
new_model.save('new_model.h5', overwrite=True)
And the resulting model looks like this:
Model: "model"
_________________________________________________________________________
Layer (type) Output Shape Param # Connected to
=========================================================================
input_1_1 (InputLayer) [(None, 224, 224, 3) 0
_________________________________________________________________________
input_1_2 (InputLayer) [(None, 224, 224, 3) 0
_________________________________________________________________________
conv1_pad_1 (ZeroPadding2D) (None, 230, 230, 3) 0 input_1_1[0][0]
_________________________________________________________________________
conv1_pad_2 (ZeroPadding2D) (None, 230, 230, 3) 0 input_1_2[0][0]
_________________________________________________________________________
conv1_conv_1 (Conv2D) (None, 112, 112, 64) 9472 conv1_pad_1[0][0]
_________________________________________________________________________
conv1_conv_2 (Conv2D) (None, 112, 112, 64) 9472 conv1_pad_2[0][0]
...
...
conv5_block3_out_1 (Activation) (None, 7, 7, 2048) 0 conv5_block3_add_1[0][0]
_________________________________________________________________________
conv5_block3_out_2 (Activation) (None, 7, 7, 2048) 0 conv5_block3_add_2[0][0]
_________________________________________________________________________
avg_pool_1 (GlobalAveragePoolin (None, 2048) 0 conv5_block3_out_1[0][0]
_________________________________________________________________________
avg_pool_2 (GlobalAveragePoolin (None, 2048) 0 conv5_block3_out_2[0][0]
_________________________________________________________________________
probs_1 (Dense) (None, 953) 1952697 avg_pool_1[0][0]
_________________________________________________________________________
probs_2 (Dense) (None, 3891) 7972659 avg_pool_2[0][0]
_________________________________________________________________________
out1 (Dense) (None, 953) 909162 probs_1[0][0]
_________________________________________________________________________
out2 (Dense) (None, 3891) 15143772 probs_2[0][0]
_________________________________________________________________________
concatenate (Concatenate) (None, 4844) 0 out1[0][0]
out2[0][0]
_________________________________________________________________________
combined_layer (Dense) (None, 4844) 23469180 concatenate[0][0]
=========================================================================
Total params: 96,622,894
Trainable params: 39,522,114
Non-trainable params: 57,100,780
As you can see, all the layers have been doubled due to Model(inputs=[input1, input2]). That will cause problems for me later when I want to use this model to predict images. Is there anyway I can do this without doubling all the previous layers and just add the trailing dense layers? At this rate I'll be overloaded with the number of parameters even faster than before...
technically it's possible, So what you can do is since you have 3 classifiers(1.h5,2.h5,3.h5), you can load these model with their weights and then use functional API in tensorflow https://www.tensorflow.org/guide/keras/functional where concatenate() API will combine output of the 3 classifiers to single vector and then use few dense network with activation function to make the final prediction.
This is a simple example that reproduces my issue in a network I am trying to deploy.
I have an image input layer (which I need to maintain), then a Dense layer, Conv2D layer and a dense layer.
The idea is that the inputs are 10x10 images and the labels are 10x10 images. Inspired by my code and this example.
import numpy as np
from keras.models import Model
from keras.layers import Input, Conv2D
#Building model
size=10
a = Input(shape=(size,size,1))
hidden = Dense(size)(a)
hidden = Conv2D(kernel_size = (3,3), filters = size*size, activation='relu', padding='same')(hidden)
outputs = Dense(size, activation='sigmoid')(hidden)
model = Model(inputs=a, outputs=outputs)
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
#Create random data and accounting for 1 channel of data
n_images=55
data = np.random.randint(0,2,(n_images,size,size,1))
labels = np.random.randint(0,2,(n_images,size,size,1))
#Fit model
model.fit(data, labels, verbose=1, batch_size=10, epochs=20)
print(model.summary())
I get the following error: ValueError: Error when checking target: expected dense_92 to have shape (10, 10, 10) but got array with shape (10, 10, 1)
I don't get an error if I change:
outputs = Dense(size, activation='sigmoid')(hidden)
with:
outputs = Dense(1, activation='sigmoid')(hidden)
No idea how Dense(1) is even valid and how it allows 10x10 output signal as model.summary() indicates:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_26 (InputLayer) (None, 10, 10, 1) 0
_________________________________________________________________
dense_93 (Dense) (None, 10, 10, 10) 20
_________________________________________________________________
conv2d_9 (Conv2D) (None, 10, 10, 100) 9100
_________________________________________________________________
dense_94 (Dense) (None, 10, 10, 1) 101
=================================================================
Total params: 9,221
Trainable params: 9,221
Non-trainable params: 0
_________________________________________________________________
None
Well, according to your comments:
what I am trying to do isn't standard. I have set of images and for
each image I want to find a binary image of the same size that if the
value of its pixel is 1 it means the feature exists in the input image
the insight wether a pixel has a feature should be taken both from
local information (extracted by a convolution layers) and global
information extracted by Dense layers.
I guess you are looking for creating a two branch model where one branch consists of convolution layers and another one is simply one or more dense layers on top of each other (although, I should mention that in my opinion one convolution network may achieve what you are looking for, because the combination of pooling and convolution layers and then maybe some up-sampling layers at the end somehow preserves both local and global information). To define such a model, you can use Keras functional API like this:
from keras import models
from keras import layers
input_image = layers.Input(shape=(10, 10, 1))
# branch one: dense layers
b1 = layers.Flatten()(input_image)
b1 = layers.Dense(64, activation='relu')(b1)
b1_out = layers.Dense(32, activation='relu')(b1)
# branch two: conv + pooling layers
b2 = layers.Conv2D(32, (3,3), activation='relu')(input_image)
b2 = layers.MaxPooling2D((2,2))(b2)
b2 = layers.Conv2D(64, (3,3), activation='relu')(b2)
b2_out = layers.MaxPooling2D((2,2))(b2)
# merge two branches
flattened_b2 = layers.Flatten()(b2_out)
merged = layers.concatenate([b1_out, flattened_b2])
# add a final dense layer
output = layers.Dense(10*10, activation='sigmoid')(merged)
output = layers.Reshape((10,10))(output)
# create the model
model = models.Model(input_image, output)
model.compile(optimizer='rmsprop', loss='binary_crossentropy')
model.summary()
Model summary:
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) (None, 10, 10, 1) 0
__________________________________________________________________________________________________
conv2d_1 (Conv2D) (None, 8, 8, 32) 320 input_1[0][0]
__________________________________________________________________________________________________
max_pooling2d_1 (MaxPooling2D) (None, 4, 4, 32) 0 conv2d_1[0][0]
__________________________________________________________________________________________________
flatten_1 (Flatten) (None, 100) 0 input_1[0][0]
__________________________________________________________________________________________________
conv2d_2 (Conv2D) (None, 2, 2, 64) 18496 max_pooling2d_1[0][0]
__________________________________________________________________________________________________
dense_1 (Dense) (None, 64) 6464 flatten_1[0][0]
__________________________________________________________________________________________________
max_pooling2d_2 (MaxPooling2D) (None, 1, 1, 64) 0 conv2d_2[0][0]
__________________________________________________________________________________________________
dense_2 (Dense) (None, 32) 2080 dense_1[0][0]
__________________________________________________________________________________________________
flatten_2 (Flatten) (None, 64) 0 max_pooling2d_2[0][0]
__________________________________________________________________________________________________
concatenate_1 (Concatenate) (None, 96) 0 dense_2[0][0]
flatten_2[0][0]
__________________________________________________________________________________________________
dense_3 (Dense) (None, 100) 9700 concatenate_1[0][0]
__________________________________________________________________________________________________
reshape_1 (Reshape) (None, 10, 10) 0 dense_3[0][0]
==================================================================================================
Total params: 37,060
Trainable params: 37,060
Non-trainable params: 0
__________________________________________________________________________________________________
Note that this is one way of achieving what you are looking for and it may or may not work for the specific problem and the data you are working on. You may modify this model (e.g. remove the pooling layers or add more dense layers) or completely use another architecture with different kind of layers (e.g. up-sampling, conv2dtrans) to reach a better accuracy. At the end, you must experiment to find the perfect solution.
Edit:
For completeness here is how to generate data and fitting the network:
n_images=10
data = np.random.randint(0,2,(n_images,size,size,1))
labels = np.random.randint(0,2,(n_images,size,size,1))
model.fit(data, labels, verbose=1, batch_size=32, epochs=20)