I am using a Keras model for regression which inputs are sensor measurements, and the output is the attitude of the sensor. This model consists of CuDNNLSTM and CNN. I need to reduce the number or range of outliers in the output.
The mean error is reasonable and low, but there are so many outliers in the output. The mean error is around 1, but as you can see in the boxplot, sometimes I get 180 errors (the maximum possible error).
The training data has no outlier and has been preprocessed before.
How can I reduce the outliers in the output?
Are there any specific network topologies or layers that could handle this?
I tried normalizing the input or adding gaussian noise, but none of them had any impact on the number of outliers in the outputs. Also, I tried all possible loss functions (more than 38), and this is the best result.
The model is:
Acc = Input((window_size, 3), name='acc')
Gyro = Input((window_size, 3), name='gyro')
AGconcat = concatenate([Acc, Gyro], axis=2, name='AGconcat')
fs = Input((1,), name='fs')
ACNN = Conv1D(filters=133,
kernel_size = 11,
padding = 'same',
activation = tfa.activations.mish,
name= 'ACNN')(Acc)
ACNN = Conv1D(filters=109,
kernel_size = 11,
padding = 'same',
activation = tfa.activations.mish,
name= 'ACNN1')(ACNN)
ACNN = MaxPooling1D(pool_size=3,
name = 'MaxPooling1D')(ACNN)
ACNN = Flatten(name='ACNNF')(ACNN)
GCNN = Conv1D(filters=142,
kernel_size = 11,
padding = 'same',
activation = tfa.activations.mish,
name= 'GCNN')(Gyro)
GCNN = Conv1D(filters=116,
kernel_size = 11,
padding = 'same',
activation = tfa.activations.mish,
name= 'GCNN1')(GCNN)
GCNN = MaxPooling1D(pool_size=3,
name = 'GyroMaxPool1D')(GCNN)
GCNN = Flatten(name='GCNNF')(GCNN)
AGconLSTM =Bidirectional(CuDNNGRU(128, return_sequences=True,
#return_state=True,
go_backwards=True,
name='BiLSTM1'))(AGconcat)
FlattenAG = Flatten(name='FlattenAG')(AGconLSTM)
AG = concatenate([ACNN, GCNN,FlattenAG])
AG = Dense(units=256,
activation= tfa.activations.mish)(AG)
Fdense = Dense(units=256,
activation= tfa.activations.mish,
name= 'Fdense')(fs)
AG = Flatten(name='AGF')(AG)
x = concatenate([AG, Fdense])
x = Dense(units=256,
activation= tfa.activations.mish)(x)
x = Flatten(name='output')(x)
output = Dense(4, activation='linear', name='quat')(x)
You can try weight decay and regularization. For the last line, you can add this:
quat = Lambda(lambda x: k.l2_normalize(x, axis=1), name='QuatNorm')(quat)
Related
I'm working on a project using Keras Model Subclassing in order to create a model with 2 inputs and 2 outputs. The training data for this model is essentially a dataset of other image classification datasets, with each image being paired with it's corresponding label; a dataset of datasets. One input of the network receives the label, the other receives the image.
train_img = generate_tensors(train, 0)
train_ans = generate_tensors(train, 1)
val_img = generate_tensors(val, 0)
val_ans = generate_tensors(val, 1)
train_img_b = train_img.batch(batch_size) # b for batched
train_ans_b = train_ans.batch(batch_size)
structuremodel = StructureModel()
hnet_output, anet_output = structuremodel([train_img_b, train_ans_b])
In the above code, I'm trying to perform a single forward propagation on my custom "StructureModel" class. "train_img" and "train_ans" are of shapes (None, 100, 224, 224, 1) and [insert shape] respectively. I have set the batch_size to 1.
The model itself is defined as follows:
class StructureModel(keras.Model):
num_images = 100 # images per timestep
resolution = [224, 224]
hnet_pred_vars = 9
anet_pred_vars = 25 # the thing on my whiteboard didnt include a stopping node
alphabet = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!\"#$%&'()*+,-./:;<=>?#[\\]^_`{|}~ "
def __init__(self):
super().__init__()
self.anet_layer = ArchitectureNet(self.anet_pred_vars)
def call(self, inputs):
# CNN-RNN/CNN-LSTM for processing images and corresponding answers
# Copied VGG16 for structure
# Image processing
# shape=(timesteps,resolution,resolution,rgb channels)
images = inputs[0]
answers = inputs[1]
x = TimeDistributed(Conv2D(filters=64, kernel_size=(3, 3), padding='same', activation='relu'))(images)
x = TimeDistributed(Conv2D(filters=64, kernel_size=(3, 3), padding='same', activation='relu'))(x)
x = TimeDistributed(MaxPooling2D(pool_size=(2, 2), strides=2))(x)
filters_convs = [(128, 2), (256, 3), (512, 3), (512, 3)]
for n_filters, n_convs in filters_convs:
for _ in range(n_convs):
x = TimeDistributed(Conv2D(filters=n_filters, kernel_size=(3, 3), padding='same', activation='relu'))(x)
x = TimeDistributed(MaxPooling2D(pool_size=(2, 2), strides=2))(x)
x = TimeDistributed(Flatten())(x)
img_embed = TimeDistributed(Dense(units=1000), name='Image_Preprocessing')(x)
# Answer embedding
# Number of image-answer pairs, characters in answer, single character
x = TimeDistributed(LSTM(units=500))(answers) # All answers, shape (100, None, 95)
answer_embed = TimeDistributed(Dense(units=1000), name='Answer_Preprocessing/Embed')(x)
# Combines both models
merge = Concatenate(axis=2)([img_embed, answer_embed])
x = LSTM(units=100)(merge)
dataset_embed = Dense(units=100, activation='relu', name='Dataset_Embed')(x)
# hnet
x = Dense(units=50)(dataset_embed)
hnet_output = Dense(units=self.hnet_pred_vars, name='Hyperparameters')(x)
# anet
anet_output = self.anet_layer(dataset_embed)
return hnet_output, anet_output
There's a lot of extra fluff in it, and I'm sure there's many other errors in the model, but the main one that I care about is the TypeError that I keep receiving. Without resolving that, I can't get to debugging anything else. The error is as follows:
File ~\Documents\Programming\Python\HYPAT\NetworksV2.py:83 in call
x = TimeDistributed(Conv2D(filters=64, kernel_size=(3, 3), padding='same', activation='relu'))(images)
TypeError: Exception encountered when calling layer "structure_model_7" (type StructureModel).
'<' not supported between instances of 'NoneType' and 'int'
Call arguments received by layer "structure_model_7" (type StructureModel):
• inputs=['<BatchDataset element_spec=TensorSpec(shape=(None, 100, 224, 224, 1), dtype=tf.float32, name=None)>', '<BatchDataset element_spec=TensorSpec(shape=(None, 100, 2, 95), dtype=tf.float64, name=None)>']
If it would be of any use, here's the entirety of the code.
import keras
from keras.layers import TimeDistributed, Conv2D, Dense, MaxPooling2D, Flatten, LSTM, Concatenate
from tensorflow.keras.utils import plot_model
import pickle
import tqdm
import tensorflow as tf
from varname import nameof
# constants/hyperparamete
batch_size = 1
epochs = 10
train_test_split = 0.25
with open("datasets", "rb") as fp:
datasets = pickle.load(fp)
class ArchitectureNet(keras.layers.Layer):
def __init__(self, anet_pred_vars, **kwargs):
super().__init__()
self.anet_pred_vars = anet_pred_vars
self.concat = Concatenate(axis=1)
self.dense1 = Dense(units=50, activation='relu')
self.dense2 = Dense(units=50, activation='relu')
self.anet_output = Dense(units=self.anet_pred_vars, name='Architecture')
self.stopping_node = Dense(units=1, activation='sigmoid')
def call(self, prev_output, dataset_embed):
x = self.concat([prev_output, dataset_embed])
x = self.dense1(x)
x = self.dense2(x)
anet_output = self.anet_output(x)
stop_node_output = self.stopping_node(x)
print(tf.make_ndarray(stop_node_output))
return anet_output
class StructureModel(keras.Model):
num_images = 100 # images per timestep
resolution = [224, 224]
hnet_pred_vars = 9
anet_pred_vars = 25 # the thing on my whiteboard didnt include a stopping node
alphabet = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!\"#$%&'()*+,-./:;<=>?#[\\]^_`{|}~ "
def __init__(self):
super().__init__()
self.anet_layer = ArchitectureNet(self.anet_pred_vars)
def call(self, inputs):
# CNN-RNN/CNN-LSTM for processing images and corresponding answers
# Copied VGG16 for structure
# Image processing
# shape=(timesteps,resolution,resolution,rgb channels)
images = inputs[0]
answers = inputs[1]
x = TimeDistributed(Conv2D(filters=64, kernel_size=(3, 3), padding='same', activation='relu'))(images)
x = TimeDistributed(Conv2D(filters=64, kernel_size=(3, 3), padding='same', activation='relu'))(x)
x = TimeDistributed(MaxPooling2D(pool_size=(2, 2), strides=2))(x)
filters_convs = [(128, 2), (256, 3), (512, 3), (512, 3)]
for n_filters, n_convs in filters_convs:
for _ in range(n_convs):
x = TimeDistributed(Conv2D(filters=n_filters, kernel_size=(3, 3), padding='same', activation='relu'))(x)
x = TimeDistributed(MaxPooling2D(pool_size=(2, 2), strides=2))(x)
x = TimeDistributed(Flatten())(x)
img_embed = TimeDistributed(Dense(units=1000), name='Image_Preprocessing')(x)
# Answer embedding
# Number of image-answer pairs, characters in answer, single character
x = TimeDistributed(LSTM(units=500))(answers) # All answers, shape (100, None, 95)
answer_embed = TimeDistributed(Dense(units=1000), name='Answer_Preprocessing/Embed')(x)
# Combines both models
merge = Concatenate(axis=2)([img_embed, answer_embed])
x = LSTM(units=100)(merge)
dataset_embed = Dense(units=100, activation='relu', name='Dataset_Embed')(x)
# hnet
x = Dense(units=50)(dataset_embed)
hnet_output = Dense(units=self.hnet_pred_vars, name='Hyperparameters')(x)
# anet
anet_output = self.anet_layer(dataset_embed)
return hnet_output, anet_output
def compile(self):
super().compile()
# Reserve 10,000 samples for validation
ratio = int(train_test_split * len(datasets))
val = datasets[:ratio]
train = datasets[ratio:]
if len(val) == 0: # look at me mom i'm a real programmer
raise IndexError('List \"x_val\" is empty; \"train_test_split\" is set too small')
# Prepare the training and testing datasets
def generate_tensors(data, img_or_ans): # 0 for image, 1 for ans
# technically the images aren't ragged arrays but for simplicity sake we'll keep them alll as ragged tensors
column = [i[img_or_ans] for i in data]
tensor_data = tf.ragged.constant(column)
tensor_data = tensor_data.to_tensor()
tensor_dataset = tf.data.Dataset.from_tensor_slices(tensor_data)
return tensor_dataset
train_img = generate_tensors(train, 0)
train_ans = generate_tensors(train, 1)
val_img = generate_tensors(val, 0)
val_ans = generate_tensors(val, 1)
# TODO: Test if CIFAR 100 dataset (which has variable length answers) will work
#train_dataset = tf.data.Dataset.zip((train_img, train_ans))
#train_dataset = train_dataset.batch(batch_size)
train_img_b = train_img.batch(batch_size) # b for batched
train_ans_b = train_ans.batch(batch_size)
structuremodel = StructureModel()
hnet_output, anet_output = structuremodel([train_img_b, train_ans_b])
plot_model(StructureModel, to_file='aeu.png', show_shapes=True)
"""
for epoch in tqdm.trange(epochs, desc="Epochs"):
# Iterate over the batches of the dataset.
for step, (x_batch_train, y_batch_train) in tqdm(enumerate(train_dataset), leave=False):
# Open a GradientTape to record the operations run
# during the forward pass, which enables auto-differentiation.
with tf.GradientTape() as tape:
# Run the forward pass of the layer.
# The operations that the layer applies
# to its inputs are going to be recorded
# on the GradientTape.
# Logits for this minibatch
logits = model(x_batch_train, training=True)
# Compute the loss value for this minibatch.
loss_value = los5s_fn(y_batch_train, logits)
# Use the gradient tape to automatically retrieve
# the gradients of the trainable variables with respect to the loss.
grads = tape.gradient(loss_value, model.trainable_weights)
# Run one step of gradient descent by updating
# the value of the variables to minimize the loss.
optimizer.apply_gradients(zip(grads, model.trainable_weights))
# Log every 200 batches.
if step % 200 == 0:
print(
"Training loss (for one batch) at step %d: %.4f"
% (step, float(loss_value))
)
print("Seen so far: %s samples" % ((step + 1) * batch_size))
"""
You cannot feed tf.data.Datasets directly to keras layers. Try this:
dataset1 = tf.data.Dataset.from_tensor_slices((tf.random.uniform((5, 100, 224, 224, 1)))).batch(1)
dataset2 = tf.data.Dataset.from_tensor_slices((tf.random.uniform((5, 100, 2, 95)))).batch(1)
structuremodel = StructureModel()
for (x1, x2) in zip(dataset1.take(1), dataset2.take(1)):
hnet_output, anet_output = structuremodel([x1, x2])
Note, however, that StructureModel is buggy, but I'm sure you know that.
I'm trying to improve the stability of my GAN model by adding a standard deviation variable to my layer's feature map. I'm following the example set in the GANs-in-Action git. The math itself makes sense to me. The mechanics of my model and the reasons why this addresses mode collapse makes sense to me. However, a shortcoming from the example is that they never actually show how this code is executed.
def minibatch_std_layer(layer, group_size=4):
group_size = keras.backend.minimum(group_size, tf.shape(layer)[0])
shape = list(keras.backend.int_shape(input))
shape[0] = tf.shape(input)[0]
minibatch = keras.backend.reshape(layer,(group_size, -1, shape[1], shape[2], shape[3]))
minibatch -= tf.reduce_mean(minibatch, axis=0, keepdims=True)
minibatch = tf.reduce_mean(keras.backend.square(minibatch), axis = 0)
minibatch = keras.backend.square(minibatch + 1e8)
minibatch = tf.reduce_mean(minibatch, axis=[1,2,4], keepdims=True)
minibatch = keras.backend.tile(minibatch,[group_size, 1, shape[2], shape[3]])
return keras.backend.concatenate([layer, minibatch], axis=1)
def build_discriminator():
const = ClipConstraint(0.01)
discriminator_input = Input(shape=(4000,3), batch_size=BATCH_SIZE, name='discriminator_input')
x = discriminator_input
x = Conv1D(64, 3, strides=1, padding="same", kernel_constraint=const)(x)
x = BatchNormalization()(x)
x = LeakyReLU(0.3)(x)
x = Dropout(0.25)(x)
x = Conv1D(128, 3, strides=2, padding="same", kernel_constraint=const)(x)
x = LeakyReLU(0.3)(x)
x = Dropout(0.25)(x)
x = Conv1D(256, 3, strides=3, padding="same", kernel_constraint=const)(x)
x = LeakyReLU(0.3)(x)
x = Dropout(0.25)(x)
# Trying to add it to the feature map here
x = minibatch_std_layer(Conv1D(256, 3, strides=3, padding="same", kernel_constraint=const)(x))
x = Flatten()(x)
x = Dense(1000)(x)
discriminator_output = Dense(1, activation='sigmoid')(x)
return Model(discriminator_input, discriminator_output, name='discriminator_model')
d = build_discriminator()
No matter how I structure it, I can't get the discriminator to build. It continues to return different types of AttributeErrors but I've been unable to understand what it wants. Searching the issue, there were lots of Medium posts showing a high level overview of what this does in a progressive GAN, but nothing I could find showing its application.
Does anyone have any suggestions about how the above code is added to a layer?
For those want using Minibatch Standard Deviation as Keras layer here is code:
# mini-batch standard deviation layer
class MinibatchStdev(layers.Layer):
def __init__(self, **kwargs):
super(MinibatchStdev, self).__init__(**kwargs)
# calculate the mean standard deviation across each pixel coord
def call(self, inputs):
mean = K.mean(inputs, axis=0, keepdims=True)
mean_sq_diff = K.mean(K.square(inputs - mean), axis=0, keepdims=True) + 1e-8
mean_pix = K.mean(K.sqrt(mean_sq_diff), keepdims=True)
shape = K.shape(inputs)
output = K.tile(mean_pix, [shape[0], shape[1], shape[2], 1])
return K.concatenate([inputs, output], axis=-1)
# define the output shape of the layer
def compute_output_shape(self, input_shape):
input_shape = list(input_shape)
input_shape[-1] += 1
return tuple(input_shape)
From: How to Train a Progressive Growing GAN in Keras for Synthesizing Faces
this is my proposal...
the problem is related to the minibatch_std_layer function. first of all your network deals with 3d data while the original minibatch_std_layer deals with 4d data so you need to adapt it. secondly, the input variable defined in this function is unknown (also in the source code you cited) so I think the most obvious and logical solution is to consider it as the layer variable (the input of minibatch_std_layer). with this in mind the modified minibatch_std_layer becomes:
def minibatch_std_layer(layer, group_size=4):
group_size = K.minimum(4, layer.shape[0])
shape = layer.shape
minibatch = K.reshape(layer,(group_size, -1, shape[1], shape[2]))
minibatch -= tf.reduce_mean(minibatch, axis=0, keepdims=True)
minibatch = tf.reduce_mean(K.square(minibatch), axis = 0)
minibatch = K.square(minibatch + 1e-8) #epsilon=1e-8
minibatch = tf.reduce_mean(minibatch, axis=[1,2], keepdims=True)
minibatch = K.tile(minibatch,[group_size, 1, shape[2]])
return K.concatenate([layer, minibatch], axis=1)
that we can put inside our model in this way:
def build_discriminator():
# const = ClipConstraint(0.01)
discriminator_input = Input(shape=(4000,3), batch_size=32, name='discriminator_input')
x = discriminator_input
x = Conv1D(64, 3, strides=1, padding="same")(x)
x = BatchNormalization()(x)
x = LeakyReLU(0.3)(x)
x = Dropout(0.25)(x)
x = Conv1D(128, 3, strides=2, padding="same")(x)
x = LeakyReLU(0.3)(x)
x = Dropout(0.25)(x)
x = Conv1D(256, 3, strides=3, padding="same")(x)
x = LeakyReLU(0.3)(x)
x = Dropout(0.25)(x)
# Trying to add it to the feature map here
x = Conv1D(256, 3, strides=3, padding="same")(x)
x = Lambda(minibatch_std_layer)(x)
x = Flatten()(x)
x = Dense(1000)(x)
discriminator_output = Dense(1, activation='sigmoid')(x)
return Model(discriminator_input, discriminator_output, name='discriminator_model')
I don't know what it's ClipConstraint but It doesn't seem problematic. I ran the code with TF 2.2 but also think that it's quite easy to make it run with TF 1 (if u are using it). here the running code: https://colab.research.google.com/drive/1A6UNYkveuHPF7r4-XAe8MuCHZJ-1vcpl?usp=sharing
I've been working on reproducing a CNN-LSTM model for PV power forecasting from literature for the past four weeks for my Master Thesis in Energy Science (http://www.mdpi.com/2076-3417/8/8/1286). However I've been stuck on a seemingly simple issue: Any configuration of LSTM model that I've tried yields one of two things:
Rediculous output, makes no sense whatsoever (flat line, complete
stochasticity, negative values, you name it)
Exactly the same (very believable) PV power forecast.
I've done my best to reproduce the issue with as little code as possible:
import tensorflow as tf
import tensorflow.keras as keras
from tensorflow.keras.layers import *
from tensorflow.keras.models import Sequential
from tensorflow.python.keras.layers import CuDNNLSTM
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
from time import time
SUN_UP, SUN_DOWN = '03:00:00', '23:00:00'
df = pd.read_csv('../Model_Xander/CNN-LSTM-wang/pv_data/all_data_resample-15T_interpolate-4.csv',
index_col = 0,
parse_dates = True)
df = pd.DataFrame(df['151'])
df = df.between_time(SUN_UP, SUN_DOWN)
TIME_STEPS_PER_DAY = len(df.loc['1-1-2016'])
print('each day consists of ' + str(TIME_STEPS_PER_DAY) + ' time steps of 15 minutes')
df = df.values
scaler = StandardScaler()
df_scaled = scaler.fit_transform(df)
df = np.nan_to_num(df_scaled, nan = -1)
#df = np.float16(df)
def multivariate_data(dataset, target, start_index, end_index, history_size,
target_size, step, single_step=False):
data = []
labels = []
start_index = start_index + history_size
if end_index is None:
end_index = len(dataset) - target_size
for i in range(start_index, end_index, step):
indices = range(i-history_size, i)
data.append(dataset[indices])
if single_step:
labels.append(target[i+target_size])
else:
labels.append(target[i:i+target_size])
return np.array(data), np.array(labels)
TRAIN_TEST_SPLIT = round(((2/3)*len(df)))
TARGET_COL = df[:,0]
HISTORY_SIZE = TIME_STEPS_PER_DAY * 10
TARGET_SIZE = TIME_STEPS_PER_DAY
STEP = TIME_STEPS_PER_DAY
x_train, y_train = multivariate_data(df, TARGET_COL, 0, TRAIN_TEST_SPLIT, HISTORY_SIZE, TARGET_SIZE, STEP)
x_test, y_test = multivariate_data(df, TARGET_COL, TRAIN_TEST_SPLIT, None, HISTORY_SIZE, TARGET_SIZE, STEP)
lstm = Sequential()
lstm.add(Input(shape = (x_train.shape[1], x_train.shape[2])))
lstm.add(Masking(mask_value = -1))
lstm.add(LSTM(units = 100,
kernel_initializer = keras.initializers.Orthogonal(),
bias_initializer = keras.initializers.Constant(value=0.1),
return_sequences = True))
lstm.add(LSTM(units = 100,
kernel_initializer = keras.initializers.Orthogonal(),
bias_initializer = keras.initializers.Constant(value=0.1),
return_sequences = False))
lstm.add(Dense(units = 100, activation = 'relu',
kernel_initializer = keras.initializers.TruncatedNormal(mean=0, stddev=0.05),
bias_initializer = keras.initializers.Constant(value=0.1)))
lstm.add(Dense(units = y_test.shape[1], activation = 'relu',
kernel_initializer = keras.initializers.TruncatedNormal(mean=0, stddev=0.05),
bias_initializer = keras.initializers.Constant(value=0.1)))
lstm.compile(loss = 'mse', optimizer = 'adam')
lstm.summary()
begin = time()
history = lstm.fit(x_train, y_train,
epochs = 5,
batch_size = 24,
validation_data = (x_test, y_test),
verbose = 1,
shuffle = False)
end = time()
print('it took ' + str(round(end-begin)) + ' seconds to train 5 epochs')
print(history.history)
predict = lstm.predict(x_test)
print(predict.shape)
plt.figure()
for i in range(10, 20):
plt.plot(predict[i,:])
plt.figure()
for i in range(0, x_test.shape[0]):
plt.plot(predict[i,:])
The problem is clearly seen in the last plot:
Plot of 350 predictions overlayed on top of one another
As you can see, all forecasts are identical, I have run out of ideas on how to combat this issue.
As far as i could deduce, there are a number of possible causes, first, my dataset contains a large number of NaN's, I've done my best to combat that issue with three methods:
Resampling from very high resolution (10 seconds) to standard resolution (15 min)
Interpolating up to 4 consecutive NaN's with linear interpolation (any more seems stupid to me)
The masking layer an observant reader might've noticed in the model definition in the code
Even after these steps, my dataset still contains a large amount of NaN's, I'm not really sure what to do about it, or if the Masking layer is even doing its intended job. I do know for sure that the masking layer cannot play nicely with CuDNNLSTM, and my normal LSTM model runs a LOT slower with the masking layer.
The best I've been able to accomplish in terms of obtaining differently shaped predictions for differently shaped inputs is this: Differently shaped output for differently shaped inputs However, as you can see, this is just the same shape with a slightly different amplitude.
Another thing I've noticed is that when i input data from 9 other sensors as features (each with a similar amount and location of NaN's), the amplitude changes per prediction (yay), but the shape remains the same across all predictions: yay different amplitude! Aww, same shape :(.
I will be uploading my model to my university's cluster (for the 200th time) to train for more than 5 epochs, who knows, maybe today is my lucky day. If anyone knows how to combat these issues, i would be very glad and thankful to hear your thoughts.
EDIT:
In light of the lessons learned from the response below i made the following changes: Regularization and dropout to combat overfitting (which will lead to the average being forecasted for every input if left unchecked).
Last LSTM layer with return_sequences = True
Added Flatten layer after last LSTM layer
Removed NaN values from my dataset removing the need for the masking layer and enabling the use of the CuDNNLSTM layer (train on GPU if I understand it correctly).
However, now that each day has a unique forecast, I noticed that increasing the number of units in the LSTM layer beyond somewhere between 20 and 50 (I tested 20 and 50). Will return the problem of each day having the exact same forecast. I am still stumped as to why this is. (See below for the model I used to produce unique forecasts for each day)
lstm = Sequential()
lstm.add(Input(shape = (x_train.shape[1], x_train.shape[2])))
lstm.add(CuDNNLSTM(units = 50,
kernel_initializer = keras.initializers.Orthogonal(),
kernel_regularizer = keras.regularizers.l1_l2(l1=1e-5, l2=1e-4),
bias_initializer = keras.initializers.Constant(value=0.1),
return_sequences = True))
#lstm.add(Dropout(rate=0.2))
lstm.add(CuDNNLSTM(units = 50,
kernel_initializer = keras.initializers.Orthogonal(),
kernel_regularizer = keras.regularizers.l1_l2(l1=1e-5, l2=1e-4),
bias_initializer = keras.initializers.Constant(value=0.1),
return_sequences = True))
lstm.add(Dropout(rate = 0.2))
lstm.add(Flatten())
lstm.add(Dense(units = int(0.5*x_train.shape[1]), activation = 'relu',
kernel_initializer = keras.initializers.TruncatedNormal(mean=0, stddev=0.05),
bias_initializer = keras.initializers.Constant(value=0.1)))
lstm.add(Dropout(rate = 0.2))
lstm.add(Dense(units = y_test.shape[1], activation = 'relu',
kernel_initializer = keras.initializers.TruncatedNormal(mean=0, stddev=0.05),
bias_initializer = keras.initializers.Constant(value=0.1)))
lstm.compile(loss = 'mse', optimizer = 'adam')
lstm.summary()
I have a CNN with a concatenate-layer at the beginning of Dense network. I use the layer.concatenate() to merge the features retrieved by CNN and my hand-crafted features. Now, this 2 data arrays, have 2 different scale (because the first are values calculated by the CNN and the other are features calculated by me). So I would a way to normalize this 2 type of data in a common scale for simplify the job. This is my CNN:
emb = Embedding()(image_input)
conv = Conv1D(64, 2, activation = 'relu', strides = 1, padding = 'same')(emb)
conv = MaxPooling1D()(conv1)
first_part_output = Flatten()(conV)
merged_model = layers.concatenate([first_part_output, other_data_input])
Here i would do the normalization:
normal = *here i would do the normalization*
primoDense = Dense(256, activation = 'relu')(normal)
drop = Dropout(0.45)(primoDense)
predictions = Dense(1, activation = 'sigmoid')(drop)
[first_part_output, other_data_input] is my new merged-array.
first_part_output are CNN featues.
other_data_input are my hand-crafted features.
I want to program a neural network and I'm using the Keras library for it. One dataset is divided into a random number of subsets (1-100). Not used subsets are set to zero. One subset consists of 2*4+1 binary input values. The Architecture should look like this (The weights of all subset networks should be shared):
. InA1(4) InB1(4) _
. \ / \
. FCNA FCNB |
. \ / |
. Concatinate |
. | \ 100x (InA2, InB2, InC2, InA3, ...)
. FCN /
.InC(1) | |
. \ / |
. \ / _/
. Concatinate
. |
. FCN
. |
. Out(1)
I have looked through a number of tutorials and examples but I dont find a proper method to implement that network. Here is what I have tried so far:
from keras import *
# define arrays for training set input
InA = []
InB = []
InC = []
for i in range(100):
InA.append( Input(shape=4,), dtype='int32') )
InB.append( Input(shape=4,), dtype='int32') )
InC.append( Input(shape=1,), dtype='int32') )
NetA = Sequential()
NetA.add(Dense(4, input_shape(4,), activation="relu"))
NetA.add(Dense(3, activation="relu"))
NetB = Sequential()
NetB.add(Dense(4, input_shape(4,), activation="relu"))
NetB.add(Dense(3, activation="relu"))
NetMergeAB = Sequential()
NetMergeAB.add(Dense(1, input_shape=(3,2), activation="relu"))
# merging all subsample networks of InA, InB
MergeList = []
for i in range(100):
NetConcat = Concatenate()( [NetA(InA[i]), NetB(InB[i])] )
MergedNode = NetMergeAB(NetConcat)
MergeList.append(MergedNode)
MergeList.append(InC[i])
# merging also InC
FullConcat = Concatenate()(MergeList)
# put in fully connected net
ConcatNet = Sequential()
ConcatNet.add(Dense(10, input_shape(2, 100), activation="relu"))
ConcatNet.add(Dense(6, activation="relu"))
ConcatNet.add(Dense(4, activation="relu"))
ConcatNet.add(Dense(1, activation="relu"))
Output = ConcatNet(FullConcat)
The problem is, that either I get a "no Tensor" error, or it doesnt work at all. Has someone a idea how to solve this properly?
You can achieve that network architecture easily with the functional API and not use Sequential at all:
InA, InB, InC = [Input(shape=(4,), dtype='int32') for _ in range(3)]
netA = Dense(4, activation="relu")(InA)
netA = Dense(3, activation="relu")(netA)
netB = Dense(4, activation="relu")(InB)
netB = Dense(3, activation="relu")(netB)
netMergeAB = concatenate([netA, netB])
netMergeAB = Dense(1, activation="relu")(netMergeAB)
fullConcat = concatenate([netMergeAB, InC])
out = Dense(10, activation="relu")(fullConcat)
out = Dense(6, activation="relu")(out)
out = Dense(4, activation="relu")(out)
out = Dense(1, activation="relu")(out)
model = Model([InA, InB, InC], out)
You might need to adjust it slightly but the overall idea should be clear.
Using the code from the answer of the question author:
ActInA = Input(shape=(4,), dtype='int32')
ActInB = Input(shape=(4,), dtype='int32')
ActInC = Input(shape=(1,), dtype='int32')
NetA = Dense(4, activation="relu")(ActInA)
NetA = Dense(3, activation="relu")(NetA)
NetB = Dense(4, activation="relu")(ActInB)
NetB = Dense(3, activation="relu")(NetB)
NetAB = concatenate([NetA, NetB])
NetAB = Dense(1, activation="relu")(NetAB)
Now we build a model for this subset of the net:
mymodel = Model([ActInA, ActInB], NetAB)
Now the important part from the keras doc:
All models are callable, just like layers
this means you can simpy do something like this:
for i in range(100):
NetMergeABC.append(mymodel([ActInA_array[i], ActInB_array[i]]))
Because you reuse the layers, the weights will be shared.
I have changed my code and I hope that it is more clear now:
NetMergeABC = []
for i in range(100):
ActInA = Input(shape=(4,), dtype='int32')
ActInB = Input(shape=(4,), dtype='int32')
ActInC = Input(shape=(1,), dtype='int32')
NetA = Dense(4, activation="relu")(ActInA)
NetA = Dense(3, activation="relu")(NetA)
NetB = Dense(4, activation="relu")(ActInB)
NetB = Dense(3, activation="relu")(NetB)
NetAB = concatenate([NetA, NetB])
NetAB = Dense(1, activation="relu")(NetAB)
NetMergeABC.append(NetAB)
NetMergeABC.append(ActInC)
NetABC = concatenate(NetMergeABC)
NetABC = Dense(10, activation="relu")(NetABC)
NetABC = Dense(6, activation="relu")(NetABC)
NetABC = Dense(4, activation="relu")(NetABC)
NetABC = Dense(1, activation="relu")(NetABC)
The problem now is, that (I guess) the weights of the NetA/B/C 1-100 arent shared.