Say I have a function F that takes in a parameter vector P (say, a 5-element vector), and produces a (numerical) time series Y[t] of length T (eg T=100, so t=1,...,100). The function could be complicated (eg enzyme reaction models)
I want to make a neural network that predicts the output (Y[t]) that would result from feeding a new parameter set (P') into the function. How can this be done?
A simple feed-forward network can work, but it requires a very large number of output nodes, and doesn't take into account the temporal correlation / relationships between points. Is it possible/better to use a RNN or Transformer instead?
Using RNN might work for you. Here is some example code in Keras to get you started:
param_length = 5
time_length = 100
hidden_size = 20
model = tf.keras.Sequential([
# Encode input parameters.
tf.keras.layers.Dense(hidden_size, input_shape=[param_length]),
# Generate a sequence.
tf.keras.layers.RepeatVector(time_length),
tf.keras.layers.LSTM(32, return_sequences=True),
tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(1))
])
model.compile(loss="mse", optimizer="nadam")
model.fit(train_x, train_y, validation_data=(val_x, val_y), epochs=10)
The first Dense layer converts input parameters to a hidden state. Then LSTM RNN units generate time sequences. You will need to experiment with hyperparameters like the number of dense and LTSM layers, the size of hidden layers etc.
One more thing you can try is to use different loss function like:
early_stopping_cb = tf.keras.callbacks.EarlyStopping(
monitor="val_mae", patience=50, restore_best_weights=True)
model.compile(loss=tf.keras.losses.Huber(), optimizer="nadam", metrics=["mae"])
history = model.fit(train_x, train_y, validation_data=(val_x, val_y), epochs=500,
callbacks=[early_stopping_cb])
Related
I have an LSTM architecture ready:
input1 = Input(shape=(1500, 3))
lstm = LSTM(units=100, return_sequences=False, activation='relu')(input1)
outputs = Dense(150, activation="sigmoid")(lstm)
model = Model(inputs=input1, outputs=outputs)
model.compile(loss="binary_crossentropy", optimizer="adam",
metrics=["accuracy"])
The LSTM layer supports a calling argument called mask.
The way I'm reading the data is by using two generators, one iterates through training files and the other through the validation files (so on the .fit method I pass the training and validation generators).
model.fit(
x=training_generator,
epochs=10,
steps_per_epoch=5, # there are 5 files
validation_data=validation_generator,
validation_steps=5, # there are 5 files
verbose=1
)
Therefore each file will have a given mask (one for the training file, another for the validation file). Therefore my question is, how can I specify which mask to use?
The way I found to work was to transform the data during the preprocessing stage. If you replace the values in your data, according to the mask, with an number you know is not in your data, for instance 0 or -999, you can then add another layer to the architecture called Masking. This layer has a parameter called mask_value which will be the same number you used to transform your data:
input1 = Input(shape=(n_timesteps, n_channels))
masking = Masking(mask_value=-999)(input1)
lstm1 = LSTM(units=100, return_sequences=False,
activation="tanh")(masking)
outputs = Dense(n_timesteps, activation="sigmoid")(lstm1)
model = Model(inputs=input1, outputs=outputs)
model.compile(loss=keras.losses.BinaryCrossentropy(),
optimizer=tf.keras.optimizers.Adam(learning_rate=0.01))
This way you can then pass this as the input to the LSTM (since LSTMs allow this, some other types of layers do not).
I have a collection of images with open and closed eyes.
The data is collected from the current directory using keras in this way:
batch_size = 64
N_images = 84898 #total number of images
datagen = ImageDataGenerator(
rescale=1./255)
data_iterator = datagen.flow_from_directory(
'./Eyes',
shuffle = 'False',
color_mode='grayscale',
target_size=(h, w),
batch_size=batch_size,
class_mode = 'binary')
I've got a .csv file with the state of each eye.
I've built this Sequential model:
num_filters = 8
filter_size = 3
pool_size = 2
model = Sequential([
Conv2D(num_filters, filter_size, input_shape=(90, 90, 1)),
MaxPooling2D(pool_size=pool_size),
Flatten(),
Dense(16, activation='relu'),
Dense(2, activation='sigmoid'), # Two classes. one for "open" and another one for "closed"
])
Model compilation.
model.compile(
'adam',
loss='categorical_crossentropy',
metrics=['accuracy']
)
Finally I fit all the data with the following:
model.fit(
train_images,
to_categorical(train_labels),
epochs=3,
validation_data=(test_images, to_categorical(test_labels)),
)
The result fluctuates around 50% and I do not understand why.
Your current model essentially has one convolutional layer. That is, num_filters convolutional filters (which in this case are 3 x 3 arrays) are defined and fit such that when they are convolved with the image, they produce features that are as discriminative as possible between classes. You then perform maxpooling to slightly reduce the dimension of the output CNN features before passing to 2 dense layers.
I'd start by saying that one convolutional layer is almost certainly insufficient, especially with 3x3 filters. Basically, with a single convolutional layer, the most meaningful information you can get are edges or lines. These features are only marginally more useful to a function approximator (i.e. your fully connected layers) than the raw pixel intensity values because they still have an extremely high degree of variability both within a class and between classes. Consider that shifting an image of an eye 2 pixels to the left would result in completely different values output from your 1-layer CNN. You'd like the outputs of your CNN to be invariant to scale, rotation, illumination, etc.
In practice, this means you're going to need more convolutional layers. The relatively simple VGG net has at least 14 convolutional layers, and modern residual-layer based networks often have over 100 convolutional layers. Try writing a routine to define sequentially more complex networks until you start seeing performance gains.
As a secondary point, generally you don't want to use a sigmoid() activation function on your final layer outputs during training. This flattens the gradients and makes it much slower to backpropogate your loss. You actually don't care that the output values fall between 0 and 1, you only care about their relative magnitudes. Common practice is to use cross entropy loss which combines a log softmax function (gradient more stable than normal softmax) and negative log likelihood loss, as you've already done. Thus, since the log softmax portion transforms the output values into the desired range, there's no need to use the sigmoid activation function.
Is there a way to determine number of nodes and hidden layers based on shape of the data?
Also, is there a way to determine the best activation function based on the topic?
For example, Im making model for fake news prediction. My features are number of words in text, number of words in title, number of questions, number of capital letters etc.
My dataset has 22 features and around 35000 rows. My output should be 0 or 1.
Based on that, how many layers and nodes should I use and what activation functions are the best for this?
This is my net:
model = Sequential()
model.add(Dense(100, input_dim = features.shape[1], activation = 'relu')) # input layer requires input_dim param
model.add(Dense(100, activation = 'relu'))
model.add(Dense(100, activation = 'relu'))
model.add(Dropout(0.1))
model.add(Dense(1, activation='sigmoid')) # sigmoid instead of relu for final probability between 0 and 1
sgd = optimizers.SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss="mean_squared_error", optimizer=sgd, metrics=['accuracy'])
# call the function to fit to the data training the network)
model.fit(x_train, y_train, epochs = 10, shuffle = True, batch_size=32, validation_data=(x_test, y_test), verbose=1)
scores = model.evaluate(features, results)
print(model.metrics_names[1], scores[1]*100)
Selecting those requires prior experience, otherwise we won't need that much ML Engineers trying different architectures and writing papers.
But for a start I would recommend you take a look at autokeras, It will help with your problem as it's kind of a known problem -Text Classification-, you only need to structure your data as input(X and Y) and then feed that to their Text Classifier which will try different models(You could specify that) to choose the best fitting for your case.
You could find more examples in the docs here
https://autokeras.com/tutorial/text_classification/
import autokeras as ak
# Initialize the text classifier.
clf = ak.TextClassifier(max_trials=10) # It tries 10 different models
# Feed the text classifier with training data.
clf.fit(x_train, y_train)
# Predict with the best model.
predicted_y = clf.predict(x_test)
# Evaluate the best model with testing data.
print(clf.evaluate(x_test, y_test))
Answer is no and no.
Well these are also hyperparameters. You can select a bunch of them and try all of them to get a rough idea of which is giving you the best result. Yes the same statement holds for activation function as well.
You can use more layers than you need and then use regularization to stop producing an overfitted model. Also if it is too less you can clearly understand the underfitting behavior from the loss curve giving high training error.
There is no formula for determining all these. You have to try different things based on the problem at hand and you will see some of it would work better than the others.
For output softmax layer would be good as this will give you a probability of predictions which you can easily convert to one-hot encoding.
I'm trying to implement an LSTM model for DNA sequence classification, but at the moment it is unusable because of how long it takes to train (25 seconds per epoch over 6.5K sequences, about 4ms per sample, and we need to train several versions of the model over 100s of thousands of sequences).
DNA sequence can be represented as a string of A, C, G, and T, e.g. "ACGGGTGACAT" could be an example of a single DNA sequence. Each sequence belongs to one of two categories that I am trying to predict and each sequence contains 1000 characters.
Initially, my model did not include an Embedding layer and instead I manually converted each sequence into a one-hot encoded matrix (4 rows by 1000 columns) and the model didn't work great but was incredibly fast. At this point though I had seen online that using an embedding layer has clear advantages. So I added an embedding layer and instead of using the one-hot encoded matrix I converted the sequences into integers with each character represented by a different integer.
Indeed the model works much better now, but it is about 30 times slower and impossible to work with. Is there something I can do here to speed up the embedding layer?
Here are the functions for constructing and fitting my model:
from tensorflow.keras.layers import Embedding, Dense, LSTM, Activation
from tensorflow.keras import Sequential
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.utils import to_categorical
def build_model():
# initialize a sequential model
model = Sequential()
# add embedding layer
model.add(Embedding(5, 1, input_length=1000, mask_zero=True))
# Add LSTM layer
model.add(
LSTM(5)
)
# Add Dense NN layer
model.add(
Dense(units=2)
)
model.add(Activation('softmax'))
optimizer = Adam(clipnorm=1.)
model.compile(
loss="categorical_crossentropy", optimizer=optimizer, metrics=['accuracy']
)
return model
def train_model(X_train, y_train, epochs, batch_size):
model = build_model()
# y_train is initially a list of zeroes and ones, needs to be converted to categorical
y_train = to_categorical(y_train)
history = model.fit(
X_train, y_train, epochs=epochs, batch_size=batch_size
)
return model, history
Any help will be greatly appreciated - after much googling and trial-and-error, I can't seem to speed this up.
A possible suggestion is to use a "cheaper" RNN, such as the SimpleRNN instead of LSTM. It has less parameters to train. In some simple testing, I got a ~3x speed up over LSTM, with the same Embedding processing as you currently have. Not sure if you can reduce the sequence length from 1000 to a lower number, but that might be a direction to explore as well. I hope this helps.
I try to get my Neuronal Network to work but unfortunately it looks like I am missing something.
I have input data from different categories.
For example the type of a machine. ('abc', 'bcd', 'dca').
So one line of my input contains different words from different distinct word-categories. At the moment I have ~70.000 samples with 12 features.
First I use sklearns labelEncoder to transform every word into a number.
The vocabulary size goes up to 17903.
My simple newtwork looks like this:
#Start with the NN
model = tf.keras.Sequential([
tf.keras.layers.Embedding(np.amax(ml_input)+1, 300, input_length = x_train.shape[1]),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(500, activation=tf.keras.activations.softmax),
tf.keras.layers.Dense(1, activation = tf.keras.activations.linear)
])
model.compile(optimizer=tf.keras.optimizers.Adam(lr=0.01),
loss=tf.keras.losses.mean_absolute_error,
metrics=[R_squared])
model.summary()
#Train the Model
callback = [tf.keras.callbacks.EarlyStopping(monitor='loss', min_delta=5.0, patience=15),
tf.keras.callbacks.ReduceLROnPlateau(monitor='loss', factor=0.1, patience=5, min_delta=5.00, min_lr=0)
]
history = model.fit(x_train, y_train, epochs=50, batch_size=64, verbose =2, callbacks = callback)
The loss of the first epoch is about 120 and after two epochs 70 but now it doesn't change anymore. So after two epochs my net isn't learning anymore.
I already tried other loss functions, standarize my labels (they go from 3 to 500mins), more neurons, another dense layer, another activation function. But after two epochs alway loss of 70. My R_Squared is something like -0.02 it changes but alway stays negative near 0.
It seems like my network isn't learning at all.
Does anyone have an Idea of what I am doing wrong?
Thanks for your help!