Teach LSTMs concept of different frequencies - python

Training an LSTM on a sine wave is simple enough. The LSTM easily understands how to predict the wave hundreds of timesteps into the future.
However, the LSTM becomes grossly incompetent once I try to have it predict a sine wave of a different frequency. Even if I try to train it on many waves of different frequencies, it forgets the first wave to learn the next one. An example of how my LSTM fails when I changed the frequency of the test data:
How do I train my LSTM to recognize the concept of frequency and work on any sinusoid?
Edit:
The model that I am using:
inputs = Input(shape=(self.timesteps, self.features))
bd_seq = Bidirectional(LSTM(128, return_sequences=True,
kernel_regularizer='l2'),
merge_mode='sum')(inputs)
bd_sin = Bidirectional(LSTM(32, return_sequences=True,
kernel_regularizer='l2'),
merge_mode='sum')(bd_seq)
bd_1 = Bidirectional(LSTM(self.features, activation='linear'),
merge_mode='sum')(bd_seq)
bd_2 = Bidirectional(LSTM(self.features, activation='tanh'),
merge_mode='sum')(bd_sin)
output = Add()([bd_1, bd_2])
self.model = Model(inputs=inputs, outputs=output)

"...it forgets the first wave to learn the next one..."
This makes me think... are you training one sequence, then another one, then another??
That will fail, naturally, for any kind of problem with any model.
You must train lots of sequences in the same batch, or, if one sequence at a time, never more than once per epoch.
freqs = list_of_frequencies
sinusoids = []
for freq in freqs:
sinusoids.append(create_a_sinusoid(freq))
training_data = np.array(sinusoids).reshape((freqs,timesteps,features))
Possible tricks to help the model:
Add the frequency as a feature (for all steps) in the input data (if you know it as input)
Make the model output the frequency (if you know it as output) and train it on frequencies.
You may combine a model that identifies frequences with a model that will read these frequencies for predicting the desired outputs

Related

Why is my Keras model predicting trend but not scale?

I'm making a model to predict a the irradiance value on a solar field. The thing is that my model, despite being very simple (added code below), performs very well. The problem is that for any reason, it predicts a different scale, giving almos always lower values but in the same trend. I have appended the plot which compares both outputs and real data, in train and test set. Also linked the dataset.
Some details: The model has a total of 24 columns which correspond to 24 pyranometers which are the ones that gives information about the sun. The model has just been trained with the first one for simplicity, therefore with more data we can achieve better performance. Also, I'm processing my data to have a 15 steps back in time and a predict window of 20 steps forward.
input = Input((LAG,1)) # LAG is the number of steps I take backward
hidden = LSTM(32, return_sequences=True)(input)
output = Dense(1, activation='linear')(hidden)
model = Model(input, output)
Dataset
Model output vs real in train set
Model output vs real in test set

Implementation of Gradient-Reversal-layer into a functioning keras model for multiclassification

My question is about the practical implementation of "Domain Adaptation" into a functional model in keras with tensorflow backend.
Description of the problem:
I have a collection of particle collision samples which consist of n variables. One half of them is simulated data with certain class labels (e.g "W-Boson"). The other half is real collision data which is not labeled. The key idea now is to setup a keras model, which has two outputs. One for classifying the class of a sample and one for classifying the domain, so wether it is simulated or real data. The thing is that the model shall be trained so that the domain classifier performs very poor. This is achieved by flipping the sign of the incoming gradient from the domain end of the network during training. This technique is called "Domain Adaptation". The model is expected to be trained to find domain-invariant features, or in other words, to perform the same on simulated and real collision data.
The framework I am working with has an existin functional keras model, which I wanted to expand with said domain classifier. This is a prototype I came up with:
# common layers
inputs = keras.Input(shape=(n_variables, ))
X = layers.Dense(units=50, activation="relu")(inputs)
# domain end
flip_layer = flipGradientTF.GradientReversal(hp_lambda=0.3)(X)
X_domain = layers.Dense(units=50, activation="relu")(flip_layer)
domain_out = layers.Dense(units=2, activation="softmax", name="domain_out")(X_domain)
# class end
X_class = layers.Dense(units=50, activation="relu")(X)
class_out = layers.Dense(units=n_classes, activation="softmax", name="class_out")(X_class)
The code for flipGradientTF is taken from https://github.com/michetonu/gradient_reversal_keras_tf
And further on for compiling and training the model:
model = keras.Model(inputs=inputs, outputs=[class_out, domain_out])
model.compile(optimizer="adam", loss=loss_function, metrics="accuracy")
# train model
model.fit(
x = train_data,
y = [train_class_labels, train_domain_labels],
batch_size = 200,
epochs = 200,
sample_weight = {"class_out": class_weights, "domain_out": None}
)
For train_data I am passing the dataframe which consists of the data from both domains. As I have tried to use either "categorical_crossentropy" or "sparse_categorical_crossentropy" as the loss_function, train_class_labels and train_domain_labels where either in the one-hot representation or in the integer representation. My biggest issue is figuring out what to use for the class labels of the unlabeled data and this led to a gut feeling that I am on the wrong track here.
So in a nutshell:
Is this implementation strategy legit and assuming it is, what should I do about the class labels for the unlabeled data? And if it is not legit, what would be a better way of attacking this problem?
Any help would be much appreciated :)

Keras LSTM appears to be fitting the end of time-series input instead of the prediction target

To preface this, I have plenty of experience with python and moderate experience building and using machine learning networks. That being said, this is the first LSTM I have made aside from some of the cookie-cutter examples available, so any help is appreciated. I feel like this is a problem with a simple solution and that I have just been looking at this code for far too long to see it.
This model is made in a python3.5 venv using Keras with a tensorflow backend.
In short, I am trying to make predictions of some temporal data using the data itself as well as a few mathematical permutations of this data, creating four input features. I am building a time-series input from the prior 60 data points and specifying the prediction target to be 60 data points in the future.
Shape of complete training data (input)(target): (2476224, 60, 4) (2476224)
Shape of single data "point" (input)(target): (1, 60, 4) (1)
What appears to be happening is that the trained model has fit the trailing value of my input time-series (the current value) instead of the target I have provided it (60 cycles in the future).
What is interesting is that the loss function seems to be calculating according to the correct prediction target, yet the model is not converging to the proper solution.
I have no idea why the model should be doing this. My first thought was that I was preprocessing my data incorrectly and feeding it the wrong target. I have tested my input formatting of the data extensively and am pretty confident that I am providing the model with he correct target and input information.
In one instance, I had increased the learning rate a tad such that the model converged to a local minima. This testing loss of this convergence was very similar to the loss of my preferred learning rate (still quite high). But the predictions were still of the "current value". Why is this so?
Here is how I created my model:
def create_model():
lstm_model = Sequential()
lstm_model.add(CuDNNLSTM(100, batch_input_shape=(batch_size, time_step, train_input.shape[2]),
stateful=True, return_sequences=True,
kernel_initializer='random_uniform'))
lstm_model.add(Dropout(0.4))
lstm_model.add(CuDNNLSTM(60))
lstm_model.add(Dropout(0.4))
lstm_model.add(Dense(20, activation='relu'))
lstm_model.add(Dense(1, activation='linear'))
optimizer = optimizers.Adagrad(lr=params["lr"])
lstm_model.compile(loss='mean_squared_error', optimizer=optimizer)
return lstm_model
This is how I am pre-processing the data. The first function, build_timeseries, constructs my input-output pairs. I believe this is working correctly (but please correct me if I am wrong). The second function trims the pairs to fit the batch size. I do the exact same for the test input/target.
train_input, train_target = build_timeseries(train_input, time_step, pred_horiz, 0)
train_input = trim_dataset(train_input, batch_size)
train_target = trim_dataset(train_target, batch_size)
def build_timeseries(mat, TIME_STEPS, PRED_HORIZON, y_col_index):
# y_col_index is the index of column that would act as output column
dim_0 = mat.shape[0] # num datasets
dim_1 = mat.shape[1] # num features
dim_2 = mat.shape[2] # num datapoints
# Reformatted matrix
mat = mat.swapaxes(1, 2)
x = np.zeros((dim_0*(dim_2-PRED_HORIZON), TIME_STEPS, dim_1))
y = np.zeros((dim_0*(dim_2-PRED_HORIZON),))
k = 0
for i in range(dim_0): # Iterate through datasets
for j in range(TIME_STEPS, dim_2-PRED_HORIZON):
x[k] = mat[i, j-TIME_STEPS:j]
y[k] = mat[i, j+PRED_HORIZON, y_col_index]
k += 1
print("length of time-series i/o", x.shape, y.shape)
return x, y
def trim_dataset(mat, batch_size):
no_of_rows_drop = mat.shape[0] % batch_size
if(no_of_rows_drop > 0):
return mat[no_of_rows_drop:]
else:
return mat
Lastly, this is how I call the actual model.
history = model.fit(train_input, train_target, epochs=params["epochs"], verbose=2, batch_size=batch_size,
shuffle=True, validation_data=(test_input, test_target), callbacks=[es, mcp])
As the model converges, I expect it to predict values close to the specified targets I had fed it. However instead, its predictions align much more closely with the trailing value of the time-series data (or the current value). Though, on the other hand, the model appears to be evaluating the loss according to the specified target.... Why is it working this way and how can I fix it? Any help is appreciated.

Regressor Neural Network built with Keras only ever predicts one value

I'm trying to build a NN with Keras and Tensorflow to predict the final chart position of a song, given a set of 5 features.
After playing around with it for a few days I realised that although my MAE was getting lower, this was because the model had just learned to predict the mean value of my training set for all input, and this was the optimal solution. (This is illustrated in the scatter plot below)
This is a random sample of 50 data points from my testing set vs what the network thinks they should be
At first I realised this was probably because my network was too complicated. I had one input layer with shape (5,) and a single node in the output layer, but then 3 hidden layers with over 32 nodes each.
I then stripped back the excess layers and moved to just a single hidden layer with a couple nodes, as shown here:
self.model = keras.Sequential([
keras.layers.Dense(4,
activation='relu',
input_dim=num_features,
kernel_initializer='random_uniform',
bias_initializer='random_uniform'
),
keras.layers.Dense(1)
])
Training this with a gradient descent optimiser still results in exactly the same prediction being made the whole time.
Then it occurred to me that perhaps the actual problem I'm trying to solve isn't hard enough for the network, that maybe it's linearly separable. Since this would respond better to not having a hidden layer at all, essentially just doing regular linear regression, I tried that. I changed my model to:
inp = keras.Input(shape=(num_features,))
out = keras.layers.Dense(1, activation='relu')(inp)
self.model = keras.Model(inp,out)
This also changed nothing. My MAE, the predicted value are all the same.
I've tried so many different things, different permutations of optimisation functions, learning rates, network configurations, and nothing can help. I'm pretty sure the data is good, but I've included a sample of it just in case.
chartposition,tagcount,dow,artistscore,timeinchart,finalpos
121,3925,5,35128,7,227
131,4453,3,85545,25,130
69,2583,4,17594,24,523
145,1165,3,292874,151,187
96,1679,5,102593,111,540
134,3494,5,1252058,37,370
6,34895,7,6824048,22,5
A sample of my dataset, finalpos is the value I'm trying to predict. Dataset contains ~40,000 records, split 80/20 - training/testing
def __init__(self, validation_split, num_features, should_log):
self.should_log = should_log
self.validation_split = validation_split
inp = keras.Input(shape=(num_features,))
out = keras.layers.Dense(1, activation='relu')(inp)
self.model = keras.Model(inp,out)
optimizer = tf.train.GradientDescentOptimizer(0.01)
self.model.compile(loss='mae',
optimizer=optimizer,
metrics=['mae'])
def train(self, data, labels, plot=False):
early_stop = keras.callbacks.EarlyStopping(monitor='val_loss', patience=20)
history = self.model.fit(data,
labels,
epochs=self.epochs,
validation_split=self.validation_split,
verbose=0,
callbacks = [PrintDot(), early_stop])
if plot: self.plot_history(history)
All code relevant to constructing and training the networ
def normalise_dataset(df, mini, maxi):
return (df - mini)/(maxi-mini)
Normalisation of the input data. Both my testing and training data are normalised to the max and min of the testing set
Graph of my loss vs validation curves with the one hidden layer network with an adamoptimiser, learning rate 0.01
Same graph but with linear regression and a gradient descent optimiser.
So I am pretty sure that your normalization is the issue: You are not normalizing by feature (as is the de-fact industry standard), but across all data.
That means, if you have two different features that have very different orders of magnitude/ranges (in your case, compare timeinchart with artistscore.
Instead, you might want to normalize using something like scikit-learn's StandardScaler. Not only does this normalize per column (so you can pass all features at once), but it also does unit variance (which is some assumption about your data, but can potentially help, too).
To transform your data, use something along these lines
from sklearn.preprocessing import StandardScaler
import numpy as np
raw_data = np.array([[1,40], [2, 80]])
scaler = StandardScaler()
processed_data = scaler.fit_transform(raw_data)
# fit() calculates mean etc, transform() puts it to the new range.
print(processed_data) # returns [[-1, -1], [1,1]]
Note that you have two possibilities to normalize/standardize your training data:
Either scale them together with your training data, and then split afterwards,
or you instead only fit the training data, and then use the same scaler to transform your test data.
Never fit_transform your test set separate from training data!
Since you have potentially different mean/min/max values, you can end up with totally wrong predictions! In a sense, the StandardScaler is your definition of your "data source distribution", which is inherently still the same for your test set, even though they might be a subset not exactly following the same properties (due to small sample size etc.)
Additionally, you might want to use a more advanced optimizer, like Adam, or specify some momentum property (0.9 is a good choice in practic, as a rule of thumb) for your SGD.
Turns out the error was a really stupid and easy to miss bug.
When I was importing my dataset, I shuffle it, however when I performed the shuffling, I was accidentally applying the shuffling only to the labels set, not the whole dataset as a whole.
As a result, each label was being assigned to a completely random feature set, of course the model didn't know what to do with this.
Thanks to #dennlinger for suggesting for me to look in the place where I eventually found this bug.

Multiple Input types in a keras Neural Network

As an example, I'd like to train a neural network to predict the location of a picture(longitude, latitude) with the image, temperature, humidity and time of year as inputs into the model.
My question is, what is the best way to add this addition information to a cnn? Should I just merge the numeric inputs with the cnn in the last dense layer or at the beginning? Should I encode the numeric values (temperature, humidity and time of year)?
Any information, resources, sources would be greatly appreciated, thanks in advance.
You can process numeric inputs separately and merge them afterwards before making the final prediction:
# Your usual CNN whatever it may be
img_in = Input(shape=(width, height, channels))
img_features = SomeCNN(...)(img_in)
# Your usual MLP model
aux_in = Input(shape=(3,))
aux_features = Dense(24, activation='relu')(aux_in)
# Possibly add more hidden layers, then merge
merged = concatenate([img_features, aux_features])
# create last layer.
out = Dense(num_locations, activation='softmax')(merged)
# build model
model = Model([img_in, aux_in], out)
model.compile(loss='categorical_crossentropy', ...)
Essentially, you treat them as separate inputs and learn useful features that combined allow your model to predict. How you encode numeric inputs really depends on their type.
For continuous inputs like temperature you can normalize between -1, 1 for discrete inputs one-hot is very often. Here is a quick guide.
If you want to predict basis on those four features then i would suggest go with cnn + rnn
so feed the image to cnn and take the logits after that make a sequence like
logits=np.array(output).flatten()
[[logits] , [temperature], [humidity] , [time_of_year]] and feed it to
rnn , Rnn will treat it like a sequence input.

Categories