I have some training data x_train and some corresponding labels for this x_train called y_train. Here is how x_train and y_train are constructed:
train_x = np.array([np.random.rand(1, 1000)[0] for i in range(10000)])
train_y = (np.random.randint(1,150,10000))
train_x has 10000 rows and 1000 columns for each row.
train_y has a label between 1 and 150 for each sample in train_x and represents a code for each train_x sample.
I also have a sample called sample, which is 1 row with 1000 columns, which I want to use for prediction on this LSTM model. This variable is defined as
sample = np.random.rand(1,1000)[0]
I am trying to train and predict an LSTM on this data using Keras. I want to take in this feature vector and use this LSTM to predict one of the codes in range 1 to 150. I know these are random arrays, but I cannot post the data I have. I have tried the following approach which I believe should work, but am facing some issues
model = Sequential()
model.add(LSTM(output_dim = 32, input_length = 10000, input_dim = 1000,return_sequences=True))
model.add(Dense(150, activation='relu'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
history = model.fit(train_x, train_y,
batch_size=128, nb_epoch=1,
verbose = 1)
model.predict(sample)
Any help or adjustments to this pipeline would be great. I am not sure if the output_dim is correct. I want to pass train the LSTM on each sample of the 1000 dimension data and then reproduce a specific code that is in range 1 to 150. Thank you.
I see at least three things you need to change:
Change this line:
model.add(Dense(150, activation='relu'))
to:
model.add(Dense(150, activation='softmax'))
as leaving 'relu' as activation makes your output unbounded whereas it needs to have a probabilistic interpretation (as you use categorical_crossentropy).
Change loss or target:
As you are using categorical_crossentropy you need to change your target to be a one-hot encoded vector of length 150. Another way is to leave your target but to change loss to sparse_categorical_crossentropy.
Change your target range:
Keras has a 0-based array indexing (as in Python, C and C++ so your values should be in range [0, 150) instead [1, 150].
Related
Say I have a function F that takes in a parameter vector P (say, a 5-element vector), and produces a (numerical) time series Y[t] of length T (eg T=100, so t=1,...,100). The function could be complicated (eg enzyme reaction models)
I want to make a neural network that predicts the output (Y[t]) that would result from feeding a new parameter set (P') into the function. How can this be done?
A simple feed-forward network can work, but it requires a very large number of output nodes, and doesn't take into account the temporal correlation / relationships between points. Is it possible/better to use a RNN or Transformer instead?
Using RNN might work for you. Here is some example code in Keras to get you started:
param_length = 5
time_length = 100
hidden_size = 20
model = tf.keras.Sequential([
# Encode input parameters.
tf.keras.layers.Dense(hidden_size, input_shape=[param_length]),
# Generate a sequence.
tf.keras.layers.RepeatVector(time_length),
tf.keras.layers.LSTM(32, return_sequences=True),
tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(1))
])
model.compile(loss="mse", optimizer="nadam")
model.fit(train_x, train_y, validation_data=(val_x, val_y), epochs=10)
The first Dense layer converts input parameters to a hidden state. Then LSTM RNN units generate time sequences. You will need to experiment with hyperparameters like the number of dense and LTSM layers, the size of hidden layers etc.
One more thing you can try is to use different loss function like:
early_stopping_cb = tf.keras.callbacks.EarlyStopping(
monitor="val_mae", patience=50, restore_best_weights=True)
model.compile(loss=tf.keras.losses.Huber(), optimizer="nadam", metrics=["mae"])
history = model.fit(train_x, train_y, validation_data=(val_x, val_y), epochs=500,
callbacks=[early_stopping_cb])
I have an LSTM architecture ready:
input1 = Input(shape=(1500, 3))
lstm = LSTM(units=100, return_sequences=False, activation='relu')(input1)
outputs = Dense(150, activation="sigmoid")(lstm)
model = Model(inputs=input1, outputs=outputs)
model.compile(loss="binary_crossentropy", optimizer="adam",
metrics=["accuracy"])
The LSTM layer supports a calling argument called mask.
The way I'm reading the data is by using two generators, one iterates through training files and the other through the validation files (so on the .fit method I pass the training and validation generators).
model.fit(
x=training_generator,
epochs=10,
steps_per_epoch=5, # there are 5 files
validation_data=validation_generator,
validation_steps=5, # there are 5 files
verbose=1
)
Therefore each file will have a given mask (one for the training file, another for the validation file). Therefore my question is, how can I specify which mask to use?
The way I found to work was to transform the data during the preprocessing stage. If you replace the values in your data, according to the mask, with an number you know is not in your data, for instance 0 or -999, you can then add another layer to the architecture called Masking. This layer has a parameter called mask_value which will be the same number you used to transform your data:
input1 = Input(shape=(n_timesteps, n_channels))
masking = Masking(mask_value=-999)(input1)
lstm1 = LSTM(units=100, return_sequences=False,
activation="tanh")(masking)
outputs = Dense(n_timesteps, activation="sigmoid")(lstm1)
model = Model(inputs=input1, outputs=outputs)
model.compile(loss=keras.losses.BinaryCrossentropy(),
optimizer=tf.keras.optimizers.Adam(learning_rate=0.01))
This way you can then pass this as the input to the LSTM (since LSTMs allow this, some other types of layers do not).
I'm trying to construct a network that will predict a Boolean target.
The data provided to the network contains both categorical and numerical entries but has all ready been properly processed. The data I am working with is Time Series data with 84 fields and 310033 rows of data. All the data has been scaled to remain between 0 and 1. Ever row represents a second in the data.
I created a database, data, with a shape (310033, 60, 500) and the target vector is of shape (1000, 1). The Time Step dimension was defined to be 60 because that is the maximum full 60 min hours possible with the amount of data available.
Then I split data (X_train, X_test, y_train, y_test).
Is it okay to give a matrix like this to the lstm model and expect a good prediction (if the relationships are there)? Because I have very poor performance. From what I have seen, people give only a 1D or 2D data and they reshape their data after to give a 3D input to the lstm layer. Which is what I have done here.
Below is the Transformation code from 2D to 3D:
X_train, X_test, y_train, y_test = train_test_split(scaled, target, train_size=.7, shuffle = False)
# Generate Lag time Steps 3D framework for LSTM - Currently in 2D Framework
# As required for LSTM networks, we must reshape the input data into N_samples x TimeSteps x Variables
hours = len(X_train)/3600
hours = math.floor(hours) #Find Most full 60-min-hours available in subset of data
temp =[]
# Pull hours into the three dimensional field
for hr in range(hours, len(X_train) + hours):
temp.append(scaled[hr - hours:hr, 0:scaled.shape[1]])
X_train = np.array(temp) #Export Train Features in (70% x Hours x Variables)
hours = len(X_test)/3600
hours = math.floor(hours) #Find Most full 60-min-hours available in subset of data
temp =[]
# Pull hours into the three dimensional field
for hr in range(hours, len(X_test) + hours):
temp.append(scaled[hr - hours:hr, 0:scaled.shape[1]])
X_test = np.array(temp) #Export Test Features in (30% x Hours x Variables)
Below is the Framework of the Model:
model = Sequential()
#Layer 1 - returns a sequence of vectors
model.add(LSTM(128, return_sequences=True,
input_shape=(X_train.shape[1], X_train.shape[2])))
model.add(Dropout(0.15)) #15% drop out layer
#model.add(BatchNormalization())
#Layer 2
model.add(LSTM(256, return_sequences=False))
model.add(Dropout(0.15)) #15% drop out layer
#model.add(BatchNormalization())
#Layer 3 - return a single vector
model.add(Dense(32))
#Output of 2 because we have 2 classes
model.add(Dense(2, activation= 'sigmoid'))
# Define optimiser
opt = tf.keras.optimizers.Adam(learning_rate=1e-5, decay=1e-6)
# Compile model
model.compile(loss='sparse_categorical_crossentropy', # Mean Square Error Loss = 'mse'; Mean Absolute Error = 'mae'; sparse_categorical_crossentropy
optimizer=opt,
metrics=['accuracy'])
history = model.fit(X_train, y_train, epochs=epoch, batch_size=batch, validation_data=(X_test, y_test), verbose=3, shuffle=False)
I have experimented with many different frameworks for the LSTM. Single layer, multilayer, a double LSTM layer with 2 truncating Dense layers (LSTM -> LSTM -> Dense(32) -> Dense(2)), Batch normalization, etc...
Is there a suggested framework for this type of time series data to improve performance? I was getting better results when the data only had a single TimeStep = 1.
I have a time series problem with 15 minutes as a timestep.The complete data will be from 2016-09-01 00:00:15 to 2016-12-31 23:45:00.
I have 5 variables(v1,v2,v3,v4,v5,v6) in the data frame and I want to predict the sixth variable (v6) for the next timestep.
I prepare the data set and prepare the information as 5-time lags. like if the time is t in the row I create the values for (t-1) to (t-5) as lags for v1 to v6.
So in total, I have 30 features (5 lags for 6 variables).
I also normalize the values by PowerTransformer.
scaler_x = PowerTransformer()
scaler_y = PowerTransformer()
train_X = scaler_x.fit_transform(train_X)
train_y = scaler_y.fit_transform(train_y.reshape(-1,1))
My data input shape of traix_X and train_y is like below at initial:
(11253, 30) , (11253, 1)
11253 rows having 30 variables as input and a single variable as target variable .Then i reshape this to fit my ConvLSTM2D like below:
# define the number of subsequences and the length of subsequences
n_steps, n_length = 5, 6 #I take into account of past 5 steps for the 6 variables
n_features=1
#reshape for ConvLSTM
# reshape into subsequences [samples, time steps, rows, cols, channels]
train_X = train_X.reshape(train_X.shape[0], n_steps, 1, n_length, n_features)
train_y = train_y.reshape((train_y.shape[0], train_y.shape[1], 1))
The ConvLSTM2D architecture looks like below :
model = Sequential()
model.add(ConvLSTM2D(filters=64, kernel_size=(1,3), activation='relu', input_shape=(n_steps, 1, n_length, n_features)))
model.add(Flatten())
model.add(RepeatVector(1))
model.add(LSTM(50, activation='relu', return_sequences=True))
model.add(TimeDistributed(Dense(20, activation='relu')))
model.add(TimeDistributed(Dense(1)))
model.compile(loss='mse', optimizer='adam')
# fit network
model.fit(train_X, train_y, epochs=epochs, batch_size=batch_size, verbose=0)
But this model gives a very bad result (It is overfitting a lot). I suspect that my inputs are not given correctly to the ConvLSTM2D.
Is my reshaping correct? Any help is appreciated.
EDIT:
I have realized my input is being given correctly to the Network but the issue is it is overfitting a lot.
My hyperparameters are below :
#hyper-parameter
epochs=100
batch_size=64
adam_opt = keras.optimizers.Adam(lr=0.001)
I even tried 50 and 10 epochs its same issue.
In my personal experience there are a few things I've picked up about using ConvLSTM2D.
I would first check to see if the model is training at all. Based on your answer I am unsure how loss is changing as your model trains - if at all. If there is some variation, you need to perform a grid search (playing around with amount of layers and filters)
I also found my models needed to train for a long time to perform well, see the Keras example on ConvLSTM2d where 300 epochs are needed to train a model to perform an arguably simple task : https://keras.io/examples/conv_lstm/. A case I worked on needed a similar amount of epochs to train.
Check different loss functions and optimizers (even though I think mse and adam are good for this type of problem)
Normalize your data differently, you may want to normalize your data statistically as
shown in this keras example : https://www.tensorflow.org/tutorials/keras/regression
From personal experience, you might want more layers for this specific problem. See keras ConvLSTM2d example above for this
* I see how you want to format your data, and though it may work, a more straightforward solution may work better. You might want to try giving (v1,v2,v3,v4,v5) and predicting for v6. You may have the use large batch sizes for this. *
I am trying to use a LSTM network in Keras to make predictions of timeseries data one step into the future. The data I have is of 5 dimensions, and I am trying to use the previous 3 periods of readings to predict the a future value in the next period. I have normalised the data and removed all NaN etc, and this is the code I am trying to use to train the network:
def Network_ii(IN, OUT, TIME_PERIOD, EPOCHS, BATCH_SIZE, LTSM_SHAPE):
length = len(OUT)
train_x = IN[:int(0.9 * length)]
validation_x = IN[int(0.9 * length):]
train_y = OUT[:int(0.9 * length)]
validation_y = OUT[int(0.9 * length):]
# Define Network & callback:
train_x = train_x.reshape(train_x.shape[0],3, 5)
validation_x = validation_x.reshape(validation_x.shape[0],3, 5)
model = Sequential()
model.add(LSTM(units=128, return_sequences= True, input_shape=(train_x.shape[1],3)))
model.add(LSTM(units=128))
model.add(Dense(units=1))
model.compile(optimizer='adam', loss='mean_squared_error')
train_y = np.asarray(train_y)
validation_y = np.asarray(validation_y)
history = model.fit(train_x, train_y, batch_size=BATCH_SIZE, epochs=EPOCHS, validation_data=(validation_x, validation_y))
# Score model
score = model.evaluate(validation_x, validation_y, verbose=0)
print('Test loss:', score)
# Save model
model.save(f"models/new_model")
I am attempting to roughly follow the steps outlined here- https://machinelearningmastery.com/multivariate-time-series-forecasting-lstms-keras/
However, no matter what adjustments I have made in terms of changing the number of dimensions used to train the network or the length of the time period I cannot get the output of the model to give predictions that are not either 1 or 0. This is even though the target data, in the array 'OUT' is made up of data continuous on [0,1].
I think there may be something wrong with how I am setting up the .Sequential() function, but I cannot see what to adjust. I am relatively new to this so any help would be greatly appreciated.
You are probably using a prediction function that is not the standard. Maybe you are using predict_classes?
The one that is well documented and the standard is model.predict.