I am trying to feed stock data to Conv2D. But ran into dimension problem. I have no idea how to solve it and need help. Below are detailed steps that I have implemented.
I have attached data and code in the following link:
https://drive.google.com/drive/folders/1snsQ-96AeRn521oc0aQVlTTd9nHbtyjO?usp=sharing
by the code itself should run. It will download the data automatically. but ive taken out the featuers to simplify the run. So it will have 5 features in the attached code.
but to give you quick glance of
The Problem I had-----------------------
1. Got stock data and generated some features, it looks like:
2. Add time step to it by using:
def reshape_data(X, y, period=28):
n_past = period # number of days to look back in the past and compile into a time series
trainX = []
trainY = np.array(y.iloc[n_past:])
trainY = trainY[..., np.newaxis]
for i in range(n_past, len(X)):
trainX.append(X[i - n_past:i, 0:X.shape[1]])
trainX = np.array(trainX)
return trainX, trainY
Note:
data can be found here
https://drive.google.com/drive/folders/1snsQ-96AeRn521oc0aQVlTTd9nHbtyjO?usp=sharing
I have applied pca on it. But simply convert it into numpy and apply reshap_data() on trainX should work
trainX, trainY = reshape_data(X_train_pca, y_train, period=30)
3. shape
trainX (5768, 30, 30) # 5768-rows, 30- time steps, 30- # of features
trainY (5768,1)
4. Add 1 axis after train X
trainX = trainX[...,np.newaxis]
trainX is now (5768, 30, 30, 1)
5. Build model
6. fit and run
model.compile(optimizer=Adam(learning_rate=0.01) , metrics="mse", loss='binary_crossentropy')
reduce_lr = tf.keras.callbacks.ReduceLROnPlateau(monitor='val_loss',factor=0.5,patience=10,verbose=0,mode='auto',min_delta=0.0002,cooldown=0,min_lr=0.0001)
early_stop = tf.keras.callbacks.EarlyStopping(monitor="val_loss", patience=80, mode="min", restore_best_weights = True)
history = model.fit(trainX, trainY, epochs=300,
batch_size= 512, shuffle=False, verbose = 1,
# validation_data=(testX, testY),
validation_split=0.2,
callbacks=[early_stop, reduce_lr] )
7. ERROR
I thought since I have convered the stock into 30,30,1 should looks like a image dataset, which would enable tensorflow to work. But somehow it doesn't
Add two layers after your convolution layer:
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(1, activation='sigmoid'))
And do not mix up tensorflow.keras and keras. Rather just use tensorflow.keras for everything.
Related
I can't figure out how to predict next 100 values in future? I have array of values(1000) and I need to predict next 100 values.
Please check my code
close_arr = close_arr[::-1]
close = np.array(close_arr)
print(close)
print(len(close))
# Dataframe must have columns "ds" and "y" with the dates and values respectively.
df = pd.DataFrame({'y': close}) #'ds': timestamp,
df = df[['y']]
# prepere data for tensorflow
scaler = MinMaxScaler(feature_range=(0, 1))
yahoo_stock_prices = scaler.fit_transform(df)
train_size = int(len(yahoo_stock_prices) * 0.80)
print(f"train_size: {train_size}")
test_size = len(yahoo_stock_prices) - train_size
train, test = yahoo_stock_prices[0:train_size, :], yahoo_stock_prices[train_size:len(yahoo_stock_prices), :]
print(len(train), len(test))
# reshape into X=t and Y=t+1
look_back = 1
trainX, trainY = create_dataset(train, look_back)
testX, testY = create_dataset(test, look_back)
trainX = np.reshape(trainX, (trainX.shape[0], 1, trainX.shape[1]))
testX = np.reshape(testX, (testX.shape[0], 1, testX.shape[1]))
# Step 2 Build Model
model = Sequential()
model.add(LSTM(50, activation='relu', return_sequences=True, input_shape=(1, look_back)))
model.add(LSTM(50, activation='relu', return_sequences=True))
model.add(Dense(25))
model.add(Dense(1, activation='linear'))
model.compile(loss='mse', optimizer='rmsprop')
model.fit(trainX, trainX, batch_size=128, epochs=100,validation_split=0.05)
model.evaluate(trainX, trainX, verbose=2)
predict_length = 50
# how to predict next 50 values from testX?
# Step 1 - predict the future values
predicted = model.predict(testX)
print(f"predicted: {scaler.inverse_transform(np.array(predicted).reshape(-1, 1))}")
print(len(predicted))
# Step 2 - Plot the predictions!
plot_results_multiple(predicted, testX, predict_length)
def create_dataset(dataset, look_back=1):
# prepare data
dataX, dataY = [], []
for i in range(len(dataset)-look_back-1):
a = dataset[i:(i+look_back), 0]
dataX.append(a)
dataY.append(dataset[i + look_back, 0])
return np.array(dataX), np.array(dataY)
Code is work but I receive only redicted data with same length, not future prediction.
Also here is result what I received
enter image description here
First of all, you need to prepare your output data (y_train, y_test) to have the shape of (number_of_rows, 100). The last layer of your network should be 100, not 1.
model.add(Dense(100, activation='linear'))
Also, the lock_back means how many of the last days to use which is 1 in this case, it should be more, so First look at prepare data function and change there in a way to have outputs of shape 100. Fix the output layer of your network to have 100
I would assume that look_back refers to the number of lags of the time-series you are using, which in this case is 1. Not sure where do you import the create_dataset function, but I would assume that it creates your datasets so the X contains the values of the time-series at time t-1 and the Y contains the values at time t.
That being said you have two options for generating a forecast for 100 time steps.
You could train your model to predict 1 time step as you have done in your code. In order to generate a forecast for 100 time steps ahead you need to iteratively input into the model your last forecast in order to produce a forecast for the next time step.
The other option is to use this create_dataset function to set up the dataset so that your Y datasets contain 100 time steps. This means that the model be set up to output a sequence of 100 values.
Hope this helps!
currently I am using tensorflow to create a neural network with a 1D convolutional layer and Dense layer to predict a single output value. The input array for the neural network is an array of 1500 samples; each sample is an array of 27x13 values.
I started training in the same manner as I did without the 1D conv layer, but the training stopped during the first epoch without warning.
I found that multiprocessing might be the cause and for that, I should turn multiprocessing off as discussed here: https://github.com/stellargraph/stellargraph/issues/1006
basically adding this to my keras model:
use_multiprocessing=False
That did not change anything, after which I found that I should probably use a DataSet to bypass multiprocessing issues according to
https://github.com/stellargraph/stellargraph/issues/1206
Replace tf.keras.Sequence objects with tf.data.Dataset #1206
after struggling with the difference between
tf.data.Dataset.from_tensors
and
tf.data.Dataset.from_tensor_slices
I found the following code to start executing the model.fit block again. As you might have guessed, it still stops running after the first epoch:
main loop started
Epoch 1/5
Press any key to continue . . .
Can someone pinpoint the source of the halting of the program?
This is my code:
import random
import numpy as np
from keras import backend as K
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from keras.models import load_model
from keras.callbacks import CSVLogger
EPOCHS = 5
BATCH_SIZE = 16
def tfdata_generator(x, y, is_training, batch_size=BATCH_SIZE):
'''Construct a data generator using `tf.Dataset`. '''
dataset = tf.data.Dataset.from_tensor_slices((x, y))
if is_training:
dataset = dataset.shuffle(1500) # depends on sample size
dataset = dataset.batch(BATCH_SIZE)
dataset = dataset.repeat()
dataset = dataset.prefetch(1)
return dataset
def main():
print("main loop started")
X_train = np.random.randn(1500, 27, 13)
Y_train = np.random.randn(1500, 1)
training_set = tfdata_generator(X_train, Y_train, is_training=True)
data = np.random.randn(1500, 27, 13), Y_train
training_set = tf.data.Dataset.from_tensors((X_train, Y_train))
logstring = "C:\Documents\Conv1D"
csv_logger = CSVLogger((logstring + ".csv"), append=True, separator=';')
early_stopper = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=20, min_delta=0.00001)
model = keras.Sequential()
model.add(layers.Conv1D(
filters=10,
kernel_size=9,
strides=3,
padding="valid"))
model.add(layers.Flatten())
model.add(layers.Dense(70, activation='relu', name="layer2"))
model.add(layers.Dense(1))
optimizer =keras.optimizers.Adam(learning_rate=0.0001)
model.compile(optimizer=optimizer, loss="mean_squared_error")
# WARNING:tensorflow:multiprocessing can interact badly with TensorFlow, causing nondeterministic deadlocks. For high performance data pipelines tf.data is recommended.
model.fit(training_set,
epochs = EPOCHS,
batch_size=BATCH_SIZE,
verbose = 2,
#validation_split=0.2,
use_multiprocessing=False);
model.summary()
modelstring = "C:\Documents\Conv1D_finishedmodel"
model.save(modelstring, overwrite=True)
model = load_model(modelstring)
main()
I am trying to build a word level lstm model using keras and for that I need to create one hot encoding for words to feed in the model. I have around 32360 words around 130,000 lines. Each time I attempt to run my model I run into a memory error.
I believe the issue is the size of the dataset. I have been researching this for a couple of days now and it seems the solution is to either: create a generator where I do the one hot encoding and then load my data in batches or to reduce the number of lines I attempt to feed into the model. I cannot quite figure out the generator piece.
The error I get is:
MemoryError: Unable to allocate 143. GiB for an array with shape
(1184643, 32360) and data type int32
Is the generator the correct way to go? Is there any way to solve this otherwise? My code is below:
vocab_size = len(tokenizer.word_index)+1
seq = []
for item in corpus:
seq_list = tokenizer.texts_to_sequences([item])[0]
for i in range(1, len(seq_list)):
n_gram = seq_list[:i+1]
seq.append(n_gram)
max_seq_size = max([len(s) for s in seq])
seq = np.array(pad_sequences(seq, maxlen=max_seq_size, padding='pre'))
input_sequences, labels = seq[:,:-1], seq[:,-1]
one_hot_labels = to_categorical(labels, num_classes=vocab_size, dtype='int32')
n_units = 256
embedding_size = 100
text_in = Input(shape = (None,))
x = Embedding(vocab_size, embedding_size)(text_in)
x = LSTM(n_units)(x)
x = Dropout(0.2)(x)
text_out = Dense(vocab_size, activation = 'softmax')(x)
model = Model(text_in, text_out)
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
filepath="new_model_weights/weights-{epoch:02d}.hdf5"
checkpoint = ModelCheckpoint(filepath,
monitor='accuracy',
verbose=1,
save_best_only=False,
save_weights_only=False,
mode='auto',
save_freq='epoch')
callbacks_list = [checkpoint]
epochs = 50
batch_size = 10
history = model.fit(input_sequences, one_hot_labels, epochs=epochs, batch_size=batch_size, callbacks=callbacks_list, verbose=1)
I'm very new to Keras and machine learning in general, and am training a model like so:
history = model.fit_generator(flight_generator(train_files_train, 4), steps_per_epoch=500, epochs=50)
Where flight_generator is a function that prepares the training data and formats it, and then yields it back to the model to fit. this works great, so now I want to add some validation and after much looking online I still don't know how to implement it.
My best guess would be something like:
history = model.fit_generator(flight_generator(train_files_train, 4), steps_per_epoch=500, epochs=50, validation_data=flight_generator(train_files_cv, 4))
But when I run the code it just freezes in the first epoch. What am I missing?
EDIT:
Code for flight_generator:
def flight_generator(files, batch_size):
while True:
batch_inputs = numpy.random.choice(a = files,
size = batch_size)
batch_input_X = []
batch_input_Y = []
c=0
for batch_input in batch_inputs:
# reshape into X=t and Y=t+1
trainX, trainY = create_dataset(batch_input, look_back)
# reshape input to be [samples, time steps, features]
trainX = numpy.reshape(trainX, (trainX.shape[0], 1, trainX.shape[1]))
if c is 0:
batch_input_X = trainX
batch_input_Y = trainY
else:
batch_input_X = numpy.concatenate((batch_input_X, trainX), axis = 0)
batch_input_Y = numpy.concatenate((batch_input_Y, trainY), axis = 0)
c += 1
# Return a tuple of (input) to feed the network
batch_x = numpy.array( batch_input_X )
batch_y = numpy.array( batch_input_Y )
yield(batch_x, batch_y)
Your validation_data should be in format of tuple. So you should try changing it :
history = model.fit_generator(flight_generator(train_files_train, 4), steps_per_epoch=500, epochs=50,batch_size=32,validation_data=(flight_generator(train_files_cv, 4)))
I guess you should be using model.fit(........)
Do not try to use generator unless you actually require it
In whatever code I have seen, model.fit() does the magic
Please refer to Keras documentation for fit()
https://keras.io/api/models/sequential/
And please mention the optimizer and the metrics
I am trying to figure out to setup process of forecasting some value. Currently, I can't understand what is issue in below code:
in_neurons = 1
out_neurons = 1
hidden_neurons = 20
nb_features = 9
# retrieve data
y_train = train.pop(target).values
X_train = pd.concat([train[['QTR_HR_START', 'QTR_HR_END', 'HOLIDAY_RANK_', 'SPECIAL_EVENT_RANK_',
'IS_AM', 'IS_TOP_RANKED', 'AWARDS_WINS_ANY', 'YEARS_SINCE_RELEASE']],
pd.DataFrame({'DATETIME': pd.DatetimeIndex(train['DATETIME']).astype(np.int64)})])
X_train = X_train.values
y_test = test.pop(target).values
X_test = pd.concat([test[['QTR_HR_START', 'QTR_HR_END', 'HOLIDAY_RANK_', 'SPECIAL_EVENT_RANK_',
'IS_AM', 'IS_TOP_RANKED', 'AWARDS_WINS_ANY', 'YEARS_SINCE_RELEASE']],
pd.DataFrame({'DATETIME': pd.DatetimeIndex(test['DATETIME']).astype(np.int64)})])
X_test = X_test.values
model = Sequential()
model.add(TimeDistributed(Dense(8, input_shape=(X_train.shape[0], 100, nb_features), activation='softmax')))
model.add(LSTM(4, dropout_W=0.2, dropout_U=0.2))
model.add(Dense(1))
model.add(Activation("sigmoid"))
model.compile(loss="mean_squared_error", optimizer="rmsprop", metrics=['accuracy'])
After running the code, I got an exception:
raise Exception('The first layer in a Sequential model must '
Exception: The first layer in a Sequential model must get an input_shape or batch_input_shape argument.
Please advice where I am wrong
EDIT1: I just configured the model as was mentioned in official documentation - http://keras.io/layers/recurrent/
model.add(LSTM(32, input_dim=nb_features, input_length=100))
model.compile(loss="mean_squared_error", optimizer="rmsprop", metrics=['accuracy'])
Exception: Error when checking model input: expected lstm_input_1 to have 3 dimensions, but got array with shape (48614, 9)
It's old, but I'm posting for future use. Keras as input requiers 3D data, as stated in error. It is samples, time steps, features. Despite the fact that you have (48614, 9) Keras takes it as 2D - [samples, features]. In order to fix it, do something like this
def reshape_dataset(train):
trainX = numpy.reshape(train, (train.shape[0], 1, train.shape[1]))
return numpy.array(trainX)
x = reshape_dataset(your_dataset_48614, 9)
now X should be 48614,1, 9 which is [samples, time steps, features] - 3D