I built a Keras sequential model on the simple dataset. I am able to train the model, however every time I try to get a prediction on the same input I get different values. Anyone knows why? I read through different Stackoverflow here (Why the exactly identical keras model predict different results for the same input data in the same env, Keras saved model predicting different values on different session, different prediction after load a model in keras), but couldn't find the answer. I tried to set the Tensorflow seed and still getting different results.
Here is my code
from pandas import concat
from pandas import DataFrame
# create sequence
length = 10
sequence = [i/float(length) for i in range(length)]
# create X/y pairs
df = DataFrame(sequence)
df = concat([df, df.shift(1)], axis=1)
df.dropna(inplace=True)
print(df)
# convert to LSTM friendly format
values = df.values
X, y = values[:, 0], values[:, 1]
X = X.reshape(len(X), 1, 1)
print(X.shape, y.shape)
output is:
0 0
1 0.1 0.0
2 0.2 0.1
3 0.3 0.2
4 0.4 0.3
5 0.5 0.4
6 0.6 0.5
7 0.7 0.6
8 0.8 0.7
9 0.9 0.8
(9, 1, 1) (9,)
Then start building the model
#configure network
from tensorflow import keras
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
tf.random.set_seed(1337)
n_batch = len(X)
n_neurons = 10
#design network
model = Sequential()
model.add(LSTM(n_neurons, batch_input_shape=(n_batch, X.shape[1], X.shape[2]), stateful=True))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(X,y,epochs=2,batch_size=n_batch,verbose=1,shuffle=False)
Now every time I run the following code to get the prediction I get different results as you can see here
model.predict(X)
********output**************
array([[0.03817442],
[0.07164046],
[0.10493257],
[0.13797525],
[0.17069395],
[0.20301574],
[0.23486984],
[0.26618803],
[0.29690543]], dtype=float32)
model.predict(X)
********output**************
array([[0.04415776],
[0.08242793],
[0.12048437],
[0.15823033],
[0.19556962],
[0.2324073 ],
[0.26865062],
[0.3042098 ],
[0.33899906]], dtype=float32)
The problem is setting stateful=True in your LSTM layer, as this keeps the state between predict calls, so each prediction depends on previous predictions.
So as a solution, set stateful=False.
I think this library and the documentation attached to it will be interesting for your work.
Based on the above library, in a recent work I had with the Keras, I was starting the code as follows:
import os
import numpy as np
from numpy.random import seed
seed(42)
rng = np.random.RandomState(42)
import tensorflow
tensorflow.random.set_seed(42)
os.environ['TF_DETERMINISTIC_OPS'] = '1'
There seemed to be a good deal of determinism in the results, and it was good enough for what I was working on at the time.
Based on #Dr.Snoopy the problem was setting stateful = True. Setting it to False fixed the issue. "Boolean (default False). If True, the last state for each sample at index i in a batch will be used as initial state for the sample of index i in the following batch." and my misunderstanding was that this only applies to training.
Thanks to #Dr.Snoopy for pointing that out.
Related
I am new to TensorFlow framework and I am trying to apply Tensorflow to predict the survivor based on this Titanic Dataset:https://www.kaggle.com/c/titanic/data.
import tensorflow as tf
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
#%%
titanictrain = pd.read_csv('train.csv')
titanictest = pd.read_csv('test.csv')
df = pd.concat([titanictrain,titanictest],join='outer',keys='PassengerId',sort=False,ignore_index=True).drop(['Name'],1)
#%%
def preprocess(df):
df['Fare'].fillna(value=df.groupby('Pclass')['Fare'].transform('median'),inplace=True)
df['Fare'] = df['Fare'].map(lambda x: np.log(x) if x>0 else 0)
df['Embarked'].fillna(value=df['Embarked'].mode()[0],inplace=True)
df['CabinAlphabet'] = df['Cabin'].str[0]
categories_to_one_hot = ['Pclass','Sex','Embarked','CabinAlphabet']
df = pd.get_dummies(df,columns=categories_to_one_hot,drop_first=True)
return df
df = preprocess(df)
df = df.drop(['PassengerId','Ticket','Cabin','Survived'],1)
titanic_trainandval = df.iloc[:len(titanictrain)]
titanic_test = df.iloc[len(titanictrain):] #test after preprocessing
titanic_test.head()
# split train into training and validation set
labels = titanictrain['Survived']
y = labels.values
test = titanic_test.copy() # real test sets
print(len(test), 'test examples')
Here I am trying to apply preprocessing on the data:
1.Drop Name column and Do one hot coding both on the train and test set
2.Drop ['PassengerId','Ticket','Cabin','Survived'] for Simplicity.
Split train and test following the original order
Here is a picture showing what the training set looks like.
"""# model training"""
from tensorflow.keras.layers import Input, Dense, Activation,Dropout
from tensorflow.keras.models import Model
X = titanic_trainandval.copy()
input_layer = Input(shape=(X.shape[1],))
dense_layer_1 = Dense(10, activation='relu')(input_layer)
dense_layer_2 = Dense(5, activation='relu')(dense_layer_1)
output = Dense(1, activation='softmax',name = 'predictions')(dense_layer_2)
model = Model(inputs=input_layer, outputs=output)
base_learning_rate = 0.0001
model.compile(loss=tf.keras.losses.BinaryCrossentropy(from_logits=True), optimizer=tf.keras.optimizers.Adam(lr=base_learning_rate), metrics=['acc'])
history = model.fit(X, y, batch_size=5, epochs=20, verbose=2, validation_split=0.1,shuffle = False)
submission = pd.DataFrame()
submission['PassengerId'] = titanictest['PassengerId']
Then I put the training set X into the model to get the result. However, history shows the following result:
No matter how I change the learning rate and batch size, the result does not change, and the loss is always 'nan', and the prediction based on the test set is always 'nan' as well.
Could anybody explain where the problem is and give some possible solutions?
at first glance there are 2 major problems in your code:
your output layer must be Dense(2, activation='softmax'). this is because yours is a binary classification problem and if you are using softmax to generate probabilities the output dim must be equal to the number of classes. (you can use one output dimension with sigmoid activation)
you have to change your loss function. with softmax and numerical encoded target use sparse_categorical_crossentropy. (you can use binary_crossentropy with sigmoid and with from_logits=False as default)
PS: make sure to remove all NaNs in your original data before starting fit
Marco Cerliani is right with the points 1 and 2.
The real problem why you have NaNs is because you feed NaNs in your code. If you look carefully, even in your third photo, the 888th example on the column Age contains a NaN.
This is why you have NaNs. Solve this one, and apply Marco Cerliani's suggestions and you're good to go.
Apart from the above answers, 1 more thing which I would like to add is that whenever you want to use form_logits=True for classification problems, use Linear activation function i.e activation='linear' which is the default value for activation function in the last layer.
I'm trying to use a time-series data set with 30 different features and I want to predict the future values for 3 of those features. Is there any way I can specify what features I want to be used for output and how many outputs using TensorFlow and Sckit-learn? Or is that just done when I am creating the x_train, y_train, etc. sets? I want to predict the heat index, temperature, and humidity based on various meteorological factors (air pressure, HDD, CDD, pollution, etc.) The 3 factors I wish to predict are part of the 30 total features.
I am using TensorFlows RNN tutorial: https://www.tensorflow.org/tutorials/structured_data/time_series
univariate_past_history = 30
univariate_future_target = 0
x_train_uni, y_train_uni = univariate_data(uni_data, 0, 1930,
univariate_past_history,
univariate_future_target)
x_val_uni, y_val_uni = univariate_data(uni_data, 1930, None,
univariate_past_history,
univariate_future_target)
My data is given daily so I wanted to predict the next day using the last 30 days for example here.
and this is my implementation of the training of the model:
BATCH_SIZE = 256
BUFFER_SIZE = 10000
train_univariate = tf.data.Dataset.from_tensor_slices((x_train_uni, y_train_uni))
train_univariate =
train_univariate.cache().shuffle(BUFFER_SIZE).batch(BATCH_SIZE).repeat()
val_univariate = tf.data.Dataset.from_tensor_slices((x_val_uni, y_val_uni))
val_univariate = val_univariate.batch(BATCH_SIZE).repeat()
simple_lstm_model = tf.keras.models.Sequential([
tf.keras.layers.LSTM(8, input_shape=x_train_uni.shape[-2:]),
tf.keras.layers.Dense(1)
])
simple_lstm_model.compile(optimizer='adam', loss='mae')
for x, y in val_univariate.take(1):
print(simple_lstm_model.predict(x).shape)
EVALUATION_INTERVAL = 200
EPOCHS = 30
simple_lstm_model.fit(train_univariate, epochs=EPOCHS,
steps_per_epoch=EVALUATION_INTERVAL,
validation_data=val_univariate, validation_steps=50)
EDIT: I understand that to increase the number of outputs I have to increase the Dense(1) value, want to understand how to specify which features to output/predict
You need to give the model.fit call the variables you want to learn from in a shape compatible with an LSTM layer
So for example, without any code a model like yours might take as input:
[batchsize, n_timestamps, n_features]
and output:
[batchsize, n_timestamps, m_features]
where n is input and m output.
So then you need to give the model the truth data of the same shape as the model output in order for the model to calculate a loss.
So the model.fit call should be:
model.fit(x_train, y_train, ....) where y_train are the truth vectors of the same shape as the model output.
You have to design a model architecture that fits your needs and matches the outputs you expect. I made a toy example, but I have never really worked with this type of NN so no idea if it makes sense for the problem.
import tensorflow as tf
from tensorflow.keras.layers import LSTM, Dense, InputLayer, Reshape
ni_feats = 10
no_feats = 3
ndays = 30
model = tf.keras.Sequential([
InputLayer((ndays, ni_feats)),
LSTM(10),
Dense(int(no_feats * ndays)),
Reshape((ndays, no_feats))
])
Edit: For anyone interested. I made it slight better. I used L2 regularizer=0.0001, I added two more dense layers with 3 and 5 nodes with no activation functions. Added doupout=0.1 for the 2nd and 3rd GRU layers.Reduced batch size to 1000 and also set loss function to mae
Important note: I discovered that my TEST dataframe wwas extremely small compared to the train one and that is the main Reason it gave me very bad results.
I have a GRU model which has 12 features as inputs and I'm trying to predict output power. I really do not understand though whether I choose
1 layer or 5 layers
50 neurons or 512 neuron
10 epochs with a small batch size or 100 eopochs with a large batch size
Different optimizers and activation functions
Dropput and L2 regurlarization
Adding more dense layer.
Increasing and Decreasing learning rate
My results are always the same and doesn't make any sense, my loss and val_loss loss is very steep in first 2 epochs and then for the rest it becomes constant with small fluctuations in val_loss
Here is my code and a figure of losses, and my dataframes if needed:
Dataframe1: https://drive.google.com/file/d/1I6QAU47S5360IyIdH2hpczQeRo9Q1Gcg/view
Dataframe2: https://drive.google.com/file/d/1EzG4TVck_vlh0zO7XovxmqFhp2uDGmSM/view
import pandas as pd
import tensorflow as tf
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from google.colab import files
from tensorboardcolab import TensorBoardColab, TensorBoardColabCallback
tbc=TensorBoardColab() # Tensorboard
from keras.layers.core import Dense
from keras.layers.recurrent import GRU
from keras.models import Sequential
from keras.callbacks import EarlyStopping
from keras import regularizers
from keras.layers import Dropout
df10=pd.read_csv('/content/drive/My Drive/Isolation Forest/IF 10 PERCENT.csv',index_col=None)
df2_10= pd.read_csv('/content/drive/My Drive/2019 Dataframe/2019 10minutes IF 10 PERCENT.csv',index_col=None)
X10_train= df10[['WindSpeed_mps','AmbTemp_DegC','RotorSpeed_rpm','RotorSpeedAve','NacelleOrientation_Deg','MeasuredYawError','Pitch_Deg','WindSpeed1','WindSpeed2','WindSpeed3','GeneratorTemperature_DegC','GearBoxTemperature_DegC']]
X10_train=X10_train.values
y10_train= df10['Power_kW']
y10_train=y10_train.values
X10_test= df2_10[['WindSpeed_mps','AmbTemp_DegC','RotorSpeed_rpm','RotorSpeedAve','NacelleOrientation_Deg','MeasuredYawError','Pitch_Deg','WindSpeed1','WindSpeed2','WindSpeed3','GeneratorTemperature_DegC','GearBoxTemperature_DegC']]
X10_test=X10_test.values
y10_test= df2_10['Power_kW']
y10_test=y10_test.values
# scaling values for model
x_scale = MinMaxScaler()
y_scale = MinMaxScaler()
X10_train= x_scale.fit_transform(X10_train)
y10_train= y_scale.fit_transform(y10_train.reshape(-1,1))
X10_test= x_scale.fit_transform(X10_test)
y10_test= y_scale.fit_transform(y10_test.reshape(-1,1))
X10_train = X10_train.reshape((-1,1,12))
X10_test = X10_test.reshape((-1,1,12))
Early_Stop=EarlyStopping(monitor='val_loss', patience=3 , mode='min',restore_best_weights=True)
# creating model using Keras
model10 = Sequential()
model10.add(GRU(units=200, return_sequences=True, input_shape=(1,12),activity_regularizer=regularizers.l2(0.0001)))
model10.add(GRU(units=100, return_sequences=True))
model10.add(GRU(units=50))
#model10.add(GRU(units=30))
model10.add(Dense(units=1, activation='linear'))
model10.compile(loss=['mse'], optimizer='adam',metrics=['mse'])
model10.summary()
history10=model10.fit(X10_train, y10_train, batch_size=1500,epochs=100,validation_split=0.1, verbose=1, callbacks=[TensorBoardColabCallback(tbc),Early_Stop])
score = model10.evaluate(X10_test, y10_test)
print('Score: {}'.format(score))
y10_predicted = model10.predict(X10_test)
y10_predicted = y_scale.inverse_transform(y10_predicted)
y10_test = y_scale.inverse_transform(y10_test)
plt.scatter( df2_10['WindSpeed_mps'], y10_test, label='Measurements',s=1)
plt.scatter( df2_10['WindSpeed_mps'], y10_predicted, label='Predicted',s=1)
plt.legend()
plt.savefig('/content/drive/My Drive/Figures/we move on curve6 IF10.png')
plt.show()
I think the units of GRU are very high there. Too many GRU units might cause vanishing gradient problem. For starting, I would choose 30 to 50 units of GRU. Also, a bit higher learning rate e. g. 0.001.
If the dataset is publicly available can you please give me the link so that I can experiment on that and inform you.
I made it slightly better. I used L2 regularizer=0.0001, I added two more dense layers with 3 and 5 nodes with no activation functions. Added doupout=0.1 for the 2nd and 3rd GRU layers.Reduced batch size to 1000 and also set loss function to mae
Important note: I discovered that my TEST dataframe was extremely small compared to the train one and that is the main Reason it gave me very bad results.
I'm training my Keras model to predict whether, with the provided data parameter, it will make a shot or not and it will represent in such a way that 0 means no and 1 means yes. However, when I try to predict it I got values that are float.
I've tried using the data that is exactly the same as train data to get 1 but it does not work.
I used the data below to tried the one-hot encoding.
https://github.com/eijaz1/Deep-Learning-in-Keras-Tutorial/blob/master/keras_tutorial.ipynb
import pandas as pd
from keras.utils import to_categorical
from keras.models import load_model
from keras.models import Sequential
from keras.layers import Dense
from keras.callbacks import EarlyStopping
#read in training data
train_df_2 = pd.read_csv('diabetes_data.csv')
#view data structure
train_df_2.head()
#create a dataframe with all training data except the target column
train_X_2 = train_df_2.drop(columns=['diabetes'])
#check that the target variable has been removed
train_X_2.head()
#one-hot encode target column
train_y_2 = to_categorical(train_df_2.diabetes)
#vcheck that target column has been converted
train_y_2[0:5]
#create model
model_2 = Sequential()
#get number of columns in training data
n_cols_2 = train_X_2.shape[1]
#add layers to model
model_2.add(Dense(250, activation='relu', input_shape=(n_cols_2,)))
model_2.add(Dense(250, activation='relu'))
model_2.add(Dense(250, activation='relu'))
model_2.add(Dense(2, activation='softmax'))
#compile model using accuracy to measure model performance
model_2.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
early_stopping_monitor = EarlyStopping(patience=3)
model_2.fit(train_X_2, train_y_2, epochs=30, validation_split=0.2, callbacks=[early_stopping_monitor])
train_dft = pd.read_csv('diabetes_data - Copy.csv')
train_dft.head()
test_y_predictions = model_2.predict(train_dft)
print(test_y_predictions)
I wanted to get
[[0,1]
[1,0]]
However, I am getting
[[0.8544417 0.14555828]
[0.9312985 0.06870154]]
Additionally, can anyone explain to me what does this value 0.8544417 mean?
Actually, you may interpret the output of a model with a softmax classifier at the top as the confidence scores or probabilities of classes (because the softmax function normalizes the values such that they would be positive and have a sum of 1). So, when you provide the model with a true label of [1, 0] this means that this sample belongs to class 1 with probability of 1, and it belongs to class 2 with probability of zero. Therefore, during training the optimization process tries to get as close as possible to that label, but it would never exactly reach [1,0] (actually due to softmax it might get as close as [0.999999, 0.000001], but never [1, 0]).
But that is not a problem, because we are interested to get just close enough and know the class with the highest probability and consider that as the prediction of the model. And you can easily do that by finding the index of the class with maximum probability:
import numpy as np
preds = model.predict(some_data)
class_preds = np.argmax(preds, axis=-1) # e.g. for [max,min] it gives 0, for [min,max] it gives 1
Further, if you are interested to convert predictions to either [0,1] or [1,0] for any reason, you can just round the values:
import numpy as np
preds = model.predict(some_data)
round_preds = np.around(preds) # this would convert [0.87, 0.13] to [1., 0.]
Note: rounding only works properly with two classes, and not when you have more than two classes (e.g. [0.3, 0.4, 0.3] would become [0, 0, 0] after rounding).
Note 2: Since you are creating the model using Sequential API of Keras, then as an alternative to argmax approach described above you can directly use model.predict_classes(some_data) which gives you the exact same output.
I can't seem to find a concrete answer to the question of how to feed data into Keras. Most examples seem to work off image / text data and have clearly defined data points.
I'm trying to feed music into an LSTM neural network. I want the network to take ~3 seconds of music and nominate the next 2 seconds. I have my music prepared into .wav files and partitioned into 5 second intervals that I've decomposed into my X (first 3 seconds) and Y (last two seconds). I've sampled my music at 44,100 hz so my X is 132,300 observations 'long' and my Y is '88,200' observations long.
But I can't figure out exactly how to bridge Keras to my data structure. I'm using a Tensorflow backend.
In the interest of generalizing the problem and answer, I'll use A,B,C to denote dimensions. The only difference between this example data and my real data is that these are random values distributed from 0 to 1, and my data is an array of integers.
import numpy as np
#using variables to make it easy to generalize the answer
#a = the number of observations I have
a = 411
#b = the duration of the sample, 44.1k observations per second of music
b_train = 132300
b_test = 88200
#c = the number of channels in the music, this is 2 channel stereo
c = 2
#now create sample data with the dimensionality given above:
X = np.random.rand(a,b_train,c)
y = np.random.rand(a,b_test ,c)
#split the data
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.20, random_state=42)
However, I don't really know how to configure a model to understand that the 'first' (A) dimension contains observations and that I want to more or less break out the music (B) by channel (C).
I know that it'd probably be easier to convert this to mono (and a 2d problem) but I'm very curious to see whether or not this has a 'simple' solution - whether that mostly takes the shape of what I have below or whether I should think of the model in another way.
The primary question is this: how would I construct a model that would allow me to transform my X data into my Y data?
Ideally, an answer would show how to modify the model below to fit the data structure above.
import keras
import math, time
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation
from keras.layers.recurrent import LSTM
from keras.models import load_model
def build_model(layers):
d = 0.3
model = Sequential()
model.add(LSTM(256, input_shape=(layers), return_sequences=True))
model.add(Dropout(d))
model.add(LSTM(256, input_shape=(layers), return_sequences=False))
model.add(Dropout(d))
model.add(Dense(32,kernel_initializer="uniform",activation='relu'))
model.add(Dense(1,kernel_initializer="uniform",activation='linear'))
start = time.time()
model.compile(loss='mse',optimizer='adam', metrics=['accuracy'])
print("Compilation Time : ", time.time() - start)
return model
#build model...
model = build_model([328,132300,2])
model.fit(X_train,y_train,batch_size=512,epochs=30,validation_split=0.1,verbose=1)
However, this yields an error (at the model = ... step):
ValueError: Input 0 is incompatible with layer lstm_2: expected ndim=3, found ndim=4
I can't figure out where Keras gets the expectation to see ndim=4 data. Also, I don't know to how to ensure that I feed data into the model such that the model 'understands' observations are distributed across the A-axis and the data itself is distributed on the B- and C-axis.
If anything is unclear, please leave a comment. I'll watch this diligently until Sept '17 or so and I'll be sure to update this question to reflect advice / comments left.
Thanks!
Keras convention is that the batch dimension is typically omitted in the input_shape arguments. From the guide:
Pass an input_shape argument to the first layer. This is a shape tuple (a tuple of integers or None entries, where None indicates that any positive integer may be expected). In input_shape, the batch dimension is not included.
So changing model = build_model([132300,2]) should solve the problem.