I want to start trying my hand at neural networks and found keras to be very simple syntactically. My set up is X_train is an array of shape (3516, 6)
and y_train is of shape (3516,)
X_train looks like this:
[[ 888. 900.5 855. 879.311 877.00266667
893.5008 ]
[ 875. 878.5 840. 880.026 874.56933333
890.7948 ]
[ 860. 870. 839.5 880.746 870.54333333
887.6428 ]....]
it is an input of 6 pieces of financial data to predict one output. I know its not going to be accurate but it is to get me going on something at least before I get on to RNNs
my problem is that the loss function at every epoch shows nan, accuracy shows 0%, validation_accuracy shows zero percent as if to say that data isnt even being passed through the model, I mean even if its a poor model with poor inputs even that should be represented by a large loss right? here is the model:(see below)
anyway guys I am sure that I am doing something wrong and would really appreciate you guys' input
many thanks
S
EDIT: FULL WORKING CODE:
def load_data(keyword):
df = pd.read_csv('%s_x.csv' %keyword)
df2 = pd.read_csv('%s_y.csv' %keyword)
df2 = df2['label']
try:
df.drop('Unnamed: 0', axis = 1, inplace=True)
except:
print('wouldnt let drop unnamed column')
X = df.as_matrix()
y = df2.as_matrix()
X_len = len(X)
test_size = 0.2
test_split = int(test_size * X_len)
X_train = X[:-test_split]
y_train = y[:-test_split]
X_test = X[-test_split:]
y_test = y[-test_split:]
def keras():
model = Sequential( [
Dense(input_dim=3, output_dim=3),
Dense(output_dim=60, activation='linear'),
core.Dropout(p=0.1),
Dense(60, activation='linear'),
core.Dropout(p=0.1),
Dense(1, activation='linear')
])
return model
def training(epoch):
# start the program off by loading some data into it
X_train, X_test, y_train, y_test = load_data('admiral')
y_train = y_train.reshape(len(y_train), 1)
y_test = y_test.reshape(len(y_test), 1)
model = keras()
# optimizer will go into the compile function
# RMSpop is apparently a pretty decent choice for recurrent neural networks although we will start it on a simple nn too.
rms = optimizers.RMSprop(lr=0.001, rho = 0.9, epsilon =1e-08)
model.compile(optimizer= rms, loss='mean_squared_error ', metrics = ['accuracy'])
model.fit(X_train, y_train, nb_epoch=epoch, batch_size =500, validation_split=0.01)
score = model.evaluate(X_test, y_test, batch_size=50)
print(score)
training(300)
accuracy really low because it doesnt make sense to show accuracy, for a regression problem, it is more for classification
data was being passed through it was just so low it was a Nan question answered
Related
I am trying to build a model to predict house prices.
I have some features X (no. of bathrooms , etc.) and target Y (ranging around $300,000 to $800,000)
I have used sklearn's Standard Scaler to standardize Y before fitting it to the model.
Here is my Keras model:
def build_model():
model = Sequential()
model.add(Dense(36, input_dim=36, activation='relu'))
model.add(Dense(18, input_dim=36, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='mse', optimizer='sgd', metrics=['mae','mse'])
return model
I am having trouble trying to interpret the results -- what does a MSE of 0.617454319755 mean?
Do I have to inverse transform this number, and square root the results, getting an error rate of 741.55 in dollars?
math.sqrt(sc.inverse_transform([mse]))
I apologise for sounding silly as I am starting out!
I apologise for sounding silly as I am starting out!
Do not; this is a subtle issue of great importance, which is usually (and regrettably) omitted in tutorials and introductory expositions.
Unfortunately, it is not as simple as taking the square root of the inverse-transformed MSE, but it is not that complicated either; essentially what you have to do is:
Transform back your predictions to the initial scale of the original data
Get the MSE between these invert-transformed predictions and the original data
Take the square root of the result
in order to get a performance indicator of your model that will be meaningful in the business context of your problem (e.g. US dollars here).
Let's see a quick example with toy data, omitting the model itself (which is irrelevant here, and in fact can be any regression model - not only a Keras one):
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error
import numpy as np
# toy data
X = np.array([[1,2], [3,4], [5,6], [7,8], [9,10]])
Y = np.array([3, 4, 5, 6, 7])
# feature scaling
sc_X = StandardScaler()
X_train = sc_X.fit_transform(X)
# outcome scaling:
sc_Y = StandardScaler()
Y_train = sc_Y.fit_transform(Y.reshape(-1, 1))
Y_train
# array([[-1.41421356],
# [-0.70710678],
# [ 0. ],
# [ 0.70710678],
# [ 1.41421356]])
Now, let's say that we fit our Keras model (not shown here) using the scaled sets X_train and Y_train, and get predictions on the training set:
prediction = model.predict(X_train) # scaled inputs here
print(prediction)
# [-1.4687586 -0.6596055 0.14954728 0.95870024 1.001172 ]
The MSE reported by Keras is actually the scaled MSE, i.e.:
MSE_scaled = mean_squared_error(Y_train, prediction)
MSE_scaled
# 0.052299712818541934
while the 3 steps I have described above are simply:
MSE = mean_squared_error(Y, sc_Y.inverse_transform(prediction)) # first 2 steps, combined
MSE
# 0.10459946572909758
np.sqrt(MSE) # 3rd step
# 0.323418406602187
So, in our case, if our initial Y were US dollars, the actual error in the same units (dollars) would be 0.32 (dollars).
Notice how the naive approach of inverse-transforming the scaled MSE would give a very different (and incorrect) result:
np.sqrt(sc_Y.inverse_transform([MSE_scaled]))
# array([2.25254588])
MSE is mean square error, here is the formula.
Basically it is a mean of square of different of expected output and prediction. Making square root of this will not give you the difference between error and output. This is useful for training.
Currently you have build a model.
If you want to train the model use these function.
mode.fit(x=input_x_array, y=input_y_array, batch_size=None, epochs=1, verbose=1, callbacks=None, validation_split=0.0, validation_data=None, shuffle=True, class_weight=None, sample_weight=None, initial_epoch=0, steps_per_epoch=None, validation_steps=None)
If you want to do prediction of the output you should use following code.
prediction = model.predict(np.array(input_x_array))
print(prediction)
You can find more details here.
https://keras.io/models/about-keras-models/
https://keras.io/models/sequential/
import pandas as pd
df=pd.read_csv('final sheet for project.csv')
features=['moisture','volatile matter','fixed carbon','calorific value','carbon %','oxygen%']
train_data=df[features]
target_data=df.pop('Activation energy')
X_train, X_test, y_train, y_test = train_test_split(train_data,target_data, test_size=0.09375, random_state=1)
standard_X_train=pd.DataFrame(StandardScaler().fit_transform(X_train))
standard_X_test=pd.DataFrame(StandardScaler().fit_transform(X_test))
y_train=y_train.values
y_train = y_train.reshape((-1, 1))
scaler = MinMaxScaler(feature_range=(0, 1))
scaler = scaler.fit(y_train)
normalized_y_train = scaler.transform(y_train)
y_test=y_test.values
y_test = y_test.reshape((-1, 1))
scaler = MinMaxScaler(feature_range=(0, 1))
scaler = scaler.fit(y_test)
normalized_y_test = scaler.transform(y_test)
model=keras.Sequential([layers.Dense(units=20,input_shape=[6,]),layers.Dense(units=1,activation='tanh')])
model.compile(
optimizer='adam',
loss='mae',
)
history = model.fit(standard_X_train,normalized_y_train, validation_data=(standard_X_test,normalized_y_test),epochs=200)
I wish to create a model to predict activation energy using some features . I am getting training loss: 0.0629 and val_loss: 0.4213.
But when I try to predict the activation energies of some other unseen data ,I get bizarre results. I am a beginner in ML.
Can someone please help what changes can be made in the code. ( I want to make a model with one hidden layer of 20 units that has activation function tanh.)
You should not use fit_transform for test data. You should use fit_transform for training data and apply just transform to test data, in order to use the same parameters for training data, on the test data.
So, the transformation part of your code should change like this:
scaler_x = StandardScaler()
standard_X_train = pd.DataFrame(scaler_x.fit_transform(X_train))
standard_X_test = pd.DataFrame(scaler_x.transform(X_test))
y_train=y_train.values
y_train = y_train.reshape((-1, 1))
y_test=y_test.values
y_test = y_test.reshape((-1, 1))
scaler_y = MinMaxScaler(feature_range=(0, 1))
normalized_y_train = scaler_y.fit_transform(y_train)
normalized_y_test = scaler_y.transform(y_test)
Furthermore, since you are scaling your data, you should do the same thing for any prediction. So, your prediction line should be something like:
preds = scaler_y.inverse_transform(
model.predict(scaler_x.transform(pred_input)) #if it is standard_X_test you don't need to transform again, since you already did it.
)
Additionally, since you are transforming your labels in range 0 and 1, you may need to change your last layer activation function to sigmoid instead of tanh, or even may better to use an activation function like relu in your first layer if you are still getting poor results after above modifications.
model=keras.Sequential([
layers.Dense(units=20,input_shape=[6,],activation='relu'),
layers.Dense(units=1,activation='sigmoid')
])
Dataset:
The PV Yield (kWh) is my output. My model is suppose to predict this.
This is what I have done. I have attached the image of the dataset. From AirTemp to Zenith is my X and Y is PV Yield(KW/H).
df=pd.read_csv("Data1.csv")
X=df.drop(['Date-PrimaryKey','output-PV Yield (kWh)'],axis=1)
Y=df['output-PV Yield (kWh)']
pca = PCA(n_components=9)
pca.fit(X_train)
X_train = pca.transform(X_train)
pca.fit(X_test)
X_test = pca.transform(X_test)
#normalizing the input values to fall in -1 to 1
X_train = X_train/180000000.0
X_test = X_test/180000000.0
#Creating Model
model = Sequential()
model.add(Dense(15, input_shape=(9,)))
model.add(Activation('tanh'))
model.add(Dense(11))
model.add(Activation('tanh'))
model.add(Dense(1))
model.summary()
sgd = optimizers.SGD(lr=0.1,momentum=0.2)
model.compile(loss='mean_absolute_error',optimizer=sgd,metrics=['accuracy'])
#Training
model.fit(X_train, train_y, epochs=20, batch_size = 50, validation_data=(X_test, test_y))
My weights are not getting updated. Accuracy is zero in all epochs.
The model seems OK but there are two problems I can spot fast:
pca = PCA(n_components=9)
pca.fit(X_train)
X_train = pca.transform(X_train)
pca.fit(X_test)
X_test = pca.transform(X_test)
Anything used for transformation of the data must not be fit on testing data. You fit it on train samples and then use it to transform both train and test part. You should assume that you know nothing about data you will be predicting on in production, eg. you know nothing about tomorrows weather, results of sport matches in a month, etc. You wont be able to do so then, so you cant do so during training. Correct way:
pca = PCA(n_components=9)
pca.fit(X_train)
X_train = pca.transform(X_train)
X_test = pca.transform(X_test)
The second very incorrect stuff you have there is here:
#normalizing the input values to fall in -1 to 1
X_train = X_train/180000000.0
X_test = X_test/180000000.0
Of course you want to normalize your data, but this way you will end up with incredibly low decimals in cases where values are low, eg. AlbedoDaily column, and quite high values where are values high, such as SurfacePressure. For such scaling you can use already defined classes such as standard scaler. The code is very simple and each column is treated independently:
from sklearn.preprocessing import StandardScaler
transformer = StandardScaler().fit(X_train)
X_train = transformer.transform(X_train)
X_test = transformer.transform(X_test)
You have not provided or explained what your target variable is and where you get is, there could be other problems in your code I can not see right now.
I am trying to create a binary classifier on a data set of 10,000. I have tried multiple Activators and Optimizers, however the results are always between 56.8% and 58.9%. Given the fairly steady results over many dozen iterations, I assume the problem is either:
My dataset is not classifiable
My model is broken
This is the data set: training-set.csv
I may be able to get 2000 more records but that would be it.
My question is: is there something in the way my model is constructed that is preventing it from learning to a higher degree?
Note that I am happy to have as many layers and nodes as needed, and time is not a factor in generating the model.
dataframe = pandas.read_csv(r"training-set.csv", index_col=None)
dataset = dataframe.values
X = dataset[:,0:48].astype(float)
Y = dataset[:,48]
#count the input variables
col_count = X.shape[1]
#normalize X
from sklearn.preprocessing import StandardScaler
sc_X = StandardScaler()
X_scale = sc_X.fit_transform(X)
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X_scale, Y, test_size = 0.2)
# define baseline model
activator = 'linear' #'relu' 'sigmoid' 'softmax' 'exponential' 'linear' 'tanh'
#opt = 'Adadelta' #adam SGD nadam RMSprop Adadelta
nodes = 1000
max_layers = 2
max_epochs = 100
max_batch = 32
loss_funct = 'binary_crossentropy' #for binary
last_act = 'sigmoid' # 'softmax' 'sigmoid' 'relu'
def baseline_model():
# create model
model = Sequential()
model.add(Dense(nodes, input_dim=col_count, activation=activator))
for x in range(0, max_layers):
model.add(Dropout(0.2))
model.add(Dense(nodes, input_dim=nodes, activation=activator))
#model.add(BatchNormalization())
model.add(Dense(1, activation=last_act)) #model.add(Dense(1, activation=last_act))
# Compile model
adam = Adam(lr=0.001)
model.compile(loss=loss_funct, optimizer=adam, metrics=['accuracy'])
return model
estimator = KerasClassifier(build_fn=baseline_model, epochs=max_epochs, batch_size=max_batch)
estimator.fit(X_train, y_train)
y_pred = estimator.predict(X_test)
#confusion matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
score = np.sum(cm.diagonal())/float(np.sum(cm))
Two points:
There is absolutely no point in stacking dense layers with linear activations - they only result to a single linear unit; change to activator = 'relu' (and just don't bother with the other candidate activation functions in your commented-out list).
Do not use dropout by default, especially if your model has difficulties in learning (like here); remove the dropout layer(s), and just be ready to put (some of) them back in only in case you see overfitting (you are currently still very far from that point, so this is not something to worry about now).
I am going through the Kaggle Digit Recognizer Tutorial and I'm trying to understand how all of this works. I would like to validate a predicted value. Basically, I have a prediction that's wrong, but I want to see what the actual value of that prediction was. I think I am way off:
...
df = pd.read_csv('data/train.csv')
labels = df['label'].values
x_train = df.drop(columns=['label']).values / 255
# trying to produce a crappy dataset for train/test
x_train, x_test, y_train, y_test = train_test_split(x_train, labels, test_size=0.95)
# Purposely trying to get a crappy model so I can learn about validation
model = tf.keras.models.Sequential()
# model.add(tf.keras.layers.Flatten())
# model.add(tf.keras.layers.Dense(128, activation=tf.nn.relu))
model.add(tf.keras.layers.Dense(10, activation=tf.nn.softmax))
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(x_train, y_train, epochs=1)
predictions = model.predict([x_test])
index_to_predict = 0
print('Prediction: ', np.argmax(predictions[index_to_predict]))
print('Actual: ', predictions.argmax(axis=-1)[index_to_predict])
print(predictions.shape)
vals = x_test[index_to_predict].reshape(28, 28)
plt.imshow(vals)
This yields the following:
How can I get a true 'heres the prediction' and 'heres the actual' breakdown? My logic on getting the actual is definitely off.
The true labels (also sometimes called target values, or ground-truth labels) are stored in y_train and y_test for training and test set respectively. Therefore, you can easily just print that to find the true label:
print('Actual:', y_test[index_to_predict])
y_test[index_to_predict]
will have the actual label and
predictions[index_to_predict]
should have the predicted probability values for each of your classes.