Solar power prediction using Keras - python

Dataset:
The PV Yield (kWh) is my output. My model is suppose to predict this.
This is what I have done. I have attached the image of the dataset. From AirTemp to Zenith is my X and Y is PV Yield(KW/H).
df=pd.read_csv("Data1.csv")
X=df.drop(['Date-PrimaryKey','output-PV Yield (kWh)'],axis=1)
Y=df['output-PV Yield (kWh)']
pca = PCA(n_components=9)
pca.fit(X_train)
X_train = pca.transform(X_train)
pca.fit(X_test)
X_test = pca.transform(X_test)
#normalizing the input values to fall in -1 to 1
X_train = X_train/180000000.0
X_test = X_test/180000000.0
#Creating Model
model = Sequential()
model.add(Dense(15, input_shape=(9,)))
model.add(Activation('tanh'))
model.add(Dense(11))
model.add(Activation('tanh'))
model.add(Dense(1))
model.summary()
sgd = optimizers.SGD(lr=0.1,momentum=0.2)
model.compile(loss='mean_absolute_error',optimizer=sgd,metrics=['accuracy'])
#Training
model.fit(X_train, train_y, epochs=20, batch_size = 50, validation_data=(X_test, test_y))
My weights are not getting updated. Accuracy is zero in all epochs.

The model seems OK but there are two problems I can spot fast:
pca = PCA(n_components=9)
pca.fit(X_train)
X_train = pca.transform(X_train)
pca.fit(X_test)
X_test = pca.transform(X_test)
Anything used for transformation of the data must not be fit on testing data. You fit it on train samples and then use it to transform both train and test part. You should assume that you know nothing about data you will be predicting on in production, eg. you know nothing about tomorrows weather, results of sport matches in a month, etc. You wont be able to do so then, so you cant do so during training. Correct way:
pca = PCA(n_components=9)
pca.fit(X_train)
X_train = pca.transform(X_train)
X_test = pca.transform(X_test)
The second very incorrect stuff you have there is here:
#normalizing the input values to fall in -1 to 1
X_train = X_train/180000000.0
X_test = X_test/180000000.0
Of course you want to normalize your data, but this way you will end up with incredibly low decimals in cases where values are low, eg. AlbedoDaily column, and quite high values where are values high, such as SurfacePressure. For such scaling you can use already defined classes such as standard scaler. The code is very simple and each column is treated independently:
from sklearn.preprocessing import StandardScaler
transformer = StandardScaler().fit(X_train)
X_train = transformer.transform(X_train)
X_test = transformer.transform(X_test)
You have not provided or explained what your target variable is and where you get is, there could be other problems in your code I can not see right now.

Related

Neural Network prediction OK on testing data but bad on full dataset [duplicate]

I am trying to build a model to predict house prices.
I have some features X (no. of bathrooms , etc.) and target Y (ranging around $300,000 to $800,000)
I have used sklearn's Standard Scaler to standardize Y before fitting it to the model.
Here is my Keras model:
def build_model():
model = Sequential()
model.add(Dense(36, input_dim=36, activation='relu'))
model.add(Dense(18, input_dim=36, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='mse', optimizer='sgd', metrics=['mae','mse'])
return model
I am having trouble trying to interpret the results -- what does a MSE of 0.617454319755 mean?
Do I have to inverse transform this number, and square root the results, getting an error rate of 741.55 in dollars?
math.sqrt(sc.inverse_transform([mse]))
I apologise for sounding silly as I am starting out!
I apologise for sounding silly as I am starting out!
Do not; this is a subtle issue of great importance, which is usually (and regrettably) omitted in tutorials and introductory expositions.
Unfortunately, it is not as simple as taking the square root of the inverse-transformed MSE, but it is not that complicated either; essentially what you have to do is:
Transform back your predictions to the initial scale of the original data
Get the MSE between these invert-transformed predictions and the original data
Take the square root of the result
in order to get a performance indicator of your model that will be meaningful in the business context of your problem (e.g. US dollars here).
Let's see a quick example with toy data, omitting the model itself (which is irrelevant here, and in fact can be any regression model - not only a Keras one):
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error
import numpy as np
# toy data
X = np.array([[1,2], [3,4], [5,6], [7,8], [9,10]])
Y = np.array([3, 4, 5, 6, 7])
# feature scaling
sc_X = StandardScaler()
X_train = sc_X.fit_transform(X)
# outcome scaling:
sc_Y = StandardScaler()
Y_train = sc_Y.fit_transform(Y.reshape(-1, 1))
Y_train
# array([[-1.41421356],
# [-0.70710678],
# [ 0. ],
# [ 0.70710678],
# [ 1.41421356]])
Now, let's say that we fit our Keras model (not shown here) using the scaled sets X_train and Y_train, and get predictions on the training set:
prediction = model.predict(X_train) # scaled inputs here
print(prediction)
# [-1.4687586 -0.6596055 0.14954728 0.95870024 1.001172 ]
The MSE reported by Keras is actually the scaled MSE, i.e.:
MSE_scaled = mean_squared_error(Y_train, prediction)
MSE_scaled
# 0.052299712818541934
while the 3 steps I have described above are simply:
MSE = mean_squared_error(Y, sc_Y.inverse_transform(prediction)) # first 2 steps, combined
MSE
# 0.10459946572909758
np.sqrt(MSE) # 3rd step
# 0.323418406602187
So, in our case, if our initial Y were US dollars, the actual error in the same units (dollars) would be 0.32 (dollars).
Notice how the naive approach of inverse-transforming the scaled MSE would give a very different (and incorrect) result:
np.sqrt(sc_Y.inverse_transform([MSE_scaled]))
# array([2.25254588])
MSE is mean square error, here is the formula.
Basically it is a mean of square of different of expected output and prediction. Making square root of this will not give you the difference between error and output. This is useful for training.
Currently you have build a model.
If you want to train the model use these function.
mode.fit(x=input_x_array, y=input_y_array, batch_size=None, epochs=1, verbose=1, callbacks=None, validation_split=0.0, validation_data=None, shuffle=True, class_weight=None, sample_weight=None, initial_epoch=0, steps_per_epoch=None, validation_steps=None)
If you want to do prediction of the output you should use following code.
prediction = model.predict(np.array(input_x_array))
print(prediction)
You can find more details here.
https://keras.io/models/about-keras-models/
https://keras.io/models/sequential/

Not able to get correct results even though mean absolute error is low

import pandas as pd
df=pd.read_csv('final sheet for project.csv')
features=['moisture','volatile matter','fixed carbon','calorific value','carbon %','oxygen%']
train_data=df[features]
target_data=df.pop('Activation energy')
X_train, X_test, y_train, y_test = train_test_split(train_data,target_data, test_size=0.09375, random_state=1)
standard_X_train=pd.DataFrame(StandardScaler().fit_transform(X_train))
standard_X_test=pd.DataFrame(StandardScaler().fit_transform(X_test))
y_train=y_train.values
y_train = y_train.reshape((-1, 1))
scaler = MinMaxScaler(feature_range=(0, 1))
scaler = scaler.fit(y_train)
normalized_y_train = scaler.transform(y_train)
y_test=y_test.values
y_test = y_test.reshape((-1, 1))
scaler = MinMaxScaler(feature_range=(0, 1))
scaler = scaler.fit(y_test)
normalized_y_test = scaler.transform(y_test)
model=keras.Sequential([layers.Dense(units=20,input_shape=[6,]),layers.Dense(units=1,activation='tanh')])
model.compile(
optimizer='adam',
loss='mae',
)
history = model.fit(standard_X_train,normalized_y_train, validation_data=(standard_X_test,normalized_y_test),epochs=200)
I wish to create a model to predict activation energy using some features . I am getting training loss: 0.0629 and val_loss: 0.4213.
But when I try to predict the activation energies of some other unseen data ,I get bizarre results. I am a beginner in ML.
Can someone please help what changes can be made in the code. ( I want to make a model with one hidden layer of 20 units that has activation function tanh.)
You should not use fit_transform for test data. You should use fit_transform for training data and apply just transform to test data, in order to use the same parameters for training data, on the test data.
So, the transformation part of your code should change like this:
scaler_x = StandardScaler()
standard_X_train = pd.DataFrame(scaler_x.fit_transform(X_train))
standard_X_test = pd.DataFrame(scaler_x.transform(X_test))
y_train=y_train.values
y_train = y_train.reshape((-1, 1))
y_test=y_test.values
y_test = y_test.reshape((-1, 1))
scaler_y = MinMaxScaler(feature_range=(0, 1))
normalized_y_train = scaler_y.fit_transform(y_train)
normalized_y_test = scaler_y.transform(y_test)
Furthermore, since you are scaling your data, you should do the same thing for any prediction. So, your prediction line should be something like:
preds = scaler_y.inverse_transform(
model.predict(scaler_x.transform(pred_input)) #if it is standard_X_test you don't need to transform again, since you already did it.
)
Additionally, since you are transforming your labels in range 0 and 1, you may need to change your last layer activation function to sigmoid instead of tanh, or even may better to use an activation function like relu in your first layer if you are still getting poor results after above modifications.
model=keras.Sequential([
layers.Dense(units=20,input_shape=[6,],activation='relu'),
layers.Dense(units=1,activation='sigmoid')
])

Scaling data for neural network

I was using a sequential model in keras for categorical classification.
given data:
x_train = np.random.random((5000, 20))
y_train = keras.utils.to_categorical(np.random.randint(10, size=(5000, 1)), num_classes=10)
x_test = np.random.random((500, 20))
y_test = keras.utils.to_categorical(np.random.randint(10, size=(500, 1)), num_classes=10)
feature scaling is important:
scaler = StandardScaler()
scaler.fit(x_train)
x_train = scaler.transform(x_train)
x_test = scaler.transform(x_test)
model
model = Sequential()
model.add(Dense(64, activation='relu', input_dim=20))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
model.fit(x_train, y_train,
epochs=100,
batch_size=32)
data to be predicted
z = np.random.random((9999000, 20))
Should I scale this data?
How to scale this data?
predictions = model.predict_classes(z)
As you see that, the training and testing samples are only a few as compared to the data to be predicted (z). Using the scaler fitted with x_train to rescale x_test, seems OK.
However, using the same scaler fitted with only 5000 samples to rescale z (9999000 samples), seems not OK. Are there any best practices in the field of deep learning to solve this problem?
For classifiers not sensitive to feature scaling linke Random Forests do not have this issue. However, for deep learning, this issue exists.
The training data shown here is for example purpose only. In the real problem, the training data is not coming from the same (uniform) probability distribution. It is difficult to label data and the training data has human bias towards easy to label. Only the samples easier to label was labelled.
However, using the same scaler fitted with only 5000 samples to rescale z (9999000 samples), seems not OK.
Not clear why you think so. This is exactly the standard practice, i.e. using the scaler fitted with your training data, as you have correctly done with your test data:
z_scaled = scaler.transform(z)
predictions = model.predict_classes(z_scaled)
The number of samples (500 or 10^6) does not make any difference here; the important thing is for all this data (x and z) to come from the same probability distribution. In practice (and for data that may still be in the future) this is only assumed (and one of the things to watch for after model deployment is exactly if this assumption does not hold, or ceases to be correct after some time). But especially here, with your simulated data coming from the exact same (uniform) probability distribution, this is exactly the correct thing to do.

Keras Prediction after test values

I am currently trying to build neuronal network to be able to predict time series, but the question is, is it possible to predict further than just the test dataset. I mean, for my example, I have a dataset of about 3000 values, from which I keep 90% for training and 10% for testing. Then When I compare the prediction with the actual test value, it corresponds, but is it possible for instance to ask the program to predict the next 500 values (i.e. from 3001 to 3500) ?
Here is a snipper of the code I use.
import csv
import numpy as np
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
from keras.layers.core import Dense, Activation, Dropout
from keras.layers.recurrent import LSTM, GRU
from keras.models import Sequential
from keras import optimizers
from sklearn.svm import SVR
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import learning_curve
from sklearn.kernel_ridge import KernelRidge
import time
from sklearn.preprocessing import MinMaxScaler
sc = MinMaxScaler(feature_range = (-1, 1))
def load_data(datasetname, column, seq_len, normalise_window):
# A support function to help prepare datasets for an RNN/LSTM/GRU
data = datasetname.loc[:,column]
sequence_length = seq_len + 1
result = []
for index in range(len(data) - sequence_length):
result.append(data[index: index + sequence_length])
result = np.array(result)
result.reshape(-1,1)
training_set_scaled = sc.fit_transform(result)
print (result)
#Last 10% is used for validation test, first 90% for training
row = round(0.9 * training_set_scaled.shape[0])
train = training_set_scaled[:int(row), :]
#np.random.shuffle(train)
x_train = train[:, :-1]
y_train = train[:, -1]
X_test = training_set_scaled[int(row):, :-1]
y_test = training_set_scaled[int(row):, -1]
print ("shape train", x_train)
print ("shape train", X_test)
x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1], 1))
X_test = np.reshape(X_test, (X_test.shape[0], X_test.shape[1], 1))
return [x_train, X_test, y_train, y_test]
def build_model():
model = Sequential()
layers = {'input': 100, 'hidden1': 150, 'hidden2': 256, 'hidden3': 100, 'output': 10}
model.add(LSTM(
50,
return_sequences=True,
input_shape=(200,1)
))
model.add(Dropout(0.2))
model.add(LSTM(
layers['hidden2'],
return_sequences=True,
))
model.add(Dropout(0.2))
model.add(LSTM(
layers['hidden3'],
return_sequences=False,
))
model.add(Dropout(0.2))
model.add(Activation("linear"))
model.add(Dense(
output_dim=layers['output']))
start = time.time()
model.compile(loss="mean_squared_error", optimizer="adam")
print ("Compilation Time : ", time.time() - start)
return model
dataset = pd.read_csv(
'data.csv')
X_train, X_test, y_train, y_test = load_data(dataset, 'mean anomaly', 200, False)
model = build_model()
print ("train",X_train)
print ("test",X_test)
model.fit(X_train, y_train, batch_size=256, epochs=1, validation_split=0.05)
predictions = model.predict(X_test)
predictions = np.reshape(predictions, (predictions.size,))
plt.figure(1)
plt.subplot(311)
plt.title("Actual Test Signal w/Anomalies & noise")
plt.plot(y_test)
plt.subplot(312)
plt.title("predicted signal")
plt.plot(predictions, 'g')
plt.subplot(313)
plt.title("training signal")
plt.plot(y_train, 'b')
plt.plot(y_test, 'y')
plt.legend(['train', 'test'])
plt.show()
I have read that I should increase the output dim of the dense layer to get more than 1 predicted value, or increase the size of my window in the load data function ?
Here is the result, the yellow plot is supposed to be after the blue one, it respresents my input test data, the first plot is a zoom on this data and the second one the prediction.
If you want to predict the output value of your serie at t+x based on data at time t, the data you need to feed to the network should already have this format.
Time series data formating :
If you have 3000 data point and want to predict the output value for the next "virtual" 500 point you should offset the output value by this amount. For exemple :
In your dataset, your 500th data point correspond to the 500th output value. If you want to predict "future" values then the 500th data point should have the 1000th output value. You can do this in pandas with the shift function. Be aware that you will loose the last 500 data point by doing so, has they will no longer have an output value.
Then when you predict on data point xi you'll have the output value yi+500. You should find some basic guides for time serie forecasting on sites like machinelearningmastery
Good pratice for model evaluation :
If you want to better evaluate the quality of your model, first find some metrics that suits your problem and try to increase test set percenatage. While graphics are a good way to visualise result, they can be deceiving, try combining them with some metrics ! (be carefull with Mean Squarred Error, it can give you a biased score with value in the range [-1;1] as the square of an error in this range will always be less than the acutal error, try Mean Absolute Error instead)
Data leakage when scalling data :
While scalling data is usually a good thing you need to be carefull doing so. You comited something called a data leak. You used scalling on the whole data set before splitting into training and test set. Further reading about this data leak.
Update
I think i misunderstood your problem.
If you want to "predict further than just the test dataset" you will need some unseen/new data to make more prediction. The test set is only made to evaluate the performance of the learning phase.
Now if you want to predict further than just the next step (this won't allow you to "predict further than just the test dataset" because of the way you change your dataset, see bellow) :
Your model as it's made will only ever predict the next step.
In your example you feed to the algorithm series of lenght 'seq_len' and give them as output the value right after the end of those series. If you want your algorithm to learn to predict in more than one step into the future you y_train must have value at the corresponding time in the future, example :
x = [0,1,2,3,4,5,6,7,8,9,10,...]
seq_len = 5
step_to_predict = 5
So to predict not one step into the future but five, your series will have to look like this :
x_serie_1 = [0,1,2,3,4]
y_serie_1 = [9]
x_serie_2 = [1,2,3,4,5]
y_serie_2 = [10]
This is a way to get your model to learn how to make predictions further into the future than just the next step.

Loss function not showing

I want to start trying my hand at neural networks and found keras to be very simple syntactically. My set up is X_train is an array of shape (3516, 6)
and y_train is of shape (3516,)
X_train looks like this:
[[ 888. 900.5 855. 879.311 877.00266667
893.5008 ]
[ 875. 878.5 840. 880.026 874.56933333
890.7948 ]
[ 860. 870. 839.5 880.746 870.54333333
887.6428 ]....]
it is an input of 6 pieces of financial data to predict one output. I know its not going to be accurate but it is to get me going on something at least before I get on to RNNs
my problem is that the loss function at every epoch shows nan, accuracy shows 0%, validation_accuracy shows zero percent as if to say that data isnt even being passed through the model, I mean even if its a poor model with poor inputs even that should be represented by a large loss right? here is the model:(see below)
anyway guys I am sure that I am doing something wrong and would really appreciate you guys' input
many thanks
S
EDIT: FULL WORKING CODE:
def load_data(keyword):
df = pd.read_csv('%s_x.csv' %keyword)
df2 = pd.read_csv('%s_y.csv' %keyword)
df2 = df2['label']
try:
df.drop('Unnamed: 0', axis = 1, inplace=True)
except:
print('wouldnt let drop unnamed column')
X = df.as_matrix()
y = df2.as_matrix()
X_len = len(X)
test_size = 0.2
test_split = int(test_size * X_len)
X_train = X[:-test_split]
y_train = y[:-test_split]
X_test = X[-test_split:]
y_test = y[-test_split:]
def keras():
model = Sequential( [
Dense(input_dim=3, output_dim=3),
Dense(output_dim=60, activation='linear'),
core.Dropout(p=0.1),
Dense(60, activation='linear'),
core.Dropout(p=0.1),
Dense(1, activation='linear')
])
return model
def training(epoch):
# start the program off by loading some data into it
X_train, X_test, y_train, y_test = load_data('admiral')
y_train = y_train.reshape(len(y_train), 1)
y_test = y_test.reshape(len(y_test), 1)
model = keras()
# optimizer will go into the compile function
# RMSpop is apparently a pretty decent choice for recurrent neural networks although we will start it on a simple nn too.
rms = optimizers.RMSprop(lr=0.001, rho = 0.9, epsilon =1e-08)
model.compile(optimizer= rms, loss='mean_squared_error ', metrics = ['accuracy'])
model.fit(X_train, y_train, nb_epoch=epoch, batch_size =500, validation_split=0.01)
score = model.evaluate(X_test, y_test, batch_size=50)
print(score)
training(300)
accuracy really low because it doesnt make sense to show accuracy, for a regression problem, it is more for classification
data was being passed through it was just so low it was a Nan question answered

Categories