This is my code:-
# Importing the essential libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Getting the dataset
data = pd.read_csv("sales_train.csv")
X = data.iloc[:, 1:-1].values
y = data.iloc[:, -1].values
# y = np.array(y).reshape(-1, 1)
# Getting the values for november 2013 and 2014 to predict 2015
list_of_november_values = []
list_of_november_values_y = []
for i in range(0, len(y)):
if X[i, 0] == 10 or X[i, 0] == 22:
list_of_november_values.append(X[i, 1:])
list_of_november_values_y.append(y[i])
# Converting list to array
arr_of_november_values = np.array(list_of_november_values)
y_train = np.array(list_of_november_values_y).reshape(-1, 1)
# Scaling the independent values
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(arr_of_november_values)
# Creating the neural network
from keras.models import Sequential
from keras.layers import Dense
nn = Sequential()
nn.add(Dense(units=120, activation='relu'))
nn.add(Dense(units=60, activation='relu'))
nn.add(Dense(units=30, activation='relu'))
nn.add(Dense(units=15, activation='relu'))
nn.add(Dense(units=1, activation='softmax'))
nn.compile(optimizer='adam', loss='mse')
nn.fit(X_train, y_train, batch_size=100, epochs=25)
# Saving the weights
nn.save_weights('weights.h5')
print("Weights Saved")
For my loss, I am getting the same value for every epoch. Is it possible if there is a concept I am missing that is causing my loss to be constant??
Here is the dataset for the code.
The predominant reason is your odd choice of final-layer activation, paired with the loss function used. Reconsider this: you are using softmax activation on a single-unit fully-connected layer. Softmax activation takes a vector and scales it such that the sum of the values are equal to one and it retains proportion according to the following function:
The idea is that your network will only ever output 1, thus there are no gradients, and no learning.
To resolve this, first change your final layer activation to either ReLU or Linear, depending upon the structure of your dataset (I'm not willing to use the provided data myself, but I'm sure you understand the structure of your dataset).
I expect there may be further issues regarding the structure of your network, but I'll leave that up to you. For now, the big issue is your final-layer activation.
Change this line:
nn.add(Dense(units=1, activation='softmax'))
To this line:
nn.add(Dense(units=1))
For a regression problem, you don't need an activation function.
Related
I am all new to neural network. I have a dataset of 3d joint positions (6400*23*3) and orientations in quaternions (6400*23*4) and I want to predict the joint angles for all 22 joints and 3 motion planes (6400*22*3). I have tried to make a model however it will not run as the input data don't match the output shape, and I can't figure out how to change it.
my code
import scipy
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv1D, MaxPooling2D, Dense, Flatten
from tensorflow.keras.utils import to_categorical
Jaload = scipy.io.loadmat('JointAnglesXsens11MovementsIforlængelse.mat')
Orload = scipy.io.loadmat('OrientationXsens11MovementsIforlængelse.mat')
Or = np.array((Orload['OR'][:,:]), dtype='float')
Ja = np.array((Jaload['JA'][:,:]), dtype='float')
Jalabel = np.array(Ja)
a = 0.6108652382
Jalabel[Jalabel<a] = 0
Jalabel[Jalabel>a] = 1
Ja3d = np.array(Jalabel.reshape(6814,22,3)) # der er 22 ledvinkler
Or3d = np.array(Or.reshape(6814,23,4)) # der er 23 segmenter
X_train = np.array(Or3d)
Y_train = np.array(Ja3d)
model = Sequential([
Dense(64, activation='relu', input_shape=(23,4)),
Dense(64, activation='relu'),
Dense(3, activation='softmax'),])
model.summary()
model.compile(loss='categorical_crossentropy', optimizer='adam') # works
model.fit(
X_train,
to_categorical(Y_train),
epochs=3,)
Running the model.fit returns with:
ValueError: A target array with shape (6814, 22, 3, 2) was passed for an output of shape (None, 3) while using as loss categorical_crossentropy. This loss expects targets to have the same shape as the output.
Here are some suggestions that might get you further down the road:
(1) You might want to insert a "Flatten()" layer just before the final Dense. This will basically collapse the output from the previous layers into a single dimension.
(2) You might want to make the final Dense layer have 22*3=66 units as opposed to three. Each output unit will represent a particular joint angle.
(3) You might want to likewise collapse the Y_train to be (num_samples, 22*3) using the numpy reshape.
(4) You might want to make the final Dense layer have "linear" activation instead of "softmax" - softmax will force the outputs to sum to 1 as a probability.
(5) Don't convert the y_train to categorical. It is already in the correct format I believe (after you reshape it to match the revised output of the model).
(6) The metric to use is probably not "categorical_crossentropy" but perhaps "mse" (mean squared error).
Hopefully, some of the above will help move you in the right direction. I hope this helps.
I'm training my Keras model to predict whether, with the provided data parameter, it will make a shot or not and it will represent in such a way that 0 means no and 1 means yes. However, when I try to predict it I got values that are float.
I've tried using the data that is exactly the same as train data to get 1 but it does not work.
I used the data below to tried the one-hot encoding.
https://github.com/eijaz1/Deep-Learning-in-Keras-Tutorial/blob/master/keras_tutorial.ipynb
import pandas as pd
from keras.utils import to_categorical
from keras.models import load_model
from keras.models import Sequential
from keras.layers import Dense
from keras.callbacks import EarlyStopping
#read in training data
train_df_2 = pd.read_csv('diabetes_data.csv')
#view data structure
train_df_2.head()
#create a dataframe with all training data except the target column
train_X_2 = train_df_2.drop(columns=['diabetes'])
#check that the target variable has been removed
train_X_2.head()
#one-hot encode target column
train_y_2 = to_categorical(train_df_2.diabetes)
#vcheck that target column has been converted
train_y_2[0:5]
#create model
model_2 = Sequential()
#get number of columns in training data
n_cols_2 = train_X_2.shape[1]
#add layers to model
model_2.add(Dense(250, activation='relu', input_shape=(n_cols_2,)))
model_2.add(Dense(250, activation='relu'))
model_2.add(Dense(250, activation='relu'))
model_2.add(Dense(2, activation='softmax'))
#compile model using accuracy to measure model performance
model_2.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
early_stopping_monitor = EarlyStopping(patience=3)
model_2.fit(train_X_2, train_y_2, epochs=30, validation_split=0.2, callbacks=[early_stopping_monitor])
train_dft = pd.read_csv('diabetes_data - Copy.csv')
train_dft.head()
test_y_predictions = model_2.predict(train_dft)
print(test_y_predictions)
I wanted to get
[[0,1]
[1,0]]
However, I am getting
[[0.8544417 0.14555828]
[0.9312985 0.06870154]]
Additionally, can anyone explain to me what does this value 0.8544417 mean?
Actually, you may interpret the output of a model with a softmax classifier at the top as the confidence scores or probabilities of classes (because the softmax function normalizes the values such that they would be positive and have a sum of 1). So, when you provide the model with a true label of [1, 0] this means that this sample belongs to class 1 with probability of 1, and it belongs to class 2 with probability of zero. Therefore, during training the optimization process tries to get as close as possible to that label, but it would never exactly reach [1,0] (actually due to softmax it might get as close as [0.999999, 0.000001], but never [1, 0]).
But that is not a problem, because we are interested to get just close enough and know the class with the highest probability and consider that as the prediction of the model. And you can easily do that by finding the index of the class with maximum probability:
import numpy as np
preds = model.predict(some_data)
class_preds = np.argmax(preds, axis=-1) # e.g. for [max,min] it gives 0, for [min,max] it gives 1
Further, if you are interested to convert predictions to either [0,1] or [1,0] for any reason, you can just round the values:
import numpy as np
preds = model.predict(some_data)
round_preds = np.around(preds) # this would convert [0.87, 0.13] to [1., 0.]
Note: rounding only works properly with two classes, and not when you have more than two classes (e.g. [0.3, 0.4, 0.3] would become [0, 0, 0] after rounding).
Note 2: Since you are creating the model using Sequential API of Keras, then as an alternative to argmax approach described above you can directly use model.predict_classes(some_data) which gives you the exact same output.
I have a minor doubt suppose my data has a continuous time series(non stationary) & other categorical variables (which are already encoded). Under that circumstance what is the best way to difference the data ? Because categorical data are not differenced but they have to be used while training the model.
I am trying to build an LSTM model so any help is much appreciated.
The data that I am currently using is at daily level & for the sake of illustration I have considered only a univariate scenario (ignoring the variable Var1 & applied differencing only to the variable "TS"
TS Var1
15000 1
14000 1
16000 0
raw_values = daily_data_dummies_V2.values
interval=1
diff = list()
for i in range(interval, len(raw_values)):
value = raw_values[i] - raw_values[i - interval]
diff.append(value)
# rescale values to -1, 1
scaler = MinMaxScaler(feature_range=(-1, 1))
scaled_values = scaler.fit_transform(diff)
X = entireData[:,:-1,:]
y = entireData[:,1:,:]
from keras.layers import merge, Input, Dense, TimeDistributed, Lambda
from keras.callbacks import LambdaCallback
from keras.layers import Dropout
# design network
model = Sequential()
model.add(LSTM(150, stateful=True, batch_input_shape=(1, None, 1),
return_sequences=True))
model.add(Dropout(0.5))
model.add(TimeDistributed(Dense(1)))
model.compile(loss='mean_squared_error', optimizer='adam')
# fit network
model.fit(X, y, epochs=5000, batch_size=1, verbose=2)
previous_inputs=X
model.reset_states()
predictions = model.predict(previous_inputs)
Then I have predicted the entire data & would like to use the last prediction to predict ahead.In doing so I will have to revert the scaling & revert the differencing. Not sure how will I adjust the loop to do the same:
first, let set the model's states (it's important for it to know the previous trends)
predictions = model.predict(previous_inputs) #this creates states
#future predictions
future = []
currentStep = predictions[:,-1:,:] #last step from the previous
prediction
for i in range(31):
currentStep = model.predict(currentStep)
one=currentStep[0]
two=scaler.inverse_transform(one)
three=raw_values[-1]+two
future.append(three)
I am very new to Keras, neural networks and machine learning having just started to learn yesterday. I decided to try predicting the experience over an hour (0 to 23) (for a game and my own generated data-set) that a user would earn. Currently running what I have the predictions seem to be very low and very poor. I have tried a relu activation, which produced predictions all to be zero and from a bit of research, LeakyReLU.
This is the code I have for the prediction model so far:
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LeakyReLU
import numpy
numpy.random.seed(7)
dataset = numpy.loadtxt("experience.csv", delimiter=",")
X = dataset[: ,0]
Y = dataset[: ,1]
model = Sequential()
model.add(Dense(12, input_dim = 1, activation=LeakyReLU(0.3)))
model.add(Dense(8, activation=LeakyReLU(0.3)))
model.add(Dense(1, activation=LeakyReLU(0.3)))
model.compile(loss = 'mean_absolute_error', optimizer='adam', metrics = ['accuracy'])
model.fit(X, Y, epochs=120, batch_size=10, verbose = 0)
predictions = model.predict(X)
rounded = [round(x[0]) for x in predictions]
print(rounded)
I have also tried playing around with the hidden levels of the network, but honestly have no idea how many there should be or a good way to justify an amount.
If it helps here is the data-set I have been using:
https://raw.githubusercontent.com/NightShadeII/xpPredictor/master/experience.csv
Thankyou for any help
Looking at your data it does not seem like a classification problem.
You have two options:
-> Look at the second column and bucket them depending on the ranges and make classes that can be predicted, for instance: 0, 1, 2 etc. Now it tries to train but does not have enough examples for millions of classes that it thinks you are trying to predict.
-> If you want real valued output and not classes, try using linear regression.
Below is the simple example of multi-class classification task with
IRIS data.
import seaborn as sns
import numpy as np
from sklearn.cross_validation import train_test_split
from keras.models import Sequential
from keras.layers.core import Dense, Activation, Dropout
from keras.regularizers import l2
from keras.utils import np_utils
#np.random.seed(1335)
# Prepare data
iris = sns.load_dataset("iris")
iris.head()
X = iris.values[:, 0:4]
y = iris.values[:, 4]
# Make test and train set
train_X, test_X, train_y, test_y = train_test_split(X, y, train_size=0.5, random_state=0)
################################
# Evaluate Keras Neural Network
################################
# Make ONE-HOT
def one_hot_encode_object_array(arr):
'''One hot encode a numpy array of objects (e.g. strings)'''
uniques, ids = np.unique(arr, return_inverse=True)
return np_utils.to_categorical(ids, len(uniques))
train_y_ohe = one_hot_encode_object_array(train_y)
test_y_ohe = one_hot_encode_object_array(test_y)
model = Sequential()
model.add(Dense(16, input_shape=(4,),
activation="tanh",
W_regularizer=l2(0.001)))
model.add(Dropout(0.5))
model.add(Dense(3, activation='sigmoid'))
model.compile(loss='categorical_crossentropy', metrics=['accuracy'], optimizer='adam')
# Actual modelling
# If you increase the epoch the accuracy will increase until it drop at
# certain point. Epoch 50 accuracy 0.99, and after that drop to 0.977, with
# epoch 70
hist = model.fit(train_X, train_y_ohe, verbose=0, nb_epoch=100, batch_size=1)
score, accuracy = model.evaluate(test_X, test_y_ohe, batch_size=16, verbose=0)
print("Test fraction correct (NN-Score) = {:.2f}".format(score))
print("Test fraction correct (NN-Accuracy) = {:.2f}".format(accuracy))
My question is how do people usually decide the size of layers?
For example based on code above we have:
model.add(Dense(16, input_shape=(4,),
activation="tanh",
W_regularizer=l2(0.001)))
model.add(Dense(3, activation='sigmoid'))
Where first parameter of Dense is 16 and second is 3.
Why two layers uses two different values for Dense?
How do we choose what's the best value for Dense?
Basically it is just trial and error. Those are called hyperparameters and should be tuned on a validation set (split from your original data into train/validation/test).
Tuning just means trying different combinations of parameters and keep the one with the lowest loss value or better accuracy on the validation set, depending on the problem.
There are two basic methods:
Grid search: For each parameter, decide a range and steps into that range, like 8 to 64 neurons, in powers of two (8, 16, 32, 64), and try each combination of the parameters. This is obviously requires an exponential number of models to be trained and tested and takes a lot of time.
Random search: Do the same but just define a range for each parameter and try a random set of parameters, drawn from an uniform distribution over each range. You can try as many parameters sets you want, for as how long you can. This is just a informed random guess.
Unfortunately there is no other way to tune such parameters. About layers having different number of neurons, that could come from the tuning process, or you can also see it as dimensionality reduction, like a compressed version of the previous layer.
There is no known way to determine a good network structure evaluating the number of inputs or outputs. It relies on the number of training examples, batch size, number of epochs, basically, in every significant parameter of the network.
Moreover, a high number of units can introduce problems like overfitting and exploding gradient problems. On the other side, a lower number of units can cause a model to have high bias and low accuracy values. Once again, it depends on the size of data used for training.
Sadly it is trying some different values that give you the best adjustments. You may choose the combination that gives you the lowest loss and validation loss values, as well as the best accuracy for your dataset, as said in the previous post.
You could do some proportion on your number of units value, something like:
# Build the model
model = Sequential()
model.add(Dense(num_classes * 8, input_shape=(shape_value,), activation = 'relu' ))
model.add(Dropout(0.5))
model.add(Dense(num_classes * 4, activation = 'relu'))
model.add(Dropout(0.2))
model.add(Dense(num_classes * 2, activation = 'relu'))
model.add(Dropout(0.2))
#Output layer
model.add(Dense(num_classes, activation = 'softmax'))
The model above shows an example of a categorisation AI system. The num_classes are the number of different categories the system has to choose. For instance, in the iris dataset from Keras, we have:
Iris Setosa
Iris Versicolour
Iris Virginica
num_classes = 3
However, this could lead to worse results than with other random values. We need to adjust the parameters to the training dataset by making some different tries and then analyse the results seeking for the best combination of parameters.
My suggestion is to use EarlyStopping(). Then check the number of epochs and accuracy with test loss.
from tensorflow.keras.callbacks import ReduceLROnPlateau, EarlyStopping
rlp = lrd = ReduceLROnPlateau(monitor = 'val_loss',patience = 2,verbose = 1,factor = 0.8, min_lr = 1e-6)
es = EarlyStopping(verbose=1, patience=2)
his = classifier.fit(X_train, y_train, epochs=500, batch_size = 128, validation_split=0.1, verbose = 1, callbacks=[lrd,es])