I am very new to Keras, neural networks and machine learning having just started to learn yesterday. I decided to try predicting the experience over an hour (0 to 23) (for a game and my own generated data-set) that a user would earn. Currently running what I have the predictions seem to be very low and very poor. I have tried a relu activation, which produced predictions all to be zero and from a bit of research, LeakyReLU.
This is the code I have for the prediction model so far:
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LeakyReLU
import numpy
numpy.random.seed(7)
dataset = numpy.loadtxt("experience.csv", delimiter=",")
X = dataset[: ,0]
Y = dataset[: ,1]
model = Sequential()
model.add(Dense(12, input_dim = 1, activation=LeakyReLU(0.3)))
model.add(Dense(8, activation=LeakyReLU(0.3)))
model.add(Dense(1, activation=LeakyReLU(0.3)))
model.compile(loss = 'mean_absolute_error', optimizer='adam', metrics = ['accuracy'])
model.fit(X, Y, epochs=120, batch_size=10, verbose = 0)
predictions = model.predict(X)
rounded = [round(x[0]) for x in predictions]
print(rounded)
I have also tried playing around with the hidden levels of the network, but honestly have no idea how many there should be or a good way to justify an amount.
If it helps here is the data-set I have been using:
https://raw.githubusercontent.com/NightShadeII/xpPredictor/master/experience.csv
Thankyou for any help
Looking at your data it does not seem like a classification problem.
You have two options:
-> Look at the second column and bucket them depending on the ranges and make classes that can be predicted, for instance: 0, 1, 2 etc. Now it tries to train but does not have enough examples for millions of classes that it thinks you are trying to predict.
-> If you want real valued output and not classes, try using linear regression.
Related
This is my code:-
# Importing the essential libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Getting the dataset
data = pd.read_csv("sales_train.csv")
X = data.iloc[:, 1:-1].values
y = data.iloc[:, -1].values
# y = np.array(y).reshape(-1, 1)
# Getting the values for november 2013 and 2014 to predict 2015
list_of_november_values = []
list_of_november_values_y = []
for i in range(0, len(y)):
if X[i, 0] == 10 or X[i, 0] == 22:
list_of_november_values.append(X[i, 1:])
list_of_november_values_y.append(y[i])
# Converting list to array
arr_of_november_values = np.array(list_of_november_values)
y_train = np.array(list_of_november_values_y).reshape(-1, 1)
# Scaling the independent values
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(arr_of_november_values)
# Creating the neural network
from keras.models import Sequential
from keras.layers import Dense
nn = Sequential()
nn.add(Dense(units=120, activation='relu'))
nn.add(Dense(units=60, activation='relu'))
nn.add(Dense(units=30, activation='relu'))
nn.add(Dense(units=15, activation='relu'))
nn.add(Dense(units=1, activation='softmax'))
nn.compile(optimizer='adam', loss='mse')
nn.fit(X_train, y_train, batch_size=100, epochs=25)
# Saving the weights
nn.save_weights('weights.h5')
print("Weights Saved")
For my loss, I am getting the same value for every epoch. Is it possible if there is a concept I am missing that is causing my loss to be constant??
Here is the dataset for the code.
The predominant reason is your odd choice of final-layer activation, paired with the loss function used. Reconsider this: you are using softmax activation on a single-unit fully-connected layer. Softmax activation takes a vector and scales it such that the sum of the values are equal to one and it retains proportion according to the following function:
The idea is that your network will only ever output 1, thus there are no gradients, and no learning.
To resolve this, first change your final layer activation to either ReLU or Linear, depending upon the structure of your dataset (I'm not willing to use the provided data myself, but I'm sure you understand the structure of your dataset).
I expect there may be further issues regarding the structure of your network, but I'll leave that up to you. For now, the big issue is your final-layer activation.
Change this line:
nn.add(Dense(units=1, activation='softmax'))
To this line:
nn.add(Dense(units=1))
For a regression problem, you don't need an activation function.
I have some text without any labels. Just a bunch of text files. And I want to train an Embedding layer to map the words to embedding vectors. Most of the examples I've seen so far are like this:
from keras.models import Sequential
from keras.layers import Embedding, Flatten, Dense
model = Sequential()
model.add(Embedding(max_words, embedding_dim, input_length=maxlen))
model.add(Flatten())
model.add(Dense(32, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.summary()
model.compile(optimizer='rmsprop',
loss='binary_crossentropy',
metrics=['acc'])
model.fit(x_train, y_train,
epochs=10,
batch_size=32,
validation_data=(x_val, y_val))
They all assume that the Embedding layer is part of a bigger model which tries to predict a label. But in my case, I have no label. I'm not trying to classify anything. I just want to train the mapping from words (more precisely integers) to embedding vectors. But the fit method of the model, asks for x_train and y_train (as the example given above).
How can I train a model only with an Embedding layer and no labels?
[UPDATE]
Based on the answer I've got from #Daniel Möller, Embedding layer in Keras is implementing a supervised algorithm and thus cannot be trained without labels. Initially, I was thinking that it is a variation of Word2Vec and thus does not need labels to be trained. Apparently, that's not the case. Personally, I ended up using the FastText which has nothing to do with Keras or Python.
Does it make sense to do that without a label/target?
How will your model decide which values in the vectors are good for anything if there is no objective?
All embeddings are "trained" for a purpose. If there is no purpose, there is no target, if there is no target, there is no training.
If you really want to transform words in vectors without any purpose/target, you've got two options:
Make one-hot encoded vectors. You may use the Keras to_categorical function for that.
Use a pretrained embedding. There are some available, such as glove, embeddings from Google, etc. (All of they were trained at some point for some purpose).
A very naive approach based on our chat, considering word distance
Warning: I don't really know anything about Word2Vec, but I'll try to show how to add the rules for your embedding using some naive kind of word distance and how to use dummy "labels" just to satisfy Keras' way of training.
from keras.layers import Input, Embedding, Subtract, Lambda
import keras.backend as K
from keras.models import Model
input1 = Input((1,)) #word1
input2 = Input((1,)) #word2
embeddingLayer = Embedding(...params...)
word1 = embeddingLayer(input1)
word2 = embeddingLayer(input2)
#naive distance rule, subtract, expect zero difference
word_distance = Subtract()([word1,word2])
#reduce all dimensions to a single dimension
word_distance = Lambda(lambda x: K.mean(x, axis=-1))(word_distance)
model = Model([input1,input2], word_distance)
Now that our model outputs directly a word distance, our labels will be "zero", they're not really labels for a supervised training, but they're the expected result of the model, something necessary for Keras to work.
We can have as loss function the mae (mean absolute error) or mse (mean squared error), for instance.
model.compile(optimizer='adam', loss='mse')
And training with word2 being the word after word1:
xTrain = entireText
xTrain1 = entireText[:-1]
xTrain2 = entireText[1:]
yTrain = np.zeros((len(xTrain1),))
model.fit([xTrain1,xTrain2], yTrain, .... more params.... )
Although this may be completely wrong regarding what Word2Vec really does, it shows the main points that are:
Embedding layers don't have special properties, they're just trainable lookup tables
Rules for creating an embedding should be defined by the model and expected outputs
A Keras model will need "targets", even if those targets are not "labels" but a mathematical trick for an expected result.
I'm a beginner in Neural Network and trying to predict values which are temperature values(output) with 5 inputs in python. I used keras package in python to work Neural Network.
Also, I used two algorithms which are feedforward Neural Network(Regression) and Recurrent Neural Network(LSTM) to predict values. However, both of algorithms didn't work well for forecasting.
In my case of Feedforward Neural Network(Regression), I used 3 hidden layers(with 100, 200, 300 neurons) like code below,
def baseline_model():
# create model
model = Sequential()
model.add(Dense(100, input_dim=5, kernel_initializer='normal', activation='sigmoid'))
model.add(Dense(200, kernel_initializer = 'normal', activation='sigmoid'))
model.add(Dense(300, kernel_initializer = 'normal', activation='sigmoid'))
model.add(Dense(1, kernel_initializer='normal'))
# Compile model
model.compile(loss='mean_squared_error', optimizer='adam')
return model
df = DataFrame({'Time': TIME_list, 'input1': input1_list, 'input2': input2_list, 'input3': input3_list, 'input4': input4_list, 'input5': input5_list, 'output': output_list})
df.index = pd.to_datetime(df.Time)
df = df.values
#Setting training data and test data
train_size_x = int(len(df)*0.8) #The user can change the range of training data
print(train_size_x)
X_train = df[0:train_size_x, 0:5]
t_train = df[0:train_size_x, 6]
X_test = df[train_size_x:int(len(df)), 0:5]
t_test = df[train_size_x:int(len(df)), 6]
# fix random seed for reproducibility
seed = 7
np.random.seed(seed)
scale = StandardScaler()
X_train = scale.fit_transform(X_train)
X_test = scale.transform(X_test)
#Regression in Keras package
clf = KerasRegressor(build_fn=baseline_model, nb_epoch=50, batch_size=5, verbose=0)
clf.fit(X_train,t_train)
res = clf.predict(X_test)
However, the error was quite big. The maximum absolute error was 78.4834. So I tried to minimize that error by changing number of hidden layer or neurons in hidden layer, but the error stayed around same.
After feedforward NN, secondly, I used Recurrent Neural Network(LSTM) algorithm which can predict by using only one input. In my case, the input is temperature. It gives me much less error than the feedforward NN, but I was lost in deep thought that Recurrent Nueral Network(LSTM) I implemented is little ambiguous in my case because it didn't use 5 inputs that affect the output(temperature value) such as feedforward regression that I implemented above.
And now I got lost what other kinds of algorithm I should use.
Any suggestions or ideas for my case..?
Thanks in advance.
I have to agree with the commenter to your question, you are jumping a little ahead of yourself. Neural networks can seem like black magic at times and its worth taking the time to understand whats actually going on under the hood. A good place to start learning and experimenting is with sklearn. Sklearn is a good place to start because you can try different techniques easily, this will help you learn quickly how to structure your problems. There is also an abundance of info and tutorials.
From there, you will be better equipped to tackling your own NN from scratch. Additionally, sklearn has many useful functions to pre-process/normalize your training data, which is a whole art in itself.
There are tons of good networks already available for common situations. Most of the work is in choosing the right structure for your problem, getting good data to train on, and massaging that data so it can be utilized properly.
Check it out... http://scikit-learn.org/stable/
I'm trying to understand how to implement neural networks. So I made my own dataset. Xtrain is numpy.random floats. Ytrain is sign(sin(1/x^3).
Try to implement neural networks gave me very poor results. 30%accuracy. Random Forest with 100 trees give 97%. But I heard that NN can approximate any function. What is wrong in my understanding?
import numpy as np
import keras
import math
from sklearn.ensemble import RandomForestClassifier as RF
train = np.random.rand(100000)
test = np.random.rand(100000)
def g(x):
if math.sin(2*3.14*x) > 0:
if math.cos(2*3.14*x) > 0:
return 0
else:
return 1
else:
if math.cos(2*3.14*x) > 0:
return 2
else:
return 3
def f(x):
x = (1/x) ** 3
res = [0, 0, 0, 0]
res[g(x)] = 1
return res
ytrain = np.array([f(x) for x in train])
ytest = np.array([f(x) for x in test])
train = np.array([[x] for x in train])
test = np.array([[x] for x in test])
from keras.models import Sequential
from keras.layers import Dense, Activation, Embedding, LSTM
model = Sequential()
model.add(Dense(100, input_dim=1))
model.add(Activation('sigmoid'))
model.add(Dense(100))
model.add(Activation('sigmoid'))
model.add(Dense(100))
model.add(Activation('sigmoid'))
model.add(Dense(4))
model.add(Activation('softmax'))
model.compile(optimizer='sgd',
loss='categorical_crossentropy',
metrics=['accuracy'])
P.S. I tried out many layers, activation functions, loss functions, optimizers, but never got more than 30% accuracy :(
I suspect that the 30% accuracy is a combination of small learning rate setting and a small training-step setting.
I ran your code snippet with model.fit(train, ytrain, nb_epoch=5, batch_size=32), after 5 epoch's training it yields about 28% accuracy. With the same setting but increasing the training steps to nb_epoch=50, the loss drops to ~1.157 ish and the accuracy raises to 40%. Further increase training steps should lead the model to further converging. Other than that, you can also try to configure the model with a larger learning rate setting which could make the converging faster :
model.compile(loss='categorical_crossentropy', optimizer=SGD(lr=0.1, momentum=0.9, nesterov=True), metrics=['accuracy'])
Although be careful don't set the learning rate to be too large otherwise your loss could blow up.
EDIT:
NN is known for having the potential for modeling extremely complex function, however, whether or not the model actually produce a good performance is a matter of how the model is designed, trained, and many other matters related to the specific application.
Zhongyu Kuang's answer is correct in stating that you may need to train it longer or with a different learning rate.
I'll add that the deeper your network, the longer you'll need to train it before it converges. For a relatively simple function like sign(sin(1/x^3)), you may be able to get away with a smaller network than the one you're using.
Additionally, softmax probably isn't the best output layer. You just need to yield -1 or 1. A single tanh unit seems like it would do well. softmax is generally used when you want to learn a probability distribution over a finite set. (You'll probably want to switch your error function from cross entropy to mean square error for similar reasons.)
Try a network with one sigmoidal hidden layer and an output layer with just one tanh unit. Then play around with the layer size and learning rate. Maybe add a second hidden layer if you can't get results with just one, but I wouldn't be surprised if it's unnecessary.
Addendum: In this approach, you'll replace f(x) with a direct calculation of the target function instead of the one-hot vector you're using currently.
Below is the simple example of multi-class classification task with
IRIS data.
import seaborn as sns
import numpy as np
from sklearn.cross_validation import train_test_split
from keras.models import Sequential
from keras.layers.core import Dense, Activation, Dropout
from keras.regularizers import l2
from keras.utils import np_utils
#np.random.seed(1335)
# Prepare data
iris = sns.load_dataset("iris")
iris.head()
X = iris.values[:, 0:4]
y = iris.values[:, 4]
# Make test and train set
train_X, test_X, train_y, test_y = train_test_split(X, y, train_size=0.5, random_state=0)
################################
# Evaluate Keras Neural Network
################################
# Make ONE-HOT
def one_hot_encode_object_array(arr):
'''One hot encode a numpy array of objects (e.g. strings)'''
uniques, ids = np.unique(arr, return_inverse=True)
return np_utils.to_categorical(ids, len(uniques))
train_y_ohe = one_hot_encode_object_array(train_y)
test_y_ohe = one_hot_encode_object_array(test_y)
model = Sequential()
model.add(Dense(16, input_shape=(4,),
activation="tanh",
W_regularizer=l2(0.001)))
model.add(Dropout(0.5))
model.add(Dense(3, activation='sigmoid'))
model.compile(loss='categorical_crossentropy', metrics=['accuracy'], optimizer='adam')
# Actual modelling
# If you increase the epoch the accuracy will increase until it drop at
# certain point. Epoch 50 accuracy 0.99, and after that drop to 0.977, with
# epoch 70
hist = model.fit(train_X, train_y_ohe, verbose=0, nb_epoch=100, batch_size=1)
score, accuracy = model.evaluate(test_X, test_y_ohe, batch_size=16, verbose=0)
print("Test fraction correct (NN-Score) = {:.2f}".format(score))
print("Test fraction correct (NN-Accuracy) = {:.2f}".format(accuracy))
My question is how do people usually decide the size of layers?
For example based on code above we have:
model.add(Dense(16, input_shape=(4,),
activation="tanh",
W_regularizer=l2(0.001)))
model.add(Dense(3, activation='sigmoid'))
Where first parameter of Dense is 16 and second is 3.
Why two layers uses two different values for Dense?
How do we choose what's the best value for Dense?
Basically it is just trial and error. Those are called hyperparameters and should be tuned on a validation set (split from your original data into train/validation/test).
Tuning just means trying different combinations of parameters and keep the one with the lowest loss value or better accuracy on the validation set, depending on the problem.
There are two basic methods:
Grid search: For each parameter, decide a range and steps into that range, like 8 to 64 neurons, in powers of two (8, 16, 32, 64), and try each combination of the parameters. This is obviously requires an exponential number of models to be trained and tested and takes a lot of time.
Random search: Do the same but just define a range for each parameter and try a random set of parameters, drawn from an uniform distribution over each range. You can try as many parameters sets you want, for as how long you can. This is just a informed random guess.
Unfortunately there is no other way to tune such parameters. About layers having different number of neurons, that could come from the tuning process, or you can also see it as dimensionality reduction, like a compressed version of the previous layer.
There is no known way to determine a good network structure evaluating the number of inputs or outputs. It relies on the number of training examples, batch size, number of epochs, basically, in every significant parameter of the network.
Moreover, a high number of units can introduce problems like overfitting and exploding gradient problems. On the other side, a lower number of units can cause a model to have high bias and low accuracy values. Once again, it depends on the size of data used for training.
Sadly it is trying some different values that give you the best adjustments. You may choose the combination that gives you the lowest loss and validation loss values, as well as the best accuracy for your dataset, as said in the previous post.
You could do some proportion on your number of units value, something like:
# Build the model
model = Sequential()
model.add(Dense(num_classes * 8, input_shape=(shape_value,), activation = 'relu' ))
model.add(Dropout(0.5))
model.add(Dense(num_classes * 4, activation = 'relu'))
model.add(Dropout(0.2))
model.add(Dense(num_classes * 2, activation = 'relu'))
model.add(Dropout(0.2))
#Output layer
model.add(Dense(num_classes, activation = 'softmax'))
The model above shows an example of a categorisation AI system. The num_classes are the number of different categories the system has to choose. For instance, in the iris dataset from Keras, we have:
Iris Setosa
Iris Versicolour
Iris Virginica
num_classes = 3
However, this could lead to worse results than with other random values. We need to adjust the parameters to the training dataset by making some different tries and then analyse the results seeking for the best combination of parameters.
My suggestion is to use EarlyStopping(). Then check the number of epochs and accuracy with test loss.
from tensorflow.keras.callbacks import ReduceLROnPlateau, EarlyStopping
rlp = lrd = ReduceLROnPlateau(monitor = 'val_loss',patience = 2,verbose = 1,factor = 0.8, min_lr = 1e-6)
es = EarlyStopping(verbose=1, patience=2)
his = classifier.fit(X_train, y_train, epochs=500, batch_size = 128, validation_split=0.1, verbose = 1, callbacks=[lrd,es])