I have a dataset and i want to decide on which ML algorithm to apply to my given problem.
Customers are to fill out an assessment questionnaire of 50 questions. Examples of the questions are, what is your job, previous job history, how much do you earn, have you been rejected for a loan etc, and the end goal is to decide whether they should be rejected or not.
I have circa 500 entries for my algorithm to learn from and have pre-processed my dataset and converted the inputs into a numpy array and wondering what would be the best algorithm to use? Should i use a classification algorithm or a neural network in tensorflow and if the latter, what would be the layers I should use?
Thanks
How about beginning with xgboost or random forest? - So plain "old" ML?
The advantage would be that you could visualize the decision tree of the model once trained.
If using a NN in tensorflow (or even easier: keras with tensorflow backend), you could go with a MLP (multi layer perceptron), since the questions answers have fixed position in the input. You don't need many layers.
Important is that you normalize your input data columnwise, so that the input numbers are not much bigger/smaller than +1/-1, respectively. Introductory books often miss this point, though important.
Since your target labeling is "accept" or "reject", binary classifier will do it (also in the machine learning approach). (You use 0 and 1 as labels).
For NN, you don't need for such kind of classification that many layers or neurons. Try the smallest network first. let's say 10 neurons in first layer, then 7 neurons in the next layer (probably even less) and then 1 output neuron for the binary decision.
With Keras this would be:
from keras.models import Sequential
from keras.layers import Dense
def create_mlp(n_input = 500): # number of columns of input data 500 here
model = Sequential()
model.add(Dense(10, input_dim=n_input, kernel_initializer='normal', activation='relu')) # init = kernel_initializer
model.add(Dense(7, kernel_initializer='normal', activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['acc'])
return model
model = create_mlp(500) # this will generate the correct NN compiled.
Your data frame (or Numpy input array must have as rows the samples,
the columns are for each answer for a question 1 column.
the answers you have to encode in a numeric form. Numbers should be small - the best between -1 and 1. NNs don't like big numbers. Thus column-wise normalization can help.)
That's it. I learned all this stuff last year. Good luck for learning. It will be tons of fun!
Related
I am trying to train the following RNN in tensorflow. It takes an 11-D numeric vector as input and it outputs a sequence of 10 multiclass probability vectors, with 14 exclusive classes.
model = keras.models.Sequential([
keras.layers.SimpleRNN(30, return_sequences=False, input_shape=[1, 11]),
keras.layers.RepeatVector(10),
keras.layers.SimpleRNN(30, return_sequences=True),
keras.layers.SimpleRNN(14, return_sequences=True, activation="softmax")
])
model.compile(loss="categorical_crossentropy",
optimizer="adam")
history = model.fit(X_train, y_train, epochs=50, batch_size=32,
validation_split=0.2)
However, even for a small dataset of 10 points, it takes hundreds of epochs to fit. As you can see in the figure, the loss barely goes down with the training epochs:
When I try to train the real training set, the loss simply does not move. Any idea of how to successfully train this model?
You can find the first 10 datapoints here
And the first 100 datapoints here
To load the data just use:
with open('train10.pickle', 'rb') as f:
X_train, y_train = pickle.load(f)
Thank you very much for your help
EDIT:
To provide additional context, what I have in this problem is a continuous numeric embedding in 11-D to start with, and the output is a sequence of one-hot encodings, so you can think of this problem as training a decoder or doing a decompression to get a sort of "words" back from points in the numeric space (each one-hot vector in the output could be thought of a "letter"). I previously tried to train a non-recurrent network outputting the full list of one-hot encodings (whole "word") at once, but the performance was also very poor. I just do not see what the bottleneck is: if the dimensionality of the numeric embedding, the training algorithm, etc. My tinkering so far with types of layers, numbers of layers, or learning rates did not produce substantial improvements. I am open to sharing the whole dataset if you think that can help. Thank you very much!
Each machine learning problem is unique and it is very difficult to say exactly what the issue is without having access to the full data set. Some possibilities are:
The model specification is suboptimal - try varying the number of hidden layers, the number of neurons in each layer, using GRU/LSTM layers instead of RNN, adding add some dropout layers, etc.
The training algorithm needs to be adjusted - try using a different optimizer, a different batch size, a different train-test split ratio etc.
The input data needs more (or less) preprocessing - try normalizing/standardizing the input features if you haven't already.
You need to do more work on feature engineering - think deeply about all potential relationships between the input data and the target, and try combining columns to create ratios etc. While the NN can theoretically figure this out for itself, it is often effective to try and reduce the work it has to do in this respect.
Your problem may just be difficult or even unsolvable. There may just be no strong relationship between the input and the target.
I saw in deep learning course by Andrew Ng a way to localize single object on image : https://www.youtube.com/watch?v=GSwYGkTfOKk .
As I understand it, you can for example bound a point to specific part of the object, take coordinates: x, y as labels y and train CNN.
I wanted to train a CNN neural network to localize my eyes (not clasiffication). I took 200 photos of me: 60x60 pixels in gray scale. I labeld left and right eye, Each coordinate of labeled eye was normalized to 0-1. The y label is : [x of eye1, y of eye1, x of eye2, y of eye2]. I used SGD optimazer with mse loss and in the output layer sigmoid function.
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Conv2D(64, (3,3), input_shape= (60,60, 1)))
model.add(tf.keras.layers.Activation('relu'))
model.add(tf.keras.layers.MaxPool2D(pool_size=(2,2)))
model.add(tf.keras.layers.Conv2D(32, (3,3)))
model.add(tf.keras.layers.Activation('relu'))
model.add(tf.keras.layers.MaxPool2D(pool_size=(2,2)))
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(64, activation='relu'))
model.add(tf.keras.layers.Dense(4, activation='sigmoid'))
sgd= tf.keras.optimizers.SGD(lr = 0.01)
model.compile(loss = 'mean_squared_error', optimizer=sgd, metrics=['accuracy'])
model.fit(x,y, batch_size=3, epochs=15, validation_split=0.2)
It didnt work for this task, so what is the way to solve this problem? I saw somewhere: apply CNN to image (I suppose without dense layers), then on flatten data from CNN use linear regression for each x/y coordinate (multivariable logistic regression). Is this a solution ? As I understand it, I would feed each image into Conv and MaxPool layers, then Flatten and then i feed the data to lin. regression and train it, but I have no idea how to do this in keras. I am new in this field, so any idea helps me.
First of all, a couple of observations with regard to your code.
Since the last layer contanins more than 2 neurons, the activation function that you have to use is softmax , not sigmoid (note that this is in the case of classification, not regression).
You should only use sigmoid when you are doing binary classification, but not when you have more than two classes (note that you can also use softmax for 2 classes, however it is not necessarily recommended from a small computational overhead viewpoint).
Your problem is both a regression and classification one!.
The first layer of your convolutional neural network contains 64 feature maps, with each size of the kernel 3x3. Although the way you feed the images to your neural network is correct, you only feed the grayscale image, not the x1,x2,y1,y2 coordinates.
For an ANN with regression, take a look at this tutorial: https://machinelearningmastery.com/regression-tutorial-keras-deep-learning-library-python/.
Your intuition is correct; object detection neural networks replace fully connected layers with convolutional ones. Yann LeCun even states that fully connected layers should not be a part of CNNs.
Since you are new to this field, I would recommend to adopt the following pipeline.
1) Find an online github model written in your preferred deep learning library(Keras/PyTorch/TensorFlow etc).
2) Follow the instructions/tutorial in order to reproduce the results obtained by the github user.
3) By means of the latter you should also understand the code / get a good intuitive grasp.
4) Adapt the model to the problem that you need.
One place where you could start is here (this is object detection - detect multiple objects and also of different categories) : https://github.com/pierluigiferrari/ssd_keras.
If you have further questions, please write them down, I would be glad to be of assistance!
First of all apologies if I word this wrong, I'm relatively new to TensorFlow
I am designing a model for simple classification of a dataset, each column is an attribute and the final column is the class. I split these and generate a dataframe in the usual way.
If I generated a model with dense layers, it works great:
def baseline_model():
# create model
model = Sequential()
model.add(Dense(30, input_dim=len(dataframe.columns)-2, activation='sigmoid'))
model.add(Dense(20,activation='sigmoid'))
model.add(Dense(unique, activation='softmax'))
# Compile model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()
return model
If I were to add, say, an LSTM layer to the model:
def baseline_model():
# create model
model = Sequential()
model.add(Dense(30, input_dim=len(dataframe.columns)-2, activation='sigmoid'))
#this bit here >
model.add(LSTM(20, return_sequences=True))
model.add(Dense(20,activation='sigmoid'))
model.add(Dense(unique, activation='softmax'))
# Compile model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()
return model
I get the following error when I execute the code:
ValueError: Input 0 is incompatible with layer lstm_1: expected ndim=3, found ndim=2
I'm not sure where these variables are coming from although, maybe it's the classes? I have three classes of data ('POSITIVE', 'NEGATIVE', 'NEUTRAL') mapped to sets of well over 2,000 attribute values - they're statistical extractions of timewindowed EEG brainwave data from multiple electrodes classed by emotional state.
Note:
The 'input_dim=len(dataframe.columns)-2' produces the number of attributes (inputs), I do this as I'd like the script to work with CSV datasets of different sizes on the fly
Also, there are no tabs in my code pasted but it is indented and will compile
The full code is pasted here: https://pastebin.com/1aXp9uDA for presentation purposes. Apologies in advance for the terrible practices! This is just an initial project, I do plan on cleaning it all up later on!
In your original code you have an input dimension of 2 which is shaped (batch, feature). When you add an LSTM, you're telling Keras you want to do the classification given the last N timesteps, hence you need an input that is shaped (batch, timestep, feature). It's easy to think that an LSTM will look back across all inputs in the batch but unfortunately it doesn't. You have to manually organize your data to present all the timestep elements together.
To split up your data you generally do a sliding window of length N (where N is how many values you wish the LSTM to look back across). You can slide the window N steps each time, meaning there's no overlap of the data or you can simply slide it one sample, meaning you get multiple copies of your data. There are numerous blogs on how to do this. Take a look at this one How to Reshape Input Data for LSTM.
You also have one other issue. For your LSTM, you probably want "return_sequences=False". With this equal to True, you would need to have an output "Y" value for each element of your timestep. You probably want your "Y" value to represent the next value in your time-series. Keep this in mind when organizing your data.
The above link provides some nice examples or you can search for more in-depth ones. If you follow those it should be clear how to reorganize things for an LSTM.
I am attempting to use keras to build an activity classifier from accelerometer signals. However, I am experiencing extreme overfitting of the data even with the most simplistic of models.
The input data is of shape (10,3) and contains roughly .1 second of data from the accelerometer in 3 dimensions. The model is simply
model = Sequential()
model.add(Flatten(input_shape=(10,3)))
model.add(Dense(2, activation='softmax'))
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
The model should output the label [1,0] for walking activities and [0,1] for non-walking activities. After training I get 99.8% accuracy (if only it was real...). When I attempt to predict on data that wasn't used for training, I get 50% accuracy, verifying that the net isn't really "learning" anything except to predict a single class value.
The data is being prepared from 100hz triaxial accelerometer signals. I am not preprocessing the data in any way except for windowing it into bins on length 10 that overlap with the previous bin by 50%. What measures can I take to make the network produce actual predictions? I have tried increasing the window size but the results remain the same. Any advice/general tips are greatly appreciated.
Ian
Try adding some hidden layers and dropout layers to your network. You could create a simple Multi Layer Perceptron (MLP) with a couple of extra lines in between your Flatten layer and Dense layer:
model.add(Dense(64, activation='relu', input_dim=30))
model.add(Dropout(0.25))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.1))
Or check out this guide. which explains how to create a simple MLP.
Without any hidden layers your model will not actually be 'learning' from the input data, rather it will be mapping the input number of features to the output number of features.
The more layers you add, the more intermediate features and patterns it should extract from the input data which should lead to better model predictions for test data. There will be a lot of trial and error to design the best model as too many layers can result in over fitting.
You have not provided information about how you train the model so that may be the cause of the issue as well. You must ensure that the data is spit into training, testing and validation sets. Some possible split ratios to use for training, validation, test data are: 60%:20%:20%, or 70%:15%:15%. This is ultimately something that you must also decide.
The problem of overfitting was caused by the input data type. The values passed to the classifier should have been float values with 2 decimal places. Somewhere along the way, some of these values had been augmented and had significantly more than 2 decimal places. That is, the input should have looked like
[9.81, 10.22, 11.3]
but instead looked like
[9.81000000012, 10.220010431, 11.3000000101]
The classifier was making its prediction based on this feature, which is obviously not the desired behavior! Lessoned learned - make sure the data preparation is consistent! Thanks to #umutto for the suggestions of random forests, the simple structure was helpful for diagnosing purposes.
I'm training a neural net using Keras in Python for time-series climate data (predicting value X at time t=T), and tried adding a (20%) dropout layer on the inputs, which seemed to limit overfitting and cause a slight increase in performance. However, after I added a new and particularly useful feature (the value of the response variable at time of prediction t=0), I found massively increased performance by removing the dropout layer. This makes sense to me, since I can imagine how the neural net would "learn" the importance of that one feature and base the rest of its training around adjusting that value (i.e, "how do these other features affect how the response at t=0 changes by time t=T").
In addition, there are a few other features that I think should be present for all epochs. That said, I am still hopeful that a dropout layer could improve the model performance-- it just needs to not drop out certain features, like X at t_0: I need a dropout layer that will only drop out certain features.
I have searched for examples of doing this, and read the Keras documentation here, but can't seem to find a way to do it. I may be missing something obvious, as I'm still not familiar with how to manually edit layers. Any help would be appreciated. Thanks!
Edit: sorry for any lack of clarity. Here is the code where I define the model (p is the number of features):
def create_model(p):
model = Sequential()
model.add(Dropout(0.2, input_shape=(p,))) # % of features dropped
model.add(Dense(1000, input_dim=p, kernel_initializer='normal'
, activation='sigmoid'))
model.add(Dense(30, kernel_initializer='normal', activation='relu'))
model.add(Dense(1, kernel_initializer='normal',activation='linear'))
model.compile(loss=cost_fn, optimizer='adam')
return model
The best way I can think of applying dropout only to specific features is to simply separate the features in different layers.
For that, I suggest you simply divide your inputs in essential features and droppable features:
from keras.layers import *
from keras.models import Model
def create_model(essentialP,droppableP):
essentialInput = Input((essentialP,))
droppableInput = Input((droppableP,))
dropped = Dropout(0.2)(droppableInput) # % of features dropped
completeInput = Concatenate()([essentialInput, dropped])
output = Dense(1000, kernel_initializer='normal', activation='sigmoid')(completeInput)
output = Dense(30, kernel_initializer='normal', activation='relu')(output)
output = Dense(1, kernel_initializer='normal',activation='linear')(output)
model = Model([essentialInput,droppableInput],output)
model.compile(loss=cost_fn, optimizer='adam')
return model
Train the model using two inputs. You have to manage your inputs before training:
model.fit([essential_train_data,droppable_train_data], predictions, ...)
I don't see any harm to using dropout in the input layer. The usage/effect would be a little different than normal of course. The effect would be similar to adding synthetic noise to an input signal; only the feature/pixel/whatever would be entirely unknown[zeroed out] instead of noisy. And inserting synthetic noise into the input is one of the oldest ways to improve robustness; certainly not bad practice as long as you think about whether it makes sense for your data set.
This question has already an accepted answer but it seems to me you are using dropout in a bad way.
Dropout is only for the hidden layers, not for the input layer !
Dropout act as a regularizer, and prevent the hidden layer complex coadaptation, quoting Hinton paper "Our work extends this idea by showing that dropout can be effectively applied in the hidden layers as well and that it can be interpreted as a form of model averaging" (http://www.jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf)
Dropout can be seen as training several different models with your data and averaging the prediction at test time. If you prevent your models to have all the inputs during training, it will perform badly, especially if one input is crucial. What you want is actually avoid overfitting, meaning you prevent too complex models during the training phase (so each of your models will select the most important features first) before testing.
It is common practice to drop some of the features in ensemble learning but it is control and not stochastic like for dropout. It also works for neural networks as hidden layers have (often) way more neurons as inputs, and so dropout follows the law of big numbers, as for a small number of inputs, you can have in some bad case almost all your inputs dropped.
In conlusion: it is a bad practice to use dropout in the input layer of a neural network.