I tried to create stacking regressor to predict multiple output with SVR and Neural network as estimators and final estimator is linear regression.
print(X_train.shape) #(73, 39)
print(y_train.shape) #(73, 13)
print(X_test.shape) #(19, 39)
print(y_test.shape) #(19, 13)
def build_nn():
ann = Sequential()
ann.add(Dense(40, input_dim=X_train.shape[1], activation='relu', name="Hidden_Layer_1"))
ann.add(Dense(y_train.shape[1], activation='sigmoid', name='Output_Layer'))
ann.compile( loss='mse', optimizer= 'adam', metrics = 'mse')
return ann
keras_reg = KerasRegressor(model = build_nn,optimizer="adam",optimizer__learning_rate=0.001,epochs=100,verbose=0)
stacker = StackingRegressor(estimators=[('svr',SVR()),('ann',keras_reg)], final_estimator= LinearRegression())
reg = MultiOutputRegressor(estimator=stacker)
model = reg.fit(X_train,y_train)
I am able to 'fit' the model. However, I got below problem when trying to predict.
prediction = reg.predict(X_test)
ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 0, the array at index 0 has size 19 and the array at index 1 has size 247
Imo the point here is the following. On one side, NN models do support multi-output regression tasks on their own, which might be solved defining an output layer similar to the one you built, namely with a number of nodes equal to the number of outputs (though, with respect to your construction, I would specify a linear activation with activation=None rather than a sigmoid activation).
def build_nn():
ann = Sequential()
ann.add(Dense(40, input_dim=X_train.shape[1], activation='relu', name="Hidden_Layer_1"))
ann.add(Dense(y_train.shape[1], name='Output_Layer'))
ann.compile(loss='mse', optimizer= 'adam', metrics = 'mse')
return ann
On the other side, here, you're trying to solve your multi-output regression task by calling the MultiOutputRegressor constructor on a StackingRegressor instance, i.e. by explicitly training one regression model per output, the regression model being the combination of multiple regression models.
The issue arises from the concatenation of the predictions of the StackingRegressor base estimators and from their different shapes, in particular. Indeed:
the predictions of the MultiOutputRegressor instance are demanded to the StackingRegressor as you can see in https://github.com/scikit-learn/scikit-learn/blob/7e1e6d09bcc2eaeba98f7e737aac2ac782f0e5f1/sklearn/multioutput.py#L234
in turn, in a StackingRegressor the predictions of each individual estimator are stacked together and used as input to a final_estimator to compute the prediction. .predict() is called on final_estimator in https://github.com/scikit-learn/scikit-learn/blob/7e1e6d09bcc2eaeba98f7e737aac2ac782f0e5f1/sklearn/ensemble/_stacking.py#L267 (and in particular, you can see that it is taking the transformed X as input).
the transformed X is the result of the concatenation of the predictions of the StackingRegressor base estimators, as you can see in https://github.com/scikit-learn/scikit-learn/blob/7e1e6d09bcc2eaeba98f7e737aac2ac782f0e5f1/sklearn/ensemble/_stacking.py#L67.
This said, among the StackingRegressor base estimators you have an SVR() model which is designed not to be able to natively solve multi-output regression tasks and a KerasRegressor neural network which, defined as you did, is meant to be able to solve a multi-output regression task without delegating to MultiOutputRegressor. Therefore, what happens in _concatenate_predictions is that dimensionally-inconsistent predictions arise from SVR() (1D array of shape (19,)=(n_samples,) eventually reshaped into a (19,1) array) and from the KerasRegressor (2D array of shape (19,13)=(n_samples,n_outputs) eventually flattened and reshaped into a (19*13,1)=(247,1) array). This reflects the fact that letting your neural network output layer have a number of nodes equal to the number of outputs cannot fit into a StackingRegressor with another base estimator which should be necessarily extended via MultiOutputRegressor to be able to solve a multi-output regression task.
Therefore, for me, if you want to keep the same "architecture", you should let your neural network have an output layer with a single node so that its predictions can be concatenated with the ones from the SVR model and accessible to the StackingRegressor final_estimator and eventually delegate to MultiOutputRegressor.
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
import tensorflow as tf
import tensorflow.keras
from tensorflow.keras.layers import Dense
from tensorflow.keras.models import Sequential
from scikeras.wrappers import KerasRegressor
from sklearn.ensemble import StackingRegressor
from sklearn.multioutput import MultiOutputRegressor
from sklearn.linear_model import LinearRegression
from sklearn.svm import SVR
X, y = make_regression(n_samples=92, n_features=39, n_informative=39, n_targets=13, random_state=42)
print(X.shape, y.shape)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)
def build_nn():
ann = Sequential()
ann.add(Dense(40, input_dim=X_train.shape[1], activation='relu', name="Hidden_Layer_1"))
ann.add(Dense(1, name='Output_Layer'))
ann.compile(loss='mse', optimizer= 'adam', metrics = 'mse')
return ann
keras_reg = KerasRegressor(model = build_nn, optimizer="adam",
optimizer__learning_rate=0.001, epochs=100, verbose=0)
stacker = StackingRegressor(estimators=[('svr', SVR()), ('ann', keras_reg)], final_estimator = LinearRegression())
reg = MultiOutputRegressor(estimator=stacker)
reg.fit(X_train,y_train)
predictions = reg.predict(X_test)
Related
I have a regression neural network with ten input features and three outputs. But all ten features do not have the same importance in loss function calculation (mean square error). So I want to define specific coefficients for each input feature to increase their role in the loss function.
Consider we define coefficients in an array: coeff=[5,20,2,1,4,5,6,2,9,15]. When mean squared error is measuring the distances of input features, for example, if the distance of the second feature is '60', this distance is multiplied by coefficient '20' from coeff array.
I guess I need to define a custom loss function, but how to pass the defined "coeff" array and multiply its elements with input features?
Updated
I guess my idea is similar to this code and this code, but I am not sure. however, I was unable to run the first one and got errors.
from numpy import mean
from numpy import std
from sklearn.datasets import make_regression
from sklearn.model_selection import RepeatedKFold
from keras.models import Sequential
from keras.layers import Dense
# get the dataset
def get_dataset():
X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, n_targets=3, random_state=2)
return X, y
# get the model
def get_model(n_inputs, n_outputs):
model = Sequential()
model.add(Dense(20, input_dim=n_inputs, kernel_initializer='he_uniform', activation='relu'))
model.add(Dense(n_outputs))
model.compile(loss='mse', optimizer='adam')
return model
# evaluate a model using repeated k-fold cross-validation
def evaluate_model(X, y):
results = list()
n_inputs, n_outputs = X.shape[1], y.shape[1]
# define evaluation procedure
cv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1)
# enumerate folds
for train_ix, test_ix in cv.split(X):
# prepare data
X_train, X_test = X[train_ix], X[test_ix]
y_train, y_test = y[train_ix], y[test_ix]
# define model
model = get_model(n_inputs, n_outputs)
# fit model
model.fit(X_train, y_train, verbose=0, epochs=100)
# evaluate model on test set
mse = model.evaluate(X_test, y_test, verbose=0)
# store result
print('>%.3f' % mse)
results.append(mse)
return results
# load dataset
X, y = get_dataset()
# evaluate model
results = evaluate_model(X, y)
# summarize performance
print('MSE: %.3f (%.3f)' % (mean(results), std(results)))
If you use the functional api, then you could add a custom loss function with the model.add_loss function, within the model. Your loss function can then use the model inputs and outputs and anything in your model.
The problem with this approach is, that in the model you don't have the 'true' y values. So you would need to add an additional input to your model, and pass the y values to the model – but just for the loss calculation.
Something like this:
inputs = Input(shape=(n_inputs))
x = Dense(20, ...)(model_inputs)
outputs = Dense(n_outputs)(x)
y_true = Input(shape=(n_outputs))
modelx = Model(inputs=[inputs, y_true], outputs=outputs)
modelx.add_loss(your_loss_function(y_true=y_true, y_pred=outputs, inputs=inputs)
Since you already added the loss to the model, you compile it without any loss:
modelx.compile(loss=None, optimizer='adam')
When you fit the model, you need to pass the y values to the model inputs.
modelx.fit(x=[X_train, y_train], y=y_train, verbose=0, epochs=100)
When you want a model with just the X values as input, for example for prediction, you can create it like so:
model = Model(modelx.input[0], modelx.output)
I am trying to create a binary classifier on a data set of 10,000. I have tried multiple Activators and Optimizers, however the results are always between 56.8% and 58.9%. Given the fairly steady results over many dozen iterations, I assume the problem is either:
My dataset is not classifiable
My model is broken
This is the data set: training-set.csv
I may be able to get 2000 more records but that would be it.
My question is: is there something in the way my model is constructed that is preventing it from learning to a higher degree?
Note that I am happy to have as many layers and nodes as needed, and time is not a factor in generating the model.
dataframe = pandas.read_csv(r"training-set.csv", index_col=None)
dataset = dataframe.values
X = dataset[:,0:48].astype(float)
Y = dataset[:,48]
#count the input variables
col_count = X.shape[1]
#normalize X
from sklearn.preprocessing import StandardScaler
sc_X = StandardScaler()
X_scale = sc_X.fit_transform(X)
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X_scale, Y, test_size = 0.2)
# define baseline model
activator = 'linear' #'relu' 'sigmoid' 'softmax' 'exponential' 'linear' 'tanh'
#opt = 'Adadelta' #adam SGD nadam RMSprop Adadelta
nodes = 1000
max_layers = 2
max_epochs = 100
max_batch = 32
loss_funct = 'binary_crossentropy' #for binary
last_act = 'sigmoid' # 'softmax' 'sigmoid' 'relu'
def baseline_model():
# create model
model = Sequential()
model.add(Dense(nodes, input_dim=col_count, activation=activator))
for x in range(0, max_layers):
model.add(Dropout(0.2))
model.add(Dense(nodes, input_dim=nodes, activation=activator))
#model.add(BatchNormalization())
model.add(Dense(1, activation=last_act)) #model.add(Dense(1, activation=last_act))
# Compile model
adam = Adam(lr=0.001)
model.compile(loss=loss_funct, optimizer=adam, metrics=['accuracy'])
return model
estimator = KerasClassifier(build_fn=baseline_model, epochs=max_epochs, batch_size=max_batch)
estimator.fit(X_train, y_train)
y_pred = estimator.predict(X_test)
#confusion matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
score = np.sum(cm.diagonal())/float(np.sum(cm))
Two points:
There is absolutely no point in stacking dense layers with linear activations - they only result to a single linear unit; change to activator = 'relu' (and just don't bother with the other candidate activation functions in your commented-out list).
Do not use dropout by default, especially if your model has difficulties in learning (like here); remove the dropout layer(s), and just be ready to put (some of) them back in only in case you see overfitting (you are currently still very far from that point, so this is not something to worry about now).
I'm getting a numpy shape error when I use the predict function of a Keras estimator. I build, evaluate, and then retrain the model using the following code:
import pandas as pd
import sqlalchemy as sqla
import numpy
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
from keras.utils import np_utils
from keras.utils.np_utils import to_categorical
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.preprocessing import LabelEncoder
from sklearn.pipeline import Pipeline
# Connect to to the DB and retrieve the iris table
con = sqla.create_engine('postgresql://tristan:sebens#db:5432/tristan')
con.connect()
table_name = "iris"
schema = "public"
iris = pd.read_sql_table(table_name, con, schema=schema)
iris.head()
iris_ds = iris.values # Convert the table to a numpy array
X = iris_ds[:, 0:4].astype(float) # Slice the descriptive features into a numpy array
Y = iris_ds[:, 4] # Slice the labels away as their own numpy array
# The labels are encoded as strings, so we need to encode them
# as numbers that can be output by an ANN
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)
# convert integers to dummy variables (i.e. one hot encoded)
dummy_y = to_categorical(encoded_Y)
# define baseline model
def baseline_model():
# create model
model = Sequential()
model.add(Dense(8, input_dim=4, activation='relu'))
model.add(Dense(3, activation='softmax'))
# Compile model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
return model
seed = 7
# Train the model:
# First we define the model as a classifier. This will affect the process used to train it
estimator = KerasClassifier(build_fn=baseline_model, epochs=200, batch_size=5, verbose=0)
# Honestly not totally sure what this is, but it has to do with splitting the training/evaluation data in
# a way that gives us a more realistic metric of the model's accuracy
kfold = KFold(n_splits=10, shuffle=True, random_state=seed)
# Now that we have our classifier and our data pipeline defined, we can begin the training process
results = cross_val_score(estimator, X, dummy_y, cv=kfold)
print("Baseline: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))
# If we like our accuracy, then we can train the model for real
# Evaluating the model actually evaluates a clone of the model, so now we need to train the model again
estimator.fit(X, dummy_y)
And this is where the trouble is. I try to make a test prediction:
# Let's make a test prediction with our model
x = X[0]
estimator.predict(x)
And I get an input shape error:
ValueError: Error when checking input: expected dense_21_input to have shape (4,) but got array with shape (1,)
I'm at a loss. How can the input have the wrong shape if it's literally a member of the training dataset?
I have created an ANN with numerical inputs and a single categorical output which is one hot encoded to be 1 of 19 categories. I set my output layer to have 19 units. I don't know how to perform the confusion matrix now nor how to classifier.predict() in light of this rather than a single binary output. I keep getting an error saying classification metrics can't handle a mix of continuous-multioutput and multi-label-indicator targets. Not sure how to proceed.
#Importing Datasets
dataset=pd.read_csv('Data.csv')
x = dataset.iloc[:,1:36].values # lower bound independent variable to upper bound in a matrix (in this case only 1 column 'NC')
y = dataset.iloc[:,36:].values # dependent variable vector
print(x.shape)
print(y.shape)
#One Hot Encoding fuel rail column
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_y= LabelEncoder()
y[:,0]=labelencoder_y.fit_transform(y[:,0])
onehotencoder= OneHotEncoder(categorical_features=[0])
y = onehotencoder.fit_transform(y).toarray()
print(y[:,0:])
print(x.shape)
print (y.shape)
#splitting data into Training and Test Data
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x,y,test_size=0.1,random_state=0)
#Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
#x_train = sc.fit_transform(x_train)
#x_test=sc.transform(x_test)
y_train = sc.fit_transform(y_train)
y_test=sc.transform(y_test)
# PART2 - Making ANN, deep neural network
#Importing the Keras libraries and packages
import keras
from keras.models import Sequential
from keras.layers import Dense
#Initialising ANN
classifier = Sequential()
#Adding the input layer and first hidden layer
classifier.add(Dense(activation= 'relu', input_dim =35, units=2, kernel_initializer="uniform"))#rectifier activation function, include all input with one hot encoding
#Adding second hidden layer
classifier.add(Dense(activation= 'relu', units=2, kernel_initializer="uniform")) #rectifier activation function
#Adding the Output Layer
classifier.add(Dense(activation='softmax', units=19, kernel_initializer="uniform"))
#Compiling ANN - stochastic gradient descent
classifier.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])#stochastic gradient descent
#Fit ANN to training set
#PART 3 - Making predictions and evaluating the model
#Fitting classifier to the training set
classifier.fit(x_train, y_train, batch_size=10, epochs=100)#original batch is 10 and epoch is 100
#Predicting the Test set rules
y_pred = classifier.predict(x_test)
y_pred = (y_pred > 0.5) #greater than 0.50 on scale 0 to 1
print(y_pred)
#Making confusion matrix that checks accuracy of the model
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
y_pred = (y_pred > 0.5)
Outputs a boolean matrix. The problem is that it has the same shape as it had before, but when you evaluate accuracy you need a vector of labels.
To do this take np.argmax(y_pred, axis=1) instead to output correct labels.
To sum this up: with this code you should get your matrix
y_pred=model.predict(X_test)
y_pred=np.argmax(y_pred, axis=1)
y_test=np.argmax(y_test, axis=1)
cm = confusion_matrix(y_test, y_pred)
print(cm)
I am trying to understand how LSTM RNNs work and how they can be implemented in Keras in order to be able to solve a binary classification problem. My code and the dataset i use are visible below. When i compilr the code i get an error TypeError: __init__() got multiple values for keyword argument 'input_dim', Can anybody help?
from keras.models import Sequential
from keras.layers import LSTM
from keras.layers.embeddings import Embedding
from keras.layers import Dense
from sklearn.cross_validation import train_test_split
import numpy
from sklearn.preprocessing import StandardScaler # data normalization
seed = 7
numpy.random.seed(seed)
dataset = numpy.loadtxt("sorted output.csv", delimiter=",")
X = dataset[:,0:4]
scaler = StandardScaler(copy=True, with_mean=True, with_std=True ) #data normalization
X = scaler.fit_transform(X) #data normalization
Y = dataset[:4]
# split into 67% for train and 33% for test
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.33, random_state=seed)
# create model
model = Sequential()
model.add(Embedding(12,input_dim=4,init='uniform',activation='relu'))
model.add(Dense(4, init='uniform', activation='relu'))
model.add(LSTM(100))
model.add(Dense(1, init='uniform', activation='sigmoid'))
# Compile model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# Fit the model
model.fit(X_train, y_train, validation_data=(X_test,y_test), nb_epoch=150, batch_size=10)
Looks like two separate questions here.
Regarding how to use LSTMs / Keras, there are some good tutorials around. Try this one which also describes a binary classification problem. If you have a specific issue or area that you don't understand, let me know.
Regarding the file opening issue, perhaps the whitespace in the filename is causing an issue. Check out this answer to see if it helps.
This is in fact a case where the error message you are getting is perfectly to-the-point. (I wish this would always be the case with Python and Keras...)
Keras' Embedding layer constructor has this signature:
keras.layers.embeddings.Embedding(input_dim, output_dim, ...)
However, you are constructing it using:
Embedding(12,input_dim=4,...)
So figure out which is the input and output dimension, respectively, and fix your parameter order and names. Based on the table you included in the question, I'm guessing 4 is your input dimension and 12 is your output dimension; then it'd be Embedding(input_dim=4, output_dim=12, ...).