I tried to create stacking regressor to predict multiple output with SVR and Neural network as estimators and final estimator is linear regression.
print(X_train.shape) #(73, 39)
print(y_train.shape) #(73, 13)
print(X_test.shape) #(19, 39)
print(y_test.shape) #(19, 13)
def build_nn():
ann = Sequential()
ann.add(Dense(40, input_dim=X_train.shape[1], activation='relu', name="Hidden_Layer_1"))
ann.add(Dense(y_train.shape[1], activation='sigmoid', name='Output_Layer'))
ann.compile( loss='mse', optimizer= 'adam', metrics = 'mse')
return ann
keras_reg = KerasRegressor(model = build_nn,optimizer="adam",optimizer__learning_rate=0.001,epochs=100,verbose=0)
stacker = StackingRegressor(estimators=[('svr',SVR()),('ann',keras_reg)], final_estimator= LinearRegression())
reg = MultiOutputRegressor(estimator=stacker)
model = reg.fit(X_train,y_train)
I am able to 'fit' the model. However, I got below problem when trying to predict.
prediction = reg.predict(X_test)
ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 0, the array at index 0 has size 19 and the array at index 1 has size 247
Imo the point here is the following. On one side, NN models do support multi-output regression tasks on their own, which might be solved defining an output layer similar to the one you built, namely with a number of nodes equal to the number of outputs (though, with respect to your construction, I would specify a linear activation with activation=None rather than a sigmoid activation).
def build_nn():
ann = Sequential()
ann.add(Dense(40, input_dim=X_train.shape[1], activation='relu', name="Hidden_Layer_1"))
ann.add(Dense(y_train.shape[1], name='Output_Layer'))
ann.compile(loss='mse', optimizer= 'adam', metrics = 'mse')
return ann
On the other side, here, you're trying to solve your multi-output regression task by calling the MultiOutputRegressor constructor on a StackingRegressor instance, i.e. by explicitly training one regression model per output, the regression model being the combination of multiple regression models.
The issue arises from the concatenation of the predictions of the StackingRegressor base estimators and from their different shapes, in particular. Indeed:
the predictions of the MultiOutputRegressor instance are demanded to the StackingRegressor as you can see in https://github.com/scikit-learn/scikit-learn/blob/7e1e6d09bcc2eaeba98f7e737aac2ac782f0e5f1/sklearn/multioutput.py#L234
in turn, in a StackingRegressor the predictions of each individual estimator are stacked together and used as input to a final_estimator to compute the prediction. .predict() is called on final_estimator in https://github.com/scikit-learn/scikit-learn/blob/7e1e6d09bcc2eaeba98f7e737aac2ac782f0e5f1/sklearn/ensemble/_stacking.py#L267 (and in particular, you can see that it is taking the transformed X as input).
the transformed X is the result of the concatenation of the predictions of the StackingRegressor base estimators, as you can see in https://github.com/scikit-learn/scikit-learn/blob/7e1e6d09bcc2eaeba98f7e737aac2ac782f0e5f1/sklearn/ensemble/_stacking.py#L67.
This said, among the StackingRegressor base estimators you have an SVR() model which is designed not to be able to natively solve multi-output regression tasks and a KerasRegressor neural network which, defined as you did, is meant to be able to solve a multi-output regression task without delegating to MultiOutputRegressor. Therefore, what happens in _concatenate_predictions is that dimensionally-inconsistent predictions arise from SVR() (1D array of shape (19,)=(n_samples,) eventually reshaped into a (19,1) array) and from the KerasRegressor (2D array of shape (19,13)=(n_samples,n_outputs) eventually flattened and reshaped into a (19*13,1)=(247,1) array). This reflects the fact that letting your neural network output layer have a number of nodes equal to the number of outputs cannot fit into a StackingRegressor with another base estimator which should be necessarily extended via MultiOutputRegressor to be able to solve a multi-output regression task.
Therefore, for me, if you want to keep the same "architecture", you should let your neural network have an output layer with a single node so that its predictions can be concatenated with the ones from the SVR model and accessible to the StackingRegressor final_estimator and eventually delegate to MultiOutputRegressor.
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
import tensorflow as tf
import tensorflow.keras
from tensorflow.keras.layers import Dense
from tensorflow.keras.models import Sequential
from scikeras.wrappers import KerasRegressor
from sklearn.ensemble import StackingRegressor
from sklearn.multioutput import MultiOutputRegressor
from sklearn.linear_model import LinearRegression
from sklearn.svm import SVR
X, y = make_regression(n_samples=92, n_features=39, n_informative=39, n_targets=13, random_state=42)
print(X.shape, y.shape)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)
def build_nn():
ann = Sequential()
ann.add(Dense(40, input_dim=X_train.shape[1], activation='relu', name="Hidden_Layer_1"))
ann.add(Dense(1, name='Output_Layer'))
ann.compile(loss='mse', optimizer= 'adam', metrics = 'mse')
return ann
keras_reg = KerasRegressor(model = build_nn, optimizer="adam",
optimizer__learning_rate=0.001, epochs=100, verbose=0)
stacker = StackingRegressor(estimators=[('svr', SVR()), ('ann', keras_reg)], final_estimator = LinearRegression())
reg = MultiOutputRegressor(estimator=stacker)
reg.fit(X_train,y_train)
predictions = reg.predict(X_test)
I started learning how to use Keras. I have a raw file that encodes ASCII values of characters in a sentence with a corresponding product name. For example, abcd toothpaste cream would be classified as Toothpaste. The first two lines (out of ~150,000 lines) of the code is shown below. The file is also available for download here (this link will last two months from today).
12,15,11,31,30,15,0,26,28,15,29,30,19,17,15,0,19,24,30,15,28,24,11,30,0,18,19,17,19,15,24,15,0,35,0,12,15,22,22,15,36,11,0,12,15,22,22,15,36,11,0,16,28,11,17,11,24,13,19,11,29,0,16,15,23,15,24,19,24,11,29,0,11,36,36,15,14,19,24,15,0,11,36,36,15,14,19,24,15,11,22,11,19,11,0,26,15,28,16,31,23,15,0,16,15,23,15,24,19,24,25,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Body Care Other
12,15,19,15,28,29,14,25,28,16,0,30,18,11,19,22,11,24,14,0,13,25,0,22,30,14,0,29,21,19,24,13,11,28,15,0,26,28,15,26,11,28,11,30,19,25,24,29,0,16,11,13,19,11,22,0,13,22,15,11,24,29,15,28,29,0,24,19,32,15,11,0,16,11,13,19,11,22,0,13,22,15,11,24,29,15,28,29,0,26,28,25,14,31,13,30,29,0,24,19,32,15,11,0,23,11,21,15,0,31,26,0,13,22,15,11,28,0,23,19,13,15,22,22,11,28,0,33,11,30,15,28,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Skin Care Other
I am following a blog post where it uses a simple deep learning Keras model to do multi-class classification. I changed the configuration of the neural network to 243 inputs --> [100 hidden nodes] --> 67 outputs (because I have 67 classes to classify). The code is below:
import numpy
import pandas
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
from keras.utils import np_utils
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.preprocessing import LabelEncoder
def baseline_model():
model = Sequential()
# I changed it to 243 inputs --> [100 hidden nodes] --> 67 outputs (because I have 67 classes to classify)
model.add(Dense(100, input_dim=X_len, activation='relu'))
model.add(Dense(Y_cnt, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
return model
seed = 7
numpy.random.seed(seed)
# load dataset
dataframe = pandas.read_csv("./input/raw_mappings.csv", header=None)
dataset = dataframe.values
X_len = len(dataset[0,:-1])
X = dataset[:,0:X_len].astype(float)
Y = dataset[:,X_len]
Y_cnt = len(numpy.unique(Y))
# encode class values as integers
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)
# convert integers to dummy variables (i.e. one hot encoded)
dummy_y = np_utils.to_categorical(encoded_Y)
estimator = KerasClassifier(build_fn=baseline_model, epochs=200, batch_size=5, verbose=0)
kfold = KFold(n_splits=10, shuffle=True, random_state=seed)
results = cross_val_score(estimator, X, dummy_y, cv=kfold)
print("Baseline: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))
But it never seems to finish when I ran it on my desktop computer for more than 12 hours. I'm starting to think there is almost nothing going on. Is there something that I'm doing wrong with either the configuration of the neural network or the problem I'm trying to solve (meaning, maybe Sequential model is not the right way to go for classifying >60 classes?).
Any pointer or tip would be greatly appreciated. Thank you.
i am trying to build an MLP model that takes a dataset consists of 9 columns
this is a sample (patient number, time in mill/sec., normalization of X Y and Z, kurtosis, skewness, pitch, roll and yaw, label) respectively.
1,15,-0.248010047716,0.00378335508419,-0.0152548459993,-86.3738760481,0.872322164158,-3.51314800063,0
1,31,-0.248010047716,0.00378335508419,-0.0152548459993,-86.3738760481,0.872322164158,-3.51314800063,0
1,46,-0.267422664673,0.0051143782875,-0.0191247001961,-85.7662354031,1.0928406847,-4.08015176908,0
1,62,-0.267422664673,0.0051143782875,-0.0191247001961,-85.7662354031,1.0928406847,-4.08015176908,0
and this is my code, there is no error in my code but the results with and without features are the same .. so i am asking if i used the right way to fed those features into the model.
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
import numpy as np
from scipy import signal
import matplotlib.pyplot as plt
import pandas as pd
import itertools
import math
np.random.seed(7)
train = np.loadtxt("featwithsignalsTRAIN.txt", delimiter=",")
test = np.loadtxt("featwithsignalsTEST.txt", delimiter=",")
x_train = train[:,[2,3,4,5,6,7]]
x_test = test[:,[2,3,4,5,6,7]]
y_train = train[:,8]
y_test = test[:,8]
model = Sequential()
model.add(Dense(500, input_dim=6, activation='relu'))
model.add(Dense(300, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy' , optimizer='adam', metrics=['accuracy'])
# Fit the model
batch_size = 128
epochs = 10
hist = model.fit(x_train, y_train,
batch_size=batch_size,
epochs=epochs,
verbose=2,
)
avg = np.mean(hist.history['acc'])
print('The Average Testing Accuracy is', avg)
##Evaluate the model
score=model.evaluate(x_test, y_test, verbose=2)
print(score)
There is nothing wrong with your model, but it's possible that your model doesn't learn anything useful. It could be that you are using a learning too high or too small, that you need more epochs, or that simply your features are not useful.
Here are some advices :
You can directly add a validation set to your fit method, which will compute the same metrics on this set at the end of each epoch and will allow you to see if your model learn something useful or if it's just overfitting on the training set without having to wait for the model to finish its training. (make sure you use verbose = 1 or 2 to see the training process).
model.fit( ... , validation_data = (x_test , y_test) , ...)
I see that you used the history callback. A good practice is to see how the accuracy is changing from an epoch to another instead of taking the mean. This allows you to see if your network is effectively learning something. A network rarely converge on the firsts epochs.
Do you have an idea of the 'usefulness' of your feature ? You can get an idea of that by performing an exploratory analysis before creating your model or by fitting a more 'conventional' model (linear regression, decision trees, random forest ...). It's Highly recommended before fitting a neural network, and this also allows you to compare different types of models and to see if you realy need to use neural networks.
If you are sure that your features would at least perform better than a random guess, try playing with the learning rate. A high learnign rate could cause the model to overshoot the minimum, and a learning rate too small could cause the model to learn very slowly or to get stuck in a local minima. You could also try to tune the number of epochs.