scikit-learn regression with multiple continuous targets

scikit-learn regression with multiple continuous targets - python

I want to perform regression on a dataset where the input has multiple features and the output has multiple continuous targets.
I've been looking through the sklearn documentation, but the only multi-target examples I've found have either 1) a discrete set of target labels or 2) use a heuristic algorithm like KNN instead of an optimization-based algorithm like regression. Adding regularization would also be great, but I can't find a method even for simple least-squares. This is a really simple, smooth optimization problem so I'd be shocked if it wasn't already implemented somewhere. I'd appreciate it if someone could point me in the right direction!

You can find what you are looking for here.
https://machinelearningmastery.com/multi-output-regression-models-with-python/
But it would be better to try using Keras if you have enough data (output layer without any activation.
from keras.layers import Dense, Input
from keras.models import Model
from keras.regularizers import l2
num_inputs = 10
num_outputs = 4
inp = Input((num_inputs,))
out = Dense(num_outputs, kernel_regularizer=l2(0.01))(inp)
model = Model(inp, out)
model.compile(optimizer='sgd', loss='mse', metrics=['acc','mse'])
model.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_9 (InputLayer) (None, 10) 0
_________________________________________________________________
dense_7 (Dense) (None, 4) 44
=================================================================
Total params: 44
Trainable params: 44
Non-trainable params: 0
_________________________________________________________________

Related

LSTM using prediction as input feature for the next time step prediction

I started building my own LSTM neural networks a few weeks ago and there is a problem I can't solve.
I built a single hidden layer LSTM of this shape:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm (LSTM) (None, 36) 14976
_________________________________________________________________
dense (Dense) (None, 1) 37
=================================================================
Total params: 15,013
Trainable params: 15,013
Non-trainable params: 0
_________________________________________________________________
I try to predict in t an event (0 or 1) using 34 explanatory variables observed in t and 35 explanatory variables observed in t-1 (including the target in t-1). The problem is that in t I am not supposed to know the value of the target of t-1 in t and I would like to use the prediction made in t-1 as an input feature to make my prediction in t but I don't know how to translate this into python.

Keras Sequential model input: How significant are the dimensions?

I am trying to build a multioutput classifier on 3D data structured like [sampleID, timestamp, deviceID, sensorID] with one-hot labels like [sampleID, deviceID] to determine which device "wins".
In a nutshell, it is a massive collection of timeseries readings from five sensors taken at regular intervals from each of four different devices. The objective is to determine which of the devices is most likely to be in a particular state at the end of each sampleID. The labels are a one-hot representation of the devices.
In a case like this where a human would find meaning in the structure of the dataset, does the training process derive similar benefit? Can I simplify my dataset by reducing it to [dataset, deviceID, timestamp X sensor] or even [dataset, deviceID X timestamp X sensor] and still get similar accuracy?
In other words would simplifying the following dataset:
[10000, 1000, 4, 5]
down to
[10000, 4, 5000]
or
[10000, 1000, 20]
or even
[10000, 20000]
significantly diminish the model's ability to classify output?
Edited to for detail and formatting.

IIUC, you are asking if using 1000 timesteps for 20 objects (device X sensor) is better than using 1000 timesteps for 4 devices for 5 sensors.
There is no way of actually determining which would better model your problem, but, we can quickly build some tests to see which models capture the complexity of the problem better.
Case 1: 1000 time steps, 20 objects -> Sequential LSTM based model
If you consider the 20 sensors individually, you can simply use a LSTM based model and let the model handle the non linear relationships between them. Since you have a 2D input, simply build reshape your data and build a model in the following structure. Feel free to add more layers and activations etc.
from tensorflow.keras import layers, Model, utils
#Temporal model
inp = layers.Input((1000,20))
x = layers.LSTM(30, return_sequences=True)(inp)
x = layers.LSTM(30)(x)
out = layers.Dense(4, activation='softmax')(x)
model = Model(inp, out)
model.summary()
Model: "model_4"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_6 (InputLayer) [(None, 1000, 20)] 0
_________________________________________________________________
lstm_4 (LSTM) (None, 1000, 30) 6120
_________________________________________________________________
lstm_5 (LSTM) (None, 30) 7320
_________________________________________________________________
dense_20 (Dense) (None, 4) 124
=================================================================
Total params: 13,564
Trainable params: 13,564
Non-trainable params: 0
_________________________________________________________________
Case 2: 1000 time steps, 4x5 objects -> Conv-LSTM based model
Since you have a 3D input, you want to consider the 4x5 as your spatial axes and your 1000 as your channels/feature maps/temporal features. Since your data type has channels_first, do specify them in the Conv2D as well as MaxPooling2D layers.
Then, once you have convolved over the spatial axes, you can start working on the feature maps with an LSTM. Sample code below, feel free to modify and build on top of this.
from tensorflow.keras import layers, Model, utils
#Conv-LSTM model
inp = layers.Input((1000,4,5))
x = layers.Conv2D(30,2, data_format="channels_first")(inp)
x = layers.MaxPooling2D(2, data_format="channels_first")(x)
x = layers.Reshape((-1,2))(x)
x = layers.LSTM(20)(x)
out = layers.Dense(4, activation='softmax')(x)
model = Model(inp, out)
model.summary()
Model: "model_21"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_25 (InputLayer) [(None, 1000, 4, 5)] 0
_________________________________________________________________
conv2d_19 (Conv2D) (None, 30, 3, 4) 120030
_________________________________________________________________
max_pooling2d_14 (MaxPooling (None, 30, 1, 2) 0
_________________________________________________________________
reshape_10 (Reshape) (None, 30, 2) 0
_________________________________________________________________
lstm_19 (LSTM) (None, 20) 1840
_________________________________________________________________
dense_30 (Dense) (None, 4) 84
=================================================================
Total params: 121,954
Trainable params: 121,954
Non-trainable params: 0
_________________________________________________________________

how does input_shape in keras.applications work?

I have been through the Keras documentation but I am still unable to figure how does the input_shape parameter works and why it does not change the number of parameters for my DenseNet model when I pass it my custom input shape. An example:
import keras
from keras import applications
from keras.layers import Conv3D, MaxPool3D, Flatten, Dense
from keras.layers import Dropout, Input, BatchNormalization
from keras import Model
# define model 1
INPUT_SHAPE = (224, 224, 1) # used to define the input size to the model
n_output_units = 2
activation_fn = 'sigmoid'
densenet_121_model = applications.densenet.DenseNet121(include_top=False, weights=None, input_shape=INPUT_SHAPE, pooling='avg')
inputs = Input(shape=INPUT_SHAPE, name='input')
model_base = densenet_121_model(inputs)
output = Dense(units=n_output_units, activation=activation_fn)(model_base)
model = Model(inputs=inputs, outputs=output)
model.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input (InputLayer) (None, 224, 224, 1) 0
_________________________________________________________________
densenet121 (Model) (None, 1024) 7031232
_________________________________________________________________
dense_1 (Dense) (None, 2) 2050
=================================================================
Total params: 7,033,282
Trainable params: 6,949,634
Non-trainable params: 83,648
_________________________________________________________________
# define model 2
INPUT_SHAPE = (512, 512, 1) # used to define the input size to the model
n_output_units = 2
activation_fn = 'sigmoid'
densenet_121_model = applications.densenet.DenseNet121(include_top=False, weights=None, input_shape=INPUT_SHAPE, pooling='avg')
inputs = Input(shape=INPUT_SHAPE, name='input')
model_base = densenet_121_model(inputs)
output = Dense(units=n_output_units, activation=activation_fn)(model_base)
model = Model(inputs=inputs, outputs=output)
model.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input (InputLayer) (None, 512, 512, 1) 0
_________________________________________________________________
densenet121 (Model) (None, 1024) 7031232
_________________________________________________________________
dense_2 (Dense) (None, 2) 2050
=================================================================
Total params: 7,033,282
Trainable params: 6,949,634
Non-trainable params: 83,648
_________________________________________________________________
Ideally with an increase in the input shape the number of parameters should increase, however as you can see they stay exactly the same. My questions are thus:
Why do the number of parameters not change with a change in the input_shape?
I have only defined one channel in my input_shape, what would happen to my model training in this scenario? The documentation says the following:
input_shape: optional shape tuple, only to be specified if include_top
is False (otherwise the input shape has to be (224, 224, 3) (with
'channels_last' data format) or (3, 224, 224) (with 'channels_first'
data format). It should have exactly 3 inputs channels, and width and
height should be no smaller than 32. E.g. (200, 200, 3) would be one
valid value.
However when I run the model with this configuration it runs without any problems. Could there be something that I am missing out?
Using Keras 2.2.4 with Tensorflow 1.12.0 as backend.

1.
In the convolutional layers the input size does not influence the number of weights, because the number of weights is determined by the kernel matrix dimensions. A larger input size leads to a larger output size, but not to an increasing number of weights.
This means, that the output size of the convolutional layers of the second model will be larger than for the first model, which would increase the number of weights in the following dense layer. However if you take a look into the architecture of DenseNet you notice that there's a GlobalMaxPooling2D layer after all the convolutional layers, which averages all the values for each output channel. Thats why the output of DenseNet will be of size 1024, whatever the input shape.
2.
Yes, the model will still work. I'm not entirely sure about that, but my guess is that the single channel will be broadcasted (dublicated) to fill all three channels. Thats at least how these things are usually handled (see for exaple tensorflow or numpy).

The DenseNet is composed of two parts, the convolution part, and the global pooling part.
The number of the convolution part's trainable weights doesn't depend on the input shape.
Usually, a classification network should employ fully connected layers to infer the classification, however, in DenseNet, global pooling is used and doesn't bring any trainable weights.
Therefore, the input shape doesn't affect the number of weights of the entire network.

Neural Network regression when the output is imbalanced

I am trying to perform regression using a neural network to predict a single output from 146 input features.
I applied Standard Scaling on all inputs and output.
I monitor the Mean Absolute Error after training and it is unreasonably high on the train, validation and test sets (I am not even overfitting).
I suspect this is due to the fact that the output variable is very imbalanced (see histogram).
From the histogram it is possible to see that most of the samples are grouped around 0 but there is also another small group of samples around -5.
Histogram of the imbalanced output
This is model creation code:
input = Input(batch_shape=(None, X.shape[1]))
layer1 = Dense(20, activation='relu')(input)
layer1 = Dropout(0.3)( layer1)
layer1 = BatchNormalization()(layer1)
layer2 = Dense(5, activation='relu',
kernel_regularizer='l2')(layer1)
layer2 = Dropout(0.3)(layer2)
layer2 = BatchNormalization()(layer2)
out_layer = Dense(1, activation='linear')(layer2)
model = Model(inputs=input, outputs=out_layer)
model.compile(loss='mean_squared_error', optimizer=optimizers.adam()
, metrics=['mae'])
This is the model summary:
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 146) 0
_________________________________________________________________
dense_1 (Dense) (None, 20) 2940
_________________________________________________________________
dropout_1 (Dropout) (None, 20) 0
_________________________________________________________________
batch_normalization_1 (Batch (None, 20) 80
_________________________________________________________________
dense_2 (Dense) (None, 5) 105
_________________________________________________________________
dropout_2 (Dropout) (None, 5) 0
_________________________________________________________________
batch_normalization_2 (Batch (None, 5) 20
_________________________________________________________________
dense_3 (Dense) (None, 1) 6
=================================================================
Total params: 3,151
Trainable params: 3,101
Non-trainable params: 50
_________________________________________________________________
Looking at the actual model predictions, the large error mainly happens for samples with a true output value around -5 (the small group of samples).
I tried many configurations for the hyperparameters but still the error is very high.
I see many suggestions on performing neural network classification on imbalanced data but what could be done with regression?
It seems odd to me that a regression neural network is not learning this correctly. What am I doing wrong?

From your histogram, it looks as though it's rare for there to be a non-zero output. This is similar to a classification problem where we're trying to predict a rare class, in that a strong strategy in terms of the loss function is simply to guess the most common class - in this case your modal value of zero.
You should do some research around what people do to predict rare events or to classify inputs when some classes are rare. E.g. this discussion might be helpful: https://www.reddit.com/r/MachineLearning/comments/412wpp/predicting_rare_events_how_to_prevent_machine/
Some strategies you might try include
Removing most of the zero-output training examples so that your training data is more balanced
Creating or acquiring more non-zero training examples
Using a different machine learning algorithm (someone at the link I provided recommends boosting. I wonder if you'd get good results from using a residual neural network structure, which is in some ways similar to boosting)
Re-structuring or rescaling your data to add more weight to the rare values

It appears to me that you have a normal distribution with a very small standard deviation. In which case this should train just as well as any other probability distribution.

Replacing the embedding layer in a pretrained Keras model

I'm trying to replace the embedding layer in a Keras NLP model. I've trained the model for one language, but I would like to transfer it to another language for which I have comparable embeddings. I hope to achieve this by replacing the index-to-embedding mapping for the source language by the index-to-embedding mapping for the target language.
I've tried to do it like this:
from keras.layers import Embedding
from keras.models import load_model
filename = "my_model.h5"
model = load_model(filename)
new_embedding_layer = Embedding(1000, 300, weights=[my_new_embedding_matrix], trainable=False)
new_embedding_layer.build((None, None))
model.layers[0] = new_embedding_layer
When I print out the model summary, this seems to have worked: the new embedding layer has the correct number of parameters (1000*300=300,000):
_________________________________________________________________
None
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding_85 (Embedding) multiple 300000
_________________________________________________________________
lstm_1 (LSTM) (None, 128) 219648
_________________________________________________________________
dense_1 (Dense) (None, 23) 2967
=================================================================
Total params: 522,615
Trainable params: 222,615
Non-trainable params: 300,000
However, when I use the new model to process new examples, nothing seems to have changed: it still accepts input sequences that have values larger than the new vocabulary size of 1000, and returns the same predictions as before.
seq = np.array([10000])
model.predict([seq])
I also notice that the output shape of the new embedding layer is "multiple" rather than (None, None, 300). Maybe this is related to the problem?
Can anyone tell me what I'm missing?

If you Embedding layers have the same shape, then you can simply load your model as you did :
from keras.models import load_model
filename = "my_model.h5"
model = load_model(filename)
Then, rather than building a new embedding layer, you can simply set the weights of the old one :
model.layers[idx_of_your_embedding_layer].set_weights(my_new_embedding_matrix)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

scikit-learn regression with multiple continuous targets - python

Related

LSTM using prediction as input feature for the next time step prediction

Keras Sequential model input: How significant are the dimensions?

how does input_shape in keras.applications work?

Neural Network regression when the output is imbalanced

Replacing the embedding layer in a pretrained Keras model

Categories

Resources