How to stack layers in Keras without using Sequential()? - python

If I have a keras layer L, and I want to stack N versions of this layer (with different weights) in a keras model, what's the best way to do that? Please note that here N is large and controlled by a hyper param. If N is small then this not a problem (we can just manually repeat a line N times). So let's assume N > 10 for example.
If the layer has only one input and one output, I can do something like:
m = Sequential()
for i in range(N):
m.add(L)
But this is not working if my layer actually takes multiple inputs. For example, if my layer has the form z = L(x, y), and I would like my model to do:
x_1 = L(x_0, y)
x_2 = L(x_1, y)
...
x_N = L(x_N-1, y)
Then Sequential wouldn't do the job. I think I can subclass a keras model, but I don't know what's the cleanest way to put N layers into the class. I can use a list, for example:
class MyModel(Model):
def __init__(self):
super(MyModel, self).__init__()
self.layers = []
for i in range(N):
self.layers.append(L)
def call(self, inputs):
x = inputs[0]
y = inputs[1]
for i in range(N):
x = self.layers[i](x, y)
return x
But this is not ideal, as keras won't recognize these layers (it seems not thinking list of layers as "checkpointables"). For example, MyModel.variables would be empty, and MyModel.Save() won't save anything.
I also tried to define the model using the functional API, but it won't work in my case as well. In fact if we do
def MyModel():
input = Input(shape=...)
output = SomeLayer(input)
return Model(inputs=input, outputs=output)
It won't run if SomeLayer itself is a custom model (it raises NotImplementedError).
Any suggestions?

Not sure if I've got your question right, but I guess that you could use the functional API and concatenate or add layers as it is shown in Keras applications, like, ResNet50 or InceptionV3 to build "non-sequential" networks.
UPDATE
In one of my projects, I was using something like this. I had a custom layer (it was not implemented in my version of Keras, so I've just manually "backported" the code into my notebook).
class LeakyReLU(Layer):
"""Leaky version of a Rectified Linear Unit backported from newer Keras
version."""
def __init__(self, alpha=0.3, **kwargs):
super(LeakyReLU, self).__init__(**kwargs)
self.supports_masking = True
self.alpha = K.cast_to_floatx(alpha)
def call(self, inputs):
return tf.maximum(self.alpha * inputs, inputs)
def get_config(self):
config = {'alpha': float(self.alpha)}
base_config = super(LeakyReLU, self).get_config()
return dict(list(base_config.items()) + list(config.items()))
def compute_output_shape(self, input_shape):
return input_shape
Then, the model:
def create_model(input_shape, output_size, alpha=0.05, reg=0.001):
inputs = Input(shape=input_shape)
x = Conv2D(16, (3, 3), padding='valid', strides=(1, 1),
kernel_regularizer=l2(reg), kernel_constraint=maxnorm(3),
activation=None)(inputs)
x = BatchNormalization()(x)
x = LeakyReLU(alpha=alpha)(x)
x = MaxPooling2D(pool_size=(2, 2))(x)
x = Conv2D(32, (3, 3), padding='valid', strides=(1, 1),
kernel_regularizer=l2(reg), kernel_constraint=maxnorm(3),
activation=None)(x)
x = BatchNormalization()(x)
x = LeakyReLU(alpha=alpha)(x)
x = MaxPooling2D(pool_size=(2, 2))(x)
x = Conv2D(64, (3, 3), padding='valid', strides=(1, 1),
kernel_regularizer=l2(reg), kernel_constraint=maxnorm(3),
activation=None)(x)
x = BatchNormalization()(x)
x = LeakyReLU(alpha=alpha)(x)
x = MaxPooling2D(pool_size=(2, 2))(x)
x = Conv2D(128, (3, 3), padding='valid', strides=(1, 1),
kernel_regularizer=l2(reg), kernel_constraint=maxnorm(3),
activation=None)(x)
x = BatchNormalization()(x)
x = LeakyReLU(alpha=alpha)(x)
x = MaxPooling2D(pool_size=(2, 2))(x)
x = Conv2D(256, (3, 3), padding='valid', strides=(1, 1),
kernel_regularizer=l2(reg), kernel_constraint=maxnorm(3),
activation=None)(x)
x = BatchNormalization()(x)
x = LeakyReLU(alpha=alpha)(x)
x = MaxPooling2D(pool_size=(2, 2))(x)
x = Flatten()(x)
x = Dense(500, activation='relu', kernel_regularizer=l2(reg))(x)
x = Dense(500, activation='relu', kernel_regularizer=l2(reg))(x)
x = Dense(500, activation='relu', kernel_regularizer=l2(reg))(x)
x = Dense(500, activation='relu', kernel_regularizer=l2(reg))(x)
x = Dense(500, activation='relu', kernel_regularizer=l2(reg))(x)
x = Dense(500, activation='relu', kernel_regularizer=l2(reg))(x)
x = Dense(output_size, activation='linear', kernel_regularizer=l2(reg))(x)
model = Model(inputs=inputs, outputs=x)
return model
Finally, a custom metric:
def root_mean_squared_error(y_true, y_pred):
return K.sqrt(K.mean(K.square(y_pred - y_true), axis=-1))
I was using the following snippet to create and compile the model:
model = create_model(input_shape=X.shape[1:], output_size=y.shape[1])
model.compile(loss=root_mean_squared_error, optimizer='adamax')
As usual, I was using a checkpoint callback to save the model. To load the model, you need to pass the custom layers classes and metrics as well into load_model function:
def load_custom_model(path):
return load_model(path, custom_objects={
'LeakyReLU': LeakyReLU,
'root_mean_squared_error': root_mean_squared_error
})
Does it help?

After having researched this to a great extent: I'm certain that there is no built-in, universal way to do this in Tensorflow/Keras.
There are, however, still ways to achieve the same goals but in a different way. The problem is that there's no universal solution to this in Tensorflow with any Keras layer, and so you'll have to approach it on a case-by-case basis.
So for example, if what you wanted to do is stack a bunch of Dense layers and then have some dimension of your input that would correspond to each one (simple example), what you would instead want to do is construct a custom Dense layer and add extra dimensions to its weights and biases, and then do the appropriate operations given some extra dimension in your input.
So ultimately the same (desired) operations would be performed here in the way that you want them to be, each input along some dimension would be put through a separate Dense layer with separate weights/biases: but it would be done concurrently, without any python looping. In essence, you would be reducing the size and complexity of the graph and performing the same operations in a more concurrent way; this ought to be much more efficient.
The strategy outlined here generalises to any layer/input type. It's not great news, in that it would be of very high value to us (users) if there were some standardised Keras-friendly way of stacking a bunch of layers and then passing input to them in a more concurrent way that didn't involve python looping but rather concatenating the internal parameters into a new dimension and managing alignment between an extra 'stacking' dimension in both the inputs and parameters.
Like, in the same way we have tf.keras.Sequential we could also benefit from something like tf.keras.Parallel as a universal solution to this common ML need.

If I understand your question correctly, you can solve this problem simply by using a for-loop when building the model. I'm not sure if you need any special layer so I will assume you only use Dense here:
def MyModel():
input = Input(shape=...)
x = input
for i in range(N):
x = Dense(number_of_nodes, name='dense %i' %i)(x)
// Or some custom layers
output = Dense(number_of_output)(x)
return Model(inputs=input, outputs=output)

Related

Tensorflow can't append batches together after doing the first epoch

I am running into problems with my code after I removed the loss function of the compile step (set it equal to loss=None) and added one with the intention of adding another, loss function through the add_loss method. I can call fit and it trains for one epoch but then I get this error:
ValueError: operands could not be broadcast together with shapes (128,) (117,) (128,)
My batch size is 128. It looks like 117 is somehow dependent on the number of examples that I am using. When I vary the number of examples, I get different numbers from 117. They are all my number of examples mod my batch size. I am at a loss about how to fix this issue. I am using tf.data.TFRecordDataset as input.
I have the following simplified model:
class MyModel(Model):
def __init__(self):
super(MyModel, self).__init__()
encoder_input = layers.Input(shape=INPUT_SHAPE, name='encoder_input')
x = encoder_input
x = layers.Conv2D(64, (3, 3), activation='relu', padding='same', strides=2)(x)
x = layers.BatchNormalization()(x)
x = layers.Conv2D(32, (3, 3), activation='relu', padding='same', strides=2)(x)
x = layers.BatchNormalization()(x)
x = layers.Flatten()(x)
encoded = layers.Dense(LATENT_DIM, name='encoded')(x)
self.encoder = Model(encoder_input, outputs=[encoded])
self.decoder = tf.keras.Sequential([
layers.Input(shape=LATENT_DIM),
layers.Dense(32 * 32 * 32),
layers.Reshape((32, 32, 32)),
layers.Conv2DTranspose(32, kernel_size=3, strides=2, activation='relu', padding='same'),
layers.Conv2DTranspose(64, kernel_size=3, strides=2, activation='relu', padding='same'),
layers.Conv2D(3, kernel_size=(3, 3), activation='sigmoid', padding='same')])
def call(self, x):
encoded = self.encoder(x)
decoded = self.decoder(encoded)
# Loss function. Has to be here because I intend to add another, more layer-interdependent, loss function.
r_loss = tf.math.reduce_sum(tf.math.square(x - decoded), axis=[1, 2, 3])
self.add_loss(r_loss)
return decoded
def read_tfrecord(example):
example = tf.io.parse_single_example(example, CELEB_A_FORMAT)
image = decode_image(example['image'])
return image, image
def load_dataset(filenames, func):
dataset = tf.data.TFRecordDataset(
filenames
)
dataset = dataset.map(partial(func), num_parallel_calls=tf.data.AUTOTUNE)
return dataset
def train_autoencoder():
filenames_train = glob.glob(TRAIN_PATH)
train_dataset_x_x = load_dataset(filenames_train[:4], func=read_tfrecord)
autoencoder = Autoencoder()
# The loss function used to be defined here and everything worked fine before.
def r_loss(y_true, y_pred):
return tf.math.reduce_sum(tf.math.square(y_true - y_pred), axis=[1, 2, 3])
optimizer = tf.keras.optimizers.Adam(1e-4)
autoencoder.compile(optimizer=optimizer, loss=None)
autoencoder.fit(train_dataset_x_x.batch(AUTOENCODER_BATCH_SIZE),
epochs=AUTOENCODER_NUM_EPOCHS,
shuffle=True)
If you only want to get rid of the error and don't care about the last "remainder" batch of your dataset, you can use the keyword argument drop_remainder=True inside of train_dataset_x_x.batch(), that way all of your batches will be the same size.
FYI, it's usually better practice to batch your dataset outside of the function call for fit:
data = data.batch(32)
model.fit(data)
the loss function can not be set in the call method .
the call method is intended to make a forward pass not to copute the loss .
u need to add the loss function in the compile method or after it

Convolutional NN pretrained on imagenet dataset (inceptionv3) raised an ValueError in ternsorflow

I tried to make Convolutional NN needs to be pretrained on imagenet dataset; TO do so, I used inceptionv3 as base model that needs to be on the top of Convolutional NN, but it raised value error as follow:
ValueError Traceback (most recent call last)
<ipython-input-13-b52791a606ee> in <module>()
6 x = MaxPooling2D(pool_size=(2,2))(x)
7 x = Flatten()(x)
----> 8 x = Dense(2048)(x)
9 x = BatchNormalization()(x)
10 x = Activation('relu')(x)
3 frames
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/layers/core.py in build(self, input_shape)
1166 last_dim = tensor_shape.dimension_value(input_shape[-1])
1167 if last_dim is None:
-> 1168 raise ValueError('The last dimension of the inputs to `Dense` '
1169 'should be defined. Found `None`.')
1170 self.input_spec = InputSpec(min_ndim=2, axes={-1: last_dim})
ValueError: The last dimension of the inputs to `Dense` should be defined. Found `None`.
from error message, I can infer that the dimension of the inputs to dense are missing. I am not sure how to fix this up? Can anyone point me out what would be a quick debug solution for this? Any possible thoughts?
my attempt
here is current my attempt
from tensorflow.keras import models
from tensorflow.keras.layers import Conv2D, MaxPooling2D, BatchNormalization
from tensorflow.keras.layers import Activation, Dropout, Flatten, Dense
from tensorflow.keras.applications.inception_v3 import InceptionV3
base_model = InceptionV3(weights='imagenet', include_top=False)
x = base_model.input
x = Conv2D(32, (3, 3), input_shape=x.shape[1:])(x)
x = BatchNormalization(axis=-1)(x)
x = Activation('relu')(x)
x = MaxPooling2D(pool_size=(2,2))(x)
x = Flatten()(x)
x = Dense(2048)(x)
x = BatchNormalization()(x)
x = Activation('relu')(x)
x = Dropout(0.2)(x)
x = Dense(10)(x)
x = Activation('softmax')(x)
outputs = x
model = tf.keras.Model(inputs=x, outputs=outputs, name="cifar10_model")
model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])
model.summary()
I want to use this on cifar10 multi-class image classification. My goal is I want convolutional NN needs to be pre-trained on imagenet dataset for the sake of weight initialization. I am not sure how to achieve this correctly. Can anyone point me out how to make this happen in TensorFlow? Any possible thoughts?
extented thoughts
if we can make above attempt as error-free, can we add a residual connection to the above attempt? Any way to do this? Thanks
There seems to be a few problems with your code. First, you are connecting layers to the input of the Inception model whereas you want to be connecting layers to the output of the network. The first thing you need to do is change
x = base_model.input
to
x = base_model.output
Next, by printing out the output of each layer, we see that the shape you are feeding into each subsequent layer after the base model is (None, None, None, 3). This is because you haven't defined an input shape for your model. To fix this, simply add the input_shape argument to your constructor.
base_model = InceptionV3(weights='imagenet', include_top=False, input_shape=shape)
Finally, when you construct your new model, the inputs to your model should be the inputs to the Inception network. So you need to change
model = tf.keras.Model(inputs=x, outputs=outputs, name="cifar10_model")
to
model = tf.keras.Model(inputs=base_model.input, outputs=outputs, name="cifar10_model")
Finally, to add a residual connection, you can either define a custom layer, or do some renaming such that you can access the identity layer into an Add() layer.
FULLY WORKING
def block(x, filters, stride=1):
identity = x
identity = Conv2D(4 * filters, 1, strides=stride, padding='same')(identity)
identity = BatchNormalization()(identity)
x = Conv2D(4 * filters, (3, 3), strides=stride, padding='same')(x)
x = BatchNormalization()(x)
x = Activation('relu')(x)
x = Add()([identity, x])
x = Activation('relu')(x)
return x
base_model = InceptionV3(weights='imagenet', include_top=False, input_shape=(224, 224, 3))
x = block(base_model.output, 32, 1)
x = BatchNormalization(axis=-1)(x)
x = Activation('relu')(x)
x = MaxPooling2D(pool_size=(2,2))(x)
x = Flatten()(x)
x = Dense(2048)(x)
x = BatchNormalization()(x)
x = Activation('relu')(x)
x = Dropout(0.2)(x)
x = Dense(10)(x)
x = Activation('softmax')(x)
outputs = x
model = models.Model(base_model.input, outputs)
The pretrained model takes the input shape, not the first Conv2D.
Try this:
base_model = InceptionV3(weights='imagenet', include_top=False,input_shape=x.shape[1:])
x = base_model.input
x = Conv2D(32, (3, 3))(x)
...
EDIT: if you want residual connection, maybe you have to implement the layer on your own.

How to convert a tensorflow model to a pytorch model?

I'm new to pytorch. Here's an architecture of a tensorflow model and I'd like to convert it into a pytorch model.
I have done most of the codes but am confused about a few places.
1) In tensorflow, the Conv2D function takes filter as an input. However, in pytorch, the function takes the size of input channels and output channels as inputs. So how do I find the equivalent number of input channels and output channels, provided with the size of the filter.
2) In tensorflow, the dense layer has a parameter called 'nodes'. However, in pytorch, the same layer has 2 different inputs (the size of the input parameters and size of the targeted parameters), how do I determine them based on the number of the nodes.
Here's the tensorflow code.
from keras.utils import to_categorical
from keras.models import Sequential, load_model
from keras.layers import Conv2D, MaxPool2D, Dense, Flatten, Dropout
model = Sequential()
model.add(Conv2D(filters=32, kernel_size=(5,5), activation='relu', input_shape=X_train.shape[1:]))
model.add(Conv2D(filters=32, kernel_size=(5,5), activation='relu'))
model.add(MaxPool2D(pool_size=(2, 2)))
model.add(Dropout(rate=0.25))
model.add(Conv2D(filters=64, kernel_size=(3, 3), activation='relu'))
model.add(Conv2D(filters=64, kernel_size=(3, 3), activation='relu'))
model.add(MaxPool2D(pool_size=(2, 2)))
model.add(Dropout(rate=0.25))
model.add(Flatten())
model.add(Dense(256, activation='relu'))
model.add(Dropout(rate=0.5))
model.add(Dense(43, activation='softmax'))
Here's my code.:
import torch.nn.functional as F
import torch
# The network should inherit from the nn.Module
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
# Define 2D convolution layers
# 3: input channels, 32: output channels, 5: kernel size, 1: stride
self.conv1 = nn.Conv2d(3, 32, 5, 1) # The size of input channel is 3 because all images are coloured
self.conv2 = nn.Conv2d(32, 64, 5, 1)
self.conv3 = nn.Conv2d(64, 128, 3, 1)
self.conv3 = nn.Conv2d(128, 256, 3, 1)
# It will 'filter' out some of the input by the probability(assign zero)
self.dropout1 = nn.Dropout2d(0.25)
self.dropout2 = nn.Dropout2d(0.5)
# Fully connected layer: input size, output size
self.fc1 = nn.Linear(36864, 128)
self.fc2 = nn.Linear(128, 10)
# forward() link all layers together,
def forward(self, x):
x = self.conv1(x)
x = F.relu(x)
x = self.conv2(x)
x = F.relu(x)
x = F.max_pool2d(x, 2)
x = self.dropout1(x)
x = self.conv3(x)
x = F.relu(x)
x = self.conv4(x)
x = F.relu(x)
x = F.max_pool2d(x, 2)
x = self.dropout1(x)
x = torch.flatten(x, 1)
x = self.fc1(x)
x = F.relu(x)
x = self.dropout2(x)
x = self.fc2(x)
output = F.log_softmax(x, dim=1)
return output
Thanks in advance!
1) In pytorch, we take input channels and output channels as an input. In your first layer, the input channels will be the number of color channels in your image. After that it's always going to be the same as the output channels from your previous layer (output channels are specified by the filters parameter in Tensorflow).
2). Pytorch is slightly annoying in the fact that when flattening your conv outputs you'll have to calculate the shape yourself. You can either use an equation to calculate this (𝑂𝑢𝑡=(𝑊−𝐹+2𝑃)/𝑆+1), or make a shape calculating function to get the shape of a dummy image after it's been passed through the conv part of the network. This parameter will be your size of input argument; the size of your output argument will just be the number of nodes you want in your next fully connected layer.

How to remove the FC layer off of a fine turned model keras

So I have finetuned a Resnet50 model with the following architecture:
model = models.Sequential()
model.add(resnet)
model.add(Conv2D(512, (3, 3), activation='relu'))
model.add(Conv2D(512, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2), strides=(2, 2)))
model.add(Flatten())
model.add(layers.Dense(2048, activation='relu'))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(4096, activation='relu'))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(736, activation='softmax')) # Output layer
So now I have a saved model (.h5) which I want to use as input into another model. But I don't want the last layer. I would normally do it like this with a base resnet50 model:
def base_model():
resnet = resnet50.ResNet50(weights="imagenet", include_top=False)
x = resnet.output
x = GlobalAveragePooling2D()(x)
x = Dense(4096, activation='relu')(x)
x = Dropout(0.6)(x)
x = Dense(4096, activation='relu')(x)
x = Dropout(0.6)(x)
x = Lambda(lambda x_: K.l2_normalize(x,axis=1))(x)
return Model(inputs=resnet.input, outputs=x)
but that does not work for the model as it gives me an error. I am trying it like this right now but still, it does not work.
def base_model():
resnet = load_model("../Models/fine_tuned_model/fine_tuned_resnet50.h5")
x = resnet.layers.pop()
#resnet = resnet50.ResNet50(weights="imagenet", include_top=False)
#x = resnet.output
#x = GlobalAveragePooling2D()(x)
x = Dense(4096, activation='relu')(x)
x = Dropout(0.6)(x)
x = Dense(4096, activation='relu')(x)
x = Dropout(0.6)(x)
x = Lambda(lambda x_: K.l2_normalize(x,axis=1))(x)
return Model(inputs=resnet.input, outputs=x)
enhanced_resent = base_model()
This is the error that it gives me.
Layer dense_3 was called with an input that isn't a symbolic tensor. Received type: <class 'keras.layers.core.Dense'>. Full input: [<keras.layers.core.Dense object at 0x000001C61E68E2E8>]. All inputs to the layer should be tensors.
I don't know if I can do this or not.
I have finally figured it out after quitting for an hour. So this is how you will do it.
def base_model():
resnet = load_model("../Models/fine_tuned_model/42-0.85.h5")
x = resnet.layers[-2].output
x = Dense(4096, activation='relu', name="FC1")(x)
x = Dropout(0.6, name="FCDrop1")(x)
x = Dense(4096, activation='relu', name="FC2")(x)
x = Dropout(0.6, name="FCDrop2")(x)
x = Lambda(lambda x_: K.l2_normalize(x,axis=1))(x)
return Model(inputs=resnet.input, outputs=x)
enhanced_resent = base_model()
And this works perfectly. I hope this helps out someone else as I have never seen this done in any tutorial before.
x = resnet.layers[-2].output
This will get the layer you want, but you need to know which index the layer you want is at. -2 is the 2nd to last FC layer that I wanted as I wanted the feature extractions, not the final classification. This can be found doing a
model.summary()

Keras, Tensorflow: Systematic Offset in Predictions

I am working on a regression CNN using Keras/Tensorflow. I have a multi-output feed-forward model that I have trained up with some success. The model takes in a 201x201 grayscale image and returns two regression targets.
Here is an example of an input/target pair:
is associated with (z=562.59, a=4.53)
There exists an analytical solution for this problem, so I know it's solvable.
Here is the model architecture:
model_input = keras.Input(shape=input_shape, name='image')
x = model_input
x = Conv2D(32, kernel_size=(3, 3), activation='relu')(x)
x = MaxPooling2D(pool_size = (2,2))(x)
x = Conv2D(32, kernel_size=(3, 3), activation='relu')(x)
x = MaxPooling2D(pool_size = (2,2))(x)
x = Conv2D(32, kernel_size=(3, 3), activation='relu')(x)
x = MaxPooling2D(pool_size = (2,2))(x)
x = Conv2D(16, kernel_size=(3, 3), activation='relu')(x)
x = MaxPooling2D(pool_size = (4,4))(x)
x = Flatten()(x)
model_outputs = list()
out_names = ['z', 'a']
for i in range(2):
out_name = out_names[i]
local_output= x
local_output = Dense(10, activation='relu')(local_output)
local_output = Dropout(0.2)(local_output)
local_output = Dense(units=1, activation='linear', name = out_name)(local_output)
model_outputs.append(local_output)
model = Model(model_input, model_outputs)
model.compile(loss = 'mean_squared_error', optimizer='adam', loss_weights = [1,1])
My targets are on different scales, so I normalized one of them (name 'a') to the range [0,1] for training. Here is how I rescale:
def rescale(min, max, list):
scalar = 1./(max-min)
list = (list-min)*scalar
return list
Where min,max for each parameter are known a priori and are constant.
Here is how I trained:
model.fit({'image' : x_train},
{'z' : z_train, 'a' : a_train},
batch_size = 32,
epochs=20,
verbose=1,
validation_data = ({'image' : x_test},
{'z' : z_test, 'a' : a_test}))
When I predict for 'a', I get a fairly good accuracy, but with an offset:
This is a fairly easy thing to fix, I just apply a linear fit to the predictions and invert it to rescale:
But I can't think of a reason why this would be happening in the first place. I've used this same model architecture for other problems, and I get that same offset again. Has anyone seen this sort of thing before?
EDIT: This offset occurs in multiple different models of mine, which each predict different parameters but are rescaled/preprocessed in the same way. It happens regardless of how many epochs I train for, with more training resulting in predictions hugging the green line (in the first graph) more closely.
As a temporary work-around, I trained a single-node model to take the input as the original model's prediction and the output as the ground truth. This trained up nicely, and corrects the offset. What's strange though, is that I can apply this rescale model to ANY of the models with this issue, and it corrects the offset equally well.
Essentially: the offset has the same weight for multiple different models, which predict completely different parameters. This makes me think there is something to do with the activation or regularization.

Categories