Using python multiprocessing for sklearn NN - python

I am using dev version of Python sklearn package with NN implementation.
My task is to train 4 NN with different input data and the average the predictions
X_median = preprocessing.scale(data_median)
X_min = preprocessing.scale(data_min)
X_max = preprocessing.scale(data_max)
X_mean = preprocessing.scale(data_mean)
I creat a Neural Networks like this
NN1 = MLPClassifier(hidden_layer_sizes = (50), activation = 'logistic', algorithm='adam', alpha= 0 , max_iter = 40, batch_size = 10, learning_rate = 'adaptive', shuffle = True, random_state=1)
NN2 = MLPClassifier(hidden_layer_sizes = (50), activation = 'logistic', algorithm='adam', alpha= 0 , max_iter = 40, batch_size = 10, learning_rate = 'adaptive', shuffle = True, random_state=1)
NN3 = MLPClassifier(hidden_layer_sizes = (50), activation = 'logistic', algorithm='adam', alpha= 0 , max_iter = 40, batch_size = 10, learning_rate = 'adaptive', shuffle = True, random_state=1)
NN4 = MLPClassifier(hidden_layer_sizes = (50), activation = 'logistic', algorithm='adam', alpha= 0 , max_iter = 40, batch_size = 10, learning_rate = 'adaptive', shuffle = True, random_state=1)
(standard sklearn function)
and I want to train them on described above datasets.
Without using pool my code will look like this:
NN1.fit(X_mean,train_y)
NN2.fit(X_median,train_y)
NN3.fit(X_min,train_y)
NN4.fit(X_max,train_y)
Of course since all 4 training are independent I want to run them in parallel, and I assume I should use pool for this. However, I do not understand completely how the computation is performed. I would assume to write something like this:
pool = Pool()
pool.apply_async(NN1.fit, args = (X_mean, train_y))
However, this does not produce any results, I can even type like this(passing only one argument) and the program will finish without any errors!
pool.apply_async(NN1.fit, args = (X_mean,)).
What will be the correct way to perform such computations?
Can someone advise good resource to understand the usage of Python multiprocessing?

Finally I made it work)
I based my solution on this answer. So, firstly create two help functions:
1)
def Myfunc(MyNN,X,train_y):
MyBrain.fit(X,train_y)
return MyNN
This one is just to make desirable function global to feed pool methods
2)
def test_star(a_b):
return Myfunc(*a_b)
This is key part of it- help function to take 1 argument and split it to desirable number of args Myfunc needed.
Then just create
mylist = [(NN_mean,X_mean, train_y), (NN_median,X_median, train_y)]
and execute
NN_mean, NN_median = pool.map(test_star, my list).
From my point of view this solution is super ugly, but it works. I hope someone can create more elegant one and post it :).

Related

ValueError: No gradients provided for any variable: ['tf_deberta_v2_for_sequence_classification_1/deberta/embeddings/word_embeddings

I am trying to fine tune a transformer model for text classification but I am having trouble training the model. I have tried many things but none of them seem to work. I have also tried different solutions on other question but they didn't work. I am using 'microsoft/deberta-v3-base' model for fine tuning. Here's my code:
train_dataset = Dataset.from_pandas(df_tr[['text', 'label']]).class_encode_column("label")
val_dataset = Dataset.from_pandas(df_tes[['text', 'label']]).class_encode_column("label")
train_tok_dataset = train_dataset.map(tokenizer_func, batched=True, remove_columns=('text'))
val_tok_dataset = val_dataset.map(tokenizer_func, batched=True, remove_columns=('text'))
from transformers import TFAutoModelForSequenceClassification
model = TFAutoModelForSequenceClassification.from_pretrained(config.model_name, num_labels=3)
transformer_model = TFAutoModelForSequenceClassification.from_pretrained(config.model_name, output_hidden_states=True)
input_ids = tf.keras.Input(shape=(config.max_len, ),dtype='int32')
attention_mask = tf.keras.Input(shape=(config.max_len, ), dtype='int32')
transformer = transformer_model([input_ids, attention_mask])
hidden_states = transformer[1] # get output_hidden_states
#print(hidden_states)
hidden_states_size = 4 # count of the last states
hiddes_states_ind = list(range(-hidden_states_size, 0, 1))
selected_hiddes_states = tf.keras.layers.concatenate(tuple([hidden_states[i] for i in hiddes_states_ind]))
# Now we can use selected_hiddes_states as we want
output = tf.keras.layers.Dense(128, activation='relu')(selected_hiddes_states)
output=tf.keras.layers.Flatten()(output)
output = tf.keras.layers.Dense(3, activation='softmax')(output)
model = tf.keras.models.Model(inputs = [input_ids, attention_mask], outputs = output)
from transformers import create_optimizer
import tensorflow as tf
batch_size = 8
num_epochs = config.epochs
#batches_per_epoch = len(tokenized_tweets["train"]) // batch_size
total_train_steps = int(num_steps * num_epochs)
optimizer, schedule = create_optimizer(init_lr=2e-5, num_warmup_steps=0, num_train_steps=num_steps/2)
model.compile(optimizer=optimizer)
with tf.device('GPU:0'):
model.fit(x=[np.array(train_tok_dataset["input_ids"]),np.array(train_tok_dataset["attention_mask"])],
y=tf.keras.utils.to_categorical(y_train,num_classes=3),
validation_data=([np.array(val_tok_dataset["input_ids"]),np.array(val_tok_dataset["attention_mask"])],tf.keras.utils.to_categorical(y_test,num_classes=3)),
epochs=config.epochs,class_weight={0:0.57,1:0.18,2:0.39})
It seems like a small issue, but I am new to tensorflow and transformers so I couldn't sort it out myself.
I would say it's probably due to the fact that you are not adding a loss to the compilation, thus no gradient can be computed wrt it:
model.compile(optimizer=optimizer)
^^^^^^^^^^^^^^^^^^^^---- no "loss = tf.keras.losses...
Maybe you're just missing an = on the right side of validation_data.
model.fit(
x=[np.array(...),np.array(...)],
y=tf.keras.utils.to_categorical(...),
validation_data=([np.array(...), np.array(...)], tf.keras.utils.to_categorical(...)),
...
)

lightGBM predicts same value

I have one problem concerning lgb. When I write
lgb.train(.......)
it finishes in less than milisecond. (for (10 000,25) ) shape dataset.
and when I write predict, all the output variables have same value.
train = pd.read_csv('data/train.csv', dtype = dtypes)
test = pd.read_csv('data/test.csv')
test.head()
X = train.iloc[:10000, 3:-1].values
y = train.iloc[:10000, -1].values
sc = StandardScaler()
X = sc.fit_transform(X)
#pca = PCA(0.95)
#X = pca.fit_transform(X)
d_train = lgb.Dataset(X, label=y)
params = {}
params['learning_rate'] = 0.003
params['boosting_type'] = 'gbdt'
params['objective'] = 'binary'
params['metric'] = 'binary_logloss'
params['sub_feature'] = 0.5
params['num_leaves'] = 10
params['min_data'] = 50
params['max_depth'] = 10
num_round = 10
clf = lgb.train(params, d_train, num_round, verbose_eval=1000)
X_test = sc.transform(test.iloc[:100,3:].values)
pred = clf.predict(X_test, num_iteration = clf.best_iteration)
when I print pred, all the values are (0.49)
It's my first time using lightgbm module. Do I have some error in the code? or I should look for some mismatches in dataset.
Your num_round is too small, it just starts to learn and stops there. Other than that, make your verbose_eval smaller, so see the results visually upon training. My suggestion for you to try the lgb.train code as below:
clf = lgb.train(params, d_train, num_boost_round=5000, verbose_eval=10, early_stopping_rounds = 3500)
Always use early_stopping_rounds since the model should stop if there is no evident learning or the model starts to overfit.
Do not hesitate to ask more. Have fun.

Tensorflow, I got a Shape mismatch in execution time

Good Afternoon Everyone,
I am currently having some trouble with tensorflow, since for some reason I get a Shape error after about 3 and a half hours running. The files are loaded using the tensorflow pipeline, and creating two reinitializable datasets for training and test. I know the data has the correct shape because I do a hardcoded reshape to the expected shape and I've never got an error there. The problem is, when running the network at some point there is a sample that do not have the correct amount of number in the flatten operation. And the program crashes, but there is no other explanation other than the number of elements in the tensor is not divisible by 10 (my batch size). Which honestly makes no sense to me since the data has gone exactly through the same pipeline as the other batches that run with no problem.
I can provide code if needed, but I think is more a failure to understand some concept from the framework.
Thanks in advance for all the help.
EDIT: Please, find the code here, a bit of nomemclature t corresponds to a layer that has time data (X), f corresponds to a layer that has frequency data (FREQ), q corresponds to a layer that contains cepstral data (QUEF) and tf corresponds to layers that contain 2-D data, spectrograms of X (SPECG), Y is the label. All data are tf.float32 except for the labels which are tf.int64
EDIT 2: The operation that gives problems is the flatten on qsubnet_out
EDIT 3: Probably the most important, it seems than some of the layers converge to NaNs
Training loop:
for i in range(FLAGS.max_steps):
start = time.time()
sess.run([train],feed_dict={handle:train_handle})
if i%10 == False:
summary_op,entropy,acc,expected,output = sess.run([merged,loss,accuracy,Y,tf.argmax(logit,1)],feed_dict={handle:train_handle})
summary_op,_,_ = sess.run([merged,loss,accuracy],feed_dict={handle:test_handle})
Training operations:
W = { 'tc1': [64,3], 'tc2':[128,3], 'tc3':[256,5], 'tc4': [128, 2],
'fc1': [64,3], 'fc2':[128,3], 'fc3':[256,5], 'fc4': [128, 2],
'qc1': [64,3], 'qc2':[128,3], 'qc3':[256,5], 'qc4': [128, 2],
'tfc1': [64,(3,3)], 'tfc2':[128,(3,3)], 'tfc3':[256,(5,5)], 'tfc4': [128, (2,2)],
'dense1': 1000, 'dense2': 100, 'dense3': 200,'dense4': 300, 'dense5': 200,
'out' : NUM_CLASSES
}
iter = tf.data.Iterator.from_string_handle(handle, train_dataset.output_types, train_dataset.output_shapes)
X,FREQ,QUEF,SPECG,Y = iter.get_next()
X.set_shape([FLAGS.batch_size,768,14])
FREQ.set_shape([FLAGS.batch_size,384,14])
QUEF.set_shape([FLAGS.batch_size,384,14])
SPECG.set_shape([FLAGS.batch_size,65,18,14])
logit = net.run(X,FREQ,QUEF,SPECG,W)
loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(labels=Y,logits=logit))
And the the file net.py:
def run(X,FREQ,QUEF,SPECG,W):
time = tf.layers.batch_normalization(X,axis=-1,training=True,trainable=True)
freq = tf.layers.batch_normalization(FREQ,axis=-1,training=True,trainable=True)
quef = tf.layers.batch_normalization(QUEF,axis=-1,training=True,trainable=True)
time_freq = tf.layers.batch_normalization(SPECG,axis=-1,training=True,trainable=True)
regularizer = tf.contrib.layers.l2_regularizer(0.1);
#########################################################################################################
#### TIME SUBNET
with tf.device('/GPU:1'):
tc1 = tf.layers.conv1d(inputs=time,filters=W['tc1'][0],kernel_size=W['tc1'][1],strides=1,padding='SAME',kernel_initializer=tf.initializers.random_normal,kernel_regularizer=regularizer,name='tc1')
trelu1 = tf.nn.relu(features=tc1,name='trelu1')
tpool1 = tf.layers.max_pooling1d(trelu1,pool_size=2,strides=1)
tc2 = tf.layers.conv1d(inputs=tpool1,filters=W['tc2'][0],kernel_size=W['tc2'][1],strides=1,padding='SAME',kernel_initializer=tf.initializers.random_normal,kernel_regularizer=regularizer,name='tc2')
tc3 = tf.layers.conv1d(inputs=tc2,filters=W['tc3'][0],kernel_size=W['tc3'][1],strides=1,padding='SAME',kernel_initializer=tf.initializers.random_normal,kernel_regularizer=regularizer,name='tc3')
trelu2 = tf.nn.relu(tc3,name='trelu2')
tpool2 = tf.layers.max_pooling1d(trelu2,pool_size=2,strides=1)
tc4 = tf.layers.conv1d(inputs=tpool2,filters=W['tc4'][0],kernel_size=W['tc4'][1],strides=1,padding='SAME',kernel_initializer=tf.initializers.random_normal,kernel_regularizer=regularizer,name='tc4')
tsubnet_out = tf.nn.relu6(tc4,'trelu61')
#########################################################################################################
#### CEPSTRUM SUBNET (QUEFRENCIAL)
qc1 = tf.layers.conv1d(inputs=quef,filters=W['qc1'][0],kernel_size=W['qc1'][1],strides=1,padding='SAME',kernel_initializer=tf.initializers.random_normal,kernel_regularizer=regularizer,name='qc1')
qrelu1 = tf.nn.relu(features=qc1,name='qrelu1')
qpool1 = tf.layers.max_pooling1d(qrelu1,pool_size=2,strides=1)
qc2 = tf.layers.conv1d(inputs=qpool1,filters=W['qc2'][0],kernel_size=W['qc2'][1],padding='SAME',strides=1,kernel_initializer=tf.initializers.random_normal,kernel_regularizer=regularizer,name='qc2')
qc3 = tf.layers.conv1d(inputs=qc2,filters=W['qc3'][0],kernel_size=W['qc3'][1],padding='SAME',strides=1,kernel_initializer=tf.initializers.random_normal,kernel_regularizer=regularizer,name='qc3')
qrelu2 = tf.nn.relu(qc3,name='qrelu2')
qpool2 = tf.layers.max_pooling1d(qrelu2,pool_size=2,strides=1)
qc4 = tf.layers.conv1d(inputs=qpool2,filters=W['qc4'][0],kernel_size=W['qc4'][1],padding='SAME',strides=1,kernel_initializer=tf.initializers.random_normal,kernel_regularizer=regularizer,name='qc4')
qsubnet_out = tf.nn.relu6(qc4,'qrelu61')
#########################################################################################################
#FREQ SUBNET
with tf.device('/GPU:1'):
fc1 = tf.layers.conv1d(inputs=freq,filters=W['fc1'][0],kernel_size=W['fc1'][1],padding='SAME',strides=1,kernel_initializer=tf.initializers.random_normal,kernel_regularizer=regularizer,name='fc1')
frelu1 = tf.nn.relu(features=fc1,name='trelu1')
fpool1 = tf.layers.max_pooling1d(frelu1,pool_size=2,strides=1)
fc2 = tf.layers.conv1d(inputs=fpool1,filters=W['fc2'][0],kernel_size=W['fc2'][1],padding='SAME',strides=1,kernel_initializer=tf.initializers.random_normal,kernel_regularizer=regularizer,name='fc2')
fc3 = tf.layers.conv1d(inputs=fc2,filters=W['fc3'][0],kernel_size=W['fc3'][1],padding='SAME',strides=1,kernel_initializer=tf.initializers.random_normal,kernel_regularizer=regularizer,name='fc3')
frelu2 = tf.nn.relu(fc3,name='frelu2')
fpool2 = tf.layers.max_pooling1d(frelu2,pool_size=2,strides=1)
fc4 = tf.layers.conv1d(inputs=fpool2,filters=W['fc4'][0],kernel_size=W['fc4'][1],padding='SAME',strides=1,kernel_initializer=tf.initializers.random_normal,kernel_regularizer=regularizer,name='fc4')
fsubnet_out = tf.nn.relu6(fc4,'frelu61')
########################################################################################################
## TIME/FREQ SUBNET
with tf.device('/GPU:0'):
tfc1 = tf.layers.conv2d(inputs=time_freq,filters=W['tfc1'][0],kernel_size=W['tfc1'][1],padding='SAME', strides=1,kernel_initializer=tf.initializers.random_normal,kernel_regularizer=regularizer,name='tfc1')
tfrelu1 = tf.nn.relu(tfc1)
tfpool1 = tf.layers.max_pooling2d(tfrelu1,pool_size=[2, 2],strides=[1, 1])
tfc2 = tf.layers.conv2d(inputs=tfpool1,filters=W['tfc2'][0],kernel_size=W['tfc2'][1],padding='SAME', strides=1,kernel_initializer=tf.initializers.random_normal,kernel_regularizer=regularizer,name='tfc2')
tfc3 = tf.layers.conv2d(inputs=tfc2,filters=W['tfc3'][0],kernel_size=W['tfc3'][1],padding='SAME', strides=1,kernel_initializer=tf.initializers.random_normal,kernel_regularizer=regularizer,name='tfc3')
tfrelu2 = tf.nn.relu(tfc3)
tfpool2 = tf.layers.max_pooling2d(tfrelu2,pool_size=[2, 2], strides=[1, 1])
tfc4 = tf.layers.conv2d(inputs=tfpool2,filters=W['tfc4'][0],kernel_size=W['tfc4'][1],padding='SAME', strides=1,kernel_initializer=tf.initializers.random_normal,kernel_regularizer=regularizer,name='tfc4')
tfsubnet_out = tf.nn.relu6(tfc4,'tfrelu61')
########################################################################################################
##Flatten subnet outputs
tsubnet_out = tf.layers.flatten(tsubnet_out)
fsubnet_out = tf.layers.flatten(fsubnet_out)
tfsubnet_out = tf.layers.flatten(tfsubnet_out)
qsubnet_out = tf.layers.flatten(qsubnet_out)
#Final subnet computation
input_final = tf.concat((tsubnet_out,fsubnet_out,qsubnet_out,tfsubnet_out),1)
dense1 = tf.layers.dense(input_final,W['dense1'],tf.nn.relu, kernel_initializer=tf.initializers.random_normal,name='dense1')
dense2 = tf.layers.dense(dense1,W['dense2'],tf.nn.relu, kernel_initializer=tf.initializers.random_normal,name='dense2')
dense3 = tf.layers.dense(dense2,W['dense3'],tf.nn.relu, kernel_initializer=tf.initializers.random_normal,name='dense3')
dense4 = tf.layers.dense(dense3,W['dense4'],tf.nn.relu, kernel_initializer=tf.initializers.random_normal,name='dense4')
dense5 = tf.layers.dense(dense4,W['dense5'],tf.nn.relu, kernel_initializer=tf.initializers.random_normal,name='dense5')
out = tf.layers.dense(dense5,W['out'],tf.nn.relu, name='out')
return out
Finally after some days, I've been able to track the problem. Which was not related to the code, I submitted, in the end. But it was related to the creation of the Tensorflow Dataset. Since in the batchin, if the length of the Dataset was not divisible by the batch size. The flag drop_remainder to True.
I will not delete the question since I believe is a problem that more people may have in the future and the source is not easily identificable.

Fine-tuning a neural network in tensorflow

I've been working on this neural network with the intent to predict TBA (time based availability) of simulated windmill parks based on certain attributes. The neural network runs just fine, and gives me some predictions, however I'm not quite satisfied with the results. It fails to notice some very obvious correlations that I can clearly see by myself. Here is my current code:
`# Import
import tensorflow as tf
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
maxi = 0.96
mini = 0.7
# Make data a np.array
data = pd.read_csv('datafile_ML_no_avg.csv')
data = data.values
# Shuffle the data
shuffle_indices = np.random.permutation(np.arange(len(data)))
data = data[shuffle_indices]
# Training and test data
data_train = data[0:int(len(data)*0.8),:]
data_test = data[int(len(data)*0.8):int(len(data)),:]
# Scale data
scaler = MinMaxScaler(feature_range=(mini, maxi))
scaler.fit(data_train)
data_train = scaler.transform(data_train)
data_test = scaler.transform(data_test)
# Build X and y
X_train = data_train[:, 0:5]
y_train = data_train[:, 6:7]
X_test = data_test[:, 0:5]
y_test = data_test[:, 6:7]
# Number of stocks in training data
n_args = X_train.shape[1]
multi = int(8)
# Neurons
n_neurons_1 = 8*multi
n_neurons_2 = 4*multi
n_neurons_3 = 2*multi
n_neurons_4 = 1*multi
# Session
net = tf.InteractiveSession()
# Placeholder
X = tf.placeholder(dtype=tf.float32, shape=[None, n_args])
Y = tf.placeholder(dtype=tf.float32, shape=[None,1])
# Initialize1s
sigma = 1
weight_initializer = tf.variance_scaling_initializer(mode="fan_avg",
distribution="uniform", scale=sigma)
bias_initializer = tf.zeros_initializer()
# Hidden weights
W_hidden_1 = tf.Variable(weight_initializer([n_args, n_neurons_1]))
bias_hidden_1 = tf.Variable(bias_initializer([n_neurons_1]))
W_hidden_2 = tf.Variable(weight_initializer([n_neurons_1, n_neurons_2]))
bias_hidden_2 = tf.Variable(bias_initializer([n_neurons_2]))
W_hidden_3 = tf.Variable(weight_initializer([n_neurons_2, n_neurons_3]))
bias_hidden_3 = tf.Variable(bias_initializer([n_neurons_3]))
W_hidden_4 = tf.Variable(weight_initializer([n_neurons_3, n_neurons_4]))
bias_hidden_4 = tf.Variable(bias_initializer([n_neurons_4]))
# Output weights
W_out = tf.Variable(weight_initializer([n_neurons_4, 1]))
bias_out = tf.Variable(bias_initializer([1]))
# Hidden layer
hidden_1 = tf.nn.relu(tf.add(tf.matmul(X, W_hidden_1), bias_hidden_1))
hidden_2 = tf.nn.relu(tf.add(tf.matmul(hidden_1, W_hidden_2),
bias_hidden_2))
hidden_3 = tf.nn.relu(tf.add(tf.matmul(hidden_2, W_hidden_3),
bias_hidden_3))
hidden_4 = tf.nn.relu(tf.add(tf.matmul(hidden_3, W_hidden_4),
bias_hidden_4))
# Output layer (transpose!)
out = tf.transpose(tf.add(tf.matmul(hidden_4, W_out), bias_out))
# Cost function
mse = tf.reduce_mean(tf.squared_difference(out, Y))
# Optimizer
opt = tf.train.AdamOptimizer().minimize(mse)
# Init
net.run(tf.global_variables_initializer())
# Fit neural net
batch_size = 10
mse_train = []
mse_test = []
# Run
epochs = 10
for e in range(epochs):
# Shuffle training data
shuffle_indices = np.random.permutation(np.arange(len(y_train)))
X_train = X_train[shuffle_indices]
y_train = y_train[shuffle_indices]
# Minibatch training
for i in range(0, len(y_train) // batch_size):
start = i * batch_size
batch_x = X_train[start:start + batch_size]
batch_y = y_train[start:start + batch_size]
# Run optimizer with batch
net.run(opt, feed_dict={X: batch_x, Y: batch_y})
# Show progress
if np.mod(i, 50) == 0:
mse_train.append(net.run(mse, feed_dict={X: X_train, Y: y_train}))
mse_test.append(net.run(mse, feed_dict={X: X_test, Y: y_test}))
pred = net.run(out, feed_dict={X: X_test})
print(pred)`
Have tried to tweak around with the number of hidden layers, number of nodes per layer, number of epochs to run and trying different activation functions and optimizers. However, I am quite new to neural networks, so there might be something very obvious that I'm missing.
Thanks in advance to anyone who managed to read through all of that.
It will make is much easier you you will share a small dataset that illustrate the problem. However, I will state some of the issues with non-standards datasets and how to overcome them.
Possible solutions
Regularization and validation-based optimization - are methods that are always good to try when looking for some extra-accuracy. See dropout methods here (original paper), and some overview here.
Unbalanced data - Sometimes of the time series categories/events behave like anomalies, or just in unbalanced ways. If you read a book, words like the or it will appear much more times than warehouse or such. This can become a problem if your main task is to detect the word warehouse and you train your network (even lstms) in traditional ways. A way to overcome this problem is by balancing the samples (creating balanced datasets) or to give more weight to low-frequent categories.
Model structure - sometimes fully connected layers are not enough. See computer vision problems for instance, where we train using convolution layers. The convolution and pooling layers enforce structure on the model, which is suitable for images. This is also some sort of regulation, since we have less parameters in those layers. In time-series problems, convolutions are also possible and turns out that works just fine. See example in Conditional Time Series Forecasting with Convolution Neural Networks.
The above suggestions are presented in the order I would suggest to try.
Good luck!

Verify that keras GaussianNoise is enabled at train time when using inference with edward

I would like to check if noise is truly added and used during training of my neural network. I therefore build my NN with keras like this:
from keras.layers import Input
from keras.layers.noise import GaussianNoise
inp = Input(tensor=self.X_ph)
noised_x = GaussianNoise(stddev=self.x_noise_std)(inp)
x = Dense(15, activation='elu')(noised_x)
x = Dense(15, activation='elu')(x)
self.estimator = x
...
# kernel weights, as output by the neural network
self.logits = logits = Dense(n_locs * self.n_scales, activation='softplus')(self.estimator)
self.weights = tf.nn.softmax(logits)
# mixture distributions
self.cat = cat = Categorical(logits=logits)
self.components = components = [MultivariateNormalDiag(loc=loc, scale_diag=scale) for loc in locs_array for scale in scales_array]
self.mixtures = mixtures = Mixture(cat=cat, components=components, value=tf.zeros_like(self.y_ph))
Then I use edward to execute inference:
self.inference = ed.MAP(data={self.mixtures: self.y_ph})
self.inference.initialize(var_list=tf.trainable_variables(), n_iter=self.n_training_epochs)
tf.global_variables_initializer().run()
According to the documentation, the closest I get to this is through ed.MAP's run() and update() functions.
Preferably, I would do something like this:
noised_x = self.sess.run(self.X_ph, feed_dict={self.X_ph: X, self.y_ph: Y})
np.allclose(noised_x, X) --> False
How can I properly verify that noise is being used at train time and disabled at test time within ed.MAP?
Update 1
Apparently the way I use GaussianNoise doesn't seem to add noise to my input since the following unittest fails:
X, Y = self.get_samples(std=1.0)
model_no_noise = KernelMixtureNetwork(n_centers=5, x_noise_std=None, y_noise_std=None)
model_no_noise.fit(X,Y)
var_no_noise = model_no_noise.covariance(x_cond=np.array([[2]]))[0][0][0]
model_noise = KernelMixtureNetwork(n_centers=5, x_noise_std=20.0, y_noise_std=20.0)
model_noise.fit(X, Y)
var_noise = model_noise.covariance(x_cond=np.array([[2]]))[0][0][0]
self.assertGreaterEqual(var_noise - var_no_noise, 0.1)
I also made sure that during the inference.update(...) the assertion
assert tf.keras.backend.learning_phase() == 1
passes.
Where could have something gone wrong here?

Categories