expand tensor in tensorflow - python

In TensorFlow, I intend to manipulate tensor with Taylor series of sin(x) with certain approximation terms. To do so, I have tried to manipulate the grayscale image (shape of (32,32)) with Taylor series of sin(x) and it works fine. Now I have trouble manipulating the same things that worked for a grayscale image with the shape of (32,32) to RGB image with the shape of (32,32,3), and it doesn't give me the correct array. Intuitively, I am trying to manipulate tensor with Taylor's expansion of sin(x). Can anyone show me the possible way of doing this in tensorflow? Any idea?
my attempt:
here is taylor expansion of sin(x) at x=0: 1- x + x**2/2 - x**3/6 with three expansion term.
from tensorflow.keras.datasets import mnist
(X_train, y_train), (X_test, y_test) = mnist.load_data()
x= X_train[1,:,:,1]
k= 3
func = 'sin(x)'
new_x = np.zeros((x.shape[0], x.shape[1]*k))
new_x = new_x.astype('float32')
nn = 0
for i in range(x.shape[1]):
col_d = x[:,i].ravel()
new_x[:,nn] = col_d
if n_terms > 0:
for j in range(1,k):
if func == 'cos(x)':
new_x[:,nn+j] = new_x[:,nn+j-1]
I think I could do this more efficiently with TensorFlow but that's not quite intuitive for me how to do it. Can anyone suggest a possible workaround to make this work? Any thought?
update:
In 2dim array col_d = x[:,i].ravel() is pixel vector which flattened 2 dim array. Similarly, we could reshape 3dim array to 2 dim by this way: x.transpose(0,1,2).reshape(x.shape[1],-1) in for loop, so it could be x[:,i].transpose(0,1,2).reshape(x.shape[1],-1), but this is still not correct. I think tensorflow might have better way of doing this. How can we manipulate the tensor with taylor series of sin(x) more efficiently? Any thoughts?
goal:
Intuitively, in Taylor series of sin(x), x is tensor, and if we want only 2, 3 approximation terms of Taylor series of sin(x) for each tensor, I want to concatenate them in new tensor. How should we do it efficiently in TensorFlow? Any thoughts?

new_x = np.zeros((x.shape[0], x.shape[1]*n_terms))
This line has no meaning, why allocating space for 96 elements for 3 taylor expansion terms.
(new_x[:, 3:] == 0.0).all() = True # check
For pixelwise taylor expansion with n-terms
def sin_exp_step(x, i):
c1 = 2 * i + 1
c2 = (-1) ** i / np.math.factorial(c1)
t = c2 * (x ** c1)
return t
# validate
x = 45.0
x = (np.pi / 180.0) * x
y = np.sin(x)
approx_y = 0
for i in range(n_terms):
approx_y += sin_exp_step(x, i)
abs(approx_y - y) < 1e-8
x= X_train[1,:,:,:]
n_terms = 3
func = 'sin(x)'
new_x = np.zeros((*x.shape, n_terms))
for i in range(0, n_terms):
if func == 'sin(x)': # sin(x)
new_x[..., i] += sin_exp_step(x, i)
Commonly numerical approximation methods are being avoided, as they are computationally expensive (i.e. factorial) and less stable, so gradient based optimization usually is the best, for a higher order derivatives algorithms such BFGS and LBFGS used to approximate hessian matrix (2nd order derivative). Optimizers such Adam & SGD are sufficient and comes with much less computational consumption. Using neural network, we might be able to find a much better expansions.
Tensorflow solution for n-terms expansion
import numpy as np
import tensorflow as tf
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.layers import Input, LocallyConnected2D
from tensorflow.keras.models import Model
from tensorflow.keras import backend as K
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
x_train = tf.constant(x_train, dtype=tf.float32)
x_test = tf.constant(x_test, dtype=tf.float32)
def expansion_approx_of(func):
def reconstruction_loss(y_true, y_pred):
loss = (y_pred - func(y_true)) ** 2
loss = 0.5 * K.mean(loss)
return loss
return reconstruction_loss
class Expansion2D(LocallyConnected2D): # n-terms expansion layer
def __init__(self, i_shape, n_terms, kernel_size=(1, 1), *args, **kwargs):
if len(i_shape) != 3:
raise ValueError('...')
self.i_shape = i_shape
self.n_terms = n_terms
filters = self.n_terms * self.i_shape[-1]
super(Expansion2D, self).__init__(filters=filters, kernel_size=kernel_size,
use_bias=False, *args, **kwargs)
def call(self, inputs):
shape = (-1, self.i_shape[0], self.i_shape[1], self.i_shape[-1], self.n_terms)
out = super().call(inputs)
expansion = tf.reshape(out, shape)
out = tf.math.reduce_sum(expansion, axis=-1)
return out, expansion
inputs = Input(shape=(32, 32, 3))
# expansion: might be a taylor expansion or something better.
out, expansion = Expansion2D(i_shape=(32, 32, 3), n_terms=3)(inputs)
model = Model(inputs, [out, expansion])
opt = tf.keras.optimizers.Adam(learning_rate=0.0001, beta_1=0.9, beta_2=0.999)
loss = expansion_approx_of(K.sin)
model.compile(optimizer=opt, loss=[loss])
model.summary()
model.fit(x_train, x_train, batch_size=1563, epochs=100)
x_pred, x_exp = model.predict_on_batch(x_test[:32])
print((x_exp[0].sum(axis=-1) == x_pred[0]).all())
err = abs(x_pred - np.sin(x_test[0])).mean()
print(err)

Put three expansion terms into a tensor at axis=1
x = tf.ones([8, 32, 32, 3], tf.float32) * 0.5 # example batchsize=8, imageshape=[32, 32, 3]
x = tf.stack([x, - (1/6) * tf.math.pow(x, 3), (1/120) * tf.math.pow(x, 5)], axis=1) # expansion of three terms of sin(x), [8, 3, 32, 32, 3]
If you would go with tf.keras Functional API or Sequential API, you might make a Keras custom layer
tf.math.pow
tf.stack
Edit: In the first answer, I recommended tf.keras.layers.Lambda, but it might not work with tf.math.pow or tf.stack (I haven't tried). You would go with Keras custom layer.

I think you can do this for 1D tensor as:
def expend_func(x):
p1 = x
p2 = x - ((x**2)/2)
new_x = K.concatenate([p1, p2], axis=1)
return new_x
note that x is your 1D tensor, new_x with two terms. If you need new_x with three terms, you might modify expend_funcs with three terms. for 2D tensor, you should use tf.stack() which is not the elegant way but that might help.

Related

Creating BLEU loss method on tensorflow gives "No gradient provided"

I need to build a custom loss method based on BLEU. I'm passing my LabelEncoder in the constructor to reverse labels and predictions and calculate the bleu distance.
Here is my Loss class
class CIMCodeSuccessiveLoss(Loss):
def __init__(self, labelEncoder: LabelEncoder):
super().__init__()
self.le = labelEncoder
def bleu_score(self, true_label, pred_label):
cim_true_label = self.le.inverse_transform(true_label.numpy())
cim_pred_label = self.le.inverse_transform(pred_label.numpy())
bleu_scores = [sentence_bleu(list(one_true_label),
list(one_pred_label),
weights=(0.5, 0.25, 0.125, 0.125)) for one_true_label, one_pred_label in
zip(cim_true_label, cim_pred_label)]
return np.float32(bleu_scores)
def call(self, y_true, y_pred):
labeled_y_pred = tf.cast(tf.argmax(y_pred, axis=-1), tf.int32)
bleu = tf.py_function(self.bleu_score, (tf.reshape(y_true, [-1]), labeled_y_pred), tf.float32)
return tf.reduce_sum(tf.square(1 - bleu))
The bleu_score method is calculating the correct scores and returns a NumPy array.
when I try to return the squared sum, I get this error
raise ValueError(f"No gradients provided for any variable: {variable}.
I'm also providing the model:
inputs = tf.keras.Input(shape=(1,), dtype=tf.string)
x = vectorize_layer(inputs)
x = Embedding(vocab_size, embedding_dim, name="embedding")(x)
x = LSTM(units=32, name="lstm")(x)
outputs = Dense(classes_number, name="classification")(x)
model = tf.keras.Model(inputs=inputs, outputs=outputs, name="first_cim_classifier")
model.summary()
# we add early stopping for our model.
early_stopping = EarlyStopping(monitor='loss', patience=2)
model.compile(
loss=CIMCodeSuccessiveLoss(le),
optimizer=tf.keras.optimizers.Adam(),
metrics=["accuracy", "crossentropy"],
run_eagerly=True)
trained_model = model.fit(np.array(x_train), np.array(y_train), batch_size=64, epochs=10,
validation_data=(np.array(x_val), np.array(y_val)),
callbacks=[early_stopping])
Any help is appreciated. Thanks in advance.
To calculate the loss function, you use the method 'tf.argmax(y_pred, axis=-1)',argmax is not differentiable and the automatic differentiation to calculate the gradients is not possible, you have to remove this method, for example (depending on your data) you can change the output layer to softmax and labels to one_hot.
The issue is, the argmax function is not a differentiable, which is problematic when including it in a loss function:
labeled_y_pred = tf.cast(tf.argmax(y_pred, axis=-1), tf.int32)
One way to workaround this is to use a differentiable approximation of the argmax function, similar to the smooth maximum function:
As β approaches infinity, this will approach the the true maximum. For your purposes, β=10 or β=100 should accomplish your goals.
In Tensorflow, this could be accomplished as follows:
def differentiable_argmax_approx(x, beta=10, axis=None):
return tf.reduce_sum(tf.cumsum(tf.ones_like(x)) * tf.exp(beta * x) / tf.reduce_sum(tf.exp(beta * x), axis=axis), axis=axis) - 1
Then changing the original line to:
labeled_y_pred = tf.cast(differentiable_argmax_approx(y_pred, axis=-1), tf.int32)
We can verify the functionality with a simple test case:
beta = 10
x = np.array([1, 2, 3, 10, 4, 5], dtype=np.float)
y = differentiable_argmax_approx(x, beta)
assert x.argmax() == y
One caveat to this approach: if the maximum value is not unique along the axis that we're applying the function to, the result will be the arithmetic mean of the indices. Providing another test case to illustrate:
beta = 10
x = np.array([1, 2, 10, 3, 10], dtype=np.float)
y = differentiable_argmax_approx(x, beta)
assert y == 3
The result is 3 here, because we have two occurrences of the maximum value (10): one at index 2, and the other at index 4. In contrast, the regular argmax function returns the first index of the maximum argument.
Another improvement would be moving more computation into Tensorflow functions. To start, instead of using sklearn's LabelEncoder, to apply a mapping in the loss function, you could use a tf.lookup.StaticHashTable to accomplish the same objective with the Tensorflow API. To convert from a LabelEncoder to a tf.lookup.StaticHashTable, you can use the following function:
def convert_label_encoder_to_static_hash_table(le: LabelEncoder,
default_value: int = -1) -> tf.lookup.StaticHashTable:
static_hash_table = tf.lookup.StaticHashTable(
tf.lookup.KeyValueTensorInitializer(
tf.convert_to_tensor(le.classes_),
tf.convert_to_tensor(le.transform(le.classes_))), default_value=default_value)
return static_hash_table
Or, for your purposes, since you're applying the inverse mapping (to go from integers -> string), you may want to swap the key and the values:
def convert_label_encoder_to_static_hash_table(le: LabelEncoder,
default_value: int = "") -> tf.lookup.StaticHashTable:
static_hash_table = tf.lookup.StaticHashTable(
tf.lookup.KeyValueTensorInitializer(
tf.convert_to_tensor(le.transform(le.classes_)),
tf.convert_to_tensor(le.classes_))), default_value=default_value)
return static_hash_table
and, in the initializer:
def __init__(self, labelEncoder: LabelEncoder):
super().__init__()
self.table = convert_label_encoder_to_static_hash_table(labelEncoder)
By operating on tf.Tensor objects, you can utilize tf.map_fn instead of using a for-loop and converting to a numpy array/lists - your loss function would become:
def bleu_score(self, true_label, pred_label):
cim_true_label = self.table[true_label]
cim_pred_label = self.table[pred_label]
bleu_scores = tf.map_fn(lambda x: sentence_bleu([str(x[0])], [str(x[1])], weights=(0.5, 0.25, 0.125, 0.125)),
elems=tf.stack([(ground_truth, pred) for ground_truth, pred in
zip(cim_pred_label, cim_true_label)],
dtype=(tf.string, tf.string),
fn_output_signature=tf.int32))
return bleu_scores
This should also mitigate the need to call tf.py_func in the loss computation, since the bleu_score function is now entirely Tensorflow operations instead of calling native Python functions.

Tensorflow AutoGraph Polynomial Model With Multiple Outputs

I have a tensorflow model whose outputs correspond to coefficients of multiple polynomials. Note that my model actually has another set outputs (multi-output), but I've mocked this below just by returning the input in addition to the polynomial coefficients.
I'm having a lot of trouble during the training of the model, related to tensor shapes. I've verified that the model is able to predict on sample inputs, and that the loss function works on sample outputs. But, during training, it immediately throws an error (see below)
For every input, the model takes in a fixed embedding-size input, and outputs coefficients for 2 polynomials of degree 2. For example, the output on a single input can look like:
[array([[[1, 2, 3],
[ 4, 5, 6]]]),
[...]]
corresponding to polynomials [1*x^2+2*x+3, 4*x^2+5*x+6]. Note that I've hidden the second output.
I noticed that tf.math.polyval requires a list of coefficients, making it wonky with AutoGrad. So, I implemented my own version of Horner's algorithm with pure tensors.
import numpy as np
import tensorflow as tf
import logging
import tensorflow.keras as K
#tf.function
def tensor_polyval(coeffs, x):
"""
Calculates polynomial scalars from tensor of polynomial coefficients
Tensorflow tf.math.polyval requires a list coeff, which isn't compatible with autograd
# Inputs:
- coeffs (NxD Tensor): each row of coeffs corresponds to r[0]*x^(D-1)+r[1]*x^(D-2)...+r[D]
- x: Scalar!
# Output:
- r[0]*x^(D-1)+r[1]*x^(D-2)...+r[D] for row in coeffs
"""
p = coeffs[:, 0]
for i in range(1,coeffs.shape[1]):
tf.autograph.experimental.set_loop_options(
shape_invariants=[(p, tf.TensorShape([None]))])
c = coeffs[:, i]
p = tf.add(c, tf.multiply(x, p))
return p
#tf.function
def coeffs_to_poly(coeffs, n):
# Converts a NxD array of coefficients to N evaluated polynomials at x=n
return tensor_polyval(coeffs, tf.convert_to_tensor(n))
Now here's a super-simplified example of my model, loss function and training routine:
def model_init(embedDim=8, polyDim=2,terms=2):
input = K.Input(shape=(embedDim,))
x = K.layers.Reshape((embedDim,))(input)
aCoeffs = K.layers.Dense((polyDim+1)*terms, activation='tanh')(x)
aCoeffs = K.layers.Reshape((terms, polyDim+1))(aCoeffs)
model = K.Model(inputs=input, outputs=[aCoeffs, input])
return model
def get_random_batch(batch, embedDim, dtype='float64'):
x = np.random.randn(batch, embedDim).astype(dtype)
y = np.array([1. for i in range(batch)]).astype(dtype)
return [x,
y]
#tf.function
def test_loss(y_true, y_pred, dtype=dataType):
an = tf.vectorized_map(lambda y_p: coeffs_to_poly(y_p[0],
tf.constant(5,dtype=dataType)),
y_pred)
return tf.reduce_mean(tf.reduce_mean(an,axis=-1))
embedDim=8
polyDim=2
terms=2
dataType = 'float64'
tf.keras.backend.set_floatx(dataType)
model = model_init(embedDim, polyDim, terms)
XTrain, yTrain = get_random_batch(batch=128,
embedDim=embedDim)
# Init Model
LR = 0.001
loss = test_loss
epochs = 5
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=LR), loss=loss)
hist = model.fit(XTrain,
yTrain,
batch_size=4,
epochs=epochs,
max_queue_size=10, workers=2, use_multiprocessing=True)
The error I get is related to the tensor_polyval function:
<ipython-input-15-f96bd099fe08>:3 test_loss *
an = tf.vectorized_map(lambda y_p: coeffs_to_poly(y_p[0],
<ipython-input-5-7205207d12fd>:23 coeffs_to_poly *
return tensor_polyval(coeffs, tf.convert_to_tensor(n))
<ipython-input-5-7205207d12fd>:13 tensor_polyval *
p = coeffs[:, 0]
...
ValueError: Index out of range using input dim 1; input has only 1 dims for '{{node strided_slice}} = StridedSlice[Index=DT_INT32, T=DT_DOUBLE, begin_mask=1, ellipsis_mask=0, end_mask=1, new_axis_mask=0, shrink_axis_mask=2](coeffs, strided_slice/stack, strided_slice/stack_1, strided_slice/stack_2)' with input shapes: [3], [2], [2], [2] and with computed input tensors: input[3] = <1 1>.
What's frustrating is that I'm perfectly able to predict with the model on sample inputs and also calculate a sample loss:
test_loss(yTrain[0:5],
model.predict(XTrain[0:5]),
dtype=dataType)
which runs just fine.
In the test_loss function, specifically the I'm just referring to the first output, via y_p[0]. It tries to calculate the value of the polynomials at n=5 and then outputs an average over everything (again this is just mocked code). As I understand it, y_p[1] would refer to the second output (in this case, a copy of the input). I would think the tf.vectorized_map should be operating across all outputs of the model batch, but it seems to be slicing one extra dimension??
I noticed that the code does train if I remove the output ,input in the model (making it a single output) and change y_p[0] to y_p in the test_loss. I have no idea why it's broken when adding the extra output, as my understanding of tf.vectorized_map implies that it acts separately on each element of the list y_pred
If we need the single loss function to receive multiple outputs altogether, perhaps we can concatenate them together to form one output.
In this case:
Changes to the model structure, here we pack the outputs:
def model_init(embedDim=8, polyDim=2, terms=2):
input = K.Input(shape=(embedDim, ))
x = K.layers.Reshape((embedDim, ))(input)
aCoeffs = K.layers.Dense((polyDim + 1) * terms, activation='tanh')(x)
# pack the two outputs, add flatten layers if their shapes are not batch*K
outputs = K.layers.Concatenate()([aCoeffs, input])
model = K.Model(inputs=input, outputs=outputs)
model.summary()
return model
Changes to the loss function, here we unpack the outputs:
# the loss function needs to know these
polyDim = 2
terms = 2
#tf.function
def test_loss(y_true, y_pred, dtype=dataType):
"""Loss function for flattened outputs."""
# unpack multiple outputs
offset = (polyDim + 1) * terms
aCoeffs = tf.reshape(y_pred[:, :offset], [-1, terms, polyDim + 1])
inputs = y_pred[:, offset:]
print(aCoeffs, inputs)
# do something with the two unpacked outputs, like below
an = tf.vectorized_map(
lambda y_p: coeffs_to_poly(y_p, tf.constant(5, dtype=dataType)),
aCoeffs)
return tf.reduce_mean(tf.reduce_mean(an, axis=-1))
Notice that the loss function relies on the knowledge of the original shapes of the outputs in order to restore them. Consider sub-classing tf.keras.losses.Loss.
P.S. For anyone simply need different losses for the multiple losses:
Define loss functions for the two outputs.
#tf.function
def test_loss(y_true, y_pred, dtype=dataType):
"""Loss function for output 1
(Only changed y_p[0] to y_p)"""
an = tf.vectorized_map(
lambda y_p: coeffs_to_poly(y_p, tf.constant(5, dtype=dataType)),
y_pred)
return tf.reduce_mean(tf.reduce_mean(an, axis=-1))
#tf.function
def dummy_loss(y_true, y_pred, dtype=dataType):
"""Loss function for output 2 i.e. the input, for debugging
Better use 0 insead of 1.2345"""
return tf.constant(1.2345, dataType)
Change to model.compile:
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=LR), loss=[test_loss, dummy_loss])

How to write a custom loss function in Keras/Tensorflow that uses loops/iterations with reference numpy code

I saw this question: Implementing custom loss function in keras with condition And I need to do the same thing but with code that seems to need loops.
I have a custom numpy function which calculates the mean Euclid distance from the mean vector. I wrote this based on the paper https://arxiv.org/pdf/1801.05365.pdf:
import numpy as np
def mean_euclid_distance_from_mean_vector(n_vectors):
dists = []
for (i, v) in enumerate(n_vectors):
n_vectors_rest = n_vectors[np.arange(len(n_vectors)) != i]
print("rest of vectors: ")
print(n_vectors_rest)
# calculate mean vector
mean_rest = n_vectors_rest.mean(axis=0)
print("mean rest vector")
print(mean_rest)
dist = v - mean_rest
print("dist vector")
print(dist)
dists.append(dist)
# dists is now a matrix of distance vectors (distance from the mean vector)
dists = np.array(dists)
print("distance vector matrix")
print(dists)
# here we matmult each vector
# sum them up
# and divide by the total number of elements
result = np.sum([np.matmul(d, d) for d in dists]) / dists.size
return result
features = np.array([
[1,2,3,4],
[4,3,2,1]
])
c = mean_euclid_distance_from_mean_vector(features)
print(c)
I need this function however to work inside tensorflow with Keras. So a custom lambda https://www.tensorflow.org/api_docs/python/tf/keras/layers/Lambda
However, I'm not sure how to implement the above in Keras/Tensorflow since it has loops, and the way the paper talked about calculating the m_i seems to require loops like the way I implemented the above.
For reference, the PyTorch version of this code is here: https://github.com/PramuPerera/DeepOneClass
Given a feature map like:
features = np.array([
[1, 2, 3, 4],
[2, 4, 4, 3],
[3, 2, 1, 4],
], dtype=np.float64)
reflecting a batch_size of
batch_size = features.shape[0]
and
k = features.shape[1]
One has that implementing the above Formulas in Tensorflow could be expressed (prototyped) by:
dim = (batch_size, features.shape[1])
def zero(i):
arr = np.ones(dim)
arr[i] = 0
return arr
mapper = [zero(i) for i in range(batch_size)]
elems = (features, mapper)
m = (1 / (batch_size - 1)) * tf.map_fn(lambda x: tf.math.reduce_sum(x[0] * x[1], axis=0), elems, dtype=tf.float64)
pairs = tf.map_fn(lambda x: tf.concat(x, axis=0) , tf.stack([features, m], 1), dtype=tf.float64)
compactness_loss = (1 / (batch_size * k)) * tf.map_fn(lambda x: tf.math.reduce_euclidean_norm(x), pairs, dtype=tf.float64)
with tf.Session() as sess:
print("loss value output is: ", compactness_loss.eval())
Which yields:
loss value output is: [0.64549722 0.79056942 0.64549722]
However a single measure is required for the batch, therefore it is necessary to reduce it; by the summation of all values.
The wanted Compactness Loss function à la Tensorflow is:
def compactness_loss(actual, features):
features = Flatten()(features)
k = 7 * 7 * 512
dim = (batch_size, k)
def zero(i):
z = tf.zeros((1, dim[1]), dtype=tf.dtypes.float32)
o = tf.ones((1, dim[1]), dtype=tf.dtypes.float32)
arr = []
for k in range(dim[0]):
arr.append(o if k != i else z)
res = tf.concat(arr, axis=0)
return res
masks = [zero(i) for i in range(batch_size)]
m = (1 / (batch_size - 1)) * tf.map_fn(
# row-wise summation
lambda mask: tf.math.reduce_sum(features * mask, axis=0),
masks,
dtype=tf.float32,
)
dists = features - m
sqrd_dists = tf.pow(dists, 2)
red_dists = tf.math.reduce_sum(sqrd_dists, axis=1)
compact_loss = (1 / (batch_size * k)) * tf.math.reduce_sum(red_dists)
return compact_loss
Of course the Flatten() could be moved back into the model for convenience and the k could be derived directly from the feature map; this answers your question. You may just have some trouble finding out the the expected values for the model are - feature maps from the VGG16 (or any other architechture) trained against the imagenet for instance?
The paper says:
In our formulation (shown in Figure 2 (e)), starting froma pre-trained deep model, we freeze initial features (gs) and learn (gl) and (hc). Based on the output of the classification sub-network (hc), two losses compactness loss and descriptiveness loss are evaluated. These two losses, introduced in the subsequent sections, are used to assess the quality of the learned deep feature. We use the provided one-class dataset to calculate the compactness loss. An external multi-class reference dataset is used to evaluate the descriptiveness loss.As shown in Figure 3, weights of gl and hc are learned in the proposed method through back-propagation from the composite loss. Once training is converged, system shown in setup in Figure 2(d) is used to perform classification where the resulting model is used as the pre-trained model.
then looking at the "Framework" backbone here plus:
AlexNet Binary and VGG16 Binary (Baseline). A binary CNN is trained by having ImageNet samples and one-class image samples as the two classes using AlexNet andVGG16 architectures, respectively. Testing is performed using k-nearest neighbor, One-class SVM [43], Isolation Forest [3]and Gaussian Mixture Model [3] classifiers.
Makes me wonder whether it would not be reasonable to add suggested the dense layers to both the Secondary and the Reference Networks to a single class output (Sigmoid) or even and binary class output (using Softmax) and using the mean_squared_error as the so called Compactness Loss and binary_cross_entropy as the Descriptveness Loss.

Why are my weights after training in this strange format?

I am currently trying to learn logistic regression, and am stuck on plotting a line from the weights after training. I am expecting an array of 3 values, but when I print the weights to check them, I get (with different values each time, but the same format):
[array([[ 0.42433906],
[-0.67847246]], dtype=float32)
array([-0.06681705], dtype=float32)]
My question, is why are the weights in this format of 2 arrays, rather than 1 array of length 3? And how do I interpret these weights so that I can plot the separating line?
Here is my code:
from tensorflow import keras
from keras.models import Sequential
from keras.layers import Dense
from keras.regularizers import L1L2
import random
import numpy as np
# return the array data of shape (m, 2) and the array labels of shape (m, 1)
def get_random_data(w, b, mu, sigma, m): # slope, y-intercept, mean of the data, standard deviation, size of arrays
data = np.empty((m, 2))
labels = np.empty((m, 1))
# fill the arrays with random data
for i in range(m):
c = (random.random() > 0.5) # 0 with probability 1/2 and 1 with probability 1/2
n = random.normalvariate(mu, sigma) # noise using normal distribution
x_1 = random.random() # uniform distribution on [0, 1)
x_2 = w * x_1 + b + (-1)**c * n
labels[i] = c
data[i][0] = x_1
data[i][1] = x_2
# the train set is the first 80% of our data, and the test set is the following 20%
train_length = int(round(m * 0.8, 1))
train_data = np.empty((train_length, 2))
train_labels = np.empty((train_length, 1))
test_data = np.empty((m - train_length, 2))
test_labels = np.empty((m - train_length, 1))
for i in range(train_length):
train_data[i] = data[i]
train_labels[i] = labels[i]
for i in range(train_length, m):
test_data[i - train_length] = data[i]
test_labels[i - train_length] = labels[i]
return (train_data, train_labels), (test_data, test_labels)
(train_data, train_labels), (test_data, test_labels) = get_random_data(2,3,100,100,200)
model = Sequential()
model.add(Dense(train_labels.shape[1],
activation='sigmoid',
kernel_regularizer=L1L2(l1=0.0, l2=0.1),
input_dim=(train_data.shape[1])))
model.compile(optimizer='sgd',
loss='binary_crossentropy',
metrics=['accuracy'])
model.fit(train_data, train_labels, epochs=100, validation_data=(test_data,test_labels))
weights = np.asarray(model.get_weights())
print("the weights are " , weights)
The first index of the array shows the weights of coefficients and the second array shows the bias.
So you have a equation like below.
h(x) = 0.42433906x1 + -0.67847246x2 + -0.06681705
Logistic regression takes this equation and applies sigmoid function to squeeze the results between 0-1.
So if you want to draw an equation of a line, you can use do it with the returned weights like I explained above.

Tensorflow with output specified as integer or float

I am working on applying DL to a regression problem and some of the outputs need to be integers while others can be floats. So far I have built a NN which returns floats for all but I want to go to the next step and actually return ints vs floats for the different outputs.
Previously I asked a question where I provided a simple example of regression for y = m * x + b which I was able to solve on my own. In this example, how would the code be changed to ensure b is integer while m is float?
#!/usr/bin/env python3
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
#################
### CONSTANTS ###
#################
ARANGE = (-5.0, 5.0) # Possible values for m in training data
BRANGE = (0.0, 10.0) # Possible values for b in training data
X_MIN = 1.0
X_MAX = 9.0
N = 10 # Number of grid points
M = 2 # Number of {(x,y)} sets to train on
def gen_ab(arange, brange):
""" mrange, brange are tuples of floats """
a = (arange[1] - arange[0])*np.random.rand() + arange[0]
b = (brange[1] - brange[0])*np.random.rand() + brange[0]
return (a, b)
def build_model(x_data, y_data):
""" Build the model using input / output training data
Args:
x_data (np array): Size (m, n*2) grid of input training data.
y_data (np array): Size (m, 2) grid of output training data.
Returns:
model (Sequential model)
"""
model = keras.Sequential()
model.add(layers.Dense(64, activation='relu', input_dim=len(x_data[0])))
model.add(layers.Dense(len(y_data[0])))
optimizer = tf.keras.optimizers.RMSprop(0.001)
model.compile(loss='mse', optimizer=optimizer, metrics=['mae', 'mse'])
return model
def gen_data(xs, arange, brange, m):
""" Generate training data for lines of y = m*x + b
Args:
xs (list): Grid points (size N1)
arange (tuple): Range to use for a (a_min, a_max)
brange (tuple): Range to use for b (b_min, b_max)
m (int): Number of y grids to generate
Returns:
x_data (np array): Size (m, n*2) grid of input training data.
y_data (np array): Size (m, 2) grid of output training data.
"""
n = len(xs)
x_data = np.zeros((m, 2*n))
y_data = np.zeros((m, 2))
for ix in range(m):
(a, b) = gen_ab(arange, brange)
ys = a*xs + b*np.ones(xs.size)
x_data[ix, :] = np.concatenate((xs, ys))
y_data[ix, :] = [a, b]
return (x_data, y_data)
def main():
""" Main routin """
# Generate the x axis grid to be used for all training sets
xs = np.linspace(X_MIN, X_MAX, N)
# Generate the training data
# x_train has M rows (M is the number of training samples)
# x_train has 2*N columns (first N columns are x, second N columns are y)
# y_train has M rows, each of which has two columns (a, b) for y = ax + b
(x_train, y_train) = gen_data(xs, ARANGE, BRANGE, M)
model = build_model(x_train, y_train)
model.fit(x_train, y_train, epochs=10, batch_size=32)
model.summary()
####################
### Test example ###
####################
(a, b) = gen_ab(ARANGE, BRANGE)
ys = a*xs + b*np.ones(xs.size)
rys = np.concatenate((xs, ys))
ab1 = model.predict(x_train)
ab2 = model.predict(np.array([rys]))
if __name__ == "__main__":
main()
I think this would be possible but actually not as trivial as it sounds. You unfortunately can't simply get the NN to output an int and a float and use the normal MSE loss you are using as the discreet nature of the int values prevents the loss function being continuously differentiable like the optimisers need.
If one really wanted to do it they could treat the int output variable as if it were actually a multi class output (treating the float output the same). You would need to craft a loss function out of the combination of these two outputs (multi class + float). You could one-hot-encode and then softmax the multi class outputs. An interesting complication of this is that the neural network would not know that the multi class output is actually ordinal (ordered, since 1<2<3<4 etc.). There have been interesting attempts in the past to help NNs to realise this (see Neural Network Ordinal Classification for Age).

Categories