As said in the title, I am trying to create a mixture of multivariate normal distributions using tensorflow probability package.
In my original project, am feeding the weights of the categorical, the loc and the variance from the output of a neural network. However when creating the graph, I get the following error:
components[0] batch shape must be compatible with cat shape and other component batch shapes
I recreated the same problem using placeholders:
import tensorflow as tf
import tensorflow_probability as tfp # dist= tfp.distributions
tf.compat.v1.disable_eager_execution()
sess = tf.compat.v1.InteractiveSession()
l1 = tf.compat.v1.placeholder(dtype=tf.float32, shape=[None, 2], name='observations_1')
l2 = tf.compat.v1.placeholder(dtype=tf.float32, shape=[None, 2], name='observations_2')
log_std = tf.compat.v1.get_variable('log_std', [1, 2], dtype=tf.float32,
initializer=tf.constant_initializer(1.0),
trainable=True)
mix = tf.compat.v1.placeholder(dtype=tf.float32, shape=[None,1], name='weights')
cat = tfp.distributions.Categorical(probs=[mix, 1.-mix])
components = [
tfp.distributions.MultivariateNormalDiag(loc=l1, scale_diag=tf.exp(log_std)),
tfp.distributions.MultivariateNormalDiag(loc=l2, scale_diag=tf.exp(log_std)),
]
bimix_gauss = tfp.distributions.Mixture(
cat=cat,
components=components)
So, my question is, what am I doing wrong? I looked into the error and it seems tensorshape_util.is_compatible_with is what raises the error but I don't see why.
Thanks!
When the components are the same type, MixtureSameFamily should be more performant.
There you only pass a single Categorical instance (with .batch_shape [b1,b2,...,bn]) and a single MVNDiag instance (with .batch_shape [b1,b2,...,bn,numcats]).
For only two classes, I wonder if Bernoulli would work?
It seems you provided a mis-shaped input to tfp.distributions.Categorical. It's probs parameter should be of shape [batch_size, cat_size] while the one you provide is rather [cat_size, batch_size, 1]. So maybe try to parametrize probs with tf.concat([mix, 1-mix], 1).
There may also be a problem with yourlog_std which doesn't have the same shape as l1and l2. In case MultivariateNormalDiag doesn't properly broadcast it, try to specify it's shape as (None, 2) or to tile it so that it's first dimension corresponds to that of your location parameters.
Related
I'm trying to recreate a transformer that was written in Pytorch and make it Tensorflow. Everything was going pretty well until each version of MultiHeadAttention started giving extremely different outputs. Both methods are an implementation of multi-headed attention as described in the paper "Attention is all you Need", so they should be able to achieve the same output.
I'm converting
self_attn = nn.MultiheadAttention(dModel, nheads, dropout=dropout)
to
self_attn = MultiHeadAttention(num_heads=nheads, key_dim=dModel, dropout=dropout)
For my tests, dropout is 0.
I'm calling them with:
self_attn(x,x,x)
where x is a tensor with shape=(10, 128, 50)
As expected from the documentation, the Pytorch version returns a tuple, (the target sequence length, embedding dimension), both with dimensions [10, 128, 50].
I'm having trouble getting the TensorFlow version to do the same thing. With Tensorflow I only get one tensor back, (size [10, 128, 50]) and it looks like neither the target sequence length or embedding dimension tensor from pytorch.
Based on the Tensorflow documentation I should be getting something comparable.
How can I get them to operate the same way? I'm guessing I'm doing something wrong with Tensorflow but I can't figure out what.
nn.MultiheadAttention outputs by default tuple with two tensors:
attn_output -- result of self-attention operation
attn_output_weights -- attention weights averaged(!) over heads
At the same time tf.keras.layers.MultiHeadAttention outputs by default only one tensor attention_output (which corresponds to attn_output of pytorch). Attention weights of all heads also will be returned if parameter return_attention_scores is set to True, like:
output, scores = self_attn(x, x, x, return_attention_scores=True)
Tensor scores also should be averaged to achieve full correspondence with pytorch:
scores = tf.math.reduce_mean(scores, 1)
While rewriting keep in mind that by default (as in snippet in question) nn.MultiheadAttention expects input in form (seq_length, batch_size, embed_dim), but tf.keras.layers.MultiHeadAttention expects it in form (batch_size, seq_length, embed_dim).
I need a way to have access to the weight matrix in TensorFlow or Keras within each iteration, so that I can convert it into a format that I can use in Numpy to carry out certain operations on it, then send it back to TensorFlow.
For example, I want to change my filter such that some of the neurons are specified by others neuron of the filter. They have to be obtained as solutions of linear systems with other neurons as its coefficients, not by the learning process. As I could not find a way to do this in TensorFlow or Keras, I have to use Numpy.
I have found many questions with the same or similar titles, but none of them helped. I would appreciate any hints.
EDIT
let me explain the problem more clearly
consider the following code
import tensorflow as tf
import numpy as np
x = tf.placeholder(tf.float32, (1, 5, 5, 1))
y = tf.placeholder(tf.float32, (1))
# create variable
weights = {
"my_filter": tf.Variable(tf.truncated_normal([3, 3, 1, 1]), name="my_filter"),
"f_c": tf.Variable(tf.truncated_normal([25,1]), name="f_c") }
conv = tf.nn.conv2d(x, weights["my_filter"], [1,1,1,1], padding='SAME')
flatten= tf.reshape(conv,[1,25])
logits= tf.matmul(flatten,weights["f_c"])
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits,labels= y))
optmize = tf.train.AdamOptimizer()
grads_and_vars = optmize.compute_gradients(cost)
#In this part before applying gradient I have to apply some complicated mathematical operation
train_op=optmize.apply_gradients(grads_and_vars)
train_epochs=10
input_x = np.arange(25).reshape([1,5,5,1])
input_y = np.arange(1)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for epoch in range(train_epochs):
sess.run(train_op, feed_dict={x: input_x, y: input_y})
I have a 5*5 filter named my_filter and I want all its elements to be trained except for one of them, for exaple the (1,1) element, and I need the latter element to be determined by the rest of the elemnets. This has to be done in each iteration. This is exactly where my problem is. I know how to access the weight matrix after the training is finished, but I do not know how to do this within each iteration.
In my code, I have first computed the gradients, then made the changes, and then applied the gradients. But the problem is that the gradients are tuples of types, for exapmle tensor, which are not easy to work with in Numpy. I need some method to convert these data to more familiar Numpy types.
Keras layers, and tf.keras.layers layers, support get_weights / set_weights methods, which return numpy arrays for the weights. So you can call get_weights, modify the result in numpy, and call set_weights back to put the new numpy values into tensorflow.
Something like this:
model = tf.keras.Sequential(...)
for batch in data:
model.fit(batch)
if ...:
weights_as_numpy = model.get_weights()
# modify the weights
model.set_weights(weights_as_numpy)
For that, you will need to be able to access to the weights. Instead of defining layer using tf.layers which automatically allocate variable, you can first get a variable yourself, and then call tf.nn instead.
# input
x = tf.placeholder(tf.float32, (1, 5, 5, 1))
dummy_input = np.arange(25).reshape([1,5,5,1])
# create variable
w = tf.get_variable('weight',[3,3,1,1])
# assign the variable to layer e.g. conv
y = tf.nn.conv2d(x, w, [1,1,1,1], padding='SAME')
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
# read the weight
random_weight = sess.run(w, feed_dict={x:dummy_input})
print('random weight', random_weight)
# create some new values for weight
new_weight = np.arange(9).reshape([3,3,1,1])
# load it into the variable
w.load(new_weight,sess)
# read back and print to verify
new_weight = sess.run(w, feed_dict={x:dummy_input})
print('new weight', new_weight)
I am working on tensorflow 1.01.
I am trying to modify an example found at:
https://github.com/nfmcclure/tensorflow_cookbook/tree/master/03_Linear_Regression/07_Implementing_Elasticnet_Regression
My model is a simple linear model
x_data = tf.placeholder(shape=[None, 3], dtype=tf.float32)
y_target = tf.placeholder(shape=[None, 1], dtype=tf.float32)
# Create variables for linear regression
A = tf.Variable(tf.random_normal(shape=[3,1]))
b = tf.Variable(tf.random_normal(shape=[1,1]))
# Declare model operations
model_output = tf.add(tf.matmul(x_data, A), b)
Specifically, I would like to add another L0 penalty term to the model loss, same way as done with L2 norm:
l2_a_loss = tf.reduce_mean(tf.square(A))
elastic_param2 = tf.constant(1.)
e2_term = tf.multiply(elastic_param2, l2_a_loss)
However, I can not compute a loss using L0 norm
elastic_param0= tf.constant(1.)
l0_a_loss= tf.reduce_mean(tf.norm(A,ord=0))
e0_term= tf.multiply(elastic_param0, l0_a_loss)
plugging in the additional term in the model loss
loss = tf.expand_dims(tf.add(tf.add(tf.reduce_mean(tf.square(y_target - model_output)), e0_term), e2_term), 0)
returns
ValueError: 'ord' must be a supported vector norm, got 0.
I was hoping that changing the axis argument value would fix it while also with
l0_a_loss= tf.reduce_mean(tf.norm(A,ord=0,axis=(0,1)))
I still get
ValueError: 'ord' must be a supported matrix norm in ['euclidean', 'fro', 1, inf], got 0
How to minimize the L-0 norm of A in this model?
The tensorflow documentation is wrong (even in current 1.3 version).
As you can see from this commit:
Fix description of tf.norm as it doesn't actually accept ord=0.
This means that you have to implement the L0 norm by yourself, you can't use tf.norm
I have temporarly solved this by:
l0_a_loss=tf.cast( tf.count_nonzero(A), tf.float32)
Looking forward to official documentation/code update in tensorflow
I need to create a random variable inside my model_fn(), having shape [batch_size, 20].
I do not want to pass batch_size as an argument, because then I cannot use a different batch size for prediction.
Removing the parts which do not concern this question, my model_fn() is:
def model(inp, out):
eps = tf.random_normal([batch_size, 20], 0, 1, name="eps"))) # batch_size is the
# value I do not want to hardcode
# dummy example
predictions = tf.add(inp, eps)
return predictions, 1
if I replace [batch_size, 20] by inp.get_shape(), I get
ValueError: Cannot convert a partially known TensorShape to a Tensor: (?, 20)
when running myclf.setup_training().
If I try
def model(inp, out):
batch_size = tf.placeholder("float", [])
eps = tf.random_normal([batch_size.eval(), 20], 0, 1, name="eps")))
# dummy example
predictions = tf.add(inp, eps)
return predictions, 1
I get ValueError: Cannot evaluate tensor using eval(): No default session is registered. Usewith sess.as_default()or pass an explicit session to eval(session=sess) (understandably, because I have not provided a feed_dict)
How can I access the value of batch_size inside model_fn(), while remaining able to change it during prediction?
I wasn't aware of the difference between Tensor.get_shape() and tf.shape(Tensor). The latter works:
eps = tf.random_normal(tf.shape(inp), 0, 1, name="eps")))
As mentionned in Tensorflow 0.8 FAQ:
How do I build a graph that works with variable batch sizes?
It is often useful to build a graph that works with variable batch
sizes, for example so that the same code can be used for (mini-)batch
training, and single-instance inference. The resulting graph can be
saved as a protocol buffer and imported into another program.
When building a variable-size graph, the most important thing to
remember is not to encode the batch size as a Python constant, but
instead to use a symbolic Tensor to represent it. The following tips
may be useful:
Use batch_size = tf.shape(input)[0] to extract the batch dimension
from a Tensor called input, and store it in a Tensor called
batch_size.
Use tf.reduce_mean() instead of tf.reduce_sum(...) / batch_size.
If you use placeholders for feeding input, you can specify a variable
batch dimension by creating the placeholder with tf.placeholder(...,
shape=[None, ...]). The None element of the shape corresponds to a
variable-sized dimension.
In ipython I imported tensorflow as tf and numpy as np and created an TensorFlow InteractiveSession.
When I am running or initializing some normal distribution with numpy input, everything runs fine:
some_test = tf.constant(np.random.normal(loc=0.0, scale=1.0, size=(2, 2)))
session.run(some_test)
Returns:
array([[-0.04152317, 0.19786302],
[-0.68232622, -0.23439092]])
Just as expected.
...but when I use the Tensorflow normal distribution function:
some_test = tf.constant(tf.random_normal([2, 2], mean=0.0, stddev=1.0, dtype=tf.float32))
session.run(some_test)
...it raises a Type error saying:
(...)
TypeError: List of Tensors when single Tensor expected
What am I missing here?
The output of:
sess.run(tf.random_normal([2, 2], mean=0.0, stddev=1.0, dtype=tf.float32))
alone returns the exact same thing which np.random.normal generates -> a matrix of shape (2, 2) with values taken from a normal distribution.
The tf.constant() op takes a numpy array (or something implicitly convertible to a numpy array), and returns a tf.Tensor whose value is the same as that array. It does not accept a tf.Tensor as its argument.
On the other hand, the tf.random_normal() op returns a tf.Tensor whose value is generated randomly according to the given distribution each time it runs. Since it returns a tf.Tensor, it cannot be used as the argument to tf.constant(). This explains the TypeError (which is unrelated to the use of tf.InteractiveSession, since it occurs when you build the graph).
I'm assuming you want your graph to include a tensor that (i) is randomly generated on its first use, and (ii) constant thereafter. There are two ways to do this:
Use NumPy to generate the random value and put it in a tf.constant(), as you did in your question:
some_test = tf.constant(
np.random.normal(loc=0.0, scale=1.0, size=(2, 2)).astype(np.float32))
(Potentially faster, as it can use the GPU to generate the random numbers) Use TensorFlow to generate the random value and put it in a tf.Variable:
some_test = tf.Variable(
tf.random_normal([2, 2], mean=0.0, stddev=1.0, dtype=tf.float32)
sess.run(some_test.initializer) # Must run this before using `some_test`