I want to use a conditional variational autoencoder to generate cocktail recipes. I modified the code from this repo so it can read my own data. The input is an array of all the possible ingredients, so most of the entries have the value 0. If an ingredient is present, it gets a value which is the amount normalized by 250 ml. The last index is what is 'left over' to make sure a cocktail always adds op to 1.
Example:
0,0.0,0.0,0.24,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.6,0.0,0.0,0.0,0.0,0.0,0.06,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.088,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0120000000000000
The output with a softmax activation function looks a bit like this:
[5.8228267e-10 6.7397465e-10 1.9761790e-08 2.3713847e-01 3.1315527e-11
4.9592632e-11 4.2637563e-05 7.6098106e-10 2.9357905e-05 1.3291576e-08
2.6885323e-09 4.2986945e-10 3.0274603e-09 8.6994453e-11 3.2391853e-10
3.3694150e-10 4.9642315e-11 2.2861177e-10 2.5966980e-11 3.3872125e-10
4.8175470e-12 1.1207919e-09 7.8108942e-10 1.0438563e-09 4.7190268e-12
2.2692757e-09 3.3177341e-10 4.7493649e-09 1.6603904e-08 2.7854623e-11
1.1586791e-07 2.3917833e-08 1.0172608e-09 2.2049740e-06 4.0200213e-10
4.8334226e-05 1.9393491e-09 4.0731374e-10 4.5671125e-10 8.5878060e-10
1.3625046e-10 1.7755342e-09 2.4927729e-09 3.8919952e-09 1.6791472e-10
1.5160178e-09 9.0631114e-10 1.2043951e-08 2.1420650e-01 1.4531254e-10
3.9913628e-10 4.6368896e-06 6.8399265e-11 2.4654754e-09 6.5392605e-12
5.8443012e-10 2.7861690e-11 4.7215394e-08 5.1503157e-09 5.4484850e-10
1.9266211e-10 7.2835156e-09 6.4243433e-10 4.2432866e-09 4.2630177e-08
1.1281617e-12 1.8015703e-08 3.5657147e-10 3.4241193e-11 4.8394988e-10
9.6064046e-11 2.9857121e-02 3.8048144e-11 1.1893182e-10 5.1867032e-01]
How can I make sure that the values are only distributed among a couple of ingredients and the rest of the ingredients get 0, similar to the input?
Is this a matter of changing the activation functions?
Thanks :)
I'm not sure you want to use probabilities here. It seems you're doing a regression to some specific values. Hence, it would make sense to not use a softmax, and use a simple mean-squared-error loss.
Note that if certain values are always biased in your loss, you can just use an extra weight on your loss, or use some abstraction (e.g. Keras's class_weight).
For this task you could consider using Keras, especially for this task it would make sense. There is an example checked into master: https://github.com/keras-team/keras/blob/master/examples/variational_autoencoder.py
For this task it might actually make sense to use a GAN: https://github.com/keras-team/keras/blob/master/examples/mnist_acgan.py . You'll let it distinguish between a random cocktail and a 'real' cocktail. It will learn to distinguish between the two, and in the process, it will train the weights to be able to create a generator that will generate cocktails for you!
Related
I want to extract features of a optical image and save them into numpy array . I've seen similar questions , also can be seen here : https://keras.io/getting_started/faq/#how-can-i-obtain-the-output-of-an-intermediate-layer-feature-extraction , but don't know how to go about it .
Keras documentation does exaclty specify how to do that. If you have defined your model model_full you can create another one, that is just a part of it - from the input layer, to the one you're interested in.
model_part = Model(
inputs=model_full.input,
outputs=model_full.get_layer("intermed_layer").output)
Then you should be able to obtain output from intermediate layer using:
intermed_output = model_part(data)
In order to do that, you just need a model_full defined, which I assume you already have.
2nd approach
You can also use built-in Keras function, which I guess you already saw in documentation as well. It may look kind of complicated at first, but it's just creating a function with bound values i.e.
from keras import backend as K
get_3rd_layer_output = K.function(
[model.layers[0].input], # param 1 will be treated as layer[0].output
[model.layers[3].output]) # and this function will return output from 3rd layer
# here X is param 1 (input) and the function returns output from layers[3]
output = get_3rd_layer_output([X])[0]
Clearly, again model has to be defined. Not sure if there are any other requirements apart from that.
How to prune weights of a CNN (convolution neural network) model which is less than a threshold value (let's consider prune all weights which are <= 1).
How we can achieve that for a weight file saved in .pth format in pytorch?
PyTorch since 1.4.0 provides model pruning out of the box, see official tutorial.
As there is no threshold method to prune in PyTorch currently, you have to implement it yourself, though it's kinda easy once you get the overall idea.
Threshold Pruning method
Below is a code performing pruning:
from torch.nn.utils import prune
class ThresholdPruning(prune.BasePruningMethod):
PRUNING_TYPE = "unstructured"
def __init__(self, threshold):
self.threshold = threshold
def compute_mask(self, tensor, default_mask):
return torch.abs(tensor) > self.threshold
Explanation:
PRUNING_TYPE can be one of global, structured, unstructured. global acts across whole module (e.g. remove 20% of weight with smallest value), structured acts on whole channels/modules. We need unstructured as we would like to modify each connection in specific parameter tensor (say weight or bias)
__init__ - pass here whatever you want or need to make it work, normal stuff
compute_mask - mask to be used to prune specific tensor. In our case all parameters below threshold should be zero. I did it with absolute value as it makes more sense. default_mask is not needed here, but is left as named parameter as that's what API requires atm.
Moreover, inheriting from prune.BasePruningMethod defines methods to apply the mask to each parameter, make pruning permanent etc. See base class docs for more info.
Example module
Nothing too fancy, you can put anything you want here:
class MyModule(torch.nn.Module):
def __init__(self):
super().__init__()
self.first = torch.nn.Linear(50, 30)
self.second = torch.nn.Linear(30, 10)
def forward(self, inputs):
return self.second(torch.relu(self.first(inputs)))
module = MyModule()
You can also load your module via module = torch.load('checkpoint.pth')
if you need, it doesn't matter.
Prune module's parameters
We should define which parameter of our module (and whether it's weight or bias) should be pruned, like this:
parameters_to_prune = ((module.first, "weight"), (module.second, "weight"))
Now, we can apply globally our unstructured pruning to all defined parameters (threshold is passed as kwarg to __init__ of ThresholdPruning):
prune.global_unstructured(
parameters_to_prune, pruning_method=ThresholdPruning, threshold=0.1
)
Results
weight attribute
To see the effect, check weights of first submodule simply with:
print(module.first.weight)
It is a weight with our pruning technique applied, but please notice it's not a torch.nn.Parameter anymore! Now it is simply an attribute of our model, hence it won't take part in training or evaluation currently.
weight_mask
We can check created mask via module.first.weight_mask to see everything is done correctly (it will be binary in this case).
weight_orig
Applying pruning creates a new torch.nn.Parameter with original weights named name + _orig, in this case weight_orig, let's see:
print(module.first.weight_orig)
This parameter will be used during training and evaluation currently!. After applying pruning via methods described above there are forward_pre_hooks added which "switch" original weight to weight_orig.
Due to such approach you can define and apply your pruning at any part of training or inference without "destroying" original weights.
Applying pruning permanently
If you wish to apply pruning permanently simply issue:
prune.remove(module.first, "weight")
And now our module.first.weight is once again parameter with entries appropriately pruned, module.first.weight_mask is removed and so is module.first.weight_orig. It's what you are probably after.
You can iterate over children to make it permanent:
for child in module.children():
prune.remove(child, "weight")
You could define parameters_to_prune using the same logic:
parameters_to_prune = [(child, "weight") for child in module.children()]
Or if you want only convolution layers to be pruned (or anything else really):
parameters_to_prune = [
(child, "weight")
for child in module.children()
if isinstance(child, torch.nn.Conv2d)
]
Advantages
uses "PyTorch way of pruning" so it's easier to communicate your intent to other programmers
define pruning on a per-tensor basis, single responsibility instead of going through everything
confine to predefined ways
pruning is not permanent hence you can recover from it if needed. Module can be saved with pruning masks and original weights so it leaves you some space to revert eventual mistake (e.g. threshold was too high and now all your weights are zero rendering results meaningless)
works with original weights during forward calls unless you want to finally change to pruned version (simple call to remove)
Disadvantages
IMO pruning API could be clearer
You can do it shorter (as provided by Shai)
might be confusing for those who do not know such thing is "defined" by PyTorch (still there are tutorials and docs so I don't think it's a major problem)
You can work directly on the values saved in the state_dict:
sd = torch.load('saved_weights.pth') # load the state dicd
for k in sd.keys():
if not 'weight' in k:
continue # skip biases and other saved parameters
w = sd[k]
sd[k] = w * (w > thr) # set to zero weights smaller than thr
torch.save(sd, 'pruned_weights.pth')
Context
I'm reading through part II of Hands on ML and am looking for some clarity on when to use "outputs" and when to use "state" in the loss calculation for a RNN.
In the book (p.396 for those that have the book), the author says, "Note that the fully connected layer is connected to the states tensor, which contains only the final states of the RNN," referring to a sequence classifier that is unrolled over 28 steps. Since the states variable will have len(states) == <number_of_hidden_layers>, when building a deep RNN I have been using states[-1] to only connect to the final state of the final layer. For example:
# hidden_layer_architecture = list of ints defining n_neurons in each layer
# example: hidden_layer_architecture = [100 for _ in range(5)]
layers = []
for layer_id, n_neurons in enumerate(hidden_layer_architecture):
hidden_layer = tf.contrib.rnn.BasicRNNCell(n_neurons,
activation=tf.nn.tanh,
name=f'hidden_layer_{layer_id}')
layers.append(hidden_layer)
recurrent_hidden_layers = tf.contrib.rnn.MultiRNNCell(layers)
outputs, states = tf.nn.dynamic_rnn(recurrent_hidden_layers,
X_, dtype=tf.float32)
logits = tf.layers.dense(states[-1], n_outputs, name='outputs')
This works as expected given the author's previous statement. However, I don't understand when one would use the outputs variable (first output of tf.nn.dynamic_rnn())
I have looked at this question, which does a pretty good job of answering the minutia, and mentioned that, "If you are only interested in the last output of the cell, you can just slice the time dimension to pick just the last element (e.g. outputs[:, -1, :])." I inferred this to mean something along the lines of states[-1] == outputs[:, -1, :], which when tested was false. Why would this not be the case? If the outputs are the outputs of the cell at each time step, why wouldn't this be the case? In general...
Question
When does one use the outputs variable from tf.nn.dynamic_rnn() in the loss function and when would one use the states variable? How does this change the abstracted architecture of the network?
Any clarity would be greatly appreciated.
This basically breaks it down:
outputs: Full sequence of outputs of the top-level of the RNN. This means that, should you be using MultiRNNCell, this will only be the top cell; nothing from the lower cells is in here.
In general, with custom RNNCell implementations, this could be pretty much anything, however pretty much all the standard cells with return the sequence of states here, however you could also write a custom cell yourself that does something to the state sequence (e.g. a linear transformation) before returning it as outputs.
state (note that this is what the docs call it, not states) is the full state of the last time step. One important difference is that, in the case of MultiRNNCell, this will contain the final states of all cells in the sequence, not just the top one! Also, the precise format/type of this output varies heavily depending on the RNNCell used (e.g. it could be a tensor, or a tuple of tensors...).
As such, if all you care about is the top-most state of the last time step in a MultiRNNCell, you really have two options that should be identical, coming down to personal preference/"clarity":
outputs[:, -1, :] (assuming batch-major format) extracts only the last time-step from the sequence of top-level states.
state[-1] extracts only the top-level state from the tuple of final states for all layers.
There are other scenarios where you might not have this choice:
If you actually need the full sequence output, you need to use outputs.
If you need the final states from lower layers in a MultiRNNCell, you need to use state.
As for why the equality check fails: If you actually used ==, I believe this checks for equality of the tensor objects which are obviously different. You could instead try to inspect the values of the two objects for some simple toy scenario (tiny state size/sequence length) -- they should be the same.
I just recently started playing around with Keras and got into making custom layers. However, I am rather confused by the many different types of layers with slightly different names but with the same functionality.
For example, there are 3 different forms of the concatenate function from https://keras.io/layers/merge/ and https://www.tensorflow.org/api_docs/python/tf/keras/backend/concatenate
keras.layers.Concatenate(axis=-1)
keras.layers.concatenate(inputs, axis=-1)
tf.keras.backend.concatenate()
I know the 2nd one is used for functional API but what is the difference between the 3? The documentation seems a bit unclear on this.
Also, for the 3rd one, I have seen a code that does this below. Why must there be the line ._keras_shape after the concatenation?
# Concatenate the summed atom and bond features
atoms_bonds_features = K.concatenate([atoms, summed_bond_features], axis=-1)
# Compute fingerprint
atoms_bonds_features._keras_shape = (None, max_atoms, num_atom_features + num_bond_features)
Lastly, under keras.layers, there always seems to be 2 duplicates. For example, Add() and add(), and so on.
First, the backend: tf.keras.backend.concatenate()
Backend functions are supposed to be used "inside" layers. You'd only use this in Lambda layers, custom layers, custom loss functions, custom metrics, etc.
It works directly on "tensors".
It's not the choice if you're not going deep on customizing. (And it was a bad choice in your example code -- See details at the end).
If you dive deep into keras code, you will notice that the Concatenate layer uses this function internally:
import keras.backend as K
class Concatenate(_Merge):
#blablabla
def _merge_function(self, inputs):
return K.concatenate(inputs, axis=self.axis)
#blablabla
Then, the Layer: keras.layers.Concatenate(axis=-1)
As any other keras layers, you instantiate and call it on tensors.
Pretty straighforward:
#in a functional API model:
inputTensor1 = Input(shape) #or some tensor coming out of any other layer
inputTensor2 = Input(shape2) #or some tensor coming out of any other layer
#first parentheses are creating an instance of the layer
#second parentheses are "calling" the layer on the input tensors
outputTensor = keras.layers.Concatenate(axis=someAxis)([inputTensor1, inputTensor2])
This is not suited for sequential models, unless the previous layer outputs a list (this is possible but not common).
Finally, the concatenate function from the layers module: keras.layers.concatenate(inputs, axis=-1)
This is not a layer. This is a function that will return the tensor produced by an internal Concatenate layer.
The code is simple:
def concatenate(inputs, axis=-1, **kwargs):
#blablabla
return Concatenate(axis=axis, **kwargs)(inputs)
Older functions
In Keras 1, people had functions that were meant to receive "layers" as input and return an output "layer". Their names were related to the merge word.
But since Keras 2 doesn't mention or document these, I'd probably avoid using them, and if old code is found, I'd probably update it to a proper Keras 2 code.
Why the _keras_shape word?
This backend function was not supposed to be used in high level codes. The coder should have used a Concatenate layer.
atoms_bonds_features = Concatenate(axis=-1)([atoms, summed_bond_features])
#just this line is perfect
Keras layers add the _keras_shape property to all their output tensors, and Keras uses this property for infering the shapes of the entire model.
If you use any backend function "outside" a layer or loss/metric, your output tensor will lack this property and an error will appear telling _keras_shape doesn't exist.
The coder is creating a bad workaround by adding the property manually, when it should have been added by a proper keras layer. (This may work now, but in case of keras updates this code will break while proper codes will remain ok)
Keras historically supports 2 different interfaces for their layers, the new functional one and the old one, that requires model.add() calls, hence the 2 different functions.
For the TF -- their concatenate() functions does not do everything that required for Keras to work, hence, the additional calls to make ._keras_shape variable correct and not to upset Keras that expects that variable to have some particular value.
I am trying to implement an Optimizer in tensorflow, and have been looking at optimizer code from old version of tensorflow, and want to understand what does this function _get_variable_for do? It is the first function in the optimizer file.
Any help would be appreciated.
Thanking You.
I see that this function checks two conditions.
ResourceVariable and VarHandleOp
This is a ResourceVariable according to the comments in the code
"For example, if there is more than one assignment to a ResourceVariable in
a single session.run call there is a well-defined value for each operation
which uses the variable's value if the assignments and the read are connected
by edges in the graph. Consider the following example, in which two writes
can cause tf.Variable and tf.ResourceVariable to behave differently:"
a = tf.Variable(1.0, use_resource=True)
a.initializer.run()
assign = a.assign(2.0)
with tf.control_dependencies([assign]):
b = a.read_value()
with tf.control_dependencies([b]):
other_assign = a.assign(3.0)
with tf.control_dependencies([other_assign]):
# Will print 2.0 because the value was read before other_assign ran. If
# `a` was a tf.Variable instead, 2.0 or 3.0 could be printed.
tf.Print(b, [b]).eval()
VarHandleOp seems to have deeper semantics as per this
"A common approach to managing where variables are placed, is to create a method to determine where each Op is to be placed and use that method in place of a specific device name when calling with tf.device(): Consider a scenario where a model is being trained on 2 GPUs and the variables are to be placed on the CPU. There would be a loop for creating and placing the "towers" on each of the 2 GPUs. A custom device placement method would be created that watches for Ops of type Variable, VariableV2, and VarHandleOp and indicates that they are to be placed on the CPU. All other Ops would be placed on the target GPU."
It explains this scenario further with sample code.