I'm trying to apply a separate convolution to each layer of a 3-dimensional array, which brought me to the Keras TimeDistributed layer. But the documentation notes that:
"Because TimeDistributed applies the same instance of Conv2D to each of the
timestamps, the same set of weights are used at each timestamp."
However, I want to perform a separate convolution (with independently defined weights / filters) for each layer of the array, not using the same set of weights. Is there some built in way to do this? Any help is appreciated!
Related
I use MultiHeadAttention layer in my transformer model (my model is very similar to the named entity recognition models). Because my data comes with different lengths, I use padding and attention_mask parameter in MultiHeadAttention to mask padding. If I would use the Masking layer before MultiHeadAttention, will it have the same effect as attention_mask parameter? Or should I use both: attention_mask and Masking layer?
The Tensoflow documentation on Masking and padding with keras may be helpful.
The following is an excerpt from the document.
When using the Functional API or the Sequential API, a mask generated
by an Embedding or Masking layer will be propagated through the
network for any layer that is capable of using them (for example, RNN
layers). Keras will automatically fetch the mask corresponding to an
input and pass it to any layer that knows how to use it.
tf.keras.layers.MultiHeadAttention also supports automatic mask propagation in TF2.10.0.
Improved masking support for tf.keras.layers.MultiHeadAttention.
Implicit masks for query, key and value inputs will automatically be
used to compute a correct attention mask for the layer. These padding
masks will be combined with any attention_mask passed in directly when
calling the layer. This can be used with tf.keras.layers.Embedding
with mask_zero=True to automatically infer a correct padding mask.
Added a use_causal_mask call time arugment to the layer. Passing
use_causal_mask=True will compute a causal attention mask, and
optionally combine it with any attention_mask passed in directly when
calling the layer.
The masking layer keeps the input vector as it and creates a masking vector to be propagated to the following layers if they need a mask vector ( like RNN layers). you can use it if you implement your own model.If you use models from huggingFace, you can use a masking layer for example if you you want to save the mask vector for future use, if not the masking operations are already built_in, so there is no need to add any masking layer at the beginning.
I want to understand the basic operation done in a convolution layer of a quantized model in TensorflowLite.
As a baseline, I chose a pretrained Tensorflow model, EfficientNet-lite0-int8 and used a sample image to serve as input for model's inference. Thereinafter, I managed to extract the output tensor of the first fused ReLU6 Convolution Layer and compared this output with that of my custom python implementation on this.
The deviation between the two tensors was large and something that I cannot explain is that Tensorflow's output tensor was not between the range of [0,6] as expected (I expected that because of the fused ReLU6 layer in the Conv layer).
Could you please provide me with a more detailed description of a quantized fused Relu6 Conv2D layer's operation in TensorflowLite?
After, studying carefully Tensorflow's github repository I found kernel_util.cc file and CalculateActivationRangeUint8 function. So using this function, I managed to understand why quantized fused ReLu6 Conv2D layer's output tensor is not clipped between [0, 6] but between [-128, 127] values. For the record, I managed to implement a Conv2D layer's operation in Python with some simple steps.
Firstly, you got to take layer's parameters(kernel, bias, scales, offsets) using interpreter.get_tensor_details() command and calculate output_multiplier using GetQuantizedConvolutionMultipler and QuantizeMultiplierSmallerThanOne functions.
After that, subtract input offset from the input layer before padding it and implement a simple convolution.
Later, you need to use MultiplyByQuantizedMultiplierSmallerThanOne function that uses SaturatingRoundingDoublingHighMul and RoundingDivideByPOT from gemmlowp/fixedpoint.h library.
Finally, add output_offset to the result and clip it using the values taken from CalculateActivationRangeUint8 function.
Link of the issue on project's github page
I need to implement neurons freezing in CNN for a deep learning research,
I tried to find any function in the Tensorflow docs, but I didn't find anything.
How can I freeze specific neuron when I implemented the layers with tf.nn.conv2d?
A neuron in a dense neural network layer simply corresponds to a column in a weight matrix. You could therefore redefine your weight matrix as a concatenation of 2 parts/variables, one trainable and one not. Then you could either:
selectively pass only the trainable part in the var_list argument of the minimize function of your optimizer, or
Use tf.stop_gradient on the vector/column corresponding to the neuron you want to freeze.
The same concept could be used for convolutional layers, although in this case the definition of a "neuron" becomes unclear; still, you could freeze any column(s) of a convolutional kernel.
As clarified in the comments, you want to freeze Neurons in a tf.nn.conv2d convolution. While there is direct way of doing this in Tensorflow (as per my search), you could try slicing the Tensor and applying tf.stop_gradient() to it. Here is a stackoverflow answer to give you an intuition on how to use tf.stop_gradient()
I haven't tested it, but according to the docs I think it should work.
This might be too stupid to ask ... but ...
When using LSTM after the initial Embedding layer in Keras (for example the Keras LSTM-IMDB tutorial code), how does the Embedding layer know that there is a time dimension? In another word, how does the Embedding layer knowthe length of each sequence in the training data set? How does the Embedding layer know I am training on sentences, not on individual words? Does it simply infer during the training process?
Embedding layer is usually either first or second layer of your model. If it's first (usually when you use Sequential API) - then you need to specify its input shape which is either (seq_len,) or (None,). In a case when it's second layer (usually when you use Functional API) then you need to specify a first layer which is an Input layer. For this layer - you also need to specify shape. In a case when a shape is (None,) then an input shape is inferred from a size of a batch of data fed to a model.
I'd like to reproduce a recurrent neural network where each time layer is followed by a dropout layer, and these dropout layers share their masks. This structure was described in, among others, A Theoretically Grounded Application of Dropout in Recurrent Neural Networks.
As far as I understand the code, the recurrent network models implemented in MXNet do not have any dropout layers applied between time layers; the dropout parameter of functions such as lstm (R API, Python API) actually defines dropout on the input. Therefore I'd need to reimplement these functions from scratch.
However, the Dropout layer does not seem to take a variable that defines mask as a parameter.
Is it possible to make multiple dropout layers in different places of the computation graph, yet sharing their masks?
According to the discussion here, it is not possible to specify the mask and using random seed does not have an impact on dropout's random number generator.