Batch normalization with 3D convolutions in TensorFlow

Batch normalization with 3D convolutions in TensorFlow - python

I'm implementing a model relying on 3D convolutions (for a task that is similar to action recognition) and I want to use batch normalization (see [Ioffe & Szegedy 2015]). I could not find any tutorial focusing on 3D convs, hence I'm making a short one here which I'd like to review with you.
The code below refers to TensorFlow r0.12 and it explicitly instances variables - I mean I'm not using tf.contrib.learn except for the tf.contrib.layers.batch_norm() function. I'm doing this both to better understand how things work under the hood and to have more implementation freedom (e.g., variable summaries).
I will get to the 3D convolution case smoothly by first writing the example for a fully-connected layer, then for a 2D convolution and finally for the 3D case. While going through the code, it would be great if you could check if everything is done correctly - the code runs, but I'm not 100% sure about the way I apply batch normalization. I end this post with a more detailed question.
import tensorflow as tf
# This flag is used to allow/prevent batch normalization params updates
# depending on whether the model is being trained or used for prediction.
training = tf.placeholder_with_default(True, shape=())
Fully-connected (FC) case
# Input.
INPUT_SIZE = 512
u = tf.placeholder(tf.float32, shape=(None, INPUT_SIZE))
# FC params: weights only, no bias as per [Ioffe & Szegedy 2015].
FC_OUTPUT_LAYER_SIZE = 1024
w = tf.Variable(tf.truncated_normal(
[INPUT_SIZE, FC_OUTPUT_LAYER_SIZE], dtype=tf.float32, stddev=1e-1))
# Layer output with no activation function (yet).
fc = tf.matmul(u, w)
# Batch normalization.
fc_bn = tf.contrib.layers.batch_norm(
fc,
center=True,
scale=True,
is_training=training,
scope='fc-batch_norm')
# Activation function.
fc_bn_relu = tf.nn.relu(fc_bn)
print(fc_bn_relu) # Tensor("Relu:0", shape=(?, 1024), dtype=float32)
2D convolutional (CNN) layer case
# Input: 640x480 RGB images (whitened input, hence tf.float32).
INPUT_HEIGHT = 480
INPUT_WIDTH = 640
INPUT_CHANNELS = 3
u = tf.placeholder(tf.float32, shape=(None, INPUT_HEIGHT, INPUT_WIDTH, INPUT_CHANNELS))
# CNN params: wights only, no bias as per [Ioffe & Szegedy 2015].
CNN_FILTER_HEIGHT = 3 # Space dimension.
CNN_FILTER_WIDTH = 3 # Space dimension.
CNN_FILTERS = 128
w = tf.Variable(tf.truncated_normal(
[CNN_FILTER_HEIGHT, CNN_FILTER_WIDTH, INPUT_CHANNELS, CNN_FILTERS],
dtype=tf.float32, stddev=1e-1))
# Layer output with no activation function (yet).
CNN_LAYER_STRIDE_VERTICAL = 1
CNN_LAYER_STRIDE_HORIZONTAL = 1
CNN_LAYER_PADDING = 'SAME'
cnn = tf.nn.conv2d(
input=u, filter=w,
strides=[1, CNN_LAYER_STRIDE_VERTICAL, CNN_LAYER_STRIDE_HORIZONTAL, 1],
padding=CNN_LAYER_PADDING)
# Batch normalization.
cnn_bn = tf.contrib.layers.batch_norm(
cnn,
data_format='NHWC', # Matching the "cnn" tensor which has shape (?, 480, 640, 128).
center=True,
scale=True,
is_training=training,
scope='cnn-batch_norm')
# Activation function.
cnn_bn_relu = tf.nn.relu(cnn_bn)
print(cnn_bn_relu) # Tensor("Relu_1:0", shape=(?, 480, 640, 128), dtype=float32)
3D convolutional (CNN3D) layer case
# Input: sequence of 9 160x120 RGB images (whitened input, hence tf.float32).
INPUT_SEQ_LENGTH = 9
INPUT_HEIGHT = 120
INPUT_WIDTH = 160
INPUT_CHANNELS = 3
u = tf.placeholder(tf.float32, shape=(None, INPUT_SEQ_LENGTH, INPUT_HEIGHT, INPUT_WIDTH, INPUT_CHANNELS))
# CNN params: wights only, no bias as per [Ioffe & Szegedy 2015].
CNN3D_FILTER_LENGHT = 3 # Time dimension.
CNN3D_FILTER_HEIGHT = 3 # Space dimension.
CNN3D_FILTER_WIDTH = 3 # Space dimension.
CNN3D_FILTERS = 96
w = tf.Variable(tf.truncated_normal(
[CNN3D_FILTER_LENGHT, CNN3D_FILTER_HEIGHT, CNN3D_FILTER_WIDTH, INPUT_CHANNELS, CNN3D_FILTERS],
dtype=tf.float32, stddev=1e-1))
# Layer output with no activation function (yet).
CNN3D_LAYER_STRIDE_TEMPORAL = 1
CNN3D_LAYER_STRIDE_VERTICAL = 1
CNN3D_LAYER_STRIDE_HORIZONTAL = 1
CNN3D_LAYER_PADDING = 'SAME'
cnn3d = tf.nn.conv3d(
input=u, filter=w,
strides=[1, CNN3D_LAYER_STRIDE_TEMPORAL, CNN3D_LAYER_STRIDE_VERTICAL, CNN3D_LAYER_STRIDE_HORIZONTAL, 1],
padding=CNN3D_LAYER_PADDING)
# Batch normalization.
cnn3d_bn = tf.contrib.layers.batch_norm(
cnn3d,
data_format='NHWC', # Matching the "cnn" tensor which has shape (?, 9, 120, 160, 96).
center=True,
scale=True,
is_training=training,
scope='cnn3d-batch_norm')
# Activation function.
cnn3d_bn_relu = tf.nn.relu(cnn3d_bn)
print(cnn3d_bn_relu) # Tensor("Relu_2:0", shape=(?, 9, 120, 160, 96), dtype=float32)
What I would like to make sure is whether the code above exactly implements batch normalization as described in [Ioffe & Szegedy 2015] at the end of Sec. 3.2:
For convolutional layers, we additionally want the normalization to obey the convolutional property – so that different elements of the same feature map, at different locations, are normalized in the same way. To achieve this, we jointly normalize all the activations in a minibatch, over all locations. [...] Alg. 2 is modified similarly, so that during inference the BN transform applies the same linear transformation to each activation in a given feature map.
UPDATE
I guess the code above is also correct for the 3D conv case. In fact, when I define my model if I print all the trainable variables, I also see the expected numbers of beta and gamma variables. For instance:
Tensor("conv3a/conv3d_weights/read:0", shape=(3, 3, 3, 128, 256), dtype=float32)
Tensor("BatchNorm_2/beta/read:0", shape=(256,), dtype=float32)
Tensor("BatchNorm_2/gamma/read:0", shape=(256,), dtype=float32)
This looks ok to me since due to BN, one pair of beta and gamma are learned for each feature map (256 in total).
[Ioffe & Szegedy 2015]: Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

That is a great post about 3D batchnorm, it's often unnoticed that batchnorm can be applied to any tensor of rank greater than 1. Your code is correct, but I couldn't help but add a few important notes on this:
A "standard" 2D batchnorm (accepts a 4D tensor) can be significantly faster in tensorflow than 3D or higher, because it supports fused_batch_norm implementation, which applies one kernel operation:
Fused batch norm combines the multiple operations needed to do batch
normalization into a single kernel. Batch norm is an expensive process
that for some models makes up a large percentage of the operation
time. Using fused batch norm can result in a 12%-30% speedup.
There is an issue on GitHub to support 3D filters as well, but there hasn't been any recent activity and at this point the issue is closed unresolved.
Although the original paper prescribes using batchnorm before ReLU activation (and that's what you did in the code above), there is evidence that it's probably better to use batchnorm after the activation. Here's a comment on Keras GitHub by Francois Chollet:
... I can guarantee that recent code written by Christian [Szegedy]
applies relu
before BN. It is still occasionally a topic of debate, though.
For anyone interested to apply the idea of normalization in practice, there's been recent research developments of this idea, namely weight normalization and layer normalization, which fix certain disadvantages of original batchnorm, for example they work better for LSTM and recurrent networks.

Related

Why does the global average pooling work in ResNet?

Lately, I start a project about classification, using a very shallow ResNet.
The model just has 10 conv. layer and then connects a Global avg pooling layer before softmax layer.
The performance is good as my expectation --- 93% (yeah, it is ok).
However, for some reasons, I need replace the Global avg pooling layer.
I have tried the following ways:
(Given the input shape of this layer [-1, 128, 1, 32], tensorflow form)
Global max pooling layer. but got 85% ACC
Exponential Moving Average. but got 12% (almost didn't work)
split_list = tf.split(input, 128, axis=1)
avg_pool = split_list[0]
beta = 0.5
for i in range(1, 128):
avg_pool = beta*split_list[i] + (1-beta)*avg_pool
avg_pool = tf.reshape(avg_pool, [-1,32])
Split input into 4 parts, avg_pool each parts, finally concatenate them.
but got 75%
split_shape = [32,32,32,32]
split_list = tf.split(input,
split_shape,
axis=1)
for i in range(len(split_shape)):
split_list[i] = tf.keras.layers.GlobalMaxPooling2D()(split_list[i])
avg_pool = tf.concat(split_list, axis=1)
Average the last channel. [-1, 128, 1, 32] --> [-1, 128], didn't work.
^
Use a conv. layer with 1 kernel. In this way, the output shape is [-1, 128, 1, 1]. but didn't work, 25% or so.
I am pretty confused why global average pooling can work that well?
And is there any other way to replace it?

Global Average Pooling has the following advantages over the fully connected final layers paradigm:
The removal of a large number of trainable parameters from the model. Fully connected or dense layers have lots of parameters. A 7 x 7 x 64 CNN output being flattened and fed into a 500 node dense layer yields 1.56 million weights which need to be trained. Removing these layers speeds up the training of your model.
The elimination of all these trainable parameters also reduces the tendency of over-fitting, which needs to be managed in fully connected layers by the use of dropout.
The authors argue in the original paper that removing the fully connected classification layers forces the feature maps to be more closely related to the classification categories – so that each feature map becomes a kind of “category confidence map”.
Finally, the authors also argue that, due to the averaging operation over the feature maps, this makes the model more robust to spatial translations in the data. In other words, as long as the requisite feature is included / or activated in the feature map somewhere, it will still be “picked up” by the averaging operation.

pytorch : unable to understand model.forward function

I am learning deep learning and am trying to understand the pytorch code given below. I'm struggling to understand how the probability calculation works. Can somehow break it down in lay-man terms. Thanks a ton.
ps = model.forward(images[0,:])
# Hyperparameters for our network
input_size = 784
hidden_sizes = [128, 64]
output_size = 10
# Build a feed-forward network
model = nn.Sequential(nn.Linear(input_size, hidden_sizes[0]),
nn.ReLU(),
nn.Linear(hidden_sizes[0], hidden_sizes[1]),
nn.ReLU(),
nn.Linear(hidden_sizes[1], output_size),
nn.Softmax(dim=1))
print(model)
# Forward pass through the network and display output
images, labels = next(iter(trainloader))
images.resize_(images.shape[0], 1, 784)
print(images.shape)
ps = model.forward(images[0,:])

I'm a layman so I'll help you with the layman's terms :)
input_size = 784
hidden_sizes = [128, 64]
output_size = 10
These are parameters for the layers in your network. Each neural network consists of layers, and each layer has an input and an output shape.
Specifically input_size deals with the input shape of the first layer. This is the input_size of the entire network. Each sample that is input into the network will be a 1 dimension vector that is length 784 (array that is 784 long).
hidden_size deals with the shapes inside the network. We will cover this a little later.
output_size deals with the output shape of the last layer. This means that our network will output a 1 dimensional vector that is length 10 for each sample.
Now to break up model definition line by line:
model = nn.Sequential(nn.Linear(input_size, hidden_sizes[0]),
The nn.Sequential part simply defines a network, each argument that is input defines a new layer in that network in that order.
nn.Linear(input_size, hidden_sizes[0]) is an example of such a layer. It is the first layer of our network takes in an input of size input_size, and outputs a vector of size hidden_sizes[0]. The size of the output is considered "hidden" in that it is not the input or the output of the whole network. It "hidden" because it's located inside of the network far from the input and output ends of the network that you interact with when you actually use it.
This is called Linear because it applies a linear transformation by multiplying the input by its weights matrix and adding its bias matrix to the result. (Y = Ax + b, Y = output, x = input, A = weights, b = bias).
nn.ReLU(),
ReLU is an example of an activation function. What this function does is apply some sort of transformation to the output of the last layer (the layer discussed above), and outputs the result of that transformation. In this case the function being used is the ReLU function, which is defined as ReLU(x) = max(x, 0). Activation functions are used in neural networks because they create non-linearities. This allows your model to model non-linear relationships.
nn.Linear(hidden_sizes[0], hidden_sizes[1]),
From what we discussed above, this is a another example of a layer. It takes an input of hidden_sizes[0] (same shape as the output of the last layer) and outputs a 1D vector of length hidden_sizes[1].
nn.ReLU(),
Apples the ReLU function again.
nn.Linear(hidden_sizes[1], output_size)
Same as the above two layers, but our output shape is the output_size this time.
nn.Softmax(dim=1))
Another activation function. This activation function turns the logits outputted by nn.Linear into an actual probability distribution. This lets the model output the probability for each class. At this point our model is built.
# Forward pass through the network and display output
images, labels = next(iter(trainloader))
images.resize_(images.shape[0], 1, 784)
print(images.shape)
These are simply just preprocessing training data and putting it into the correct format
ps = model.forward(images[0,:])
This passes the images through the model (forward pass) and applies the operations previously discussed in layer. You get the resultant output.

After going through convolution steps, what should be the shape of the tensor in fully connected layer?

So let's assume that I have RGB images of shape [128,128,3], I want to create a CNN with two Conv-ReLu-MaxPool layers as below.
def cnn(input_data):
#conv1
conv1_weight = tf.Variable(tf.truncated_normal([4,4,3,25], stddev=0.1,),tf.float32)
conv1_bias = tf.Variable(tf.zeros([25]), tf.float32)
conv1 = tf.nn.conv2d(input_data, conv1_weight, [1,1,1,1], 'SAME')
relu1 = tf.nn.relu(tf.nn.add(conv1, conv1_bias))
max_pool1 = tf.nn.max_pool(relu1, [1,2,2,1], [1,1,1,1], 'SAME')
#conv2
conv2_weight = tf.Variable(tf.truncated_normal([4,4,25,50]),0.1,tf.float32)
conv2_bias = tf.Variable(tf.zeros([50]), tf.float32)
conv2 = tf.nn.conv2d(max_pool1, conv2_weight, [1,1,1,1], 'SAME')
relu2 = tf.nn.relu(tf.nn.add(conv2, conv2_bias))
max_pool2 = tf.nn.max_pool(relu2, [1,2,2,1], [1,1,1,1], 'SAME')
After this step, I need to transform the output into 1xN layer for the next fully connected layer. However, I am not sure how I should determine what N is in 1xN. Is there a specific formula including the layer size, strides, max pool size, image size etc? I am pretty lost in this phase of the problem even though I think that I get the intuition behind a CNN.

I understand that you want to transform the multiple 2D feature maps that come out of the last convolutional/pooling layer to a vector that can be fed into a fully-connected layer. Or to be precise and include the batch dimension, go from shape [batch, width, height, feature_maps] to [batch, N].
The above already implies that N = batch * width * height since reshaping keeps the overall number of elements the same. width and height depend on the size of your inputs and the strides of your network layers (convolution and/or pooling).
A stride of x simply divides the size by x. You have inputs of size 128 in each dimension, and two pooling layers with stride 2. Thus after the first pooling layer your images are 64x64 and after the second they are 32x32, so width = height = 32. Normally we would have to account for padding as well but the point of SAME padding is precisely that we don't have to worry about that.
Finally, feature_maps is 50 since that is how many filters your last convolutional layer has (pooling doesn't modify this). So N = 32*32*50 = 51200.
Thus, you should be able to do tf.reshape(max_pool2, [-1, 51200]) (or tf.reshape(max_pool2, [-1, 32*32*50]) to keep it more interpretable) and feed the resulting 2D tensor through a fully-connected layer (i.e. tf.matmul).
The simplest way would be to just use tf.layers.flatten(max_pool2). This function does all the above for you and just gives you the [batch, N] result.

First of all since you are starting out, I would recommend Keras instead of pure tensorflow. And to answer your question regarding the shape refer this blog by Andrej karpathy
Quote from the blog:
We can compute the spatial size of the output volume as a function of the input volume size (W), the receptive field size of the Conv Layer neurons (F), the stride with which they are applied (S), and the amount of zero padding used (P) on the border. You can convince yourself that the correct formula for calculating how many neurons “fit” is given by (W−F+2P)/S+1. For example for a 7x7 input and a 3x3 filter with stride 1 and pad 0 we would get a 5x5 output. With stride 2 we would get a 3x3 output.
Now coming to your tensorflow's implementation:
For the conv1 stage you have given a 4*4 filter having a depth of 25. Since you have used padding="SAME" for conv1 and maxpooling1 your output 2D spatial dimensions will be same as input for both the cases. That is after conv1 your output size is: 128*128*25. For the same reason the output of your maxpool1 layer is also the same. Since you have given padding to be "SAME" for the second conv2 also your output shape is 128*128*50(you changed the output channels). Thus after maxpool2 your dimensions are: batch_size, 128*128*50. Thus before adding Dense layer you have 3 major options:
1) flatten the tensor results in a shape : batch_size, 128*128*50
2) global average pooling results in a shape : batch_size, 50
3) global max pooling also results in a shape : batch_size, 50.
Note:
global average pooling layer is similar to average pooling but, we average the entire feature map instead of a window. Hence the name global. For example: in your case you have batch_size, 128,128,50 as your dimensions. This means you have 50 feature maps with spatial dimensions 128*128. What global average pooling does is that, it
Averages the 128*128 feature map to give a single number. Thus you will have 50 values in total. This is very useful in designing fully convolutional architectures like inception, resnet etc. Because, this makes the network's input generic meaning you can send any image size as input to the network. Global max pooling is very similar to above but the slight difference is it finds the max value of the feature map instead of average.
Problems with this architecture:
Generally it is not recommended to use padding = "SAME" in maxpooling layers. If you see the source code of vgg16 you will see that after each block (conv relu and maxpooling) the input size is halved. Thus the general structure is you reduce the spatial dimension while increasing the depth/channels.

Flattening the layer:
var_name = tf.layers.flatten(max_pool2)
Should work, and it's what almost every example of a Tensorflow CNN uses.

How to implement Tensorflow batch normalization in LSTM

My current LSTM network looks like this.
rnn_cell = tf.contrib.rnn.BasicRNNCell(num_units=CELL_SIZE)
init_s = rnn_cell.zero_state(batch_size=1, dtype=tf.float32) # very first hidden state
outputs, final_s = tf.nn.dynamic_rnn(
rnn_cell, # cell you have chosen
tf_x, # input
initial_state=init_s, # the initial hidden state
time_major=False, # False: (batch, time step, input); True: (time step, batch, input)
)
# reshape 3D output to 2D for fully connected layer
outs2D = tf.reshape(outputs, [-1, CELL_SIZE])
net_outs2D = tf.layers.dense(outs2D, INPUT_SIZE)
# reshape back to 3D
outs = tf.reshape(net_outs2D, [-1, TIME_STEP, INPUT_SIZE])
Usually, I apply tf.layers.batch_normalization as batch normalization. But I am not sure if this works in a LSTM network.
b1 = tf.layers.batch_normalization(outputs, momentum=0.4, training=True)
d1 = tf.layers.dropout(b1, rate=0.4, training=True)
# reshape 3D output to 2D for fully connected layer
outs2D = tf.reshape(d1, [-1, CELL_SIZE])
net_outs2D = tf.layers.dense(outs2D, INPUT_SIZE)
# reshape back to 3D
outs = tf.reshape(net_outs2D, [-1, TIME_STEP, INPUT_SIZE])

If you want to use batch norm for RNN (LSTM or GRU), you can check out this implementation , or read the full description from blog post.
However, the layer-normalization has more advantage than batch norm in sequence data. Specifically, "the effect of batch normalization is dependent on the mini-batch size and it is not obvious how to apply it to recurrent networks" (from the paper Ba, et al. Layer normalization).
For layer normalization, it normalizes the summed inputs within each layer. You can check out the implementation of layer-normalization for GRU cell:

Based on this paper: "Layer Normalization" - Jimmy Lei Ba, Jamie Ryan Kiros, Geoffrey E. Hinton
Tensorflow now comes with the tf.contrib.rnn.LayerNormBasicLSTMCell a LSTM unit with layer normalization and recurrent dropout.
Find the documentation here.

Why do Keras Conv1D layers' output tensors not have the input dimension?

According to the keras documentation (https://keras.io/layers/convolutional/) the shape of a Conv1D output tensor is (batch_size, new_steps, filters) while the input tensor shape is (batch_size, steps, input_dim). I don't understand how this could be since that implies that if you pass a 1d input of length 8000 where batch_size = 1 and steps = 1 (I've heard steps means the # of channels in your input) then this layer would have an output of shape (1,1,X) where X is the number of filters in the Conv layer. But what happens to the input dimension? Since the X filters in the layer are applied to the entire input dimension shouldn't one of the output dimensions be 8000 (or less depending on padding), something like (1,1,8000,X)? I checked and Conv2D layers behave in a way that makes more sense their output_shape is (samples, filters, new_rows, new_cols) where new_rows and new_cols would be the dimensions of an input image again adjusted based on padding. If Conv2D layers preserve their input dimensions why don't Conv1D layers? Is there something I'm missing here?
Background Info:
I'm trying to visualize 1d convolutional layer activations of my CNN but most tools online I've found seem to just work for 2d convolutional layers so I've decided to write my own code for it. I've got a pretty good understanding of how it works here is the code I've got so far:
# all the model's activation layer output tensors
activation_output_tensors = [layer.output for layer in model.layers if type(layer) is keras.layers.Activation]
# make a function that computes activation layer outputs
activation_comp_function = K.function([model.input, K.learning_phase()], activation_output_tensors)
# 0 means learning phase = False (i.e. the model isn't learning right now)
activation_arrays = activation_comp_function([training_data[0,:-1], 0])
This code is based off of julienr's first comment in this thread, with some modifications for the current version of keras. Sure enough when I use it though all the activation arrays are of shape (1,1,X)... I spent all day yesterday trying to figure out why this is but no luck any help is greatly appreciated.
UPDATE: Turns out I mistook the meaning of the input_dimension with the steps dimension. This is mostly because the architecture I used came from another group that build their model in mathematica and in mathematica an input shape of (X,Y) to a Conv1D layer means X "channels" (or input_dimension of X) and Y steps. A thank you to gionni for helping me realize this and explaining so well how the "input_dimension" becomes the "filter" dimension.

I used to have the same problem with 2D convolutions. The thing is that when you apply a convolutional layer the kernel you are applying is not of size (kernel_size, 1) but actually (kernel_size, input_dim).
If you think of it if it wasn't this way a 1D convolutional layer with kernel_size = 1 would be doing nothing to the inputs it received.
Instead it is computing a weighted average of the input features at each time step, using the same weights for each time step (although every filter uses a different set of weights). I think it helps to visualize input_dim as the number of channels in a 2D convolution of an image, where the same reaoning applies (in that case is the channels that "get lost" and trasformed into the number of filters).
To convince yourself of this, you can reproduce the 1D convolution with a 2D convolution layer using kernel_size=(1D_kernel_size, input_dim) and the same number of filters. Here an example:
from keras.layers import Conv1D, Conv2D
import keras.backend as K
import numpy as np
# create an input with 4 steps and 5 channels/input_dim
channels = 5
steps = 4
filters = 3
val = np.array([list(range(i * channels, (i + 1) * channels)) for i in range(1, steps + 1)])
val = np.expand_dims(val, axis=0)
x = K.variable(value=val)
# 1D convolution. Initialize the kernels to ones so that it's easier to compute the result by hand
conv1d = Conv1D(filters=filters, kernel_size=1, kernel_initializer='ones')(x)
# 2D convolution that replicates the 1D one
# need to add a dimension to your input since conv2d expects 4D inputs. I add it at axis 4 since my keras is setup with `channel_last`
val1 = np.expand_dims(val, axis=3)
x1 = K.variable(value=val1)
conv2d = Conv2D(filters=filters, kernel_size=(1, 5), kernel_initializer='ones')(x1)
# evaluate and print the outputs
print(K.eval(conv1d))
print(K.eval(conv2d))
As I said, it took me a while too to understand this, I think mostly because no tutorial explains it clearly

Thanks, It's very useful.
here the same code adapted using recent version of tensorflow + keras
and stacking on axis 0 to build the 4D
# %%
from tensorflow.keras.layers import Conv1D, Conv2D
from tensorflow.keras.backend import eval
import tensorflow as tf
import numpy as np
# %%
# create an 3D input with format BLC (Batch, Layer, Channel)
batch = 10
layers = 3
channels = 5
kernel = 2
val3D = np.random.randint(0, 100, size=(batch, layers, channels))
x = tf.Variable(val3D.astype('float32'))
# %%
# 1D convolution. Initialize the kernels to ones so that it's easier to compute the result by hand / compare
conv1d = Conv1D(filters=layers, kernel_size=kernel, kernel_initializer='ones')(x)
# %%
# 2D convolution that replicates the 1D one
# need to add a dimension to your input since conv2d expects 4D inputs. I add it at axis 0 since my keras is setup with `channel_last`
# stack 3 time the same
val4D = np.stack([val3D,val3D,val3D], axis=0)
x1 = tf.Variable(val4D.astype('float32'))
# %%
# 2D convolution. Initialize the kernel_size to one for the 1st kernel size so that replicate the conv1D
conv2d = Conv2D(filters=layers, kernel_size=(1, kernel), kernel_initializer='ones')(x1)
# %%
# evaluate and print the outputs
print(eval(conv1d))
print('---------------------------------------------')
# display only one of the stacked
print(eval(conv2d)[0])

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.