I used Taylor expansion in image classification task. Basically, firstly, pixel vector is generated from RGB image, and each pixel values from pixel vector is going to approximated with Taylor series expansion of sin(x). In tensorflow implementation, I tried possible of coding up this with tensorflow, and I still have some problem when I tried to create feature maps by stacking tensor with expansion terms. Can anyone provide possible perspective how can I make my current attempt more efficient? Any possible thoughts?
Here is the expansion terms of Taylor series of sin(x):
here is my current attempt:
term = 2
c = tf.constant([1, -1/6])
power = tf.constant([1, 3])
x = tf.keras.Input(shape=(32, 32, 3))
res =[]
for x in range(term):
expansion = c * tf.math.pow(tf.tile(x[..., None], [1, 1, 1, 1, term]),power)
m_ij = tf.math.cumsum(expansion, axis=-1)
res.append(m_i)
but this is not quite working because I want to create input features maps from each expansion neurons, delta_1, delta_2 needs to be stacked, which I didn't make correctly in my above attempt, and my code is not well generalized also. How can I refine my above coding attempts in correct way of implementation? Can any one give me possible ideas or canonical answer to improve my current attempts?
If doing series expansion as described, if the input has C channels and the expansion has T terms, the expanded input should have C*T channels and otherwise be the same shape. Thus, the original input and the function being approximated up to each term should be concatenated along the channel dimension. It is a bit easier to do this with a transpose and reshape than an actual concatenate.
Here is example code for a convolutional network trained on CIFAR10:
inputs = tf.keras.Input(shape=(32, 32, 3))
x = inputs
n_terms = 2
c = tf.constant([1, -1/6])
p = tf.constant([1, 3], dtype=tf.float32)
terms = []
for i in range(n_terms):
m = c[i] * tf.math.pow(x, p[i])
terms.append(m)
expansion = tf.math.cumsum(terms)
expansion_terms_last = tf.transpose(expansion, perm=[1, 2, 3, 4, 0])
x = tf.reshape(expansion_terms_last, tf.constant([-1, 32, 32, 3*n_terms]))
x = Conv2D(32, (3, 3), input_shape=(32,32,3*n_terms))(x)
This assumes the original network (without expansion) would have a first layer that looks like this:
x = Conv2D(32, (3, 3), input_shape=(32,32,3))(inputs)
and the rest of the network is exactly the same as it would be without expansion.
terms contains a list of c_i * x ^ p_i from the original; expansion contains the sum of the terms (1st, then 1st and 2nd, etc), in a single tensor (where T is the first dimension). expansion_terms_last moves the T dimension to be last, and the reshape changes the shape from (..., C, T) to (..., C*T)
The output of model.summary() then looks like this:
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_4 (InputLayer) [(None, 32, 32, 3)] 0
__________________________________________________________________________________________________
tf_op_layer_Pow_6 (TensorFlowOp [(None, 32, 32, 3)] 0 input_4[0][0]
__________________________________________________________________________________________________
tf_op_layer_Pow_7 (TensorFlowOp [(None, 32, 32, 3)] 0 input_4[0][0]
__________________________________________________________________________________________________
tf_op_layer_Mul_6 (TensorFlowOp [(None, 32, 32, 3)] 0 tf_op_layer_Pow_6[0][0]
__________________________________________________________________________________________________
tf_op_layer_Mul_7 (TensorFlowOp [(None, 32, 32, 3)] 0 tf_op_layer_Pow_7[0][0]
__________________________________________________________________________________________________
tf_op_layer_x_3 (TensorFlowOpLa [(2, None, 32, 32, 3 0 tf_op_layer_Mul_6[0][0]
tf_op_layer_Mul_7[0][0]
__________________________________________________________________________________________________
tf_op_layer_Cumsum_3 (TensorFlo [(2, None, 32, 32, 3 0 tf_op_layer_x_3[0][0]
__________________________________________________________________________________________________
tf_op_layer_Transpose_3 (Tensor [(None, 32, 32, 3, 2 0 tf_op_layer_Cumsum_3[0][0]
__________________________________________________________________________________________________
tf_op_layer_Reshape_3 (TensorFl [(None, 32, 32, 6)] 0 tf_op_layer_Transpose_3[0][0]
__________________________________________________________________________________________________
conv2d_5 (Conv2D) (None, 30, 30, 32) 1760 tf_op_layer_Reshape_3[0][0]
On CIFAR10, this network trains slightly better with expansion - maybe 1% accuracy gain (from 71 to 72%).
Step by step explanation of the code using sample data:
# create a sample input
x = tf.convert_to_tensor([[1,2,3],[4,5,6],[7,8,9]], dtype=tf.float32) # start with H=3, W=3
x = tf.expand_dims(x, axis=0) # add batch dimension N=1
x = tf.expand_dims(x, axis=3) # add channel dimension C=1
# x is now NHWC or (1, 3, 3, 1)
n_terms = 2 # expand to T=2
c = tf.constant([1, -1/6])
p = tf.constant([1, 3], dtype=tf.float32)
terms = []
for i in range(n_terms):
# this simply calculates m = c_i * x ^ p_i
m = c[i] * tf.math.pow(x, p[i])
terms.append(m)
print(terms)
# list of two tensors with shape NHWC or (1, 3, 3, 1)
# calculate each partial sum
expansion = tf.math.cumsum(terms)
print(expansion.shape)
# tensor with shape TNHWC or (2, 1, 3, 3, 1)
# move the T dimension last
expansion_terms_last = tf.transpose(expansion, perm=[1, 2, 3, 4, 0])
print(expansion_terms_last.shape)
# tensor with shape NHWCT or (1, 3, 3, 1, 2)
# stack the last two dimensions together
x = tf.reshape(expansion_terms_last, tf.constant([-1, 3, 3, 1*2]))
print(x.shape)
# tensor with shape NHW and C*T or (1, 3, 3, 2)
# if the input had 3 channels for example, this would be (1, 3, 3, 6)
# now use this as though it was the input
Key assumptions (1) The c_i and p_i are not learned parameters, therefore the "expansion neurons" are not actually neurons, they are just a multiply and sum node (althrough neurons sounds cooler :) and (2) the expansion happens for each input channel independently, thus C input channels expanded to T terms each produce C*T input features, but the T features from each channel are calculated completely independently of the other channels (it looks like that in the diagram), and (3) the input contains all the partial sums (ie c_1 * x ^ p_1, c_1 * x ^ p_1 + c_2 * x ^ p_2 and so forth) but does not contain the terms (again, looks like it in the diagram)
Related
I'm attempting to use tf.image.pyramids.downsample from tensorflow_graphics in an auto-encoder model in every Down (encoding) block, to be then sent as a skip connection to Up (decoder) blocks.
class DownConv(Model):
n = 0
def __init__(self, kernel_size, filters, initializer, n_lower_levels):
super(DownConv, self).__init__(name=f"DownConv_{DownConv.n}")
DownConv.n += 1
self.pad = tf.constant([[0, 0], [kernel_size // 2, kernel_size // 2], [kernel_size // 2, kernel_size // 2], [0, 0]])
self.conv = L.Conv2D(filters, kernel_size, strides=2, kernel_initializer=initializer)
self.pyramid = None
self.filters = filters
self.n_lower_levels = n_lower_levels
def call(self, input_t):
logger.debug(f"Received {input_t.shape} in {self.name}")
x = tf.pad(input_t, self.pad, "SYMMETRIC")
x = self.conv(x)
p = tf.Variable(x)
self.pyramid = downsample(p, self.n_lower_levels)
pyramods = ", ".join([str(p.shape) for p in self.pyramid])
logger.debug(f"Received {input_t.shape} in {self.name}")
logger.debug(f"Generated pyramids: {pyramods}")
return tf.nn.selu(x)
However, thanks to logging I found out that this doesn't work. It seems only the very first pyramid (the first step of the downsample) contains the channels, the rest of them have None for channels.
self.pyramid[0].shape yields the correct (None, 256, 256, 64), but self.pyramid[1] yields (None, 256, 256, None) during a training step. Note that batches are correctly None here for axis=0, it is normal Tensorflow behavior for error logs.
Due to this issue, the training step produces an error in my Up blocks, when it tries to concatenate the two feature maps:
ValueError: The channel dimension of the inputs should be defined. The input_shape received is (None, 32, 32, None), where axis -1 (0-based) is the channel dimension, which found to be `None`.
Call arguments received:
• input_t=tf.Tensor(shape=(None, 32, 32, 256), dtype=float32)
The keras model is like this:
input_x = Input(shape=input_shape)
x=Conv2D(...)(input_x)
...
y_pred1 = Conv2D(...)(x) # shape of (None, 80, 80, 2)
y_pred2 = Dense(...)(x) # shape of (None, 4)
y_merged = Concatenate(...)([y_pred1, y_pred2])
model = Model(input_x, y_merged)
y_pred1 and y_pred2 are the results I want the model to learn to predict.
But the loss function fcn1 for the y_pred1 branch need y_pred2 prediction results, so I have to concatenate the results of the two branches to get y_merged, so that fcn1 will have access to y_pred2.
The problem is, I want to use the Concatenate layer to concatenate the y_pred1 (None, 4) output with the y_pred2 (None, 80, 80, 2) output, but I don't know how to do that.
How can I reshape the (None, 4) to (None, 80, 80, 1)? For example, by filling the (None, 80, 80, 1) with the 4 elements in y_pred2 and zeros.
Is there any better solutions than using the Concatenate layer?
Maybe this extracted piece of code could help you:
tf.print(condi_input.shape)
# shape is TensorShape([None, 1])
condi_i_casted = tf.expand_dims(condi_input, 2)
tf.print(condi_i_casted.shape)
# shape is TensorShape([None, 1, 1])
broadcasted_val = tf.broadcast_to(condi_i_casted, shape=tf.shape(decoder_outputs))
tf.print(broadcasted_val.shape)
# shape is TensorShape([None, 23, 256])
When you want to broadcast a value, first think about what exactly you want to broadcast. In this example, condi_input has shape(None,1) and helped me as a condition for my encoder-decoder lstm network. To match all dimensionalities, of the encoder states of the lstm, first I had to use tf.expand_dims() to expand the condition value from a shape like [[1]] to [[[1]]].
This is what you need to do first. If you have a prediction as a softmax from the dense layers, you might want to use tf.argmax() first, so you only have one value, which is way easier to broadcast. However, its also possible with 4 but keep in mind, that the dimensions need to match. You cannot broadcast shape(None,4) to shape(None,6), but to shape(None,8) since 8 is devidable through 4.
Then you you can use tf.broadcast() to broadcast your value into the desired shape. Then you have two shapes, you can concatenate together.
hope this helps you out.
Figured it out, the code is like this:
input_x = Input(shape=input_shape)
x=Conv2D(...)(input_x)
...
y_pred1 = Conv2D(...)(x) # shape of (None, 80, 80, 2)
y_pred2 = Dense(4)(x) # (None, 4)
# =========transform to concatenate:===========
y_pred2_matrix = Lambda(lambda x: K.expand_dims(K.expand_dims(x, -1)))(y_pred2) # (None,4, 1,1)
y_pred2_matrix = ZeroPadding2D(padding=((0,76),(0,79)))(y_pred2_matrix) # (None, 80, 80,1)
y_merged = Concatenate(axis=-1)([y_pred1, y_pred2_matrix]) # (None, 80, 80, 3)
The 4 elements of y_pred2 can be indexed as y_merged[None, :4, 0, 2]
I am new to keras.
My goal is to have total of 4 max pooling layers. All of them take same input with shape (N, 256). The first layer does global max pooling and give 1 output. The second layer with N / 2 pooling size and N / 2 stride, gives 2 outputs. The third gives 4 outputs and the fourth gives 8 outputs. Here is my code.
test_x = np.random.rand(N, 256, 1)
model = Sequential()
input1 = Input(shape=test_x.shape, name='input1')
input2 = Input(shape=test_x.shape, name='input2')
input3 = Input(shape=test_x.shape, name='input3')
input4 = Input(shape=test_x.shape, name='input4')
max1 = MaxPooling2D(pool_size=(N, 256), strides=N)(input1)
max2 = MaxPooling2D(pool_size=(N / 2, 256), strides=N / 2)(input2)
max3 = MaxPooling2D(pool_size=(N / 4, 256), strides=N / 4)(input3)
max4 = MaxPooling2D(pool_size=(N / 8, 256), strides=N / 8)(input4)
mrg = Merge(mode='concat')([max1, max2, max3, max4])
After creating 4 max pooling layers, I try to merge them together, but keras gives this error.
ValueError: Dimension 1 in both shapes must be equal, but are 4 and 8 for 'merge_1/concat' (op: 'ConcatV2') with input shapes: [?,1,1,1], [?,2,1,1], [?,4,1,1], [?,8,1,1], [] and with computed input tensors: input[4] = <3>.
How can I solve this issue? Is merging the correct way to achieve my goal in keras?
For concatenation, all dimensions must have the same number of elements, except for the concat dimension itself.
As you can see, your results have shape:
(?, 1, 1, 1)
(?, 2, 1, 1)
(?, 4, 1, 1)
(?, 8, 1, 1)
Naturally, the only possible way to concatenate them is in the second axis (axis=1)
mrg = Concatenate(axis=1)([max1,max2,max3,max4])
But notice that (unless you have specific reasons for that and know exaclty what you're doing) this will result in a very weird image, since you're concatenating in a spatial dimension, not in a channel dimension.
I have two vectors, weighted: shape (None, 3) and D: shape (None, 3, 5). Then I want to multiply weighted to D like weighted * D: shape(None, 3, 5).
I attached my image below. So each scalar value is multiplied to each row element.
So I tried multiply([weighted, D]), but I got an error ValueError: Operands could not be broadcast together with shapes (3, 5) (3,). I assume this is caused of different shape of inputs. Then, how do I fix this?
Update
multiply([weighted, Permute((2, 1))(D)]) worked. I am not sure but last element of shape must be same..
You can reshape weighted and use broadcasting to accomplish that. Like this:
weighted = weighted.reshape(-1, 3, 1)
result = weighted * D
Update 1: The same concept (broadcasting) can be used for instance in tensorflow with tf.expand_dims(weights, dim=2). My POC:
import tensorflow as tf
import numpy as np
tf.reset_default_graph()
anp = np.array([[1, 2, 10], [2, 1, 10]])
bnp = np.random.random((2, 3, 5))
with tf.Session() as sess:
weighted = tf.placeholder(tf.float32, shape=(None, 3))
D = tf.placeholder(tf.float32, shape=(None, 3, 5))
rweighted = tf.expand_dims(weighted, dim=2)
result = rweighted * D
r = sess.run(result, feed_dict={weighted: anp, D: bnp})
print(bnp)
print("--")
print(r)
For keras use the backend API:
from keras import backend as K
...
K.expand_dims(weighted, 2)
I've been searching for a way to visualize parameters in Caffe after traning the network, I found this link. it send a transpose of parameter with
filters = net.params['conv1'][0].data
vis_square(filters.transpose(0, 2, 3, 1))
Which i don't understand why it transpose the data? and in the vis_square it use this code:
data = data.reshape((n, n) + data.shape[1:]).transpose((0, 2, 1, 3) + tuple(range(4, data.ndim + 1)))
data = data.reshape((n * data.shape[1], n * data.shape[3]) + data.shape[4:])
Which is too compressed to understand, any explanation would be appreciated. and then when i changed the code to get conv2 instead of conv1:
filters = net.params['conv2'][0].data
vis_square(filters.transpose(0, 2, 3, 1))
I get
TypeError: Invalid dimensions for image data
, Is there any different between conv1 and conv2 which cause this error ? How can we change the code to fix it and it work for all layer ?
Some debugging data :
net.params['conv1'][0].data.shape : (96, 3, 11, 11)
net.params['conv1'][1].data.shape : (96,)
net.params['conv2'][0].data.shape : (256, 48, 5, 5)
net.params['conv2'][1].data.shape : (256,)
net.params['conv3'][0].data.shape : (384, 256, 3, 3)
net.params['conv3'][1].data.shape : (384,)
for conv2:
data.shape[0] : 256
np.sqrt(data.shape[0]) : 16.0
np.ceil(np.sqrt(data.shape[0])) : 16.0
data.shape[0] : 256
data.shape[0:] : (256, 6, 6, 48)
data.shape[1] : 6
data.shape[1:] : (6, 6, 48)
data.ndim : 4
range(4, data.ndim + 1)) : [4]
tuple(range(4, data.ndim + 1)) : (4,)
AND after :
data = np.pad(data, padding, mode='constant', constant_values=1)
for conv2:
data.shape : (10, 12, 10, 12, 3)
and after
data = data.reshape((n, n) + data.shape[1:]).transpose((0, 2, 1, 3) + tuple(range(4, data.ndim + 1)))
data became :
data.shape : (120, 120, 3)
The code you inspected is written to visualize (i.e., convert to RGB image) convolutional filters.
The shape of conv1 filters (in your example) is (96, 3, 11, 11) which means
- 96 : you have 96 filters in conv1 of your net (i.e., num_output: 96), therefore you would wish to view 96 different filters.
- 3 : the input dimension of each filter is 3, because the input to conv1 in your net is an RGB image with three channels.
- 11, 11: the spatial size of each kernel/filter in your case is 11x11 (i.e., kernel_size: 11).
Therefore, to visualize 96 filters as 11x11x3 thumbnails.
However, when trying to visualize conv2 (or any other deeper layer) you have a problem. There is no longer RGB meaning to filter dimensions. The filters of conv2 work on the output feature of conv1 (which in your case is a 96-dim space). To date, AFAIK, there is no straight-forward way to convert a 96-dim data to a simple 3D RGB representation.
So, you cannot use the same code to visualize conv2 filters. You must use some other method for visualization.