how to expand 2-dim arrays by using maclaurin series? - python

I am trying to feed the pixel vector to the convolutional neural network (CNN), where the pixel vector came from image data like cifar-10 dataset. Before feeding the pixel vector to CNN, I need to expand the pixel vector with maclaurin series. The point is, I figured out how to expand tensor with one dim, but not able to get it right for tensor with dim >2. Can anyone one give me ideas of how to apply maclaurin series of one dim tensor to tensor dim more than 1? is there any heuristics approach to implement this either in TensorFlow or Keras? any possible thought?
maclaurin series on CNN:
I figured out way of expanding tensor with 1 dim using maclaurin series. Here is how to scratch implementation looks like:
def cnn_taylor(input_dim, approx_order=2):
x = Input((input_dim,))
def pwr(x, approx_order):
x = x[..., None]
x = tf.tile(x, multiples=[1, 1, approx_order + 1])
pw = tf.range(0, approx_order + 1, dtype=tf.float32)
x_p = tf.pow(x, pw)
x_p = x_p[..., None]
return x_p
x_p = Lambda(lambda x: pwr(x, approx_order))(x)
h = Dense(1, use_bias=False)(x_p)
def cumu_sum(h):
h = tf.squeeze(h, axis=-1)
s = tf.cumsum(h, axis=-1)
s = s[..., None]
return s
S = Lambda(cumu_sum)(h)
so above implementation is sketch coding attempt on how to expand CNN with Taylor expansion by using 1 dim tensor. I am wondering how to do same thing to tensor with multi dim array (i.e, dim=3).
If I want to expand CNN with an approximation order of 2 with Taylor expansion where input is a pixel vector from RGB image, how am I going to accomplish this easily in TensorFlow? any thought? Thanks

If I understand correctly, each x in the provided computational graph is just a scalar (one channel of a pixel). In this case, in order to apply the transformation to each pixel, you could:
Flatten the 4D (b, h, w, c) input coming from the convolutional layer into a tensor of shape (b, h*w*c).
Apply the transformation to the resulting tensor.
Undo the reshaping to get a 4D tensor of shape (b, h, w, c)` back for which the "Taylor expansion" has been applied element-wise.
This could be achieved as follows:
shape_cnn = h.shape # Shape=(bs, h, w, c)
flat_dim = h.shape[1] * h.shape[2] * h.shape[3]
h = tf.reshape(h, (-1, flat_dim))
taylor_model = taylor_expansion_network(input_dim=flat_dim, max_pow=approx_order)
h = taylor_model(h)
h = tf.reshape(h, (-1, shape_cnn[1], shape_cnn[2], shape_cnn[3]))
NOTE: I am borrowing the function taylor_expansion_network from this answer.
UPDATE: I still don't clearly understand the end goal, but perhaps this update brings us closer to the desired output. I modified the taylor_expansion_network to apply the first part of the pipeline to RGB images of shape (width, height, nb_channels=3), returning a tensor of shape (width, height, nb_channels=3, max_pow+1):
def taylor_expansion_network_2(width, height, nb_channels=3, max_pow=2):
input_dim = width * height * nb_channels
x = Input((width, height, nb_channels,))
h = tf.reshape(x, (-1, input_dim))
# Raise input x_i to power p_i for each i in [0, max_pow].
def raise_power(x, max_pow):
x_ = x[..., None] # Shape=(batch_size, input_dim, 1)
x_ = tf.tile(x_, multiples=[1, 1, max_pow + 1]) # Shape=(batch_size, input_dim, max_pow+1)
pows = tf.range(0, max_pow + 1, dtype=tf.float32) # Shape=(max_pow+1,)
x_p = tf.pow(x_, pows) # Shape=(batch_size, input_dim, max_pow+1)
return x_p
h = raise_power(h, max_pow)
# Compute s_i for each i in [0, max_pow]
h = tf.cumsum(h, axis=-1) # Shape=(batch_size, input_dim, max_pow+1)
# Get the input format back
h = tf.reshape(h, (-1, width, height, nb_channels, max_pow+1)) # Shape=(batch_size, w, h, nb_channels, max_pow+1)
# Return Taylor expansion model
model = Model(inputs=x, outputs=h)
model.summary()
return model
In this modified model, the last step of the pipeline, namely the sum of w_i * s_i for each i, is not applied. Now, you can use the resulting tensor of shape (width, height, nb_channels=3, max_pow+1) in any way you want.

Related

Trying to use Dice Loss with UNET

I'm trying to implement the UNET at the keras website:
Image segmentation with a U-Net-like architecture
With only one change. use Dice loss instead of "sparse_categorical_crossentropy". However, every time I try something, I get different error. I'm coding on google colab using Tensorflow 2.7.
For example, I tried using
def DiceLoss(targets, inputs, smooth=1e-6):
#flatten label and prediction tensors
inputs = K.flatten(inputs)
targets = K.flatten(targets)
intersection = K.sum(K.dot(targets, inputs))
dice = (2*intersection + smooth) / (K.sum(targets) + K.sum(inputs) + smooth)
return 1 - dice
The eror I got:
ValueError: Shape must be rank 2 but is rank 1 for '{{node DiceLoss99/MatMul}} = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false](DiceLoss99/Reshape_1, DiceLoss99/Reshape)' with input shapes: [?], [?].
The problem is on this line:
intersection = K.sum(K.dot(targets, inputs))
I also tried this library:
!pip install git+https://github.com/qubvel/segmentation_models
# define optomizer
n_classes=3
LR = 0.0001
optim = keras.optimizers.Adam(LR)
dice_loss_sm = sm.losses.DiceLoss(class_weights=K.ones_like(n_classes))
However, I got the following error:
TypeError: Input 'y' of 'Mul' Op has type int32 that does not match type float32 of argument 'x'.
the remaining code is same as in keras.io. but I listed below for completeness :
from tensorflow.keras import layers
def get_model(img_size, num_classes):
inputs = keras.Input(shape=img_size + (3,))
### [First half of the network: downsampling inputs] ###
# Entry block
x = layers.Conv2D(32, 3, strides=2, padding="same")(inputs)
x = layers.BatchNormalization()(x)
x = layers.Activation("relu")(x)
previous_block_activation = x # Set aside residual
# Blocks 1, 2, 3 are identical apart from the feature depth.
for filters in [64, 128, 256]:
x = layers.Activation("relu")(x)
x = layers.SeparableConv2D(filters, 3, padding="same")(x)
x = layers.BatchNormalization()(x)
x = layers.Activation("relu")(x)
x = layers.SeparableConv2D(filters, 3, padding="same")(x)
x = layers.BatchNormalization()(x)
x = layers.MaxPooling2D(3, strides=2, padding="same")(x)
# Project residual
residual = layers.Conv2D(filters, 1, strides=2, padding="same")(
previous_block_activation
)
x = layers.add([x, residual]) # Add back residual
previous_block_activation = x # Set aside next residual
### [Second half of the network: upsampling inputs] ###
for filters in [256, 128, 64, 32]:
x = layers.Activation("relu")(x)
x = layers.Conv2DTranspose(filters, 3, padding="same")(x)
x = layers.BatchNormalization()(x)
x = layers.Activation("relu")(x)
x = layers.Conv2DTranspose(filters, 3, padding="same")(x)
x = layers.BatchNormalization()(x)
x = layers.UpSampling2D(2)(x)
# Project residual
residual = layers.UpSampling2D(2)(previous_block_activation)
residual = layers.Conv2D(filters, 1, padding="same")(residual)
x = layers.add([x, residual]) # Add back residual
previous_block_activation = x # Set aside next residual
# Add a per-pixel classification layer
outputs = layers.Conv2D(num_classes, 3, activation="softmax", padding="same")(x)
# Define the model
model = keras.Model(inputs, outputs)
return model
# Free up RAM in case the model definition cells were run multiple times
keras.backend.clear_session()
# Build model
model = get_model(img_size, num_classes)
model.summary()
# Configure the model for training.
# We use the "sparse" version of categorical_crossentropy
# because our target data is integers.
# notice I changed the lose the dice loss instead of sparse_categorical_crossentropy
model.compile(optimizer="rmsprop", loss="sparse_categorical_crossentropy")
callbacks = [
keras.callbacks.ModelCheckpoint("oxford_segmentation.h5", save_best_only=True)
]
# Train the model, doing validation at the end of each epoch.
epochs = 15
model.fit(train_gen, epochs=epochs, validation_data=val_gen, callbacks=callbacks)
EDIT
This detailed error message when trying the lose library at segmentation_models:
The issue on this code :
backend = kwargs['backend']
Args:
gt: ground truth 4D keras tensor (B, H, W, C) or (B, C, H, W)
pr: prediction 4D keras tensor (B, H, W, C) or (B, C, H, W)
class_weights: 1. or list of class weights, len(weights) = C
class_indexes: Optional integer or list of integers, classes to consider, if ``None`` all classes are used.
beta: f-score coefficient
smooth: value to avoid division by zero
per_image: if ``True``, metric is calculated as mean over images in batch (B),
else over whole batch
threshold: value to round predictions (use ``>`` comparison), if ``None`` prediction will not be round
Returns:
F-score in range [0, 1]
"""
Args:
gt: ground truth 4D keras tensor (B, H, W, C) or (B, C, H, W)
pr: prediction 4D keras tensor (B, H, W, C) or (B, C, H, W)
class_weights: 1. or list of class weights, len(weights) = C
class_indexes: Optional integer or list of integers, classes to consider, if ``None`` all classes are used.
beta: f-score coefficient
smooth: value to avoid division by zero
per_image: if ``True``, metric is calculated as mean over images in batch (B),
else over whole batch
threshold: value to round predictions (use ``>`` comparison), if ``None`` prediction will not be round
Returns:
F-score in range [0, 1]
"""
Args:
gt: ground truth 4D keras tensor (B, H, W, C) or (B, C, H, W)
pr: prediction 4D keras tensor (B, H, W, C) or (B, C, H, W)
class_weights: 1. or list of class weights, len(weights) = C
class_indexes: Optional integer or list of integers, classes to consider, if ``None`` all classes are used.
beta: f-score coefficient
smooth: value to avoid division by zero
per_image: if ``True``, metric is calculated as mean over images in batch (B),
else over whole batch
threshold: value to round predictions (use ``>`` comparison), if ``None`` prediction will not be round
Returns:
F-score in range [0, 1]
"""
gt, pr = gather_channels(gt, pr, indexes=class_indexes, **kwargs)
pr = round_if_needed(pr, threshold, **kwargs)
axes = get_reduce_axes(per_image, **kwargs)
# calculate score
tp = backend.sum(gt * pr, axis=axes) # the issue here
fp = backend.sum(pr, axis=axes) - tp
fn = backend.sum(gt, axis=axes) - tp
score = ((1 + beta ** 2) * tp + smooth) \
/ ((1 + beta ** 2) * tp + beta ** 2 * fn + fp + smooth)
score = average(score, per_image, class_weights, **kwargs)
return score
The code for gt,pr and axis is here:
def get_reduce_axes(per_image, **kwargs):
backend = kwargs['backend']
axes = [1, 2] if backend.image_data_format() == 'channels_last' else [2, 3]
if not per_image:
axes.insert(0, 0)
return axes
def gather_channels(*xs, indexes=None, **kwargs):
"""Slice tensors along channels axis by given indexes"""
if indexes is None:
return xs
elif isinstance(indexes, (int)):
indexes = [indexes]
xs = [_gather_channels(x, indexes=indexes, **kwargs) for x in xs]
return xs
def round_if_needed(x, threshold, **kwargs):
backend = kwargs['backend']
if threshold is not None:
x = backend.greater(x, threshold)
x = backend.cast(x, backend.floatx())
return x
You are passing 1-dimensional vectors to K.dot, while the ValueError is saying that K.dot requires arrays with 2-dimensions.
You can replace it with element-wise multiplication, i.e. intersection = K.sum(targets *inputs)

How to resize a batch of images for use with Pytorch Linear Regression?

I am trying to create a simple linear regression neural net for use with batches of images. The input dimensions are [BatchSize, 3, Width, Height] with the second dimension representing the RGB channels of the input image.
Here is my (broken) attempt at that regression model:
class LinearNet(torch.nn.Module):
def __init__(self, Chn, W,H, nHidden):
"""
Input: A [BatchSize x Channels x Width x Height] set of images
Output: A fitted regression model with weights dimension : [Width x Height]
"""
super(LinearNet, self).__init__()
self.Chn = Chn
self.W = W
self.H = H
self.hidden = torch.nn.Linear(Chn*W*H,nHidden) # hidden layer
self.predict = torch.nn.Linear(nHidden, Chn*W*H) # output layer
def forward(self, x):
torch.reshape(x, (-1,self.Chn*self.W*self.H)) # FAILS here
# x = x.resize(-1,self.Chn*self.W*self.H)
x = F.relu(self.hidden(x)) # activation function for hidden layer
x = self.predict(x) # linear output
x = x.resize(-1,self.Chn, self.W,self.H)
return x
When sending in a batch of images with dimensions [128 x 3 x 96 x 128] this fails on the indicated line:
RuntimeError: mat1 and mat2 shapes cannot be multiplied (36864x128 and 36864x256)
How should the matrix dimensions be properly manipulated to use these pytorch functions?
Update Based on a (since deleted) comment I have updated the code to use torch.reshape.
Solution 1 As a possible solution, you can get a batch size from input x with x.shape[0] and use it in reshape later
import torch
batch = torch.zeros([128, 3, 96, 128], dtype=torch.float32)
# -1 will compute last dimension automatically
batch_upd = torch.reshape(batch, (batch.shape[0], -1))
print(batch_upd.shape)
Output for this code is
torch.Size([128, 36864])
Solution 2
As another possible solution you can use flatten
batch_upd = batch.flatten(start_dim=1)
will result in the same output
As to your next problem, consider going through the modified forward code:
def forward(self, x):
x = x.flatten(1) # shape: [B, C, W, H] -> [B, C*W*H]
x = F.relu(self.hidden(x)) # activation function for hidden layer
x = self.predict(x) # linear output
x = x.reshape((-1, self.Chn, self.W, self.H)) # shape: [B, C*W*H] -> [B, C, W, H]
return x
Here is the successful usage example:
ln = LinearNet(3, 96, 128, 256)
batch = torch.zeros((128, 3, 96, 128))
res = ln(batch)
print(res.shape) # torch.Size([128, 3, 96, 128])

How to convolve signal with 1D kernel in TensorFlow?

I am trying to filter a TensorFlow tensor of shape (N_batch, N_data), where N_batch is the batch size (e.g. 32), and N_data is the size of the (noisy) timeseries array. I have a Gaussian kernel (taken from here), which is one-dimensional. I then want to use tensorflow.nn.conv1d to convolve this kernel with my signal.
I have been trying for most of the morning to get the dimensions of the input signal and the kernel right, but obviously with no success. From what I gathered from the interwebs, the dimensions of both the input signal and the kernel need to be aligned in some finicky way, and I just can't figure out which way that is. The TensorFlow error messages aren't particularly meaningful either (Shape must be rank 4 but is rank 3 for 'conv1d/Conv2D' (op: 'Conv2D') with input shapes: [?,1,1000], [1,81]). Below I've included a little piece of code to reproduce the situation:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
# Based on: https://stackoverflow.com/a/52012658/1510542
# Credits to #zephyrus
def gaussian_kernel(size, mean, std):
d = tf.distributions.Normal(tf.cast(mean, tf.float32), tf.cast(std, tf.float32))
vals = d.prob(tf.range(start=-size, limit=size+1, dtype=tf.float32))
kernel = vals # Some reshaping is required here
return kernel / tf.reduce_sum(kernel)
def gaussian_filter(input, sigma):
size = int(4*sigma + 0.5)
x = input # Some reshaping is required here
kernel = gaussian_kernel(size=size, mean=0.0, std=sigma)
conv = tf.nn.conv1d(x, kernel, stride=1, padding="SAME")
return conv
def run_filter():
tf.reset_default_graph()
# Define size of data, batch sizes
N_batch = 32
N_data = 1000
noise = 0.2 * (np.random.rand(N_batch, N_data) - 0.5)
x = np.linspace(0, 2*np.pi, N_data)
y = np.tile(np.sin(x), N_batch).reshape(N_batch, N_data)
y_noisy = y + noise
input = tf.placeholder(tf.float32, shape=[None, N_data])
smooth_input = gaussian_filter(input, sigma=10)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
y_smooth = smooth_input.eval(feed_dict={input: y_noisy})
plt.plot(y_noisy[0])
plt.plot(y_smooth[0])
plt.show()
if __name__ == "__main__":
run_filter()
Any ideas?
You need to add channel dimensions to your input/kernel, since TF convolutions are generally used for multi-channel inputs/outputs. As you are working with simple 1-channel input/output this amounts to just adding some size-1 "dummy" axes.
Since by default convolution expects channels to come last, your placeholder should have shape [None, N_data, 1] and your input be modified like
y_noisy = y + noise
y_noisy = y_noisy[:, :, np.newaxis]
Similarly, you need to add input and output channel dimensions to your filter:
kernel = gaussian_kernel(size=size, mean=0.0, std=sigma)
kernel = kernel[:, tf.newaxis, tf.newaxis]
That is, the filter is expected to have shape [width, in_channels, out_cannels].

Flattening two last dimensions of a tensor in TensorFlow

I'm trying to reshape a tensor from [A, B, C, D] into [A, B, C * D] and feed it into a dynamic_rnn. Assume that I don't know the B, C, and D in advance (they're a result of a convolutional network).
I think in Theano such reshaping would look like this:
x = x.flatten(ndim=3)
It seems that in TensorFlow there's no easy way to do this and so far here's what I came up with:
x_shape = tf.shape(x)
x = tf.reshape(x, [batch_size, x_shape[1], tf.reduce_prod(x_shape[2:])]
Even when the shape of x is known during graph building (i.e. print(x.get_shape()) prints out absolute values, like [10, 20, 30, 40] after the reshaping get_shape() becomes [10, None, None]. Again, still assume the initial shape isn't known so I can't operate with absolute values.
And when I'm passing x to a dynamic_rnn it fails:
ValueError: Input size (depth of inputs) must be accessible via shape inference, but saw value None.
Why is reshape unable to handle this case? What is the right way of replicating Theano's flatten(ndim=n) in TensorFlow with tensors of rank 4 and more?
It is not a flaw in reshape, but a limitation of tf.dynamic_rnn.
Your code to flatten the last two dimensions is correct. And, reshape behaves correctly too: if the last two dimensions are unknown when you define the flattening operation, then so is their product, and None is the only appropriate value that can be returned at this time.
The culprit is tf.dynamic_rnn, which expects a fully-defined feature shape during construction, i.e. all dimensions apart from the first (batch size) and the second (time steps) must be known. It is a bit unfortunate perhaps, but the current implementation does not seem to allow RNNs with a variable number of features, à la FCN.
I tried a simple code according to your requirements. Since you are trying to reshape a CNN output, the shape of X is same as the output of CNN in Tensorflow.
HEIGHT = 100
WIDTH = 200
N_CHANELS =3
N_HIDDEN =64
X = tf.placeholder(tf.float32, shape=[None,HEIGHT,WIDTH,N_CHANELS],name='input') # output of CNN
shape = X.get_shape().as_list() # get the shape of each dimention shape[0] =BATCH_SIZE , shape[1] = HEIGHT , shape[2] = HEIGHT = WIDTH , shape[3] = N_CHANELS
input = tf.reshape(X, [-1, shape[1] , shape[2] * shape[3]])
print(input.shape) # prints (?, 100, 600)
#Input for tf.nn.dynamic_rnn should be in the shape of [BATCH_SIZE, N_TIMESTEPS, INPUT_SIZE]
#Therefore, according to the reshape N_TIMESTEPS = 100 and INPUT_SIZE= 600
#create the RNN here
lstm_layers = tf.contrib.rnn.BasicLSTMCell(N_HIDDEN, forget_bias=1.0)
outputs, _ = tf.nn.dynamic_rnn(lstm_layers, input, dtype=tf.float32)
Hope this helps.
I found a solution to this by using .get_shape().
Assuming 'x' is a 4-D Tensor.
This will only work with the Reshape Layer. As you were making changes to the architecture of the model, this should work.
x = tf.keras.layers.Reshape(x, [x.get_shape()[0], x.get_shape()[1], x.get_shape()[2] * x.get_shape()][3])
Hope this works!
If you use the tf.keras.models.Model or tf.keras.layers.Layer wrapper, the build method provides a nice way to do this.
Here's an example:
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Conv1D, Conv2D, Conv2DTranspose, Attention, Layer, Reshape
class VisualAttention(Layer):
def __init__(self, channels_out, key_is_value=True):
super(VisualAttention, self).__init__()
self.channels_out = channels_out
self.key_is_value = key_is_value
self.flatten_images = None # see build method
self.unflatten_images = None # see build method
self.query_conv = Conv1D(filters=channels_out, kernel_size=1, padding='same')
self.value_conv = Conv1D(filters=channels_out, kernel_size=4, padding='same')
self.key_conv = self.value_conv if key_is_value else Conv1D(filters=channels_out, kernel_size=4, padding='same')
self.attention_layer = Attention(use_scale=False, causal=False, dropout=0.)
def build(self, input_shape):
b, h, w, c = input_shape
self.flatten_images = Reshape((h*w, c), input_shape=(h, w, c))
self.unflatten_images = Reshape((h, w, self.channels_out), input_shape=(h*w, self.channels_out))
def call(self, x, training=True):
x = self.flatten_images(x)
q = self.query_conv(x)
v = self.value_conv(x)
inputs = [q, v] if self.key_is_value else [q, v, self.key_conv(x)]
output = self.attention_layer(inputs=inputs, training=training)
return self.unflatten_images(output)
# test
import numpy as np
x = np.arange(8*28*32*3).reshape((8, 28, 32, 3)).astype('float32')
model = VisualAttention(8)
y = model(x)
print(y.shape)

Pass in matrix of images of variables sizes into Theano

I'm trying to use Theano to do some recognition. All my images are different sizes, and I don't want to resize them because they're paintings so they shouldn't be the same size. I was wondering how to pass in a matrix of images of variable image size lengths into the Theano function.
I'm under the impression that this not possible with numpy. Is there an alternative?
def floatX(X):
return np.asarray(X, dtype=theano.config.floatX)
def init_weights(shape):
return theano.shared(floatX(np.random.randn(*shape) * 0.01))
def model(X, w):
return T.nnet.softmax(T.dot(X, w))
X = T.fmatrix()
Y = T.fmatrix()
w = init_weights((784, 10))
py_x = model(X, w)
y_pred = T.argmax(py_x, axis=1)
cost = T.mean(T.nnet.categorical_crossentropy(py_x, Y))
gradient = T.grad(cost=cost, wrt=w)
update = [[w, w - gradient * 0.05]]
train = theano.function(inputs=[X, Y], outputs=cost, updates=update, allow_input_downcast=True)
predict = theano.function(inputs=[X], outputs=y_pred, allow_input_downcast=True)
Unless I'm mistaken in my interpretation of your code, I don't think what you're trying to do makes sense.
If I understand correctly, in model() you are computing a weighted sum over your image pixels using dot(X, w), where I assume that X is an (nimages, npixels) array of image data, and w is a weight matrix with fixed dimensions (784, 10).
In order for that dot product to even be computable, X.shape[1] (the number of pixels in each of your input images) must be equal to w.shape[0].
If the sizes of your input images vary, how can you expect to learn a single weight matrix with fixed dimensions?

Categories