Adding additional features in Theano (CNN) - python

I'm using Theano for classification (convolutional neural networks)
Previously, I've been using the pixel values of the (flattened) image as the features of the NN.
Now, I want to add additional features. I've been told that I can concatenate that vector of additional features to the flattened image features and then use that as input to the fully-connected layer, but I'm having trouble with that.
First of all, is that the right approach?
Here's some code snippets and my errors:
Similar to the provided example from their site with some modifications
(from the class that builds the model)
# allocate symbolic variables for the data
self.x = T.matrix('x') # the data is presented as rasterized images
self.y = T.ivector('y') # the labels are presented as 1D vector of [int] labels
self.f = T.matrix('f') # additional features
Below, variables v and rng are defined previously. What's important is layer2_input:
layer2_input = self.layer1.output.flatten(2)
layer2_input = T.concatenate([layer2_input, self.f.flatten(2)])
self.layer2 = HiddenLayer(rng, input=layer2_input, n_in=v, n_out=200, activation=T.tanh)
(from the class that trains)
train_model = theano.function([index], cost, updates=updates,
givens={
model.x: train_set_x[index * batch_size: (index + 1) * batch_size],
model.y: train_set_y[index * batch_size: (index + 1) * batch_size],
model.f: train_set_f[index * batch_size: (index + 1) * batch_size]
})
However, I get an error when the train_model is called:
ValueError: GpuJoin: Wrong inputs for input 1 related to inputs 0.!
Apply node that caused the error: GpuJoin(TensorConstant{0}, GpuElemwise{tanh,no_inplace}.0, GpuFlatten{2}.0)
Inputs shapes: [(), (5, 11776), (5, 2)]
Inputs strides: [(), (11776, 1), (2, 1)]
Inputs types: [TensorType(int8, scalar), CudaNdarrayType(float32, matrix), CudaNdarrayType(float32, matrix)]
Do the input shapes represent the shapes of x, y and f, respectively?
If so, the third seems correct (batchsize=5, 2 extra features), but why is the first a scalar and the second a matrix?
More details:
train_set_x.shape = (61, 19200) [61 flattened images (160x120), 19200 pixels]
train_set_y.shape = (61,) [61 integer labels]
train_set_f.shape = (61,2) [2 additional features per image]
batch_size = 5
Do I have the right idea or is there a better way of accomplishing this?
Any insights into why I'm getting an error?

Issue was that I was concatenating on the wrong axis.
layer2_input = T.concatenate([layer2_input, self.f.flatten(2)])
should have been
layer2_input = T.concatenate([layer2_input, self.f.flatten(2)], axis=1)

Related

Keras input layer shape matches any shape given by generator

I found this strange behaviour that keras trains without error although the shapes do not match
n = Input(shape=(16, 1))
d = Dense(1, activation='sigmoid')(n)
m = Model(n, d)
def get_gen():
for i in range(10):
x = np.random.rand(32, 28, 1)
yield x, np.ones(32)
m.fit(get_gen()) # Works strangely
m.fit(np.random.rand(32, 28, 1), np.ones(32)) # Not working reasonbly
So why input shape is not respected in case of generator, is there implicit broadcasting, I really can not comprehend this behaviour as it should raise error when input shapes mismatch

ValueError: Cannot iterate over a shape with unknown rank

I want to train an autoencoder on mp3 songs. Given the size of the dataset, it would be better if only part of the dataset is in memory at any given time.
What I tried
is using tfio and tf.data.Dataset but that gives me an error when fitting the model.
ValueError: Cannot iterate over a shape with unknown rank.
The code was as follows
segment_length = 1024
filenames= tf.data.Dataset.list_files('data/*')
def decode_mp3(mp3_path):
mp3_path = mp3_path.numpy().decode("utf-8")
audio = tfio.audio.AudioIOTensor(mp3_path)
audio_tensor = tf.cast(audio[:], tf.float32)
overflow = len(audio_tensor) % segment_length
audio_tensor = audio_tensor[:-overflow, 0]
audio_tensor = tf.reshape(audio_tensor,(len(audio_tensor), 1))
audio_tensor = audio_tensor[:, 0]
return audio_tensor
song_dataset = filenames.map(lambda path:
tf.py_function(func=decode_mp3, inp=[path], Tout=tf.float32))
segment_dataset = song_dataset.flat_map(lambda song:
tf.data.Dataset.from_tensor_slices(song)).batch(segment_length)
dataset = segment_dataset.map(lambda x: (x, x)) # add labels (identical to inputs here)
With a model like so
encoder = keras.models.Sequential([
keras.layers.Input((segment_length, 1)),
keras.layers.Conv1D(128, 3, strides=2, padding="same"),
...
)]
but as I said, calling fit would throw the error above. Even though the shape is exactly as I would hope
for x,y in dataset.take(1):
print(x.shape, y.shape)
> (1024, 1) (1024, 1)
Any help on this would be appreciated. I might be misunderstanding something with input shapes and datasets.
So I finally found part of the answer. The Input layer seems to be meant for models with the functional API (?) and I removed it. Now the model is like this
encoder = keras.models.Sequential([
keras.layers.Conv1D(128, 3, strides=2, padding="same", input_shape=(segment_length, 1)),
...
where the Input layer is replaced with an input_shape parameter in the first Conv1D layer. Also I batched the dataset with
ds = dataset.batch(2)
and that was important too. Any further clarification would still be appreciated. None the less, I hope this can help people with the same problem.

Wrong shape Dataset Tensorflow

Im new to tensorflow and Im trying to feed some data with tensorflow.Dataset. Im using Cityscape dataset with 8 different classes. Here is my code:
import os
import cv2
import numpy as np
import tensorflow as tf
H = 256
W = 256
id2cat = np.array([0,0,0,0,0,0,0, 1,1,1,1, 2,2,2,2,2,2, 3,3,3,3, 4,4, 5, 6,6, 7,7,7,7,7,7,7,7,7])
def readImage(x):
x = cv2.imread(x, cv2.IMREAD_COLOR)
x = cv2.resize(x, (W, H))
x = x / 255.0
x = x.astype(np.float32)
return x
def readMask(path):
mask = cv2.imread(path, 0)
mask = cv2.resize(mask, (W, H))
mask = id2cat[mask]
return mask.astype(np.int32)
def preprocess(x, y):
def f(x, y):
image = readImage(x)
mask = readMask(y)
return image, mask
image, mask = tf.numpy_function(f, [x, y], [tf.float32, tf.int32])
mask = tf.one_hot(mask, 3, dtype=tf.int32)
image.set_shape([H, W, 3])
mask.set_shape([H, W, 3])
return image, mask
def tf_dataset(x, y, batch=8):
dataset = tf.data.Dataset.from_tensor_slices((x, y))
dataset = dataset.shuffle(buffer_size=5000)
dataset = dataset.map(preprocess)
dataset = dataset.batch(batch)
dataset = dataset.repeat()
dataset = dataset.prefetch(2)
return dataset
def loadCityscape():
trainPath = os.path.join(os.path.dirname(os.path.realpath(__file__)), 'datasets\\Cityscape\\train')
imagesPath = os.path.join(trainPath, 'images')
maskPath = os.path.join(trainPath, 'masks')
images = []
masks = []
print('Loading images and masks for Cityscape dataset...')
for image in os.listdir(imagesPath):
images.append(readImage(os.path.join(imagesPath, image)))
for mask in os.listdir(maskPath):
if 'label' in mask:
masks.append(readMask(os.path.join(maskPath, mask)))
print('Loaded {} images\n'.format(len(images)))
return images, masks
images, masks = loadCityscape()
dataset = tf_dataset(images, masks, batch=8)
print(dataset)
That last print(dataset) shows:
<PrefetchDataset shapes: ((None, 256, 256, 3), (None, 256, 256, 3)), types: (tf.float32, tf.int32)>
Why am I obtaining (None, 256, 256, 3) instead of (8, 256, 256, 3)? I also have some doubts about how to iterate over this dataset.
Thanks a lot.
Tensorflow is a graph based mathematical framework that abstracts for you all of those complex vectorial or matricial operations you face, particularly in machine learning.
What the developers though is that it would be unconfortable to specify every single time how many input vectors you need to pass in your model for the training, so they decided to abstract it for you.
You will not interested if your model is fed with one single or thousands samples as long as the output matches with the input dimension (but also any internal operation should match in dimensions!).
So the None size is a placeholder for a possible changing shape, that is usually the batch size of the input.
We need a placeholder because (None, 2) is a different shape with respect of just (2,), because in the first case we know we will face 2 dimensions.
Even if the None dimension is unknown when you "compile" your model, it will be evaluated only when it is strictly needed, in other words when you run it. In this way your model will be happy to run on a batch size of 64 as like as 128 samples.
For the rest a (non-scalar) Tensor behaves like a normal numpy array:
tensor1 = tf.constant([ 0, 1, 2, 3]) # shape (4, )
tensor2 = tf.constant([ [0], [1], [2], [3]]) # shape (4, 1)
for x in tensor1:
print(x) # 0, 1, 2, 3
for x in tensor2:
print(x) # Tensor([0]), Tensor([1]), Tensor([2]), Tensor([3])
The only difference is that it can be allocated into any supported device memory (CPU / Cuda GPU).
Iterating through the dataset is just like slicing it at (usually) constant sizes, where that constant is your batch size, which will fill that empty None dimension.
This line of code will be responsible of slicing your dataset into "sub-tensors" ("sub-arrays") composed by its samples:
dataset = dataset.batch(N)
# iterating over it:
for batch in dataset: # I'm taking N samples here
...
Your "runtime" shape will be (N, 256, 256, 3), but if you will try to take an element from the dataset it could still have None in the shape... That's because we can't guarantee, for example, that the dimension of the dataset is exactly divisible by the batch size, so some trailing samples of a variable shape could still be possible. You will hardly get rid off that None dimension, but in some custom methods of your model you could achieve that.
If you are still unconfortable with tensors there is the tensor.numpy() method that gives you back a numpy array, but at the cost of copying it (usually to your CPU). This is not available in every step of the process.
There are many way to define a dataset in tensorflow, I suggest to read how they think you should build an input pipeline, because it will make your life easier if you understand how much tensorflow takes your code at higher levels of abstraction.

Why can the reshape function in keras not change the number of dimensions

I'm attempting to make a chess engine using a neural network made in Keras. I want to output a prediction of the probable policy based off training games, and I am using a 73x8x8 output to do that (each position on the board times 73 different possible moves, 8 directions * 7 squares for the "queen moves", 8 knight moves, 3 promotions (any other promotion is a queen promotion) times 3 directions).
However the final layer in my network is a Dense layer, which outputs a single dimensional 4672 long output. I am trying to reshape this into something easier to use through the Reshape layer.
However, it gives me this error: ValueError: Error when checking target: expected reshape_1 to have 4 dimensions, but got array with shape (2, 1)
I have had a look at this question: Error when checking target: expected dense_1 to have 3 dimensions, but got array with shape (118, 1) but the answer doesn't seem to apply to Dense layers as they do not have a "return sequences" input.
Here is my code:
from keras.models import Model, Input
from keras.layers import Conv2D, Dense, Flatten, Reshape
from keras.optimizers import SGD
import numpy
from copy import deepcopy
class NeuralNetwork:
def __init__(self):
self.network = Model()
self.create_network()
def create_network(self):
input = Input((13, 8, 8))
output = Conv2D(256, (3, 3), padding='same')(input)
policy_head_output = Conv2D(2, (1, 1))(output)
policy_head_output = Flatten()(policy_head_output)
policy_head_output = Dense(4672, name='policy_output')(policy_head_output)
policy_head_output = Reshape((73, 8, 8), input_shape=(4672,))(policy_head_output)
value_head_output = Conv2D(1, (1, 1))(output)
value_head_output = Dense(256)(value_head_output)
value_head_output = Flatten()(value_head_output)
value_head_output = Dense(1, name="value_output")(value_head_output)
self.network = Model(outputs=[value_head_output, policy_head_output], inputs=input)
def train_network(self, input_training_data, labels):
sgd = SGD(0.2, 0.9)
self.network.compile(sgd, 'categorical_crossentropy', metrics=['accuracy'])
self.network.fit(input_training_data, [labels[0], labels[1]])
self.network.save("Neural Net 1")
def make_training_data():
training_data = []
labels = []
for i in range(6):
training_data.append(make_image())
labels.append(make_label_image())
return training_data, labels
def make_image():
data = []
for i in range(13):
blank_board = []
for j in range(8):
a = []
for k in range(8):
a.append(0)
blank_board.append(a)
data.append(blank_board)
return data
def make_label_image():
policy_logits = []
blank_board = []
for i in range(8):
a = []
for j in range(8):
a.append(0)
blank_board.append(a)
for i in range(73):
policy_logits.append(deepcopy(blank_board))
return [policy_logits, [0]]
def main():
input_training_data, output_training_data = make_training_data()
neural_net = NeuralNetwork()
input_training_data = numpy.array(input_training_data)
output_training_data = numpy.array(output_training_data)
neural_net.train_network(input_training_data, output_training_data)
main()
Could someone please explain:
What's happening
What I can do to fix it
There's a few things wrong with your approach.
1. Your output/target data
So you're creating a list object with two elements (board, label). Board is 73x8x8 where label is 0/1. This creates inconsistent dimensions. And when you convert this ragged structure to a numpy array this happens.
a = [[0,1,2,3],[0]]
arr = np.array(a)
print(arr)
# => [list([0, 1, 2, 3]) list([0])]
Then data slicing indexing takes a very weird turn and I will not go there. So, first thing is separate out your data, so that each element returned in your make_training_data has consistent dimensions. So here we have the input_image, output_board_image and output_labels returned separately.
def make_training_data():
training_data = []
labels = []
board_output = []
for i in range(6):
training_data.append(make_image())
board, lbl = make_label_image()
labels.append(lbl)
board_output.append(board)
return training_data, board_output, labels
and in the main(), it becomes,
input_training_data, output_training_board, output_training_labels = make_training_data()
input_training_data = np.array(input_training_data)
output_training_board = np.array(output_training_board)
output_training_labels = np.array(output_training_labels)
The error
So you're getting the error
ValueError: Error when checking target: expected reshape_1 to have 4 dimensions, but got array with shape (2, 1)
Well, it's simple, you have given the outputs in the wrong order when doing the model.fit(). In other words, your model says,
outputs=[value_head_output, policy_head_output]
and your make_labels() says,
[policy_logits, [0]]
which is the other way around. Your poor model is trying to reshape labels to that 4 dimensional structure. That's why it complains. So it should be,
neural_net.train_network(input_training_data, [output_training_labels, output_training_board])
Even if you correct just this (without the make_training_data()), you probably won't get this working because of all those inconsistencies in your numpy structure (the first section).
The loss function
This is about your loss function. You have a Dense layer with a single output and you're using categorical_crossentropy which is for "categorical" outputs. You should use binary_crossentropy here, as you only have a single index.
Also, if you want multiple losses for your multiple outputs do the following.
self.network.compile(sgd, ['binary_crossentropy', 'mean_squared_error'], metrics=['accuracy'])
This is just an example. If you want you can have the same loss for both inputs too.

duplicate a column in keras tensor

I am writing a custom loss function for semi supervised learning on cifar-10 dataset, for which I need to duplicate columns of my tensor for creating a sort of mask which I then multiply with the activation values to later sum over.
My loss function is a sum of entropy and cross entropy for unlabelled and labeled samples. I add an extra class and set it to 1 for unlabelled samples.
I then create a mask for identifying row indices of unlabelled samples from the y_true tensor. From that I should get a (n_samples, 1) tensor which I need to repeat/duplicate/copy to a (n_samples, 11) tensor that I can multiply with the activation values in y_pred
Loss function code:
a = np.ones((mini_batch_size, 1)) * 10
a_var = K.variable(value=a)
v = K.cast(K.equal(K.cast(K.argmax(y_true, axis=1), 'float32'), a_var), 'float32')
e_loss = K.sum(K.concatenate([v,v,v,v,v,v,v,v,v,v,v], axis=-1) * K.log(y_pred) * y_pred)
m_u = K.sum(K.cast(K.equal(K.cast(K.argmax(y_true, axis=1), 'float32'), a_var), 'float32'))
b = np.ones((mini_batch_size, 1)) * 10
b_var = K.variable(value=b)
v2 = K.cast(K.not_equal(K.cast(K.argmax(y_true, axis=1), 'float32'), b_var), 'float32')
ce_loss = K.sum(K.concatenate([v2, v2, v2, v2, v2, v2, v2, v2, v2, v2, v2], axis=1) * K.log(y_pred))
m_l = K.variable(value=float(mini_batch_size), dtype='float32') #- m_u
return -((e_loss/m_u) + (ce_loss/m_l))
The error I get is:
InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Incompatible shapes: [40,11] vs. [40,440]
[[{{node loss_36/dense_74_loss/mul_2}}]]
[[metrics_28/acc/Mean/_2627]]
(1) Invalid argument: Incompatible shapes: [40,11] vs. [40,440]
[[{{node loss_36/dense_74_loss/mul_2}}]]
0 successful operations.
0 derived errors ignored.
My batch size is 40.
I need my concatenated tensor to be of size [40, 11] not [40, 440]
I don't have real data to test whether the loss properly works, but this got rid of that InvalidArgumentError and did work with model.fit() for a dense model.
Few changes I did,
You don't have to repeat your v 11 times to multiply that with y_pred. All you need is reshape it to (-1,1) - (Will save you memory)
Got rid of all the K.variables. Now this is something I want to check with you, you are not trying to optimize a_var and b_var right (i.e. that's not a part of the model)? (Apparently, that's what's causing the issue. I need to dive deeper to see why). It seems the whole point of a_var and b_var is to perform boolean logics equal and not_equal, which works just fine with the constant.
Made m_l a K.constant
def loss_fn(y_true, y_pred):
v = K.cast(K.equal(K.cast(K.argmax(y_true, axis=-1), 'float32'), 10), 'float32')
e_loss = K.sum(K.reshape(v, (-1,1)) * K.log(y_pred) * y_pred)
m_u = K.sum(K.cast(K.equal(K.cast(K.argmax(y_true, axis=-1), 'float32'), 10), 'float32'))
v2 = K.cast(K.not_equal(K.cast(K.argmax(y_true, axis=-1), 'float32'), 10), 'float32')
ce_loss = K.sum(K.reshape(v2, (-1,1)) * K.log(y_pred))
m_l = K.constant(value=float(mini_batch_size), dtype='float32') #- m_u
return -((e_loss/m_u) + (ce_loss/m_l))
Note: Depending on the batch size within the loss function is a bad idea. Try to get rid of any batch_size dependent operations (especially for shape of tensors). You can see that I only have kept mini_batch_size to set m_l. But I would suggest setting this to some constant instead of min_batch_size. Because, if a batch with <40 comes through, you are using a different loss function for that batch. And your results aren't comparable between different batch sizes, as your loss function changes.

Categories