Related
I want to train an autoencoder on mp3 songs. Given the size of the dataset, it would be better if only part of the dataset is in memory at any given time.
What I tried
is using tfio and tf.data.Dataset but that gives me an error when fitting the model.
ValueError: Cannot iterate over a shape with unknown rank.
The code was as follows
segment_length = 1024
filenames= tf.data.Dataset.list_files('data/*')
def decode_mp3(mp3_path):
mp3_path = mp3_path.numpy().decode("utf-8")
audio = tfio.audio.AudioIOTensor(mp3_path)
audio_tensor = tf.cast(audio[:], tf.float32)
overflow = len(audio_tensor) % segment_length
audio_tensor = audio_tensor[:-overflow, 0]
audio_tensor = tf.reshape(audio_tensor,(len(audio_tensor), 1))
audio_tensor = audio_tensor[:, 0]
return audio_tensor
song_dataset = filenames.map(lambda path:
tf.py_function(func=decode_mp3, inp=[path], Tout=tf.float32))
segment_dataset = song_dataset.flat_map(lambda song:
tf.data.Dataset.from_tensor_slices(song)).batch(segment_length)
dataset = segment_dataset.map(lambda x: (x, x)) # add labels (identical to inputs here)
With a model like so
encoder = keras.models.Sequential([
keras.layers.Input((segment_length, 1)),
keras.layers.Conv1D(128, 3, strides=2, padding="same"),
...
)]
but as I said, calling fit would throw the error above. Even though the shape is exactly as I would hope
for x,y in dataset.take(1):
print(x.shape, y.shape)
> (1024, 1) (1024, 1)
Any help on this would be appreciated. I might be misunderstanding something with input shapes and datasets.
So I finally found part of the answer. The Input layer seems to be meant for models with the functional API (?) and I removed it. Now the model is like this
encoder = keras.models.Sequential([
keras.layers.Conv1D(128, 3, strides=2, padding="same", input_shape=(segment_length, 1)),
...
where the Input layer is replaced with an input_shape parameter in the first Conv1D layer. Also I batched the dataset with
ds = dataset.batch(2)
and that was important too. Any further clarification would still be appreciated. None the less, I hope this can help people with the same problem.
I'm attempting to make a chess engine using a neural network made in Keras. I want to output a prediction of the probable policy based off training games, and I am using a 73x8x8 output to do that (each position on the board times 73 different possible moves, 8 directions * 7 squares for the "queen moves", 8 knight moves, 3 promotions (any other promotion is a queen promotion) times 3 directions).
However the final layer in my network is a Dense layer, which outputs a single dimensional 4672 long output. I am trying to reshape this into something easier to use through the Reshape layer.
However, it gives me this error: ValueError: Error when checking target: expected reshape_1 to have 4 dimensions, but got array with shape (2, 1)
I have had a look at this question: Error when checking target: expected dense_1 to have 3 dimensions, but got array with shape (118, 1) but the answer doesn't seem to apply to Dense layers as they do not have a "return sequences" input.
Here is my code:
from keras.models import Model, Input
from keras.layers import Conv2D, Dense, Flatten, Reshape
from keras.optimizers import SGD
import numpy
from copy import deepcopy
class NeuralNetwork:
def __init__(self):
self.network = Model()
self.create_network()
def create_network(self):
input = Input((13, 8, 8))
output = Conv2D(256, (3, 3), padding='same')(input)
policy_head_output = Conv2D(2, (1, 1))(output)
policy_head_output = Flatten()(policy_head_output)
policy_head_output = Dense(4672, name='policy_output')(policy_head_output)
policy_head_output = Reshape((73, 8, 8), input_shape=(4672,))(policy_head_output)
value_head_output = Conv2D(1, (1, 1))(output)
value_head_output = Dense(256)(value_head_output)
value_head_output = Flatten()(value_head_output)
value_head_output = Dense(1, name="value_output")(value_head_output)
self.network = Model(outputs=[value_head_output, policy_head_output], inputs=input)
def train_network(self, input_training_data, labels):
sgd = SGD(0.2, 0.9)
self.network.compile(sgd, 'categorical_crossentropy', metrics=['accuracy'])
self.network.fit(input_training_data, [labels[0], labels[1]])
self.network.save("Neural Net 1")
def make_training_data():
training_data = []
labels = []
for i in range(6):
training_data.append(make_image())
labels.append(make_label_image())
return training_data, labels
def make_image():
data = []
for i in range(13):
blank_board = []
for j in range(8):
a = []
for k in range(8):
a.append(0)
blank_board.append(a)
data.append(blank_board)
return data
def make_label_image():
policy_logits = []
blank_board = []
for i in range(8):
a = []
for j in range(8):
a.append(0)
blank_board.append(a)
for i in range(73):
policy_logits.append(deepcopy(blank_board))
return [policy_logits, [0]]
def main():
input_training_data, output_training_data = make_training_data()
neural_net = NeuralNetwork()
input_training_data = numpy.array(input_training_data)
output_training_data = numpy.array(output_training_data)
neural_net.train_network(input_training_data, output_training_data)
main()
Could someone please explain:
What's happening
What I can do to fix it
There's a few things wrong with your approach.
1. Your output/target data
So you're creating a list object with two elements (board, label). Board is 73x8x8 where label is 0/1. This creates inconsistent dimensions. And when you convert this ragged structure to a numpy array this happens.
a = [[0,1,2,3],[0]]
arr = np.array(a)
print(arr)
# => [list([0, 1, 2, 3]) list([0])]
Then data slicing indexing takes a very weird turn and I will not go there. So, first thing is separate out your data, so that each element returned in your make_training_data has consistent dimensions. So here we have the input_image, output_board_image and output_labels returned separately.
def make_training_data():
training_data = []
labels = []
board_output = []
for i in range(6):
training_data.append(make_image())
board, lbl = make_label_image()
labels.append(lbl)
board_output.append(board)
return training_data, board_output, labels
and in the main(), it becomes,
input_training_data, output_training_board, output_training_labels = make_training_data()
input_training_data = np.array(input_training_data)
output_training_board = np.array(output_training_board)
output_training_labels = np.array(output_training_labels)
The error
So you're getting the error
ValueError: Error when checking target: expected reshape_1 to have 4 dimensions, but got array with shape (2, 1)
Well, it's simple, you have given the outputs in the wrong order when doing the model.fit(). In other words, your model says,
outputs=[value_head_output, policy_head_output]
and your make_labels() says,
[policy_logits, [0]]
which is the other way around. Your poor model is trying to reshape labels to that 4 dimensional structure. That's why it complains. So it should be,
neural_net.train_network(input_training_data, [output_training_labels, output_training_board])
Even if you correct just this (without the make_training_data()), you probably won't get this working because of all those inconsistencies in your numpy structure (the first section).
The loss function
This is about your loss function. You have a Dense layer with a single output and you're using categorical_crossentropy which is for "categorical" outputs. You should use binary_crossentropy here, as you only have a single index.
Also, if you want multiple losses for your multiple outputs do the following.
self.network.compile(sgd, ['binary_crossentropy', 'mean_squared_error'], metrics=['accuracy'])
This is just an example. If you want you can have the same loss for both inputs too.
My codes:
def f(x):
try:
import tensorflow as tf
# x is (None, 10, 2)
idx = K.cast(x*15.5+15.5, "int32")
z = tf.sparse_to_dense(idx, 32, 1.0, 0.0, name='sparse_tensor')
print('z.shape={0}'.format(z.shape))
except Exception as e:
print(e)
return x[:, :, 0:2]
drop_out = Lambda(lambda x: f(x),
output_shape=drop_output_shape, name='projection')(reshape_out)
x is tensor of (None, 10, 2), where there are 10 indexes/coordinates. Trying to generate a (None, 32, 32) tensor z. I got the following error:
Shape must be rank 1 but is rank 0 for 'projection_14/sparse_tensor' (op: 'SparseToDense') with input shapes: [?,10,2], [], [], [].
How to fix it? Thanks
The specific error you've seen is trying to say your output_shape should be a 1-D Tensor, like (32,), rather than 0-D Tensor as you had there, 32. But I worried this simple change will not solve your problem.
One thing I don't understand is why your x is a 3-D tensor when you said you have just 10 indices. Technically speaking, sparse_indices can be a 2-D tensor at most. My understanding of tf.sparse_to_dense is that it's quite similar to making a sparse tensor. So the number 2 in your (10, 2) already decided that the output tensor will be 2-D. The None, like variant sample size, should be handled differently.
Following this logic, another problem you may find is the output_shape should be (32, 32) rather than (32,) as the simple fix mentioned above. The length of the tuple should match the shape (last axis specifically) of sparse_indices.
With all these in mind, I think a tensorflow only MVCE mimicking your example could be:
import numpy as np
import tensorflow as tf
x = tf.placeholder(tf.float32, shape=(10, 2))
idx = tf.cast(x*15.5+15.5, tf.int32)
z = tf.sparse_to_dense(idx, (32, 32), 1.0, 0.0, name='sparse_tensor')
with tf.Session() as sess:
print(sess.run(
z, feed_dict={x: np.arange(20, dtype=np.float32).reshape((10, 2))/20})
)
Just to point out: The tf.sparse_to_dense "FUNCTION IS DEPRECATED. It will be removed in a future version. Instructions for updating: Create a tf.sparse.SparseTensor and use tf.sparse.to_dense instead."
I'm trying to reshape a tensor from [A, B, C, D] into [A, B, C * D] and feed it into a dynamic_rnn. Assume that I don't know the B, C, and D in advance (they're a result of a convolutional network).
I think in Theano such reshaping would look like this:
x = x.flatten(ndim=3)
It seems that in TensorFlow there's no easy way to do this and so far here's what I came up with:
x_shape = tf.shape(x)
x = tf.reshape(x, [batch_size, x_shape[1], tf.reduce_prod(x_shape[2:])]
Even when the shape of x is known during graph building (i.e. print(x.get_shape()) prints out absolute values, like [10, 20, 30, 40] after the reshaping get_shape() becomes [10, None, None]. Again, still assume the initial shape isn't known so I can't operate with absolute values.
And when I'm passing x to a dynamic_rnn it fails:
ValueError: Input size (depth of inputs) must be accessible via shape inference, but saw value None.
Why is reshape unable to handle this case? What is the right way of replicating Theano's flatten(ndim=n) in TensorFlow with tensors of rank 4 and more?
It is not a flaw in reshape, but a limitation of tf.dynamic_rnn.
Your code to flatten the last two dimensions is correct. And, reshape behaves correctly too: if the last two dimensions are unknown when you define the flattening operation, then so is their product, and None is the only appropriate value that can be returned at this time.
The culprit is tf.dynamic_rnn, which expects a fully-defined feature shape during construction, i.e. all dimensions apart from the first (batch size) and the second (time steps) must be known. It is a bit unfortunate perhaps, but the current implementation does not seem to allow RNNs with a variable number of features, à la FCN.
I tried a simple code according to your requirements. Since you are trying to reshape a CNN output, the shape of X is same as the output of CNN in Tensorflow.
HEIGHT = 100
WIDTH = 200
N_CHANELS =3
N_HIDDEN =64
X = tf.placeholder(tf.float32, shape=[None,HEIGHT,WIDTH,N_CHANELS],name='input') # output of CNN
shape = X.get_shape().as_list() # get the shape of each dimention shape[0] =BATCH_SIZE , shape[1] = HEIGHT , shape[2] = HEIGHT = WIDTH , shape[3] = N_CHANELS
input = tf.reshape(X, [-1, shape[1] , shape[2] * shape[3]])
print(input.shape) # prints (?, 100, 600)
#Input for tf.nn.dynamic_rnn should be in the shape of [BATCH_SIZE, N_TIMESTEPS, INPUT_SIZE]
#Therefore, according to the reshape N_TIMESTEPS = 100 and INPUT_SIZE= 600
#create the RNN here
lstm_layers = tf.contrib.rnn.BasicLSTMCell(N_HIDDEN, forget_bias=1.0)
outputs, _ = tf.nn.dynamic_rnn(lstm_layers, input, dtype=tf.float32)
Hope this helps.
I found a solution to this by using .get_shape().
Assuming 'x' is a 4-D Tensor.
This will only work with the Reshape Layer. As you were making changes to the architecture of the model, this should work.
x = tf.keras.layers.Reshape(x, [x.get_shape()[0], x.get_shape()[1], x.get_shape()[2] * x.get_shape()][3])
Hope this works!
If you use the tf.keras.models.Model or tf.keras.layers.Layer wrapper, the build method provides a nice way to do this.
Here's an example:
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Conv1D, Conv2D, Conv2DTranspose, Attention, Layer, Reshape
class VisualAttention(Layer):
def __init__(self, channels_out, key_is_value=True):
super(VisualAttention, self).__init__()
self.channels_out = channels_out
self.key_is_value = key_is_value
self.flatten_images = None # see build method
self.unflatten_images = None # see build method
self.query_conv = Conv1D(filters=channels_out, kernel_size=1, padding='same')
self.value_conv = Conv1D(filters=channels_out, kernel_size=4, padding='same')
self.key_conv = self.value_conv if key_is_value else Conv1D(filters=channels_out, kernel_size=4, padding='same')
self.attention_layer = Attention(use_scale=False, causal=False, dropout=0.)
def build(self, input_shape):
b, h, w, c = input_shape
self.flatten_images = Reshape((h*w, c), input_shape=(h, w, c))
self.unflatten_images = Reshape((h, w, self.channels_out), input_shape=(h*w, self.channels_out))
def call(self, x, training=True):
x = self.flatten_images(x)
q = self.query_conv(x)
v = self.value_conv(x)
inputs = [q, v] if self.key_is_value else [q, v, self.key_conv(x)]
output = self.attention_layer(inputs=inputs, training=training)
return self.unflatten_images(output)
# test
import numpy as np
x = np.arange(8*28*32*3).reshape((8, 28, 32, 3)).astype('float32')
model = VisualAttention(8)
y = model(x)
print(y.shape)
I'm using Theano for classification (convolutional neural networks)
Previously, I've been using the pixel values of the (flattened) image as the features of the NN.
Now, I want to add additional features. I've been told that I can concatenate that vector of additional features to the flattened image features and then use that as input to the fully-connected layer, but I'm having trouble with that.
First of all, is that the right approach?
Here's some code snippets and my errors:
Similar to the provided example from their site with some modifications
(from the class that builds the model)
# allocate symbolic variables for the data
self.x = T.matrix('x') # the data is presented as rasterized images
self.y = T.ivector('y') # the labels are presented as 1D vector of [int] labels
self.f = T.matrix('f') # additional features
Below, variables v and rng are defined previously. What's important is layer2_input:
layer2_input = self.layer1.output.flatten(2)
layer2_input = T.concatenate([layer2_input, self.f.flatten(2)])
self.layer2 = HiddenLayer(rng, input=layer2_input, n_in=v, n_out=200, activation=T.tanh)
(from the class that trains)
train_model = theano.function([index], cost, updates=updates,
givens={
model.x: train_set_x[index * batch_size: (index + 1) * batch_size],
model.y: train_set_y[index * batch_size: (index + 1) * batch_size],
model.f: train_set_f[index * batch_size: (index + 1) * batch_size]
})
However, I get an error when the train_model is called:
ValueError: GpuJoin: Wrong inputs for input 1 related to inputs 0.!
Apply node that caused the error: GpuJoin(TensorConstant{0}, GpuElemwise{tanh,no_inplace}.0, GpuFlatten{2}.0)
Inputs shapes: [(), (5, 11776), (5, 2)]
Inputs strides: [(), (11776, 1), (2, 1)]
Inputs types: [TensorType(int8, scalar), CudaNdarrayType(float32, matrix), CudaNdarrayType(float32, matrix)]
Do the input shapes represent the shapes of x, y and f, respectively?
If so, the third seems correct (batchsize=5, 2 extra features), but why is the first a scalar and the second a matrix?
More details:
train_set_x.shape = (61, 19200) [61 flattened images (160x120), 19200 pixels]
train_set_y.shape = (61,) [61 integer labels]
train_set_f.shape = (61,2) [2 additional features per image]
batch_size = 5
Do I have the right idea or is there a better way of accomplishing this?
Any insights into why I'm getting an error?
Issue was that I was concatenating on the wrong axis.
layer2_input = T.concatenate([layer2_input, self.f.flatten(2)])
should have been
layer2_input = T.concatenate([layer2_input, self.f.flatten(2)], axis=1)