Tensorflow InvalidArgumentError Matrix size incompatible - python

When I start the simple neural net I got an error. By the way, the code should output the first number of the test array.
There have been other errors(there was one having to do with the data's dtype).
import tensorflow as tf
import numpy as np
from tensorflow import keras
data = np.array([[0, 1, 1], [0, 0, 1], [1, 1, 1]])
labels = np.array([0, 0, 1])
data.dtype = float
print(data.dtype)
model = keras.Sequential([
keras.layers.Dense(3, activation=tf.nn.relu),
keras.layers.Dense(2, activation=tf.nn.softmax)])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.fit(data, labels)
prediction = model.predict([0, 1, 0])
print(prediction)
I get this error:
tensorflow.python.framework.errors_impl.InvalidArgumentError: Matrix size-incompatible: In[0]: [3,1], In[1]: [3,3]
[[{{node sequential/dense/Relu}}]]

You are getting above error because of below line:
prediction = model.predict([0, 1, 0])
You are passing a list which should be a numpy array and of shape Nx3, where N is basically batch size and can be 1, 2, etc. In this case, it will be 1.
In order to make it correct, change it to
prediction = model.predict(np.expand_dims(np.array([0, 1, 0], dtype=np.float32), 0))
or
prediction = model.predict(np.array([[0, 1, 0]], dtype=np.float32))
And, change data.dtype = float to data.dtype = np.float32.

Related

Issue with using pooling layers in Spektral package for graph neural networks

I am currently working on implementing graph neural networks using the Spektral package in Python. I am trying to include pooling layers (such as TopKPool) in my model, but I am running into dimension issues.
I have created a sample code to reproduce the error. Here is the code:
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.models import Model
from spektral.layers import GCNConv, TopKPool
# Example adjacency matrix and feature matrix
A = np.array([[0, 1, 1, 0],
[1, 0, 1, 0],
[1, 1, 0, 1],
[0, 0, 1, 0]], dtype=np.float32)
X = np.array([[0, 1],
[2, 3],
[4, 5],
[6, 7]], dtype=np.float32)
# Add a batch dimension to X
X = np.expand_dims(X, axis=0)
A = np.expand_dims(A, axis=0)
# Build the graph convolutional network model
X_in = Input((X.shape[1], X.shape[2]))
A_in = Input((A.shape[1],A.shape[2]))
gc1 = GCNConv(16, activation='relu')([X_in, A_in])
gc2 = GCNConv(32, activation='relu')([gc1, A_in])
pool1, A_out = TopKPool(ratio=0.5)([gc2, A_in])
flatten = Dense(128, activation='relu')(pool1)
model = Model(inputs=[X_in, A_in], outputs=flatten)
When I run this code, I get the following error message:
ValueError: Dimensions [1,1) of input[shape=[?]] = [] must match
dimensions [1,2) of updates[shape=[?,1]] = 1: Shapes must be equal
rank, but are 0 and 1 for '{{node top_k_pool/TensorScatterUpdate}} =
TensorScatterUpdate[T=DT_FLOAT, Tindices=DT_INT32](top_k_pool/sub_1,
top_k_pool/strided_slice_5, top_k_pool/strided_slice_1)' with input
shapes: [?], [?,1], [?,1].
I have tried using different pooling layers (SAGPool etc.) and checked the dimensions according to the package documentation, but I keep encountering the same error.
Here are Input dimensions:
Node features of shape (batch, n_nodes_in,
n_node_features); Adjacency matrix of shape (batch, n_nodes_in,
n_nodes_in);
I would greatly appreciate any assistance in resolving this issue.

tf.keras.metrics.TruePositives() returning wrong value in model.fit() when passed as metric to model.compile()

When I pass one-hot encoded labels as train and validation data into tensorflow keras' model.fit() function, the metric tf.keras.metrics.TruePositives() returns wrong values.
I'm running Tensorflow 2.0.
For example, if this is my code:
model.compile(optimizer, 'binary_crossentropy',
['accuracy', tf.keras.metrics.TruePositives()])
history = model.fit(train_data, train_labels_binary, batch_size=32, epochs=30,
validation_data=(val_data, val_labels_binary),
callbacks=[early_stopping])
train_labels_binary is this: array([[1, 0], [1, 0], [0, 1]])
and the resulting y_pred's are array([[1, 0], [1, 0], [0, 1]])
then tf.keras.metrics.TruePositives() should return 1, but it returns 3.
Any help would be greatly appreciated!!
Ok I did some more experimenting and it is fixed when the input is not 1-hot encoded and there is only 1 output neuron. So all metrics run correctly if we change the following 2 lines:
This: train_labels = np.eye(2)[np.random.randint(0, 2, size=(10, 1)).reshape(-1)]
To: train_labels = np.random.randint(0, 2, size=(10, 1))
and
This: model.add(layers.Dense(units=2, activation='sigmoid'))
To: model.add(layers.Dense(units=1, activation='sigmoid'))

How do I input more than one sequence into a CNN in TensorFlow?

I'm brand new to TensorFlow and working through an example similar to the text classification guide here
I have a working model with a single sequence, but I'm trying to figure out how to have two distinct sequences for each observation enter into the model training. I could just concatenate the two sequences and have them enter as one, but I'd like each sequence to be distinct.
I looked at the documentation for the input shape argument and that led me to attempt to input the training data as a tuple of the shapes of each sequence (see below), but that did not seem to do it:
x_train = [x_trainfirst, x_trainlast]
x_val = [x_valfirst, x_vallast]
shape_param = tuple([i.shape[1] for i in x_train])
ValueError: Error when checking model input: the list of Numpy arrays that you are passing to your model is not the size the model expected. Expected to see 1 array(s), but instead got the following list of 2 arrays: [array([[ 0, 0, 0, ..., 21, 5, 4],
[ 0, 0, 0, ..., 10, 1, 11],
[ 0, 0, 0, ..., 26, 8, 7],
...,
[ 0, 0, 0, ..., 10, 2, 3],
[ 0, 0, 0, ..., 8, 7, ...
Any suggestions or pointers to examples/resources for running multiple sequences into a CNN would be much appreciated!
EDIT: Showing architecture and dimensions as suggested:
def sepcnn_model(blocks,
filters,
kernel_size,
embedding_dim,
dropout_rate,
pool_size,
input_shape,
num_classes,
num_features,
use_pretrained_embedding=False,
is_embedding_trainable=False,
embedding_matrix=None):
"""Creates an instance of a separable CNN model.
# Arguments
blocks: int, number of pairs of sepCNN and pooling blocks in the model.
filters: int, output dimension of the layers.
kernel_size: int, length of the convolution window.
embedding_dim: int, dimension of the embedding vectors.
dropout_rate: float, percentage of input to drop at Dropout layers.
pool_size: int, factor by which to downscale input at MaxPooling layer.
input_shape: tuple, shape of input to the model.
num_classes: int, number of output classes.
num_features: int, number of words (embedding input dimension).
use_pretrained_embedding: bool, true if pre-trained embedding is on.
is_embedding_trainable: bool, true if embedding layer is trainable.
embedding_matrix: dict, dictionary with embedding coefficients.
# Returns
A sepCNN model instance.
"""
op_units, op_activation = _get_last_layer_units_and_activation(num_classes)
model = models.Sequential()
# Add embedding layer. If pre-trained embedding is used add weights to the
# embeddings layer and set trainable to input is_embedding_trainable flag.
if use_pretrained_embedding:
model.add(Embedding(input_dim=num_features,
output_dim=embedding_dim,
input_length=input_shape[0],
weights=[embedding_matrix],
trainable=is_embedding_trainable))
else:
model.add(Embedding(input_dim=num_features,
output_dim=embedding_dim,
input_length=input_shape[0]))
for _ in range(blocks-1):
model.add(Dropout(rate=dropout_rate))
model.add(SeparableConv1D(filters=filters,
kernel_size=kernel_size,
activation='relu',
bias_initializer='random_uniform',
depthwise_initializer='random_uniform',
padding='same'))
model.add(SeparableConv1D(filters=filters,
kernel_size=kernel_size,
activation='relu',
bias_initializer='random_uniform',
depthwise_initializer='random_uniform',
padding='same'))
model.add(MaxPooling1D(pool_size=pool_size))
model.add(SeparableConv1D(filters=filters * 2,
kernel_size=kernel_size,
activation='relu',
bias_initializer='random_uniform',
depthwise_initializer='random_uniform',
padding='same'))
model.add(SeparableConv1D(filters=filters * 2,
kernel_size=kernel_size,
activation='relu',
bias_initializer='random_uniform',
depthwise_initializer='random_uniform',
padding='same'))
model.add(GlobalAveragePooling1D())
model.add(Dropout(rate=dropout_rate))
model.add(Dense(op_units, activation=op_activation))
return model
learning_rate=1e-3
epochs=1000
batch_size=128
blocks=2
filters=64
dropout_rate=0.2
embedding_dim=200
kernel_size=3
pool_size=3
# trying to get multiple features
(training_data_fname, training_data_lname, training_labels), (val_data_fname, val_data_lname, val_labels) = tupleData #data
# Verify that validation labels are in the same range as training labels.
num_classes = get_num_classes(train_labels)
unexpected_labels = [v for v in val_labels if v not in range(num_classes)]
# if len(%unaliasexpected_labels):
# raise ValueError('Unexpected label values found in the validation set:'
# ' {unexpected_labels}. Please make sure that the '
# 'labels in the validation set are in the same range '
# 'as training labels.'.format(
# unexpected_labels=unexpected_labels))
# Vectorize texts.
x_trainfirst, x_valfirst, word_index_first = sequence_vectorize(
training_data_fname, val_data_fname)
x_trainlast, x_vallast, word_index_last = sequence_vectorize(
training_data_lname, val_data_lname)
# Number of features will be the embedding input dimension. Add 1 for the
# reserved index 0.
num_features = min(len(word_index_first) + 1, len(word_index_last) + 1, TOP_K)
x_train = [x_trainfirst, x_trainlast]
x_val = [x_valfirst, x_vallast]
shape_param = tuple([i.shape[1] for i in x_train])
# Create model instance.
model = sepcnn_model(blocks=blocks,
filters=filters,
kernel_size=kernel_size,
embedding_dim=embedding_dim,
dropout_rate=dropout_rate,
pool_size=pool_size,
input_shape=shape_param, #x_train.shape[1:],
num_classes=num_classes,
num_features=num_features)
# Compile model with learning parameters.
if num_classes == 2:
loss = 'binary_crossentropy'
else:
loss = 'sparse_categorical_crossentropy'
optimizer = tf.keras.optimizers.Adam(lr=learning_rate)
model.compile(optimizer=optimizer, loss=loss, metrics=['acc'])
# Create callback for early stopping on validation loss. If the loss does
# not decrease in two consecutive tries, stop training.
callbacks = [tf.keras.callbacks.EarlyStopping(
monitor='val_loss', patience=2)]
# Train and validate model.
history = model.fit(
x_train,
train_labels,
epochs=epochs,
callbacks=callbacks,
validation_data=(x_val, val_labels),
verbose=2, # Logs once per epoch.
batch_size=batch_size)
# Print results.
history = history.history
print('Validation accuracy: {acc}, loss: {loss}'.format(
acc=history['val_acc'][-1], loss=history['val_loss'][-1]))
# Save model.
model.save('sepcnn_model.h5')
#return history['val_acc'][-1], history['val_loss'][-1], model
tupleData = (training_data_fname, training_data_lname, training_labels), (val_data_fname, val_data_lname, val_labels)
shape_param = tuple([i.shape[1] for i in x_train])
shape_param
1
shape_param = tuple([i.shape[1] for i in x_train])
shape_param
(17, 22)
print(x_trainfirst.shape)
(50000, 17)
print(x_trainlast.shape)
(50000, 22)
x_trainfirst[0]
array([ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 18, 2, 21, 5, 4], dtype=int32)
x_trainlast[0]
array([ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 10,
2, 3, 7, 5, 4], dtype=int32)
You can try out something like following.
Create one network, whose input will be sequence and output will be of some dims, let's suppose 32. Now this 32 dims represent the feature for the sequence.
Pass other sequence through the same network.
Now, you have 32 dims vectors for each sequence. You can concatenate these features and use these 64 dims vectors as input of another model(it can be simple feed forward network), whose output will be its target(whether both sequence are same or not).
This way you can capture the relation between two sequence.
Note: Before feeding the sequence into network, make sure both have same dims.
More you can read here, where it is used to find the similarity between faces.

Tensorflow Keras output style

I have encounter a strange thing while doing a dummy model in Keras. For reasons that are not important now, I decided to try to train a set of weights to become the identity matrix. My code was the following:
import tensorflow as tf
from tensorflow import keras
import numpy as np
tfe = tf.contrib.eager
tf.enable_eager_execution()
i4 = np.eye(4)
inds = np.random.randint(0,4,size=2000)
data = i4[inds]
model = keras.Sequential([keras.layers.Dense(4, kernel_regularizer=
keras.regularizers.l2(.001), kernel_initializer='zeros')])
model.compile(optimizer=tf.train.AdamOptimizer(.001), loss= 'mse', metrics = ['accuracy'])
model.fit(data,inds, epochs=50)
this did horribly on what should be a very simple task.I changed the last line to
model.fit(data, data, epochs =50)
which I think essentially means I am feeding the labels as one hot vectors. With this line, the training did exactly what I wanted it to on this very simple task. So, my questions are:
Why would this not work with the first line and work with the second?
What do I need to do to be able to feed the output to keras not as one hot vectors? Not that I mind converting. It's just that some of the examples I've seen - even MNIST - don't seem to convert their labels to one hots before feeding them in. What's the issue here? Is keras trying convert the numerical/other labels I've given it in a way I don't expect? If so, how does it convert such labels so I can predict the response correctly?
The model you used is trying to minimize the mean square error. Thus, it is obvious that the second line is the way to go:
model.fit(data, data, epochs=50)
because to learn the identity matrix, we should have: x =y, and thus data is both the inputs and outputs.
Why this does not work:
model.fit(data, inds, epochs=50)
Well, in this case your network output is of size 4 (dense layer), but you give it outputs of size 1 (inds). You should get an error...
How to do it without using one hot vectors for the output vectors:
One way is to use the sparse categorical crossentropy loss instead as such:
i4 = np.eye(4)
inds = np.random.randint(0,4,size=32)
data = i4[inds]
model = keras.Sequential([keras.layers.Dense(4, kernel_initializer='zeros', activation='softmax')])
model.compile(optimizer=tf.train.AdamOptimizer(.001), loss= 'sparse_categorical_crossentropy', metrics = ['accuracy'])
model.fit(data, inds, epochs=50)
and then you will see that the model will fit the inds very accurately:
In [4]: np.argmax(model.predict(data), axis=1)
Out[4]:
array([3, 1, 1, 3, 0, 3, 2, 0, 2, 1, 0, 2, 0, 0, 1, 2, 3, 2, 3, 0, 3, 2,
1, 2, 3, 3, 3, 1, 0, 1, 2, 0])
In [5]: inds
Out[5]:
array([3, 1, 1, 3, 0, 3, 2, 0, 2, 1, 0, 2, 0, 0, 1, 2, 3, 2, 3, 0, 3, 2,
1, 2, 3, 3, 3, 1, 0, 1, 2, 0])
and the train accuracy :
In [6]: np.mean(np.argmax(model.predict(data), axis=1) == inds)
Out[6]: 1.0

tensorflow: Strange result from convolution compared to theano (not flipping, though)

I am trying to implement some deep neural network with tensorflow. But I have already a problem at the first steps.
When I type the following using theano.tensor.nnet.conv2d, I get the expected result:
import theano.tensor as T
import theano
import numpy as np
# Theano expects input of shape (batch_size, channels, height, width)
# and filters of shape (out_channel, in_channel, height, width)
x = T.tensor4()
w = T.tensor4()
c = T.nnet.conv2d(x, w, filter_flip=False)
f = theano.function([x, w], [c], allow_input_downcast=True)
base = np.array([[1, 0, 0, 0], [1, 0, 0, 0], [0, 0, 0, 1]]).T
i = base[np.newaxis, np.newaxis, :, :]
print f(i, i) # -> results in 3 as expected because np.sum(i*i) = 3
However, when I do the presumingly same thing in tf.nn.conv2d, my result is different:
import tensorflow as tf
import numpy as np
# TF expects input of (batch_size, height, width, channels)
# and filters of shape (height, width, in_channel, out_channel)
x = tf.placeholder(tf.float32, shape=(1, 4, 3, 1), name="input")
w = tf.placeholder(tf.float32, shape=(4, 3, 1, 1), name="weights")
c = tf.nn.conv2d(x, w, strides=[1, 1, 1, 1], padding='VALID')
with tf.Session() as sess:
base = np.array([[1, 0, 0, 0], [1, 0, 0, 0], [0, 0, 0, 1]]).T
i = base[np.newaxis, :, :, np.newaxis]
weights = base[:, :, np.newaxis, np.newaxis]
res = sess.run(c, feed_dict={x: i, w: weights})
print res # -> results in -5.31794233e+37
The layout of the convolution operation in tensorflow is a little different from theano, which is why the input looks slightly different.
However, since strides in Theano default to (1,1,1,1) and a valid convolution is the default, too, this should be the exact same input.
Furthermore, tensorflow does not flip the kernel (implements cross-correlation).
Do you have any idea why this is not giving the same result?
Thanks in advance,
Roman
Okay, I found a solution, even though it is not really one because I do not understand it myself.
First, it seems that for the task that I was trying to solve, Theano and Tensorflow use different convolutions.
The task at hand is a "1.5 D convolution" which means sliding a kernel in only one direction over the input (here DNA sequences).
In Theano, I solved this with the conv2d operation that had the same amount of rows as the kernels and it was working fine.
However, Tensorflow (probably correctly) wants me to use conv1d for that, interpreting the rows as channels.
So, the following should work but didn't in the beginning:
import tensorflow as tf
import numpy as np
# TF expects input of (batch_size, height, width, channels)
# and filters of shape (height, width, in_channel, out_channel)
x = tf.placeholder(tf.float32, shape=(1, 4, 3, 1), name="input")
w = tf.placeholder(tf.float32, shape=(4, 3, 1, 1), name="weights")
x_star = tf.reshape(x, [1, 4, 3])
w_star = tf.reshape(w, [4, 3, 1])
c = tf.nn.conv1d(x_star, w_star, stride=1, padding='VALID')
with tf.Session() as sess:
base = np.array([[1, 0, 0, 0], [1, 0, 0, 0], [0, 0, 0, 1]]).T
i = base[np.newaxis, :, :, np.newaxis]
weights = base[:, :, np.newaxis, np.newaxis]
res = sess.run(c, feed_dict={x: i, w: weights})
print res # -> produces 3 after updating tensorflow
This code produced NaN until I updated Tensorflow to version 1.0.1 and since then, it produces the expected output.
To summarize, my problem was partly solved by using 1D convolution instead of 2D convolution but still required the update of the framework. For the second part, I have no idea at all what might have caused wrong behavior in the first place.
EDIT: The code I posted in my original question is now working fine, too. So the different behavior came only from an old (maybe corrupt) version of TF.

Categories