I'm building an RNN LSTM network to classify texts based on the writers' age (binary classification - young / adult).
Seems like the network does not learn and suddenly starts overfitting:
Red: train
Blue: validation
One possibility could be that the data representation is not good enough. I just sorted the unique words by their frequency and gave them indices. E.g.:
unknown -> 0
the -> 1
a -> 2
. -> 3
to -> 4
So I'm trying to replace that with word embedding.
I saw a couple of examples but I'm not able to implement it in my code. Most of the examples look like this:
embedding = tf.Variable(tf.random_uniform([vocab_size, hidden_size], -1, 1))
inputs = tf.nn.embedding_lookup(embedding, input_data)
Does this mean we're building a layer that learns the embedding? I thought that one should download some Word2Vec or Glove and just use that.
Anyway let's say I want to build this embedding layer...
If I use these 2 lines in my code I get an error:
TypeError: Value passed to parameter 'indices' has DataType float32 not in list of allowed values: int32, int64
So I guess I have to change the input_data type to int32. So I do that (it's all indices after all), and I get this:
TypeError: inputs must be a sequence
I tried wrapping inputs (argument to tf.contrib.rnn.static_rnn) with a list: [inputs] as suggested in this answer, but that produced another error:
ValueError: Input size (dimension 0 of inputs) must be accessible via
shape inference, but saw value None.
Update:
I was unstacking the tensor x before passing it to embedding_lookup. I moved the unstacking after the embedding.
Updated code:
MIN_TOKENS = 10
MAX_TOKENS = 30
x = tf.placeholder("int32", [None, MAX_TOKENS, 1])
y = tf.placeholder("float", [None, N_CLASSES]) # 0.0 / 1.0
...
seqlen = tf.placeholder(tf.int32, [None]) #list of each sequence length*
embedding = tf.Variable(tf.random_uniform([VOCAB_SIZE, HIDDEN_SIZE], -1, 1))
inputs = tf.nn.embedding_lookup(embedding, x) #x is the text after converting to indices
inputs = tf.unstack(inputs, MAX_POST_LENGTH, 1)
outputs, states = tf.contrib.rnn.static_rnn(lstm_cell, inputs, dtype=tf.float32, sequence_length=seqlen) #---> Produces error
*seqlen: I zero-padded the sequences so all of them have the same list size, but since the actual size differ, I prepared a list describing the length without the padding.
New error:
ValueError: Input 0 of layer basic_lstm_cell_1 is incompatible with
the layer: expected ndim=2, found ndim=3. Full shape received: [None,
1, 64]
64 is the size of each hidden layer.
It's obvious that I have a problem with the dimensions... How can I make the inputs fit the network after embedding?
From the tf.nn.static_rnn , we can see the inputs arguments to be:
A length T list of inputs, each a Tensor of shape [batch_size, input_size]
So your code should be something like:
x = tf.placeholder("int32", [None, MAX_TOKENS])
...
inputs = tf.unstack(inputs, axis=1)
tf.squeeze is a method that removes dimensions of size 1 from the tensor. If the end goal is to have the input shape as [None,64], then put a line similar to inputs = tf.squeeze(inputs) and that would fix your problem.
Related
My question is very simple:
How can I reduce the dimension of a list or a tensor using max-pooling layer to 512 elements in the list:
I'm trying the following code:
input_ids = tokenizer.encode(question, text)
print(input_ids) # input_ids is a list of 700 elements
m = nn.AdaptiveMaxPool1d(512)
input_ids = m(torch.tensor([[input_ids]])) # convert the list to tensor and apply max-pooling layer
But I get the following error:
RuntimeError: "adaptive_max_pool2d_cpu" not implemented for 'Long'
So, please help to figure out where is the error
The problem is with your input_ids. You are passing a tensor of type long to AdaptiveMaxPool1d, just convert it to float.
input_ids = tokenizer.encode(question, text)
print(input_ids) # input_ids is a list of 700 elements
m = nn.AdaptiveMaxPool1d(512)
input_ids = m(torch.tensor([[input_ids]]).float()) #
I have a keras/tf problem using sub-sampling of values from a tensor. My model is given below:
x_input = Input((input_size,))
enc1 = Dense(encoder_size[0], activation='relu')(x_input)
drop = Dropout(keep_prob)(enc1)
enc2 = Dense(encoder_size[1], activation='relu')(drop)
drop = Dropout(keep_prob)(enc2)
mu = Dense(latent_dim, activation='linear', name='encoder_mean')(drop)
encoder = Model(x_input,mu)
I want to sample from the input randomly and then get the encoded values of the input. The error I am getting is
ValueError: When feeding symbolic tensors to a model, we expect the tensors to have a static batch size. Got tensor with shape: (None, 13)
which I can understand is because "predict" does not work on placeholder but I am not sure what to pass to get the output for a placeholder.
# sample input randomly
sample_num = 500
idxs = tf.range(tf.shape(x_input)[0])
ridxs = tf.random_shuffle(idxs)[:sample_num]
sample_input = tf.gather(x_input, ridxs)
# get sample shape
sample_shape = K.shape(sample_input)
# sample from encoded value
sample_encoded = encoder.predict(sample_input) <----- Error
If you see the predict function documentation, it doesn't expect a placeholder or a tensor node as an expected set of input. You have to pass directly the Numpy array (in your case).
If you wish to perform some special data preprocessing which is not part of your regular model, you have to do it in Numpy and avoid Tensor computations for it.
Hi there i'm trying to build a simple rnn with 11 inputs and 2 outputs
X=tf.placeholder(tf.float32,[None,n_steps,n_inputs])
y=tf.placeholder(tf.int32,[None,n_steps,n_outputs])
I know the rnn excepts an input in the shape of [batch_size,n_steps,n_inputs] so that's why i have shaped my placeholders like this
However when i run the code i get an error
ValueError: Shape must be rank 2 but is rank 3 for 'in_top_k/InTopKV2' (op: 'InTopKV2') with input shapes: [1,270,2], [1,270,2], [].
The error seems to originate here : correct = tf.nn.in_top_k(logits,tf.reshape(y,[1,n_steps,n_outputs]),1)
I have tried reshaping the logits, squeezing the logits, expanding the y dimensions, but nothing seems to work.
One difference that i have noticed is that when i squeeze the logits with
tf.squeeze(logits)
The error now says
ValueError: Shape must be rank 1 but is rank 3
That is the only 'progress' that i have been able to make, any help would be appreciated.
p.s go easy on me this is my first question ever
You have to reshape the inputs as 2D tensors, then you can reshape the result back to the desired shape:
logits_res = tf.reshape(logits, (-1, n_outputs))
y_res = tf.reshape(y, (-1, n_outputs))
correct_res = tf.nn.in_top_k(logits_res, y_res, 1)
correct = tf.reshape(correct_res, (-1, n_steps))
I am trying to feed the features extracted from 2 fine-tuned VGG16 (each on a different stream), then for each sequence of 9 data pairs, concatenate their numpy arrays and feed the sequence of 9 outputs (concatenated) to a bi-directional LSTM in Keras.
The problem is that I am running into an error when trying to build the LSTM part. The following shows the generator I wrote to read both RGB and Optical flow streams, extract features and concatenate each pair :
def generate_generator_multiple(generator,dir1, dir2, batch_rgb, batch_flow, img_height,img_width):
print("Processing inside generate multiple")
genX1 = generator.flow_from_directory(dir1,
target_size = (img_height,img_width),
class_mode = 'categorical',
batch_size = batch_rgb,
shuffle=False
)
genX2 = generator.flow_from_directory(dir2,
target_size = (img_height,img_width),
class_mode = 'categorical',
batch_size = batch_flow,
shuffle=False
)
while True:
imgs, labels = next(genX1)
X1i = RGB_model.predict(imgs, verbose=0)
imgs2, labels2 = next(genX2)
X2i = FLOW_model.predict(imgs2,verbose=0)
Xi = []
for i in range(9):
Xi.append(np.concatenate([X1i[i+1],X2i[i]]))
Xi = np.asarray(Xi)
if np.array_equal(labels[1:],labels2)==False:
print("ERROR !! problem of labels matching: RGB and FLOW have different labels")
yield Xi, labels2[2]
I am expecting the generator to yield a sequence of 9 arrays, so the shape of Xi when I force the loop to run twice is: (9, 14, 7, 512)
When I use while True (like in the code above) and try to call the method to check what it returs, after 3 iterations I get the error:
ValueError: too many values to unpack (expected 2)
Now, assuming that there is no problem with the generator, I try to feed the data returned by the generator to the bidirectional LSTM like the following:
n_frames = 9
seq = 100
Bi_LSTM = Sequential()
Bi_LSTM.add(Bidirectional(LSTM(seq, return_sequences=True, dropout=0.25, recurrent_dropout=0.1),input_shape=(n_frames,14,7,512)))
Bi_LSTM.add(GlobalMaxPool1D())
Bi_LSTM.add(TimeDistributed(Dense(100, activation="relu")))
Bi_LSTM.add(layers.Dropout(0.25))
Bi_LSTM.add(Dense(4, activation="relu"))
model.compile(Adam(lr=.00001), loss='categorical_crossentropy', metrics=['accuracy'])
But I keep getting the following error: (the error log is a bit long)
InvalidArgumentError: Shape must be rank 4 but is rank 2 for 'bidirectional_2/Tile_1' (op: 'Tile') with input shapes: [?,7,512,1], [2].
It seems to be caused by this line:
Bi_LSTM.add(Bidirectional(LSTM(seq, return_sequences=True, dropout=0.25, recurrent_dropout=0.1),input_shape=(n_frames,14,7,512)))
I am not sure anymore if the problem is the way I try to build the LSTM, the way I return the data from the generator, or the way I define the input of LSTM.
Thanks a lot for any help you can provide.
It seems like this error specifically is cause by the following line:
input_shape=(n_frames,14,7,512)
I was confused about the input for LSTM. Instead to explicitly giving the shape of the input, we just need to specify the dimensions of the input. In my case, this is 3 since the input is a 3D np array. I still have other problems with my code, but for this specific error, the solution is changing that part with:
input_shape=(n_frames,3)
Edit:
When predicting, We need to get the mean of the prediction since LSTM expects a 1D input.
Another issue in my code was the shape of Xi. It needs to be reshaped before yielding it so that it matches the input expected by LSTM.
My Keras RNN code is as follows:
def RNN():
inputs = Input(shape = (None, word_vector_size))
layer = LSTM(64)(inputs)
layer = Dense(256,name='FC1')(layer)
layer = Dropout(0.5)(layer)
layer = Dense(num_classes,name='out_layer')(layer)
layer = Activation('softmax')(layer)
model = Model(inputs=inputs,outputs=layer)
return model
I'm getting the error when I call model.fit()
model.fit(np.array(word_vector_matrix), np.array(Y_binary), batch_size=128, epochs=10, validation_split=0.2, callbacks=[EarlyStopping(monitor='val_loss',min_delta=0.0001)])
Word_vector_matrix is a 3-dim numpy array.
I have printed the following :
print(type(word_vector_matrix), type(word_vector_matrix[0]), type(word_vector_matrix[0][0]), type(word_vector_matrix[0][0][0]))
and the answer is :
<class 'numpy.ndarray'> <class 'numpy.ndarray'> <class 'numpy.ndarray'> <class 'numpy.float32'>
It's shape is 1745 x sentence length x word vector size.
The sentence length is variable and I'm trying to pass this entire word vector matrix to the RNN, but I get the error above.
The shape is printed like:
print(word_vector_matrix.shape)
The answer is (1745,)
The shape of the nested arrays are printed like:
print(word_vector_matrix[10].shape)
The answer is (7, 300)
The first number 7 denotes the sentence length, which is variable and changes for each sentence, and the second number is 300, which is fixed for all words and is the word vector size.
I have converted everything to np.array() as suggested by the other posts, but still the same error. Can someone please help me. I'm using python3 btw. The similar thing is working in python2 for me, but not in python3. Thanks!
word_vector_matrix is not a 3-D ndarray. It's a 1-D ndarray of 2-D arrays. This is due to variable sentence length.
Numpy allows ndarray to be list-like structures that may contain a complex element (another ndarray). In Keras however, the ndarray must be converted into a Tensor (which has to be a "mathematical" matrix of some dimension - this is required for the sake of efficient computation).
Therefore, each batch must have fixed size sentences (and not the entire data).
Here are a few alternatives:
Use batch size of 1 - simplest approach, but impedes your network's convergence. I would suggest to only use it as a temporary sanity check.
If sequence length variability is low, pad all your batches to be of the same length.
If sequence length variability is high, pad each batch with the max length within that batch. This would require you to use a custom data generator.
Note: After you padded your data, you need to use Masking, so that the padded part will be ignored during training.