Bi-directional LSTM for entity recognition - python

Following a paper, I'm using word embeddings as a feature vector for entity recognition.
I've attempted to architect the network using Keras but have run into a dimensionality problem I cannot seem to resolve.
Take the following example sentence:
["I went to the shop"]
The sentence has 5 words, and after computing the feature matrix, I am left with a matrix of dimension: (1, 120, 1000) == (#examples, sequence_length, embedding).
Note that sequence_length appends 0. padding when not complete. In this example, the actual sequence_length would be 5.
My network architecture is as follows:
enc = encode()
claims_input = Input(shape=(120, 1000), dtype='float32', name='claims')
x = Masking(mask_value=0., input_shape=(120, 1000))(claims_input)
x = Bidirectional(LSTM(units=512, return_sequences=True, recurrent_dropout=0.2, dropout=0.2))(x)
x = Bidirectional(LSTM(units=512, return_sequences=True, recurrent_dropout=0.2, dropout=0.2))(x)
out = TimeDistributed(Dense(8, activation="softmax"))(x)
model = Model(inputs=claims_input, output=out)
model.compile(loss="sparse_categorical_crossentropy", optimizer='adam', metrics=["accuracy"])
model.fit(enc, y)
The architecture is straight forward, I mask specific time steps, run two bidirectional LSTMs, followed by a softmax output. My y variable in this case, is a (9,8) one-hot-encoded matrix corresponding to the gold label of each word.
When trying to fit() this model, I am running into a dimensionality problem relating to the TimeDistributed() layer and I'm unsure how to resolve, or even begin to debug this.
Error: ValueError: Error when checking target: expected time_distributed_1 to have 3 dimensions, but got array with shape (9, 8)
Any help would be appreciated.

You are doing entity recognition. So each element in your input sequence will be assigned an entity (probably some of them as null). If your model takes an input sample of shape (120, n_features), then the output must also be a sequence of length of 120, i.e. one entity for each element. Therefore, the labels, i.e. y, you provide to the model must have a shape of (n_samples, 120, n_entities) (or (n_samples, 120, 1) if you are using sparse labeling).
Side note: There is no difference between TimeDistributed(Dense(...)) and Dense(...), as the Dense layer is applied on the last axis.

Related

How is the Keras Conv1D input specified? I seem to be lacking a dimension

My input is a array of 64 integers.
model = Sequential()
model.add( Input(shape=(68,), name="input"))
model.add(Conv1D(64, 2, activation="relu", padding="same", name="convLayer"))
I have 10,000 of these arrays in my training set. And I supposed to be specifying this in order for conv1D to work?
I am getting the dreaded
ValueError: Input 0 of layer convLayer is incompatible with the layer: : expected min_ndim=3, found ndim=2. Full shape received: [None, 68]
error and I really don't understand what I need to do.
Don't let the name confuse you. The layer tf.keras.layers.Conv1D needs the following shape: (time_steps, features). If your dataset is made of 10,000 samples with each sample having 64 values, then your data has the shape (10000, 64), which is not directly applicable to the tf.keras.layers.Conv1D layer. You are missing the time_steps dimension. What you can do is use the tf.keras.layers.RepeatVector, which repeats your array input n times, in the example 5. This way your Conv1D layer gets an input of the shape (5, 64). Check out the documentation for more information:
time_steps = 5
model = tf.keras.Sequential()
model.add(tf.keras.layers.Input(shape=(64,), name="input"))
model.add(tf.keras.layers.RepeatVector(time_steps))
model.add(tf.keras.layers.Conv1D(64, 2, activation="relu", padding="same", name="convLayer"))
As a side note, you should ask yourself if using a tf.keras.layers.Conv1D layer is the right option for your use case. This layer is usually used for NLP and other time series tasks. For example, in sentence classification, each word in a sentence is usually mapped to a high-dimensional word vector representation, as seen in the image. This results in data with the shape (time_steps, features).
                                          
If you want to use character one hot encoded embeddings it would look something like this:
                                          
This is a simple example of one single sample with the shape (10, 10) --> 10 characters along the time series dimension and 10 features. It should help you understand the tutorial I mentioned a bit better.
The Conv1D layer does temporal convolution, that is, along the first dimension (not the batch dimension of course), so you should put something like this:
time_steps = 5
model = tf.keras.Sequential()
model.add(tf.keras.layers.Input(shape=(time_steps, 64), name="input"))
model.add(tf.keras.layers.Conv1D(64, 2, activation="relu", padding="same", name="convLayer"))
You will need to slice your data into time_steps temporal slices to feed the network.
However, if your arrays don't have a temporal structure, then conv1D is not the layer you are looking for.

How to correctly shape my CNN-LSTM input layer

I have a data set with the shape (3340, 6). I want to use a CNN-LSTM to read a sequence of 30 rows and predict the next row's (6) elements. From what I have read, this is considered a multi-parallel time series. I have been primarily following this machine learning mastery tutorial and am having trouble implementing the CNN-LSTM architecture for a multi-parallel time series.
I have used this function to split the data into 30 day time step frames
# split a multivariate sequence into samples
def split_sequences(sequences, n_steps):
X, y = list(), list()
for i in range(len(sequences)):
# find the end of this pattern
end_ix = i + n_steps
# check if we are beyond the dataset
if end_ix > len(sequences)-1:
break
# gather input and output parts of the pattern
seq_x, seq_y = sequences[i:end_ix, :], sequences[end_ix, :]
X.append(seq_x)
y.append(seq_y)
return array(X), array(y)
Here is a sample of the data frames produced by the function above.
# 30 Time Step Input Frame X[0], X.shape = (3310, 30, 6)
[4.951e-02, 8.585e-02, 5.941e-02, 8.584e-02, 8.584e-02, 5.000e+00],
[8.584e-02, 9.307e-02, 7.723e-02, 8.080e-02, 8.080e-02, 4.900e+01],
[8.080e-02, 8.181e-02, 7.426e-02, 7.474e-02, 7.474e-02, 2.000e+01],
[7.474e-02, 7.921e-02, 6.634e-02, 7.921e-02, 7.921e-02, 4.200e+01],
...
# 1 Time Step Output Array y[0], y.shape = (3310, 6)
[6.550e-02, 7.690e-02, 6.243e-02, 7.000e-02, 7.000e-02, 9.150e+02]
Here is the following model that I am using:
model = Sequential()
model.add(TimeDistributed(Conv1D(64, 1, activation='relu'), input_shape=(None, 30, 6)))
model.add(TimeDistributed(MaxPooling1D(pool_size=2)))
model.add(TimeDistributed(Flatten()))
model.add(LSTM(50, activation='relu', return_sequences=True))
model.add(Dense(6))
model.compile(optimizer='adam', loss='mse')
When I run model.fit, I receive the following error:
ValueError: Error when checking input: expected time_distributed_59_input to have
4 dimensions, but got array with shape (3310, 30, 6)
I am at a loss at how to properly shape my input layer so that I can get this model learning. I have done several Conv2D nets in the past but this is my first time series model so I apologize if there's an obvious answer here that I am missing.
Remove TimeDistributed from Conv1D and MaxPooling1D; 3D inputs are supported
Remove Flatten(), as it destroys timesteps-channels relationships
Add TimeDistributed to the last Dense layer, as Dense does not support 3D inputs (returned by LSTM(return_sequences=True); alternatively, use return_sequences=False)

Error when checking input in first layer of Keras Conv1D

I'm trying to assign one of two classes (positive/nagative) to audio using CNN with Keras. My model should accept varied lengths of input (frames) in which each frame contains 41 features but I struggle with the input size. Bear in mind that I haven't acquired full dataset so I just mocked some meaningless data just to check if network works at all.
According to documentation https://keras.io/layers/convolutional/ and my best understanding Conv1D can tackle varied lengths if first element of input_shape tuple is None. Shape of variable containing input data X_train.shape is (4, 497, 41).
data = pd.read_csv('output_file.csv', sep=';')
featureCount = data.values.shape[1]
#mocks because full data is not available yet
Y_train = np.asarray([1, 0, 1, 0])
X_train = np.asarray(
[np.array(data.values, copy=True), np.array(data.values, copy=True), np.array(data.values, copy=True),
np.array(data.values, copy=True)])
# variable length with 41 features
model = keras.models.Sequential()
model.add(keras.layers.Conv1D(100, 5, activation='relu', input_shape=(None, featureCount)))
model.add(keras.layers.GlobalMaxPooling1D())
model.add(keras.layers.Dense(10, activation='relu'))
model.add(keras.layers.Dense(1, activation='sigmoid'))
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
model.summary()
model.fit(X_train, Y_train, epochs=10, verbose=False, validation_data=(np.array(data.values, copy=True), [1]))
This code produces error
ValueError: Error when checking input: expected conv1d_input to have 3 dimensions, but got array with shape (497, 41). So it appears like the first dimension was cut out as it contains training samples (it seems correct to me) what bothered me is the required dimensionality, why is it 3?
After searching for the answer I stumbled onto Dimension of shape in conv1D and followed it by adding last dimension (using X_train = np.expand_dims(X_train, axis=3)) that contains only single digit but I ended up with another, similar error:
ValueError: Error when checking input: expected conv1d_input to have 3 dimensions, but got array with shape (4, 497, 41, 1) now it seems that first dimension that previously was treated as sample "list" is now part of actual data.
I also tried fiddling with input_shape parameter but to no avail and using Reshape layer but ended up fighting with size the
What should I do to satisfy required shape? How to prepare data for processing?

How to train a LSTM model with different N-dimensions labels?

I am using keras (ver. 2.0.6 with TensorFlow backend) for a simple neural network:
model = Sequential()
model.add(LSTM(32, return_sequences=True, input_shape=(100, 5)))
model.add(LSTM(32, return_sequences=True))
model.add(TimeDistributed(Dense(5)))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
It is only a test for me, I am "training" the model with the following dummy data.
x_train = np.array([
[[0,0,0,0,1], [0,0,0,1,0], [0,0,1,0,0]],
[[1,0,0,0,0], [0,1,0,0,0], [0,0,1,0,0]],
[[0,1,0,0,0], [0,0,1,0,0], [0,0,0,1,0]],
[[0,0,1,0,0], [1,0,0,0,0], [1,0,0,0,0]],
[[0,0,0,1,0], [0,0,0,0,1], [0,1,0,0,0]],
[[0,0,0,0,1], [0,0,0,0,1], [0,0,0,0,1]]
])
y_train = np.array([
[[0,0,0,0,1], [0,0,0,1,0], [0,0,1,0,0]],
[[1,0,0,0,0], [0,1,0,0,0], [0,0,1,0,0]],
[[0,1,0,0,0], [0,0,1,0,0], [0,0,0,1,0]],
[[1,0,0,0,0], [1,0,0,0,0], [1,0,0,0,0]],
[[1,0,0,0,0], [0,0,0,0,1], [0,1,0,0,0]],
[[1,0,0,0,0], [0,0,0,0,1], [0,0,0,0,1]]
])
then i do:
model.fit(x_train, y_train, batch_size=2, epochs=50, shuffle=False)
print(model.predict(x_train))
The result is:
[[[ 0.11855114 0.13603994 0.21069065 0.28492314 0.24979511]
[ 0.03013871 0.04114409 0.16499813 0.41659597 0.34712321]
[ 0.00194826 0.00351031 0.06993906 0.52274817 0.40185428]]
[[ 0.17915446 0.19629011 0.21316603 0.22450975 0.18687972]
[ 0.17935558 0.1994358 0.22070852 0.2309722 0.16952793]
[ 0.18571526 0.20774922 0.22724937 0.23079531 0.14849086]]
[[ 0.11163659 0.13263632 0.20109797 0.28029731 0.27433187]
[ 0.02216373 0.03424517 0.13683401 0.38068131 0.42607573]
[ 0.00105937 0.0023865 0.0521594 0.43946937 0.50492537]]
[[ 0.13276921 0.15531689 0.21852671 0.25823513 0.23515201]
[ 0.05750636 0.08210614 0.22636817 0.3303588 0.30366054]
[ 0.01128351 0.02332032 0.210263 0.3951444 0.35998878]]
[[ 0.15303896 0.18197381 0.21823004 0.23647803 0.21027911]
[ 0.10842207 0.15755147 0.23791778 0.26479205 0.23131666]
[ 0.06472684 0.12843341 0.26680911 0.28923658 0.25079405]]
[[ 0.19560908 0.20663913 0.21954383 0.21920268 0.15900527]
[ 0.22829761 0.22907974 0.22933882 0.20822221 0.10506159]
[ 0.27179539 0.25587022 0.22594844 0.18308094 0.063305 ]]]
Ok, It works, but it is just a test, i really do not care about accuracy etc. I would like to understand how i can work with output of different size.
For example: passing a sequence (numpy.array) like:
[[0,0,0,0,1], [0,0,0,1,0], [0,0,1,0,0]]
I would like to get 4 dimensions output as prediction:
[[..first..], [..second..], [..third..], [..four..]]
Is that possibile somehow? The size could vary I would train the model with different labels that can have different N-dimensions.
Thanks
This answer is for non varying dimensions, but for varying dimensions, the padding idea in Giuseppe's answer seems the way to go, maybe with help of the "Masking" proposed in Keras documentation.
The output shape in Keras is totally dependent on the number of "units/neurons/cells" you put in the last layer, and of course, on the type of layer.
I can see that your data does not match your code in your question, it's impossible, but, suppose your code is right and forget the data for a while.
An input shape of (100,5) in an LSTM layer means a tensor of shape (None, 100, 5), which is
None is the batch size. The first dimension of your data is reserved to the number of examples you have. (X and Y must have the same number of examples).
Each example is a sequence with 100 time steps
each time step is a 5-dimension vector.
And the 32 cells in this same LSTM layer means that the resulting vectors will change from 5 to 32-dimension vectors. With return_sequences=True, all the 100 timesteps will appear in the result. So the result shape of the first layer is (None, 100, 32):
Same number of examples (this will never change along the model)
Still 100 timesteps per example (because return_sequences=True)
each time step is a 32-dimension vector (because of 32 cells)
Now the second LSTM layer does exactly the same thing. Keeps the 100 timesteps, and since it has also 32 cells, keeps the 32-dimension vectors, so the output is also (None, 100, 32)
Finally, the time distributed Dense layer will also keep the 100 timesteps (because of TimeDistributed), and change your vectors to 5-dimensoin vectors again (because of 5 units), resulting in (None, 100, 5).
As you can see, you cannot change the number of timesteps directly with recurrent layers, you need to use other layers to change these dimensions. And the way to do this is completely up to you, there are infinite ways of doing this.
But in all of them, you need to get free of the timesteps and rebuild the data with another shape.
Suggestion
A suggestion from me (which is just one possibility) is to reshape your result, and apply another dense layer just to achieve the final shape expeted.
Suppose you want a result like (None, 4, 5) (never forget, the first dimension of your data is the number of examples, it can be any number, but you must take it into account when you organize your data). We can achieve this by reshaping the data to a shape containing 4 in the second dimension:
#after the Dense layer:
model.add(Reshape((4,125)) #the batch size doesn't appear here,
#just make sure you have 500 elements, which is 100*5 = 4*125
model.add(TimeDistributed(Dense(5))
#this layer could also be model.add(LSTM(5,return_sequences=True)), for instance
#continue to the "Activation" layer
This will give you 4 timesteps (because the dimension after Reshape was: (None, 4, 125), each step being a 5-dimension vector (because of Dense(5)).
Use the model.summary() command to see the shapes outputted by each layer.
I don't know Keras but from a practical and theoretical point of view this is absolutely possible.
The idea is that you have an input sequence and an output sequence. Commonly, the beginning and the end of each sequence are delimited by some special symbol (e.g. the character sequence "cat" is translated into "^cat#" with an start symbol "^" and an end symbol "#"). Then the sequence is padded with another special symbol, up to a maximum sequence length (e.g. "^cat#$$$$$$" with a padding symbol "$").
If the padding symbol correspond to a zero-vector, it will have no impact on your training.
Your output sequence could now assume any length up to the maximum one, because the real length is the one from the start to the end symbol positions.
In other words, you will have always the same input and output sequence length (i.e. the maximum one), but the real length is that between the start and the end symbols.
(Obviously, in the output sequence, anything after the end symbol should not be considered in the loss function)
There seems to be two methods to do a sequence to sequence method, you're describing. The first directly using keras using this example (code below)
from keras.layers import Input, LSTM, RepeatVector
from keras.models import Model
inputs = Input(shape=(timesteps, input_dim))
encoded = LSTM(latent_dim)(inputs)
decoded = RepeatVector(timesteps)(encoded)
decoded = LSTM(input_dim, return_sequences=True)(decoded)
sequence_autoencoder = Model(inputs, decoded)
encoder = Model(inputs, encoded)
Where the repeat vector repeats the initial time series n times to match the output vectors number of timestamps. This will still mean you need a fixed number of time steps in you output vector, however, there may be a method to padding vectors that have less timestamps than you max amount of timesteps.
Or you can you the seq2seq module, which is built ontop of keras.

TensorFlow/TFLearn: ValueError: Cannot feed value of shape (64,) for Tensor u'target/Y:0', which has shape '(?, 10)'

I have been trying to perform regression using tflearn and my own dataset.
Using tflearn I have been trying to implement a convolutional network based off an example using the MNIST dataset. Instead of using the MNIST dataset I have tried replacing the training and test data with my own. My data is read in from a csv file and is a different shape to the MNIST data. I have 255 features which represent a 15*15 grid and a target value. In the example I replaced the lines 24-30 with (and included import numpy as np):
#read in train and test csv's where there are 255 features (15*15) and a target
csvTrain = np.genfromtxt('train.csv', delimiter=",")
X = np.array(csvTrain[:, :225]) #225, 15
Y = csvTrain[:,225]
csvTest = np.genfromtxt('test.csv', delimiter=",")
testX = np.array(csvTest[:, :225])
testY = csvTest[:,225]
#reshape features for each instance in to 15*15, targets are just a single number
X = X.reshape([-1,15,15,1])
testX = testX.reshape([-1,15,15,1])
## Building convolutional network
network = input_data(shape=[None, 15, 15, 1], name='input')
I get the following error:
ValueError: Cannot feed value of shape (64,) for Tensor u'target/Y:0',
which has shape '(?, 10)'
I have tried various combinations and have seen a similar question in stackoverflow but have not had success. The example in this page does not work for me and throws a similar error and I do not understand the answer provided or those provided by similar questions.
How do I use my own data?
Short answer
In the line 41 of the MNIST example, you also have to change the output size 10 to 1 in network = fully_connected(network, 10, activation='softmax') to network = fully_connected(network, 1, activation='linear'). Note that you can remove the final softmax.
Looking at your code, it seems you have a target value Y, which means using the L2 loss with mean_square (you will find here all the losses available):
regression(network, optimizer='adam', learning_rate=0.01,
loss='mean_square', name='target')
Also, reshape Y and Y_test to have shape (batch_size, 1).
Long answer: How to analyse the error and find the bug
Here is how to analyse the error:
The error is Cannot feed value ... for Tensor 'target/Y', which means it comes from the feed_dict argument Y.
Again, according to the error, you try to feed an Y value of shape (64,) whereas the network expect a shape (?, 10).
It expects a shape (batch_size, 10), because originally it's a network for MNIST (10 classes)
We now want to change the expected value of the network for Y.
in the code, we see that the last layer fully_connected(network, 10, activation='softmax') is returning an output of size 10
We change that to an output of size 1 without softmax: fully_connected(network, 1, activation='linear')
In the end, it was not a bug, but a wrong model architecture.

Categories