Fit Vectorization-Output for RNN-Input - python

I have a Keras model that text Text as an Input, vectorizes it via the TextVectorization-Layer via multi_hot encoding. The problem is that this layer outputs a Tensor with the shape (None,batch_size). The next layer, SimpleRNN, requires a Tensor with 3 Dimensions. How can I convert this output without using an Embedding-Layer?
I already tried different Layers to work with my Input, reshaping the np-arrays. Here is my current code:
X_Train = np.asarray(X_Train)
X_Test = np.asarray(X_Test)
Y_Train = np.asarray(Y_Train)
Y_Test = np.asarray(Y_Test)
vectorizer = TextVectorization(standardize=None,split=None,output_mode="multi_hot")
vectorizer.adapt(X_Train)
model = Sequential()
model.add(Input(shape=(1),dtype="string"))
model.add(vectorizer)
print(vectorizer.output_shape)
#model.add(Embedding(datasetSize,2))
model.add(layers.SimpleRNN(nodes,activation=activationFunc))
model.add(Dense(1,activation=None,use_bias=False))
model.compile(optimizer=optimizerFunc,loss=lossFunc)
model.summary()
It works using the Embedding-Layer, but I need it to work without at.
My data is basically a csv-File with the format
Word
Weight
word1
1
word2
2
The goal is to let the network predict the weight of any given word based on this data.
Any idea how to make it work with this?

Related

Tensorflow input pipeline with rank 3 tensor

I just started to learn Tensorflow. Most projects I have been working on is using csv dataset to predict values with CNN architecture. Actually, I just use the example code from Basic regression: Predict fuel efficiency
However, the current project I am working on has an input data structure like a 3D tensor. I tried to modify the input size within my hidden layer. Nevertheless, I did not get the outcomes.
My code is here (The first part of code is from Tensorflow input pipeline where multiple rows correspond to a single observation?, he had same situation as I have.)
dataset = tf.data.TextLineDataset('Batch training test/3+1testdata.csv')
# Skip the header line.
dataset = dataset.skip(1)
# Combine 6 lines into a single observation.
dataset = dataset.batch(6)
def parse_observation(line_batch):
record_defaults = [[0.0], [0.0], [0.0], [0.0]]
a, b, c, d = tf.io.decode_csv(line_batch, record_defaults=record_defaults)
features = tf.stack([a, b, c])
label = d[-1] # Take the label from the last row.
return features, label
# Parse each observation into a `row_per_ob X 2` matrix of features and a
# scalar label.
dataset = dataset.map(parse_observation)
# Batch multiple observations.
dataset = dataset.batch(10)
# Optionally add a prefetch for performance.
dataset = dataset.prefetch(1)
def build_model():
model = keras.Sequential([
layers.Dense(128, activation='relu', kernel_initializer='he_uniform',input_shape=(None,3,6)),
layers.Dense(128, activation='relu', kernel_initializer='he_uniform'),
layers.Dense(1, kernel_initializer='he_uniform')
])
optimizer = tf.keras.optimizers.RMSprop(learning_rate=0.001)
#optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
model.compile(loss='mae',
optimizer=optimizer,
metrics=['mae', 'mse', 'msle'])
return model
model = build_model()
history = model.fit(
dataset,
epochs=200)
What I expect is this machine using column data a,b,c to predict d[-1], instead the machine seems predicting d[-1] three times with input columns a,b,c individually. I know the problem comes from input shape, but how do I change it? Do I have to add convolutional 2d or padding layers?
I really have no clue for this, looking for someone's brilliant advices. Thank in advance!

LSTM with Attention getting weights?? Classifing documents based on sentence embedding

I'm really stuck building a NN for text-classification with keras using lstm and adding an attention_layer on top. Im sure Iam pretty close, but Im confused:
Do I have to add a TimeDistributed dense layer after LSTM?
And, how do I retrieve the Attention weights from my network (for visualization purpose)? - so that I know which sentence was 'responsible' that the document was classified as good or bad?
Say, I have 10 documents consisting of 100 sentences and each sentence is represented as a 500 element vector. So my documents matrix containing the sentence-sequences looks like: X = np.array(Matrix).reshape(10, 100, 500)
The documents should be classified to an according sentiment 1=good; 0=bad - so
y= [1,0,0,1,1]
yy= np.array(y)
I dont need an embedding-layer cause each sentence of each document is already a sparse-vector.
The attention layer is taken from: https://github.com/richliao/textClassifier/blob/master/textClassifierHATT.py
MAX_SENTS = 100
MAX_SENT_LENGTH = 500
review_input = Input(shape=(MAX_SENTS, MAX_SENT_LENGTH))
l_lstm_sent = LSTM(100, activation='tanh', return_sequences=True)(review_input)
l_att_sent = AttLayer(100)(l_lstm_sent)
preds = Dense(1, activation='softmax')(l_att_sent)
model = Model(review_input, preds)
model.compile(loss='binary_crossentropy',
optimizer='rmsprop',
metrics=['acc'])
model.fit(X, yy, nb_epoch=10, batch_size=50)
So I think my model should be set up correctly but Im not quite sure.. But how do I get the attention-weights from that (e.g. so I know which sentence caused a classification as 1)? Help so much appreciated
1. Time distributed
In this case, you don't have to wrap Dense into TimeDistributed, although it may be a little bit faster if you do, especially if you can provide a mask that masks out a large part of the LSTM output.
However, Dense operates in the last dimension no matter what the shape before the last dimension is.
2. Attention weights
Yes, it is as you suggest in the comment. You need to modify the AttLayer it is capable of returning both its output and the attention weights.
return output, ait
And then create a model that contains both prediction and attention weight tensors and get the predictions for them:
l_att_sent, l_att_sent = AttLayer(100)(l_lstm_sent)
...
predictions, att_weights = attmodel.predict(X)

What do input layers represent in a Hierarchical Attention Network

I'm trying to grasp the idea of a Hierarchical Attention Network (HAN), most of the code i find online is more or less similar to the one here: https://medium.com/jatana/report-on-text-classification-using-cnn-rnn-han-f0e887214d5f :
embedding_layer=Embedding(len(word_index)+1,EMBEDDING_DIM,weights=[embedding_matrix],
input_length=MAX_SENT_LENGTH,trainable=True)
sentence_input = Input(shape=(MAX_SENT_LENGTH,), dtype='int32', name='input1')
embedded_sequences = embedding_layer(sentence_input)
l_lstm = Bidirectional(LSTM(100))(embedded_sequences)
sentEncoder = Model(sentence_input, l_lstm)
review_input = Input(shape=(MAX_SENTS,MAX_SENT_LENGTH), dtype='int32', name='input2')
review_encoder = TimeDistributed(sentEncoder)(review_input)
l_lstm_sent = Bidirectional(LSTM(100))(review_encoder)
preds = Dense(len(macronum), activation='softmax')(l_lstm_sent)
model = Model(review_input, preds)
My question is: What do the input layers here represent? I'm guessing that input1 represents the sentences wrapped with the embedding layer, but in that case what is input2? Is it the output of the sentEncoder? In that case it should be a float, or if it's another layer of embedded words, then it should be wrapped with an embedding layer as well.
The HAN model processes the text in a hierarchy: it takes a document already splitted into sentences (that's why the shape of input2 is (MAX_SENTS,MAX_SENT_LENGTH)); then it processes each sentence independently using sentEncoder model (that's why the shape of input1 is (MAX_SENT_LENGTH,)), and finally it processes all the encoded sentences together.
So in your code the whole model is stored in model and its input layer is input2 which you would fed with documents which have been splitted into sentences and their words have been integer encoded (to make it compatible with the embedding layer). The other input layer belongs to the sentEncoder model which is used inside the model (and not directly by you):
review_encoder = TimeDistributed(sentEncoder)(review_input)
Masoud's answer is correct but I'll rewrite it here in my own words:
The data (X_train) is fed as indexes to the model and is received by
input2
X_train is then forwarded to the encoder model and is received by
input1
input1 is wrapped by an embedding layer so the indexes are converted
to vectors
So input2 is more a proxy of the model's input.

What is the timestep in Keras' LSTM?

I have some troubles with the LSTM implementation in Keras.
My training set is structured as follow:
number of sequences: 5358
the length of each sequence is 300
each element of the sequence is a vector of 54 features
I'm unsure on how to shape the input for a stateful LSTM.
Following this tutorial: http://philipperemy.github.io/keras-stateful-lstm/, I've created the subsequences (in my case there are 1452018 subsequences with a window_size = 30).
What is the best option to reshape the data for a stateful LSTM's input?
What means the timestep of the input in this case? And why?
Is the batch_size related to the timestep?
I'm unsure on how to shape the input for a stateful LSTM.
LSTM(100, statefull=True)
But before using stateful LSTM ask yourself do I really need statefull LSTM? See here and here for more details.
What is the best option to reshape the data for a stateful LSTM's
input?
It really depends on the problem on hands. However, I think you do not need reshaping just feed data directly into Keras:
input_layer = Input(shape=(300, 54))
What means the timestep of the input in this case? And why?
In your example timestamp is 300. See here for further details on timestamp. In the following picture, we have 5 timestamps that we feed them into the LSTM network.
Is the batch_size related to the timestep?
No, it has nothing to do with batch_size. More details on batch_size can be found here.
Here is simple code based on the description that you provide. It might give you some intuition:
import numpy as np
from tensorflow.python.keras import Input, Model
from tensorflow.python.keras.layers import LSTM
from tensorflow.python.layers.core import Dense
x_train = np.zeros(shape=(5358, 300, 54))
y_train = np.zeros(shape=(5358, 1))
input_layer = Input(shape=(300, 54))
lstm = LSTM(100)(input_layer)
dense1 = Dense(20, activation='relu')(lstm)
dense2 = Dense(1, activation='sigmoid')(dense1)
model = Model(inputs=input_layer, ouputs=dense2)
model.compile("adam", loss='binary_crossentropy')
model.fit(x_train, y_train, batch_size=512)

LSTM with keras

I have some training data x_train and some corresponding labels for this x_train called y_train. Here is how x_train and y_train are constructed:
train_x = np.array([np.random.rand(1, 1000)[0] for i in range(10000)])
train_y = (np.random.randint(1,150,10000))
train_x has 10000 rows and 1000 columns for each row.
train_y has a label between 1 and 150 for each sample in train_x and represents a code for each train_x sample.
I also have a sample called sample, which is 1 row with 1000 columns, which I want to use for prediction on this LSTM model. This variable is defined as
sample = np.random.rand(1,1000)[0]
I am trying to train and predict an LSTM on this data using Keras. I want to take in this feature vector and use this LSTM to predict one of the codes in range 1 to 150. I know these are random arrays, but I cannot post the data I have. I have tried the following approach which I believe should work, but am facing some issues
model = Sequential()
model.add(LSTM(output_dim = 32, input_length = 10000, input_dim = 1000,return_sequences=True))
model.add(Dense(150, activation='relu'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
history = model.fit(train_x, train_y,
batch_size=128, nb_epoch=1,
verbose = 1)
model.predict(sample)
Any help or adjustments to this pipeline would be great. I am not sure if the output_dim is correct. I want to pass train the LSTM on each sample of the 1000 dimension data and then reproduce a specific code that is in range 1 to 150. Thank you.
I see at least three things you need to change:
Change this line:
model.add(Dense(150, activation='relu'))
to:
model.add(Dense(150, activation='softmax'))
as leaving 'relu' as activation makes your output unbounded whereas it needs to have a probabilistic interpretation (as you use categorical_crossentropy).
Change loss or target:
As you are using categorical_crossentropy you need to change your target to be a one-hot encoded vector of length 150. Another way is to leave your target but to change loss to sparse_categorical_crossentropy.
Change your target range:
Keras has a 0-based array indexing (as in Python, C and C++ so your values should be in range [0, 150) instead [1, 150].

Categories