SelfAttention Visualization - python

I've trained simple GRU with attention layer and now I'm trying to visualize attention weights (I've already got them). Input is 2 one-hot encoded sequences (one is correct, the other is almost the same but has permutations of letters). The task is to define which one of the sequences is correct.
Here's my NN:
optimizer = keras.optimizers.RMSprop()
max_features = 4 #number of words in the dictionary
num_classes = 2
model = keras.Sequential()
model.add(GRU(128, input_shape=(70, max_features), return_sequences=True, activation='tanh'))
model.add(Dropout(0.5))
atn_layer = model.add(SeqSelfAttention())
model.add(Flatten())
model.add(Dense(num_classes, activation='sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer=optimizer,
metrics=['accuracy'])
I've tried several things found on StackOverflow but didn't succeed. The thing particularly is that I don't understand how to couple my input and attention weights. I'd appreciate any help and suggestions.

Related

Can any one explain the functionality of this code,please?

Can any one explain the functionality of this code,please?
embedding_vector_length = 32 model = Sequential()
model.add(Embedding(vocab_size,embedding_vector_length,
input_length=200) ) model.add(SpatialDropout1D(0.25))
model.add(LSTM(50, dropout=0.5, recurrent_dropout=0.5))
model.add(Dropout(0.2)) model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy',optimizer='adam', metrics=['accuracy'])
print(model.summary())
This code defines a recurrent neural network (often referred to as a 'Sequence' or 'Sequential' model). In this context is is being set up for natural language processing with a binary cross entropy loss function, adam optimizer, and two dropout layers.

Tensorflow LSTM based RNN - Incorrect and Constant Prediction

I hope someone can point out where I am going wrong with my RNN. The long and short of my problem is that no matter the structure of my network, the predictions are always along the lines of this:
I have tried 1, 2, 3, and 4 layers of LSTMs each with varying neuron counts and either relu or tanh activation functions. For the above image, the network was setup as:
model = Sequential()
model.add(LSTM(128, activation='relu', return_sequences=True, input_shape=(length, scaled_train_data.shape[1])))
model.add(LSTM(256, activation='relu', return_sequences=True))
model.add(LSTM(256, activation='relu', return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(128, activation='relu'))
model.add(Dense(scaled_train_data.shape[1]))
model.compile(optimizer='adam', loss="mse")
The actual training of the model passes ok, without event:
My data is financial data. There are around 70k rows and I have approx. 70/30 train/test split.
Where am I going wrong? Thanks!
So from asking about and reading around, it seems RNNs might not be the best solution for financial / random walk data - at least with the setup I am using. I wonder if using averages might produce better results?
Anyway, moving on to Reinforcement Learning.

How to include transformation in Keras input model?

I'm pretty new to Keras and Tensorflow in general, so perhaps this is a stupid question...
What I want to achieve is the following:
I have a set of words, let's say: cat, dog, cow,..
Those words should be encoded based on a given alphabet, on the position of the character there is a 1 in the vector, else a 0.
For cat e.g. something like 1,0,1,0,0,0,0,0,....,1,0,0,...0.
I use the Keras Tokenizer for that:
tk = Tokenizer(char_level=True, oov_token='UNK')
alphabet="abcdefghijklmnopqrstuvwxyzöäü0123456789-,;.!?:'\"/\\|_##$%^&*~`+-=<>()[]{}"
char_dict = {}
for i, char in enumerate(alphabet):
char_dict[char] = i + 1
# Use char_dict to replace the tk.word_index
tk.word_index = char_dict
# Add 'UNK' to the vocabulary
tk.word_index[tk.oov_token] = max(char_dict.values()) + 1
x_train = tk.texts_to_matrix(x_train)
Those vectors are passed into the Keras model for prediction. Now, I want the transformation to happen in the Keras model. So, the user should provide "cat" to the model and not a numeric vector like above. And the model should also return "cat".
How can I achieve that? I saw that there is a Lambda layer in Keras, is this the correct approach?
Thanks in advance.
Edit for clarification:
The model at the moment looks like this:
model = Sequential()
model.add(Dense(128, input_shape=(len(alphabet)+1,)))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes))
model.add(Activation('softmax'))
But what I want to achieve is to have an input layer that gets strings as inputs and converts the strings to the format the actual first layer can read.Something like this:
model = Sequential()
**model.add(transformation_layer)**
model.add(Dense(128, input_shape=(len(alphabet)+1,)))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes))
model.add(Activation('softmax'))
Edit 2
This is what I tried, but getting the following error when running the "model.fit" function:
tensorflow.python.framework.errors_impl.OperatorNotAllowedInGraphError: iterating over tf.Tensor is not allowed in Graph execution. Use Eager execution or decorate this function with #tf.function.
def transform_layer(x):
return tk.texts_to_matrix(x)
print('Building model...')
transform_layer = Lambda(transform_layer)
model = Sequential()
model.add(transform_layer)
model.add(Dense(128, input_shape=(len(alphabet)+1,)))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
history = model.fit(np.array(['test','test2']), np.array(['blub','blub2']),
batch_size=batch_size,
epochs=epochs,
verbose=1,
validation_split=0.1)

keras fit time/step difference

Building a dqn agent, and trying to understand why calling fit in my code is orders of magnitude slower (over 1s) than another example I found (1ms). The neural nets are almost the same, the example has more connections, but that's the only difference (my alpha is set to the same as the example NN learning rate).
No idea what would cause such a difference in performance time. I thought maybe it was the way data was formatted before calling fit, but it looks like everything is the same.
My results:
Example results:
My NN:
q = Sequential()
q.add(Dense(24, input_dim=n_states, activation='relu'))
q.add(Dense(24, activation='relu'))
q.add(Dense(n_actions, activation='linear'))
q.compile(loss='mse', optimizer=Adam(lr=alpha))
Example NN:
model = Sequential()
model.add(Dense(32, input_dim=nS, activation='relu'))
model.add(Dense(32, activation='relu'))
model.add(Dense(32, activation='relu'))
model.add(Dense(nA, activation='linear'))
model.compile(loss='mse', optimizer=Adam(lr=0.01))

Neural Networks works worse than RandomForest

I have a classification problem that target contains 5 classes, 15 features(all continuous)
and have 1 million for training data, 0.5 million for validation data.
e.g.,
shape of X_train = (1000000,15)
shape of X_validation = (500000,15)
First, I used Random Forest that can get 88% Avg. Accuracy.
After that I tried many Neural Network architecture, the best one got ~80% Avg. Accuracy both on training and validation data, which was worse than Random forest.
(I don't know much about designing Neural Network architecture)
Following is the best one of my NN architecture. (~80% Avg.Accuracy)
model = Sequential()
model.add(Dense(1000, input_dim=15, activation='relu'))
model.add(Dropout(0.1))
model.add(Dense(900, activation='relu'))
model.add(Dropout(0.1))
model.add(Dense(800, activation='relu'))
model.add(Dropout(0.1))
model.add(Dense(700, activation='relu'))
model.add(Dropout(0.1))
model.add(Dense(600, activation='relu'))
model.add(Dense(5, activation='softmax'))#output layer
adadelta = Adadelta()
model.compile(loss='categorical_crossentropy', optimizer=adadelta, metrics=['accuracy'])
Batch Size = 128 and epochs = 100
I have read this question. The answer point out that NN needs amount of data and some regulization. I think my data size is good enough and I have also tried higer Dropout rate and L2 regulization but still not working.
What could the problem be?
This is biological data that I have no domain knowledge so sorry about that I can't explain it. I've plot the feature distribution as below, all features are between 0 to 3

Categories