I am new to machine learning and experimented a bit with neural networks and did some research.
I am currently trying to make a mini network for fake news detection.
My data has several features (statement,speaker,date,topic..), so far I've tried using simply the text of the false and true statements as input for my network and used glove for word embeddings. I tried out the following network:
model = tf.keras.Sequential(
[
# part 1: word and sequence processing
tf.keras.layers.Embedding(embeddings_matrix.shape[0], embeddings_matrix.shape[1], weights=[embeddings_matrix], trainable=True),
tf.keras.layers.Conv1D(128, 5, activation='relu'),
tf.keras.layers.GlobalMaxPooling1D(),
tf.keras.layers.Dropout(0.2),
# part 2: classification
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(1, activation='sigmoid')
])
model.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])
It gives me over 90% training accuracy and around 65% testing accuracy.
So now i want to try adding 2 more features, the speaker, which is given as [firstname,lastname] and the topic, which can be just one or several words (e.g vaccines or politics-elections-2016), which i decided to limit/pad to 5 words.
Now i don't know how to combine those features into one model. I don't think using word embeddings on the other features makes sense. Do i have to make 3 different networks and concatenate them into one? can I use both the speaker and topic as inputs for the same network? if i concatenate them, how would i do so (using the already classified output for each as input?) ?
I have in the past concatenated different input types like this. You can’t use Sequential anymore, the docs say
A Sequential model is not appropriate when:
• Your model has multiple inputs or multiple outputs
• Any of your layers has multiple inputs or multiple outputs
In the Keras.functional docs, there is a section called “Models with multiple inputs and outputs” that is almost the exact problem you are asking about.
However, I wrote an answer to say: don’t rule out turning your structured data into text, such as prepending “xxspeaker firstname lastname xxtopic topicname” to the text you currently have. It might not work for your current model, which seems pretty small... but if you were to be using a larger model, or fine-tuning a large LM for the task, like if you were using fast.ai or huggingface, you almost certainly have the capacity to just learn it from text.
Related
I used Keras to build a Siamese network using the coding format of one of the questions posted (please see code sample here). To explain this briefly, I built a Siamese network using the pretrained efficient net so that each copy of the network produces a dense layer which then get combined into into a L1-similarity output.
However, during prediction time, I only want to obtain the dense output of one of the layers (as an embedding). I plan on using a variety of unsupervised learning methods (including KNN) on these outputs.
During prediction, how can I ask keras to run only one copy of my network graph using a single input? Can I extract only a part of the NN graph? I don't want to have to always generate pairs of images or run the cost of running 2 images when I only need one output.
Let me just make sure that I understand your question and context. You are using a Siamese network (efficient net) and you want to generate embeddings for your input images.
From the image below, you only want to save the image encodings for one the ConvNets?
If that is the case, I dont really see the point of building a Siamese network at all. Just go for a single ConvNet (using efficient net). Because if you use the Siamese network model, it will always ask you to make image pairs.
If you go for only a single ConvNet model, and you identify the layer which you want to use to get the embeddings, then you can use the tf.keras.backend.function like this:
get_layer_output = tf.keras.backend.function([fine_tuned_model.layers[0].input],[fine_tuned_model.layers[-2].output])
Which then, for the predict, you can call it like this:
features = get_layer_output([x])[0]
I have been attempting to create a model that given an image, can read the text from it. I am attempting to do this by implementing a cnn, rnn, and ctc. I am doing this with TensorFlow and Keras. There are a couple of things I am confused about. For reading single digits, I understand that your last layer in the model should have 9 nodes, since those are the options. However, for reading words, aren't there infinitely many options, so how many nodes should I have in my last layer. Also, I am confused as to how I should add my ctc to my Keras model. Is it as a loss function?
I see two options here:
You can construct your model to recognize separate letters of those words, then there are as many nodes in the last layer as there are letters and symbols in the alphabet that your model will read.
You can make output of your model as a vector and then "decode" this vector using some other tool that can encode/decode words as vectors. One such tool I can think of is word2vec. Or there's an option to download some database of possible words and create such a tool yourself.
Description of your model is very vague. If you want to get more specific help, then you should provide more info, e.g. some model architecture.
I have training data as two columns
1.'Sentences'
2.'Relevant_text' (text in this column is a subset of text in the column 'Sentences')
I tried training a RNN with LSTM directly treating 'Sentences' as input and 'Relevant_text' and output but the results were disappointing.
I want to know how to approach this type of problem? Does this kind of problem have a name? Which models should I explore?
If the target text is the subset of the input text, then, I believe, this problem can be solved as a tagging problem: make your neural network for each word predict whether it is "relevant" or not.
On the one hand, the problem of taking a text and selecting its subset that best reflects its meaning is called extractive summarization, and has lots of solutions, from the well known unsupervised textRank algorithm to complex BERT-based neural models.
On the other hand, technically your problem is just binary token-wise classification: you label each token (word or other symbol) of your input text as "relevant" or not, and train any neural network architecture which is good for tagging on this data. Specifically, I would look into architectures for POS tagging, because they are very well studied. Typically, it is BiLSTM, maybe with a CRF head. More modern models are based on pretrained contextual word embeddings, such as BERT (maybe, you won't even need to fine tune them - just use it as a feature extractor, and add a BiLSTM on top). If you want a more lightweight model, you can consider a CNN over pretrained and fixed word embeddings.
One final parameter you should time playing with is the threshold for classifying the word as relevant - maybe, the default one, 0.5, is not the best choice. Maybe, instead of keeping all the tokens with probability-of-being-important higher than 0.5, you would like to keep the top k tokens, where k is fixed or is some percentage of the whole text.
Of course, more specific recommendations would be dataset-specific, so if you could share your dataset, it would be a great help.
I'm trying to do text classification with scikit-learn.
I have text that does not classify well. I think I can improve the predictions by adding data I can deduce in the form of an array of integers.
For example, sample 1 would come with [3, 1, 5, 2] and sample 2 would come with [2, 1, 4, 2]. This would also be true of the test data.
The idea is that the classifier could use both the text and the numbers to classify the data.
I've read the documentation for scikit learn and I can't find how to do it. It must be possible because all that is classified, internally, is vectors of numbers. So adding another vector of numbers should not be that much of a problem, but I can't figure out how. partial_fit adds more samples, it does not add more information about the existing samples. Is there a way to do what I'm trying to do. I tried to combine GaussianNB with SGDClassifier, but it turns out I don't know how to do that. (Was it a bad idea?)
What should I do?
I think you could add this new feature as another dimension to your training data. You need to modify the training data by adding your new features before calling SGD.
A simple/naive way would be:
For example, if my training data with two samples were
X = [ [1,2,3], [8,9,0] ]
And my new features for each sample was
new_feature_X = [ [11,22,33] , [77,88,00] ]
My new training data would be:
X_new = [[1,2,3,11,22,33] , [8,9,0,77,88,00]]
Then you call SGD.fit(X_new, labels)
As far as my SGD knowledge goes, I don't think there is any other way to combine two features.
The idea is that the classifier could use both the text and the
numbers to classify the data.
I find a neural network to be much more suitable for this. You could use two input layers, one for text vectors, and one for the numbers and feed them together into a network to get the output.
I tried to combine GaussianNB with SGDClassifier, but it turns out I
don't know how to do that. (Was it a bad idea?)
SGD means stochastic gradient descent. Is it possible to find the gradient of NaiveBayes? Whats the corresponding cost function ?
What should I do?
Ensemble. Train two separate classifiers. One using your text data, and another one for your new handcrafted feature. And then take the average of their prediction probabilities. You could train multiple classifier and take their votes. This tutorial is great for that.
Try out MLP Classifier. I used it a while ago, and found it works pretty great with text.
Neural networks. It's pretty easy with Keras.
Read research literature. There is pretty good chance academia might have done some work on your dataset. Try to read some of them. Google scholar, semantic scholar are great places to find published reseaerch.
from keras.layers import Input, Dense,Concatenate
from keras.models import Model
# This returns a tensor
text_input_vec = Input(shape=(784,))
new_numeric_feature = Input(shape=(4,))
# feed your text to a dense layer
dense1 = Dense(64, activation='relu')(text_input_vec)
# feed your numeric feature to another dense layer
dense2 = Dense(64, activation='relu')(new_numeric_feature)
# concatenate/combine the output of both
concat = Concatenate(axis=-1)([dense1,dense2])
# use the above to predict the label of your text. Layer below
# assumes you have 2 classes
predictions = Dense(2, activation='softmax')(concat)
model = Model(inputs=[text_input_vec,new_numeric_feature], outputs=predictions)
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
model.summary()
I am working on my paper, and one of the tasks is to extract the company name and location from the sentence of the following type:
"Google shares resources with Japan based company."
Here, I want the output to be "Google Japan". The sentence structure may also be varied like "Japan based company can access the resources of Google". I have tried an Attention based NN, but the error rate is around 0.4. Can anyone give me a little bit of hint about which model I should use?
And I printed out the validation process like this:
validation print
And I got the graphs of the loss and accuracy:
lass and accuracy
It shows that the val_acc is 0.99. Is this mean my model is pretty good at predicting? But why do I get 0.4 error rate when I use my own validation function to show error rate? I am very new to ML. What does the val_acc actually mean?
Here is my model:
encoder_input = Input(shape=(INPUT_LENGTH,))
decoder_input = Input(shape=(OUTPUT_LENGTH,))
encoder = Embedding(input_dict_size, 64, input_length=INPUT_LENGTH, mask_zero=True)(encoder_input)
encoder = LSTM(64, return_sequences=True, unroll=True)(encoder)
encoder_last = encoder[:, -1, :]
decoder = Embedding(output_dict_size, 64, input_length=OUTPUT_LENGTH, mask_zero=True)(decoder_input)
decoder = LSTM(64, return_sequences=True, unroll=True)(decoder, initial_state=[encoder_last, encoder_last])
attention = dot([decoder, encoder], axes=[2, 2])
attention = Activation('softmax')(attention)
context = dot([attention, encoder], axes=[2, 1])
decoder_combined_context = concatenate([context, decoder])
output = TimeDistributed(Dense(64, activation="tanh"))(decoder_combined_context) # equation (5) of the paper
output = TimeDistributed(Dense(output_dict_size, activation="softmax"))(output)
model = Model(inputs=[encoder_input, decoder_input], outputs=[output])
model.compile(optimizer='adam', loss="binary_crossentropy", metrics=['accuracy'])
es = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=200, min_delta=0.0005)
I will preface this by saying that if you're new to ML, I would advise you to learn more "traditional" algorithms before turning towards Neural Networks. Furthermore, your task is so specific to company names and locations that using Latent Semantic Analysis (or a similar statistical method) to generate your embeddings and an SVM to determine which words are relevant might give you better results than neural networks with less experimentation and less training time.
Now with all that being said, here's what I can gather. If I understand correctly, you have a separate, second validation set on which you get a 40% error rate. All the numbers in the screenshots are pretty good, which leads me to two possible conclusions: Either your second validation set is very different from your first one and you're suffering from a bit of overfitting, or there's a bug somewhere in your code that leads Keras to believe your model is doing great when in fact it isn't. (Bear in mind I'm not very familiar with Keras so I don't know how likely the latter option is)
Now as for the model itself, your task is clearly extractive, meaning that your model doesn't need to paraphrase anything or come up with something that isn't in the source text. Your model should take that into account, and should never make mistakes like confusing India with New-Zealand or Tecent with Google. You can probably base your model on recent work in extractive summarization, which is a fairly active field (moreso than keyword and keyphrase extraction). Here's a recent article which uses a neural attention model, you can use Google Scholar to easily find more.