I'm currently playing with Tensorflow Seq2seq model, trying to implement sentiment analysis. My idea is to feed the encoder with IMDB comment, the decoder with [Pad] or [Go] and the target with [neg]/[pos]. Most of my code is quite similar with the example of seq2seq translation. But the result I get is quite strange. For each batch, the results are either all [neg] or all [pos].
"encoder input : I was hooked almost immediately.[pad][pad][pad]"
"decoder input : [pad]"
"target : [pos]"
Since this result is very particular, I was wondering if anyone knows what would lead to this kind of thing?
I would recommend to try using a simpler architecture - RNN or CNN encoder that feeds into logistic classifier. This architectures has been showing very good results on sentiment analysis (amazon reviews, yelp reviews, etc).
For examples of such models, you can see here - various encoders (LSTM or Convolution) on words and characters.
Related
i'm totally new in NLP and Bert Model.
What im trying to do right now is Sentiment Analysis on Twitter Trending Hashtag ("neg", "neu", "pos") by using DistilBert Model, but the accurazcy was about 50% ( I tried w Label data taken from Kaggle).
So here is my idea:
(1) First, I will Fine-tunning Distilbertmodel (Model 1) with IMDB dataset,
(2) After that since i've got some data took from Twitter post, i will sentiment analysis them my Model 1 and get Result 2.
(3) Then I will refine-tunning Model 1 with the Result 2 and expecting to have Model (3).
Im not really sure this process has any meaning to make the model more accuracy or not.
Thanks for reading my post.
I'm a little skeptical about your first step. Since the IMDB database is different from your target database, I do not think it will positively affect the outcome of your work. Thus, I would suggest fine-tuning it on a dataset like a tweeter or other social media hashtags; however, if you are only focusing on hashtags and do not care about the text, that might work! My little experience with fine-tuning transformers like BART and BERT shows that the dataset that you are working on should be very similar to your actual data. But in general, you can fine-tune a model with different datasets, and if the datasets are structured for one goal, it can improve the model's accuracy.
If you want to fine-tune a sentiment classification head of BERT for classifying tweets, then I'd recommend a different strategy:
IMDB dataset is a different kind of sentiment - the ratings do not really correspond with short post sentiment, unless you want to focus on tweets regarding movies.
using classifier output as input for further training of that classifier is not really a good approach, because, if the classifier made many mistakes while classifying, these will be reflected in the training, and so the errors will deapen. This is basically creating endogenous labels, which will not really improve your real-world classification.
You should consider other ways of obtaining labelled training data. There are a few good examples for twitter:
Twitter datasets on Kaggle - there are plenty of datasets available containing millions of various tweets. Some of those even contain sentiment labels (usually inferred from emoticons, as these were proven to be more accurate than words in predicting sentiment - for explanation see e.g. Frasincar 2013). So that's probably where you should look.
Stocktwits (if youre interested in financial sentiments)- contain posts that authors can label for sentiments, thus are a perfect way of mining labelled data, if stocks/cryptos is what you're looking for.
Another thing is picking a model that's better for your language, I'd recommend this one. It has been pretrained on 80M tweets, so should provide strong improvements. I believe it even contains a sentiment classification head that you can use.
Roberta Twitter Base
Check out the website for that and guidance for loading the model in your code - it's very easy, just use the following code (this is for sentiment classification):
MODEL = "cardiffnlp/twitter-roberta-base-sentiment"
tokenizer = AutoTokenizer.from_pretrained(MODEL)
model = AutoModelForSequenceClassification.from_pretrained(MODEL)
Another benefit of this model is that it has been pretrained from scratch with a vocabulary that contains emojis, meaning it has a deep understanding of them, their typical contexts and co-occurences. This can greatly benefit a social media classification, as many researchers would agree that emojis/emoticons are better predictors of sentiment than normal words.
I am trying solving a product matching task by extracting the word embedding from BERT and feed the word embedding to Machine Learning models.
However, the accuracy is not good.
I tried different truncation strategy and use single attribute or concatenated 2 attributes as the input for the word embedding, but the accuracy did not improve a lot.
May you pls suggest me what should I do to improve the performance of the models?
Here is the project in github
https://github.com/jajawong/Data-Science/blob/main/Copy_of_FineTuneBert_Feature_Extraction%20(1).ipynb
I am currently working on a system that extracts certain features out of 3D-objects (Voxelgrids to be precise), and i would like to compare those features to automatically made features when it comes to performance (classification) in a tensorflow cNN with some other data, but that is not the point here, just for background.
My idea now was, to take a dataset (modelnet10), train a tensorflow cNN to classify them, and then use what it learned there on my dataset - not to classify, but to extract features.
So i want to throw away everything the cnn does,except for what it takes from the objects.
Is there anyway to get these features? and how do i do that? i certainly have no idea.
Yes, it is possible to train models exclusively for feature extraction. This is called transfer learning where you can either train your own model and then extract the features or you can extract features from pre-trained models and then use it in your task if your task is similar in nature to that of what the pre-trained model was trained for. You can of course find a lot of material online for these topics. However, I am providing some links below which give details on how you can go about it:
https://keras.io/api/applications/
https://keras.io/guides/transfer_learning/
https://machinelearningmastery.com/how-to-use-transfer-learning-when-developing-convolutional-neural-network-models/
https://www.pyimagesearch.com/2019/05/27/keras-feature-extraction-on-large-datasets-with-deep-learning/
https://www.kaggle.com/angqx95/feature-extractor-fine-tuning-with-keras
For a text classification, I have data of 1000 reviews and I tried different neural networks. For the CNN I got an accuracy of 0.94 but with the LSTM I got a lower accuracy (0.88) is this normal because as far as I know the LSTM is specialized for text classification and it preserves the order of the word sequence?
Yes, this isn't abnormal and was shown in a lot of researches.
The performance of these models depends on many factors like the data you have and the task you are dealing with it.
For example, CNN can perform well if your task cares more about detecting some substantial features (like the sentiment).
However, RNN-based models can show their superiority when the sequential aspect of the data is matters, like in Machine Translation and Text Summarization tasks.
I don't believe that the "LSTM specialized for text classification" is true. It's better to say LSTM specialized to learn sequential data. LSTM can learn the texts and the relation between the tokens very well, but the task you defined maybe doesn't care about these linguistic features. For example, in sentiment classification, a model (like CNN) can care about just the presence of some words and achieves good results.
I'm currently working on NLP project. Actually, when i researched how to deal with NLP, i found some articles about SpaCy. But, because i'm still newbie on python, i don't understand how SpaCy TextCategorizer Pipeline works.
Is there any detailed about how this pipeline works? Is TextCategorizer Pipeline also using text feature extraction such as Bag of Words, TF-IDF, Word2Vec or anything else? And what model architecture use in SpaCy TextCategorizer? Is there someone who could explain me about this?
There's a lot of info in the docs:
https://spacy.io/usage/examples#textcat shows a code example
https://spacy.io/api/textcategorizer provides details on the architecture:
The model supports classification with multiple, non-mutually exclusive labels. You can change the model architecture rather easily, but by default, the TextCategorizer class uses a convolutional neural network to assign position-sensitive vectors to each word in the document. The TextCategorizer uses its own CNN model, to avoid sharing weights with the other pipeline components. The document tensor is then summarized by concatenating max and mean pooling, and a multilayer perceptron is used to predict an output vector of length nr_class, before a logistic activation is applied elementwise. The value of each output neuron is the probability that some class is present.