Polarity values do not appear when performing sentiment analysis - python

I wrote code using my dataset to do sentiment analysis, but I can't see the polarity results in the output. What is the reason of this? Can you suggest another sentiment analysis library?
Even though I changed the codes, I couldn't get any output.

Related

Clustering text data based on sentiment?

I am scraping reviews off Amazon with the intent to perform sentiment analysis to classify them into positive, negative and neutral. Now the data I would get would be text and unlabeled.
My approach to this problem would be as following:-
1.) Label the data using clustering algorithms like DBScan, HDBScan or KMeans. The number of clusters would obviously be 3.
2.) Train a Classification algorithm on the labelled data.
Now I have never performed clustering on text data but I am familiar with the basics of clustering. So my question is:
Is my approach correct?
Any articles/blogs/tutorials I can follow for text based clustering since I am kinda new to this?
I have never done such an experiment but as far as I know, the most challenging part of this work is transforming the sentences or documents into fixed-length vectors (mapping into semantic space). I highly suggest using a sentiment analysis pipeline from huggingface library for embedding the sentences (in this way you might exploit some supervision). There are other options as well:
Using sentence-transformers library. (straightforward and still good)
Using BoW. (simplest way but hard to get what you want)
Using TF-IDF (still simple but may simply do the work)
After you reach this point (every review ==> fixed-length vector) you can exploit whatever you want to cluster them and look after the results.

Using an average of VADER and textBlob's sentiment polarity gives me a more accurate result, why?

I have a manually labelled set of ~120K tweets. If I use VADER's compound score it only matches the manual labelling for ~24% of the records, textblob matches ~35% of the manually labelled record. If I take Vaders compound score and textblobs score and add then together and divide by 2 the resulting sentiment result matches the manual labelling ~70% of the time. Is there any reason for why its more accurate or is it just coincidence?
I think you're stumbling upon the idea behind ensemble learning. More often than not, putting multiple models together and combining their predictions leads to better results. Your implementation could be thought of as an equally weighted soft-voting ensemble. For more examples and additional implementations, the scikit-learn Voting Classifier docs are great.

How to categorize tweets(supportive vs. unsupportive) to predict elections results

The idea
I am collecting tweets talking about the three major candidates for the US presidency in November. After collecting the tweets from different states, I will score these tweets, and then analyze each candidate's followers/supporters on various aspects.
The problem
I am sure what method I should use to classify the tweets in order to produce reasonable outcomes. More precisely, I don't know how to tell if a tweet is supporting or opposing a specific candidate.
What I tried
I tried to use a library called textblob. Given a tweet, it returns a tuple of the form Sentiment(polarity, subjectivity). Polarity is a float which lies in the range of [-1,1] where 1 means positive statement and -1 means a negative statement. This method does not return reasonable results at all when applied. For example, given a tweet like "Donald Trump is horrible! I still support him tho.", it returns a polarity of -1.0 (negative), which does not make any sense at all.
Further Research
I looked for more examples and found this. In that example, the author uses a mood vocabulary (from the internet) and later assigns a mood to each tweet. I am planning to take a close look at that article and apply the method used there.
My questions
What is a proper way to categorize those tweets? Should I consider the one I mentioned in the further research section?
What if a tweet contains both names? something like "Trump will beat Biden". How something like this is going to be scored using a specific method?
Sentiment Analysis is a topic of NLP (Natural language processing). What you are wondering is about NLP which is one of many interesting branches of machine learning.
Your approach to weigh out the polarity of a tweet is correct. Since you have 3 different candidates aka. labels, this is a more interesting problem.
I would store a sentiment array of length 3 for each tweet in order to calculate weights for each candidate. For example; a tweet like "A is bad. B will do much better! But also C might too?" might produce [-1, 0.8, 0.4].
To do that, you need a corpus. The corpus aka your dataset should contain tweets and labels for each tweet so that your machine learning model can learn from the tweets.
There are many ways to build a machine learning model and train it with your dataset. That is a topic of data science. Data scientists try to improve some performance indicators in order to improve their model.
The simpliest would be something like, parse out all words out of tweet, increment their values in a hash map with labels and normalize. Now you have a hash map containing the sentiment value for each word.
But in reality, this would not work out well as outliers and lack of dataset would affect your result. Therefore, you need to look at your data, problem and choose the right machine learning model. Look at this article to learn more about building a sentiment classifier.

Python NLTK difference between a sentiment and an incident

Hi i want to implement a system which can identify whether the given sentence is an incident or a sentiment.
I was going through python NLTK and found out that there is a way to find out positivity or negativity of a sentense.
Found out the ref link: ref link
I want to achieve like
My new Phone is not as good as I expected should be treated as sentiment
and Camera of my phone is not working should be considered as incident.
I gave a Idea of making my own clusters for training my system for finding out such but not getting a desired solution is there a built-in way to find that or any idea on how can be approach for solution of same.
Advance thanks for your time.
If you have, or can construct, a corpus of appropriately categorized sentences, you could use it to train a classifier. There can be as many categories as you need (two, three or more).
You'll have to do some work (reading and experimenting) to find the best features to use for the task. I'd start by POS-tagging the sentence so you can pull out the verb(s), etc. Take a look at the NLTK book's chapter on classifiers.
Use proper training/testing methodology (always test on data that was not seen during training), and make sure you have enough training data-- it's easy to "overtrain" your classifier so that it does well on the training data, by using characteristics that coincidentally correlate with the category but will not recur in novel data.

Sentiment Analysis API for Multiple Dimensions i.e. Positivity, Emotionality etc

I have big chucks of text in English (avg length 800 words) which I would like to evaluate with a good and reliable sentiment analysis API.
Some threads seem to suggest APIs like Alchemy but I would like an evaluation of the sentiment along multiple dimensions and not just a single score. Example of such dimensions could be Positivity and Emotionality etc.
Do you know any APIs that would provide such more elaborate results?
The terms used in the natural language processing literature for positivity and emotionality are "valence" (or sometimes "polarity") and "arousal", respectively, so searching for APIs using those terms might be more useful to you. A quick search on those terms + sentiment + API revealed the following:
http://talc2.loria.fr/empathic/ can give positivity (valence) as well as the specific type of emotion (e.g. "sadness" vs. "disgust")
SentiStrength gives a positivity score as well as a negativity score. You can sum the scores to get positivity, or sum the absolute values of the score to get emotionality. For example a high-magnitude positivity score (+5) and a high-magnitude negativity score (-5) corresponds to high emotionality, but neutral positivity.
Mashape's Repustate ( https://www.mashape.com/repustate/repustate-sentiment-and-text-analytics ) can give positivity towards different aspects of a service (e.g. pos/neg sentiment towards price, food, staff, location, atmosphere, events). Some of their other APIs on this list may also be of interest: http://blog.mashape.com/list-of-20-sentiment-analysis-apis/ . Apparently they used to have sentiment detection APIs specific to the dimensions of anger and excitement, but these seem to have been phased out.
We recently compared 15 Sentiment Analysis APIs. Here are some relevant points:
sentiment score and positivity is essentially the same thing. Some APIs return the sentiment score, others - sentiment polarity labels (negative, positive etc) together with a confidence for each label. They could be mapped into each other (and we do that in our uniform API). The only difference is that the latter approach allows for expressing a mixed sentiment, while with the sentiment score it requires adding sentiment agreement (like Meaning Cloud does).
aspect-based sentiment is when the subject can be evaluated along different dimensions or aspects. An example is a restaurant review, which may combine sentiment towards service, meals, and prices in one sentence. We have found aspect-based sentiment in Aylien, Meaning Cloud and Repustate, with different domain models available at each of the services.
entity-based sentiment. another way to get more details is to perform entity extraction and then to analyze sentiment towards each of the entities mentioned in the sentence. This is supported by Google Cloud Natural Language.
Additionally, Aylien and Meaning Cloud provide sentiment subjectivity score, measuring how subjective is the writer opinion.
Surprisingly, only Meaning Cloud provides explicit irony detection. It is not clear if it is used in other models implicitly.
Here's the picture:
Take a look at this API: http://sentic.net/
They're doing sentiment analysis for a wide variety of different emotional dimensions at concept level and so much more...

Categories