Extract different emotions for words Affect WordNet - python

I'm trying to use Affect WordNet to use the different emotions that are implemented within, like: anger, dislike, fear negative-fear, joy, sad etc.
I found a code sample for that on the library: https://github.com/clemtoy/WNAffect
from wnaffect import WNAffect
import nltk
nltk.download('dict')
wna = WNAffect('wordnet-1.6/', 'wn-domains-3.2/')
emo = wna.get_emotion('angry', 'JJ')
print(emo)
I tested it and I got this error:
Resource dict not found. Please use the NLTK Downloader to obtain
the resource:
import nltk nltk.download('dict')
I tried to use NLTK downloader to download it, but nothing there !
A similar question was asked before here: Extract “emotion words” / affect words from english corpus?
but they solved it by using sentiwordNet to retrieve the negative and the positive scores !

Related

What are the best algorithms to determine the language of text and to correct typos in python?

I am looking for algorithms that could tell the language of the text to me(e.g. Hello - English, Bonjour - French, Servicio - Spanish) and also correct typos of the words in english. I have already explored Google's TextBlob, it is very relevant but it got "Too many requests" error as soon as my code starts executing. I also started exploring Polyglot but I am facing a lot of issues to download the library on Windows.
Code for TextBlob
*import pandas as pd
from tkinter import filedialog
from textblob import TextBlob
import time
from time import sleep
colnames = ['Word']
x=filedialog.askopenfilename(title='Select the word list')
print("Data to be checked: " + x)
df = pd.read_excel(x,sheet_name='Sheet1',header=0,names=colnames,na_values='?',dtype=str)
words = df['Word']
i=0
Language_detector=pd.DataFrame(columns=['Word','Language','corrected_word','translated_word'])
for word in words:
b = TextBlob(word)
language_word=b.detect_language()
time.sleep(0.5)
if language_word in ['en','EN']:
corrected_word=b.correct()
time.sleep(0.5)
Language_detector.loc[i, ['corrected_word']]=corrected_word
else:
translated_word=b.translate(to='en')
time.sleep(0.5)
Language_detector.loc[i, ['Word']]=word
Language_detector.loc[i, ['Language']]=language_word
Language_detector.loc[i, ['translated_word']]=translated_word
i=i+1
filename="Language detector test v 1.xlsx"
Language_detector.to_excel(filename,sheet_name='Sheet1')
print("Languages identified for the word list")**
A common way to classify languages is to gather summary statistics on letter or word frequencies and compare them to a known corpus. A naive bayesian classifier would suffice. See https://pypi.org/project/Reverend/ for a way to do this in Python.
Correction of typos can also be done from a corpus using a statistical model of the most likely words versus the likelihood of a particular typo. See, https://norvig.com/spell-correct.html for an example of how to do this in Python.
You could use this, but it is hardly reliable:
https://github.com/hb20007/hands-on-nltk-tutorial/blob/master/8-1-The-langdetect-and-langid-Libraries.ipynb
Alternatively, you could give compact language detector (cld v3) or fasttext a chance OR you could use a corpus to check frequencies of occurring words with the target text in order to find out whether the target text belongs to the language of the respective corpus. The latter is only possible if you know the set of languages to choose from.
For typo correction, you could use the Levenshtein algorithm, which computes a «edit distance». You can compare your words against a dictionary and choose the most likely word. For Python, you could use: https://pypi.org/project/python-Levenshtein/
See the concept of Levenshtein edit distance here: https://en.wikipedia.org/wiki/Levenshtein_distance

How to get Synonyms for ngrams using sentiwordnet in nltk Python

I've been trying
on how to get synonyms for the words I pass. That's an easy piece of cake for wordnet to do. However I'm trying and failing with bigrams
viz. 'i tried', 'horrible food', 'people go'
I looked for these
to learn more on sentiwordnet, since I read documentation. Perhaps, there are no examples on how to use it.
Went on for source too, yet I'm here.
I wrote
the code for doing it in foll. way, point out what needs correction:
from nltk.corpus import sentiwordnet as swn
sentisynset = swn.senti_synset('so horrible')
print(sentisynset)
Well, this clear returns ValeError, not sure why though.
Also, I tried this:
from nltk.corpus.reader.lin import LinThesaurusCorpusReader as ltcr
synon = ltcr.synonyms(ngram='so horrible')
print(synon)
and this returns TypeError, asking me for self parameter to be filled.

targeting one word in the response of an input command python

I have been trying to make an artificial intelligence on python. What I have been trying to do is make input command responses target one word. So for example, if the user types in "whats your name" it will have the same response as "name" by targeting the word "name". how can I do this?
What you're looking for is a library for handling Parts of Speech. Luckily it's pretty well trodden ground, and there are libraries for lots of different languages - including Python. Have a look at Stanford's Natural Language Toolkit (NLTK). Here's an example from the linked article:
>>> from nltk.tag.stanford import POSTagger
>>> english_postagger = POSTagger(‘models/english-bidirectional-distsim.tagger’, ‘stanford-postagger.jar’)
>>> english_postagger.tag(‘this is stanford postagger in nltk for python users’.split())
[(u’this’, u’DT’),
(u’is’, u’VBZ’),
(u’stanford’, u’JJ’),
(u’postagger’, u’NN’),
(u’in’, u’IN’),
(u’nltk’, u’NN’),
(u’for’, u’IN’),
(u’python’, u’NN’),
(u’users’, u’NNS’)]
The NN, VBZ, etc. you can see are speech tags. It looks like you're looking for nouns (NN).

Plotting the ConditionalFreqDist of a book

Using nltk (already imported). Playing around with gutenberg corpus
import nltk
from nltk.corpus import gutenberg
Checked out the fileids, to find one I could play with:
gutenberg.fileids()
I made a small code to find the most common words (in order to choose a few for the graph)
kjv_text = nltk.Text(kjv)
from collections import Counter
for words in [kjv_text]:
c = Counter(words)
print c.most_common()[:100] # top 100
kjv_text.dispersion_plot(["LORD", "God", "Israel", "king", "people"])
Until here it works perfectly. Then I try and implement the ConditionalFreqDist, but get bunch of errors:
cfd2 = nltk.ConditionalFreqDist((target, fileid['bible-kjv.txt'])
for fileid in gutenberg.fileids()
for w in gutenberg.words(fileid)
for target in ['lord']
if w.lower().startswith(target))
cfd2.plot()
I have tried to change a few things, but always get some errors. Any experts that can tell me what I'm doing wrong?
Thanks
Here is what was wrong:
The fileid in:
cfd2 = nltk.ConditionalFreqDist((target, fileid['bible-kjv.txt'])
should reference to which element it is (in this case the 4th on the list of gutemberg texts.
So the line should instead say:
cfd2 = nltk.ConditionalFreqDist((target, fileid[3])

CherryPy WebService not returning NLTK collocations to browser window

I have a very simple CherryPy webservice that I hope will be the foundation of a larger project, however, I need to get NLTK to work the way I want.
My python script imports NLTK and uses the collocation (bigram) function of NLTK, to do some analysis on pre-loaded data.
I have a couple of questions:
1) Why is the program not returning the collocations to my browser, but only to my console?.
2) Why if I am specifying from nltk.book import text4, the program imports the whole set of sample books (text1 to text9)?
Please, keep in mind that I am a newbie, so the answer might be in front of me, but I don't see it.
Main question: How do I pass the collocation results to the browser (webservice), instead of console?
Thanks
import cherrypy
import nltk
from nltk.book import text4
class BiGrams:
def index(self):
return text4.collocations(num=20)
index.exposed = True
cherrypy.quickstart(BiGrams())
I have been doing some work with Moby Dick and I stumbled on the answer to the question of importing just one specific text the other day:
>>>import nltk.corpus
>>>from nltk.text import Text
>>>moby = Text(nltk.corpus.gutenberg.words('melville-moby_dick.txt'))
Thus, all you really need is the fileid in order to assign the text of that file to your new Text object. Be careful, though, because only "literary" sources are in the gutenberg.words directory.
Anyway, for help with finding file ids for gutenberg, after import nltk.corpus above, you can use the following command:
>>> nltk.corpus.gutenberg.fileids()
['austen-emma.txt', 'austen-persuasion.txt', 'austen-sense.txt', 'bible-kjv.txt', 'blake-poems.txt', 'bryant-stories.txt', 'burgess-busterbrown.txt', 'carroll-alice.txt', 'chesterton-ball.txt', 'chesterton-brown.txt', 'chesterton-thursday.txt', 'edgeworth-parents.txt', 'melville-moby_dick.txt', 'milton-paradise.txt', 'shakespeare-caesar.txt', 'shakespeare-hamlet.txt', 'shakespeare-macbeth.txt', 'whitman-leaves.txt']
This still doesn't answer the question for your specific corpus, the inaugural addresses, however. For that answer, I found this MIT paper: http://web.mit.edu/6.863/www/fall2012/nltk/ch2-3.pdf
(I recommend it to anyone beginning to work with nltk texts because it talks about grabbing all kinds of textual data for analysis). The answer to getting the inaugural address fileids comes on page 6 (edited a bit):
>>> nltk.corpus.inaugural.fileids()
['1789-Washington.txt', '1793-Washington.txt', '1797-Adams.txt', '1801-Jefferson.txt', '1805-Jefferson.txt', '1809-Madison.txt', '1813-Madison.txt', '1817-Monroe.txt', '1821-Monroe.txt', '1825-Adams.txt', '1829-Jackson.txt', '1833-Jackson.txt', '1837-VanBuren.txt', '1841-Harrison.txt', '1845-Polk.txt', '1849-Taylor.txt', '1853-Pierce.txt', '1857-Buchanan.txt', '1861-Lincoln.txt', '1865-Lincoln.txt', '1869-Grant.txt', '1873-Grant.txt', '1877-Hayes.txt', '1881-Garfield.txt', '1885-Cleveland.txt', '1889-Harrison.txt', '1893-Cleveland.txt', '1897-McKinley.txt', '1901-McKinley.txt', '1905-Roosevelt.txt', '1909-Taft.txt', '1913-Wilson.txt', '1917-Wilson.txt', '1921-Harding.txt', '1925-Coolidge.txt', '1929-Hoover.txt', '1933-Roosevelt.txt', '1937-Roosevelt.txt', '1941-Roosevelt.txt', '1945-Roosevelt.txt', '1949-Truman.txt', '1953-Eisenhower.txt', '1957-Eisenhower.txt', '1961-Kennedy.txt', '1965-Johnson.txt', '1969-Nixon.txt', '1973-Nixon.txt', '1977-Carter.txt', '1981-Reagan.txt', '1985-Reagan.txt', '1989-Bush.txt', '1993-Clinton.txt', '1997-Clinton.txt', '2001-Bush.txt', '2005-Bush.txt', '2009-Obama.txt']
Thus, you should be able to import specific inaugural addresses as Texts (assuming you did "from nltk.text import Text" above) or you can work with them using the "inaugural" identifier imported above. For example, this works:
>>>address1 = Text(nltk.corpus.inaugural.words('2009-Obama.txt'))
In fact, you can treat all inaugural addresses as one document by calling inaugural.words without any arguments, as in the following example from this page:
>>>len(nltk.corpus.inaugural.words())
OR
addresses = Text(nltk.corpus.inaugural.words())
I remembered reading this thread a month ago when trying to answer this question myself, so perhaps this information, if coming late, will be helpful to someone somewhere.
(This is my first contribution to Stack Overflow. I've been reading for months and never had anything useful to add until now. Just want to say generally 'thanks to everyone for all the help.')
My guess is that what you get back from the collocations() call is not a string, and that you need to serialize it. Try this instead:
import cherrypy
import nltk
from nltk.book import text4
import simplejson
class BiGrams:
def index(self):
c = text4.collocations(num=20)
return simplejson.dumps(c)
index.exposed = True
cherrypy.quickstart(BiGrams())
Take a look at the source code (http://code.google.com/p/nltk/source/browse/trunk/nltk/) and you'll learn a lot (I know I did).
1) Collocations is returning to your console because that's what it is supposed to do.
help(text4.collocations)
will give you:
Help on method collocations in module nltk.text:
collocations(self, num=20, window_size=2) method of nltk.text.Text instance
Print collocations derived from the text, ignoring stopwords.
#seealso: L{find_collocations}
#param num: The maximum number of collocations to print.
#type num: C{int}
#param window_size: The number of tokens spanned by a collocation (default=2)
#type window_size: C{int}
Browse the source in text.py and you'll find the method for collocations is pretty straight-forward.
2) Importing nltk.book loads each text. You could could just grab the bits you need from book.py and write a method that only loads the inaugural addresses.

Categories