Incomplete list of synset hypernyms in NLTK's WordNet? - python

While trying to recover any given WordNet synset's hypernyms through WN NLTK's interface, I am getting what I think are different results from WN's web search interface. For example:
from nltk.corpus import wordnet as wn
bank6ss = wn.synsets("bank")[5] # 'bank' as gambling house funds
bank6ss.hypernyms()
# returns [Synset('funds.n.01')]
That is, only one hypernym found (no others are found with, for instance, instance_hypernyms()). However, when looking at WN's web interface, this sense of 'bank' lists several other hypernyms under "Direct hypernym":
funds, finances, monetary resource, cash in hand, pecuniary resource
What would explain this difference, and how could I get that longer list of hypernyms in NLTK's WordNet?
The WordNet version used in my NLTK installation is 3.0.

I just realized that I'm looking at two different types of output: What is returned in NLTK WordNet is a hypernym synset (Synset['funds.n.01']) while the list of hypernyms in the web interface is composed of lemmas belonging to that one synset.
To fully answer the question, this list of lemmas can be recovered in NLTK as follows:
from nltk.corpus import wordnet as wn
bank6ss = wn.synsets("bank")[5] # 'bank' as gambling house funds
hn1ss = bank6ss.hypernyms()[0]
hn1ss.lemmas()
# returns [Lemma('funds.n.01.funds'),
# Lemma('funds.n.01.finances'),
# Lemma('funds.n.01.monetary_resource'),
# Lemma('funds.n.01.cash_in_hand'),
# Lemma('funds.n.01.pecuniary_resource')]
Or, if only lemma names are of interest:
hn1ss.lemma_names()
# returns [u'funds',
# u'finances',
# u'monetary_resource',
# u'cash_in_hand',
# u'pecuniary_resource']

Related

Can we create a simple thesaurus from a field in a dataframe?

I am trying to find synonyms and antonyms for one word, using strings from a field in a dataframe and not a standard wordnet.synsets lexical database. I'm pretty sure this is possible, but I'm not sure how to feed in the appropriate data source (my specific field).
For instance, the code below works fine.
import nltk
from nltk.corpus import wordnet #Import wordnet from the NLTK
syn = list()
ant = list()
for synset in wordnet.synsets("fake"):
for lemma in synset.lemmas():
syn.append(lemma.name()) #add the synonyms
if lemma.antonyms(): #When antonyms are available, add them into the list
ant.append(lemma.antonyms()[0].name())
print('Synonyms: ' + str(syn))
print('Antonyms: ' + str(ant))
I tried to convert the field to an array, and use that...
import pandas as pd
import nltk.corpus
from nltk.corpus import stopwords, wordnet
from nltk.tokenize import word_tokenize
from nltk.stem import PorterStemmer, WordNetLemmatizer
df = pd.read_csv("C:\\my_path\\dataset.csv")
df['review_text'] = df['review_text'].astype(str)
type(df)
df.dtypes
asarray = pd.array(df['review_text'])
import nltk
from nltk.corpus import wordnet #Import wordnet from the NLTK
syn = list()
ant = list()
for synset in wordnet.asarray('fake'):
for lemma in df['review_text'].iterrows():
syn.append(lemma.name()) #add the synonyms
if lemma.antonyms(): #When antonyms are available, add them into the list
ant.append(lemma.antonyms()[0].name())
print('Synonyms: ' + str(syn))
print('Antonyms: ' + str(ant))
When I run that, I get this error:
AttributeError: 'WordNetCorpusReader' object has no attribute 'asarray'
The field in the dataframe looks like this:
feels comfortable i wear day!
package came end missing box. since it’s gift i update actual fit.
birkenstock amazing shoe!!!! i wish i ten pairs!
delivered advertised.... shoe looks & fits expected. leather color seems bit lighter one seen store, still satisfactory.
second pair i had. nothing beats them.
These are the first 5 rows. Maybe the issue is related to this thing (not sure): it’s
It's a typo or something.
Error happens on:
for synset in wordnet.asarray('fake'):
Where wordnet is an object from nltk.corpus, thus a WordNetCorpusReader
And before the for loop, you have
asarray = pd.array(df['review_text'])
that reads in a Pandas array/series into the asarray variable. But it's not related to WordNetCorpusReader

How to analyze nouns in a list

I would like to know if there is a way to analyze nouns in a list. For example, if there is an algorithm that discern different categories, so like if the noun is part of the category "animal", "plants", "nature" and so on.
I thought it was possible to achieve this result with Wordnet, but, if I am not wrong, all the nouns in WordNet are categorized as "entity". Here is a script of my WordNet analysis:
lemmas = ['dog', 'cat', 'garden', 'ocean', 'death', 'joy']
hypernyms = []
for i in lemmas:
dog = wn.synsets(i)[0]
temp_list = []
hypernyms_list = ([lemma.name() for synset in dog.root_hypernyms() for lemma in synset.lemmas()])
temp_list.append(hypernyms_list)
flat = list(set([item for sublist in temp_list for item in sublist]))
hypernyms.append(flat)
hypernyms
And the result is: [['entity'], ['entity'], ['entity'], ['entity'], ['entity'], ['entity']].
Can anybody suggest me some techniques to retrieve the category the names belong to, if there is anything available?
Thanks in advance.
One approach I can suggest is using Google's NLP API. This API have feature of identifying Part of Speech as part of Syntax Analysis. Please refer to documentation here -
Google's NLP API - Syntax Analysis
Another option is Stanford's NLP API. Here are reference docs - Stanford's NLP API

How to get Synonyms for ngrams using sentiwordnet in nltk Python

I've been trying
on how to get synonyms for the words I pass. That's an easy piece of cake for wordnet to do. However I'm trying and failing with bigrams
viz. 'i tried', 'horrible food', 'people go'
I looked for these
to learn more on sentiwordnet, since I read documentation. Perhaps, there are no examples on how to use it.
Went on for source too, yet I'm here.
I wrote
the code for doing it in foll. way, point out what needs correction:
from nltk.corpus import sentiwordnet as swn
sentisynset = swn.senti_synset('so horrible')
print(sentisynset)
Well, this clear returns ValeError, not sure why though.
Also, I tried this:
from nltk.corpus.reader.lin import LinThesaurusCorpusReader as ltcr
synon = ltcr.synonyms(ngram='so horrible')
print(synon)
and this returns TypeError, asking me for self parameter to be filled.

How to get all synonims from NLTK Wordnet Interface in the way CLI does

I am trying to obtain synonyms for word arbitrary using a following code:
from nltk.corpus import wordnet as wn
for i, j in enumerate(wn.synsets('arbitrary')):
print("Meaning", i, "NLTK ID:", j.name())
print("Definition:", j.definition())
print("Synonyms:", ", ".join(j.lemma_names()))
With a following result:
Meaning 0 NLTK ID: arbitrary.a.01
Definition: based on or subject to individual discretion or preference or sometimes impulse or caprice
Synonyms: arbitrary
However, when I use CLI, I get
$ wn arbitrary -synsa
Similarity of adj arbitrary
1 sense of arbitrary
Sense 1
arbitrary (vs. nonarbitrary)
=> absolute
=> capricious, impulsive, whimsical
=> discretionary, discretional
As you can see, CLI shows more synonyms.
I have tried to set WNSEARCHDIR env variable to NLTK corpora directory, so CLI tool uses same dictionary NLTK does.

Does wordnet API has a function ".is_parent_of()"?

I want to check if 'worda' is a hypernym of 'wordb', according to the wordnet words hierarchy relationship.
Does wordnet api NLKT has a function like
worda.is_parent_of(wordb)
Thanks
There is the hypernyms() method for synsets. Also lowest_common_hypernyms() can be useful.Bear in mind that synsets can contain more than one word.Some example code to navigate wordnet can be found below.
from nltk.corpus import wordnet as wn
right_whale = wn.synset('right_whale.n.01')
orca = wn.synset('orca.n.01')
print orca
print right_whale.lowest_common_hypernyms(orca)
baleen_whale = right_whale.hypernyms()[0]
print baleen_whale

Categories