Getting adjective from an adverb in nltk or other NLP library - python

Is there a way to get an adjective corresponding to a given adverb in NLTK or other python library.
For example, for the adverb "terribly", I need to get "terrible".
Thanks.

There is a relation in wordnet that connects the adjectives to adverbs and vice versa.
>>> from itertools import chain
>>> from nltk.corpus import wordnet as wn
>>> from difflib import get_close_matches as gcm
>>> possible_adjectives = [k.name for k in chain(*[j.pertainyms() for j in chain(*[i.lemmas for i in wn.synsets('terribly')])])]
['terrible', 'atrocious', 'awful', 'rotten']
>>> gcm('terribly',possible_adjectives)
['terrible']
A more human readable way to computepossible_adjective is as followed:
possible_adj = []
for ss in wn.synsets('terribly'):
for lemmas in ss.lemmas: # all possible lemmas.
for lemma in lemmas:
for ps in lemma.pertainyms(): # all possible pertainyms.
for p in ps:
for ln in p.name: # all possible lemma names.
possible_adj.append(ln)
EDIT: In the newer version of NLTK:
possible_adj = []
for ss in wn.synsets('terribly'):
for lemmas in ss.lemmas(): # all possible lemmas
for ps in lemmas.pertainyms(): # all possible pertainyms
possible_adj.append(ps.name())

As MKoosej mentioned, nltk's lemmas is no longer an attribute but a method. I also made a little simplification to get the most possible word. Hope someone else can use it also:
wordtoinv = 'unduly'
s = []
winner = ""
for ss in wn.synsets(wordtoinv):
for lemmas in ss.lemmas(): # all possible lemmas.
s.append(lemmas)
for pers in s:
posword = pers.pertainyms()[0].name()
if posword[0:3] == wordtoinv[0:3]:
winner = posword
break
print winner # undue

Related

Find NLTK Wordnet Synsets for combined items in a list

I am new to NLTK. I want to use nltk to extract hyponyms for a given list of words, specifically, for some combined words
my code:
import nltk
from nltk.corpus import wordnet as wn
list = ["real_time", 'Big_data', "Healthcare",
'Fuzzy_logic', 'Computer_vision']
def get_synset(a_list):
synset_list = []
for word in a_list:
a = wn.synsets(word)[:1] #The index is to ensure each word gets assigned 1st synset only
synset_list.append(a)
return synset_list
lst_synsets = get_synset(list)
lst_synsets
Here is the output:
[[Synset('real_time.n.01')],
[],
[Synset('healthcare.n.01')],
[Synset('fuzzy_logic.n.01')],
[]]
How could I find NLTK Wordnet Synsets for combined items? if no, any suggestion to use one of these methods for combined terms?

How to extract all words in a noun food category in WordNet?

I am trying to get all the words in Wordnet dictionary that are of type noun and category food.
I have found a way to check if a word is noun.food but I need the reverse method:
import nltk
nltk.download('wordnet')
from nltk.corpus import wordnet as wn
def if_food(word):
syns = wn.synsets(word, pos = wn.NOUN)
for syn in syns:
print(syn.lexname())
if 'food' in syn.lexname():
return 1
return 0
So I think I have found a solution:
# Using the NLTK WordNet dictionary check if the word is noun and a food.
import nltk
nltk.download('wordnet')
from nltk.corpus import wordnet as wn
def if_food(word):
syns = wn.synsets(str(word), pos = wn.NOUN)
for syn in syns:
if 'food' in syn.lexname():
return 1
return 0
Then using the qdapDictionaries::GradyAugmented R English words dictionary I have checked each word if it's a noun.food:
en_dict = pd.read_csv("GradyAugmentedENDict.csv")
en_dict['is_food'] = en_dict.word.apply(if_food)
en_dict[en_dict.is_food == 1].to_csv("en_dict_is_food.csv")
It it actually did the job.
Hope it will help others.

How to calculate the deepest node in WordNet using NLTK?

Is there built-in functionality to find the lowest word in a word hierarchy using NLTK? For example, if there were no edge between 'placenta' and 'carnivore' in the first graph at http://www.randomhacks.net/2009/12/29/visualizing-wordnet-relationships-as-graphs/, the lowest words would be 'placenta' and 'carnivore' (both having distance 10 from 'entity').
You can find the synset with no hyponyms, e.g.
from nltk.corpus import wordnet as wn
lowest_level = set()
for ss in wn.all_synsets():
if ss.hyponyms() == []:
lowest_level.add(ss)
len(lowest_level) # 97651
If you would like to exclude synsets with instance hyponyms:
from nltk.corpus import wordnet as wn
lowest_level = set()
for ss in wn.all_synsets():
if ss.hyponyms() == ss.instance_hyponyms() == []:
lowest_level.add(ss)
len(lowest_level) # 97187

How to get the gloss given sense key using Nltk WordNet?

I got a set of sense key such as "long%3:00:02::" from SemCor+OMSTI. How can I get the glosses? Is there a map file? Or using Nltk WordNet?
TL;DR
import re
from nltk.corpus import wordnet as wn
sense_key_regex = r"(.*)\%(.*):(.*):(.*):(.*):(.*)"
synset_types = {1:'n', 2:'v', 3:'a', 4:'r', 5:'s'}
def synset_from_sense_key(sense_key):
lemma, ss_type, lex_num, lex_id, head_word, head_id = re.match(sense_key_regex, sense_key).groups()
ss_idx = '.'.join([lemma, synset_types[int(ss_type)], lex_id])
return wn.synset(ss_idx)
x = "long%3:00:02::"
synset_from_sense_key(x)
In Long
There's this really obtuse function in NLTK. However, that doesn't read from the sense key but from data_file_map (e.g. "data.adj", "data.noun", etc.): https://github.com/nltk/nltk/blob/develop/nltk/corpus/reader/wordnet.py#L1355
Since we already have a mere-mortal understandable API in NTLK, with some guides from https://wordnet.princeton.edu/wordnet/man/senseidx.5WN.html ,
A sense_key is represented as:
lemma % lex_sense
where lex_sense is encoded as:
ss_type:lex_filenum:lex_id:head_word:head_id
(yada, yada...)
The synset type is encoded as follows:
1 NOUN
2 VERB
3 ADJECTIVE
4 ADVERB
5 ADJECTIVE SATELLITE
we can do this using a regex https://regex101.com/r/9KlVK7/1/:
>>> import re
>>> sense_key_regex = r"(.*)\%(.*):(.*):(.*):(.*):(.*)"
>>> x = "long%3:00:02::"
>>> re.match(sense_key_regex, x)
<_sre.SRE_Match object at 0x10061ad78>
>>> re.match(sense_key_regex, x).groups()
('long', '3', '00', '02', '', '')
>>> lemma, ss_type, lex_num, lex_id, head_word, head_id = re.match(sense_key_regex, x).groups()
>>> synset_types = {1:'n', 2:'v', 3:'a', 4:'r', 5:'s'}
>>> '.'.join([lemma, synset_types[int(ss_type)], lex_id])
'long.a.02'
And voila you get the NLTK Synset() object from the sense key =)
>>> from nltk.corpus import wordnet as wn
>>> wn.synset(idx)
Synset('long.a.02')
I solved this by download this.
http://wordnet.princeton.edu/glosstag.shtml
Use the files in WordNet-3.0\glosstag\merged to create my own map dic.
The first answer provides wrong answer. Also, there are many keys in wordnet
for which synset does not exists. For this reason, you can use the following code for wordnet 3.0:
import nltk
from nltk.corpus import wordnet as wn
def synset_from_key(sense_key):
lem=wn.lemma_from_key(sense_key)
return lem.synset()
key='england%1:15:00::'
try:
ss=synset_from_key(ky)
print(ss)
except:
print("No Synset Found.")
You can also find the definition by using:
print(ss.definition())
More details can be found at: https://www.nltk.org/howto/wordnet.html

How to generate a list of antonyms for adjectives in WordNet using Python

I want to do the following in Python (I have the NLTK library, but I'm not great with Python, so I've written the following in a weird pseudocode):
from nltk.corpus import wordnet as wn #Import the WordNet library
for each adjective as adj in wn #Get all adjectives from the wordnet dictionary
print adj & antonym #List all antonyms for each adjective
once list is complete then export to txt file
This is so I can generate a complete dictionary of antonyms for adjectives. I think it should be doable, but I don't know how to create the Python script. I'd like to do it in Python as that's the NLTK's native language.
from nltk.corpus import wordnet as wn
for i in wn.all_synsets():
if i.pos() in ['a', 's']: # If synset is adj or satelite-adj.
for j in i.lemmas(): # Iterating through lemmas for each synset.
if j.antonyms(): # If adj has antonym.
# Prints the adj-antonym pair.
print j.name(), j.antonyms()[0].name()
Note that there will be reverse duplicates.
[out]:
able unable
unable able
abaxial adaxial
adaxial abaxial
acroscopic basiscopic
basiscopic acroscopic
abducent adducent
adducent abducent
nascent dying
dying nascent
abridged unabridged
unabridged abridged
absolute relative
relative absolute
absorbent nonabsorbent
nonabsorbent absorbent
adsorbent nonadsorbent
nonadsorbent adsorbent
absorbable adsorbable
adsorbable absorbable
abstemious gluttonous
gluttonous abstemious
abstract concrete
...
The following function uses WordNet to return a set of adjective-only antonyms for a given word:
from nltk.corpus import wordnet as wn
def antonyms_for(word):
antonyms = set()
for ss in wn.synsets(word):
for lemma in ss.lemmas():
any_pos_antonyms = [ antonym.name() for antonym in lemma.antonyms() ]
for antonym in any_pos_antonyms:
antonym_synsets = wn.synsets(antonym)
if wn.ADJ not in [ ss.pos() for ss in antonym_synsets ]:
continue
antonyms.add(antonym)
return antonyms
Usage:
print(antonyms_for("good"))

Categories