I am trying to go through a loop of strings (alias names) in order to apply the unidecode, and to avoid the error of not having aliases words in some cases, I am using an if/else to check if there is a string or not, so I can apply the unidecode.
I tried by checking the length of the word by doing this code, but it keeps saying "list' object has no attribute 'element". I am not sure what should I do.
alias=record.findall("./person/names/aliases/alias")
alias_name=[element.text for element in alias]
if len(alias_name.element.text)!=0:
alias_names_str = ','.join(alias_name)
alias_names_str=unidecode.unidecode(alias_names_str)
else:
alias_name.element.text="NONE"
Not exactly sure what you are trying to accomplish, but it looks like an XML or HTML decoder.
# alias is an iterable of elements
alias=record.findall("./person/names/aliases/alias")
# alias_names is a list of strings, you have already dereferenced the text
alias_names=[element.text for element in alias]
if len(alias_names)>0:
alias_names_str = ','.join(alias_names)
alias_names_str = unidecode.unidecode(alias_names_str)
else:
alias_names_str = "NONE"
thank you! yes, I want to apply the unidecode to remove the diacritics of those strings. Your code helped, then I needed to change some details and it worked!
Thank you so much! :)
I wanna build a spell correction using python and I try to use pyspellchecker, because I have to build my own dictionary and I think pyspellchecker is easy to use with our own model or dictionary. My problem is, how to load and return my word with case_sensitive is On?
I have tried this:
spell = SpellChecker(language=None, case_sensitive=True)
but when I load my file contains many text like 'Hello' with this code:
spell.word_frequency.load_text_file('myfile.txt')
and when I start to spell with spell.correction('Hello') its return 'hello' (lower case).
Do you know how to build our own model or dictionary with our letters not diminished or it stays uppercase?
Or if you have a recommendation for spell-checking with our own model please let me know, Thank you!
Try this:
from spellchecker import SpellChecker
spell = SpellChecker(language=None, case_sensitive=True)
a = spell.word_frequency.load_words(["Hello", "HELLO", "I", "AM", "Alok", "Mishra"])
# find those words that may be misspelled
misspelled = spell.unknown(["helo", "Alk", "Mishr"])
for word in misspelled:
# Get the one `most likely` answer
print(spell.correction(word))
# Get a list of `likely` options
print(spell.candidates(word))
Output:
Alok
{'Alok'}
Hello
{'Hello'}
Mishra
{'Mishra'}
I am trying to name keys in my dictionary in a generic way because the name will be based on the data I get from a file. I am a new beginner to Python and I am not able to solve it, hope to get answer from u guys.
For example:
from collections import defaultdict
dic = defaultdict(dict)
dic = {}
if cycle = fergurson:
dic[cycle] = {}
if loop = mourinho:
a = 2
dic[cycle][loop] = {a}
Sorry if there is syntax error or any other mistake.
The variable fergurson and mourinho will be changing due to different files that I will import later on.
So I am expecting to see my output when i type :
dic[fergurson][mourinho]
the result will be:
>>>dic[fergurson][mourinho]
['2']
It will be done by using Python
Naming things, as they say, is one of the two hardest problems in Computer Science. That and cache invalidation and off-by-one errors.
Instead of focusing on what to call it now, think of how you're going to use the variable in your code a few lines down.
If you were to read code that was
for filename in directory_list:
print filename
It would be easy to presume that it is printing out a list of filenames
On the other hand, if the same code had different names
for a in b:
print a
it would be a lot less expressive as to what it is doing other than printing out a list of who knows what.
I know that this doesn't help what to call your 'dic' variable, but I hope that it gets you on the right track to find the right one for you.
i have found a way, if it is wrong please correct it
import re
dictionary={}
dsw = "I am a Geography teacher"
abc = "I am a clever student"
search = re.search(r'(?<=Geography )(.\w+)',dsw)
dictionary[search]={}
again = re.search(r'(?<=clever )(.\w+)' abc)
dictionary[search][again]={}
number = 56
dictionary[search][again]={number}
and so when you want to find your specific dictionary after running the program:
dictionary["teacher"]["student"]
you will get
>>>'56'
This is what i mean to
I'm fairly new to Python and NLTK. I am busy with an application that can perform spell checks (replaces an incorrectly spelled word with the correct one).
I'm currently using the Enchant library on Python 2.7, PyEnchant and the NLTK library. The code below is a class that handles the correction/replacement.
from nltk.metrics import edit_distance
class SpellingReplacer:
def __init__(self, dict_name='en_GB', max_dist=2):
self.spell_dict = enchant.Dict(dict_name)
self.max_dist = 2
def replace(self, word):
if self.spell_dict.check(word):
return word
suggestions = self.spell_dict.suggest(word)
if suggestions and edit_distance(word, suggestions[0]) <= self.max_dist:
return suggestions[0]
else:
return word
I have written a function that takes in a list of words and executes replace() on each word and then returns a list of those words, but spelled correctly.
def spell_check(word_list):
checked_list = []
for item in word_list:
replacer = SpellingReplacer()
r = replacer.replace(item)
checked_list.append(r)
return checked_list
>>> word_list = ['car', 'colour']
>>> spell_check(words)
['car', 'color']
Now, I don't really like this because it isn't very accurate and I'm looking for a way to achieve spelling checks and replacements on words. I also need something that can pick up spelling mistakes like "caaaar"? Are there better ways to perform spelling checks out there? If so, what are they? How does Google do it? Because their spelling suggester is very good.
Any suggestions?
You can use the autocorrect lib to spell check in python.
Example Usage:
from autocorrect import Speller
spell = Speller(lang='en')
print(spell('caaaar'))
print(spell('mussage'))
print(spell('survice'))
print(spell('hte'))
Result:
caesar
message
service
the
I'd recommend starting by carefully reading this post by Peter Norvig. (I had to something similar and I found it extremely useful.)
The following function, in particular has the ideas that you now need to make your spell checker more sophisticated: splitting, deleting, transposing, and inserting the irregular words to 'correct' them.
def edits1(word):
splits = [(word[:i], word[i:]) for i in range(len(word) + 1)]
deletes = [a + b[1:] for a, b in splits if b]
transposes = [a + b[1] + b[0] + b[2:] for a, b in splits if len(b)>1]
replaces = [a + c + b[1:] for a, b in splits for c in alphabet if b]
inserts = [a + c + b for a, b in splits for c in alphabet]
return set(deletes + transposes + replaces + inserts)
Note: The above is one snippet from Norvig's spelling corrector
And the good news is that you can incrementally add to and keep improving your spell-checker.
Hope that helps.
The best way for spell checking in python is by: SymSpell, Bk-Tree or Peter Novig's method.
The fastest one is SymSpell.
This is Method1: Reference link pyspellchecker
This library is based on Peter Norvig's implementation.
pip install pyspellchecker
from spellchecker import SpellChecker
spell = SpellChecker()
# find those words that may be misspelled
misspelled = spell.unknown(['something', 'is', 'hapenning', 'here'])
for word in misspelled:
# Get the one `most likely` answer
print(spell.correction(word))
# Get a list of `likely` options
print(spell.candidates(word))
Method2: SymSpell Python
pip install -U symspellpy
Maybe it is too late, but I am answering for future searches.
TO perform spelling mistake correction, you first need to make sure the word is not absurd or from slang like, caaaar, amazzzing etc. with repeated alphabets. So, we first need to get rid of these alphabets. As we know in English language words usually have a maximum of 2 repeated alphabets, e.g., hello., so we remove the extra repetitions from the words first and then check them for spelling.
For removing the extra alphabets, you can use Regular Expression module in Python.
Once this is done use Pyspellchecker library from Python for correcting spellings.
For implementation visit this link: https://rustyonrampage.github.io/text-mining/2017/11/28/spelling-correction-with-python-and-nltk.html
Try jamspell - it works pretty well for automatic spelling correction:
import jamspell
corrector = jamspell.TSpellCorrector()
corrector.LoadLangModel('en.bin')
corrector.FixFragment('Some sentnec with error')
# u'Some sentence with error'
corrector.GetCandidates(['Some', 'sentnec', 'with', 'error'], 1)
# ('sentence', 'senate', 'scented', 'sentinel')
IN TERMINAL
pip install gingerit
FOR CODE
from gingerit.gingerit import GingerIt
text = input("Enter text to be corrected")
result = GingerIt().parse(text)
corrections = result['corrections']
correctText = result['result']
print("Correct Text:",correctText)
print()
print("CORRECTIONS")
for d in corrections:
print("________________")
print("Previous:",d['text'])
print("Correction:",d['correct'])
print("`Definiton`:",d['definition'])
You can also try:
pip install textblob
from textblob import TextBlob
txt="machne learnig"
b = TextBlob(txt)
print("after spell correction: "+str(b.correct()))
after spell correction: machine learning
spell corrector->
you need to import a corpus on to your desktop if you store elsewhere change the path in the code i have added a few graphics as well using tkinter and this is only to tackle non word errors!!
def min_edit_dist(word1,word2):
len_1=len(word1)
len_2=len(word2)
x = [[0]*(len_2+1) for _ in range(len_1+1)]#the matrix whose last element ->edit distance
for i in range(0,len_1+1):
#initialization of base case values
x[i][0]=i
for j in range(0,len_2+1):
x[0][j]=j
for i in range (1,len_1+1):
for j in range(1,len_2+1):
if word1[i-1]==word2[j-1]:
x[i][j] = x[i-1][j-1]
else :
x[i][j]= min(x[i][j-1],x[i-1][j],x[i-1][j-1])+1
return x[i][j]
from Tkinter import *
def retrieve_text():
global word1
word1=(app_entry.get())
path="C:\Documents and Settings\Owner\Desktop\Dictionary.txt"
ffile=open(path,'r')
lines=ffile.readlines()
distance_list=[]
print "Suggestions coming right up count till 10"
for i in range(0,58109):
dist=min_edit_dist(word1,lines[i])
distance_list.append(dist)
for j in range(0,58109):
if distance_list[j]<=2:
print lines[j]
print" "
ffile.close()
if __name__ == "__main__":
app_win = Tk()
app_win.title("spell")
app_label = Label(app_win, text="Enter the incorrect word")
app_label.pack()
app_entry = Entry(app_win)
app_entry.pack()
app_button = Button(app_win, text="Get Suggestions", command=retrieve_text)
app_button.pack()
# Initialize GUI loop
app_win.mainloop()
pyspellchecker is the one of the best solutions for this problem. pyspellchecker library is based on Peter Norvig’s blog post.
It uses a Levenshtein Distance algorithm to find permutations within an edit distance of 2 from the original word.
There are two ways to install this library. The official document highly recommends using the pipev package.
install using pip
pip install pyspellchecker
install from source
git clone https://github.com/barrust/pyspellchecker.git
cd pyspellchecker
python setup.py install
the following code is the example provided from the documentation
from spellchecker import SpellChecker
spell = SpellChecker()
# find those words that may be misspelled
misspelled = spell.unknown(['something', 'is', 'hapenning', 'here'])
for word in misspelled:
# Get the one `most likely` answer
print(spell.correction(word))
# Get a list of `likely` options
print(spell.candidates(word))
from autocorrect import spell
for this you need to install, prefer anaconda and it only works for words, not sentences so that's a limitation u gonna face.
from autocorrect import spell
print(spell('intrerpreter'))
# output: interpreter
pip install scuse
from scuse import scuse
obj = scuse()
checkedspell = obj.wordf("spelling you want to check")
print(checkedspell)
Spark NLP is another option that I used and it is working excellent. A simple tutorial can be found here. https://github.com/JohnSnowLabs/spark-nlp-workshop/blob/master/jupyter/annotation/english/spell-check-ml-pipeline/Pretrained-SpellCheckML-Pipeline.ipynb
<parent1>
<span>Text1</span>
</parnet1>
<parent2>
<span>Text2</span>
</parnet2>
<parent3>
<span>Text3</span>
</parnet3>
I'm parsing this with Python & BeautifulSoup. I have a variable soupData which stores pointer for need object. How can I get pointer for the parent2, for example, if I have the text Text2. So the problem is to filter span-tags by content. How can I do this?
After correcting the spelling on the end-tags:
[e for e in soup(recursive=False, text=False) if e.span.string == 'Text2']
I don't think there's a way to do it in a single step. So:
for parenttag in soupData:
if parenttag.span.string == "Text2":
do_stuff(parenttag)
break
It's possible to use a generator expression, but not much shorter.
Using python 2.7.6 and BeautifulSoup 4.3.2 I found Marcelo's answer to give an empty list. This worked for me, however:
[x.parent for x in bSoup.findAll('span') if x.text == 'Text2'][0]
Alternatively, for a ridiculously overengineered solution (to this particular problem at least, but maybe it would be useful if you'll be doing filtering on criteria too long to put in a reasonably easily understandable list expression) you could do:
def hasText(text):
def hasTextFunc(x):
return x.text == text
return hasTextFunc
to create a function factory, then
hasTextText2 = hasText('Text2')
filter(hasTextText2,bSoup.findAll('span'))[0].parent
to get the reference to the parent tag that you were looking for