I need a language detection script. I tried Textblob library which right now give me the two letter abbreviation of the language. How can I get the complete language expansion?
This detects the language with two letter abbreviation of the language
from textblob import TextBlob
b = TextBlob("cómo estás")
language = b.detect_language()
print(language)
Actual Results : es
Expected Results : Spanish
I have the list of language and their abbreviation from this link
https://developers.google.com/admin-sdk/directory/v1/languages
The code you're using gives you a two-letter abbreviation that conforms to the ISO 639-2 international protocol. You could look up a list of these correspondences (e.g. this page and rig up a method to just input one and output the other, but given you're programming in python, someone's already done that for you.
I recommend pycountry - a general-purpose library for this type of task that also contains a number of other standards. Example of using it for this problem:
from textblob import TextBlob
import pycountry
b = TextBlob("நீங்கள் எப்படி இருக்கிறீர்கள்")
iso_code = b.detect_language()
# iso_code = "ta"
language = pycountry.languages.get(alpha_2=iso_code)
# language = Language(alpha_2='ta', alpha_3='tam', name='Tamil', scope='I', type='L')
print(language.name)
and that prints Tamil, as expected. Same works for Spanish:
>>> pycountry.languages.get(alpha_2='es').name
'Spanish'
and probably most other languages you'll encounter in whatever it is you're doing..
Related
I work on sentiment analysis . Abbreviations are one of the most widely used in natural languages. I used Spellcheker to correct spelling mistakes, and one of the problems with using this method is that it translates Abbreviations into the closest word to English. This affects the sentiment detection. Is there any code or a method that these Abbreviations can be extended according to their neighbor words?
hello here is an example that might be useful
import spacy
from scispacy.abbreviation import AbbreviationDetector
nlp=spacy.load("en_core_web_sm")
abbreviation_pipe=AbbreviationDetector(nlp)
text="stackoverflow (SO) is a question and answer site for professional and enth_usiast programmers.SO roxks!"
nlp.add_pipe(abbreviation_pipe)
def replace_acronyms(text):
doc=nlp(txt)
altered_tok=[tok.text for tok in doc]
print(doc._.abbreviations)
for abrv in doc._.abbreviations:
altered_tok[abrv.start]=str(abrv._.long_form)
return(" "join(altered_tok))
replace_acronyms(text)
replace_acronyms("Top executives of Microsoft(MS) and General Motors (GM) met today in NewYord")
I am looking for algorithms that could tell the language of the text to me(e.g. Hello - English, Bonjour - French, Servicio - Spanish) and also correct typos of the words in english. I have already explored Google's TextBlob, it is very relevant but it got "Too many requests" error as soon as my code starts executing. I also started exploring Polyglot but I am facing a lot of issues to download the library on Windows.
Code for TextBlob
*import pandas as pd
from tkinter import filedialog
from textblob import TextBlob
import time
from time import sleep
colnames = ['Word']
x=filedialog.askopenfilename(title='Select the word list')
print("Data to be checked: " + x)
df = pd.read_excel(x,sheet_name='Sheet1',header=0,names=colnames,na_values='?',dtype=str)
words = df['Word']
i=0
Language_detector=pd.DataFrame(columns=['Word','Language','corrected_word','translated_word'])
for word in words:
b = TextBlob(word)
language_word=b.detect_language()
time.sleep(0.5)
if language_word in ['en','EN']:
corrected_word=b.correct()
time.sleep(0.5)
Language_detector.loc[i, ['corrected_word']]=corrected_word
else:
translated_word=b.translate(to='en')
time.sleep(0.5)
Language_detector.loc[i, ['Word']]=word
Language_detector.loc[i, ['Language']]=language_word
Language_detector.loc[i, ['translated_word']]=translated_word
i=i+1
filename="Language detector test v 1.xlsx"
Language_detector.to_excel(filename,sheet_name='Sheet1')
print("Languages identified for the word list")**
A common way to classify languages is to gather summary statistics on letter or word frequencies and compare them to a known corpus. A naive bayesian classifier would suffice. See https://pypi.org/project/Reverend/ for a way to do this in Python.
Correction of typos can also be done from a corpus using a statistical model of the most likely words versus the likelihood of a particular typo. See, https://norvig.com/spell-correct.html for an example of how to do this in Python.
You could use this, but it is hardly reliable:
https://github.com/hb20007/hands-on-nltk-tutorial/blob/master/8-1-The-langdetect-and-langid-Libraries.ipynb
Alternatively, you could give compact language detector (cld v3) or fasttext a chance OR you could use a corpus to check frequencies of occurring words with the target text in order to find out whether the target text belongs to the language of the respective corpus. The latter is only possible if you know the set of languages to choose from.
For typo correction, you could use the Levenshtein algorithm, which computes a «edit distance». You can compare your words against a dictionary and choose the most likely word. For Python, you could use: https://pypi.org/project/python-Levenshtein/
See the concept of Levenshtein edit distance here: https://en.wikipedia.org/wiki/Levenshtein_distance
I have been trying to make an artificial intelligence on python. What I have been trying to do is make input command responses target one word. So for example, if the user types in "whats your name" it will have the same response as "name" by targeting the word "name". how can I do this?
What you're looking for is a library for handling Parts of Speech. Luckily it's pretty well trodden ground, and there are libraries for lots of different languages - including Python. Have a look at Stanford's Natural Language Toolkit (NLTK). Here's an example from the linked article:
>>> from nltk.tag.stanford import POSTagger
>>> english_postagger = POSTagger(‘models/english-bidirectional-distsim.tagger’, ‘stanford-postagger.jar’)
>>> english_postagger.tag(‘this is stanford postagger in nltk for python users’.split())
[(u’this’, u’DT’),
(u’is’, u’VBZ’),
(u’stanford’, u’JJ’),
(u’postagger’, u’NN’),
(u’in’, u’IN’),
(u’nltk’, u’NN’),
(u’for’, u’IN’),
(u’python’, u’NN’),
(u’users’, u’NNS’)]
The NN, VBZ, etc. you can see are speech tags. It looks like you're looking for nouns (NN).
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
The community reviewed whether to reopen this question 1 year ago and left it closed:
Original close reason(s) were not resolved
Improve this question
I am trying to translate large number of text files from English to other several languages. And we use Python in our project, we try to use Google translation service to translate them first then we will correct the mistakes manually.
I have come up with two ways to translate:
Use Python Google translation API. Here: goslate 1.1.2: Python Package
Try to program with the google translation page, that is, feed in the text that we want to translate, simulate the HTTP request and process the response. Google Translation
Is anyone have a better offer?
I made my own google translate function for python ;)
try it https://github.com/mouuff/Google-Translate-API
Google does in fact have an official translation API with a REST interface. You can check it out here. Note that it is a paid API with no free quota.
Try using the googletrans module. For example:
from googletrans import Translator
translator = Translator() # initalize the Translator object
translations = translator.translate(['see if this helps', 'tarun'], dest='hi') # translate two phrases to Hindi
for translation in translations: # print every translation
print(translation.text)
# Output:
# देखें कि इस मदद करता है
# तरुण
The dicts of the supported languages (106) and their ISO639-1 codes:
import googletrans
print(googletrans.LANGCODES) # {language name: iso639-1 language code}
# or
print(googletrans.LANGUAGES) # {iso639-1 language code: language name}
See the docs for more information.
One of the simplest ways is to use Selenium for getting the translations of the words and phrases.
Here is a piece of code that gets the word in English and returns the Persian (Farsi) translation. Everything is explained in the readme file on Github:
https://github.com/mnosrati/Google-Translate-Farsi
Use this
This code is using google trans module which is free to use.
From this code you can convert any language to any language and also get pronunciation of it.
from googletrans import Translator, LANGUAGES
from googletrans.models import Translated
lang = list(LANGUAGES.values())
print("Welcome to Py_Guy Translate")
input_text = input("Please Enter Your Text in english:\n")
out_lang = input("Please enter output language name (ex.-hindi,gujarati,japanese:\n
").lower()
if out_lang not in lang:
print("Sorry This Language is not available to translate")
else:
translator = Translator()
translated = translator.translate(text=input_text, src="english",dest=out_lang)
translated = str(translated).split(", ")
converted = translated[2]
pro = translated[3]
print(converted)
print(pro)
def translate_text(target, text):
"""Translates text into the target language.
Target must be an ISO 639-1 language code.
See https://g.co/cloud/translate/v2/translate-reference#supported_languages
"""
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "secret.json"
translate_client = translate.Client()
if isinstance(text, six.binary_type):
text = text.decode("utf-8")
# Text can also be a sequence of strings, in which case this method
# will return a sequence of results for each text.
result = translate_client.translate(text, target_language=target)
return result["translatedText"]
Check out the complete code for translate with google api:
https://neculaifantanaru.com/en/example-google-translate-api-key-python-code-beautifulsoup.html
Since the origin of this post, connecting to the Google Translate API has become a whole lot easier. That being said, I would still recommend connecting directly to the Google Translate API, but now through it's RapidAPI page here.
You can find out how to obtain an API key here. Just input the API key into the API's function page on Rapid API and click TEST Function. For example, that’s what a basic english to german translation will look like:
Just note that de is the language code for German. RapidAPI will generate a code snippet for you so you can just copy and paste the API call directly into your project.
What is the best way to approach writing a program in Python to translate English words and/or phrases into other languages?
AJAX Language API
This is an incredibly difficult problem -- language is very very very complicated. Think about all the things you'd have to do -- parse the phrase, work out what the words mean, translate them. That's probably not idiomatic so you'll need special cases for different syntaxes. Many, many special cases. You'll need to work out the syntax of the foreign language if it differs from English -- "the big green ball" goes to "the ball big green" in Spanish, for instance.
Don't reinvent the wheel. Google provide an API to their translation service, which has undoubtedly had many many clever people thinking really quite hard about it.
I think you should look into the Google Translate API. Here is a library implemented specifically for this purpose in python.
the simplest way to do this is to make a dictionary that matches one language's words to another language's words. However, this is extremely silly and would not take into account grammar at all and it would literally take a very long time to create a translator, especially if you plan to use it for multiple languages. If grammar is not important to you (for example, if you were creating your own language for a game or story that doesn't have grammar different from english) than you could get away with using dictionaries and simply having a function look for a requested match in the dictionary
# command : pip install mtranslate
from mtranslate import translate
>>> from mtranslate import translate
>>> translate("Tranalating to kannada language (my mother tongue) ", to_language = "kn")
'ಕನ್ನಡ ಭಾಷೆಗೆ ಅನುವಾದ (ನನ್ನ ಮಾತೃಭಾಷೆ)'
You can use Goslate Package for that
its very easy to use
example
import goslate
print(goslate.Goslate().translate('hello world', 'ar'))
now first argument is text you want to translate and second argument is in which language you want to translate.
i hope you will find the answer usefull
# Please install Microsoft translate using >> pip install translate
from translate import Translator
class clsTranslate():
def translateText(self, strString, strTolang):
self.strString = strString
self.strTolang = strTolang
translator = Translator(to_lang=self.strTolang)
translation = translator.translate(self.strString)
return (str(translation))
# Create a Class object and call the Translate function
# Pass the language as a parameter to the function, de: German zh: Chinese etc
objTrans=clsTranslate()
strTranslatedText= objTrans.translateText('Howare you', 'de')
print(strTranslatedText)
It's very very easy if you use deep-translator! Here's the source code(make sure to install deep-translator module):
from deep_translator import GoogleTranslator
import time
def start():
while True:
def translate():
line_to_translate = input('Which line/phrase/word you want to translate?\n')
to_lang = input('In which language you want to translate it?\n')
to_lang = to_lang.lower()
translation = GoogleTranslator(source='auto', target=to_lang).translate(text=line_to_translate)
return translation
time.sleep(1 sec)
esc = (input("Enter 'q' to exit and 'r' to restart.\n"))
while True:
if esc.lower() in {'q', 'r'}:
break
else:
print('Please enter a valid Option!!')
time.sleep(1)
esc = (input("Enter 'q' to exit and 'r' to restart.\n"))
if esc.lower() == 'q':
return
elif esc.lower() == 'r':
pass
start()
# command : pip install mtranslate
from mtranslate import translate
>>> from mtranslate import translate
>>> translate("Tranalating to kannada language (my mother tongue) ", to_language = "kn")
'ಕನ್ನಡ ಭಾಷೆಗೆ ಅನುವಾದ (ನನ್ನ ಮಾತೃಭಾಷೆ)'