What is the best way to approach writing a program in Python to translate English words and/or phrases into other languages?
AJAX Language API
This is an incredibly difficult problem -- language is very very very complicated. Think about all the things you'd have to do -- parse the phrase, work out what the words mean, translate them. That's probably not idiomatic so you'll need special cases for different syntaxes. Many, many special cases. You'll need to work out the syntax of the foreign language if it differs from English -- "the big green ball" goes to "the ball big green" in Spanish, for instance.
Don't reinvent the wheel. Google provide an API to their translation service, which has undoubtedly had many many clever people thinking really quite hard about it.
I think you should look into the Google Translate API. Here is a library implemented specifically for this purpose in python.
the simplest way to do this is to make a dictionary that matches one language's words to another language's words. However, this is extremely silly and would not take into account grammar at all and it would literally take a very long time to create a translator, especially if you plan to use it for multiple languages. If grammar is not important to you (for example, if you were creating your own language for a game or story that doesn't have grammar different from english) than you could get away with using dictionaries and simply having a function look for a requested match in the dictionary
# command : pip install mtranslate
from mtranslate import translate
>>> from mtranslate import translate
>>> translate("Tranalating to kannada language (my mother tongue) ", to_language = "kn")
'ಕನ್ನಡ ಭಾಷೆಗೆ ಅನುವಾದ (ನನ್ನ ಮಾತೃಭಾಷೆ)'
You can use Goslate Package for that
its very easy to use
example
import goslate
print(goslate.Goslate().translate('hello world', 'ar'))
now first argument is text you want to translate and second argument is in which language you want to translate.
i hope you will find the answer usefull
# Please install Microsoft translate using >> pip install translate
from translate import Translator
class clsTranslate():
def translateText(self, strString, strTolang):
self.strString = strString
self.strTolang = strTolang
translator = Translator(to_lang=self.strTolang)
translation = translator.translate(self.strString)
return (str(translation))
# Create a Class object and call the Translate function
# Pass the language as a parameter to the function, de: German zh: Chinese etc
objTrans=clsTranslate()
strTranslatedText= objTrans.translateText('Howare you', 'de')
print(strTranslatedText)
It's very very easy if you use deep-translator! Here's the source code(make sure to install deep-translator module):
from deep_translator import GoogleTranslator
import time
def start():
while True:
def translate():
line_to_translate = input('Which line/phrase/word you want to translate?\n')
to_lang = input('In which language you want to translate it?\n')
to_lang = to_lang.lower()
translation = GoogleTranslator(source='auto', target=to_lang).translate(text=line_to_translate)
return translation
time.sleep(1 sec)
esc = (input("Enter 'q' to exit and 'r' to restart.\n"))
while True:
if esc.lower() in {'q', 'r'}:
break
else:
print('Please enter a valid Option!!')
time.sleep(1)
esc = (input("Enter 'q' to exit and 'r' to restart.\n"))
if esc.lower() == 'q':
return
elif esc.lower() == 'r':
pass
start()
# command : pip install mtranslate
from mtranslate import translate
>>> from mtranslate import translate
>>> translate("Tranalating to kannada language (my mother tongue) ", to_language = "kn")
'ಕನ್ನಡ ಭಾಷೆಗೆ ಅನುವಾದ (ನನ್ನ ಮಾತೃಭಾಷೆ)'
Related
I would like to use python to convert all synonyms and plural forms of words to the base version of the word.
e.g. Babies would become baby and so would infant and infants.
I tried creating a naive version of plural to root code but it has the issue that it doesn't always function correctly and can't detect a large amount of cases.
contents = ["buying", "stalls", "responsibilities"]
for token in contents:
if token.endswith("ies"):
token = token.replace('ies','y')
elif token.endswith('s'):
token = token[:-1]
elif token.endswith("ed"):
token = token[:-2]
elif token.endswith("ing"):
token = token[:-3]
print(contents)
I have not used this library before, so that this with a grain of salt. However, NodeBox Linguistics seems to be a reasonable set of scripts that will do exactly what you are looking for if you are on MacOS. Check the link here: https://www.nodebox.net/code/index.php/Linguistics
Based on their documentation, it looks like you will be able to use lines like so:
print( en.noun.singular("people") )
>>> person
print( en.verb.infinitive("swimming") )
>>> swim
etc.
In addition to the example above, another to consider is a natural language processing library like NLTK. The reason why I recommend using an external library is because English has a lot of exceptions. As mentioned in my comment, consider words like: class, fling, red, geese, etc., which would trip up the rules that was mentioned in the original question.
I build a python library - Plurals and Countable, which is open source on github. The main purpose is to get plurals (yes, mutliple plurals for some words), but it also solves this particular problem.
import plurals_counterable as pluc
pluc.pluc_lookup_plurals('men', strict_level='dictionary')
will return a dictionary of the following.
{
'query': 'men',
'base': 'man',
'plural': ['men'],
'countable': 'countable'
}
The base field is what you need.
The library actually looks up the words in dictionaries, so it takes some time to request, parse and return. Alternatively, you might use REST API provided by Dictionary.video. You'll need contact admin#dictionary.video to get an API key. The call will be like
import requests
import json
import logging
url = 'https://dictionary.video/api/noun/plurals/men?key=YOUR_API_KEY'
response = requests.get(url)
if response.status_code == 200:
return json.loads(response.text)['base']
else:
logging.error(url + ' response: status_code[%d]' % response.status_code)
return None
I am using both Nltk and Scikit Learn to do some text processing. However, within my list of documents I have some documents that are not in English. For example, the following could be true:
[ "this is some text written in English",
"this is some more text written in English",
"Ce n'est pas en anglais" ]
For the purposes of my analysis, I want all sentences that are not in English to be removed as part of pre-processing. However, is there a good way to do this? I have been Googling, but cannot find anything specific that will let me recognize if strings are in English or not. Is this something that is not offered as functionality in either Nltk or Scikit learn? EDIT I've seen questions both like this and this but both are for individual words... Not a "document". Would I have to loop through every word in a sentence to check if the whole sentence is in English?
I'm using Python, so libraries that are in Python would be preferable, but I can switch languages if needed, just thought that Python would be the best for this.
There is a library called langdetect. It is ported from Google's language-detection available here:
https://pypi.python.org/pypi/langdetect
It supports 55 languages out of the box.
You might be interested in my paper The WiLI benchmark dataset for written
language identification. I also benchmarked a couple of tools.
TL;DR:
CLD-2 is pretty good and extremely fast
lang-detect is a tiny bit better, but much slower
langid is good, but CLD-2 and lang-detect are much better
NLTK's Textcat is neither efficient nor effective.
You can install lidtk and classify languages:
$ lidtk cld2 predict --text "this is some text written in English"
eng
$ lidtk cld2 predict --text "this is some more text written in English"
eng
$ lidtk cld2 predict --text "Ce n'est pas en anglais"
fra
Pretrained Fast Text Model Worked Best For My Similar Needs
I arrived at your question with a very similar need. I appreciated Martin Thoma's answer. However, I found the most help from Rabash's answer part 7 HERE.
After experimenting to find what worked best for my needs, which were making sure text files were in English in 60,000+ text files, I found that fasttext was an excellent tool.
With a little work, I had a tool that worked very fast over many files. Below is the code with comments. I believe that you and others will be able to modify this code for your more specific needs.
class English_Check:
def __init__(self):
# Don't need to train a model to detect languages. A model exists
# that is very good. Let's use it.
pretrained_model_path = 'location of your lid.176.ftz file from fasttext'
self.model = fasttext.load_model(pretrained_model_path)
def predictionict_languages(self, text_file):
this_D = {}
with open(text_file, 'r') as f:
fla = f.readlines() # fla = file line array.
# fasttext doesn't like newline characters, but it can take
# an array of lines from a file. The two list comprehensions
# below, just clean up the lines in fla
fla = [line.rstrip('\n').strip(' ') for line in fla]
fla = [line for line in fla if len(line) > 0]
for line in fla: # Language predict each line of the file
language_tuple = self.model.predictionict(line)
# The next two lines simply get at the top language prediction
# string AND the confidence value for that prediction.
prediction = language_tuple[0][0].replace('__label__', '')
value = language_tuple[1][0]
# Each top language prediction for the lines in the file
# becomes a unique key for the this_D dictionary.
# Everytime that language is found, add the confidence
# score to the running tally for that language.
if prediction not in this_D.keys():
this_D[prediction] = 0
this_D[prediction] += value
self.this_D = this_D
def determine_if_file_is_english(self, text_file):
self.predictionict_languages(text_file)
# Find the max tallied confidence and the sum of all confidences.
max_value = max(self.this_D.values())
sum_of_values = sum(self.this_D.values())
# calculate a relative confidence of the max confidence to all
# confidence scores. Then find the key with the max confidence.
confidence = max_value / sum_of_values
max_key = [key for key in self.this_D.keys()
if self.this_D[key] == max_value][0]
# Only want to know if this is english or not.
return max_key == 'en'
Below is the application / instantiation and use of the above class for my needs.
file_list = # some tool to get my specific list of files to check for English
en_checker = English_Check()
for file in file_list:
check = en_checker.determine_if_file_is_english(file)
if not check:
print(file)
This is what I've used some time ago.
It works for texts longer than 3 words and with less than 3 non-recognized words.
Of course, you can play with the settings, but for my use case (website scraping) those worked pretty well.
from enchant.checker import SpellChecker
max_error_count = 4
min_text_length = 3
def is_in_english(quote):
d = SpellChecker("en_US")
d.set_text(quote)
errors = [err.word for err in d]
return False if ((len(errors) > max_error_count) or len(quote.split()) < min_text_length) else True
print(is_in_english('“中文”'))
print(is_in_english('“Two things are infinite: the universe and human stupidity; and I\'m not sure about the universe.”'))
> False
> True
Use the enchant library
import enchant
dictionary = enchant.Dict("en_US") #also available are en_GB, fr_FR, etc
dictionary.check("Hello") # prints True
dictionary.check("Helo") #prints False
This example is taken directly from their website
If you want something lightweight, letter trigrams are a popular approach. Every language has a different "profile" of common and uncommon trigrams. You can google around for it, or code your own. Here's a sample implementation I came across, which uses "cosine similarity" as a measure of distance between the sample text and the reference data:
http://code.activestate.com/recipes/326576-language-detection-using-character-trigrams/
If you know the common non-English languages in your corpus, it's pretty easy to turn this into a yes/no test. If you don't, you need to anticipate sentences from languages for which you don't have trigram statistics. I would do some testing to see the normal range of similarity scores for single-sentence texts in your documents, and choose a suitable threshold for the English cosine score.
import enchant
def check(text):
text=text.split()
dictionary = enchant.Dict("en_US") #also available are en_GB, fr_FR, etc
for i in range(len(text)):
if(dictionary.check(text[i])==False):
o = "False"
break
else:
o = ("True")
return o
I'm fairly new to Python and NLTK. I am busy with an application that can perform spell checks (replaces an incorrectly spelled word with the correct one).
I'm currently using the Enchant library on Python 2.7, PyEnchant and the NLTK library. The code below is a class that handles the correction/replacement.
from nltk.metrics import edit_distance
class SpellingReplacer:
def __init__(self, dict_name='en_GB', max_dist=2):
self.spell_dict = enchant.Dict(dict_name)
self.max_dist = 2
def replace(self, word):
if self.spell_dict.check(word):
return word
suggestions = self.spell_dict.suggest(word)
if suggestions and edit_distance(word, suggestions[0]) <= self.max_dist:
return suggestions[0]
else:
return word
I have written a function that takes in a list of words and executes replace() on each word and then returns a list of those words, but spelled correctly.
def spell_check(word_list):
checked_list = []
for item in word_list:
replacer = SpellingReplacer()
r = replacer.replace(item)
checked_list.append(r)
return checked_list
>>> word_list = ['car', 'colour']
>>> spell_check(words)
['car', 'color']
Now, I don't really like this because it isn't very accurate and I'm looking for a way to achieve spelling checks and replacements on words. I also need something that can pick up spelling mistakes like "caaaar"? Are there better ways to perform spelling checks out there? If so, what are they? How does Google do it? Because their spelling suggester is very good.
Any suggestions?
You can use the autocorrect lib to spell check in python.
Example Usage:
from autocorrect import Speller
spell = Speller(lang='en')
print(spell('caaaar'))
print(spell('mussage'))
print(spell('survice'))
print(spell('hte'))
Result:
caesar
message
service
the
I'd recommend starting by carefully reading this post by Peter Norvig. (I had to something similar and I found it extremely useful.)
The following function, in particular has the ideas that you now need to make your spell checker more sophisticated: splitting, deleting, transposing, and inserting the irregular words to 'correct' them.
def edits1(word):
splits = [(word[:i], word[i:]) for i in range(len(word) + 1)]
deletes = [a + b[1:] for a, b in splits if b]
transposes = [a + b[1] + b[0] + b[2:] for a, b in splits if len(b)>1]
replaces = [a + c + b[1:] for a, b in splits for c in alphabet if b]
inserts = [a + c + b for a, b in splits for c in alphabet]
return set(deletes + transposes + replaces + inserts)
Note: The above is one snippet from Norvig's spelling corrector
And the good news is that you can incrementally add to and keep improving your spell-checker.
Hope that helps.
The best way for spell checking in python is by: SymSpell, Bk-Tree or Peter Novig's method.
The fastest one is SymSpell.
This is Method1: Reference link pyspellchecker
This library is based on Peter Norvig's implementation.
pip install pyspellchecker
from spellchecker import SpellChecker
spell = SpellChecker()
# find those words that may be misspelled
misspelled = spell.unknown(['something', 'is', 'hapenning', 'here'])
for word in misspelled:
# Get the one `most likely` answer
print(spell.correction(word))
# Get a list of `likely` options
print(spell.candidates(word))
Method2: SymSpell Python
pip install -U symspellpy
Maybe it is too late, but I am answering for future searches.
TO perform spelling mistake correction, you first need to make sure the word is not absurd or from slang like, caaaar, amazzzing etc. with repeated alphabets. So, we first need to get rid of these alphabets. As we know in English language words usually have a maximum of 2 repeated alphabets, e.g., hello., so we remove the extra repetitions from the words first and then check them for spelling.
For removing the extra alphabets, you can use Regular Expression module in Python.
Once this is done use Pyspellchecker library from Python for correcting spellings.
For implementation visit this link: https://rustyonrampage.github.io/text-mining/2017/11/28/spelling-correction-with-python-and-nltk.html
Try jamspell - it works pretty well for automatic spelling correction:
import jamspell
corrector = jamspell.TSpellCorrector()
corrector.LoadLangModel('en.bin')
corrector.FixFragment('Some sentnec with error')
# u'Some sentence with error'
corrector.GetCandidates(['Some', 'sentnec', 'with', 'error'], 1)
# ('sentence', 'senate', 'scented', 'sentinel')
IN TERMINAL
pip install gingerit
FOR CODE
from gingerit.gingerit import GingerIt
text = input("Enter text to be corrected")
result = GingerIt().parse(text)
corrections = result['corrections']
correctText = result['result']
print("Correct Text:",correctText)
print()
print("CORRECTIONS")
for d in corrections:
print("________________")
print("Previous:",d['text'])
print("Correction:",d['correct'])
print("`Definiton`:",d['definition'])
You can also try:
pip install textblob
from textblob import TextBlob
txt="machne learnig"
b = TextBlob(txt)
print("after spell correction: "+str(b.correct()))
after spell correction: machine learning
spell corrector->
you need to import a corpus on to your desktop if you store elsewhere change the path in the code i have added a few graphics as well using tkinter and this is only to tackle non word errors!!
def min_edit_dist(word1,word2):
len_1=len(word1)
len_2=len(word2)
x = [[0]*(len_2+1) for _ in range(len_1+1)]#the matrix whose last element ->edit distance
for i in range(0,len_1+1):
#initialization of base case values
x[i][0]=i
for j in range(0,len_2+1):
x[0][j]=j
for i in range (1,len_1+1):
for j in range(1,len_2+1):
if word1[i-1]==word2[j-1]:
x[i][j] = x[i-1][j-1]
else :
x[i][j]= min(x[i][j-1],x[i-1][j],x[i-1][j-1])+1
return x[i][j]
from Tkinter import *
def retrieve_text():
global word1
word1=(app_entry.get())
path="C:\Documents and Settings\Owner\Desktop\Dictionary.txt"
ffile=open(path,'r')
lines=ffile.readlines()
distance_list=[]
print "Suggestions coming right up count till 10"
for i in range(0,58109):
dist=min_edit_dist(word1,lines[i])
distance_list.append(dist)
for j in range(0,58109):
if distance_list[j]<=2:
print lines[j]
print" "
ffile.close()
if __name__ == "__main__":
app_win = Tk()
app_win.title("spell")
app_label = Label(app_win, text="Enter the incorrect word")
app_label.pack()
app_entry = Entry(app_win)
app_entry.pack()
app_button = Button(app_win, text="Get Suggestions", command=retrieve_text)
app_button.pack()
# Initialize GUI loop
app_win.mainloop()
pyspellchecker is the one of the best solutions for this problem. pyspellchecker library is based on Peter Norvig’s blog post.
It uses a Levenshtein Distance algorithm to find permutations within an edit distance of 2 from the original word.
There are two ways to install this library. The official document highly recommends using the pipev package.
install using pip
pip install pyspellchecker
install from source
git clone https://github.com/barrust/pyspellchecker.git
cd pyspellchecker
python setup.py install
the following code is the example provided from the documentation
from spellchecker import SpellChecker
spell = SpellChecker()
# find those words that may be misspelled
misspelled = spell.unknown(['something', 'is', 'hapenning', 'here'])
for word in misspelled:
# Get the one `most likely` answer
print(spell.correction(word))
# Get a list of `likely` options
print(spell.candidates(word))
from autocorrect import spell
for this you need to install, prefer anaconda and it only works for words, not sentences so that's a limitation u gonna face.
from autocorrect import spell
print(spell('intrerpreter'))
# output: interpreter
pip install scuse
from scuse import scuse
obj = scuse()
checkedspell = obj.wordf("spelling you want to check")
print(checkedspell)
Spark NLP is another option that I used and it is working excellent. A simple tutorial can be found here. https://github.com/JohnSnowLabs/spark-nlp-workshop/blob/master/jupyter/annotation/english/spell-check-ml-pipeline/Pretrained-SpellCheckML-Pipeline.ipynb
I'm looking to create a search function for my flash game website.
One of the problems with the site is that it is difficult to find a specific game you want, as users must go to the alphabetical list to find one they want.
It's run with Google App Engine written in python, using the webapp framework.
At the very least I need a simple way to search games by their name. It might be easier to do searching in Javascript from the looks of it. I would prefer an autocomplete functionality. I've tried to figure out how to go about this and it seems that the only way is to create a huge index with each name broken up into various stages of being typed ("S", "Sh", "Sho" ... "Shopping Cart Hero").
Is there anyway to do this simply and easily? I'm beginning to think I'll have to create a web service on a PHP+MySql server and search using it.
I have written the code below to handle this. Basically, I save all the possible word "starts" in a list instead of whole sentences. That's how the jquery autocomplete of this site works.
import unicodedata
import re
splitter = re.compile(r'[\s|\-|\)|\(|/]+')
def remove_accents(text):
nkfd_form = unicodedata.normalize('NFKD', unicode(text))
return u"".join([c for c in nkfd_form if not unicodedata.combining(c)])
def get_words(text):
return [s.lower() for s in splitter.split(remove_accents(text)) if s!= '']
def get_unique_words(text):
word_set = set(get_words(text))
return word_set
def get_starts(text):
word_set = get_unique_words(text)
starts = set()
for word in word_set:
for i in range(len(word)):
starts.add(word[:i+1])
return sorted(starts)
Have you looked at gae-search? I believe the Django + jQuery "autocomplete" feature is not part of the free version (it's just in the for-pay premium version), but maybe it's worth a little money to you.
I am writing a game in python and have decided to create a DSL for the map data files. I know I could write my own parser with regex, but I am wondering if there are existing python tools which can do this more easily, like re2c which is used in the PHP engine.
Some extra info:
Yes, I do need a DSL, and even if I didn't I still want the experience of building and using one in a project.
The DSL contains only data (declarative?), it doesn't get "executed". Most lines look like:
SOMETHING: !abc #123 #xyz/123
I just need to read the tree of data.
I've always been impressed by pyparsing. The author, Paul McGuire, is active on the python list/comp.lang.python and has always been very helpful with any queries concerning it.
Here's an approach that works really well.
abc= ONETHING( ... )
xyz= ANOTHERTHING( ... )
pqr= SOMETHING( this=abc, that=123, more=(xyz,123) )
Declarative. Easy-to-parse.
And...
It's actually Python. A few class declarations and the work is done. The DSL is actually class declarations.
What's important is that a DSL merely creates objects. When you define a DSL, first you have to start with an object model. Later, you put some syntax around that object model. You don't start with syntax, you start with the model.
Yes, there are many -- too many -- parsing tools, but none in the standard library.
From what what I saw PLY and SPARK are popular. PLY is like yacc, but you do everything in Python because you write your grammar in docstrings.
Personally, I like the concept of parser combinators (taken from functional programming), and I quite like pyparsing: you write your grammar and actions directly in python and it is easy to start with. I ended up producing my own tree node types with actions though, instead of using their default ParserElement type.
Otherwise, you can also use existing declarative language like YAML.
I have written something like this in work to read in SNMP notification definitions and automatically generate Java classes and SNMP MIB files from this. Using this little DSL, I could write 20 lines of my specification and it would generate roughly 80 lines of Java code and a 100 line MIB file.
To implement this, I actually just used straight Python string handling (split(), slicing etc) to parse the file. I find Pythons string capabilities to be adequate for most of my (simple) parsing needs.
Besides the libraries mentioned by others, if I were writing something more complex and needed proper parsing capabilities, I would probably use ANTLR, which supports Python (and other languages).
For "small languages" as the one you are describing, I use a simple split, shlex (mind that the # defines a comment) or regular expressions.
>>> line = 'SOMETHING: !abc #123 #xyz/123'
>>> line.split()
['SOMETHING:', '!abc', '#123', '#xyz/123']
>>> import shlex
>>> list(shlex.shlex(line))
['SOMETHING', ':', '!', 'abc', '#', '123']
The following is an example, as I do not know exactly what you are looking for.
>>> import re
>>> result = re.match(r'([A-Z]*): !([a-z]*) #([0-9]*) #([a-z0-9/]*)', line)
>>> result.groups()
('SOMETHING', 'abc', '123', 'xyz/123')
DSLs are a good thing, so you don't need to defend yourself :-)
However, have you considered an internal DSL ? These have so many pros versus external (parsed) DSLs that they're at least worth consideration. Mixing a DSL with the power of the native language really solves lots of the problems for you, and Python is not really bad at internal DSLs, with the with statement handy.
On the lines of declarative python, I wrote a helper module called 'bpyml' which lets you declare data in python in a more XML structured way without the verbose tags, it can be converted to/from XML too, but is valid python.
https://svn.blender.org/svnroot/bf-blender/trunk/blender/release/scripts/modules/bpyml.py
Example Use
http://wiki.blender.org/index.php/User:Ideasman42#Declarative_UI_In_Blender
Here is a simpler approach to solve it
What if I can extend python syntax with new operators to introduce new functionally to the language? For example, a new operator <=> for swapping the value of two variables.
How can I implement such behavior? Here comes AST module.
The last module is a handy tool for handling abstract syntax trees. What’s cool about this module is it allows me to write python code that generates a tree and then compiles it to python code.
Let’s say we want to compile a superset language (or python-like language) to python:
from :
a <=> b
to:
a , b = b , a
I need to convert my 'python like' source code into a list of tokens.
So I need a tokenizer, a lexical scanner for Python source code. Tokenize module
I may use the same meta-language to define both the grammar of new 'python-like' language and then build the structure of the abstract syntax tree AST
Why use AST?
AST is a much safer choice when evaluating untrusted code
manipulate the tree before executing the code Working on the Tree
from tokenize import untokenize, tokenize, NUMBER, STRING, NAME, OP, COMMA
import io
import ast
s = b"a <=> b\n" # i may read it from file
b = io.BytesIO(s)
g = tokenize(b.readline)
result = []
for token_num, token_val, _, _, _ in g:
# naive simple approach to compile a<=>b to a,b = b,a
if token_num == OP and token_val == '<=' and next(g).string == '>':
first = result.pop()
next_token = next(g)
second = (NAME, next_token.string)
result.extend([
first,
(COMMA, ','),
second,
(OP, '='),
second,
(COMMA, ','),
first,
])
else:
result.append((token_num, token_val))
src = untokenize(result).decode('utf-8')
exp = ast.parse(src)
code = compile(exp, filename='', mode='exec')
def my_swap(a, b):
global code
env = {
"a": a,
"b": b
}
exec(code, env)
return env['a'], env['b']
print(my_swap(1,10))
Other modules using AST, whose source code may be a useful reference:
textX-LS: A DSL used to describe a collection of shapes and draw it for us.
pony orm: You can write database queries using Python generators and lambdas with translate to SQL query sting—pony orm use AST under the hood
osso: Role Based Access Control a framework handle permissions.