GAE Simple Searching + Autocomplete - python

I'm looking to create a search function for my flash game website.
One of the problems with the site is that it is difficult to find a specific game you want, as users must go to the alphabetical list to find one they want.
It's run with Google App Engine written in python, using the webapp framework.
At the very least I need a simple way to search games by their name. It might be easier to do searching in Javascript from the looks of it. I would prefer an autocomplete functionality. I've tried to figure out how to go about this and it seems that the only way is to create a huge index with each name broken up into various stages of being typed ("S", "Sh", "Sho" ... "Shopping Cart Hero").
Is there anyway to do this simply and easily? I'm beginning to think I'll have to create a web service on a PHP+MySql server and search using it.

I have written the code below to handle this. Basically, I save all the possible word "starts" in a list instead of whole sentences. That's how the jquery autocomplete of this site works.
import unicodedata
import re
splitter = re.compile(r'[\s|\-|\)|\(|/]+')
def remove_accents(text):
nkfd_form = unicodedata.normalize('NFKD', unicode(text))
return u"".join([c for c in nkfd_form if not unicodedata.combining(c)])
def get_words(text):
return [s.lower() for s in splitter.split(remove_accents(text)) if s!= '']
def get_unique_words(text):
word_set = set(get_words(text))
return word_set
def get_starts(text):
word_set = get_unique_words(text)
starts = set()
for word in word_set:
for i in range(len(word)):
starts.add(word[:i+1])
return sorted(starts)

Have you looked at gae-search? I believe the Django + jQuery "autocomplete" feature is not part of the free version (it's just in the for-pay premium version), but maybe it's worth a little money to you.

Related

How do I get the base of a synonym/plural of a word in python?

I would like to use python to convert all synonyms and plural forms of words to the base version of the word.
e.g. Babies would become baby and so would infant and infants.
I tried creating a naive version of plural to root code but it has the issue that it doesn't always function correctly and can't detect a large amount of cases.
contents = ["buying", "stalls", "responsibilities"]
for token in contents:
if token.endswith("ies"):
token = token.replace('ies','y')
elif token.endswith('s'):
token = token[:-1]
elif token.endswith("ed"):
token = token[:-2]
elif token.endswith("ing"):
token = token[:-3]
print(contents)
I have not used this library before, so that this with a grain of salt. However, NodeBox Linguistics seems to be a reasonable set of scripts that will do exactly what you are looking for if you are on MacOS. Check the link here: https://www.nodebox.net/code/index.php/Linguistics
Based on their documentation, it looks like you will be able to use lines like so:
print( en.noun.singular("people") )
>>> person
print( en.verb.infinitive("swimming") )
>>> swim
etc.
In addition to the example above, another to consider is a natural language processing library like NLTK. The reason why I recommend using an external library is because English has a lot of exceptions. As mentioned in my comment, consider words like: class, fling, red, geese, etc., which would trip up the rules that was mentioned in the original question.
I build a python library - Plurals and Countable, which is open source on github. The main purpose is to get plurals (yes, mutliple plurals for some words), but it also solves this particular problem.
import plurals_counterable as pluc
pluc.pluc_lookup_plurals('men', strict_level='dictionary')
will return a dictionary of the following.
{
'query': 'men',
'base': 'man',
'plural': ['men'],
'countable': 'countable'
}
The base field is what you need.
The library actually looks up the words in dictionaries, so it takes some time to request, parse and return. Alternatively, you might use REST API provided by Dictionary.video. You'll need contact admin#dictionary.video to get an API key. The call will be like
import requests
import json
import logging
url = 'https://dictionary.video/api/noun/plurals/men?key=YOUR_API_KEY'
response = requests.get(url)
if response.status_code == 200:
return json.loads(response.text)['base']
else:
logging.error(url + ' response: status_code[%d]' % response.status_code)
return None

How can I get a list of all the strings that Babel knows about?

I have an application in which the main strings are in English and then various translations are made in various .po/.mo files, as usual (using Flask and Flask-Babel). Is it possible to get a list of all the English strings somewhere within my Python code? Specifically, I'd like to have an admin interface on the website which lets someone log in and choose an arbitrary phrase to be used in a certain place without having to poke at actual Python code or .po/.mo files. This phrase might change over time but needs to be translated, so it needs to be something Babel knows about.
I do have access to the actual .pot file, so I could just parse that, but I was hoping for a cleaner method if possible.
You can use polib for this.
This section of the documentation shows examples of how to iterate over the contents of a .po file. Here is one taken from that page:
import polib
po = polib.pofile('path/to/catalog.po')
for entry in po:
print entry.msgid, entry.msgstr
If you alredy use babel you can get all items from po file:
from babel.messages.pofile import read_po
catalog = read_po(open(full_file_name))
for message in catalog:
print message.id, message.string
See http://babel.edgewall.org/browser/trunk/babel/messages/pofile.py.
You alredy can try get items from mo file:
from babel.messages.mofile import read_mo
catalog = read_po(open(full_file_name))
for message in catalog:
print message.id, message.string
But when I try use it last time it's not was availible. See http://babel.edgewall.org/browser/trunk/babel/messages/mofile.py.
You can use polib as #Miguel wrote.

Webapp username security

I'm creating a webapplication in Python (and Flask) where a user can register with their wanted username. I would like to show their profile at /user/ and have a directory on the server for each user.
What is the best way to make sure the username is secure for both a url and directory? I read about people using the urlsafe methods in base64, but I would like to have a string that is related to their username for easy recognition.
The generic term for such URL-safe values is "slug", and the process of generating one is called "slugification", or "to slugify". People generally use a regular expression to do so; here is one (sourced from this article on the subject), using only stdlib modules:
import re
from unicodedata import normalize
_punct_re = re.compile(r'[\t !"#$%&\'()*\-/<=>?#\[\\\]^_`{|},.]+')
def slugify(text, delim=u'-'):
"""Generates an slightly worse ASCII-only slug."""
result = []
for word in _punct_re.split(text.lower()):
word = normalize('NFKD', word).encode('ascii', 'ignore')
if word:
result.append(word)
return unicode(delim.join(result))
The linked article has another 2 alternatives requiring additional modules.

How to transform hyperlink codes into normal URL strings?

I'm trying to build a blog system. So I need to do things like transforming '\n' into < br /> and transform http://example.com into < a href='http://example.com'>http://example.com< /a>
The former thing is easy - just using string replace() method
The latter thing is more difficult, but I found solution here: Find Hyperlinks in Text using Python (twitter related)
But now I need to implement "Edit Article" function, so I have to do the reverse action on this.
So, how can I transform < a href='http://example.com'>http://example.com< /a> into http://example.com?
Thanks! And I'm sorry for my poor English.
Sounds like the wrong approach. Making round-trips work correctly is always challenging. Instead, store the source text only, and only format it as HTML when you need to display it. That way, alternate output formats / views (RSS, summaries, etc) are easier to create, too.
Separately, we wonder whether this particular wheel needs to be reinvented again ...
Since you are using the answer from that other question your links will always be in the same format. So it should be pretty easy using regex. I don't know python, but going by the answer from the last question:
import re
myString = 'This is my tweet check it out http://tinyurl.com/blah'
r = re.compile(r'(http://[^ ]+)')
print r.sub(r'\1', myString)
Should work.

Python - English translator

What is the best way to approach writing a program in Python to translate English words and/or phrases into other languages?
AJAX Language API
This is an incredibly difficult problem -- language is very very very complicated. Think about all the things you'd have to do -- parse the phrase, work out what the words mean, translate them. That's probably not idiomatic so you'll need special cases for different syntaxes. Many, many special cases. You'll need to work out the syntax of the foreign language if it differs from English -- "the big green ball" goes to "the ball big green" in Spanish, for instance.
Don't reinvent the wheel. Google provide an API to their translation service, which has undoubtedly had many many clever people thinking really quite hard about it.
I think you should look into the Google Translate API. Here is a library implemented specifically for this purpose in python.
the simplest way to do this is to make a dictionary that matches one language's words to another language's words. However, this is extremely silly and would not take into account grammar at all and it would literally take a very long time to create a translator, especially if you plan to use it for multiple languages. If grammar is not important to you (for example, if you were creating your own language for a game or story that doesn't have grammar different from english) than you could get away with using dictionaries and simply having a function look for a requested match in the dictionary
# command : pip install mtranslate
from mtranslate import translate
>>> from mtranslate import translate
>>> translate("Tranalating to kannada language (my mother tongue) ", to_language = "kn")
'ಕನ್ನಡ ಭಾಷೆಗೆ ಅನುವಾದ (ನನ್ನ ಮಾತೃಭಾಷೆ)'
You can use Goslate Package for that
its very easy to use
example
import goslate
print(goslate.Goslate().translate('hello world', 'ar'))
now first argument is text you want to translate and second argument is in which language you want to translate.
i hope you will find the answer usefull
# Please install Microsoft translate using >> pip install translate
from translate import Translator
class clsTranslate():
def translateText(self, strString, strTolang):
self.strString = strString
self.strTolang = strTolang
translator = Translator(to_lang=self.strTolang)
translation = translator.translate(self.strString)
return (str(translation))
# Create a Class object and call the Translate function
# Pass the language as a parameter to the function, de: German zh: Chinese etc
objTrans=clsTranslate()
strTranslatedText= objTrans.translateText('Howare you', 'de')
print(strTranslatedText)
It's very very easy if you use deep-translator! Here's the source code(make sure to install deep-translator module):
from deep_translator import GoogleTranslator
import time
def start():
while True:
def translate():
line_to_translate = input('Which line/phrase/word you want to translate?\n')
to_lang = input('In which language you want to translate it?\n')
to_lang = to_lang.lower()
translation = GoogleTranslator(source='auto', target=to_lang).translate(text=line_to_translate)
return translation
time.sleep(1 sec)
esc = (input("Enter 'q' to exit and 'r' to restart.\n"))
while True:
if esc.lower() in {'q', 'r'}:
break
else:
print('Please enter a valid Option!!')
time.sleep(1)
esc = (input("Enter 'q' to exit and 'r' to restart.\n"))
if esc.lower() == 'q':
return
elif esc.lower() == 'r':
pass
start()
# command : pip install mtranslate
from mtranslate import translate
>>> from mtranslate import translate
>>> translate("Tranalating to kannada language (my mother tongue) ", to_language = "kn")
'ಕನ್ನಡ ಭಾಷೆಗೆ ಅನುವಾದ (ನನ್ನ ಮಾತೃಭಾಷೆ)'

Categories