Arguments in the middle of a string to be localized - python

I'm a beginner in Python. My problem is pretty simple. I have a string to be localized in a python application containing parameters :
print _('Hello dear user, your name is ') + params['first_name'] + ' ' + params['last_name'] + _(' and blah blah blah')
This actually does the job, but is not really what I would call a nice way to do it. Not to mention that some languages would, for example, require the last name to be displayed before the first name.
Is there a better way to do it ? I thought about placing custom tags like {{fn}} or {{ln}} in the translation string and replacing them by the actual values before displaying the string. But it seems not to be really more pleasant.
Thanks,
Pierre

I'd suggest
print 'Hello dear user, your name is %(first_name)s %(last_name)s' % params

Something like this should do the trick :
print _('Hello dear user, your name is %s %s and blah blah blah') % (params['first_name'], params['last_name'])

I would go with templates if I were you. That would let you have a separate template for each language. For example:
from string import Template
s_en = Template('Hello dear user, your name is $first_name $last_name and blah blah blah')
s_sco = Template('Hello, $first_name of the clan Mac$last_name...')
user = {'last_name': 'Duncan', 'first_name': 'Leod'}
print(s_en.substitute(user))
print(s_sco.substitute(user))

I thought about placing custom tags like {{fn}} or {{ln}} in the translation string and replacing them by the actual values before displaying them.
That's what I would do. Placeholders in the right places for each language versions should do the job.
One thing to mention that in some languages peoples names have to be modified dependent on where in a sentence and how they are used. You need to know each specific language to be able to do it correctly.
A possible solution: keep "in the middle of a sentence" cases to a minimum. Keep a localizable resource separated.
Instead of Hello dear user, your name is {{UserName}}
use User name: {{UserName}}

Related

RenPy: Python .format translate

I'm creating a translation for a visual novel, and I can't translate the text with string formatting, does anyone know how to do this?
Code to translate:
$ gil_likes = "Teasing {0}, Basketball".format(Main)
The translation must be something like this:
old "Teasing Walter, Basketball" new "Дразнить Уолтера, Баскетбол"
I try to replace {0} with a specific name, but then the translation works only for that specific name, the problem is that the player can enter any name

How to account for string formatting in Python variable assignment?

I am parsing text to check for the presence such as:
u'Your new contact email thedude#gmail.com has been confirmed.'
...where the text either side of the email address will be constant, and the email address won't be constant, but will be known before parsing.
Assume the sentence is contained in a variable called response and the email address in address. I could do:
'Your new contact email' + address + 'has been confirmed' in response
This is a little untidy, and downright inconvenient if the text of the sentence ever changes. Is it possible to advantage of string formatting in a variable assignment e.g.
sentence = 'Your new contact email %s has been confirmed'
And somehow pass address into the variable at runtime?
Of course you can! Try this out...
sentence = 'Your new contact email {} has been confirmed'.format(address)
There's also this other (rather hacky) alternative...
sentence = 'Your new contact email %s has been confirmed' % address
This alternative has its limitations too, such as requiring the use of a tuple for passing more than one argument...
sentence = 'Hi, %s! Your new contact email %s has been confirmed' % ('KemyLand', address)
Edit: According to comments from the OP, he's asking how to do this if the format string happens to exist before address does. Actually, this is very simple. May I show you the last three examples with this?...
# At this moment, `address` does not exist yet.
firstFormat = 'Your new contact email address {} has been confirmed'
secondFormat = 'Your new contact email address %s has been confirmed'
thirdFormat = 'Hi, %s! Your new contact email %s has been confirmed'
# Now, somehow, `address` does now exists.
firstSentence = firstFormat.format(address);
secondSentence = secondFormat % address
thirdSentence = thirdFormat % ('Pyderman', address)
I hope this has led some light on you!
This is what I usually do with my SQL queries, output lines and whatever:
sentence = 'Blah blah {0} blah'
...
if sentence.format(adress) in response:
foo()
bar()
So basically you get to keep all your I/O-related strings defined in one place instead of hardcoded all over the program. But at the same place you get to edit them whenever you please, but only in a limited way ('foo'.format() throws an exception when it gets too few or too many arguments).
Maybe a hack way of doing but if I understand you correctly, here's how you can..
At the beginning, declare the string, but where the address would go, put in something that will never generally be repeated... Like ||||| (5 pipe characters).
Then when you have the address and want to pop it in do:
myString.replace('|||||', address)
That will slot your address right where you need it :)
My understanding was you are trying to create a string and then later, add a piece in. Sorry if I misunderstood you :)

match hex string with list indice

I'm building a de-identify tool. It replaces all names by other names.
We got a report that <name>Peter</name> met <name>Jane</name> yesterday. <name>Peter</name> is suspicious.
outpout :
We got a report that <name>Billy</name> met <name>Elsa</name> yesterday. <name>Billy</name> is suspicious.
It can be done on multiple documents, and one name is always replaced by the same counterpart, so you can still understand who the text is talking about. BUT, all documents have an ID, referring to the person this file is about (I'm working with files in a public service) and only documents with the same people ID will be de-identified the same way, with the same names. (the goal is to watch evolution and people's history) This is a security measure, such as when I hand over the tool to a third party, I don't hand over the key to my own documents with it.
So the same input, with a different ID, produces :
We got a report that <name>Henry</name> met <name>Alicia</name> yesterday. <name>Henry</name> is suspicious.
Right now, I'm hashing each name with the document ID as a salt, I convert the hash to an integer, then subtract the length of the name list until I can request a name with that integer as an indice. But I feel like there should be a quicker/more straightforward approach ?
It's really more of an algorithmic question, but if it's of any relevance I'm working with python 2.7 Please request more explanation if needed. Thank you !
I hope it's clearer this way ô_o Sorry when you are neck-deep in your code you forget others need a bigger picture to understand how you got there.
As #LutzHorn pointed out, you could just use a dict to map real names to false ones.
You could also just do something like:
existing_names = []
for nameocurrence in original_text:
if not nameoccurence.name in existing_names:
nameoccurence.id = len(existing_names)
existing_names.append(nameoccurence.name)
else:
nameoccurence.id = existing_names.index(nameoccurence.name)
for idx, _ in enumerate(existing_names):
existing_names[idx] = gimme_random_name()
Try using a dictionary of names.
import re
names = {"Peter": "Billy", "Jane": "Elsa"}
for name in re.findall("<name>([a-zA-Z]+)</name>", s):
s = re.sub("<name>" + name + "</name>", "<name>"+ names[name] + "</name>", s)
print(s)
Output:
'We got a report that <name>Billy</name> met <name>Elsa</name> yesterday. <name>Billy</name> is suspicious.'

Jython How to remove characters a string

This is a WebSphere related question.
I am trying to turn this command into variables
AdminConfig.modify('(cells/taspmociias204Cell01/clusters/cam_group|resources.xml#J2EEResourceProperty_1324400045826)'
I've found that this command:
AdminConfig.list('J2EEResourceProperty', 'URL*cam_group*)').splitlines()
Will return:
['URL(cells/taspmociias204Cell01/clusters/cam_group|resources.xml#J2EEResourceProperty_1324400045826)', 'URL(cells/taspmociias204Cell01/clusters/cam_group|resources.xml#J2EEResourceProperty_1355156316906)']
So I turned that command into a variable:
j2ee = AdminConfig.list('J2EEResourceProperty', 'URL*cam_group*)').splitlines()
And i'm able to get the string that I want by typing "j2ee[0]" I get
'URL(cells/taspmociias204Cell01/clusters/cam_group|resources.xml#J2EEResourceProperty_1324400045826)'
So that is exactly what I wanted, minus the URL part in the front. How can I get rid of those characters?!
I'm not sure if I understood your requirement, but it seems to me that you want to modify some attributes of J2EEResourceProperty object.
If this is the case, then you don't need to remove that "URL" string, actually you shouldn't do that. The string 'URL(cells/taspmociias204Cell01/clusters/cam_group|resources.xml#J2EEResourceProperty_1324400045826)' fully identifies WebSphere configuration object. Try this:
AdminConfig.modify('URL(cells/taspmociias204Cell01/clusters/cam_group|resources.xml#J2EEResourceProperty_1324400045826)', [['value', 'the new value'], ['description', 'the new description']])
BTW: you can also try using WDR library (https://github.com/WDR/wdr/). Then your script would look as follows:
prop = listConfigObjects('J2EEResourceProperty')[0]
prop.value = 'the new value'
prop.description = 'the new description'
Disclosure: I'm one of WDR contributors.
You could always use a simple replace regular expression to parse out the URL part.
For example:
import re
mystr = 'URL(blahblahblah)'
re.sub(r'^URL', "", mystr)
This is a handy tool to learn and test your regular expressions to make sure they are correct.
http://gskinner.com/RegExr/

Validate email local component

I'm writing a registration form that only needs to accept the local component of a desired email address. The domain component is fixed to the site. I am attempting to validate it by selectively copying from validators.validate_email which Django provides for EmailField:
email_re = re.compile(
r"(^[-!#$%&'*+/=?^_`{}|~0-9A-Z]+(\.[-!#$%&'*+/=?^_`{}|~0-9A-Z]+)*" # dot-atom
# quoted-string, see also http://tools.ietf.org/html/rfc2822#section-3.2.5
r'|^"([\001-\010\013\014\016-\037!#-\[\]-\177]|\\[\001-\011\013\014\016-\177])*"'
r')#((?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\.)+[A-Z]{2,6}\.?$)' # domain
r'|\[(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}\]$', re.IGNORECASE) # literal form, ipv4 address (SMTP 4.1.3)
validate_email = EmailValidator(email_re, _(u'Enter a valid e-mail address.'), 'invalid')
Following is my code. My main issue is that I'm unable to adapt the regex. At this point I'm only testing it in a regex tester at http://www.pythonregex.com/ however it's failing:
^([-!#$%&'*+/=?^_`{}|~0-9A-Z]+(\.[-!#$%&'*+/=?^_`{}|~0-9A-Z]+)*)$
This seems to be passing undesirable characters such as ?
The entire code for my Field, which is not necessarily relevant at this stage but I wouldn't mind some comment on it would be:
class LocalEmailField(CharField):
email_local_re = re.compile(r"^([-!#$%&'*+/=?^_`{}|~0-9A-Z]+(\.[-!#$%&'*+/=?^_`{}|~0-9A-Z]+)*)$", re.IGNORECASE)
validate_email_local = RegexValidator(email_re, (u'Enter a valid e-mail username.'), 'invalid')
default_validators = [validate_email_local]
EDIT: To clarify, the user is only entering the text BEFORE the #, hence why I have no need to validate the #domain.com in the validator.
EDIT 2: So the form field and label will look like this:
Desired Email Address: [---type-able area---] #domain.com
You say "undesirable characters such as ?", but I think you're mistaken about what characters are desirable. The original regex allows question marks.
Note that you can also define your own validator that doesn't use a massive regex, and have some chance of decoding the logic later.
Some people, when confronted with a problem, think, “I know, I’ll use
regular expressions.” Now they have two problems. - Jamie
Zawinski
Checking via regex is an exercise in wasting your time. The best way is to attempt delivery; this way not only can you verify the email address, but also if the mailbox is actually active and can receive emails.
Otherwise you'll end up in an every-expanding regular expression that can't possibly hope to match all the rules.
"Haha boo hoo woo woo!"#foo.com is a valid address, so is qwerterukeriouo#gmail.com
Instead, offer the almost-standard "Please click on the link in the email we sent to blahblah#goo.com to verify your address." approach.
If you want to create email addresses, then you can write your own rules on what can be a part of the email component; and they can be a subset of the official allowed chars in the RFC.
For example, a conservative rule (that doesn't use regular expressions):
allowed_chars = [string.digits+string.letters+'-']
if len([x in user_input if x not in allowed_chars]):
print 'Sorry, invalid characters'
else:
if user_input[0] in string.digits+'-':
print 'Cannot start with a number or `-`'
else:
if check_if_already_exists(user_input):
print 'Sorry, already taken'
else:
print 'Congratulations!'
I'm still new to Django and Python, but why reinvent the wheel and maintain your own regex? If, apart from wanting users to enter only the local portion of their email address, you're happy with Django's built-in EmailField, you can subclass it quite easily and tweak the validation logic a bit:
DOMAIN_NAME = u'foo.com'
class LocalEmailField(models.EmailField):
def clean(local_part):
whole_address = '%s#%s' % (local_part, DOMAIN_NAME)
clean_address = super(LocalEmailField, self).clean(whole_address)
# Can do more checking here if necessary
clean_local, at_sign, clean_domain = clean_address.rpartition('#')
return clean_local
Have you looked at the documentation for Form and Field Validation and the .clean() method?
If you want to do it 100% correctly with regex, you need to use an engine with some form of extended regex which allow matching nested parentheses.
Python's default engine does not allow this, so you're better off compromising with a very simple (permissive) regex.

Categories