I am parsing text to check for the presence such as:
u'Your new contact email thedude#gmail.com has been confirmed.'
...where the text either side of the email address will be constant, and the email address won't be constant, but will be known before parsing.
Assume the sentence is contained in a variable called response and the email address in address. I could do:
'Your new contact email' + address + 'has been confirmed' in response
This is a little untidy, and downright inconvenient if the text of the sentence ever changes. Is it possible to advantage of string formatting in a variable assignment e.g.
sentence = 'Your new contact email %s has been confirmed'
And somehow pass address into the variable at runtime?
Of course you can! Try this out...
sentence = 'Your new contact email {} has been confirmed'.format(address)
There's also this other (rather hacky) alternative...
sentence = 'Your new contact email %s has been confirmed' % address
This alternative has its limitations too, such as requiring the use of a tuple for passing more than one argument...
sentence = 'Hi, %s! Your new contact email %s has been confirmed' % ('KemyLand', address)
Edit: According to comments from the OP, he's asking how to do this if the format string happens to exist before address does. Actually, this is very simple. May I show you the last three examples with this?...
# At this moment, `address` does not exist yet.
firstFormat = 'Your new contact email address {} has been confirmed'
secondFormat = 'Your new contact email address %s has been confirmed'
thirdFormat = 'Hi, %s! Your new contact email %s has been confirmed'
# Now, somehow, `address` does now exists.
firstSentence = firstFormat.format(address);
secondSentence = secondFormat % address
thirdSentence = thirdFormat % ('Pyderman', address)
I hope this has led some light on you!
This is what I usually do with my SQL queries, output lines and whatever:
sentence = 'Blah blah {0} blah'
...
if sentence.format(adress) in response:
foo()
bar()
So basically you get to keep all your I/O-related strings defined in one place instead of hardcoded all over the program. But at the same place you get to edit them whenever you please, but only in a limited way ('foo'.format() throws an exception when it gets too few or too many arguments).
Maybe a hack way of doing but if I understand you correctly, here's how you can..
At the beginning, declare the string, but where the address would go, put in something that will never generally be repeated... Like ||||| (5 pipe characters).
Then when you have the address and want to pop it in do:
myString.replace('|||||', address)
That will slot your address right where you need it :)
My understanding was you are trying to create a string and then later, add a piece in. Sorry if I misunderstood you :)
Related
I'm getting strings of game-chat from a server and I need to check if a user is mentioned in that string and if he is, I need to find him on the server and mention him because I can't just send the string as it is as it's not mentioning him.
Here's a simple example:
socket_str = "Hey this is a ping test for #TheBeast"
I need to check for tags on that string (#) , then get the name separated so TheBeast , then I need to go over the members in the servers and find a member object with that name and build a final fstring that contains the string before the mention, and with the mention.
so it will look the same but the bot will actually mention this user.
This was the simplest example, but there are so many edge cases that I can't deal with, for example, what if the the person has spaces in his name, how do you know when the name ends? Here's is the most complicated example I could make:
socket_str = "Hey I'm looking for #The New Beast is he online?, or #Newly Born Beast or #someone that doesnt exists is on?"
I'm looking for a different approach for this, I could share what I wrote so far which is a lot but honestly it's so complex code even I don't understand from it much anymore
This is actually very non-trivial. You've already said it yourself
"if the the person has spaces in his name, how do you know when the name ends?"
The only option I can think of to reliably check if a username (containing spaces) exists is to iteratively check each combination of spaced words as long as a certain semantic criteria is met.
In Discord, the only restrictions usernames have is that it can be at max 32 characters long. AFAIK you can have every symbol, emoji whatsoever in your name...
To illustrate, the statements would look something like this
string = "Hello #This is a username! Whats up?"
# is "This" a username?
# yes -> great! | no -> is "This is" a username?
# yes -> great! | no -> is "This is a" a username?
# yes -> great! | no -> is "This is a username!" a username?
# ...
However, this is also another edge case. This is a username is a valid user, but by spliting with spaces the program would look for This is a username!, which isn't valid. So as far as I can tell, the best option to say for sure, if a username is valid is to actually check for each character until the max length of Discord usernames.
This could be implemented like so
string = "Hello #This is a username! Whats up?"
potentialUsernames = string.split("#") # Split string into potential usernames
del potentialUsernames [0] # Delete first element, as it is definitely not a username
for potentialUsername in potentialUsernames: # for every potential username, do
for run, letter in enumerate(potentialUsername): # for every letter in every potential username, do
checkUsername = potentialUsername[:(run+1)]
if run > 32:
break # break out of this loop as there cant be a username here anymore
potentialMember = guild.get_member_named(checkUsername) # check if potential username exists
if potentialMember != None: # BOOM, we found a member!
string = string.replace("#" + checkUsername, potentialMember.mention) # replace the username with real mention in string
break # break because the user was already found
The output of print(string) would be
"Hello <#!1234567891011121314>! Whats up?"
yes.. this is how mentions would look in text-form, if you didn't know. The long number would be the user-id, however Member.mention already constructs this for you!
In this code, guild will have to be the guild object, of which you want to get the members from.
Now, what this code does is checking every potential username split by #, and check for every possible length until the next #, or Discords restriction of 32 characters.
I.e.
# is "T" a username?
# yes -> great | no -> is "Th" a username?
# yes -> great | no -> is "Thi" a username?
# ...
As a side note, this method should work with any amount of mentions!
I'm writing a registration form that only needs to accept the local component of a desired email address. The domain component is fixed to the site. I am attempting to validate it by selectively copying from validators.validate_email which Django provides for EmailField:
email_re = re.compile(
r"(^[-!#$%&'*+/=?^_`{}|~0-9A-Z]+(\.[-!#$%&'*+/=?^_`{}|~0-9A-Z]+)*" # dot-atom
# quoted-string, see also http://tools.ietf.org/html/rfc2822#section-3.2.5
r'|^"([\001-\010\013\014\016-\037!#-\[\]-\177]|\\[\001-\011\013\014\016-\177])*"'
r')#((?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\.)+[A-Z]{2,6}\.?$)' # domain
r'|\[(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}\]$', re.IGNORECASE) # literal form, ipv4 address (SMTP 4.1.3)
validate_email = EmailValidator(email_re, _(u'Enter a valid e-mail address.'), 'invalid')
Following is my code. My main issue is that I'm unable to adapt the regex. At this point I'm only testing it in a regex tester at http://www.pythonregex.com/ however it's failing:
^([-!#$%&'*+/=?^_`{}|~0-9A-Z]+(\.[-!#$%&'*+/=?^_`{}|~0-9A-Z]+)*)$
This seems to be passing undesirable characters such as ?
The entire code for my Field, which is not necessarily relevant at this stage but I wouldn't mind some comment on it would be:
class LocalEmailField(CharField):
email_local_re = re.compile(r"^([-!#$%&'*+/=?^_`{}|~0-9A-Z]+(\.[-!#$%&'*+/=?^_`{}|~0-9A-Z]+)*)$", re.IGNORECASE)
validate_email_local = RegexValidator(email_re, (u'Enter a valid e-mail username.'), 'invalid')
default_validators = [validate_email_local]
EDIT: To clarify, the user is only entering the text BEFORE the #, hence why I have no need to validate the #domain.com in the validator.
EDIT 2: So the form field and label will look like this:
Desired Email Address: [---type-able area---] #domain.com
You say "undesirable characters such as ?", but I think you're mistaken about what characters are desirable. The original regex allows question marks.
Note that you can also define your own validator that doesn't use a massive regex, and have some chance of decoding the logic later.
Some people, when confronted with a problem, think, “I know, I’ll use
regular expressions.” Now they have two problems. - Jamie
Zawinski
Checking via regex is an exercise in wasting your time. The best way is to attempt delivery; this way not only can you verify the email address, but also if the mailbox is actually active and can receive emails.
Otherwise you'll end up in an every-expanding regular expression that can't possibly hope to match all the rules.
"Haha boo hoo woo woo!"#foo.com is a valid address, so is qwerterukeriouo#gmail.com
Instead, offer the almost-standard "Please click on the link in the email we sent to blahblah#goo.com to verify your address." approach.
If you want to create email addresses, then you can write your own rules on what can be a part of the email component; and they can be a subset of the official allowed chars in the RFC.
For example, a conservative rule (that doesn't use regular expressions):
allowed_chars = [string.digits+string.letters+'-']
if len([x in user_input if x not in allowed_chars]):
print 'Sorry, invalid characters'
else:
if user_input[0] in string.digits+'-':
print 'Cannot start with a number or `-`'
else:
if check_if_already_exists(user_input):
print 'Sorry, already taken'
else:
print 'Congratulations!'
I'm still new to Django and Python, but why reinvent the wheel and maintain your own regex? If, apart from wanting users to enter only the local portion of their email address, you're happy with Django's built-in EmailField, you can subclass it quite easily and tweak the validation logic a bit:
DOMAIN_NAME = u'foo.com'
class LocalEmailField(models.EmailField):
def clean(local_part):
whole_address = '%s#%s' % (local_part, DOMAIN_NAME)
clean_address = super(LocalEmailField, self).clean(whole_address)
# Can do more checking here if necessary
clean_local, at_sign, clean_domain = clean_address.rpartition('#')
return clean_local
Have you looked at the documentation for Form and Field Validation and the .clean() method?
If you want to do it 100% correctly with regex, you need to use an engine with some form of extended regex which allow matching nested parentheses.
Python's default engine does not allow this, so you're better off compromising with a very simple (permissive) regex.
We have successfully implemented in our Python+pyramid program Encrypted Website Payments for PayPal, except for a tiny detail: input sanitization. Namely, we would like to help the user by providing as much data as possible to the PayPal from our user database. Now, it occurred to me that a malicious user could change his name to 'Mr Hacker\nprice=0.00' or similar, and thus completely negate the security offered by EWP. I did try URL-encoding the values, but PayPal does not seem to decode the percent escapes in the file.
Our code is based on the django-paypal library; the library completely neglects this issue, outputting happily bare name=value pairs without any checks:
plaintext = 'cert_id=%s\n' % CERT_ID
for name, field in self.fields.iteritems():
value = None
if name in self.initial:
value = self.initial[name]
elif field.initial is not None:
value = field.initial
if value is not None:
# ### Make this less hackish and put it in the widget.
if name == "return_url":
name = "return"
plaintext += u'%s=%s\n' % (name, value)
plaintext = plaintext.encode('utf-8')
So, how does one properly format the input for dynamically encrypted buttons? Or is there a better way to achieve similar functionality in Website Payments Standard to avoid this problem, yet as secure?
Update
What we craft is a string with contents like
item_number=BASIC
p3=1
cmd=_xclick-subscriptions
business=business#business.com
src=1
item_name=Percent%20encoding%20and%20UTF-8:%20%C3%B6
charset=UTF-8
t3=M
a3=10.0
sra=1
cert_id=ABCDEFGHIJKLM
currency_code=EUR
and encrypt it for EWP; the user posts the form to https://www.sandbox.paypal.com/cgi-bin/webscr. When the user clicks on the button, the PayPal page "Log in to complete your checkout" the item name displayed is "Percent%20encoding%20and%20UTF-8:%20%C3%B6". Thus, for EWP input it seems that percent encoding is not decoded.
You could filter out key-value pairs with regular expressions;
>>> import re
>>> text = 'Mr Hacker\nprice=0.00\nsecurity=false'
>>> re.sub('[\n][^\s]+=[^\s]*', '', text)
'Mr Hacker'
Or even more simple, ditch everything after the first newline;
>>> text.splitlines()[0]
'Mr Hacker'
The latter assumes that the first line is correct, which might not be the case.
I'm using the imaplib module to log into my gmail account and retrieve emails.
This gives me alot of information aswell as the to/from/subject/body text. According to
type(msg) th object returned is a instance.
My regex wont work when I apply it to the msg object as it expects a string, and this is obviously an instance so doesn't work.
Example of regex to identify the date which works fine when I just give it a string:
match = re.search(r"Time:\s(([0-2]\d):([0-5]\d))", text) # validates hour and minute in a 24 hour clock
So three questions really:
1.) am I going about this the right way or is there a better way to do it?
2.) how can I apply my regex to this 'instance' informtion so I can identify the date/time etc
3.) how can I just retrieve the email body?
result, data = mail.fetch(latest_email_id, "(RFC822)")
raw_email = data[0][1]
email_message = email.message_from_string(raw_email)
msg = email.message_from_string(raw_email)
msg.get_payload()
Thank you again
I think that this problem might be really close to another question I had answered:
payload of an email in string format, python
The main problem for the other person was that get_payload() can return multipart objects which you have to check for. Its not always just a string.
Here is the snippet from the other question about how to handle the object you get from get_payload():
if isinstance(payload, list):
for m in payload:
print str(m).split()
else:
print str(m).split()
Also, you can review the actual extended conversation that I had with the OP of that question here: https://chat.stackoverflow.com/rooms/5963/discussion-between-jdi-and-puneet
Turns out that the body of the email can be accessed via the payload[0], as payload is a list while the msg variable was an instance. I then just converted it to a string with a simple
payload = msg.get_payload()
body = payload[0]
str_body = str(body)
Thanks for your help again
I'm a beginner in Python. My problem is pretty simple. I have a string to be localized in a python application containing parameters :
print _('Hello dear user, your name is ') + params['first_name'] + ' ' + params['last_name'] + _(' and blah blah blah')
This actually does the job, but is not really what I would call a nice way to do it. Not to mention that some languages would, for example, require the last name to be displayed before the first name.
Is there a better way to do it ? I thought about placing custom tags like {{fn}} or {{ln}} in the translation string and replacing them by the actual values before displaying the string. But it seems not to be really more pleasant.
Thanks,
Pierre
I'd suggest
print 'Hello dear user, your name is %(first_name)s %(last_name)s' % params
Something like this should do the trick :
print _('Hello dear user, your name is %s %s and blah blah blah') % (params['first_name'], params['last_name'])
I would go with templates if I were you. That would let you have a separate template for each language. For example:
from string import Template
s_en = Template('Hello dear user, your name is $first_name $last_name and blah blah blah')
s_sco = Template('Hello, $first_name of the clan Mac$last_name...')
user = {'last_name': 'Duncan', 'first_name': 'Leod'}
print(s_en.substitute(user))
print(s_sco.substitute(user))
I thought about placing custom tags like {{fn}} or {{ln}} in the translation string and replacing them by the actual values before displaying them.
That's what I would do. Placeholders in the right places for each language versions should do the job.
One thing to mention that in some languages peoples names have to be modified dependent on where in a sentence and how they are used. You need to know each specific language to be able to do it correctly.
A possible solution: keep "in the middle of a sentence" cases to a minimum. Keep a localizable resource separated.
Instead of Hello dear user, your name is {{UserName}}
use User name: {{UserName}}