Applying Regular Expression To An Instance - From Email - python

I'm using the imaplib module to log into my gmail account and retrieve emails.
This gives me alot of information aswell as the to/from/subject/body text. According to
type(msg) th object returned is a instance.
My regex wont work when I apply it to the msg object as it expects a string, and this is obviously an instance so doesn't work.
Example of regex to identify the date which works fine when I just give it a string:
match = re.search(r"Time:\s(([0-2]\d):([0-5]\d))", text) # validates hour and minute in a 24 hour clock
So three questions really:
1.) am I going about this the right way or is there a better way to do it?
2.) how can I apply my regex to this 'instance' informtion so I can identify the date/time etc
3.) how can I just retrieve the email body?
result, data = mail.fetch(latest_email_id, "(RFC822)")
raw_email = data[0][1]
email_message = email.message_from_string(raw_email)
msg = email.message_from_string(raw_email)
msg.get_payload()
Thank you again

I think that this problem might be really close to another question I had answered:
payload of an email in string format, python
The main problem for the other person was that get_payload() can return multipart objects which you have to check for. Its not always just a string.
Here is the snippet from the other question about how to handle the object you get from get_payload():
if isinstance(payload, list):
for m in payload:
print str(m).split()
else:
print str(m).split()
Also, you can review the actual extended conversation that I had with the OP of that question here: https://chat.stackoverflow.com/rooms/5963/discussion-between-jdi-and-puneet

Turns out that the body of the email can be accessed via the payload[0], as payload is a list while the msg variable was an instance. I then just converted it to a string with a simple
payload = msg.get_payload()
body = payload[0]
str_body = str(body)
Thanks for your help again

Related

Compare format string to actual string python

I have the following format, as a string variable
format = '$<amount> has been received from <user>'
How would I check if another string fits the format, for example:
message = '$50 has been received from Hugh'
I'd like to check that the message exactly fits the format, and save the data, in this case 50 and Hugh in two separate variables.
I checked RegEx on a few websites such as W3Schools and PyPi, but couldn't find anything that fits what it is I'm trying to do.
message = '$50 has been received from Hugh'
import re
regex = re.compile(r'\$(\d+) has been received from (\w+)')
amount, user = regex.findall(message)[0]
print(amount, user) # 50, Hugh

Search Gmail using imaplib using double quotes - how to avoid search command error

I need to search someone's Gmail account for a specific phrase, "foo bar". If I search foo bar without double quotes I get >125,000 emails, when I search with double quotes (from the browser), I get the 180 relevant emails I'm looking for. However, imaplib's search method won't let me use double quotes. Is there anything I can do about this?
This is what I've already tried:
import imaplib
mail = imaplib.IMAP4_SSL(SMTP_SERVER)
mail.login(USERNAME,PASSWORD)
mail.select(mail_box)
Type, data = mail.search(None, ('Since 01-Jan-2016'), ('BODY "foo bar"'))
^^ works but returns >125,000 emails, mostly irrelevant - anything with both foo and bar
Type, data = mail.search(None, ('Since 01-Jan-2016'), ('BODY ""foo bar""'))
Type, data = mail.search(None, ('Since 01-Jan-2016'), ('BODY "\"foo bar\""'))
Type, data = mail.search(None, ('Since 01-Jan-2016'), ('BODY """foo bar"""'))
^^^ all of the above throw the following error: "error: SEARCH command error: BAD [b'Could not parse command']"
Any ideas would be much appreciated.
As suggested by Max above, this is what worked:
import imaplib
mail = imaplib.IMAP4_SSL(SMTP_SERVER)
mail.login(USERNAME,PASSWORD)
mail.select(mail_box)
Type, data = mail.uid('search', None, ('Since 01-Jan-2016'), 'X-GM-RAW', r'"\"foo bar\""')
Note if you're using mail.uid() search, you need to update your fetch call as well to...
mail.uid('fetch', ID, '(RFC822)')
This should work:
mail.search(None, r'BODY "\"Security Alert\""')
r to turn it into a raw string so the backslashes won't be interpreted by Python. Then the backslash gets sent to the server and interpreted properly.
You should be able to adapt this format for your use.
Note: to see what's being sent, temporary set your mail.debug to a high number, like 4. This shows the traffic. Seeing this I saw the quote was not actually being escaped (because Python was treating the backspace as an escape for it).
mail.debug = 4

Extract attendee response from google calendar event with python

This is a follow on question to one answered recently by wpercy and Kieran.
I'm trying to fashion some Python code to improve a Zap in Zapier.
The first stage involved extracting the attendee emails from the supplied (by Google) string variable containing the emails separated by commas.
What I now need to figure out is how to also extract the attendee responses and pair them or somehow get them to follow their corresponding attendee email address as the remaining steps in the Zap are carried out, once for each email/attendee.
Here is the solution code I have successfully tested. It deals with just the emails:
emails = []
attendeeList = input_data['attendeeEmails'].split(',')
for email in attendeeList:
a = {'Email' : email.strip()}
emails.append(a)
return emails
Here is the other solution offered by Kieran:
[{'Email': email.strip()} for email in input_data['attendeeEmails'].split(',')]
The Google Calendar data looks like this:
attendees:
1:
displayName: Doug Christensen
email: xxxx#gmail.com
responseStatus: needsAction
2:
displayName: Doug Christensen
email: yyyyyy#gmail.com
responseStatus: needsAction
3:
self: true
email: zzzz#xyzmadscience.com
organizer: true
responseStatus: accepted
So I want to get "responseStatus" and the only thing I could think to do was the following:
emails = []
position = 0
responseList = input_data['attendeeReponses'].split(',')
attendeeList = input_data['attendeeEmails'].split(',')
for email in attendeeList:
a = {'Email' : email.strip(), 'responseStatus' : reponseStatus(position).strip()}
a = {'Email' : email.strip()}
emails.append(a)
position += 1
return emails
...but that does not work (says "error" in Zapier).
I'm pretty confused by the fact that the attendee emails are available in 2 Google variables "Attendee Emails" and "Attendees Email". One actually shows up in the variables to pass to the Zap's Python code as 'Attendees[]Email' while the other shows as 'Attendee Emails'. For the attendee responses there is only one option which manifests as 'Attendees[]ResponseStatus'.
I'm clearly no expert but these labels suggest to me a bit of a data structure? when the '[]' is included, making me think that an even more elegant method of extraction of the email and pairing with the attendee response, is possible.
I want the Python code to return the email and its corresponding attendee response in a way such that the following Zap steps will be performed once for each email/response pair.
Again, any guidance would be greatly appreciated.
Doug
The reason for your error is that you're trying to access an element in the list with parentheses (). You should be using brackets [].
Even after fixing that, you can be doing this in a far more pythonic fashion. Instead of keeping track of your position in the list with its own variable, you should use the built-in function enumerate(). This will keep track of the index for you, and you won't have to increment it manually.
You would use it like this
emails = []
responseList = input_data['attendeeReponses'].split(',')
attendeeList = input_data['attendeeEmails'].split(',')
for i,email in enumerate(attendeeList):
a = {'Email': email.strip(), 'responseStatus': reponseStatus[i].strip()}
emails.append(a)
return emails

How to account for string formatting in Python variable assignment?

I am parsing text to check for the presence such as:
u'Your new contact email thedude#gmail.com has been confirmed.'
...where the text either side of the email address will be constant, and the email address won't be constant, but will be known before parsing.
Assume the sentence is contained in a variable called response and the email address in address. I could do:
'Your new contact email' + address + 'has been confirmed' in response
This is a little untidy, and downright inconvenient if the text of the sentence ever changes. Is it possible to advantage of string formatting in a variable assignment e.g.
sentence = 'Your new contact email %s has been confirmed'
And somehow pass address into the variable at runtime?
Of course you can! Try this out...
sentence = 'Your new contact email {} has been confirmed'.format(address)
There's also this other (rather hacky) alternative...
sentence = 'Your new contact email %s has been confirmed' % address
This alternative has its limitations too, such as requiring the use of a tuple for passing more than one argument...
sentence = 'Hi, %s! Your new contact email %s has been confirmed' % ('KemyLand', address)
Edit: According to comments from the OP, he's asking how to do this if the format string happens to exist before address does. Actually, this is very simple. May I show you the last three examples with this?...
# At this moment, `address` does not exist yet.
firstFormat = 'Your new contact email address {} has been confirmed'
secondFormat = 'Your new contact email address %s has been confirmed'
thirdFormat = 'Hi, %s! Your new contact email %s has been confirmed'
# Now, somehow, `address` does now exists.
firstSentence = firstFormat.format(address);
secondSentence = secondFormat % address
thirdSentence = thirdFormat % ('Pyderman', address)
I hope this has led some light on you!
This is what I usually do with my SQL queries, output lines and whatever:
sentence = 'Blah blah {0} blah'
...
if sentence.format(adress) in response:
foo()
bar()
So basically you get to keep all your I/O-related strings defined in one place instead of hardcoded all over the program. But at the same place you get to edit them whenever you please, but only in a limited way ('foo'.format() throws an exception when it gets too few or too many arguments).
Maybe a hack way of doing but if I understand you correctly, here's how you can..
At the beginning, declare the string, but where the address would go, put in something that will never generally be repeated... Like ||||| (5 pipe characters).
Then when you have the address and want to pop it in do:
myString.replace('|||||', address)
That will slot your address right where you need it :)
My understanding was you are trying to create a string and then later, add a piece in. Sorry if I misunderstood you :)

Forwarded Email parsing in Python/Any other language?

I have some mails in txt format, that have been forwarded multiple times.
I want to extract the content/the main body of the mail. This should be at the last position in the hierarchy..right? (Someone point this out if I'm wrong).
The email module doesn't give me a way to extract the content. if I make a message object, the object doesn't have a field for the content of the body.
Any idea on how to do it? Any module that exists for the same or any any particular way you can think of except the most naive one of-course of starting from the back of the text file and looking till you find the header.
If there is an easy or straightforward way/module with any other language ( I doubt), please let me know that as well!
Any help is much appreciated!
The email module doesn't give me a way to extract the content. if I make a message object, the object doesn't have a field for the content of the body.
Of course it does. Have a look at the Python documentation and examples. In particular, look at the walk and payload methods.
Try get_payload on the parsed Message object. If there is only one message, the return type will be string, otherwise it will be a list of Message objects.
Something like this:
messages = parsed_message.get_payload()
while type(messages) <> Types.StringType:
messages = messages[-1].get_payload()

Categories