Get all information about an email - python

I want to get all my emails with all their information, e.g. Body, Subject, Date Received, Folders, Attachments, From, To, etc.
I found this:
How can I download all emails with attachments from Gmail?
But it's only subjects/from/attachments.
I copied pretty much everything and added this:
for payload in mail.get_payload():
print part.get_payload(decode=True)
But I'm getting (�� (�� (�� at times and html at times.
I just want to have simple access to an email message where everything is decoded, easily handleable.

Related

Python email.mime sometimes sends body text as attachments

I am working with the python email.mime library to generate emails for monitoring system status across a number of locations. I attach data to the body of the email as described in examples and documentation and often I get the email just as expected: lines of text with the data I need (within the body of the email).
In some cases, however, I get the text within an attachment. Here's what I see when I diff two of the emails in raw format:
---===============1782456183745610843756==
-Content-Type: text/plain; charset="us-ascii"; name="ATT00001"
+--===============4561084375674561084375==
+Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
-Content-Description: ATT00001
-Content-Disposition: attachment; filename="ATT00001"
The way I want the email sent is as text within the body. Some locations always attach it as an untyped text file, while others always append it to the body of the email.
I am looking for differences in how the emails are generated, but it is literally the same code generating both, so it must be some site-specific parameter. Has anyone encountered this before?
I have figured it out.
It has to do with how Microsoft servers interpret the components of the email. I would build the emails in pieces: first the main part was constructed, and then additional metadata was computed and attached via msg.attach. This data was supposed to be handled as part of the body of the email (and it is when it is sent through some email servers). However, when the email is routed through MS Exchange servers, it takes the additional parts of the email and treats them as attachments.
I was able to piece it together and found evidence for it here
https://learn.microsoft.com/en-us/exchange/troubleshoot/mailflow/message-body-shown-as-attachment
https://kb.mit.edu/confluence/pages/viewpage.action?pageId=4981187

Extract the forwarded emails from outlook using python and split it

Need to extract the emails from outlook. Most of them are forwarded emails. I tried to extract the emails using pywin32 and it worked by extracting the entire email.
But the problem, I need to have it splitted as title, body and signature separately.
For ex:
Title should include From, To, CC, Subject.
Body should include the email body
Signature should include the email signature if its available.
The issue is that as I have threaded emails, the body section extracts the entire forwarded email like below.
Thanks and Regards,
Sam
From: Odar <odar#sysp.com>
Sent: 2 October 2007 09:22
To: rav <rav#tsvs.com>
Cc: hare <hare#sysp.com>; fere <fere#sysp.com>
Subject: CHAN BULB - test at Company
CAUTION - EXTERNAL SENDER !
Dear Sir,
Good day!!
Please note that the product is checked and working.
It will get delivered as soon as possible.
Best Regards
Odar
Executive
This is the body off the email pywin extracted as it is a threaded email.
But I need the title separately includes from,sent,to,cc,subject.
And body should have the middle text and
signature should include the last part.
Any suggestions how it can be done??
The code I tried is below:
outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
outbox = outlook.GetDefaultFolder(6)
messages = outbox.Items
print(messages[1].Body)

Making a Python program handle mailto links

I made a Python program that is capable of sending email. However, I want to make it capable of being a default email client such that it will capture the email address and subject of HTML email links (mailto links) when the user clicks them.
How do I get the email address and subject to my client?
Currently, I can set my program as the default mail client, but I don't know what information, or format of information, it's getting from the web browser; so, I'm not sure how to parse it.
Assuming the complete link is passed in as sys.argv[1], you need to do something like this:
import urllib.parse
parsed = urllib.parse.urlparse(sys.argv[1])
mail_addr = parsed.path
fields = urllib.parse.parse_qs(parsed.query)
This sets mail_addr to the address to send the email to, while field will be a dictionary of additional parameters.
The fields that you can expect to be present are specified in RFC 6068

Parsing emails with bad mime types in python

I am trying to use python's imaplib and email.feedparser to grab an attachment out of a gmail inbox. The email is generated by an external party and sent to us, so I have no control over it.
The trouble is that the message I am trying to parse has msg.get_content_maintype() return 'text' instead of 'multitype'. As a result the uuencoded attachment gets concatenated with the rest of the message and I don't see an ease way to pull it out of email.message.Message.
Any ideas how I can extract the attachment out of such an email?
If it is any help, the email has 'Produced By Microsoft MimeOLE V6.00.3790.4862' in it. Thunderbird also had trouble rendering this email and wasn't able to figure out that it had an attachment. Otherwise, the message looks ok in Outlook and Gmail web client.
As email.parser.FeedParser documentation says you can use _factory parameter to provide your own Message class. You could put your own class derived from email.message.Message that will replace message contenttype with the correct one if this message was composed in "Produced By Microsoft MimeOLE ...blablabla".
I think it is safe to expect that the messagesin this particular case will always be multipart.

extracting individual email from a single thunderbird email data file

I have a thunderbird email data file from which I need to extract individual email. I tried to use regex and do plain vanilla extract based on from tag but this doesn't give me the required result. An email can have another email attached within the body hence a single email can have more than one "From:" strings. How can I extract individual emails out of this data file?
Try the python mailbox module.
The mailbox module can read mbox/maildir message stores.

Categories