How do I get unfolded email headers in Python3? - python

(Note: this question has nothing to do with encoding, as should be clear by reading it. Ignore the suggestion above.)
I'm learning Python and figured a nice tool to start out with would be something that would grab some emails over MIME and display a given header. The following is basically my script:
#!/usr/bin/env python3
from imaplib import IMAP4_SSL
from netrc import netrc
from email import message_from_bytes
conn = IMAP4_SSL('imap.gmail.com')
auth = netrc().hosts['imap.gmail.com']
conn.login(auth[0], auth[2])
conn.select()
typ, data = conn.search(None, 'ALL')
i = 0
for num in reversed(data[0].split()):
i += 1
typ, data = conn.fetch(num, '(RFC822)')
email = message_from_bytes(data[0][1])
print("%i: %s" % (int(num), email.get('subject')))
if i == 5:
break
conn.close()
conn.logout()
The frustrating thing is that the header comes back folded; thus showing through
the underlying email string instead of the actual value inside of the header.
How can I get the correctly unfolded header value? I'd like
to stick with core python3 stuff but I'm open to external deps if I must.

Use Policy Objects to enable unfolding in the Python email package. In your script, you would have to add:
from email.policy import SMTPUTF8
to import the policy SMTPUTF8, and later use that when calling message_from_bytes:
email = message_from_bytes(data[0][1], policy=SMTPUTF8)
I tried your script with Python 3.9.5, actually all policies except compat32 (which is used when the parameter policy is absent) enabled unfolding.

TL;DR: strip newlines
I'd love it if there were a simple answer to this, so if you have a better one feel free to add it. In the meantime, this sorta ghetto solution works perfectly:
#!/usr/bin/env python3
from imaplib import IMAP4_SSL
from netrc import netrc
from email import message_from_bytes
import re
conn = IMAP4_SSL('imap.gmail.com')
auth = netrc().hosts['imap.gmail.com']
conn.login(auth[0], auth[2])
conn.select()
typ, data = conn.search(None, 'ALL')
i = 0
for num in reversed(data[0].split()):
i += 1
typ, data = conn.fetch(num, '(RFC822)')
email = message_from_bytes(data[0][1])
raw_header = email.get('subject')
header = re.sub('[\r\n]', '', header)
print("%i: %s" % (int(num), header))
if i == 5:
break
conn.close()
conn.logout()

Related

Converting EML (MIME) to MSG with python

I'm trying to convert EML file to MSG (Outlook) file using python. Using various examples I was able to gather this code, but well, it doesn't work. It creates a msg file but the file is unreadable by Outlook and the size is two times bigger than the input eml file. I'm a little bit lost, any ideas?
from win32com.mapi import mapi
from win32com.mapi import mapitags
import win32com.client
import pythoncom
from win32com import storagecon
import ctypes
import platform
import winreg
import uuid
import sys
import os
mapi.MAPIInitialize((mapi.MAPI_INIT_VERSION, mapi.MAPI_MULTITHREAD_NOTIFICATIONS))
IconvOLE = ctypes.OleDLL(r'C:\Program Files (x86)\Microsoft Office\root\Office16\OUTLMIME.DLL')
clsid_class = uuid.UUID(str(mapi.CLSID_IConverterSession)).bytes_le
iclassfactory = uuid.UUID(str(pythoncom.IID_IClassFactory)).bytes_le
com_classfactory = ctypes.c_long(0)
IconvOLE.DllGetClassObject(clsid_class, iclassfactory, ctypes.byref(com_classfactory))
MyFactory = pythoncom.ObjectFromAddress(com_classfactory.value, pythoncom.IID_IClassFactory)
cs = MyFactory.CreateInstance (None, str(mapi.IID_IConverterSession))
eml = mapi.OpenStreamOnFileW(r"C:\test.eml")
stg = pythoncom.StgCreateDocfile(r"C:\test.msg",
storagecon.STGM_CREATE | storagecon.STGM_READWRITE | storagecon.STGM_TRANSACTED)
msg = mapi.OpenIMsgOnIStg(0, None, stg, None, 0, mapi.MAPI_UNICODE)
cs.MIMEToMAPI(eml, msg, win32com.mapi.mapi.CCSF_SMTP | win32com.mapi.mapi.CCSF_INCLUDE_BCC)
msg.SaveChanges(0)
mapi.MAPIUninitialize()
Firstly, sizes don't matter, especially if you compare different file formats.
Secondly, try to open the MSG file in a utility like SSView (it shows the data on the IStorage level) or OutlookSpy (I am its author - click OpenIMsgOnIStg button) - it will show the MSG file data on the MAPI level.
Perhaps most importantly, as of Outlook 2016, IConverterSession interface only works if your code is running inside the outlook.exe address space (i.e. your code is a COM/VSTO addin or Outlook VBA). Also, your code never checks that IConverterSession::MIMEToMAPI returns a success return code.
If using Redemption is an option (I am also its author), it allows to convert an EML file to MSG without using Outlook converter as easily as (in VB script):
set Session = CreateObject("Redemption.RDOSession")
Session.MAPIOBJECT = Application.Session.MAPIOBJECT 'not required
set Msg = Session.CreateMessageFromMsgFile("c:\temp\test.msg")
Msg.Sent = true
Msg.Import "c:\temp\test.eml", 1024 '1024 is olRfc822
Msg.Save
So it started to work after I have moved from Outlook x86 to x64, and I added the following registry keys:
[HKEY_LOCAL_MACHINE\SOFTWARE\Classes\CLSID\{4E3A7680-B77A-11D0-9DA5-00C04FD65685}]
#="CLSID_IConverterSession"
[HKEY_LOCAL_MACHINE\SOFTWARE\Classes\CLSID\{4E3A7680-B77A-11D0-9DA5-00C04FD65685}\InprocServer32]
#="C:\\Program Files\\Microsoft Office\\root\\Office16\\OUTLMIME.DLL"
"ThreadingModel"="Both"
[HKEY_LOCAL_MACHINE\SOFTWARE\Classes\CLSID\{9EADBD1A-447B-4240-A9DD-73FE7C53A981}]
#="CLSID_IMimeMessage"
[HKEY_LOCAL_MACHINE\SOFTWARE\Classes\CLSID\{9EADBD1A-447B-4240-A9DD-73FE7C53A981}\InprocServer32]
#="C:\\Program Files\\Microsoft Office\\root\\Office16\\OUTLMIME.DLL"
"ThreadingModel"="Both"
[HKEY_LOCAL_MACHINE\SOFTWARE\Classes\CLSID\{9EADBD1A-447B-4240-A9DD-73FE7C53A981}\Typelib]
#="{9EADBD25-447B-4240-A9DD-73FE7C53A981}"
Keys are copies of the keys that you can find in:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office\ClickToRun\REGISTRY\MACHINE\Software\Classes\CLSID
Working code:
mapi.MAPIInitialize((mapi.MAPI_INIT_VERSION, mapi.MAPI_MULTITHREAD_NOTIFICATIONS))
inf = mapi.OpenStreamOnFile(r"C:\Users\xxx\raw.eml")
stg = pythoncom.StgCreateDocfile(r"C:\Users\xxx\raw.msg",
storagecon.STGM_CREATE | storagecon.STGM_READWRITE | storagecon.STGM_SHARE_EXCLUSIVE | storagecon.STGM_TRANSACTED)
msg = mapi.OpenIMsgOnIStg(0, None, stg, None, 0, mapi.MAPI_UNICODE)
cs = pythoncom.CoCreateInstance(mapi.CLSID_IConverterSession, None, pythoncom.CLSCTX_INPROC_SERVER, mapi.IID_IConverterSession)
cs.MIMEToMAPI(inf, msg, 0)
msg.SaveChanges(0)
mapi.MAPIUninitialize()

How to properly read J1939 messages from .asc file with cantools?

I'm trying to create a CAN logs converter from .asc files to .csv files (in human readable form). I'm somewhat successful. My code works fine with almost any database but j1939.dbc.
The thing is, that if I print out the messages read from the dbc file, I can see that the messages from j1939.dbc are read into the database. But it fails to find any of those messages in the processed log file. At the same time I can read the same file using Vector CANalyzer with no issues.
I wonder why this may happed and why it only affects the j1939.dbc and not the others.
I suspect that maybe the way I convert those messages is wrong because it never goes by the if msg_id in database: line (and as mentioned above, those messages are certainly there because Vector CANalyzer works fine with them).
EDIT: I realized that maybe the problem is not cantools but python-can package, maybe the can.ASCReader() doeasn't do well with j1939 frames and omits them? I'm gonna investigate myself but I hope someone better at coding will help.
import pandas as pd
import can
import cantools
import time as t
from tqdm import tqdm
import re
import os
from binascii import unhexlify
dbcs = [filename.split('.')[0] for filename in os.listdir('./dbc/') if filename.endswith('.dbc')]
files = [filename.split('.')[0] for filename in os.listdir('./asc/') if filename.endswith('.asc')]
start = t.time()
db = cantools.database.Database()
for dbc in dbcs:
with open(f'./dbc/{dbc}.dbc', 'r') as f:
db.add_dbc(f)
f_num = 1
for fname in files:
print(f'[{f_num}/{len(files)}] Parsing data from file: {fname}')
log=can.ASCReader(f'./asc/{fname}.asc')
entries = []
all_msgs =[]
message = {'Time [s]': ''}
database = list(db._frame_id_to_message.keys())
print(database)
lines = sum(1 for line in open(f'./asc/{fname}.asc'))
msgs = iter(log)
try:
for msg, i in zip(msgs, tqdm(range(lines))):
msg = re.split("\\s+", str(msg))
timestamp = round(float(msg[1]), 0)
msg_id = int(msg[3], 16)
try:
data = unhexlify(''.join(msg[7:15]))
except:
continue
if msg_id in database:
if timestamp != message['Time [s]']:
entries.append(message.copy())
message.update({'Time [s]': timestamp})
message.update(db.decode_message(msg_id, data))
except ValueError:
print('ValueError')
df = pd.DataFrame(entries[1:])
duration = t.time() - start
df.to_csv(f'./csv/{fname}.csv', index=False)
print(f'DONE IN {int(round(duration, 2)//60)}min{round(duration % 60, 2)}s!\n{len(df.columns)} signals extracted!')
f_num += 1
class can.ASCReader(file, base=’hex’)
Bases: can.io.generic.BaseIOHandler
Iterator of CAN messages from a ASC logging file. Meta data (comments, bus statistics, J1939 Transport
Protocol messages) is ignored.
Might answer your question...

Exchangelib item.sender returns bytes

I'm having an issue recently with some values returned from item.sender - attached is a screen dump
As you can see in the lower portion of the graphic, it's not the usual form returned by item.sender which usually is of form:
Mailbox(name='ReCircle Recycling', email_address='aldoushicks#recirclerecycling.com', routing_type='SMTP', mailbox_type='OneOff')
Has anyone else seen this?
How do you deal with it?
i.e. even though I am using try/except clause, this result still causes my IDE to freeze.
I re-ran the script today using exactly the same date filter and it didn't happen. So I’m forced to ask, what did happen? Why is it not occurring again?
Its weird behaviour. It could jam up a script in future so wondering how to prevent it.
Code:
from collections import defaultdict
from datetime import datetime
import logging
from exchangelib import DELEGATE, Account, Credentials, \
EWSDateTime, EWSTimeZone, Configuration
from exchangelib.util import PrettyXmlHandler
logging.basicConfig(level=logging.DEBUG, handlers=[PrettyXmlHandler()])
gusername="" #deleted :/
gpassword="" #deleted :/
gprimary_smtp_address="bspks#lunet.lboro.ac.uk"
em_dict = defaultdict(list)
def contactServer(pusername, ppassword,pprimary_smtp_address):
creds = Credentials(pusername, ppassword)
config = Configuration(server='outlook.office365.com/EWS/Exchange.asmx', \
credentials=creds)
return Account(
primary_smtp_address=pprimary_smtp_address,
autodiscover=False,
config = config,
access_type=DELEGATE
)
print ("connecting to server\n")
account = contactServer(gusername, gpassword, gprimary_smtp_address)
dt_string = "2018-11-01 12:41:19+00:00" #usually comes out from db, stringified for debugging
dt = datetime.strptime(dt_string, '%Y-%m-%d %H:%M:%S+00:00')
tz = EWSTimeZone.timezone('Europe/London')
last_datetime = tz.localize(dt)
for item in account.inbox.filter(datetime_received__gt=last_datetime):
if isinstance(item, Message):
em_dict["username"].append(gusername)
em_dict["gprimary_smtp_address"].append(gprimary_smtp_address)
em_dict["datetime_received"].append(item.datetime_received)
em_dict["subject"].append(item.subject)
em_dict["body"].append(item.body)
em_dict["item_id"].append(item.item_id)
em_dict["sender"].append(item.sender)
# extract the email address from item.sender
emailaddr = item.sender.email_address
print ("debug email addr: ", emailaddr)
print ("downloaded emails\n")

Wrong encoding of email attachment

I have a python 2.7 script running on windows. It logs in gmail, checks for new e-mails and attachments:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
file_types = ["pdf", "doc", "docx"] # download attachments with these extentions
login = "login"
passw = "password"
imap_server = "imap.gmail.com"
smtp_server = "smtp.gmail.com"
smtp_port = 587
from smtplib import SMTP
from email.parser import HeaderParser
from email.MIMEText import MIMEText
import sys
import imaplib
import getpass
import email
import datetime
import os
import time
if __name__ == "__main__":
try:
while True:
session = imaplib.IMAP4_SSL(imap_server)
try:
rv, data = session.login(login, passw)
print "Logged in: ", rv
except imaplib.IMAP4.error:
print "Login failed!"
sys.exit(1)
rv, mailboxes = session.list()
rv, data = session.select(foldr)
rv, data = session.search(None, "(UNSEEN)")
for num in data[ 0 ].split():
rv, data = session.fetch(num, "(RFC822)")
for rpart in data:
if isinstance(rpart, tuple):
msg = email.message_from_string(rpart[ 1 ])
to = email.utils.parseaddr(msg[ "From" ])[ 1 ]
text = data[ 0 ][ 1 ]
msg = email.message_from_string(text)
got = []
for part in msg.walk():
if part.get_content_maintype() == 'multipart':
continue
if part.get('Content-Disposition') is None:
continue
filename = part.get_filename()
print "file: ", filename
print "Extention: ", filename.split(".")[ -1 ]
if filename.split(".")[ -1 ] not in file_types:
continue
data = part.get_payload(decode = True)
if not data:
continue
date = datetime.datetime.now().strftime("%Y-%m-%d")
if not os.path.isdir("CONTENT"):
os.mkdir("CONTENT")
if not os.path.isdir("CONTENT/" + date):
os.mkdir("CONTENT/" + date)
ftime = datetime.datetime.now().strftime("%H-%M-%S")
new_file = "CONTENT/" + date + "/" + ftime + "_" + filename
f = open(new_file, 'wb')
print "Got new file %s from %s" % (new_file, to)
got.append(filename.encode("utf-8"))
f.write(data)
f.close()
session.close()
session.logout()
time.sleep(60)
except:
print "TARFUN!"
And the problem is that the last print reads garbage:
=?UTF-8?B?0YfQsNGB0YLRjCAxINGC0LXQutGB0YIg0LzQtdGC0L7QtNC40YfQutC4LmRv?=
for example
so later checks don't work. On linux it works just fine.
For now I tryed to d/e[n]code filename to utf-8. But it did nothing. Thanks in advance.
If you read the spec that defines the filename field, RFC 2183, section 2.3, it says:
Current [RFC 2045] grammar restricts parameter values (and hence
Content-Disposition filenames) to US-ASCII. We recognize the great
desirability of allowing arbitrary character sets in filenames, but
it is beyond the scope of this document to define the necessary
mechanisms. We expect that the basic [RFC 1521] 'value'
specification will someday be amended to allow use of non-US-ASCII
characters, at which time the same mechanism should be used in the
Content-Disposition filename parameter.
There are proposed RFCs to handle this. In particular, it's been suggested that filenames be handled as encoded-words, as defined by RFC 5987, RFC 2047, and RFC 2231. In brief this means either RFC 2047 format:
"=?" charset "?" encoding "?" encoded-text "?="
… or RFC 2231 format:
"=?" charset ["*" language] "?" encoded-text "?="
Some mail agents are already using this functionality, others don't know what to do with it. The email package in Python 2.x is among those that don't know what to do with it. (It's possible that the later version in Python 3.x does, or that it may change in the future, but that won't help you if you want to stick with 2.x.) So, if you want to parse this, you have to do it yourself.
In your example, you've got a filename in RFC 2047 format, with charset UTF-8 (which is usable directly as a Python encoding name), encoding B, which means Base-64, and content 0YfQsNGB0YLRjCAxINGC0LXQutGB0YIg0LzQtdGC0L7QtNC40YfQutC4LmRv. So, you have to base-64 decode that, then UTF-8-decode that, and you get u'часть 1 текст методички.do'.
If you want to do this more generally, you're going to have to write code which tries to interpret each filename in RFC 2231 format if possible, in RFC 2047 format otherwise, and does the appropriate decoding steps. This code isn't trivial enough to write in a StackOverflow answer, but the basic idea is pretty simple, as demonstrated above, so you should be able to write it yourself. You may also want to search PyPI for existing implementations.

mimetools.Message() to python 3 email.message.Message

I try to port a python 2.x code to python 3.
The line im struggeling with is
from mimetools import Message
...
headers = Message(StringIO(data.split('\r\n', 1)[1]))
i have figured out that mimetools are no longer present in python 3 and that the replacement is the email class.
I tried out to replace it like this:
headers = email.message_from_file(io.StringIO(data.split('\r\n', 1)[1]))
but with that i get this error:
headers = email.message_from_file(io.StringIO(data.split('\r\n', 1)[1]))
TypeError: Type str doesn't support the buffer API
i am searching for an hint to do this porting from mimetools to email correct.
The original code is not from me. It can be found here :
https://gist.github.com/jkp/3136208
Alex's own solution from his comment:
import email
stream = io.StringIO()
rxString = data.decode("utf-8").split('\r\n', 1)[1]
stream.write(rxString)
headers = email.message_from_string(rxString)
found short solution
from email import message_from_string
data = socket.recv(4096)
headers = message_from_string(str(data, 'ASCII').split('\r\n', 1)[1])

Categories