Converting EML (MIME) to MSG with python - python

I'm trying to convert EML file to MSG (Outlook) file using python. Using various examples I was able to gather this code, but well, it doesn't work. It creates a msg file but the file is unreadable by Outlook and the size is two times bigger than the input eml file. I'm a little bit lost, any ideas?
from win32com.mapi import mapi
from win32com.mapi import mapitags
import win32com.client
import pythoncom
from win32com import storagecon
import ctypes
import platform
import winreg
import uuid
import sys
import os
mapi.MAPIInitialize((mapi.MAPI_INIT_VERSION, mapi.MAPI_MULTITHREAD_NOTIFICATIONS))
IconvOLE = ctypes.OleDLL(r'C:\Program Files (x86)\Microsoft Office\root\Office16\OUTLMIME.DLL')
clsid_class = uuid.UUID(str(mapi.CLSID_IConverterSession)).bytes_le
iclassfactory = uuid.UUID(str(pythoncom.IID_IClassFactory)).bytes_le
com_classfactory = ctypes.c_long(0)
IconvOLE.DllGetClassObject(clsid_class, iclassfactory, ctypes.byref(com_classfactory))
MyFactory = pythoncom.ObjectFromAddress(com_classfactory.value, pythoncom.IID_IClassFactory)
cs = MyFactory.CreateInstance (None, str(mapi.IID_IConverterSession))
eml = mapi.OpenStreamOnFileW(r"C:\test.eml")
stg = pythoncom.StgCreateDocfile(r"C:\test.msg",
storagecon.STGM_CREATE | storagecon.STGM_READWRITE | storagecon.STGM_TRANSACTED)
msg = mapi.OpenIMsgOnIStg(0, None, stg, None, 0, mapi.MAPI_UNICODE)
cs.MIMEToMAPI(eml, msg, win32com.mapi.mapi.CCSF_SMTP | win32com.mapi.mapi.CCSF_INCLUDE_BCC)
msg.SaveChanges(0)
mapi.MAPIUninitialize()

Firstly, sizes don't matter, especially if you compare different file formats.
Secondly, try to open the MSG file in a utility like SSView (it shows the data on the IStorage level) or OutlookSpy (I am its author - click OpenIMsgOnIStg button) - it will show the MSG file data on the MAPI level.
Perhaps most importantly, as of Outlook 2016, IConverterSession interface only works if your code is running inside the outlook.exe address space (i.e. your code is a COM/VSTO addin or Outlook VBA). Also, your code never checks that IConverterSession::MIMEToMAPI returns a success return code.
If using Redemption is an option (I am also its author), it allows to convert an EML file to MSG without using Outlook converter as easily as (in VB script):
set Session = CreateObject("Redemption.RDOSession")
Session.MAPIOBJECT = Application.Session.MAPIOBJECT 'not required
set Msg = Session.CreateMessageFromMsgFile("c:\temp\test.msg")
Msg.Sent = true
Msg.Import "c:\temp\test.eml", 1024 '1024 is olRfc822
Msg.Save

So it started to work after I have moved from Outlook x86 to x64, and I added the following registry keys:
[HKEY_LOCAL_MACHINE\SOFTWARE\Classes\CLSID\{4E3A7680-B77A-11D0-9DA5-00C04FD65685}]
#="CLSID_IConverterSession"
[HKEY_LOCAL_MACHINE\SOFTWARE\Classes\CLSID\{4E3A7680-B77A-11D0-9DA5-00C04FD65685}\InprocServer32]
#="C:\\Program Files\\Microsoft Office\\root\\Office16\\OUTLMIME.DLL"
"ThreadingModel"="Both"
[HKEY_LOCAL_MACHINE\SOFTWARE\Classes\CLSID\{9EADBD1A-447B-4240-A9DD-73FE7C53A981}]
#="CLSID_IMimeMessage"
[HKEY_LOCAL_MACHINE\SOFTWARE\Classes\CLSID\{9EADBD1A-447B-4240-A9DD-73FE7C53A981}\InprocServer32]
#="C:\\Program Files\\Microsoft Office\\root\\Office16\\OUTLMIME.DLL"
"ThreadingModel"="Both"
[HKEY_LOCAL_MACHINE\SOFTWARE\Classes\CLSID\{9EADBD1A-447B-4240-A9DD-73FE7C53A981}\Typelib]
#="{9EADBD25-447B-4240-A9DD-73FE7C53A981}"
Keys are copies of the keys that you can find in:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office\ClickToRun\REGISTRY\MACHINE\Software\Classes\CLSID
Working code:
mapi.MAPIInitialize((mapi.MAPI_INIT_VERSION, mapi.MAPI_MULTITHREAD_NOTIFICATIONS))
inf = mapi.OpenStreamOnFile(r"C:\Users\xxx\raw.eml")
stg = pythoncom.StgCreateDocfile(r"C:\Users\xxx\raw.msg",
storagecon.STGM_CREATE | storagecon.STGM_READWRITE | storagecon.STGM_SHARE_EXCLUSIVE | storagecon.STGM_TRANSACTED)
msg = mapi.OpenIMsgOnIStg(0, None, stg, None, 0, mapi.MAPI_UNICODE)
cs = pythoncom.CoCreateInstance(mapi.CLSID_IConverterSession, None, pythoncom.CLSCTX_INPROC_SERVER, mapi.IID_IConverterSession)
cs.MIMEToMAPI(inf, msg, 0)
msg.SaveChanges(0)
mapi.MAPIUninitialize()

Related

How to read custom message type using ros2bag

So I have this code which works great for reading messages out of predefined topics and printing it to screen. The rosbags come with a rosbag_name.db3 (sqlite) database and metadata.yaml file
from rosbags.rosbag2 import Reader as ROS2Reader
import sqlite3
from rosbags.serde import deserialize_cdr
import matplotlib.pyplot as plt
import os
import collections
import argparse
parser = argparse.ArgumentParser(description='Extract images from rosbag.')
# input will be the folder containing the .db3 and metadata.yml file
parser.add_argument('--input','-i',type=str, help='rosbag input location')
# run with python filename.py -i rosbag_dir/
args = parser.parse_args()
rosbag_dir = args.input
topic = "/topic/name"
frame_counter = 0
with ROS2Reader(rosbag_dir) as ros2_reader:
ros2_conns = [x for x in ros2_reader.connections]
# This prints a list of all topic names for sanity
print([x.topic for x in ros2_conns])
ros2_messages = ros2_reader.messages(connections=ros2_conns)
for m, msg in enumerate(ros2_messages):
(connection, timestamp, rawdata) = msg
if (connection.topic == topic):
print(connection.topic) # shows topic
print(connection.msgtype) # shows message type
print(type(connection.msgtype)) # shows it's of type string
# TODO
# this is where things crash when it's a custom message type
data = deserialize_cdr(rawdata, connection.msgtype)
print(data)
The issue is that I can't seem to figure out how to read in custom message types. deserialize_cdr takes a string for the message type field, but it's not clear to me how to replace this with a path or how to otherwise pass in a custom message.
Thanks
One approach would be that you declare and register it to the type system as a string:
from rosbags.typesys import get_types_from_msg, register_types
MY_CUSTOM_MSG = """
std_msgs/Header header
string foo
"""
register_types(get_types_from_msg(
MY_CUSTOM_MSG, 'my_custom_msgs/msg/MyCustomMsg'))
from rosbags.typesys.types import my_custom_msgs__msg__MyCustomMsg as MyCustomMsg
Next, using:
msg_type = MyCustomMsg.__msgtype__
you can get the message type that you can pass to deserialize_cdr.
Also, see here for a quick example.
Another approach is to directly load it from the message definition.
Essentially, you would need to read the message
from pathlib import Path
custom_msg_path = Path('/path/to/my_custom_msgs/msg/MyCustomMsg.msg')
msg_def = custom_msg_path.read_text(encoding='utf-8')
and then follow the same steps as above starting with get_types_from_msg().
A more detailed example of this approach is given here.

How to properly read J1939 messages from .asc file with cantools?

I'm trying to create a CAN logs converter from .asc files to .csv files (in human readable form). I'm somewhat successful. My code works fine with almost any database but j1939.dbc.
The thing is, that if I print out the messages read from the dbc file, I can see that the messages from j1939.dbc are read into the database. But it fails to find any of those messages in the processed log file. At the same time I can read the same file using Vector CANalyzer with no issues.
I wonder why this may happed and why it only affects the j1939.dbc and not the others.
I suspect that maybe the way I convert those messages is wrong because it never goes by the if msg_id in database: line (and as mentioned above, those messages are certainly there because Vector CANalyzer works fine with them).
EDIT: I realized that maybe the problem is not cantools but python-can package, maybe the can.ASCReader() doeasn't do well with j1939 frames and omits them? I'm gonna investigate myself but I hope someone better at coding will help.
import pandas as pd
import can
import cantools
import time as t
from tqdm import tqdm
import re
import os
from binascii import unhexlify
dbcs = [filename.split('.')[0] for filename in os.listdir('./dbc/') if filename.endswith('.dbc')]
files = [filename.split('.')[0] for filename in os.listdir('./asc/') if filename.endswith('.asc')]
start = t.time()
db = cantools.database.Database()
for dbc in dbcs:
with open(f'./dbc/{dbc}.dbc', 'r') as f:
db.add_dbc(f)
f_num = 1
for fname in files:
print(f'[{f_num}/{len(files)}] Parsing data from file: {fname}')
log=can.ASCReader(f'./asc/{fname}.asc')
entries = []
all_msgs =[]
message = {'Time [s]': ''}
database = list(db._frame_id_to_message.keys())
print(database)
lines = sum(1 for line in open(f'./asc/{fname}.asc'))
msgs = iter(log)
try:
for msg, i in zip(msgs, tqdm(range(lines))):
msg = re.split("\\s+", str(msg))
timestamp = round(float(msg[1]), 0)
msg_id = int(msg[3], 16)
try:
data = unhexlify(''.join(msg[7:15]))
except:
continue
if msg_id in database:
if timestamp != message['Time [s]']:
entries.append(message.copy())
message.update({'Time [s]': timestamp})
message.update(db.decode_message(msg_id, data))
except ValueError:
print('ValueError')
df = pd.DataFrame(entries[1:])
duration = t.time() - start
df.to_csv(f'./csv/{fname}.csv', index=False)
print(f'DONE IN {int(round(duration, 2)//60)}min{round(duration % 60, 2)}s!\n{len(df.columns)} signals extracted!')
f_num += 1
class can.ASCReader(file, base=’hex’)
Bases: can.io.generic.BaseIOHandler
Iterator of CAN messages from a ASC logging file. Meta data (comments, bus statistics, J1939 Transport
Protocol messages) is ignored.
Might answer your question...

Read PST files from win32 or pypff

I want to read PST files using Python. I've found 2 libraries win32 and pypff
Using win32 we can initiate a outlook object using:
import win32com.client
outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
inbox = outlook.GetDefaultFolder(6)
The GetDefaultFolder(6) gets the inbox folder. And then I can use this folders functions and attribute to work with.
But what I want is to give my own pst files which pywin32(or any other library) can read. Here it only connects with my Outlook Application
With pypff I can use the below code to work with pst files:
import pypff
pst_file = pypff.file()
pst_file.open('test.pst')
root = pst_file.get_root_folder()
for folder in root.sub_folders:
for sub in folder.sub_folders:
for message in sub.sub_messages:
print(message.get_plain_text_body()
But I want attributes like the size of the message and also like to access calendars in the pst files which is not available in pypff(not that I know of)
Question
How can I read PST files to get data like the size of the email, the types of attachments it has and the calendars?
Is it possible? Is there a work around in win32, pypff or any other library?
This is something that I want to do for my own application. I was able to piece together a solution from these sources:
https://gist.github.com/attibalazs/d4c0f9a1d21a0b24ff375690fbb9f9a7
https://github.com/matthewproctor/OutlookAttachmentExtractor
https://learn.microsoft.com/en-us/office/vba/api/outlook.namespace
The third link above should give additional details about available attributes and various item types. My solution still needs to connect to your Outlook application, but it should be transparent to the user since the pst store is automatically removed using in the try/catch/finally block. I hope this helps you get on the right track!
import win32com.client
def find_pst_folder(OutlookObj, pst_filepath) :
for Store in OutlookObj.Stores :
if Store.IsDataFileStore and Store.FilePath == pst_filepath :
return Store.GetRootFolder()
return None
def enumerate_folders(FolderObj) :
for ChildFolder in FolderObj.Folders :
enumerate_folders(ChildFolder)
iterate_messages(FolderObj)
def iterate_messages(FolderObj) :
for item in FolderObj.Items :
print("***************************************")
print(item.SenderName)
print(item.SenderEmailAddress)
print(item.SentOn)
print(item.To)
print(item.CC)
print(item.BCC)
print(item.Subject)
count_attachments = item.Attachments.Count
if count_attachments > 0 :
for att in range(count_attachments) :
print(item.Attachments.Item(att + 1).Filename)
Outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
pst = r"C:\Users\Joe\Your\PST\Path\example.pst"
Outlook.AddStore(pst)
PSTFolderObj = find_pst_folder(Outlook,pst)
try :
enumerate_folders(PSTFolderObj)
except Exception as exc :
print(exc)
finally :
Outlook.RemoveStore(PSTFolderObj)

How to suppress the "Microsoft PDF Reflow has stopped working" error when using python to open pdf file with MS Word?

I was trying to automate the pdf-to-docx process using python.
PDF Reflow will automatically opened if I open pdf files with MS Word. Therefore, I used it as an OCR tool.
I have suppressed the message boxes from Word by using word_app.DisplayAlerts=0 and try-except.
However, message box "Microsoft PDF Reflow has stopped working" still popped out sometimes.
Now I should manually close those message boxes and it is not reasonable for an automated process. Is there anyway to suppress the errors from PDF Reflow?
Below is the code I use:
import pythoncom
import win32com
from win32com.client import Dispatch, constants
#import logging
word_app = win32com.client.gencache.EnsureDispatch("Word.Application")
word_app.Visible = 0
word_app.DisplayAlerts = 0
wc = win32com.client.constants
try:
word_doc = word_app.Documents.Open('file.pdf')
except pythoncom.com_error as e:
print(e)
#logger.info(e)
word_doc.SaveAs(FileName = 'file.docx', FileFormat = wc.wdFormatXMLDocument)
word_doc.Close(SaveChanges = wc.wdDoNotSaveChanges)
word_app.Quit()
Thank you for any help!
Found answer by myself:
import win32com
shell_app = win32com.client.Dispatch('WScript.Shell')
RegKey = r'HKEY_CURRENT_USER\Software\Microsoft\Windows\Windows Error Reporting'
GetKeyValue = shell_app.RegRead(os.path.join(RegKey, "DontShowUI"))
EditKeyValue = shell_app.RegWrite(os.path.join(RegKey, "DontShowUI"), 1, "REG_DWORD")
##use this to restore your registry
RestoreKeyValue = shell_app.RegWrite(os.path.join(RegKey, "DontShowUI"), GetKeyValue, "REG_DWORD")
Be careful when using this code because it edits registry and may damage your computer.
This code is not an ideal solution as it will suppress not only PDF Reflow's error reporting but ALL of them.

How do I get unfolded email headers in Python3?

(Note: this question has nothing to do with encoding, as should be clear by reading it. Ignore the suggestion above.)
I'm learning Python and figured a nice tool to start out with would be something that would grab some emails over MIME and display a given header. The following is basically my script:
#!/usr/bin/env python3
from imaplib import IMAP4_SSL
from netrc import netrc
from email import message_from_bytes
conn = IMAP4_SSL('imap.gmail.com')
auth = netrc().hosts['imap.gmail.com']
conn.login(auth[0], auth[2])
conn.select()
typ, data = conn.search(None, 'ALL')
i = 0
for num in reversed(data[0].split()):
i += 1
typ, data = conn.fetch(num, '(RFC822)')
email = message_from_bytes(data[0][1])
print("%i: %s" % (int(num), email.get('subject')))
if i == 5:
break
conn.close()
conn.logout()
The frustrating thing is that the header comes back folded; thus showing through
the underlying email string instead of the actual value inside of the header.
How can I get the correctly unfolded header value? I'd like
to stick with core python3 stuff but I'm open to external deps if I must.
Use Policy Objects to enable unfolding in the Python email package. In your script, you would have to add:
from email.policy import SMTPUTF8
to import the policy SMTPUTF8, and later use that when calling message_from_bytes:
email = message_from_bytes(data[0][1], policy=SMTPUTF8)
I tried your script with Python 3.9.5, actually all policies except compat32 (which is used when the parameter policy is absent) enabled unfolding.
TL;DR: strip newlines
I'd love it if there were a simple answer to this, so if you have a better one feel free to add it. In the meantime, this sorta ghetto solution works perfectly:
#!/usr/bin/env python3
from imaplib import IMAP4_SSL
from netrc import netrc
from email import message_from_bytes
import re
conn = IMAP4_SSL('imap.gmail.com')
auth = netrc().hosts['imap.gmail.com']
conn.login(auth[0], auth[2])
conn.select()
typ, data = conn.search(None, 'ALL')
i = 0
for num in reversed(data[0].split()):
i += 1
typ, data = conn.fetch(num, '(RFC822)')
email = message_from_bytes(data[0][1])
raw_header = email.get('subject')
header = re.sub('[\r\n]', '', header)
print("%i: %s" % (int(num), header))
if i == 5:
break
conn.close()
conn.logout()

Categories