email.message_from_string can't parse outlook original source message - python

I am trying to parse a multipart email using its original source message from Outlook. The email has 2 parts: plain text and html. email.message_from_string doesn't parse the raw email correctly. It doesn't return 2 parts and also _payload includes everything except for the first 2 lines.
I used email.message_from_string(raw_email) to parse the raw original source message and it didn't parse it correctly.
Note: I cut off most of the email to keep it short.
Original source message from Outlook:
Received: from SN1NAM04HT187.eop-NAM04.prod.protection.outlook.com
(2603:10b6:300:d4::32) by CO2PR01MB1959.prod.exchangelabs.com with HTTPS via
MWHPR19CA0022.NAMPRD19.PROD.OUTLOOK.COM; Wed, 31 Jul 2019 19:52:30 +0000
Received: from SN1NAM04FT005.eop-NAM04.prod.protection.outlook.com
(10.152.88.55) by SN1NAM04HT187.eop-NAM04.prod.protection.outlook.com
(10.152.89.14) with Microsoft SMTP Server (version=TLS1_2,
cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2115.10; Wed, 31 Jul
2019 19:52:29 +0000
Authentication-Results: spf=pass (sender IP is 50.31.51.89)
smtp.mailfrom=sendgrid.blabla.com; windowslive.com; dkim=pass (signature was
verified) header.d=blabla.com;windowslive.com; dmarc=pass action=none
header.from=blabla.com;
Received-SPF: Pass (protection.outlook.com: domain of sendgrid.blabla.com
designates 50.31.51.89 as permitted sender) receiver=protection.outlook.com;
client-ip=50.31.51.89; helo=o1.email-sg.blabla.com;
Received: from o1.email-sg.blabla.com (50.31.51.89) by
SN1NAM04FT005.mail.protection.outlook.com (10.152.88.160) with Microsoft SMTP
Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
15.20.2136.14 via Frontend Transport; Wed, 31 Jul 2019 19:52:29 +0000
X-IncomingTopHeaderMarker: OriginalChecksum:0A3835CC8F7E76F92E22A1986408E34F6CB0EE38219063E844D0BB1572B82825;UpperCasedChecksum:3B51CEDA7CBD6FB06905BA9CCFA3417B571F394F0412206B12B87927F8C8FE0B;SizeAsReceived:1804;Count:15
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=blabla.com;
h=from:sender:to:subject:mime-version:content-type; s=s1;
bh=wEOiatvA5BWHjVFwDPHy3RC5ur4=; b=Y/sBR8/uaU5y+7GN3GanXk7dlsUId
bQjsB7HfZp6fdDuVo9EIKrFn9uffrsqJpXO6DFqX5rWWCvgTMYPnsM8Iy3ekU0sD
psxBZ186ROAoalowdniEsGZ/fTMan4JEXEWhlKKpGHxGR102lz1qylqRazxFlOEY
5yhWp6dJjLegIg=
Received: by filter0246p1iad2.sendgrid.net with SMTP id filter0246p1iad2-24721-5D41F17C-8
2019-07-31 19:52:28.625630772 +0000 UTC m=+518882.143553260
Received: from iad1gmta02.localdomain (unknown [192.88.178.20])
by ismtpd0002p1iad2.sendgrid.net (SG) with ESMTP id GrDTvaa3R6yukBfilmmfMw
for <y#windowslive.com>; Wed, 31 Jul 2019 19:52:28.454 +0000 (UTC)
Received: from iad1gbos.localdomain (unknown [10.3.65.145])
by iad1gmta02.localdomain (Postfix) with ESMTP id 62A8812AEA4D
for <y#windowslive.com>; Wed, 31 Jul 2019 15:52:28 -0400 (EDT)
Received: from iad1gbos.ecom.blabla.com (localhost [127.0.0.1])
by iad1gbos.localdomain (Postfix) with ESMTP id 57CD51013601
for <y#windowslive.com>; Wed, 31 Jul 2019 15:52:28 -0400 (EDT)
From: "blabla.com" <service#blabla.com>
Sender: "blabla.com" <service#blabla.com>
To: yc#windowslive.com
Message-ID: <1943845105.133098.1564602748358#localhost>
Subject: Thanks for your blabla order!
Content-Type: multipart/alternative;
boundary="----=_Part_12654_1135590884.1564602743147"
Date: Wed, 31 Jul 2019 19:52:28 +0000
X-SG-EID: KlhL5+04rpq9b+lNnUQSSXSv/U/Agrwcy5kw6hHCP8rbih+DKKzTjpaizOf9gI4jfUbeoQFtkwaLeA
Q5VJW+s2G92MVJdOKnwbJCcJQrsVc4oiuZgDCBS8dpWhU6KfIM6V5wL2yNP0pKKCugS+b4cgX4K5CX
GndIFYXJXa1LTZLPblTMNhH8QH5+kLY4Wtg9po8FuNUzEJaPXsJgnMHYzKZOIvAvnevqNIcyYVL2Yc
0=
X-SG-ID: DT9Vw4eifUpKg3EkHbNxgoJgjlm7TnFJRHcoaVv1UYo=
X-IncomingHeaderCount: 15
Return-Path: bounces+266386-caec-yc=windowslive.com#sendgrid.blabla.com
X-MS-Exchange-Organization-ExpirationStartTime: 31 Jul 2019 19:52:29.3384
(UTC)
X-MS-Exchange-Organization-ExpirationStartTimeReason: OriginalSubmit
X-MS-Exchange-Organization-ExpirationInterval: 1:00:00:00.0000000
X-MS-Exchange-Organization-ExpirationIntervalReason: OriginalSubmit
X-MS-Exchange-Organization-Network-Message-Id: 0748a39e-bdb3-4241-2271-08d715f09e99
X-EOPAttributedMessage: 0
X-EOPTenantAttributedMessage: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa:0
X-MS-Exchange-Organization-MessageDirectionality: Incoming
X-Forefront-Antispam-Report: EFV:NLI;
X-MS-Exchange-Organization-AuthSource:
SN1NAM04FT005.eop-NAM04.prod.protection.outlook.com
X-MS-Exchange-Organization-AuthAs: Anonymous
X-MS-PublicTrafficType: Email
X-MS-UserLastLogonTime: 7/31/2019 7:47:36 PM
X-MS-Office365-Filtering-Correlation-Id: 0748a39e-bdb3-4241-2271-08d715f09e99
X-Microsoft-Antispam:
BCL:6;PCL:0;RULEID:(2390118)(5000188)(711020)(4605104)(610169)(650170)(651021)(8291501072);SRVR:SN1NAM04HT187;
X-MS-TrafficTypeDiagnostic: SN1NAM04HT187:
X-MS-Exchange-PUrlCount: 24
X-MS-Exchange-EOPDirect: true
X-Sender-IP: 50.31.51.89
X-SID-PRA: SERVICE#blabla.COM
X-SID-Result: PASS
X-MS-Exchange-Organization-PCL: 2
X-OriginatorOrg: outlook.com
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 31 Jul 2019 19:52:29.0952
(UTC)
X-MS-Exchange-CrossTenant-Network-Message-Id: 0748a39e-bdb3-4241-2271-08d715f09e99
X-MS-Exchange-CrossTenant-Id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa
X-MS-Exchange-CrossTenant-FromEntityHeader: Internet
X-MS-Exchange-CrossTenant-RMS-PersistedConsumerOrg:
00000000-0000-0000-0000-000000000000
X-MS-Exchange-Transport-CrossTenantHeadersStamped: SN1NAM04HT187
X-MS-Exchange-Transport-EndToEndLatency: 00:00:01.1679704
X-MS-Exchange-Processed-By-BccFoldering: 15.20.2115.005
X-Microsoft-Antispam-Mailbox-Delivery:
abwl:0;wl:0;pcwl:0;kl:0;iwl:0;ijl:0;dwl:0;dkl:0;rwl:0;ucf:0;jmr:0;ex:0;auth:1;dest:I;ENG:(5062000261)(5061607266)(5061608174)(4900115)(4920090)(6515079)(4950130)(4990090);
X-Message-Info:
5vMbyqxGkdee9CWP6GN6k7SiFatHA5tOJthXLYGApF09RV+2VIwDSv7TpFIyyuwbdpElZ/0OfDQ0pBW79cd9agnpjGQw7b7v7zA6S/RBHvx/Foariz/CKmDCPiOrCScSKVW9YaM/CqKL76WFalT2LUf8VJBR8M4LupokoBm/WazuStfNPUu2PvSCEzFxbvn/ptMxVl+4wEXDPQivJ1nuMw==
X-Message-Delivery: Vj0xLjE7dXM9MDtsPTA7YT0xO0Q9MTtHRD0xO1NDTD0z
X-Microsoft-Antispam-Message-Info:
1PVG+UKcd/9R+HkjIXvLA0AQVREwyDmXR6XmnL6oX6xks/yw3ZQRARX4fYngU4vXeDhoJr9kyTA6Bpm3OE5FZn8+JPH3p24pamcQhTiI/RdyRHAOx7q7YHb9PzM3EkY2hOb6qF/QxCZPdshlewXqGe+azoh4Sr9CPUL8x7gtZS0XVBYBQMtHRW0NsS0ULp/4e4+lbGoyQcXdMGoy4Cf6ACU783dlOjDyZNkz2Frk3vm3Za/P3avHn46xf8WzHrDbbfOiVc+HXFAQxBOWbQPD0rkXNssXlOszegvDX7nPq9hdj8UadbqhECjiizH890bNwqKIa2sWd/d1HBfojK2FDsEOPwDSsIS/1ApF038jELtjpzkzSadz319o6VohzYUHm7CtRF9sqJTgLVKePBo/i8FrLeoCq2rXydXj6a9MS7SqLDfny7NlP/qId5z5GXFs63K/QXu3InYLIf5zcl/kgsvg+W0cHDZ4/IdBPvHGeaQn7hdf62IKftys3CspYBbSlt0Eus97CCddOX/EBaJpJ7nEpHxIL3pxKVY0kKWqaUqrvvC3mvCffBe3igaAq2LiHgvT0pIU+j0y41VwEn7X8rL8gyWpbBF64+wf8NAe8JM2N8aWudElAkIeA5GHJodGcXt+jdyhYzh3EZs1BWyrF+k6MPp4kU/9yVxAxBimBx1aje5geHD7NggWqFJAD6fv0XsSjxku4Tap6Zs+NEkoD/MHNyT6TMnu9cqGgEoznr2mTssEZkz0JRPgcb2YbZabybkxBRJjVi/aroSjtOj2V2JHo9m2F8bA4XhLrLwgJzOs1ZelyYFKZ3OMaUAS9yNRlSPHBSKm/WUzIMNkcHLCcg==
MIME-Version: 1.0
------=_Part_12654_1135590884.1564602743147
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Hi c y,
Thanks for your order! Your order information appears below.
Item(s) Subtotal: $1.79
Shipping: $4.95
----
Total Before Tax: $6.74
Estimated Tax: $0.15
----
Order Total: $6.89
Shipping Address:
blabla.com
https://www.blabla.com
1-800-672-4399
------=_Part_12654_1135590884.1564602743147
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: 7bit
<!--[if !mso]><!--><meta http-equiv="Content-Type" content="text/html; charset=utf-8"><link href="http://cms.blabla.com/fonts/roboto/email-font.css" rel="stylesheet" type="text/css">
<style type="text/css">
#font-face {
font-family: 'Roboto';
src: url('http://cms.blabla.com/fonts/roboto/Roboto-Regular-webfont.eot');
src: url('http://cms.blabla.com/fonts/roboto/Roboto-Regular-webfont.eot?#iefix') format('embedded-opentype'),
url('http://cms.blabla.com/fonts/roboto/Roboto-Regular-webfont.woff') format('woff'),
url('http://cms.blabla.com/fonts/roboto/Roboto-Regular-webfont.ttf') format('truetype'),
url('http://cms.blabla.com/fonts/roboto/Roboto-Regular-webfont.svg#robotoregular') format('svg');
font-weight:400;
font-style:normal;
}
</style>
<div width="100%" style="display:none;font-size:0px;color:#eeeeee;line-height:1px;text-align:center;opacity:0;overflow:hidden;max-height:0px;max-width:0px;">
Questions? Call us any time 24/7 at 1-800-672-4399 or simply reply to this email | blabla.com
</div>
------=_Part_12654_1135590884.1564602743147--
Result:
{'_charset': None,
'_default_type': 'text/plain',
'_headers': [('Received',
'from SN1NAM04HT187.eop-NAM04.prod.protection.outlook.com'),
('(2603',
'10b6:300:d4::32) by CO2PR01MB1959.prod.exchangelabs.com with HTTPS via')],
'_payload': 'Received: from SN1NAM04HT187.eop-NAM04.prod.protection.outlook.com
(2603:10b6:300:d4::32) by CO2PR01MB1959.prod.exchangelabs.com with HTTPS via
MWHPR19CA0022.NAMPRD19.PROD.OUTLOOK.COM; Wed, 31 Jul 2019 19:52:30 +0000
Received: from SN1NAM04FT005.eop-NAM04.prod.protection.outlook.com
(10.152.88.55) by SN1NAM04HT187.eop-NAM04.prod.protection.outlook.com
(10.152.89.14) with Microsoft SMTP Server (version=TLS1_2,
cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2115.10; Wed, 31 Jul
2019 19:52:29 +0000
Authentication-Results: spf=pass (sender IP is 50.31.51.89)
smtp.mailfrom=sendgrid.blabla.com; windowslive.com; dkim=pass (signature was
verified) header.d=blabla.com;windowslive.com; dmarc=pass action=none
header.from=blabla.com;
Received-SPF: Pass (protection.outlook.com: domain of sendgrid.blabla.com
designates 50.31.51.89 as permitted sender) receiver=protection.outlook.com;
client-ip=50.31.51.89; helo=o1.email-sg.blabla.com;
Received: from o1.email-sg.blabla.com (50.31.51.89) by
SN1NAM04FT005.mail.protection.outlook.com (10.152.88.160) with Microsoft SMTP
Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
15.20.2136.14 via Frontend Transport; Wed, 31 Jul 2019 19:52:29 +0000
X-IncomingTopHeaderMarker: OriginalChecksum:0A3835CC8F7E76F92E22A1986408E34F6CB0EE38219063E844D0BB1572B82825;UpperCasedChecksum:3B51CEDA7CBD6FB06905BA9CCFA3417B571F394F0412206B12B87927F8C8FE0B;SizeAsReceived:1804;Count:15
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=blabla.com;
h=from:sender:to:subject:mime-version:content-type; s=s1;
bh=wEOiatvA5BWHjVFwDPHy3RC5ur4=; b=Y/sBR8/uaU5y+7GN3GanXk7dlsUId
bQjsB7HfZp6fdDuVo9EIKrFn9uffrsqJpXO6DFqX5rWWCvgTMYPnsM8Iy3ekU0sD
psxBZ186ROAoalowdniEsGZ/fTMan4JEXEWhlKKpGHxGR102lz1qylqRazxFlOEY
5yhWp6dJjLegIg=
Received: by filter0246p1iad2.sendgrid.net with SMTP id filter0246p1iad2-24721-5D41F17C-8
2019-07-31 19:52:28.625630772 +0000 UTC m=+518882.143553260
Received: from iad1gmta02.localdomain (unknown [192.88.178.20])
by ismtpd0002p1iad2.sendgrid.net (SG) with ESMTP id GrDTvaa3R6yukBfilmmfMw
for <y#windowslive.com>; Wed, 31 Jul 2019 19:52:28.454 +0000 (UTC)
Received: from iad1gbos.localdomain (unknown [10.3.65.145])
by iad1gmta02.localdomain (Postfix) with ESMTP id 62A8812AEA4D
for <y#windowslive.com>; Wed, 31 Jul 2019 15:52:28 -0400 (EDT)
Received: from iad1gbos.ecom.blabla.com (localhost [127.0.0.1])
by iad1gbos.localdomain (Postfix) with ESMTP id 57CD51013601
for <y#windowslive.com>; Wed, 31 Jul 2019 15:52:28 -0400 (EDT)
From: "blabla.com" <service#blabla.com>
Sender: "blabla.com" <service#blabla.com>
To: yc#windowslive.com
Message-ID: <1943845105.133098.1564602748358#localhost>
Subject: Thanks for your blabla order!
Content-Type: multipart/alternative;
boundary="----=_Part_12654_1135590884.1564602743147"
Date: Wed, 31 Jul 2019 19:52:28 +0000
X-SG-EID: KlhL5+04rpq9b+lNnUQSSXSv/U/Agrwcy5kw6hHCP8rbih+DKKzTjpaizOf9gI4jfUbeoQFtkwaLeA
Q5VJW+s2G92MVJdOKnwbJCcJQrsVc4oiuZgDCBS8dpWhU6KfIM6V5wL2yNP0pKKCugS+b4cgX4K5CX
GndIFYXJXa1LTZLPblTMNhH8QH5+kLY4Wtg9po8FuNUzEJaPXsJgnMHYzKZOIvAvnevqNIcyYVL2Yc
0=
X-SG-ID: DT9Vw4eifUpKg3EkHbNxgoJgjlm7TnFJRHcoaVv1UYo=
X-IncomingHeaderCount: 15
Return-Path: bounces+266386-caec-yc=windowslive.com#sendgrid.blabla.com
X-MS-Exchange-Organization-ExpirationStartTime: 31 Jul 2019 19:52:29.3384
(UTC)
X-MS-Exchange-Organization-ExpirationStartTimeReason: OriginalSubmit
X-MS-Exchange-Organization-ExpirationInterval: 1:00:00:00.0000000
X-MS-Exchange-Organization-ExpirationIntervalReason: OriginalSubmit
X-MS-Exchange-Organization-Network-Message-Id: 0748a39e-bdb3-4241-2271-08d715f09e99
X-EOPAttributedMessage: 0
X-EOPTenantAttributedMessage: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa:0
X-MS-Exchange-Organization-MessageDirectionality: Incoming
X-Forefront-Antispam-Report: EFV:NLI;
X-MS-Exchange-Organization-AuthSource:
SN1NAM04FT005.eop-NAM04.prod.protection.outlook.com
X-MS-Exchange-Organization-AuthAs: Anonymous
X-MS-PublicTrafficType: Email
X-MS-UserLastLogonTime: 7/31/2019 7:47:36 PM
X-MS-Office365-Filtering-Correlation-Id: 0748a39e-bdb3-4241-2271-08d715f09e99
X-Microsoft-Antispam:
BCL:6;PCL:0;RULEID:(2390118)(5000188)(711020)(4605104)(610169)(650170)(651021)(8291501072);SRVR:SN1NAM04HT187;
X-MS-TrafficTypeDiagnostic: SN1NAM04HT187:
X-MS-Exchange-PUrlCount: 24
X-MS-Exchange-EOPDirect: true
X-Sender-IP: 50.31.51.89
X-SID-PRA: SERVICE#blabla.COM
X-SID-Result: PASS
X-MS-Exchange-Organization-PCL: 2
X-OriginatorOrg: outlook.com
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 31 Jul 2019 19:52:29.0952
(UTC)
X-MS-Exchange-CrossTenant-Network-Message-Id: 0748a39e-bdb3-4241-2271-08d715f09e99
X-MS-Exchange-CrossTenant-Id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa
X-MS-Exchange-CrossTenant-FromEntityHeader: Internet
X-MS-Exchange-CrossTenant-RMS-PersistedConsumerOrg:
00000000-0000-0000-0000-000000000000
X-MS-Exchange-Transport-CrossTenantHeadersStamped: SN1NAM04HT187
X-MS-Exchange-Transport-EndToEndLatency: 00:00:01.1679704
X-MS-Exchange-Processed-By-BccFoldering: 15.20.2115.005
X-Microsoft-Antispam-Mailbox-Delivery:
abwl:0;wl:0;pcwl:0;kl:0;iwl:0;ijl:0;dwl:0;dkl:0;rwl:0;ucf:0;jmr:0;ex:0;auth:1;dest:I;ENG:(5062000261)(5061607266)(5061608174)(4900115)(4920090)(6515079)(4950130)(4990090);
X-Message-Info:
5vMbyqxGkdee9CWP6GN6k7SiFatHA5tOJthXLYGApF09RV+2VIwDSv7TpFIyyuwbdpElZ/0OfDQ0pBW79cd9agnpjGQw7b7v7zA6S/RBHvx/Foariz/CKmDCPiOrCScSKVW9YaM/CqKL76WFalT2LUf8VJBR8M4LupokoBm/WazuStfNPUu2PvSCEzFxbvn/ptMxVl+4wEXDPQivJ1nuMw==
X-Message-Delivery: Vj0xLjE7dXM9MDtsPTA7YT0xO0Q9MTtHRD0xO1NDTD0z
X-Microsoft-Antispam-Message-Info:
1PVG+UKcd/9R+HkjIXvLA0AQVREwyDmXR6XmnL6oX6xks/yw3ZQRARX4fYngU4vXeDhoJr9kyTA6Bpm3OE5FZn8+JPH3p24pamcQhTiI/RdyRHAOx7q7YHb9PzM3EkY2hOb6qF/QxCZPdshlewXqGe+azoh4Sr9CPUL8x7gtZS0XVBYBQMtHRW0NsS0ULp/4e4+lbGoyQcXdMGoy4Cf6ACU783dlOjDyZNkz2Frk3vm3Za/P3avHn46xf8WzHrDbbfOiVc+HXFAQxBOWbQPD0rkXNssXlOszegvDX7nPq9hdj8UadbqhECjiizH890bNwqKIa2sWd/d1HBfojK2FDsEOPwDSsIS/1ApF038jELtjpzkzSadz319o6VohzYUHm7CtRF9sqJTgLVKePBo/i8FrLeoCq2rXydXj6a9MS7SqLDfny7NlP/qId5z5GXFs63K/QXu3InYLIf5zcl/kgsvg+W0cHDZ4/IdBPvHGeaQn7hdf62IKftys3CspYBbSlt0Eus97CCddOX/EBaJpJ7nEpHxIL3pxKVY0kKWqaUqrvvC3mvCffBe3igaAq2LiHgvT0pIU+j0y41VwEn7X8rL8gyWpbBF64+wf8NAe8JM2N8aWudElAkIeA5GHJodGcXt+jdyhYzh3EZs1BWyrF+k6MPp4kU/9yVxAxBimBx1aje5geHD7NggWqFJAD6fv0XsSjxku4Tap6Zs+NEkoD/MHNyT6TMnu9cqGgEoznr2mTssEZkz0JRPgcb2YbZabybkxBRJjVi/aroSjtOj2V2JHo9m2F8bA4XhLrLwgJzOs1ZelyYFKZ3OMaUAS9yNRlSPHBSKm/WUzIMNkcHLCcg==
MIME-Version: 1.0
------=_Part_12654_1135590884.1564602743147
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Hi c y,
Thanks for your order! Your order information appears below.
Item(s) Subtotal: $1.79
Shipping: $4.95
----
Total Before Tax: $6.74
Estimated Tax: $0.15
----
Order Total: $6.89
Shipping Address:
blabla.com
https://www.blabla.com
1-800-672-4399
------=_Part_12654_1135590884.1564602743147
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: 7bit
<!--[if !mso]><!--><meta http-equiv="Content-Type" content="text/html; charset=utf-8"><link href="http://cms.blabla.com/fonts/roboto/email-font.css" rel="stylesheet" type="text/css">
<style type="text/css">
#font-face {
font-family: 'Roboto';
src: url('http://cms.blabla.com/fonts/roboto/Roboto-Regular-webfont.eot');
src: url('http://cms.blabla.com/fonts/roboto/Roboto-Regular-webfont.eot?#iefix') format('embedded-opentype'),
url('http://cms.blabla.com/fonts/roboto/Roboto-Regular-webfont.woff') format('woff'),
url('http://cms.blabla.com/fonts/roboto/Roboto-Regular-webfont.ttf') format('truetype'),
url('http://cms.blabla.com/fonts/roboto/Roboto-Regular-webfont.svg#robotoregular') format('svg');
font-weight:400;
font-style:normal;
}
</style>
<div width="100%" style="display:none;font-size:0px;color:#eeeeee;line-height:1px;text-align:center;opacity:0;overflow:hidden;max-height:0px;max-width:0px;">
Questions? Call us any time 24/7 at 1-800-672-4399 or simply reply to this email | blabla.com
</div>
------=_Part_12654_1135590884.1564602743147--',
'_unixfrom': None,
'defects': [],
'epilogue': None,
'preamble': None}
As you can see result returns the complete original email source message as a payload except for the first 2 lines. The email should be 2 parts, one text/plain and the other text/html. Lines before MIME-Version: 1.0 should not be included in payload. Thanks!

The problem was the formating of the email source message. When I copy pasted it from outlook client, formating was broken so I had to fix it manually for it to be parsed correctly. I had to put tabs before some lines as you can see below:
Received: from SN1NAM04HT187.eop-NAM04.prod.protection.outlook.com
(2603:10b6:300:d4::32) by CO2PR01MB1959.prod.exchangelabs.com with
HTTPS via
MWHPR19CA0022.NAMPRD19.PROD.OUTLOOK.COM; Wed, 31 Jul 2019 19:52:30 +0000
Received: from SN1NAM04FT005.eop-NAM04.prod.protection.outlook.com
(10.152.88.55) by SN1NAM04HT187.eop-NAM04.prod.protection.outlook.com
(10.152.89.14) with Microsoft SMTP Server (version=TLS1_2,
cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2115.10; Wed, 31
Jul 2019 19:52:29 +0000
Authentication-Results: spf=pass (sender IP is 50.31.51.89)
smtp.mailfrom=sendgrid.blabla.com; windowslive.com;
dkim=pass (signature was
verified) header.d=blabla.com;windowslive.com; dmarc=pass
action=none
header.from=blabla.com;

Related

How to get access of value within long text present within list in Python?

I am working on piece of code to get a value from gmail, but email itself is HTML File, so code is also returning me html code within list, for which I am unable to parse data.
My Code:
import imaplib
ORG_EMAIL = "comapnyname.com"
FROM_EMAIL = "automation#companyname.co"
FROM_PWD = "password123!"
SMTP_SERVER = "imap.gmail.com"
def read_email_from_gmail():
mail = imaplib.IMAP4_SSL(SMTP_SERVER)
mail.login(FROM_EMAIL, FROM_PWD)
mail.select("inbox")
email_type, data = mail.search(None, "ALL")
mail_ids = data[0]
id_list = mail_ids.split()
latest_email_id = int(id_list[-1])
email_type, data = mail.fetch(str.encode(str(latest_email_id)), "(RFC822)")
string_data = str(data)
print('MAIL Data: ')
print(string_data)
read_email_from_gmail()
Now This code is returning me long list which contains HTML
[(b'1 (RFC822 {54624}', b'Delivered-To: automation+qa1#spekit.co\r\nReceived: by 2002:a4a:6f04:0:0:0:0:0 with SMTP id h4csp1519301ooc;\r\n Thu, 10 Sep 2020 09:18:42 -0700 (PDT)\r\nX-Google-Smtp-Source: ABdhPJy/7yOn17HKdn+QjP0XHEOK2fu8LDL8tz4jDmDKemms2GVyykqDCDUfppmRbV4DUi7ckRRg\r\nX-Received: by 2002:a25:d7cd:: with SMTP id o196mr14075369ybg.91.1599754722247;\r\n Thu, 10 Sep 2020 09:18:42 -0700 (PDT)\r\nARC-Seal: i=1; a=rsa-sha256; t=1599754722; cv=none;\r\n d=google.com; s=arc-20160816;\r\n b=KzNg7bsmLaNcrRMihkN+AwlTp8ybj5D65K+Z21Ddl/lgd2LN90InAWhj+guhrmzHtB\r\n vw83T4AlJ8u2jpAs5qYUbxgd/R5COLhlRDqR/dE4wljRgIq2W6sVCJo/fGuZruFjob4Z\r\n h1acPat0xa3h83lJzzbH576KggTqdScMwCbLsujPr/FclnHNjkqxQuFQlV23nAGgvWX8\r\n raiIW+6wC070tmQaaz3feIVfo7r7cmQBGokOmy8B3of0/kqIyMVuaEkmk2kno8VFvILF\r\n i8YPq7bOHVNpre7KwiG4r69PdaDRXIcd/ETtuyusfNXOrGJ0QhC44j2eLUpxlRltOGgL\r\n NAeA==\r\nARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816;\r\n h=mime-version:date:message-id:to:subject:from:dkim-signature\r\n :dkim-signature;\r\n bh=ZNxh0gTg5kVpAZyTHGJ2jWADa5UGAoPCP3GFX1DUu94=;\r\n b=WjnIWwVX2oWrl3aZoKlzck1GAoy/gT5/cbNP+tnmdypfjvAUTyuZ3OO5xXlZB/CiF9\r\n PkYZFEzJQSxradr3ky5T7tLmV2qKnHfaIp3G3STUs5f9vhSfp6qknV7ouLBGwCWyp2gp\r\n e14Aek7M5ciVC1GIjxlr7AXZne4eHSwCb7u8j91Yt8B2getEQ9lyQlChwjYf38Kau5lL\r\n wPmMtAM0DDOqlNff2gTBEFgAX1s0Wk+g8mKS31tzBMIQvayR+a3PHX+S3zhtC2i1XsLm\r\n NOWSMsI0ZEEk/mjA36DVWhEN0d9llOwiDfFonXxIkcPZLlNR3zGfA61apTeud7i24vYn\r\n bfCw==\r\nARC-Authentication-Results: i=1; mx.google.com;\r\n dkim=pass header.i=#spekit.co header.s=mandrill header.b=RhjFdk+T;\r\n dkim=pass header.i=#mandrillapp.com header.s=mandrill header.b=SusUoY2S;\r\n spf=pass (google.com: domain of bounce-md_31064008.5f5a51e1.v1-8084cafe0c6c4aeca73fef8bdaf5b70b#mandrillapp.com designates 198.2.180.17 as permitted sender) smtp.mailfrom=bounce-md_31064008.5f5a51e1.v1-8084cafe0c6c4aeca73fef8bdaf5b70b#mandrillapp.com;\r\n dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=spekit.co\r\nReturn-Path: <bounce-md_31064008.5f5a51e1.v1-8084cafe0c6c4aeca73fef8bdaf5b70b#mandrillapp.com>\r\nReceived: from mail180-17.suw31.mandrillapp.com (mail180-17.suw31.mandrillapp.com. [198.2.180.17])\r\n by mx.google.com with ESMTPS id t10si6240908ybl.463.2020.09.10.09.18.42\r\n for <automation+qa1#spekit.co>\r\n (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128);\r\n Thu, 10 Sep 2020 09:18:42 -0700 (PDT)\r\nReceived-SPF: pass (google.com: domain of bounce-md_31064008.5f5a51e1.v1-8084cafe0c6c4aeca73fef8bdaf5b70b#mandrillapp.com designates 198.2.180.17 as permitted sender) client-ip=198.2.180.17;\r\nAuthentication-Results: mx.google.com;\r\n dkim=pass header.i=#spekit.co header.s=mandrill header.b=RhjFdk+T;\r\n dkim=pass header.i=#mandrillapp.com header.s=mandrill header.b=SusUoY2S;\r\n spf=pass (google.com: domain of bounce-md_31064008.5f5a51e1.v1-8084cafe0c6c4aeca73fef8bdaf5b70b#mandrillapp.com designates 198.2.180.17 as permitted sender) smtp.mailfrom=bounce-md_31064008.5f5a51e1.v1-8084cafe0c6c4aeca73fef8bdaf5b70b#mandrillapp.com;\r\n dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=spekit.co\r\nDKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; s=mandrill; d=spekit.co;\r\n h=From:Subject:To:Message-Id:Date:MIME-Version:Content-Type; i=support#spekit.co;\r\n bh=ZNxh0gTg5kVpAZyTHGJ2jWADa5UGAoPCP3GFX1DUu94=;\r\n b=RhjFdk+Tvr3HP43qJoKzVowGAs1SYJFfpq8MK4firz5tcpBYn3UEP/Z5cF+IBA74/PTmCahgTnXi\r\n /EPSbY2b+20ERj4s4VUnwNZw8t4L98gSQiM6o3mF4iVI2JIgABU2Tn2nmB68kGZyxeSOs4bWtE+s\r\n MXleLzg+uTftETJoUhM=\r\nReceived: from pmta03.mandrill.prod.suw01.rsglab.com (127.0.0.1) by mail180-17.suw31.mandrillapp.com id hb98u422sc0h for <automation+qa1#spekit.co>; Thu, 10 Sep 2020 16:18:42 +0000 (envelope-from <bounce-md_31064008.5f5a51e1.v1-8084cafe0c6c4aeca73fef8bdaf5b70b#mandrillapp.com>)\r\nDKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mandrillapp.com; \r\n i=#mandrillapp.com; q=dns/txt; s=mandrill; t=1599754721; h=From : \r\n Subject : To : Message-Id : Date : MIME-Version : Content-Type : From : \r\n Subject : Date : X-Mandrill-User : List-Unsubscribe; \r\n bh=ZNxh0gTg5kVpAZyTHGJ2jWADa5UGAoPCP3GFX1DUu94=; \r\n b=SusUoY2SOQosSQrzHafHGf7Pto1Ol3PDGU067dNsjT1ZIOuSP0Dz7DJwqgFn6NpwAV7X7e\r\n pzQQPyDJoAqQCjCdSqG9mp80hAEGwQC89GNu78a8o0NRC+BPRTGaNKV/jX06cXsgp+A4KXfY\r\n 13x1BInjKraTnCYz9TnzDUChIm3pg=\r\nFrom: Support <support#spekit.co>\r\nSubject: Your Spekit Login PIN\r\nReturn-Path: <bounce-md_31064008.5f5a51e1.v1-8084cafe0c6c4aeca73fef8bdaf5b70b#mandrillapp.com>\r\nReceived: from [3.128.246.0] by mandrillapp.com id 8084cafe0c6c4aeca73fef8bdaf5b70b; Thu, 10 Sep 2020 16:18:41 +0000\r\nTo: Automation <automation+qa1#spekit.co>\r\nX-Report-Abuse: Please forward a copy of this message, including all headers, to abuse#mandrill.com\r\nX-Report-Abuse: You can also report abuse here: http://mandrillapp.com/contact/abuse?id=31064008.8084cafe0c6c4aeca73fef8bdaf5b70b\r\nX-Mandrill-User: md_31064008\r\nMessage-Id: <31064008.20200910161841.5f5a51e1e2be13.10518479#mail180-17.suw31.mandrillapp.com>\r\nDate: Thu, 10 Sep 2020 16:18:41 +0000\r\nMIME-Version: 1.0\r\nContent-Type: multipart/alternative; boundary="_av-l5kOy35rlKJaV18wYlOHPA"\r\n\r\n--_av-l5kOy35rlKJaV18wYlOHPA\r\nContent-Type: text/plain; charset=utf-8\r\nContent-Transfer-Encoding: quoted-printable\r\n\r\n Your Spekit Login PIN \r\n Hi Automation, Someone (hopefully you) just\r\nlogged into your Spekit account with the email *automation+qa1#spekit.co*. \r\n \r\n If this was you, please use the code below to log-in, otherwise please\r\ncontact your admin and reset your password ASAP.\r\n =3D *952681* =3D\r\n\r\n Enter PIN <https://app.spekit.co/verifypin>\r\n<http://www.twitter.com/spekitapp>\r\n<https://www.linkedin.com/company/spekit/> <https://medium.com/spekit>\r\n<https://spekit.co/> \r\nQuestions? Contact us. <mailto:support#spekit.co>\r\n Copyright =C2=A9 2018 Spekit, Inc. All rights reserved.\r\n\r\n--_av-l5kOy35rlKJaV18wYlOHPA\r\nContent-Type: text/html; charset=utf-8\r\nContent-Transfer-Encoding: quoted-printable\r\n\r\n<!doctype html>\r\n<html xmlns=3D"http://www.w3.org/1999/xhtml" xmlns:v=3D"urn:schemas-microso=\r\nft-com:vml" xmlns:o=3D"urn:schemas-microsoft-com:office:office">\r\n <head>\r\n <!-- NAME: 1 COLUMN - FULL WIDTH -->\r\n <!--[if gte mso 15]>\r\n <xml>\r\n <o:OfficeDocumentSettings>\r\n <o:AllowPNG/>\r\n <o:PixelsPerInch>96</o:PixelsPerInch>\r\n </o:OfficeDocumentSettings>\r\n </xml>\r\n <![endif]-->\r\n <meta charset=3D"UTF-8">\r\n <meta http-equiv=3D"X-UA-Compatible" content=3D"IE=3Dedge">\r\n <meta name=3D"viewport" content=3D"width=3Ddevice-width, initial-sc=\r\nale=3D1">\r\n <title>Your Spekit Login PIN</title>\r\n \r\n <style type=3D"text/css">\r\n=09=09p{\r\n=09=09=09margin:10px 0;\r\n=09=09=09padding:0;\r\n=09=09}\r\n=09=09table{\r\n=09=09=09border-</tbody></table> ')']
I need to get value i.e. '952681', which is displaying twice, can someone help me there?
if the format of the email stays the same you can use regex to parse the returned html string:
import re
pattern = '\*([\s\S]*?)\*'
res = re.findall(pattern, your_email_text)
the variable res contains your number at the second position:
['automation+qa1#spekit.co', '952681']

Python regex re.match() not returning any results

I'm hoping this is just something simple. I'm trying to determine whether or not an email is already encrypted.
# Read e-mail from stdin
raw = sys.stdin.read()
raw_message = email.message_from_string( raw )
I took the example from http://docs.python.org/2/howto/regex.html on doing a simple test for match.
p = re.compile('-----BEGIN\sPGP\sMESSAGE-----')
m = p.match(raw)
if m:
log = open(cfg['logging']['file'], 'a')
log.write("THIS IS ENCRYPTED")
log.close()
else:
log = open(cfg['logging']['file'], 'a')
log.write("NOT ENCRYPTED:")
log.close()
The email is read. The log file is written to but it always comes back no match. I've written raw to a logfile and that string is present.
Not sure where to go next.
UPDATE:
Here is the output from a raw ( a simple test message )
Sending email to: <bruce#packetaddiction.com>
Received: from localhost (localhost [127.0.0.1])
by mail2.packetaddiction.com (Postfix) with ESMTP id 5FE5D22A65
for <bruce#packetaddiction.com>; Tue, 10 Sep 2013 16:19:12 +0000 (UTC)
X-Virus-Scanned: Debian amavisd-new at mail2.packetaddiction.com
Received: from mail2.packetaddiction.com ([127.0.0.1])
by localhost (mail2.packetaddiction.com [127.0.0.1]) (amavisd-new, port 10024)
with ESMTP id cc3zZ_izEb1j for <bruce#packetaddiction.com>;
Tue, 10 Sep 2013 16:19:06 +0000 (UTC)
Received: from mail.secryption.com (mail.secryption.com [178.18.24.223])
by mail2.packetaddiction.com (Postfix) with ESMTPS id 9CA3C22A5B
for <bruce#packetaddiction.com>; Tue, 10 Sep 2013 16:19:06 +0000 (UTC)
Received: from localhost (localhost.localdomain [127.0.0.1])
by mail.secryption.com (Postfix) with ESMTP id 9994E1421F81
for <bruce#packetaddiction.com>; Tue, 10 Sep 2013 12:19:19 -0400 (EDT)
X-Virus-Scanned: Debian amavisd-new at mail.secryption.com
Received: from mail.secryption.com ([127.0.0.1])
by localhost (mail.secryption.com [127.0.0.1]) (amavisd-new, port 10024)
with ESMTP id WbkVn_cowG6q for <bruce#packetaddiction.com>;
Tue, 10 Sep 2013 12:19:18 -0400 (EDT)
Received: from dennis.cng.int (mail.compassnetworkgroup.com [173.163.129.21])
(using TLSv1 with cipher RC4-MD5 (128/128 bits))
(No client certificate requested)
by mail.secryption.com (Postfix) with ESMTPSA id 5B4191421F80
for <bruce#packetaddiction.com>; Tue, 10 Sep 2013 12:19:18 -0400 (EDT)
User-Agent: K-9 Mail for Android
MIME-Version: 1.0
Content-Type: text/plain;
charset=UTF-8
Content-Transfer-Encoding: 8bit
Subject: Message
From: Bruce Markey <bruce#secryption.com>
Date: Tue, 10 Sep 2013 12:19:00 -0400
To: "bruce#packetaddiction.com" <bruce#packetaddiction.com>
Message-ID: <36615ed6-a1a9-49ac-ac85-31905916d478#email.android.com>
-----BEGIN PGP MESSAGE-----
Version: APG v1.0.8
hQEMAwPNxvNWsisWAQgAuOTLkiitYzhGJydOzN4sBoGjhRm9JeJMfmxKxKTKcV2W
ZBuN0z+nS1KxnXrIlahhwLtpiFvp5apI8wAyAiLC2BhFieFttOl1/xLVJbd1nI1o
KQE1RUXhPURejJ3eH9g/LmkhtFQcnsuHGTGnLi6dugBNhWLqgnLUBX+VLt6moz2C
84lDuQ1y7B/JFOctKRScUqmxDd8b2peZJOnVT/p0tSYNfN9QGH3W02FZShE4KKBl
HpezK8KC6cZdf34Eao+ep+fP5DuKx/4j3ksCbFKyQ3gd+yxK/xnhkijDsYCfFRiF
ElAGDvXu4RXqrKRpBxq1bRhU8YqS7j5593MTUViWitLAGgH1DV0UeA/B5LMUDRyz
4ZfDqd0kDYsPUy2Cg20HdXHaobkzdvHLzfqQq0Owc1nTcvu4nzCbIMhTAlZjn8ZA
aODTlKcvnFBWEtNERPm0x6nkbhMo3GeysejaJSRod3aGqhuhga4iIrrew1W03297
aalwY8RKeNoV15VItsyrbbT+HvDNSaFFCPUAs+KcLHCOez5/woozjlqKdBI6yHCe
gqpYJPP07qFsVviltfDO63xS48f2HCPe4iyXCy6Usp0+jM7zAzH7KH1O854GH46Q
r0A01DLo9REmDr4U
=pBQZ
-----END PGP MESSAGE-----
re.match will only find a match at the beginning of the string, as noted here. You want to use re.search
raw = """Sending email to: <bruce#packetaddiction.com>...
...
-----BEGIN PGP MESSAGE-----
...
"""
>>> p = re.compile('-----BEGIN\sPGP\sMESSAGE-----')
>>> m = p.search(raw)
>>> m
<_sre.SRE_Match object at 0x0000000002E02510>
>>> m.group()
'-----BEGIN PGP MESSAGE-----'
>>> m = p.match(raw)
>>> print m
None
Although, as noted, regex is likely overkill for this problem as the matching text is static.
Regular expressions are used when you want a "fuzzy" match - that is, you aren't sure if the string you are looking for will be identical every time.
In this case, the string you are looking for appears to be exactly -----BEGIN PGP MESSAGE----. In this case, the string.find function will be simpler to use and faster to boot.
>>> a = "This is a PGP encrypted email. -----BEGIN PGP MESSAGE----- !##$%^..."
>>> b = "This is not encrypted. My hovercraft is full of eels." #example strings
>>> a.find("-----BEGIN PGP MESSAGE-----")
30 # Return value '30' means that the search string was found at index 30 of source string
>>> b.find("-----BEGIN PGP MESSAGE-----")
-1 # -1 means 'not found in the source string'
>>>

Python : ( msg = email.message_from_string(aaa) ) values are returning ( None ) when trying to parse stuff from raw e-mail source

let's execute the script
python b.wsgi
result is:
None
None
that is the problem and here is the full script b.wsgi
aaa = """
From root#a1.local.tld Thu Jul 25 19:28:59 2013
Received: from a1.local.tld (localhost [127.0.0.1])
by a1.local.tld (8.14.4/8.14.4) with ESMTP id r6Q2SxeQ003866
for <ooo#a1.local.tld>; Thu, 25 Jul 2013 19:28:59 -0700
Received: (from root#localhost)
by a1.local.tld (8.14.4/8.14.4/Submit) id r6Q2Sxbh003865;
Thu, 25 Jul 2013 19:28:59 -0700
From: root#a1.local.tld
Subject: oooooooooooooooo
To: ooo#a1.local.tld
Cc:
X-Originating-IP: 192.168.15.127
X-Mailer: Webmin 1.420
Message-Id: <1374805739.3861#a1>
Date: Thu, 25 Jul 2013 19:28:59 -0700 (PDT)
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="bound1374805739"
This is a multi-part message in MIME format.
--bound1374805739
Content-Type: text/plain
Content-Transfer-Encoding: 7bit
ooooooooooooooooooooooooooooooooooooooooooooooo
--bound1374805739--
"""
import email
msg = email.message_from_string(aaa)
print msg['From']
print msg['To']
i tried changing it to
print msg['from']
print msg['to']
same problem.
what might be the issue here ?
is it possible PYTHON knows this "raw" string was manually edited by my hands ?
very sneaky stuff going on here.
The \n at the beginning and end of the string are causing the problem. Try this
>>> msg = email.message_from_string(aaa.strip())
>>> msg.keys()
['Received', 'Received', 'From', 'Subject', 'To', 'Cc', 'X-Originating-IP', 'X-Mailer', 'Message-Id', 'Date', 'MIME-Version', 'Content-Type']
>>> msg['From']
'root#a1.local.tld'

Parsing email headers with regular expressions in python

I'm a python beginner trying to extract data from email headers. I have thousands of email messages in a single text file, and from each message I want to extract the sender's address, recipient(s) address, and the date, and write it to a single, semicolon-delimitted line in a new file.
this is ugly, but it's what I've come up with:
import re
emails = open("demo_text.txt","r") #opens the file to analyze
results = open("results.txt","w") #creates new file for search results
resultsList = []
for line in emails:
if "From - " in line: #recgonizes the beginning of a email message and adds a linebreak
newMessage = re.findall(r'\w\w\w\s\w\w\w.*', line)
if newMessage:
resultsList.append("\n")
if "From: " in line:
address = re.findall(r'[\w.-]+#[\w.-]+', line)
if address:
resultsList.append(address)
resultsList.append(";")
if "To: " in line:
if "Delivered-To:" not in line: #avoids confusion with 'Delivered-To:' tag
address = re.findall(r'[\w.-]+#[\w.-]+', line)
if address:
for person in address:
resultsList.append(person)
resultsList.append(";")
if "Date: " in line:
date = re.findall(r'\w\w\w\,.*', line)
resultsList.append(date)
resultsList.append(";")
for result in resultsList:
results.writelines(result)
emails.close()
results.close()
and here's my 'demo_text.txt':
From - Sun Jan 06 19:08:49 2013
X-Mozilla-Status: 0001
X-Mozilla-Status2: 00000000
Delivered-To: somebody_1#hotmail.com
Received: by 10.48.48.3 with SMTP id v3cs417003nfv;
Mon, 15 Jan 2007 10:14:19 -0800 (PST)
Received: by 10.65.211.13 with SMTP id n13mr5741660qbq.1168884841872;
Mon, 15 Jan 2007 10:14:01 -0800 (PST)
Return-Path: <nobody#hotmail.com>
Received: from bay0-omc3-s21.bay0.hotmail.com (bay0-omc3-s21.bay0.hotmail.com [65.54.246.221])
by mx.google.com with ESMTP id e13si6347910qbe.2007.01.15.10.13.58;
Mon, 15 Jan 2007 10:14:01 -0800 (PST)
Received-SPF: pass (google.com: domain of nobody#hotmail.com designates 65.54.246.221 as permitted sender)
Received: from hotmail.com ([65.54.250.22]) by bay0-omc3-s21.bay0.hotmail.com with Microsoft SMTPSVC(6.0.3790.2668);
Mon, 15 Jan 2007 10:13:48 -0800
Received: from mail pickup service by hotmail.com with Microsoft SMTPSVC;
Mon, 15 Jan 2007 10:13:47 -0800
Message-ID: <BAY115-F12E4E575FF2272CF577605A1B50#phx.gbl>
Received: from 65.54.250.200 by by115fd.bay115.hotmail.msn.com with HTTP;
Mon, 15 Jan 2007 18:13:43 GMT
X-Originating-IP: [200.122.47.165]
X-Originating-Email: [nobody#hotmail.com]
X-Sender: nobody#hotmail.com
From: =?iso-8859-1?B?UGF1bGEgTWFy7WEgTGlkaWEgRmxvcmVuemE=?=
<nobody#hotmail.com>
To: somebody_1#hotmail.com, somebody_2#gmail.com, 3_nobodies#yahoo.com.ar
Bcc:
Subject: fotos
Date: Mon, 15 Jan 2007 18:13:43 +0000
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="----=_NextPart_000_d98_1c4f_3aa9"
X-OriginalArrivalTime: 15 Jan 2007 18:13:47.0572 (UTC) FILETIME=[E68D4740:01C738D0]
Return-Path: nobody#hotmail.com
The output is:
somebody_1#hotmail.com;somebody_2#gmail.com;3_nobodies#yahoo.com.ar;Mon, 15 Jan 2007 18:13:43 +0000;
This output would be fine except there's a line break in the 'From:' field in my demo_text.txt (line 24), and so I miss 'nobody#hotmail.com'.
I'm not sure how to tell my code to skip line break and still find email address in the From: tag.
More generally, I'm sure there are many more sensible ways to go about this task. If anyone could point me in the right direction, I'd sure appreciate it.
Your demo text is practicallly the mbox format, which can be perfectly processed with the appropriate object in the mailbox module:
from mailbox import mbox
import re
PAT_EMAIL = re.compile(r"[0-9A-Za-z._-]+\#[0-9A-Za-z._-]+")
mymbox = mbox("demo.txt")
for email in mymbox.values():
from_address = PAT_EMAIL.findall(email["from"])
to_address = PAT_EMAIL.findall(email["to"])
date = [ email["date"], ]
print ";".join(from_address + to_address + date)
In order to skip newlines, you can't read it line by line. You can try loading in your file, and using your keywords (From, To, etc.) as boundaries. So when you search for 'From -', you use the rest of your keywords as boundaries so they won't be included in the portion of the list.
Also, mentioning this cause you said you're a beginner:
The "Pythonic" way of naming your non-class variables is with underscores. So resultsList should be results_list.

python mechanize - retrieve a file from aspnetForm submitControl that triggers a file download

How do I use python mechanize to retrieve a file from an aspnetForm submitControl that triggers an Excel file download when I don't know the file URL or file name?
URL of site with Excel file: http://www.ncysaclassic.com/TTSchedules.aspx?tid=NCFL&year=2012&stid=NCFL&syear=2012&div=U11M01
I'm trying to get the file downloaded by the Print Excel 'button'.
So far I have:
r = br.open('http://www.ncysaclassic.com/TTSchedules.aspx?tid=NCFL&year=2012&stid=NCFL&syear=2012&div=U11M01')
html = r.read()
# Show the html title
print br.title()
# Show the available forms
for f in br.forms():
print f
br.select_form('aspnetForm')
print '\n\nSubmitting...\n'
br.submit("ctl00$ContentPlaceHolder1$btnExtractSched")
print 'Response...\n'
print br.response().info()
print br.response().read
print 'still alive...\n'
for prop, value in vars(br.response()).iteritems():
print 'Property:', prop, ', Value: ', value
print 'myfile...\n'
myfile = br.response().read
and I get this output:
Submitting...
Response...
Content-Type: application/vnd.ms-excel
Last-Modified: Thu, 27 Sep 2012 20:19:10 GMT
Accept-Ranges: bytes
ETag: W/"6e27615aed9ccd1:0"
Server: Microsoft-IIS/7.5
X-Powered-By: ASP.NET
Date: Thu, 27 Sep 2012 20:19:09 GMT
Connection: close
Content-Length: 691200
<bound method response_seek_wrapper.read of <response_seek_wrapper at 0x2db5248L whose wrapped object = <closeable_response at 0x2e811c8L whose fp = <socket._fileobject object at 0x0000000002D79930>>>>
still alive...
Property: _headers , Value: Content-Type: application/vnd.ms-excel
Last-Modified: Thu, 27 Sep 2012 20:19:10 GMT
Accept-Ranges: bytes
ETag: W/"6e27615aed9ccd1:0"
Server: Microsoft-IIS/7.5
X-Powered-By: ASP.NET
Date: Thu, 27 Sep 2012 20:19:09 GMT
Connection: close
Content-Length: 691200
Property: _seek_wrapper__read_complete_state , Value: [False]
Property: _seek_wrapper__have_readline , Value: True
Property: _seek_wrapper__is_closed_state , Value: [False]
Property: _seek_wrapper__pos , Value: 0
Property: wrapped , Value: <closeable_response at 0x2e811c8L whose fp = <socket._fileobject object at 0x0000000002D79930>>
Property: _seek_wrapper__cache , Value: <cStringIO.StringO object at 0x0000000002E8B0D8>
Seems I am very close...Note the Content-Type: application/vnd.ms-excel
I just don't know what to do next. Where is my file, and how do I get a pointer to it and save it locally for access later?
Update:
I used dir() to get a list of methods/attributes for the response() and then tried a couple of the methods...
print '\ndir(br.response())\n'
for each in dir(br.response()):
print each
print '\nresponse info...\n'
print br.response().info()
print '\nresponse geturl\n'
print br.response().geturl()
and I get this output...
dir(br.response())
__copy__
__doc__
__getattr__
__init__
__iter__
__module__
__repr__
__setattr__
_headers
_seek_wrapper__cache
_seek_wrapper__have_readline
_seek_wrapper__is_closed_state
_seek_wrapper__pos
_seek_wrapper__read_complete_state
close
get_data
geturl
info
invariant
next
read
readline
readlines
seek
set_data
tell
wrapped
xreadlines
response info...
Date: Thu, 27 Sep 2012 20:55:02 GMT
ETag: W/"fa759b5df29ccd1:0"
Server: Microsoft-IIS/7.5
Connection: Close
Content-Type: application/vnd.ms-excel
X-Powered-By: ASP.NET
Accept-Ranges: bytes
Last-Modified: Thu, 27 Sep 2012 20:55:03 GMT
Content-Length: 691200
response geturl
http://www.ncysaclassic.com/photos/pdftemp/ScheduleExcel165502.xls
I think I already have this file in my br.response. I just don't know how to extract it! Please help.
# fill out the form
response = br.submit()
fileobj = open('filename', 'w+')
fileobj.write(response.read())
fileobj.close()

Categories