Extracting some SIP headers - python

I am using this regular expression for SIP (Session Initiation Protocol) URIs to extract the different internal variables.
_syntax = re.compile('^(?P<scheme>[a-zA-Z][a-zA-Z0-9\+\-\.]*):' # scheme
+ '(?:(?:(?P<user>[a-zA-Z0-9\-\_\.\!\~\*\'\(\)&=\+\$,;\?\/\%]+)' # user
+ '(?::(?P<password>[^:#;\?]+))?)#)?' # password
+ '(?:(?:(?P<host>[^;\?:]*)(?::(?P<port>[\d]+))?))' # host, port
+ '(?:;(?P<params>[^\?]*))?' # parameters
+ '(?:\?(?P<headers>.*))?$') # headers
m = URI._syntax.match(value)
if m:
self.scheme, self.user, self.password, self.host, self.port, params, headers = m.groups()
and i want to extract specific header like the header via,branch,contact,callID or Cseq.
The general form of a sip message is:
OPTIONS sip:172.16.18.35:5060 SIP/2.0
Content-Length: 0
Via: SIP/2.0/UDP 172.16.18.90:5060
From: "fake" <sip:fake#172.16.18.90>
Supported: replaces, timer
User-Agent: SIPPing
To: <sip:172.16.18.35:5060>
Contact: <sip:fake#172.16.18.90:5060>
CSeq: 1 OPTIONS
Allow: INVITE, ACK, CANCEL, OPTIONS, BYE, REFER, SUBSCRIBE, NOTIFY, INFO, PUBLISH
Call-ID: fake-id#172.16.18.90
Date: Thu, 25 Apr 2013 003024 +0000
Max-Forwards: 70

I would suggest taking advantage of the intentional similarities between SIP header format and RFC822.
from email.parser import Parser
msg = Parser().parsestr(m.group('headers'))
...thereafter:
>>> msg.keys()
['Content-Length', 'Via', 'From', 'Supported', 'User-Agent', 'To', 'Contact', 'CSeq', 'Allow', 'Call-ID', 'Date', 'Max-Forwards']
>>> msg['To']
'<sip:172.16.18.35:5060>'
>>> msg['Date']
'Thu, 25 Apr 2013 003024 +0000'
...etc. See the documentation for the Python standard-library email module for more details.

Related

How to get access of value within long text present within list in Python?

I am working on piece of code to get a value from gmail, but email itself is HTML File, so code is also returning me html code within list, for which I am unable to parse data.
My Code:
import imaplib
ORG_EMAIL = "comapnyname.com"
FROM_EMAIL = "automation#companyname.co"
FROM_PWD = "password123!"
SMTP_SERVER = "imap.gmail.com"
def read_email_from_gmail():
mail = imaplib.IMAP4_SSL(SMTP_SERVER)
mail.login(FROM_EMAIL, FROM_PWD)
mail.select("inbox")
email_type, data = mail.search(None, "ALL")
mail_ids = data[0]
id_list = mail_ids.split()
latest_email_id = int(id_list[-1])
email_type, data = mail.fetch(str.encode(str(latest_email_id)), "(RFC822)")
string_data = str(data)
print('MAIL Data: ')
print(string_data)
read_email_from_gmail()
Now This code is returning me long list which contains HTML
[(b'1 (RFC822 {54624}', b'Delivered-To: automation+qa1#spekit.co\r\nReceived: by 2002:a4a:6f04:0:0:0:0:0 with SMTP id h4csp1519301ooc;\r\n Thu, 10 Sep 2020 09:18:42 -0700 (PDT)\r\nX-Google-Smtp-Source: ABdhPJy/7yOn17HKdn+QjP0XHEOK2fu8LDL8tz4jDmDKemms2GVyykqDCDUfppmRbV4DUi7ckRRg\r\nX-Received: by 2002:a25:d7cd:: with SMTP id o196mr14075369ybg.91.1599754722247;\r\n Thu, 10 Sep 2020 09:18:42 -0700 (PDT)\r\nARC-Seal: i=1; a=rsa-sha256; t=1599754722; cv=none;\r\n d=google.com; s=arc-20160816;\r\n b=KzNg7bsmLaNcrRMihkN+AwlTp8ybj5D65K+Z21Ddl/lgd2LN90InAWhj+guhrmzHtB\r\n vw83T4AlJ8u2jpAs5qYUbxgd/R5COLhlRDqR/dE4wljRgIq2W6sVCJo/fGuZruFjob4Z\r\n h1acPat0xa3h83lJzzbH576KggTqdScMwCbLsujPr/FclnHNjkqxQuFQlV23nAGgvWX8\r\n raiIW+6wC070tmQaaz3feIVfo7r7cmQBGokOmy8B3of0/kqIyMVuaEkmk2kno8VFvILF\r\n i8YPq7bOHVNpre7KwiG4r69PdaDRXIcd/ETtuyusfNXOrGJ0QhC44j2eLUpxlRltOGgL\r\n NAeA==\r\nARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816;\r\n h=mime-version:date:message-id:to:subject:from:dkim-signature\r\n :dkim-signature;\r\n bh=ZNxh0gTg5kVpAZyTHGJ2jWADa5UGAoPCP3GFX1DUu94=;\r\n b=WjnIWwVX2oWrl3aZoKlzck1GAoy/gT5/cbNP+tnmdypfjvAUTyuZ3OO5xXlZB/CiF9\r\n PkYZFEzJQSxradr3ky5T7tLmV2qKnHfaIp3G3STUs5f9vhSfp6qknV7ouLBGwCWyp2gp\r\n e14Aek7M5ciVC1GIjxlr7AXZne4eHSwCb7u8j91Yt8B2getEQ9lyQlChwjYf38Kau5lL\r\n wPmMtAM0DDOqlNff2gTBEFgAX1s0Wk+g8mKS31tzBMIQvayR+a3PHX+S3zhtC2i1XsLm\r\n NOWSMsI0ZEEk/mjA36DVWhEN0d9llOwiDfFonXxIkcPZLlNR3zGfA61apTeud7i24vYn\r\n bfCw==\r\nARC-Authentication-Results: i=1; mx.google.com;\r\n dkim=pass header.i=#spekit.co header.s=mandrill header.b=RhjFdk+T;\r\n dkim=pass header.i=#mandrillapp.com header.s=mandrill header.b=SusUoY2S;\r\n spf=pass (google.com: domain of bounce-md_31064008.5f5a51e1.v1-8084cafe0c6c4aeca73fef8bdaf5b70b#mandrillapp.com designates 198.2.180.17 as permitted sender) smtp.mailfrom=bounce-md_31064008.5f5a51e1.v1-8084cafe0c6c4aeca73fef8bdaf5b70b#mandrillapp.com;\r\n dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=spekit.co\r\nReturn-Path: <bounce-md_31064008.5f5a51e1.v1-8084cafe0c6c4aeca73fef8bdaf5b70b#mandrillapp.com>\r\nReceived: from mail180-17.suw31.mandrillapp.com (mail180-17.suw31.mandrillapp.com. [198.2.180.17])\r\n by mx.google.com with ESMTPS id t10si6240908ybl.463.2020.09.10.09.18.42\r\n for <automation+qa1#spekit.co>\r\n (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128);\r\n Thu, 10 Sep 2020 09:18:42 -0700 (PDT)\r\nReceived-SPF: pass (google.com: domain of bounce-md_31064008.5f5a51e1.v1-8084cafe0c6c4aeca73fef8bdaf5b70b#mandrillapp.com designates 198.2.180.17 as permitted sender) client-ip=198.2.180.17;\r\nAuthentication-Results: mx.google.com;\r\n dkim=pass header.i=#spekit.co header.s=mandrill header.b=RhjFdk+T;\r\n dkim=pass header.i=#mandrillapp.com header.s=mandrill header.b=SusUoY2S;\r\n spf=pass (google.com: domain of bounce-md_31064008.5f5a51e1.v1-8084cafe0c6c4aeca73fef8bdaf5b70b#mandrillapp.com designates 198.2.180.17 as permitted sender) smtp.mailfrom=bounce-md_31064008.5f5a51e1.v1-8084cafe0c6c4aeca73fef8bdaf5b70b#mandrillapp.com;\r\n dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=spekit.co\r\nDKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; s=mandrill; d=spekit.co;\r\n h=From:Subject:To:Message-Id:Date:MIME-Version:Content-Type; i=support#spekit.co;\r\n bh=ZNxh0gTg5kVpAZyTHGJ2jWADa5UGAoPCP3GFX1DUu94=;\r\n b=RhjFdk+Tvr3HP43qJoKzVowGAs1SYJFfpq8MK4firz5tcpBYn3UEP/Z5cF+IBA74/PTmCahgTnXi\r\n /EPSbY2b+20ERj4s4VUnwNZw8t4L98gSQiM6o3mF4iVI2JIgABU2Tn2nmB68kGZyxeSOs4bWtE+s\r\n MXleLzg+uTftETJoUhM=\r\nReceived: from pmta03.mandrill.prod.suw01.rsglab.com (127.0.0.1) by mail180-17.suw31.mandrillapp.com id hb98u422sc0h for <automation+qa1#spekit.co>; Thu, 10 Sep 2020 16:18:42 +0000 (envelope-from <bounce-md_31064008.5f5a51e1.v1-8084cafe0c6c4aeca73fef8bdaf5b70b#mandrillapp.com>)\r\nDKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mandrillapp.com; \r\n i=#mandrillapp.com; q=dns/txt; s=mandrill; t=1599754721; h=From : \r\n Subject : To : Message-Id : Date : MIME-Version : Content-Type : From : \r\n Subject : Date : X-Mandrill-User : List-Unsubscribe; \r\n bh=ZNxh0gTg5kVpAZyTHGJ2jWADa5UGAoPCP3GFX1DUu94=; \r\n b=SusUoY2SOQosSQrzHafHGf7Pto1Ol3PDGU067dNsjT1ZIOuSP0Dz7DJwqgFn6NpwAV7X7e\r\n pzQQPyDJoAqQCjCdSqG9mp80hAEGwQC89GNu78a8o0NRC+BPRTGaNKV/jX06cXsgp+A4KXfY\r\n 13x1BInjKraTnCYz9TnzDUChIm3pg=\r\nFrom: Support <support#spekit.co>\r\nSubject: Your Spekit Login PIN\r\nReturn-Path: <bounce-md_31064008.5f5a51e1.v1-8084cafe0c6c4aeca73fef8bdaf5b70b#mandrillapp.com>\r\nReceived: from [3.128.246.0] by mandrillapp.com id 8084cafe0c6c4aeca73fef8bdaf5b70b; Thu, 10 Sep 2020 16:18:41 +0000\r\nTo: Automation <automation+qa1#spekit.co>\r\nX-Report-Abuse: Please forward a copy of this message, including all headers, to abuse#mandrill.com\r\nX-Report-Abuse: You can also report abuse here: http://mandrillapp.com/contact/abuse?id=31064008.8084cafe0c6c4aeca73fef8bdaf5b70b\r\nX-Mandrill-User: md_31064008\r\nMessage-Id: <31064008.20200910161841.5f5a51e1e2be13.10518479#mail180-17.suw31.mandrillapp.com>\r\nDate: Thu, 10 Sep 2020 16:18:41 +0000\r\nMIME-Version: 1.0\r\nContent-Type: multipart/alternative; boundary="_av-l5kOy35rlKJaV18wYlOHPA"\r\n\r\n--_av-l5kOy35rlKJaV18wYlOHPA\r\nContent-Type: text/plain; charset=utf-8\r\nContent-Transfer-Encoding: quoted-printable\r\n\r\n Your Spekit Login PIN \r\n Hi Automation, Someone (hopefully you) just\r\nlogged into your Spekit account with the email *automation+qa1#spekit.co*. \r\n \r\n If this was you, please use the code below to log-in, otherwise please\r\ncontact your admin and reset your password ASAP.\r\n =3D *952681* =3D\r\n\r\n Enter PIN <https://app.spekit.co/verifypin>\r\n<http://www.twitter.com/spekitapp>\r\n<https://www.linkedin.com/company/spekit/> <https://medium.com/spekit>\r\n<https://spekit.co/> \r\nQuestions? Contact us. <mailto:support#spekit.co>\r\n Copyright =C2=A9 2018 Spekit, Inc. All rights reserved.\r\n\r\n--_av-l5kOy35rlKJaV18wYlOHPA\r\nContent-Type: text/html; charset=utf-8\r\nContent-Transfer-Encoding: quoted-printable\r\n\r\n<!doctype html>\r\n<html xmlns=3D"http://www.w3.org/1999/xhtml" xmlns:v=3D"urn:schemas-microso=\r\nft-com:vml" xmlns:o=3D"urn:schemas-microsoft-com:office:office">\r\n <head>\r\n <!-- NAME: 1 COLUMN - FULL WIDTH -->\r\n <!--[if gte mso 15]>\r\n <xml>\r\n <o:OfficeDocumentSettings>\r\n <o:AllowPNG/>\r\n <o:PixelsPerInch>96</o:PixelsPerInch>\r\n </o:OfficeDocumentSettings>\r\n </xml>\r\n <![endif]-->\r\n <meta charset=3D"UTF-8">\r\n <meta http-equiv=3D"X-UA-Compatible" content=3D"IE=3Dedge">\r\n <meta name=3D"viewport" content=3D"width=3Ddevice-width, initial-sc=\r\nale=3D1">\r\n <title>Your Spekit Login PIN</title>\r\n \r\n <style type=3D"text/css">\r\n=09=09p{\r\n=09=09=09margin:10px 0;\r\n=09=09=09padding:0;\r\n=09=09}\r\n=09=09table{\r\n=09=09=09border-</tbody></table> ')']
I need to get value i.e. '952681', which is displaying twice, can someone help me there?
if the format of the email stays the same you can use regex to parse the returned html string:
import re
pattern = '\*([\s\S]*?)\*'
res = re.findall(pattern, your_email_text)
the variable res contains your number at the second position:
['automation+qa1#spekit.co', '952681']

Python regex re.match() not returning any results

I'm hoping this is just something simple. I'm trying to determine whether or not an email is already encrypted.
# Read e-mail from stdin
raw = sys.stdin.read()
raw_message = email.message_from_string( raw )
I took the example from http://docs.python.org/2/howto/regex.html on doing a simple test for match.
p = re.compile('-----BEGIN\sPGP\sMESSAGE-----')
m = p.match(raw)
if m:
log = open(cfg['logging']['file'], 'a')
log.write("THIS IS ENCRYPTED")
log.close()
else:
log = open(cfg['logging']['file'], 'a')
log.write("NOT ENCRYPTED:")
log.close()
The email is read. The log file is written to but it always comes back no match. I've written raw to a logfile and that string is present.
Not sure where to go next.
UPDATE:
Here is the output from a raw ( a simple test message )
Sending email to: <bruce#packetaddiction.com>
Received: from localhost (localhost [127.0.0.1])
by mail2.packetaddiction.com (Postfix) with ESMTP id 5FE5D22A65
for <bruce#packetaddiction.com>; Tue, 10 Sep 2013 16:19:12 +0000 (UTC)
X-Virus-Scanned: Debian amavisd-new at mail2.packetaddiction.com
Received: from mail2.packetaddiction.com ([127.0.0.1])
by localhost (mail2.packetaddiction.com [127.0.0.1]) (amavisd-new, port 10024)
with ESMTP id cc3zZ_izEb1j for <bruce#packetaddiction.com>;
Tue, 10 Sep 2013 16:19:06 +0000 (UTC)
Received: from mail.secryption.com (mail.secryption.com [178.18.24.223])
by mail2.packetaddiction.com (Postfix) with ESMTPS id 9CA3C22A5B
for <bruce#packetaddiction.com>; Tue, 10 Sep 2013 16:19:06 +0000 (UTC)
Received: from localhost (localhost.localdomain [127.0.0.1])
by mail.secryption.com (Postfix) with ESMTP id 9994E1421F81
for <bruce#packetaddiction.com>; Tue, 10 Sep 2013 12:19:19 -0400 (EDT)
X-Virus-Scanned: Debian amavisd-new at mail.secryption.com
Received: from mail.secryption.com ([127.0.0.1])
by localhost (mail.secryption.com [127.0.0.1]) (amavisd-new, port 10024)
with ESMTP id WbkVn_cowG6q for <bruce#packetaddiction.com>;
Tue, 10 Sep 2013 12:19:18 -0400 (EDT)
Received: from dennis.cng.int (mail.compassnetworkgroup.com [173.163.129.21])
(using TLSv1 with cipher RC4-MD5 (128/128 bits))
(No client certificate requested)
by mail.secryption.com (Postfix) with ESMTPSA id 5B4191421F80
for <bruce#packetaddiction.com>; Tue, 10 Sep 2013 12:19:18 -0400 (EDT)
User-Agent: K-9 Mail for Android
MIME-Version: 1.0
Content-Type: text/plain;
charset=UTF-8
Content-Transfer-Encoding: 8bit
Subject: Message
From: Bruce Markey <bruce#secryption.com>
Date: Tue, 10 Sep 2013 12:19:00 -0400
To: "bruce#packetaddiction.com" <bruce#packetaddiction.com>
Message-ID: <36615ed6-a1a9-49ac-ac85-31905916d478#email.android.com>
-----BEGIN PGP MESSAGE-----
Version: APG v1.0.8
hQEMAwPNxvNWsisWAQgAuOTLkiitYzhGJydOzN4sBoGjhRm9JeJMfmxKxKTKcV2W
ZBuN0z+nS1KxnXrIlahhwLtpiFvp5apI8wAyAiLC2BhFieFttOl1/xLVJbd1nI1o
KQE1RUXhPURejJ3eH9g/LmkhtFQcnsuHGTGnLi6dugBNhWLqgnLUBX+VLt6moz2C
84lDuQ1y7B/JFOctKRScUqmxDd8b2peZJOnVT/p0tSYNfN9QGH3W02FZShE4KKBl
HpezK8KC6cZdf34Eao+ep+fP5DuKx/4j3ksCbFKyQ3gd+yxK/xnhkijDsYCfFRiF
ElAGDvXu4RXqrKRpBxq1bRhU8YqS7j5593MTUViWitLAGgH1DV0UeA/B5LMUDRyz
4ZfDqd0kDYsPUy2Cg20HdXHaobkzdvHLzfqQq0Owc1nTcvu4nzCbIMhTAlZjn8ZA
aODTlKcvnFBWEtNERPm0x6nkbhMo3GeysejaJSRod3aGqhuhga4iIrrew1W03297
aalwY8RKeNoV15VItsyrbbT+HvDNSaFFCPUAs+KcLHCOez5/woozjlqKdBI6yHCe
gqpYJPP07qFsVviltfDO63xS48f2HCPe4iyXCy6Usp0+jM7zAzH7KH1O854GH46Q
r0A01DLo9REmDr4U
=pBQZ
-----END PGP MESSAGE-----
re.match will only find a match at the beginning of the string, as noted here. You want to use re.search
raw = """Sending email to: <bruce#packetaddiction.com>...
...
-----BEGIN PGP MESSAGE-----
...
"""
>>> p = re.compile('-----BEGIN\sPGP\sMESSAGE-----')
>>> m = p.search(raw)
>>> m
<_sre.SRE_Match object at 0x0000000002E02510>
>>> m.group()
'-----BEGIN PGP MESSAGE-----'
>>> m = p.match(raw)
>>> print m
None
Although, as noted, regex is likely overkill for this problem as the matching text is static.
Regular expressions are used when you want a "fuzzy" match - that is, you aren't sure if the string you are looking for will be identical every time.
In this case, the string you are looking for appears to be exactly -----BEGIN PGP MESSAGE----. In this case, the string.find function will be simpler to use and faster to boot.
>>> a = "This is a PGP encrypted email. -----BEGIN PGP MESSAGE----- !##$%^..."
>>> b = "This is not encrypted. My hovercraft is full of eels." #example strings
>>> a.find("-----BEGIN PGP MESSAGE-----")
30 # Return value '30' means that the search string was found at index 30 of source string
>>> b.find("-----BEGIN PGP MESSAGE-----")
-1 # -1 means 'not found in the source string'
>>>

Broken attachment encryption with python and GnuPG

I've taken all the forks of gpg-mailgate and put all the working parts together to get it almost totally working. The last issue I"m having is that attachments come through encrpted as filename.originalextension.pgp BUT are undecryptable.
Here is the full code of the mailgate plugin as I have it working.
#!/usr/bin/python
from ConfigParser import RawConfigParser
from email.mime.base import MIMEBase
import email
import email.message
import re
import GnuPG
import smtplib
import sys
# Read configuration from /etc/gpg-mailgate.conf
_cfg = RawConfigParser()
_cfg.read('/etc/gpg-mailgate.conf')
cfg = dict()
for sect in _cfg.sections():
cfg[sect] = dict()
for (name, value) in _cfg.items(sect):
cfg[sect][name] = value
# Read e-mail from stdin
raw = sys.stdin.read()
raw_message = email.message_from_string( raw )
from_addr = raw_message['From']
to_addrs = sys.argv[1:]
def send_msg( message, recipients = None ):
if recipients == None:
recipients = to_addrs
if cfg.has_key('logging') and cfg['logging'].has_key('file'):
log = open(cfg['logging']['file'], 'a')
log.write("Sending email to: <%s>\n" % '> <'.join( recipients ))
log.close()
relay = (cfg['relay']['host'], int(cfg['relay']['port']))
smtp = smtplib.SMTP(relay[0], relay[1])
smtp.sendmail( from_addr, recipients, message.as_string() )
def encrypt_payload( payload, gpg_to_cmdline ):
gpg = GnuPG.GPGEncryptor( cfg['gpg']['keyhome'], gpg_to_cmdline )
raw_payload = payload.get_payload(decode=True)
gpg.update( raw_payload )
if "-----BEGIN PGP MESSAGE-----" in raw_payload and "-----END PGP MESSAGE-----" in raw_payload:
return payload
payload.set_payload( gpg.encrypt() )
if payload['Content-Disposition']:
payload.replace_header( 'Content-Disposition', re.sub(r'filename="([^"]+)"', r'filename="\1.pgp"', payload['Content-Disposition']) )
if payload['Content-Type']:
payload.replace_header( 'Content-Type', re.sub(r'name="([^"]+)"', r'name="\1.pgp"', payload['Content-Type']) )
# if payload.get_content_type() != 'text/plain' and payload.get_content_type != 'text/html':
if 'name="' in payload['Content-Type']:
payload.replace_header( 'Content-Type', re.sub(r'^[a-z/]+;', r'application/octet-stream;', payload['Content-Type']) )
payload.set_payload( "\n".join( filter( lambda x:re.search(r'^(?:[A-Za-z0-9+/]{4})*(?:[A-Za-z0-9+/]{2}==|[A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{4})$',x), payload.get_payload().split("\n") ) ) )
return payload
def encrypt_all_payloads( payloads, gpg_to_cmdline ):
encrypted_payloads = list()
if type( payloads ) == str:
msg = email.message.Message()
msg.set_payload( payloads )
return encrypt_payload( msg, gpg_to_cmdline ).as_string()
for payload in payloads:
if( type( payload.get_payload() ) == list ):
encrypted_payloads.append( encrypt_all_payloads( payload.get_payload(), gpg_to_cmdline ) )
else:
encrypted_payloads.append( [encrypt_payload( payload, gpg_to_cmdline )] )
return sum(encrypted_payloads, [])
def get_msg( message ):
if not message.is_multipart():
return message.get_payload()
return '\n\n'.join( [str(m) for m in message.get_payload()] )
keys = GnuPG.public_keys( cfg['gpg']['keyhome'] )
gpg_to = list()
ungpg_to = list()
for to in to_addrs:
domain = to.split('#')[1]
if domain in cfg['default']['domains'].split(','):
if to in keys:
gpg_to.append( (to, to) )
elif cfg.has_key('keymap') and cfg['keymap'].has_key(to):
gpg_to.append( (to, cfg['keymap'][to]) )
else:
ungpg_to.append(to)
if gpg_to == list():
if cfg['default'].has_key('add_header') and cfg['default']['add_header'] == 'yes':
raw_message['X-GPG-Mailgate'] = 'Not encrypted, public key not found'
send_msg( raw_message )
exit()
if ungpg_to != list():
send_msg( raw_message, ungpg_to )
if cfg.has_key('logging') and cfg['logging'].has_key('file'):
log = open(cfg['logging']['file'], 'a')
log.write("Encrypting email to: %s\n" % ' '.join( map(lambda x: x[0], gpg_to) ))
log.close()
if cfg['default'].has_key('add_header') and cfg['default']['add_header'] == 'yes':
raw_message['X-GPG-Mailgate'] = 'Encrypted by GPG Mailgate'
gpg_to_cmdline = list()
gpg_to_smtp = list()
for rcpt in gpg_to:
gpg_to_smtp.append(rcpt[0])
gpg_to_cmdline.extend(rcpt[1].split(','))
encrypted_payloads = encrypt_all_payloads( raw_message.get_payload(), gpg_to_cmdline )
raw_message.set_payload( encrypted_payloads )
send_msg( raw_message, gpg_to_smtp )
My client ( both roundcube and k-9 ), do not know what to do with the file.
From command line if I do a gpg --decrypt filename.txt.pgp I get:
gpg: no valid OpenPGP data found.
gpg: decrypt_message failed: eof
Headers of email are:
User-Agent: K-9 Mail for Android
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="----YR49011GO0MWM753ETYWUA7CBOKGAV"
Subject: New with attachments
From: Bruce Markey <bruce#secryption.com>
Date: Fri, 13 Sep 2013 06:18:03 -0400
To: "bruce#packetaddiction.com" <bruce#packetaddiction.com>
Message-ID: <53178821-6d6c-4b7d-b9c7-5a49034da1ef#email.android.com>
X-GPG-Mailgate: Encrypted by GPG Mailgate
I'm not even sure "what" to debug since everything looks ok, and there is a total lack of errors.
If anyone has any direction I'd appreciate it.
Update:
I came across this Decrypt gpg file attached from email (file.pgp)
I decided to add a line to write raw to a log file.
MIME-Version: 1.0
X-Received: by 10.68.130.1 with SMTP id oa1mr18868651pbb.35.1379162867744;
Sat, 14 Sep 2013 05:47:47 -0700 (PDT)
Received: by 10.68.46.72 with HTTP; Sat, 14 Sep 2013 05:47:47 -0700 (PDT)
Date: Sat, 14 Sep 2013 08:47:47 -0400
Message-ID: <CACRtyey-L9Z5JGNG4bheYqJ7tVK+6qfigmanH9pTUk0ute5gEw#mail.gmail.com>
Subject: Test with attachment - Saturday
From: Bruce Markey <bmarkey#gmail.com>
To: bruce#packetaddiction.com
Content-Type: multipart/mixed; boundary=047d7b10ca15d1c7b904e65760eb
--047d7b10ca15d1c7b904e65760eb
Content-Type: text/plain; charset=ISO-8859-1
Just a simple test with txt attachment
--047d7b10ca15d1c7b904e65760eb
Content-Type: text/plain; charset=US-ASCII; name="TestAttach.txt"
Content-Disposition: attachment; filename="TestAttach.txt"
Content-Transfer-Encoding: base64
X-Attachment-Id: f_hlkty4930
VGhpcyBpcyBqdXN0IGEgdGVzdCBvZiB0aGUgYXR0YWNobWVudHMuIApUaGlzIGlzIGEgc2ltcGxl
IHRleHQgZmlsZS4gCgo=
--047d7b10ca15d1c7b904e65760eb--
Since this is being written raw it's pre encryption. So should I be decoding the base64 prior to encryption?
After staring at this for awhile I don't understand why this line is here.
if 'name="' in payload['Content-Type']:
payload.replace_header( 'Content-Type', re.sub(r'^[a-z/]+;', r'application/octet- stream;', payload['Content-Type']) )
payload.set_payload( "\n".join( filter( lambda x:re.search(r'^(?:[A-Za-z0-9+/]{4})*(?:[A-Za-z0-9+/]{2}==|[A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{4})$',x), payload.get_payload().split("\n") ) ) )
Why change to application/octet-stream?
UPDATE:
I think I got it working unless I'm doing something horribly wrong. I changed the following:
def get_msg( message ):
if not message.is_multipart():
return message.get_payload()
return '\n\n'.join( [base64.decodestring(str(m)) for m in message.get_payload()] )
This now allows me to actually run gpg --decrypt filename.txt .
( I assume that most attachments will come through as base64 although I'll probably add a test for all content-transfer-encoding types. )
The attachment needed to be decoded, content-transfer-encoding, prior to encrypting. See code in update of original question.

Parsing email headers with regular expressions in python

I'm a python beginner trying to extract data from email headers. I have thousands of email messages in a single text file, and from each message I want to extract the sender's address, recipient(s) address, and the date, and write it to a single, semicolon-delimitted line in a new file.
this is ugly, but it's what I've come up with:
import re
emails = open("demo_text.txt","r") #opens the file to analyze
results = open("results.txt","w") #creates new file for search results
resultsList = []
for line in emails:
if "From - " in line: #recgonizes the beginning of a email message and adds a linebreak
newMessage = re.findall(r'\w\w\w\s\w\w\w.*', line)
if newMessage:
resultsList.append("\n")
if "From: " in line:
address = re.findall(r'[\w.-]+#[\w.-]+', line)
if address:
resultsList.append(address)
resultsList.append(";")
if "To: " in line:
if "Delivered-To:" not in line: #avoids confusion with 'Delivered-To:' tag
address = re.findall(r'[\w.-]+#[\w.-]+', line)
if address:
for person in address:
resultsList.append(person)
resultsList.append(";")
if "Date: " in line:
date = re.findall(r'\w\w\w\,.*', line)
resultsList.append(date)
resultsList.append(";")
for result in resultsList:
results.writelines(result)
emails.close()
results.close()
and here's my 'demo_text.txt':
From - Sun Jan 06 19:08:49 2013
X-Mozilla-Status: 0001
X-Mozilla-Status2: 00000000
Delivered-To: somebody_1#hotmail.com
Received: by 10.48.48.3 with SMTP id v3cs417003nfv;
Mon, 15 Jan 2007 10:14:19 -0800 (PST)
Received: by 10.65.211.13 with SMTP id n13mr5741660qbq.1168884841872;
Mon, 15 Jan 2007 10:14:01 -0800 (PST)
Return-Path: <nobody#hotmail.com>
Received: from bay0-omc3-s21.bay0.hotmail.com (bay0-omc3-s21.bay0.hotmail.com [65.54.246.221])
by mx.google.com with ESMTP id e13si6347910qbe.2007.01.15.10.13.58;
Mon, 15 Jan 2007 10:14:01 -0800 (PST)
Received-SPF: pass (google.com: domain of nobody#hotmail.com designates 65.54.246.221 as permitted sender)
Received: from hotmail.com ([65.54.250.22]) by bay0-omc3-s21.bay0.hotmail.com with Microsoft SMTPSVC(6.0.3790.2668);
Mon, 15 Jan 2007 10:13:48 -0800
Received: from mail pickup service by hotmail.com with Microsoft SMTPSVC;
Mon, 15 Jan 2007 10:13:47 -0800
Message-ID: <BAY115-F12E4E575FF2272CF577605A1B50#phx.gbl>
Received: from 65.54.250.200 by by115fd.bay115.hotmail.msn.com with HTTP;
Mon, 15 Jan 2007 18:13:43 GMT
X-Originating-IP: [200.122.47.165]
X-Originating-Email: [nobody#hotmail.com]
X-Sender: nobody#hotmail.com
From: =?iso-8859-1?B?UGF1bGEgTWFy7WEgTGlkaWEgRmxvcmVuemE=?=
<nobody#hotmail.com>
To: somebody_1#hotmail.com, somebody_2#gmail.com, 3_nobodies#yahoo.com.ar
Bcc:
Subject: fotos
Date: Mon, 15 Jan 2007 18:13:43 +0000
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="----=_NextPart_000_d98_1c4f_3aa9"
X-OriginalArrivalTime: 15 Jan 2007 18:13:47.0572 (UTC) FILETIME=[E68D4740:01C738D0]
Return-Path: nobody#hotmail.com
The output is:
somebody_1#hotmail.com;somebody_2#gmail.com;3_nobodies#yahoo.com.ar;Mon, 15 Jan 2007 18:13:43 +0000;
This output would be fine except there's a line break in the 'From:' field in my demo_text.txt (line 24), and so I miss 'nobody#hotmail.com'.
I'm not sure how to tell my code to skip line break and still find email address in the From: tag.
More generally, I'm sure there are many more sensible ways to go about this task. If anyone could point me in the right direction, I'd sure appreciate it.
Your demo text is practicallly the mbox format, which can be perfectly processed with the appropriate object in the mailbox module:
from mailbox import mbox
import re
PAT_EMAIL = re.compile(r"[0-9A-Za-z._-]+\#[0-9A-Za-z._-]+")
mymbox = mbox("demo.txt")
for email in mymbox.values():
from_address = PAT_EMAIL.findall(email["from"])
to_address = PAT_EMAIL.findall(email["to"])
date = [ email["date"], ]
print ";".join(from_address + to_address + date)
In order to skip newlines, you can't read it line by line. You can try loading in your file, and using your keywords (From, To, etc.) as boundaries. So when you search for 'From -', you use the rest of your keywords as boundaries so they won't be included in the portion of the list.
Also, mentioning this cause you said you're a beginner:
The "Pythonic" way of naming your non-class variables is with underscores. So resultsList should be results_list.

python3 download mail via pop3 and process it

i have printer status reports sent on my email. i would like to download them and process one by one, and all information to put in some database for further processing.
i would like to use python3 as i start to learn it.
i have this code:
import getpass
import poplib
server = poplib.POP3('pop3.mailserver.com' )
server.user('report#mailserver.com')
server.pass_('pswd')
numMessages = 1 #len(server.list()[1])
emails, total_bytes = server.stat()
print("{0} emails in the inbox, {1} bytes total".format(emails, total_bytes))
for i in range(numMessages):
for msg in server.retr(i+1)[1]:
print(msg)
and what i get is whole email message (with headers and body) in this format:
b'Return-Path: <"tever">'
b'Delivered-To: reportc#mailserver.com'
b'Received: (qmail 13193 invoked by uid 89); 23 May 2012 08:44:51 -0000'
b'Received: by simscan 1.2.0 ppid: 13156, pid: 13164, t: 0.1620s'
b' scanners: clamav: 0.97-exp/m:53 spam: 3.3.1'
b'X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mxavas16.ad.aruba.it'
b'X-Spam-Level: *'
b'X-Spam-Status: No, score=1.4 required=5.0 tests=FH_FROMEML_NOTLD,INVALID_MSGID,'
b'\tT_FILL_THIS_FORM_SHORT autolearn=disabled version=3.3.1'
b'Received: from unknown (HELO smtplq02.aruba.it) (62.149.158.35)'
b' by mxavas16.ad.aruba.it with SMTP; 23 May 2012 08:44:51 -0000'
b'Received: (qmail 30750 invoked by uid 89); 23 May 2012 08:44:51 -0000'
b'Received: from unknown (HELO smtp8.aruba.it) (62.149.158.228)'
b' by smtplq02.aruba.it with SMTP; 23 May 2012 08:44:51 -0000'
b'Received: (qmail 30979 invoked by uid 89); 23 May 2012 08:44:51 -0000'
b'Received: from unknown (HELO NM7ACD31) (email#server.it#83.xxx.xxx.xxx)'
b' by smtp8.ad.aruba.it with SMTP; 23 May 2012 08:44:51 -0000'
b'Date: Wed, 23 May 2012 10:46:34 +0200'
b'From: tever'
b'Subject: QEQ1313212'
b'To: report#mailserver.com'
b'Message-Id: <201205231046340001d806.TEVER>'
b'Mime-Version: 1.0'
b'Content-Type: text/plain; charset="utf-8"'
b'Content-Transfer-Encoding: base64'
b''
b'RXF1aXBtZW50IElEOiAgICAgICAgICAgICANCk1vZGVsIE5hbWU6ICAgICAgICAg'
b'ICAgICAgQ0RDIDE3MjVfRENDIDI3MjUNClNlcmlhbCBOdW1iZXI6ICAgICAgICAg'
b'ICAgUUVRMTMxMzIxMg0KTWV0ZXJEYXRlOiAgICAgICAgICAgICAgICBXZWQgMjMg'
b'TWF5IDIwMTIgMTA6NDY6MzQNCkNvdW50ZXJzIGJ5IEZ1bmN0aW9uDQogUHJpbnRl'
b'ZCBQYWdlcw0KICBDb3BpZXI6ICAgICAgICAgICAgICAgICAyMjE1ICAgIA0KICBQ'
b'cmludGVyOiAgICAgICAgICAgICAgICAxMTEyMDQgIA0KICBGQVg6ICAgICAgICAg'
b'ICAgICAgICAgICA5MzIgICAgIA0KICBUb3RhbDogICAgICAgICAgICAgICAgICAx'
b'MTQzNTEgIA0KIFNjYW5uZWQgUGFnZXMNCiAgQ29waWVyOiAgICAgICAgICAgICAg'
b'ICAgMTkxOSAgICANCiAgRkFYOiAgICAgICAgICAgICAgICAgICAgMjIwNyAgICAN'
b'CiAgT3RoZXI6ICAgICAgICAgICAgICAgICAgMTgyMiAgICANCiAgVG90YWw6ICAg'
b'ICAgICAgICAgICAgICAgNTk0OCAgICANCkNvdW50ZXJzIGJ5IFBhcGVyIFNpemUN'
b'Ck1vbm9jaHJvbWUNCiAgQTM6ICAgICAgICAgICAgICAgICAgICAgNDU0ICAgICAN'
b'CiAgQjQ6ICAgICAgICAgICAgICAgICAgICAgMCAgICAgICANCiAgQTQ6ICAgICAg'
b'ICAgICAgICAgICAgICAgMTA4MDQ4ICANCiAgQjU6ICAgICAgICAgICAgICAgICAg'
b'ICAgNDI3ICAgICANCiAgQTU6ICAgICAgICAgICAgICAgICAgICAgMCAgICAgICAN'
b'CiAgRm9saW86ICAgICAgICAgICAgICAgICAgMSAgICAgICANCiAgTGVkZ2VyOiAg'
b'ICAgICAgICAgICAgICAgMCAgICAgICANCiAgTGVnYWw6ICAgICAgICAgICAgICAg'
b'ICAgMCAgICAgICANCiAgTGV0dGVyOiAgICAgICAgICAgICAgICAgMCAgICAgICAN'
b'CiAgU3RhdGVtZW50OiAgICAgICAgICAgICAgMCAgICAgICANCiAgT3RoZXIxOiAg'
b'ICAgICAgICAgICAgICAgMCAgICAgICANCiAgT3RoZXIyOiAgICAgICAgICAgICAg'
b'ICAgMiAgICAgICANCk1vbm8gQ29sb3INCiAgQTM6ICAgICAgICAgICAgICAgICAg'
b'ICAgMCAgICAgICANCiAgQjQ6ICAgICAgICAgICAgICAgICAgICAgMCAgICAgICAN'
b'CiAgQTQ6ICAgICAgICAgICAgICAgICAgICAgMCAgICAgICANCiAgQjU6ICAgICAg'
b'IE90aGVyIEVycm9ycw0KDQo8V2VkIDIzIE1heSAyMDEyIDEwOjQxOjU0Pg0KICBb'
b'IF0gQWxsIE90aGVyIEVycm9ycw0KDQo8V2VkIDIzIE1heSAyMDEyIDEwOjQ1OjIx'
b'Pg0KICBbKl0gQWRkIFBhcGVyDQoNCi0tLS0tLS0tLS0tLS0tLS0tLS0NCkNEQyAx'
b'NzI1X0RDQyAyNzI1DQpbMDA6YzA6ZWU6N2E6Y2Q6MzFdDQotLS0tLS0tLS0tLS0t'
b'LS0tLS0t'
b'DQo='
b''
what i need is to process body content line by line and if it matches i need to delete it from the server.
and tips how to do it?
many thanks
gerard
maybe if you start by parsing the message it would be a good start:
# ... get your message ...
# msg = [b'Return-Path: <"tever">'
# b'Delivered-To: reportc#mailserver.com', ... ]
import email
# decode simple non-multipart message
message = email.message_from_bytes(b'\n'.join(msg))
payload = message.get_payload(decode=True)
payload = payload.decode(message.get_content_charset())
print(payload)
then you can do with the payload whatever you need...

Categories