Convert Outlook PST to json using libpst - python

I have an Outlook PST file, and I'd like to get a json of the emails, e.g. something like
{"emails": [
{"from": "alice#example.com",
"to": "bob#example.com",
"bcc": "eve#example.com",
"subject": "mitm",
"content": "be careful!"
}, ...]}
I've thought using readpst to convert to MH format and then scan it in a ruby/python/bash script, is there a better way?
Unfortunately the ruby-msg gem doesn't work on my PST files (and looks like it wasn't updated since 2014).

I found a way to do it in 2 stages, first convert to mbox and then to json:
# requires installing libpst
pst2json my.pst
# or you can specify a custom output dir and an outlook mail folder,
# e.g. Inbox, Sent, etc.
pst2json -o email/ -f Inbox my.pst
Where pst2json is my script and mbox2json is slightly modified from Mining the Social Web.
pst2json:
#!/usr/bin/env bash
usage(){
echo "usage: $(basename $0) [-o <output-dir>] [-f <folder>] <pst-file>"
echo "default output-dir: email/mbox-all/<pst-file>"
echo "default folder: Inbox"
exit 1
}
which readpst || { echo "Error: libpst not installed"; exit 1; }
folder=Inbox
while (( $# > 0 )); do
[[ -n "$pst_file" ]] && usage
case "$1" in
-o)
if [[ -n "$2" ]]; then
out_dir="$2"
shift 2
else
usage
fi
;;
-f)
if [[ -n "$2" ]]; then
folder="$2"
shift 2
else
usage
fi
;;
*)
pst_file="$1"
shift
esac
done
default_out_dir="email/mbox-all/$(basename $pst_file)"
out_dir=${out_dir:-"$default_out_dir"}
mkdir -p "$out_dir"
readpst -o "$out_dir" "$pst_file"
[[ -f "$out_dir/$folder" ]] || { echo "Error: folder $folder is missing or empty."; exit 1; }
res="$out_dir"/"$folder".json
mbox2json "$out_dir/$folder" "$res" && echo "Success: result saved to $res"
mbox2json (python 2.7):
# -*- coding: utf-8 -*-
import sys
import mailbox
import email
import quopri
import json
from BeautifulSoup import BeautifulSoup
MBOX = sys.argv[1]
OUT_FILE = sys.argv[2]
SKIP_HTML=True
def cleanContent(msg):
# Decode message from "quoted printable" format
msg = quopri.decodestring(msg)
# Strip out HTML tags, if any are present
soup = BeautifulSoup(msg)
return ''.join(soup.findAll(text=True))
def jsonifyMessage(msg):
json_msg = {'parts': []}
for (k, v) in msg.items():
json_msg[k] = v.decode('utf-8', 'ignore')
# The To, CC, and Bcc fields, if present, could have multiple items
# Note that not all of these fields are necessarily defined
for k in ['To', 'Cc', 'Bcc']:
if not json_msg.get(k):
continue
json_msg[k] = json_msg[k].replace('\n', '').replace('\t', '').replace('\r'
, '').replace(' ', '').decode('utf-8', 'ignore').split(',')
try:
for part in msg.walk():
json_part = {}
if part.get_content_maintype() == 'multipart':
continue
type = part.get_content_type()
if SKIP_HTML and type == 'text/html':
continue
json_part['contentType'] = type
content = part.get_payload(decode=False).decode('utf-8', 'ignore')
json_part['content'] = cleanContent(content)
json_msg['parts'].append(json_part)
except Exception, e:
sys.stderr.write('Skipping message - error encountered (%s)\n' % (str(e), ))
finally:
return json_msg
# There's a lot of data to process, so use a generator to do it. See http://wiki.python.org/moin/Generators
# Using a generator requires a trivial custom encoder be passed to json for serialization of objects
class Encoder(json.JSONEncoder):
def default(self, o):
return {'emails': list(o)}
# The generator itself...
def gen_json_msgs(mb):
while 1:
msg = mb.next()
if msg is None:
break
yield jsonifyMessage(msg)
mbox = mailbox.UnixMailbox(open(MBOX, 'rb'), email.message_from_file)
json.dump(gen_json_msgs(mbox),open(OUT_FILE, 'wb'), indent=4, cls=Encoder)
Now, it's possible to process the file easily. E.g. to get just the contents of the emails:
jq '.emails[] | .parts[] | .content' < out/Inbox.json

Related

Jenkins build failing without updating Xray with the failed status

Please forgive me if this is not the place to ask this question. I'm running python scripts in a Jenkins pipeline from a Jenkinsfile. I am also updating Jira Xray tickets within the Jenkisfile. Behave is being used to validate the test status. If the check fails then the Jenkins build fails without getting the Xray ticket updated with the failure. I've attempted to use "try" to capture the failure but have not succeeded in getting the failure to propagate to the Xray ticket.
Would anyone here know where I might find an answer? I would be in your dept.
Jenkinsfile
node() {
def repoURL = '<GitLab URL>/prod-003.git'
def STC_INSTALL = "/opt/STC_CLIENT/Spirent_TestCenter_5.22/Spirent_TestCenter_Application_Linux/"
try {
stage("Prepare Workspace") {
echo "*** Prepare Workspace ***"
cleanWs()
env.WORKSPACE_LOCAL = sh(returnStdout: true, script: 'pwd').trim()
env.BUILD_TIME = "${BUILD_TIMESTAMP}"
echo "Workspace set to:" + env.WORKSPACE_LOCAL
echo "Build time:" + env.BUILD_TIME
sh """
cd ${env.WORKSPACE_LOCAL}
rm -fr *
"""
}
stage('Checkout Code') {
echo "*** Checking Code Out ***"
git branch: 'master', credentialsId: '', url: repoURL
}
stage('Executing Tests') {
if (env.WanModeCheck == "Yes") {
echo "Executing WAN Mode Change Before FW Upgrade"
sh """
/var/lib/jenkins/.pyenv/shims/python WanMode.py -i $modemIP -m $WanMode
"""
echo "Starting Firmware Upgrade"
sh """
cd ${env.WORKSPACE_LOCAL}
./ModemUpgrade.sh -i $modemIP -f $FW -p2
/var/lib/jenkins/.pyenv/shims/behave -f cucumber -o storetarget-bdd/reporting/cucumber.json --junit --format=json -o target/behave.json --junit ./features/PROD-003.feature
"""
} else {
echo "#######################\n# Skipping WAN Mode Change #\n#######################"
}
if (env.WanModeCheck == "No") {
echo "Starting Firmware Upgrade"
sh """
cd ${env.WORKSPACE_LOCAL}
./ModemUpgrade.sh -i $modemIP -f $FW -p2
/var/lib/jenkins/.pyenv/shims/behave -f cucumber -o storetarget-bdd/reporting/cucumber.json --junit --format=json -o target/behave.json --junit ./features/fwupgrade.feature
"""
}
// Setting variables to use for the Xray Test Execution
res = sh(returnStdout: true, script: 'awk "/##/{f=1;next} /#####/{f=0} f" PROD-003-Out.txt | sed -e "s/#//g" -e "s/^ * //g" | tr "\n" "%" | sed -e "s/^%%%%%%//g" -e "s/%%$//g" -e "s/%/\\\\\\\\Z/g" -e "s/Z/n/g"')
env.STResults = res.strip()
model = sh(returnStdout: true, script: 'grep Model: PROD-003-Out.txt')
env.Model = model.strip()
wanmode = sh(returnStdout: true, script: 'grep CPE PROD-003-Out.txt')
env.WanMode = wanmode.strip()
serialnum = sh(returnStdout: true, script: 'grep Number: PROD-003-Out.txt')
env.SerialNum = serialnum.strip()
echo "End of test phase"
}
stage('Expose report') {
echo "*** Expose Reports ***"
echo "*** Archive Artifacts ***"
archiveArtifacts "**/cucumber.json"
echo "*** cucumber cucumber.json ***"
cucumber '**/cucumber.json'
junit skipPublishingChecks: true, allowEmptyResults: true, keepLongStdio: true, testResults: 'reports/*.xml'
cucumber buildStatus: "UNSTABLE",
fileIncludePattern: "**/cucumber.json",
jsonReportDirectory: 'reports'
}
stage('Import results to Xray') {
echo "*** Import Results to XRAY ***"
def description = "Jenkins Project: ${env.JOB_NAME}\\n\\nCucumber Test Report: [${env.JOB_NAME}-Link|${env.BUILD_URL}/cucumber-html-reports/overview-features.html]\\n\\nJenkins Console Output: [${env.JOB_NAME}-Console-Link|${env.BUILD_URL}/console]\\n\\nCPE IP: ${modemIP}\\n\\nCPE FW File Name: ${FW}\\n\\n${env.STResults}"
def labels = '["regression","automated_regression"]'
def environment = "DEV"
def testExecutionFieldId = 10552
def testEnvironmentFieldName = "customfield_10372"
def projectKey = "AARC"
def projectId = 10608
def xrayConnectorId = "e66d84d8-f978-4af6-9757-93d5804fde1d"
// def xrayConnectorId = "${xrayConnectorId}"
def info = '''{
"fields": {
"project": {
"id": "''' + projectId + '''"
},
"labels":''' + labels + ''',
"description":"''' + description + '''",
"summary": "''' + env.JOB_NAME + ' ' + env.Model + ' ' + env.WanMode + ' ' + env.SerialNum + ''' Test Executed ''' + env.BUILD_TIME + ''' " ,
"issuetype": {
"id": "''' + testExecutionFieldId + '''"
}
}
}'''
echo info
step([$class: 'XrayImportBuilder',
endpointName: '/cucumber/multipart',
importFilePath: 'storetarget-bdd/reporting/cucumber.json',
importInfo: info,
inputInfoSwitcher: 'fileContent',
serverInstance: xrayConnectorId])
}
}
catch(e) {
// If there was an exception thrown, the build failed
currentBuild.result = "FAILED"
throw e
} finally {
// Success or failure, always send notifications
echo "Sending final test status to Slack"
// notifyBuild(currentBuild.result)
}
}
def notifyBuild(String buildStatus = 'STARTED') {
// build status of null means successful
buildStatus = buildStatus ?: 'SUCCESSFUL'
// Default values
def colorName = 'RED'
def colorCode = '#FF0000'
def subject = "${buildStatus}: Job '${env.JOB_NAME} [${env.BUILD_NUMBER}]'"
def summary = "${subject} (${env.BUILD_URL})"
def details = """<p>STARTED: Job '${env.JOB_NAME} [${env.BUILD_NUMBER}]':</p>
<p>Check console output at &QUOT;<a href='${env.BUILD_URL}'>${env.JOB_NAME} [${env.BUILD_NUMBER}]</a>&QUOT;</p>"""
// Override default values based on build status
if (buildStatus == 'STARTED') {
color = 'BLUE'
colorCode = '#0000FF'
msg = "Build: ${env.JOB_NAME} has started: ${BUILD_TIMESTAMP}"
} else if (buildStatus == 'UNSTABLE') {
color = 'YELLOW'
colorCode = '#FFFF00'
msg = "Build: ${env.JOB_NAME} was listed as unstable. Look at ${env.BUILD_URL} and Report: ${env.BUILD_URL}/cucumber-html-reports/overview-features.html"
} else if (buildStatus == 'SUCCESSFUL') {
color = 'GREEN'
colorCode = '#00FF00'
msg = "Build: ${env.JOB_NAME} Completed Successfully ${env.BUILD_URL} Report: ${env.BUILD_URL}/cucumber-html-reports/overview-features.html"
} else {
color = 'RED'
colorCode = '#FF0000'
msg = "Build: ${env.JOB_NAME} had an issue ${env.BUILD_URL}/console"
}
// Send notifications
slackSend (color: colorCode, message: summary)
slackSend baseUrl: 'https://hooks.slack.com/services/',
channel: '#wopr-private',
color: colorCode,
message: msg,
teamDomain: '<Slack URL>',
tokenCredentialId: 'Jenkins-Slack-Token',
username: 'JenkinsAutomation'
}
feature file
Feature: SNMP Firmware Upgrade Test
#demo #AARC-3428
Scenario: SNMP Firmware Upgrade Executed against the DUT
Given ModemUpgrade.sh Script Exists
When SNMP Firmware Upgrade Executed
Then I expect Result Pass
step file
from behave import *
import pathlib
from pathlib import Path
#given('ModemUpgrade.sh Script Exists')
def step_impl(context):
STCFile = pathlib.Path('ModemUpgrade.sh')
if STCFile.exists():
print("SNMP Firmware Upgrade file exists")
pass
# else:
# print("SNMP Firmware Upgrade file does not exists")
# assert context.failed
#when('SNMP Firmware Upgrade Executed')
def step_impl(context):
path_to_file = 'PROD-003-Out.txt'
path = Path(path_to_file)
if path.is_file():
print(f'Output file {path_to_file} exists')
else:
print(f'Output file {path_to_file} does not exists')
#then('I expect Result Pass')
def step_impl(context):
Result = False
with open("PROD-003-Out.txt") as FwUpgradeResults:
for line in FwUpgradeResults:
if 'Upgrade Status: Passed'.lower() in line.strip().lower():
Result = True
break
else:
Result = False
break
if Result is False:
print("Error: Upgrade Failed")
assert context.failed
The suggestion of using || /usr/bin/true appears to have worked for the above mentioned code. Now I have a second instance where my Python test is throwing an exception when the DUT fails DHCP bind
def wait_for_dhcp_bind():
try:
stc.perform("Dhcpv4BindWait", objectlist=project)
except Exception:
raise Exception("DHCP Bind Failed")
I attempted to add the same after the Python script but the Jenkins build fails without the Xray test getting updated with a failure.
Here is what this looks like in the Jenkinsfile
echo "Starting Speed Test"
// def ModemMac = sh(returnStdout: true, script: './ModemUpgrade.sh -i ${modemIP} -f mac')
sh """
export STC_PRIVATE_INSTALL_DIR=${STC_INSTALL}
cd ${env.WORKSPACE_LOCAL}
/var/lib/jenkins/.pyenv/shims/python SpeedTest.py -d $dsp -u $usp -i $iterations -x $imix -f $frames -m $ModemMac || /usr/bin/true
/var/lib/jenkins/.pyenv/shims/behave -f cucumber -o storetarget-bdd/reporting/cucumber.json --junit --format=json -o target/behave.json --junit ./features/speedtest.feature || /usr/bin/true
"""
Your case should be easy to fix. Behave utility returns exit code 1 if tests fails..
Just add this to the end of your behave command || /usr/bin/true (please make sure of the path of the "true" command).
This will make your command to always return true even if some problems exist with behave.
So your overall command should be something like:
/var/lib/jenkins/.pyenv/shims/behave -f cucumber -o storetarget-bdd/reporting/cucumber.json --junit --format=json -o target/behave.json --junit ./features/PROD-003.feature || /usr/bin/true

Running a powershell on multiple python threads

I have a Backup software made in Python. This backup software lists all the computers on a network and I need to backup the computers to a given directory. My thought was: When the user click on backup on a certain computer, it launches a Thread where a Powershell script is executed and the percentage ((Bytes Copied / Total Bytes to be Copied)*100) is displayed. Everything works fine and I'm able to copy a single computer without problem.
But here's the problem: Whenever I launch a second thread (I click backup on another Computer), it seems like the process from Powershell stops as there's no more output and the only running process is the new Powershell process launched on the second thread.
Thread run method:
def run(self):
'''
Initialise the runner function with passed args, kwargs.
'''
self.setAlive(True)
p = subprocess.Popen(["powershell.exe",".\script.ps1 . .\\Backups"],stdout=subprocess.PIPE)
self.signals.progress.emit(self.CurrentComp['RowIndex'], 0, self)
percentage = 0.0
a = ""
displayPercentage = 0.0
while a != "Done Copying":
a = p.stdout.readline()
a = a.decode("utf-8").strip()
try:
percentage = float(a)
print(percentage)
while displayPercentage != int(percentage):
displayPercentage += 1
sleep(0.03)
self.signals.progress.emit(self.CurrentComp['RowIndex'], displayPercentage ,self)
except ValueError:
pass
self.signals.finished.emit()
Param([String[]]$paths,[String]$Destination)
function Copy-WithProgress {
[CmdletBinding()]
param (
[Parameter(Mandatory = $true)]
[string] $Source
, [Parameter(Mandatory = $true)]
[string] $Destination
, [int] $Gap = 200
, [int] $ReportGap = 2000
)
$RegexBytes = '(?<=\s+)\d+(?=\s+)';
$CommonRobocopyParams = '/MIR /NP /NDL /NC /BYTES /NJH /NJS';
$StagingLogPath = '{0}\temp\{1} robocopy staging.log' -f $env:windir, (Get-Date -Format 'yyyy-MM-dd HH-mm-ss');
$StagingArgumentList = '"{0}" "{1}" /LOG:"{2}" /L {3}' -f $Source, $Destination, $StagingLogPath, $CommonRobocopyParams;
Start-Process -Wait -FilePath robocopy.exe -ArgumentList $StagingArgumentList -NoNewWindow;
$StagingContent = Get-Content -Path $StagingLogPath;
$TotalFileCount = $StagingContent.Count - 1;
[RegEx]::Matches(($StagingContent -join "`n"), $RegexBytes) | % { $BytesTotal = 0; } { $BytesTotal += $_.Value; };
$RobocopyLogPath = '{0}\temp\{1} robocopy.log' -f $env:windir, (Get-Date -Format 'yyyy-MM-dd HH-mm-ss');
$ArgumentList = '"{0}" "{1}" /LOG:"{2}" /ipg:{3} {4}' -f $Source, $Destination, $RobocopyLogPath, $Gap, $CommonRobocopyParams;
$Robocopy = Start-Process -FilePath robocopy.exe -ArgumentList $ArgumentList -Verbose -PassThru -NoNewWindow;
Start-Sleep -Milliseconds 100;
#region Progress bar loop
while (!$Robocopy.HasExited) {
Start-Sleep -Milliseconds $ReportGap;
$BytesCopied = 0;
$LogContent = Get-Content -Path $RobocopyLogPath;
$BytesCopied = [Regex]::Matches($LogContent, $RegexBytes) | ForEach-Object -Process { $BytesCopied += $_.Value; } -End { $BytesCopied; };
#Write-Verbose -Message ('Bytes copied: {0}' -f $BytesCopied);
#Write-Verbose -Message ('Files copied: {0}' -f $LogContent.Count);
$Percentage = 0;
if ($BytesCopied -gt 0) {
$Percentage = (($BytesCopied/$BytesTotal)*100)
}
Write-Host $Percentage
}
Write-Host "Done Copying"

Need a help on fetching value based on a key from a config file

I have a file containing similar data
[xxx]
name = xxx
address = bangalore
[yyy]
name = yyy
address = sjc
Please help me getting a regex that I can fetch the address/name value based on xxx or yyy (xxx or yyy and address or name is the input)
You can do something like this with awk if your file is just like that (i.e., the name is the same as the section and it is before the address):
$ awk -v nm='yyy' -F ' *= *' '$1=="name" && $2==nm{infi=1; next}
$1=="address" && infi {print $2; infi=0}' file
sjc
Or, better still you can get the section and then fetch the key, value as they occur and print them and then exit:
$ awk -v sec='yyy' -v key='address' '
BEGIN{
FS=" *= *"
pat=sprintf("^\\[%s\\]", sec)}
$0 ~ pat {secin=$1; next}
NF==2 && $1==key && secin ~ pat {print $2; exit}' file
sjc
If you want to gather all sections with their key/value pairs, you can do (with gawk):
$ gawk 'BEGIN{FS=" *= *"}
/^\[[^\]]+\]/ && NF==1 {sec=$1; next}
NF==2 {d[sec][$1]=$2}
END{ for (k in d){
printf "%s: ",k
for (v in d[k])
printf "\t%s = %s\n", v, d[k][v]
}
}' file
[xxx]: address = bangalore
name = xxx
[yyy]: address = sjc
name = yyy
Config or .ini files can have quoting like csv, so it is best to use a full config file parser. You can use Perl or Python that have robust libraries for parsing .ini or config type files.
Python example:
#!/usr/bin/python
import ConfigParser
config = ConfigParser.ConfigParser()
config.read("/tmp/file")
Then you can grab the sections, the items in each section, or a specific items in a specific section:
>>> config.sections()
['xxx', 'yyy']
>>> config.items("yyy")
[('name', 'yyy'), ('address', 'sjc')]
>>> config.get("xxx", "address")
'bangalore'
Regex to the rescue!
This approach splits the entries into single elements and parses the key-value-pairs afterwards. In the end, you can simply ask your resulting dictionary for ie. values['xxx'].
See a demo on ideone.com.
import re
string = """
[xxx]
name = xxx
address = bangalore
[yyy]
name = yyy
address = sjc
"""
rx_item = re.compile(r'''
^\[(?P<name>[^][]*)\]
.*?
(?=^\[[^][]*\]$|\Z)
''', re.X | re.M | re.DOTALL)
rx_value = re.compile(r'^(?P<key>\w+)\s*=\s*(?P<value>.+)$', re.MULTILINE)
values = {item.group('name'): {
m.group('key'): m.group('value')
for m in rx_value.finditer(item.group(0))}
for item in rx_item.finditer(string)
}
print(values)
# {'xxx': {'name': 'xxx', 'address': 'bangalore'}, 'yyy': {'name': 'yyy', 'address': 'sjc'}}
It's not clear if you're trying to search for the value inside the square brackets or the value of the "name" tag but here's a solution to one possible interpretation of your question:
$ cat tst.awk
BEGIN { FS=" *= *" }
!NF { next }
NF<2 { prt(); k=$0 }
{ map[$1] = $2 }
END { prt() }
function prt() { if (k=="["key"]") print map[tag]; delete map }
$ awk -v key='yyy' -v tag='address' -f tst.awk file
sjc
$ awk -v key='xxx' -v tag='address' -f tst.awk file
bangalore
$ awk -v key='xxx' -v tag='name' -f tst.awk file
xxx

Undoing "marked as read" status of emails fetched with imaplib

I wrote a python script to fetch all of my gmail. I have hundreds of thousands of old emails, of which about 10,000 were unread.
After successfully fetching all of my email, I find that gmail has marked all the fetched emails as "read". This is disastrous for me since I need to check all unread emails only.
How can I recover the information about which emails were unread? I dumped each mail object into files, the core of my code is shown below:
m = imaplib.IMAP4_SSL("imap.gmail.com")
m.login(user,pwd)
m.select("[Gmail]/All Mail")
resp, items = m.uid('search', None, 'ALL')
uids = items[0].split()
for uid in uids:
resp, data = m.uid('fetch', uid, "(RFC822)")
email_body = data[0][1]
mail = email.message_from_string(email_body)
dumbobj(uid, mail)
I am hoping there is either an option to undo this in gmail, or a member inside the stored mail objects reflecting the seen-state information.
For anyone looking to prevent this headache, consider this answer here. This does not work for me, however, since the damage has already been done.
Edit:
I have written the following function to recursively "grep" all strings in an object, and applied it to a dumped email object using the following keywords:
regex = "(?i)((marked)|(seen)|(unread)|(read)|(flag)|(delivered)|(status)|(sate))"
So far, no results (only an unrelated "Delivered-To"). Which other keywords could I try?
def grep_object (obj, regex , cycle = set(), matched = set()):
import re
if id(obj) in cycle:
return
cycle.update([id(obj)])
if isinstance(obj, basestring):
if re.search(regex, obj):
matched.update([obj])
def grep_dict (adict ):
try:
[ [ grep_object(a, regex, cycle, matched ) for a in ab ] for ab in adict.iteritems() ]
except:pass
grep_dict(obj)
try:grep_dict(obj.__dict__)
except:pass
try:
[ grep_object(elm, regex, cycle, matched ) for elm in obj ]
except: pass
return matched
grep_object(mail_object, regex)
I'm having a similar problem (not with gmail), and the biggest problem for me was to make a reproducible test case; and I finally managed to produce one (see below).
In terms of the Seen flag, I now gather it goes like this:
If a message is new/unseen, IMAP fetch for \Seen flag will return empty (i.e. it will not be present, as related to the email message).
If you do IMAP select on a mailbox (INBOX), you get a "flag" UNSEEN which contains a list of ids (or uids) of emails in that folder that are new (do not have the \Seen flag)
In my test case, if you fetch say headers for a message with BODY.PEEK, then \Seen on a message is not set; if you fetch them with BODY, then \Seen is set
In my test case, also fetching (RFC822) doesn't set \Seen (unlike your case with Gmail)
In the test case, I try to do pprint.pprint(inspect.getmembers(mail)) (in lieu of your dumpobj(uid, mail)) - but only after I'm certain \Seen has been set. The output I get is posted in mail_object_inspect.txt - and as far as I can see, there is no mention of 'new/read/seen' etc. in none of the readable fields; furthermore mail.as_string() prints:
'From: jesse#example.com\nTo: user#example.com\nSubject: This is a test message!\n\nHello. I am executive assistant to the director of\nBear Stearns, a failed investment Bank. I have\naccess to USD6,000,000. ...\n'
Even worse, there is no mention of "fields" anywhere in the imaplib code (below filenames are printed if they do not contain case-insensitive "field" anywhere):
$ grep -L -i field /usr/lib/python{2.7,3.2}/imaplib.py
/usr/lib/python2.7/imaplib.py
/usr/lib/python3.2/imaplib.py
... so I guess that information was not saved with your dumps.
Here is a bit on reconstructing the test case. The hardest was to find a small IMAP server, that can be quickly ran with some arbitrary users and emails, but without having to install a ton of stuff on your system. Finally I found one: trivial-server.pl, the example file of Perl's Net::IMAP::Server; tested on Ubuntu 11.04.
The test case is pasted in this gist, with two files (with many comments) that I'll try to post abridged:
trivial-serverB.pl - Perl (v5.10.1) Net::IMAP::Server server (has a terminal output paste at end of file with a telnet client session)
testimap.py - Python 2.7/3.2 imaplib
client (has a terminal output paste at end of file, of itself operating with the server)
trivial-serverB.pl
First, make sure you have Net::IMAP::Server - note, it has many dependencies, so the below command may take a while to install:
sudo perl -MCPAN -e 'install Net::IMAP::Server'
Then, in the directory where you got trivial-serverB.pl, create a subdirectory with SSL certificates:
mkdir certs
openssl req \
-x509 -nodes -days 365 \
-subj '/C=US/ST=Oregon/L=Portland/CN=localhost' \
-newkey rsa:1024 -keyout certs/server-key.pem -out certs/server-cert.pem
Finally run the server with administrative properties:
sudo perl trivial-serverB.pl
Note that the trivial-serverB.pl has a hack which will let a client to connect without SSL. Here is trivial-serverB.pl:
#!/usr/bin/perl
use v5.10.1;
use feature qw(say);
use Net::IMAP::Server;
package Demo::IMAP::Hack;
$INC{'Demo/IMAP/Hack.pm'} = 1;
sub capabilityb {
my $self = shift;
print STDERR "Capabilitin'\n";
my $base = $self->server->capability;
my #words = split " ", $base;
#words = grep {$_ ne "STARTTLS"} #words
if $self->is_encrypted;
unless ($self->auth) {
my $auth = $self->auth || $self->server->auth_class->new;
my #auth = $auth->sasl_provides;
# hack:
#unless ($self->is_encrypted) {
# # Lack of encrpytion makes us turn off all plaintext auth
# push #words, "LOGINDISABLED";
# #auth = grep {$_ ne "PLAIN"} #auth;
#}
push #words, map {"AUTH=$_"} #auth;
}
return join(" ", #words);
}
package Demo::IMAP::Auth;
$INC{'Demo/IMAP/Auth.pm'} = 1;
use base 'Net::IMAP::Server::DefaultAuth';
sub auth_plain {
my ( $self, $user, $pass ) = #_;
# XXX DO AUTH CHECK
$self->user($user);
return 1;
}
package Demo::IMAP::Model;
$INC{'Demo/IMAP/Model.pm'} = 1;
use base 'Net::IMAP::Server::DefaultModel';
sub init {
my $self = shift;
$self->root( Demo::IMAP::Mailbox->new() );
$self->root->add_child( name => "INBOX" );
}
###########################################
package Demo::IMAP::Mailbox;
use base qw/Net::IMAP::Server::Mailbox/;
use Data::Dumper;
my $data = <<'EOF';
From: jesse#example.com
To: user#example.com
Subject: This is a test message!
Hello. I am executive assistant to the director of
Bear Stearns, a failed investment Bank. I have
access to USD6,000,000. ...
EOF
my $msg = Net::IMAP::Server::Message->new($data);
sub load_data {
my $self = shift;
$self->add_message($msg);
}
my %ports = ( port => 143, ssl_port => 993 );
$ports{$_} *= 10 for grep {$> > 0} keys %ports;
$myserv = Net::IMAP::Server->new(
auth_class => "Demo::IMAP::Auth",
model_class => "Demo::IMAP::Model",
user => 'nobody',
log_level => 3, # at least 3 to output 'CONNECT TCP Peer: ...' message; 4 to output IMAP commands too
%ports,
);
# apparently, this overload MUST be after the new?! here:
{
no strict 'refs';
*Net::IMAP::Server::Connection::capability = \&Demo::IMAP::Hack::capabilityb;
}
# https://stackoverflow.com/questions/27206371/printing-addresses-of-perl-object-methods
say " -", $myserv->can('validate'), " -", $myserv->can('capability'), " -", \&Net::IMAP::Server::Connection::capability, " -", \&Demo::IMAP::Hack::capabilityb;
$myserv->run();
testimap.py
With the server above running in one terminal, in another terminal you can just do:
python testimap.py
The code will simply read fields and content from the one (and only) message the server above presents, and will eventually restore (remove) the \Seen field.
import sys
if sys.version_info[0] < 3: # python 2.7
def uttc(x):
return x
else: # python 3+
def uttc(x):
return x.decode("utf-8")
import imaplib
import email
import pprint,inspect
imap_user = 'nobody'
imap_password = 'whatever'
imap_server = 'localhost'
conn = imaplib.IMAP4(imap_server)
conn.debug = 3
try:
(retcode, capabilities) = conn.login(imap_user, imap_password)
except:
print(sys.exc_info()[1])
sys.exit(1)
# not conn.select(readonly=1), else we cannot modify the \Seen flag later
conn.select() # Select inbox or default namespace
(retcode, messages) = conn.search(None, '(UNSEEN)')
if retcode == 'OK':
for num in uttc(messages[0]).split(' '):
if not(num):
print("No messages available: num is `{0}`!".format(num))
break
print('Processing message: {0}'.format(num))
typ, data = conn.fetch(num,'(FLAGS)')
isSeen = ( "Seen" in uttc(data[0]) )
print('Got flags: {2}: {0} .. {1}'.format(typ,data, # NEW: OK .. ['1 (FLAGS ())']
"Seen" if isSeen else "NEW"))
print('Peeking headers, message: {0} '.format(num))
typ, data = conn.fetch(num,'(BODY.PEEK[HEADER])')
pprint.pprint(data)
typ, data = conn.fetch(num,'(FLAGS)')
isSeen = ( "Seen" in uttc(data[0]) )
print('Got flags: {2}: {0} .. {1}'.format(typ,data, # NEW: OK .. ['1 (FLAGS ())']
"Seen" if isSeen else "NEW"))
print('Get RFC822 body, message: {0} '.format(num))
typ, data = conn.fetch(num,'(RFC822)')
mail = email.message_from_string(uttc(data[0][1]))
#pprint.pprint(inspect.getmembers(mail))
typ, data = conn.fetch(num,'(FLAGS)')
isSeen = ( "Seen" in uttc(data[0]) )
print('Got flags: {2}: {0} .. {1}'.format(typ,data, # NEW: OK .. ['1 (FLAGS ())']
"Seen" if isSeen else "NEW"))
print('Get headers, message: {0} '.format(num))
typ, data = conn.fetch(num,'(BODY[HEADER])') # note, FLAGS (\\Seen) is now in data, even if not explicitly requested!
pprint.pprint(data)
print('Get RFC822 body, message: {0} '.format(num))
typ, data = conn.fetch(num,'(RFC822)')
mail = email.message_from_string(uttc(data[0][1]))
pprint.pprint(inspect.getmembers(mail)) # this is in mail_object_inspect.txt
pprint.pprint(mail.as_string())
typ, data = conn.fetch(num,'(FLAGS)')
isSeen = ( "Seen" in uttc(data[0]) )
print('Got flags: {2}: {0} .. {1}'.format(typ,data, # Seen: OK .. ['1 (FLAGS (\\Seen))']
"Seen" if isSeen else "NEW"))
conn.select() # select again, to see flags server side
# * OK [UNSEEN 0] # no more unseen messages (if there was only one msg in folder)
print('Restoring flag to unseen/new, message: {0} '.format(num))
ret, data = conn.store(num,'-FLAGS','\\Seen')
if ret == 'OK':
print("Set back to unseen; Got OK: {0}{1}{2}".format(data,'\n',30*'-'))
print(mail)
typ, data = conn.fetch(num,'(FLAGS)')
isSeen = ( "Seen" in uttc(data[0]) )
print('Got flags: {2}: {0} .. {1}'.format(typ,data, # NEW: OK .. [b'1 (FLAGS ())']
"Seen" if isSeen else "NEW"))
conn.close()
References
How do I mock an IMAP server in Python, despite extreme laziness?
Get only NEW Emails imaplib and python
Undoing "marked as read" status of emails fetched with imaplib
http://www.skytale.net/blog/archives/23-Manual-IMAP.html
IMAP FETCH Subject
https://mail.python.org/pipermail/python-list/2009-March/527020.html
http://www.thecodingforums.com/threads/re-imaplib-fetch-message-flags.673872/

Grep reliably all C #defines

I need to analyse some C files and print out all the #define found.
It's not that hard with a regexp (for example)
def with_regexp(fname):
print("{0}:".format(fname))
for line in open(fname):
match = macro_regexp.match(line)
if match is not None:
print(match.groups())
But for example it doesn't handle multiline defines for example.
There is a nice way to do it in C for example with
gcc -E -dM file.c
the problem is that it returns all the #defines, not just the one from the given file, and I don't find any option to only use the given file..
Any hint?
Thanks
EDIT:
This is a first solution to filter out the unwanted defines, simply checking that the name of the define is actually part of the original file, not perfect but seems to work nicely..
def with_gcc(fname):
cmd = "gcc -dM -E {0}".format(fname)
proc = Popen(cmd, shell=True, stdout=PIPE)
out, err = proc.communicate()
source = open(fname).read()
res = set()
for define in out.splitlines():
name = define.split(' ')[1]
if re.search(name, source):
res.add(define)
return res
Sounds like a job for a shell one-liner!
What I want to do is remove the all #includes from the C file (so we don't get junk from other files), pass that off to gcc -E -dM, then remove all the built in #defines - those start with _, and apparently linux and unix.
If you have #defines that start with an underscore this won't work exactly as promised.
It goes like this:
sed -e '/#include/d' foo.c | gcc -E -dM - | sed -e '/#define \(linux\|unix\|_\)/d'
You could probably do it in a few lines of Python too.
In PowerShell you could do something like the following:
function Get-Defines {
param([string] $Path)
"$Path`:"
switch -regex -file $Path {
'\\$' {
if ($multiline) { $_ }
}
'^\s*#define(.*)$' {
$multiline = $_.EndsWith('\');
$_
}
default {
if ($multiline) { $_ }
$multiline = $false
}
}
}
Using the following sample file
#define foo "bar"
blah
#define FOO \
do { \
do_stuff_here \
do_more_stuff \
} while (0)
blah
blah
#define X
it prints
\x.c:
#define foo "bar"
#define FOO \
do { \
do_stuff_here \
do_more_stuff \
} while (0)
#define X
Not ideal, at least how idiomatic PowerShell functions should work, but should work well enough for your needs.
Doing this in pure python I'd use a small state machine:
def getdefines(fname):
""" return a list of all define statements in the file """
lines = open(fname).read().split("\n") #read in the file as a list of lines
result = [] #the result list
current = []#a temp list that holds all lines belonging to a define
lineContinuation = False #was the last line break escaped with a '\'?
for line in lines:
#is the current line the start or continuation of a define statement?
isdefine = line.startswith("#define") or lineContinuation
if isdefine:
current.append(line) #append to current result
lineContinuation = line.endswith("\\") #is the line break escaped?
if not lineContinuation:
#we reached the define statements end - append it to result list
result.append('\n'.join(current))
current = [] #empty the temp list
return result

Categories