lxml.objectify error when writing data

lxml.objectify error when writing data - python

here is my code:
from lxml import etree, objectify
def parseXML(xmlFile):
with open(xmlFile) as f:
xml = f.read()
root = objectify.fromstring(xml)
#returns attributes in element node as dict
attrib = root.attrib
#how to extract element data
begin = root.appointment.begin
uid = root.appointment.uid
#loop over elements and print their tags and text
for appt in root.getchildren():
for e in appt.getchildren():
print('%s => %s' % (e.tag, e.text))
print()
#how to change element's text
root.appointment.begin = 'something else'
print(root.appointment.begin)
#how to add a new element
root.appointment.new_element = 'new data'
#remove the py:pytype stuff
objectify.deannotate(root)
etree.cleanup_namespaces(root)
obj_xml = etree.tostring(root, pretty_print=True)
print(obj_xml)
#save your xml
with open('new.xml', 'w') as f:
f.write(obj_xml)
parseXML('example.xml')
Here is parsed xml file:
<?xml version="1.0" ?>
<zAppointments reminder="15">
<appointment>
<begin>1181251600</begin>
<uid>0400000008200E000</uid>
<alarmTime>1181572063</alarmTime>
<state></state>
<location></location>
<duration>1800</duration>
<subject>Bring pizza home</subject>
</appointment>
<appointment>
<begin>1234567890</begin>
<duration>1800</duration>
<subject>Check MS office webstie for updates</subject>
<state>dismissed</state>
<location></location>
<uid>502fq14-12551ss-255sf2</uid>
</appointment>
</zAppointments>
And here is output with error:
/usr/bin/python3.5 "/home/michal/Desktop/nauka programowania/python 101/parsing_with_lxml.py"
begin => 1181251600
uid => 0400000008200E000
Traceback (most recent call last):
alarmTime => 1181572063
File "/home/michal/Desktop/nauka programowania/python 101/parsing_with_lxml.py", line 87, in <module>
state => None
location => None
parseXML('example.xml')
duration => 1800
subject => Bring pizza home
begin => 1234567890
duration => 1800
subject => Check MS office webstie for updates
state => dismissed
location => None
uid => 502fq14-12551ss-255sf2
something else
b'<zAppointments reminder="15">\n <appointment>\n <begin>something else</begin>\n <uid>0400000008200E000</uid>\n <alarmTime>1181572063</alarmTime>\n <state/>\n <location/>\n <duration>1800</duration>\n <subject>Bring pizza home</subject>\n <new_element>new data</new_element>\n </appointment>\n <appointment>\n <begin>1234567890</begin>\n <duration>1800</duration>\n <subject>Check MS office webstie for updates</subject>\n <state>dismissed</state>\n <location/>\n <uid>502fq14-12551ss-255sf2</uid>\n </appointment>\n</zAppointments>\n'
File "/home/michal/Desktop/nauka programowania/python 101/parsing_with_lxml.py", line 85, in parseXML
f.write(obj_xml)
TypeError: write() argument must be str, not bytes
Process finished with exit code 1
What can I do to turn that f object to a string? Is it possible even? I got that error few times earlier and still don't know how to fix it (doing Python 101 exercises).

obj_xml is bytes type, so can't use it with write() without decoding it first. Need to change
f.write(obj_xml)
to:
f.write(obj_xml.decode('utf-8'))
And it works great!

Related

How can I parse a VCARD in a XML file

I'm trying to parse an XML file in which there is some VCARD. I need the info: FN, NOTE (SIREN and A) and print them as a list as FN, SIREN_A. I would also like to add them in a list if the string in the description equals "diviseur" only
I've tried different things (vobject, finditer) but none of them work. For my parser, I'm using the library xml.etree.ElementTree and pandas which usually are causing some incompatibilies.
code python :
import xml.etree.ElementTree as ET
import vobject
newlist=[]
data=[]
data.append(newlist)
diviseur=[]
tree=ET.parse('test_oc.xml')
root=tree.getroot()
newlist=[]
for lifeCycle in root.findall('{http://ltsc.ieee.org/xsd/LOM}lifeCycle'):
for contribute in lifeCycle.findall('{http://ltsc.ieee.org/xsd/LOM}contribute'):
for entity in contribute.findall('{http://ltsc.ieee.org/xsd/LOM}entity'):
vcard = vobject.readOne(entity)
siren = vcard.contents['note'].value,":",vcard.contents['fn'].value
print ('siren',siren.text)
for date in contribute.findall('{http://ltsc.ieee.org/xsd/LOM}date'):
for description in date.findall('{http://ltsc.ieee.org/xsd/LOM}description'):
entite=description.find('{http://ltsc.ieee.org/xsd/LOM}string')
print ('Type entité:', entite.text)
newlist.append(entite)
j=0
for j in range(len(entite)-1):
if entite[j]=="diviseur":
diviseur.append(siren[j])
print('diviseur:', diviseur)
newlist.append(diviseur)
data.append(newlist)
print(data)
xml file to parse:
<?xml version="1.0" encoding="UTF-8"?>
<lom:lom xmlns:lom="http://ltsc.ieee.org/xsd/LOM" xmlns:lomfr="http://www.lom-fr.fr/xsd/LOMFR" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://ltsc.ieee.org/xsd/LOM">
<lom:version uniqueElementName="version">
<lom:string language="http://id.loc.gov/vocabulary/iso639-2/fre">V4.1</lom:string>
</lom:version>
<lom:lifeCycle uniqueElementName="lifeCycle">
<lom:contribute>
<lom:entity><![CDATA[
BEGIN:VCARD
VERSION:4.0
FN:Cailler
N:;Valérie;;Mr;
ORG:Veoli
NOTE:SIREN=203025106
NOTE :ISNI=0000000000000000
END:VCARD
]]></lom:entity>
<lom:date uniqueElementName="date">
<lom:dateTime uniqueElementName="dateTime">2019-07-10</lom:dateTime>
<lom:description uniqueElementName="description">
<lom:string>departure</lom:string>
</lom:description>
</lom:date>
</lom:contribute>
<lom:contribute>
<lom:entity><![CDATA[
BEGIN:VCARD
VERSION:4.0
FN:Besnard
N:;Ugo;;Mr;
ORG:MG
NOTE:SIREN=501 025 205
NOTE :A=0000 0000
END:VCARD
]]></lom:entity>
<lom:date uniqueElementName="date">
<lom:dateTime uniqueElementName="dateTime">2019-07-10</lom:dateTime>
<lom:description uniqueElementName="description">
<lom:string>diviseur</lom:string>
</lom:description>
</lom:date>
</lom:contribute>
</lom:lifeCycle>
</lom:lom>
Traceback (most recent call last):
File "parser_export_csv_V2.py", line 73, in
vcard = vobject.readOne(entity)
File "C:\Users\b\AppData\Local\Programs\Python\Python36-32\lib\site-packages\vobject\base.py", line 1156, in readOne
allowQP))
File "C:\Users\b\AppData\Local\Programs\Python\Python36-32\lib\site-packages\vobject\base.py", line 1089, in readComponents
for line, n in getLogicalLines(stream, allowQP):
File "C:\Users\b\AppData\Local\Programs\Python\Python36-32\lib\site-packages\vobject\base.py", line 869, in getLogicalLines
val = fp.read(-1)
AttributeError: 'xml.etree.ElementTree.Element' object has no attribute 'read'

There are a few problems here.
entity is an Element instance, and vCard is a plain text data format. vobject.readOne() expects text.
There is unwanted whitespace adjacent to the vCard properties in the XML file.
NOTE :ISNI=0000000000000000 is invalid; it should be NOTE:ISNI=0000000000000000 (space removed).
vcard.contents['note'] is a list and does not have a value property.
Here is code that probably doesn't produce exactly what you want, but I hope it helps:
import xml.etree.ElementTree as ET
import vobject
NS = {"lom": "http://ltsc.ieee.org/xsd/LOM"}
tree = ET.parse('test_oc.xml')
for contribute in tree.findall('.//lom:contribute', NS):
desc_string = contribute.find('.//lom:string', NS)
print(desc_string.text)
entity = contribute.find('lom:entity', NS)
txt = entity.text.replace(" ", "") # Text with spaces removed
vcard = vobject.readOne(txt)
for p in vcard.contents["note"]:
print(p.name, p.value)
for p in vcard.contents["fn"]:
print(p.name, p.value)
print()
Output:
departure
NOTE SIREN=203025106
NOTE ISNI=0000000000000000
FN Cailler
diviseur
NOTE SIREN=501025205
NOTE A=00000000
FN Besnard

python question regarding re.search and indexing

Im new to python and coding and im trying to understand how re.search and indexing works.
below is what I have so far and I want physical_state to equal what comes after the Completion: (in this case it is success) but I dont really understand how the match.group(1) and re.search works.
import ops # Import the OPS module.
import sys # Import the sys module.
import re
# Subscription processing function
def ops_condition (o):
status, err_str = o.timer.relative("tag",10)
return status
def ops_execute (o):
handle, err_desp = o.cli.open()
print("OPS opens the process of command:",err_desp)
result, n11, n21 = o.cli.execute(handle,"return")
result, n11, n21 = o.cli.execute(handle,"dis nqa results test-instance sla 1 | i Completion:")
match = re.search(r"Completion:", result)
if not match:
print("Could not determine the state.")
return 0 # Look into what the return values mean.
physical_state = match.group(1) # Gets the first group from the match.
print (physical_state)
result = o.cli.close(handle)
return 0
output of result, n11, n21 = o.cli.execute(handle,"dis nqa results test-instance sla 1 | i Completion:")
Completion:success RTD OverThresholds number: 0
Completion:success RTD OverThresholds number: 0
Completion:success RTD OverThresholds number: 0
Completion:success RTD OverThresholds number: 0
Completion:success RTD OverThresholds number: 0
error when run
<setup>('OPS opens the process of command:', 'success')
Oct 18 2018 06:12:57+00:00 setup %%01OPSA/3/OPS_RESULT_EXCEPTION(l)[410]:Script is test3.py, current event is tag, instance is 1381216156, exception reason is Traceback (most recent call last):
File ".lib/frame.py", line 114, in <module>
ret = m.ops_execute(o)
File "flash:$_user/test3.py", line 21, in ops_execute
physical_state = match.group(1) # Gets the first group from the match.
IndexError: no such group
Thanks in advance

.strip won't remove last quotation mark in Python3 Program

EDIT: UPDATE, so I ran this with #Alex Thornton's suggestion.
This is my output:
'100.00"\r'
Traceback (most recent call last):
File "budget.py", line 48, in <module>
Main()
File "budget.py", line 44, in Main
budget = readBudget("budget.txt")
File "budget.py", line 21, in readBudget
p_value = float(maxamount)
ValueError: invalid literal for float(): 100.00"
Under Windows though, I just get the list of numbers, with the qutations and the \r's stripped off.
Now, I don't know too much about the way Windows and Linux handle text files, but isn't it due to the way Windows and Linux handle the return/enter key?
So I have this code:
def readBudget(budgetFile):
# Read the file into list lines
f = open(budgetFile)
lines = f.readlines()
f.close()
budget = []
# Parse the lines
for i in range(len(lines)):
list = lines[i].split(",")
exptype = list[0].strip('" \n')
if exptype == "Type":
continue
maxamount = list[1].strip('$" \n')
entry = {'exptype':exptype, 'maxamnt':float(maxamount)}
budget.append(entry)
#print(budget)
return budget
def printBudget(budget):
print()
print("================= BUDGET ==================")
print("Type".ljust(12), "Max Amount".ljust(12))
total = 0
for b in budget:
print(b['exptype'].ljust(12), str("$%0.2f" %b['maxamnt']).ljust(50))
total = total + b['maxamnt']
print("Total: ", "$%0.2f" % total)
def Main():
budget = readBudget("budget.txt")
printBudget(budget)
if __name__ == '__main__':
Main()
Which reads from this file:
"Type", "MaxAmount"
"SCHOOL","$100.00"
"UTILITIES","$200.00"
"AUTO", "$100.00"
"RENT", "$600.00"
"MEALS", "$300.00"
"RECREATION", "$100.00"
It is supposed to extract the budget type (school, utilities, etc) and the max amount. The max amount is supposed to be converted to a float. However, when I run the program, I get this error.
Traceback (most recent call last):
File "budget.py", line 47, in <module>
Main()
File "budget.py", line 43, in Main
budget = readBudget("budget.txt")
File "budget.py", line 22, in readBudget
entry = {'exptype':exptype, 'maxamnt':float(maxamount)}
ValueError: invalid literal for float(): 100.00"
Shouldn't the strip function in readBudget remove the last quotation mark?

When I tried this:
>>> attempt = '"$100.00"'
>>> new = attempt.strip('$" \n')
'100.00'
>>> float(new)
100.00
I got exactly what one would expect- so it must be something to do with what we cannot see from the file. From what you've posted, it's not clear whether there is something subtly wrong with the string you're trying to pass to float() (because it looks perfectly reasonable). Try adding a debug print statement:
print(repr(maxamount))
p_value = float(maxamount)
Then you can determine exactly what is being passed to float(). The call to repr() will make even normally invisible characters visible. Add the result to your question and we will be able to comment further.
EDIT:
In which case, replace:
maxamount = list[1].strip('$" \n')
With:
maxamount = list[1].strip('$" \n\r')
That should then work fine.

Adding in this:
maxamount = list[1].strip('$" \n\r')
Or more specifically, the \r, removed the error.

You can use a regex to capture either all or most of the floating point info in a string.
Consider:
import re
valid='''\
123.45"
123.
123"
.123
123e-16
-123e16
123e45
+123.45'''
invalid='''\
12"34
12f45
e123'''
pat=r'(?:^|\s)([-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?)'
for e in [valid, invalid]:
print
for line in e.splitlines():
m=re.search(pat, line)
if m:
print '"{}" -> {} -> {}'.format(line, m.group(1), float(m.group(1)))
else:
print '"{}" not valid'.format(line)
Prints:
"123.45"" -> 123.45 -> 123.45
"123." -> 123 -> 123.0
"123"" -> 123 -> 123.0
".123" -> .123 -> 0.123
"123e-16" -> 123e-16 -> 1.23e-14
"-123e16" -> -123e16 -> -1.23e+18
"123e45" -> 123e45 -> 1.23e+47
"+123.45" -> +123.45 -> 123.45
"12"34" -> 12 -> 12.0
"12f45" -> 12 -> 12.0
"e123" not valid
Just modify the regex to capture what you consider a valid floating point data point -- or invalid.

Undoing "marked as read" status of emails fetched with imaplib

I wrote a python script to fetch all of my gmail. I have hundreds of thousands of old emails, of which about 10,000 were unread.
After successfully fetching all of my email, I find that gmail has marked all the fetched emails as "read". This is disastrous for me since I need to check all unread emails only.
How can I recover the information about which emails were unread? I dumped each mail object into files, the core of my code is shown below:
m = imaplib.IMAP4_SSL("imap.gmail.com")
m.login(user,pwd)
m.select("[Gmail]/All Mail")
resp, items = m.uid('search', None, 'ALL')
uids = items[0].split()
for uid in uids:
resp, data = m.uid('fetch', uid, "(RFC822)")
email_body = data[0][1]
mail = email.message_from_string(email_body)
dumbobj(uid, mail)
I am hoping there is either an option to undo this in gmail, or a member inside the stored mail objects reflecting the seen-state information.
For anyone looking to prevent this headache, consider this answer here. This does not work for me, however, since the damage has already been done.
Edit:
I have written the following function to recursively "grep" all strings in an object, and applied it to a dumped email object using the following keywords:
regex = "(?i)((marked)|(seen)|(unread)|(read)|(flag)|(delivered)|(status)|(sate))"
So far, no results (only an unrelated "Delivered-To"). Which other keywords could I try?
def grep_object (obj, regex , cycle = set(), matched = set()):
import re
if id(obj) in cycle:
return
cycle.update([id(obj)])
if isinstance(obj, basestring):
if re.search(regex, obj):
matched.update([obj])
def grep_dict (adict ):
try:
[ [ grep_object(a, regex, cycle, matched ) for a in ab ] for ab in adict.iteritems() ]
except:pass
grep_dict(obj)
try:grep_dict(obj.__dict__)
except:pass
try:
[ grep_object(elm, regex, cycle, matched ) for elm in obj ]
except: pass
return matched
grep_object(mail_object, regex)

I'm having a similar problem (not with gmail), and the biggest problem for me was to make a reproducible test case; and I finally managed to produce one (see below).
In terms of the Seen flag, I now gather it goes like this:
If a message is new/unseen, IMAP fetch for \Seen flag will return empty (i.e. it will not be present, as related to the email message).
If you do IMAP select on a mailbox (INBOX), you get a "flag" UNSEEN which contains a list of ids (or uids) of emails in that folder that are new (do not have the \Seen flag)
In my test case, if you fetch say headers for a message with BODY.PEEK, then \Seen on a message is not set; if you fetch them with BODY, then \Seen is set
In my test case, also fetching (RFC822) doesn't set \Seen (unlike your case with Gmail)
In the test case, I try to do pprint.pprint(inspect.getmembers(mail)) (in lieu of your dumpobj(uid, mail)) - but only after I'm certain \Seen has been set. The output I get is posted in mail_object_inspect.txt - and as far as I can see, there is no mention of 'new/read/seen' etc. in none of the readable fields; furthermore mail.as_string() prints:
'From: jesse#example.com\nTo: user#example.com\nSubject: This is a test message!\n\nHello. I am executive assistant to the director of\nBear Stearns, a failed investment Bank. I have\naccess to USD6,000,000. ...\n'
Even worse, there is no mention of "fields" anywhere in the imaplib code (below filenames are printed if they do not contain case-insensitive "field" anywhere):
$ grep -L -i field /usr/lib/python{2.7,3.2}/imaplib.py
/usr/lib/python2.7/imaplib.py
/usr/lib/python3.2/imaplib.py
... so I guess that information was not saved with your dumps.
Here is a bit on reconstructing the test case. The hardest was to find a small IMAP server, that can be quickly ran with some arbitrary users and emails, but without having to install a ton of stuff on your system. Finally I found one: trivial-server.pl, the example file of Perl's Net::IMAP::Server; tested on Ubuntu 11.04.
The test case is pasted in this gist, with two files (with many comments) that I'll try to post abridged:
trivial-serverB.pl - Perl (v5.10.1) Net::IMAP::Server server (has a terminal output paste at end of file with a telnet client session)
testimap.py - Python 2.7/3.2 imaplib
client (has a terminal output paste at end of file, of itself operating with the server)
trivial-serverB.pl
First, make sure you have Net::IMAP::Server - note, it has many dependencies, so the below command may take a while to install:
sudo perl -MCPAN -e 'install Net::IMAP::Server'
Then, in the directory where you got trivial-serverB.pl, create a subdirectory with SSL certificates:
mkdir certs
openssl req \
-x509 -nodes -days 365 \
-subj '/C=US/ST=Oregon/L=Portland/CN=localhost' \
-newkey rsa:1024 -keyout certs/server-key.pem -out certs/server-cert.pem
Finally run the server with administrative properties:
sudo perl trivial-serverB.pl
Note that the trivial-serverB.pl has a hack which will let a client to connect without SSL. Here is trivial-serverB.pl:
#!/usr/bin/perl
use v5.10.1;
use feature qw(say);
use Net::IMAP::Server;
package Demo::IMAP::Hack;
$INC{'Demo/IMAP/Hack.pm'} = 1;
sub capabilityb {
my $self = shift;
print STDERR "Capabilitin'\n";
my $base = $self->server->capability;
my #words = split " ", $base;
#words = grep {$_ ne "STARTTLS"} #words
if $self->is_encrypted;
unless ($self->auth) {
my $auth = $self->auth || $self->server->auth_class->new;
my #auth = $auth->sasl_provides;
# hack:
#unless ($self->is_encrypted) {
# # Lack of encrpytion makes us turn off all plaintext auth
# push #words, "LOGINDISABLED";
# #auth = grep {$_ ne "PLAIN"} #auth;
#}
push #words, map {"AUTH=$_"} #auth;
}
return join(" ", #words);
}
package Demo::IMAP::Auth;
$INC{'Demo/IMAP/Auth.pm'} = 1;
use base 'Net::IMAP::Server::DefaultAuth';
sub auth_plain {
my ( $self, $user, $pass ) = #_;
# XXX DO AUTH CHECK
$self->user($user);
return 1;
}
package Demo::IMAP::Model;
$INC{'Demo/IMAP/Model.pm'} = 1;
use base 'Net::IMAP::Server::DefaultModel';
sub init {
my $self = shift;
$self->root( Demo::IMAP::Mailbox->new() );
$self->root->add_child( name => "INBOX" );
}
###########################################
package Demo::IMAP::Mailbox;
use base qw/Net::IMAP::Server::Mailbox/;
use Data::Dumper;
my $data = <<'EOF';
From: jesse#example.com
To: user#example.com
Subject: This is a test message!
Hello. I am executive assistant to the director of
Bear Stearns, a failed investment Bank. I have
access to USD6,000,000. ...
EOF
my $msg = Net::IMAP::Server::Message->new($data);
sub load_data {
my $self = shift;
$self->add_message($msg);
}
my %ports = ( port => 143, ssl_port => 993 );
$ports{$_} *= 10 for grep {$> > 0} keys %ports;
$myserv = Net::IMAP::Server->new(
auth_class => "Demo::IMAP::Auth",
model_class => "Demo::IMAP::Model",
user => 'nobody',
log_level => 3, # at least 3 to output 'CONNECT TCP Peer: ...' message; 4 to output IMAP commands too
%ports,
);
# apparently, this overload MUST be after the new?! here:
{
no strict 'refs';
*Net::IMAP::Server::Connection::capability = \&Demo::IMAP::Hack::capabilityb;
}
# https://stackoverflow.com/questions/27206371/printing-addresses-of-perl-object-methods
say " -", $myserv->can('validate'), " -", $myserv->can('capability'), " -", \&Net::IMAP::Server::Connection::capability, " -", \&Demo::IMAP::Hack::capabilityb;
$myserv->run();
testimap.py
With the server above running in one terminal, in another terminal you can just do:
python testimap.py
The code will simply read fields and content from the one (and only) message the server above presents, and will eventually restore (remove) the \Seen field.
import sys
if sys.version_info[0] < 3: # python 2.7
def uttc(x):
return x
else: # python 3+
def uttc(x):
return x.decode("utf-8")
import imaplib
import email
import pprint,inspect
imap_user = 'nobody'
imap_password = 'whatever'
imap_server = 'localhost'
conn = imaplib.IMAP4(imap_server)
conn.debug = 3
try:
(retcode, capabilities) = conn.login(imap_user, imap_password)
except:
print(sys.exc_info()[1])
sys.exit(1)
# not conn.select(readonly=1), else we cannot modify the \Seen flag later
conn.select() # Select inbox or default namespace
(retcode, messages) = conn.search(None, '(UNSEEN)')
if retcode == 'OK':
for num in uttc(messages[0]).split(' '):
if not(num):
print("No messages available: num is `{0}`!".format(num))
break
print('Processing message: {0}'.format(num))
typ, data = conn.fetch(num,'(FLAGS)')
isSeen = ( "Seen" in uttc(data[0]) )
print('Got flags: {2}: {0} .. {1}'.format(typ,data, # NEW: OK .. ['1 (FLAGS ())']
"Seen" if isSeen else "NEW"))
print('Peeking headers, message: {0} '.format(num))
typ, data = conn.fetch(num,'(BODY.PEEK[HEADER])')
pprint.pprint(data)
typ, data = conn.fetch(num,'(FLAGS)')
isSeen = ( "Seen" in uttc(data[0]) )
print('Got flags: {2}: {0} .. {1}'.format(typ,data, # NEW: OK .. ['1 (FLAGS ())']
"Seen" if isSeen else "NEW"))
print('Get RFC822 body, message: {0} '.format(num))
typ, data = conn.fetch(num,'(RFC822)')
mail = email.message_from_string(uttc(data[0][1]))
#pprint.pprint(inspect.getmembers(mail))
typ, data = conn.fetch(num,'(FLAGS)')
isSeen = ( "Seen" in uttc(data[0]) )
print('Got flags: {2}: {0} .. {1}'.format(typ,data, # NEW: OK .. ['1 (FLAGS ())']
"Seen" if isSeen else "NEW"))
print('Get headers, message: {0} '.format(num))
typ, data = conn.fetch(num,'(BODY[HEADER])') # note, FLAGS (\\Seen) is now in data, even if not explicitly requested!
pprint.pprint(data)
print('Get RFC822 body, message: {0} '.format(num))
typ, data = conn.fetch(num,'(RFC822)')
mail = email.message_from_string(uttc(data[0][1]))
pprint.pprint(inspect.getmembers(mail)) # this is in mail_object_inspect.txt
pprint.pprint(mail.as_string())
typ, data = conn.fetch(num,'(FLAGS)')
isSeen = ( "Seen" in uttc(data[0]) )
print('Got flags: {2}: {0} .. {1}'.format(typ,data, # Seen: OK .. ['1 (FLAGS (\\Seen))']
"Seen" if isSeen else "NEW"))
conn.select() # select again, to see flags server side
# * OK [UNSEEN 0] # no more unseen messages (if there was only one msg in folder)
print('Restoring flag to unseen/new, message: {0} '.format(num))
ret, data = conn.store(num,'-FLAGS','\\Seen')
if ret == 'OK':
print("Set back to unseen; Got OK: {0}{1}{2}".format(data,'\n',30*'-'))
print(mail)
typ, data = conn.fetch(num,'(FLAGS)')
isSeen = ( "Seen" in uttc(data[0]) )
print('Got flags: {2}: {0} .. {1}'.format(typ,data, # NEW: OK .. [b'1 (FLAGS ())']
"Seen" if isSeen else "NEW"))
conn.close()
References
How do I mock an IMAP server in Python, despite extreme laziness?
Get only NEW Emails imaplib and python
Undoing "marked as read" status of emails fetched with imaplib
http://www.skytale.net/blog/archives/23-Manual-IMAP.html
IMAP FETCH Subject
https://mail.python.org/pipermail/python-list/2009-March/527020.html
http://www.thecodingforums.com/threads/re-imaplib-fetch-message-flags.673872/

Preserving special characters in text nodes using Python lxml module

I am editing an XML file that is provided by a third party. The XML is used to recreate and entire environment and one is able to edit the XML to propogate the changes. I was able to lookup the element I wanted to change through command line options and save the XML, but special characters are being escaped and I need to retain the special characters. For example it is changing > to $gt; in the file during the .write operation. This is affecting in all occurances of the XML document not just the node element (I think that is what it is called) Below is my code:
import sys
from lxml import etree
from optparse import OptionParser
def parseCommandLine ():
usage = "usage: %prog [options] arg"
parser = OptionParser(usage)
parser.add_option("-f","--file",dest="filename",
help="Context File name including full path", metavar="CONTEXT_FILE")
parser.add_option("-k","--key",dest="key",
help="Key you are looking for in Context File i.e s_isAdmin", metavar="s_someKey")
parser.add_option("-v","--value",dest="value",
help="The replacement value for the key")
if len(sys.argv[1:]) < 3:
print len(sys.argv[1:])
parser.print_help()
sys.exit(2)
(options, args) = parser.parse_args()
return options.filename, options.key, options.value
Filename, Key, Value=parseCommandLine()
parser_options=etree.XMLParser(attribute_defaults=True, dtd_validation=False, strip_cdata=False)
doc = etree.parse(Filename, parser_options ) #Open and parse the file
print doc.findall("//*[#oa_var=%r]" % Key)[0].text
oldval = doc.findall("//*[#oa_var=%r]" % Key)[0].text
val = doc.findall("//*[#oa_var=%r]" % Key)[0]
val.text = Value
print 'old value is %s' % oldval
print 'new value is %s' % val.text
root = doc.getroot()
doc.write(Filename,method='xml',with_tail=True,pretty_print=False)
Original file has this:
tf.fm.FulfillmentServer >> /s_u01/app/applmgr/f
Saved version is being replaced with this:
tf.fm.FulfillmentServer >> /s_u01/app/applmgr/f
I have been trying to mess with pretty_print in the output side DTD validations on the parsing side and I am stumped.
Below is a diff from the changed file and and the original file:
I updated the s_cookie_domain only.
diff finprod_acfpdb10.xml_original finprod_acfpdb10.xml
Warning: missing newline at end of file finprod_acfpdb10.xml
1,3c1
< <?xml version = '1.0'?>
< <!-- $Header: adxmlctx.tmp 115.426 2009/05/08 08:46:29 rdamodar ship $ -->
< <!--
---
> <!-- $Header: adxmlctx.tmp 115.426 2009/05/08 08:46:29 rdamodar ship $ --><!--
13,14c11
< -->
< <oa_context version="$Revision: 115.426 $">
---
> --><oa_context version="$Revision: 115.426 $">
242c239
< <cookiedomain oa_var="s_cookie_domain">.apollogrp.edu</cookiedomain>
---
> <cookiedomain oa_var="s_cookie_domain">.qadoamin.edu</cookiedomain>
526c523
< <FORMS60_BLOCK_URL_CHARACTERS oa_var="s_f60blockurlchar">%0a,%0d,!,%21,",%22,%28,%29,;,[,%5b,],%5d,{,%7b,|,%7c,},%7d,%7f,>,%3c,<,%3e</FORMS60_BLOCK_URL_CHARACTERS>
---
> <FORMS60_BLOCK_URL_CHARACTERS oa_var="s_f60blockurlchar">%0a,%0d,!,%21,",%22,%28,%29,;,[,%5b,],%5d,{,%7b,|,%7c,},%7d,%7f,>,%3c,<,%3e</FORMS60_BLOCK_URL_CHARACTERS>
940c937
< <start_cmd oa_var="s_jtffstart">/s_u01/app/applmgr/jdk1.5.0_11/bin/java -Xmx512M -classpath .:/s_u01/app/applmgr/finprod/comn/java/jdbc111.zip:/s_u01/app/applmgr/finprod/comn/java/xmlparserv2.zip:/s_u01/app/applmgr/finprod/comn/java:/s_u01/app/applmgr/finprod/comn/java/apps.zip:/s_u01/app/applmgr/jdk1.5.0_11/classes:/s_u01/app/applmgr/jdk1.5.0_11/lib:/s_u01/app/applmgr/jdk1.5.0_11/lib/classes.zip:/s_u01/app/applmgr/jdk1.5.0_11/lib/classes.jar:/s_u01/app/applmgr/jdk1.5.0_11/lib/rt.jar:/s_u01/app/applmgr/jdk1.5.0_11/lib/i18n.jar:/s_u01/app/applmgr/finprod/comn/java/3rdparty/RFJavaInt.zip: -Dengine.LogPath=/s_u01/app/applmgr/finprod/comn/admin/log/finprod_acfpdb10 -Dengine.TempDir=/s_u01/app/applmgr/finprod/comn/temp -Dengine.CommandPromptEnabled=false -Dengine.CommandPort=11000 -Dengine.AOLJ.config=/s_u01/app/applmgr/finprod/appl/fnd/11.5.0/secure/acfpdb10_finprod.dbc -Dengine.ServerID=5000 -Ddebug=off -Dengine.LogLevel=1 -Dlog.ShowWarnings=false -Dengine.FaxEnabler=oracle.apps.jtf.fm.engine.rightfax.RfFaxEnablerImpl -Dengine.PrintEnabler=oracle.apps.jtf.fm.engine.rightfax.RfPrintEnablerImpl -Dfax.TempDir=/s_u01/app/applmgr/finprod/comn/admin/log/finprod_acfpdb10 -Dprint.TempDir=/s_u01/app/applmgr/finprod/comn/admin/log/finprod_acfpdb10 oracle.apps.jtf.fm.FulfillmentServer >> /s_u01/app/applmgr/finprod/comn/admin/log/finprod_acfpdb10/jtffmctl.txt</start_cmd>
---
> <start_cmd oa_var="s_jtffstart">/s_u01/app/applmgr/jdk1.5.0_11/bin/java -Xmx512M -classpath .:/s_u01/app/applmgr/finprod/comn/java/jdbc111.zip:/s_u01/app/applmgr/finprod/comn/java/xmlparserv2.zip:/s_u01/app/applmgr/finprod/comn/java:/s_u01/app/applmgr/finprod/comn/java/apps.zip:/s_u01/app/applmgr/jdk1.5.0_11/classes:/s_u01/app/applmgr/jdk1.5.0_11/lib:/s_u01/app/applmgr/jdk1.5.0_11/lib/classes.zip:/s_u01/app/applmgr/jdk1.5.0_11/lib/classes.jar:/s_u01/app/applmgr/jdk1.5.0_11/lib/rt.jar:/s_u01/app/applmgr/jdk1.5.0_11/lib/i18n.jar:/s_u01/app/applmgr/finprod/comn/java/3rdparty/RFJavaInt.zip: -Dengine.LogPath=/s_u01/app/applmgr/finprod/comn/admin/log/finprod_acfpdb10 -Dengine.TempDir=/s_u01/app/applmgr/finprod/comn/temp -Dengine.CommandPromptEnabled=false -Dengine.CommandPort=11000 -Dengine.AOLJ.config=/s_u01/app/applmgr/finprod/appl/fnd/11.5.0/secure/acfpdb10_finprod.dbc -Dengine.ServerID=5000 -Ddebug=off -Dengine.LogLevel=1 -Dlog.ShowWarnings=false -Dengine.FaxEnabler=oracle.apps.jtf.fm.engine.rightfax.RfFaxEnablerImpl -Dengine.PrintEnabler=oracle.apps.jtf.fm.engine.rightfax.RfPrintEnablerImpl -Dfax.TempDir=/s_u01/app/applmgr/finprod/comn/admin/log/finprod_acfpdb10 -Dprint.TempDir=/s_u01/app/applmgr/finprod/comn/admin/log/finprod_acfpdb10 oracle.apps.jtf.fm.FulfillmentServer >> /s_u01/app/applmgr/finprod/comn/admin/log/finprod_acfpdb10/jtffmctl.txt</start_cmd>
983c980
< </oa_context>
---
> </oa_context>

Terminology: Parsers don't write XML; they read XML. Serialisers write XML.
In normal element content, < and & are illegal and must be escaped. > is legal except where it follows ]] and is NOT the end of a CDATA section. Most serialisers take the easy way out and write > because a parser will handle both that and >.
I suggest that you submit both your output and input files to an XML validation service like this or this and also test whether the consumer will actually parse your output file.

The only thing I can think of is forcing the parser to treat the nodes you modify as cdata blocks (as the parser is clearly changing the xml tag closing brackets). Try val.text = etree.CDATA(Value) instead of val.text = Value.
http://lxml.de/api.html#cdata

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

lxml.objectify error when writing data - python

obj_xml is bytes type, so can't use it with write() without decoding it first. Need to change f.write(obj_xml) to: f.write(obj_xml.decode('utf-8')) And it works great!

Related

How can I parse a VCARD in a XML file

python question regarding re.search and indexing

.strip won't remove last quotation mark in Python3 Program

Undoing "marked as read" status of emails fetched with imaplib

Preserving special characters in text nodes using Python lxml module

Categories

Resources