I am currently learning about xml and DTD and I came across a DTD which was a bit puzzling .
<!ELEMENT foo (superpowers*)>
<!ELEMENT superpowers ( foo | agility )>
Firstly is this DTD legal ?
whenever I try to generate a corresponding xml file i get the following error,
Traceback (most recent call last): File "test.py", line 12, in
foo.append(superpowers)\n File "src/lxml/etree.pyx", line 832, in lxml.etree._Element.append File "src/lxml/apihelpers.pxi", line
1283, in lxml.etree._appendChild ValueError: cannot append parent to
itself
The code I am using for xml generation in python is represented by this pseudo code.
from lxml import etree as xml
import pprint
foo = xml.Element("foo")
superpowers = xml.Element("superpowers")
x = True
if x :
foo = xml.Element("foo")
superpowers.append(foo)
foo.append(superpowers)
else :
agility = xml.Element("agility")
superpowers.append(agility)
foo.append(superpowers)
tree = xml.ElementTree(foo)
print (xml.tostring(tree, pretty_print=True))
with open("foo.xml", "wb") as op:
tree.write(op, pretty_print=True) #pretty_print used for indentation
Have I overlooked anything ??
<!ELEMENT foo (superpowers*)>
<!ELEMENT superpowers ( foo | agility )>
Firstly is this DTD legal ?
Yes, that partial DTD is legal. (I say partial because there isn't an element declaration for agility.)
It's saying that foo can contain zero or more superpowers and superpowers must contain exactly one foo or agility.
For example, this would be valid according to that DTD...
<foo>
<superpowers>
<foo/>
</superpowers>
<superpowers>
<foo>
<superpowers>
<foo/>
</superpowers>
</foo>
</superpowers>
</foo>
The error you're getting makes sense; you can't foo.append(superpowers) because superpowers is already the parent of foo. (Like Harry Dunne says, "You can't triple stamp a double stamp Lloyd!".)
What you would need to do is create a brand new foo and append superpowers to that.
Example...
if x :
foo = xml.Element("foo")
superpowers.append(foo)
foo2 = xml.Element("foo")
foo2.append(superpowers)
and what you'd end up with is (comments added to try to help clarify)...
<foo><!--foo2-->
<superpowers>
<foo><!--original foo--></foo>
</superpowers>
</foo>
Related
I'm trying to parse an XML file in which there is some VCARD. I need the info: FN, NOTE (SIREN and A) and print them as a list as FN, SIREN_A. I would also like to add them in a list if the string in the description equals "diviseur" only
I've tried different things (vobject, finditer) but none of them work. For my parser, I'm using the library xml.etree.ElementTree and pandas which usually are causing some incompatibilies.
code python :
import xml.etree.ElementTree as ET
import vobject
newlist=[]
data=[]
data.append(newlist)
diviseur=[]
tree=ET.parse('test_oc.xml')
root=tree.getroot()
newlist=[]
for lifeCycle in root.findall('{http://ltsc.ieee.org/xsd/LOM}lifeCycle'):
for contribute in lifeCycle.findall('{http://ltsc.ieee.org/xsd/LOM}contribute'):
for entity in contribute.findall('{http://ltsc.ieee.org/xsd/LOM}entity'):
vcard = vobject.readOne(entity)
siren = vcard.contents['note'].value,":",vcard.contents['fn'].value
print ('siren',siren.text)
for date in contribute.findall('{http://ltsc.ieee.org/xsd/LOM}date'):
for description in date.findall('{http://ltsc.ieee.org/xsd/LOM}description'):
entite=description.find('{http://ltsc.ieee.org/xsd/LOM}string')
print ('Type entité:', entite.text)
newlist.append(entite)
j=0
for j in range(len(entite)-1):
if entite[j]=="diviseur":
diviseur.append(siren[j])
print('diviseur:', diviseur)
newlist.append(diviseur)
data.append(newlist)
print(data)
xml file to parse:
<?xml version="1.0" encoding="UTF-8"?>
<lom:lom xmlns:lom="http://ltsc.ieee.org/xsd/LOM" xmlns:lomfr="http://www.lom-fr.fr/xsd/LOMFR" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://ltsc.ieee.org/xsd/LOM">
<lom:version uniqueElementName="version">
<lom:string language="http://id.loc.gov/vocabulary/iso639-2/fre">V4.1</lom:string>
</lom:version>
<lom:lifeCycle uniqueElementName="lifeCycle">
<lom:contribute>
<lom:entity><![CDATA[
BEGIN:VCARD
VERSION:4.0
FN:Cailler
N:;Valérie;;Mr;
ORG:Veoli
NOTE:SIREN=203025106
NOTE :ISNI=0000000000000000
END:VCARD
]]></lom:entity>
<lom:date uniqueElementName="date">
<lom:dateTime uniqueElementName="dateTime">2019-07-10</lom:dateTime>
<lom:description uniqueElementName="description">
<lom:string>departure</lom:string>
</lom:description>
</lom:date>
</lom:contribute>
<lom:contribute>
<lom:entity><![CDATA[
BEGIN:VCARD
VERSION:4.0
FN:Besnard
N:;Ugo;;Mr;
ORG:MG
NOTE:SIREN=501 025 205
NOTE :A=0000 0000
END:VCARD
]]></lom:entity>
<lom:date uniqueElementName="date">
<lom:dateTime uniqueElementName="dateTime">2019-07-10</lom:dateTime>
<lom:description uniqueElementName="description">
<lom:string>diviseur</lom:string>
</lom:description>
</lom:date>
</lom:contribute>
</lom:lifeCycle>
</lom:lom>
Traceback (most recent call last):
File "parser_export_csv_V2.py", line 73, in
vcard = vobject.readOne(entity)
File "C:\Users\b\AppData\Local\Programs\Python\Python36-32\lib\site-packages\vobject\base.py", line 1156, in readOne
allowQP))
File "C:\Users\b\AppData\Local\Programs\Python\Python36-32\lib\site-packages\vobject\base.py", line 1089, in readComponents
for line, n in getLogicalLines(stream, allowQP):
File "C:\Users\b\AppData\Local\Programs\Python\Python36-32\lib\site-packages\vobject\base.py", line 869, in getLogicalLines
val = fp.read(-1)
AttributeError: 'xml.etree.ElementTree.Element' object has no attribute 'read'
There are a few problems here.
entity is an Element instance, and vCard is a plain text data format. vobject.readOne() expects text.
There is unwanted whitespace adjacent to the vCard properties in the XML file.
NOTE :ISNI=0000000000000000 is invalid; it should be NOTE:ISNI=0000000000000000 (space removed).
vcard.contents['note'] is a list and does not have a value property.
Here is code that probably doesn't produce exactly what you want, but I hope it helps:
import xml.etree.ElementTree as ET
import vobject
NS = {"lom": "http://ltsc.ieee.org/xsd/LOM"}
tree = ET.parse('test_oc.xml')
for contribute in tree.findall('.//lom:contribute', NS):
desc_string = contribute.find('.//lom:string', NS)
print(desc_string.text)
entity = contribute.find('lom:entity', NS)
txt = entity.text.replace(" ", "") # Text with spaces removed
vcard = vobject.readOne(txt)
for p in vcard.contents["note"]:
print(p.name, p.value)
for p in vcard.contents["fn"]:
print(p.name, p.value)
print()
Output:
departure
NOTE SIREN=203025106
NOTE ISNI=0000000000000000
FN Cailler
diviseur
NOTE SIREN=501025205
NOTE A=00000000
FN Besnard
here is my code:
from lxml import etree, objectify
def parseXML(xmlFile):
with open(xmlFile) as f:
xml = f.read()
root = objectify.fromstring(xml)
#returns attributes in element node as dict
attrib = root.attrib
#how to extract element data
begin = root.appointment.begin
uid = root.appointment.uid
#loop over elements and print their tags and text
for appt in root.getchildren():
for e in appt.getchildren():
print('%s => %s' % (e.tag, e.text))
print()
#how to change element's text
root.appointment.begin = 'something else'
print(root.appointment.begin)
#how to add a new element
root.appointment.new_element = 'new data'
#remove the py:pytype stuff
objectify.deannotate(root)
etree.cleanup_namespaces(root)
obj_xml = etree.tostring(root, pretty_print=True)
print(obj_xml)
#save your xml
with open('new.xml', 'w') as f:
f.write(obj_xml)
parseXML('example.xml')
Here is parsed xml file:
<?xml version="1.0" ?>
<zAppointments reminder="15">
<appointment>
<begin>1181251600</begin>
<uid>0400000008200E000</uid>
<alarmTime>1181572063</alarmTime>
<state></state>
<location></location>
<duration>1800</duration>
<subject>Bring pizza home</subject>
</appointment>
<appointment>
<begin>1234567890</begin>
<duration>1800</duration>
<subject>Check MS office webstie for updates</subject>
<state>dismissed</state>
<location></location>
<uid>502fq14-12551ss-255sf2</uid>
</appointment>
</zAppointments>
And here is output with error:
/usr/bin/python3.5 "/home/michal/Desktop/nauka programowania/python 101/parsing_with_lxml.py"
begin => 1181251600
uid => 0400000008200E000
Traceback (most recent call last):
alarmTime => 1181572063
File "/home/michal/Desktop/nauka programowania/python 101/parsing_with_lxml.py", line 87, in <module>
state => None
location => None
parseXML('example.xml')
duration => 1800
subject => Bring pizza home
begin => 1234567890
duration => 1800
subject => Check MS office webstie for updates
state => dismissed
location => None
uid => 502fq14-12551ss-255sf2
something else
b'<zAppointments reminder="15">\n <appointment>\n <begin>something else</begin>\n <uid>0400000008200E000</uid>\n <alarmTime>1181572063</alarmTime>\n <state/>\n <location/>\n <duration>1800</duration>\n <subject>Bring pizza home</subject>\n <new_element>new data</new_element>\n </appointment>\n <appointment>\n <begin>1234567890</begin>\n <duration>1800</duration>\n <subject>Check MS office webstie for updates</subject>\n <state>dismissed</state>\n <location/>\n <uid>502fq14-12551ss-255sf2</uid>\n </appointment>\n</zAppointments>\n'
File "/home/michal/Desktop/nauka programowania/python 101/parsing_with_lxml.py", line 85, in parseXML
f.write(obj_xml)
TypeError: write() argument must be str, not bytes
Process finished with exit code 1
What can I do to turn that f object to a string? Is it possible even? I got that error few times earlier and still don't know how to fix it (doing Python 101 exercises).
obj_xml is bytes type, so can't use it with write() without decoding it first. Need to change
f.write(obj_xml)
to:
f.write(obj_xml.decode('utf-8'))
And it works great!
I have been trying to use minidom but have no real preference. For some reason lxml will not install on my machine.
I would like to parse an xml file:
<?xml version="1.
-<transfer frmt="1" vtl="0" serial_number="E5XX-0822" date="2016-10-03 16:34:53.000" style="startstop">
-<plateInfo>
<plate barcode="E0122326" name="384plate" type="source"/>
<plate barcode="A1234516" name="1536plateD" type="destination"/>
</plateInfo>
-<printmap total="1387">
<w reason="" cf="13" aa="1.779" eo="299.798" tof="32.357" sv="1565.311" ct="1.627" ft="1.649" fc="88.226" memt="0.877" fldu="Percent" fld="DMSO" dy="0" dx="0" region="-1" tz="18989.481" gy="72468.649" gx="55070.768" avt="50" vt="50" vl="3.68" cvl="3.63" t="16:30:47.703" dc="0" dr="0" dn="A1" c="0" r="0" n="A1"/>
<w reason="" cf="13" aa="1.779" eo="299.798" tof="32.357" sv="1565.311" ct="1.627" ft="1.649" fc="88.226" memt="0.877" fldu="Percent" fld="DMSO" dy="0" dx="0" region="-1" tz="18989.481" gy="72468.649" gx="55070.768" avt="50" vt="50" vl="3.68" cvl="3.63" t="16:30:47.703" dc="0" dr="0" dn="A1" c="1" r="0" n="A2"/>
</printmap>
</transfer>
The files do not have any element details, as you can see. All the information is contained in the attributes. In trying to adapt another SO post, I have this - but it seems to be geared more toward elements. I am also failing at a good way to "browse" the xml information, i.e. I would like to say "dir(xml_file)" and have a list of all the methods I can carry out on my tree structure, or see all the attributes. I know this was a lot and potentially different directions, but thank you in advance!
def parse(files):
for xml_file in files:
xmldoc = minidom.parse(xml_file)
transfer = xmldoc.getElementsByTagName('transfer')[0]
plateInfo = transfer.getElementsByTagName('plateInfo')[0]
With minidom you can access the attributes of a particular element using the method attributes which can then be treated as dictionary; this example iterates and print the attributes of the element transfer[0]:
from xml.dom.minidom import parse, parseString
xml_file='''<?xml version="1.0" encoding="UTF-8"?>
<transfer frmt="1" vtl="0" serial_number="E5XX-0822" date="2016-10-03 16:34:53.000" style="startstop">
<plateInfo>
<plate barcode="E0122326" name="384plate" type="source"/>
<plate barcode="A1234516" name="1536plateD" type="destination"/>
</plateInfo>
<printmap total="1387">
<w reason="" cf="13" aa="1.779" eo="299.798" tof="32.357" sv="1565.311" ct="1.627" ft="1.649" fc="88.226" memt="0.877" fldu="Percent" fld="DMSO" dy="0" dx="0" region="-1" tz="18989.481" gy="72468.649" gx="55070.768" avt="50" vt="50" vl="3.68" cvl="3.63" t="16:30:47.703" dc="0" dr="0" dn="A1" c="0" r="0" n="A1"/>
<w reason="" cf="13" aa="1.779" eo="299.798" tof="32.357" sv="1565.311" ct="1.627" ft="1.649" fc="88.226" memt="0.877" fldu="Percent" fld="DMSO" dy="0" dx="0" region="-1" tz="18989.481" gy="72468.649" gx="55070.768" avt="50" vt="50" vl="3.68" cvl="3.63" t="16:30:47.703" dc="0" dr="0" dn="A1" c="1" r="0" n="A2"/>
</printmap>
</transfer>'''
xmldoc = parseString(xml_file)
transfer = xmldoc.getElementsByTagName('transfer')
attlist= transfer[0].attributes.keys()
for a in attlist:
print transfer[0].attributes[a].name,transfer[0].attributes[a].value
you can find more information here:
http://www.diveintopython.net/xml_processing/attributes.html
Here's my project: I'm graphing weather data from WeatherBug using RRDTool. I need a simple, efficient way to download the weather data from WeatherBug. I was using a terribly inefficient bash-script-scraper but moved on to BeautifulSoup. The performance is just too slow (it's running on a Raspberry Pi) so I need to use LXML.
What I have so far:
from lxml import etree
doc=etree.parse('weather.xml')
print doc.xpath("//aws:weather/aws:ob/aws:temp")
But I get an error message. Weather.xml is this:
<?xml version="1.0" encoding="UTF-8"?>
<aws:weather xmlns:aws="http://www.aws.com/aws">
<aws:api version="2.0"/>
<aws:WebURL>http://weather.weatherbug.com/PA/Tunkhannock-weather.html?ZCode=Z5546&Units=0&stat=TNKCN</aws:WebURL>
<aws:InputLocationURL>http://weather.weatherbug.com/PA/Tunkhannock-weather.html?ZCode=Z5546&Units=0</aws:InputLocationURL>
<aws:ob>
<aws:ob-date>
<aws:year number="2013"/>
<aws:month number="1" text="January" abbrv="Jan"/>
<aws:day number="11" text="Friday" abbrv="Fri"/>
<aws:hour number="10" hour-24="22"/>
<aws:minute number="26"/>
<aws:second number="00"/>
<aws:am-pm abbrv="PM"/>
<aws:time-zone offset="-5" text="Eastern Standard Time (USA)" abbrv="EST"/>
</aws:ob-date>
<aws:requested-station-id/>
<aws:station-id>TNKCN</aws:station-id>
<aws:station>Tunkhannock HS</aws:station>
<aws:city-state zipcode="18657">Tunkhannock, PA</aws:city-state>
<aws:country>USA</aws:country>
<aws:latitude>41.5663871765137</aws:latitude>
<aws:longitude>-75.9794464111328</aws:longitude>
<aws:site-url>http://www.tasd.net/highschool/index.cfm</aws:site-url>
<aws:aux-temp units="°F">-100</aws:aux-temp>
<aws:aux-temp-rate units="°F">0</aws:aux-temp-rate>
<aws:current-condition icon="http://deskwx.weatherbug.com/images/Forecast/icons/cond013.gif">Cloudy</aws:current-condition>
<aws:dew-point units="°F">40</aws:dew-point>
<aws:elevation units="ft">886</aws:elevation>
<aws:feels-like units="°F">41</aws:feels-like>
<aws:gust-time>
<aws:year number="2013"/>
<aws:month number="1" text="January" abbrv="Jan"/>
<aws:day number="11" text="Friday" abbrv="Fri"/>
<aws:hour number="12" hour-24="12"/>
<aws:minute number="18"/>
<aws:second number="00"/>
<aws:am-pm abbrv="PM"/>
<aws:time-zone offset="-5" text="Eastern Standard Time (USA)" abbrv="EST"/>
</aws:gust-time>
<aws:gust-direction>NNW</aws:gust-direction>
<aws:gust-direction-degrees>323</aws:gust-direction-degrees>
<aws:gust-speed units="mph">17</aws:gust-speed>
<aws:humidity units="%">98</aws:humidity>
<aws:humidity-high units="%">100</aws:humidity-high>
<aws:humidity-low units="%">61</aws:humidity-low>
<aws:humidity-rate>3</aws:humidity-rate>
<aws:indoor-temp units="°F">77</aws:indoor-temp>
<aws:indoor-temp-rate units="°F">-1.1</aws:indoor-temp-rate>
<aws:light>0</aws:light>
<aws:light-rate>0</aws:light-rate>
<aws:moon-phase moon-phase-img="http://api.wxbug.net/images/moonphase/mphase01.gif">0</aws:moon-phase>
<aws:pressure units=""">30.09</aws:pressure>
<aws:pressure-high units=""">30.5</aws:pressure-high>
<aws:pressure-low units=""">30.08</aws:pressure-low>
<aws:pressure-rate units=""/h">-0.01</aws:pressure-rate>
<aws:rain-month units=""">0.11</aws:rain-month>
<aws:rain-rate units=""/h">0</aws:rain-rate>
<aws:rain-rate-max units=""/h">0.12</aws:rain-rate-max>
<aws:rain-today units=""">0.09</aws:rain-today>
<aws:rain-year units=""">0.11</aws:rain-year>
<aws:temp units="°F">41</aws:temp>
<aws:temp-high units="°F">42</aws:temp-high>
<aws:temp-low units="°F">29</aws:temp-low>
<aws:temp-rate units="°F/h">-0.9</aws:temp-rate>
<aws:sunrise>
<aws:year number="2013"/>
<aws:month number="1" text="January" abbrv="Jan"/>
<aws:day number="11" text="Friday" abbrv="Fri"/>
<aws:hour number="7" hour-24="07"/>
<aws:minute number="29"/>
<aws:second number="53"/>
<aws:am-pm abbrv="AM"/>
<aws:time-zone offset="-5" text="Eastern Standard Time (USA)" abbrv="EST"/>
</aws:sunrise>
<aws:sunset>
<aws:year number="2013"/>
<aws:month number="1" text="January" abbrv="Jan"/>
<aws:day number="11" text="Friday" abbrv="Fri"/>
<aws:hour number="4" hour-24="16"/>
<aws:minute number="54"/>
<aws:second number="19"/>
<aws:am-pm abbrv="PM"/>
<aws:time-zone offset="-5" text="Eastern Standard Time (USA)" abbrv="EST"/>
</aws:sunset>
<aws:wet-bulb units="°F">40.802</aws:wet-bulb>
<aws:wind-speed units="mph">3</aws:wind-speed>
<aws:wind-speed-avg units="mph">1</aws:wind-speed-avg>
<aws:wind-direction>S</aws:wind-direction>
<aws:wind-direction-degrees>163</aws:wind-direction-degrees>
<aws:wind-direction-avg>SE</aws:wind-direction-avg>
</aws:ob>
</aws:weather>
I used http://www.xpathtester.com/test to test my xpath and it worked there. But I get the error message:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "lxml.etree.pyx", line 2043, in lxml.etree._ElementTree.xpath (src/lxml/lxml.etree.c:47570)
File "xpath.pxi", line 376, in lxml.etree.XPathDocumentEvaluator.__call__ (src/lxml/lxml.etree.c:118247)
File "xpath.pxi", line 239, in lxml.etree._XPathEvaluatorBase._handle_result (src/lxml/lxml.etree.c:116911)
File "xpath.pxi", line 224, in lxml.etree._XPathEvaluatorBase._raise_eval_error (src/lxml/lxml.etree.c:116728)
lxml.etree.XPathEvalError: Undefined namespace prefix
This is all very new to me -- Python, XML, and LXML. All I want is the observed time and the temperature.
Do my problems have anything to do with that aws: prefix in front of everything? What does that even mean?
Any help you can offer is greatly appreciated!
The problem has all "to do with that aws: prefix in front of everything"; it is a namespace prefix which you have to define. This is easily achievable, as in:
print doc.xpath('//aws:weather/aws:ob/aws:temp',
namespaces={'aws': 'http://www.aws.com/aws'})[0].text
The need for this mapping between the namespace prefix to a value is documented at http://lxml.de/xpathxslt.html.
Try something like this:
from lxml import etree
ns = etree.FunctionNamespace("http://www.aws.com/aws")
ns.prefix = "aws"
doc=etree.parse('weather.xml')
print doc.xpath("//aws:weather/aws:ob/aws:temp")[0].text
See this link: http://lxml.de/extensions.html
I am editing an XML file that is provided by a third party. The XML is used to recreate and entire environment and one is able to edit the XML to propogate the changes. I was able to lookup the element I wanted to change through command line options and save the XML, but special characters are being escaped and I need to retain the special characters. For example it is changing > to $gt; in the file during the .write operation. This is affecting in all occurances of the XML document not just the node element (I think that is what it is called) Below is my code:
import sys
from lxml import etree
from optparse import OptionParser
def parseCommandLine ():
usage = "usage: %prog [options] arg"
parser = OptionParser(usage)
parser.add_option("-f","--file",dest="filename",
help="Context File name including full path", metavar="CONTEXT_FILE")
parser.add_option("-k","--key",dest="key",
help="Key you are looking for in Context File i.e s_isAdmin", metavar="s_someKey")
parser.add_option("-v","--value",dest="value",
help="The replacement value for the key")
if len(sys.argv[1:]) < 3:
print len(sys.argv[1:])
parser.print_help()
sys.exit(2)
(options, args) = parser.parse_args()
return options.filename, options.key, options.value
Filename, Key, Value=parseCommandLine()
parser_options=etree.XMLParser(attribute_defaults=True, dtd_validation=False, strip_cdata=False)
doc = etree.parse(Filename, parser_options ) #Open and parse the file
print doc.findall("//*[#oa_var=%r]" % Key)[0].text
oldval = doc.findall("//*[#oa_var=%r]" % Key)[0].text
val = doc.findall("//*[#oa_var=%r]" % Key)[0]
val.text = Value
print 'old value is %s' % oldval
print 'new value is %s' % val.text
root = doc.getroot()
doc.write(Filename,method='xml',with_tail=True,pretty_print=False)
Original file has this:
tf.fm.FulfillmentServer >> /s_u01/app/applmgr/f
Saved version is being replaced with this:
tf.fm.FulfillmentServer >> /s_u01/app/applmgr/f
I have been trying to mess with pretty_print in the output side DTD validations on the parsing side and I am stumped.
Below is a diff from the changed file and and the original file:
I updated the s_cookie_domain only.
diff finprod_acfpdb10.xml_original finprod_acfpdb10.xml
Warning: missing newline at end of file finprod_acfpdb10.xml
1,3c1
< <?xml version = '1.0'?>
< <!-- $Header: adxmlctx.tmp 115.426 2009/05/08 08:46:29 rdamodar ship $ -->
< <!--
---
> <!-- $Header: adxmlctx.tmp 115.426 2009/05/08 08:46:29 rdamodar ship $ --><!--
13,14c11
< -->
< <oa_context version="$Revision: 115.426 $">
---
> --><oa_context version="$Revision: 115.426 $">
242c239
< <cookiedomain oa_var="s_cookie_domain">.apollogrp.edu</cookiedomain>
---
> <cookiedomain oa_var="s_cookie_domain">.qadoamin.edu</cookiedomain>
526c523
< <FORMS60_BLOCK_URL_CHARACTERS oa_var="s_f60blockurlchar">%0a,%0d,!,%21,",%22,%28,%29,;,[,%5b,],%5d,{,%7b,|,%7c,},%7d,%7f,>,%3c,<,%3e</FORMS60_BLOCK_URL_CHARACTERS>
---
> <FORMS60_BLOCK_URL_CHARACTERS oa_var="s_f60blockurlchar">%0a,%0d,!,%21,",%22,%28,%29,;,[,%5b,],%5d,{,%7b,|,%7c,},%7d,%7f,>,%3c,<,%3e</FORMS60_BLOCK_URL_CHARACTERS>
940c937
< <start_cmd oa_var="s_jtffstart">/s_u01/app/applmgr/jdk1.5.0_11/bin/java -Xmx512M -classpath .:/s_u01/app/applmgr/finprod/comn/java/jdbc111.zip:/s_u01/app/applmgr/finprod/comn/java/xmlparserv2.zip:/s_u01/app/applmgr/finprod/comn/java:/s_u01/app/applmgr/finprod/comn/java/apps.zip:/s_u01/app/applmgr/jdk1.5.0_11/classes:/s_u01/app/applmgr/jdk1.5.0_11/lib:/s_u01/app/applmgr/jdk1.5.0_11/lib/classes.zip:/s_u01/app/applmgr/jdk1.5.0_11/lib/classes.jar:/s_u01/app/applmgr/jdk1.5.0_11/lib/rt.jar:/s_u01/app/applmgr/jdk1.5.0_11/lib/i18n.jar:/s_u01/app/applmgr/finprod/comn/java/3rdparty/RFJavaInt.zip: -Dengine.LogPath=/s_u01/app/applmgr/finprod/comn/admin/log/finprod_acfpdb10 -Dengine.TempDir=/s_u01/app/applmgr/finprod/comn/temp -Dengine.CommandPromptEnabled=false -Dengine.CommandPort=11000 -Dengine.AOLJ.config=/s_u01/app/applmgr/finprod/appl/fnd/11.5.0/secure/acfpdb10_finprod.dbc -Dengine.ServerID=5000 -Ddebug=off -Dengine.LogLevel=1 -Dlog.ShowWarnings=false -Dengine.FaxEnabler=oracle.apps.jtf.fm.engine.rightfax.RfFaxEnablerImpl -Dengine.PrintEnabler=oracle.apps.jtf.fm.engine.rightfax.RfPrintEnablerImpl -Dfax.TempDir=/s_u01/app/applmgr/finprod/comn/admin/log/finprod_acfpdb10 -Dprint.TempDir=/s_u01/app/applmgr/finprod/comn/admin/log/finprod_acfpdb10 oracle.apps.jtf.fm.FulfillmentServer >> /s_u01/app/applmgr/finprod/comn/admin/log/finprod_acfpdb10/jtffmctl.txt</start_cmd>
---
> <start_cmd oa_var="s_jtffstart">/s_u01/app/applmgr/jdk1.5.0_11/bin/java -Xmx512M -classpath .:/s_u01/app/applmgr/finprod/comn/java/jdbc111.zip:/s_u01/app/applmgr/finprod/comn/java/xmlparserv2.zip:/s_u01/app/applmgr/finprod/comn/java:/s_u01/app/applmgr/finprod/comn/java/apps.zip:/s_u01/app/applmgr/jdk1.5.0_11/classes:/s_u01/app/applmgr/jdk1.5.0_11/lib:/s_u01/app/applmgr/jdk1.5.0_11/lib/classes.zip:/s_u01/app/applmgr/jdk1.5.0_11/lib/classes.jar:/s_u01/app/applmgr/jdk1.5.0_11/lib/rt.jar:/s_u01/app/applmgr/jdk1.5.0_11/lib/i18n.jar:/s_u01/app/applmgr/finprod/comn/java/3rdparty/RFJavaInt.zip: -Dengine.LogPath=/s_u01/app/applmgr/finprod/comn/admin/log/finprod_acfpdb10 -Dengine.TempDir=/s_u01/app/applmgr/finprod/comn/temp -Dengine.CommandPromptEnabled=false -Dengine.CommandPort=11000 -Dengine.AOLJ.config=/s_u01/app/applmgr/finprod/appl/fnd/11.5.0/secure/acfpdb10_finprod.dbc -Dengine.ServerID=5000 -Ddebug=off -Dengine.LogLevel=1 -Dlog.ShowWarnings=false -Dengine.FaxEnabler=oracle.apps.jtf.fm.engine.rightfax.RfFaxEnablerImpl -Dengine.PrintEnabler=oracle.apps.jtf.fm.engine.rightfax.RfPrintEnablerImpl -Dfax.TempDir=/s_u01/app/applmgr/finprod/comn/admin/log/finprod_acfpdb10 -Dprint.TempDir=/s_u01/app/applmgr/finprod/comn/admin/log/finprod_acfpdb10 oracle.apps.jtf.fm.FulfillmentServer >> /s_u01/app/applmgr/finprod/comn/admin/log/finprod_acfpdb10/jtffmctl.txt</start_cmd>
983c980
< </oa_context>
---
> </oa_context>
Terminology: Parsers don't write XML; they read XML. Serialisers write XML.
In normal element content, < and & are illegal and must be escaped. > is legal except where it follows ]] and is NOT the end of a CDATA section. Most serialisers take the easy way out and write > because a parser will handle both that and >.
I suggest that you submit both your output and input files to an XML validation service like this or this and also test whether the consumer will actually parse your output file.
The only thing I can think of is forcing the parser to treat the nodes you modify as cdata blocks (as the parser is clearly changing the xml tag closing brackets). Try val.text = etree.CDATA(Value) instead of val.text = Value.
http://lxml.de/api.html#cdata