Python ElementTree - print out namespace definitions?

Python ElementTree - print out namespace definitions? - python

I'm using Python's elementtree to parse some XML configuration files.
At the top of the file, I have a root element like this:
<?xml version="1.0" encoding="utf-8"?>
<sgx:FooConfig
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:foo="http://ns.au.firm.com/foo.xsd"
xmlns:bar="http://ns.au.firm.com/bar.xsd"
>
The problem is, the bar namespace can be set to one of two different XSDs, depending on the version of the configuration file.
I'm looking for a way to print out the namespace mapping using ElementTree, so I can check which of the two XSDs is being used - then I can get my code to handle the correct case.
Is there a way to print out all the namespace definitions out using Python?
Cheers,
Victor

What you have is not valid xml (undefined prefixes) and I think you can't do this with xml.etree but you should be able to do it using lxml.
import lxml.etree as et
tree = et.XML(yourxml)
print tree.nsmap

Related

extracting result text from xml using Python

I have the following xml :
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ns3:result xmlns:ns2="http://ws.def.com/">
<ns3:value>QWESW12323D2412123S</ns3:value>
</ns3:result>
and want to parse it with python and extract this text i tried the following :
from xml.etree import ElementTree as etree
xml = etree.fromstring(data)
item = xml.find('ns3:value')
print item
but i get empty item ,could someone help to achieve this with Python?

Use the syntax '{ns3}value' to apply the namespace - although as ns3 isn't defined, I don't think this is actually valid xml.

How to (push) parse XML files in Python?

I've already seen this question, but it's from the 2009.
What's a simple modern way to handle XML files in Python 3?
I.e., from this TLD (adapted from here):
<?xml version="1.0" encoding="UTF-8" ?>
<taglib>
<tlib-version>1.0</tlib-version>
<short-name>bar-baz</short-name>
<tag>
<name>present</name>
<tag-class>condpkg.IfSimpleTag</tag-class>
<body-content>scriptless</body-content>
<attribute>
<name>test</name>
<required>true</required>
<rtexprvalue>true</rtexprvalue>
</attribute>
</tag>
</taglib>
I want to parse TLD files (Java Server Pages Tag Library Descriptors), to obtain some sort of structure in Python (I have still to decide about that part).
Hence, I need a push parser. But I won't do much more with it, so I'd rather prefer a simple API (I'm new to Python).

xml.etree.ElementTree is still there, in the standard library:
import xml.etree.ElementTree as ET
data = """your xml here"""
tree = ET.fromstring(data)
print(tree.find('tag/name').text) # prints "present"
If you look outside of the standard library, there is a very popular and fast lxml module that follows the ElementTree interface and supports Python3:
from lxml import etree as ET
data = """your xml here"""
tree = ET.fromstring(data)
print(tree.find('tag/name').text) # prints "present"
Besides, there is lxml.objectify that allows you to deal with XML structure like with a Python object.

How can i deal with xml namespace in python?

I'm not going to parse a xml file but i want to move it in python. (i.e etree)
I know about the basic of etree however all i want to know is about xml namespace.
I've got 3 namespaces in code. Here is the code that i want to transfer.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><ns1:g2sMessage xmlns:ns1="http://www.gamingstandards.com/g2s/schemas/v1.0.3"><ns1:g2sBody ns1:dateTimeSent="2014-08-14T15:30:48.692+09:00" ns1:egmId="TestAppEGMID" ns1:hostId="1"><ns1:communications ns1:deviceId="0" ns1:sessionId="1001" ns1:sessionType="G2S_request" ns1:timeToLive="100" ns1:commandId="1001" ns1:dateTime="2014-08-14T06:30:48.696Z"><ns1:keepAlive/></ns1:communications></ns1:g2sBody></ns1:g2sMessage>TestAppEGMID1
Does anyone have an idea?

Use Python to edit XML header

I've written a Python script to create some XML, but I didn't find a way to edit the heading within Python.
Basically, instead of having this as my heading:
<?xml version="1.0" ?>
I need to have this:
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
I looked into this as best I could, but found no way to add standalone status from within Python. So I figured that I'd have to go over the file after it had been created and then replace the text. I read in several places that I should stay far away from using readlines() because it could ruin the XML formatting.
Currently the code I have to do this - which I got from another Stackoverflow post about editing XML with Python - is:
doc = parse('file.xml')
elem = doc.find('xml version="1.0"')
elem.text = 'xml version="1.0" encoding="UTF-8" standalone="no"'
That provides me with a KeyError. I've tried to debug it to no avail, which leads me to believe that perhaps the XML heading wasn't meant to be edited in this way. Or my code is just wrong.
If anyone is curious (or miraculously knows how to insert standalone status into Python code), here is my code to write the xml file:
with open('file.xml', 'w') as f:
f.write(doc.toprettyxml(indent=' '))
Some people don't like "toprettyxml", but with my relatively basic level, it seemed like the best bet.
Anyway, if anyone can provide some advice or insight, I would be very grateful.

The xml.etree API does not give you any options to write out a standalone attribute in the XML declaration.
You may want to use the lxml library instead; it uses the same ElementTree API, but offers more control over the output. tostring() supports a standalone flag:
from lxml import etree
etree.tostring(doc, pretty_print=True, standalone=False)
or use .write(), which support the same options:
doc.write(outputfile, pretty_print=True, standalone=False)

Reading XML DOCTYPE info with Python

I need to parse a version of an XML file as follows.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE twReport [
<!ELEMENT twReport (twHead?, (twWarn | twDebug | twInfo)*, twBody, twSum?,
twDebug*, twFoot?, twClientInfo?)>
<!ATTLIST twReport version CDATA "10,4"> <----- VERSION INFO HERE
I use xml.dom.minidom for parsing XML file, and I need to parse the version of the XML file written in embedded DTD.
Can I use xml.dom.minidom for this purpose?
Is there any python XML parser for that purposes?

How about xmlproc's DTD api?
Here's a random snippet of code I wrote years and years ago to do some work with DTDs from Python, which might give you an idea of what it's like to work with this library:
from xml.parsers.xmlproc import dtdparser
attr_separator = '_'
child_separator = '_'
dtd = dtdparser.load_dtd('schedule.dtd')
for name, element in dtd.elems.items():
for attr in element.attrlist:
output = '%s%s%s = ' % (name, attr_separator, attr)
print output
for child in element.get_valid_elements(element.get_start_state()):
output = '%s%s%s = ' % (name, child_separator, child)
print output
(FYI, this was the first result when searching for "python dtd parser")

Because both of the the standard library XML libraries (xml.dom.minidom and xml.etree) use the same parser (xml.parsers.expat) you are limited in the "quality" of XML data you are able to successfully parse.
You're better off using the tried-and-true 3rd party modules out there like lxml or BeautifulSoup that are not only more resilient to errors, but will also give you exactly what you are looking for with little trouble.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python ElementTree - print out namespace definitions? - python

What you have is not valid xml (undefined prefixes) and I think you can't do this with xml.etree but you should be able to do it using lxml. import lxml.etree as et tree = et.XML(yourxml) print tree.nsmap

Related

extracting result text from xml using Python

How to (push) parse XML files in Python?

How can i deal with xml namespace in python?

Use Python to edit XML header

Reading XML DOCTYPE info with Python

Categories

Resources