how to use fromstring for xml parsing by ElementTree using python? - python

xml code is this
<foo>
<bar key="value">text</bar>
</foo>
Python code is:
import xml.etree.ElementTree as ET
xml=ET.fromstring(contents)
xml.find('./bar').attrib['key']
Output: 'value'
What must be placed in contents place of the above python code to get the value as output?
If i write as contents only it is giving an error as contents not defined.

It works if the XML is provided as a triple-quoted string. This allows you to include unescaped quotes within the string.
import xml.etree.ElementTree as ET
contents = """
<foo>
<bar key="value">text</bar>
</foo>"""
xml = ET.fromstring(contents)
print xml.find('./bar').attrib['key']

Related

how to build xml file in python, with formatting

I'm trying to build a xml file in python so I can write it out to a file, but I'm getting complications with new lines and tabbing etc...
I cannot use a module to do this - because Im using a cut down version of python 2. It must all be in pure python.
For instance, how is it possible to create a xml file with this type of formatting, which keeps all the new lines and tabs (whitespace)?
e.g.
<?xml version="1.0" encoding="UTF-8"?>
<myfiledata>
<mydata>
blahblah
</mydata>
</myfiledata>
I've tried enclosing each line
' <myfiledata>' +\n
' blahblah' +\n
etc.
However, the output Im getting from the script is not anything close to how it looks in my python file, there is extra white space and the new lines arent properly working.
Is there any definitive way to do this? I would rather be editing a file that looks somewhat like what I will end up with - for clarity sake...
You can use XMLGenerator from saxutils to generate the XML and xml.dom.minidom to parse it and print the pretty xml (both modules from standard library in Python 2).
Sample code creating a XML and pretty-printing it:
from __future__ import print_function
from xml.sax.saxutils import XMLGenerator
import io
import xml.dom.minidom
def pprint_xml_string(s):
"""Pretty-print an XML string with minidom"""
parsed = xml.dom.minidom.parse(io.BytesIO(s))
return parsed.toprettyxml()
# create a XML file in-memory:
fp = io.BytesIO()
xg = XMLGenerator(fp)
xg.startDocument()
xg.startElement('root', {})
xg.startElement('subitem', {})
xg.characters('text content')
xg.endElement('subitem')
xg.startElement('subitem', {})
xg.characters('text content for another subitem')
xg.endElement('subitem')
xg.endElement('root')
xg.endDocument()
# pretty-print it
xml_string = fp.getvalue()
pretty_xml = pprint_xml_string(xml_string)
print(pretty_xml)
Output is:
<?xml version="1.0" ?>
<root>
<subitem>text content</subitem>
<subitem>text content for another subitem</subitem>
</root>
Note that the text content elements (wrapped in <subitem> tags) aren't indented because doing so would change their content (XML doesn't ignore whitespace like HTML does).
The answer was to use xml.element.tree and from xml.dom import minidom
Which are all available on python 2.5

extracting result text from xml using Python

I have the following xml :
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ns3:result xmlns:ns2="http://ws.def.com/">
<ns3:value>QWESW12323D2412123S</ns3:value>
</ns3:result>
and want to parse it with python and extract this text i tried the following :
from xml.etree import ElementTree as etree
xml = etree.fromstring(data)
item = xml.find('ns3:value')
print item
but i get empty item ,could someone help to achieve this with Python?
Use the syntax '{ns3}value' to apply the namespace - although as ns3 isn't defined, I don't think this is actually valid xml.

How to (push) parse XML files in Python?

I've already seen this question, but it's from the 2009.
What's a simple modern way to handle XML files in Python 3?
I.e., from this TLD (adapted from here):
<?xml version="1.0" encoding="UTF-8" ?>
<taglib>
<tlib-version>1.0</tlib-version>
<short-name>bar-baz</short-name>
<tag>
<name>present</name>
<tag-class>condpkg.IfSimpleTag</tag-class>
<body-content>scriptless</body-content>
<attribute>
<name>test</name>
<required>true</required>
<rtexprvalue>true</rtexprvalue>
</attribute>
</tag>
</taglib>
I want to parse TLD files (Java Server Pages Tag Library Descriptors), to obtain some sort of structure in Python (I have still to decide about that part).
Hence, I need a push parser. But I won't do much more with it, so I'd rather prefer a simple API (I'm new to Python).
xml.etree.ElementTree is still there, in the standard library:
import xml.etree.ElementTree as ET
data = """your xml here"""
tree = ET.fromstring(data)
print(tree.find('tag/name').text) # prints "present"
If you look outside of the standard library, there is a very popular and fast lxml module that follows the ElementTree interface and supports Python3:
from lxml import etree as ET
data = """your xml here"""
tree = ET.fromstring(data)
print(tree.find('tag/name').text) # prints "present"
Besides, there is lxml.objectify that allows you to deal with XML structure like with a Python object.

XML parsing with ElementTree produces wrong output

I want to parse an XML file with ElementTree but at a certain tag the output is wrong
<descriptions>
<description descriptionType="Abstract">Some Abstract Text
</description>
</descriptions>
So I parse it with the XML function
import xml.etree.ElementTree as ElementTree
root = ElementTree.XML(my_xml)
root.getchildren()[0].items()
and the outcome is:
Out: [('descriptionType', 'Abstract')]
Is there any problem with the XML, I use ElementTree in a wrong way or it's a bug?
I guess you want to get the text. So:
root.getchildren()[0].text
not
root.getchildren()[0].items()
It was just that if there are no tags its stored in the text attribute..

Python ElementTree - print out namespace definitions?

I'm using Python's elementtree to parse some XML configuration files.
At the top of the file, I have a root element like this:
<?xml version="1.0" encoding="utf-8"?>
<sgx:FooConfig
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:foo="http://ns.au.firm.com/foo.xsd"
xmlns:bar="http://ns.au.firm.com/bar.xsd"
>
The problem is, the bar namespace can be set to one of two different XSDs, depending on the version of the configuration file.
I'm looking for a way to print out the namespace mapping using ElementTree, so I can check which of the two XSDs is being used - then I can get my code to handle the correct case.
Is there a way to print out all the namespace definitions out using Python?
Cheers,
Victor
What you have is not valid xml (undefined prefixes) and I think you can't do this with xml.etree but you should be able to do it using lxml.
import lxml.etree as et
tree = et.XML(yourxml)
print tree.nsmap

Categories