XML parsing with ElementTree produces wrong output

XML parsing with ElementTree produces wrong output - python

I want to parse an XML file with ElementTree but at a certain tag the output is wrong
<descriptions>
<description descriptionType="Abstract">Some Abstract Text
</description>
</descriptions>
So I parse it with the XML function
import xml.etree.ElementTree as ElementTree
root = ElementTree.XML(my_xml)
root.getchildren()[0].items()
and the outcome is:
Out: [('descriptionType', 'Abstract')]
Is there any problem with the XML, I use ElementTree in a wrong way or it's a bug?

I guess you want to get the text. So:
root.getchildren()[0].text
not
root.getchildren()[0].items()

It was just that if there are no tags its stored in the text attribute..

Related

how to use fromstring for xml parsing by ElementTree using python?

xml code is this
<foo>
<bar key="value">text</bar>
</foo>
Python code is:
import xml.etree.ElementTree as ET
xml=ET.fromstring(contents)
xml.find('./bar').attrib['key']
Output: 'value'
What must be placed in contents place of the above python code to get the value as output?
If i write as contents only it is giving an error as contents not defined.

It works if the XML is provided as a triple-quoted string. This allows you to include unescaped quotes within the string.
import xml.etree.ElementTree as ET
contents = """
<foo>
<bar key="value">text</bar>
</foo>"""
xml = ET.fromstring(contents)
print xml.find('./bar').attrib['key']

Remove ns0 from XML

I have an XML file where I would like to edit certain attributes. I am able to properly edit the attributes but when I write the changes to the file, the tags have a strange "ns0" added onto them. How can I get rid of this? This is what I have tried and have been unsuccessful. I am working in Python and using lxml.
import xml.etree.ElementTree as ET
from xml.etree import ElementTree as etree
from lxml import etree, objectify
frag_xml_tree = ET.parse(xml_name)
frag_root = frag_xml_tree.getroot()
for e in frag_root:
for elem in frag_root.iter(e):
elem.attrib[frag_param_name] = update_val
etree.register_namespace("", "http://www.w3.org/2001")
frag_xml_tree.write(xml_name)
However, when I do this, I only get the error Invalid tag name u''. I thought this error came up if the xml tags started with digits but that is not the case with my xml. I am really stuck on how to proceed. Thanks

Actually the way to do it seemed to be a combination of two things.
The import statement is import xml.etree.ElementTree as ET
ET.register_namespace("", NAMESPACE) is the correct call, where NAMESPACE is the namespace listed in the input xml, ie the url after xmlns.

Here's the corrected code using only xml.etree.ElementTree instead of lxml:
import xml.etree.ElementTree as ET
frag_xml_tree = ET.parse(xml_name)
frag_root = frag_xml_tree.getroot()
for e in frag_root:
for elem in frag_root.iter(e):
elem.attrib[frag_param_name] = update_val
ET.register_namespace("", "http://www.w3.org/2001")
frag_xml_tree.write(xml_name)

the following snippet removes the presence of ns0 throughout the xml file
for i in range (0,len(list(root))):
print(root[i])
ET.register_namespace("",NAMESPACE)
tree.write('TP_updated2.xml',xml_declaration=True,method='xml',encoding="utf8",default_namespace=None)
NAMESPACE = the url after the xmlns

extracting result text from xml using Python

I have the following xml :
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ns3:result xmlns:ns2="http://ws.def.com/">
<ns3:value>QWESW12323D2412123S</ns3:value>
</ns3:result>
and want to parse it with python and extract this text i tried the following :
from xml.etree import ElementTree as etree
xml = etree.fromstring(data)
item = xml.find('ns3:value')
print item
but i get empty item ,could someone help to achieve this with Python?

Use the syntax '{ns3}value' to apply the namespace - although as ns3 isn't defined, I don't think this is actually valid xml.

How to (push) parse XML files in Python?

I've already seen this question, but it's from the 2009.
What's a simple modern way to handle XML files in Python 3?
I.e., from this TLD (adapted from here):
<?xml version="1.0" encoding="UTF-8" ?>
<taglib>
<tlib-version>1.0</tlib-version>
<short-name>bar-baz</short-name>
<tag>
<name>present</name>
<tag-class>condpkg.IfSimpleTag</tag-class>
<body-content>scriptless</body-content>
<attribute>
<name>test</name>
<required>true</required>
<rtexprvalue>true</rtexprvalue>
</attribute>
</tag>
</taglib>
I want to parse TLD files (Java Server Pages Tag Library Descriptors), to obtain some sort of structure in Python (I have still to decide about that part).
Hence, I need a push parser. But I won't do much more with it, so I'd rather prefer a simple API (I'm new to Python).

xml.etree.ElementTree is still there, in the standard library:
import xml.etree.ElementTree as ET
data = """your xml here"""
tree = ET.fromstring(data)
print(tree.find('tag/name').text) # prints "present"
If you look outside of the standard library, there is a very popular and fast lxml module that follows the ElementTree interface and supports Python3:
from lxml import etree as ET
data = """your xml here"""
tree = ET.fromstring(data)
print(tree.find('tag/name').text) # prints "present"
Besides, there is lxml.objectify that allows you to deal with XML structure like with a Python object.

Python ElementTree - print out namespace definitions?

I'm using Python's elementtree to parse some XML configuration files.
At the top of the file, I have a root element like this:
<?xml version="1.0" encoding="utf-8"?>
<sgx:FooConfig
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:foo="http://ns.au.firm.com/foo.xsd"
xmlns:bar="http://ns.au.firm.com/bar.xsd"
>
The problem is, the bar namespace can be set to one of two different XSDs, depending on the version of the configuration file.
I'm looking for a way to print out the namespace mapping using ElementTree, so I can check which of the two XSDs is being used - then I can get my code to handle the correct case.
Is there a way to print out all the namespace definitions out using Python?
Cheers,
Victor

What you have is not valid xml (undefined prefixes) and I think you can't do this with xml.etree but you should be able to do it using lxml.
import lxml.etree as et
tree = et.XML(yourxml)
print tree.nsmap

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

XML parsing with ElementTree produces wrong output - python

I guess you want to get the text. So: root.getchildren()[0].text not root.getchildren()[0].items()

It was just that if there are no tags its stored in the text attribute..

Related

how to use fromstring for xml parsing by ElementTree using python?

Remove ns0 from XML

extracting result text from xml using Python

How to (push) parse XML files in Python?

Python ElementTree - print out namespace definitions?

Categories

Resources