etree xml parsing and deletion - python

How to delete or remove all the entries for server1 including tags ? I tried to use etree remove function but its not helping
<hosts>
<host instances="" name="*" roles="alpha">
<tags/>
</host>
<host instances="" name="server1" id="alpha,beta">
<tags>
<tag app-id="1" instance="1" name="alpha"/>
<tag app-id="2" instance="2" name="beta"/>
</tags>
</host>
<host instances="" name="server2" id="beta,gama">
<tags>
<tag app-id="1" instance="1" name="beta"/>
<tag app-id="2" instance="2" name="gama"/>
</tags>
</host>
</hosts>
def main1(file=outfile):
tree = et.parse(file)
root = tree.getroot()
thingy = root.find('hosts')
for thing in thingy:
if "server1" in thing.get('name'):
root.remove(thing)
#thingy.remove(thing)
print thingy

Need parent object to remove its child from the HTML/XML.
Use getparent() method to get parent and then remove() method to remove its chid tag.
Demo:
>>> import lxml.etree as PARSER
>>> root = PARSER.fromstring(data)
>>> root.xpath("//hosts/host[#name='server1']")
[<Element host at 0xb6d2ce6c>]
>>> a = root.xpath("//hosts/host[#name='server1']")
>>> for i in a:
... pp = i.getparent()
... pp.remove(i)
...
>>> PARSER.tostring(root, method="xml")
A. find return None Object for following code.
>>> thingy = root.find('hosts')
>>> thingy
This should be thingy = root.find('host')
B. Use xpath method to get target tag.

Related

Parsing xml in python to get all child elements

I have parsed an XML file to get all its elements. I am getting the following output
[<Element '{urn:mitel:params:xml:ns:yang:vld}vld-list' at 0x0000000003059188>, <Element '{urn:mitel:params:xml:ns:yang:vld}vl-id' at 0x00000000030689F8>, <Element '{urn:mitel:params:xml:ns:yang:vld}descriptor-version' at 0x0000000003068A48>]
I need to select the value between } and ' only for each element of the list.
This is my Code till now :
import xml.etree.ElementTree as ET
tree = ET.parse('UMR_VLD01_OAM_V6-Provider_eth0.xml')
root = tree.getroot()
# all items
print('\nAll item data:')
for elem in root:
all_descendants = list(elem.iter())
print(all_descendants)
How can i achieve this ?
The text in {} is the namespace part of the qualified name (QName) of the XML element. AFAIK there is no method in ElementTree to return only the local name. So, you have to either
extract the local part of the name with string handling, as already proposed in a comment to your question,
use lxml.etree instead of xml.etree.ElementTree and apply xpath('local-name()') on each element,
or provide an XML source without namespace. You can strip the namespace with XSLT.
So, given this XML input:
<?xml version="1.0" encoding="UTF-8"?>
<foo xmlns="urn:mitel:params:xml:ns:yang:vld">
<bar>
<baz x="1"/>
<yet>
<more>
<nested/>
</more>
</yet>
</bar>
<bar/>
</foo>
You can print a list of the local names only with this variation of your program:
import xml.etree.ElementTree as ET
tree = ET.parse('UMR_VLD01_OAM_V6-Provider_eth0.xml')
root = tree.getroot()
# all items
print('\nAll item data:')
for elem in root:
all_descendants = [e.tag.split('}', 1)[1] for e in elem.iter()]
print(all_descendants)
Output:
['bar', 'baz', 'yet', 'more', 'nested']
['bar']
The version with lxml.etree and xpath('local-name()') looks like this:
import lxml.etree as ET
tree = ET.parse('UMR_VLD01_OAM_V6-Provider_eth0.xml')
root = tree.getroot()
# all items
print('\nAll item data:')
for elem in root:
all_descendants = [e.xpath('local-name()') for e in elem.iter()]
print(all_descendants)
The output is the same as with the string handling version.
For stripping the namespace completely from your input, you can apply this XSLT:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" >
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="*">
<xsl:element name="{local-name()}">
<xsl:copy-of select="#*"/>
<xsl:apply-templates/>
</xsl:element>
</xsl:template>
</xsl:stylesheet>
Then your original program outputs:
[<Element 'bar' at 0x04583B40>, <Element 'baz' at 0x04583B70>, <Element 'yet' at 0x04583BD0>, <Element 'more' at 0x04583C30>, <Element 'nested' at 0x04583C90>]
[<Element 'bar' at 0x04583CC0>]
Now the elements themselves do not bear a namespace. So, you don't have to strip it anymore.
You can apply the XSLT with with xsltproc, then you don't need to change your program. Alternatively, you can apply XSLT in python, but this also requires you to use lxml.etree. So, the last variation of your program looks like this:
import lxml.etree as ET
tree = ET.parse('UMR_VLD01_OAM_V6-Provider_eth0.xml')
xslt = ET.parse('stripns.xslt')
transform = ET.XSLT(xslt)
tree = transform(tree)
root = tree.getroot()
# all items
print('\nAll item data:')
for elem in root:
all_descendants = list(elem.iter())
print(all_descendants)

Removing empty xml nodes

I have an xml file that I'm trying to remove empty nodes from with python. When I've tested it to check if a the value is, say, 'shark', it works. But when i check for it being none, it doesn't remove the empty node.
for records in recordList:
for fieldGroup in records:
for field in fieldGroup:
if field.text is None:
fieldGroup.remove(field)
xpath is your friend here.
from lxml import etree
doc = etree.XML("""<root><a>1</a><b><c></c></b><d></d></root>""")
def remove_empty_elements(doc):
for element in doc.xpath('//*[not(node())]'):
element.getparent().remove(element)
Then:
>>> print etree.tostring(doc,pretty_print=True)
<root>
<a>1</a>
<b>
<c/>
</b>
<d/>
</root>
>>> remove_empty_elements(doc)
>>> print etree.tostring(doc,pretty_print=True)
<root>
<a>1</a>
<b/>
</root>
>>> remove_empty_elements(doc)
>>> print etree.tostring(doc,pretty_print=True)
<root>
<a>1</a>
</root>

etree.strip_tags returning 'None' when trying to strip tag

Script:
print entryDetails
for i in range(len(entryDetails)):
print etree.tostring(entryDetails[i])
print etree.strip_tags(entryDetails[i], 'entry-details')
Output:
[<Element entry-details at 0x234e0a8>, <Element entry-details at 0x234e878>]
<entry-details>2014-02-05 11:57:01</entry-details>
None
<entry-details>2014-02-05 12:11:05</entry-details>
None
How is etree.strip_tags failing to strip the entry-details tag? Is the dash in the tag name affecting it?
strip_tags() does not return anything. It strips off the tags in-place.
The documentation says: "Note that this will not delete the element (or ElementTree root element) that you passed even if it matches. It will only treat its descendants.".
Demo code:
from lxml import etree
XML = """
<root>
<entry-details>ABC</entry-details>
</root>"""
root = etree.fromstring(XML)
ed = root.xpath("//entry-details")[0]
print ed
print
etree.strip_tags(ed, "entry-details") # Has no effect
print etree.tostring(root)
print
etree.strip_tags(root, "entry-details")
print etree.tostring(root)
Output:
<Element entry-details at 0x2123b98>
<root>
<entry-details>ABC</entry-details>
</root>
<root>
ABC
</root>

lxml: insert tag at a given position

I have an xml file, similar to this:
<tag attrib1='I'>
<subtag1 subattrib1='1'>
<subtext>text1</subtext>
</subtag1>
<subtag3 subattrib3='3'>
<subtext>text3</subtext>
</subtag3>
</tag>
I would like to insert a new subElement, so the result would be something like this
<tag attrib1='I'>
<subtag1 subattrib1='1'>
<subtext>text1</subtext>
</subtag1>
<subtag2 subattrib2='2'>
<subtext>text2</subtext>
</subtag2>
<subtag3 subattrib3='3'>
<subtext>text3</subtext>
</subtag3>
</tag>
I can append my xml file, but then the new elements will be inserted at the end. How can I force python lxml to put it into a given position?
Thanks for your help!
You can use the addnext() method:
from lxml import etree
XML= """
<tag attrib1='I'>
<subtag1 subattrib1='1'>
<subtext>text1</subtext>
</subtag1>
<subtag3 subattrib3='3'>
<subtext>text3</subtext>
</subtag3>
</tag>"""
parser = etree.XMLParser(remove_blank_text=True)
tag = etree.fromstring(XML, parser)
subtag1 = tag.find("subtag1")
subtag2 = etree.Element("subtag2", subattrib2="2")
subtext = etree.SubElement(subtag2, "subtext")
subtext.text = "text2"
subtag1.addnext(subtag2) # Add subtag2 as a following sibling of subtag1
print etree.tostring(tag, pretty_print=True)
Output:
<tag attrib1="I">
<subtag1 subattrib1="1">
<subtext>text1</subtext>
</subtag1>
<subtag2 subattrib2="2">
<subtext>text2</subtext>
</subtag2>
<subtag3 subattrib3="3">
<subtext>text3</subtext>
</subtag3>
</tag>
Alternative: use insert() on the root element:
subtag2 = etree.Element("subtag2", subattrib2="2")
subtext = etree.SubElement(subtag2, "subtext")
subtext.text = "text2"
tag.insert(1, subtag2) # Add subtag2 as the second child (index 1) of the root element

Parsing wsdl (retrieve namespaces from the definitions)using an Element Tree

I am trying to parse a wsdl file using ElementTree, As part of this I"d like to retrieve all the namespaces from a given wsdl definitions element.
For instance in the below snippet , I am trying to retrieve all the namespaces in the definitions tag
<?xml version="1.0"?>
<definitions name="DateService" targetNamespace="http://dev-b.handel-dev.local:8080/DateService.wsdl" xmlns:tns="http://dev-b.handel-dev.local:8080/DateService.wsdl"
xmlns="http://schemas.xmlsoap.org/wsdl/" xmlns:soap="http://schemas.xmlsoap.org/wsdl/soap/" xmlns:myType="DateType_NS" xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:wsdl="http://schemas.xmlsoap.org/wsdl/">
My code looks like this
import xml.etree.ElementTree as ET
xml_file='<path_to_my_wsdl>'
tree = xml.parse(xml_file)
rootElement = tree.getroot()
print (rootElement.tag) #{http://schemas.xmlsoap.org/wsdl/}definitions
print(rootElement.attrib) #targetNamespace="http://dev-b..../DateService.wsdl"
As I understand, in ElementTree the namespace URI is combined with the local name of the element .How can I retrieve all the namespace entries from the definitions element?
Appreciate your help on this
P.S: I am new (very!) to python
>>> import xml.etree.ElementTree as etree
>>> from StringIO import StringIO
>>>
>>> s = """<?xml version="1.0"?>
... <definitions
... name="DateService"
... targetNamespace="http://dev-b.handel-dev.local:8080/DateService.wsdl"
... xmlns:tns="http://dev-b.handel-dev.local:8080/DateService.wsdl"
... xmlns="http://schemas.xmlsoap.org/wsdl/"
... xmlns:soap="http://schemas.xmlsoap.org/wsdl/soap/"
... xmlns:myType="DateType_NS"
... xmlns:xsd="http://www.w3.org/2001/XMLSchema"
... xmlns:wsdl="http://schemas.xmlsoap.org/wsdl/">
... </definitions>"""
>>> file_ = StringIO(s)
>>> namespaces = []
>>> for event, elem in etree.iterparse(file_, events=('start-ns',)):
... print elem
...
(u'tns', 'http://dev-b.handel-dev.local:8080/DateService.wsdl')
('', 'http://schemas.xmlsoap.org/wsdl/')
(u'soap', 'http://schemas.xmlsoap.org/wsdl/soap/')
(u'myType', 'DateType_NS')
(u'xsd', 'http://www.w3.org/2001/XMLSchema')
(u'wsdl', 'http://schemas.xmlsoap.org/wsdl/')
Inspired by the ElementTree documentation
You can use lxml.
from lxml import etree
tree = etree.parse(file)
root = tree.getroot()
namespaces = root.nsmap
see https://stackoverflow.com/a/26807636/5375693

Categories