I am trying to parse a wsdl file using ElementTree, As part of this I"d like to retrieve all the namespaces from a given wsdl definitions element.
For instance in the below snippet , I am trying to retrieve all the namespaces in the definitions tag
<?xml version="1.0"?>
<definitions name="DateService" targetNamespace="http://dev-b.handel-dev.local:8080/DateService.wsdl" xmlns:tns="http://dev-b.handel-dev.local:8080/DateService.wsdl"
xmlns="http://schemas.xmlsoap.org/wsdl/" xmlns:soap="http://schemas.xmlsoap.org/wsdl/soap/" xmlns:myType="DateType_NS" xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:wsdl="http://schemas.xmlsoap.org/wsdl/">
My code looks like this
import xml.etree.ElementTree as ET
xml_file='<path_to_my_wsdl>'
tree = xml.parse(xml_file)
rootElement = tree.getroot()
print (rootElement.tag) #{http://schemas.xmlsoap.org/wsdl/}definitions
print(rootElement.attrib) #targetNamespace="http://dev-b..../DateService.wsdl"
As I understand, in ElementTree the namespace URI is combined with the local name of the element .How can I retrieve all the namespace entries from the definitions element?
Appreciate your help on this
P.S: I am new (very!) to python
>>> import xml.etree.ElementTree as etree
>>> from StringIO import StringIO
>>>
>>> s = """<?xml version="1.0"?>
... <definitions
... name="DateService"
... targetNamespace="http://dev-b.handel-dev.local:8080/DateService.wsdl"
... xmlns:tns="http://dev-b.handel-dev.local:8080/DateService.wsdl"
... xmlns="http://schemas.xmlsoap.org/wsdl/"
... xmlns:soap="http://schemas.xmlsoap.org/wsdl/soap/"
... xmlns:myType="DateType_NS"
... xmlns:xsd="http://www.w3.org/2001/XMLSchema"
... xmlns:wsdl="http://schemas.xmlsoap.org/wsdl/">
... </definitions>"""
>>> file_ = StringIO(s)
>>> namespaces = []
>>> for event, elem in etree.iterparse(file_, events=('start-ns',)):
... print elem
...
(u'tns', 'http://dev-b.handel-dev.local:8080/DateService.wsdl')
('', 'http://schemas.xmlsoap.org/wsdl/')
(u'soap', 'http://schemas.xmlsoap.org/wsdl/soap/')
(u'myType', 'DateType_NS')
(u'xsd', 'http://www.w3.org/2001/XMLSchema')
(u'wsdl', 'http://schemas.xmlsoap.org/wsdl/')
Inspired by the ElementTree documentation
You can use lxml.
from lxml import etree
tree = etree.parse(file)
root = tree.getroot()
namespaces = root.nsmap
see https://stackoverflow.com/a/26807636/5375693
Related
i want to search a specific word(which is entered by user) in .xml file. This is my xml file.
<?xml version="1.0" encoding="UTF-8"?>
<words>
<entry>
<word>John</word>
<pron>()</pron>
<gram>[Noun]</gram>
<poem></poem>
<meanings>
<meaning>name</meaning>
</meanings>
</entry>
</words>
here is my Code
import nltk
from nltk.tokenize import word_tokenize
import os
import xml.etree.ElementTree as etree
sen = input("Enter Your sentence - ")
print(sen)
print("\n")
print(word_tokenize(sen)[0])
tree = etree.parse('roman.xml')
node=etree.fromstring(tree)
#node=etree.fromstring('<a><word>waya</word><gram>[Noun]</gram>
<meaning>talking</meaning></a>')
s = node.findtext(word_tokenize(sen)[0])
print(s)
i have tried everything but still its giving me error
a bytes-like object is required, not 'ElementTree'
i really don't know how to solve it.
the error happens because you are passing an elementtree object to the fromstring () methods. Do like this:
>>> import os
>>> import xml.etree.ElementTree as etree
>>> a = etree.parse('a.xml')
>>> a
<xml.etree.ElementTree.ElementTree object at 0x10fcabeb8>
>>> b = a.getroot()
>>> b
<Element 'words' at 0x10fb21f48>
>>> b[0][0].text
'John'
Use find() and findall() methods to search.
for more info, check lib: https://docs.python.org/3/library/xml.etree.elementtree.html
Simple example:
test.xml
<?xml version="1.0" encoding="UTF-8"?>
<words>
<word value="John"></word>
<word value="Mike"></word>
<word value="Scott"></word>
</words>
example.py
root = ET.parse("test.xml")
>>> search = root.findall(".//word/.[#value='John']")
>>> search
[<Element 'word' at 0x10be9c868>]
>>> search[0].attrib
{'value': 'John'}
>>> search[0].tag
'word'
I have parsed an XML file to get all its elements. I am getting the following output
[<Element '{urn:mitel:params:xml:ns:yang:vld}vld-list' at 0x0000000003059188>, <Element '{urn:mitel:params:xml:ns:yang:vld}vl-id' at 0x00000000030689F8>, <Element '{urn:mitel:params:xml:ns:yang:vld}descriptor-version' at 0x0000000003068A48>]
I need to select the value between } and ' only for each element of the list.
This is my Code till now :
import xml.etree.ElementTree as ET
tree = ET.parse('UMR_VLD01_OAM_V6-Provider_eth0.xml')
root = tree.getroot()
# all items
print('\nAll item data:')
for elem in root:
all_descendants = list(elem.iter())
print(all_descendants)
How can i achieve this ?
The text in {} is the namespace part of the qualified name (QName) of the XML element. AFAIK there is no method in ElementTree to return only the local name. So, you have to either
extract the local part of the name with string handling, as already proposed in a comment to your question,
use lxml.etree instead of xml.etree.ElementTree and apply xpath('local-name()') on each element,
or provide an XML source without namespace. You can strip the namespace with XSLT.
So, given this XML input:
<?xml version="1.0" encoding="UTF-8"?>
<foo xmlns="urn:mitel:params:xml:ns:yang:vld">
<bar>
<baz x="1"/>
<yet>
<more>
<nested/>
</more>
</yet>
</bar>
<bar/>
</foo>
You can print a list of the local names only with this variation of your program:
import xml.etree.ElementTree as ET
tree = ET.parse('UMR_VLD01_OAM_V6-Provider_eth0.xml')
root = tree.getroot()
# all items
print('\nAll item data:')
for elem in root:
all_descendants = [e.tag.split('}', 1)[1] for e in elem.iter()]
print(all_descendants)
Output:
['bar', 'baz', 'yet', 'more', 'nested']
['bar']
The version with lxml.etree and xpath('local-name()') looks like this:
import lxml.etree as ET
tree = ET.parse('UMR_VLD01_OAM_V6-Provider_eth0.xml')
root = tree.getroot()
# all items
print('\nAll item data:')
for elem in root:
all_descendants = [e.xpath('local-name()') for e in elem.iter()]
print(all_descendants)
The output is the same as with the string handling version.
For stripping the namespace completely from your input, you can apply this XSLT:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" >
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="*">
<xsl:element name="{local-name()}">
<xsl:copy-of select="#*"/>
<xsl:apply-templates/>
</xsl:element>
</xsl:template>
</xsl:stylesheet>
Then your original program outputs:
[<Element 'bar' at 0x04583B40>, <Element 'baz' at 0x04583B70>, <Element 'yet' at 0x04583BD0>, <Element 'more' at 0x04583C30>, <Element 'nested' at 0x04583C90>]
[<Element 'bar' at 0x04583CC0>]
Now the elements themselves do not bear a namespace. So, you don't have to strip it anymore.
You can apply the XSLT with with xsltproc, then you don't need to change your program. Alternatively, you can apply XSLT in python, but this also requires you to use lxml.etree. So, the last variation of your program looks like this:
import lxml.etree as ET
tree = ET.parse('UMR_VLD01_OAM_V6-Provider_eth0.xml')
xslt = ET.parse('stripns.xslt')
transform = ET.XSLT(xslt)
tree = transform(tree)
root = tree.getroot()
# all items
print('\nAll item data:')
for elem in root:
all_descendants = list(elem.iter())
print(all_descendants)
I have the following Input XML:
<?xml version="1.0" encoding="utf-8"?>
<Scenario xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="Scenario.xsd">
<TestCase>test_startup_0029</TestCase>
<ShortDescription>Restart of the EVC with missing ODO5 board.</ShortDescription>
<Events>
<Event Num="1">Switch on the EVC</Event>
</Events>
<HW-configuration>
<ELBE5A>true</ELBE5A>
<ELBE5K>false</ELBE5K>
</HW-configuration>
<SystemFailure>true</SystemFailure>
</Scenario>
My Program does add three Tags to the XML but they are formatted false.
The Output XML looks like the following:
<Scenario xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="Scenario.xsd">
<TestCase>test_startup_0029</TestCase>
<ShortDescription>Restart of the EVC with missing ODO5 board.</ShortDescription>
<Events>
<Event Num="1">Switch on the EVC</Event>
</Events>
<HW-configuration>
<ELBE5A>true</ELBE5A>
<ELBE5K>false</ELBE5K>
</HW-configuration>
<SystemFailure>true</SystemFailure>
<Duration>12</Duration><EVC-SW-Version>08.02.0001.0027</EVC-SW-Version><STAC-Release>08.02.0001.0027</STAC-Release></Scenario>
Thats my Source-Code:
class XmlManager:
#staticmethod
def write_xml(xml_path, duration, evc_sw_version):
xml_path = os.path.abspath(xml_path)
if os.path.isfile(xml_path) and xml_path.endswith(".xml"):
# parse XML into etree
root = etree.parse(xml_path).getroot()
# add tags
duration_tag = etree.SubElement(root, "Duration")
duration_tag.text = duration
sw_version_tag = etree.SubElement(root, "EVC-SW-Version")
sw_version_tag.text = evc_sw_version
stac_release = evc_sw_version
stac_release_tag = etree.SubElement(root, "STAC-Release")
stac_release_tag.text = stac_release
# write changes to the XML-file
tree = etree.ElementTree(root)
tree.write(xml_path, pretty_print=False)
else:
XmlManager.logger.log("Invalid path to XML-file")
def main():
xml = r".\Test_Input_Data_Base\blnmerf1_md1czjyc_REL_V_08.01.0001.000x\Test_startup_0029\Test_startup_0029.xml"
XmlManager.write_xml(xml, "12", "08.02.0001.0027")
My Question is how to add the new tags to the XML in the right format. I guess its working that way for parsing again the changed XML but its not nice formated. Any Ideas? Thanks in advance.
To ensure nice pretty-printed output, you need to do two things:
Parse the input file using an XMLParser object with remove_blank_text=True.
Write the output using pretty_print=True
Example:
from lxml import etree
parser = etree.XMLParser(remove_blank_text=True)
tree = etree.parse("Test_startup_0029.xml", parser)
root = tree.getroot()
duration_tag = etree.SubElement(root, "Duration")
duration_tag.text = "12"
sw_version_tag = etree.SubElement(root, "EVC-SW-Version")
sw_version_tag.text = "08.02.0001.0027"
stac_release_tag = etree.SubElement(root, "STAC-Release")
stac_release_tag.text = "08.02.0001.0027"
tree.write("output.xml", pretty_print=True)
Contents of output.xml:
<Scenario xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="Scenario.xsd">
<TestCase>test_startup_0029</TestCase>
<ShortDescription>Restart of the EVC with missing ODO5 board.</ShortDescription>
<Events>
<Event Num="1">Switch on the EVC</Event>
</Events>
<HW-configuration>
<ELBE5A>true</ELBE5A>
<ELBE5K>false</ELBE5K>
</HW-configuration>
<SystemFailure>true</SystemFailure>
<Duration>12</Duration>
<EVC-SW-Version>08.02.0001.0027</EVC-SW-Version>
<STAC-Release>08.02.0001.0027</STAC-Release>
</Scenario>
See also http://lxml.de/FAQ.html#why-doesn-t-the-pretty-print-option-reformat-my-xml-output.
I need to parse an xml file, lest say called example.xml, that looks like the following:
<?xml version="1.0" encoding="ISO-8859-1"?>
<nf:rpc-reply xmlns:nf="urn:ietf:params:xml:ns:netconf:base:1.0" xmlns="http://www.cisco.com/nxos:1.0:if_manager">
<nf:data>
<show>
<interface>
<__XML__OPT_Cmd_show_interface___readonly__>
<__readonly__>
<TABLE_interface>
<ROW_interface>
<interface>Ethernet1/1</interface>
<state>down</state>
<state_rsn_desc>Link not connected</state_rsn_desc>
<admin_state>up</admin_state>
I need to get "interface", and "state" elements as such: ['Ethernet1/1', 'down']
Here is my solution that doesnt work:
from lxml import etree
parser = etree.XMLParser()
tree = etree.parse('example.xml', parser)
print tree.xpath('//*/*/*/*/*/*/*/*/interface/text()')
print tree.xpath('//*/*/*/*/*/*/*/*/state/text()')
You need to handle namespaces here:
from lxml import etree
parser = etree.XMLParser()
tree = etree.parse('example.xml', parser)
ns = {'ns': 'http://www.cisco.com/nxos:1.0:if_manager'}
interface = tree.find('//ns:ROW_interface', namespaces=ns)
print [interface.find('.//ns:interface', namespaces=ns).text,
interface.find('.//ns:state', namespaces=ns).text]
Prints:
['Ethernet1/1', 'down']
Using collections.namedtuple():
interface_node = tree.find('//ns:ROW_interface', ns)
Interface = namedtuple('Interface', ['interface', 'state'])
interface = Interface(interface=interface_node.find('.//ns:interface', ns).text,
state=interface_node.find('.//ns:state', ns).text)
print interface
Prints:
Interface(interface='Ethernet1/1', state='down')
Here's the code I have:
from cStringIO import StringIO
from lxml import etree
xml = StringIO('''<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE root [
<!ENTITY test "This is a test">
]>
<root>
<sub>&test;</sub>
</root>''')
d1 = etree.parse(xml)
print '%r' % d1.find('/sub').text
parser = etree.XMLParser(resolve_entities=False)
d2 = etree.parse(xml, parser=parser)
print '%r' % d2.find('/sub').text
Here's the output:
'This is a test'
None
How do I get lxml to give me '&test;', i.e., the raw entity reference?
The "unresolved" Entity is left as child node of the element node sub
>>> print d2.find('/sub')[0]
&test;
>>> d2.find('/sub').getchildren()
[&test;]