Parsing soap/XML response in Python - python

I am trying to parse the below xml using the python. I do not understand which type of xml this is as I never worked on this kind of xml.I just got it from a api response form Microsoft.
Now my question is how to parse and get the value of BinarySecurityToken in my python code.
I refer this question Parse XML SOAP response with Python
But look like this has also some xmlns to get the text .However in my xml I can't see any nearby xmlns value through I can get the value.
Please let me know how to get the value of a specific filed using python from below xml.
<?xml version="1.0" encoding="utf-8" ?>
<S:Envelope xmlns:S="http://www.w3.org/2003/05/soap-envelope" xmlns:wsse="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-secext-1.0.xsd" xmlns:wsu="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-utility-1.0.xsd" xmlns:wsa="http://www.w3.org/2005/08/addressing">
<S:Header>
<wsa:Action xmlns:S="http://www.w3.org/2003/05/soap-envelope" xmlns:wsa="http://www.w3.org/2005/08/addressing" xmlns:wsu="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-utility-1.0.xsd" wsu:Id="Action" S:mustUnderstand="1">http://schemas.xmlsoap.org/ws/2005/02/trust/RSTR/Issue</wsa:Action>
<wsa:To xmlns:S="http://www.w3.org/2003/05/soap-envelope" xmlns:wsa="http://www.w3.org/2005/08/addressing" xmlns:wsu="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-utility-1.0.xsd" wsu:Id="To" S:mustUnderstand="1">http://schemas.xmlsoap.org/ws/2004/08/addressing/role/anonymous</wsa:To>
<wsse:Security S:mustUnderstand="1">
<wsu:Timestamp xmlns:wsu="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-utility-1.0.xsd" wsu:Id="TS">
<wsu:Created>2017-06-12T10:23:01Z</wsu:Created>
<wsu:Expires>2017-06-12T10:28:01Z</wsu:Expires>
</wsu:Timestamp>
</wsse:Security>
</S:Header>
<S:Body>
<wst:RequestSecurityTokenResponse xmlns:S="http://www.w3.org/2003/05/soap-envelope" xmlns:wst="http://schemas.xmlsoap.org/ws/2005/02/trust" xmlns:wsse="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-secext-1.0.xsd" xmlns:wsu="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-utility-1.0.xsd" xmlns:saml="urn:oasis:names:tc:SAML:1.0:assertion" xmlns:wsp="http://schemas.xmlsoap.org/ws/2004/09/policy" xmlns:psf="http://schemas.microsoft.com/Passport/SoapServices/SOAPFault">
<wst:TokenType>urn:passport:compact</wst:TokenType>
<wsp:AppliesTo xmlns:wsa="http://www.w3.org/2005/08/addressing">
<wsa:EndpointReference>
<wsa:Address>https://something.something.something.com</wsa:Address>
</wsa:EndpointReference>
</wsp:AppliesTo>
<wst:Lifetime>
<wsu:Created>2017-06-12T10:23:01Z</wsu:Created>
<wsu:Expires>2017-06-13T10:23:01Z</wsu:Expires>
</wst:Lifetime>
<wst:RequestedSecurityToken>
<wsse:BinarySecurityToken Id="Compact0">my token</wsse:BinarySecurityToken>
</wst:RequestedSecurityToken>
<wst:RequestedAttachedReference>
<wsse:SecurityTokenReference>
<wsse:Reference URI="wwwww=">
</wsse:Reference>
</wsse:SecurityTokenReference>
</wst:RequestedAttachedReference>
<wst:RequestedUnattachedReference>
<wsse:SecurityTokenReference>
<wsse:Reference URI="swsw=">
</wsse:Reference>
</wsse:SecurityTokenReference>
</wst:RequestedUnattachedReference>
</wst:RequestSecurityTokenResponse>
</S:Body>
</S:Envelope>

This declaration is part of the start tag of the root element:
xmlns:wsse="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-secext-1.0.xsd"
It means that elements with the wsse prefix (such as BinarySecurityToken) are in the http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-secext-1.0.xsd namespace.
The solution is basically the same as in the answer to the linked question. It's just another namespace:
import xml.etree.ElementTree as ET
tree = ET.parse('soap.xml')
print tree.find('.//{http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-secext-1.0.xsd}BinarySecurityToken').text
Here is another way of doing it:
import xml.etree.ElementTree as ET
ns = {"wsse": "http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-secext-1.0.xsd"}
tree = ET.parse('soap.xml')
print tree.find('.//wsse:BinarySecurityToken', ns).text
The output in both cases is my token.
See https://docs.python.org/2.7/library/xml.etree.elementtree.html#parsing-xml-with-namespaces.

Creating a namespace dict helped me. Thank you #mzjn for linking that article.
In my SOAP response, I found that I was having to use the full path to the element to extract the text.
For example, I am working with FEDEX API, and one element that I needed to find was TrackDetails. My initial .find() looked like .find('{http://fedex.com/ws/track/v16}TrackDetails')
I was able to simplify this to the following:
ns = {'TrackDetails': 'http://fedex.com/ws/track/v16'}
tree.find('TrackDetails:TrackDetails',ns)
You see TrackDetails twice because I named the key TrackDetails in the dict, but you could name this anything you want. Just helped me to remember what I was working on in my project, but the TrackDetails after the : is the actual element in the SOAP response that I need.
Hope this helps someone!

Related

How can we parse xml data that contains nodes with xml namespace tags in python?

I am getting XML as a response so I want to parse it. I tried many python libraries but not get my desired results. So if you can help, it will be really appreciative.
The following code returns None:
xmlResponse = ET.fromstring(context.response_document)
a = xmlResponse.findall('.//Body')
print(a)
Sample XML Data:
<S:Envelope
xmlns:S="http://www.w3.org/2003/05/soap-envelope">
<S:Header>
<wsa:Action s:mustUnderstand="1"
xmlns:s="http://www.w3.org/2003/05/soap-envelope"
xmlns:wsa="http://www.w3.org/2005/08/addressing">urn:ihe:iti:2007:RegistryStoredQueryResponse
</wsa:Action>
</S:Header>
<S:Body>
<query:AdhocQueryResponse status="urn:oasis:names:tc:ebxml-regrep:ResponseStatusType:Success"
xmlns:query="urn:oasis:names:tc:ebxml-regrep:xsd:query:3.0">
<rim:RegistryObjectList
xmlns:rim="u`enter code here`rn:oasis:names:tc:ebxml-regrep:xsd:rim:3.0"/>
</query:AdhocQueryResponse>
</S:Body>
</S:Envelope>
I want to get status from it which is in Body. If you can suggest some changes of some library then please help me. Thanks
Given the following base code:
import xml.etree.ElementTree as ET
root = ET.fromstring(xml)
Let's build on top of it to get your desired output.
Your initial find for .//Body x-path returns NONE because it doesn't exist in your XML response.
Each tag in your XML has a namespace associated with it. More info on xml namespaces can be found here.
Consider the following line with xmlns value (xml-namespace):
<S:Envelope xmlns:S="http://www.w3.org/2003/05/soap-envelope">
The value of namespace S is set to be http://www.w3.org/2003/05/soap-envelope.
Replacing S in {S}Envelope with value set above will give you the resulting tag to find in your XML:
root.find('{http://www.w3.org/2003/05/soap-envelope}Envelope') #top most node
We would need to do the same for <S:Body>.
To get<S:Body> elements and it's child nodes you can do the following:
body_node = root.find('{http://www.w3.org/2003/05/soap-envelope}Body')
for response_child_node in list(body_node):
print(response_child_node.tag) #tag of the child node
print(response_child_node.get('status')) #the status you're looking for
Outputs:
{urn:oasis:names:tc:ebxml-regrep:xsd:query:3.0}AdhocQueryResponse
urn:oasis:names:tc:ebxml-regrep:ResponseStatusType:Success
Alternatively
You can also directly find all {query}AdhocQueryResponse in your XML using:
response_nodes = root.findall('.//{urn:oasis:names:tc:ebxml-regrep:xsd:query:3.0}AdhocQueryResponse')
for response in response_nodes:
print(response.get('status'))
Outputs:
urn:oasis:names:tc:ebxml-regrep:ResponseStatusType:Success

Don't get a usable XML string with .tostring when using xml.etree.ElementTree in python

I've tried a lot, but I haven't found a working solution to my problem, so I hope you can help me:
I am about to write a python module, which sends an XML request to a DNS server, receives an XML as response and should process this response. However, I am already failing in sending the request.
I have an XML base structure which I want to fill with different elements depending on the action to be performed on the DNS.
For this purpose I read in an XML string with .fromstring, edit the xml object and want to send it back to the server with .tostring. The problem is that .tostring does not return a usable xml string. The following example shows what I mean:
import xml.etree.ElementTree as ET
import requests
headers = {'HeaderSOAP': 'SOAPAction:urn:QIPServices#getEntry'}
body = """<soapenv:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-
instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:urn="urn:QIPServices">
<soapenv:Header/>
<soapenv:Body>
<urn:getEntry soapenv:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">
<login xsi:type="xsd:string">USERNAME</login>
<password xsi:type="xsd:string">PASSWD</password>
<sharedsecret xsi:type="xsd:string">SECRET</sharedsecret>
<VPN xsi:type="xsd:string">VPN</VPN>
<IPoderName xsi:type="xsd:string">IPorFQDN</IPoderName>
</urn:getEntry>
</soapenv:Body>
</soapenv:Envelope>
"""
root = ET.fromstring(body)
ThatsTheProblem = ET.tostring(root, encoding='utf-8')
print(ThatsTheProblem)
returns:
b'<ns0:Envelope xmlns:ns0="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:ns1="urn:QIPServices" xmlns:xsi="http://www.w3.org/2001/XMLSchema-
instance">\n <ns0:Header />\n <ns0:Body>\n
<ns1:getEntry
ns0:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">\n
<login xsi:type="xsd:string">USERNAME</login>\n <password
xsi:type="xsd:string">PASSWD</password>\n <sharedsecret
xsi:type="xsd:string">SECRET</sharedsecret>\n <VPN
xsi:type="xsd:string">VPN</VPN>\n <IPoderName
xsi:type="xsd:string">IPorFQDN</IPoderName>\n
</ns1:getEntry>\n </ns0:Body>\n </ns0:Envelope>'
Without having changed anything, the import and output not only changed the complete formatting, there are also whitespaces everywhere in the file. When I send this XML to the server using
response = requests.post(url,data=ThatsTheProblem, headers=headers)
I get the following answer:
Application failed during request deserialization: Unresolved prefix \'xsd\' for attribute value \'xsd:string\'\n
Which I attribute to the problem I described at the beginning.
If anyone has a solution to this problem I would be very grateful.
Thanks and have a nice day.

XPath with LXML Element

I am trying to parse an XML document using lxml etree. The XML doc I am parsing looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<metadata xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.openarchives.org/OAI/2.0/">\t
<codeBook version="2.5" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="ddi:codebook:2_5" xsi:schemaLocation="ddi:codebook:2_5 http://www.ddialliance.org/Specification/DDI-Codebook/2.5/XMLSchema/codebook.xsd">
<docDscr>
<citation>
<titlStmt>
<titl>Test Title</titl>
</titlStmt>
<prodStmt>
<prodDate/>
</prodStmt>
</citation>
</docDscr>
<stdyDscr>
<citation>
<titlStmt>
<titl>Test Title 2</titl>
<IDNo agency="UKDA">101</IDNo>
</titlStmt>
<rspStmt>
<AuthEnty>TestAuthEntry</AuthEnty>
</rspStmt>
<prodStmt>
<copyright>Yes</copyright>
</prodStmt>
<distStmt/>
<verStmt>
<version date="">1</version>
</verStmt>
</citation>
<stdyInfo>
<subject>
<keyword>2009</keyword>
<keyword>2010</keyword>
<topcClas>CLASS</topcClas>
<topcClas>ffdsf</topcClas>
</subject>
<abstract>This is an abstract piece of text.</abstract>
<sumDscr>
<timePrd event="single">2020</timePrd>
<nation>UK</nation>
<anlyUnit>Test</anlyUnit>
<universe>test</universe>
<universe>hello</universe>
<dataKind>fdsfdsf</dataKind>
</sumDscr>
</stdyInfo>
<method>
<dataColl>
<timeMeth>test timemeth</timeMeth>
<dataCollector>test data collector</dataCollector>
<sampProc>test sampprocess</sampProc>
<deviat>test deviat</deviat>
<collMode>test collMode</collMode>
<sources/>
</dataColl>
</method>
<dataAccs>
<setAvail>
<accsPlac>Test accsPlac</accsPlac>
</setAvail>
<useStmt>
<restrctn>NONE</restrctn>
</useStmt>
</dataAccs>
<othrStdyMat>
<relPubl>122</relPubl>
<relPubl>12332</relPubl>
</othrStdyMat>
</stdyDscr>
</codeBook>
</metadata>
I wrote the following code to try and process it:
from lxml import etree
import pdb
f = open('/vagrant/out2.xml', 'r')
xml_str = f.read()
xml_doc = etree.fromstring(xml_str)
f.close()
From what I understand from the lxml xpath docs, I should be able to get the text from a specific element as follows:
xml_doc.xpath('/metadata/codeBook/docDscr/citation/titlStmt/titl/text()')
However, when I run this it returns an empty array.
The only xpath I can get to return something is using a wildcard:
xml_doc.xpath('*')
Which returns [<Element {ddi:codebook:2_5}codeBook at 0x7f8da8a413f8>].
I've read through the docs and I'm not understanding what is going wrong with this. Any help is appreciated.
You need to take the default namespace into account so instead of
xml_doc.xpath('/metadata/codeBook/docDscr/citation/titlStmt/titl/text()')
use
xml_doc.xpath.xpath(
'/oai:metadata/ddi:codeBook/ddi:docDscr/ddi:citation/ddi:titlStmt/ddi:titl/text()',
namespaces={
'oai': 'http://www.openarchives.org/OAI/2.0/',
'ddi': 'ddi:codebook:2_5'
}
)

Python - parse xml with lxml trouble

I've found a lot of questions on this issue but nothing I saw fits mine. I'm new to lxml so need some help.
my users.xml file:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<user>
<login>elena</login>
<password>elena</password>
<group>1</group>
</user>
<user>
<login>anele</login>
<password>anele</password>
<group>2</group>
</user>
</root>
the trouble function:
def analize_data(login):
doc = etree.parse("/myapp/users.xml")
for elem in doc.iter(tag='login'):
if elem.text == login:
parent = elem.getparent()
group = etree.SubElement(parent, 'group')
return group.text
What I need:
to find a user tag with login passed to function and get the text of group subelement of this user. But this function returns None when testing. What am I doing wrong and how to fix it?
I'm new to all these things, so need help. Thanks in advance!
Try using:
group = parent.iterchildren(tag="group").next()
etree.SubElement does something completely different:
This function creates an element instance, and appends it to an existing element.
Which is clearly not what you want.

Element Tree doesn't load a Google Earth-exported KML

I have a problem related to a Google Earth exported KML, as it doesn't seem to work well with Element Tree. I don't have a clue where the problem might lie, so I will explain how I do everything.
Here is the relevant code:
kmlFile = open( filePath, 'r' ).read( -1 ) # read the whole file as text
kmlFile = kmlFile.replace( 'gx:', 'gx' ) # we need this as otherwise the Element Tree parser
# will give an error
kmlData = ET.fromstring( kmlFile )
document = kmlData.find( 'Document' )
With this code, ET (Element Tree object) creates an Element object accessible via variable kmlData. It points to the root element ('kml' tag). However, when I run a search for the sub-element 'Document', it returns None. Although the 'Document' tag is present in the KML file!
Are there any other discrepancies between KMLs and XMLs apart from the 'gx: smth' tags? I have searched through the KML files I am dealing with and found nothing suspicious. Here is a simplified structure of an KML file the program is supposed to deal with:
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://earth.google.com/kml/2.2">
<Document>
<name>UK.kmz</name>
<Style id="sh_blu-blank">
<IconStyle>
<scale>1.3</scale>
<Icon>
<href>http://maps.google.com/mapfiles/kml/paddle/blu-blank.png</href>
</Icon>
<hotSpot x="32" y="1" xunits="pixels" yunits="pixels"/>
</IconStyle>
<ListStyle>
<ItemIcon>
<href>http://maps.google.com/mapfiles/kml/paddle/blu-blank-lv.png</href>
</ItemIcon>
</ListStyle>
</Style>
[other style tags...]
<Folder>
<name>UK</name>
<Placemark>
<name>1262 Crossness Pumping Station</name>
<LookAt>
<longitude>0.1329926667038817</longitude>
<latitude>51.50303535104574</latitude>
<altitude>0</altitude>
<range>4246.539753518848</range>
<tilt>0</tilt>
<heading>-4.295161152207489</heading>
<altitudeMode>relativeToGround</altitudeMode>
<gx:altitudeMode>relativeToSeaFloor</gx:altitudeMode>
</LookAt>
<styleUrl>#msn_blu-blank15000</styleUrl>
<Point>
<coordinates>0.1389579668507301,51.50888923518947,0</coordinates>
</Point>
</Placemark>
[other placemark tags...]
</Folder>
</Document>
</kml>
Do you have an idea why I can't access any sub-elements of 'kml'? By the way, Python version is 2.7.
The KML document is in the http://earth.google.com/kml/2.2 namespace, as indicated by
<kml xmlns="http://earth.google.com/kml/2.2">
This means that the name of the Document element is in fact {http://earth.google.com/kml/2.2}Document.
Instead of this:
document = kmlData.find('Document')
you need this:
document = kmlData.find('{http://earth.google.com/kml/2.2}Document')
However, there is a problem with the XML file. There is an element called gx:altitudeMode. The gx bit is a namespace prefix. Such a prefix needs to be declared, but the declaration is missing.
You have worked around the problem by simply replacing gx: with gx. But the proper way to do this would be to add the namespace declaration. Based on https://developers.google.com/kml/documentation/altitudemode, I take it that gx is associated with the http://www.google.com/kml/ext/2.2 namespace. So for the document to be well-formed, the root element start tag should read
<kml xmlns="http://earth.google.com/kml/2.2" xmlns:gx="http://www.google.com/kml/ext/2.2">
Now the document can be parsed:
In [1]: from xml.etree import ElementTree as ET
In [2]: kmlData = ET.parse("kml2.xml")
In [3]: document = kmlData.find('{http://earth.google.com/kml/2.2}Document')
In [4]: document
Out[4]: <Element '{http://earth.google.com/kml/2.2}Document' at 0x1895810>
In [5]:

Categories