Parse XML SOAP response with Python - python

I want parse this response from SOAP and extract text between <LoginResult> :
<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<soap:Body>
<LoginResponse xmlns="http://tempuri.org/wsSalesQuotation/Service1">
<LoginResult>45eeadF43423KKmP33</LoginResult>
</LoginResponse>
</soap:Body>
</soap:Envelope>
How I can do it using XML Python Libs?

import xml.etree.ElementTree as ET
tree = ET.parse('soap.xml')
print tree.find('.//{http://tempuri.org/wsSalesQuotation/Service1}LoginResult').text
>>45eeadF43423KKmP33
instead of print, do something useful to it.

Related

Remove unwanted tags from XML file

I working on a XML file that contains soap tags in it. I want to remove those soap tags as part of XML cleanup process.
How can I achieve it in either Python or Scala. Should not use shell script.
Sample Input :
<?xml version="1.0" encoding="UTF-8"?>
<soap:Envelope xmlns:soap="http://sample.com/">
<soap:Body>
<com:RESPONSE xmlns:com="http://sample.com/">
<Student>
<StudentID>100234</StudentID>
<Gender>Male</Gender>
<Surname>Robert</Surname>
<Firstname>Mathews</Firstname>
</Student>
</com:RESPONSE>
</soap:Body>
</soap:Envelope>
Expected Output :
<?xml version="1.0" encoding="UTF-8"?>
<com:RESPONSE xmlns:com="http://sample.com/">
<Student>
<StudentID>100234</StudentID>
<Gender>Male</Gender>
<Surname>Robert</Surname>
<Firstname>Mathews</Firstname>
</Student>
</com:RESPONSE>
This could help you!
from lxml import etree
doc = etree.parse('test.xml')
for ele in doc.xpath('//soap'):
parent = ele.getparent()
parent.remove(ele)
print(etree.tostring(doc))

How to fetch data from multiple namespaces in a soap response using python: xml.etree.ElementTree

How to fetch value of through the correct xpath using python xml.etree.ElementTree
This is the 'soap.xml' file
<?xml version="1.0" encoding="UTF-8"?>
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<soapenv:Header>
<platformMsgs:documentInfo xmlns:platformMsgs="urn:messages_2014_1.platform.webservices.netsuite.com">
<platformMsgs:nsId>WEBSERVICES_3883026_SB1_103120188552030381995151284_f039e9cc</platformMsgs:nsId>
</platformMsgs:documentInfo>
</soapenv:Header>
<soapenv:Body>
<searchResponse xmlns="urn:messages_2014_1.platform.webservices.netsuite.com">
<platformCore:searchResult xmlns:platformCore="urn:core_2014_1.platform.webservices.netsuite.com">
<platformCore:status isSuccess="true"/>
<platformCore:totalRecords>200</platformCore:totalRecords>
<platformCore:pageSize>1000</platformCore:pageSize>
<platformCore:totalPages>1</platformCore:totalPages>
<platformCore:pageIndex>1</platformCore:pageIndex>
<platformCore:searchId>WEBSERVICES_3883026_SB1_103120188552030381995151284_f039e9cc</platformCore:searchId>
</platformCore:searchResult>
</searchResponse>
</soapenv:Body>
</soapenv:Envelope>
Here is my code.
import xml.etree.ElementTree as ET
tree = ET.parse('soap.xml')
print tree.findall('.//{http://schemas.xmlsoap.org/soap/envelope/}platformCore')
But it return [] instead of value = '200'
You need to use the correct namespace (nsmap), then you can search easily using prefixes:
from lxml import etree
tree = etree.parse('soap.xml')
nsmap = {'platformCore': 'urn:core_2014_1.platform.webservices.netsuite.com'}
print (tree.findall('.//platformCore:totalRecords', nsmap)[0].text)
Returns:
200

Paser XML in python

I am getting this xml response, can anybody help me in getting the token from the xml tags?
<s:Envelope xmlns:s="http://schemas.xmlsoap.org/soap/envelope/"><s:Body><LoginResponse xmlns="http://videoos.net/2/XProtectCSServerCommand"><LoginResult xmlns:i="http://www.w3.org/2001/XMLSchema-instance"><RegistrationTime>2018-09-06T07:30:38.4571763Z</RegistrationTime><TimeToLive><MicroSeconds>3600000000</MicroSeconds></TimeToLive><TimeToLiveLimited>false</TimeToLiveLimited><Token>TOKEN#xxxxx#</Token></LoginResult></LoginResponse></s:Body></s:Envelope>
I have it as a string
Tried lxml and other libs too like ET but wasn't able to extract the token field. HELPPP
Update with a format xml to make you easy to read, FYI.
<?xml version="1.0" encoding="utf-8"?>
<s:Envelope xmlns:s="http://schemas.xmlsoap.org/soap/envelope/">
<s:Body>
<LoginResponse xmlns="http://videoos.net/2/XProtectCSServerCommand">
<LoginResult xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
<RegistrationTime>2018-09-06T07:30:38.4571763Z</RegistrationTime>
<TimeToLive>
<MicroSeconds>3600000000</MicroSeconds>
</TimeToLive>
<TimeToLiveLimited>false</TimeToLiveLimited>
<Token>TOKEN#xxxxx#</Token>
</LoginResult>
</LoginResponse>
</s:Body>
</s:Envelope>
text = """
<?xml version="1.0" encoding="utf-8"?>
<s:Envelope xmlns:s="http://schemas.xmlsoap.org/soap/envelope/">
<s:Body>
<LoginResponse xmlns="http://videoos.net/2/XProtectCSServerCommand">
<LoginResult xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
<RegistrationTime>2018-09-06T07:30:38.4571763Z</RegistrationTime>
<TimeToLive>
<MicroSeconds>3600000000</MicroSeconds>
</TimeToLive>
<TimeToLiveLimited>false</TimeToLiveLimited>
<Token>TOKEN#xxxxx#</Token>
</LoginResult>
</LoginResponse>
</s:Body>
</s:Envelope>
"""
from bs4 import BeautifulSoup
parser = BeautifulSoup(text,'xml')
for item in parser.find_all('Token'):
print(item.text)
Using lxml
Demo:
x = '''<?xml version="1.0" encoding="utf-8"?>
<s:Envelope xmlns:s="http://schemas.xmlsoap.org/soap/envelope/">
<s:Body>
<LoginResponse xmlns="http://videoos.net/2/XProtectCSServerCommand">
<LoginResult xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
<RegistrationTime>2018-09-06T07:30:38.4571763Z</RegistrationTime>
<TimeToLive>
<MicroSeconds>3600000000</MicroSeconds>
</TimeToLive>
<TimeToLiveLimited>false</TimeToLiveLimited>
<Token>TOKEN#xxxxx#</Token>
</LoginResult>
</LoginResponse>
</s:Body>
</s:Envelope>'''
from lxml import etree
xmltree = etree.fromstring(x)
namespaces = {'content': "http://videoos.net/2/XProtectCSServerCommand"}
items = xmltree.xpath('//content:Token/text()', namespaces=namespaces)
print(items)
Output:
['TOKEN#xxxxx#']

Parsing XML with namespaces into a dataframe

I have the following simpplified XML:
<?xml version="1.0" encoding="UTF-8"?>
<soap:Envelope xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:soap="http://www.w3.org/2003/05/soap-envelope">
<soap:Body>
<ReadResponse xmlns="ABCDEFG.com">
<ReadResult>
<Value>
<Alias>x1</Alias>
<Timestamp>2013-11-11T00:00:00</Timestamp>
<Val>113</Val>
<Duration>5000</Duration>
<Quality>128</Quality>
</Value>
<Value>
<Alias>x1</Alias>
<Timestamp>2014-11-11T00:02:00</Timestamp>
<Val>110</Val>
<Duration>5000</Duration>
<Quality>128</Quality>
</Value>
<Value>
<Alias>x2</Alias>
<Timestamp>2013-11-11T00:00:00</Timestamp>
<Val>101</Val>
<Duration>5000</Duration>
<Quality>128</Quality>
</Value>
<Value>
<Alias>x2</Alias>
<Timestamp>2014-11-11T00:02:00</Timestamp>
<Val>122</Val>
<Duration>5000</Duration>
<Quality>128</Quality>
</Value>
</ReadResult>
</ReadResponse>
</soap:Body>
</soap:Envelope>
and would like to parse it into a dataframe with the following structure (keeping some of the tags and discarding the rest):
Timestamp x1 x2
2013-11-11T00:00:00 113 101
2014-11-11T00:02:00 110 122
The problem is since the XML file includes namespaces, I don't know how to proceed. I have gone through several tutorials (e.g., https://docs.python.org/2/library/pyexpat.html) and questions (e.g., How to open this XML file to create dataframe in Python? and Parsing XML with namespace in Python via 'ElementTree') but none of them have helped/worked. I appreciate if anyone can help me sorting this out.
Here is an example on how to parse an xml using lxml and xpaths:
from lxml import etree
namespaces = {'abc': "ABCDEFG.com"}
xmltree = etree.fromstring(xml_string)
items = xmltree.xpath('//abc:Alias/text()', namespaces=namespaces)
print items

python remove element containing namespace

I am trying to remove an element in an xml which contains a namespace.
Here is my code:
templateXml = """<?xml version="1.0" encoding="UTF-8"?>
<Metadata xmlns="http://www.amazon.com/UnboxMetadata/v1">
<Movie>
<CountryOfOrigin>US</CountryOfOrigin>
<TitleInfo>
<Title locale="en-GB">The Title</Title>
<Actor>
<ActorName locale="en-GB">XXX</ActorName>
<Character locale="en-GB">XXX</Character>
</Actor>
</TitleInfo>
</Movie>
</Metadata>"""
from lxml import etree
tree = etree.fromstring(templateXml)
namespaces = {'ns':'http://www.amazon.com/UnboxMetadata/v1'}
for checkActor in tree.xpath('//ns:Actor', namespaces=namespaces):
etree.strip_elements(tree, 'ns:Actor')
In my actual XML I have lots of tags, So I am trying to search for the Actor tags which contain XXX and completely remove that whole tag and its contents. But it's not working.
Use remove() method:
for checkActor in tree.xpath('//ns:Actor', namespaces=namespaces):
checkActor.getparent().remove(checkActor)
print etree.tostring(tree, pretty_print=True, xml_declaration=True)
prints:
<?xml version='1.0' encoding='ASCII'?>
<Metadata xmlns="http://www.amazon.com/UnboxMetadata/v1">
<Movie>
<CountryOfOrigin>US</CountryOfOrigin>
<TitleInfo>
<Title locale="en-GB">The Title</Title>
</TitleInfo>
</Movie>
</Metadata>

Categories