Python and XML Processing

Python and XML Processing - python

I have used urllib to get the following data:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<videos xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:www="http://www.www.com"">
<video type="cl">
<cd>
<src lang="music">http://www.google.com/ </src>
</cd>
</video>
</videos>
I want to get http://www.google.com/ out, here is my code:
import xml.etree.ElementTree as etree
data='<?xml version="1.0" encoding="UTF-8" standalone="yes"?><videos xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:www="http://www.www.com""><video type="cl"><cd><src lang="music">http://www.google.com/ </src></cd></video></videos>'
tree = etree.fromstring(data)
geturl=tree.findtext('/video/cd/src').strip()
print geturl
I get error:
AttributeError: 'NoneType' object has no attribute 'strip'
Obviously, the findtext failed. I tried findtext('src'), also wont work.
Whats wrong?

Remove the first forward-slash from the path: video/cd/src:
import xml.etree.ElementTree as etree
data='''<?xml version="1.0" encoding="UTF-8" standalone="yes"?><videos xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:www="http://www.www.com"><video type="cl"><cd><src lang="music">http://www.google.com/ </src></cd></video></videos>'''
tree = etree.fromstring(data)
geturl=tree.findtext('video/cd/src').strip()
print geturl
yields
http://www.google.com/
The forward-slash indicates an absolute path, which is not allowed on elements.
PS. There is also a syntax error in the data you posted: xmlns:www="http://www.www.com"" has two double-quotes at the end...

Related

How to extract text from xml error message python

I want to determine if the return from a beautifulsoup request looks like this.
Out[32]:
<?xml version="1.0" encoding="utf-8"?>
<boardgames termsofuse="https://boardgamegeek.com/xmlapi/termsofuse">
<boardgame>
<error message="Item not found"/>
</boardgame>
</boardgames>
I can extract the center of the previous output using:
soup.find_all('boardgame')[0], which produces the following:
Out[24]:
<boardgame>
<error message="Item not found"/>
</boardgame>
I feel like this should be so easy, and I've tried the following, but I still can't determine if the "error message="Item not found" is in there. What am I missing here?
soup.findAll('boardgame')[0].getText()
Out[26]: '\n\n'

Use the attribute message to get the value.If you to find the error tag first and then use the attribute message
from bs4 import BeautifulSoup
data='''<?xml version="1.0" encoding="utf-8"?>
<boardgames termsofuse="https://boardgamegeek.com/xmlapi/termsofuse">
<boardgame>
<error message="Item not found"/>
</boardgame>
</boardgames>'''
soup=BeautifulSoup(data,'html.parser')
message=soup.find('boardgame').find('error')['message']
print(message)
Output:
Item not found
Or you can use css selector
from bs4 import BeautifulSoup
data='''<?xml version="1.0" encoding="utf-8"?>
<boardgames termsofuse="https://boardgamegeek.com/xmlapi/termsofuse">
<boardgame>
<error message="Item not found"/>
</boardgame>
</boardgames>'''
soup=BeautifulSoup(data,'html.parser')
message=soup.select_one('boardgame error')['message']
print(message)
Output:
Item not found

Remove unwanted tags from XML file

I working on a XML file that contains soap tags in it. I want to remove those soap tags as part of XML cleanup process.
How can I achieve it in either Python or Scala. Should not use shell script.
Sample Input :
<?xml version="1.0" encoding="UTF-8"?>
<soap:Envelope xmlns:soap="http://sample.com/">
<soap:Body>
<com:RESPONSE xmlns:com="http://sample.com/">
<Student>
<StudentID>100234</StudentID>
<Gender>Male</Gender>
<Surname>Robert</Surname>
<Firstname>Mathews</Firstname>
</Student>
</com:RESPONSE>
</soap:Body>
</soap:Envelope>
Expected Output :
<?xml version="1.0" encoding="UTF-8"?>
<com:RESPONSE xmlns:com="http://sample.com/">
<Student>
<StudentID>100234</StudentID>
<Gender>Male</Gender>
<Surname>Robert</Surname>
<Firstname>Mathews</Firstname>
</Student>
</com:RESPONSE>

This could help you!
from lxml import etree
doc = etree.parse('test.xml')
for ele in doc.xpath('//soap'):
parent = ele.getparent()
parent.remove(ele)
print(etree.tostring(doc))

How to fetch data from multiple namespaces in a soap response using python: xml.etree.ElementTree

How to fetch value of through the correct xpath using python xml.etree.ElementTree
This is the 'soap.xml' file
<?xml version="1.0" encoding="UTF-8"?>
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<soapenv:Header>
<platformMsgs:documentInfo xmlns:platformMsgs="urn:messages_2014_1.platform.webservices.netsuite.com">
<platformMsgs:nsId>WEBSERVICES_3883026_SB1_103120188552030381995151284_f039e9cc</platformMsgs:nsId>
</platformMsgs:documentInfo>
</soapenv:Header>
<soapenv:Body>
<searchResponse xmlns="urn:messages_2014_1.platform.webservices.netsuite.com">
<platformCore:searchResult xmlns:platformCore="urn:core_2014_1.platform.webservices.netsuite.com">
<platformCore:status isSuccess="true"/>
<platformCore:totalRecords>200</platformCore:totalRecords>
<platformCore:pageSize>1000</platformCore:pageSize>
<platformCore:totalPages>1</platformCore:totalPages>
<platformCore:pageIndex>1</platformCore:pageIndex>
<platformCore:searchId>WEBSERVICES_3883026_SB1_103120188552030381995151284_f039e9cc</platformCore:searchId>
</platformCore:searchResult>
</searchResponse>
</soapenv:Body>
</soapenv:Envelope>
Here is my code.
import xml.etree.ElementTree as ET
tree = ET.parse('soap.xml')
print tree.findall('.//{http://schemas.xmlsoap.org/soap/envelope/}platformCore')
But it return [] instead of value = '200'

You need to use the correct namespace (nsmap), then you can search easily using prefixes:
from lxml import etree
tree = etree.parse('soap.xml')
nsmap = {'platformCore': 'urn:core_2014_1.platform.webservices.netsuite.com'}
print (tree.findall('.//platformCore:totalRecords', nsmap)[0].text)
Returns:
200

Paser XML in python

I am getting this xml response, can anybody help me in getting the token from the xml tags?
<s:Envelope xmlns:s="http://schemas.xmlsoap.org/soap/envelope/"><s:Body><LoginResponse xmlns="http://videoos.net/2/XProtectCSServerCommand"><LoginResult xmlns:i="http://www.w3.org/2001/XMLSchema-instance"><RegistrationTime>2018-09-06T07:30:38.4571763Z</RegistrationTime><TimeToLive><MicroSeconds>3600000000</MicroSeconds></TimeToLive><TimeToLiveLimited>false</TimeToLiveLimited><Token>TOKEN#xxxxx#</Token></LoginResult></LoginResponse></s:Body></s:Envelope>
I have it as a string
Tried lxml and other libs too like ET but wasn't able to extract the token field. HELPPP
Update with a format xml to make you easy to read, FYI.
<?xml version="1.0" encoding="utf-8"?>
<s:Envelope xmlns:s="http://schemas.xmlsoap.org/soap/envelope/">
<s:Body>
<LoginResponse xmlns="http://videoos.net/2/XProtectCSServerCommand">
<LoginResult xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
<RegistrationTime>2018-09-06T07:30:38.4571763Z</RegistrationTime>
<TimeToLive>
<MicroSeconds>3600000000</MicroSeconds>
</TimeToLive>
<TimeToLiveLimited>false</TimeToLiveLimited>
<Token>TOKEN#xxxxx#</Token>
</LoginResult>
</LoginResponse>
</s:Body>
</s:Envelope>

text = """
<?xml version="1.0" encoding="utf-8"?>
<s:Envelope xmlns:s="http://schemas.xmlsoap.org/soap/envelope/">
<s:Body>
<LoginResponse xmlns="http://videoos.net/2/XProtectCSServerCommand">
<LoginResult xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
<RegistrationTime>2018-09-06T07:30:38.4571763Z</RegistrationTime>
<TimeToLive>
<MicroSeconds>3600000000</MicroSeconds>
</TimeToLive>
<TimeToLiveLimited>false</TimeToLiveLimited>
<Token>TOKEN#xxxxx#</Token>
</LoginResult>
</LoginResponse>
</s:Body>
</s:Envelope>
"""
from bs4 import BeautifulSoup
parser = BeautifulSoup(text,'xml')
for item in parser.find_all('Token'):
print(item.text)

Using lxml
Demo:
x = '''<?xml version="1.0" encoding="utf-8"?>
<s:Envelope xmlns:s="http://schemas.xmlsoap.org/soap/envelope/">
<s:Body>
<LoginResponse xmlns="http://videoos.net/2/XProtectCSServerCommand">
<LoginResult xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
<RegistrationTime>2018-09-06T07:30:38.4571763Z</RegistrationTime>
<TimeToLive>
<MicroSeconds>3600000000</MicroSeconds>
</TimeToLive>
<TimeToLiveLimited>false</TimeToLiveLimited>
<Token>TOKEN#xxxxx#</Token>
</LoginResult>
</LoginResponse>
</s:Body>
</s:Envelope>'''
from lxml import etree
xmltree = etree.fromstring(x)
namespaces = {'content': "http://videoos.net/2/XProtectCSServerCommand"}
items = xmltree.xpath('//content:Token/text()', namespaces=namespaces)
print(items)
Output:
['TOKEN#xxxxx#']

Parse XML SOAP response with Python

I want parse this response from SOAP and extract text between <LoginResult> :
<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<soap:Body>
<LoginResponse xmlns="http://tempuri.org/wsSalesQuotation/Service1">
<LoginResult>45eeadF43423KKmP33</LoginResult>
</LoginResponse>
</soap:Body>
</soap:Envelope>
How I can do it using XML Python Libs?

import xml.etree.ElementTree as ET
tree = ET.parse('soap.xml')
print tree.find('.//{http://tempuri.org/wsSalesQuotation/Service1}LoginResult').text
>>45eeadF43423KKmP33
instead of print, do something useful to it.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python and XML Processing - python

Related

How to extract text from xml error message python

Remove unwanted tags from XML file

How to fetch data from multiple namespaces in a soap response using python: xml.etree.ElementTree

Paser XML in python

Parse XML SOAP response with Python

Categories

Resources