python remove element containing namespace

python remove element containing namespace - python

I am trying to remove an element in an xml which contains a namespace.
Here is my code:
templateXml = """<?xml version="1.0" encoding="UTF-8"?>
<Metadata xmlns="http://www.amazon.com/UnboxMetadata/v1">
<Movie>
<CountryOfOrigin>US</CountryOfOrigin>
<TitleInfo>
<Title locale="en-GB">The Title</Title>
<Actor>
<ActorName locale="en-GB">XXX</ActorName>
<Character locale="en-GB">XXX</Character>
</Actor>
</TitleInfo>
</Movie>
</Metadata>"""
from lxml import etree
tree = etree.fromstring(templateXml)
namespaces = {'ns':'http://www.amazon.com/UnboxMetadata/v1'}
for checkActor in tree.xpath('//ns:Actor', namespaces=namespaces):
etree.strip_elements(tree, 'ns:Actor')
In my actual XML I have lots of tags, So I am trying to search for the Actor tags which contain XXX and completely remove that whole tag and its contents. But it's not working.

Use remove() method:
for checkActor in tree.xpath('//ns:Actor', namespaces=namespaces):
checkActor.getparent().remove(checkActor)
print etree.tostring(tree, pretty_print=True, xml_declaration=True)
prints:
<?xml version='1.0' encoding='ASCII'?>
<Metadata xmlns="http://www.amazon.com/UnboxMetadata/v1">
<Movie>
<CountryOfOrigin>US</CountryOfOrigin>
<TitleInfo>
<Title locale="en-GB">The Title</Title>
</TitleInfo>
</Movie>
</Metadata>

Related

How to remove all occurences of element in XML file?

I'd like to edit a KML file and remove all occurences of ExtendedData elements, wherever they are located in the file.
Here's the input XML file:
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://earth.google.com/kml/2.2">
<Document>
<Style id="placemark-red">
<IconStyle>
<Icon>
<href>http://maps.me/placemarks/placemark-red.png</href>
</Icon>
</IconStyle>
</Style>
<name>My track</name>
<ExtendedData xmlns:mwm="https://maps.me">
<mwm:name>
<mwm:lang code="default">Blah</mwm:lang>
</mwm:name>
<mwm:lastModified>2020-04-05T14:17:18Z</mwm:lastModified>
</ExtendedData>
<Placemark>
<name></name>
…
<ExtendedData xmlns:mwm="https://maps.me">
<mwm:localId>0</mwm:localId>
<mwm:visibility>1</mwm:visibility>
</ExtendedData>
</Placemark>
</Document>
</kml>
And here's the code that 1) only removes the outermost occurence, and 2) requires adding the namespace to find it:
from lxml import etree
from pykml import parser
from pykml.factory import KML_ElementMaker as KML
with open("input.xml") as f:
doc = parser.parse(f)
root = doc.getroot()
ns = "{http://earth.google.com/kml/2.2}"
for pm in root.Document.getchildren():
#No way to get rid of namespace, for easier search?
if pm.tag==f"{ns}ExtendedData":
root.Document.remove(pm)
#How to remove innermost occurence of ExtendedData?
print(etree.tostring(doc, pretty_print=True))
Is there a way to remove all occurences in one go, or should I parse the whole tree?
Thank you.
Edit: The BeautifulSoup solution below requires adding an option "BeautifulSoup(my_xml,features="lxml")" to avoid the warning "No parser was explicitly specified".

Here's a solution using BeautifulSoup:
soup = BeautifulSoup(my_xml) # this is your xml
while True:
elem = soup.find("extendeddata")
if not elem:
break
elem.decompose()
Here's the output for your data:
<?xml version="1.0" encoding="UTF-8"?>
<html>
<body>
<kml xmlns="http://earth.google.com/kml/2.2">
<document>
<style id="placemark-red">
<IconStyle>
<Icon>
<href>http://maps.me/placemarks/placemark-red.png</href>
</Icon>
</IconStyle>
</style>
<name>
My track
</name>
<placemark>
<name>
</name>
</placemark>
</document>
</kml>
</body>
</html>

If you know the XML structure, try:
xml_root = ElementTree.parse(filename_path).getroot()
elem = xml_root.find('./ExtendedData')
xml_root.remove(elem)
or
xml_root = ElementTree.parse(filename_path).getroot()
p_elem = xml_root.find('/Placemark')
c_elem = xml_root.find('/Placemark/ExtendedData')
p_elem.remove(c_elem)
play with this ideas :)
if you don't know the xml structure, I think you need to parse the whole tree.

Simply run the empty template with Identity Transform using XSLT 1.0 which Python's lxml can run. No for/while loops or if logic needed. To handle the default namespace, define a prefix like doc:
XSLT (save a .xsl file, a special .xml file)
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:doc="http://earth.google.com/kml/2.2">
<xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<!-- IDENTITY TRANSFORM -->
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<!-- REMOVE ALL OCCURRENCES OF NODE -->
<xsl:template match="doc:ExtendedData"/>
</xsl:stylesheet>
Python
import lxml.etree as et
# LOAD XML AND XSL SOURCES
xml = et.parse('Input.xml')
xsl = et.parse('XSLT_Script.xsl')
# TRANSFORM INPUT
transform = et.XSLT(xsl)
result = transform(xml)
# PRINT TO SCREEN
print(result)
# SAVE TO FILE
with open('Output.kml', 'wb') as f:
f.write(result)

Remove unwanted tags from XML file

I working on a XML file that contains soap tags in it. I want to remove those soap tags as part of XML cleanup process.
How can I achieve it in either Python or Scala. Should not use shell script.
Sample Input :
<?xml version="1.0" encoding="UTF-8"?>
<soap:Envelope xmlns:soap="http://sample.com/">
<soap:Body>
<com:RESPONSE xmlns:com="http://sample.com/">
<Student>
<StudentID>100234</StudentID>
<Gender>Male</Gender>
<Surname>Robert</Surname>
<Firstname>Mathews</Firstname>
</Student>
</com:RESPONSE>
</soap:Body>
</soap:Envelope>
Expected Output :
<?xml version="1.0" encoding="UTF-8"?>
<com:RESPONSE xmlns:com="http://sample.com/">
<Student>
<StudentID>100234</StudentID>
<Gender>Male</Gender>
<Surname>Robert</Surname>
<Firstname>Mathews</Firstname>
</Student>
</com:RESPONSE>

This could help you!
from lxml import etree
doc = etree.parse('test.xml')
for ele in doc.xpath('//soap'):
parent = ele.getparent()
parent.remove(ele)
print(etree.tostring(doc))

Paser XML in python

I am getting this xml response, can anybody help me in getting the token from the xml tags?
<s:Envelope xmlns:s="http://schemas.xmlsoap.org/soap/envelope/"><s:Body><LoginResponse xmlns="http://videoos.net/2/XProtectCSServerCommand"><LoginResult xmlns:i="http://www.w3.org/2001/XMLSchema-instance"><RegistrationTime>2018-09-06T07:30:38.4571763Z</RegistrationTime><TimeToLive><MicroSeconds>3600000000</MicroSeconds></TimeToLive><TimeToLiveLimited>false</TimeToLiveLimited><Token>TOKEN#xxxxx#</Token></LoginResult></LoginResponse></s:Body></s:Envelope>
I have it as a string
Tried lxml and other libs too like ET but wasn't able to extract the token field. HELPPP
Update with a format xml to make you easy to read, FYI.
<?xml version="1.0" encoding="utf-8"?>
<s:Envelope xmlns:s="http://schemas.xmlsoap.org/soap/envelope/">
<s:Body>
<LoginResponse xmlns="http://videoos.net/2/XProtectCSServerCommand">
<LoginResult xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
<RegistrationTime>2018-09-06T07:30:38.4571763Z</RegistrationTime>
<TimeToLive>
<MicroSeconds>3600000000</MicroSeconds>
</TimeToLive>
<TimeToLiveLimited>false</TimeToLiveLimited>
<Token>TOKEN#xxxxx#</Token>
</LoginResult>
</LoginResponse>
</s:Body>
</s:Envelope>

text = """
<?xml version="1.0" encoding="utf-8"?>
<s:Envelope xmlns:s="http://schemas.xmlsoap.org/soap/envelope/">
<s:Body>
<LoginResponse xmlns="http://videoos.net/2/XProtectCSServerCommand">
<LoginResult xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
<RegistrationTime>2018-09-06T07:30:38.4571763Z</RegistrationTime>
<TimeToLive>
<MicroSeconds>3600000000</MicroSeconds>
</TimeToLive>
<TimeToLiveLimited>false</TimeToLiveLimited>
<Token>TOKEN#xxxxx#</Token>
</LoginResult>
</LoginResponse>
</s:Body>
</s:Envelope>
"""
from bs4 import BeautifulSoup
parser = BeautifulSoup(text,'xml')
for item in parser.find_all('Token'):
print(item.text)

Using lxml
Demo:
x = '''<?xml version="1.0" encoding="utf-8"?>
<s:Envelope xmlns:s="http://schemas.xmlsoap.org/soap/envelope/">
<s:Body>
<LoginResponse xmlns="http://videoos.net/2/XProtectCSServerCommand">
<LoginResult xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
<RegistrationTime>2018-09-06T07:30:38.4571763Z</RegistrationTime>
<TimeToLive>
<MicroSeconds>3600000000</MicroSeconds>
</TimeToLive>
<TimeToLiveLimited>false</TimeToLiveLimited>
<Token>TOKEN#xxxxx#</Token>
</LoginResult>
</LoginResponse>
</s:Body>
</s:Envelope>'''
from lxml import etree
xmltree = etree.fromstring(x)
namespaces = {'content': "http://videoos.net/2/XProtectCSServerCommand"}
items = xmltree.xpath('//content:Token/text()', namespaces=namespaces)
print(items)
Output:
['TOKEN#xxxxx#']

iterate through XML?

What is the easiest way to navigate through XML with python?
<html>
<body>
<soapenv:envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<soapenv:body>
<getservicebyidresponse xmlns="http://www.something.com/soa/2.0/SRIManagement">
<code xmlns="">
0
</code>
<client xmlns="">
<action xsi:nil="true">
</action>
<actionmode xsi:nil="true">
</actionmode>
<clientid>
405965216
</clientid>
<firstname xsi:nil="true">
</firstname>
<id xsi:nil="true">
</id>
<lastname>
Last Name
</lastname>
<role xsi:nil="true">
</role>
<state xsi:nil="true">
</state>
</client>
</getservicebyidresponse>
</soapenv:body>
</soapenv:envelope>
</body>
</html>
I would go with regex and try to get the values of the lines I need but is there a pythonic way? something like xml[0][1] etc?

As #deceze already pointed out, you can use xml.etree.ElementTree here.
import xml.etree.ElementTree as ET
tree = ET.parse("path_to_xml_file")
root = tree.getroot()
You can iterate over all children nodes of root:
for child in root.iter():
if child.tag == 'clientid':
print(child.tag, child.text.strip())
Children are nested, and we can access specific child nodes by index, so root[0][1] should work (as long as the indices are correct).

Parse XML SOAP response with Python

I want parse this response from SOAP and extract text between <LoginResult> :
<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<soap:Body>
<LoginResponse xmlns="http://tempuri.org/wsSalesQuotation/Service1">
<LoginResult>45eeadF43423KKmP33</LoginResult>
</LoginResponse>
</soap:Body>
</soap:Envelope>
How I can do it using XML Python Libs?

import xml.etree.ElementTree as ET
tree = ET.parse('soap.xml')
print tree.find('.//{http://tempuri.org/wsSalesQuotation/Service1}LoginResult').text
>>45eeadF43423KKmP33
instead of print, do something useful to it.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

python remove element containing namespace - python

Related

How to remove all occurences of element in XML file?

Remove unwanted tags from XML file

Paser XML in python

iterate through XML?

Parse XML SOAP response with Python

Categories

Resources