Parsing XML in Python with ElementTree - findall() - python

I'm using the documentation here to try to get only the values (address , mask ) for certain elements.
This is an example of the structure of my XML:
<?xml version="1.0" ?>
<rpc-reply xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="urn:uuid:52622325-b136-40cf-bc36-85332e25b6f3" xmlns:nc="urn:ietf:params:xml:ns:netconf:base:1.0">
<data>
<native xmlns="http://cisco.com/ns/yang/Cisco-IOS-XE-native">
<interface>
<GigabitEthernet>
<name>1</name>
<ip>
<address>
<primary>
<address>192.168.40.30</address>
<mask>255.255.255.0</mask>
</primary>
</address>
</ip>
<logging>
<event>
<link-status/>
</event>
</logging>
<mop>
<enabled>false</enabled>
<sysid>false</sysid>
</mop>
<negotiation xmlns="http://cisco.com/ns/yang/Cisco-IOS-XE-ethernet">
<auto>true</auto>
</negotiation>
</GigabitEthernet>
<GigabitEthernet>
<name>2</name>
<ip>
<address>
<primary>
<address>10.10.10.1</address>
<mask>255.255.255.0</mask>
</primary>
</address>
</ip>
<logging>
<event>
<link-status/>
</event>
</logging>
<mop>
<enabled>false</enabled>
<sysid>false</sysid>
</mop>
<negotiation xmlns="http://cisco.com/ns/yang/Cisco-IOS-XE-ethernet">
<auto>true</auto>
</negotiation>
</GigabitEthernet>
<GigabitEthernet>
<name>3</name>
<ip>
<address>
<primary>
<address>30.30.30.1</address>
<mask>255.255.255.0</mask>
</primary>
</address>
</ip>
<logging>
<event>
<link-status/>
</event>
</logging>
<mop>
<enabled>false</enabled>
<sysid>false</sysid>
</mop>
<negotiation xmlns="http://cisco.com/ns/yang/Cisco-IOS-XE-ethernet">
<auto>true</auto>
</negotiation>
</GigabitEthernet>
<GigabitEthernet>
<name>4</name>
<logging>
<event>
<link-status/>
</event>
</logging>
<mop>
<enabled>false</enabled>
<sysid>false</sysid>
</mop>
<negotiation xmlns="http://cisco.com/ns/yang/Cisco-IOS-XE-ethernet">
<auto>true</auto>
</negotiation>
</GigabitEthernet>
</interface>
</native>
</data>
Working off this example in the documentation, I've tried something like this:
import xml.etree.ElementTree as ET
tree = ET.parse("C:\\Users\\Redha\\Documents\\test_network\\interface123.xml")
root = tree.getroot()
for i in root.findall('native'):
print(i.tag)
But it returns nothing . I've tried other things to no success. Any ideas? All advice appreciated. Thank you!

Consider using namespaces when referencing XML elements:
import xml.etree.ElementTree as ET
# declare XML namespaces
namespaces = {'native': 'http://cisco.com/ns/yang/Cisco-IOS-XE-native'}
tree = ET.parse("C:\\Users\\Redha\\Documents\\test_network\\interface123.xml")
root = tree.getroot()
# call findall() using previously created namespaces map
for i in root.findall('.//native:native', namespaces):
print(i.tag)

Related

How to copy xml entire elements, attributes and data to new xml file with specific id using python

Below is my sample xml source file
<?xml version="1.0" encoding="UTF-8"?>
<catalog xmlns="http://www.sample.com/xml/catalog" catalog-id="sample-catalog">
<product product-id="214146430">
<online-flag>false</online-flag>
<online-flag site-id="sample_ae">false</online-flag>
<available-flag>true</available-flag>
<searchable-flag>true</searchable-flag>
<tax-class-id>standard</tax-class-id>
<page-attributes/>
<custom-attributes>
<custom-attribute attribute-id="adultsize">L</custom-attribute>
</custom-attributes>
</product>
<product product-id="214146123">
<online-flag>false</online-flag>
<online-flag site-id="sample_ae">false</online-flag>
<available-flag>true</available-flag>
<searchable-flag>true</searchable-flag>
<tax-class-id>standard</tax-class-id>
<page-attributes/>
<custom-attributes>
<custom-attribute attribute-id="adultsize">L</custom-attribute>
</custom-attributes>
</product>
</catalog>
I want to copy only product id 214146430 to
New xml file and it should look like below
<?xml version="1.0" encoding="UTF-8"?>
<catalog xmlns="http://www.sample.com/xml/catalog" catalog-id="sample-catalog">
<product product-id="214146430">
<online-flag>false</online-flag>
<online-flag site-id="sample_ae">false</online-flag>
<available-flag>true</available-flag>
<searchable-flag>true</searchable-flag>
<tax-class-id>standard</tax-class-id>
<page-attributes/>
<custom-attributes>
<custom-attribute attribute-id="adultsize">L</custom-attribute>
</custom-attributes>
</product>
</catalog>
I am currently using xml.etree.ElementTree and xml.dom but no luck
but it is just copying the entire xml which is is not expected
Below is my python code
import xml.etree.ElementTree as ET
tree = ET.parse('Development/product_data_parser/emporio-imoprt-test.xml')
root = tree.getroot()
print(ET.tostring(root, encoding='utf8').decode('utf8'))
Thank so much in advance for your help
You could do it like this although this solution might be specific to your use-case:-
import xml.etree.ElementTree as ET
import re
# A list of product IDs that you want to keep
keep = ['214146430']
# Figure out the namespace (used in tag matching later)
def getnamespace(root):
m = re.match(r'\{.*\}', root.tag)
return m.group(0) if m is not None else ''
tree = ET.parse('Development/product_data_parser/emporio-imoprt-test.xml')
root = tree.getroot()
namespace = getnamespace(root)
for elem in root.findall(f'{namespace}product'):
if elem.attrib['product-id'] not in keep:
root.remove(elem)
print(ET.tostring(root, encoding='utf8').decode('utf8'))

Python write result of XML ElementTree findall to a file

I want to write a python code to extract some data from a source XML file and write to a new file. My source file is like this:
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/">
<soapenv:Header/>
<soapenv:Body>
<SessionID xmlns="http://www.niku.com/xog">12345</SessionID>
<QueryResult xmlns="http://www.niku.com/xog/Query" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Records>
<Record>
<id>1</id>
<date_start>2020-10-04T00:00:00</date_start>
<date_end>2020-10-10T00:00:00</date_end>
<name>Payne, Max</name>
</Record>
<Record>
<id>2</id>
<date_start>2020-10-04T00:00:00</date_start>
<date_end>2020-10-10T00:00:00</date_end>
<name>Reno, Jean</name>
</Record>
</Records>
</QueryResult>
</soapenv:Body>
</soapenv:Envelope>
I want to write the following output to a new xml file.
<Records>
<Record>
<id>1</id>
<date_start>2020-10-04T00:00:00</date_start>
<date_end>2020-10-10T00:00:00</date_end>
<name>Payne, Max</name>
</Record>
<Record>
<id>2</id>
<date_start>2020-10-04T00:00:00</date_start>
<date_end>2020-10-10T00:00:00</date_end>
<name>Reno, Jean</name>
</Record>
</Records>
I was able to get following results from this code.
import xml.etree.ElementTree as ET
tree = ET.parse('my_file.xml')
root = tree.getroot()
for xtag in root.findall('.//{http://www.niku.com/xog/Query}Record'):
print(xtag)
Result:
<Element '{http://www.niku.com/xog/Query}Record' at 0x00000216BA69B778>
<Element '{http://www.niku.com/xog/Query}Record' at 0x00000216BA6A3228>
Can anyone help me to complete my requirement?
In your case print(xtag) prints the xtag object and not a string. For that you would need to convert the object to a string using the tree's tostring() method. Also, it seems you are looking to get the whole <Records> block instead of the individual <Record> elements; for this you don't need a loop.
import xml.etree.ElementTree as ET
tree = ET.parse('test.xml')
root = tree.getroot()
records = root.find('.//{http://www.niku.com/xog/Query}Records')
print(ET.tostring(records).decode("utf-8"))
Output
<ns0:Records xmlns:ns0="http://www.niku.com/xog/Query">
<ns0:Record>
<ns0:id>1</ns0:id>
<ns0:date_start>2020-10-04T00:00:00</ns0:date_start>
<ns0:date_end>2020-10-10T00:00:00</ns0:date_end>
<ns0:name>Payne, Max</ns0:name>
</ns0:Record>
<ns0:Record>
<ns0:id>2</ns0:id>
<ns0:date_start>2020-10-04T00:00:00</ns0:date_start>
<ns0:date_end>2020-10-10T00:00:00</ns0:date_end>
<ns0:name>Reno, Jean</ns0:name>
</ns0:Record>
</ns0:Records>
You could also use the lxml module, which gives a slightly different output.
from lxml import etree
tree = etree.parse('test.xml')
root = tree.getroot()
records = root.find('.//{http://www.niku.com/xog/Query}Records')
print(etree.tostring(records).decode("utf-8"))
Output
<Records xmlns="http://www.niku.com/xog/Query" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/">
<Record>
<id>1</id>
<date_start>2020-10-04T00:00:00</date_start>
<date_end>2020-10-10T00:00:00</date_end>
<name>Payne, Max</name>
</Record>
<Record>
<id>2</id>
<date_start>2020-10-04T00:00:00</date_start>
<date_end>2020-10-10T00:00:00</date_end>
<name>Reno, Jean</name>
</Record>
</Records>

parse xml file with game data in python

I have this game data written in xml format.
<?xml version="1.0" encoding="ISO-8859-1"?>
<log xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://mirror.server.eu/descr.xsd">
<version>0.1</version>
<info>
<timestamp>2018-09-23 16:09:23 CEST</timestamp>
<hostname>server.eu</hostname>
</info>
<events>
<event>
<pickup>
<time>1.506636</time>
<item>item_spikes</item>
<player>player1</player>
<value>50</value>
</pickup>
</event>
<event>
<damage>
<time>1.926975</time>
<attacker>player1</attacker>
<target>player2</target>
<type>sg</type>
<quad>0</quad>
<splash>0</splash>
<value>24</value>
<armor>0</armor>
</damage>
</event>
<event>
<death>
<time>4.862534</time>
<attacker>player2</attacker>
<target>player1</target>
<type>lg_beam</type>
<quad>0</quad>
<armorleft>0</armorleft>
<killheight>0</killheight>
<lifetime>4.862534</lifetime>
</death>
</event>
</events>
</log>
I need to parse it and take out all events called 'death'. Then I need to access every element in that 'death' section. Could you please help me with that?
Assuming that events can only contain a tag called death you can easily do this:
import xml.etree.cElementTree as ET
tree = ET.ElementTree(file='your_game_events.xml')
for event in tree.iter(tag = 'death'):
for child in event:
print "%s: %s" % (child.tag, child.text)

Delete entire node using lxml

I have a an xml document like the following:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>company</groupId>
<artifactId>art-id</artifactId>
<version>RELEASE</version>
</parent>
<properties>
<tomcat.username>admin</tomcat.username>
<tomcat.password>admin</tomcat.password>
</properties>
<dependencies>
<dependency>
<groupId>asdf</groupId>
<artifactId>asdf</artifactId>
<version>[3.8,)</version>
</dependency>
<dependency>
<groupId>asdf</groupId>
<artifactId>asdf</artifactId>
<version>[4.1,)</version>
</dependency>
</dependencies>
how can I delete the entire node "dependencies"?
I have looked at other questions and answers on stackoverflow and what is different about is the namespace aspect of this xml, and the other questions ask to delete a subelement like "dependency" while I want to delete the whole node "dependencies." Is there an easy way using lxml to delete the entire node?
The following gives a 'NoneType' object has no attribute 'remove' error:
from lxml import etree as ET
tree = ET.parse('pom.xml')
namespace = '{http://maven.apache.org/POM/4.0.0}'
root = ET.Element(namespace+'project')
root.find(namespace+'dependencies').remove()
You can create a dict mapping for your namespace(s), find the node then call root.remove passing the node, you don't call .remove on the node:
x = """<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>company</groupId>
<artifactId>art-id</artifactId>
<version>RELEASE</version>
</parent>
<properties>
<tomcat.username>admin</tomcat.username>
<tomcat.password>admin</tomcat.password>
</properties>
<dependencies>
<dependency>
<groupId>asdf</groupId>
<artifactId>asdf</artifactId>
<version>[3.8,)</version>
</dependency>
<dependency>
<groupId>asdf</groupId>
<artifactId>asdf</artifactId>
<version>[4.1,)</version>
</dependency>
</dependencies>
</project>"""
import lxml.etree as et
from StringIO import StringIO
tree = et.parse(StringIO(x))
root =tree.getroot()
nsmap = {"mav":"http://maven.apache.org/POM/4.0.0"}
root.remove(root.find("mav:dependencies", namespaces=nsmap))
print(et.tostring(tree))
Which would give you:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>company</groupId>
<artifactId>art-id</artifactId>
<version>RELEASE</version>
</parent>
<properties>
<tomcat.username>admin</tomcat.username>
<tomcat.password>admin</tomcat.password>
</properties>
</project>
First, grab the root node. Since it is <project ... > (vs <project .../>) the "parent" element of dependencies is project. Example from the documentation:
import xml.etree.ElementTree as ET
tree = ET.parse('country_data.xml')
root = tree.getroot()
Once you have the root, check root.tag(), it should be "project".
Then do root.remove(root.find('dependencies')), where root is the project node.
If it were <project .../> then it would be invalid XML since there must be a root element. I can see exactly where you are coming from, though.

how to change a node value in python

<?xml version="1.0"?>
<info>
</tags>
</tags>
<area>
<media>
<options>
<name>Jaipur</name>
</options>
</media>
</area>
</info>
i am totaly new in python, here is my xml file and i want to edit element value at run time in python
it means I want to change the <name>Jaipur</name> to <name>Mumbai</name>
First, the example is not valid xml. You can use xml.etree that comes included:
from xml.etree import ElementTree as et
xmlstr="""\
<?xml version="1.0"?>
<area>
<media>
<options>
<name>Jaipur</name>
</options>
</media>
</area>"""
doc=et.fromstring(xmlstr)
doc.find('.//name').text='Mumbai'
print et.tostring(doc)
output:
<area>
<media>
<options>
<name>Mumbai</name>
</options>
</media>
</area>

Categories