Delete entire node using lxml - python

I have a an xml document like the following:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>company</groupId>
<artifactId>art-id</artifactId>
<version>RELEASE</version>
</parent>
<properties>
<tomcat.username>admin</tomcat.username>
<tomcat.password>admin</tomcat.password>
</properties>
<dependencies>
<dependency>
<groupId>asdf</groupId>
<artifactId>asdf</artifactId>
<version>[3.8,)</version>
</dependency>
<dependency>
<groupId>asdf</groupId>
<artifactId>asdf</artifactId>
<version>[4.1,)</version>
</dependency>
</dependencies>
how can I delete the entire node "dependencies"?
I have looked at other questions and answers on stackoverflow and what is different about is the namespace aspect of this xml, and the other questions ask to delete a subelement like "dependency" while I want to delete the whole node "dependencies." Is there an easy way using lxml to delete the entire node?
The following gives a 'NoneType' object has no attribute 'remove' error:
from lxml import etree as ET
tree = ET.parse('pom.xml')
namespace = '{http://maven.apache.org/POM/4.0.0}'
root = ET.Element(namespace+'project')
root.find(namespace+'dependencies').remove()

You can create a dict mapping for your namespace(s), find the node then call root.remove passing the node, you don't call .remove on the node:
x = """<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>company</groupId>
<artifactId>art-id</artifactId>
<version>RELEASE</version>
</parent>
<properties>
<tomcat.username>admin</tomcat.username>
<tomcat.password>admin</tomcat.password>
</properties>
<dependencies>
<dependency>
<groupId>asdf</groupId>
<artifactId>asdf</artifactId>
<version>[3.8,)</version>
</dependency>
<dependency>
<groupId>asdf</groupId>
<artifactId>asdf</artifactId>
<version>[4.1,)</version>
</dependency>
</dependencies>
</project>"""
import lxml.etree as et
from StringIO import StringIO
tree = et.parse(StringIO(x))
root =tree.getroot()
nsmap = {"mav":"http://maven.apache.org/POM/4.0.0"}
root.remove(root.find("mav:dependencies", namespaces=nsmap))
print(et.tostring(tree))
Which would give you:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>company</groupId>
<artifactId>art-id</artifactId>
<version>RELEASE</version>
</parent>
<properties>
<tomcat.username>admin</tomcat.username>
<tomcat.password>admin</tomcat.password>
</properties>
</project>

First, grab the root node. Since it is <project ... > (vs <project .../>) the "parent" element of dependencies is project. Example from the documentation:
import xml.etree.ElementTree as ET
tree = ET.parse('country_data.xml')
root = tree.getroot()
Once you have the root, check root.tag(), it should be "project".
Then do root.remove(root.find('dependencies')), where root is the project node.
If it were <project .../> then it would be invalid XML since there must be a root element. I can see exactly where you are coming from, though.

Related

How to create an objectified element with text with lxml

I'd like to create an element tree (not parsing!) with lxml.objectify that might look like this:
<root>
<child>Hello World</child>
</root>
My first attempt was to write code like this:
import lxml.objectify as o
from lxml.etree import tounicode
r = o.Element("root")
c = o.Element("child", text="Hello World")
r.append(c)
print(tounicode(r, pretty_print=True)
But that produces:
<root xmlns:py="http://codespeak.net/lxml/objectify/pytype"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema" py:pytype="TREE">
<child text="Hello World" data="Test" py:pytype="TREE"/>
</root>
As suggested in other answers, the <child> has no method _setText.
Apparently, lxml.objectifiy does not allow to create an element with text or change the text content. So, did I miss something?
From the doc and the answer you linked. You should use SubElement:
r = o.E.root() # same as o.Element("root")
c = o.SubElement(r, "child")
c._setText("Hello World")
print(tounicode(r, pretty_print=True))
c._setText("Changed it!")
print(tounicode(r, pretty_print=True))
Output:
<root xmlns:py="http://codespeak.net/lxml/objectify/pytype" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<child>Hello World</child>
</root>
<root xmlns:py="http://codespeak.net/lxml/objectify/pytype" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<child>Changed it!</child>
</root>

Issue with python script while parsing pom file in project

I'm having issue extracting version number using python script. Its returning none while running the script. Can someone help me on this ?
Python Script:
import xml.etree.ElementTree as ET
tree = ET.parse('pom.xml')
root = tree.getroot()
releaseVersion = root.find("version")
print(releaseVersion)
pom.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://maven.apache.org/POM/4.0.0"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
<artifactId>watcher</artifactId>
<version>0.0.1-SNAPSHOT</version>
<groupId>com.test</groupId>
<name>file</name>
<packaging>jar</packaging>
<parent>
<artifactId>spring-boot-starter-parent</artifactId>
<groupId>org.springframework.boot</groupId>
<relativePath/>
<version>2.6.1</version>
</parent>
</project>
You're not taking into account that all of your elements are in a default namespace defined by xmlns="http://maven.apache.org/POM/4.0.0" in your <project> element.
So you have to create your query with this namespace.
import xml.etree.ElementTree as ET
tree = ET.parse('pom.xml')
root = tree.getroot()
NS = { 'maven' : 'http://maven.apache.org/POM/4.0.0' }
releaseVersion = root.find("maven:version",NS)
print(releaseVersion.text)
Here NS = { ... } defines the namespace (in the following referred to by its prefix maven) used in the following XPath expression.
Your pom.xml has a namespace xmlns="http://maven.apache.org/POM/4.0.0" in the project tag.
If you must search with fullname, you need to follow {namespace}tag
>> root.find("{http://maven.apache.org/POM/4.0.0}version")
<Element '{http://maven.apache.org/POM/4.0.0}version' at 0x0000014635EC0A40>
But if you don't bother you can search with {*}tag
>> root.find("{*}version")
<Element '{http://maven.apache.org/POM/4.0.0}version' at 0x0000014635EC0A40>

Parsing XML in Python with ElementTree - findall()

I'm using the documentation here to try to get only the values (address , mask ) for certain elements.
This is an example of the structure of my XML:
<?xml version="1.0" ?>
<rpc-reply xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="urn:uuid:52622325-b136-40cf-bc36-85332e25b6f3" xmlns:nc="urn:ietf:params:xml:ns:netconf:base:1.0">
<data>
<native xmlns="http://cisco.com/ns/yang/Cisco-IOS-XE-native">
<interface>
<GigabitEthernet>
<name>1</name>
<ip>
<address>
<primary>
<address>192.168.40.30</address>
<mask>255.255.255.0</mask>
</primary>
</address>
</ip>
<logging>
<event>
<link-status/>
</event>
</logging>
<mop>
<enabled>false</enabled>
<sysid>false</sysid>
</mop>
<negotiation xmlns="http://cisco.com/ns/yang/Cisco-IOS-XE-ethernet">
<auto>true</auto>
</negotiation>
</GigabitEthernet>
<GigabitEthernet>
<name>2</name>
<ip>
<address>
<primary>
<address>10.10.10.1</address>
<mask>255.255.255.0</mask>
</primary>
</address>
</ip>
<logging>
<event>
<link-status/>
</event>
</logging>
<mop>
<enabled>false</enabled>
<sysid>false</sysid>
</mop>
<negotiation xmlns="http://cisco.com/ns/yang/Cisco-IOS-XE-ethernet">
<auto>true</auto>
</negotiation>
</GigabitEthernet>
<GigabitEthernet>
<name>3</name>
<ip>
<address>
<primary>
<address>30.30.30.1</address>
<mask>255.255.255.0</mask>
</primary>
</address>
</ip>
<logging>
<event>
<link-status/>
</event>
</logging>
<mop>
<enabled>false</enabled>
<sysid>false</sysid>
</mop>
<negotiation xmlns="http://cisco.com/ns/yang/Cisco-IOS-XE-ethernet">
<auto>true</auto>
</negotiation>
</GigabitEthernet>
<GigabitEthernet>
<name>4</name>
<logging>
<event>
<link-status/>
</event>
</logging>
<mop>
<enabled>false</enabled>
<sysid>false</sysid>
</mop>
<negotiation xmlns="http://cisco.com/ns/yang/Cisco-IOS-XE-ethernet">
<auto>true</auto>
</negotiation>
</GigabitEthernet>
</interface>
</native>
</data>
Working off this example in the documentation, I've tried something like this:
import xml.etree.ElementTree as ET
tree = ET.parse("C:\\Users\\Redha\\Documents\\test_network\\interface123.xml")
root = tree.getroot()
for i in root.findall('native'):
print(i.tag)
But it returns nothing . I've tried other things to no success. Any ideas? All advice appreciated. Thank you!
Consider using namespaces when referencing XML elements:
import xml.etree.ElementTree as ET
# declare XML namespaces
namespaces = {'native': 'http://cisco.com/ns/yang/Cisco-IOS-XE-native'}
tree = ET.parse("C:\\Users\\Redha\\Documents\\test_network\\interface123.xml")
root = tree.getroot()
# call findall() using previously created namespaces map
for i in root.findall('.//native:native', namespaces):
print(i.tag)

Cannot Parse XML file using Python

<?xml version="1.0" encoding="utf-8"?>
<AcResponse
Command="hist"
TaskId="408709">
<element
name="/build.gradle"
id="93527">
<transaction
id="1117194"
type="promote"
time="1529083792"
user="soarfa99">
<comment>Automated promotion to parent stream by module build: jenkins-SC-MODULE-CS-SC-TRUNK-MedRec-DEV-CI-430</comment>
<version
virtual="11007/75"
real="36877/2"
virtualNamedVersion="CS-SC-TRUNK-INTG/75"
realNamedVersion="CS-SC-TRUNK-MedRec-DEV2_ar037601/2"
elem_type="text"
dir="no">
<issueNum>72768</issueNum>
</version>
</transaction>
<transaction
id="1111652"
type="promote"
time="1528100495"
user="dm041068">
<comment>SEDA file add- Debajyoti</comment>
<version
virtual="11007/74"
real="39225/1"
virtualNamedVersion="CS-SC-TRUNK-INTG/74"
realNamedVersion="CS-SC-TRUNK-CM-DEV-Debajyoti_dm041068/1"
elem_type="text"
dir="no">
<issueNum>72629</issueNum>
</version>
</transaction>
</element>
<streams>
<stream
id="11007"
name="CS-SC-TRUNK-INTG"
type="normal"/>
</streams>
</AcResponse>
This is the xml i am trying to parse, and i am trying to extract the attribute 'issueNum' with the following code:
tree=ET.parse(xml)
root=tree.getroot()
for item in root.findall('version'):
for child in item:
print(child.attrib['issueNum'])
Can you guys please help, get me the value of "issueNum".
You can use an xpath expression to find the values of issueNum:
from lxml import etree
xml = '''<?xml version="1.0" encoding="utf-8"?>
<AcResponse
Command="hist"
TaskId="408709">....'''
tree = etree.fromstring(xml)
issues = tree.xpath('//version/issueNum')
for issue in issues:
print(issue.text)
This prints:
72768
72629

how to change a node value in python

<?xml version="1.0"?>
<info>
</tags>
</tags>
<area>
<media>
<options>
<name>Jaipur</name>
</options>
</media>
</area>
</info>
i am totaly new in python, here is my xml file and i want to edit element value at run time in python
it means I want to change the <name>Jaipur</name> to <name>Mumbai</name>
First, the example is not valid xml. You can use xml.etree that comes included:
from xml.etree import ElementTree as et
xmlstr="""\
<?xml version="1.0"?>
<area>
<media>
<options>
<name>Jaipur</name>
</options>
</media>
</area>"""
doc=et.fromstring(xmlstr)
doc.find('.//name').text='Mumbai'
print et.tostring(doc)
output:
<area>
<media>
<options>
<name>Mumbai</name>
</options>
</media>
</area>

Categories