Node.TEXT_NODE has the value, but I need the Attribute - python

I have an xml file like so:
<host name='ip-10-196-55-2.ec2.internal'>
<hostvalue name='arch_string'>lx24-x86</hostvalue>
<hostvalue name='num_proc'>1</hostvalue>
<hostvalue name='load_avg'>0.01</hostvalue>
</host>
I can get get out the Node.data from a Node.TEXT_NODE, but I also need the Attribute name, like I want to know load_avg = 0.01, without writing load_avg, num_proc, etc, one by one. I want them all.
My code looks like this, but I can't figure out what part of the Node has the attribute name for me.
for stat in h.getElementsByTagName("hostvalue"):
for node3 in stat.childNodes:
attr = "foo"
val = "poo"
if node3.nodeType == Node.ATTRINUTE_NODE:
attr = node3.tagName
if node3.nodeType == Node.TEXT_NODE:
#attr = node3.tagName
val = node3.data
From the above code, I'm able to get val, but not attr (compile error:

here's a short example of what you could achieve:
from xml.dom import minidom
xmldoc = minidom.parse("so.xml")
values = {}
for stat in xmldoc.getElementsByTagName("hostvalue"):
attr = stat.attributes["name"].value
value = "\n".join([x.data for x in stat.childNodes])
values[attr] = value
print repr(values)
This outputs, given your XML file:
$ ./parse.py
{u'num_proc': u'1', u'arch_string': u'lx24-x86', u'load_avg': u'0.01'}
Be warned that this is not failsafe, i.e. if you have nested elements inside <hostvalue>.

Related

XML parsing in python issue using elementTree

I need to parse a soap response and convert to a text file. I am trying to parse the values as detailed below. I am using ElementTree in python
I have the below xml response which I need to parse
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:tmf854="tmf854.v1" xmlns:alu="alu.v1">
<soapenv:Header>
<tmf854:header>
<tmf854:activityName>query</tmf854:activityName>
<tmf854:msgName>queryResponse</tmf854:msgName>
<tmf854:msgType>RESPONSE</tmf854:msgType>
<tmf854:senderURI>https:/destinationhost:8443/tmf854/services</tmf854:senderURI>
<tmf854:destinationURI>https://localhost:8443</tmf854:destinationURI>
<tmf854:activityStatus>SUCCESS</tmf854:activityStatus>
<tmf854:correlationId>1</tmf854:correlationId>
<tmf854:communicationPattern>MultipleBatchResponse</tmf854:communicationPattern>
<tmf854:communicationStyle>RPC</tmf854:communicationStyle>
<tmf854:requestedBatchSize>1500</tmf854:requestedBatchSize>
<tmf854:batchSequenceNumber>1</tmf854:batchSequenceNumber>
<tmf854:batchSequenceEndOfReply>true</tmf854:batchSequenceEndOfReply>
<tmf854:iteratorReferenceURI>http://9195985371165397084</tmf854:iteratorReferenceURI>
<tmf854:timestamp>20220915222121.472+0530</tmf854:timestamp>
</tmf854:header>
</soapenv:Header>
<soapenv:Body>
<queryResponse xmlns="alu.v1">
<queryObjectData>
<queryObject>
<name>
<tmf854:mdNm>AMS</tmf854:mdNm>
<tmf854:meNm>CHEERLAVANCHA_281743</tmf854:meNm>
<tmf854:ptpNm>/type=NE/CHEERLAVANCHA_281743</tmf854:ptpNm>
</name>
<vendorExtensions>
<package>
<NameAndStringValue>
<tmf854:name>hubSubtendedStatus</tmf854:name>
<tmf854:value>NONE</tmf854:value>
</NameAndStringValue>
<NameAndStringValue>
<tmf854:name>productAndRelease</tmf854:name>
<tmf854:value>DF.6.1</tmf854:value>
</NameAndStringValue>
<NameAndStringValue>
<tmf854:name>adminUserName</tmf854:name>
<tmf854:value>isadmin</tmf854:value>
</NameAndStringValue>
<NameAndStringValue>
</package>
</vendorExtensions>
</queryObject>
</queryObjectData>
</queryResponse>
</soapenv:Body>
</soapenv:Envelope>
I need to use the below code snippet.
parser = ElementTree.parse("response.txt")
root = parser.getroot()
inventoryObjectData = root.find(".//{alu.v1}queryObjectData")
for inventoryObject in inventoryObjectData:
for device in inventoryObject:
if (device.tag.split("}")[1]) == "me":
vendorExtensionsNames = []
vendorExtensionsValues = []
if device.find(".//{tmf854.v1}mdNm") is not None:
mdnm = device.find(".//{tmf854.v1}mdNm").text
if device.find(".//{tmf854.v1}meNm") is not None:
menm = device.find(".//{tmf854.v1}meNm").text
if device.find(".//{tmf854.v1}userLabel") is not None:
userlabel = device.find(".//{tmf854.v1}userLabel").text
if device.find(".//{tmf854.v1}resourceState") is not None:
resourcestate = device.find(".//{tmf854.v1}resourceState").text
if device.find(".//{tmf854.v1}location") is not None:
location = device.find(".//{tmf854.v1}location").text
if device.find(".//{tmf854.v1}manufacturer") is not None:
manufacturer = device.find(".//{tmf854.v1}manufacturer").text
if device.find(".//{tmf854.v1}productName") is not None:
productname = device.find(".//{tmf854.v1}productName").text
if device.find(".//{tmf854.v1}version") is not None:
version = device.find(".//{tmf854.v1}version").text
vendorExtensions = device.find("vendorExtensions")
vendorExtensionsNamesElements = vendorExtensions.findall(".//{tmf854.v1}name")
for i in vendorExtensionsNamesElements:
vendorExtensionsNames.append(i.text.strip())
vendorExtensionsValuesElements = vendorExtensions.findall(".//{tmf854.v1}value")
for i in vendorExtensionsValuesElements:
vendorExtensionsValues.append(str(i.text or "").strip())
alu = ""
for i in vendorExtensions:
if i.attrib:
if alu == "":
alu = i.attrib.get("{alu.v1}name")
else:
alu = alu + "|" + i.attrib.get("{alu.v1}name")
The issue is that The below code is not able to find the 'vendorExtensions"'. Please help here.
vendorExtensions = device.find("vendorExtensions")
Have tried the below as well
vendorExtensions = device.find(".//queryObject/vendorExtensions")
Your document declares a default namespace of alu.v1:
<queryResponse xmlns="alu.v1">
...
</queryResponse>
Any attribute without an explicit namespace is in the alu.v1 namespace. You need to qualify your attribute name appropriately:
vendorExtensions = device.find("{alu.v1}vendorExtensions")
While the above is a real problem with your code that needs to be corrected (the Wikipedia entry on XML namespaces may be useful reading if you're unfamiliar with how namespaces work), there are also some logic problems with your code.
Let's drop the big list of conditionals from the code and see if it's actually doing what we think it's doing. If we run this:
from xml.etree import ElementTree
parser = ElementTree.parse("data.xml")
root = parser.getroot()
queryObjectData = root.find(".//{alu.v1}queryObjectData")
for queryObject in queryObjectData:
for device in queryObject:
print(device.tag)
Then using your sample data (once it has been corrected to be syntactically valid), we see as output:
{alu.v1}name
{alu.v1}vendorExtensions
Your search for the {alu.v1}vendorExtensions element will never succeed before the thing on which you're trying to search (the device variable) is the thing you're trying to find.
Additionally, the conditional in your loop...
if (device.tag.split("}")[1]) == "me":
...will never match (there is no element in the entire document for which tag.split("}")[1] == "me" is True).
I'm not entirely clear what you're trying to do, but here's are some thoughts:
Given your example data, you probably don't want that for device in inventoryObject: loop
We can drastically simplify your code by replacing that long block of conditionals with a list of attributes in which we are interested and then a for loop to extract them.
Rather than assigning a bunch of individual variables, we can build up a dictionary with the data from the queryObject
That might look like:
from xml.etree import ElementTree
import json
attributeNames = [
"mdNm",
"meNm",
"userLabel",
"resourceState",
"location",
"manufacturer",
"productName",
"version",
]
parser = ElementTree.parse("data.xml")
root = parser.getroot()
queryObjectData = root.find(".//{alu.v1}queryObjectData")
for queryObject in queryObjectData:
device = {}
for name in attributeNames:
if (value := queryObject.find(f".//{{tmf854.v1}}{name}")) is not None:
device[name] = value.text
vendorExtensions = queryObject.find("{alu.v1}vendorExtensions")
extensionMap = {}
for extension in vendorExtensions.findall(".//{alu.v1}NameAndStringValue"):
extname = extension.find("{tmf854.v1}name").text
extvalue = extension.find("{tmf854.v1}value").text
extensionMap[extname] = extvalue
device["vendorExtensions"] = extensionMap
print(json.dumps(device, indent=2))
Given your example data, this outputs:
{
"mdNm": "AMS",
"meNm": "CHEERLAVANCHA_281743",
"vendorExtensions": {
"hubSubtendedStatus": "NONE",
"productAndRelease": "DF.6.1",
"adminUserName": "isadmin"
}
}
An alternate approach, in which we just transform each queryObject into a dictionary, might look like this:
from xml.etree import ElementTree
import json
def localName(ele):
return ele.tag.split("}")[1]
def etree_to_dict(t):
if list(t):
d = {}
for child in t:
if localName(child) == "NameAndStringValue":
d.update(dict([[x.text.strip() for x in child]]))
else:
d.update({localName(child): etree_to_dict(child) for child in t})
return d
else:
return t.text.strip()
parser = ElementTree.parse("data.xml")
root = parser.getroot()
queryObjectData = root.find(".//{alu.v1}queryObjectData") or []
for queryObject in queryObjectData:
d = etree_to_dict(queryObject)
print(json.dumps(d, indent=2))
This will output:
{
"name": {
"mdNm": "AMS",
"meNm": "CHEERLAVANCHA_281743",
"ptpNm": "/type=NE/CHEERLAVANCHA_281743"
},
"vendorExtensions": {
"package": {
"hubSubtendedStatus": "NONE",
"productAndRelease": "DF.6.1",
"adminUserName": "isadmin"
}
}
}
That may or may not be appropriate depending on the structure of your real data and exactly what you're trying to accomplish.

Convert a dot notation string to object to access lxml objects python

I am trying to update the xml using object notation using lxml objectify.
<xml>
<fruit>
<citrus>
<lemon />
</citrus>
</fruit>
</xml>
I am trying to add another fruit called mango using lxml objectify like
root = lxml.objectify.fromstring(xml_string)
root.fruit.citrus = 'orange'
def update(path, value):
// code
update('fruit.citrus', 'orange')
I would like to pass a string like 'fruit.citrus' because I cannot pass an object fruit.citrus.
How do I achieve this in Python ie how do I execute the code 'root.fruit.citrus = 'orange' inside the update function. How to convert string to object?
Try Below solution:
import lxml.objectify, lxml.etree
xml = '<xml> <fruit> <citrus> <lemon /> </citrus> </fruit> </xml>'
root = lxml.objectify.fromstring(xml)
print("Before:")
print(lxml.etree.tostring(root))
def update(path, value):
parent = None
lst = path.split('.')
while lst:
ele = lst.pop(0)
parent = getattr(root, ele) if parent is None else getattr(parent, ele)
lxml.etree.SubElement(parent, value)
update('fruit.citrus', 'orange')
print("After:")
print(lxml.etree.tostring(root))
Output:
Before:
b'<xml><fruit><citrus><lemon/></citrus></fruit></xml>'
After:
b'<xml><fruit><citrus><lemon/><orange/></citrus></fruit></xml>'
If you insist on using objectify, you may not like this, but I think this is a pretty clean solution using lxml etree:
from lxml import etree
doc = etree.fromstring("""<xml>
<fruit>
<citrus>
<lemon />
</citrus>
</fruit>
</xml>""")
def update(root, path, item):
elems = root.xpath(path)
for elem in elems:
elem.append(etree.Element(item))
update(doc, 'fruit/citrus', 'orange')
print(etree.tostring(doc).decode())
The above answers were partially correct. It did not have the ability to handle indexes. The below code handles all the cases with ObjectPath (https://lxml.de/objectify.html).
import lxml.objectify, lxml.etree
from robot.api.deco import keyword
class ConfigXML(object):
def get_xml(self, filename):
self.root = lxml.objectify.fromstring(open(filename).read())
def add_attribute(self, path, **kwargs):
path_obj = lxml.objectify.ObjectPath(path)
for key in kwargs:
path_obj.find(self.root).set(key, kwargs[key])
def add_value(self, path, value):
path_obj = lxml.objectify.ObjectPath(path)
path_obj.setattr(self.root, value)
def add_tag(self, path, tag):
path_obj = lxml.objectify.ObjectPath(path)
lxml.objectify.SubElement(path_obj.find(self.root), tag)
def generate_xml(self):
lxml.objectify.deannotate(self.root, cleanup_namespaces=True, xsi_nil=True)
return lxml.etree.tostring(self.root).decode('utf-8')

Dictionary key and value flipping themselves unexpectedly

I am running python 3.5, and I've defined a function that creates XML SubElements and adds them under another element. The attributes are in a dictionary, but for some reason the dictionary keys and values will sometimes flip when I execute the script.
Here is a snippet of kind of what I have (the code is broken into many functions so I combined it here)
import xml.etree.ElementTree as ElementTree
def AddSubElement(parent, tag, text='', attributes = None):
XMLelement = ElementTree.SubElement(parent, tag)
XMLelement.text = text
if attributes != None:
for key, value in attributes:
XMLelement.set(key, value)
print("attributes =",attributes)
return XMLelement
descriptionTags = ([('xmlns:g' , 'http://base.google.com/ns/1.0')])
XMLroot = ElementTree.Element('rss')
XMLroot.set('version', '2.0')
XMLchannel = ElementTree.SubElement(XMLroot,'channel')
AddSubElement(XMLchannel,'g:description', 'sporting goods', attributes=descriptionTags )
AddSubElement(XMLchannel,'link', 'http://'+ domain +'/')
XMLitem = AddSubElement(XMLchannel,'item')
AddSubElement(XMLitem, 'g:brand', Product['ProductManufacturer'], attributes=bindingParam)
AddSubElement(XMLitem, 'g:description', Product['ProductDescriptionShort'], attributes=bindingParam)
AddSubElement(XMLitem, 'g:price', Product['ProductPrice'] + ' USD', attributes=bindingParam)
The key and value does get switched! Because I'll see this in the console sometimes:
attributes = [{'xmlns:g', 'http://base.google.com/ns/1.0'}]
attributes = [{'http://base.google.com/ns/1.0', 'xmlns:g'}]
attributes = [{'http://base.google.com/ns/1.0', 'xmlns:g'}]
...
And here is the xml string that sometimes comes out:
<rss version="2.0">
<channel>
<title>example.com</title>
<g:description xmlns:g="http://base.google.com/ns/1.0">sporting goods</g:description>
<link>http://www.example.com/</link>
<item>
<g:id http://base.google.com/ns/1.0="xmlns:g">8987983</g:id>
<title>Some cool product</title>
<g:brand http://base.google.com/ns/1.0="xmlns:g">Cool</g:brand>
<g:description http://base.google.com/ns/1.0="xmlns:g">Why is this so cool?</g:description>
<g:price http://base.google.com/ns/1.0="xmlns:g">69.00 USD</g:price>
...
What is causing this to flip?
attributes = [{'xmlns:g', 'http://base.google.com/ns/1.0'}]
This is a list containing a set, not a dictionary. Neither sets nor dictionaries are ordered.

Xpath select attribute of current node?

I use python with lxml to process the xml. After I query/filter to get to a nodes I want but I have some problem. How to get its attribute's value by xpath ? Here is my input example.
>print(etree.tostring(node, pretty_print=True ))
<rdf:li xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" rdf:resource="urn:miriam:obo.chebi:CHEBI%3A37671"/>
The value I want is in resource=... . Currently I just use the lxml to get the value. I wonder if it is possible to do in pure xpath ? thanks
EDIT: Forgot to said, this is not a root nodes so I can't use // here. I have like 2000-3000 others in xml file. My first attempt was playing around with ".#attrib" and "self::*#" but those does not seems to work.
EDIT2: I will try my best to explain (well, this is my first time to deal with xml problem using xpath. and english is not one of my favorite field....). Here is my input snippet http://pastebin.com/kZmVdbQQ (full one from here http://www.comp-sys-bio.org/yeastnet/ using version 4).
In my code, I try to get speciesTypes node with resource link chebi (<rdf:li rdf:resource="urn:miriam:obo.chebi:...."/>). and then I tried to get value from rdf:resource attribute in rdf:li. The thing is, I am pretty sure it would be easy to get attribute in child node if I start from parent node like speciesTypes, but I wonder how to do if I start from rdf:li. From my understanding, the "//" in xpath will looking for node from everywhere not just only in the current node.
below is my code
import lxml.etree as etree
tree = etree.parse("yeast_4.02.xml")
root = tree.getroot()
ns = {"sbml": "http://www.sbml.org/sbml/level2/version4",
"rdf":"http://www.w3.org/1999/02/22-rdf-syntax-ns#",
"body":"http://www.w3.org/1999/xhtml",
"re": "http://exslt.org/regular-expressions"
}
#good enough for now
maybemeta = root.xpath("//sbml:speciesType[descendant::rdf:li[starts-with(#rdf:resource, 'urn:miriam:obo.chebi') and not(starts-with(#rdf:resource, 'urn:miriam:uniprot'))]]", namespaces = ns)
def extract_name_and_chebi(node):
name = node.attrib['name']
chebies = node.xpath("./sbml:annotation//rdf:li[starts-with(#rdf:resource, 'urn:miriam:obo.chebi') and not(starts-with(#rdf:resource, 'urn:miriam:uniprot'))]", namespaces=ns) #get all rdf:li node with chebi resource
assert len(chebies) == 1
#my current solution to get rdf:resource value from rdf:li node
rdfNS = "{" + ns.get('rdf') + "}"
chebi = chebies[0].attrib[rdfNS + 'resource']
#do protein later
return (name, chebi)
metaWithChebi = map(extract_name_and_chebi, maybemeta)
fo = open("metabolites.txt", "w")
for name, chebi in metaWithChebi:
fo.write("{0}\t{1}\n".format(name, chebi))
Prefix the attribute name with # in the XPath query:
>>> from lxml import etree
>>> xml = """\
... <?xml version="1.0" encoding="utf8"?>
... <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
... <rdf:li rdf:resource="urn:miriam:obo.chebi:CHEBI%3A37671"/>
... </rdf:RDF>
... """
>>> tree = etree.fromstring(xml)
>>> ns = {'rdf': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'}
>>> tree.xpath('//rdf:li/#rdf:resource', namespaces=ns)
['urn:miriam:obo.chebi:CHEBI%3A37671']
EDIT
Here's a revised version of the script in the question:
import lxml.etree as etree
ns = {
'sbml': 'http://www.sbml.org/sbml/level2/version4',
'rdf':'http://www.w3.org/1999/02/22-rdf-syntax-ns#',
'body':'http://www.w3.org/1999/xhtml',
're': 'http://exslt.org/regular-expressions',
}
def extract_name_and_chebi(node):
chebies = node.xpath("""
.//rdf:li[
starts-with(#rdf:resource, 'urn:miriam:obo.chebi')
]/#rdf:resource
""", namespaces=ns)
return node.attrib['name'], chebies[0]
with open('yeast_4.02.xml') as xml:
tree = etree.parse(xml)
maybemeta = tree.xpath("""
//sbml:speciesType[descendant::rdf:li[
starts-with(#rdf:resource, 'urn:miriam:obo.chebi')]]
""", namespaces = ns)
with open('metabolites.txt', 'w') as output:
for node in maybemeta:
output.write('%s\t%s\n' % extract_name_and_chebi(node))
To select off the current node its attribute named rdf:resource, use this XPath expression:
#rdf:resource
In order for this to "work correctly" you must register the association of the prefix "rdf:" to the corresponding namespace.
If you don't know how to register the rdf namespace, it is still possible to select the attribute -- with this XPath expression:
#*[name()='rdf:resource']
Well, I got it. The xpath expression I need here is "./#rdf:resource" not ".#rdf:resource". But why ? I thought "./" indicate the child of current node.

Search and remove element with elementTree in Python

I have an XML document in which I want to search for some elements and if they match some criteria
I would like to delete them
However, I cannot seem to be able to access the parent of the element so that I can delete it
file = open('test.xml', "r")
elem = ElementTree.parse(file)
namespace = "{http://somens}"
props = elem.findall('.//{0}prop'.format(namespace))
for prop in props:
type = prop.attrib.get('type', None)
if type == 'json':
value = json.loads(prop.attrib['value'])
if value['name'] == 'Page1.Button1':
#here I need to access the parent of prop
# in order to delete the prop
Is there a way I can do this?
Thanks
You can remove child elements with the according remove method. To remove an element you have to call its parents remove method. Unfortunately Element does not provide a reference to its parents, so it is up to you to keep track of parent/child relations (which speaks against your use of elem.findall())
A proposed solution could look like this:
root = elem.getroot()
for child in root:
if child.name != "prop":
continue
if True:# TODO: do your check here!
root.remove(child)
PS: don't use prop.attrib.get(), use prop.get(), as explained here.
You could use xpath to select an Element's parent.
file = open('test.xml', "r")
elem = ElementTree.parse(file)
namespace = "{http://somens}"
props = elem.findall('.//{0}prop'.format(namespace))
for prop in props:
type = prop.get('type', None)
if type == 'json':
value = json.loads(prop.attrib['value'])
if value['name'] == 'Page1.Button1':
# Get parent and remove this prop
parent = prop.find("..")
parent.remove(prop)
http://docs.python.org/2/library/xml.etree.elementtree.html#supported-xpath-syntax
Except if you try that it doesn't work: http://elmpowered.skawaii.net/?p=74
So instead you have to:
file = open('test.xml', "r")
elem = ElementTree.parse(file)
namespace = "{http://somens}"
search = './/{0}prop'.format(namespace)
# Use xpath to get all parents of props
prop_parents = elem.findall(search + '/..')
for parent in prop_parents:
# Still have to find and iterate through child props
for prop in parent.findall(search):
type = prop.get('type', None)
if type == 'json':
value = json.loads(prop.attrib['value'])
if value['name'] == 'Page1.Button1':
parent.remove(prop)
It is two searches and a nested loop. The inner search is only on Elements known to contain props as first children, but that may not mean much depending on your schema.
I know this is an old thread but this kept popping up while I was trying to figure out a similar task. I did not like the accepted answer for two reasons:
1) It doesn't handle multiple nested levels of tags.
2) It will break if multiple xml tags are deleted in the same level one-after-another. Since each element is an index of Element._children you shouldn't delete while forward iterating.
I think a better more versatile solution is this:
import xml.etree.ElementTree as et
file = 'test.xml'
tree = et.parse(file)
root = tree.getroot()
def iterator(parents, nested=False):
for child in reversed(parents):
if nested:
if len(child) >= 1:
iterator(child)
if True: # Add your entire condition here
parents.remove(child)
iterator(root, nested=True)
For the OP, this should work - but I don't have the data you're working with to test if it's perfect.
import xml.etree.ElementTree as et
file = 'test.xml'
tree = et.parse(file)
namespace = "{http://somens}"
props = tree.findall('.//{0}prop'.format(namespace))
def iterator(parents, nested=False):
for child in reversed(parents):
if nested:
if len(child) >= 1:
iterator(child)
if prop.attrib.get('type') == 'json':
value = json.loads(prop.attrib['value'])
if value['name'] == 'Page1.Button1':
parents.remove(child)
iterator(props, nested=True)
A solution using lxml module
from lxml import etree
root = ET.fromstring(xml_str)
for e in root.findall('.//{http://some.name.space}node'):
parent = e.getparent()
for child in parent.find('./{http://some.name.space}node'):
try:
parent.remove(child)
except ValueError:
pass
Using the fact that every child must have a parent, I'm going to simplify #kitsu.eb's example. f using the findall command to get the children and parents, their indices will be equivalent.
file = open('test.xml', "r")
elem = ElementTree.parse(file)
namespace = "{http://somens}"
search = './/{0}prop'.format(namespace)
# Use xpath to get all parents of props
prop_parents = elem.findall(search + '/..')
props = elem.findall('.//{0}prop'.format(namespace))
for prop in props:
type = prop.attrib.get('type', None)
if type == 'json':
value = json.loads(prop.attrib['value'])
if value['name'] == 'Page1.Button1':
#use the index of the current child to find
#its parent and remove the child
prop_parents[props.index[prop]].remove(prop)
I also used XPath for this issue, but in a different way:
root = elem.getroot()
elementName = "YourElement"
#this will find all the parents of the elements with elementName
for elementParent in root.findall(".//{}/..".format(elementName)):
#this will find all the elements under the parent, and remove them
for element in elementParent.findall("{}".format(elementName)):
elementParent.remove(element)
I like to use an XPath expression for this kind of filtering. Unless I know otherwise, such an expression must be applied at the root level, which means I can't just get a parent and apply the same expression on that parent. However, it seems to me that there is a nice and flexible solution that should work with any supported XPath, as long as none of the sought nodes is the root. It goes something like this:
root = elem.getroot()
# Find all nodes matching the filter string (flt)
nodes = root.findall(flt)
while len(nodes):
# As long as there are nodes, there should be parents
# Get the first of all parents to the found nodes
parent = root.findall(flt+'/..')[0]
# Use this parent to remove the first node
parent.remove(nodes[0])
# Find all remaining nodes
nodes = root.findall(flt)
I would like only to add a comment on the accepted answer, but my lack of reputation doesn't allow me to. I wanted to add that it is important to add .findall("*")to the iterator to avoid issues, as stated in the documentation:
Note that concurrent modification while iterating can lead to problems, just like when iterating and modifying Python lists or dicts. Therefore, the example first collects all matching elements with root.findall(), and only then iterates over the list of matches.
Therefore, in the accepted answer the iteration should be for child in root.findal("*"):instead of for child in root:. Not doing so made my code skip some elements from the list.

Categories