Get a single child note using lxml

Get a single child note using lxml - python

Edit - The issue was that I was running an outdated version of lxml - I feel really stupid now but I'm glad I found out.
I'm having trouble iterating through an XML tree to export single child elements.
What I'm looking for is isolating child elements and exporting them in separate xml files. But my problem is that when I'm using the 'etree.iter' function, I'm not only getting the children elements, I'm also getting all following siblings. How can I only get one child element at the time?
This should explain it better. Here's my sample code:
from lxml import etree
root = etree.XML("<users><user><name>Test</name><id>01</id></user> \
<user><name>Test</name><id>02</id></user> \
<user><name>Test</name><id>03</id></user></users>")
for record in root.iter("user"):
print(etree.tostring(record))
It produces the following output
b'<user><name>Test</name><id>01</id></user><user><name>Test</name><id>02</id></user><user><name>Test</name><id>03</id></user></users>'
b'<user><name>Test</name><id>02</id></user><user><name>Test</name><id>03</id></user></users>'
b'<user><name>Test</name><id>03</id></user></users>'
But what I need is
b'<user><name>Test</name><id>01</id></user>'
b'<user><name>Test</name><id>02</id></user>'
b'<user><name>Test</name><id>03</id></user>'
What am I doing wrong?

Quite not sure why iter is producing such an error. Try this, it works fine.
xn = etree.fromstring("<users><user><name>Test</name><id>01</id></user><user><name>Test</name><id>02</id></user><user><name>Test</name><id>03</id></user></users>")
user_nodes = xn.findall("user")
str_nodes = [etree.tostring(un) for un in user_nodes]
print(str_nodes)
produces an expected output
[
b'<user><name>Test</name><id>01</id></user>',
b'<user><name>Test</name><id>02</id></user>',
b'<user><name>Test</name><id>03</id></user>']

Related

Find for multiple tags' values with lxml

I am using lxml to parse an XML like this sample one:
<compounddef xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" id="d2/db7/class_foo" kind="class">
<compoundname>FooClass</compoundname>
<sectiondef kind="public-type">
<memberdef kind="typedef" id="d2/db7/class_bar">
<type><ref refid="d3/d73/struct_foo" kindref="compound">StructFoo</ref></type>
<definition>StructFooDefinition</definition>
</memberdef>
</sectiondef>
</compounddef>
I'm trying to get the element with <refid> "d3/d73/struct_foo" and with the <definition> containing the text "Foo".
There could be many refid with that value and many definitions containing Foo, but only one has this combination.
I am able to first find all the elements with that refid and then filter this list by checking which of them containts "Foo" in the , but since I'm working with a really big XML file (~1GB) and the application is time sensitive, I wanted to avoid this.
I tried combining the various etree paths using the keyword 'and' or '//precede:...', but without success.
My last try was:
self.dox_tree_root_.xpath(".//compounddef[#kind = 'class']//memberdef[#kind='typedef'][/type/ref[#refid='%s'] and contains(definition, 'name')]" % (independent_type_refid, name)))
but it is giving me an error.
Is there a way to combine the two filters inside one command?

You can use XPATH
//a[.//ref[#refid="12345"] and contains(c, "Good")]

If I understand your correctly, this should get you close enough:
.//compounddef[#kind = 'class']//memberdef[#kind='typedef'][./type/ref[#refid='d3/d73/struct_foo']][contains(.//definition, 'Foo')]//definition
Output:
StructFooDefinition

python xml.etree - how to search on more than one attribute

I have an XML file with this line:
<op type="create" file="C:/Users/mureadr/Desktop/A/HMI_FORGF/bld/armle-v7/release/SimpleNetwork/Makefile" found="0"/>
I want to use xml.etree to search on more than one attribute:
result = tree.search('.//op[#type="create" #file="c:/Users/mureadr/Desktop/A/HMI_FORGF/bld/armle-v7/release/HmiLogging/Makefile"]')
But I get an error
raise SyntaxError("invalid predicate")
I tried this (added and), still got same error
'.//op[#type="create" and #file="c:/Users/mureadr/Desktop/A/HMI_FORGF/bld/armle-v7/release/HmiLogging/Makefile"]'
Tried adding &&, still got same error
'.//op[#type="create" && #file="c:/Users/mureadr/Desktop/A/HMI_FORGF/bld/armle-v7/release/HmiLogging/Makefile"]'
Finally, tried &, still got same error
'.//op[#type="create" & #file="c:/Users/mureadr/Desktop/A/HMI_FORGF/bld/armle-v7/release/HmiLogging/Makefile"]'
I'm guessing that this is a limitation of xml.etree.
Probably I shouldn't use it in the future, but I'm almost done with my project.
For N attributes, how do I use etree.xml to be able to search on all N attributes?

You can use multiple square brackets in succession
'.//op[#type="create"][#file="/some/path"]'
UPDATE: I see that you are using python's xml.etree module. I am not sure if the above answer is valid for that module (It has extremely limited support for XPath). I'd suggest using the go-to library for all XML tasks -- LXML. If you'd use lxml, it would be simply doc.xpath(".//op[..][..]")

Xpath - obtaining 2 nodes with 1 node having default value if missing

I am using xpath in Python 2.7 with lxml:
from lxml import html
...
tree = html.fromstring(source)
results = tree.xpath(...xpath string...)
Now the problem is the xpath string and am getting quite lost in this. I am trying to get all the nodes from one path like this:
//a[#class="hyperlinkClass"]/span/text() (1)
There are no missing entries in this part and this works fine. But I'm also trying to get a part relative to this as well, like so:
//a[#class="hyperlinkClass"]/span/following-sibling::div[#class="divClassName"]/span[#class="spanClassName"]/text() (2)
This works fine by itself, but (2) may or may not have nodes for each node in (1). What I would like to do is to have a default value for if (2) is missing/empty for each (1), say "absent". This sounds straightforward and maybe it is, but I'm hitting a brick wall here.
By doing '(1) | (2)' I get all the values needed, but no way to match them. If I do '(1) | concat((2), "absent")', this doesn't work either - concat doesn't seem to work in python, though I've read with xpath that it is valid. I saw here the "Becker method", but that doesn't work either (or I can't get it to).
Hopefully, someone can shine a light on how to get this working or if it's even possible.

Don't make this more complicated than it is:
path1 = '//a[#class="hyperlinkClass"]/span'
path2 = './following-sibling::div[#class="divClassName"]/span[#class="spanClassName"]'
for link in tree.xpath(path1):
other_node = link.xpath(path2)
if len(other_node):
print(link.text, other_node[0].text)
else:
print(link.text, 'n/a')

Parse XML attribute to variable with ElementTree

Hello im writing a bit of code im Maya and running into some issues with ElementTree. I need help reading in this xml, or something similar. The XML is generated based on a selection, so it can change.
<root>
<Locations>
<1 name="CacheLocation">C:\Users\daunish\Desktop</1>
</Locations>
<Objects>
<1 name="Sphere">[u'pSphere1', u'pSphere2']</1>
<2 name="Cube">[u'pCube1']</2>
</Objects>
</root>
I need a way of searching for a particular "name" inside "Locations", and passing the text to a variable.
I also need a way of going through each line inside of "Objects" and preforming a functions, as in a for loop.
I'm open to all suggestions, I have been going crazy trying to get this to work. If you think i should format the XML differently I'm up for that as well. Thanks in advance for the help.

[Note: your XML is not well formed because you can't have tags that start with a number]
Not sure what you've tried but there are many ways to do this, here's one:
Find the first element with name=CacheLocation in Locations:
>>> filename = root.find("./Locations/*[#name='CacheLocation']").text
>>> filename
'C:\\Users\\daunish\\Desktop'
Iterating over all the elements in Objects:
>>> import ast
>>> for target in root.find("./Objects"):
... for i in ast.literal_eval(target.text):
... print(target.get('name'), i)
Sphere pSphere1
Sphere pSphere2
Cube pCube1

Creating a document tree before or after adding the subelements

I am using lxml and Python for writing XML files. I was wondering what is the accepted practice: creating a document tree first and then adding the sub elements OR adding the sub elements and creating the tree later? I know this hardly makes any difference as to the output, but I was interested in knowing what is the accepted norm in this from a coding-style point of view.
Sample code:
page = etree.Element('root')
#first create the tree
doc = etree.ElementTree(page)
#add the subelements
headElt = etree.SubElement(page, 'head')
Or this:
page = etree.Element('root')
headElt = etree.SubElement(page, 'head')
#create the tree in the end
doc = etree.ElementTree(page)

Since tree construction is typically a recursive action, I would say that the tree root could get created last, once the subtree is done. However, I don't see any reason why that should be any better than creating the tree first. I honestly don't think there's an accepted norm for this, and rather than trying to find one I would advise you to write your code in such a way that it makes sense for you and anyone else that might need to read and understand it later.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Get a single child note using lxml - python

Related

Find for multiple tags' values with lxml

python xml.etree - how to search on more than one attribute

Xpath - obtaining 2 nodes with 1 node having default value if missing

Parse XML attribute to variable with ElementTree

Creating a document tree before or after adding the subelements

Categories

Resources