lxml error : lxml.etree.XPathEvalError: Invalid expression with descendant - python

As I'm new to python and lxml also, not able to understand this error. Below is my xml text.
<node id="n25::n1">
<data key="d5" xml:space="preserve"><![CDATA[ronin_sanity]]></data>
<data key="d6">
<ShapeNode>
<Geometry height="86.25" width="182.0" x="3164.9136178770227" y="1045.403736953325"/>
<Fill color="#C0C0C0" transparent="false"/>
<BorderStyle color="#000000" raised="false" type="line" width="1.0"/>
<NodeLabel alignment="center" autoSizePolicy="content" fontFamily="Dialog" fontSize="12" fontStyle="plain" hasBackgroundColor="false" hasLineColor="false" height="18.701171875" horizontalTextPosition="center" iconTextGap="4" modelName="internal" modelPosition="c" textColor="#000000" verticalTextPosition="bottom" visible="true" width="83.376953125" x="49.3115234375" xml:space="preserve" y="33.7744140625">Messages App</NodeLabel>
<Shape type="ellipse"/>
</ShapeNode>
</data>
</node>
This is my xpath query. I want to search element with text Fill color ="#C0C0C0".
etree.xpath(/node/descendant::Fill[#color='#C0C0C0'])

You can simply use proper xpath to find the element as shown below,
In [1]: import lxml.etree as ET
In [2]: cat myxml.xml
<node id="n25::n1">
<data key="d5" xml:space="preserve"><![CDATA[ronin_sanity]]></data>
<data key="d6">
<ShapeNode>
<Geometry height="86.25" width="182.0" x="3164.9136178770227" y="1045.403736953325"/>
<Fill color="#C0C0C0" transparent="false"/>
<BorderStyle color="#000000" raised="false" type="line" width="1.0"/>
<NodeLabel alignment="center" autoSizePolicy="content" fontFamily="Dialog" fontSize="12" fontStyle="plain" hasBackgroundColor="false" hasLineColor="false" height="18.701171875" horizontalTextPosition="center" iconTextGap="4" modelName="internal" modelPosition="c" textColor="#000000" verticalTextPosition="bottom" visible="true" width="83.376953125" x="49.3115234375" xml:space="preserve" y="33.7744140625">Messages App</NodeLabel>
<Shape type="ellipse"/>
</ShapeNode>
</data>
</node>
In [3]: tree = ET.parse('myxml.xml')
In [4]: root = tree.getroot()
In [5]: elem = root.xpath('//Fill[#color="#C0C0C0"]')
In [6]: elem
Out[6]: [<Element Fill at 0x7efe04280098>]
if the node is not matching then you will get a empty list as output
In [7]: elem = root.xpath('//Fill[#color="#C0C0C0ABC"]')
In [8]: elem
Out[8]: []

Related

Parse xml string by attribute name

I'm using python and I want to get some value from XML string.
For example if I have this XML string, which I'm getting from CSV:
<Event xmlns='http://schemas.microsoft.com/win/2004/08/events/event'>
<System>
<Provider Name='Microsoft-Windows' Guid='{aaaa-ss-www-qqq-qeqweqwe}'/>
<EventID>4771</EventID>
<Version>0</Version>
<Level>0</Level>
<Task>1000</Task>
<Opcode>0</Opcode>
<Keywords>0x9110</Keywords>
<TimeCreated SystemTime='2022-01-01T00:00:00.000000Z'/>
<EventRecordID>123123123</EventRecordID>
<Correlation/>
<Execution ProcessID='2' ThreadID='11'/>
<Channel>Security</Channel>
<Computer>pcname</Computer>
<Security/>
</System>
<EventData>
<Data Name='TargetUserName'>user</Data>
<Data Name='TargetSid'>S-1-5-21-123123-321312-123132-31212</Data>
<Data Name='ServiceName'>service/dom</Data>
<Data Name='TicketOptions'>0x123123</Data>
<Data Name='Status'>0xq</Data>
<Data Name='PreAuthType'>0</Data>
<Data Name='IpAddress'>::ffff:8.8.8.8</Data>
<Data Name='IpPort'>123321</Data>
<Data Name='CertIssuerName'></Data>
<Data Name='CertSerialNumber'></Data>
<Data Name='CertThumbprint'></Data>
</EventData>
</Event>
And I've got some code, with I can get some values by attribute path:
import os, csv
import xml.etree.ElementTree as ET
def cls():
os.system('cls' if os.name=='nt' else 'clear')
cls()
raw = open('C:/tmp2/data.csv', 'r')
reader = csv.reader(raw)
line_number = 1
for i, row in enumerate(reader):
if i == line_number:
break
tree = ET.fromstring(''.join(row))
EventID = [literal.text for literal in tree.findall('.//{http://schemas.microsoft.com/win/2004/08/events/event}System/{http://schemas.microsoft.com/win/2004/08/events/event}EventID')]
TimeCreated = [literal.text for literal in tree.findall('.//{http://schemas.microsoft.com/win/2004/08/events/event}System/{http://schemas.microsoft.com/win/2004/08/events/event}TimeCreated[#Name="SystemTime"]')]
TargetUserName = [literal.text for literal in tree.findall('.//{http://schemas.microsoft.com/win/2004/08/events/event}EventData/{http://schemas.microsoft.com/win/2004/08/events/event}Data[#Name="TargetUserName"]')]
ServiceName = [literal.text for literal in tree.findall('.//{http://schemas.microsoft.com/win/2004/08/events/event}EventData/{http://schemas.microsoft.com/win/2004/08/events/event}Data[#Name="ServiceName"]')]
print ('EVENT:',''.join(EventID))
print ('TimeCreated:',''.join(TimeCreated))
print ('TargetUserName:',''.join(TargetUserName))
print ('ServiceName:', ''.join(ServiceName))
How to get value of attribute, like EventID by attribute name?
You're close, though you should approach the namespaces a bit differntly and, if I understand you correctly, modify your TimeCreated:
ns = {'': 'http://schemas.microsoft.com/win/2004/08/events/event'}
TimeCreated= [tc.attrib['SystemTime'] for tc in tree.findall('.//System//TimeCreated[#SystemTime]',namespaces=ns)]
EventID = [eid.text for eid in tree.findall('.//System//EventID',namespaces=ns)]
TargetUserName = [tun.text for tun in tree.findall('.//EventData//Data[#Name="TargetUserName"]',namespaces=ns)]
ServiceName = [sn.text for sn in tree.findall('.//EventData//Data[#Name="ServiceName"]',namespaces=ns)]
Output of your print statements, given your sample xml, should be:
EVENT: 4771
TimeCreated: 2022-01-01T00:00:00.000000Z
TargetUserName: user
ServiceName: service/dom

Python XML get immediate child elements only

I have an xml file as below:
<?xml version="1.0" encoding="utf-8"?>
<EDoc CID="1000101" Cname="somename" IName="iname" CSource="e1" Version="1.0">
<RIGLIST>
<RIG RIGID="100001" RIGName="RgName1">
<ListID>
<nodeA nodeAID="1000011" nodeAName="node1A" nodeAExtID="9000011" />
<nodeA nodeAID="1000012" nodeAName="node2A" nodeAExtID="9000012" />
<nodeA nodeAID="1000013" nodeAName="node3A" nodeAExtID="9000013" />
<nodeA nodeAID="1000014" nodeAName="node4A" nodeAExtID="9000014" />
<nodeA nodeAID="1000015" nodeAName="node5A" nodeAExtID="9000015" />
<nodeA nodeAID="1000016" nodeAName="node6A" nodeAExtID="9000016" />
<nodeA nodeAID="1000017" nodeAName="node7A" nodeAExtID="9000017" />
</ListID>
</RIG>
<RIG RIGID="100002" RIGName="RgName2">
<ListID>
<nodeA nodeAID="1000021" nodeAName="node1B" nodeAExtID="9000021" />
<nodeA nodeAID="1000022" nodeAName="node2B" nodeAExtID="9000022" />
<nodeA nodeAID="1000023" nodeAName="node3B" nodeAExtID="9000023" />
</ListID>
</RIG>
</RIGLIST>
</EDoc>
I need to search for the Node value RIGName and if match is found print out all the values of nodeAName
Example:
Searching for RIGName = "RgName2" should print all the values as node1B, node2B, node3B
As of now I am only able to get the first part as below:
import xml.etree.ElementTree as eT
import re
xmlfilePath = "Path of xml file"
tree = eT.parse(xmlfilePath)
root = tree.getroot()
for elem in root.iter("RIGName"):
# print(elem.tag, elem.attrib)
if re.findall(searchtxt, elem.attrib['RIGName'], re.IGNORECASE):
print(elem.attrib)
count += 1
How can I get only the immediate child node values?
Switching from xml.etree to lxml would give you a way to do it in a single go because of a much better XPath query language support:
In [1]: from lxml import etree as ET
In [2]: tree = ET.parse('input.xml')
In [3]: root = tree.getroot()
In [4]: root.xpath('//RIG[#RIGName = "RgName2"]/ListID/nodeA/#nodeAName')
Out[4]: ['node1B', 'node2B', 'node3B']

Python xpath with xml.etree.ElementTree: multiple conditions

I am trying to count from an XML file all the XML nodes of the form:
....
<node id="0">
<data key="d0">Attribute</data>
....
</node>
....
For example a file like this:
<graphml xmlns="http://graphml.graphdrawing.org/xmlns" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
<graph edgedefault="directed">
<node id="0">
<data key="d0">Attribute</data>
<data key="d1">Foo</data>
</node>
What I have tried is:
x = graphml_root.findall(".//"+nsfy("node")+"/["+nsfy("data")+"='Attribute']")
Butt his only says that the text of the XML has to be "Attribute", I want to make sure that "Attribute" is the text of the node with key="d0", so I tried this:
x = graphml_root.findall(".//"+nsfy("node")+"/"+nsfy("data")+"[#key='d0']"+"[""'Attribute']")
But it returns an empty list, so I am missing something.
NOTE:
I had to write a little lambda to avoid copying the xmlnamespace all teh time:
nsfy = lambda x : '{http://graphml.graphdrawing.org/xmlns}'+x #to be able to read namespace tags
Thanks.
Try doing something like:
nodes = []
containers = graphml_root.findall('.//node/data[#key="d0"]')
for container in containers:
if container.text == "Attribute":
nodes.append(container)
count = len(nodes)
from lxml import etree
f= '''
<node id="0">
<data key="d0" t="32">Attribute</data>
<data key="d1">Foo</data>
</node>'''
root = etree.XML(f)
data = root.xpath('.//*[#key="d0" and text()="Attribute"]')
print(data)
lxml provide the xpath method.and it's done.
UPDATE
read the DOC of xml.etree,it don't supported this syntax.the xpath supported by xml.etree
So,only you can do is find .//*[#key="d0"]then test it's text equal to "Attribute".

Iterate through XML child of a child tags in Python

Currently, I have an XML file. I want to say, if string is this print all the child element associated with this. I've documented some of the code that I've tried. I'm using the element tree built in.
XML
<commands>
<command name="this" type="out" major="0x1" minor="0x0">
<data bytes="1-0" descrip=" ID"></data>
<data bytes="3-2" descrip=" ID"></data>
<data bytes="5-4" descrip=" ID"></data>
<data bytes="7-6" descrip=" Code"></data>
<data bytes="12-8" descrip=" Revision"></data>
<data bytes="13" descrip=" Version"></data>
<data bytes="14" descrip=" Mask"></data>
<data bytes="15" descrip="Reserved"></data>
<data bytes="17-16" descrip=" Windows"></data>
<data bytes="19-18" descrip=" of Write Flush Addresses"></data>
</command>
</commands>
Sample Code to Parse Out Names
tree = ET.parse('command_details.xml')
root = tree.getroot()
for child in root:
if child.attrib['major'] == str(hex(int(major_bits[::-1], 2))) and child.attrib['minor'] == str(hex(int(minor_bits[::-1], 2))):
command_name = str(child.attrib['name'])
I basically want to dive deeper and print the sub tags of the command name.
You have to get the children of the child and iterate through all of the grandchildren
tree = ET.parse('command_details.xml')
root = tree.getroot()
for child in root:
if child.attrib['major'] == str(hex(int(major_bits[::-1], 2))) and child.attrib['minor'] == str(hex(int(minor_bits[::-1], 2))):
command_name = str(child.attrib['name'])
for grandchild in child.getchildren():
print str(grandchild.attrib['bytes'])
print str(grandchild.attrib['descrip'])
Or if you want to print the full XML line, you can do:
print ET.tostring(grandchild).strip()

Parsing XML with Python and etree

I want to extract all the way elements that contain a tag with the key 'highway' and a specific value from the following example Open Street Map XML file:
<?xml version="1.0" encoding="UTF-8"?>
<osm version="0.6" generator="CGImap 0.0.2">
<bounds minlat="54.0889580" minlon="12.2487570" maxlat="54.0913900" maxlon="12.2524800"/>
<node id="298884272" lat="54.0901447" lon="12.2516513" user="SvenHRO" uid="46882" visible="true" version="1" changeset="676636" timestamp="2008-09-21T21:37:45Z"/>
<way id="26659127" user="Masch" uid="55988" visible="true" version="5" changeset="4142606" timestamp="2010-03-16T11:47:08Z">
<nd ref="292403538"/>
<nd ref="298884289"/>
<nd ref="261728686"/>
<tag k="highway" v="unclassified"/>
<tag k="name" v="Pastower Straße"/>
</way>
<relation id="56688" user="kmvar" uid="56190" visible="true" version="28" changeset="6947637" timestamp="2011-01-12T14:23:49Z">
<member type="node" ref="294942404" role=""/>
...
<member type="node" ref="364933006" role=""/>
<member type="way" ref="4579143" role=""/>
...
<member type="node" ref="249673494" role=""/>
<tag k="name" v="Küstenbus Linie 123"/>
<tag k="network" v="VVW"/>
<tag k="operator" v="Regionalverkehr Küste"/>
<tag k="ref" v="123"/>
<tag k="route" v="bus"/>
<tag k="type" v="route"/>
</relation>
</osm>
To do this; I wrote the following piece of Python code that uses the Etree library. It parses the XML document and uses the findall function (with XPath syntax)
import xml.etree.ElementTree as ET
supported_highways = ('motorway', 'trunk', 'primary', 'secondary', 'tertiary', 'unclassified', 'residential', 'highway_link', 'trunk_link', 'primary_link', 'secondary_link', 'tertiary_link')
class OSMParser:
def __init__(self, inputData):
self.root = ET.fromstring(inputData)
def getRoads(self):
ways = dict()
for road in self.root.findall('./way/'):
highway_tags = road.findall("./tag[#k='highway']")
if not highway_tags:
continue
if all(highway.attrib['v'] not in supported_highways for highway in highway_tags):
continue
However when I run the code, it does not find tag of the way element (the second findall produces an empty list). Any idea what's wrong? Thank you.
Try the XPath //way/ instead of ./way/.
Its working.
>>> root.findall("./way/tag[#k='highway']")
[<Element 'tag' at 0xb74568ac>]
I think in your input content tag way is not child of main start tag i.e. root tag.
or use lxml.etree
>>> import lxml.etree as ET1
>>> root = ET1.fromstring(content)
>>> root.xpath("//way/tag[#k='highway']")
[<Element tag at 0xb745642c>]

Categories