I am trying to count from an XML file all the XML nodes of the form:
....
<node id="0">
<data key="d0">Attribute</data>
....
</node>
....
For example a file like this:
<graphml xmlns="http://graphml.graphdrawing.org/xmlns" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
<graph edgedefault="directed">
<node id="0">
<data key="d0">Attribute</data>
<data key="d1">Foo</data>
</node>
What I have tried is:
x = graphml_root.findall(".//"+nsfy("node")+"/["+nsfy("data")+"='Attribute']")
Butt his only says that the text of the XML has to be "Attribute", I want to make sure that "Attribute" is the text of the node with key="d0", so I tried this:
x = graphml_root.findall(".//"+nsfy("node")+"/"+nsfy("data")+"[#key='d0']"+"[""'Attribute']")
But it returns an empty list, so I am missing something.
NOTE:
I had to write a little lambda to avoid copying the xmlnamespace all teh time:
nsfy = lambda x : '{http://graphml.graphdrawing.org/xmlns}'+x #to be able to read namespace tags
Thanks.
Try doing something like:
nodes = []
containers = graphml_root.findall('.//node/data[#key="d0"]')
for container in containers:
if container.text == "Attribute":
nodes.append(container)
count = len(nodes)
from lxml import etree
f= '''
<node id="0">
<data key="d0" t="32">Attribute</data>
<data key="d1">Foo</data>
</node>'''
root = etree.XML(f)
data = root.xpath('.//*[#key="d0" and text()="Attribute"]')
print(data)
lxml provide the xpath method.and it's done.
UPDATE
read the DOC of xml.etree,it don't supported this syntax.the xpath supported by xml.etree
So,only you can do is find .//*[#key="d0"]then test it's text equal to "Attribute".
Related
I'm using python and I want to get some value from XML string.
For example if I have this XML string, which I'm getting from CSV:
<Event xmlns='http://schemas.microsoft.com/win/2004/08/events/event'>
<System>
<Provider Name='Microsoft-Windows' Guid='{aaaa-ss-www-qqq-qeqweqwe}'/>
<EventID>4771</EventID>
<Version>0</Version>
<Level>0</Level>
<Task>1000</Task>
<Opcode>0</Opcode>
<Keywords>0x9110</Keywords>
<TimeCreated SystemTime='2022-01-01T00:00:00.000000Z'/>
<EventRecordID>123123123</EventRecordID>
<Correlation/>
<Execution ProcessID='2' ThreadID='11'/>
<Channel>Security</Channel>
<Computer>pcname</Computer>
<Security/>
</System>
<EventData>
<Data Name='TargetUserName'>user</Data>
<Data Name='TargetSid'>S-1-5-21-123123-321312-123132-31212</Data>
<Data Name='ServiceName'>service/dom</Data>
<Data Name='TicketOptions'>0x123123</Data>
<Data Name='Status'>0xq</Data>
<Data Name='PreAuthType'>0</Data>
<Data Name='IpAddress'>::ffff:8.8.8.8</Data>
<Data Name='IpPort'>123321</Data>
<Data Name='CertIssuerName'></Data>
<Data Name='CertSerialNumber'></Data>
<Data Name='CertThumbprint'></Data>
</EventData>
</Event>
And I've got some code, with I can get some values by attribute path:
import os, csv
import xml.etree.ElementTree as ET
def cls():
os.system('cls' if os.name=='nt' else 'clear')
cls()
raw = open('C:/tmp2/data.csv', 'r')
reader = csv.reader(raw)
line_number = 1
for i, row in enumerate(reader):
if i == line_number:
break
tree = ET.fromstring(''.join(row))
EventID = [literal.text for literal in tree.findall('.//{http://schemas.microsoft.com/win/2004/08/events/event}System/{http://schemas.microsoft.com/win/2004/08/events/event}EventID')]
TimeCreated = [literal.text for literal in tree.findall('.//{http://schemas.microsoft.com/win/2004/08/events/event}System/{http://schemas.microsoft.com/win/2004/08/events/event}TimeCreated[#Name="SystemTime"]')]
TargetUserName = [literal.text for literal in tree.findall('.//{http://schemas.microsoft.com/win/2004/08/events/event}EventData/{http://schemas.microsoft.com/win/2004/08/events/event}Data[#Name="TargetUserName"]')]
ServiceName = [literal.text for literal in tree.findall('.//{http://schemas.microsoft.com/win/2004/08/events/event}EventData/{http://schemas.microsoft.com/win/2004/08/events/event}Data[#Name="ServiceName"]')]
print ('EVENT:',''.join(EventID))
print ('TimeCreated:',''.join(TimeCreated))
print ('TargetUserName:',''.join(TargetUserName))
print ('ServiceName:', ''.join(ServiceName))
How to get value of attribute, like EventID by attribute name?
You're close, though you should approach the namespaces a bit differntly and, if I understand you correctly, modify your TimeCreated:
ns = {'': 'http://schemas.microsoft.com/win/2004/08/events/event'}
TimeCreated= [tc.attrib['SystemTime'] for tc in tree.findall('.//System//TimeCreated[#SystemTime]',namespaces=ns)]
EventID = [eid.text for eid in tree.findall('.//System//EventID',namespaces=ns)]
TargetUserName = [tun.text for tun in tree.findall('.//EventData//Data[#Name="TargetUserName"]',namespaces=ns)]
ServiceName = [sn.text for sn in tree.findall('.//EventData//Data[#Name="ServiceName"]',namespaces=ns)]
Output of your print statements, given your sample xml, should be:
EVENT: 4771
TimeCreated: 2022-01-01T00:00:00.000000Z
TargetUserName: user
ServiceName: service/dom
I'm having a hard time following the xml.etree.ElementTree documentation with regard to parsing an XML document with a namespace and nested tags.
To begin, the xml tree I am trying to parse looks like:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ROOT-MAIN xmlns="http://fakeurl.com/page">
<Alarm> <--- I dont care about these types of objects
<Node>
<location>Texas></location>
<name>John</name>
</Node>
</Alarm>
<Alarm> <--- I care about these types of objects
<CreateTime>01/01/2011</CreateTime>
<Story>
<Node>
<Name>Ethan</name
<Address category="residential>
<address>1421 Morning SE</address>
</address>
</Node>
</Story>
<Build>
<Action category="build_value_1">Build was successful</Action>
</Build>
<OtherData type="string" meaning="favoriteTVShow">Purple</OtherData>
<OtherData type="string" meaning="favoriteColor">Seinfeld</OtherData>
</Alarm>
</ROOT-MAIN>
I am trying to build an array of dictionaries that have a similar structure to the second < Alarm > object. When parsing this XML file, I do the following:
import xml.etree.ElementTree as ET
tree = ET.parse('data/'+filename)
root = tree.getroot()
namespace= '{http://fakeurl.com/page}'
for alarm in tree.findall(namespace+'Alarm'):
for elem in alarm.iter():
try:
creation_time = elem.find(namespace+'CreateTime')
for story in elem.findall(namespace+'Story'):
for node in story.findall(namespace+'Node'):
for Address in node.findall(namespace+'Address'):
address = Address.find(namespace+'address').text
for build in elem.findall(namespace+'Build'):
category= build.find(namespace+'Action').attrib
action = build.find(namespace+'Action').text
for otherdata in elem.findall(namespace+'OtherData'):
#not sure how to get the 'meaning' attribute value as well as the text value for these <OtherData> tags
except:
pass
Right I'm just trying to get values for:
< address >
< Action > (attribute value and text value)
< OtherData > (attribute value and text value)
I'm sort of able to do this with for loops within for-loops but I was hoping for a cleaner, xpath solution which I haven't figured out how to do with a namespace.
Any suggestions would be much appreciated.
Here (collecting a subset of the elements you mentioned -- add more code to collect rest of elements)
import xml.etree.ElementTree as ET
import re
xmlstring = '''<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<root xmlns="http://fakeurl.com/page">
<Alarm>
<Node>
<location>Texas></location>
<name>John</name>
</Node>
</Alarm>
<Alarm>
<CreateTime>01/01/2011</CreateTime>
<Story>
<Node>
<Name>Ethan</Name>
<Address category="residential">
<address>1421 Morning SE</address>
</Address>
</Node>
</Story>
<Build>
<Action category="build_value_1">Build was successful</Action>
</Build>
<OtherData type="string" meaning="favoriteTVShow">Purple</OtherData>
<OtherData type="string" meaning="favoriteColor">Seinfeld</OtherData>
</Alarm>
</root>'''
xmlstring = re.sub(' xmlns="[^"]+"', '', xmlstring, count=1)
root = ET.fromstring(xmlstring)
alarms = root.findall('Alarm')
alarms_list = []
for alarm in alarms:
create_time = alarm.find('CreateTime')
if create_time is not None:
entry = {'create_time': create_time.text}
alarms_list.append(entry)
actions = alarm.findall('Build/Action')
if actions:
entry['builds'] = []
for action in actions:
entry['builds'].append({'category': action.attrib['category'], 'status': action.text})
print(alarms_list)
As I'm new to python and lxml also, not able to understand this error. Below is my xml text.
<node id="n25::n1">
<data key="d5" xml:space="preserve"><![CDATA[ronin_sanity]]></data>
<data key="d6">
<ShapeNode>
<Geometry height="86.25" width="182.0" x="3164.9136178770227" y="1045.403736953325"/>
<Fill color="#C0C0C0" transparent="false"/>
<BorderStyle color="#000000" raised="false" type="line" width="1.0"/>
<NodeLabel alignment="center" autoSizePolicy="content" fontFamily="Dialog" fontSize="12" fontStyle="plain" hasBackgroundColor="false" hasLineColor="false" height="18.701171875" horizontalTextPosition="center" iconTextGap="4" modelName="internal" modelPosition="c" textColor="#000000" verticalTextPosition="bottom" visible="true" width="83.376953125" x="49.3115234375" xml:space="preserve" y="33.7744140625">Messages App</NodeLabel>
<Shape type="ellipse"/>
</ShapeNode>
</data>
</node>
This is my xpath query. I want to search element with text Fill color ="#C0C0C0".
etree.xpath(/node/descendant::Fill[#color='#C0C0C0'])
You can simply use proper xpath to find the element as shown below,
In [1]: import lxml.etree as ET
In [2]: cat myxml.xml
<node id="n25::n1">
<data key="d5" xml:space="preserve"><![CDATA[ronin_sanity]]></data>
<data key="d6">
<ShapeNode>
<Geometry height="86.25" width="182.0" x="3164.9136178770227" y="1045.403736953325"/>
<Fill color="#C0C0C0" transparent="false"/>
<BorderStyle color="#000000" raised="false" type="line" width="1.0"/>
<NodeLabel alignment="center" autoSizePolicy="content" fontFamily="Dialog" fontSize="12" fontStyle="plain" hasBackgroundColor="false" hasLineColor="false" height="18.701171875" horizontalTextPosition="center" iconTextGap="4" modelName="internal" modelPosition="c" textColor="#000000" verticalTextPosition="bottom" visible="true" width="83.376953125" x="49.3115234375" xml:space="preserve" y="33.7744140625">Messages App</NodeLabel>
<Shape type="ellipse"/>
</ShapeNode>
</data>
</node>
In [3]: tree = ET.parse('myxml.xml')
In [4]: root = tree.getroot()
In [5]: elem = root.xpath('//Fill[#color="#C0C0C0"]')
In [6]: elem
Out[6]: [<Element Fill at 0x7efe04280098>]
if the node is not matching then you will get a empty list as output
In [7]: elem = root.xpath('//Fill[#color="#C0C0C0ABC"]')
In [8]: elem
Out[8]: []
Currently, I have an XML file. I want to say, if string is this print all the child element associated with this. I've documented some of the code that I've tried. I'm using the element tree built in.
XML
<commands>
<command name="this" type="out" major="0x1" minor="0x0">
<data bytes="1-0" descrip=" ID"></data>
<data bytes="3-2" descrip=" ID"></data>
<data bytes="5-4" descrip=" ID"></data>
<data bytes="7-6" descrip=" Code"></data>
<data bytes="12-8" descrip=" Revision"></data>
<data bytes="13" descrip=" Version"></data>
<data bytes="14" descrip=" Mask"></data>
<data bytes="15" descrip="Reserved"></data>
<data bytes="17-16" descrip=" Windows"></data>
<data bytes="19-18" descrip=" of Write Flush Addresses"></data>
</command>
</commands>
Sample Code to Parse Out Names
tree = ET.parse('command_details.xml')
root = tree.getroot()
for child in root:
if child.attrib['major'] == str(hex(int(major_bits[::-1], 2))) and child.attrib['minor'] == str(hex(int(minor_bits[::-1], 2))):
command_name = str(child.attrib['name'])
I basically want to dive deeper and print the sub tags of the command name.
You have to get the children of the child and iterate through all of the grandchildren
tree = ET.parse('command_details.xml')
root = tree.getroot()
for child in root:
if child.attrib['major'] == str(hex(int(major_bits[::-1], 2))) and child.attrib['minor'] == str(hex(int(minor_bits[::-1], 2))):
command_name = str(child.attrib['name'])
for grandchild in child.getchildren():
print str(grandchild.attrib['bytes'])
print str(grandchild.attrib['descrip'])
Or if you want to print the full XML line, you can do:
print ET.tostring(grandchild).strip()
This is the sample data.
input.xml
<root>
<entry id="1">
<headword>go</headword>
<example>I <hw>go</hw> to school.</example>
</entry>
</root>
I'd like to put node and its descendants into . That is,
output.xml
<root>
<entry id="1">
<headword>go</headword>
<examplegrp>
<example>I <hw>go</hw> to school.</example>
</examplegrp>
</entry>
</root>
My poor and incomplete script is:
import codecs
import xml.etree.ElementTree as ET
fin = codecs.open(r'input.xml', 'rb', encoding='utf-8')
data = ET.parse(fin)
root = data.getroot()
example = root.find('.//example')
for elem in example.iter():
---and then I don't know what to do---
Here's an example of how it can be done:
text = """
<root>
<entry id="1">
<headword>go</headword>
<example>I <hw>go</hw> to school.</example>
</entry>
</root>
"""
import lxml.etree
import StringIO
data = lxml.etree.parse(StringIO.StringIO(text))
root = data.getroot()
for entry in root.xpath('//example/ancestor::entry[1]'):
examplegrp = lxml.etree.SubElement(entry,"examplegrp")
nodes = [node for node in entry.xpath('./example')]
for node in nodes:
entry.remove(node)
examplegrp.append(node)
print lxml.etree.tostring(root,pretty_print=True)
which will output:
<root>
<entry id="1">
<headword>go</headword>
<examplegrp><example>I <hw>go</hw> to school.</example>
</examplegrp></entry>
</root>
http://docs.python.org/3/library/xml.dom.html?highlight=xml#node-objects
http://docs.python.org/3/library/xml.dom.html?highlight=xml#document-objects
You probably want to follow some paradigm of creating a Document Element and appending reach result to it.
group = Document.createElement(tagName)
for found in founds:
group.appendNode(found)
Or something like this