Parse xml string by attribute name - python

I'm using python and I want to get some value from XML string.
For example if I have this XML string, which I'm getting from CSV:
<Event xmlns='http://schemas.microsoft.com/win/2004/08/events/event'>
<System>
<Provider Name='Microsoft-Windows' Guid='{aaaa-ss-www-qqq-qeqweqwe}'/>
<EventID>4771</EventID>
<Version>0</Version>
<Level>0</Level>
<Task>1000</Task>
<Opcode>0</Opcode>
<Keywords>0x9110</Keywords>
<TimeCreated SystemTime='2022-01-01T00:00:00.000000Z'/>
<EventRecordID>123123123</EventRecordID>
<Correlation/>
<Execution ProcessID='2' ThreadID='11'/>
<Channel>Security</Channel>
<Computer>pcname</Computer>
<Security/>
</System>
<EventData>
<Data Name='TargetUserName'>user</Data>
<Data Name='TargetSid'>S-1-5-21-123123-321312-123132-31212</Data>
<Data Name='ServiceName'>service/dom</Data>
<Data Name='TicketOptions'>0x123123</Data>
<Data Name='Status'>0xq</Data>
<Data Name='PreAuthType'>0</Data>
<Data Name='IpAddress'>::ffff:8.8.8.8</Data>
<Data Name='IpPort'>123321</Data>
<Data Name='CertIssuerName'></Data>
<Data Name='CertSerialNumber'></Data>
<Data Name='CertThumbprint'></Data>
</EventData>
</Event>
And I've got some code, with I can get some values by attribute path:
import os, csv
import xml.etree.ElementTree as ET
def cls():
os.system('cls' if os.name=='nt' else 'clear')
cls()
raw = open('C:/tmp2/data.csv', 'r')
reader = csv.reader(raw)
line_number = 1
for i, row in enumerate(reader):
if i == line_number:
break
tree = ET.fromstring(''.join(row))
EventID = [literal.text for literal in tree.findall('.//{http://schemas.microsoft.com/win/2004/08/events/event}System/{http://schemas.microsoft.com/win/2004/08/events/event}EventID')]
TimeCreated = [literal.text for literal in tree.findall('.//{http://schemas.microsoft.com/win/2004/08/events/event}System/{http://schemas.microsoft.com/win/2004/08/events/event}TimeCreated[#Name="SystemTime"]')]
TargetUserName = [literal.text for literal in tree.findall('.//{http://schemas.microsoft.com/win/2004/08/events/event}EventData/{http://schemas.microsoft.com/win/2004/08/events/event}Data[#Name="TargetUserName"]')]
ServiceName = [literal.text for literal in tree.findall('.//{http://schemas.microsoft.com/win/2004/08/events/event}EventData/{http://schemas.microsoft.com/win/2004/08/events/event}Data[#Name="ServiceName"]')]
print ('EVENT:',''.join(EventID))
print ('TimeCreated:',''.join(TimeCreated))
print ('TargetUserName:',''.join(TargetUserName))
print ('ServiceName:', ''.join(ServiceName))
How to get value of attribute, like EventID by attribute name?

You're close, though you should approach the namespaces a bit differntly and, if I understand you correctly, modify your TimeCreated:
ns = {'': 'http://schemas.microsoft.com/win/2004/08/events/event'}
TimeCreated= [tc.attrib['SystemTime'] for tc in tree.findall('.//System//TimeCreated[#SystemTime]',namespaces=ns)]
EventID = [eid.text for eid in tree.findall('.//System//EventID',namespaces=ns)]
TargetUserName = [tun.text for tun in tree.findall('.//EventData//Data[#Name="TargetUserName"]',namespaces=ns)]
ServiceName = [sn.text for sn in tree.findall('.//EventData//Data[#Name="ServiceName"]',namespaces=ns)]
Output of your print statements, given your sample xml, should be:
EVENT: 4771
TimeCreated: 2022-01-01T00:00:00.000000Z
TargetUserName: user
ServiceName: service/dom

Related

Add tag with content to existing XML (resx) using python

I have an XML with a number of strings:
<?xml version="1.0" encoding="UTF-8"?>
<Strings>
<String id="TEST_STRING_FROM_XML">
<en>Test string from XML</en>
<de>Testzeichenfolge aus XML</de>
<es>Cadena de prueba de XML</es>
<fr>Tester la chaîne à partir de XML</fr>
<it>Stringa di test da XML</it>
<ja>XMLからのテスト文字列</ja>
<ko>XML에서 테스트 문자열</ko>
<nl>Testreeks van XML</nl>
<pl>Łańcuch testowy z XML</pl>
<pt>Cadeia de teste de XML</pt>
<ru>Тестовая строка из XML</ru>
<sv>Teststräng från XML</sv>
<zh-CHS>从XML测试字符串</zh-CHS>
<zh-CHT>從XML測試字符串</zh-CHT>
<Comment>A test string that comes from a shared XML file.</Comment>
</String>
<String id="TEST_STRING_FROM_XML_2">
<en>Another test string from XML.</en>
<de></de>
<es></es>
<fr></fr>
<it></it>
<ja></ja>
<ko></ko>
<nl></nl>
<pl></pl>
<pt></pt>
<ru></ru>
<sv></sv>
<zh-CHS></zh-CHS>
<zh-CHT></zh-CHT>
<Comment>Another test string that comes from a shared XML file.</Comment>
</String>
</Strings>
And I would like to append these strings to a resx file with a long list of strings in the following format:
<?xml version="1.0" encoding="utf-8"?>
<root>
<!--
Microsoft ResX Schema
Version 2.0
**a bunch of schema and header stuff...**
-->
<data name="STRING_NAME_1" xml:space="preserve">
<value>This is a value 1</value>
<comment>This is a comment 1</comment>
</data>
<data name="STRING_NAME_2" xml:space="preserve">
<value>This is a value 2</value>
<comment>This is a comment 2</comment>
</data>
</root>
But using the following snippet of python code:
import sys, os, os.path, re
import xml.etree.ElementTree as ET
from xml.dom import minidom
existingStrings = []
newStrings = {}
languages = []
resx = '*path to resx file*'
def LoadAllNewStrings():
src_root = ET.parse('Strings.xml').getroot()
for src_string in src_root.findall('String'):
src_id = src_string.get('id')
src_value = src_string.findtext("en")
src_comment = src_string.findtext("Comment")
content = [src_value, src_comment]
newStrings[src_id] = content
def ExscludeExistingStrings():
dest_root = ET.parse(resx)
for stringName in dest_root.findall('Name'):
for stringId in newStrings:
if stringId == stringName:
newStrings.remove(stringId)
def PrettifyXML(element):
roughString = ET.tostring(element, 'utf-8')
reparsed = minidom.parseString(roughString)
return reparsed.toprettyxml(indent=" ")
def AddMissingStringsToLocalResource():
ExscludeExistingStrings()
with open(resx, "a") as output:
root = ET.parse(resx).getroot()
for newString in newStrings:
data = ET.Element("data", name=newString)
newStringContent = newStrings[newString]
newStringValue = newStringContent[0]
newStringComment = newStringContent[1]
ET.SubElement(data, "value").text = newStringValue
ET.SubElement(data, "comment").text = newStringComment
output.write(PrettifyXML(data))
if __name__ == "__main__":
LoadAllNewStrings()
AddMissingStringsToLocalResource()
I get the following XML appended to the end of the resx file:
<data name="STRING_NAME_2" xml:space="preserve">
<value>This is a value 1</value>
<comment>This is a comment 1</comment>
</data>
</root><?xml version="1.0" ?>
<data name="TEST_STRING_FROM_XML">
<value>Test string from XML</value>
<comment>A test string that comes from a shared XML file.</comment>
</data>
<?xml version="1.0" ?>
<data name="TEST_STRING_FROM_XML_2">
<value>Another test string from XML.</value>
<comment>Another test string that comes from a shared XML file.</comment>
</data>
I.e. the root ends and then my new strings are added after. Any ideas on how to add the data tags to the existing root properly?
with open(resx, "a") as output:
No. Don't open XML files as text files. Not for reading, not for writing, not for appending. Never.
The typical life cycle of an XML file is:
parsing (with an XML parser)
reading or Modification (with a DOM API)
if there were changes: Serializition (also with a DOM API)
At no point should you ever call open() on an XML file. XML files are not supposed to be treated as if they were plain text. They are not.
# parsing
resx = ET.parse(resx_path)
root = resx.getroot()
# modification
for newString in newStrings:
newStringContent = newStrings[newString]
# create node
data = ET.Element("data", name=newString)
ET.SubElement(data, "value").text = newStringContent[0]
ET.SubElement(data, "comment").text = newStringContent[1]
# append node, e.g. to the top level element
root.append(data)
# serialization
resx.write(resx_path, encoding='utf8')

Parsing XML with namespace into dictionary

I'm having a hard time following the xml.etree.ElementTree documentation with regard to parsing an XML document with a namespace and nested tags.
To begin, the xml tree I am trying to parse looks like:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ROOT-MAIN xmlns="http://fakeurl.com/page">
<Alarm> <--- I dont care about these types of objects
<Node>
<location>Texas></location>
<name>John</name>
</Node>
</Alarm>
<Alarm> <--- I care about these types of objects
<CreateTime>01/01/2011</CreateTime>
<Story>
<Node>
<Name>Ethan</name
<Address category="residential>
<address>1421 Morning SE</address>
</address>
</Node>
</Story>
<Build>
<Action category="build_value_1">Build was successful</Action>
</Build>
<OtherData type="string" meaning="favoriteTVShow">Purple</OtherData>
<OtherData type="string" meaning="favoriteColor">Seinfeld</OtherData>
</Alarm>
</ROOT-MAIN>
I am trying to build an array of dictionaries that have a similar structure to the second < Alarm > object. When parsing this XML file, I do the following:
import xml.etree.ElementTree as ET
tree = ET.parse('data/'+filename)
root = tree.getroot()
namespace= '{http://fakeurl.com/page}'
for alarm in tree.findall(namespace+'Alarm'):
for elem in alarm.iter():
try:
creation_time = elem.find(namespace+'CreateTime')
for story in elem.findall(namespace+'Story'):
for node in story.findall(namespace+'Node'):
for Address in node.findall(namespace+'Address'):
address = Address.find(namespace+'address').text
for build in elem.findall(namespace+'Build'):
category= build.find(namespace+'Action').attrib
action = build.find(namespace+'Action').text
for otherdata in elem.findall(namespace+'OtherData'):
#not sure how to get the 'meaning' attribute value as well as the text value for these <OtherData> tags
except:
pass
Right I'm just trying to get values for:
< address >
< Action > (attribute value and text value)
< OtherData > (attribute value and text value)
I'm sort of able to do this with for loops within for-loops but I was hoping for a cleaner, xpath solution which I haven't figured out how to do with a namespace.
Any suggestions would be much appreciated.
Here (collecting a subset of the elements you mentioned -- add more code to collect rest of elements)
import xml.etree.ElementTree as ET
import re
xmlstring = '''<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<root xmlns="http://fakeurl.com/page">
<Alarm>
<Node>
<location>Texas></location>
<name>John</name>
</Node>
</Alarm>
<Alarm>
<CreateTime>01/01/2011</CreateTime>
<Story>
<Node>
<Name>Ethan</Name>
<Address category="residential">
<address>1421 Morning SE</address>
</Address>
</Node>
</Story>
<Build>
<Action category="build_value_1">Build was successful</Action>
</Build>
<OtherData type="string" meaning="favoriteTVShow">Purple</OtherData>
<OtherData type="string" meaning="favoriteColor">Seinfeld</OtherData>
</Alarm>
</root>'''
xmlstring = re.sub(' xmlns="[^"]+"', '', xmlstring, count=1)
root = ET.fromstring(xmlstring)
alarms = root.findall('Alarm')
alarms_list = []
for alarm in alarms:
create_time = alarm.find('CreateTime')
if create_time is not None:
entry = {'create_time': create_time.text}
alarms_list.append(entry)
actions = alarm.findall('Build/Action')
if actions:
entry['builds'] = []
for action in actions:
entry['builds'].append({'category': action.attrib['category'], 'status': action.text})
print(alarms_list)

lxml error : lxml.etree.XPathEvalError: Invalid expression with descendant

As I'm new to python and lxml also, not able to understand this error. Below is my xml text.
<node id="n25::n1">
<data key="d5" xml:space="preserve"><![CDATA[ronin_sanity]]></data>
<data key="d6">
<ShapeNode>
<Geometry height="86.25" width="182.0" x="3164.9136178770227" y="1045.403736953325"/>
<Fill color="#C0C0C0" transparent="false"/>
<BorderStyle color="#000000" raised="false" type="line" width="1.0"/>
<NodeLabel alignment="center" autoSizePolicy="content" fontFamily="Dialog" fontSize="12" fontStyle="plain" hasBackgroundColor="false" hasLineColor="false" height="18.701171875" horizontalTextPosition="center" iconTextGap="4" modelName="internal" modelPosition="c" textColor="#000000" verticalTextPosition="bottom" visible="true" width="83.376953125" x="49.3115234375" xml:space="preserve" y="33.7744140625">Messages App</NodeLabel>
<Shape type="ellipse"/>
</ShapeNode>
</data>
</node>
This is my xpath query. I want to search element with text Fill color ="#C0C0C0".
etree.xpath(/node/descendant::Fill[#color='#C0C0C0'])
You can simply use proper xpath to find the element as shown below,
In [1]: import lxml.etree as ET
In [2]: cat myxml.xml
<node id="n25::n1">
<data key="d5" xml:space="preserve"><![CDATA[ronin_sanity]]></data>
<data key="d6">
<ShapeNode>
<Geometry height="86.25" width="182.0" x="3164.9136178770227" y="1045.403736953325"/>
<Fill color="#C0C0C0" transparent="false"/>
<BorderStyle color="#000000" raised="false" type="line" width="1.0"/>
<NodeLabel alignment="center" autoSizePolicy="content" fontFamily="Dialog" fontSize="12" fontStyle="plain" hasBackgroundColor="false" hasLineColor="false" height="18.701171875" horizontalTextPosition="center" iconTextGap="4" modelName="internal" modelPosition="c" textColor="#000000" verticalTextPosition="bottom" visible="true" width="83.376953125" x="49.3115234375" xml:space="preserve" y="33.7744140625">Messages App</NodeLabel>
<Shape type="ellipse"/>
</ShapeNode>
</data>
</node>
In [3]: tree = ET.parse('myxml.xml')
In [4]: root = tree.getroot()
In [5]: elem = root.xpath('//Fill[#color="#C0C0C0"]')
In [6]: elem
Out[6]: [<Element Fill at 0x7efe04280098>]
if the node is not matching then you will get a empty list as output
In [7]: elem = root.xpath('//Fill[#color="#C0C0C0ABC"]')
In [8]: elem
Out[8]: []

Python xpath with xml.etree.ElementTree: multiple conditions

I am trying to count from an XML file all the XML nodes of the form:
....
<node id="0">
<data key="d0">Attribute</data>
....
</node>
....
For example a file like this:
<graphml xmlns="http://graphml.graphdrawing.org/xmlns" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
<graph edgedefault="directed">
<node id="0">
<data key="d0">Attribute</data>
<data key="d1">Foo</data>
</node>
What I have tried is:
x = graphml_root.findall(".//"+nsfy("node")+"/["+nsfy("data")+"='Attribute']")
Butt his only says that the text of the XML has to be "Attribute", I want to make sure that "Attribute" is the text of the node with key="d0", so I tried this:
x = graphml_root.findall(".//"+nsfy("node")+"/"+nsfy("data")+"[#key='d0']"+"[""'Attribute']")
But it returns an empty list, so I am missing something.
NOTE:
I had to write a little lambda to avoid copying the xmlnamespace all teh time:
nsfy = lambda x : '{http://graphml.graphdrawing.org/xmlns}'+x #to be able to read namespace tags
Thanks.
Try doing something like:
nodes = []
containers = graphml_root.findall('.//node/data[#key="d0"]')
for container in containers:
if container.text == "Attribute":
nodes.append(container)
count = len(nodes)
from lxml import etree
f= '''
<node id="0">
<data key="d0" t="32">Attribute</data>
<data key="d1">Foo</data>
</node>'''
root = etree.XML(f)
data = root.xpath('.//*[#key="d0" and text()="Attribute"]')
print(data)
lxml provide the xpath method.and it's done.
UPDATE
read the DOC of xml.etree,it don't supported this syntax.the xpath supported by xml.etree
So,only you can do is find .//*[#key="d0"]then test it's text equal to "Attribute".

Iterate through XML child of a child tags in Python

Currently, I have an XML file. I want to say, if string is this print all the child element associated with this. I've documented some of the code that I've tried. I'm using the element tree built in.
XML
<commands>
<command name="this" type="out" major="0x1" minor="0x0">
<data bytes="1-0" descrip=" ID"></data>
<data bytes="3-2" descrip=" ID"></data>
<data bytes="5-4" descrip=" ID"></data>
<data bytes="7-6" descrip=" Code"></data>
<data bytes="12-8" descrip=" Revision"></data>
<data bytes="13" descrip=" Version"></data>
<data bytes="14" descrip=" Mask"></data>
<data bytes="15" descrip="Reserved"></data>
<data bytes="17-16" descrip=" Windows"></data>
<data bytes="19-18" descrip=" of Write Flush Addresses"></data>
</command>
</commands>
Sample Code to Parse Out Names
tree = ET.parse('command_details.xml')
root = tree.getroot()
for child in root:
if child.attrib['major'] == str(hex(int(major_bits[::-1], 2))) and child.attrib['minor'] == str(hex(int(minor_bits[::-1], 2))):
command_name = str(child.attrib['name'])
I basically want to dive deeper and print the sub tags of the command name.
You have to get the children of the child and iterate through all of the grandchildren
tree = ET.parse('command_details.xml')
root = tree.getroot()
for child in root:
if child.attrib['major'] == str(hex(int(major_bits[::-1], 2))) and child.attrib['minor'] == str(hex(int(minor_bits[::-1], 2))):
command_name = str(child.attrib['name'])
for grandchild in child.getchildren():
print str(grandchild.attrib['bytes'])
print str(grandchild.attrib['descrip'])
Or if you want to print the full XML line, you can do:
print ET.tostring(grandchild).strip()

Categories