parsing .xml file using python :search and copy related data - python

I want to copy some data from .xml file based on some search value .
In below xml file I want to search 0xCCB7B836 ( 0xCCB7B836 )and copy data inside that
4e564d2d52656648
6173685374617274
1782af065966579e
899885d440d3ad67
d04b41b15e2b13c2
one more example :
search value 0xECFBBA1A and return 0000
or
search value 0xA54E2B5A and return 30d4
<MEM_DATA>
<MEM_SECTOR>
<MEM_SECTOR_NUMBER>0</MEM_SECTOR_NUMBER>
<MEM_SECTOR_STATUS>ACTIVE</MEM_SECTOR_STATUS>
<MEM_SECTOR_STARTADR>0x800000</MEM_SECTOR_STARTADR>
<MEM_SECTOR_ENDADR>0x0</MEM_SECTOR_ENDADR>
<MEM_SECTOR_COUNTER>0x1</MEM_SECTOR_COUNTER>
<MEM_ERASED_MARKER>SET</MEM_ERASED_MARKER>
<MEM_USED_MARKER>SET</MEM_USED_MARKER>
<MEM_FULL_MARKER>NOT_SET</MEM_FULL_MARKER>
<MEM_ERASE_MARKER>NOT_SET</MEM_ERASE_MARKER>
<MEM_START_MARKER>SET</MEM_START_MARKER>
<MEM_START_OFFSET>0x1</MEM_START_OFFSET>
<MEM_CLONE_MARKER>NOT_SET</MEM_CLONE_MARKER>
<MEM_BLOCK>
<MEM_BLOCK_ID>0x101</MEM_BLOCK_ID>
<MEM_BLOCK_NAME>UNKNOWN</MEM_BLOCK_NAME>
<MEM_BLOCK_STATUS>VALID</MEM_BLOCK_STATUS>
<MEM_BLOCK_FLAGS>0x0</MEM_BLOCK_FLAGS>
<MEM_BLOCK_STORAGE>Emulation</MEM_BLOCK_STORAGE>
<MEM_BLOCK_LEN>0x28</MEM_BLOCK_LEN>
<MEM_BLOCK_VERSION>0x0</MEM_BLOCK_VERSION>
<MEM_BLOCK_HEADER_CRC>0xE527</MEM_BLOCK_HEADER_CRC>
<MEM_BLOCK_CRC>0xCCB7B836</MEM_BLOCK_CRC>
<MEM_BLOCK_CRC2>None</MEM_BLOCK_CRC2>
<MEM_BLOCK_DATA>
<MEM_PAGE_DATA>4e564d2d52656648</MEM_PAGE_DATA>
<MEM_PAGE_DATA>6173685374617274</MEM_PAGE_DATA>
<MEM_PAGE_DATA>1782af065966579e</MEM_PAGE_DATA>
<MEM_PAGE_DATA>899885d440d3ad67</MEM_PAGE_DATA>
<MEM_PAGE_DATA>d04b41b15e2b13c2</MEM_PAGE_DATA>
</MEM_BLOCK_DATA>
</MEM_BLOCK>
<MEM_BLOCK>
<MEM_BLOCK_ID>0x20F</MEM_BLOCK_ID>
<MEM_BLOCK_NAME>UNKNOWN</MEM_BLOCK_NAME>
<MEM_BLOCK_STATUS>VALID</MEM_BLOCK_STATUS>
<MEM_BLOCK_FLAGS>0x0</MEM_BLOCK_FLAGS>
<MEM_BLOCK_STORAGE>Emulation</MEM_BLOCK_STORAGE>
<MEM_BLOCK_LEN>0x2</MEM_BLOCK_LEN>
<MEM_BLOCK_VERSION>0x0</MEM_BLOCK_VERSION>
<MEM_BLOCK_HEADER_CRC>0xE0D2</MEM_BLOCK_HEADER_CRC>
<MEM_BLOCK_CRC>0xECFBBA1A</MEM_BLOCK_CRC>
<MEM_BLOCK_CRC2>None</MEM_BLOCK_CRC2>
<MEM_BLOCK_DATA>
<MEM_PAGE_DATA>0000</MEM_PAGE_DATA>
</MEM_BLOCK_DATA>
</MEM_BLOCK>
<MEM_BLOCK>
<MEM_BLOCK_ID>0x1F8</MEM_BLOCK_ID>
<MEM_BLOCK_NAME>UNKNOWN</MEM_BLOCK_NAME>
<MEM_BLOCK_STATUS>VALID</MEM_BLOCK_STATUS>
<MEM_BLOCK_FLAGS>0x0</MEM_BLOCK_FLAGS>
<MEM_BLOCK_STORAGE>Emulation</MEM_BLOCK_STORAGE>
<MEM_BLOCK_LEN>0x2</MEM_BLOCK_LEN>
<MEM_BLOCK_VERSION>0x0</MEM_BLOCK_VERSION>
<MEM_BLOCK_HEADER_CRC>0x1DCC</MEM_BLOCK_HEADER_CRC>
<MEM_BLOCK_CRC>0xA54E2B5A</MEM_BLOCK_CRC>
<MEM_BLOCK_CRC2>None</MEM_BLOCK_CRC2>
<MEM_BLOCK_DATA>
<MEM_PAGE_DATA>30d4</MEM_PAGE_DATA>
</MEM_BLOCK_DATA>
</MEM_BLOCK>
</MEM_SECTOR>
</MEM_DATA>

Assuming that we have this xml data inside a file named test.xml, you can do something like that:
import xml.etree.ElementTree as ET
tree = ET.parse('test.xml')
root = tree.getroot()
def search_and_copy(query):
for child in root.findall("MEM_SECTOR/MEM_BLOCK"):
if child.find("MEM_BLOCK_CRC").text == query:
return [item.text for item in child.findall("MEM_BLOCK_DATA/*")]
Let's try this search_and_copy() function out:
>>> search_and_copy("0xCCB7B836")
['4e564d2d52656648', '6173685374617274', '1782af065966579e', '899885d440d3ad67', 'd04b41b15e2b13c2']
>>> search_and_copy("0xA54E2B5A")
['30d4']

We can use xpath, with python's xml etree and elementpath to write a function to retrieve the data :
Breakdown of the code below (within the elementpath.Selector):
1. the first line looks for elements that have our search string
2. The second line .. goes back one step to get the parent element
3. Proceeding from the parent element, this line searches for MEM_PAGE_DATA within the parent element. This element holds the data we are actually interested in.
4. The rest of the code simply pulls the text from the matches
import xml.etree.ElementTree as ET
import elementpath
#wrapped the shared data into a test.xml file
root = ET.parse('test.xml').getroot()
def find_data(search_string):
selector = elementpath.Selector(f""".//*[text()='{search_string}']
//..
//MEM_PAGE_DATA""")
#pull text from the match
result = [entry.text for entry in selector.select(root)]
return result
Test on the strings provided :
find_data("0xCCB7B836")
['4e564d2d52656648',
'6173685374617274',
'1782af065966579e',
'899885d440d3ad67',
'd04b41b15e2b13c2']
find_data("0xECFBBA1A")
['0000']
find_data("0xA54E2B5A")
['30d4']

Related

Having challenge with xml file

want to print this xml file such that I can be able to loop through it. my aim is to combine it with a csv file having the same column name, before creating a database with this combined file. I'm not allow to use non standard Libraries.
code------
xml_file = ET.parse("E:/Research work/My connect/Sam/CETM50 - 2022_3 - Assignment Data/user_data.xml")
get the parent tag
root = xml_file.getroot()
print the attributes of the first tag
e = ET.tostring(xml_file.getroot(), encoding='unicode', method='xml')
print(e)
output
<user firstName="Jayne" lastName="Wilson" age="69" sex="Female" retired="False" dependants="1" marital_status="divorced" salary="36872" pension="0" company="Wall, Reed and Whitehouse" commute_distance="10.47" address_postcode="TD95 7FL"

"not well-formed (invalid token): " error for trying to parse an XML file

I am having this error. I am trying to access an xmlfile called "people-kb.xml".
I am having the problem on a line known as: xmldoc = minidom.parse(xmlfile) #Accesses file.
xmldoc is "people-kb.xml" which is passed into a method such as:
parseXML('people-kb.xml')
So the problem I was having came from the save file I had created as I was trying to make a multiple trials that would contain information on two people. for now I only have one trial included and not multiple yet as I am starting with creating the file and after I would edit if it already exists.
the code for making the file is:
import xml.etree.cElementTree as ET
def saveXML(xmlfile):
root = ET.Element("Simulation")
ET.SubElement(root, "chaserStartingCoords").text = "1,1"
ET.SubElement(root, "runnerStartingCoords").text = "9,9"
doc = ET.SubElement(root, "trail")
ET.SubElement(doc, "number").text = "1"
doc1 = ET.SubElement(doc, "number", name="number").text = "1" #Trying to make multiple trials
ET.SubElement(doc1, "chaserEndCoords").text = "10,10"
ET.SubElement(doc1, "runnerInitialCoords").text = "10,10"
tree = ET.ElementTree(root)
tree.write(xmlfile)
if __name__ == '__main__':
saveXML('output.xml')
Where it says "number" I am trying to make it the amount of trials it would be. So what I am trying to make it expect is an output like this:
<simulation>
<chaserStartingCoords>1,1<chaserStartingCoords>
<runnerStartingCoords>9,9<runnerStartingoords>
<trial>
<number>1</number>
<move number="1">
<chaserEndcoords>10,10<chaserEndCoords>
<runnerInitialCoords>10,10<runnerInitialCoords>
</move>
</trial>
</simulation>
I've been having a problem trying to get the <move number="1"> part as later I expect to be able to go into the file and iterate through each node called "move" to check positions.
You say "when trying to name a node of the file, it shows a red highlight on "1" "
That suggests you're trying to use "1", or something beginning with "1", as an element or attribute name, which would be invalid.

Is there something wrong with my script or the XML file? I am using ElementTree in attempt to get out child attributes

This is a shorted version of the XML file that I am trying to parse:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<TipsContents xmlns="http://www.avendasys.com/tipsapiDefs/1.0">
<TipsHeader exportTime="Mon May 04 20:05:47 SAST 2020" version="6.8"/>
<Endpoints>
<Endpoint macVendor="SHENZHEN RF-LINK TECHNOLOGY CO.,LTD." macAddress="c46e7b2939cb" status="Known">
<EndpointProfile updatedAt="May 04, 2020 10:02:21 SAST" profiledBy="Policy Manager" addedAt="Mar 04, 2020 17:31:53 SAST" fingerprint="{}" conflict="false" name="Windows" family="Windows" category="Computer" staticIP="false" ipAddress="xxx.xxx.xxx.xxx"/>
<EndpointTags tagName="Username" tagValue="xxxxxxxx"/>
<EndpointTags tagName="Disabled Reason" tagValue="IS_ACTIVE"/>
</Endpoint>
</Endpoints>
<TagDictionaries>
<TagDictionary allowMultiple="false" mandatory="true" defaultValue="false" dataType="Boolean" attributeName="DOMAIN-MACHINES" entityName="Endpoint"/>
<TagDictionary allowMultiple="false" mandatory="true" defaultValue="true" dataType="Boolean" attributeName="IS_ACTIVE" entityName="Endpoint"/>
<TagDictionary allowMultiple="true" mandatory="false" dataType="String" attributeName="Disabled Reason" entityName="Endpoint"/>
<TagDictionary allowMultiple="false" mandatory="false" dataType="String" attributeName="Username" entityName="Endpoint"/>
</TagDictionaries>
</TipsContents>
I run the following script:
import xml.etree.ElementTree as ET
f = open("Endpoint-5.xml", 'r')
tree = ET.parse(f)
root = tree.getroot()
This is what my outputs look like:
In [8]: root = tree.getroot()
In [9]: root.findall('.')
Out[9]: [<Element '{http://www.avendasys.com/tipsapiDefs/1.0}TipsContents' at 0x10874b410>]
In [10]: root.findall('./TipsHeader')
Out[10]: []
In [11]: root.findall('./TipsContents')
Out[11]: []
In [15]: root.findall('{http://www.avendasys.com/tipsapiDefs/1.0}TipsContents//Endpoints/Endpoint/EndpointProfile')
Out[15]: []
I have been following this: https://docs.python.org/3/library/xml.etree.elementtree.html#example
among other tutorials but I don't seem to get an output.
I have tried from lxml import html
My script is as follows:
tree = html.fromstring(html=f)
updatedAt = tree.xpath("//TipsContents/Endpoints/Endpoint/EndpointProfile/#updatedAt")
name = tree.xpath("//TipsContents/Endpoints/Endpoint/EndpointProfile/#name")
category = tree.xpath("//TipsContents/Endpoints/Endpoint/EndpointProfile/#category")
tagValue = tree.xpath("//TipsContents/Endpoints/Endpoint/EndpointTags[#tagName = 'Username']/#tagValue")
active = tree.xpath("//TipsContents/Endpoints/Endpoint/EndpointTags[#tagName = 'Disabled Reason']/#tagValue")
print("Name:",name)
The above attempt also returns nothing.
I am able to parse an XML document from an API and use the second attempt successfully but when I am doing this from a local file I do not get the results.
Any assistance will be appreciated.
Note that your input XML contains a default namespace, so to refer to
any element you have to specify the namespace.
One of methods to do it is to define a dictionary of namespaces
(shortcut : full_name), in your case:
ns = {'tips': 'http://www.avendasys.com/tipsapiDefs/1.0'}
Then, using findall:
use the appropriate shortcut before the element name (and ':'),
pass the namespace dictionary as the second argument.
The code to do it is:
for elem in root.findall('./tips:TipsHeader', ns):
print(elem.attrib)
The result, for your input sample, is:
{'exportTime': 'Mon May 04 20:05:47 SAST 2020', 'version': '6.8'}
As far as root.findall('./TipsContents') is concerned, it will return
an empty list, even if you specify the namespace as above.
The reason is that TipsContents is the name of the root node,
whereas you attempt to find an element with the same name below in
the XML tree, but it contains no such element.
If you want to access attributes of the root element, you can run:
print(root.attrib)
but to get something more than an empty dictionary, you have to add
some attributes to the root element (namespace is not an attribute).

Is there a way to parse a XML according to its attributes?

I'm trying to parse my xml using minidom.parse but the program crushes when debugger reaches line xmldoc = minidom.parse(self)
Here is what have I tried:
attribValList = list()
xmldoc = minidom.parse(path)
equipments = xmldoc.getElementsByTagName(xmldoc , elementName)
equipNames = equipments.getElementsByTagName(xmldoc , attributeKey)
for item in equipNames:
attribValList.append(item.value)
return attribValList
Maybe my XML is too specific for minidom. Here is how it looks like:
<TestSystem id="...">
<Port>58</Port>
<TestSystemEquipment>
<Equipment type="BCAFC">
<Name>System1</Name>
<DU-Junctions>
...
</DU-Junctions>
</Equipment>
Basically I need to get for each Equipment its name and write the names into a list.
Can anybody tell what I'm doing wrong?
enter image description here

XML reading the last entry from the file in python

I want to read the last entry of the xml file and get its value. Here is my xml file
<TestSuite>
<TestCase>
<name>tcname1</name>
<total>1</total>
<totalpass>0</totalpass>
<totalfail>0</totalfail>
<totalerror>1</totalerror>
</TestCase>
<TestCase>
<name>tcname2</name>
<total>1</total>
<totalpass>0</totalpass>
<totalfail>0</totalfail>
<totalerror>1</totalerror>
</TestCase>
</TestSuite>
I want to get the <total> , <totalpass>,<totalfail> and <totalerror> value in the last tag of the file. I have tried this code to do that.
import xmltodict
with open(filename) as fd:
doc = xmltodict.parse(fd.read())
length=len(doc['TestSuite']['TestCase'])
tp=doc['TestSuite']['TestCase'][length-1]['totalpass']
tf=doc['TestSuite']['TestCase'][length-1]['totalfail']
te=doc['TestSuite']['TestCase'][length-1]['totalerror']
total=doc['TestSuite']['TestCase'][length-1]['total']
This works for the xml with 2 or more testcase tags in xml files , But fails with this error for the file with only one testcase tag .
Traceback (most recent call last):
File "HTMLReportGenerationFromXML.py", line 52, in <module>
tp=doc['TestSuite']['TestCase'][length-1]['totalpass']
KeyError: 4 .
Because instead of the count , it is taking the subtag ( etc value as length). Please help me resolve this issue.
Since you only want the last one, you can use negative indices to retrieve it:
import xml.etree.ElementTree as et
tree = et.parse('test.xml')
# collect all the test cases
test_cases = [test_case for test_case in tree.findall('TestCase')]
# Pull data from the last one
last = test_cases[-1]
total = last.find('total').text
totalpass = last.find('totalpass').text
totalfail = last.find('totalfail').text
totalerror = last.find('totalerror').text
print total,totalpass,totalfail,totalerror
Why didn't I do t his in the first place! Use xpath.
The first example involves processing the xml file with just one TestCase element, the second with two of them. The key point is to use the xpath last selector.
>>> from lxml import etree
>>> tree = etree.parse('temp.xml')
>>> last_TestCase = tree.xpath('.//TestCase[last()]')[0]
>>> for child in last_TestCase.iterchildren():
... child.tag, child.text
...
('name', 'tcname2')
('total', '1')
('totalpass', '0')
('totalfail', '0')
('totalerror', '1')
>>>
>>> tree = etree.parse('temp_2.xml')
>>> last_TestCase = tree.xpath('.//TestCase[last()]')[0]
>>> for child in last_TestCase.iterchildren():
... child.tag, child.text
...
('name', 'tcname1')
('reason', 'reason')
('total', '2')
('totalpass', '0')
('totalfail', '0')
('totalerror', '2')
I have tried this this works for me
import xml.etree.ElementTree as ET
import sys
tree = ET.parse('temp.xml')
root = tree.getroot()
print root
total=[]
totalpass=[]
totalfail=[]
totalerror=[]
for test in root.findall('TestCase'):
total.append(test.find('total').text)
totalpass.append(test.find('totalpass').text)
totalfail.append(test.find('totalfail').text)
totalerror.append(test.find('totalerror').text)
length=len(total)
print total[length-1],totalpass[length-1],totalfail[length-1],totalerror[length-1]
This one works for me
The reason of your error is that with xmltidict doc['TestSuite']['TestCase'] is a list just for long XMLs
>>> type(doc2['TestSuite']['TestCase']) # here doc2 is more than one-entry long XML file
>>> list
but it is just a kind of dictionary for a one-entry long file:
>>> type(doc['TestSuite']['TestCase']) # doc is one-entry long
>>> collections.OrderedDict
That's the reason. You could try to manage the issue in the following way:
import xmltodict
with open(filename) as fd:
doc = xmltodict.parse(fd.read())
if type(doc['TestSuite']['TestCase']) == list:
tp=doc['TestSuite']['TestCase'][length-1]['totalpass']
tf=doc['TestSuite']['TestCase'][length-1]['totalfail']
te=doc['TestSuite']['TestCase'][length-1]['totalerror']
total=doc['TestSuite']['TestCase'][length-1]['total']
else: # you have just a dict here
tp=doc['TestSuite']['TestCase']['totalpass']
tf=doc['TestSuite']['TestCase']['totalfail']
te=doc['TestSuite']['TestCase']['totalerror']
total=doc['TestSuite']['TestCase']['total']
Otherwise, you can use another library for the XML parsing.
...let me know if it helps!

Categories