python read complex xml with ElementTree

python read complex xml with ElementTree - python

I am trying to parse this xml file with python element tree:
<?xml version="1.0" encoding="Windows-1250"?>
<rsp:responsePack version="2.0" id="001" state="ok" note="" programVersion="9801.8 (19.5.2011)" xmlns:rsp="http://www.stormware.cz/schema/version_2/response.xsd" xmlns:rdc="http://www.stormware.cz/schema/version_2/documentresponse.xsd" xmlns:typ="http://www.stormware.cz/schema/version_2/type.xsd" xmlns:lst="http://www.stormware.cz/schema/version_2/list.xsd" xmlns:lStk="http://www.stormware.cz/schema/version_2/list_stock.xsd" xmlns:lAdb="http://www.stormware.cz/schema/version_2/list_addBook.xsd" xmlns:acu="http://www.stormware.cz/schema/version_2/accountingunit.xsd" xmlns:inv="http://www.stormware.cz/schema/version_2/invoice.xsd" xmlns:vch="http://www.stormware.cz/schema/version_2/voucher.xsd" xmlns:int="http://www.stormware.cz/schema/version_2/intDoc.xsd" xmlns:stk="http://www.stormware.cz/schema/version_2/stock.xsd" xmlns:ord="http://www.stormware.cz/schema/version_2/order.xsd" xmlns:ofr="http://www.stormware.cz/schema/version_2/offer.xsd" xmlns:enq="http://www.stormware.cz/schema/version_2/enquiry.xsd" xmlns:vyd="http://www.stormware.cz/schema/version_2/vydejka.xsd" xmlns:pri="http://www.stormware.cz/schema/version_2/prijemka.xsd" xmlns:bal="http://www.stormware.cz/schema/version_2/balance.xsd" xmlns:pre="http://www.stormware.cz/schema/version_2/prevodka.xsd" xmlns:vyr="http://www.stormware.cz/schema/version_2/vyroba.xsd" xmlns:pro="http://www.stormware.cz/schema/version_2/prodejka.xsd" xmlns:con="http://www.stormware.cz/schema/version_2/contract.xsd" xmlns:adb="http://www.stormware.cz/schema/version_2/addressbook.xsd" xmlns:prm="http://www.stormware.cz/schema/version_2/parameter.xsd" xmlns:lCon="http://www.stormware.cz/schema/version_2/list_contract.xsd" xmlns:ctg="http://www.stormware.cz/schema/version_2/category.xsd" xmlns:ipm="http://www.stormware.cz/schema/version_2/intParam.xsd">
<rsp:responsePackItem version="2.0" id="li1" state="ok">
<lst:listInvoice version="2.0" dateTimeStamp="2011-05-27T10:47:23Z" dateValidFrom="2011-05-27" state="ok">
<lst:invoice version="2.0">
<inv:invoiceHeader>
<inv:id>20</inv:id>
<inv:invoiceType>issuedInvoice</inv:invoiceType>
<inv:number>
<typ:id>26</typ:id>
<typ:ids>1101</typ:ids>
<typ:numberRequested>110100001</typ:numberRequested>
</inv:number>
<inv:symVar>110100001</inv:symVar>
<inv:date>2011-01-30</inv:date>
<inv:dateTax>2011-01-30</inv:dateTax>
<inv:dateAccounting>2011-01-30</inv:dateAccounting>
<inv:dateDue>2011-02-13</inv:dateDue>
<inv:accounting>
<typ:id>17</typ:id>
<typ:ids>3Fv</typ:ids>
</inv:accounting>
<inv:classificationVAT>
<typ:id>251</typ:id>
<typ:ids>UD</typ:ids>
<typ:classificationVATType/>
</inv:classificationVAT>
<inv:text>Fakturujeme Vám zboží dle Vaší objednávky: </inv:text>
<inv:partnerIdentity>
<typ:id>15</typ:id>
<typ:address>
<typ:company>INTEAK spol. s r. o.</typ:company>
<typ:division>prodejna</typ:division>
<typ:name>David Jánský</typ:name>
<typ:city>Benešovice</typ:city>
<typ:street>Jiřího z Poděbrad 35</typ:street>
<typ:zip>463 48</typ:zip>
<typ:ico>85236972</typ:ico>
<typ:dic>CZ85236972</typ:dic>
</typ:address>
<typ:shipToAddress>
<typ:company/>
<typ:division/>
<typ:name/>
<typ:city/>
<typ:street/>
</typ:shipToAddress>
</inv:partnerIdentity>
<inv:myIdentity>
<typ:address>
<typ:company>Novák </typ:company>
<typ:surname>Novák</typ:surname>
<typ:name>Jan</typ:name>
<typ:city>Jihlava 1</typ:city>
<typ:street>Horní</typ:street>
<typ:number>15</typ:number>
<typ:zip>586 01</typ:zip>
<typ:ico>12345678</typ:ico>
<typ:dic>CZ12345678</typ:dic>
<typ:phone>569 876 542</typ:phone>
<typ:mobilPhone>602 852 369</typ:mobilPhone>
<typ:fax>564 563 216</typ:fax>
<typ:email>info#novak.cz</typ:email>
<typ:www>www.novak.cz</typ:www>
</typ:address>
</inv:myIdentity>
<inv:dateOrder>2011-01-22</inv:dateOrder>
<inv:paymentType>
<typ:id>1</typ:id>
<typ:ids>příkazem</typ:ids>
<typ:paymentType>draft</typ:paymentType>
</inv:paymentType>
<inv:account>
<typ:id>2</typ:id>
<typ:ids>KB</typ:ids>
</inv:account>
<inv:symConst>0308</inv:symConst>
<inv:centre>
<typ:id>1</typ:id>
<typ:ids>BRNO</typ:ids>
</inv:centre>
<inv:activity>
<typ:id>2</typ:id>
<typ:ids>NÁBYTEK</typ:ids>
</inv:activity>
<inv:liquidation>
<typ:date>2011-02-12</typ:date>
<typ:amountHome>356</typ:amountHome>
</inv:liquidation>
</inv:invoiceHeader>
<inv:invoiceDetail>
<inv:invoiceItem>
<inv:id>19</inv:id>
<inv:text>Židle Z220</inv:text>
<inv:quantity>2.0</inv:quantity>
<inv:unit>ks</inv:unit>
<inv:coefficient>1.0</inv:coefficient>
<inv:rateVAT>high</inv:rateVAT>
<inv:discountPercentage>0.0</inv:discountPercentage>
<inv:homeCurrency>
<typ:unitPrice>1968</typ:unitPrice>
<typ:price>3936</typ:price>
<typ:priceVAT>787.2</typ:priceVAT>
<typ:priceSum>4723.2</typ:priceSum>
</inv:homeCurrency>
<inv:code>Z220</inv:code>
<inv:guarantee>0</inv:guarantee>
<inv:guaranteeType>none</inv:guaranteeType>
<inv:stockItem>
<typ:store>
<typ:id>1</typ:id>
<typ:ids>ZBOŽÍ</typ:ids>
</typ:store>
<typ:stockItem>
<typ:id>27</typ:id>
<typ:ids>Z220</typ:ids>
<typ:PLU>650</typ:PLU>
</typ:stockItem>
</inv:stockItem>
</inv:invoiceItem>
<inv:invoiceItem>
<inv:id>20</inv:id>
<inv:text>Konferenční stolek chrom</inv:text>
<inv:quantity>1.0</inv:quantity>
<inv:unit>ks</inv:unit>
<inv:coefficient>1.0</inv:coefficient>
<inv:rateVAT>high</inv:rateVAT>
<inv:discountPercentage>0.0</inv:discountPercentage>
<inv:homeCurrency>
<typ:unitPrice>7680</typ:unitPrice>
<typ:price>7680</typ:price>
<typ:priceVAT>1536</typ:priceVAT>
<typ:priceSum>9216</typ:priceSum>
</inv:homeCurrency>
<inv:note>Rozměr: 120 x 60</inv:note>
<inv:code>Konf11</inv:code>
<inv:guarantee>0</inv:guarantee>
<inv:guaranteeType>none</inv:guaranteeType>
<inv:stockItem>
<typ:store>
<typ:id>1</typ:id>
<typ:ids>ZBOŽÍ</typ:ids>
</typ:store>
<typ:stockItem>
<typ:id>10</typ:id>
<typ:ids>Konf11</typ:ids>
<typ:PLU>625</typ:PLU>
</typ:stockItem>
</inv:stockItem>
</inv:invoiceItem>
<inv:invoiceItem>
<inv:id>21</inv:id>
<inv:text>Křeslo čalouněné 1320</inv:text>
<inv:quantity>4.0</inv:quantity>
<inv:unit>ks</inv:unit>
<inv:coefficient>1.0</inv:coefficient>
<inv:rateVAT>high</inv:rateVAT>
<inv:discountPercentage>0.0</inv:discountPercentage>
<inv:homeCurrency>
<typ:unitPrice>5988</typ:unitPrice>
<typ:price>23952</typ:price>
<typ:priceVAT>4790.4</typ:priceVAT>
<typ:priceSum>28742.4</typ:priceSum>
</inv:homeCurrency>
<inv:code>Kř1320</inv:code>
<inv:guarantee>0</inv:guarantee>
<inv:guaranteeType>none</inv:guaranteeType>
<inv:stockItem>
<typ:store>
<typ:id>1</typ:id>
<typ:ids>ZBOŽÍ</typ:ids>
</typ:store>
<typ:stockItem>
<typ:id>13</typ:id>
<typ:ids>Kř1320</typ:ids>
<typ:PLU>627</typ:PLU>
</typ:stockItem>
</inv:stockItem>
</inv:invoiceItem>
</inv:invoiceDetail>
<inv:invoiceSummary>
<inv:roundingDocument>up2one</inv:roundingDocument>
<inv:roundingVAT>none</inv:roundingVAT>
<inv:homeCurrency>
<typ:priceNone>0</typ:priceNone>
<typ:priceLow>0</typ:priceLow>
<typ:priceLowVAT>0</typ:priceLowVAT>
<typ:priceLowSum>0</typ:priceLowSum>
<typ:priceHigh>35568</typ:priceHigh>
<typ:priceHighVAT>7113.6</typ:priceHighVAT>
<typ:priceHighSum>42681.6</typ:priceHighSum>
<typ:round>
<typ:priceRound>0.4</typ:priceRound>
</typ:round>
</inv:homeCurrency>
</inv:invoiceSummary>
</lst:invoice>
</lst:listInvoice>
</rsp:responsePackItem>
</rsp:responsePack>
Please, how can I get data such as: (?)
inv:invoiceSummary - typ:priceHighSum
inv:partnerIdentity - typ:name, typ:ico
inv:myIdentity - typ:company
inv:liquidation - typ:date
I tried this but can't get it working:
import xml.etree.ElementTree as ET
tree = ET.parse('temp_xml2.xml')
root = tree.getroot()
for listInvoice in root.findall('listInvoice'):
invoiceHeader = listInvoice.find('invoiceHeader').text
print invoiceHeader

Try yo use jsoup. An related example is parse XML.

this works:
for listInvoice in root.findall('.//{http://www.stormware.cz/schema/version_2/invoice.xsd}invoiceHeader'):
invoiceHeader = listInvoice.find('.//{http://www.stormware.cz/schema/version_2/invoice.xsd}id').text
print invoiceHeader

Related

Create dataframe of certain XML element's text python pandas

I am trying to create a dataframe out the XML code as shown below
<Structure>
<Field>
<Field_Name>GAMEREF</Field_Name>
<Field_Type>Numeric</Field_Type>
<Field_Len>4</Field_Len>
<Field_Dec>0</Field_Dec>
</Field>
...
<Field>
<Field_Name>WINLOSS</Field_Name>
<Field_Type>Character</Field_Type>
<Field_Len>1</Field_Len>
<Field_Dec>0</Field_Dec>
</Field>
</Structure>
<Records>
<Record>
<GAMEREF>1217</GAMEREF>
<YEAR>2021</YEAR>
(MORE ELEMENTS I DO NOT CARE ABOUT)
<GAMENO>1</GAMENO>
<WINLOSS>W</WINLOSS>
</Record>
...
<Record>
<GAMEREF>1220</GAMEREF>
<YEAR>2021</YEAR>
(MORE ELEMENTS I DO NOT CARE ABOUT)
<GAMENO>4</GAMENO>
<WINLOSS>L</WINLOSS>
</Record>
</Records>
The structure section of the XML code that is irrelevant to the dataframe I am trying to create.
I am trying to only use the XML elements of GAMEREF, YEAR, GAMENO, and WINLOSS as there are more in the XML for the Record elements.
I have tried using code as shown below to get this to work, but when I run the code I get the error of "AttributeError: 'NoneType' object has no attribute 'text'"
Code is below.
import pandas as pd
import xml.etree.ElementTree as et
xtree = et.parse("gamedata.xml")
xroot = xtree.getroot()
df_cols = ["GAME REF","YEAR", "GAME NO", "WIN LOSS"]
rows = []
for child in xroot.iter():
s_gameref = child.find('GAMEREF').text,
s_year = child.find('YEAR').text,
s_game_no = child.find('GAMENO').text,
s_winloss = child.find('WINLOSS').text
rows.append({"GAME REF": s_gameref,"YEAR": s_year,
"GAME NO": s_game_no, "WIN LOSS": s_winloss})
df = pd.DataFrame(rows, columns = df_cols)
The code is based off other stuff I have seen on the Stack and other sites, but nothing is working yet.
Ideal dataframe output is below
GAME REF
YEAR
GAME NO
WIN LOSS
1217
2021
1
W
1218
2021
2
W
1219
2021
3
L
1220
2021
4
L
Thanks
EDIT - NOT SURE WHAT IS GOING ON WITH MY TABLE, BUT IT SHOULD LOOK LIKE THIS

I think the below is what you are looking for. (Just loop over the "interesting" sub elements of Record). The logic of the code is in the line that starts with data = [.... The 2 loops can be found there.
import pandas as pd
import xml.etree.ElementTree as ET
xml = '''<r><Structure>
<Field>
<Field_Name>GAMEREF</Field_Name>
<Field_Type>Numeric</Field_Type>
<Field_Len>4</Field_Len>
<Field_Dec>0</Field_Dec>
</Field>
...
<Field>
<Field_Name>WINLOSS</Field_Name>
<Field_Type>Character</Field_Type>
<Field_Len>1</Field_Len>
<Field_Dec>0</Field_Dec>
</Field>
</Structure>
<Records>
<Record>
<GAMEREF>1217</GAMEREF>
<YEAR>2021</YEAR>
<GAMENO>1</GAMENO>
<WINLOSS>W</WINLOSS>
</Record>
<Record>
<GAMEREF>1220</GAMEREF>
<YEAR>2021</YEAR>
<GAMENO>4</GAMENO>
<WINLOSS>L</WINLOSS>
</Record>
</Records></r>'''
fields = {'GAMEREF':'GAME REF', 'YEAR':'YEAR', 'GAMENO':'GAME NO','WINLOSS':'WIN LOSS'}
root = ET.fromstring(xml)
data = [{display_name: rec.find(element_name).text for element_name,display_name in fields.items()} for rec in root.findall('.//Record')]
df = pd.DataFrame(data)
print(df)
output
GAME REF YEAR GAME NO WIN LOSS
0 1217 2021 1 W
1 1220 2021 4 L

import pandas as pd
import xml.etree.ElementTree as et
xtree = et.parse("gamedata.xml")
xroot = xtree.getroot()
df_cols = ["GAME REF","YEAR", "GAME NO", "WIN LOSS"]
rows = []
for record in xroot:
s_gameref = record.find('GAMEREF').text
s_year = record.find('YEAR').text
s_game_no = record.find('GAMENO').text
s_winloss = record.find('WINLOSS').text
rows.append({"GAME REF": s_gameref,"YEAR": s_year,
"GAME NO": s_game_no, "WIN LOSS": s_winloss})
df = pd.DataFrame(rows, columns = df_cols)
Remove .iter()

Python re.findall organize list

I have a text file with entries like this:
<soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<soap:Body>
<Applications_GetResponse xmlns="http://www.country.com">
<Applications>
<CS_Application>
<Name>Spain</Name>
<Key>2345364564</Key>
<Status>NORMAL</Status>
<Modules>
<CS_Module>
<Name>zaragoza</Name>
<Key>8743249725</Key>
<DevelopmentEffort>0</DevelopmentEffort>
<LogicalDBConnections/>
</CS_Module>
<CS_Module>
<Name>malaga</Name>
<Key>8743249725</Key>
<DevelopmentEffort>0</DevelopmentEffort>
<LogicalDBConnections/>
</CS_Module>
</Modules>
<CreatedBy>7</CreatedBy>
</CS_Application>
<CS_Application>
<Name>UK</Name>
<Key>2345364564</Key>
<Status>NORMAL</Status>
<Modules>
<CS_Module>
<Name>london</Name>
<Key>8743249725</Key>
<DevelopmentEffort>0</DevelopmentEffort>
<LogicalDBConnections/>
</CS_Module>
<CS_Module>
<Name>liverpool</Name>
<Key>8743249725</Key>
<DevelopmentEffort>0</DevelopmentEffort>
<LogicalDBConnections/>
</CS_Module>
</Modules>
<CreatedBy>7</CreatedBy>
</CS_Application>
</Applications>
</Applications_GetResponse>
</soap:Body>
</soap:Envelope>
I would like to analyze it and obtain the name of the country in the sequence of the cities.
I tried some things with python re.finall, but I didn't get anything like it
print("HERE APPLICATIONS")
applications = re.findall('<CS_Application><Name>(.*?)</Name>', response_apply.text)
print(applications)
print("HERE MODULES")
modules = re.findall('<CS_Module><Name>(.*?)</Name>', response_apply.text)
print(modules)
return:
host-10$ sudo python3 capture.py
HERE APPLICATIONS
['Spain', 'UK']
HERE MODULES
['zaragoza', 'malaga', 'london', 'liverpool']
The expected result is, I would like the result to be like this:
HERE
The Country: Spain - Cities: zaragoza,malaga
The Country: UK - Cities: london,liverpool

Regex is not good to parse xml. Better use xml parser..
If you want regex solution then hope below code help you.
import re
s = """\n<soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">\n <soap:Body>\n <Applications_GetResponse xmlns="http://www.country.com">\n <Applications>\n <CS_Application>\n <Name>Spain</Name>\n <Key>2345364564</Key>\n <Status>NORMAL</Status>\n <Modules>\n <CS_Module>\n <Name>zaragoza</Name>\n <Key>8743249725</Key>\n <DevelopmentEffort>0</DevelopmentEffort>\n <LogicalDBConnections/>\n </CS_Module>\n <CS_Module>\n <Name>malaga</Name>\n <Key>8743249725</Key>\n <DevelopmentEffort>0</DevelopmentEffort>\n <LogicalDBConnections/>\n </CS_Module>\n </Modules>\n <CreatedBy>7</CreatedBy>\n </CS_Application>\n <CS_Application>\n <Name>UK</Name>\n <Key>2345364564</Key>\n <Status>NORMAL</Status>\n <Modules>\n <CS_Module>\n <Name>london</Name>\n <Key>8743249725</Key>\n <DevelopmentEffort>0</DevelopmentEffort>\n <LogicalDBConnections/>\n </CS_Module>\n <CS_Module>\n <Name>liverpool</Name>\n <Key>8743249725</Key>\n <DevelopmentEffort>0</DevelopmentEffort>\n <LogicalDBConnections/>\n </CS_Module>\n </Modules>\n <CreatedBy>7</CreatedBy>\n </CS_Application>\n </Applications>\n </Applications_GetResponse>\n </soap:Body>\n</soap:Envelope>\n"""
pattern1 = re.compile(r'<CS_Application>([\s\S]*?)</CS_Application>')
pattern2 = re.compile(r'<Name>(.*)?</Name>')
for m in re.finditer(pattern1, s):
ss = m.group(1)
res = []
for mm in re.finditer(pattern2, ss):
res.append(mm.group(1))
print("The Country: "+res[0]+" - Cities: "+",".join(res[1:len(res)]))

Building XML from excel data with Python

I am trying to build an xml file from an excel spreadsheet using python but am having trouble building the structure. The xml schema is unique to a software so the opening few tags and ending few would be easier to be written to the xml file just as variables, shown below. They are constant so are pulled from the "
I believe the script neeeds to loop through another sheet, being the ".XML Framework" sheet to build the .xml structure as these are the values which will be ultimately changing. The structure of this sheet is provided below.
here is the .xml structure, from which the python is outputting well up to the unique values, and the changing values are shown in bold. This shows just one row of the data from the workbook. When the workbook has a second row, the .xml structure repeats again where it starts with .
The data structure in the excel sheet ".XML Framework" is:
col 1 = **equals**
col 2 = **74**
col 3 = **Data**"
col 4 = col 3
col 5 = **Name 07**
col 6 = col 5
col 7 = **wstring**
col 8 = /**SM15-HVAC-SUPP-TM-37250-ST**
THIS IS THE DESIRED XML STRUCTURE
<?xml version="1.0" encoding="UTF-8" ?>
<exchange xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://download.autodesk.com/us/navisworks/schemas/nw-exchange-12.0.xsd" units="m" filename="" filepath="">
<selectionsets>
<selectionset name="Dev_1">
<findspec mode="all" disjoint="0">
<conditions>
<condition test="**equals**" flags="**74**">
<category>
<name internal="**Data**">**Data**</name>
</category>
<property>
<name internal="**Name 07**">**Name 07**</name>
</property>
<value>
<data type="**wstring**">/**SM15-HVAC-SUPP-TM-37250-ST**</data>
</value>
</condition>
</conditions>
<locator>/</locator>
</findspec>
</exchange>
Here is my attempt from the python:
path = (r"C:\\Users\\ciara\\desktop\\")
book = os.path.join(path + "Search_Set.xlsm")
wb = openpyxl.load_workbook(book)
sh = wb.get_sheet_by_name('.XML Framework')
df1 = pd.read_excel(book, "<CLEAN>", header=None)
#opening 5 lines of .xml search
print(df1)
cV1 = df1.iloc[0,0] #xml header
print (cV1)
cV2 = df1.iloc[1,0] #<exchange>
print (cV2)
cV3 = df1.iloc[2,0] #<selectionsets>
print (cV3)
cV4 = df1.iloc[3,0] #<selection set name>
print (cV4)
cV5 = df1.iloc[4,0] #<findspec mode>
print (cV5)
cV6 = df1.iloc[5,0] #<findspec mode>
print (cV6)
E = lxml.builder.ElementMaker()
root = ET.Element(cV1)
doc0 = ET.SubElement(root, cV2)
doc1 = ET.SubElement(doc0, cV3)
doc2 = ET.SubElement(doc1, cV4)
doc3 = ET.SubElement(doc2, cV5)
doc4 = ET.SubElement(doc3, cV6)
the_doc = root(
doc0(
doc1(
doc2(
doc3(
FIELD1('condition test=', name='blah'),
FIELD2('some value2', name='asdfasd'),
)
)
)
)
)
print (lxml.etree.tostring(the_doc, pretty_print=True))
tree = ET.ElementTree(root)
tree.write("filename.xml")

getting value from xml in a dict

I have a XML file that I want to parse. In the file I have 3 unique tags -
3
2
1
Each of these have 1 unique value for a metricX. I want to be extract these values in form a dict in python.
Something like
Desired Output
{ 3 : {"metricX":100}, 2 : {"metricX":11}, 1 : {"metricX":44}}
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="MeasDataCollection.xsl"?>
<!DOCTYPE mdc SYSTEM "MeasDataCollection.dtd">
<mdc xmlns:HTML="http://www.w3.org/TR/REC-xml">
<md>
<neid>
<neun>936001_STURGEON_BAY_MEYER</neun>
</neid>
<mi>
<mi>
<mts>20170924161500Z</mts>
<gp>900</gp>
<mt>metricX</mt>
<mv>
<moid>3</moid>
<r>100</r>
</mv>
<mv>
<moid>2</moid>
<r>11</r>
</mv>
<mv>
<moid>1</moid>
<r>44</r>
</mv>
</mi>
</mi>
</md>
</mdc>
So far I have tried using Element Tree.
import os
import xml.etree.ElementTree as ET
fullpath = os.getcwd()
os.chdir(r"C:\Users\sss\Documents\Zabbix_work\xml_parsing")
tree = ET.ElementTree(file='smaple.xml')
for elem in tree.iter():
print (elem.tag, elem.text)
Output so far is -
mdc
md
neid
neun 936001_STURGEON_BAY_MEYER
mi
mi
mts 20170924161500Z
gp 900
mt metricX
mv
moid 3
r 100
mv
moid 2
r 11
mv
moid 1
r 44
Not so sure now how to organize it further in form of a dict.

This should do the trick:
import xml.etree.ElementTree as ET
import os
file_path = os.path.expanduser('~/Desktop/input123.xml') # filepath here
tree = ET.ElementTree(file=file_path)
my_dict = {}
for node in tree.getroot().find('md').find('mi').find('mi').findall('mv'):
my_dict[int(node.find('moid').text)] = { 'metricX': int(node.find('r').text) }
print(my_dict)
...output:
{3: {'metricX': 100}, 2: {'metricX': 11}, 1: {'metricX': 44}}

How to parse this XML response in Python?

This is my XML file:
<?xml version="1.0" ?>
<Items>
<Item>
<ASIN>3570102769</ASIN>
<DetailPageURL>http://www.amazon.de/Inside-IS-Tage-Islamischen-Staat/dp/3570102769%3FSubscriptionId%3DAKIAI554OLCUMRCYB7ZA%26tag%3DjPp08vuSO4osfgfbCbEdF7TNqnWOm7YtprtqRPB9%26linkCode%3Dxm2%26camp%3D2025%26creative%3D165953%26creativeASIN%3D3570102769</DetailPageURL>
<ItemLinks>
<ItemLink>
<Description>Add To Wishlist</Description>
<URL>http://www.amazon.de/gp/registry/wishlist/add-item.html%3Fasin.0%3D3570102769%26SubscriptionId%3DAKIAI554OLCUMRCYB7ZA%26tag%3DjPp08vuSO4osfgfbCbEdF7TNqnWOm7YtprtqRPB9%26linkCode%3Dxm2%26camp%3D2025%26creative%3D12738%26creativeASIN%3D3570102769</URL>
</ItemLink>
<ItemLink>
<Description>Tell A Friend</Description>
<URL>http://www.amazon.de/gp/pdp/taf/3570102769%3FSubscriptionId%3DAKIAI554OLCUMRCYB7ZA%26tag%3DjPp08vuSO4osfgfbCbEdF7TNqnWOm7YtprtqRPB9%26linkCode%3Dxm2%26camp%3D2025%26creative%3D12738%26creativeASIN%3D3570102769</URL>
</ItemLink>
<ItemLink>
<Description>All Customer Reviews</Description>
<URL>http://www.amazon.de/review/product/3570102769%3FSubscriptionId%3DAKIAI554OLCUMRCYB7ZA%26tag%3DjPp08vuSO4osfgfbCbEdF7TNqnWOm7YtprtqRPB9%26linkCode%3Dxm2%26camp%3D2025%26creative%3D12738%26creativeASIN%3D3570102769</URL>
</ItemLink>
<ItemLink>
<Description>All Offers</Description>
<URL>http://www.amazon.de/gp/offer-listing/3570102769%3FSubscriptionId%3DAKIAI554OLCUMRCYB7ZA%26tag%3DjPp08vuSO4osfgfbCbEdF7TNqnWOm7YtprtqRPB9%26linkCode%3Dxm2%26camp%3D2025%26creative%3D12738%26creativeASIN%3D3570102769</URL>
</ItemLink>
</ItemLinks>
<ItemAttributes>
<Author>Jürgen Todenhöfer</Author>
<Binding>Gebundene Ausgabe</Binding>
<EAN>9783570102763</EAN>
<EANList>
<EANListElement>9783570102763</EANListElement>
</EANList>
<ISBN>3570102769</ISBN>
<IsEligibleForTradeIn>1</IsEligibleForTradeIn>
<ItemDimensions>
<Height Units="hundredths-inches">874</Height>
<Length Units="hundredths-inches">575</Length>
<Width Units="hundredths-inches">126</Width>
</ItemDimensions>
<Label>C. Bertelsmann Verlag</Label>
<Languages>
<Language>
<Name>Deutsch</Name>
<Type>Published</Type>
</Language>
<Language>
<Name>Deutsch</Name>
<Type>Original</Type>
</Language>
<Language>
<Name>Deutsch</Name>
<Type>Unbekannt</Type>
</Language>
</Languages>
<ListPrice>
<Amount>1799</Amount>
<CurrencyCode>EUR</CurrencyCode>
<FormattedPrice>EUR 17,99</FormattedPrice>
</ListPrice>
<Manufacturer>C. Bertelsmann Verlag</Manufacturer>
<ManufacturerMinimumAge Units="months">192</ManufacturerMinimumAge>
<NumberOfPages>288</NumberOfPages>
<PackageDimensions>
<Height Units="hundredths-inches">118</Height>
<Length Units="hundredths-inches">567</Length>
<Weight Units="hundredths-pounds">93</Weight>
<Width Units="hundredths-inches">252</Width>
</PackageDimensions>
<PackageQuantity>1</PackageQuantity>
<ProductGroup>Book</ProductGroup>
<ProductTypeName>ABIS_BOOK</ProductTypeName>
<PublicationDate>2015-04-27</PublicationDate>
<Publisher>C. Bertelsmann Verlag</Publisher>
<Studio>C. Bertelsmann Verlag</Studio>
<Title>Inside IS - 10 Tage im 'Islamischen Staat'</Title>
<TradeInValue>
<Amount>930</Amount>
<CurrencyCode>EUR</CurrencyCode>
<FormattedPrice>EUR 9,30</FormattedPrice>
</TradeInValue>
</ItemAttributes>
<OfferSummary>
<LowestNewPrice>
<Amount>1799</Amount>
<CurrencyCode>EUR</CurrencyCode>
<FormattedPrice>EUR 17,99</FormattedPrice>
</LowestNewPrice>
<LowestUsedPrice>
<Amount>1390</Amount>
<CurrencyCode>EUR</CurrencyCode>
<FormattedPrice>EUR 13,90</FormattedPrice>
</LowestUsedPrice>
<LowestCollectiblePrice>
<Amount>4999</Amount>
<CurrencyCode>EUR</CurrencyCode>
<FormattedPrice>EUR 49,99</FormattedPrice>
</LowestCollectiblePrice>
<TotalNew>56</TotalNew>
<TotalUsed>8</TotalUsed>
<TotalCollectible>1</TotalCollectible>
<TotalRefurbished>0</TotalRefurbished>
</OfferSummary>
<Offers>
<TotalOffers>1</TotalOffers>
<TotalOfferPages>1</TotalOfferPages>
<MoreOffersUrl>http://www.amazon.de/gp/offer-listing/3570102769%3FSubscriptionId%3DAKIAI554OLCUMRCYB7ZA%26tag%3DjPp08vuSO4osfgfbCbEdF7TNqnWOm7YtprtqRPB9%26linkCode%3Dxm2%26camp%3D2025%26creative%3D12738%26creativeASIN%3D3570102769</MoreOffersUrl>
<Offer>
<OfferAttributes>
<Condition>New</Condition>
</OfferAttributes>
<OfferListing>
<OfferListingId>9KHCZj9qtL6ucVBPASfXaryQjU8tWbc0n%2F3F4F7GraOKW6Csji2OxpD93%2FkoHwgIGQctlnrtx4RWIeJULAcvvsFhiopFi08JdsZ%2FeO3u6g0%3D</OfferListingId>
<Price>
<Amount>1799</Amount>
<CurrencyCode>EUR</CurrencyCode>
<FormattedPrice>EUR 17,99</FormattedPrice>
</Price>
<Availability>Gewöhnlich versandfertig in 24 Stunden</Availability>
<AvailabilityAttributes>
<AvailabilityType>now</AvailabilityType>
<MinimumHours>0</MinimumHours>
<MaximumHours>0</MaximumHours>
</AvailabilityAttributes>
<IsEligibleForSuperSaverShipping>1</IsEligibleForSuperSaverShipping>
</OfferListing>
</Offer>
</Offers>
</Item>
<Item>
<ASIN>3813506479</ASIN>
<DetailPageURL>http://www.amazon.de/Altes-Land-Roman-D%C3%B6rte-Hansen/dp/3813506479%3FSubscriptionId%3DAKIAI554OLCUMRCYB7ZA%26tag%3DjPp08vuSO4osfgfbCbEdF7TNqnWOm7YtprtqRPB9%26linkCode%3Dxm2%26camp%3D2025%26creative%3D165953%26creativeASIN%3D3813506479</DetailPageURL>
<ItemLinks>
<ItemLink>
<Description>Add To Wishlist</Description>
<URL>http://www.amazon.de/gp/registry/wishlist/add-item.html%3Fasin.0%3D3813506479%26SubscriptionId%3DAKIAI554OLCUMRCYB7ZA%26tag%3DjPp08vuSO4osfgfbCbEdF7TNqnWOm7YtprtqRPB9%26linkCode%3Dxm2%26camp%3D2025%26creative%3D12738%26creativeASIN%3D3813506479</URL>
</ItemLink>
<ItemLink>
<Description>Tell A Friend</Description>
<URL>http://www.amazon.de/gp/pdp/taf/3813506479%3FSubscriptionId%3DAKIAI554OLCUMRCYB7ZA%26tag%3DjPp08vuSO4osfgfbCbEdF7TNqnWOm7YtprtqRPB9%26linkCode%3Dxm2%26camp%3D2025%26creative%3D12738%26creativeASIN%3D3813506479</URL>
</ItemLink>
<ItemLink>
<Description>All Customer Reviews</Description>
<URL>http://www.amazon.de/review/product/3813506479%3FSubscriptionId%3DAKIAI554OLCUMRCYB7ZA%26tag%3DjPp08vuSO4osfgfbCbEdF7TNqnWOm7YtprtqRPB9%26linkCode%3Dxm2%26camp%3D2025%26creative%3D12738%26creativeASIN%3D3813506479</URL>
</ItemLink>
<ItemLink>
<Description>All Offers</Description>
<URL>http://www.amazon.de/gp/offer-listing/3813506479%3FSubscriptionId%3DAKIAI554OLCUMRCYB7ZA%26tag%3DjPp08vuSO4osfgfbCbEdF7TNqnWOm7YtprtqRPB9%26linkCode%3Dxm2%26camp%3D2025%26creative%3D12738%26creativeASIN%3D3813506479</URL>
</ItemLink>
</ItemLinks>
<ItemAttributes>
<Author>Dörte Hansen</Author>
<Binding>Gebundene Ausgabe</Binding>
<EAN>9783813506471</EAN>
<EANList>
<EANListElement>9783813506471</EANListElement>
</EANList>
<ISBN>3813506479</ISBN>
<IsEligibleForTradeIn>1</IsEligibleForTradeIn>
<ItemDimensions>
<Height Units="hundredths-inches">870</Height>
<Length Units="hundredths-inches">567</Length>
<Width Units="hundredths-inches">114</Width>
</ItemDimensions>
<Label>Albrecht Knaus Verlag</Label>
<Languages>
<Language>
<Name>Deutsch</Name>
<Type>Published</Type>
</Language>
<Language>
<Name>Deutsch</Name>
<Type>Original</Type>
</Language>
</Languages>
<ListPrice>
<Amount>1999</Amount>
<CurrencyCode>EUR</CurrencyCode>
<FormattedPrice>EUR 19,99</FormattedPrice>
</ListPrice>
<Manufacturer>Albrecht Knaus Verlag</Manufacturer>
<NumberOfPages>288</NumberOfPages>
<PackageDimensions>
<Height Units="hundredths-inches">118</Height>
<Length Units="hundredths-inches">858</Length>
<Weight Units="hundredths-pounds">101</Weight>
<Width Units="hundredths-inches">559</Width>
</PackageDimensions>
<ProductGroup>Book</ProductGroup>
<ProductTypeName>ABIS_BOOK</ProductTypeName>
<PublicationDate>2015-02-16</PublicationDate>
<Publisher>Albrecht Knaus Verlag</Publisher>
<Studio>Albrecht Knaus Verlag</Studio>
<Title>Altes Land: Roman</Title>
<TradeInValue>
<Amount>965</Amount>
<CurrencyCode>EUR</CurrencyCode>
<FormattedPrice>EUR 9,65</FormattedPrice>
</TradeInValue>
</ItemAttributes>
<OfferSummary>
<LowestNewPrice>
<Amount>1999</Amount>
<CurrencyCode>EUR</CurrencyCode>
<FormattedPrice>EUR 19,99</FormattedPrice>
</LowestNewPrice>
<LowestUsedPrice>
<Amount>1599</Amount>
<CurrencyCode>EUR</CurrencyCode>
<FormattedPrice>EUR 15,99</FormattedPrice>
</LowestUsedPrice>
<TotalNew>72</TotalNew>
<TotalUsed>8</TotalUsed>
<TotalCollectible>0</TotalCollectible>
<TotalRefurbished>0</TotalRefurbished>
</OfferSummary>
<Offers>
<TotalOffers>1</TotalOffers>
<TotalOfferPages>1</TotalOfferPages>
<MoreOffersUrl>http://www.amazon.de/gp/offer-listing/3813506479%3FSubscriptionId%3DAKIAI554OLCUMRCYB7ZA%26tag%3DjPp08vuSO4osfgfbCbEdF7TNqnWOm7YtprtqRPB9%26linkCode%3Dxm2%26camp%3D2025%26creative%3D12738%26creativeASIN%3D3813506479</MoreOffersUrl>
<Offer>
<OfferAttributes>
<Condition>New</Condition>
</OfferAttributes>
<OfferListing>
<OfferListingId>aeRv5KPt26T8S0hLrgV8Bv9UPYABYOMijGRxffbNJXUZSN4XfeeOZZpCZ28EURzmgMLlcYEBSRlMXS%2F8Z0pN1JbYerndME%2B2VK3RosfdQJA%3D</OfferListingId>
<Price>
<Amount>1999</Amount>
<CurrencyCode>EUR</CurrencyCode>
<FormattedPrice>EUR 19,99</FormattedPrice>
</Price>
<Availability>Gewöhnlich versandfertig in 24 Stunden</Availability>
<AvailabilityAttributes>
<AvailabilityType>now</AvailabilityType>
<MinimumHours>0</MinimumHours>
<MaximumHours>0</MaximumHours>
</AvailabilityAttributes>
<IsEligibleForSuperSaverShipping>1</IsEligibleForSuperSaverShipping>
</OfferListing>
</Offer>
</Offers>
</Item>
</Items>
I want to get any ASIN element. So I tried this:
from lxml import etree
doc = etree.fromstring(xmlstring)
items = doc.xpath('//Items/Item')
for a in items:
asin = a.xpath('//ASIN/text()')
print asin
What I get is this:
['3570102769', '3813506479']
['3570102769', '3813506479']
But I want this:
['3570102769']
['3813506479']
I don't understand what's the problem here? I think I should iterate over any element and in every element is one item with one asin. Why does it return two times two asin?

When you're searching for a.xpath('//ASIN/text()') you're searching the complete document tree again. Quoting from the XML Path language specification:
//para selects all the para descendants of the document root and thus selects all para elements in the same document as the context node
So what you're doing is iterating over the matched Item nodes and saying "Give me all ASIN nodes in this document please". The context for this (the Item node) is ignored.
What you should do instead, is directly select the ASIN child-node directly. Keeping to your original implementation this could look like this:
doc = etree.fromstring(xmlstring)
items = doc.xpath('//Items/Item')
for a in items:
asin = a.xpath('ASIN/text()')
print asin
which gives the output you desire:
['3570102769']
['3813506479']
Alternatively, if you're not certain where in the Item node your ASIN appears, you could use .//ASIN/text()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

python read complex xml with ElementTree - python

Try yo use jsoup. An related example is parse XML.

this works: for listInvoice in root.findall('.//{http://www.stormware.cz/schema/version_2/invoice.xsd}invoiceHeader'): invoiceHeader = listInvoice.find('.//{http://www.stormware.cz/schema/version_2/invoice.xsd}id').text print invoiceHeader

Related

Create dataframe of certain XML element's text python pandas

Python re.findall organize list

Building XML from excel data with Python

getting value from xml in a dict

How to parse this XML response in Python?

Categories

Resources