I have a text file with entries like this:
<soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<soap:Body>
<Applications_GetResponse xmlns="http://www.country.com">
<Applications>
<CS_Application>
<Name>Spain</Name>
<Key>2345364564</Key>
<Status>NORMAL</Status>
<Modules>
<CS_Module>
<Name>zaragoza</Name>
<Key>8743249725</Key>
<DevelopmentEffort>0</DevelopmentEffort>
<LogicalDBConnections/>
</CS_Module>
<CS_Module>
<Name>malaga</Name>
<Key>8743249725</Key>
<DevelopmentEffort>0</DevelopmentEffort>
<LogicalDBConnections/>
</CS_Module>
</Modules>
<CreatedBy>7</CreatedBy>
</CS_Application>
<CS_Application>
<Name>UK</Name>
<Key>2345364564</Key>
<Status>NORMAL</Status>
<Modules>
<CS_Module>
<Name>london</Name>
<Key>8743249725</Key>
<DevelopmentEffort>0</DevelopmentEffort>
<LogicalDBConnections/>
</CS_Module>
<CS_Module>
<Name>liverpool</Name>
<Key>8743249725</Key>
<DevelopmentEffort>0</DevelopmentEffort>
<LogicalDBConnections/>
</CS_Module>
</Modules>
<CreatedBy>7</CreatedBy>
</CS_Application>
</Applications>
</Applications_GetResponse>
</soap:Body>
</soap:Envelope>
I would like to analyze it and obtain the name of the country in the sequence of the cities.
I tried some things with python re.finall, but I didn't get anything like it
print("HERE APPLICATIONS")
applications = re.findall('<CS_Application><Name>(.*?)</Name>', response_apply.text)
print(applications)
print("HERE MODULES")
modules = re.findall('<CS_Module><Name>(.*?)</Name>', response_apply.text)
print(modules)
return:
host-10$ sudo python3 capture.py
HERE APPLICATIONS
['Spain', 'UK']
HERE MODULES
['zaragoza', 'malaga', 'london', 'liverpool']
The expected result is, I would like the result to be like this:
HERE
The Country: Spain - Cities: zaragoza,malaga
The Country: UK - Cities: london,liverpool
Regex is not good to parse xml. Better use xml parser..
If you want regex solution then hope below code help you.
import re
s = """\n<soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">\n <soap:Body>\n <Applications_GetResponse xmlns="http://www.country.com">\n <Applications>\n <CS_Application>\n <Name>Spain</Name>\n <Key>2345364564</Key>\n <Status>NORMAL</Status>\n <Modules>\n <CS_Module>\n <Name>zaragoza</Name>\n <Key>8743249725</Key>\n <DevelopmentEffort>0</DevelopmentEffort>\n <LogicalDBConnections/>\n </CS_Module>\n <CS_Module>\n <Name>malaga</Name>\n <Key>8743249725</Key>\n <DevelopmentEffort>0</DevelopmentEffort>\n <LogicalDBConnections/>\n </CS_Module>\n </Modules>\n <CreatedBy>7</CreatedBy>\n </CS_Application>\n <CS_Application>\n <Name>UK</Name>\n <Key>2345364564</Key>\n <Status>NORMAL</Status>\n <Modules>\n <CS_Module>\n <Name>london</Name>\n <Key>8743249725</Key>\n <DevelopmentEffort>0</DevelopmentEffort>\n <LogicalDBConnections/>\n </CS_Module>\n <CS_Module>\n <Name>liverpool</Name>\n <Key>8743249725</Key>\n <DevelopmentEffort>0</DevelopmentEffort>\n <LogicalDBConnections/>\n </CS_Module>\n </Modules>\n <CreatedBy>7</CreatedBy>\n </CS_Application>\n </Applications>\n </Applications_GetResponse>\n </soap:Body>\n</soap:Envelope>\n"""
pattern1 = re.compile(r'<CS_Application>([\s\S]*?)</CS_Application>')
pattern2 = re.compile(r'<Name>(.*)?</Name>')
for m in re.finditer(pattern1, s):
ss = m.group(1)
res = []
for mm in re.finditer(pattern2, ss):
res.append(mm.group(1))
print("The Country: "+res[0]+" - Cities: "+",".join(res[1:len(res)]))
I am trying to build an xml file from an excel spreadsheet using python but am having trouble building the structure. The xml schema is unique to a software so the opening few tags and ending few would be easier to be written to the xml file just as variables, shown below. They are constant so are pulled from the "
I believe the script neeeds to loop through another sheet, being the ".XML Framework" sheet to build the .xml structure as these are the values which will be ultimately changing. The structure of this sheet is provided below.
here is the .xml structure, from which the python is outputting well up to the unique values, and the changing values are shown in bold. This shows just one row of the data from the workbook. When the workbook has a second row, the .xml structure repeats again where it starts with .
The data structure in the excel sheet ".XML Framework" is:
col 1 = **equals**
col 2 = **74**
col 3 = **Data**"
col 4 = col 3
col 5 = **Name 07**
col 6 = col 5
col 7 = **wstring**
col 8 = /**SM15-HVAC-SUPP-TM-37250-ST**
THIS IS THE DESIRED XML STRUCTURE
<?xml version="1.0" encoding="UTF-8" ?>
<exchange xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://download.autodesk.com/us/navisworks/schemas/nw-exchange-12.0.xsd" units="m" filename="" filepath="">
<selectionsets>
<selectionset name="Dev_1">
<findspec mode="all" disjoint="0">
<conditions>
<condition test="**equals**" flags="**74**">
<category>
<name internal="**Data**">**Data**</name>
</category>
<property>
<name internal="**Name 07**">**Name 07**</name>
</property>
<value>
<data type="**wstring**">/**SM15-HVAC-SUPP-TM-37250-ST**</data>
</value>
</condition>
</conditions>
<locator>/</locator>
</findspec>
</exchange>
Here is my attempt from the python:
path = (r"C:\\Users\\ciara\\desktop\\")
book = os.path.join(path + "Search_Set.xlsm")
wb = openpyxl.load_workbook(book)
sh = wb.get_sheet_by_name('.XML Framework')
df1 = pd.read_excel(book, "<CLEAN>", header=None)
#opening 5 lines of .xml search
print(df1)
cV1 = df1.iloc[0,0] #xml header
print (cV1)
cV2 = df1.iloc[1,0] #<exchange>
print (cV2)
cV3 = df1.iloc[2,0] #<selectionsets>
print (cV3)
cV4 = df1.iloc[3,0] #<selection set name>
print (cV4)
cV5 = df1.iloc[4,0] #<findspec mode>
print (cV5)
cV6 = df1.iloc[5,0] #<findspec mode>
print (cV6)
E = lxml.builder.ElementMaker()
root = ET.Element(cV1)
doc0 = ET.SubElement(root, cV2)
doc1 = ET.SubElement(doc0, cV3)
doc2 = ET.SubElement(doc1, cV4)
doc3 = ET.SubElement(doc2, cV5)
doc4 = ET.SubElement(doc3, cV6)
the_doc = root(
doc0(
doc1(
doc2(
doc3(
FIELD1('condition test=', name='blah'),
FIELD2('some value2', name='asdfasd'),
)
)
)
)
)
print (lxml.etree.tostring(the_doc, pretty_print=True))
tree = ET.ElementTree(root)
tree.write("filename.xml")
I am trying to iterate over all nodes & child nodes in a tree using ElementTree. I would like to get the all the parent & its child XML tags as columns and values which could append the child nodes to parent in CSV format. I am using python 2.7. The header should be printed only once & below should be respective values
XML File :
<Customers>
<Customer CustomerID="GREAL">
<CompanyName>Great Lakes Food Market</CompanyName>
<ContactName>Howard Snyder</ContactName>
<ContactTitle>Marketing Manager</ContactTitle>
<Phone>(503) 555-7555</Phone>
<FullAddress>
<Address>2732 Baker Blvd.</Address>
<City>Eugene</City>
<Region>OR</Region>
<PostalCode>97403</PostalCode>
<Country>USA</Country>
</FullAddress>
</Customer>
<Customer CustomerID="HUNGC">
<CompanyName>Hungry Coyote Import Store</CompanyName>
<ContactName>Yoshi Latimer</ContactName>
<ContactTitle>Sales Representative</ContactTitle>
<Phone>(503) 555-6874</Phone>
<Fax>(503) 555-2376</Fax>
<FullAddress>
<Address>City Center Plaza 516 Main St.</Address>
<City>Elgin</City>
<Region>OR</Region>
<PostalCode>97827</PostalCode>
<Country>USA</Country>
</FullAddress>
</Customer>
<Customer CustomerID="LAZYK">
<CompanyName>Lazy K Kountry Store</CompanyName>
<ContactName>John Steel</ContactName>
<ContactTitle>Marketing Manager</ContactTitle>
<Phone>(509) 555-7969</Phone>
<Fax>(509) 555-6221</Fax>
<FullAddress>
<Address>12 Orchestra Terrace</Address>
<City>Walla Walla</City>
<Region>WA</Region>
<PostalCode>99362</PostalCode>
<Country>USA</Country>
</FullAddress>
</Customer>
<Customer CustomerID="LETSS">
<CompanyName>Let's Stop N Shop</CompanyName>
<ContactName>Jaime Yorres</ContactName>
<ContactTitle>Owner</ContactTitle>
<Phone>(415) 555-5938</Phone>
<FullAddress>
<Address>87 Polk St. Suite 5</Address>
<City>San Francisco</City>
<Region>CA</Region>
<PostalCode>94117</PostalCode>
<Country>USA</Country>
</FullAddress>
</Customer>
</Customers>
My Code:
#Import Libraries
import csv
import xmlschema
import xml.etree.ElementTree as ET
#Define the variable to store the XML Document
xml_file = 'C:/Users/391648/Desktop/BOSS_20190618_20190516_18062019141928_CUMA/source_Files_XML/CustomersOrders.xml'
#using XML Schema Library validate the XML against XSD
my_schema = xmlschema.XMLSchema('C:/Users/391648/Desktop/BOSS_20190618_20190516_18062019141928_CUMA/source_Files_XML/CustomersOrders.xsd')
SchemaCheck = my_schema.is_valid(xml_file)
print(SchemaCheck) #Prints as True if the document is validated with XSD
#Parse XML & get root
tree = ET.parse(xml_file)
root = tree.getroot()
#Create & Open CSV file
xml_data_to_csv = open('C:/Users/391648/Desktop/BOSS_20190618_20190516_18062019141928_CUMA/source_Files_XML/PythonXMl.csv','w')
#create variable to write to csv
csvWriter = csv.writer(xml_data_to_csv)
#Create list contains header
count =0
#Loop for each node
for element in root.findall('Customers/Customer'):
List_nodes = []
#Get head by Tag
if count ==0:
list_header =[]
Full_Address = []
CompanyName = element.find('CompanyName').tag
list_header.append(CompanyName)
ContactName = element.find('ContactName').tag
list_header.append(ContactName)
ContactTitle = element.find('ContactTitle').tag
list_header.append(ContactTitle)
Phone = element.find('Phone').tag
list_header.append(Phone)
print(list_header)
csvWriter.writerow(list_header)
count = count + 1
#Get the data of the Node
CompanyName = element.find('CompanyName').text
List_nodes.append(CompanyName)
ContactName = element.find('ContactName').text
List_nodes.append(ContactName)
ContactTitle = element.find('ContactTitle').text
List_nodes.append(ContactTitle)
Phone = element.find('Phone').text
List_nodes.append(Phone)
print(List_nodes)
#Write List_Nodes to CSV
csvWriter.writerow(List_nodes)
xml_data_to_csv.close()
Expected CSV output:
CompanyName,ContactName,ContactTitle,Phone, Address, City, Region, PostalCode, Country
Great Lakes Food Market,Howard Snyder,Marketing Manager,(503) 555-7555, City Center Plaza 516 Main St., Elgin, OR, 97827, USA
Hungry Coyote Import Store,Yoshi Latimer,Sales Representative,(503) 555-6874, 12 Orchestra Terrace, Walla Walla, WA, 99362, USA
You might be better off using lxml. It has most of the desired functionality for finding elements built in.
from lxml import etree
import csv
with open('file.xml') as fp:
xml = etree.fromstring(fp.read())
field_dict = {
'CompanyName': 'CompanyName',
'ContactName': 'ContactName',
'ContactTitle': 'ContactTitle',
'Phone': 'Phone',
'Address': 'FullAddress/Address',
'City': 'FullAddress/City',
'Region': 'FullAddress/Region',
'PostalCode': 'FullAddress/PostalCode',
'Country': 'FullAddress/Country'
}
customers = []
for customer in xml:
line = {k: customer.find(v).text for k, v in field_dict.items()}
customers.append(line)
with open('customers.csv', 'w') as fp:
writer = csv.DictWriter(fp, field_dict)
writer.writerows(customers)
You can use xmltodict to convert data to JSON format instead of parsing XML:
import xmltodict
import pandas as pd
with open('data.xml', 'r') as f:
data = xmltodict.parse(f.read())['Customers']['Customer']
data_pd = {'CompanyName': [i['CompanyName'] for i in data],
'ContactName': [i['ContactName'] for i in data],
'ContactTitle': [i['ContactTitle'] for i in data],
'Phone': [i['Phone'] for i in data],
'Address': [i['FullAddress']['Address'] for i in data],
'City': [i['FullAddress']['City'] for i in data],
'Region': [i['FullAddress']['Region'] for i in data],
'PostalCode': [i['FullAddress']['PostalCode'] for i in data],
'Country': [i['FullAddress']['Country'] for i in data]}
df = pd.DataFrame(data_pd)
df.to_csv('result.csv', index=False)
Output CSV file:
CompanyName,ContactName,ContactTitle,Phone,Address,City,Region,PostalCode,Country
Great Lakes Food Market,Howard Snyder,Marketing Manager,(503) 555-7555,2732 Baker Blvd.,Eugene,OR,97403,USA
Hungry Coyote Import Store,Yoshi Latimer,Sales Representative,(503) 555-6874,City Center Plaza 516 Main St.,Elgin,OR,97827,USA
Lazy K Kountry Store,John Steel,Marketing Manager,(509) 555-7969,12 Orchestra Terrace,Walla Walla,WA,99362,USA
Let's Stop N Shop,Jaime Yorres,Owner,(415) 555-5938,87 Polk St. Suite 5,San Francisco,CA,94117,USA
A couple of things I have changed:
Removed schema validation since I do not have the XSD. You may include it
Made the child node traversal dynamic instead of statically referring each child node
The main for loop condition changed to for customer in root.findall('Customer') from for customer in root.findall('Customers/Customer')
However, I tried to keep your program structure, library usage intact. Here is the modified program:
import xml.etree.ElementTree as et
import csv
tree = et.parse("../data/customers.xml")
root = tree.getroot()
headers = []
count = 0
xml_data_to_csv = open('../data/customers.csv', 'w')
csvWriter = csv.writer(xml_data_to_csv)
for customer in root.findall('Customer'):
data = []
for detail in customer:
if(detail.tag == 'FullAddress'):
for addresspart in detail:
data.append(addresspart.text.rstrip('/n/r'))
if(count == 0):
headers.append(addresspart.tag)
else:
data.append(detail.text.rstrip('/n/r'))
if(count == 0):
headers.append(detail.tag)
if(count == 0):
csvWriter.writerow(headers)
csvWriter.writerow(data)
count = count + 1
With the given input XML content it produces:
CompanyName,ContactName,ContactTitle,Phone,Address,City,Region,PostalCode,Country
Great Lakes Food Market,Howard Snyde,Marketing Manage,(503) 555-7555,2732 Baker Blvd.,Eugene,OR,97403,USA
Hungry Coyote Import Store,Yoshi Latime,Sales Representative,(503) 555-6874,(503) 555-2376,City Center Plaza 516 Main St.,Elgi,OR,97827,USA
Lazy K Kountry Store,John Steel,Marketing Manage,(509) 555-7969,(509) 555-6221,12 Orchestra Terrace,Walla Walla,WA,99362,USA
Let's Stop N Shop,Jaime Yorres,Owne,(415) 555-5938,87 Polk St. Suite 5,San Francisco,CA,94117,USA
Note: Instead of writing to CSV in the loop you may append to an array and write it at one go. It depends on your content size and performance.
Update: When you have customers and their orders in the XML
The XML processing and CSV writing code structure remains the same. Additionally, process Orders element while processing customers. Now, under Orders Order elements can be processed exactly like Customer. As you mentioned each Order has ShipInfo as well.
The input XML is assumed to be (based on the comment below):
<Customers>
<Customer CustomerID="GREAL">
<CompanyName>Great Lakes Food Market</CompanyName>
<ContactName>Howard Snyder</ContactName>
<ContactTitle>Marketing Manager</ContactTitle>
<Phone>(503) 555-7555</Phone>
<FullAddress>
<Address>2732 Baker Blvd.</Address>
<City>Eugene</City>
<Region>OR</Region>
<PostalCode>97403</PostalCode>
<Country>USA</Country>
</FullAddress>
<Orders>
<Order>
<Param1>Value1</Param1>
<Param2>Value2</Param2>
<ShipInfo>
<ShipInfoParam1>Value3</ShipInfoParam1>
<ShipInfoParam2>Value4</ShipInfoParam2>
</ShipInfo>
</Order>
<Order>
<Param1>Value5</Param1>
<Param2>Value6</Param2>
<ShipInfo>
<ShipInfoParam1>Value7</ShipInfoParam1>
<ShipInfoParam2>Value8</ShipInfoParam2>
</ShipInfo>
</Order>
</Orders>
</Customer>
<Customer CustomerID="HUNGC">
<CompanyName>Hungry Coyote Import Store</CompanyName>
<ContactName>Yoshi Latimer</ContactName>
<ContactTitle>Sales Representative</ContactTitle>
<Phone>(503) 555-6874</Phone>
<Fax>(503) 555-2376</Fax>
<FullAddress>
<Address>City Center Plaza 516 Main St.</Address>
<City>Elgin</City>
<Region>OR</Region>
<PostalCode>97827</PostalCode>
<Country>USA</Country>
</FullAddress>
<Orders>
<Order>
<Param1>Value7</Param1>
<Param2>Value8</Param2>
<ShipInfo>
<ShipInfoParam1>Value9</ShipInfoParam1>
<ShipInfoParam2>Value10</ShipInfoParam2>
</ShipInfo>
</Order>
</Orders>
</Customer>
<Customer CustomerID="LAZYK">
<CompanyName>Lazy K Kountry Store</CompanyName>
<ContactName>John Steel</ContactName>
<ContactTitle>Marketing Manager</ContactTitle>
<Phone>(509) 555-7969</Phone>
<Fax>(509) 555-6221</Fax>
<FullAddress>
<Address>12 Orchestra Terrace</Address>
<City>Walla Walla</City>
<Region>WA</Region>
<PostalCode>99362</PostalCode>
<Country>USA</Country>
</FullAddress>
</Customer>
<Customer CustomerID="LETSS">
<CompanyName>Let's Stop N Shop</CompanyName>
<ContactName>Jaime Yorres</ContactName>
<ContactTitle>Owner</ContactTitle>
<Phone>(415) 555-5938</Phone>
<FullAddress>
<Address>87 Polk St. Suite 5</Address>
<City>San Francisco</City>
<Region>CA</Region>
<PostalCode>94117</PostalCode>
<Country>USA</Country>
</FullAddress>
</Customer>
</Customers>
Here is the modified code that process both customers and orders:
import xml.etree.ElementTree as et
import csv
tree = et.parse("../data/customers-with-orders.xml")
root = tree.getroot()
customer_csv = open('../data/customers-part.csv', 'w')
order_csv = open('../data/orders-part.csv', 'w')
customerCsvWriter = csv.writer(customer_csv)
orderCsvWriter = csv.writer(order_csv)
customerHeaders = []
orderHeaders = ['CustomerID']
isFirstCustomer = True
isFirstOrder = True
def processOrders(customerId):
global isFirstOrder
for order in detail.findall('Order'):
orderData = [customerId]
for orderdetail in order:
if(orderdetail.tag == 'ShipInfo'):
for shipinfopart in orderdetail:
orderData.append(shipinfopart.text.rstrip('/n/r'))
if(isFirstOrder):
orderHeaders.append(shipinfopart.tag)
else:
orderData.append(orderdetail.text.rstrip('/n/r'))
if(isFirstOrder):
orderHeaders.append(orderdetail.tag)
if(isFirstOrder):
orderCsvWriter.writerow(orderHeaders)
orderCsvWriter.writerow(orderData)
isFirstOrder = False
for customer in root.findall('Customer'):
customerData = []
customerId = customer.get('CustomerID')
for detail in customer:
if(detail.tag == 'FullAddress'):
for addresspart in detail:
customerData.append(addresspart.text.rstrip('/n/r'))
if(isFirstCustomer):
customerHeaders.append(addresspart.tag)
elif(detail.tag == 'Orders'):
processOrders(customerId)
else:
customerData.append(detail.text.rstrip('/n/r'))
if(isFirstCustomer):
customerHeaders.append(detail.tag)
if(isFirstCustomer):
customerCsvWriter.writerow(customerHeaders)
customerCsvWriter.writerow(customerData)
isFirstCustomer = False
Output produced in customers-part.csv:
CompanyName,ContactName,ContactTitle,Phone,Address,City,Region,PostalCode,Country
Great Lakes Food Market,Howard Snyde,Marketing Manage,(503) 555-7555,2732 Baker Blvd.,Eugene,OR,97403,USA
Hungry Coyote Import Store,Yoshi Latime,Sales Representative,(503) 555-6874,(503) 555-2376,City Center Plaza 516 Main St.,Elgi,OR,97827,USA
Lazy K Kountry Store,John Steel,Marketing Manage,(509) 555-7969,(509) 555-6221,12 Orchestra Terrace,Walla Walla,WA,99362,USA
Let's Stop N Shop,Jaime Yorres,Owne,(415) 555-5938,87 Polk St. Suite 5,San Francisco,CA,94117,USA
Output produced in orders-part.csv:
CustomerID,Param1,Param2,ShipInfoParam1,ShipInfoParam2
GREAL,Value1,Value2,Value3,Value4
GREAL,Value5,Value6,Value7,Value8
HUNGC,Value7,Value8,Value9,Value10
Note: the code can be optimized further by reusing. I am leaving that part to you. Secondly, notice that in each order customer Id is added in order to distinguish.
My goal is to convert an .XML file into a .CSV file.
This part of the code is already functional.
However, I also want to extract the sub-sub-nodes of one of the "father" nodes.
Maybe an example would be more self explanatory;
Here is the structure of my XML:
<nedisCatalogue>
<headerInfo>
<feedVersion>1-0</feedVersion>
<dateCreated>2018-01-22T23:37:01+0100</dateCreated>
<supplier>Nedis_BENED</supplier>
<locale>nl_BE</locale>
</headerInfo>
<productList>
<product>
<nedisPartnr><![CDATA[VS-150/63BA]]></nedisPartnr>
<nedisArtlid>17005</nedisArtlid>
<vendorPartnr><![CDATA[TONFREQ-ELKOS / BIPOL 150, 5390]]></vendorPartnr>
<brand><![CDATA[Visaton]]></brand>
<EAN>4007540053905</EAN>
<intrastatCode>8532220000</intrastatCode>
<UNSPSC>52161514</UNSPSC>
<headerText><![CDATA[Crossover Foil capacitor]]></headerText>
<internetText><![CDATA[Bipolaire elco met een ruwe folie en een zeer goede prijs/kwaliteits-verhouding voor de bouw van cross-overs. 63 Vdc, 10% tolerantie.]]></internetText>
<generalText><![CDATA[Dimensions 16 x 35 mm
]]></generalText>
<images>
<image type="2" order="15">767736.JPG</image>
</images>
<attachments>
</attachments>
<categories>
<tree name="Internet_Tree_ISHP">
<entry depth="001" id="1067858"><![CDATA[Audio]]></entry>
<entry depth="002" id="1067945"><![CDATA[Speakers]]></entry>
<entry depth="003" id="1068470"><![CDATA[Accessoires]]></entry>
</tree>
</categories>
<properties>
<property id="360" multiplierID="" unitID="" valueID=""><![CDATA[...]]></property>
</properties>
<status>
<code status="NORMAL"></code>
</status>
<packaging quantity="1" weight="8"></packaging>
<introductionDate>2015-10-26</introductionDate>
<serialnumberKeeping>N</serialnumberKeeping>
<priceLevels>
<normalPricing from="2017-02-13" to="2018-01-23">
<price level="1" moq="1" currency="EUR">2.48</price>
</normalPricing>
<specialOfferPricing></specialOfferPricing>
<goingPriceInclVAT currency="EUR" quantity="1">3.99</goingPriceInclVAT>
</priceLevels>
<tax>
</tax>
<stock>
<inStockLocal>25</inStockLocal>
<inStockCentral>25</inStockCentral>
<ATP>
<nextExpectedStockDateLocal></nextExpectedStockDateLocal>
<nextExpectedStockDateCentral></nextExpectedStockDateCentral>
</ATP>
</stock>
</product>
....
</nedisCatalogue>
And here is the code that I have now:
import xml.etree.ElementTree as ET
import csv
tree = ET.parse("/Users/BE07861/Documents/nedis_catalog_2018-01-23_nl_BE_32191_v1-0_xml")
root = tree.getroot()
f = open('/Users/BE07861/Documents/test2.csv', 'w')
csvwriter = csv.writer(f, delimiter='ç')
count = 0
head = ['Nedis Part Number', 'Nedis Article ID', 'Vendor Part Number', 'Brand', 'EAN', 'Header text', 'Internet Text', 'General Text', 'categories']
prdlist = root[1]
prdct = prdlist[5]
cat = prdct[12]
tree1=cat[0]
csvwriter.writerow(head)
for time in prdlist.findall('product'):
row = []
nedis_number = time.find('nedisPartnr').text
row.append(nedis_number)
nedis_art_id = time.find('nedisArtlid').text
row.append(nedis_art_id)
vendor_part_nbr = time.find('vendorPartnr').text
row.append(vendor_part_nbr)
Brand = time.find('brand').text
row.append(Brand)
ean = time.find('EAN').text
row.append(ean)
header_text = time.find('headerText').text
row.append(header_text)
internet_text = time.find('internetText').text
row.append(internet_text)
general_text = time.find('generalText').text
row.append(general_text)
categ = time.find('categories').find('tree').find('entry').text
row.append(categ)
csvwriter.writerow(row)
f.close()
If you run the code, you'll see that I only retrieve the first "entry" of the categories/tree; which is normal. However, I don't know how to create a loop that, for every node "categories", creates new columns such as categories1, categories2 & categories3 with the values: "entry".
My result should look like this
Nedis Part Number Nedis Article ID Vendor Part Number
VS-150/63BA 17005 TONFREQ-ELKOS / BIPOL 150, 5390
Brand EAN Header text Internet Text
Visaton 4,00754E+12 Crossover Foil capacitor Bipolaire elco …
General Text Category1 Categroy2 Category3
Dimensions 16 x 35 mm Audio Speakers Accessoires
I've really tried my best but didn't manage to find the solution.
Any help would be very much appreciated!!! :)
Thanks a lot,
Allan
I think this is what you're looking for:
for child in time.find('categories').find('tree'):
categ = child.text
row.append(categ)
Here's a solution that loops through the xml once to figure out how many headers to add, adds the headers, and then loops through each product's category list:
**Updated to iterate through images in addition to categories. This is the biggest difference:
for child in time.find('categories').find('tree'):
categ = child.text
row.append(categ)
curcat += 1
while curcat < maxcat:
row.append('')
curcat += 1
It's going to figure out the maximum number of categories on a single record and then and that many columns. If a particular record has less categories, this code will stick blank values in as placeholders so the column headers always line up with the data.
For instance:
Cat1 Cat2 Cat3 Img1 Img2 Img3
A B C 1 2 3
D E <blank> 4 <blank> <blank>
Here's the full solution:
import xml.etree.ElementTree as ET
import csv
tree = ET.parse("c:\\python\\xml.xml")
root = tree.getroot()
f = open('c:\\python\\xml.csv', 'w')
csvwriter = csv.writer(f, delimiter=',')
count = 0
head = ['Nedis Part Number', 'Nedis Article ID', 'Vendor Part Number', 'Brand', 'EAN', 'Header text', 'Internet Text', 'General Text']
prdlist = root[1]
maxcat = 0
for time in prdlist.findall('product'):
cur = 0
for child in time.find('categories').find('tree'):
cur += 1
if cur > maxcat:
maxcat = cur
for cnt in range (0, maxcat):
head.append('Category ' + str(cnt + 1))
maximg = 0
for time in prdlist.findall('product'):
cur = 0
for child in time.find('images'):
cur += 1
if cur > maximg:
maximg = cur
for cnt in range(0, maximg):
head.append('Image ' + str(cnt + 1))
csvwriter.writerow(head)
for time in prdlist.findall('product'):
row = []
nedis_number = time.find('nedisPartnr').text
row.append(nedis_number)
nedis_art_id = time.find('nedisArtlid').text
row.append(nedis_art_id)
vendor_part_nbr = time.find('vendorPartnr').text
row.append(vendor_part_nbr)
Brand = time.find('brand').text
row.append(Brand)
ean = time.find('EAN').text
row.append(ean)
header_text = time.find('headerText').text
row.append(header_text)
internet_text = time.find('internetText').text
row.append(internet_text)
general_text = time.find('generalText').text
row.append(general_text)
curcat = 0
for child in time.find('categories').find('tree'):
categ = child.text
row.append(categ)
curcat += 1
while curcat < maxcat:
row.append('')
curcat += 1
curimg = 0
for img in time.find('images'):
image = img.text
row.append(image)
curimg += 1
while curimg < maximg:
row.append('')
curimg += 1
csvwriter.writerow(row)
f.close()