I have been trying to use minidom but have no real preference. For some reason lxml will not install on my machine.
I would like to parse an xml file:
<?xml version="1.
-<transfer frmt="1" vtl="0" serial_number="E5XX-0822" date="2016-10-03 16:34:53.000" style="startstop">
-<plateInfo>
<plate barcode="E0122326" name="384plate" type="source"/>
<plate barcode="A1234516" name="1536plateD" type="destination"/>
</plateInfo>
-<printmap total="1387">
<w reason="" cf="13" aa="1.779" eo="299.798" tof="32.357" sv="1565.311" ct="1.627" ft="1.649" fc="88.226" memt="0.877" fldu="Percent" fld="DMSO" dy="0" dx="0" region="-1" tz="18989.481" gy="72468.649" gx="55070.768" avt="50" vt="50" vl="3.68" cvl="3.63" t="16:30:47.703" dc="0" dr="0" dn="A1" c="0" r="0" n="A1"/>
<w reason="" cf="13" aa="1.779" eo="299.798" tof="32.357" sv="1565.311" ct="1.627" ft="1.649" fc="88.226" memt="0.877" fldu="Percent" fld="DMSO" dy="0" dx="0" region="-1" tz="18989.481" gy="72468.649" gx="55070.768" avt="50" vt="50" vl="3.68" cvl="3.63" t="16:30:47.703" dc="0" dr="0" dn="A1" c="1" r="0" n="A2"/>
</printmap>
</transfer>
The files do not have any element details, as you can see. All the information is contained in the attributes. In trying to adapt another SO post, I have this - but it seems to be geared more toward elements. I am also failing at a good way to "browse" the xml information, i.e. I would like to say "dir(xml_file)" and have a list of all the methods I can carry out on my tree structure, or see all the attributes. I know this was a lot and potentially different directions, but thank you in advance!
def parse(files):
for xml_file in files:
xmldoc = minidom.parse(xml_file)
transfer = xmldoc.getElementsByTagName('transfer')[0]
plateInfo = transfer.getElementsByTagName('plateInfo')[0]
With minidom you can access the attributes of a particular element using the method attributes which can then be treated as dictionary; this example iterates and print the attributes of the element transfer[0]:
from xml.dom.minidom import parse, parseString
xml_file='''<?xml version="1.0" encoding="UTF-8"?>
<transfer frmt="1" vtl="0" serial_number="E5XX-0822" date="2016-10-03 16:34:53.000" style="startstop">
<plateInfo>
<plate barcode="E0122326" name="384plate" type="source"/>
<plate barcode="A1234516" name="1536plateD" type="destination"/>
</plateInfo>
<printmap total="1387">
<w reason="" cf="13" aa="1.779" eo="299.798" tof="32.357" sv="1565.311" ct="1.627" ft="1.649" fc="88.226" memt="0.877" fldu="Percent" fld="DMSO" dy="0" dx="0" region="-1" tz="18989.481" gy="72468.649" gx="55070.768" avt="50" vt="50" vl="3.68" cvl="3.63" t="16:30:47.703" dc="0" dr="0" dn="A1" c="0" r="0" n="A1"/>
<w reason="" cf="13" aa="1.779" eo="299.798" tof="32.357" sv="1565.311" ct="1.627" ft="1.649" fc="88.226" memt="0.877" fldu="Percent" fld="DMSO" dy="0" dx="0" region="-1" tz="18989.481" gy="72468.649" gx="55070.768" avt="50" vt="50" vl="3.68" cvl="3.63" t="16:30:47.703" dc="0" dr="0" dn="A1" c="1" r="0" n="A2"/>
</printmap>
</transfer>'''
xmldoc = parseString(xml_file)
transfer = xmldoc.getElementsByTagName('transfer')
attlist= transfer[0].attributes.keys()
for a in attlist:
print transfer[0].attributes[a].name,transfer[0].attributes[a].value
you can find more information here:
http://www.diveintopython.net/xml_processing/attributes.html
Related
New to python,I am presently in the process of converting the XML to CSV using Python 3.6.1
Input file is file1.xml file:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Package>
<name>AllFeatureRules</name>
<pkgId>13569656</pkgId>
<pkgMetadata>
<creator>rsikhapa</creator>
<createdDate>13-05-2018 10:07:16</createdDate>
<pkgVersion>3.0.29</pkgVersion>
<application>All</application>
<icType>Feature</icType>
<businessService>Common</businessService>
<technology>All,NA</technology>
<runTimeFormat>RBML</runTimeFormat>
<inputForTranslation></inputForTranslation>
<pkgDescription></pkgDescription>
</pkgMetadata>
<rules>
<rule>
<name>ip_slas_scheduling</name>
<ruleId>46288</ruleId>
<ruleVersion>1.3.0</ruleVersion>
<ruleVersionId>1698132</ruleVersionId>
<nuggetId>619577</nuggetId>
<nuggetVersionId>225380</nuggetVersionId>
<icType>Feature</icType>
<creator>paws</creator>
<customer></customer>
</rule>
</rules>
<versionChanges>
<rulesAdded/>
<rulesModified/>
<rulesDeleted/>
</versionChanges>
</Package>
python code:
import xml.etree.ElementTree as ET
import pandas as pd
tree = ET.parse("file1.xml")
root = tree.getroot()
get_range = lambda col: range(len(col))
l = [{r[i].tag:r[i].text for i in get_range(r)} for r in root]
df = pd.DataFrame.from_dict(l)
df.to_csv('ABC.csv')
python code written as above
problem is it is taking csv conversion only for parent element(pkgmetadata) not for child element(rules).
,
not converting all xml file into csv .please let me know solution
to iterate over every entry, you can use the element trees ET.iter() function.
try:
import xml.etree.cElementTree as ET
except ImportError:
import xml.etree.ElementTree as ET
import pandas as pd
tree = ET.parse("file1.xml")
root = tree.getroot()
iter_root = root.iter()
l = {}
for elem in iter_root:
l[str(elem.tag)] = str(elem.text)
df = pd.DataFrame.from_dict(l,orient="index")
df.to_csv('ABC.csv')
producing a csv:
;0
Package;"
"
name;ip_slas_scheduling
pkgId;13569656
pkgMetadata;"
"
creator;paws
createdDate;13-05-2018 10:07:16
pkgVersion;3.0.29
application;All
icType;Feature
businessService;Common
technology;All,NA
runTimeFormat;RBML
inputForTranslation;None
pkgDescription;None
rules;"
"
rule;"
"
ruleId;46288
ruleVersion;1.3.0
ruleVersionId;1698132
nuggetId;619577
nuggetVersionId;225380
customer;None
versionChanges;"
"
rulesAdded;None
rulesModified;None
rulesDeleted;None
I am programming novice and have just started learning python
below is my xml file:
<Build_details>
<Release number="1902">
<Build number="260">
<OMS>
<Build_path>ST_OMS_V1810_B340</Build_path>
<Pc_version>8041.30.01</Pc_version>
</OMS>
<OMNI>
<Build_path>ST_OMNI_V1810_B340</Build_path>
</OMNI>
</Build>
</Release>
<Release number="1810">
<Build number="230">
<OMS>
<Build_path>ST_OMS_909908</Build_path>
<Pc_version>8031.25.65</Pc_version>
</OMS>
<OMNI>
<Build_path>ST_OMNI_798798789789</Build_path>
</OMNI>
</Build>
</Release>
<Release number="1806">
<Build number="300">
<OMS>
<Build_path>ST_OMS_V18102_B300</Build_path>
<Pc_version>8041.30.01</Pc_version>
</OMS>
<OMNI>
<Build_path>ST_OMNI_V18102_B300</Build_path>
</OMNI>
</Build>
</Release>
</Build_details>
How can i insert below chunk of data by asking release no to user and insert below it :
<Build number="230">
<OMS>
<Build_path>ST_OMS_909908</Build_path>
<Pc_version>8031.25.65</Pc_version>
</OMS>
<OMNI>
<Build_path>ST_OMNI_798798789789</Build_path>
</OMNI>
</Build>
I need to search a particular release and then add details to it.Please help
i am not unable to traverse xml to find a particular release
I'm not able to add my comment because of less Reputations .
go through this link Reading XML file and fetching its attributes value in Python
Here is the solution using python inbuilt library xml,
You will have to find the release element first and then create a new build element and append to the release element.
import xml.etree.ElementTree as ET
if __name__ == "__main__":
release_number = input("Enter the release number\n").strip()
tree = ET.ElementTree(file="Build.xml") # Original XML File
root = tree.getroot()
for elem in root.iterfind('.//Release'):
# Find the release element
if elem.attrib['number'] == release_number:
# Create new Build Element
build_elem = ET.Element("Build", {"number": "123"})
# OMS element
oms_elem = ET.Element("OMS")
build_path_elem = ET.Element("Build_path")
build_path_elem.text = "ST_OMS_909908"
pc_version_elem = ET.Element("Pc_version")
pc_version_elem.text = "8031.25.65"
oms_elem.append(build_path_elem)
oms_elem.append(pc_version_elem)
omni_elem = ET.Element("OMNI")
build_path_omni_elem = ET.Element("Build_path")
build_path_omni_elem.text = "ST_OMNI_798798789789"
omni_elem.append(build_path_omni_elem)
build_elem.append(oms_elem)
build_elem.append(omni_elem)
elem.append(build_elem)
# Write to file
tree.write("Build_new.xml") # After adding the new element
If have an xml file that looks like:
<?xml version="1.0"?>
-<apple view_filter="simple" version="1" format="1">
<apples fruit_id="3" type="red" name="american">
<basket version="1" type="6" pieces="12" expiration="12">
<fruit_type colour="000" fruit_type="0x" weight="32">
</basket>
</apples>
</apple>
For the element fruit_type="0x", I want to be able to use python code to navigate to that element and change the text (0x) of it's attribute. I also want to do the same for 'colour' and 'weight'.
How can I do this because when I try to navigate to fruit_type, I end up changing the fruit_type (first element) not the one that is fruit_type = '0x'?
The code that does exactly what I want is:
import xml.etree.ElementTree as ET
parent = ET.parse("d:\\untitled\\note.xml")
root = parent.getroot()
for nodes in root.getchildren() :
for subNodes in nodes.getchildren() :
for mynode in subNodes.iterfind('basket'):
print("##### Before Change of attributes ########### \n")
print(ET.tostring(mynode))
print("\n ##### After Change of attributes ###########\n")
mynode.set('fruit_type', '0234')
mynode.set('colour', '999')
mynode.set('weight', '45')
print(ET.tostring(mynode))
Here is a sample code how you can change the attributes of Fruit_type:
Sample Code
import xml.etree.ElementTree as ET
parent = ET.parse("d:\\untitled\\note.xml")
root = parent.getroot()
for nodes in root.getchildren() :
for subNodes in nodes.getchildren() :
for mynode in subNodes.getchildren():
print("##### Before Change of attributes ########### \n")
print(ET.tostring(mynode))
print("\n ##### After Change of attributes ###########\n")
mynode.set('fruit_type', '0234')
mynode.set('colour', '999')
mynode.set('weight', '45')
print(ET.tostring(mynode))
Output
##### Before Change of attributes ###########
b'<fruit_type colour="000" fruit_type="0x" weight="32">\n </fruit_type>\n '
##### After Change of attributes ###########
b'<fruit_type colour="999" fruit_type="0234" weight="45">\n </fruit_type>\n '
hope this helps
I am trying to use findall to select on some xml elements, but i can't get any results.
import xml.etree.ElementTree as ET
import sys
storefront = sys.argv[1]
xmlFileName = 'promotions{0}.xml'
xmlFile = xmlFileName.format(storefront)
csvFileName = 'hrz{0}.csv'
csvFile = csvFileName.format(storefront)
ET.register_namespace('', "http://www.demandware.com/xml/impex/promotion/2008-01-31")
tree = ET.parse(xmlFile)
root = tree.getroot()
print('------------------Generate test-------------\n')
csv = open(csvFile,'w')
n = 0
for child in root.findall('campaign'):
print(child.attrib['campaign-id'])
print(n)
n+=1
The XML looks something like this:
<?xml version="1.0" encoding="UTF-8"?>
<promotions xmlns="http://www.demandware.com/xml/impex/promotion/2008-01-31">
<campaign campaign-id="10off-310781">
<enabled-flag>true</enabled-flag>
<campaign-scope>
<applicable-online/>
</campaign-scope>
<customer-groups match-mode="any">
<customer-group group-id="Everyone"/>
</customer-groups>
</campaign>
<campaign campaign-id="MNT-deals">
<enabled-flag>true</enabled-flag>
<campaign-scope>
<applicable-online/>
</campaign-scope>
<start-date>2017-07-03T22:00:00.000Z</start-date>
<end-date>2017-07-31T22:00:00.000Z</end-date>
<customer-groups match-mode="any">
<customer-group group-id="Everyone"/>
</customer-groups>
</campaign>
<campaign campaign-id="black-friday">
<enabled-flag>true</enabled-flag>
<campaign-scope>
<applicable-online/>
</campaign-scope>
<start-date>2017-11-23T23:00:00.000Z</start-date>
<end-date>2017-11-24T23:00:00.000Z</end-date>
<customer-groups match-mode="any">
<customer-group group-id="Everyone"/>
</customer-groups>
<custom-attributes>
<custom-attribute attribute-id="expires_date">2017-11-29</custom-attribute>
</custom-attributes>
</campaign>
<promotion-campaign-assignment promotion-id="winter17-new-bubble" campaign-id="winter17-new-bubble">
<qualifiers match-mode="any">
<customer-groups/>
<source-codes/>
<coupons/>
</qualifiers>
<rank>100</rank>
</promotion-campaign-assignment>
<promotion-campaign-assignment promotion-id="xmas" campaign-id="xmas">
<qualifiers match-mode="any">
<customer-groups/>
<source-codes/>
<coupons/>
</qualifiers>
</promotion-campaign-assignment>
</promotions>
Any ideas what i am doing wrong?
I have tried different solutions that i found on stackoverflow but nothing seems to work for me(from the things i have tried).
The list is empty.
Sorry if it is something very obvious i am new to python.
As mentioned here by #MartijnPieters, etree's .findall uses the namespaces argument while the .register_namespace() is used for xml output of the tree. Therefore, consider mapping the default namespace with an explicit prefix. Below uses doc but can even be cosmin.
Additionally, consider with and enumerate() even the csv module as better handlers for your print and CSV outputs.
import csv
...
root = tree.getroot()
print('------------------Generate test-------------\n')
with open(csvFile, 'w') as f:
c = csv.writer(f, lineterminator='\n')
for n, child in enumerate(root.findall('doc:campaign', namespaces={'doc':'http://www.demandware.com/xml/impex/promotion/2008-01-31'})):
print(child.attrib['campaign-id'])
print(n)
c.writerow([child.attrib['campaign-id']])
# ------------------Generate test-------------
# 10off-310781
# 0
# MNT-deals
# 1
# black-friday
# 2
I am currently learning about xml and DTD and I came across a DTD which was a bit puzzling .
<!ELEMENT foo (superpowers*)>
<!ELEMENT superpowers ( foo | agility )>
Firstly is this DTD legal ?
whenever I try to generate a corresponding xml file i get the following error,
Traceback (most recent call last): File "test.py", line 12, in
foo.append(superpowers)\n File "src/lxml/etree.pyx", line 832, in lxml.etree._Element.append File "src/lxml/apihelpers.pxi", line
1283, in lxml.etree._appendChild ValueError: cannot append parent to
itself
The code I am using for xml generation in python is represented by this pseudo code.
from lxml import etree as xml
import pprint
foo = xml.Element("foo")
superpowers = xml.Element("superpowers")
x = True
if x :
foo = xml.Element("foo")
superpowers.append(foo)
foo.append(superpowers)
else :
agility = xml.Element("agility")
superpowers.append(agility)
foo.append(superpowers)
tree = xml.ElementTree(foo)
print (xml.tostring(tree, pretty_print=True))
with open("foo.xml", "wb") as op:
tree.write(op, pretty_print=True) #pretty_print used for indentation
Have I overlooked anything ??
<!ELEMENT foo (superpowers*)>
<!ELEMENT superpowers ( foo | agility )>
Firstly is this DTD legal ?
Yes, that partial DTD is legal. (I say partial because there isn't an element declaration for agility.)
It's saying that foo can contain zero or more superpowers and superpowers must contain exactly one foo or agility.
For example, this would be valid according to that DTD...
<foo>
<superpowers>
<foo/>
</superpowers>
<superpowers>
<foo>
<superpowers>
<foo/>
</superpowers>
</foo>
</superpowers>
</foo>
The error you're getting makes sense; you can't foo.append(superpowers) because superpowers is already the parent of foo. (Like Harry Dunne says, "You can't triple stamp a double stamp Lloyd!".)
What you would need to do is create a brand new foo and append superpowers to that.
Example...
if x :
foo = xml.Element("foo")
superpowers.append(foo)
foo2 = xml.Element("foo")
foo2.append(superpowers)
and what you'd end up with is (comments added to try to help clarify)...
<foo><!--foo2-->
<superpowers>
<foo><!--original foo--></foo>
</superpowers>
</foo>