Python XML findall does not work

Python XML findall does not work - python

I am trying to use findall to select on some xml elements, but i can't get any results.
import xml.etree.ElementTree as ET
import sys
storefront = sys.argv[1]
xmlFileName = 'promotions{0}.xml'
xmlFile = xmlFileName.format(storefront)
csvFileName = 'hrz{0}.csv'
csvFile = csvFileName.format(storefront)
ET.register_namespace('', "http://www.demandware.com/xml/impex/promotion/2008-01-31")
tree = ET.parse(xmlFile)
root = tree.getroot()
print('------------------Generate test-------------\n')
csv = open(csvFile,'w')
n = 0
for child in root.findall('campaign'):
print(child.attrib['campaign-id'])
print(n)
n+=1
The XML looks something like this:
<?xml version="1.0" encoding="UTF-8"?>
<promotions xmlns="http://www.demandware.com/xml/impex/promotion/2008-01-31">
<campaign campaign-id="10off-310781">
<enabled-flag>true</enabled-flag>
<campaign-scope>
<applicable-online/>
</campaign-scope>
<customer-groups match-mode="any">
<customer-group group-id="Everyone"/>
</customer-groups>
</campaign>
<campaign campaign-id="MNT-deals">
<enabled-flag>true</enabled-flag>
<campaign-scope>
<applicable-online/>
</campaign-scope>
<start-date>2017-07-03T22:00:00.000Z</start-date>
<end-date>2017-07-31T22:00:00.000Z</end-date>
<customer-groups match-mode="any">
<customer-group group-id="Everyone"/>
</customer-groups>
</campaign>
<campaign campaign-id="black-friday">
<enabled-flag>true</enabled-flag>
<campaign-scope>
<applicable-online/>
</campaign-scope>
<start-date>2017-11-23T23:00:00.000Z</start-date>
<end-date>2017-11-24T23:00:00.000Z</end-date>
<customer-groups match-mode="any">
<customer-group group-id="Everyone"/>
</customer-groups>
<custom-attributes>
<custom-attribute attribute-id="expires_date">2017-11-29</custom-attribute>
</custom-attributes>
</campaign>
<promotion-campaign-assignment promotion-id="winter17-new-bubble" campaign-id="winter17-new-bubble">
<qualifiers match-mode="any">
<customer-groups/>
<source-codes/>
<coupons/>
</qualifiers>
<rank>100</rank>
</promotion-campaign-assignment>
<promotion-campaign-assignment promotion-id="xmas" campaign-id="xmas">
<qualifiers match-mode="any">
<customer-groups/>
<source-codes/>
<coupons/>
</qualifiers>
</promotion-campaign-assignment>
</promotions>
Any ideas what i am doing wrong?
I have tried different solutions that i found on stackoverflow but nothing seems to work for me(from the things i have tried).
The list is empty.
Sorry if it is something very obvious i am new to python.

As mentioned here by #MartijnPieters, etree's .findall uses the namespaces argument while the .register_namespace() is used for xml output of the tree. Therefore, consider mapping the default namespace with an explicit prefix. Below uses doc but can even be cosmin.
Additionally, consider with and enumerate() even the csv module as better handlers for your print and CSV outputs.
import csv
...
root = tree.getroot()
print('------------------Generate test-------------\n')
with open(csvFile, 'w') as f:
c = csv.writer(f, lineterminator='\n')
for n, child in enumerate(root.findall('doc:campaign', namespaces={'doc':'http://www.demandware.com/xml/impex/promotion/2008-01-31'})):
print(child.attrib['campaign-id'])
print(n)
c.writerow([child.attrib['campaign-id']])
# ------------------Generate test-------------
# 10off-310781
# 0
# MNT-deals
# 1
# black-friday
# 2

Related

convert xml to csv python

New to python,I am presently in the process of converting the XML to CSV using Python 3.6.1
Input file is file1.xml file:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Package>
<name>AllFeatureRules</name>
<pkgId>13569656</pkgId>
<pkgMetadata>
<creator>rsikhapa</creator>
<createdDate>13-05-2018 10:07:16</createdDate>
<pkgVersion>3.0.29</pkgVersion>
<application>All</application>
<icType>Feature</icType>
<businessService>Common</businessService>
<technology>All,NA</technology>
<runTimeFormat>RBML</runTimeFormat>
<inputForTranslation></inputForTranslation>
<pkgDescription></pkgDescription>
</pkgMetadata>
<rules>
<rule>
<name>ip_slas_scheduling</name>
<ruleId>46288</ruleId>
<ruleVersion>1.3.0</ruleVersion>
<ruleVersionId>1698132</ruleVersionId>
<nuggetId>619577</nuggetId>
<nuggetVersionId>225380</nuggetVersionId>
<icType>Feature</icType>
<creator>paws</creator>
<customer></customer>
</rule>
</rules>
<versionChanges>
<rulesAdded/>
<rulesModified/>
<rulesDeleted/>
</versionChanges>
</Package>
python code:
import xml.etree.ElementTree as ET
import pandas as pd
tree = ET.parse("file1.xml")
root = tree.getroot()
get_range = lambda col: range(len(col))
l = [{r[i].tag:r[i].text for i in get_range(r)} for r in root]
df = pd.DataFrame.from_dict(l)
df.to_csv('ABC.csv')
python code written as above
problem is it is taking csv conversion only for parent element(pkgmetadata) not for child element(rules).
,
not converting all xml file into csv .please let me know solution

to iterate over every entry, you can use the element trees ET.iter() function.
try:
import xml.etree.cElementTree as ET
except ImportError:
import xml.etree.ElementTree as ET
import pandas as pd
tree = ET.parse("file1.xml")
root = tree.getroot()
iter_root = root.iter()
l = {}
for elem in iter_root:
l[str(elem.tag)] = str(elem.text)
df = pd.DataFrame.from_dict(l,orient="index")
df.to_csv('ABC.csv')
producing a csv:
;0
Package;"
"
name;ip_slas_scheduling
pkgId;13569656
pkgMetadata;"
"
creator;paws
createdDate;13-05-2018 10:07:16
pkgVersion;3.0.29
application;All
icType;Feature
businessService;Common
technology;All,NA
runTimeFormat;RBML
inputForTranslation;None
pkgDescription;None
rules;"
"
rule;"
"
ruleId;46288
ruleVersion;1.3.0
ruleVersionId;1698132
nuggetId;619577
nuggetVersionId;225380
customer;None
versionChanges;"
"
rulesAdded;None
rulesModified;None
rulesDeleted;None

How to insert values using python in xml file

I am programming novice and have just started learning python
below is my xml file:
<Build_details>
<Release number="1902">
<Build number="260">
<OMS>
<Build_path>ST_OMS_V1810_B340</Build_path>
<Pc_version>8041.30.01</Pc_version>
</OMS>
<OMNI>
<Build_path>ST_OMNI_V1810_B340</Build_path>
</OMNI>
</Build>
</Release>
<Release number="1810">
<Build number="230">
<OMS>
<Build_path>ST_OMS_909908</Build_path>
<Pc_version>8031.25.65</Pc_version>
</OMS>
<OMNI>
<Build_path>ST_OMNI_798798789789</Build_path>
</OMNI>
</Build>
</Release>
<Release number="1806">
<Build number="300">
<OMS>
<Build_path>ST_OMS_V18102_B300</Build_path>
<Pc_version>8041.30.01</Pc_version>
</OMS>
<OMNI>
<Build_path>ST_OMNI_V18102_B300</Build_path>
</OMNI>
</Build>
</Release>
</Build_details>
How can i insert below chunk of data by asking release no to user and insert below it :
<Build number="230">
<OMS>
<Build_path>ST_OMS_909908</Build_path>
<Pc_version>8031.25.65</Pc_version>
</OMS>
<OMNI>
<Build_path>ST_OMNI_798798789789</Build_path>
</OMNI>
</Build>
I need to search a particular release and then add details to it.Please help
i am not unable to traverse xml to find a particular release

I'm not able to add my comment because of less Reputations .
go through this link Reading XML file and fetching its attributes value in Python

Here is the solution using python inbuilt library xml,
You will have to find the release element first and then create a new build element and append to the release element.
import xml.etree.ElementTree as ET
if __name__ == "__main__":
release_number = input("Enter the release number\n").strip()
tree = ET.ElementTree(file="Build.xml") # Original XML File
root = tree.getroot()
for elem in root.iterfind('.//Release'):
# Find the release element
if elem.attrib['number'] == release_number:
# Create new Build Element
build_elem = ET.Element("Build", {"number": "123"})
# OMS element
oms_elem = ET.Element("OMS")
build_path_elem = ET.Element("Build_path")
build_path_elem.text = "ST_OMS_909908"
pc_version_elem = ET.Element("Pc_version")
pc_version_elem.text = "8031.25.65"
oms_elem.append(build_path_elem)
oms_elem.append(pc_version_elem)
omni_elem = ET.Element("OMNI")
build_path_omni_elem = ET.Element("Build_path")
build_path_omni_elem.text = "ST_OMNI_798798789789"
omni_elem.append(build_path_omni_elem)
build_elem.append(oms_elem)
build_elem.append(omni_elem)
elem.append(build_elem)
# Write to file
tree.write("Build_new.xml") # After adding the new element

how to get file names and paths based on a given attribute in parent tag

I want to change the below code to get file_names and file_paths only when fastboot="true" attribute is present in the parent tag,I provided the current output and expected ouput,can anyone provide guidance on how to do it?
import sys
import os
import string
from xml.dom import minidom
if __name__ == '__main__':
meta_contents = minidom.parse("fast.xml")
builds_flat = meta_contents.getElementsByTagName("builds_flat")[0]
build_nodes = builds_flat.getElementsByTagName("build")
for build in build_nodes:
bid_name = build.getElementsByTagName("name")[0]
print "Checking if this is cnss related image... : \n"+bid_name.firstChild.data
if (bid_name.firstChild.data == 'apps'):
file_names = build.getElementsByTagName("file_name")
file_paths = build.getElementsByTagName("file_path")
print "now files paths...\n"
for fn,fp in zip(file_names,file_paths):
if (not fp.firstChild.nodeValue.endswith('/')):
fp.firstChild.nodeValue = fp.firstChild.nodeValue + '/'
full_path = fp.firstChild.nodeValue+fn.firstChild.nodeValue
print "file-to-copy: "+full_path
break
INPUT XML:-
<builds_flat>
<build>
<name>apps</name>
<file_ref ignore="true" minimized="true">
<file_name>adb.exe</file_name>
<file_path>LINUX/android/vendor/qcom/proprietary/usb/host/windows/prebuilt/</file_path>
</file_ref>
<file_ref ignore="true" minimized="true">
<file_name>system.img</file_name>
<file_path>LINUX/android/out/target/product/msmcobalt/secondary-boot/</file_path>
</file_ref>
<download_file cmm_file_var="APPS_BINARY" fastboot_rumi="boot" fastboot="true" minimized="true">
<file_name>boot.img</file_name>
<file_path>LINUX/android/out/target/product/msmcobalt/</file_path>
</download_file>
<download_file sparse_image_path="true" fastboot_rumi="abl" fastboot="true" minimized="true">
<file_name>abl.elf</file_name>
<file_path>LINUX/android/out/target/product/msmcobalt/</file_path>
</download_file>
</build>
</builds_flat>
OUTPUT:-
...............
now files paths...
file-to-copy: LINUX/android/vendor/qcom/proprietary/usb/host/windows/prebuilt/adb.exe
file-to-copy: LINUX/android/out/target/product/msmcobalt/secondary-boot/system.img
file-to-copy: LINUX/android/out/target/product/msmcobalt/boot.img
file-to-copy: LINUX/android/out/target/product/msmcobalt/abl.elf
EXPECTED OUT:-
now files paths...
........
file-to-copy: LINUX/android/out/target/product/msmcobalt/boot.img
file-to-copy: LINUX/android/out/target/product/msmcobalt/abl.elf

Something rather quick and dirty that comes to mind is using the fact that only the download_file elements have the fastboot attribute, right? If that's the case, you could always get the children of type download_file and filter the ones whose fastboot attribute is not "true":
import os
from xml.dom import minidom
if __name__ == '__main__':
meta_contents = minidom.parse("fast.xml")
for elem in meta_contents.getElementsByTagName('download_file'):
if elem.getAttribute('fastboot') == "true":
path = elem.getElementsByTagName('file_path')[0].firstChild.nodeValue
file_name = elem.getElementsByTagName('file_name')[0].firstChild.nodeValue
print os.path.join(path, file_name)
With the sample you provided that outputs:
$ python ./stack_034.py
LINUX/android/out/target/product/msmcobalt/boot.img
LINUX/android/out/target/product/msmcobalt/abl.elf
Needless to say... since there's no .xsd file (nor that it'd matter with the minidom, though) you only get strings (no type safety) and this only applies to the structure shown in the example (you probably would like to add some extra checks there, is what I mean)
EDIT:
As per the comment in this answer:
To get the elements within the <build> that contains a <name> attribute with value apps, you can: Find that <name> tag (the one whose value is the string apps), then move to the parent node (which will put you in the build element) and then proceed as mentioned above:
if __name__ == '__main__':
meta_contents = minidom.parse("fast.xml")
for elem in meta_contents.getElementsByTagName('name'):
if elem.firstChild.nodeValue == "apps":
apps_build = elem.parentNode
for elem in apps_build.getElementsByTagName('download_file'):
if elem.getAttribute('fastboot') == "true":
path = elem.getElementsByTagName('file_path')[0].firstChild.nodeValue
file_name = elem.getElementsByTagName('file_name')[0].firstChild.nodeValue
print os.path.join(path, file_name)

Parsing xml tree attributes (file has no elements)

I have been trying to use minidom but have no real preference. For some reason lxml will not install on my machine.
I would like to parse an xml file:
<?xml version="1.
-<transfer frmt="1" vtl="0" serial_number="E5XX-0822" date="2016-10-03 16:34:53.000" style="startstop">
-<plateInfo>
<plate barcode="E0122326" name="384plate" type="source"/>
<plate barcode="A1234516" name="1536plateD" type="destination"/>
</plateInfo>
-<printmap total="1387">
<w reason="" cf="13" aa="1.779" eo="299.798" tof="32.357" sv="1565.311" ct="1.627" ft="1.649" fc="88.226" memt="0.877" fldu="Percent" fld="DMSO" dy="0" dx="0" region="-1" tz="18989.481" gy="72468.649" gx="55070.768" avt="50" vt="50" vl="3.68" cvl="3.63" t="16:30:47.703" dc="0" dr="0" dn="A1" c="0" r="0" n="A1"/>
<w reason="" cf="13" aa="1.779" eo="299.798" tof="32.357" sv="1565.311" ct="1.627" ft="1.649" fc="88.226" memt="0.877" fldu="Percent" fld="DMSO" dy="0" dx="0" region="-1" tz="18989.481" gy="72468.649" gx="55070.768" avt="50" vt="50" vl="3.68" cvl="3.63" t="16:30:47.703" dc="0" dr="0" dn="A1" c="1" r="0" n="A2"/>
</printmap>
</transfer>
The files do not have any element details, as you can see. All the information is contained in the attributes. In trying to adapt another SO post, I have this - but it seems to be geared more toward elements. I am also failing at a good way to "browse" the xml information, i.e. I would like to say "dir(xml_file)" and have a list of all the methods I can carry out on my tree structure, or see all the attributes. I know this was a lot and potentially different directions, but thank you in advance!
def parse(files):
for xml_file in files:
xmldoc = minidom.parse(xml_file)
transfer = xmldoc.getElementsByTagName('transfer')[0]
plateInfo = transfer.getElementsByTagName('plateInfo')[0]

With minidom you can access the attributes of a particular element using the method attributes which can then be treated as dictionary; this example iterates and print the attributes of the element transfer[0]:
from xml.dom.minidom import parse, parseString
xml_file='''<?xml version="1.0" encoding="UTF-8"?>
<transfer frmt="1" vtl="0" serial_number="E5XX-0822" date="2016-10-03 16:34:53.000" style="startstop">
<plateInfo>
<plate barcode="E0122326" name="384plate" type="source"/>
<plate barcode="A1234516" name="1536plateD" type="destination"/>
</plateInfo>
<printmap total="1387">
<w reason="" cf="13" aa="1.779" eo="299.798" tof="32.357" sv="1565.311" ct="1.627" ft="1.649" fc="88.226" memt="0.877" fldu="Percent" fld="DMSO" dy="0" dx="0" region="-1" tz="18989.481" gy="72468.649" gx="55070.768" avt="50" vt="50" vl="3.68" cvl="3.63" t="16:30:47.703" dc="0" dr="0" dn="A1" c="0" r="0" n="A1"/>
<w reason="" cf="13" aa="1.779" eo="299.798" tof="32.357" sv="1565.311" ct="1.627" ft="1.649" fc="88.226" memt="0.877" fldu="Percent" fld="DMSO" dy="0" dx="0" region="-1" tz="18989.481" gy="72468.649" gx="55070.768" avt="50" vt="50" vl="3.68" cvl="3.63" t="16:30:47.703" dc="0" dr="0" dn="A1" c="1" r="0" n="A2"/>
</printmap>
</transfer>'''
xmldoc = parseString(xml_file)
transfer = xmldoc.getElementsByTagName('transfer')
attlist= transfer[0].attributes.keys()
for a in attlist:
print transfer[0].attributes[a].name,transfer[0].attributes[a].value
you can find more information here:
http://www.diveintopython.net/xml_processing/attributes.html

lxml use namespace instead of ns0, ns1,

I have just started with lxml basics and I am stuck with namespaces: I need to generate an xml like this:
<CityModel
xmlns:bldg="http://www.opengis.net/citygml/building/2.0"
<cityObjectMember>
<bldg:Building>
<bldg:function>1000</bldg:function>
</bldg:Building>
</cityObjectMember>
</CityModel>
By using the following code:
from lxml import etree
cityModel = etree.Element("cityModel")
cityObject = etree.SubElement(cityModel, "cityObjectMember")
bldg = etree.SubElement(cityObject, "{http://schemas.opengis.net/citygml/building/2.0/building.xsd}bldg")
function = etree.SubElement(bldg, "{bldg:}function")
function.text = "1000"
print etree.tostring(cityModel, pretty_print=True)
I get this:
<cityModel>
<cityObjectMember>
<ns0:bldg xmlns:ns0="http://schemas.opengis.net/citygml/building/2.0/building.xsd">
<ns1:function xmlns:ns1="bldg:">1000</ns1:function>
</ns0:bldg>
</cityObjectMember>
</cityModel>
which is quite different from what I want, and my software doesn't parse it.
How to get the correct xml?

from lxml import etree
ns_bldg = "http://www.opengis.net/citygml/building/2.0"
nsmap = {
'bldg': ns_bldg,
}
cityModel = etree.Element("cityModel", nsmap=nsmap)
cityObject = etree.SubElement(cityModel, "cityObjectMember")
bldg = etree.SubElement(cityObject, "{%s}Building" % ns_bldg)
function = etree.SubElement(bldg, "{%s}function" % ns_bldg)
function.text = "1000"
print etree.tostring(cityModel, pretty_print=True)
prints
<cityModel xmlns:bldg="http://www.opengis.net/citygml/building/2.0">
<cityObjectMember>
<bldg:Building>
<bldg:function>1000</bldg:function>
</bldg:Building>
</cityObjectMember>
</cityModel>
See lxml.etree Tutorial - Namespaces.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python XML findall does not work - python

Related

convert xml to csv python

How to insert values using python in xml file

how to get file names and paths based on a given attribute in parent tag

Parsing xml tree attributes (file has no elements)

lxml use namespace instead of ns0, ns1,

Categories

Resources