How to get node's value of an XML in Python? - python

suppose i have an xml file:
<?xml version="1.0" encoding="utf-8" ?>
<configuration>
<quarkSettings>
<UpdatePath></UpdatePath>
<Version>Development</Version>
<Project>ABC</Project>
</quarkSettings>
</configuration>
now i want get Project's value. I have written following code:
import xml.etree.ElementTree as ET
doc1 = ET.parse("Configuration.xml")
for e in doc1.find("Project"):
project =e.text
but it doesn't give the value.

i got the answer:
import xml.etree.ElementTree as ET
doc1 = ET.parse(get_path_for_config_Quark_Release)
root = doc1.getroot()
for element in root.findall("quarkSettings"):
project = element.find("Project").text

Related

How to load xml file with specifc paragraph by xml in Python?

I have a xml file and its structure like that,
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<book>
<toc> <tocdiv pagenum="564">
<title>9thmemo</title>
<tocdiv pagenum="588">
<title>b</title>
</tocdiv>
</tocdiv></toc>
<chapter><title>9thmemo</title>
<para>...</para>
<para>...</para></chapter>
<chapter>...</chapter>
<chapter>...</chapter>
</book>
There are several chapters in the <book>...</book>, and each chapter has a title, I only want to read all content of this chapter,"9thmemo"(not others)
I tried to read by following code:
from xml.dom import minidom
filename = "result.xml"
file = minidom.parse(filename)
chapters = file.getElementsByTagName('chapter')
for i in range(10):
print(chapters[i])
I only get the address of each chapter...
if I add some sub-element like chapters[i].title, it shows cannot find this attribute
I only want to read all content of this chapter,"9thmemo"(not others)
The problem with the code is that it does not try to locate the specific 'chapter' while the answer code uses xpath in order to locate it.
Try the below
import xml.etree.ElementTree as ET
xml = '''<?xml version="1.0" encoding="UTF-8"?>
<book>
<toc>
<tocdiv pagenum="564">
<title>9thmemo</title>
<tocdiv pagenum="588">
<title>b</title>
</tocdiv>
</tocdiv>
</toc>
<chapter>
<title>9thmemo</title>
<para>A</para>
<para>B</para>
</chapter>
<chapter>...</chapter>
<chapter>...</chapter>
</book>'''
root = ET.fromstring(xml)
chapter = root.find('.//chapter/[title="9thmemo"]')
para_data = ','.join(p.text for p in chapter.findall('para'))
print(para_data)
output
A,B

parsing xml with Python minidom

I'm trying to parse elements from .xml file with Python minidom library, but it doesn't seem to work. It's returning "IndexError list out of range". Perhaps I'm using incorrect method/library for the job. Please suggest how to do this. Thanks
from xml.dom import minidom
doc = minidom.parse('/path/to/file/runParameters.xml')
docs = doc.getElementsByTagName('RunParameters')
for el in docs:
cloud = el.getElementsByTagName("EnableCloud")
print(cloud[0].firstChild.nodeValue)
Here is what the structure of the file looks like
<?xml version="1.0"?>
<RunParameters xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<EnableCloud>false</EnableCloud>
<RunParametersVersion>MiSeq</RunParametersVersion>
<CopyManifests>true</CopyManifests>
<FlowcellRFIDTag>
<SerialNumber>000000000-AG01C</SerialNumber>
<PartNumber>17772</PartNumber>
<ExpirationDate>2016-04-10T00:00:00</ExpirationDate>
</FlowcellRFIDTag>
</RunParameters>
Using ElementTree
import xml.etree.ElementTree as ET
xml = '''<?xml version="1.0"?>
<RunParameters xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<EnableCloud>false</EnableCloud>
<RunParametersVersion>MiSeq</RunParametersVersion>
<CopyManifests>true</CopyManifests>
<FlowcellRFIDTag>
<SerialNumber>000000000-AG01C</SerialNumber>
<PartNumber>17772</PartNumber>
<ExpirationDate>2016-04-10T00:00:00</ExpirationDate>
</FlowcellRFIDTag>
</RunParameters>'''
root = ET.fromstring(xml)
print(root.find('.//EnableCloud').text)
output
false
This code works for me. Please try it on your system:
xx = '''
<?xml version="1.0"?>
<RunParameters xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<EnableCloud>false</EnableCloud>
<RunParametersVersion>MiSeq</RunParametersVersion>
<CopyManifests>true</CopyManifests>
<FlowcellRFIDTag>
<SerialNumber>000000000-AG01C</SerialNumber>
<PartNumber>17772</PartNumber>
<ExpirationDate>2016-04-10T00:00:00</ExpirationDate>
</FlowcellRFIDTag>
</RunParameters>
'''.strip()
with open('test2.xml','w') as f:
f.write(xx)
from xml.dom import minidom
doc = minidom.parse('test2.xml')
docs = doc.getElementsByTagName('RunParameters')
for el in docs:
cloud = el.getElementsByTagName("EnableCloud")
print(cloud[0].firstChild.nodeValue)
Output
false

XML Parsing in python(xml.etree.ElementTree)

I am using import xml.etree.ElementTree as ET for parsing xml file in python
I tried:
import xml.etree.ElementTree as ET
tree = ET.parse('pyxml.xml')
self.root = tree.getroot()
name=root[0][0].text
username=root[0][1].text
password=root[0][2].text
host=root[0][3].text
port=root[0][4].text
pyxml.xml:
<data>
<database>
<name>qwe</name>
<username>postgres</username>
<password>1234</password>
<host>localhost</host>
<port>5432</port>
</database>
</data>
But I want XML file like:
<data>
<database name="abc" username="xyz" password="dummy" host="localhost" port="5432"/>
</data>
If I do like this,root[0][0].text is not working.Can anyone tell how to access it?
Try the code below,
import xml.etree.ElementTree as ET
tree = ET.parse('/Users/a-8525/Documents/tmp/pyxml.xml')
root = tree.getroot()
database = root.find('database')
attribute = database.attrib
name = attribute['name']
username = attribute['username']
password =attribute['password']
host = attribute['host']
port = attribute['port']

How to get the content of child->child->child->child in XML file using Python

<?xml version="1.0" encoding="UTF-8"?>
<Document xmlns="urn:iso:std:iso:20022:tech:xsd:camt.056.001.01" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<FIToFIPmtCxlReq>
<Assgnmt>
<Id>TEST-ISO-81</Id>
<Assgnr>
<Agt>
<FinInstnId>
<BIC>CCCCGB2L</BIC>
</FinInstnId>
</Agt>
</Assgnr>
<Assgne>
<Agt>
<FinInstnId>
<BIC>MMMMGB2L</BIC>
</FinInstnId>
</Agt>
</Assgne>
<CreDtTm>2009-03-24T11:22:59</CreDtTm>
</Assgnmt>
<TxInf>
<CxlId>103012345</CxlId>
<Case>
<Id>ISO_TEST_CASE</Id>
<Cretr>
<Agt>
<FinInstnId>
<BIC>MMMMGB2L</BIC>
</FinInstnId>
</Agt>
</Cretr>
</Case>
</TxInf>
</Undrlyg>
</FIToFIPmtCxlReq>
</Document>
Here I want to get the content of "TxInf" like all its child and child of child and the data.
What I have tried is :
import xml.etree.ElementTree as ET
from xml.etree import ElementTree
tree = ET.parse('R3-CAMT.056.001.07-ISO-V.XML')
root = tree.getroot()
for element in root.iter():
if element.tag == "{urn:iso:std:iso:20022:tech:xsd:camt.056.001.01}TxInf":
tree._setroot(element.tag)
print(root.tag)
print(root.attrib)
Please suggest if I can change the root with _setroot or any other possible method
Try something along these lines on your code to see if it works:
for r in root.findall(".//*"):
if 'TxInf' in r.tag:
print(ET.tostring(r))
By the way, it may be easier to do it with lxml, if you can use it.

XML file generating unwanted data

I have tried writing few things to xml file after reading it from a different xml file, everything works smoothly but there are few unwanted tags coming inside the xml file which i generate as output.
Here is what I have tried
from xml.etree import ElementTree as ET
from xml.dom.minidom import getDOMImplementation
from xml.dom.minidom import parseString
tree = ET.parse('C:\\Users\\ca33.xml')
root = tree.getroot()
impl = getDOMImplementation()
#print(root)
header = [root.find('header')]
for h in header:
h1=(parseString(ET.tostring(h)).toprettyxml(''))
#print(h1)
commands = root.findall(".//records//")
recs=[c for c in commands if c.find('soc_id')!=None and c.find('soc_id').text[:9]=='000001051']
bb=""
for rec in recs:
aa=(parseString(ET.tostring(rec)).toprettyxml(''))
bb=bb+aa
#print(bb)
newdoc = impl.createDocument(None, "file"+h1+bb, None)
newdoc.writexml(open('data.xml', 'w'),'\n'.join([line for line in newdoc.toprettyxml(indent=' '*2).split('\n') if line.strip()]))
I get the output data.xml file as.
<?xml version="1.0" ?><?xml version="1.0" ?>
<file<?xml version="1.0" ?>
<header>
<number_of_records>41</number_of_records>
</header>
<?xml version="1.0" ?>
<record>
<soc_id>00000105139E3B82</soc_id>
</record>
<?xml version="1.0" ?>
<soc_id>00000105139E3640</soc_id>
</record>
<?xml version="1.0" ?>
<header>
<number_of_records>41</number_of_records>
So you can see that many tags of <?xml version="1.0" ?> is being generated everywhere and in the last it again starts writing the data from first but leaves a 2 line spacing
So, what I understand is that you are trying to read a xml file at first place and then you are trying to write the same data into a different file.
In this process you are running into problems
from xml.etree import ElementTree as ET
tree = ET.parse('C:\\Users\\ca33.xml')
root = tree.getroot()
for header_ex in root.findall('header'):
h = [ET.tostring(c) for c in header_ex]
str_header=str(h)
for record_ex in root.findall('records'):
r = [ET.tostring(c) for c in record if c.find('soc_id')!=None and c.find('soc_id').text[:9]=='000001051']
for rec in r:
str_rec=str(rec)
with open("output.xml","w") as f:
f.write("<?xml version='1.0' encoding='ASCII' standalone='yes'?>")
f.write("<file>"+"<header>"+str_header+"</header>")
f.close()
Since you have not posted any random data, I assume it to be the way you had posted in question.I assume that record is a tag and it has something more or many sub/child tags inside it and that's the reason for me to loop twice over it.
And also stop using unnecessary imports in your code.

Categories