Cannot Parse XML file using Python

Cannot Parse XML file using Python - python

<?xml version="1.0" encoding="utf-8"?>
<AcResponse
Command="hist"
TaskId="408709">
<element
name="/build.gradle"
id="93527">
<transaction
id="1117194"
type="promote"
time="1529083792"
user="soarfa99">
<comment>Automated promotion to parent stream by module build: jenkins-SC-MODULE-CS-SC-TRUNK-MedRec-DEV-CI-430</comment>
<version
virtual="11007/75"
real="36877/2"
virtualNamedVersion="CS-SC-TRUNK-INTG/75"
realNamedVersion="CS-SC-TRUNK-MedRec-DEV2_ar037601/2"
elem_type="text"
dir="no">
<issueNum>72768</issueNum>
</version>
</transaction>
<transaction
id="1111652"
type="promote"
time="1528100495"
user="dm041068">
<comment>SEDA file add- Debajyoti</comment>
<version
virtual="11007/74"
real="39225/1"
virtualNamedVersion="CS-SC-TRUNK-INTG/74"
realNamedVersion="CS-SC-TRUNK-CM-DEV-Debajyoti_dm041068/1"
elem_type="text"
dir="no">
<issueNum>72629</issueNum>
</version>
</transaction>
</element>
<streams>
<stream
id="11007"
name="CS-SC-TRUNK-INTG"
type="normal"/>
</streams>
</AcResponse>
This is the xml i am trying to parse, and i am trying to extract the attribute 'issueNum' with the following code:
tree=ET.parse(xml)
root=tree.getroot()
for item in root.findall('version'):
for child in item:
print(child.attrib['issueNum'])
Can you guys please help, get me the value of "issueNum".

You can use an xpath expression to find the values of issueNum:
from lxml import etree
xml = '''<?xml version="1.0" encoding="utf-8"?>
<AcResponse
Command="hist"
TaskId="408709">....'''
tree = etree.fromstring(xml)
issues = tree.xpath('//version/issueNum')
for issue in issues:
print(issue.text)
This prints:
72768
72629

Related

How to rewrite thid XML file?

I trying to rewrite this xml file containing this XML code:
<?xml version="1.0" encoding="UTF-8"?>
<BrowserAutomationStudioProject>
<ModelList>
<Model>
<Name>token</Name>
<Description ru="token" en="token"/>
<Value>5660191076:AAEY8RI3hXcI3dEvjWAj7p2e7DdxOMNjPfk8</Value>
</Model>
<Defaults/>
<Model>
<Name>chat_id</Name>
<Value>5578940124</Value>
</Model>
<Defaults/>
</ModelList>
</BrowserAutomationStudioProject>
My python code:
import xml.etree.ElementTree as ET
tree = ET.parse('Actual.xml')
root = tree.getroot()
for model in root.findall('Model'):
name = model.find('Name').text
if name == 'token':
model.find('Value').text = '123456789:ABCDEFGHIJKLMNOPQRSTUVWXYZ'
if name == 'chat_id':
model.find('Value').text = '1234567890'
tree.write('xml_file.xml')
It works but I get the same file:
<?xml version="1.0" encoding="UTF-8"?>
<BrowserAutomationStudioProject>
<ModelList>
<Model>
<Name>token</Name>
<Description ru="token" en="token"/>
<Value>5660191076:AAEY8RI3hXcI3dEvjWAj7p2e7DdxOMNjPfk8</Value>
</Model>
<Defaults/>
<Model>
<Name>chat_id</Name>
<Value>5578940124</Value>
</Model>
<Defaults/>
</ModelList>
</BrowserAutomationStudioProject>
What's wrong with my code?
Even ChatGPT can't help me haha
I even tried to print it but it doesn't work
What I should do?
Please help me.

As described in the documentation, Element.findall() finds only elements with a tag which are direct children of the current element.. You need to force ET to selects all subelements, on all levels beneath the current element by using //.
Since <Model> is not a direct child of root (it's a grandchild, or something to that effect :)), root.findall('Model') finds nothing. So to get ET to find it, you need to modify that to
root.findall('.//Model')
and it should work.

You could also use for model in root.findall('ModelList/Model').

If you know the order of the xml tag you can do something like pop() the values from a list by iterate through the tree:
import xml.etree.ElementTree as ET
tree = ET.parse('Actual.xml')
root = tree.getroot()
input_value = ['1234567890','123456789:ABCDEFGHIJKLMNOPQRSTUVWXYZ']
for elem in root.iter():
if elem.tag == "Value":
elem.text = input_value.pop()
print(elem.tag, elem.text)
tree.write('xml_file.xml')
Output:
<?xml version="1.0"?>
<BrowserAutomationStudioProject>
<ModelList>
<Model>
<Name>token</Name>
<Description ru="token" en="token" />
<Value>123456789:ABCDEFGHIJKLMNOPQRSTUVWXYZ</Value>
</Model>
<Defaults />
<Model>
<Name>chat_id</Name>
<Value>1234567890</Value>
</Model>
<Defaults />
</ModelList>
</BrowserAutomationStudioProject>

How to load xml file with specifc paragraph by xml in Python?

I have a xml file and its structure like that,
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<book>
<toc> <tocdiv pagenum="564">
<title>9thmemo</title>
<tocdiv pagenum="588">
<title>b</title>
</tocdiv>
</tocdiv></toc>
<chapter><title>9thmemo</title>
<para>...</para>
<para>...</para></chapter>
<chapter>...</chapter>
<chapter>...</chapter>
</book>
There are several chapters in the <book>...</book>, and each chapter has a title, I only want to read all content of this chapter,"9thmemo"(not others)
I tried to read by following code:
from xml.dom import minidom
filename = "result.xml"
file = minidom.parse(filename)
chapters = file.getElementsByTagName('chapter')
for i in range(10):
print(chapters[i])
I only get the address of each chapter...
if I add some sub-element like chapters[i].title, it shows cannot find this attribute

I only want to read all content of this chapter,"9thmemo"(not others)
The problem with the code is that it does not try to locate the specific 'chapter' while the answer code uses xpath in order to locate it.
Try the below
import xml.etree.ElementTree as ET
xml = '''<?xml version="1.0" encoding="UTF-8"?>
<book>
<toc>
<tocdiv pagenum="564">
<title>9thmemo</title>
<tocdiv pagenum="588">
<title>b</title>
</tocdiv>
</tocdiv>
</toc>
<chapter>
<title>9thmemo</title>
<para>A</para>
<para>B</para>
</chapter>
<chapter>...</chapter>
<chapter>...</chapter>
</book>'''
root = ET.fromstring(xml)
chapter = root.find('.//chapter/[title="9thmemo"]')
para_data = ','.join(p.text for p in chapter.findall('para'))
print(para_data)
output
A,B

How to get the content of child->child->child->child in XML file using Python

<?xml version="1.0" encoding="UTF-8"?>
<Document xmlns="urn:iso:std:iso:20022:tech:xsd:camt.056.001.01" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<FIToFIPmtCxlReq>
<Assgnmt>
<Id>TEST-ISO-81</Id>
<Assgnr>
<Agt>
<FinInstnId>
<BIC>CCCCGB2L</BIC>
</FinInstnId>
</Agt>
</Assgnr>
<Assgne>
<Agt>
<FinInstnId>
<BIC>MMMMGB2L</BIC>
</FinInstnId>
</Agt>
</Assgne>
<CreDtTm>2009-03-24T11:22:59</CreDtTm>
</Assgnmt>
<TxInf>
<CxlId>103012345</CxlId>
<Case>
<Id>ISO_TEST_CASE</Id>
<Cretr>
<Agt>
<FinInstnId>
<BIC>MMMMGB2L</BIC>
</FinInstnId>
</Agt>
</Cretr>
</Case>
</TxInf>
</Undrlyg>
</FIToFIPmtCxlReq>
</Document>
Here I want to get the content of "TxInf" like all its child and child of child and the data.
What I have tried is :
import xml.etree.ElementTree as ET
from xml.etree import ElementTree
tree = ET.parse('R3-CAMT.056.001.07-ISO-V.XML')
root = tree.getroot()
for element in root.iter():
if element.tag == "{urn:iso:std:iso:20022:tech:xsd:camt.056.001.01}TxInf":
tree._setroot(element.tag)
print(root.tag)
print(root.attrib)
Please suggest if I can change the root with _setroot or any other possible method

Try something along these lines on your code to see if it works:
for r in root.findall(".//*"):
if 'TxInf' in r.tag:
print(ET.tostring(r))
By the way, it may be easier to do it with lxml, if you can use it.

python lxml element attrib issue

I have to build a XML file that looks like the following:
<?xml version='1.0' encoding='ISO-8859-1'?>
<Document protocol="OCI" xmlns="C">
<sessionId>xmlns=874587878</sessionId>
<command xmlns="http://www.w3.org/2001/XMLSchema-instance" xsi:type="UserGetRegistrationListRequest">
<userId>data</userId>
</command>
</Document>
I got everything working except for the command attrib xsi:type="UserGetRegistrationListRequest"
I can't get the : in the attrib of the command element.
Can someone please help me with this issue?
I am using Python 3.5.
My current code is
from lxml import etree
root = etree.Element("Document", protocol="OCI", xmlns="C")
print(root.tag)
root.append(etree.Element("sessionId") )
sessionId=root.find("sessionId")
sessionId.text = "xmlns=78546587854"
root.append(etree.Element("command", xmlns="http://www.w3.org/2001/XMLSchema-instance",xsitype = "UserGetRegistrationListRequest" ) )
command=root.find("command")
userID = etree.SubElement(command, "userId")
userID.text = "data"
print(etree.tostring(root, pretty_print=True))
tree = etree.ElementTree(root)
tree.write('output.xml', pretty_print=True, xml_declaration=True, encoding="ISO-8859-1")
and then i get this back
<?xml version='1.0' encoding='ISO-8859-1'?>
<Document protocol="OCI" xmlns="C">
<sessionId>xmlns=78546587854</sessionId>
<command xmlns="http://www.w3.org/2001/XMLSchema-instance" xsitype="UserGetRegistrationListRequest">
<userId>data</userId>
</command>

QName can be used to create the xsi:type attribute.
from lxml import etree
root = etree.Element("Document", protocol="OCI", xmlns="C")
# Create sessionId element
sessionId = etree.SubElement(root, "sessionId")
sessionId.text = "xmlns=78546587854"
# Create xsi:type attribute using QName
xsi_type = etree.QName("http://www.w3.org/2001/XMLSchema-instance", "type")
# Create command element, with xsi:type attribute
command = etree.SubElement(root, "command", {xsi_type: "UserGetRegistrationListRequest"})
# Create userId element
userID = etree.SubElement(command, "userId")
userID.text = "data"
Resulting XML (with the proper xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" declaration):
<?xml version='1.0' encoding='ISO-8859-1'?>
<Document protocol="OCI" xmlns="C">
<sessionId>xmlns=78546587854</sessionId>
<command xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="UserGetRegistrationListRequest">
<userId>data</userId>
</command>
</Document>
Note that the xsi prefix does not need to be explicitly defined in the Python code. lxml defines default prefixes for some well-known namespace URIs, including xsi for http://www.w3.org/2001/XMLSchema-instance.

Combine XML files similar to ConfigParser's multiple file support

I'm writing an application configuration module that uses XML in its files. Consider the following example:
<?xml version="1.0" encoding="UTF-8"?>
<Settings>
<PathA>/Some/path/to/directory</PathA>
<PathB>/Another/path</PathB>
</Settings>
Now, I'd like to override certain elements in a different file that gets loaded afterwards. Example of the override file:
<?xml version="1.0" encoding="UTF-8"?>
<Settings>
<PathB>/Change/this/path</PathB>
</Settings>
When querying the document (with overrides) with XPath, I'd like to get this as the element tree:
<?xml version="1.0" encoding="UTF-8"?>
<Settings>
<PathA>/Some/path/to/directory</PathA>
<PathB>/Change/this/path</PathB>
</Settings>
This is similar to what Python's ConfigParser does with its read() method, but done with XML. How can I implement this?

You could convert the XML into an instance of Python class:
import lxml.etree as ET
import io
class Settings(object):
def __init__(self,text):
root=ET.parse(io.BytesIO(text)).getroot()
self.settings=dict((elt.tag,elt.text) for elt in root.xpath('/Settings/*'))
def update(self,other):
self.settings.update(other.settings)
text='''\
<?xml version="1.0" encoding="UTF-8"?>
<Settings>
<PathA>/Some/path/to/directory</PathA>
<PathB>/Another/path</PathB>
</Settings>'''
text2='''\
<?xml version="1.0" encoding="UTF-8"?>
<Settings>
<PathB>/Change/this/path</PathB>
</Settings>'''
s=Settings(text)
s2=Settings(text2)
s.update(s2)
print(s.settings)
yields
{'PathB': '/Change/this/path', 'PathA': '/Some/path/to/directory'}

Must you use XML? The same could be achieved with JSON much simpler:
Suppose this is the text from the first config file:
text='''
{
"PathB": "/Another/path",
"PathA": "/Some/path/to/directory"
}
'''
and this is the text from the second:
text2='''{
"PathB": "/Change/this/path"
}'''
Then to merge the to, you simply load each into a dict, and call update:
import json
config=json.loads(text)
config2=json.loads(text2)
config.update(config2)
print(config)
yields the Python dict:
{u'PathB': u'/Change/this/path', u'PathA': u'/Some/path/to/directory'}

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Cannot Parse XML file using Python - python

Related

How to rewrite thid XML file?

How to load xml file with specifc paragraph by xml in Python?

How to get the content of child->child->child->child in XML file using Python

python lxml element attrib issue

Combine XML files similar to ConfigParser's multiple file support

Categories

Resources