python lxml element attrib issue - python

I have to build a XML file that looks like the following:
<?xml version='1.0' encoding='ISO-8859-1'?>
<Document protocol="OCI" xmlns="C">
<sessionId>xmlns=874587878</sessionId>
<command xmlns="http://www.w3.org/2001/XMLSchema-instance" xsi:type="UserGetRegistrationListRequest">
<userId>data</userId>
</command>
</Document>
I got everything working except for the command attrib xsi:type="UserGetRegistrationListRequest"
I can't get the : in the attrib of the command element.
Can someone please help me with this issue?
I am using Python 3.5.
My current code is
from lxml import etree
root = etree.Element("Document", protocol="OCI", xmlns="C")
print(root.tag)
root.append(etree.Element("sessionId") )
sessionId=root.find("sessionId")
sessionId.text = "xmlns=78546587854"
root.append(etree.Element("command", xmlns="http://www.w3.org/2001/XMLSchema-instance",xsitype = "UserGetRegistrationListRequest" ) )
command=root.find("command")
userID = etree.SubElement(command, "userId")
userID.text = "data"
print(etree.tostring(root, pretty_print=True))
tree = etree.ElementTree(root)
tree.write('output.xml', pretty_print=True, xml_declaration=True, encoding="ISO-8859-1")
and then i get this back
<?xml version='1.0' encoding='ISO-8859-1'?>
<Document protocol="OCI" xmlns="C">
<sessionId>xmlns=78546587854</sessionId>
<command xmlns="http://www.w3.org/2001/XMLSchema-instance" xsitype="UserGetRegistrationListRequest">
<userId>data</userId>
</command>

QName can be used to create the xsi:type attribute.
from lxml import etree
root = etree.Element("Document", protocol="OCI", xmlns="C")
# Create sessionId element
sessionId = etree.SubElement(root, "sessionId")
sessionId.text = "xmlns=78546587854"
# Create xsi:type attribute using QName
xsi_type = etree.QName("http://www.w3.org/2001/XMLSchema-instance", "type")
# Create command element, with xsi:type attribute
command = etree.SubElement(root, "command", {xsi_type: "UserGetRegistrationListRequest"})
# Create userId element
userID = etree.SubElement(command, "userId")
userID.text = "data"
Resulting XML (with the proper xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" declaration):
<?xml version='1.0' encoding='ISO-8859-1'?>
<Document protocol="OCI" xmlns="C">
<sessionId>xmlns=78546587854</sessionId>
<command xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="UserGetRegistrationListRequest">
<userId>data</userId>
</command>
</Document>
Note that the xsi prefix does not need to be explicitly defined in the Python code. lxml defines default prefixes for some well-known namespace URIs, including xsi for http://www.w3.org/2001/XMLSchema-instance.

Related

How to rewrite thid XML file?

I trying to rewrite this xml file containing this XML code:
<?xml version="1.0" encoding="UTF-8"?>
<BrowserAutomationStudioProject>
<ModelList>
<Model>
<Name>token</Name>
<Description ru="token" en="token"/>
<Value>5660191076:AAEY8RI3hXcI3dEvjWAj7p2e7DdxOMNjPfk8</Value>
</Model>
<Defaults/>
<Model>
<Name>chat_id</Name>
<Value>5578940124</Value>
</Model>
<Defaults/>
</ModelList>
</BrowserAutomationStudioProject>
My python code:
import xml.etree.ElementTree as ET
tree = ET.parse('Actual.xml')
root = tree.getroot()
for model in root.findall('Model'):
name = model.find('Name').text
if name == 'token':
model.find('Value').text = '123456789:ABCDEFGHIJKLMNOPQRSTUVWXYZ'
if name == 'chat_id':
model.find('Value').text = '1234567890'
tree.write('xml_file.xml')
It works but I get the same file:
<?xml version="1.0" encoding="UTF-8"?>
<BrowserAutomationStudioProject>
<ModelList>
<Model>
<Name>token</Name>
<Description ru="token" en="token"/>
<Value>5660191076:AAEY8RI3hXcI3dEvjWAj7p2e7DdxOMNjPfk8</Value>
</Model>
<Defaults/>
<Model>
<Name>chat_id</Name>
<Value>5578940124</Value>
</Model>
<Defaults/>
</ModelList>
</BrowserAutomationStudioProject>
What's wrong with my code?
Even ChatGPT can't help me haha
I even tried to print it but it doesn't work
What I should do?
Please help me.
As described in the documentation, Element.findall() finds only elements with a tag which are direct children of the current element.. You need to force ET to selects all subelements, on all levels beneath the current element by using //.
Since <Model> is not a direct child of root (it's a grandchild, or something to that effect :)), root.findall('Model') finds nothing. So to get ET to find it, you need to modify that to
root.findall('.//Model')
and it should work.
You could also use for model in root.findall('ModelList/Model').
If you know the order of the xml tag you can do something like pop() the values from a list by iterate through the tree:
import xml.etree.ElementTree as ET
tree = ET.parse('Actual.xml')
root = tree.getroot()
input_value = ['1234567890','123456789:ABCDEFGHIJKLMNOPQRSTUVWXYZ']
for elem in root.iter():
if elem.tag == "Value":
elem.text = input_value.pop()
print(elem.tag, elem.text)
tree.write('xml_file.xml')
Output:
<?xml version="1.0"?>
<BrowserAutomationStudioProject>
<ModelList>
<Model>
<Name>token</Name>
<Description ru="token" en="token" />
<Value>123456789:ABCDEFGHIJKLMNOPQRSTUVWXYZ</Value>
</Model>
<Defaults />
<Model>
<Name>chat_id</Name>
<Value>1234567890</Value>
</Model>
<Defaults />
</ModelList>
</BrowserAutomationStudioProject>

Issue with python script while parsing pom file in project

I'm having issue extracting version number using python script. Its returning none while running the script. Can someone help me on this ?
Python Script:
import xml.etree.ElementTree as ET
tree = ET.parse('pom.xml')
root = tree.getroot()
releaseVersion = root.find("version")
print(releaseVersion)
pom.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://maven.apache.org/POM/4.0.0"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
<artifactId>watcher</artifactId>
<version>0.0.1-SNAPSHOT</version>
<groupId>com.test</groupId>
<name>file</name>
<packaging>jar</packaging>
<parent>
<artifactId>spring-boot-starter-parent</artifactId>
<groupId>org.springframework.boot</groupId>
<relativePath/>
<version>2.6.1</version>
</parent>
</project>
You're not taking into account that all of your elements are in a default namespace defined by xmlns="http://maven.apache.org/POM/4.0.0" in your <project> element.
So you have to create your query with this namespace.
import xml.etree.ElementTree as ET
tree = ET.parse('pom.xml')
root = tree.getroot()
NS = { 'maven' : 'http://maven.apache.org/POM/4.0.0' }
releaseVersion = root.find("maven:version",NS)
print(releaseVersion.text)
Here NS = { ... } defines the namespace (in the following referred to by its prefix maven) used in the following XPath expression.
Your pom.xml has a namespace xmlns="http://maven.apache.org/POM/4.0.0" in the project tag.
If you must search with fullname, you need to follow {namespace}tag
>> root.find("{http://maven.apache.org/POM/4.0.0}version")
<Element '{http://maven.apache.org/POM/4.0.0}version' at 0x0000014635EC0A40>
But if you don't bother you can search with {*}tag
>> root.find("{*}version")
<Element '{http://maven.apache.org/POM/4.0.0}version' at 0x0000014635EC0A40>

How to get the content of child->child->child->child in XML file using Python

<?xml version="1.0" encoding="UTF-8"?>
<Document xmlns="urn:iso:std:iso:20022:tech:xsd:camt.056.001.01" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<FIToFIPmtCxlReq>
<Assgnmt>
<Id>TEST-ISO-81</Id>
<Assgnr>
<Agt>
<FinInstnId>
<BIC>CCCCGB2L</BIC>
</FinInstnId>
</Agt>
</Assgnr>
<Assgne>
<Agt>
<FinInstnId>
<BIC>MMMMGB2L</BIC>
</FinInstnId>
</Agt>
</Assgne>
<CreDtTm>2009-03-24T11:22:59</CreDtTm>
</Assgnmt>
<TxInf>
<CxlId>103012345</CxlId>
<Case>
<Id>ISO_TEST_CASE</Id>
<Cretr>
<Agt>
<FinInstnId>
<BIC>MMMMGB2L</BIC>
</FinInstnId>
</Agt>
</Cretr>
</Case>
</TxInf>
</Undrlyg>
</FIToFIPmtCxlReq>
</Document>
Here I want to get the content of "TxInf" like all its child and child of child and the data.
What I have tried is :
import xml.etree.ElementTree as ET
from xml.etree import ElementTree
tree = ET.parse('R3-CAMT.056.001.07-ISO-V.XML')
root = tree.getroot()
for element in root.iter():
if element.tag == "{urn:iso:std:iso:20022:tech:xsd:camt.056.001.01}TxInf":
tree._setroot(element.tag)
print(root.tag)
print(root.attrib)
Please suggest if I can change the root with _setroot or any other possible method
Try something along these lines on your code to see if it works:
for r in root.findall(".//*"):
if 'TxInf' in r.tag:
print(ET.tostring(r))
By the way, it may be easier to do it with lxml, if you can use it.

How to force ElementTree to keep xmlns attribute within its original element?

I have an input XML file:
<?xml version='1.0' encoding='utf-8'?>
<configuration>
<runtime name="test" version="1.2" xmlns:ns0="urn:schemas-microsoft-com:asm.v1">
<ns0:assemblyBinding>
<ns0:dependentAssembly />
</ns0:assemblyBinding>
</runtime>
</configuration>
...and Python script:
import xml.etree.ElementTree as ET
file_xml = 'test.xml'
tree = ET.parse(file_xml)
root = tree.getroot()
print (root.tag)
print (root.attrib)
element_runtime = root.find('.//runtime')
print (element_runtime.tag)
print (element_runtime.attrib)
tree.write(file_xml, xml_declaration=True, encoding='utf-8', method="xml")
...which gives the following output:
>test.py
configuration
{}
runtime
{'name': 'test', 'version': '1.2'}
...and has an undesirable side-effect of modifying XML into:
<?xml version='1.0' encoding='utf-8'?>
<configuration xmlns:ns0="urn:schemas-microsoft-com:asm.v1">
<runtime name="test" version="1.2">
<ns0:assemblyBinding>
<ns0:dependentAssembly />
</ns0:assemblyBinding>
</runtime>
</configuration>
My original script modifies XML so I do have to call tree.write and save edited file. But the problem is that ElementTree parser moves xmlns attribute from runtime element up to the root element configuration which is not desirable in my case.
I can't remove xmlns attribute from the root element (remove it from the dictionary of its attributes) as it is not listed in a list of its attributes (unlike the attributes listed for runtime element).
Why does xmlns attribute never gets listed within the list of attributes for any element?
How to force ElementTree to keep xmlns attribute within its original element?
I am using Python 3.5.1 on Windows.
xml.etree.ElementTree pulls all namespaces into the first element as it internally doesn't track on which element the namespace was declared originally.
If you don't want that, you'll have to write your own serialisation logic.
The better alternative would be to use lxml instead of xml.etree, because it preserves the location where a namespace prefix is declared.
Following #mata advice, here I give an answer with an example with code and xml file attached.
The xml input is as shown in the picture (original and modified)
The python codes check the NtnlCcy Name and if it is "EUR", convert the Price to USD (by multiplying EURUSD: = 1.2) and change the NtnlCcy Name to "USD".
The python code is as follows:
from lxml import etree
pathToXMLfile = r"C:\Xiang\codes\Python\afmreports\test_original.xml"
tree = etree.parse(pathToXMLfile)
root = tree.getroot()
EURUSD = 1.2
for Rchild in root:
print ("Root child: ", Rchild.tag, ". \n")
if Rchild.tag.endswith("Pyld"):
for PyldChild in Rchild:
print ("Pyld Child: ", PyldChild.tag, ". \n")
Doc = Rchild.find('{001.003}Document')
FinInstrNodes = Doc.findall('{001.003}FinInstr')
for FinInstrNode in FinInstrNodes:
FinCcyNode = FinInstrNode.find('{001.003}NtnlCcy')
FinPriceNode = FinInstrNode.find('{001.003}Price')
FinCcyNodeText = ""
if FinCcyNode is not None:
CcyNodeText = FinCcyNode.text
if CcyNodeText == "EUR":
PriceText = FinPriceNode.text
Price = float(PriceText)
FinPriceNode.text = str(Price * EURUSD)
FinCcyNode.text = "USD"
tree.write(r"C:\Xiang\codes\Python\afmreports\test_modified.xml", encoding="utf-8", xml_declaration=True)
print("\n the program runs to the end! \n")
As we compare the original and modified xml files, the namespace remains unchanged, the whole structure of the xml remains unchanged, only some NtnlCcy and Price Nodes have been changed, as desired.
The only minor difference we do not want is the first line. In the original xml file, it is <?xml version="1.0" encoding="UTF-8"?>, while in the modified xml file, it is <?xml version='1.0' encoding='UTF-8'?>. The quotation sign changes from double quotation to single quotation. But we think this minor difference should not matter.
The original file context will be attached for your easy test:
<?xml version="1.0" encoding="UTF-8"?>
<BizData xmlns="001.001">
<Hdr>
<AppHdr xmlns="001.002">
<Fr>
<Id>XXX01</Id>
</Fr>
<To>
<Id>XXX02</Id>
</To>
<CreDt>2019-10-25T15:38:30</CreDt>
</AppHdr>
</Hdr>
<Pyld>
<Document xmlns="001.003">
<FinInstr>
<Id>NLENX240</Id>
<FullNm>AO.AAI</FullNm>
<NtnlCcy>EUR</NtnlCcy>
<Price>9</Price>
</FinInstr>
<FinInstr>
<Id>NLENX681</Id>
<FullNm>AO.ABN</FullNm>
<NtnlCcy>USD</NtnlCcy>
<Price>10</Price>
</FinInstr>
<FinInstr>
<Id>NLENX320</Id>
<FullNm>AO.ING</FullNm>
<NtnlCcy>EUR</NtnlCcy>
<Price>11</Price>
</FinInstr>
</Document>
</Pyld>

Get all contents between the result tags of a SOAP response in Python

I have this SOAP response :
<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<soap:Body>
<GetCurrencyCodeByCurrencyNameResponse xmlns="http://www.webserviceX.NET">
<GetCurrencyCodeByCurrencyNameResult>
<NewDataSet />
</GetCurrencyCodeByCurrencyNameResult>
</GetCurrencyCodeByCurrencyNameResponse>
</soap:Body></soap:Envelope>
And I use this code to get the contents of the result tag:
import xml.etree.ElementTree as ET
root = ET.fromstring(SoapResponse)
child=root[0][0][0]
contenu= child.text
But when I have a response which contains other tags inside the results tag (other children) like this SOAP response :
<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<soap:Body>
<GetUserInfoResponse xmlns="http://tempuri.org/">
<GetUserInfoResult>
<ErrorOccured>true</ErrorOccured>
<ErrorStr>System.Data.OleDb.OleDbException: Conversion failed when converting the varchar value '4CuTrO8O6Tn' to data type int.
at System.Data.OleDb.OleDbDataReader.ProcessResults(OleDbHResult hr)
at System.Data.OleDb.OleDbDataReader.NextResult()
at System.Data.OleDb.OleDbCommand.ExecuteReaderInternal(CommandBehavior behavior, String method)
at System.Data.OleDb.OleDbCommand.ExecuteReader(CommandBehavior behavior)
at Service.GetUserInfo(String username, String password)
</ErrorStr>
<SqlQuery>SELECT * FROM users WHERE username=''+(select convert(int,CHAR(52)+CHAR(67)+CHAR(117)+CHAR(84)+CHAR(114)+CHAR(79)+CHAR(56)+CHAR(79)+CHAR(54)+CHAR(84)+CHAR(110)) FROM syscolumns)+'' AND password='32cc5886dc1fa8c106a02056292c4654'
</SqlQuery><id>-1</id><joindate>0001-01-01T00:00:00</joindate>
</GetUserInfoResult>
</GetUserInfoResponse>
</soap:Body></soap:Envelope>
I can not get the contents between result tags with the previous code.
So, how can I get the whole contents between the result tags of a SOAP response ?
I'm not quite clear on exactly what you want, but this might do it:
# This gets all of the text data in the indicated region
import xml.etree.ElementTree as ET
root = ET.fromstring(SoapResponse)
child=root[0][0][0]
contenu = ET.tostring(child, encoding='UTF-8', method='text').decode('UTF-8')
Or
# This gets the indicated XML fragment as a string
import xml.etree.ElementTree as ET
root = ET.fromstring(SoapResponse)
child=root[0][0][0]
contenu = ET.tostring(child, encoding='UTF-8', method='xml')

Categories