Editing existing XML file and sending post via Jboss - python

I have the following python method which runs though an xml file and parses it, and TRIES to edit a field:
import requests
import xml.etree.ElementTree as ET
import random
def runThrougheTree():
#open xml file
with open("testxml.xml") as xml:
from lxml import etree
#parse
parser = etree.XMLParser(strip_cdata=True, recover=True)
tree = etree.parse("testxml.xml", parser)
root= tree.getroot()
#ATTEMPT to edit field - will not work as of now
for ci in root.iter("CurrentlyInjured"):
ci.text = randomCurrentlyInjured(['sffdgdg', 'sdfsdfdsfsfsfsd','sfdsdfsdfds'])
#Overwrite initial xml file with new fields - will not work as of now
etree.ElementTree(root).write("testxml.xml",pretty_print=True, encoding='utf-8', xml_declaration=True)
#send post (Jboss)
requests.post('http://localhost:9000/something/RuleServiceImpl', data="testxml.xml)
def randomCurrentlyInjured(ran):
random.shuffle(ran)
return ran[0]
#-----------------------------------------------
if __name__ == "__main__":
runThrougheTree()
Edited XML file:
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:rule="http://somewebsite.com/" xmlns:ws="http://somewebsite.com/" xmlns:bus="http://somewebsite.com/">
<soapenv:Header/>
<soapenv:Body>
<ws:Respond>
<ws:busMessage>
<bus:SomeRef>insertnumericvaluehere</bus:SomeRef>
<bus:Content><![CDATA[<SomeDef>
<SomeType>ABCD</Sometype>
<A_Message>
<body>
<AnonymousField>
<RefIndicator>1111111111111</RefIndicator>
<OneMoreType>HIJK</OneMoreType>
<CurrentlyInjured>ABCDE</CurentlyInjured>
</AnonymousField>
</body>
</A_Message>
</SomeDef>]]></bus:Content>
<bus:MessageTypeId>somenumericvalue</bus:MessageTypeId>
</ws:busMessage>
</ws:Respond>
</soapenv:Body>
</soapenv:Envelope>
Issues:
The field is not being edited.
Jboss error: Caused by: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,1]
Message: Content is not allowed in prolog.
Note: I have ensured that there is no characters prior to first xml tag.

In the end, I was unable to use lxml, elementtree to edit the fields/post to Jboss as:
I had CDATA in the xml as mzjn pointed out in the comments
Jboss did not like the request after it had been parsed, even when the CDATA tags were removed.
Workaround/Eventual SOlution: I was able to (somewhat tediously) use .replace() in my script to edit the plaintext successfully, and then send the POST via Jboss. I hope this helps someone else someday!

Related

Don't get a usable XML string with .tostring when using xml.etree.ElementTree in python

I've tried a lot, but I haven't found a working solution to my problem, so I hope you can help me:
I am about to write a python module, which sends an XML request to a DNS server, receives an XML as response and should process this response. However, I am already failing in sending the request.
I have an XML base structure which I want to fill with different elements depending on the action to be performed on the DNS.
For this purpose I read in an XML string with .fromstring, edit the xml object and want to send it back to the server with .tostring. The problem is that .tostring does not return a usable xml string. The following example shows what I mean:
import xml.etree.ElementTree as ET
import requests
headers = {'HeaderSOAP': 'SOAPAction:urn:QIPServices#getEntry'}
body = """<soapenv:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-
instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:urn="urn:QIPServices">
<soapenv:Header/>
<soapenv:Body>
<urn:getEntry soapenv:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">
<login xsi:type="xsd:string">USERNAME</login>
<password xsi:type="xsd:string">PASSWD</password>
<sharedsecret xsi:type="xsd:string">SECRET</sharedsecret>
<VPN xsi:type="xsd:string">VPN</VPN>
<IPoderName xsi:type="xsd:string">IPorFQDN</IPoderName>
</urn:getEntry>
</soapenv:Body>
</soapenv:Envelope>
"""
root = ET.fromstring(body)
ThatsTheProblem = ET.tostring(root, encoding='utf-8')
print(ThatsTheProblem)
returns:
b'<ns0:Envelope xmlns:ns0="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:ns1="urn:QIPServices" xmlns:xsi="http://www.w3.org/2001/XMLSchema-
instance">\n <ns0:Header />\n <ns0:Body>\n
<ns1:getEntry
ns0:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">\n
<login xsi:type="xsd:string">USERNAME</login>\n <password
xsi:type="xsd:string">PASSWD</password>\n <sharedsecret
xsi:type="xsd:string">SECRET</sharedsecret>\n <VPN
xsi:type="xsd:string">VPN</VPN>\n <IPoderName
xsi:type="xsd:string">IPorFQDN</IPoderName>\n
</ns1:getEntry>\n </ns0:Body>\n </ns0:Envelope>'
Without having changed anything, the import and output not only changed the complete formatting, there are also whitespaces everywhere in the file. When I send this XML to the server using
response = requests.post(url,data=ThatsTheProblem, headers=headers)
I get the following answer:
Application failed during request deserialization: Unresolved prefix \'xsd\' for attribute value \'xsd:string\'\n
Which I attribute to the problem I described at the beginning.
If anyone has a solution to this problem I would be very grateful.
Thanks and have a nice day.

How to remove all " \n" in xml payload by using lxml library

I'm trying to change a text value in xml file, and I need to return the updated xml content by using lxml library. I can able to successfully update the value, but the updated xml file contains "\n"(next line) character as below.
Output:
<?xml version='1.0' encoding='ASCII'?>\n<Order>\n <content>\n <sID>123</sID>\n <spNumber>UserTemp</spNumber>\n <client>ARRCHANA</client>\n <orderType>Dashboard</orderType>\n </content>\n
<content>\n <sID>111</sID>\n <spNumber>UserTemp</spNumber>\n <client>ARRCHANA</client>\n <orderType>Dashboard</orderType>\n </content>\n
</Order>
Note: I didn't format the above xml output, and posted it how exactly I get it from output console.
Input:
<Order>
<content>
<sID>123</sID>
<spNumber>UserTemp</spNumber>
<client>WALLMART</client>
<orderType>Dashboard</orderType>
</content>
<content>
<sID>111</sID>
<spNumber>UserTemp</spNumber>
<client>D&B</client>
<orderType>Dashboard</orderType>
</content>
</Order>
Also, I tried to remove the \n character in output xml file by using
getValue = getValue.replace('\n','')
but, no luck.
The below code I used to update the xml( tag), and tried to return the updated xml content back.
Python Code:
from lxml import etree
from io import StringIO
import six
import numpy
def getListOfNodes(location):
f = open(location)
xml = f.read()
f.close()
#print(xml)
getXml = etree.parse(location)
for elm in getXml.xpath('.//Order//content/client'):
index='ARRCHANA'
elm.text=index
#with open('C:\\New folder\\temp.xml','w',newline='\r\n') as writeFile:
#writeFile.write(str(etree.tostring(getXml,pretty_print=True, xml_declaration=True)))
getValue=str((etree.tostring(getXml,pretty_print=True, xml_declaration=True)))
#getValue = getValue.replace('\n','')
#getValue=getValue.replace("\n","<br/>")
print(getValue)
return getValue
When I'm trying to open the response payload through firefox browser, then It says the below error message:
XML Parsing Error: no element found Location:
file:///C:/New%20folder/Confidential.xml
Line Number 1, Column 1:
It says that "no element found location in Line Number 1, column 1" in xml file when it found "\n" character in it.
Can somebody assist me the better way to update the text value, and return it back without any additional characters.
It's fixed by myself by using the below script:
code = root.xpath('.//Order//content/client')
if code:
code[0].text = 'ARRCHANA'
etree.ElementTree(root).write('D:\test.xml', pretty_print=True)

XPath with LXML Element

I am trying to parse an XML document using lxml etree. The XML doc I am parsing looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<metadata xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.openarchives.org/OAI/2.0/">\t
<codeBook version="2.5" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="ddi:codebook:2_5" xsi:schemaLocation="ddi:codebook:2_5 http://www.ddialliance.org/Specification/DDI-Codebook/2.5/XMLSchema/codebook.xsd">
<docDscr>
<citation>
<titlStmt>
<titl>Test Title</titl>
</titlStmt>
<prodStmt>
<prodDate/>
</prodStmt>
</citation>
</docDscr>
<stdyDscr>
<citation>
<titlStmt>
<titl>Test Title 2</titl>
<IDNo agency="UKDA">101</IDNo>
</titlStmt>
<rspStmt>
<AuthEnty>TestAuthEntry</AuthEnty>
</rspStmt>
<prodStmt>
<copyright>Yes</copyright>
</prodStmt>
<distStmt/>
<verStmt>
<version date="">1</version>
</verStmt>
</citation>
<stdyInfo>
<subject>
<keyword>2009</keyword>
<keyword>2010</keyword>
<topcClas>CLASS</topcClas>
<topcClas>ffdsf</topcClas>
</subject>
<abstract>This is an abstract piece of text.</abstract>
<sumDscr>
<timePrd event="single">2020</timePrd>
<nation>UK</nation>
<anlyUnit>Test</anlyUnit>
<universe>test</universe>
<universe>hello</universe>
<dataKind>fdsfdsf</dataKind>
</sumDscr>
</stdyInfo>
<method>
<dataColl>
<timeMeth>test timemeth</timeMeth>
<dataCollector>test data collector</dataCollector>
<sampProc>test sampprocess</sampProc>
<deviat>test deviat</deviat>
<collMode>test collMode</collMode>
<sources/>
</dataColl>
</method>
<dataAccs>
<setAvail>
<accsPlac>Test accsPlac</accsPlac>
</setAvail>
<useStmt>
<restrctn>NONE</restrctn>
</useStmt>
</dataAccs>
<othrStdyMat>
<relPubl>122</relPubl>
<relPubl>12332</relPubl>
</othrStdyMat>
</stdyDscr>
</codeBook>
</metadata>
I wrote the following code to try and process it:
from lxml import etree
import pdb
f = open('/vagrant/out2.xml', 'r')
xml_str = f.read()
xml_doc = etree.fromstring(xml_str)
f.close()
From what I understand from the lxml xpath docs, I should be able to get the text from a specific element as follows:
xml_doc.xpath('/metadata/codeBook/docDscr/citation/titlStmt/titl/text()')
However, when I run this it returns an empty array.
The only xpath I can get to return something is using a wildcard:
xml_doc.xpath('*')
Which returns [<Element {ddi:codebook:2_5}codeBook at 0x7f8da8a413f8>].
I've read through the docs and I'm not understanding what is going wrong with this. Any help is appreciated.
You need to take the default namespace into account so instead of
xml_doc.xpath('/metadata/codeBook/docDscr/citation/titlStmt/titl/text()')
use
xml_doc.xpath.xpath(
'/oai:metadata/ddi:codeBook/ddi:docDscr/ddi:citation/ddi:titlStmt/ddi:titl/text()',
namespaces={
'oai': 'http://www.openarchives.org/OAI/2.0/',
'ddi': 'ddi:codebook:2_5'
}
)

Extracting nested namespace from a xml using lxml

I'm new to Python and currently learning to parse XML. All seems to be going well until I hit a wall with nested namespaces.
Below is an snippet of my xml ( with a beginning and child element that I'm trying to parse:
<?xml version="1.0" encoding="UTF-8"?>
-<CompositionPlaylist xmlns="http://www.digicine.com/PROTO-ASDCP-CPL-20040511#">
<!-- Generated by orca_wrapping version 3.8.3-0 -->
<Id>urn:uuid:e0e43007-ca9b-4ed8-97b9-3ac9b272be7a</Id>
-------------
-------------
-------------
-<cc-cpl:MainClosedCaption xmlns:cc-cpl="http://www.digicine.com/PROTO- ASDCP-CC-CPL-20070926#"><Id>urn:uuid:0607e57f-edcc-46ec- 997a-d2fbc0c1ea3a</Id><EditRate>24 1</EditRate><IntrinsicDuration>2698</IntrinsicDuration></cc-cpl:MainClosedCaption>
------------
------------
------------
</CompositionPlaylist>
What I'm need is a solution to extract the URI of the local name 'MainClosedCaption'. In this case, I'm trying to extract the string "http://www.digicine.com/PROTO- ASDCP-CC-CPL-20070926#". I looked through a lot of tutorials but cannot seems to find a solution.
If there's anyone out there can lend your expertise, it would be much appreciated.
Here what I did so far with the help from the two contributors:
#!/usr/bin/env python
from xml.etree import ElementTree as ET #import ElementTree module as an alias ET
from lxml import objectify, etree
def parse():
import os
import sys
cpl_file = sys.argv[1]
xml_file = os.path.abspath(__file__)
xml_file = os.path.dirname(xml_file)
xml_file = os.path.join(xml_file,cpl_file)
with open(xml_file)as f:
xml = f.read()
tree = etree.XML(xml)
caption_namespace = etree.QName(tree.find('.//{*}MainClosedCaption')).namespace
print caption_namespace
print tree.nsmap
nsmap = {}
for ns in tree.xpath('//namespace::*'):
if ns[0]:
nsmap[ns[0]] = ns[1]
tree.xpath('//cc-cpl:MainClosedCaption', namespace=nsmap)
return nsmap
if __name__=="__main__":
parse()
But it's not working so far. I got the result 'None' when I used QName to locate the tag and its namespace. And when I try to locate all namespace in the XML using for loop as suggested in another post, I got the error 'Unknown return type: dict'
Any suggestions pls?
This program prints the namespace of the indicated tag:
from lxml import etree
xml = etree.XML('''<?xml version="1.0" encoding="UTF-8"?>
<CompositionPlaylist xmlns="http://www.digicine.com/PROTO-ASDCP-CPL-20040511#">
<!-- Generated by orca_wrapping version 3.8.3-0 -->
<Id>urn:uuid:e0e43007-ca9b-4ed8-97b9-3ac9b272be7a</Id>
<cc-cpl:MainClosedCaption xmlns:cc-cpl="http://www.digicine.com/PROTO-ASDCP-CC-CPL-20070926#">
<Id>urn:uuid:0607e57f-edcc-46ec- 997a-d2fbc0c1ea3a</Id>
<EditRate>24 1</EditRate>
<IntrinsicDuration>2698</IntrinsicDuration>
</cc-cpl:MainClosedCaption>
</CompositionPlaylist>
''')
print etree.QName(xml.find('.//{*}MainClosedCaption')).namespace
Result:
http://www.digicine.com/PROTO-ASDCP-CC-CPL-20070926#
Reference: http://lxml.de/tutorial.html#namespaces

Python xml etree DTD from a StringIO source?

I'm adapting the following code (created via advice in this question), that took an XML file and it's DTD and converted them to a different format. For this problem only the loading section is important:
xmldoc = open(filename)
parser = etree.XMLParser(dtd_validation=True, load_dtd=True)
tree = etree.parse(xmldoc, parser)
This worked fine, whilst using the file system, but I'm converting it to run via a web framework, where the two files are loaded via a form.
Loading the xml file works fine:
tree = etree.parse(StringIO(data['xml_file'])
But as the DTD is linked to in the top of the xml file, the following statement fails:
parser = etree.XMLParser(dtd_validation=True, load_dtd=True)
tree = etree.parse(StringIO(data['xml_file'], parser)
Via this question, I tried:
etree.DTD(StringIO(data['dtd_file'])
tree = etree.parse(StringIO(data['xml_file'])
Whilst the first line doesn't cause an error, the second falls over on unicode entities the DTD is meant to pick up (and does so in the file system version):
XMLSyntaxError: Entity 'eacute' not
defined, line 4495, column 46
How do I go about correctly loading this DTD?
Here's a short but complete example, using the custom resolver technique #Steven mentioned.
from StringIO import StringIO
from lxml import etree
data = dict(
xml_file = '''<?xml version="1.0"?>
<!DOCTYPE x SYSTEM "a.dtd">
<x><y>ézz</y></x>
''',
dtd_file = '''<!ENTITY eacute "é">
<!ELEMENT x (y)>
<!ELEMENT y (#PCDATA)>
''')
class DTDResolver(etree.Resolver):
def resolve(self, url, id, context):
return self.resolve_string(data['dtd_file'], context)
xmldoc = StringIO(data['xml_file'])
parser = etree.XMLParser(dtd_validation=True, load_dtd=True)
parser.resolvers.add(DTDResolver())
try:
tree = etree.parse(xmldoc, parser)
except etree.XMLSyntaxError as e:
# handle xml and validation errors
You could probably use a custom resolver. The docs actually give an example of doing this to provide a dtd.

Categories