I am in the process of implementing some pre-defined WSDL which uses multiple namespaces. For simplicity, I have requests that look something like:
<?xml version='1.0' encoding='UTF-8'?>
<soapenv:Envelope xmlns:soapenv="schemas.xmlsoap.org/soap/envelope/">
<soapenv:Header>
</soapenv:Header>
<soapenv:Body>
<a:Foo xmlns:a="www.example.com/schema/a" AttrA="a1" AttrB="b2">
<b:Baz xmlns:b="www.example.com/schema/b" AttrC="c3"/>
<a:Bar>blah</a:Bar>
</a:Foo>
</soapenv:Body>
</soapenv:Envelope>
I've been using code like:
from spyne.model.primitive import Unicode
from spyne.model.complex import Iterable, XmlAttribute, ComplexModel, ComplexModelMeta, ComplexModelBase
from spyne.service import ServiceBase
from spyne.protocol.soap import Soap11
from spyne.application import Application
from spyne.decorator import srpc, rpc
class BazBase(ComplexModelBase):
__namespace__ = "www.example.com/schema/b"
__metaclass__ = ComplexModelMeta
class Baz(BazBase):
Thing = Unicode
AttrC = XmlAttribute(Unicode)
class FooService(ServiceBase):
__namespace__ = "www.example.com/schema/a"
#rpc(XmlAttribute(Unicode), XmlAttribute(Unicode), Baz, Unicode, _returns=Iterable(Unicode))
def Foo(ctx, AttrA, AttrB, Baz, Bar):
yield 'Hello, %s' % Bar
app = Application([FooService],
"www.example.com/schema/a",
in_protocol=Soap11(validator='lxml'),
out_protocol=Soap11(),
)
to parse things, but I get:
<?xml version='1.0' encoding='utf-8'?>
<senv:Envelope xmlns:senv="schemas.xmlsoap.org/soap/envelope/">
<senv:Body>
<senv:Fault>
<faultcode>senv:Client.SchemaValidationError</faultcode>
<faultstring>
<string>:1:0:ERROR:SCHEMASV:SCHEMAV_ELEMENT_CONTENT:
Element '{www.example.com/schema/b}Baz': This element
is not expected. Expected is one of (
{www.example.com/schema/a}Baz,
{www.example.com/schema/a}Bar ).</faultstring>
<faultactor></faultactor>
</senv:Fault>
</senv:Body>
</senv:Envelope>
as the response.
I've tried using the schema_tag parameter, but nothing I put in there seems to work, with errors like 'ValueError: Unhandled schema_tag / type combination.' or 'ValueError: InvalidTagName'
What do I need to do to properly handle multiple namespaces in the same request document?
To my knowledge, Spyne does the right thing there and the request is incorrect. Child elements are always under parent's namespace. The children to those elements can be in own namespace.
<a:Foo xmlns:a="www.example.com/schema/a" AttrA="a1" AttrB="b2">
<a:Baz xmlns:b="www.example.com/schema/b" AttrC="c3"/>
<a:Bar>blah</a:Bar>
</a:Foo>
That said, you can just use the soft validator which doesn't care about namespaces.
Related
The output of my xml parsing is not es expected.
The xml file
<?xml version="1.0"?>
<stationaer xsi:schemaLocation="http:/foo.bar" xmlns="http://foo.bar" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<einrichtung>
<name>Name</name>
</einrichtung>
<einrichtung>
<name>Name</name>
</einrichtung>
</stationaer>
I would expect to get something like root.tag == 'stationaer' and child.tag = 'einrichtung'.
See the outpout at the end.
This is the MWE
#!/usr/bin/env python3
import pathlib
import lxml
from lxml import etree
import pandas
xml_src = '''<?xml version="1.0"?>
<stationaer xsi:schemaLocation="http:/foo.bar" xmlns="http://foo.bar" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<einrichtung>
<name>Name</name>
</einrichtung>
<einrichtung>
<name>Name</name>
</einrichtung>
</stationaer>
'''
# tree = etree.parse(file_path)
# root = tree.getroot()
root = etree.fromstring(xml_src)
print(repr(root.tag))
print(repr(root.text))
child = root.getchildren()[0]
print(repr(child.tag))
print(repr(child.text))
The output for root is
'{http://foo.bar}stationaer'
'\n '
and for child
'{http://foo.bar}einrichtung'
'\n '
I don't understand what's going on here and why that URL is in the output.
This is actually not unexpected. The elements in the XML document are bound to the http://foo.bar default namespace. The namespace is declared by xmlns="http://foo.bar" on the root element and the declaration is inherited by all descendants.
The special notation with the namespace URI enclosed in curly braces ({http://foo.bar}stationaer) is never used in XML documents, but it is used by lxml and ElementTree when printing element (tag) names. It can also be used when searching or creating elements that belong to a namespace.
More information:
https://www.w3.org/TR/xml-names/
https://lxml.de/tutorial.html#namespaces
https://docs.python.org/3/library/xml.etree.elementtree.html#parsing-xml-with-namespaces
I've tried a lot, but I haven't found a working solution to my problem, so I hope you can help me:
I am about to write a python module, which sends an XML request to a DNS server, receives an XML as response and should process this response. However, I am already failing in sending the request.
I have an XML base structure which I want to fill with different elements depending on the action to be performed on the DNS.
For this purpose I read in an XML string with .fromstring, edit the xml object and want to send it back to the server with .tostring. The problem is that .tostring does not return a usable xml string. The following example shows what I mean:
import xml.etree.ElementTree as ET
import requests
headers = {'HeaderSOAP': 'SOAPAction:urn:QIPServices#getEntry'}
body = """<soapenv:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-
instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:urn="urn:QIPServices">
<soapenv:Header/>
<soapenv:Body>
<urn:getEntry soapenv:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">
<login xsi:type="xsd:string">USERNAME</login>
<password xsi:type="xsd:string">PASSWD</password>
<sharedsecret xsi:type="xsd:string">SECRET</sharedsecret>
<VPN xsi:type="xsd:string">VPN</VPN>
<IPoderName xsi:type="xsd:string">IPorFQDN</IPoderName>
</urn:getEntry>
</soapenv:Body>
</soapenv:Envelope>
"""
root = ET.fromstring(body)
ThatsTheProblem = ET.tostring(root, encoding='utf-8')
print(ThatsTheProblem)
returns:
b'<ns0:Envelope xmlns:ns0="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:ns1="urn:QIPServices" xmlns:xsi="http://www.w3.org/2001/XMLSchema-
instance">\n <ns0:Header />\n <ns0:Body>\n
<ns1:getEntry
ns0:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">\n
<login xsi:type="xsd:string">USERNAME</login>\n <password
xsi:type="xsd:string">PASSWD</password>\n <sharedsecret
xsi:type="xsd:string">SECRET</sharedsecret>\n <VPN
xsi:type="xsd:string">VPN</VPN>\n <IPoderName
xsi:type="xsd:string">IPorFQDN</IPoderName>\n
</ns1:getEntry>\n </ns0:Body>\n </ns0:Envelope>'
Without having changed anything, the import and output not only changed the complete formatting, there are also whitespaces everywhere in the file. When I send this XML to the server using
response = requests.post(url,data=ThatsTheProblem, headers=headers)
I get the following answer:
Application failed during request deserialization: Unresolved prefix \'xsd\' for attribute value \'xsd:string\'\n
Which I attribute to the problem I described at the beginning.
If anyone has a solution to this problem I would be very grateful.
Thanks and have a nice day.
I'm coming from Java, C++ and Delphi, now I'm working on a small project in Python. So far I could most problems solve but I have following unsolved:
I just want to substitute/overwrite a method getAttribute() of a XML Node by type casting the xml node into ext_xml_node, so that every function of the fore mentioned project uses the new getAttribute. As far as I could read there is no way to real type cast (Like C++ etc.) in python, so I came on the idea to make at certain functions (which calls other sub functions like a dispatcher with a xml node argument) a type cast of the argument to the new class ext_xml_node
class ext_xml_node(xml_node):
...
def getAttribute(self, name):
unhandled_value = super...(name)
handled_value= dosomethingiwth(unhandled_value)
return handled_value
def dispatcher(self, xml_node):
for child_node in xml_node.childNodes:
if child_node.nodeName == 'setvariable':
bla = ext_xml_node(child_node)
self.handle_setvariable_statement(bla)
def handle_setvariable_statement(xml_node):
varname= xml_node.getAttribute("varname")
# Now it should call ext_xml_node.getAttribute Method
I don't want to substitute each getAttribute function in this project, and is there another way (duck typing surely isn't working) or should I really write a function with that yield over each attribute - and how?
lxml provides custom element classes which should suit your needs.
xml = '''\
<?xml version="1.0" encoding="utf-8"?>
<root xmlns="http://example.com">
<element1>
<element2/>
</element1>
</root>'''
from lxml import etree
class MyXmlClass1(etree.ElementBase):
def getAttribute(self):
print '1'
class MyXmlClass2(etree.ElementBase):
def getAttribute(self):
print '2'
nsprefix = '{http://example.com}'
fallback = etree.ElementDefaultClassLookup(element = MyXmlClass1)
lookup = etree.ElementNamespaceClassLookup(fallback)
parser = etree.XMLParser()
parser.set_element_class_lookup(lookup)
namespace = lookup.get_namespace('http://example.com')
namespace['element2'] = MyXmlClass2
root = etree.XML(xml, parser)
root.getAttribute()
>>> 1
element1 = root.getchildren()[0]
element1.getAttribute()
>>> 1
element2 = element1.getchildren()[0]
element2.getAttribute()
>>> 2
I am relatively new to SOAP frameworks and have been reading through Spynes docs and trying to figure out to build a service that accepts the following request:
<?xml version='1.0' encoding='UTF-8'?>
<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://..." xmlns:xsi=http:/..." xmlns:xsd="http://...">
<SOAP-ENV:Body>
<modifyRequest returnData="everything" xmlns="urn:...">
<attr ID="..."/>
<data>
</data>
</modifyRequest>
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>
I have read through the docs but just have not seen enough more complex examples to figure out how to put together something to handle this. The <attr ID="..."/> tag must be processed for the ID attribute and the <data> tags contain some varying collection of additional xml. I understand its better to formally define the service but for now I was hoping to use anyXML (?) to accept whatever is in the tags. I need to accept and process the ID attribute along with its xml payload contained within the data tags.
I'd be grateful for any guidance,
Thanks.
Here's how you'd do it:
NS = 'urn:...'
class Attr(ComplexModel):
__namespace__ = NS
_type_info = [
('ID', XmlAttribute(UnsignedInteger32)),
]
class ModifyRequest(ComplexModel):
__namespace__ = NS
_type_info = [
('returnData', XmlAttribute(Unicode(values=['everything', 'something', 'anything', 'etc']))),
('attr', Attr),
('data', AnyXml),
]
class SomeService(ServiceBase):
#rpc(ModifyRequest, _body_style='bare')
def modifyRequest(ctx, request):
pass
This requires Spyne 2.11 though, _body_style='bare' in problematic in 2.10 and older.
Trying to parse XML, with ElementTree, that contains undefined entity (i.e. ) raises:
ParseError: undefined entity
In Python 2.x XML entity dict can be updated by creating parser (documentation):
parser = ET.XMLParser()
parser.entity["nbsp"] = unichr(160)
but how to do the same with Python 3.x?
Update: There was misunderstanding from my side, because I overlooked that I was calling parser.parser.UseForeignDTD(1) before trying to update XML entity dict, which was causing error with the parser. Luckily, #m.brindley was patient and pointed that XML entity dict still exists in Python 3.x and can be updated the same way as in Python 2.x
The issue here is that the only valid mnemonic entities in XML are quot, amp, apos, lt and gt. This means that almost all (X)HTML named entities must be defined in the DTD using the entity declaration markup defined in the XML 1.1 spec. If the document is to be standalone, this should be done with an inline DTD like so:
<?xml version="1.1" ?>
<!DOCTYPE naughtyxml [
<!ENTITY nbsp " ">
<!ENTITY copy "©">
]>
<data>
<country name="Liechtenstein">
<rank>1 ></rank>
<year>2008©</year>
<gdppc>141100</gdppc>
<neighbor name="Austria" direction="E"/>
<neighbor name="Switzerland" direction="W"/>
</country>
</data>
The XMLParser in xml.etree.ElementTree uses an xml.parsers.expat to do the actual parsing. In the init arguments for XMLParser, there is a space for 'predefined HTML entities' but that argument is not implemented yet. An empty dict named entity is created in the init method and this is what is used to look up undefined entities.
I don't think expat (by extension, the ET XMLParser) is able to handle switching namespaces to something like XHMTL to get around this. Possibly because it will not fetch external namespace definitions (I tried making xmlns="http://www.w3.org/1999/xhtml" the default namespace for the data element but it did not play nicely) but I can't confirm that. By default, expat will raise an error against non XML entities but you can get around that by defining an external DOCTYPE - this causes the expat parser to pass undefined entity entries back to the ET.XMLParser's _default() method.
The _default() method does a look up of the entity dict in the XMLParser instance and if it finds a matching key, it will replace the entity with the associated value. This maintains the Python-2.x syntax mentioned in the question.
Solutions:
If the data does not have an external DOCTYPE and has (X)HTML mnemonic entities, you are out of luck. It is not valid XML and expat is right to throw an error. You should add an external DOCTYPE.
If the data has an external DOCTYPE, you can just use your old syntax to map mnemonic names to characters. Note: you should use chr() in py3k - unichr() is not a valid name anymore
Alternatively, you could update XMLParser.entity with html.entities.html5 to map all valid HTML5 mnemonic entities to their characters.
If the data is XHTML, you could subclass HTMLParser to handle mnemonic entities but this won't return an ElementTree as desired.
Here is the snippet I used - it parses XML with an external DOCTYPE through HTMLParser (to demonstrate how to add entity handling by subclassing), ET.XMLParser with entity mappings and expat (which will just silently ignore undefined entities due to the external DOCTYPE). There is a valid XML entity (>) and an undefined entity (©) which I map to chr(0x24B4) with the ET.XMLParser.
from html.parser import HTMLParser
from html.entities import name2codepoint
import xml.etree.ElementTree as ET
import xml.parsers.expat as expat
xml = '''<?xml version="1.0"?>
<!DOCTYPE data PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<data>
<country name="Liechtenstein">
<rank>1></rank>
<year>2008©</year>
<gdppc>141100</gdppc>
<neighbor name="Austria" direction="E"/>
<neighbor name="Switzerland" direction="W"/>
</country>
</data>'''
# HTMLParser subclass which handles entities
print('=== HTMLParser')
class MyHTMLParser(HTMLParser):
def handle_starttag(self, name, attrs):
print('Start element:', name, attrs)
def handle_endtag(self, name):
print('End element:', name)
def handle_data(self, data):
print('Character data:', repr(data))
def handle_entityref(self, name):
self.handle_data(chr(name2codepoint[name]))
htmlparser = MyHTMLParser()
htmlparser.feed(xml)
# ET.XMLParser parse
print('=== XMLParser')
parser = ET.XMLParser()
parser.entity['copy'] = chr(0x24B8)
root = ET.fromstring(xml, parser)
print(ET.tostring(root))
for elem in root:
print(elem.tag, ' - ', elem.attrib)
for subelem in elem:
print(subelem.tag, ' - ', subelem.attrib, ' - ', subelem.text)
# Expat parse
def start_element(name, attrs):
print('Start element:', name, attrs)
def end_element(name):
print('End element:', name)
def char_data(data):
print('Character data:', repr(data))
print('=== Expat')
expatparser = expat.ParserCreate()
expatparser.StartElementHandler = start_element
expatparser.EndElementHandler = end_element
expatparser.CharacterDataHandler = char_data
expatparser.Parse(xml)
I was having a similar issue and got around it by using lxml. Its etree.XMLParser has a recover keyword argument which forces it to try to ignore broken XML.
from xml.etree import ElementTree
from html.entities import name2codepoint
from io import StringIO
import unicodedata
url = "https://docs.python.org/3/library/html.parser.html"
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 \
(KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'
}
html_response = requests.get(url=url, headers = headers)
def getParser():
return xp
with open("sample.html", "w", encoding="utf-8") as html_file:
html_file.write(html_response.text)
xp = ElementTree.XMLParser()
for k, v in name2codepoint.items(): xp.entity[k] = chr(v)
with open("sample.html", "r", encoding="utf-8") as html_file:
html = html_file.read()
b = StringIO(html)
t = ElementTree.parse(b, xp)
print(t)