I try to extend ISOSTS XSD scheme for supporting SVG images tags.
I found XSD scheme for SVG and has put it near ISOSTS.xsd.
Now I try to extend ISOSTS.xsd:
<?xml version="1.0" encoding="utf-8"?>
<xs:schema
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:mml="http://www.w3.org/1998/Math/MathML"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:tbx="urn:iso:std:iso:30042:ed-1"
xmlns:xlink="http://www.w3.org/1999/xlink"
<!-- my line -->
xmlns:svg="http://www.w3.org/2000/svg"
elementFormDefault="qualified">
<xs:import namespace="http://www.w3.org/1998/Math/MathML"
schemaLocation="ncbi-mathml2/mathml2.xsd"/>
<xs:import namespace="http://www.w3.org/1999/xlink"
schemaLocation="xlink.xsd"/>
<!-- XSD import of namespace http://www.w3.org/2001/XMLSchema-instance suppressed (not necessary) -->
<xs:import namespace="http://www.w3.org/XML/1998/namespace"
schemaLocation="xml.xsd"/>
<xs:import namespace="urn:iso:std:iso:30042:ed-1"
schemaLocation="tbx.xsd"/>
<!-- my line -->
<xs:import namespace="http://www.w3.org/2000/svg"
schemaLocation="SVG.xsd"/>
....
<xs:element name="p">
<xs:complexType mixed="true">
<xs:choice minOccurs="0" maxOccurs="unbounded">
<!-- my line --> <xs:element ref="svg:svg"/>
<xs:element ref="email"/>
....
But I have error when try to load scheme:
from lxml.etree import parse, XMLSchema
schema_file = open(self._schema_filename)
schema_doc = parse(schema_file)
schema_file.close()
self._xmlschema = XMLSchema(schema_doc) # Error
Error message:
File "src/lxml/xmlschema.pxi", line 87, in lxml.etree.XMLSchema.init (src/lxml/lxml.etree.c:197819)
lxml.etree.XMLSchemaParseError: Element '{http://www.w3.org/2001/XMLSchema}element', attribute 'ref': References from this schema to components in the namespace 'http://www.w3.org/2000/svg' are not allowed, since not indicated by an import statement., line 4664
What is wrong?
The message seems clear enough to me, I'm not sure which part of it you don't understand. Your schema document imports schema components for various namespaces (mathml, xlink, xml, etc) but it makes no attempt to import the schema for SVG, and the error message is telling you so.
I replicated your three modifications (declaring a namespace binding for the SVG namespace, importing the SVG namespace, and referring to the svg:svg element), but got no error from Xerces or Saxon EE.
So it seems to me that you've done everything right.
The error message suggests that your XSD validator is not picking up the import.
If I had to guess (and I suppose I have to, since while you've given a very concise statement of the problem, we don't have a reproducible error), your validator is looking at an interim version of the schema document in which the reference to svg:svg has been added to the content model of p, but the xs:import statement has not yet been added to the beginning of the schema document.
Possibly your Python bytecode is out of date and your Python needs to be recompiled? (Pure conjecture; I don't know how much schema information lxml generates at compile time and how much it generates at run time.)
Problem solved using next XSD schema for SVG: https://github.com/dumistoklus/svg-xsd-schema
Related
I'm trying to read XML with ElementTree and write the result back to disk. My long-term goal is to prettify the XML this way. However, in my naive approach, ElementTree eats all the namespace declarations in the document and I don't understand why. Here is an example
test.xsd
<?xml version='1.0' encoding='UTF-8'?>
<xs:schema xmlns:xs='http://www.w3.org/2001/XMLSchema'
xmlns='sdformat/pose' targetNamespace='sdformat/pose'
xmlns:pose='sdformat/pose'
xmlns:types='http://sdformat.org/schemas/types.xsd'>
<xs:import namespace='sdformat/pose' schemaLocation='./pose.xsd'/>
<xs:element name='pose' type='poseType' />
<xs:simpleType name='string'><xs:restriction base='xs:string' /></xs:simpleType>
<xs:simpleType name='pose'><xs:restriction base='types:pose' /></xs:simpleType>
<xs:complexType name='poseType'>
<xs:simpleContent>
<xs:extension base="pose">
<xs:attribute name='relative_to' type='string' use='optional' default=''>
</xs:attribute>
</xs:extension>
</xs:simpleContent>
</xs:complexType>
</xs:schema>
test.py
from xml.etree import ElementTree
ElementTree.register_namespace("types", "http://sdformat.org/schemas/types.xsd")
ElementTree.register_namespace("pose", "sdformat/pose")
ElementTree.register_namespace("xs", "http://www.w3.org/2001/XMLSchema")
tree = ElementTree.parse("test.xsd")
tree.write("test_out.xsd")
Produces test_out.xsd
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="sdformat/pose">
<xs:import namespace="sdformat/pose" schemaLocation="./pose.xsd" />
<xs:element name="pose" type="poseType" />
<xs:simpleType name="string"><xs:restriction base="xs:string" /></xs:simpleType>
<xs:simpleType name="pose"><xs:restriction base="types:pose" /></xs:simpleType>
<xs:complexType name="poseType">
<xs:simpleContent>
<xs:extension base="pose">
<xs:attribute name="relative_to" type="string" use="optional" default="">
</xs:attribute>
</xs:extension>
</xs:simpleContent>
</xs:complexType>
</xs:schema>
Notice how test_out.xsd is missing any namespace declarations from test.xsd. I would expect them to be identical. I verified that the latter is valid XML by validating it. It validates with exception of my choice of namespace URI, which I think shouldn't matter.
Update:
Based on mzji's comment I realized that this only happens for values of attributes. With this in mind, I can manually add the namespaces like so:
from xml.etree import ElementTree
namespaces = {
"types": "http://sdformat.org/schemas/types.xsd",
"pose": "sdformat/pose",
"xs": "http://www.w3.org/2001/XMLSchema"
}
for prefix, ns in namespaces.items():
ElementTree.register_namespace(prefix, ns)
tree = ElementTree.parse("test.xsd")
root = tree.getroot()
queue = [tree.getroot()]
while queue:
element:ElementTree.Element = queue.pop()
for value in element.attrib.values():
try:
prefix, value = value.split(":")
except ValueError:
# no namespace, nothing to do
pass
else:
if prefix == "xs":
break # ignore XMLSchema namespace
root.attrib[f"xmlns:{prefix}"] = namespaces[prefix]
for child in element:
queue.append(child)
tree.write("test_out.xsd")
While this solves the problem, it is quite an ugly solution. I also still don't understand why this happens in the first place, so it doesn't answer the question.
There is a valid reason for this behaviour, but it requires a good understanding of XML Schema concepts.
First, some important facts:
Your XML document is not just any old XML document. It is an XSD.
An XSD is described by a schema (See schema for schema )
The attribute xs:restriction/#base is not an xs:string. Its type is xs:QName.
Based on the above facts, we can assert the following:
if test.xsd is parsed as an XML document, but without knowledge of the 'schema for schema' then the value of the base attribute will be treated as a string (technically, as PCDATA).
if test.xsd is parsed using a validating XML parser, with the 'schema for schema' as the XSD, then the value of the base attribute will be parsed as xs:QName
When ElementTree writes the output XML, its behaviour should depend on the data type of base. If base is a QName then ElementTree should detect that it is using the namespace prefix 'types' and it should emit the corresponding namespace declaration.
If you are not supplying the 'schema for schema' when parsing test.xsd then ElementTree is off the hook, because it cannot possibly know that base is supposed to be interpreted as a QName.
I am trying to call "Get_Purchase_Orders" operation in python and it throws below error when response is received
TypeError error in Get_Purchase_Orders : {urn:com.workday/bsvc}Bill_To_Address_ReferenceType() got an unexpected keyword argument 'Address_Reference'. Signature: `({Bill_To_Address_Reference: {urn:com.workday/bsvc}Unique_IdentifierObjectType} | {Address_Reference: {urn:com.workday/bsvc}Address_ReferenceType[]}) Unexpected error: <class 'TypeError'>
the WSDL file is accessible here
My Findings:
Bill_To_Address_Data has two elements (Bill_To_Address_Reference and Address_Reference) that are mutually exclusive, meaning only one out of two elements are expected (there is choice for Bill_To_Address_Reference Address_Reference and both tags are coming in response ). Sample XML can be seen here.
xml chunk can be seen below as well
<bsvc:Bill_To_Address_Data>
<!-- You have a CHOICE of the next 2 items at this level -->
<!-- Optional: -->
<bsvc:Bill_To_Address_Reference bsvc:Descriptor="string">
<!-- Zero or more repetitions: -->
<bsvc:ID bsvc:type="string">string</bsvc:ID>
</bsvc:Bill_To_Address_Reference>
<!-- Zero or more repetitions: -->
<bsvc:Address_Reference>
<!-- Optional: -->
<bsvc:ID>string</bsvc:ID>
</bsvc:Address_Reference>
</bsvc:Bill_To_Address_Data>
below is xsd chunk for above xml
<xsd:complexType name="Bill_To_Address_ReferenceType">
<xsd:annotation>
<xsd:documentation>Contains a reference instance or a Address Reference ID for an existing address</xsd:documentation>
<xsd:appinfo>
<wd:Validation>
<wd:Validation_Message>The Provided Bill To Address is Invalid for this Purchase Order</wd:Validation_Message>
</wd:Validation>
<wd:Validation>
<wd:Validation_Message>The Provided Bill To Address is Invalid for this Purchase Order</wd:Validation_Message>
</wd:Validation>
</xsd:appinfo>
</xsd:annotation>
<xsd:sequence>
<xsd:choice>
<xsd:element name="Bill_To_Address_Reference" type="wd:Unique_IdentifierObjectType" minOccurs="0">
<xsd:annotation>
<xsd:documentation>Reference to an existing Ship-To address.</xsd:documentation>
</xsd:annotation>
</xsd:element>
<xsd:element name="Address_Reference" type="wd:Address_ReferenceType" minOccurs="0" maxOccurs="unbounded">
<xsd:annotation>
<xsd:documentation>Address Reference ID</xsd:documentation>
</xsd:annotation>
</xsd:element>
</xsd:choice>
</xsd:sequence>
</xsd:complexType>
I confirmed this in oxygen when validating XML against the XSD in WSDL or can be accessed here
Now what I want is to ignore this error and parse the response in python using zeep.
Any help will be highly appreciated.
Your choices are:
Modify the WSDL (the XML schema part) so that both tags are allowed in the same request
Find a setting in Zeep that allows you to switch off XSD validation
Stop using Zeep, and find another tool that allows you to parse a request without validating against the WSDL
Option 1 is best because WSDL is supposed to be a contract between the service and its callers. If you don't validate then the value of using WSDL is greatly reduced.
Using Python's lxml library, I'm trying to load a .xsd as schema. The Python script is in one directory and the schemas are in another:
/root
my_script.py
/data
/xsd
schema_1.xsd
schema_2.xsd
The problem is that schema_1.xsd includes schema_2.xsd like this:
<xsd:include schemaLocation="schema_2.xsd"/>
Being schema_2.xsd a relative path (the two schemas are in the same directory), lxml doesn't find it and it rises and error:
schema_root = etree.fromstring(open('data/xsd/schema_1.xsd').read().encode('utf-8'))
schema = etree.XMLSchema(schema_root)
--> xml.etree.XMLSchemaParseError: Element '{http://www.w3.org/2001/XMLSchema}include': Failed to load the document './schema_2.xsd' for inclusion
How to solve this problem without changing the schema files?
One option is to use an XML Catalog. You could also probably use a custom URI Resolver, but I've always used a catalog. It's easier for non-developers to make configuration changes. This is especially helpful if you're delivering an executable instead of plain Python.
Using a catalog is different between Windows and Linux; see here for more info.
Here's a Windows example using Python 3.#.
XSD #1 (schema_1.xsd)
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
<xs:include schemaLocation="schema_2.xsd"/>
<xs:element name="doc">
<xs:complexType>
<xs:sequence>
<xs:element ref="test"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="test" type="test"/>
</xs:schema>
XSD #2 (schema_2.xsd)
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
<xs:simpleType name="test">
<xs:restriction base="xs:string">
<xs:enumeration value="Hello World"/>
</xs:restriction>
</xs:simpleType>
</xs:schema>
XML Catalog (catalog.xml)
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE catalog PUBLIC "-//OASIS//DTD XML Catalogs V1.1//EN" "http://www.oasis-open.org/committees/entity/release/1.1/catalog.dtd">
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
<!-- The path in #uri is relative to this file (catalog.xml). -->
<system systemId="schema_2.xsd" uri="./xsd_test/schema_2.xsd"/>
</catalog>
Python
import os
from urllib.request import pathname2url
from lxml import etree
# The XML_CATALOG_FILES environment variable is used by libxml2 (which is used by lxml).
# See http://xmlsoft.org/catalog.html.
if "XML_CATALOG_FILES" not in os.environ:
# Path to catalog must be a url.
catalog_path = f"file:{pathname2url(os.path.join(os.getcwd(), 'catalog.xml'))}"
# Temporarily set the environment variable.
os.environ['XML_CATALOG_FILES'] = catalog_path
schema_root = etree.fromstring(open('xsd_test/schema_1.xsd').read().encode('utf-8'))
schema = etree.XMLSchema(schema_root)
print(schema)
Print Output
<lxml.etree.XMLSchema object at 0x02B4B3F0>
There may also be a simpler solution in your case. I ran into this today and resolved it by temporarily changing the current working directory on importing the xml schema:
import os
from lxml import etree
xml_schema_path = 'data/xsd/schema_1.xsd'
# Get the working directory the script was run from
run_dir = os.getcwd()
# Set the working directory to the schema dir so relative imports resolve from there
os.chdir(os.path.dirname(xml_schema_path))
# Load the schema. Note that you can use the `file=` option to point to a file path
xml_schema = etree.XMLSchema(file=os.path.basename(xml_schema_path))
# Re-set the working directory
os.chdir(run_dir)
I am trying to parse a XML using python ,xml example snippet:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<raml xmlns="raml21.xsd" version="2.1">
<series xmlns="" scope="USA" name="Arizona">
<header>
<log action="created"/>
</header>
<x_ns color="Blue">
<p name="timeZone">(GMT-10)</p>
</x_ns>
<x_ns color="Red">
<p name="AvgHeight">175</p>
</x_ns>
<x_ns color="black">
<p name="AvgWeight">235</p>
</x_ns>
the problem is namespaces keeps changing so as an alternative I tried to read the xmlns string first then create a dicionary using namespaces using the below code
root = raw_xml.getroot()
namespace_temp1=root.tag.split("}")
namespace_temp2=namespace_temp1[0].strip('{')
namespaces_auto={}
tag_name =["x","y","z","w","v"]
ns_name=[namespace_temp2,namespace_temp2,namespace_temp2,namespace_temp2,namespace_temp2]
namespace_temp3=zip(tag_name,ns_name)
for tag,ns in namespace_temp3:
namespaces_auto[tag]=ns
namespaces=namespaces_auto
to access a particular tag with namespace I am using the code as follows
for data in raw_xml.findall('x:x_ns',namespaces)
this pretty much solves the problem but gets stuck when the child node has blank xmlns as seen in the series tag (xmlns=""). Not Sure how to incorporate it in the code to check this condition.
I'm using python's lxml to validate xmls against a schema. I have a schema with an element:
<xs:element name="link-url" type="xs:anyURL"/>
and I test, for example, this (part of an) xml:
<a link-url="server/path"/>
I would like this test to FAIL because the link-url doesn't start with http://. I tried switching anyURI to anyURL but this results in an exception - it's not a valid tag.
Is this possible with lxml? is it possible at all with schema validation?
(I'm pretty sure xs:anyURL is not valid. The XML Schema standard calls it anyURI. And since link-url is an attribute, shouldn't you be using xs:attribute instead of xs:element?)
You could restrict the URIs by creating a new simpleType based on it, and put a restriction on the pattern. For example,
#!/usr/bin/env python2.6
from lxml import etree
from StringIO import StringIO
schema_doc = etree.parse(StringIO('''
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:simpleType name="httpURL">
<xs:restriction base="xs:anyURI">
<xs:pattern value='https?://.+'/>
<!-- accepts only http:// or https:// URIs. -->
</xs:restriction>
</xs:simpleType>
<xs:element name="a">
<xs:complexType>
<xs:attribute name="link-url" type="httpURL"/>
</xs:complexType>
</xs:element>
</xs:schema>
''')) #/
schema = etree.XMLSchema(schema_doc)
schema.assertValid(etree.parse(StringIO('<a link-url="http://sd" />')))
assert not schema(etree.parse(StringIO('<a link-url="server/path" />')))