etree schematron validation extends tag - python

I am trying to validate an xml by using a group of schemas which a schema that includes the others.
Main schematron :
<?xml version="1.0" encoding="UTF-8"?>
<sch:schema xmlns="http://purl.oclc.org/dsdl/schematron"
xmlns:sch="http://purl.oclc.org/dsdl/schematron"
xmlns:sh="http://www.unece.org/cefact/namespaces/StandardBusinessDocumentHeader"
xmlns:ef="http://www.efatura.gov.tr/envelope-namespace">
<sch:include href="UBL-TR_Codelist.sch#codes"/>
<sch:include href="UBL-TR_Common_Schematron.sch#abstracts"/>
<sch:ns prefix="sh" uri="http://www.unece.org/cefact/namespaces/StandardBusinessDocumentHeader" />
<sch:ns prefix="ef" uri="http://www.efatura.gov.tr/package-namespace" />
<!-- .... -->
<sch:pattern id="document">
<sch:rule context="sh:StandardBusinessDocument">
<sch:extends rule="DocumentCheck"/>
</sch:rule>
</sch:pattern>
</sch:schema>
Common schmatron:
<sch:schema xmlns="http://purl.oclc.org/dsdl/schematron"
xmlns:sch="http://purl.oclc.org/dsdl/schematron">
<sch:pattern name="AbstractRules" id="abstracts">
<sch:p>Pattern for storing abstract rules</sch:p>
<!-- Rule to validate StandardBusinessDocument -->
<sch:rule abstract="true" id="DocumentCheck">
<sch:assert test="sh:StandardBusinessDocumentHeader">sh:StandardBusinessDocumentHeader zorunlu bir elemandır.</sch:assert>
<sch:assert test="ef:Package">ef:Package zorunlu bir elemandır.</sch:assert>
</sch:rule>
</sch:pattern>
</sch:schema>
The problem is that, in main schema, if i put a directly assertion tag, for example:
<assert test="sum(//Percent)=100">Sum is not 100%.</assert>
between "rule" tags, like that:
<sch:pattern id="document">
<sch:rule context="sh:StandardBusinessDocument">
<assert test="sum(//Percent)=100">Sum is not 100%.</assert>
</sch:rule>
</sch:pattern>
Than the etree's isoschematron.Schematron class validates my main schematron. Else it throws an error like this:
Traceback (most recent call last):
File "C:\SUNUCU\validate\v.py", line 102, in <module>
schematron = etree.Schematron(s)
File "schematron.pxi", line 116, in lxml.etree.Schematron.__init__ (src\lxml\lxml.etree.c:156251)
SchematronParseError: Document is not a valid Schematron schema
I've tried it with etree.Schematron class and it throws "SchematronParseError: invalid schematron schema:" too.
I am thinking that the problem is about schematron's
<sch:extends />
tag. I mean, errors appear when schematron use an external assertion of rule.
What is the correct way working with related and united schematrons by using python?
Thanks in advance.

I think the problem was much simpler:
it seems like you forgot the "sch:" XML-namespace prefix in the <assert> element.

Related

Error with Python and XML

I'm getting an error when trying to grab a value from my XML. I get "Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration."
Here is my code:
import requests
import lxml.etree
from requests.auth import HTTPBasicAuth
r= requests.get("https://somelinkhere/folder/?parameter=abc", auth=HTTPBasicAuth('username', 'password'))
print r.text
root = lxml.etree.fromstring(r.text)
textelem = root.find("opensearch:totalResults")
print textelem.text
I'm getting this error:
Traceback (most recent call last):
File "tickets2.py", line 8, in <module>
root = lxml.etree.fromstring(r.text)
File "src/lxml/lxml.etree.pyx", line 3213, in lxml.etree.fromstring (src/lxml/lxml.etree.c:82934)
File "src/lxml/parser.pxi", line 1814, in lxml.etree._parseMemoryDocument (src/lxml/lxml.etree.c:124471)
ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration.
Here is what the XML looks like, where I'm trying to grab the file in the last line.
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:apple-wallpapers="http://www.apple.com/ilife/wallpapers" xmlns:g-custom="http://base.google.com/cns/1.0" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/" xmlns:media="http://search.yahoo.com/mrss/" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss/" xmlns:creativeCommons="http://backend.userland.com/creativeCommonsRssModule" xmlns:cc="http://web.resource.org/cc/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:g-core="http://base.google.com/ns/1.0">
<title>Feed from some link here</title>
<link rel="self" href="https://somelinkhere/folder/?parameter=abc" />
<link rel="first" href="https://somelinkhere/folder/?parameter=abc" />
<id>https://somelinkhere/folder/?parameter=abc</id>
<updated>2018-03-06T17:48:09Z</updated>
<dc:creator>company.com</dc:creator>
<dc:date>2018-03-06T17:48:09Z</dc:date>
<opensearch:totalResults>4</opensearch:totalResults>
I have tried various changes from links like https://twigstechtips.blogspot.com/2013/06/python-lxml-strings-with-encoding.html and http://makble.com/how-to-parse-xml-with-python-and-lxml but I keep running into the same error.
Instead of r.text, which guesses at the text encoding and decodes it, try using r.content which accesses the response body as bytes. (See http://docs.python-requests.org/en/latest/user/quickstart/#response-content.)
You could also use r.raw. See parsing XML file gets UnicodeEncodeError (ElementTree) / ValueError (lxml) for more info.
Once that issue is fixed, you'll have the issue of the namespace. The element you're trying to find (opensearch:totalResults) has the prefix opensearch which is bound to the uri http://a9.com/-/spec/opensearch/1.1/.
You can find the element by combining the namespace uri and the local name (Clark notation):
{http://a9.com/-/spec/opensearch/1.1/}totalResults
See http://lxml.de/tutorial.html#namespaces for more info.
Here's an example with both changes implemented:
os = "{http://a9.com/-/spec/opensearch/1.1/}"
root = lxml.etree.fromstring(r.content)
textelem = root.find(os + "totalResults")
print textelem.text

How to extend XSD scheme for supporting SVG?

I try to extend ISOSTS XSD scheme for supporting SVG images tags.
I found XSD scheme for SVG and has put it near ISOSTS.xsd.
Now I try to extend ISOSTS.xsd:
<?xml version="1.0" encoding="utf-8"?>
<xs:schema
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:mml="http://www.w3.org/1998/Math/MathML"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:tbx="urn:iso:std:iso:30042:ed-1"
xmlns:xlink="http://www.w3.org/1999/xlink"
<!-- my line -->
xmlns:svg="http://www.w3.org/2000/svg"
elementFormDefault="qualified">
<xs:import namespace="http://www.w3.org/1998/Math/MathML"
schemaLocation="ncbi-mathml2/mathml2.xsd"/>
<xs:import namespace="http://www.w3.org/1999/xlink"
schemaLocation="xlink.xsd"/>
<!-- XSD import of namespace http://www.w3.org/2001/XMLSchema-instance suppressed (not necessary) -->
<xs:import namespace="http://www.w3.org/XML/1998/namespace"
schemaLocation="xml.xsd"/>
<xs:import namespace="urn:iso:std:iso:30042:ed-1"
schemaLocation="tbx.xsd"/>
<!-- my line -->
<xs:import namespace="http://www.w3.org/2000/svg"
schemaLocation="SVG.xsd"/>
....
<xs:element name="p">
<xs:complexType mixed="true">
<xs:choice minOccurs="0" maxOccurs="unbounded">
<!-- my line --> <xs:element ref="svg:svg"/>
<xs:element ref="email"/>
....
But I have error when try to load scheme:
from lxml.etree import parse, XMLSchema
schema_file = open(self._schema_filename)
schema_doc = parse(schema_file)
schema_file.close()
self._xmlschema = XMLSchema(schema_doc) # Error
Error message:
File "src/lxml/xmlschema.pxi", line 87, in lxml.etree.XMLSchema.init (src/lxml/lxml.etree.c:197819)
lxml.etree.XMLSchemaParseError: Element '{http://www.w3.org/2001/XMLSchema}element', attribute 'ref': References from this schema to components in the namespace 'http://www.w3.org/2000/svg' are not allowed, since not indicated by an import statement., line 4664
What is wrong?
The message seems clear enough to me, I'm not sure which part of it you don't understand. Your schema document imports schema components for various namespaces (mathml, xlink, xml, etc) but it makes no attempt to import the schema for SVG, and the error message is telling you so.
I replicated your three modifications (declaring a namespace binding for the SVG namespace, importing the SVG namespace, and referring to the svg:svg element), but got no error from Xerces or Saxon EE.
So it seems to me that you've done everything right.
The error message suggests that your XSD validator is not picking up the import.
If I had to guess (and I suppose I have to, since while you've given a very concise statement of the problem, we don't have a reproducible error), your validator is looking at an interim version of the schema document in which the reference to svg:svg has been added to the content model of p, but the xs:import statement has not yet been added to the beginning of the schema document.
Possibly your Python bytecode is out of date and your Python needs to be recompiled? (Pure conjecture; I don't know how much schema information lxml generates at compile time and how much it generates at run time.)
Problem solved using next XSD schema for SVG: https://github.com/dumistoklus/svg-xsd-schema

python : appending new data in xml is overring existing data

i want to add entire tag to xml, below is my XML format.
<?xml version="1.0" encoding="UTF-8"?>
<ca st="true" name="XMLConfig">
<app>
<!--- I want to add entire commneted tag to XML . !
<ar ty="co" name="st">
<ly ty="pt">
<pt>value</pt>
</Layout>
</ar> -->
<roll name="roll" fN="file.log" fP="logs.gz">
<ly type="ptl">
<pt>value</pt>
</ly>
<po>
<!-- Comment /> -->
<si size="100 MB" />
<!-- Comment /> -->
</po>
<de fI="max" max="10"/>
</roll>
</app>
as shown in above file i want to add this tag in file
<ar ty="co" name="st">
<ly ty="pt">
<pt>value</pt>
</Layout>
</ar>
this is where i reached so far..
for appenders in tree.xpath('//Appenders'):
if appenders.getchildren():
appenders.remove(appenders.getchildren()[0])
appenders.insert(0, appenders.getparent().append(etree.fromstring('<ar ty="co" name="st"> <ly ty="pt"><pt>value</pt></Layout></ar>')))
this is removing all other content after new content.
any help will be appreciated.!
In my opinion the first way you did it is way better. You just made some mistakes in your insert line, it should be this:
appenders.insert(0, etree.fromstring('<ar ty="co" name="st"> <ly ty="pt"><pt>value</pt></ly></ar>')))
I'm surprised it didn't throw an error for you because your insert line is basically this:
appenders.insert(0,None)
Also I noticed you do something in all of your questions:
You leave out some line(s) of your xml file. (I mean why?)
You shorten the tag names in your xml but you keep their long version in the code, which is kind of annoying because the person who wants to answer you have to change the code again to see if it is working.
I got it working, !
for apps in tree.xpath('//app'):
if appenders.tag == 'app':
appenders.insert(0, etree.SubElement(appenders, 'ar', ty="Co", name="st"))
for appender in tree.xpath('//ar'):
appender.insert(0, etree.SubElement(appender, 'ly', ty="pt"))
for layout in tree.xpath('//ly'):
layout.insert(0, etree.SubElement(layout, 'pt'))
for pattern in tree.xpath('//pt'):
pattern.text = 'value'
tree.write(r'C:\value.xml', xml_declaration=True, encoding='UTF-8')
if anyone has better way to do this .. please let me know to so i can improve on this .!

Python ElementTree does not like colon in name of processing instruction

The following code:
import xml.etree.ElementTree as ET
xml = '''\
<?xml version="1.0" encoding="UTF-8"?>
<testCaseConfig>
<?LazyComment Blah de blah/?>
<testCase runLimit="420" name="d1/n1"/>
<testCase runLimit="420" name="d1/n2"/>
</testCaseConfig>'''
root = ET.fromstring(xml)
xml2 = xml.replace('LazyComment ', 'LazyComment:')
print(xml2)
try:
root2 = ET.fromstring(xml2)
except ET.ParseError:
print("\nERROR in xml2!!!\n")
xml3 = xml2.replace('testCaseConfig', 'testCaseConfig xmlns:Blah="http://www.w3.org/TR/html4/"', 1)
print(xml3)
try:
root3 = ET.fromstring(xml3)
except ET.ParseError:
print("\nERROR in xml3!!!\n")
raise
Gives this output:
<?xml version="1.0" encoding="UTF-8"?>
<testCaseConfig>
<?LazyComment:Blah de blah/?>
<testCase runLimit="420" name="d1/n1"/>
<testCase runLimit="420" name="d1/n2"/>
</testCaseConfig>
ERROR in xml2!!!
<?xml version="1.0" encoding="UTF-8"?>
<testCaseConfig xmlns:Blah="http://www.w3.org/TR/html4/">
<?LazyComment:Blah de blah/?>
<testCase runLimit="420" name="d1/n1"/>
<testCase runLimit="420" name="d1/n2"/>
</testCaseConfig>
ERROR in xml3!!!
Traceback (most recent call last):
File "C:\Users\Paddy3118\Google Drive\Code\elementtree_error.py", line 30, in <module>
root3 = ET.fromstring(xml3)
File "C:\Anaconda3\envs\Py3.5\lib\xml\etree\ElementTree.py", line 1333, in XML
parser.feed(text)
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 3, column 17
I searched and found this Q that pointed to other resources that I read.
It seems that the '?' makes it a processing instruction whose tag name can include colons. Without the '?' then a colon in a name indicates namespace and one of the answers stated that defining the namespace should make things work.
Combining '?' and ':' though causes issues with ElementTree.
I am given xml files of this type that are used by other tools that do parse it OK and want to process the files myself using Python. Any ideas?
Thanks.
According to the W3C Extensible Markup Language 1.0 Specifications under Common Syntactic Constructs:
The Namespaces in XML Recommendation [XML Names] assigns a meaning to
names containing colon characters. Therefore, authors should not use
the colon in XML names except for namespace purposes, but XML
processors must accept the colon as a name character.
And further in the W3C XPath 1.0 note on Processing Instruction nodes:
A processing instruction has an expanded-name: the local part is the
processing instruction's target; the namespace URI is null.
Altogether, <?LazyComment:Blah de blah/?> is an invalid processing instruction as colons is used to reference namespace URIs and for processing instructions that part is null or empty. Therefore, Python's XML processor complains that using such an instruction does not render a well-formed XML.
Also, reconsider such tools that are generating such invalid processing instructions as they are not handling valid XML documents. Possibly, such tools are treating XML files as text documents (similar to the way you were able to replace the string representation of XML but would not have been able to append an instruction using etree).
<?xml version="1.0" encoding="UTF-8"?>
<testCaseConfig xmlns:Blah="http://www.w3.org/TR/html4/">
<?LazyComment:Blah de blah/?>
<testCase runLimit="420" name="d1/n1"/>
<testCase runLimit="420" name="d1/n2"/>
</testCaseConfig xmlns:Blah="http://www.w3.org/TR/html4/">
Is invalid XML. You can't have attributes in the closing tag. The last line should be just </testCaseConfig>
Also comments are written like this
<!-- this is a comment -->

WordPress XML ParseError: unbound prefix?

I am trying to use Python's xml.etree.ElementTree.parse() function to parse an XML file I created by exporting all of the content from a WordPress blog. However, when I try like so:
import xml.etree.ElementTree as xml
tree = xml.parse('/path/to/file.xml')
I get the following error:
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 1183, in parse
tree.parse(source, parser)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 656, in parse
parser.feed(data)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 1643, in feed
self._raiseerror(v)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 1507, in _raiseerror
raise err
ParseError: unbound prefix: line 189, column 1
Here's what's on line 189 of my XML file:
<atom:link rel="search" type="application/opensearchdescription+xml" href="http://blogname.wordpress.com/osd.xml" title="blog name" />
I've seen many questions about this error coming up with Android development, but I can't tell if and how that applies to my situation. Can anyone help with this?
Apologies to everyone for whom this was stupidly obvious, but it turns out I simply didn't have a namespace definition for "atom" in the document. I'm guessing that "unbound prefix" means that the prefix "atom" wasn't "bound" to a namespace definition?
Anyway, adding said definition has solved the problem. Although it makes me wonder why WordPress exports XML files without proper definitions for all of the namespaces they use...
If you remove all the Name Space, works absolutely fine.
Change
<s:home>USA</s:home>
to
<home>USA</home>
Just in case it helps someone some day, I was also working with a WordPress XML export (WordPress eXtended RSS) file in Python and was getting the same error. In my case, WordPress had included most of the correct namespace definitions. However, the XML had iTunes podcast information as well, and the iTunes namespace declaration was not present.
I fixed it by adding xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" into the RSS declaration block. So this:
<!-- generator="WordPress/4.9.8" created="2018-08-06 03:12" -->
<rss version="2.0"
xmlns:excerpt="http://wordpress.org/export/1.2/excerpt/"
xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:wp="http://wordpress.org/export/1.2/"
>
became this:
<!-- generator="WordPress/4.9.8" created="2018-08-06 03:12" -->
<rss version="2.0"
xmlns:excerpt="http://wordpress.org/export/1.2/excerpt/"
xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:wp="http://wordpress.org/export/1.2/"
xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd"
>

Categories