Using owl:Class prefix with rdflib and xml serialization - python

I would like to use the owl: prefix in the XML serialization of my RDF ontology (using rdflib version 4.1.1); unfortunately I'm still getting the serialization as rdf:Description tags. I have looked at the answer about binding the namespace to the graph at RDFLib: Namespace prefixes in XML serialization but this seems to only work when serializing using the ns format rather than xml format.
Let's be more concrete. I'm attempting to get the following ontology (as taken from Introducing RDFS and OWL) in XML as follows:
<!-- OWL Class Definition - Plant Type -->
<owl:Class rdf:about="http://www.linkeddatatools.com/plants#planttype">
<rdfs:label>The plant type</rdfs:label>
<rdfs:comment>The class of all plant types.</rdfs:comment>
</owl:Class>
Here is the python code for constructing such a thing, using rdflib:
from rdflib.namespace import OWL, RDF, RDFS
from rdflib import Graph, Literal, Namespace, URIRef
# Construct the linked data tools namespace
LDT = Namespace("http://www.linkeddatatools.com/plants#")
# Create the graph
graph = Graph()
# Create the node to add to the Graph
Plant = URIRef(LDT["planttype"])
# Add the OWL data to the graph
graph.add((Plant, RDF.type, OWL.Class))
graph.add((Plant, RDFS.subClassOf, OWL.Thing))
graph.add((Plant, RDFS.label, Literal("The plant type")))
graph.add((Plant, RDFS.comment, Literal("The class of all plant types")))
# Bind the OWL and LDT name spaces
graph.bind("owl", OWL)
graph.bind("ldt", LDT)
print graph.serialize(format='xml')
Sadly, even with those bind statements, the following XML is printed:
<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
>
<rdf:Description rdf:about="http://www.linkeddatatools.com/plants#planttype">
<rdfs:subClassOf rdf:resource="http://www.w3.org/2002/07/owl#Thing"/>
<rdfs:label>The plant type</rdfs:label>
<rdfs:comment>The class of all plant types</rdfs:comment>
<rdf:type rdf:resource="http://www.w3.org/2002/07/owl#Class"/>
</rdf:Description>
</rdf:RDF>
Granted, this is still an Ontology, and usable - but since we have various editors, the much more compact and readable first version using the owl prefix would be far preferable. Is it possible to do this in rdflib without overriding the serialization method?
Update
In response to the comments, I'll rephrase my "bonus question" as simple clarification to my question at large.
Not a Bonus Question The topic here involves the construction of the OWL namespace formatted ontology which is a shorthand for the more verbose RDF/XML specification. The issue here is larger though than the simple declaration of a namespace prefix for shorthand for only Classes or Properties, there are many shorthand notations that have to be dealt with in code; for example owl:Ontology descriptions should be added as good form to this notation. I am hoping that rdflib has support for the complete specification of the notation- rather than have to roll my own serialization.

Instead of using the xml format, you need to use the pretty-xml format. It's listed in the documentation, Plugin serializers. That will give you the type of output that you're looking for. That is, you'd use a line like the following in order to use the PrettyXMLSerializer:
print graph.serialize(format='pretty-xml')
To address the "bonus question", you can add a line like the following to create the ontology header, and then serializing with pretty-xml will give you the following output.
graph.add((URIRef('https://stackoverflow.com/q/24017320/1281433/ontology.owl'), RDF.type, OWL.Ontology ))
<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF
xmlns:owl="http://www.w3.org/2002/07/owl#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
>
<owl:Ontology rdf:about="https://stackoverflow.com/q/24017320/1281433/ontology.owl"/>
<owl:Class rdf:about="http://www.linkeddatatools.com/plants#planttype">
<rdfs:comment>The class of all plant types</rdfs:comment>
<rdfs:subClassOf rdf:resource="http://www.w3.org/2002/07/owl#Thing"/>
<rdfs:label>The plant type</rdfs:label>
</owl:Class>
</rdf:RDF>
Adding the x rdf:type owl:Ontology triple isn't a very OWL-centric way of declaring the ontology though. It sounds like you're looking for something more like Jena's OntModel interface (which is just a convenience layer over Jena's RDF-centric Model), or the OWLAPI, but for RDFLib. I don't know whether such a thing exists (I'm not an RDFlib user), but you might have a look at:
RDFLib/OWL-RL: It looks like a reasoner, but it might have some of the methods that you need.
Inspecting an ontology with RDFLib: a blog article with links to source that might do some of what you want.
Is there a Python library to handle OWL?: A Stack Overflow question (now off-topic, because library/tool requests are off-topic, but it's an old question) where the accepted answer points out that rdflib is RDF-centric, not OWL-centric, but some of the other answers might be useful, particular this one, although most of those were outdated, even in 2011.

Related

XMLSigner - sign multiple references

I'm developing a digital signature system and I have a question about the XMLSigner library. I could not find my answer by looking at documentation, so I'm gonna ask here.
I have a XML file, which I need to sign it and I have more than a reference, but there's a problem: they don't have an ID specifying for a reference. The XML that I need to sign is something like:
<AppHdr>
<Fr>
<FIId>
<FinInstnId>
<Othr>
<Id>00038166</Id>
</Othr>
</FinInstnId>
</FIId>
</Fr>
(more content...)
</AppHdr>
<Document>
(more content...)
</Document>
I've extracted both AppHdr and Document, and made the signature using each one, and my idea was to put them together later in another xml file, already canonicalized and encrypted, using:
signed_app_hdr = XMLSigner(method=methods.enveloped, signature_algorithm='rsa-sha256',
digest_algorithm='sha256',
c14n_algorithm=
"http://www.w3.org/2001/10/xml-exc-c14n#"). \
sign(et_app_hdr, key=rsa_key)
signed_document_info = XMLSigner(signature_algorithm='rsa-sha256',
digest_algorithm='sha256',
c14n_algorithm=
"http://www.w3.org/2001/10/xml-exc-c14n#"). \
sign(et_document, key=rsa_key)
The digital signature output (padronized by an institution that I'm sending the message), requires that
<Reference URI="">
to reference the <AppHdr>
And
<Reference>
to reference the <Document>
There's also a <KeyInfo>, that is referenced by an id (no issues in this case). I'm just using sign(et_key_info, key=rsa_key, reference_uri='key-info-id') to reference it.
So, my question is: how do I do to reference the AppHdr and Document in the reference_uri? Is it possible? When I just leave reference_uri = None (by default), it just creates <Reference URI="">, which would be no problem for the AppHdr. And for the document? What should I do? Could I create an artificial ID for them and remove later? Idk if it would have implications in cryptography.
Thanks in advance!

How can I register multiple default namespaces when modifying an xml file using python?

I have looked through many namespace documents on here and am only able to slightly relate to a few. in my document, I have 3 defaults and only one colon style xmlns, example:
xmlns="someurlNo1"
xmlns:spatial="someurlNo2"
xmlns="someurlNo3"
xmlns="someurlNo4"
From what I have read, it seems that I have 3 defaults (please correct me if I am interpreting this wrong), but when I modify my base xml and then write my new xml, I am only able to avoid having the first two, ns0 and ns1, not show up by commenting out the last two, which makes everything else part of the last two defaults are labeled with "ns2" and "ns3" even if I register all as such:
ET.register_namespace('',"someurlNo0") #ns0
ET.register_namespace('spatial',"someurlNo1") #ns1
#ET.register_namespace('',"someurlNo2") #ns2
#ET.register_namespace('',"someurlNo3") #ns3
Does anyone know how to register the last two default namespaces correctly? When I leave the last two not commented out, ns0 and ns2 appear where they should, and while all the ns3s disappear, the default is no longer equal to "someurlNo3".
This answer: https://stackoverflow.com/a/43530940 has so far been the most helpful explanation to me in illustrating that there may be multiple defaults that travel down (which I believe is true for my document), but I am still unsure how to properly register them. Any ideas would be much appreciated!
Here is what the top part of my xml looks like that includes all 4 namespaces. I'd rather spare you from seeing all 3k lines but if needed I can share more:
<?xml version='1.0' encoding='UTF-8' standalone='no'?>
<sbml xmlns="http://www.sbml.org/sbml/level3/version1/core" level="3" spatial:required="true" version="1" xmlns:spatial="http://www.sbml.org/sbml/level3/version1/spatial/version1">
<notes>
<body xmlns="http://www.w3.org/1999/xhtml">
<p>Exported by VCell 7.3</p>
</body>
</notes>
<model areaUnits="um2" extentUnits="molecules" id="_zero_6_29_21_Phase1_cellularConcAgain_Spatial" lengthUnits="um" name="06_29_21_Phase1_cellularConcAgain_Spatial" substanceUnits="molecules" timeUnits="s" volumeUnits="um3">
<spatial:geometry xmlns:spatial="http://www.sbml.org/sbml/level3/version1/spatial/version1" id="vcell" spatial:coordinateSystem="cartesian" spatial:id="vcell">
<spatial:listOfCoordinateComponents>
<spatial:coordinateComponent id="x" spatial:id="x" spatial:type="cartesianX" spatial:unit="um">
<spatial:boundaryMin id="Xmin" spatial:id="Xmin" spatial:value="0.0"/>
<spatial:boundaryMax id="Xmax" spatial:id="Xmax" spatial:value="1.6"/>
</spatial:coordinateComponent>
<spatial:coordinateComponent id="y" spatial:id="y" spatial:type="cartesianY" spatial:unit="um">
<spatial:boundaryMin id="Ymin" spatial:id="Ymin" spatial:value="0.0"/>
<spatial:boundaryMax id="Ymax" spatial:id="Ymax" spatial:value="3.5"/>
</spatial:coordinateComponent>
</spatial:listOfCoordinateComponents>
<spatial:listOfDomains>
<spatial:domain id="chr0" spatial:domainType="domainType_chr" spatial:id="chr0">
<spatial:listOfInteriorPoints>
<spatial:interiorPoint spatial:coord1="0.0" spatial:coord2="0.0" spatial:coord3="5.0"/>
</spatial:listOfInteriorPoints>
</spatial:domain>
</spatial:listOfDomains>
<spatial:listOfDomainTypes>
<spatial:domainType id="domainType_chr" spatial:id="domainType_chr" spatial:spatialDimensions="3"/>
</spatial:listOfDomainTypes>
<spatial:listOfGeometryDefinitions>
<spatial:analyticGeometry id="Analytic_Geometry1640227629" spatial:id="Analytic_Geometry1640227629" spatial:isActive="true">
<spatial:listOfAnalyticVolumes>
<spatial:analyticVolume spatial:domainType="domainType_chr" spatial:functionType="layered" spatial:id="chr" spatial:ordinal="0">
<math xmlns="http://www.w3.org/1998/Math/MathML">
<apply>
<neq/>
<cn> 0 </cn>
<cn> 1 </cn>
</apply>
</math>
You're misunderstanding namespaces. That a namespace is "default" is not a property of that namespace, it's a property of the XML in that location.
Don't get distracted. Give all namespace URIs that you're going to use a prefix, done.
ns = {
'a': 'someurlNo0',
'b': 'someurlNo1',
'c': 'someurlNo2',
'd': 'someurlNo3',
'e': 'someurlNo3', # same URI as above, perfectly legal
}
tree = ET.parse('path/to.xml')
tree.findall("./a:node/b:node/c:node/d:node/e:node", namespaces=ns)
It does not even need to be the same prefix as in your XML. In fact, avoid that. Give namespace URIs prefixes that make reading your code easy. All that matters in the end is the namespace URI, the prefix is ephemeral.
As long as the prefixes in your code resolve to the actual namespace URI of the targeted nodes, you're good. It does not matter if the namespaces are default at the specific location in the XML.

How to check equivalence of two XML documents?

From my program I call a command line XSLT processor (such Saxon or xsltproc).
Then (for testing purposes) I want to compare the output of the processor with a predefined string.
The trouble is that XML can be formatted differently. The following three are different strings:
<?xml version="1.0" encoding="utf-8"?>
<x/>
<?xml version="1.0"?>
<x/>
<?xml version="1.0"?>
<x
/>
How to check output from different XSLT processors to match a given XML string?
Maybe there is a way (not necessarily standartized) for different XSLT processors to output exactly the same?
I use Python 3.
Have you looked at using a testing framework like XSpec that already addresses this issue?
Typically the two classic ways of solving this are to compare the serialized XML lexically after putting it through a canonicalizer, or to compare the tree representations using a function such as XPath 2.0 deep-equal().
Neither of these is a perfect answer. Firstly, the things which XML canonicalization considers to be significant or insignificant may not be the same as the things you consider significant or insignificant; and the same goes for XPath deep-equal(). Secondly, you really want to know not just whether the files are the same, but where the differences are.
Saxon has an enhanced version of deep-equal() called saxon:deep-equal() designed to address these issues: it takes a set of flags that can be used to customize the comparison, and it tries to tell you where the differences are in terms of warning messages. But it's not a perfect solution either.
In the W3C test suites for XSLT 3.0 and XQuery we've moved away from comparing XML outputs of tests to writing assertions against the expected results in terms of XPath expressions. The tests use assertions like this:
<result>
<all-of>
<assert>every $a in /out/* except /out/a4
satisfies $a/#actual = $a/#expected</assert>
<assert>/out/a4/#actual = 'false'</assert>
</all-of>
</result>
Do you care about the order? IF NOT:
Convert them to a dictionary then run deepdiff on them!
It can be easily done with minidom:
from unittest import TestCase
from defusedxml.minidom import parseString
class XmlTest(TestCase):
def assertXmlEqual(self, got, want):
return self.assertEqual(parseString(got).toxml(), parseString(want).toxml())

How to access attribute value in xml containing namespace using ElementTree in python

XML file:
<?xml version="1.0" encoding="iso-8859-1"?>
<rdf:RDF xmlns:cim="http://iec.ch/TC57/2008/CIM-schema-cim13#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<cim:Terminal rdf:ID="A_T1">
<cim:Terminal.ConductingEquipment rdf:resource="#A_EF2"/>
<cim:Terminal.ConnectivityNode rdf:resource="#A_CN1"/>
</cim:Terminal>
</rdf:RDF>
I want to get the Terminal.ConnnectivityNode element's attribute value and Terminal element's attribute value also as output from the above xml. I have tried in below way!
Python code:
from elementtree import ElementTree as etree
tree= etree.parse(r'N:\myinternwork\files xml of bus systems\cimxmleg.xml')
cim= "{http://iec.ch/TC57/2008/CIM-schema-cim13#}"
rdf= "{http://www.w3.org/1999/02/22-rdf-syntax-ns#}"
Appending the below line to the code
print tree.find('{0}Terminal'.format(cim)).attrib
output1: : Is as expected
{'{http://www.w3.org/1999/02/22-rdf-syntax-ns#}ID': 'A_T1'}
If we Append with this below line to above code
print tree.find('{0}Terminal'.format(cim)).attrib['rdf:ID']
output2: key error in rdf:ID
If we append with this below line to above code
print tree.find('{0}Terminal/{0}Terminal.ConductivityEquipment'.format(cim))
output3 None
How to get output2 as A_T1 & Output3 as #A_CN1?
What is the significance of {0} in the above code, I have found that it must be used through net didn't get the significance of it?
First off, the {0} you're wondering about is part of the syntax for Python's built-in string formatting facility. The Python documentation has a fairly comprehensive guide to the syntax. In your case, it simply gets substituted by cim, which results in the string {http://iec.ch/TC57/2008/CIM-schema-cim13#}Terminal.
The problem here is that ElementTree is a bit silly about namespaces. Instead of being able to simply supply the namespace prefix (like cim: or rdf:), you have to supply it in XPath form. This means that rdf:id becomes {http://www.w3.org/1999/02/22-rdf-syntax-ns#}ID, which is very clunky.
ElementTree does support a way to use the namespace prefix for finding tags, but not for attributes. This means you'll have to expand rdf: to {http://www.w3.org/1999/02/22-rdf-syntax-ns#} yourself.
In your case, it could look as following (note also that ID is case-sensitive):
tree.find('{0}Terminal'.format(cim)).attrib['{0}ID'.format(rdf)]
Those substitutions expand to:
tree.find('{http://iec.ch/TC57/2008/CIM-schema-cim13#}Terminal').attrib['{http://www.w3.org/1999/02/22-rdf-syntax-ns#}ID']
With those hoops jumped through, it works (note that the ID is A_T1 and not #A_T1, however). Of course, this is all really annoying to have to deal with, so you could also switch to lxml and have it mostly handled for you.
Your third case doesn't work simply because 1) it's named Terminal.ConductingEquipment and not Terminal.ConductivityEquipment, and 2) if you really want A_CN1 and not A_EF2, that's the ConnectivityNode and not the ConductingEquipment. You can get A_CN1 with tree.find('{0}Terminal/{0}Terminal.ConnectivityNode'.format(cim)).attrib['{0}resource'.format(rdf)].

RDFLib: Namespace prefixes in XML serialization

In a Python script using RDFLib 3.0, I get the following XML-Output when serializing my triples:
<rdf:RDF
xmlns:_3="http://www.my-example.intra/ontologies/ci.owl#"
>
How can I define specific namespace prefixes for those anonymous _x-prefixes automatically assigned by RDFLib (or it's XML-Serializer respectively)?
<rdf:RDF
xmlns:ex="http://www.my-example.intra/ontologies/ci.owl#"
>
Many thanks in advance for your responses!
I eventually found a solution to this by looking at some (quite messily distributed) rdflib doc files. For the (Conjunctive)Graph storing the triples, call
mygraph.bind(prefix, URIRef(url))
i.e.
mygraph.bind('ex', URIRef('http://www.my-example.intra/ontologies/ci.owl#'))
Passing 'False' as 3rd argument overrides existing namespace prefix bindings.

Categories