XMLSigner - sign multiple references - python

I'm developing a digital signature system and I have a question about the XMLSigner library. I could not find my answer by looking at documentation, so I'm gonna ask here.
I have a XML file, which I need to sign it and I have more than a reference, but there's a problem: they don't have an ID specifying for a reference. The XML that I need to sign is something like:
<AppHdr>
<Fr>
<FIId>
<FinInstnId>
<Othr>
<Id>00038166</Id>
</Othr>
</FinInstnId>
</FIId>
</Fr>
(more content...)
</AppHdr>
<Document>
(more content...)
</Document>
I've extracted both AppHdr and Document, and made the signature using each one, and my idea was to put them together later in another xml file, already canonicalized and encrypted, using:
signed_app_hdr = XMLSigner(method=methods.enveloped, signature_algorithm='rsa-sha256',
digest_algorithm='sha256',
c14n_algorithm=
"http://www.w3.org/2001/10/xml-exc-c14n#"). \
sign(et_app_hdr, key=rsa_key)
signed_document_info = XMLSigner(signature_algorithm='rsa-sha256',
digest_algorithm='sha256',
c14n_algorithm=
"http://www.w3.org/2001/10/xml-exc-c14n#"). \
sign(et_document, key=rsa_key)
The digital signature output (padronized by an institution that I'm sending the message), requires that
<Reference URI="">
to reference the <AppHdr>
And
<Reference>
to reference the <Document>
There's also a <KeyInfo>, that is referenced by an id (no issues in this case). I'm just using sign(et_key_info, key=rsa_key, reference_uri='key-info-id') to reference it.
So, my question is: how do I do to reference the AppHdr and Document in the reference_uri? Is it possible? When I just leave reference_uri = None (by default), it just creates <Reference URI="">, which would be no problem for the AppHdr. And for the document? What should I do? Could I create an artificial ID for them and remove later? Idk if it would have implications in cryptography.
Thanks in advance!

Related

How can I register multiple default namespaces when modifying an xml file using python?

I have looked through many namespace documents on here and am only able to slightly relate to a few. in my document, I have 3 defaults and only one colon style xmlns, example:
xmlns="someurlNo1"
xmlns:spatial="someurlNo2"
xmlns="someurlNo3"
xmlns="someurlNo4"
From what I have read, it seems that I have 3 defaults (please correct me if I am interpreting this wrong), but when I modify my base xml and then write my new xml, I am only able to avoid having the first two, ns0 and ns1, not show up by commenting out the last two, which makes everything else part of the last two defaults are labeled with "ns2" and "ns3" even if I register all as such:
ET.register_namespace('',"someurlNo0") #ns0
ET.register_namespace('spatial',"someurlNo1") #ns1
#ET.register_namespace('',"someurlNo2") #ns2
#ET.register_namespace('',"someurlNo3") #ns3
Does anyone know how to register the last two default namespaces correctly? When I leave the last two not commented out, ns0 and ns2 appear where they should, and while all the ns3s disappear, the default is no longer equal to "someurlNo3".
This answer: https://stackoverflow.com/a/43530940 has so far been the most helpful explanation to me in illustrating that there may be multiple defaults that travel down (which I believe is true for my document), but I am still unsure how to properly register them. Any ideas would be much appreciated!
Here is what the top part of my xml looks like that includes all 4 namespaces. I'd rather spare you from seeing all 3k lines but if needed I can share more:
<?xml version='1.0' encoding='UTF-8' standalone='no'?>
<sbml xmlns="http://www.sbml.org/sbml/level3/version1/core" level="3" spatial:required="true" version="1" xmlns:spatial="http://www.sbml.org/sbml/level3/version1/spatial/version1">
<notes>
<body xmlns="http://www.w3.org/1999/xhtml">
<p>Exported by VCell 7.3</p>
</body>
</notes>
<model areaUnits="um2" extentUnits="molecules" id="_zero_6_29_21_Phase1_cellularConcAgain_Spatial" lengthUnits="um" name="06_29_21_Phase1_cellularConcAgain_Spatial" substanceUnits="molecules" timeUnits="s" volumeUnits="um3">
<spatial:geometry xmlns:spatial="http://www.sbml.org/sbml/level3/version1/spatial/version1" id="vcell" spatial:coordinateSystem="cartesian" spatial:id="vcell">
<spatial:listOfCoordinateComponents>
<spatial:coordinateComponent id="x" spatial:id="x" spatial:type="cartesianX" spatial:unit="um">
<spatial:boundaryMin id="Xmin" spatial:id="Xmin" spatial:value="0.0"/>
<spatial:boundaryMax id="Xmax" spatial:id="Xmax" spatial:value="1.6"/>
</spatial:coordinateComponent>
<spatial:coordinateComponent id="y" spatial:id="y" spatial:type="cartesianY" spatial:unit="um">
<spatial:boundaryMin id="Ymin" spatial:id="Ymin" spatial:value="0.0"/>
<spatial:boundaryMax id="Ymax" spatial:id="Ymax" spatial:value="3.5"/>
</spatial:coordinateComponent>
</spatial:listOfCoordinateComponents>
<spatial:listOfDomains>
<spatial:domain id="chr0" spatial:domainType="domainType_chr" spatial:id="chr0">
<spatial:listOfInteriorPoints>
<spatial:interiorPoint spatial:coord1="0.0" spatial:coord2="0.0" spatial:coord3="5.0"/>
</spatial:listOfInteriorPoints>
</spatial:domain>
</spatial:listOfDomains>
<spatial:listOfDomainTypes>
<spatial:domainType id="domainType_chr" spatial:id="domainType_chr" spatial:spatialDimensions="3"/>
</spatial:listOfDomainTypes>
<spatial:listOfGeometryDefinitions>
<spatial:analyticGeometry id="Analytic_Geometry1640227629" spatial:id="Analytic_Geometry1640227629" spatial:isActive="true">
<spatial:listOfAnalyticVolumes>
<spatial:analyticVolume spatial:domainType="domainType_chr" spatial:functionType="layered" spatial:id="chr" spatial:ordinal="0">
<math xmlns="http://www.w3.org/1998/Math/MathML">
<apply>
<neq/>
<cn> 0 </cn>
<cn> 1 </cn>
</apply>
</math>
You're misunderstanding namespaces. That a namespace is "default" is not a property of that namespace, it's a property of the XML in that location.
Don't get distracted. Give all namespace URIs that you're going to use a prefix, done.
ns = {
'a': 'someurlNo0',
'b': 'someurlNo1',
'c': 'someurlNo2',
'd': 'someurlNo3',
'e': 'someurlNo3', # same URI as above, perfectly legal
}
tree = ET.parse('path/to.xml')
tree.findall("./a:node/b:node/c:node/d:node/e:node", namespaces=ns)
It does not even need to be the same prefix as in your XML. In fact, avoid that. Give namespace URIs prefixes that make reading your code easy. All that matters in the end is the namespace URI, the prefix is ephemeral.
As long as the prefixes in your code resolve to the actual namespace URI of the targeted nodes, you're good. It does not matter if the namespaces are default at the specific location in the XML.

How can I parse an XML document into a Python object?

I'm trying to consume an XML API. I'd like to have some Python objects that represent the XML data. I have several XSD and some example API responses from the documentation.
http://www.isan.org/schema/v1.11/common/common.xsd
http://www.isan.org/schema/v1.21/common/serial.xsd
http://www.isan.org/schema/v1.11/common/version.xsd
http://www.isan.org/ISAN/isan.xsd
http://www.isan.org/schema/v1.11/common/title.xsd
http://www.isan.org/schema/v1.11/common/externalid.xsd
http://www.isan.org/schema/v1.11/common/participant.xsd
http://www.isan.org/schema/v1.11/common/language.xsd
http://www.isan.org/schema/v1.11/common/country.xsd
Here's one example XML response:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<serial:serialHeaderType xmlns:isan="http://www.isan.org/ISAN/isan"
xmlns:title="http://www.isan.org/schema/v1.11/common/title"
xmlns:serial="http://www.isan.org/schema/v1.21/common/serial"
xmlns:externalid="http://www.isan.org/schema/v1.11/common/externalid"
xmlns:common="http://www.isan.org/schema/v1.11/common/common"
xmlns:participant="http://www.isan.org/schema/v1.11/common/participant"
xmlns:language="http://www.isan.org/schema/v1.11/common/language"
xmlns:country="http://www.isan.org/schema/v1.11/common/country">
<common:status>
<common:DataType>SERIAL_HEADER_TYPE</common:DataType>
<common:ISAN root="0000-0002-3B9F"/>
<common:WorkStatus>ACTIVE</common:WorkStatus>
</common:status>
<serial:SerialHeaderId root="0000-0002-3B9F"/>
<serial:MainTitles>
<title:TitleDetail>
<title:Title>Braquo</title:Title>
<title:Language>
<language:LanguageLabel>French</language:LanguageLabel>
<language:LanguageCode>
<language:CodingSystem>ISO639_2</language:CodingSystem>
<language:ISO639_2Code>FRE</language:ISO639_2Code>
</language:LanguageCode>
</title:Language>
<title:TitleKind>ORIGINAL</title:TitleKind>
</title:TitleDetail>
</serial:MainTitles>
<serial:TotalEpisodes>11</serial:TotalEpisodes>
<serial:TotalSeasons>0</serial:TotalSeasons>
<serial:MinDuration>
<common:TimeUnit>MIN</common:TimeUnit>
<common:TimeValue>45</common:TimeValue>
</serial:MinDuration>
<serial:MaxDuration>
<common:TimeUnit>MIN</common:TimeUnit>
<common:TimeValue>144</common:TimeValue>
</serial:MaxDuration>
<serial:MinYear>2009</serial:MinYear>
<serial:MaxYear>2009</serial:MaxYear>
<serial:MainParticipantList>
<participant:Participant>
<participant:FirstName>Frédéric</participant:FirstName>
<participant:LastName>Schoendoerffer</participant:LastName>
<participant:RoleCode>DIR</participant:RoleCode>
</participant:Participant>
<participant:Participant>
<participant:FirstName>Karole</participant:FirstName>
<participant:LastName>Rocher</participant:LastName>
<participant:RoleCode>ACT</participant:RoleCode>
</participant:Participant>
</serial:MainParticipantList>
<serial:CompanyList>
<common:Company>
<common:CompanyKind>PRO</common:CompanyKind>
<common:CompanyName>R.T.B.F.</common:CompanyName>
</common:Company>
<common:Company>
<common:CompanyKind>PRO</common:CompanyKind>
<common:CompanyName>Capa Drama</common:CompanyName>
</common:Company>
<common:Company>
<common:CompanyKind>PRO</common:CompanyKind>
<common:CompanyName>Marathon</common:CompanyName>
</common:Company>
</serial:CompanyList>
</serial:serialHeaderType>
I tried simply ignoring the XSD and using lxml.objectify on the XML I'd get from the API. I had a problem with namespaces. Having to refer to every child node with its explicit namespace was a real pain and doesn't make for readable code.
from lxml import objectify
obj = objectify.fromstring(response)
print obj.MainTitles.TitleDetail
# This will fail to find the element because you need to specify the namespace
print obj.MainTitles['{http://www.isan.org/schema/v1.11/common/title}TitleDetail']
# Or something like that, I couldn't get it to work, and I'd much rather use attributes and not specify the namespace
So then I tried generateDS to create some Python class definitions for me. I've lost the error messages that this attempt gave me but I couldn't get it to work. It would generate a module for each XSD that I gave it but it wouldn't parse the example XML.
I'm now trying pyxb and this seems much nicer so far. It's generating nicer definitions than generateDS (splitting them into multiple, reusable modules) but it won't parse the XML:
from models import serial
obj = serial.CreateFromDocument(response)
Traceback (most recent call last):
...
File "/vagrant/isan/isan.py", line 58, in lookup
return serial.CreateFromDocument(resp.content)
File "/vagrant/isan/models/serial.py", line 69, in CreateFromDocument
instance = handler.rootObject()
File "/home/vagrant/venv/lib/python2.7/site-packages/pyxb/binding/saxer.py", line 285, in rootObject
raise pyxb.UnrecognizedDOMRootNodeError(self.__rootObject)
UnrecognizedDOMRootNodeError: <pyxb.utils.saxdom.Element object at 0x2b53664dc850>
The unrecognised node is the <serial:serialHeaderType> node from the example. Looking at the pyxb source it seems that this error comes about "if the top-level element got processed as a DOM instance" but I don't know what this means or how to prevent it.
I've run out of steam for trying to explore this, I don't know what to do next.
I have had a lot of luck parsing XML into Python using Beautiful Soup. It is extremely straightforward, and they provide pretty strong documentation. Check it out here:
http://www.crummy.com/software/BeautifulSoup/
http://www.crummy.com/software/BeautifulSoup/bs4/doc/
UnrecognizedDOMRootNodeError indicates that PyXB could not locate the element in a namespace for which it has bindings registered. In your case it fails on the first element, which is {http://www.isan.org/schema/v1.21/common/serial}serialHeaderType.
The schema for that namespace defines a complexType named SerialHeaderType but does not define an element with the name serialHeaderType. In fact it defines no top-level elements. So PyXB can't recognize it, and the XML does not validate.
Either there's an additional schema for the namespace that you'll need to locate which provides elements, or the message you're sending really doesn't validate. That may be because somebody's expecting a implicit mapping from a complex type to an element with that type, or because it's a fragment that would normally be found within some other element where that QName is a member element name.
UPDATE: You can hand-craft an element in that namespace by adding the
following to the generated bindings in serial.py:
serialHeaderType = pyxb.binding.basis.element(pyxb.namespace.ExpandedName(Namespace, 'serialHeaderType'), SerialHeaderType)
Namespace.addCategoryObject('elementBinding', serialHeaderType.name().localName(), serialHeaderType)
If you do that, you won't get the UnrecognizedDOMRootNodeError but you
will get an IncompleteElementContentError at:
<common:status>
<common:DataType>SERIAL_HEADER_TYPE</common:DataType>
<common:ISAN root="0000-0002-3B9F"/>
<common:WorkStatus>ACTIVE</common:WorkStatus>
</common:status>
which provides the following details:
The containing element {http://www.isan.org/schema/v1.11/common/common}status is defined at common.xsd[243:3].
The containing element type {http://www.isan.org/schema/v1.11/common/common}StatusType is defined at common.xsd[289:1]
The {http://www.isan.org/schema/v1.11/common/common}StatusType automaton is not in an accepting state.
Any accepted content has been stored in instance
The following element and wildcard content would be accepted:
An element {http://www.isan.org/schema/v1.11/common/common}ActiveISAN per common.xsd[316:3]
An element {http://www.isan.org/schema/v1.11/common/common}MatchingISANs per common.xsd[317:3]
An element {http://www.isan.org/schema/v1.11/common/common}Description per common.xsd[318:3]
No content remains unconsumed
Reviewing the schema confirms that, at a minimum, a {http://www.isan.org/schema/v1.11/common/common}Description element is missing but required.
So it seems these documents are not meant to be validated, and PyXB is
probably the wrong technology to use.

Using owl:Class prefix with rdflib and xml serialization

I would like to use the owl: prefix in the XML serialization of my RDF ontology (using rdflib version 4.1.1); unfortunately I'm still getting the serialization as rdf:Description tags. I have looked at the answer about binding the namespace to the graph at RDFLib: Namespace prefixes in XML serialization but this seems to only work when serializing using the ns format rather than xml format.
Let's be more concrete. I'm attempting to get the following ontology (as taken from Introducing RDFS and OWL) in XML as follows:
<!-- OWL Class Definition - Plant Type -->
<owl:Class rdf:about="http://www.linkeddatatools.com/plants#planttype">
<rdfs:label>The plant type</rdfs:label>
<rdfs:comment>The class of all plant types.</rdfs:comment>
</owl:Class>
Here is the python code for constructing such a thing, using rdflib:
from rdflib.namespace import OWL, RDF, RDFS
from rdflib import Graph, Literal, Namespace, URIRef
# Construct the linked data tools namespace
LDT = Namespace("http://www.linkeddatatools.com/plants#")
# Create the graph
graph = Graph()
# Create the node to add to the Graph
Plant = URIRef(LDT["planttype"])
# Add the OWL data to the graph
graph.add((Plant, RDF.type, OWL.Class))
graph.add((Plant, RDFS.subClassOf, OWL.Thing))
graph.add((Plant, RDFS.label, Literal("The plant type")))
graph.add((Plant, RDFS.comment, Literal("The class of all plant types")))
# Bind the OWL and LDT name spaces
graph.bind("owl", OWL)
graph.bind("ldt", LDT)
print graph.serialize(format='xml')
Sadly, even with those bind statements, the following XML is printed:
<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
>
<rdf:Description rdf:about="http://www.linkeddatatools.com/plants#planttype">
<rdfs:subClassOf rdf:resource="http://www.w3.org/2002/07/owl#Thing"/>
<rdfs:label>The plant type</rdfs:label>
<rdfs:comment>The class of all plant types</rdfs:comment>
<rdf:type rdf:resource="http://www.w3.org/2002/07/owl#Class"/>
</rdf:Description>
</rdf:RDF>
Granted, this is still an Ontology, and usable - but since we have various editors, the much more compact and readable first version using the owl prefix would be far preferable. Is it possible to do this in rdflib without overriding the serialization method?
Update
In response to the comments, I'll rephrase my "bonus question" as simple clarification to my question at large.
Not a Bonus Question The topic here involves the construction of the OWL namespace formatted ontology which is a shorthand for the more verbose RDF/XML specification. The issue here is larger though than the simple declaration of a namespace prefix for shorthand for only Classes or Properties, there are many shorthand notations that have to be dealt with in code; for example owl:Ontology descriptions should be added as good form to this notation. I am hoping that rdflib has support for the complete specification of the notation- rather than have to roll my own serialization.
Instead of using the xml format, you need to use the pretty-xml format. It's listed in the documentation, Plugin serializers. That will give you the type of output that you're looking for. That is, you'd use a line like the following in order to use the PrettyXMLSerializer:
print graph.serialize(format='pretty-xml')
To address the "bonus question", you can add a line like the following to create the ontology header, and then serializing with pretty-xml will give you the following output.
graph.add((URIRef('https://stackoverflow.com/q/24017320/1281433/ontology.owl'), RDF.type, OWL.Ontology ))
<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF
xmlns:owl="http://www.w3.org/2002/07/owl#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
>
<owl:Ontology rdf:about="https://stackoverflow.com/q/24017320/1281433/ontology.owl"/>
<owl:Class rdf:about="http://www.linkeddatatools.com/plants#planttype">
<rdfs:comment>The class of all plant types</rdfs:comment>
<rdfs:subClassOf rdf:resource="http://www.w3.org/2002/07/owl#Thing"/>
<rdfs:label>The plant type</rdfs:label>
</owl:Class>
</rdf:RDF>
Adding the x rdf:type owl:Ontology triple isn't a very OWL-centric way of declaring the ontology though. It sounds like you're looking for something more like Jena's OntModel interface (which is just a convenience layer over Jena's RDF-centric Model), or the OWLAPI, but for RDFLib. I don't know whether such a thing exists (I'm not an RDFlib user), but you might have a look at:
RDFLib/OWL-RL: It looks like a reasoner, but it might have some of the methods that you need.
Inspecting an ontology with RDFLib: a blog article with links to source that might do some of what you want.
Is there a Python library to handle OWL?: A Stack Overflow question (now off-topic, because library/tool requests are off-topic, but it's an old question) where the accepted answer points out that rdflib is RDF-centric, not OWL-centric, but some of the other answers might be useful, particular this one, although most of those were outdated, even in 2011.

XPath Namespace issues

When connecting to an XMPP server I get one of these two responses:
<stream:features xmlns:stream="http://etherx.jabber.org/streams">
<mechanisms xmlns="urn:ietf:params:xml:ns:xmpp-sasl">
<mechanism>PLAIN</mechanism>
<mechanism>DIGEST MD5</mechanism>
</mechanisms>
<auth xmlns="http://jabber.org/features/iq-auth" />
<register xmlns="http://jabber.org/features/iq-register" />
</stream:features>
OR
<stream:features>
<mechanisms xmlns="urn:ietf:params:xml:ns:xmpp-sasl">
<mechanism>DIGEST-MD5</mechanism>
<mechanism>PLAIN</mechanism>
<mechanism>ANONYMOUS</mechanism>
<mechanism>CRAM-MD5</mechanism>
</mechanisms>
<compression xmlns="http://jabber.org/features/compress">
<method>zlib</method>
</compression>
<auth xmlns="http://jabber.org/features/iq-auth" />
<register xmlns="http://jabber.org/features/iq-register" />
</stream:features>
When trying to parse the second one with my code, I get this error:
namespace error : Namespace prefix stream on features is not defined
<stream:features><mechanisms xmlns="urn:ietf:params:xml:ns:xmpp-sasl"><mechanism
^
Here is my code:
mechanisms = []
xmlParsed = libxml2.parseDoc(xmlResponse)
xpathContext = xmlParsed.xpathNewContext()
xpathContext.xpathRegisterNs('urn','http://etherx.jabber.org/streams')
xpathContext.xpathRegisterNs('sasl', 'urn:ietf:params:xml:ns:xmpp-sasl')
nodes = xpathContext.xpathEval("//urn:stream/features/sasl:mechanisms/sasl:mechanism/text()|//urn:features/sasl:mechanisms/sasl:mechanism/text()")
for node in nodes:
mechanisms.append(str(node))
What am I doing wrong and how can I right it? Please don't say, use the XMPP libraries or such, I'm not trying to write an entire XMPP client. I just want enough code to register as a user first.
Please don't write your own XMPP library from scratch. There are already many available from a list on xmpp.org. In particular, for Python, try SleekXMPP.
For example, using parseDoc isn't going to work; you'll need to parse XML incrementally. The missing prefix definition for "stream" in "stream:features" is a symptom of this sort of problem.
I think the error is reported for the <stream:features> tag saying that the prefix stream is not defined.
<stream:features> indicates that the features tag is under a namespace represented by prefix stream and in your xml fragment there is no such namespace declared.

Why does the Django Atom1Feed use atom:updated instead of atom:published?

I made an Atom feed in Django using a class that looks something like this:
class AtomFeed(Feed):
feed_type = feedgenerator.Atom1Feed
# ...
def item_pubdate(self, post):
return datetime.datetime(post.date.year, post.date.month, post.date.day)
The resulting XML for an item:
<entry>
<title>..</title>
<link href="..." rel="alternate"></link>
<updated>2010-10-18T00:00:00+02:00</updated>
<author><name>...</name></author>
<id>...</id>
<summary type="html">...</summary>
</entry>
The thing to note here is that the date goes in the atom:updated element, not the atom:published element.
The RFC clearly suggests to me that this is not the intended usage:
The "atom:updated" element is a Date construct indicating the most recent instant in time when an entry or feed was modified in a way the publisher considers significant. Therefore, not all modifications necessarily result in a changed atom:updated value.
Whereas:
The "atom:published" element is a Date construct indicating an instant in time associated with an event early in the life cycle of the entry.
This is more than just a theoretical problem. Google Reader, for example, does not seem to use the updated element, and uses the date that it first saw the item appear. As a result, it does not order the items properly upon first import of the feed.
The code in Django responsible for this:
django/utils/feedgenerator.py:331
if item['pubdate'] is not None:
handler.addQuickElement(u"updated", rfc3339_date(item['pubdate']).decode('utf-8'))
There appears to be no mention of the published element.
Is this a bug in Django? Am I misunderstanding the Atom RFC? Am I missing something else?
You are not missing anything. The Atom RFC is correct, and this is a known bug in Django; see this Django bug.
It looks like a relatively simple fix, so feel free to get in there and patch it! ^_^

Categories