How to use python functions and variables inside XSLT? - python

In PHP, you can use registerPHPFunctions to use a PHP function inside an XSLT file like this:
<?php
$xml = <<<EOB
<allusers>
<user>
<uid>bob</uid>
<id>1</id>
</user>
<user>
<uid>joe</uid>
<id>2</id>
</user>
</allusers>
EOB;
$xsl = <<<EOB
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:php="http://php.net/xsl">
<xsl:output method="html" encoding="utf-8" indent="yes"/>
<xsl:template match="allusers">
<html><body>
<h2>Users</h2>
<table>
<xsl:for-each select="user">
<tr><td>
<xsl:value-of
select="php:function('ucfirst',concat(string(uid), string(id)))"/>
</td></tr>
</xsl:for-each>
</table>
</body></html>
</xsl:template>
</xsl:stylesheet>
EOB;
$xmldoc = DOMDocument::loadXML($xml);
$xsldoc = DOMDocument::loadXML($xsl);
$proc = new XSLTProcessor();
$proc->registerPHPFunctions('ucfirst');
$proc->importStyleSheet($xsldoc);
echo $proc->transformToXML($xmldoc);
?>
What is the Python equivalent? This is what I've tried
from lxml import etree
xml = etree.XML('''
<allusers>
<user>
<uid>bob</uid>
<id>1</id>
</user>
<user>
<uid>joe</uid>
<id>2</id>
</user>
</allusers>''')
xsl = etree.XML('''
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:f="mynamespace"
extension-element-prefixes="f">
<xsl:output method="html" encoding="utf-8" indent="yes"/>
<xsl:template match="allusers">
<html><body>
<h2>Users</h2>
<table>
<xsl:for-each select="user">
<tr><td>
<f:ucfirst>
<xsl:value-of select="concat(string(uid), string(id))"/>
</f:ucfirst>
</td></tr>
</xsl:for-each>
</table>
</body></html>
</xsl:template>
</xsl:stylesheet>
''')
extension = Ucase()
extensions = { ('mynamespace', 'ucfirst') : extension }
proc = etree.XSLT(xsl, extensions=extensions)
str(proc(xml))
class Ucase(etree.XSLTExtension):
def execute(self, context, self_node, input_node, output_parent):
title = self_node[0].text.capitalize()
output_parent.text(title)
This is a simplified version of my XSLT.

Here is how an extension function (not an element) can give the result that I think you want:
from lxml import etree
def ucfirst(context, s):
return s.capitalize()
ns = etree.FunctionNamespace("mynamespace")
ns['ucfirst'] = ucfirst
xml = etree.XML('''
<allusers>
<user>
<uid>bob</uid>
<id>1</id>
</user>
<user>
<uid>joe</uid>
<id>2</id>
</user>
</allusers>''')
xsl = etree.XML('''\
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:f="mynamespace" exclude-result-prefixes="f">
<xsl:output method="html" encoding="utf-8" indent="yes"/>
<xsl:template match="allusers">
<html><body>
<h2>Users</h2>
<table>
<xsl:for-each select="user">
<tr><td>
<xsl:value-of select="f:ucfirst(concat(string(uid), string(id)))"/>
</td></tr>
</xsl:for-each>
</table>
</body></html>
</xsl:template>
</xsl:stylesheet>
''')
transform = etree.XSLT(xsl)
result = transform(xml)
print result
Output:
<html><body>
<h2>Users</h2>
<table>
<tr><td>Bob1</td></tr>
<tr><td>Joe2</td></tr>
</table>
</body></html>
See http://lxml.de/extensions.html#xpath-extension-functions.

There are separate answers for variables and functions. I'm only really familiar with the variable half.
For variables, you can pass them as an xsl:param by passing them as keyword arguments to the call. For example:
transform = etree.XSLT(xslt_tree)
result = transform(doc_root, a="5")
Note that the argument is an XPath expression, so strings need to be quoted. There is a function that does this opaquely:
result = transform(doc_root, a=etree.XSLT.strparam(""" It's "Monty Python" """))
If you want to pass an XML fragment you could use the exslt:node-set() function.
For functions, you can expose them either as an xpath function or as an element. There is a bunch of variety and I haven't done this myself so read the docs below and/or edit this answer.
Docs for basic use and variables.
Docs for adding functions.

Related

Python XML/Pandas: How to merge nested XML?

How can I join two different pieces of information together from this XML file?
# data
xml1 = ('''<?xml version="1.0" encoding="utf-8"?>
<TopologyDefinition xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RSkus>
<RSku ID="V1" Deprecated="true" Owner="Unknown" Generation="1">
<Devices>
<Device ID="1" SkuID="Switch" Role="xD" />
</Devices>
<Blades>
<Blade ID="{1-20}" SkuID="SBlade" />
</Blades>
<Interfaces>
<Interface ID="COM" HardwareID="NS1" SlotID="COM1" Type="serial" />
<Interface ID="LINK" HardwareID="TS1" SlotID="UPLINK_1" Type="serial" />
</Interfaces>
<Wires>
<WireGroup Type="network">
<Wire LocationA="NS1" SlotA="{1-20}" LocationB="{1-20}" SlotB="NIC1" />
</WireGroup>
<WireGroup Type="serial">
<Wire LocationA="TS1" SlotA="{7001-7020}" LocationB="{1-20}" SlotB="COM1" />
</WireGroup>
</Wires>
</RSku>
</RSkus>
</TopologyDefinition>
''')
While this is a single case and trivial in the instance below; if I run the below commands on the full file, I get shapes that do not match and therefore cannot be joined so easily.
How can I extract the XML information such that for every row, I get all the RSku information PLUS its Blade information. Each xpath contains no information that would let me join it to another xpath so that I may combine the information.
# how to have them joined?
pd.read_xml(xml1, xpath = ".//RSku")
pd.read_xml(xml1, xpath = ".//Blade")
# expected
pd.concat([pd.read_xml(xml1, xpath = ".//RSku"), pd.read_xml(xml1, xpath = ".//Blade")], axis=1)
Consider transforming the XML with XSLT by flattening the document with information you need. Specifically, retrieve only Blade attributes using descendant::* axis and corresponding RSku attributes using the ancestor::* axis. Python' lxml (default parser of pandas.read_xml) can run XSLT 1.0 scripts.
Below XSLT's <xsl:for-each> is used to prefix RSku_ and Blade_ to attribute names since they share same attribute such as ID. Otherwise template would be much less wordy.
import pandas as pd
xml1 = ...
xsl = ('''<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="/TopologyDefinition">
<root>
<xsl:apply-templates select="descendant::Blade"/>
</root>
</xsl:template>
<xsl:template match="Blade">
<data>
<xsl:for-each select="ancestor::RSku/#*">
<xsl:attribute name="{concat('RSku_', name())}">
<xsl:value-of select="."/>
</xsl:attribute>
</xsl:for-each>
<xsl:for-each select="#*">
<xsl:attribute name="{concat('Blade_', name())}">
<xsl:value-of select="."/>
</xsl:attribute>
</xsl:for-each>
</data>
</xsl:template>
</xsl:stylesheet>''')
blades_df = pd.read xml(xml1, stylesheet=xsl)
Online XSLT Demo

How do I insert a tag that holds the text of an older tag in xml using python?

I want to insert s tags inside a tag that already exists and move the text of the older tag inside the s tag. For example, if my XML file looks like this:
<root>
<name>Light and dark</name>
<address>
<sector>142</sector>
<location>Noida</location>
</address>
</root>
I want it to be like this (check the name tag):
<root>
<name>
<s>Light and dark</s>
</name>
<address>
<sector>142</sector>
<location>Noida</location>
</address>
</root>
I tried using ET.SubElement but it doesn't give me the same result.
It it much better to use XSLT for such tasks.
XSLT has so called Identity Transform pattern.
Input XML
<root>
<name>Light and dark</name>
<address>
<sector>142</sector>
<location>Noida</location>
</address>
</root>
XSLT
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" encoding="utf-8" indent="yes" omit-xml-declaration="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="name">
<xsl:copy>
<s>
<xsl:value-of select="."/>
</s>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Output XML
<root>
<name>
<s>Light and dark</s>
</name>
<address>
<sector>142</sector>
<location>Noida</location>
</address>
</root>
To insert a sub-element in the XML using ElementTree XML API, append the new element then set it with the text value of the parent element.
import xml.etree.ElementTree as ET
xml = """
<root>
<name>Light and dark</name>
<address>
<sector>142</sector>
<location>Noida</location>
</address>
</root>"""
root = ET.fromstring(xml)
# 1. find name element in document
name = root.find('name')
# 2. get text value and reset the element
text = name.text
name.clear()
# 3. create new element s and set text
elt = ET.SubElement(name, "s")
elt.text = text
print(ET.tostring(root, encoding='unicode'))
To process multiple elements, add a loop around steps 1-3:
for child in root.findall('name'):
text = child.text
child.clear()
elt = ET.SubElement(child, "s")
elt.text = text
Output:
<root>
<name><s>Light and dark</s></name>
<address>
<sector>142</sector>
<location>Noida</location>
</address>
</root>

How to remove all occurences of element in XML file?

I'd like to edit a KML file and remove all occurences of ExtendedData elements, wherever they are located in the file.
Here's the input XML file:
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://earth.google.com/kml/2.2">
<Document>
<Style id="placemark-red">
<IconStyle>
<Icon>
<href>http://maps.me/placemarks/placemark-red.png</href>
</Icon>
</IconStyle>
</Style>
<name>My track</name>
<ExtendedData xmlns:mwm="https://maps.me">
<mwm:name>
<mwm:lang code="default">Blah</mwm:lang>
</mwm:name>
<mwm:lastModified>2020-04-05T14:17:18Z</mwm:lastModified>
</ExtendedData>
<Placemark>
<name></name>
…
<ExtendedData xmlns:mwm="https://maps.me">
<mwm:localId>0</mwm:localId>
<mwm:visibility>1</mwm:visibility>
</ExtendedData>
</Placemark>
</Document>
</kml>
And here's the code that 1) only removes the outermost occurence, and 2) requires adding the namespace to find it:
from lxml import etree
from pykml import parser
from pykml.factory import KML_ElementMaker as KML
with open("input.xml") as f:
doc = parser.parse(f)
root = doc.getroot()
ns = "{http://earth.google.com/kml/2.2}"
for pm in root.Document.getchildren():
#No way to get rid of namespace, for easier search?
if pm.tag==f"{ns}ExtendedData":
root.Document.remove(pm)
#How to remove innermost occurence of ExtendedData?
print(etree.tostring(doc, pretty_print=True))
Is there a way to remove all occurences in one go, or should I parse the whole tree?
Thank you.
Edit: The BeautifulSoup solution below requires adding an option "BeautifulSoup(my_xml,features="lxml")" to avoid the warning "No parser was explicitly specified".
Here's a solution using BeautifulSoup:
soup = BeautifulSoup(my_xml) # this is your xml
while True:
elem = soup.find("extendeddata")
if not elem:
break
elem.decompose()
Here's the output for your data:
<?xml version="1.0" encoding="UTF-8"?>
<html>
<body>
<kml xmlns="http://earth.google.com/kml/2.2">
<document>
<style id="placemark-red">
<IconStyle>
<Icon>
<href>http://maps.me/placemarks/placemark-red.png</href>
</Icon>
</IconStyle>
</style>
<name>
My track
</name>
<placemark>
<name>
</name>
</placemark>
</document>
</kml>
</body>
</html>
If you know the XML structure, try:
xml_root = ElementTree.parse(filename_path).getroot()
elem = xml_root.find('./ExtendedData')
xml_root.remove(elem)
or
xml_root = ElementTree.parse(filename_path).getroot()
p_elem = xml_root.find('/Placemark')
c_elem = xml_root.find('/Placemark/ExtendedData')
p_elem.remove(c_elem)
play with this ideas :)
if you don't know the xml structure, I think you need to parse the whole tree.
Simply run the empty template with Identity Transform using XSLT 1.0 which Python's lxml can run. No for/while loops or if logic needed. To handle the default namespace, define a prefix like doc:
XSLT (save a .xsl file, a special .xml file)
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:doc="http://earth.google.com/kml/2.2">
<xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<!-- IDENTITY TRANSFORM -->
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<!-- REMOVE ALL OCCURRENCES OF NODE -->
<xsl:template match="doc:ExtendedData"/>
</xsl:stylesheet>
Python
import lxml.etree as et
# LOAD XML AND XSL SOURCES
xml = et.parse('Input.xml')
xsl = et.parse('XSLT_Script.xsl')
# TRANSFORM INPUT
transform = et.XSLT(xsl)
result = transform(xml)
# PRINT TO SCREEN
print(result)
# SAVE TO FILE
with open('Output.kml', 'wb') as f:
f.write(result)

lxml throwing xslt parse error - not able to template match any other than root

I'm using python/lxml to translate source xml to a target xml format. I keep getting a XLSTParseError when I try to match template to any other elements than root ('/') but cannot figure out what is wrong - pretty sure its namespace related though...The content i am trying to access from source xml is contained in the elements. Any idea how to fix or how to get lxml to output more detailed error msg?
Source xml has declaration:
<?xml version="1.0" encoding="UTF-8"?>
<dataroot generated="2016-10-24T09:16:37" xsi:noNamespaceSchemaLocation="BOLIG_XML.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:od="urn:schemas-microsoft-com:officedata">
<BOLIG_XML>...</BOLIG_XML>
<BOLIG_XML>...</BOLIG_XML>
...
Target xml has declaration:
<?xml version="1.0" encoding="utf-8"?>
<BoligListe xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:oio:lbf:1.0.0">
<BoligStruktur>...</BoligStruktur>
<BoligStruktur>...</BoligStruktur>
...
XSLT currently looks like this:
xslt_tree = etree.XML('''\
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<BoligListe xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:oio:lbf:1.0.0">
<xsl:template match="BOLIG_XML">
<BoligStruktur>hello world</BoligStruktur>
</xsl:template>
</BoligListe>
</xsl:template>
</xsl:stylesheet>'''
)
Each xsl:template element needs to be a top level element of the stylesheet root element, you can't nest templates as you seem to try to do in your XSLT code. You then use xsl:apply-templates to process child elements with a matching template so I guess you want
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<BoligListe xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:oio:lbf:1.0.0">
<xsl:apply-templates/>
</BoligListe>
</xsl:template>
<xsl:template match="BOLIG_XML">
<BoligStruktur>hello world</BoligStruktur>
</xsl:template>
</xsl:stylesheet>

(Not so) advanced xsl transformation of child nodes into list

Input:
<root>
<aa><aaa/><bbb/><ccc/><ddd/><eee/></aa>
<bb><ggg/></bb>
</root>
Desirable output:
<root>
<aa>aaa<aa>
<aa>bbb<aa>
<aa>ccc<aa>
<aa>ddd<aa>
<aa>eee<aa>
<bb>ggg</bb>
</root>
I've come up with the simple xslt but it properly handles only and doesn't create list of tags.
XSLT:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes" omit-xml-declaration="yes"/>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<!-- select all elements that doesn't have any child nodes (elements or text etc) -->
<xsl:template match="//*[not(node())]">
<xsl:value-of select="name()"/>
</xsl:template>
</xsl:stylesheet>
Output:
<root>
<aa>aaabbbcccdddeee</aa>
<bb>ggg</bb>
</root>
P.S. It is part of python script. Does it make to do such conversions using xslt in python script? Or python solution using simple xpath and python logic will work better?
An example is not a substitute for explaining the logic behind the required transformation. I can think of several different ways to process your example input and arrive at the same output.
Here's a guess at what you want to accomplish (read the comments):
XSLT 1.0
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="/">
<root>
<!-- select all elements that don't have any child nodes -->
<xsl:for-each select="//*[not(node())]">
<!-- create an element with the name of the parent element -->
<xsl:element name="{name(..)}">
<xsl:value-of select="name()"/>
</xsl:element>
</xsl:for-each>
</root>
</xsl:template>
</xsl:stylesheet>

Categories