I want to insert s tags inside a tag that already exists and move the text of the older tag inside the s tag. For example, if my XML file looks like this:
<root>
<name>Light and dark</name>
<address>
<sector>142</sector>
<location>Noida</location>
</address>
</root>
I want it to be like this (check the name tag):
<root>
<name>
<s>Light and dark</s>
</name>
<address>
<sector>142</sector>
<location>Noida</location>
</address>
</root>
I tried using ET.SubElement but it doesn't give me the same result.
It it much better to use XSLT for such tasks.
XSLT has so called Identity Transform pattern.
Input XML
<root>
<name>Light and dark</name>
<address>
<sector>142</sector>
<location>Noida</location>
</address>
</root>
XSLT
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" encoding="utf-8" indent="yes" omit-xml-declaration="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="name">
<xsl:copy>
<s>
<xsl:value-of select="."/>
</s>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Output XML
<root>
<name>
<s>Light and dark</s>
</name>
<address>
<sector>142</sector>
<location>Noida</location>
</address>
</root>
To insert a sub-element in the XML using ElementTree XML API, append the new element then set it with the text value of the parent element.
import xml.etree.ElementTree as ET
xml = """
<root>
<name>Light and dark</name>
<address>
<sector>142</sector>
<location>Noida</location>
</address>
</root>"""
root = ET.fromstring(xml)
# 1. find name element in document
name = root.find('name')
# 2. get text value and reset the element
text = name.text
name.clear()
# 3. create new element s and set text
elt = ET.SubElement(name, "s")
elt.text = text
print(ET.tostring(root, encoding='unicode'))
To process multiple elements, add a loop around steps 1-3:
for child in root.findall('name'):
text = child.text
child.clear()
elt = ET.SubElement(child, "s")
elt.text = text
Output:
<root>
<name><s>Light and dark</s></name>
<address>
<sector>142</sector>
<location>Noida</location>
</address>
</root>
Using Python or XSLT, I would like to know how to convert highly complex, hierarchical nested XML file to CSV including all the sub-elements and without hard coding as few element nodes as possible or is rational/effective?
Please find attached simplified XML example and the output CSV to get a better understanding of what I’m trying to achieve.
The actual XML file has much more elements but the data hierarchy and the nesting is like in the example. <InvoiceRow> element and its sub-elements are the only repeating elements in the XML file, all the other elements are static that are repeated in the output CSV as many times as there are <InvoiceRow> elements in the XML file.
It’s the repeating <InvoiceRow> element that is causing trouble for me. Elements that don’t repeat are easy to convert to CSV without hard coding any elements.
Complex XML scenarios, with hierarchical data structures and multiple one-to-many relationships all being stored in a single XML file. Structured text file.
Example XML input:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<Invoice>
<SellerDetails>
<Identifier>1234-1</Identifier>
<SellerAddress>
<SellerStreet>Street1</SellerStreet>
<SellerTown>Town1</SellerTown>
</SellerAddress>
</SellerDetails>
<BuyerDetails>
<BuyerIdentifier>1234-2</BuyerIdentifier>
<BuyerAddress>
<BuyerStreet>Street2</BuyerStreet>
<BuyerTown>Town2</BuyerTown>
</BuyerAddress>
</BuyerDetails>
<BuyerNumber>001234</BuyerNumber>
<InvoiceDetails>
<InvoiceNumber>0001</InvoiceNumber>
</InvoiceDetails>
<InvoiceRow>
<ArticleName>Article1</ArticleName>
<RowText>Product Text1</RowText>
<RowText>Product Text2</RowText>
<RowAmount AmountCurrencyIdentifier="EUR">10.00</RowAmount>
</InvoiceRow>
<InvoiceRow>
<ArticleName>Article2</ArticleName>
<RowText>Product Text11</RowText>
<RowText>Product Text22</RowText>
<RowAmount AmountCurrencyIdentifier="EUR">20.00</RowAmount>
</InvoiceRow>
<InvoiceRow>
<ArticleName>Article3</ArticleName>
<RowText>Product Text111</RowText>
<RowText>Product Text222</RowText>
<RowAmount AmountCurrencyIdentifier="EUR">30.00</RowAmount>
</InvoiceRow>
<EpiDetails>
<EpiPartyDetails>
<EpiBfiPartyDetails>
<EpiBfiIdentifier IdentificationSchemeName="BIC">XXXXX</EpiBfiIdentifier>
</EpiBfiPartyDetails>
</EpiPartyDetails>
</EpiDetails>
<InvoiceUrlText>Some text</InvoiceUrlText>
</Invoice>
Example CSV output:
Identifier,SellerStreet,SellerTown,BuyerIdentifier,BuyerStreet,BuyerTown,BuyerNumber,InvoiceNumber,ArticleName,RowText,RowText,RowAmount,EpiBfiIdentifier,InvoiceUrlText
1234-1,Street1,Town1,1234-2,Street2,Town2,1234,1,Article1,Product Text1,Product Text2,10,XXXXX,Some text
1234-1,Street1,Town1,1234-2,Street2,Town2,1234,1,Article2,Product Text11,Product Text22,20,XXXXX,Some text
1234-1,Street1,Town1,1234-2,Street2,Town2,1234,1,Article3,Product Text111,Product Text222,30,XXXXX,Some text
Consider the following example:
XML
<Invoice>
<SellerDetails>
<Identifier>1234-1</Identifier>
<SellerAddress>
<SellerStreet>Street1</SellerStreet>
<SellerTown>Town1</SellerTown>
</SellerAddress>
</SellerDetails>
<BuyerDetails>
<BuyerIdentifier>1234-2</BuyerIdentifier>
<BuyerAddress>
<BuyerStreet>Street2</BuyerStreet>
<BuyerTown>Town2</BuyerTown>
</BuyerAddress>
</BuyerDetails>
<BuyerNumber>001234</BuyerNumber>
<InvoiceDetails>
<InvoiceNumber>0001</InvoiceNumber>
</InvoiceDetails>
<InvoiceRow>
<ArticleName>Article1</ArticleName>
<RowText>Product Text1</RowText>
<RowText>Product Text2</RowText>
<RowAmount AmountCurrencyIdentifier="EUR">10.00</RowAmount>
</InvoiceRow>
<InvoiceRow>
<ArticleName>Article2</ArticleName>
<RowText>Product Text11</RowText>
<RowText>Product Text22</RowText>
<RowAmount AmountCurrencyIdentifier="EUR">20.00</RowAmount>
</InvoiceRow>
<InvoiceRow>
<ArticleName>Article3</ArticleName>
<RowText>Product Text111</RowText>
<RowText>Product Text222</RowText>
<RowAmount AmountCurrencyIdentifier="EUR">30.00</RowAmount>
</InvoiceRow>
<EpiDetails>
<EpiPartyDetails>
<EpiBfiPartyDetails>
<EpiBfiIdentifier IdentificationSchemeName="BIC">XXXXX</EpiBfiIdentifier>
</EpiBfiPartyDetails>
</EpiPartyDetails>
</EpiDetails>
<InvoiceUrlText>Some text</InvoiceUrlText>
</Invoice>
XSLT 1.0
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match="Invoice">
<xsl:variable name="common-head">
<xsl:value-of select="SellerDetails/Identifier"/>
<xsl:text>,</xsl:text>
<xsl:value-of select="BuyerDetails/BuyerIdentifier"/>
<xsl:text>,</xsl:text>
<xsl:value-of select="InvoiceDetails/InvoiceNumber"/>
<xsl:text>,</xsl:text>
<!-- add more here -->
</xsl:variable>
<xsl:variable name="common-tail">
<xsl:value-of select="EpiDetails/EpiPartyDetails/EpiBfiPartyDetails/EpiBfiIdentifier"/>
<xsl:text>,</xsl:text>
<!-- add more here -->
<xsl:value-of select="InvoiceUrlText"/>
</xsl:variable>
<!-- header -->
<xsl:text>SellerIdentifier,BuyerIdentifier,InvoiceNumber,ArticleName,RowText,RowText,RowAmount,EpiBfiIdentifier,InvoiceUrlText
</xsl:text>
<!-- data -->
<xsl:for-each select="InvoiceRow">
<xsl:copy-of select="$common-head"/>
<xsl:value-of select="ArticleName"/>
<xsl:text>,</xsl:text>
<xsl:value-of select="RowAmount"/>
<xsl:text>,</xsl:text>
<!-- add more here -->
<xsl:copy-of select="$common-tail"/>
<xsl:text>
</xsl:text>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
Result
SellerIdentifier,BuyerIdentifier,InvoiceNumber,ArticleName,RowText,RowText,RowAmount,EpiBfiIdentifier,InvoiceUrlText
1234-1,1234-2,0001,Article1,10.00,XXXXX,Some text
1234-1,1234-2,0001,Article2,20.00,XXXXX,Some text
1234-1,1234-2,0001,Article3,30.00,XXXXX,Some text
Added in response to:
Is there a way in XSLT to get the same results using loop? For example loop through and output all the elements and the sub-elements except the InvoiceRow elements and then vice versa?
If you prefer, you could try something like:
XSLT 1.0
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match="Invoice">
<xsl:variable name="invoice-fields" select="//*[not(*) and not(ancestor::InvoiceRow)]" />
<xsl:variable name="common-data">
<xsl:for-each select="$invoice-fields">
<xsl:value-of select="."/>
<xsl:text>,</xsl:text>
</xsl:for-each>
</xsl:variable>
<!-- header -->
<xsl:for-each select="$invoice-fields">
<xsl:value-of select="name()"/>
<xsl:text>,</xsl:text>
</xsl:for-each>
<xsl:for-each select="InvoiceRow[1]/*">
<xsl:value-of select="name()"/>
<xsl:if test="position()!=last()">,</xsl:if>
</xsl:for-each>
<xsl:text>
</xsl:text>
<!-- data -->
<xsl:for-each select="InvoiceRow">
<xsl:copy-of select="$common-data"/>
<xsl:for-each select="*">
<xsl:value-of select="."/>
<xsl:if test="position()!=last()">,</xsl:if>
</xsl:for-each>
<xsl:text>
</xsl:text>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
The result here would be:
Identifier,SellerStreet,SellerTown,BuyerIdentifier,BuyerStreet,BuyerTown,BuyerNumber,InvoiceNumber,EpiBfiIdentifier,InvoiceUrlText,ArticleName,RowText,RowText,RowAmount
1234-1,Street1,Town1,1234-2,Street2,Town2,001234,0001,XXXXX,Some text,Article1,Product Text1,Product Text2,10.00
1234-1,Street1,Town1,1234-2,Street2,Town2,001234,0001,XXXXX,Some text,Article2,Product Text11,Product Text22,20.00
1234-1,Street1,Town1,1234-2,Street2,Town2,001234,0001,XXXXX,Some text,Article3,Product Text111,Product Text222,30.00
i.e. listing all invoice fields before the row fields.
I have done similar case like your requirements, I have created one package base on untangle, a package which can parse your XML to pure python objects like:
<?xml version="1.0"?>
<root>
<child name="child1"/>
</root>
to
obj.root.child['name'] # u'child1'
then you can easily write some code to traverse the object to get what you want.
For example, you can do something like get_items_by_tag(InvoiceRow).
Hope it helps!
In PHP, you can use registerPHPFunctions to use a PHP function inside an XSLT file like this:
<?php
$xml = <<<EOB
<allusers>
<user>
<uid>bob</uid>
<id>1</id>
</user>
<user>
<uid>joe</uid>
<id>2</id>
</user>
</allusers>
EOB;
$xsl = <<<EOB
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:php="http://php.net/xsl">
<xsl:output method="html" encoding="utf-8" indent="yes"/>
<xsl:template match="allusers">
<html><body>
<h2>Users</h2>
<table>
<xsl:for-each select="user">
<tr><td>
<xsl:value-of
select="php:function('ucfirst',concat(string(uid), string(id)))"/>
</td></tr>
</xsl:for-each>
</table>
</body></html>
</xsl:template>
</xsl:stylesheet>
EOB;
$xmldoc = DOMDocument::loadXML($xml);
$xsldoc = DOMDocument::loadXML($xsl);
$proc = new XSLTProcessor();
$proc->registerPHPFunctions('ucfirst');
$proc->importStyleSheet($xsldoc);
echo $proc->transformToXML($xmldoc);
?>
What is the Python equivalent? This is what I've tried
from lxml import etree
xml = etree.XML('''
<allusers>
<user>
<uid>bob</uid>
<id>1</id>
</user>
<user>
<uid>joe</uid>
<id>2</id>
</user>
</allusers>''')
xsl = etree.XML('''
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:f="mynamespace"
extension-element-prefixes="f">
<xsl:output method="html" encoding="utf-8" indent="yes"/>
<xsl:template match="allusers">
<html><body>
<h2>Users</h2>
<table>
<xsl:for-each select="user">
<tr><td>
<f:ucfirst>
<xsl:value-of select="concat(string(uid), string(id))"/>
</f:ucfirst>
</td></tr>
</xsl:for-each>
</table>
</body></html>
</xsl:template>
</xsl:stylesheet>
''')
extension = Ucase()
extensions = { ('mynamespace', 'ucfirst') : extension }
proc = etree.XSLT(xsl, extensions=extensions)
str(proc(xml))
class Ucase(etree.XSLTExtension):
def execute(self, context, self_node, input_node, output_parent):
title = self_node[0].text.capitalize()
output_parent.text(title)
This is a simplified version of my XSLT.
Here is how an extension function (not an element) can give the result that I think you want:
from lxml import etree
def ucfirst(context, s):
return s.capitalize()
ns = etree.FunctionNamespace("mynamespace")
ns['ucfirst'] = ucfirst
xml = etree.XML('''
<allusers>
<user>
<uid>bob</uid>
<id>1</id>
</user>
<user>
<uid>joe</uid>
<id>2</id>
</user>
</allusers>''')
xsl = etree.XML('''\
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:f="mynamespace" exclude-result-prefixes="f">
<xsl:output method="html" encoding="utf-8" indent="yes"/>
<xsl:template match="allusers">
<html><body>
<h2>Users</h2>
<table>
<xsl:for-each select="user">
<tr><td>
<xsl:value-of select="f:ucfirst(concat(string(uid), string(id)))"/>
</td></tr>
</xsl:for-each>
</table>
</body></html>
</xsl:template>
</xsl:stylesheet>
''')
transform = etree.XSLT(xsl)
result = transform(xml)
print result
Output:
<html><body>
<h2>Users</h2>
<table>
<tr><td>Bob1</td></tr>
<tr><td>Joe2</td></tr>
</table>
</body></html>
See http://lxml.de/extensions.html#xpath-extension-functions.
There are separate answers for variables and functions. I'm only really familiar with the variable half.
For variables, you can pass them as an xsl:param by passing them as keyword arguments to the call. For example:
transform = etree.XSLT(xslt_tree)
result = transform(doc_root, a="5")
Note that the argument is an XPath expression, so strings need to be quoted. There is a function that does this opaquely:
result = transform(doc_root, a=etree.XSLT.strparam(""" It's "Monty Python" """))
If you want to pass an XML fragment you could use the exslt:node-set() function.
For functions, you can expose them either as an xpath function or as an element. There is a bunch of variety and I haven't done this myself so read the docs below and/or edit this answer.
Docs for basic use and variables.
Docs for adding functions.