How to convert nodes in XML to CDATA with XSLT?

How to convert nodes in XML to CDATA with XSLT? - python

I have a source.xml file with structure like:
<products>
<product>
<id>1</id>
<description>
<style>
table{
some css here
}
</style>
<descr>
<div>name of producer like ABC&DEF</div>
<table>
<th>parameters</th>
<tr><td>name of param 1 e.g POWER CONSUMPTION</td>
<td>value of param 1 with e.g < 100 W</td></tr>
</table>
</descr>
</description>
</product>
.....................
</products>
I would like to have:
<products>
<product>
<id>1</id>
<description>
<![CDATA[
<style>
table{
some css here
}
</style>
<descr>
<div>name of producer like ABC&DEF</div>
<table>
<th>parameters</th>
<tr><td>name of param 1 e.g POWER CONSUMPTION</td>
<td>value of param 1 with e.g < 100 VA</td></tr>
</table>
]]>
</descr>
</description>
</product>
.....................
</products>
I tried .xsl stylesheets based on:
How to use in XSLT?
and
Add CDATA to an xml file
and
how to add cdata to an xml file using xsl such as:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes" encoding="UTF-8" />
<xsl:template match="/products">
<products>
<xsl:for-each select="product">
<product>
<description>
<xsl:text disable-output-escaping="yes"><![CDATA[</xsl:text>
<xsl:copy-of select="description/node()" />
<xsl:text disable-output-escaping="yes">]]></xsl:text>
</xsl:for-each>
</description>
</product>
</xsl:for-each>
</products>
</xsl:template>
</xsl:stylesheet>
and
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" >
<xsl:output method="xml" indent="yes" cdata-section-elements="description"/>
<xsl:template match="description">
<xsl:copy>
<xsl:apply-templates select="#*"/>
<xsl:variable name="subElementsText">
<xsl:apply-templates select="node()" mode="asText"/>
</xsl:variable>
</xsl:copy>
</xsl:template>
<xsl:template match="text()" mode="asText">
<xsl:copy/>
</xsl:template>
<xsl:template match="*" mode="asText">
<xsl:value-of select="concat('<',name())"/>
<xsl:for-each select="#*">
<xsl:value-of select="concat(' ',name(),'="',.,'"')"/>
</xsl:for-each>
<xsl:value-of select="'>'"/>
<xsl:apply-templates select="node()" mode="asText"/>
<xsl:value-of select="concat('</',name(),'>')"/>
</xsl:template>
<xsl:template match="#* | node()">
<xsl:copy>
<xsl:apply-templates select="#* | node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
but running my python script
import lxml.etree as ET
doc = ET.parse('source.xml')
xslt = ET.parse('modyfi.xsl')
transform = ET.XSLT(xslt)
newdoc = transform(doc)
with open(f'output.xml', 'wb') as f:
f.write(newdoc)
on SublimeText3 I allways get the same error:
lxml.etree.XMLSyntaxError: StartTag: invalid element name, {number of line and column with first appearance of illegal character}
I am sure, that solution is straight in front of me in links above, but I can't see it.
Or maybe I can't find it because I can't ask the right question. Please help, I'm new to coding.

The input XML is not well-formed. I had to fix it first. That seems to be the reason why it is failing on your end.
XML
<products>
<product>
<id>1</id>
<description>
<style>table{
some css here
}</style>
<descr>
<div>name of producer like ABC&DEF</div>
<table>
<th>parameters</th>
<tr>
<td>name of param 1 e.g POWER CONSUMPTION</td>
<td>value of param 1 with e.g < 100 W</td>
</tr>
</table>
</descr>
</description>
</product>
</products>
XSLT
<?xml version="1.0"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes" omit-xml-declaration="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="description">
<xsl:copy>
<xsl:text disable-output-escaping="yes"><![CDATA[</xsl:text>
<xsl:copy-of select="*"/>
<xsl:text disable-output-escaping="yes">]]></xsl:text>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Output
<products>
<product>
<id>1</id>
<description><![CDATA[
<style>table{
some css here
}
</style>
<descr>
<div>name of producer like ABC&DEF</div>
<table>
<th>parameters</th>
<tr>
<td>name of param 1 e.g POWER CONSUMPTION</td>
<td>value of param 1 with e.g < 100 W</td>
</tr>
</table>
</descr>]]>
</description>
</product>
</products>

In my view a clean way is to make use of a serialize function to serialize all elements you want as plain text, to then designate the parent container in the xsl:output declaration in the cdata-section-elements and to finally make sure the XSLT processor is in charge of the serialization.
Now XSLT 3 has a built-in XPath 3.1 serialize function, in Python you could use that with Saxon-C and its Python API.
For libxslt based XSLT 1 with lxml you can write an extension function in Python exposed to XSLT:
from lxml import etree as ET
def serialize(context, nodes):
return b''.join(ET.tostring(node) for node in nodes)
ns = ET.FunctionNamespace('http://example.com/mf')
ns['serialize'] = serialize
xml = ET.fromstring('<root><div><p>foo</p><p>bar</p></div></root>')
xsl = ET.fromstring('''<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:mf="http://example.com/mf" version="1.0">
<xsl:output method="xml" cdata-section-elements="div" encoding="UTF-8"/>
<xsl:template match="#* | node()">
<xsl:copy>
<xsl:apply-templates select="#* | node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="div">
<xsl:copy>
<xsl:value-of select="mf:serialize(node())"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>''')
transform = ET.XSLT(xsl)
result = transform(xml)
result.write_output("transformed.xml")
Output then is
<?xml version="1.0" encoding="UTF-8"?>
<root><div><![CDATA[<p>foo</p><p>bar</p>]]></div></root>

Related

xslt template to transform xml file to xml

Im using the python code to parse multiple .xml files
import os
import lxml.etree as ET
import sys
inputpath =
xsltfile =
outpath =
dir = []
if sys.version_info[0] >= 3:
unicode = str
for dirpath, dirnames, filenames in os.walk(inputpath):
structure = os.path.join(outpath, dirpath[len(inputpath):])
if not os.path.isdir(structure):
os.mkdir(structure)
for filename in filenames:
if filename.endswith(('.xml')):
dir = os.path.join(dirpath, filename)
print(dir)
dom = ET.parse(dir)
xslt = ET.parse(xsltfile)
transform = ET.XSLT(xslt)
newdom = transform(dom)
infile = unicode((ET.tostring(newdom, pretty_print=True,xml_declaration=True,standalone='yes')))
outfile = open(structure + "\\" + filename, 'a')
outfile.write(infile)
I do have an .xslt template which is used to sort the uuids in the same file.
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes" standalone="yes"/>
<xsl:strip-space elements="*"/>
<!-- identity transform -->
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="uuids">
<xsl:copy>
<xsl:apply-templates select="uuid">
<xsl:sort select="."/>
</xsl:apply-templates>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Desired Output should be same as source unicode char's but with sortig uuid's in the same file. I see that uuids are sorting fine, but this unicode is changing to numbers which i dont want to. I

While asking a question it is a good idea to provide a minimal reproducible example, i.e. XML/XSLT pair.
Please try the following conceptual example.
I am using SAXON 9.7.0.15
It is very possible that the last Python line is causing the issue:
outfile.write(ET.tostring(newdom,pretty_print=True,xml_declaration=True,standalone='yes').decode())
Please try Python last lines as follows:
import sys
if sys.version_info[0] >= 3:
unicode = str
...
newdom = transform(dom)
infile = unicode((ET.tostring(newdom, pretty_print=True)))
outfile = open(structure + "\\" + filename, 'a')
outfile.write(infile, encoding='utf-8', xml_declaration=True, pretty_print=True)
https://lxml.de/api/lxml.etree._ElementTree-class.html#write
Reference link: How to transform an XML file using XSLT in Python
Input XML
<?xml version="1.0" encoding="UTF-8"?>
<a:ruleInputTestConfigs xmlns:a="URI">
<a:value xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:type="xsd:string">あいうえお#domain.com</a:value>
<a:nameRef>email</a:nameRef>
<a:id>1</a:id>
</a:ruleInputTestConfigs>
XSLT
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes" standalone="yes"/>
<xsl:strip-space elements="*"/>
<!-- identity transform -->
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Output XML
<?xml version="1.0" encoding="UTF-8"?>
<a:ruleInputTestConfigs xmlns:a="URI">
<a:value xmlns:xsd="http://www.w3.org/2001/XMLSchema" xsi:type="xsd:string"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">あいうえお#domain.com</a:value>
<a:nameRef>email</a:nameRef>
<a:id>1</a:id>
</a:ruleInputTestConfigs>

xslt transformation works in Altova but not in python

I am trying to transform the one xml to another xml using xslt. Below is the xslt I am using
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns="Apartments.AP.Mits20PropertyFinal"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:var="http://schemas.microsoft.com/BizTalk/2003/var"
exclude-result-prefixes="msxsl var userCSharp"
xmlns:userCSharp="http://schemas.microsoft.com/BizTalk/2003/userCSharp"
>
<xsl:output method="xml" encoding="UTF-8" indent="yes"/>
<xsl:key name="myKey" match="ILS_Unit/Units/Unit" use="./Identification[#IDType='FloorplanID']/#IDValue"/>
<xsl:template match="/">
<PhysicalProperty>
<!-- <xsl:apply-templates select="PhysicalProperty/Management"/>
--> <xsl:apply-templates select="PhysicalProperty/Property"/>
</PhysicalProperty>
</xsl:template>
<xsl:template match="Property">
<Property>
<PropertyInfo>
<xsl:element name="MITSID">
<xsl:value-of select="./PropertyID/Identification[#IDRank='secondary']/#IDValue"/>
</xsl:element>
<xsl:element name="MarketingName">
<xsl:value-of select="./PropertyID/MarketingName"/>
</xsl:element>
<xsl:element name="TotalUnits">
</xsl:element>
<xsl:element name="AddressLine1">
<xsl:value-of select="./PropertyID/Address/AddressLine1"/>
</xsl:element>
<xsl:element name="City">
<xsl:value-of select="./PropertyID/Address/City"/>
</xsl:element>
<xsl:element name="State">
<xsl:value-of select="./PropertyID/Address/State"/>
</xsl:element>
<xsl:element name="Zip">
<xsl:value-of select="./PropertyID/Address/PostalCode"/>
</xsl:element>
</PropertyInfo>
<xsl:for-each select="./Floorplan">
<Floorplan>
<FloorplanID><xsl:value-of select="./#IDValue"/></FloorplanID>
<FloorplanName><xsl:value-of select="./Name"/></FloorplanName>
<!--This Unit count is Totals no of Units per floorplan-->
<UnitCount><xsl:value-of select="./UnitCount"/></UnitCount>
<Units>
<xsl:variable name="floorplanid" select="./#IDValue"/>
<xsl:for-each select="key('myKey',$floorplanid)">
<Unit>
<UnitNum>
<xsl:value-of select="./MarketingName"/>
</UnitNum>
<xsl:if test="./UnitLeasedStatus='Not_Available'">
<UnitLeasedStatus>Occupied</UnitLeasedStatus>
</xsl:if>
<xsl:if test="./UnitLeasedStatus='On_Notice'">
<UnitLeasedStatus>On Notice</UnitLeasedStatus>
</xsl:if>
<xsl:if test="./UnitLeasedStatus='Available'">
<UnitLeasedStatus>Available</UnitLeasedStatus>
</xsl:if>
</Unit>
</xsl:for-each>
</Units>
</Floorplan>
</xsl:for-each>
</Property>
</xsl:template>
</xsl:stylesheet>
And below is the sample xml file
<?xml version="1.0" encoding="utf-8"?>
<PhysicalProperty>
<Property>
<PropertyID>
<Identification IDValue="183dbed4-0101-4a85-954c-a4e8042d2819" IDRank="primary"/>
<Identification IDValue="6458174" IDRank="secondary"/>
<MarketingName>Westmount at London Park</MarketingName>
<Website>http://www.westmountatlondonpark.com</Website>
<Address AddressType="property">
<AddressLine1>14545 Bammel North Houston Road</AddressLine1>
<AddressLine2/>
<City>Houston</City>
<State>TX</State>
<PostalCode>77014</PostalCode>
</Address>
</PropertyID>
<ILS_Identification ILS_IdentificationType="Apartment" RentalType="Unspecified"/>
<Floorplan IDValue="171ad0f2-da57-45c0-9cf5-c61cfa26dd63" IDType="FloorplanID" IDRank="primary">
<FloorplanType>Internal</FloorplanType>
<Name>A1  </Name>
<Comment/>
<UnitCount>22</UnitCount>
<UnitsAvailable>3</UnitsAvailable>
<DisplayedUnitsAvailable>5</DisplayedUnitsAvailable>
<Room RoomType="Bedroom">
<Count>1</Count>
</Room>
<Room RoomType="Bathroom">
<Count>1</Count>
</Room>
<SquareFeet Min="602" Max="602"/>
<EffectiveRent Min="810" Max="830"/>
<Deposit DepositType="Security Deposit">
<Amount>
<ValueRange Min="150" Max="150"/>
</Amount>
</Deposit>
<File FileID="e114a438-ee82-497d-8e04-6a5a8b2381ef" active="true">
<FileType>Floorplan</FileType>
<Caption/>
<Src>https://apollostore.blob.core.windows.net/londonpark/uploads/images/floorplans/a1.7a75909c-3362-409c-82c7-aa6c959f9c99.jpg</Src>
<Rank>999</Rank>
</File>
</Floorplan>
<ILS_Unit>
<ILS_Unit IDValue="be827564-6460-4af1-9644-3f6ffa225557" IDType="UnitID" IDRank="primary">
<Units>
<Unit>
<Identification IDValue="be827564-6460-4af1-9644-3f6ffa225557" IDType="UnitID" IDRank="primary"/>
<Identification IDValue="171ad0f2-da57-45c0-9cf5-c61cfa26dd63" IDType="FloorplanID" IDRank="primary"/>
<MarketingName>1901</MarketingName>
<UnitBedrooms>1</UnitBedrooms>
<UnitBathrooms>1</UnitBathrooms>
<UnitRent>765</UnitRent>
<UnitLeasedStatus>Not_Available</UnitLeasedStatus>
<FloorplanName>A1  </FloorplanName>
</Unit>
</Units>
<Comment/>
<EffectiveRent Min="765" Max="765"/>
<Deposit DepositType="Security Deposit">
<Amount>
<ValueRange Min="150" Max="150"/>
</Amount>
</Deposit>
</ILS_Unit>
</ILS_Unit>
</Property>
</PhysicalProperty>
The transformation works fine using altova tool but does not work in python.
Below is the python script I am using.
from lxml import etree
import os
import glob
xslt = etree.parse("fl.xslt")
dom = etree.parse("f.xml",)
transform = etree.XSLT(xslt)
try:
newdom = transform(dom)
root = etree.parse(newdom)
properties = root.findall("./Property")
print(properties)
except Exception as e:
print (e)
for error in transform.error_log:
print(error.message, error.line)
print(etree.tostring(newdom, pretty_print=True))
I am trying to print the property nodes and I also tried to write the result to .xml file but it returns nothing. Could some one tell what the issue could be.
Surprisingly it works fine in Altova.
Below are the errors that are thrown.
line 53, in <module>
root = etree.parse(newdom)
File "src\lxml\etree.pyx", line 3519, in lxml.etree.parse
File "src\lxml\parser.pxi", line 1862, in lxml.etree._parseDocument
TypeError: cannot parse from 'lxml.etree._XSLTResultTree'

Specific error has nothing to do with XSLT but your attempted parse of the result. Like Python's built-in xml.etree, the parse function of lxml.etree requires a file-like object. However, the result from an XSLT transformation in lxml is a ElementTree object that you can then directly run any XML DOM calls like findall, iterfind, etc.
Therefore, simply remove the parse line. Additionally, because you have a default namespace, consider assigning a temporary prefix to access nodes. Also, consider xpath in lxml.
newdom = transform(dom)
properties = newdom.findall("./doc:Property", namespaces={'doc': 'Apartments.AP.Mits20PropertyFinal'})
print(properties)
# [<Element {Apartments.AP.Mits20PropertyFinal}Property at 0x136b2f94b08>]
properties = newdom.xpath("./doc:Property", namespaces={'doc': 'Apartments.AP.Mits20PropertyFinal'})
print(properties)
# [<Element {Apartments.AP.Mits20PropertyFinal}Property at 0x136b2f94b08>]
Output
print(etree.tostring(newdom, pretty_print=True).decode("utf-8"))
# <PhysicalProperty xmlns="Apartments.AP.Mits20PropertyFinal">
# <Property>
# <PropertyInfo>
# <MITSID>6458174</MITSID>
# <MarketingName>Westmount at London Park</MarketingName>
# <TotalUnits/>
# <AddressLine1>14545 Bammel North Houston Road</AddressLine1>
# <City>Houston</City>
# <State>TX</State>
# <Zip>77014</Zip>
# </PropertyInfo>
# <Floorplan>
# <FloorplanID>171ad0f2-da57-45c0-9cf5-c61cfa26dd63</FloorplanID>
# <FloorplanName>A1 </FloorplanName>
# <UnitCount>22</UnitCount>
# <Units>
# <Unit>
# <UnitNum>1901</UnitNum>
# <UnitLeasedStatus>Occupied</UnitLeasedStatus>
# </Unit>
# </Units>
# </Floorplan>
# </Property>
# </PhysicalProperty>

Parsing PDML / XML and selecting specific fields for Pandas

I am using the ElementTreeXML API and trying to parse a large PDML (XML) file in Python. I am trying to get a tabular Pandas dataframe output with specific fields of information. The following is a subset of the actual file.
<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="pdml2html.xsl"?>
<!-- You can find pdml2html.xsl in C:\Program Files\Wireshark or at https://code.wireshark.org/review/gitweb?p=wireshark.git;a=blob_plain;f=pdml2html.xsl. -->
<pdml version="0" creator="wireshark/3.2.2" time="Sun Mar 22 23:53:43 2020" capture_file="C:\Users\anyoung\AppData\Local\Temp\wireshark_Wi-Fi 2_20200322234518_a20824.pcapng">
<packet>
<proto name="geninfo" pos="0" showname="General information" size="66">
<field name="frame.cap_len" showname="Capture Length: 66 bytes (528 bits)" size="0" pos="0" show="66"/>
<field name="frame.marked" showname="Frame is marked: False" size="0" pos="0" show="0"/>
<field name="frame.cap_len" showname="Capture Length: 66 bytes (528 bits)" size="0" pos="0" show="66"/>
<field name="frame.marked" showname="Frame is marked: False" size="0" pos="0" show="0"/>
<field name="caplen" pos="0" show="66" showname="Captured Length" value="42" size="66"/>
<field name="timestamp" pos="0" show="Mar 22, 2020 23:45:34.045301000 Pacific Daylight Time" showname="Captured Time" value="1584945934.045301000" size="66"/>
</proto>
I want to get a table like:
field size value
frame.cap_len 0 null
frame.marked 0 null
timestamp 66 1584945934.045301000
I am really struggling with the syntax to do the above. I haven't been able to get anything that even comes close.

Here's another XSLT example (this is more for #Kristian).
XML Input (input.xml)
<pdml version="0" creator="wireshark/3.2.2" time="Sun Mar 22 23:53:43 2020" capture_file="C:\Users\anyoung\AppData\Local\Temp\wireshark_Wi-Fi 2_20200322234518_a20824.pcapng">
<packet>
<proto name="geninfo" pos="0" showname="General information" size="66">
<field name="frame.cap_len" showname="Capture Length: 66 bytes (528 bits)" size="0" pos="0" show="66"/>
<field name="frame.marked" showname="Frame is marked: False" size="0" pos="0" show="0"/>
<field name="frame.cap_len" showname="Capture Length: 66 bytes (528 bits)" size="0" pos="0" show="66"/>
<field name="frame.marked" showname="Frame is marked: False" size="0" pos="0" show="0"/>
<field name="caplen" pos="0" show="66" showname="Captured Length" value="42" size="66"/>
<field name="timestamp" pos="0" show="Mar 22, 2020 23:45:34.045301000 Pacific Daylight Time" showname="Captured Time" value="1584945934.045301000" size="66"/>
</proto>
</packet>
</pdml>
XSLT 1.0 (test.xsl)
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:strip-space elements="*"/>
<xsl:variable name="tab" select="' '"/>
<xsl:variable name="nl" select="'
'"/>
<xsl:template match="/">
<xsl:value-of select="concat('field',$tab,'size',$tab,'value',$nl)"/>
<xsl:apply-templates select=".//field"/>
</xsl:template>
<xsl:template match="field">
<xsl:value-of select="concat(#name,$tab,#size,$tab,#value,$nl)"/>
</xsl:template>
</xsl:stylesheet>
Python 3
from lxml import etree
tree = etree.parse("input.xml")
xslt = etree.parse("test.xsl")
new_tree = tree.xslt(xslt)
print(new_tree)
Printed Output
field size value
frame.cap_len 0
frame.marked 0
frame.cap_len 0
frame.marked 0
caplen 66 42
timestamp 66 1584945934.045301000

lxml
https://lxml.de/xpathxslt.html
lxml will allow you to transform XML documents using XSLT. For some reason, XSLT is overlooked and brute-force programming objects are used instead. Nevertheless, I prefer to use XLST when handling and transforming XML data.
I highly recommend learning XSLT and utilizing it regularly if you must handle XML data on a daily basis.
XLST to transform your XML document: packet.xsl
Variables are used for delimiter and end-of-line (EOL) to allow for easy modification.
Templates for header-row and field are used to allow the re-arrangement or addition of new fields when necessary.
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" media-type="string" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:variable name="delimiter" select="' '"/>
<xsl:variable name="EOL" select="'
'"/>
<xsl:template match="/pdml/packet/proto">
<xsl:call-template name="header-row"/>
<xsl:apply-templates match="field"/>
</xsl:template>
<xsl:template match="field">
<xsl:value-of select="#name"/>
<xsl:value-of select="$delimiter"/>
<xsl:value-of select="#size"/>
<xsl:value-of select="$delimiter"/>
<xsl:value-of select="#value"/>
<xsl:value-of select="$EOL"/>
<xsl:apply-templates select="*"/>
</xsl:template>
<xsl:template name="header-row">
<xsl:element name="row">
<xsl:text>field</xsl:text>
<xsl:value-of select="$delimiter"/>
<xsl:text>size</xsl:text>
<xsl:value-of select="$delimiter"/>
<xsl:text>value</xsl:text>
<xsl:value-of select="$EOL"/>
</xsl:element>
</xsl:template>
</xsl:stylesheet>
Example Python XML/XSLT transformation code
Use the python script provided by #Daniel Haley. I just named my file test.py
Execute XSLT with XML input
XSLT = packet.xsl
XML = packet.xml
Assuming your packet.xml is a well formed XML document and not an incomplete fragement.
./test.py
Tab delimited output.
field size value
frame.cap_len 0
frame.marked 0
frame.cap_len 0
frame.marked 0
caplen 66 42
timestamp 66 1584945934.045301000

How to get content of an XML element as string?

Considering this XML example:
<data>
<items>
<item name="item1">item1pre <bold>ok!</bold> item1post</item>
<item name="item2">item2</item>
</items>
</data>
I am looking for a way to get the following result:
"item1pre **ok! ** item1post"
I thought of getting all the content of item1 as a string "item1pre <'bold> ok!<'/bold> item1post" and then replace "<'bold>" and "<'/bold>" by "**", but I don't know how to get that.

xml="""
<data>
<items>
<item name="item1">item1pre<bold>ok!</bold>item1post</item>
<item name="item2">item2</item>
</items>
</data>
"""
import xml.etree.ElementTree as ET
# python included module
def cleaned_strings_from_xml(xml_str, tag='item'):
"""
finds all items of type tag from xml-string
:param xml_str: valid xml structure as string
:param tag: tag to search inside the xml
:returns: list of all texts of 'tag'-items
"""
strings = []
root = ET.fromstring(xml)
for item in root.iter(tag):
item_str = ET.tostring(item).decode('utf-8')
item_str = item_str.replace('<bold>', ' **').replace('</bold>', ' **')
strings.append(ET.fromstring(item_str).text)
return strings
print(cleaned_strings_from_xml(xml))

You could offload all xml processing to libxml by using an xslt transformation. Libxml is written in C which should be quicker:
from lxml import etree
transform = etree.XSLT(etree.XML('''
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" />
<xsl:strip-space elements="*"/>
<xsl:template match="/">
<xsl:apply-templates />
</xsl:template>
<xsl:template match="data/items/item[#name = 'item1']">
<xsl:text>"</xsl:text>
<xsl:value-of select="text()"/>
<xsl:text>**</xsl:text>
<xsl:value-of select="bold/."/>
<xsl:text>**</xsl:text>
<xsl:value-of select="bold/following-sibling::text()[1]"/>
<xsl:text>"</xsl:text>
</xsl:template>
<xsl:template match="data/items/item[#name != 'item1']" />
</xsl:stylesheet>
'''))
with open("source.xml") as f:
print(transform(etree.parse(f)))
In a nutshell: Match the item element with name attribute 'item1' then use relative xpath expressions to extract the strings.

Python XML and XPath to sort things out

Let's say I have an XML as follows.
<a>
<b>
<c>A</c>
</b>
<bb>
<c>B</c>
</bb>
<c>
X
</c>
</a>
I need to parse this XML into dictionary X for a/b/c and a/b'/c, but dictionary Y for a/c.
dictionary X
X[a_b_c] = A
X[a_bb_c] = B
dictionary T
T[a_c] = X
Q : I'd like to make a mapping file for this in XML file using XPath. How can I do this?
I think of having mapping.xml as follows.
<mapping>
<from>a/c</from><to>dictionary T<to>
....
</mapping>
And using 'a/c' to get X, and put it in dictionary T. Is there any better ways to go?

Maybe you could do this with XSLT. This stylesheet:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:key name="dict" match="item" use="#dict"/>
<xsl:key name="path" match="*[not(*)]" use="concat(name(../..),'/',
name(..),'/',
name())"/>
<xsl:variable name="map">
<item path="a/b/c" dict="X"/>
<item path="a/bb/c" dict="X"/>
<item path="/a/c" dict="T"/>
</xsl:variable>
<xsl:template match="/">
<xsl:variable name="input" select="."/>
<xsl:for-each select="document('')/*/xsl:variable[#name='map']/*[count(.|key('dict',#dict)[1])=1]">
<xsl:variable name="dict" select="#dict"/>
<xsl:variable name="path" select="../item[#dict=$dict]/#path"/>
<xsl:value-of select="concat('dictionary ',$dict,'
')"/>
<xsl:for-each select="$input">
<xsl:apply-templates select="key('path',$path)">
<xsl:with-param name="dict" select="$dict"/>
</xsl:apply-templates>
</xsl:for-each>
</xsl:for-each>
</xsl:template>
<xsl:template match="*">
<xsl:param name="dict"/>
<xsl:variable name="path" select="concat(name(../..),'_',
name(..),'_',
name())"/>
<xsl:value-of select="concat($dict,'[',
translate(substring($path,
1,
1),
'_',
''),
substring($path,2),'] = ',
normalize-space(.),'
')"/>
</xsl:template>
</xsl:stylesheet>
Output:
dictionary X
X[a_b_c] = A
X[a_bb_c] = B
dictionary T
T[a_c] = X
EDIT: Pretty things a bit.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to convert nodes in XML to CDATA with XSLT? - python

Related

xslt template to transform xml file to xml

xslt transformation works in Altova but not in python

Parsing PDML / XML and selecting specific fields for Pandas

How to get content of an XML element as string?

Python XML and XPath to sort things out

Categories

Resources