Editing EPG XML with Python - python

I am new to Python and looking to modify an XML file to change some things around. I can provide an example followed by what I would like the output to be.
Original....
<programme channel="I9.11363.zap2it.com" start="20220729080000 -0500" stop="20220729090000 -0500">
<title lang="en">Live with Kelly and Ryan</title>
<sub-title lang="en">Live's Ready or Not Week; Live's Foodfluencer Friday Faceoff</sub-title>
<desc lang="en">Making an emergency evacuation kit; a chef provides a summertime recipe.</desc>
<date>20220729</date>
<category lang="en">Talk</category>
<category lang="en">Series</category>
<length units="minutes">60</length>
<icon src="https://zap2it.tmsimg.com/assets/p14101643_b_v13_ah.jpg" />
<url>https://tvlistings.zap2it.com//overview.html?programSeriesId=SH02684484&tmsId=EP026844841372</url>
<episode-num system="common">S06E232</episode-num>
<episode-num system="dd_progid">EP02684484.1372</episode-num>
<episode-num system="xmltv_ns">5.231.</episode-num>
<audio>
<stereo>stereo</stereo>
</audio>
<new />
<subtitles type="teletext" />
<rating>
<value>TV-PG</value>
</rating>
</programme>
Desired Output.... Moving the "New" tag into the title and removing the <episode-num system="common">S06E232</episode-num> and placing it into the description.
<programme channel="I9.11363.zap2it.com" start="20220729080000 -0500" stop="20220729090000 -0500">
<title lang="en">Live with Kelly and Ryan New</title>
<sub-title lang="en">Live's Ready or Not Week; Live's Foodfluencer Friday Faceoff</sub-title>
<desc lang="en">S06E232 (return)Making an emergency evacuation kit; a chef provides a summertime recipe. TV-PG 20220729 </desc>
<icon src="https://zap2it.tmsimg.com/assets/p14101643_b_v13_ah.jpg" />
<url>https://tvlistings.zap2it.com//overview.html?programSeriesId=SH02684484&tmsId=EP026844841372</url>
</programme>

Here is an XSLT based solution.
Input XML
<?xml version="1.0"?>
<programme channel="I9.11363.zap2it.com" start="20220729080000 -0500" stop="20220729090000 -0500">
<title lang="en">Live with Kelly and Ryan</title>
<sub-title lang="en">Live's Ready or Not Week; Live's Foodfluencer Friday Faceoff</sub-title>
<desc lang="en">Making an emergency evacuation kit; a chef provides a summertime recipe.</desc>
<date>20220729</date>
<category lang="en">Talk</category>
<category lang="en">Series</category>
<length units="minutes">60</length>
<icon src="https://zap2it.tmsimg.com/assets/p14101643_b_v13_ah.jpg"/>
<url>https://tvlistings.zap2it.com//overview.html?programSeriesId=SH02684484&tmsId=EP026844841372</url>
<episode-num system="common">S06E232</episode-num>
<episode-num system="dd_progid">EP02684484.1372</episode-num>
<episode-num system="xmltv_ns">5.231.</episode-num>
<audio>
<stereo>stereo</stereo>
</audio>
<new/>
<subtitles type="teletext"/>
<rating>
<value>TV-PG</value>
</rating>
</programme>
XSLT
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" encoding="utf-8" indent="yes" omit-xml-declaration="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="title">
<xsl:copy>
<xsl:attribute name="lang">en</xsl:attribute>
<xsl:value-of select="concat(., ' new')"/>
</xsl:copy>
</xsl:template>
<xsl:template match="desc">
<xsl:copy>
<xsl:attribute name="lang">en</xsl:attribute>
<xsl:value-of select="concat(/programme/episode-num[#system='common'], ' ', .)"/>
</xsl:copy>
</xsl:template>
<xsl:template match="date | category | length | episode-num | audio | new | subtitles | rating"/>
</xsl:stylesheet>
Output XML
<programme stop="20220729090000 -0500" channel="I9.11363.zap2it.com" start="20220729080000 -0500">
<title lang="en">Live with Kelly and Ryan new</title>
<sub-title lang="en">Live's Ready or Not Week; Live's Foodfluencer Friday Faceoff</sub-title>
<desc lang="en">S06E232 Making an emergency evacuation kit; a chef provides a summertime recipe.</desc>
<icon src="https://zap2it.tmsimg.com/assets/p14101643_b_v13_ah.jpg"/>
<url>https://tvlistings.zap2it.com//overview.html?programSeriesId=SH02684484&tmsId=EP026844841372</url>
</programme>
Python
import os
import lxml.etree as ET
inputfile = "D:\\temp\\input.xml"
xsltfile = "D:\\temp\\process.xslt"
outfile = "D:\\output\\output.xml"
dom = ET.parse(inputfile)
xslt = ET.parse(xsltfile)
transform = ET.XSLT(xslt)
newdom = transform(dom,
id=XSLT.strparam("bk101"),
author=XSLT.strparam("New Author"))
infile = unicode((ET.tostring(newdom, pretty_print=True)))
outfile = open(outfile, 'a')
outfile.write(infile)

Related

Create a series tag for multiple episodes - XML to XSLT

I have an XML feed with multiple tv show's episodes.
I need to convert it into an RDF/XML file using XLST template.
Each RDF should be one tv show, its episodes should be nested properties under it along with extracted info about the show.
The way i'm transforming it, all episodes (item tag) become RDFs.
How do i extract tv show's metadata from episode i.e items and then nest the tv show's episodes under the tv show rdf?
CURRENT BEHAVIOUR
rdf:Group rdf:id=New Item in Series
ebucore:groupName: Football
ebucore:Season: 1
 ebucore:Episode: 1
 ebucore:groupId: Better Call Saul
 ebucore:groupDescription: Three young up
#----------------------------------------------------------------------
rdf:Group rdf:id=New Item in Series
ebucore:groupName: Basketball
ebucore:Season: 1
 ebucore:Episode: 2
 ebucore:groupId: Better Call Saul
 ebucore:groupDescription: Three young up
#----------------------------------------------------------------------
EXPECTED BEHAVIOUR IS BELOW
Each show as an rdf with its episodes nested properties under it.
rdf
show title better call saul
show description: Three young up
ep
ep.id 1
ep
ep.id 2
Attached below XML raw, XLST & code
<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
xmlns:media="http://search.yahoo.com/mrss/" xmlns:ott="http://rss.ott.com/1.0" xmlns:name="http://www.name.com/rss/extensions/" xmlns:dcterms="http://purl.org/dc/terms/">
<channel>
<title>ABC</title>
<description>Feed of items</description>
<pubDate>Wed, 06 May 2022</pubDate>
<item>
<guid isPermaLink="false">some_alphaNumeric</guid>
<title>Better Call Saul</title>
<name:episodic type="episode">
<name:seriesId>Better Call Saul</name:seriesId>
<name:seasonNum>1</name:seasonNum>
<name:episodeNum>1</name:episodeNum>
</name:episodic>
<description>This is a victorious story of a Lawyer </description>
<media:keywords>urban, African American, destiny</media:keywords>
<media:subTitle lang="en" type="application/octet-stream"/>
<media:category scheme="http://www.name.com">Drama - Drama</media:category>
<media:group>
<media:content url="video_url.mp4?Jd" type="application/mp4" duration="4693" medium="video" expression="full" bitrate="7627"/>
</media:group>
<media:group>
<media:thumbnail url="some_url" height="450" width="800"/>
<media:thumbnail url="https://mp4video.ott.com/04" height="1920" width="1080"/>
</media:group>
<media:rating scheme="simple">nonadult</media:rating>
<name:cuePoints>10</name:cuePoints>
<pubDate>Tue, 28 Jul 2020 13:29:16 -0400</pubDate>
</item>
<item>
<guid isPermaLink="false">some_alphaNumeric</guid>
<title>Better Call Saul</title>
<name:episodic type="episode">
<name:seriesId>Better Call Saul</name:seriesId>
<name:seasonNum>1</name:seasonNum>
<name:episodeNum>2</name:episodeNum>
</name:episodic>
<description>This is a victorious story of a Lawyer </description>
<media:keywords>urban, African American, destiny</media:keywords>
<media:subTitle lang="en" type="application/octet-stream"/>
<media:category scheme="http://www.name.com">Drama - Drama</media:category>
<media:group>
<media:content url="video_url.mp4?acs3" type="application/mp4" duration="4693" medium="video" expression="full" bitrate="7627"/>
</media:group>
<media:group>
<media:thumbnail url="some_url" height="450" width="800"/>
<media:thumbnail url="https://mp4video.ott.com/04" height="1920" width="1080"/>
</media:group>
<media:rating scheme="simple">nonadult</media:rating>
<name:cuePoints>10</name:cuePoints>
<pubDate>Tue, 28 Aug 2020 13:29:16 -0400</pubDate>
</item>
and the XSLT
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:html="http://www.w3.org/1999/xhtml"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:foaf="http://xmlns.com/foaf/spec/"
xmlns:ebucore="http://www.ebu.ch/metadata/ontologies/ebucore/ebucore#"
xmlns:ebulang="http://resolution.org/res#"
xmlns:foo="http://example.com/foo#"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:name="http://www.name.com/rss/extensions/">
<xsl:template match="/">
<rdf:RDF>
Number of Items in Rss Feed: <xsl:value-of select="count(//item)" />
<xsl:apply-templates/>
</rdf:RDF>
</xsl:template>
<xsl:template match="item">
<xsl:text>#----------------------------------------------------------------------</xsl:text><xsl:text>
</xsl:text>
<xsl:text>rdf:Group rdf:id=New Item in Series</xsl:text><xsl:text>
</xsl:text>
<xsl:text>ebucore:groupName: </xsl:text>
<xsl:value-of select="normalize-space(title)"/>
<xsl:text>
</xsl:text>
<xsl:for-each select="name:episodic">
<xsl:text>ebucore:Season: </xsl:text>
<xsl:value-of select="normalize-space(name:seasonNum)"/>
<xsl:text>
</xsl:text>
<xsl:text> </xsl:text>
<xsl:text>ebucore:Episode: </xsl:text>
<xsl:value-of select="normalize-space(name:episodeNum)"/>
<xsl:text>
</xsl:text>
<xsl:text> </xsl:text>
<xsl:text>ebucore:groupId: </xsl:text>
<xsl:value-of select="name:seriesId"/>
<xsl:text>
</xsl:text>
<xsl:text> </xsl:text>
</xsl:for-each>
<xsl:text>ebucore:groupDescription: </xsl:text>
<xsl:if test="count(description)=1">
<xsl:value-of select="normalize-space(description)"/>
</xsl:if>
<xsl:if test="count(description)=0">
<xsl:text>Null</xsl:text>
</xsl:if>
</xsl:template>
</xsl:stylesheet>
code
import lxml.etree as ET
import xml.dom.minidom
from pprint import pprint
import os
dom_rssFeed = ET.parse('raw.xml')
xslt_rssFeed = ET.parse('mapper.xslt')
transform_rssFeed = ET.XSLT(xslt_rssFeed)
finaldom = transform_rssFeed(dom_rssFeed)
new = ET.tostring(finaldom)
dom_transformed = xml.dom.minidom.parseString(new)
pretty_xml_as_string = dom_transformed.toprettyxml()
print(pretty_xml_as_string)

How to convert nodes in XML to CDATA with XSLT?

I have a source.xml file with structure like:
<products>
<product>
<id>1</id>
<description>
<style>
table{
some css here
}
</style>
<descr>
<div>name of producer like ABC&DEF</div>
<table>
<th>parameters</th>
<tr><td>name of param 1 e.g POWER CONSUMPTION</td>
<td>value of param 1 with e.g < 100 W</td></tr>
</table>
</descr>
</description>
</product>
.....................
</products>
I would like to have:
<products>
<product>
<id>1</id>
<description>
<![CDATA[
<style>
table{
some css here
}
</style>
<descr>
<div>name of producer like ABC&DEF</div>
<table>
<th>parameters</th>
<tr><td>name of param 1 e.g POWER CONSUMPTION</td>
<td>value of param 1 with e.g < 100 VA</td></tr>
</table>
]]>
</descr>
</description>
</product>
.....................
</products>
I tried .xsl stylesheets based on:
How to use in XSLT?
and
Add CDATA to an xml file
and
how to add cdata to an xml file using xsl such as:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes" encoding="UTF-8" />
<xsl:template match="/products">
<products>
<xsl:for-each select="product">
<product>
<description>
<xsl:text disable-output-escaping="yes"><![CDATA[</xsl:text>
<xsl:copy-of select="description/node()" />
<xsl:text disable-output-escaping="yes">]]></xsl:text>
</xsl:for-each>
</description>
</product>
</xsl:for-each>
</products>
</xsl:template>
</xsl:stylesheet>
and
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" >
<xsl:output method="xml" indent="yes" cdata-section-elements="description"/>
<xsl:template match="description">
<xsl:copy>
<xsl:apply-templates select="#*"/>
<xsl:variable name="subElementsText">
<xsl:apply-templates select="node()" mode="asText"/>
</xsl:variable>
</xsl:copy>
</xsl:template>
<xsl:template match="text()" mode="asText">
<xsl:copy/>
</xsl:template>
<xsl:template match="*" mode="asText">
<xsl:value-of select="concat('<',name())"/>
<xsl:for-each select="#*">
<xsl:value-of select="concat(' ',name(),'="',.,'"')"/>
</xsl:for-each>
<xsl:value-of select="'>'"/>
<xsl:apply-templates select="node()" mode="asText"/>
<xsl:value-of select="concat('</',name(),'>')"/>
</xsl:template>
<xsl:template match="#* | node()">
<xsl:copy>
<xsl:apply-templates select="#* | node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
but running my python script
import lxml.etree as ET
doc = ET.parse('source.xml')
xslt = ET.parse('modyfi.xsl')
transform = ET.XSLT(xslt)
newdoc = transform(doc)
with open(f'output.xml', 'wb') as f:
f.write(newdoc)
on SublimeText3 I allways get the same error:
lxml.etree.XMLSyntaxError: StartTag: invalid element name, {number of line and column with first appearance of illegal character}
I am sure, that solution is straight in front of me in links above, but I can't see it.
Or maybe I can't find it because I can't ask the right question. Please help, I'm new to coding.
The input XML is not well-formed. I had to fix it first. That seems to be the reason why it is failing on your end.
XML
<products>
<product>
<id>1</id>
<description>
<style>table{
some css here
}</style>
<descr>
<div>name of producer like ABC&DEF</div>
<table>
<th>parameters</th>
<tr>
<td>name of param 1 e.g POWER CONSUMPTION</td>
<td>value of param 1 with e.g < 100 W</td>
</tr>
</table>
</descr>
</description>
</product>
</products>
XSLT
<?xml version="1.0"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes" omit-xml-declaration="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="description">
<xsl:copy>
<xsl:text disable-output-escaping="yes"><![CDATA[</xsl:text>
<xsl:copy-of select="*"/>
<xsl:text disable-output-escaping="yes">]]></xsl:text>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Output
<products>
<product>
<id>1</id>
<description><![CDATA[
<style>table{
some css here
}
</style>
<descr>
<div>name of producer like ABC&DEF</div>
<table>
<th>parameters</th>
<tr>
<td>name of param 1 e.g POWER CONSUMPTION</td>
<td>value of param 1 with e.g < 100 W</td>
</tr>
</table>
</descr>]]>
</description>
</product>
</products>
In my view a clean way is to make use of a serialize function to serialize all elements you want as plain text, to then designate the parent container in the xsl:output declaration in the cdata-section-elements and to finally make sure the XSLT processor is in charge of the serialization.
Now XSLT 3 has a built-in XPath 3.1 serialize function, in Python you could use that with Saxon-C and its Python API.
For libxslt based XSLT 1 with lxml you can write an extension function in Python exposed to XSLT:
from lxml import etree as ET
def serialize(context, nodes):
return b''.join(ET.tostring(node) for node in nodes)
ns = ET.FunctionNamespace('http://example.com/mf')
ns['serialize'] = serialize
xml = ET.fromstring('<root><div><p>foo</p><p>bar</p></div></root>')
xsl = ET.fromstring('''<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:mf="http://example.com/mf" version="1.0">
<xsl:output method="xml" cdata-section-elements="div" encoding="UTF-8"/>
<xsl:template match="#* | node()">
<xsl:copy>
<xsl:apply-templates select="#* | node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="div">
<xsl:copy>
<xsl:value-of select="mf:serialize(node())"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>''')
transform = ET.XSLT(xsl)
result = transform(xml)
result.write_output("transformed.xml")
Output then is
<?xml version="1.0" encoding="UTF-8"?>
<root><div><![CDATA[<p>foo</p><p>bar</p>]]></div></root>

xslt transformation works in Altova but not in python

I am trying to transform the one xml to another xml using xslt. Below is the xslt I am using
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns="Apartments.AP.Mits20PropertyFinal"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:var="http://schemas.microsoft.com/BizTalk/2003/var"
exclude-result-prefixes="msxsl var userCSharp"
xmlns:userCSharp="http://schemas.microsoft.com/BizTalk/2003/userCSharp"
>
<xsl:output method="xml" encoding="UTF-8" indent="yes"/>
<xsl:key name="myKey" match="ILS_Unit/Units/Unit" use="./Identification[#IDType='FloorplanID']/#IDValue"/>
<xsl:template match="/">
<PhysicalProperty>
<!-- <xsl:apply-templates select="PhysicalProperty/Management"/>
--> <xsl:apply-templates select="PhysicalProperty/Property"/>
</PhysicalProperty>
</xsl:template>
<xsl:template match="Property">
<Property>
<PropertyInfo>
<xsl:element name="MITSID">
<xsl:value-of select="./PropertyID/Identification[#IDRank='secondary']/#IDValue"/>
</xsl:element>
<xsl:element name="MarketingName">
<xsl:value-of select="./PropertyID/MarketingName"/>
</xsl:element>
<xsl:element name="TotalUnits">
</xsl:element>
<xsl:element name="AddressLine1">
<xsl:value-of select="./PropertyID/Address/AddressLine1"/>
</xsl:element>
<xsl:element name="City">
<xsl:value-of select="./PropertyID/Address/City"/>
</xsl:element>
<xsl:element name="State">
<xsl:value-of select="./PropertyID/Address/State"/>
</xsl:element>
<xsl:element name="Zip">
<xsl:value-of select="./PropertyID/Address/PostalCode"/>
</xsl:element>
</PropertyInfo>
<xsl:for-each select="./Floorplan">
<Floorplan>
<FloorplanID><xsl:value-of select="./#IDValue"/></FloorplanID>
<FloorplanName><xsl:value-of select="./Name"/></FloorplanName>
<!--This Unit count is Totals no of Units per floorplan-->
<UnitCount><xsl:value-of select="./UnitCount"/></UnitCount>
<Units>
<xsl:variable name="floorplanid" select="./#IDValue"/>
<xsl:for-each select="key('myKey',$floorplanid)">
<Unit>
<UnitNum>
<xsl:value-of select="./MarketingName"/>
</UnitNum>
<xsl:if test="./UnitLeasedStatus='Not_Available'">
<UnitLeasedStatus>Occupied</UnitLeasedStatus>
</xsl:if>
<xsl:if test="./UnitLeasedStatus='On_Notice'">
<UnitLeasedStatus>On Notice</UnitLeasedStatus>
</xsl:if>
<xsl:if test="./UnitLeasedStatus='Available'">
<UnitLeasedStatus>Available</UnitLeasedStatus>
</xsl:if>
</Unit>
</xsl:for-each>
</Units>
</Floorplan>
</xsl:for-each>
</Property>
</xsl:template>
</xsl:stylesheet>
And below is the sample xml file
<?xml version="1.0" encoding="utf-8"?>
<PhysicalProperty>
<Property>
<PropertyID>
<Identification IDValue="183dbed4-0101-4a85-954c-a4e8042d2819" IDRank="primary"/>
<Identification IDValue="6458174" IDRank="secondary"/>
<MarketingName>Westmount at London Park</MarketingName>
<Website>http://www.westmountatlondonpark.com</Website>
<Address AddressType="property">
<AddressLine1>14545 Bammel North Houston Road</AddressLine1>
<AddressLine2/>
<City>Houston</City>
<State>TX</State>
<PostalCode>77014</PostalCode>
</Address>
</PropertyID>
<ILS_Identification ILS_IdentificationType="Apartment" RentalType="Unspecified"/>
<Floorplan IDValue="171ad0f2-da57-45c0-9cf5-c61cfa26dd63" IDType="FloorplanID" IDRank="primary">
<FloorplanType>Internal</FloorplanType>
<Name>A1  </Name>
<Comment/>
<UnitCount>22</UnitCount>
<UnitsAvailable>3</UnitsAvailable>
<DisplayedUnitsAvailable>5</DisplayedUnitsAvailable>
<Room RoomType="Bedroom">
<Count>1</Count>
</Room>
<Room RoomType="Bathroom">
<Count>1</Count>
</Room>
<SquareFeet Min="602" Max="602"/>
<EffectiveRent Min="810" Max="830"/>
<Deposit DepositType="Security Deposit">
<Amount>
<ValueRange Min="150" Max="150"/>
</Amount>
</Deposit>
<File FileID="e114a438-ee82-497d-8e04-6a5a8b2381ef" active="true">
<FileType>Floorplan</FileType>
<Caption/>
<Src>https://apollostore.blob.core.windows.net/londonpark/uploads/images/floorplans/a1.7a75909c-3362-409c-82c7-aa6c959f9c99.jpg</Src>
<Rank>999</Rank>
</File>
</Floorplan>
<ILS_Unit>
<ILS_Unit IDValue="be827564-6460-4af1-9644-3f6ffa225557" IDType="UnitID" IDRank="primary">
<Units>
<Unit>
<Identification IDValue="be827564-6460-4af1-9644-3f6ffa225557" IDType="UnitID" IDRank="primary"/>
<Identification IDValue="171ad0f2-da57-45c0-9cf5-c61cfa26dd63" IDType="FloorplanID" IDRank="primary"/>
<MarketingName>1901</MarketingName>
<UnitBedrooms>1</UnitBedrooms>
<UnitBathrooms>1</UnitBathrooms>
<UnitRent>765</UnitRent>
<UnitLeasedStatus>Not_Available</UnitLeasedStatus>
<FloorplanName>A1  </FloorplanName>
</Unit>
</Units>
<Comment/>
<EffectiveRent Min="765" Max="765"/>
<Deposit DepositType="Security Deposit">
<Amount>
<ValueRange Min="150" Max="150"/>
</Amount>
</Deposit>
</ILS_Unit>
</ILS_Unit>
</Property>
</PhysicalProperty>
The transformation works fine using altova tool but does not work in python.
Below is the python script I am using.
from lxml import etree
import os
import glob
xslt = etree.parse("fl.xslt")
dom = etree.parse("f.xml",)
transform = etree.XSLT(xslt)
try:
newdom = transform(dom)
root = etree.parse(newdom)
properties = root.findall("./Property")
print(properties)
except Exception as e:
print (e)
for error in transform.error_log:
print(error.message, error.line)
print(etree.tostring(newdom, pretty_print=True))
I am trying to print the property nodes and I also tried to write the result to .xml file but it returns nothing. Could some one tell what the issue could be.
Surprisingly it works fine in Altova.
Below are the errors that are thrown.
line 53, in <module>
root = etree.parse(newdom)
File "src\lxml\etree.pyx", line 3519, in lxml.etree.parse
File "src\lxml\parser.pxi", line 1862, in lxml.etree._parseDocument
TypeError: cannot parse from 'lxml.etree._XSLTResultTree'
Specific error has nothing to do with XSLT but your attempted parse of the result. Like Python's built-in xml.etree, the parse function of lxml.etree requires a file-like object. However, the result from an XSLT transformation in lxml is a ElementTree object that you can then directly run any XML DOM calls like findall, iterfind, etc.
Therefore, simply remove the parse line. Additionally, because you have a default namespace, consider assigning a temporary prefix to access nodes. Also, consider xpath in lxml.
newdom = transform(dom)
properties = newdom.findall("./doc:Property", namespaces={'doc': 'Apartments.AP.Mits20PropertyFinal'})
print(properties)
# [<Element {Apartments.AP.Mits20PropertyFinal}Property at 0x136b2f94b08>]
properties = newdom.xpath("./doc:Property", namespaces={'doc': 'Apartments.AP.Mits20PropertyFinal'})
print(properties)
# [<Element {Apartments.AP.Mits20PropertyFinal}Property at 0x136b2f94b08>]
Output
print(etree.tostring(newdom, pretty_print=True).decode("utf-8"))
# <PhysicalProperty xmlns="Apartments.AP.Mits20PropertyFinal">
# <Property>
# <PropertyInfo>
# <MITSID>6458174</MITSID>
# <MarketingName>Westmount at London Park</MarketingName>
# <TotalUnits/>
# <AddressLine1>14545 Bammel North Houston Road</AddressLine1>
# <City>Houston</City>
# <State>TX</State>
# <Zip>77014</Zip>
# </PropertyInfo>
# <Floorplan>
# <FloorplanID>171ad0f2-da57-45c0-9cf5-c61cfa26dd63</FloorplanID>
# <FloorplanName>A1 </FloorplanName>
# <UnitCount>22</UnitCount>
# <Units>
# <Unit>
# <UnitNum>1901</UnitNum>
# <UnitLeasedStatus>Occupied</UnitLeasedStatus>
# </Unit>
# </Units>
# </Floorplan>
# </Property>
# </PhysicalProperty>

Parsing PDML / XML and selecting specific fields for Pandas

I am using the ElementTreeXML API and trying to parse a large PDML (XML) file in Python. I am trying to get a tabular Pandas dataframe output with specific fields of information. The following is a subset of the actual file.
<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="pdml2html.xsl"?>
<!-- You can find pdml2html.xsl in C:\Program Files\Wireshark or at https://code.wireshark.org/review/gitweb?p=wireshark.git;a=blob_plain;f=pdml2html.xsl. -->
<pdml version="0" creator="wireshark/3.2.2" time="Sun Mar 22 23:53:43 2020" capture_file="C:\Users\anyoung\AppData\Local\Temp\wireshark_Wi-Fi 2_20200322234518_a20824.pcapng">
<packet>
<proto name="geninfo" pos="0" showname="General information" size="66">
<field name="frame.cap_len" showname="Capture Length: 66 bytes (528 bits)" size="0" pos="0" show="66"/>
<field name="frame.marked" showname="Frame is marked: False" size="0" pos="0" show="0"/>
<field name="frame.cap_len" showname="Capture Length: 66 bytes (528 bits)" size="0" pos="0" show="66"/>
<field name="frame.marked" showname="Frame is marked: False" size="0" pos="0" show="0"/>
<field name="caplen" pos="0" show="66" showname="Captured Length" value="42" size="66"/>
<field name="timestamp" pos="0" show="Mar 22, 2020 23:45:34.045301000 Pacific Daylight Time" showname="Captured Time" value="1584945934.045301000" size="66"/>
</proto>
I want to get a table like:
field size value
frame.cap_len 0 null
frame.marked 0 null
timestamp 66 1584945934.045301000
I am really struggling with the syntax to do the above. I haven't been able to get anything that even comes close.
Here's another XSLT example (this is more for #Kristian).
XML Input (input.xml)
<pdml version="0" creator="wireshark/3.2.2" time="Sun Mar 22 23:53:43 2020" capture_file="C:\Users\anyoung\AppData\Local\Temp\wireshark_Wi-Fi 2_20200322234518_a20824.pcapng">
<packet>
<proto name="geninfo" pos="0" showname="General information" size="66">
<field name="frame.cap_len" showname="Capture Length: 66 bytes (528 bits)" size="0" pos="0" show="66"/>
<field name="frame.marked" showname="Frame is marked: False" size="0" pos="0" show="0"/>
<field name="frame.cap_len" showname="Capture Length: 66 bytes (528 bits)" size="0" pos="0" show="66"/>
<field name="frame.marked" showname="Frame is marked: False" size="0" pos="0" show="0"/>
<field name="caplen" pos="0" show="66" showname="Captured Length" value="42" size="66"/>
<field name="timestamp" pos="0" show="Mar 22, 2020 23:45:34.045301000 Pacific Daylight Time" showname="Captured Time" value="1584945934.045301000" size="66"/>
</proto>
</packet>
</pdml>
XSLT 1.0 (test.xsl)
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:strip-space elements="*"/>
<xsl:variable name="tab" select="' '"/>
<xsl:variable name="nl" select="'
'"/>
<xsl:template match="/">
<xsl:value-of select="concat('field',$tab,'size',$tab,'value',$nl)"/>
<xsl:apply-templates select=".//field"/>
</xsl:template>
<xsl:template match="field">
<xsl:value-of select="concat(#name,$tab,#size,$tab,#value,$nl)"/>
</xsl:template>
</xsl:stylesheet>
Python 3
from lxml import etree
tree = etree.parse("input.xml")
xslt = etree.parse("test.xsl")
new_tree = tree.xslt(xslt)
print(new_tree)
Printed Output
field size value
frame.cap_len 0
frame.marked 0
frame.cap_len 0
frame.marked 0
caplen 66 42
timestamp 66 1584945934.045301000
lxml
https://lxml.de/xpathxslt.html
lxml will allow you to transform XML documents using XSLT. For some reason, XSLT is overlooked and brute-force programming objects are used instead. Nevertheless, I prefer to use XLST when handling and transforming XML data.
I highly recommend learning XSLT and utilizing it regularly if you must handle XML data on a daily basis.
XLST to transform your XML document: packet.xsl
Variables are used for delimiter and end-of-line (EOL) to allow for easy modification.
Templates for header-row and field are used to allow the re-arrangement or addition of new fields when necessary.
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" media-type="string" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:variable name="delimiter" select="' '"/>
<xsl:variable name="EOL" select="'
'"/>
<xsl:template match="/pdml/packet/proto">
<xsl:call-template name="header-row"/>
<xsl:apply-templates match="field"/>
</xsl:template>
<xsl:template match="field">
<xsl:value-of select="#name"/>
<xsl:value-of select="$delimiter"/>
<xsl:value-of select="#size"/>
<xsl:value-of select="$delimiter"/>
<xsl:value-of select="#value"/>
<xsl:value-of select="$EOL"/>
<xsl:apply-templates select="*"/>
</xsl:template>
<xsl:template name="header-row">
<xsl:element name="row">
<xsl:text>field</xsl:text>
<xsl:value-of select="$delimiter"/>
<xsl:text>size</xsl:text>
<xsl:value-of select="$delimiter"/>
<xsl:text>value</xsl:text>
<xsl:value-of select="$EOL"/>
</xsl:element>
</xsl:template>
</xsl:stylesheet>
Example Python XML/XSLT transformation code
Use the python script provided by #Daniel Haley. I just named my file test.py
Execute XSLT with XML input
XSLT = packet.xsl
XML = packet.xml
Assuming your packet.xml is a well formed XML document and not an incomplete fragement.
./test.py
Tab delimited output.
field size value
frame.cap_len 0
frame.marked 0
frame.cap_len 0
frame.marked 0
caplen 66 42
timestamp 66 1584945934.045301000

Python XML and XPath to sort things out

Let's say I have an XML as follows.
<a>
<b>
<c>A</c>
</b>
<bb>
<c>B</c>
</bb>
<c>
X
</c>
</a>
I need to parse this XML into dictionary X for a/b/c and a/b'/c, but dictionary Y for a/c.
dictionary X
X[a_b_c] = A
X[a_bb_c] = B
dictionary T
T[a_c] = X
Q : I'd like to make a mapping file for this in XML file using XPath. How can I do this?
I think of having mapping.xml as follows.
<mapping>
<from>a/c</from><to>dictionary T<to>
....
</mapping>
And using 'a/c' to get X, and put it in dictionary T. Is there any better ways to go?
Maybe you could do this with XSLT. This stylesheet:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:key name="dict" match="item" use="#dict"/>
<xsl:key name="path" match="*[not(*)]" use="concat(name(../..),'/',
name(..),'/',
name())"/>
<xsl:variable name="map">
<item path="a/b/c" dict="X"/>
<item path="a/bb/c" dict="X"/>
<item path="/a/c" dict="T"/>
</xsl:variable>
<xsl:template match="/">
<xsl:variable name="input" select="."/>
<xsl:for-each select="document('')/*/xsl:variable[#name='map']/*[count(.|key('dict',#dict)[1])=1]">
<xsl:variable name="dict" select="#dict"/>
<xsl:variable name="path" select="../item[#dict=$dict]/#path"/>
<xsl:value-of select="concat('dictionary ',$dict,'
')"/>
<xsl:for-each select="$input">
<xsl:apply-templates select="key('path',$path)">
<xsl:with-param name="dict" select="$dict"/>
</xsl:apply-templates>
</xsl:for-each>
</xsl:for-each>
</xsl:template>
<xsl:template match="*">
<xsl:param name="dict"/>
<xsl:variable name="path" select="concat(name(../..),'_',
name(..),'_',
name())"/>
<xsl:value-of select="concat($dict,'[',
translate(substring($path,
1,
1),
'_',
''),
substring($path,2),'] = ',
normalize-space(.),'
')"/>
</xsl:template>
</xsl:stylesheet>
Output:
dictionary X
X[a_b_c] = A
X[a_bb_c] = B
dictionary T
T[a_c] = X
EDIT: Pretty things a bit.

Categories