Let's say I have an XML as follows.
<a>
<b>
<c>A</c>
</b>
<bb>
<c>B</c>
</bb>
<c>
X
</c>
</a>
I need to parse this XML into dictionary X for a/b/c and a/b'/c, but dictionary Y for a/c.
dictionary X
X[a_b_c] = A
X[a_bb_c] = B
dictionary T
T[a_c] = X
Q : I'd like to make a mapping file for this in XML file using XPath. How can I do this?
I think of having mapping.xml as follows.
<mapping>
<from>a/c</from><to>dictionary T<to>
....
</mapping>
And using 'a/c' to get X, and put it in dictionary T. Is there any better ways to go?
Maybe you could do this with XSLT. This stylesheet:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:key name="dict" match="item" use="#dict"/>
<xsl:key name="path" match="*[not(*)]" use="concat(name(../..),'/',
name(..),'/',
name())"/>
<xsl:variable name="map">
<item path="a/b/c" dict="X"/>
<item path="a/bb/c" dict="X"/>
<item path="/a/c" dict="T"/>
</xsl:variable>
<xsl:template match="/">
<xsl:variable name="input" select="."/>
<xsl:for-each select="document('')/*/xsl:variable[#name='map']/*[count(.|key('dict',#dict)[1])=1]">
<xsl:variable name="dict" select="#dict"/>
<xsl:variable name="path" select="../item[#dict=$dict]/#path"/>
<xsl:value-of select="concat('dictionary ',$dict,'
')"/>
<xsl:for-each select="$input">
<xsl:apply-templates select="key('path',$path)">
<xsl:with-param name="dict" select="$dict"/>
</xsl:apply-templates>
</xsl:for-each>
</xsl:for-each>
</xsl:template>
<xsl:template match="*">
<xsl:param name="dict"/>
<xsl:variable name="path" select="concat(name(../..),'_',
name(..),'_',
name())"/>
<xsl:value-of select="concat($dict,'[',
translate(substring($path,
1,
1),
'_',
''),
substring($path,2),'] = ',
normalize-space(.),'
')"/>
</xsl:template>
</xsl:stylesheet>
Output:
dictionary X
X[a_b_c] = A
X[a_bb_c] = B
dictionary T
T[a_c] = X
EDIT: Pretty things a bit.
Related
I have an XML feed with multiple tv show's episodes.
I need to convert it into an RDF/XML file using XLST template.
Each RDF should be one tv show, its episodes should be nested properties under it along with extracted info about the show.
The way i'm transforming it, all episodes (item tag) become RDFs.
How do i extract tv show's metadata from episode i.e items and then nest the tv show's episodes under the tv show rdf?
CURRENT BEHAVIOUR
rdf:Group rdf:id=New Item in Series
ebucore:groupName: Football
ebucore:Season: 1
ebucore:Episode: 1
ebucore:groupId: Better Call Saul
ebucore:groupDescription: Three young up
#----------------------------------------------------------------------
rdf:Group rdf:id=New Item in Series
ebucore:groupName: Basketball
ebucore:Season: 1
ebucore:Episode: 2
ebucore:groupId: Better Call Saul
ebucore:groupDescription: Three young up
#----------------------------------------------------------------------
EXPECTED BEHAVIOUR IS BELOW
Each show as an rdf with its episodes nested properties under it.
rdf
show title better call saul
show description: Three young up
ep
ep.id 1
ep
ep.id 2
Attached below XML raw, XLST & code
<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
xmlns:media="http://search.yahoo.com/mrss/" xmlns:ott="http://rss.ott.com/1.0" xmlns:name="http://www.name.com/rss/extensions/" xmlns:dcterms="http://purl.org/dc/terms/">
<channel>
<title>ABC</title>
<description>Feed of items</description>
<pubDate>Wed, 06 May 2022</pubDate>
<item>
<guid isPermaLink="false">some_alphaNumeric</guid>
<title>Better Call Saul</title>
<name:episodic type="episode">
<name:seriesId>Better Call Saul</name:seriesId>
<name:seasonNum>1</name:seasonNum>
<name:episodeNum>1</name:episodeNum>
</name:episodic>
<description>This is a victorious story of a Lawyer </description>
<media:keywords>urban, African American, destiny</media:keywords>
<media:subTitle lang="en" type="application/octet-stream"/>
<media:category scheme="http://www.name.com">Drama - Drama</media:category>
<media:group>
<media:content url="video_url.mp4?Jd" type="application/mp4" duration="4693" medium="video" expression="full" bitrate="7627"/>
</media:group>
<media:group>
<media:thumbnail url="some_url" height="450" width="800"/>
<media:thumbnail url="https://mp4video.ott.com/04" height="1920" width="1080"/>
</media:group>
<media:rating scheme="simple">nonadult</media:rating>
<name:cuePoints>10</name:cuePoints>
<pubDate>Tue, 28 Jul 2020 13:29:16 -0400</pubDate>
</item>
<item>
<guid isPermaLink="false">some_alphaNumeric</guid>
<title>Better Call Saul</title>
<name:episodic type="episode">
<name:seriesId>Better Call Saul</name:seriesId>
<name:seasonNum>1</name:seasonNum>
<name:episodeNum>2</name:episodeNum>
</name:episodic>
<description>This is a victorious story of a Lawyer </description>
<media:keywords>urban, African American, destiny</media:keywords>
<media:subTitle lang="en" type="application/octet-stream"/>
<media:category scheme="http://www.name.com">Drama - Drama</media:category>
<media:group>
<media:content url="video_url.mp4?acs3" type="application/mp4" duration="4693" medium="video" expression="full" bitrate="7627"/>
</media:group>
<media:group>
<media:thumbnail url="some_url" height="450" width="800"/>
<media:thumbnail url="https://mp4video.ott.com/04" height="1920" width="1080"/>
</media:group>
<media:rating scheme="simple">nonadult</media:rating>
<name:cuePoints>10</name:cuePoints>
<pubDate>Tue, 28 Aug 2020 13:29:16 -0400</pubDate>
</item>
and the XSLT
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:html="http://www.w3.org/1999/xhtml"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:foaf="http://xmlns.com/foaf/spec/"
xmlns:ebucore="http://www.ebu.ch/metadata/ontologies/ebucore/ebucore#"
xmlns:ebulang="http://resolution.org/res#"
xmlns:foo="http://example.com/foo#"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:name="http://www.name.com/rss/extensions/">
<xsl:template match="/">
<rdf:RDF>
Number of Items in Rss Feed: <xsl:value-of select="count(//item)" />
<xsl:apply-templates/>
</rdf:RDF>
</xsl:template>
<xsl:template match="item">
<xsl:text>#----------------------------------------------------------------------</xsl:text><xsl:text>
</xsl:text>
<xsl:text>rdf:Group rdf:id=New Item in Series</xsl:text><xsl:text>
</xsl:text>
<xsl:text>ebucore:groupName: </xsl:text>
<xsl:value-of select="normalize-space(title)"/>
<xsl:text>
</xsl:text>
<xsl:for-each select="name:episodic">
<xsl:text>ebucore:Season: </xsl:text>
<xsl:value-of select="normalize-space(name:seasonNum)"/>
<xsl:text>
</xsl:text>
<xsl:text> </xsl:text>
<xsl:text>ebucore:Episode: </xsl:text>
<xsl:value-of select="normalize-space(name:episodeNum)"/>
<xsl:text>
</xsl:text>
<xsl:text> </xsl:text>
<xsl:text>ebucore:groupId: </xsl:text>
<xsl:value-of select="name:seriesId"/>
<xsl:text>
</xsl:text>
<xsl:text> </xsl:text>
</xsl:for-each>
<xsl:text>ebucore:groupDescription: </xsl:text>
<xsl:if test="count(description)=1">
<xsl:value-of select="normalize-space(description)"/>
</xsl:if>
<xsl:if test="count(description)=0">
<xsl:text>Null</xsl:text>
</xsl:if>
</xsl:template>
</xsl:stylesheet>
code
import lxml.etree as ET
import xml.dom.minidom
from pprint import pprint
import os
dom_rssFeed = ET.parse('raw.xml')
xslt_rssFeed = ET.parse('mapper.xslt')
transform_rssFeed = ET.XSLT(xslt_rssFeed)
finaldom = transform_rssFeed(dom_rssFeed)
new = ET.tostring(finaldom)
dom_transformed = xml.dom.minidom.parseString(new)
pretty_xml_as_string = dom_transformed.toprettyxml()
print(pretty_xml_as_string)
Im using the python code to parse multiple .xml files
import os
import lxml.etree as ET
import sys
inputpath =
xsltfile =
outpath =
dir = []
if sys.version_info[0] >= 3:
unicode = str
for dirpath, dirnames, filenames in os.walk(inputpath):
structure = os.path.join(outpath, dirpath[len(inputpath):])
if not os.path.isdir(structure):
os.mkdir(structure)
for filename in filenames:
if filename.endswith(('.xml')):
dir = os.path.join(dirpath, filename)
print(dir)
dom = ET.parse(dir)
xslt = ET.parse(xsltfile)
transform = ET.XSLT(xslt)
newdom = transform(dom)
infile = unicode((ET.tostring(newdom, pretty_print=True,xml_declaration=True,standalone='yes')))
outfile = open(structure + "\\" + filename, 'a')
outfile.write(infile)
I do have an .xslt template which is used to sort the uuids in the same file.
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes" standalone="yes"/>
<xsl:strip-space elements="*"/>
<!-- identity transform -->
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="uuids">
<xsl:copy>
<xsl:apply-templates select="uuid">
<xsl:sort select="."/>
</xsl:apply-templates>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Desired Output should be same as source unicode char's but with sortig uuid's in the same file. I see that uuids are sorting fine, but this unicode is changing to numbers which i dont want to. I
While asking a question it is a good idea to provide a minimal reproducible example, i.e. XML/XSLT pair.
Please try the following conceptual example.
I am using SAXON 9.7.0.15
It is very possible that the last Python line is causing the issue:
outfile.write(ET.tostring(newdom,pretty_print=True,xml_declaration=True,standalone='yes').decode())
Please try Python last lines as follows:
import sys
if sys.version_info[0] >= 3:
unicode = str
...
newdom = transform(dom)
infile = unicode((ET.tostring(newdom, pretty_print=True)))
outfile = open(structure + "\\" + filename, 'a')
outfile.write(infile, encoding='utf-8', xml_declaration=True, pretty_print=True)
https://lxml.de/api/lxml.etree._ElementTree-class.html#write
Reference link: How to transform an XML file using XSLT in Python
Input XML
<?xml version="1.0" encoding="UTF-8"?>
<a:ruleInputTestConfigs xmlns:a="URI">
<a:value xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:type="xsd:string">あいうえお#domain.com</a:value>
<a:nameRef>email</a:nameRef>
<a:id>1</a:id>
</a:ruleInputTestConfigs>
XSLT
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes" standalone="yes"/>
<xsl:strip-space elements="*"/>
<!-- identity transform -->
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Output XML
<?xml version="1.0" encoding="UTF-8"?>
<a:ruleInputTestConfigs xmlns:a="URI">
<a:value xmlns:xsd="http://www.w3.org/2001/XMLSchema" xsi:type="xsd:string"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">あいうえお#domain.com</a:value>
<a:nameRef>email</a:nameRef>
<a:id>1</a:id>
</a:ruleInputTestConfigs>
I have a source.xml file with structure like:
<products>
<product>
<id>1</id>
<description>
<style>
table{
some css here
}
</style>
<descr>
<div>name of producer like ABC&DEF</div>
<table>
<th>parameters</th>
<tr><td>name of param 1 e.g POWER CONSUMPTION</td>
<td>value of param 1 with e.g < 100 W</td></tr>
</table>
</descr>
</description>
</product>
.....................
</products>
I would like to have:
<products>
<product>
<id>1</id>
<description>
<![CDATA[
<style>
table{
some css here
}
</style>
<descr>
<div>name of producer like ABC&DEF</div>
<table>
<th>parameters</th>
<tr><td>name of param 1 e.g POWER CONSUMPTION</td>
<td>value of param 1 with e.g < 100 VA</td></tr>
</table>
]]>
</descr>
</description>
</product>
.....................
</products>
I tried .xsl stylesheets based on:
How to use in XSLT?
and
Add CDATA to an xml file
and
how to add cdata to an xml file using xsl such as:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes" encoding="UTF-8" />
<xsl:template match="/products">
<products>
<xsl:for-each select="product">
<product>
<description>
<xsl:text disable-output-escaping="yes"><![CDATA[</xsl:text>
<xsl:copy-of select="description/node()" />
<xsl:text disable-output-escaping="yes">]]></xsl:text>
</xsl:for-each>
</description>
</product>
</xsl:for-each>
</products>
</xsl:template>
</xsl:stylesheet>
and
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" >
<xsl:output method="xml" indent="yes" cdata-section-elements="description"/>
<xsl:template match="description">
<xsl:copy>
<xsl:apply-templates select="#*"/>
<xsl:variable name="subElementsText">
<xsl:apply-templates select="node()" mode="asText"/>
</xsl:variable>
</xsl:copy>
</xsl:template>
<xsl:template match="text()" mode="asText">
<xsl:copy/>
</xsl:template>
<xsl:template match="*" mode="asText">
<xsl:value-of select="concat('<',name())"/>
<xsl:for-each select="#*">
<xsl:value-of select="concat(' ',name(),'="',.,'"')"/>
</xsl:for-each>
<xsl:value-of select="'>'"/>
<xsl:apply-templates select="node()" mode="asText"/>
<xsl:value-of select="concat('</',name(),'>')"/>
</xsl:template>
<xsl:template match="#* | node()">
<xsl:copy>
<xsl:apply-templates select="#* | node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
but running my python script
import lxml.etree as ET
doc = ET.parse('source.xml')
xslt = ET.parse('modyfi.xsl')
transform = ET.XSLT(xslt)
newdoc = transform(doc)
with open(f'output.xml', 'wb') as f:
f.write(newdoc)
on SublimeText3 I allways get the same error:
lxml.etree.XMLSyntaxError: StartTag: invalid element name, {number of line and column with first appearance of illegal character}
I am sure, that solution is straight in front of me in links above, but I can't see it.
Or maybe I can't find it because I can't ask the right question. Please help, I'm new to coding.
The input XML is not well-formed. I had to fix it first. That seems to be the reason why it is failing on your end.
XML
<products>
<product>
<id>1</id>
<description>
<style>table{
some css here
}</style>
<descr>
<div>name of producer like ABC&DEF</div>
<table>
<th>parameters</th>
<tr>
<td>name of param 1 e.g POWER CONSUMPTION</td>
<td>value of param 1 with e.g < 100 W</td>
</tr>
</table>
</descr>
</description>
</product>
</products>
XSLT
<?xml version="1.0"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes" omit-xml-declaration="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="description">
<xsl:copy>
<xsl:text disable-output-escaping="yes"><![CDATA[</xsl:text>
<xsl:copy-of select="*"/>
<xsl:text disable-output-escaping="yes">]]></xsl:text>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Output
<products>
<product>
<id>1</id>
<description><![CDATA[
<style>table{
some css here
}
</style>
<descr>
<div>name of producer like ABC&DEF</div>
<table>
<th>parameters</th>
<tr>
<td>name of param 1 e.g POWER CONSUMPTION</td>
<td>value of param 1 with e.g < 100 W</td>
</tr>
</table>
</descr>]]>
</description>
</product>
</products>
In my view a clean way is to make use of a serialize function to serialize all elements you want as plain text, to then designate the parent container in the xsl:output declaration in the cdata-section-elements and to finally make sure the XSLT processor is in charge of the serialization.
Now XSLT 3 has a built-in XPath 3.1 serialize function, in Python you could use that with Saxon-C and its Python API.
For libxslt based XSLT 1 with lxml you can write an extension function in Python exposed to XSLT:
from lxml import etree as ET
def serialize(context, nodes):
return b''.join(ET.tostring(node) for node in nodes)
ns = ET.FunctionNamespace('http://example.com/mf')
ns['serialize'] = serialize
xml = ET.fromstring('<root><div><p>foo</p><p>bar</p></div></root>')
xsl = ET.fromstring('''<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:mf="http://example.com/mf" version="1.0">
<xsl:output method="xml" cdata-section-elements="div" encoding="UTF-8"/>
<xsl:template match="#* | node()">
<xsl:copy>
<xsl:apply-templates select="#* | node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="div">
<xsl:copy>
<xsl:value-of select="mf:serialize(node())"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>''')
transform = ET.XSLT(xsl)
result = transform(xml)
result.write_output("transformed.xml")
Output then is
<?xml version="1.0" encoding="UTF-8"?>
<root><div><![CDATA[<p>foo</p><p>bar</p>]]></div></root>
I am trying to transform the one xml to another xml using xslt. Below is the xslt I am using
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns="Apartments.AP.Mits20PropertyFinal"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:var="http://schemas.microsoft.com/BizTalk/2003/var"
exclude-result-prefixes="msxsl var userCSharp"
xmlns:userCSharp="http://schemas.microsoft.com/BizTalk/2003/userCSharp"
>
<xsl:output method="xml" encoding="UTF-8" indent="yes"/>
<xsl:key name="myKey" match="ILS_Unit/Units/Unit" use="./Identification[#IDType='FloorplanID']/#IDValue"/>
<xsl:template match="/">
<PhysicalProperty>
<!-- <xsl:apply-templates select="PhysicalProperty/Management"/>
--> <xsl:apply-templates select="PhysicalProperty/Property"/>
</PhysicalProperty>
</xsl:template>
<xsl:template match="Property">
<Property>
<PropertyInfo>
<xsl:element name="MITSID">
<xsl:value-of select="./PropertyID/Identification[#IDRank='secondary']/#IDValue"/>
</xsl:element>
<xsl:element name="MarketingName">
<xsl:value-of select="./PropertyID/MarketingName"/>
</xsl:element>
<xsl:element name="TotalUnits">
</xsl:element>
<xsl:element name="AddressLine1">
<xsl:value-of select="./PropertyID/Address/AddressLine1"/>
</xsl:element>
<xsl:element name="City">
<xsl:value-of select="./PropertyID/Address/City"/>
</xsl:element>
<xsl:element name="State">
<xsl:value-of select="./PropertyID/Address/State"/>
</xsl:element>
<xsl:element name="Zip">
<xsl:value-of select="./PropertyID/Address/PostalCode"/>
</xsl:element>
</PropertyInfo>
<xsl:for-each select="./Floorplan">
<Floorplan>
<FloorplanID><xsl:value-of select="./#IDValue"/></FloorplanID>
<FloorplanName><xsl:value-of select="./Name"/></FloorplanName>
<!--This Unit count is Totals no of Units per floorplan-->
<UnitCount><xsl:value-of select="./UnitCount"/></UnitCount>
<Units>
<xsl:variable name="floorplanid" select="./#IDValue"/>
<xsl:for-each select="key('myKey',$floorplanid)">
<Unit>
<UnitNum>
<xsl:value-of select="./MarketingName"/>
</UnitNum>
<xsl:if test="./UnitLeasedStatus='Not_Available'">
<UnitLeasedStatus>Occupied</UnitLeasedStatus>
</xsl:if>
<xsl:if test="./UnitLeasedStatus='On_Notice'">
<UnitLeasedStatus>On Notice</UnitLeasedStatus>
</xsl:if>
<xsl:if test="./UnitLeasedStatus='Available'">
<UnitLeasedStatus>Available</UnitLeasedStatus>
</xsl:if>
</Unit>
</xsl:for-each>
</Units>
</Floorplan>
</xsl:for-each>
</Property>
</xsl:template>
</xsl:stylesheet>
And below is the sample xml file
<?xml version="1.0" encoding="utf-8"?>
<PhysicalProperty>
<Property>
<PropertyID>
<Identification IDValue="183dbed4-0101-4a85-954c-a4e8042d2819" IDRank="primary"/>
<Identification IDValue="6458174" IDRank="secondary"/>
<MarketingName>Westmount at London Park</MarketingName>
<Website>http://www.westmountatlondonpark.com</Website>
<Address AddressType="property">
<AddressLine1>14545 Bammel North Houston Road</AddressLine1>
<AddressLine2/>
<City>Houston</City>
<State>TX</State>
<PostalCode>77014</PostalCode>
</Address>
</PropertyID>
<ILS_Identification ILS_IdentificationType="Apartment" RentalType="Unspecified"/>
<Floorplan IDValue="171ad0f2-da57-45c0-9cf5-c61cfa26dd63" IDType="FloorplanID" IDRank="primary">
<FloorplanType>Internal</FloorplanType>
<Name>A1 </Name>
<Comment/>
<UnitCount>22</UnitCount>
<UnitsAvailable>3</UnitsAvailable>
<DisplayedUnitsAvailable>5</DisplayedUnitsAvailable>
<Room RoomType="Bedroom">
<Count>1</Count>
</Room>
<Room RoomType="Bathroom">
<Count>1</Count>
</Room>
<SquareFeet Min="602" Max="602"/>
<EffectiveRent Min="810" Max="830"/>
<Deposit DepositType="Security Deposit">
<Amount>
<ValueRange Min="150" Max="150"/>
</Amount>
</Deposit>
<File FileID="e114a438-ee82-497d-8e04-6a5a8b2381ef" active="true">
<FileType>Floorplan</FileType>
<Caption/>
<Src>https://apollostore.blob.core.windows.net/londonpark/uploads/images/floorplans/a1.7a75909c-3362-409c-82c7-aa6c959f9c99.jpg</Src>
<Rank>999</Rank>
</File>
</Floorplan>
<ILS_Unit>
<ILS_Unit IDValue="be827564-6460-4af1-9644-3f6ffa225557" IDType="UnitID" IDRank="primary">
<Units>
<Unit>
<Identification IDValue="be827564-6460-4af1-9644-3f6ffa225557" IDType="UnitID" IDRank="primary"/>
<Identification IDValue="171ad0f2-da57-45c0-9cf5-c61cfa26dd63" IDType="FloorplanID" IDRank="primary"/>
<MarketingName>1901</MarketingName>
<UnitBedrooms>1</UnitBedrooms>
<UnitBathrooms>1</UnitBathrooms>
<UnitRent>765</UnitRent>
<UnitLeasedStatus>Not_Available</UnitLeasedStatus>
<FloorplanName>A1 </FloorplanName>
</Unit>
</Units>
<Comment/>
<EffectiveRent Min="765" Max="765"/>
<Deposit DepositType="Security Deposit">
<Amount>
<ValueRange Min="150" Max="150"/>
</Amount>
</Deposit>
</ILS_Unit>
</ILS_Unit>
</Property>
</PhysicalProperty>
The transformation works fine using altova tool but does not work in python.
Below is the python script I am using.
from lxml import etree
import os
import glob
xslt = etree.parse("fl.xslt")
dom = etree.parse("f.xml",)
transform = etree.XSLT(xslt)
try:
newdom = transform(dom)
root = etree.parse(newdom)
properties = root.findall("./Property")
print(properties)
except Exception as e:
print (e)
for error in transform.error_log:
print(error.message, error.line)
print(etree.tostring(newdom, pretty_print=True))
I am trying to print the property nodes and I also tried to write the result to .xml file but it returns nothing. Could some one tell what the issue could be.
Surprisingly it works fine in Altova.
Below are the errors that are thrown.
line 53, in <module>
root = etree.parse(newdom)
File "src\lxml\etree.pyx", line 3519, in lxml.etree.parse
File "src\lxml\parser.pxi", line 1862, in lxml.etree._parseDocument
TypeError: cannot parse from 'lxml.etree._XSLTResultTree'
Specific error has nothing to do with XSLT but your attempted parse of the result. Like Python's built-in xml.etree, the parse function of lxml.etree requires a file-like object. However, the result from an XSLT transformation in lxml is a ElementTree object that you can then directly run any XML DOM calls like findall, iterfind, etc.
Therefore, simply remove the parse line. Additionally, because you have a default namespace, consider assigning a temporary prefix to access nodes. Also, consider xpath in lxml.
newdom = transform(dom)
properties = newdom.findall("./doc:Property", namespaces={'doc': 'Apartments.AP.Mits20PropertyFinal'})
print(properties)
# [<Element {Apartments.AP.Mits20PropertyFinal}Property at 0x136b2f94b08>]
properties = newdom.xpath("./doc:Property", namespaces={'doc': 'Apartments.AP.Mits20PropertyFinal'})
print(properties)
# [<Element {Apartments.AP.Mits20PropertyFinal}Property at 0x136b2f94b08>]
Output
print(etree.tostring(newdom, pretty_print=True).decode("utf-8"))
# <PhysicalProperty xmlns="Apartments.AP.Mits20PropertyFinal">
# <Property>
# <PropertyInfo>
# <MITSID>6458174</MITSID>
# <MarketingName>Westmount at London Park</MarketingName>
# <TotalUnits/>
# <AddressLine1>14545 Bammel North Houston Road</AddressLine1>
# <City>Houston</City>
# <State>TX</State>
# <Zip>77014</Zip>
# </PropertyInfo>
# <Floorplan>
# <FloorplanID>171ad0f2-da57-45c0-9cf5-c61cfa26dd63</FloorplanID>
# <FloorplanName>A1 </FloorplanName>
# <UnitCount>22</UnitCount>
# <Units>
# <Unit>
# <UnitNum>1901</UnitNum>
# <UnitLeasedStatus>Occupied</UnitLeasedStatus>
# </Unit>
# </Units>
# </Floorplan>
# </Property>
# </PhysicalProperty>
Considering this XML example:
<data>
<items>
<item name="item1">item1pre <bold>ok!</bold> item1post</item>
<item name="item2">item2</item>
</items>
</data>
I am looking for a way to get the following result:
"item1pre **ok! ** item1post"
I thought of getting all the content of item1 as a string "item1pre <'bold> ok!<'/bold> item1post" and then replace "<'bold>" and "<'/bold>" by "**", but I don't know how to get that.
xml="""
<data>
<items>
<item name="item1">item1pre<bold>ok!</bold>item1post</item>
<item name="item2">item2</item>
</items>
</data>
"""
import xml.etree.ElementTree as ET
# python included module
def cleaned_strings_from_xml(xml_str, tag='item'):
"""
finds all items of type tag from xml-string
:param xml_str: valid xml structure as string
:param tag: tag to search inside the xml
:returns: list of all texts of 'tag'-items
"""
strings = []
root = ET.fromstring(xml)
for item in root.iter(tag):
item_str = ET.tostring(item).decode('utf-8')
item_str = item_str.replace('<bold>', ' **').replace('</bold>', ' **')
strings.append(ET.fromstring(item_str).text)
return strings
print(cleaned_strings_from_xml(xml))
You could offload all xml processing to libxml by using an xslt transformation. Libxml is written in C which should be quicker:
from lxml import etree
transform = etree.XSLT(etree.XML('''
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" />
<xsl:strip-space elements="*"/>
<xsl:template match="/">
<xsl:apply-templates />
</xsl:template>
<xsl:template match="data/items/item[#name = 'item1']">
<xsl:text>"</xsl:text>
<xsl:value-of select="text()"/>
<xsl:text>**</xsl:text>
<xsl:value-of select="bold/."/>
<xsl:text>**</xsl:text>
<xsl:value-of select="bold/following-sibling::text()[1]"/>
<xsl:text>"</xsl:text>
</xsl:template>
<xsl:template match="data/items/item[#name != 'item1']" />
</xsl:stylesheet>
'''))
with open("source.xml") as f:
print(transform(etree.parse(f)))
In a nutshell: Match the item element with name attribute 'item1' then use relative xpath expressions to extract the strings.