xslt template to transform xml file to xml - python

Im using the python code to parse multiple .xml files
import os
import lxml.etree as ET
import sys
inputpath =
xsltfile =
outpath =
dir = []
if sys.version_info[0] >= 3:
unicode = str
for dirpath, dirnames, filenames in os.walk(inputpath):
structure = os.path.join(outpath, dirpath[len(inputpath):])
if not os.path.isdir(structure):
os.mkdir(structure)
for filename in filenames:
if filename.endswith(('.xml')):
dir = os.path.join(dirpath, filename)
print(dir)
dom = ET.parse(dir)
xslt = ET.parse(xsltfile)
transform = ET.XSLT(xslt)
newdom = transform(dom)
infile = unicode((ET.tostring(newdom, pretty_print=True,xml_declaration=True,standalone='yes')))
outfile = open(structure + "\\" + filename, 'a')
outfile.write(infile)
I do have an .xslt template which is used to sort the uuids in the same file.
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes" standalone="yes"/>
<xsl:strip-space elements="*"/>
<!-- identity transform -->
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="uuids">
<xsl:copy>
<xsl:apply-templates select="uuid">
<xsl:sort select="."/>
</xsl:apply-templates>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Desired Output should be same as source unicode char's but with sortig uuid's in the same file. I see that uuids are sorting fine, but this unicode is changing to numbers which i dont want to. I

While asking a question it is a good idea to provide a minimal reproducible example, i.e. XML/XSLT pair.
Please try the following conceptual example.
I am using SAXON 9.7.0.15
It is very possible that the last Python line is causing the issue:
outfile.write(ET.tostring(newdom,pretty_print=True,xml_declaration=True,standalone='yes').decode())
Please try Python last lines as follows:
import sys
if sys.version_info[0] >= 3:
unicode = str
...
newdom = transform(dom)
infile = unicode((ET.tostring(newdom, pretty_print=True)))
outfile = open(structure + "\\" + filename, 'a')
outfile.write(infile, encoding='utf-8', xml_declaration=True, pretty_print=True)
https://lxml.de/api/lxml.etree._ElementTree-class.html#write
Reference link: How to transform an XML file using XSLT in Python
Input XML
<?xml version="1.0" encoding="UTF-8"?>
<a:ruleInputTestConfigs xmlns:a="URI">
<a:value xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:type="xsd:string">あいうえお#domain.com</a:value>
<a:nameRef>email</a:nameRef>
<a:id>1</a:id>
</a:ruleInputTestConfigs>
XSLT
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes" standalone="yes"/>
<xsl:strip-space elements="*"/>
<!-- identity transform -->
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Output XML
<?xml version="1.0" encoding="UTF-8"?>
<a:ruleInputTestConfigs xmlns:a="URI">
<a:value xmlns:xsd="http://www.w3.org/2001/XMLSchema" xsi:type="xsd:string"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">あいうえお#domain.com</a:value>
<a:nameRef>email</a:nameRef>
<a:id>1</a:id>
</a:ruleInputTestConfigs>

Related

How to convert nodes in XML to CDATA with XSLT?

I have a source.xml file with structure like:
<products>
<product>
<id>1</id>
<description>
<style>
table{
some css here
}
</style>
<descr>
<div>name of producer like ABC&DEF</div>
<table>
<th>parameters</th>
<tr><td>name of param 1 e.g POWER CONSUMPTION</td>
<td>value of param 1 with e.g < 100 W</td></tr>
</table>
</descr>
</description>
</product>
.....................
</products>
I would like to have:
<products>
<product>
<id>1</id>
<description>
<![CDATA[
<style>
table{
some css here
}
</style>
<descr>
<div>name of producer like ABC&DEF</div>
<table>
<th>parameters</th>
<tr><td>name of param 1 e.g POWER CONSUMPTION</td>
<td>value of param 1 with e.g < 100 VA</td></tr>
</table>
]]>
</descr>
</description>
</product>
.....................
</products>
I tried .xsl stylesheets based on:
How to use in XSLT?
and
Add CDATA to an xml file
and
how to add cdata to an xml file using xsl such as:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes" encoding="UTF-8" />
<xsl:template match="/products">
<products>
<xsl:for-each select="product">
<product>
<description>
<xsl:text disable-output-escaping="yes"><![CDATA[</xsl:text>
<xsl:copy-of select="description/node()" />
<xsl:text disable-output-escaping="yes">]]></xsl:text>
</xsl:for-each>
</description>
</product>
</xsl:for-each>
</products>
</xsl:template>
</xsl:stylesheet>
and
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" >
<xsl:output method="xml" indent="yes" cdata-section-elements="description"/>
<xsl:template match="description">
<xsl:copy>
<xsl:apply-templates select="#*"/>
<xsl:variable name="subElementsText">
<xsl:apply-templates select="node()" mode="asText"/>
</xsl:variable>
</xsl:copy>
</xsl:template>
<xsl:template match="text()" mode="asText">
<xsl:copy/>
</xsl:template>
<xsl:template match="*" mode="asText">
<xsl:value-of select="concat('<',name())"/>
<xsl:for-each select="#*">
<xsl:value-of select="concat(' ',name(),'="',.,'"')"/>
</xsl:for-each>
<xsl:value-of select="'>'"/>
<xsl:apply-templates select="node()" mode="asText"/>
<xsl:value-of select="concat('</',name(),'>')"/>
</xsl:template>
<xsl:template match="#* | node()">
<xsl:copy>
<xsl:apply-templates select="#* | node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
but running my python script
import lxml.etree as ET
doc = ET.parse('source.xml')
xslt = ET.parse('modyfi.xsl')
transform = ET.XSLT(xslt)
newdoc = transform(doc)
with open(f'output.xml', 'wb') as f:
f.write(newdoc)
on SublimeText3 I allways get the same error:
lxml.etree.XMLSyntaxError: StartTag: invalid element name, {number of line and column with first appearance of illegal character}
I am sure, that solution is straight in front of me in links above, but I can't see it.
Or maybe I can't find it because I can't ask the right question. Please help, I'm new to coding.
The input XML is not well-formed. I had to fix it first. That seems to be the reason why it is failing on your end.
XML
<products>
<product>
<id>1</id>
<description>
<style>table{
some css here
}</style>
<descr>
<div>name of producer like ABC&DEF</div>
<table>
<th>parameters</th>
<tr>
<td>name of param 1 e.g POWER CONSUMPTION</td>
<td>value of param 1 with e.g < 100 W</td>
</tr>
</table>
</descr>
</description>
</product>
</products>
XSLT
<?xml version="1.0"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes" omit-xml-declaration="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="description">
<xsl:copy>
<xsl:text disable-output-escaping="yes"><![CDATA[</xsl:text>
<xsl:copy-of select="*"/>
<xsl:text disable-output-escaping="yes">]]></xsl:text>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Output
<products>
<product>
<id>1</id>
<description><![CDATA[
<style>table{
some css here
}
</style>
<descr>
<div>name of producer like ABC&DEF</div>
<table>
<th>parameters</th>
<tr>
<td>name of param 1 e.g POWER CONSUMPTION</td>
<td>value of param 1 with e.g < 100 W</td>
</tr>
</table>
</descr>]]>
</description>
</product>
</products>
In my view a clean way is to make use of a serialize function to serialize all elements you want as plain text, to then designate the parent container in the xsl:output declaration in the cdata-section-elements and to finally make sure the XSLT processor is in charge of the serialization.
Now XSLT 3 has a built-in XPath 3.1 serialize function, in Python you could use that with Saxon-C and its Python API.
For libxslt based XSLT 1 with lxml you can write an extension function in Python exposed to XSLT:
from lxml import etree as ET
def serialize(context, nodes):
return b''.join(ET.tostring(node) for node in nodes)
ns = ET.FunctionNamespace('http://example.com/mf')
ns['serialize'] = serialize
xml = ET.fromstring('<root><div><p>foo</p><p>bar</p></div></root>')
xsl = ET.fromstring('''<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:mf="http://example.com/mf" version="1.0">
<xsl:output method="xml" cdata-section-elements="div" encoding="UTF-8"/>
<xsl:template match="#* | node()">
<xsl:copy>
<xsl:apply-templates select="#* | node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="div">
<xsl:copy>
<xsl:value-of select="mf:serialize(node())"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>''')
transform = ET.XSLT(xsl)
result = transform(xml)
result.write_output("transformed.xml")
Output then is
<?xml version="1.0" encoding="UTF-8"?>
<root><div><![CDATA[<p>foo</p><p>bar</p>]]></div></root>

xslt transformation works in Altova but not in python

I am trying to transform the one xml to another xml using xslt. Below is the xslt I am using
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns="Apartments.AP.Mits20PropertyFinal"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:var="http://schemas.microsoft.com/BizTalk/2003/var"
exclude-result-prefixes="msxsl var userCSharp"
xmlns:userCSharp="http://schemas.microsoft.com/BizTalk/2003/userCSharp"
>
<xsl:output method="xml" encoding="UTF-8" indent="yes"/>
<xsl:key name="myKey" match="ILS_Unit/Units/Unit" use="./Identification[#IDType='FloorplanID']/#IDValue"/>
<xsl:template match="/">
<PhysicalProperty>
<!-- <xsl:apply-templates select="PhysicalProperty/Management"/>
--> <xsl:apply-templates select="PhysicalProperty/Property"/>
</PhysicalProperty>
</xsl:template>
<xsl:template match="Property">
<Property>
<PropertyInfo>
<xsl:element name="MITSID">
<xsl:value-of select="./PropertyID/Identification[#IDRank='secondary']/#IDValue"/>
</xsl:element>
<xsl:element name="MarketingName">
<xsl:value-of select="./PropertyID/MarketingName"/>
</xsl:element>
<xsl:element name="TotalUnits">
</xsl:element>
<xsl:element name="AddressLine1">
<xsl:value-of select="./PropertyID/Address/AddressLine1"/>
</xsl:element>
<xsl:element name="City">
<xsl:value-of select="./PropertyID/Address/City"/>
</xsl:element>
<xsl:element name="State">
<xsl:value-of select="./PropertyID/Address/State"/>
</xsl:element>
<xsl:element name="Zip">
<xsl:value-of select="./PropertyID/Address/PostalCode"/>
</xsl:element>
</PropertyInfo>
<xsl:for-each select="./Floorplan">
<Floorplan>
<FloorplanID><xsl:value-of select="./#IDValue"/></FloorplanID>
<FloorplanName><xsl:value-of select="./Name"/></FloorplanName>
<!--This Unit count is Totals no of Units per floorplan-->
<UnitCount><xsl:value-of select="./UnitCount"/></UnitCount>
<Units>
<xsl:variable name="floorplanid" select="./#IDValue"/>
<xsl:for-each select="key('myKey',$floorplanid)">
<Unit>
<UnitNum>
<xsl:value-of select="./MarketingName"/>
</UnitNum>
<xsl:if test="./UnitLeasedStatus='Not_Available'">
<UnitLeasedStatus>Occupied</UnitLeasedStatus>
</xsl:if>
<xsl:if test="./UnitLeasedStatus='On_Notice'">
<UnitLeasedStatus>On Notice</UnitLeasedStatus>
</xsl:if>
<xsl:if test="./UnitLeasedStatus='Available'">
<UnitLeasedStatus>Available</UnitLeasedStatus>
</xsl:if>
</Unit>
</xsl:for-each>
</Units>
</Floorplan>
</xsl:for-each>
</Property>
</xsl:template>
</xsl:stylesheet>
And below is the sample xml file
<?xml version="1.0" encoding="utf-8"?>
<PhysicalProperty>
<Property>
<PropertyID>
<Identification IDValue="183dbed4-0101-4a85-954c-a4e8042d2819" IDRank="primary"/>
<Identification IDValue="6458174" IDRank="secondary"/>
<MarketingName>Westmount at London Park</MarketingName>
<Website>http://www.westmountatlondonpark.com</Website>
<Address AddressType="property">
<AddressLine1>14545 Bammel North Houston Road</AddressLine1>
<AddressLine2/>
<City>Houston</City>
<State>TX</State>
<PostalCode>77014</PostalCode>
</Address>
</PropertyID>
<ILS_Identification ILS_IdentificationType="Apartment" RentalType="Unspecified"/>
<Floorplan IDValue="171ad0f2-da57-45c0-9cf5-c61cfa26dd63" IDType="FloorplanID" IDRank="primary">
<FloorplanType>Internal</FloorplanType>
<Name>A1  </Name>
<Comment/>
<UnitCount>22</UnitCount>
<UnitsAvailable>3</UnitsAvailable>
<DisplayedUnitsAvailable>5</DisplayedUnitsAvailable>
<Room RoomType="Bedroom">
<Count>1</Count>
</Room>
<Room RoomType="Bathroom">
<Count>1</Count>
</Room>
<SquareFeet Min="602" Max="602"/>
<EffectiveRent Min="810" Max="830"/>
<Deposit DepositType="Security Deposit">
<Amount>
<ValueRange Min="150" Max="150"/>
</Amount>
</Deposit>
<File FileID="e114a438-ee82-497d-8e04-6a5a8b2381ef" active="true">
<FileType>Floorplan</FileType>
<Caption/>
<Src>https://apollostore.blob.core.windows.net/londonpark/uploads/images/floorplans/a1.7a75909c-3362-409c-82c7-aa6c959f9c99.jpg</Src>
<Rank>999</Rank>
</File>
</Floorplan>
<ILS_Unit>
<ILS_Unit IDValue="be827564-6460-4af1-9644-3f6ffa225557" IDType="UnitID" IDRank="primary">
<Units>
<Unit>
<Identification IDValue="be827564-6460-4af1-9644-3f6ffa225557" IDType="UnitID" IDRank="primary"/>
<Identification IDValue="171ad0f2-da57-45c0-9cf5-c61cfa26dd63" IDType="FloorplanID" IDRank="primary"/>
<MarketingName>1901</MarketingName>
<UnitBedrooms>1</UnitBedrooms>
<UnitBathrooms>1</UnitBathrooms>
<UnitRent>765</UnitRent>
<UnitLeasedStatus>Not_Available</UnitLeasedStatus>
<FloorplanName>A1  </FloorplanName>
</Unit>
</Units>
<Comment/>
<EffectiveRent Min="765" Max="765"/>
<Deposit DepositType="Security Deposit">
<Amount>
<ValueRange Min="150" Max="150"/>
</Amount>
</Deposit>
</ILS_Unit>
</ILS_Unit>
</Property>
</PhysicalProperty>
The transformation works fine using altova tool but does not work in python.
Below is the python script I am using.
from lxml import etree
import os
import glob
xslt = etree.parse("fl.xslt")
dom = etree.parse("f.xml",)
transform = etree.XSLT(xslt)
try:
newdom = transform(dom)
root = etree.parse(newdom)
properties = root.findall("./Property")
print(properties)
except Exception as e:
print (e)
for error in transform.error_log:
print(error.message, error.line)
print(etree.tostring(newdom, pretty_print=True))
I am trying to print the property nodes and I also tried to write the result to .xml file but it returns nothing. Could some one tell what the issue could be.
Surprisingly it works fine in Altova.
Below are the errors that are thrown.
line 53, in <module>
root = etree.parse(newdom)
File "src\lxml\etree.pyx", line 3519, in lxml.etree.parse
File "src\lxml\parser.pxi", line 1862, in lxml.etree._parseDocument
TypeError: cannot parse from 'lxml.etree._XSLTResultTree'
Specific error has nothing to do with XSLT but your attempted parse of the result. Like Python's built-in xml.etree, the parse function of lxml.etree requires a file-like object. However, the result from an XSLT transformation in lxml is a ElementTree object that you can then directly run any XML DOM calls like findall, iterfind, etc.
Therefore, simply remove the parse line. Additionally, because you have a default namespace, consider assigning a temporary prefix to access nodes. Also, consider xpath in lxml.
newdom = transform(dom)
properties = newdom.findall("./doc:Property", namespaces={'doc': 'Apartments.AP.Mits20PropertyFinal'})
print(properties)
# [<Element {Apartments.AP.Mits20PropertyFinal}Property at 0x136b2f94b08>]
properties = newdom.xpath("./doc:Property", namespaces={'doc': 'Apartments.AP.Mits20PropertyFinal'})
print(properties)
# [<Element {Apartments.AP.Mits20PropertyFinal}Property at 0x136b2f94b08>]
Output
print(etree.tostring(newdom, pretty_print=True).decode("utf-8"))
# <PhysicalProperty xmlns="Apartments.AP.Mits20PropertyFinal">
# <Property>
# <PropertyInfo>
# <MITSID>6458174</MITSID>
# <MarketingName>Westmount at London Park</MarketingName>
# <TotalUnits/>
# <AddressLine1>14545 Bammel North Houston Road</AddressLine1>
# <City>Houston</City>
# <State>TX</State>
# <Zip>77014</Zip>
# </PropertyInfo>
# <Floorplan>
# <FloorplanID>171ad0f2-da57-45c0-9cf5-c61cfa26dd63</FloorplanID>
# <FloorplanName>A1 </FloorplanName>
# <UnitCount>22</UnitCount>
# <Units>
# <Unit>
# <UnitNum>1901</UnitNum>
# <UnitLeasedStatus>Occupied</UnitLeasedStatus>
# </Unit>
# </Units>
# </Floorplan>
# </Property>
# </PhysicalProperty>

How do I send an XML request by substituing values in the XML over SOAP?

I have an XML file in my template directory and this XML file has a few placeholders. I need to fill these placeholders with values from the database and calculations of other values (based on user input) and then send it over to a URL that will process this XML and return an XML response as well.
This is how my XML file looks:
my_file.xml
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:web="http://some_url.api.com" xmlns:web1="http://another_url.api.com">
<soapenv:Header>
<web:AuthenticationToken>
<web:licenseKey> string </web:licenseKey>
<web:password> string </web:password>
<web:username> string </web:username>
</web:AuthenticationToken>
</soapenv:Header>
<soapenv:Body>
<web:Shipping>
<web:ShipRequest>
<web1:dtnCtry> string </web1:dtnCtry>
<web1:dtnZC> string </web1:dtnZC>
<web1:details>
<!--Zero or more repetitions:-->
<web1:ShipRequestDetail>
<web1:class> string </web1:class>
<web1:wt> string </web1:wt>
</web1:ShipRequestDetail>
</web1:details>
<web1:orgCtry> string </web1:orgCtry>
<web1:orgZC> string </web1:orgZC>
<web1:shipDateCCYYMMDD> string </web1:shipDateCCYYMMDD>
<web1:shipID> string </web1:shipID>
<web1:tarName> string </web1:tarName>
</web:ShipRequest>
</web:Shipping>
</soapenv:Body>
Each of those string values will have to be substituted with an actual database value. How do I go about this process?
I have checked multiple forums, sites and multiple questions inside of SO and each one recommends a different approach. As a newbie, it is difficult for me to select one and it is extremely overwhelming.
Could someone here please explain a process from scratch? Thanks!
Consider passing parameters into an XSLT script using Python's lxml:
XSLT (save as .xsl file)
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:web="http://some_url.api.com"
xmlns:web1="http://another_url.api.com">
<xsl:output method="xml" omit-xml-declaration="no" indent="yes" encoding="utf-8"/>
<xsl:strip-space elements="*"/>
<!-- XSLT PARMS (TO RECEIVE PYTHON VALUES) -->
<xsl:param name="licenseKey_param"/>
<xsl:param name="password_param"/>
<xsl:param name="username_param"/>
<xsl:param name="dtnCtry_param"/>
<xsl:param name="dtnZC_param"/>
<xsl:param name="class_param"/>
<xsl:param name="wt_param"/>
<xsl:param name="orgCtry_param"/>
<xsl:param name="orgZC_param"/>
<xsl:param name="shipDateCCYYMMDD_param"/>
<xsl:param name="shipID_param"/>
<xsl:param name="tarName_param"/>
<xsl:template match="/*">
<xsl:copy>
<soapenv:Header>
<web:AuthenticationToken>
<web:licenseKey><xsl:value-of select="$licenseKey_param"/></web:licenseKey>
<web:password><xsl:value-of select="$password_param"/></web:password>
<web:username><xsl:value-of select="$username_param"/></web:username>
</web:AuthenticationToken>
</soapenv:Header>
<soapenv:Body>
<web:Shipping>
<web:ShipRequest>
<web1:dtnCtry><xsl:value-of select="$dtnCtry_param"/></web1:dtnCtry>
<web1:dtnZC><xsl:value-of select="$dtnZC_param"/></web1:dtnZC>
<web1:details>
<xsl:comment> Zero or more repetitions: </xsl:comment>
<web1:ShipRequestDetail>
<web1:class><xsl:value-of select="$class_param"/></web1:class>
<web1:wt><xsl:value-of select="$wt_param"/></web1:wt>
</web1:ShipRequestDetail>
</web1:details>
<web1:orgCtry><xsl:value-of select="$orgCtry_param"/></web1:orgCtry>
<web1:orgZC><xsl:value-of select="$orgZC_param"/></web1:orgZC>
<web1:shipDateCCYYMMDD><xsl:value-of select="$shipDateCCYYMMDD_param"/></web1:shipDateCCYYMMDD>
<web1:shipID><xsl:value-of select="$shipID_param"/></web1:shipID>
<web1:tarName><xsl:value-of select="$tarName_param"/></web1:tarName>
</web:ShipRequest>
</web:Shipping>
</soapenv:Body>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Python (adjust string literals in strparam() to databases values and calculations)
import lxml.etree as et
# LOAD XML AND XSL
doc = et.parse('my_file.xml')
xsl = et.parse('my_script.xsl')
# CONFIGURE TRANSFORMER
transform = et.XSLT(xsl)
# RUN TRANSFORMATION WITH PARAMS
result = transform(doc,
licenseKey_param = et.XSLT.strparam('123123'),
password_param = et.XSLT.strparam('password'),
username_param = et.XSLT.strparam('Di437'),
dtnCtry_param = et.XSLT.strparam('Python Country'),
dtnZC_param = et.XSLT.strparam('Django ZC'),
class_param = et.XSLT.strparam('my class'),
wt_param = et.XSLT.strparam('my wt'),
orgCtry_param = et.XSLT.strparam('XML Country'),
orgZC_param = et.XSLT.strparam('Soap ZC'),
shipDateCCYYMMDD_param = et.XSLT.strparam('2018-06-17 12:00'),
shipID_param = et.XSLT.strparam('888999'),
tarName_param = et.XSLT.strparam('stackoverflow'))
# PRINT RESULT
print(result)
# SAVE TO FILE
with open('output.xml', 'wb') as f:
f.write(result)
Output
<?xml version="1.0" encoding="utf-8"?>
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:web="http://some_url.api.com" xmlns:web1="http://another_url.api.com">
<soapenv:Header>
<web:AuthenticationToken>
<web:licenseKey>123123</web:licenseKey>
<web:password>password</web:password>
<web:username>Di437</web:username>
</web:AuthenticationToken>
</soapenv:Header>
<soapenv:Body>
<web:Shipping>
<web:ShipRequest>
<web1:dtnCtry>Python Country</web1:dtnCtry>
<web1:dtnZC>Django ZC</web1:dtnZC>
<web1:details>
<!-- Zero or more repetitions: -->
<web1:ShipRequestDetail>
<web1:class>my class</web1:class>
<web1:wt>my wt</web1:wt>
</web1:ShipRequestDetail>
</web1:details>
<web1:orgCtry>XML Country</web1:orgCtry>
<web1:orgZC>Soap ZC</web1:orgZC>
<web1:shipDateCCYYMMDD>2018-06-17 12:00</web1:shipDateCCYYMMDD>
<web1:shipID>888999</web1:shipID>
<web1:tarName>stackoverflow</web1:tarName>
</web:ShipRequest>
</web:Shipping>
</soapenv:Body>
</soapenv:Envelope>

Extracting data from large xml file with the aid of xslt using node path in python and

I am new to xslt. I want to extract from a large xml file some info using python. I learnt that to convert my heavily nested xml file into a flat file, I need to transform the xml file into xsl file format, but have little understanding of xslt syntax. My xml file (DUMMY XML FILE). My desired goal is to extract into sqlite databse a table with columns names as UltimateHoldingCompany Name, OrganisationName, CompanyID
From the dummy file, paths to this info are:
OrganisationName = N8:EntityList/N8:Entity/N2:OrganisationName/N2:NameElement
CompanyID = N8:EntityList/N8:Entity/N5:Identifiers/N5:Identifier/N5:IdentifierElement
UltimateHoldingCompanyName = N8:EntityList/N8:Entity/N9:UltimateHoldingCompany/N2:OrganisationName/N2:NameElement
My python code is this:
import lxml.etree as ET
import sqlite3
def insert_program(db_conn, companyID, companyNAME, ultimateHOLDINGcompanyName):
curs = db_conn.cursor()
curs.execute("insert into program values (?,?,?,?,?,?)", (companyID, companyNAME, ultimateHOLDINGcompanyName))
db_conn.commit()
def program_data_from_element(element):
companyID = element.find("companyID").text
companyNAME = element.find("companyNAME").text
ultimateHOLDINGcompanyName = element.find("ultimateHOLDINGcompanyName").text
return companyID, companyNAME, ultimateHOLDINGcompanyName
if __name__ == "__main__":
conn = sqlite3.connect("program.sqlite3")
xml = ET.parse("ompanies_xml_extract_20170703NEW.xml")
# PROGRAM PARSE
xslt = ET.parse("program.xsl")
transform = ET.XSLT(xslt)
newdom = transform(xml)
program = newdom.xpath("//program")
for index, element in enumerate(program):
companyID, companyNAME, ultimateHOLDINGcompanyName = program_data_from_element(element)
insert_program(conn, companyID, companyNAME, ultimateHOLDINGcompanyName)
My xslt code is this:
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:output version="1.0" encoding="UTF-8" indent="yes" />
<xsl:strip-space elements="*"/>
<xsl:template match="/N8:EntityList">
<xsl:copy>
<xsl:for-each select="N8:Entity/N2:OrganisationName">
<Report_Entry>
<xsl:copy-of select="../N2:NameElement"/>
<xsl:for-each select="../N2:NameElement/*">
<xsl:element name="N2:NameElement">
<xsl:value-of select="."/>
</xsl:element>
</xsl:for-each>
<xsl:copy-of select="../N9:UltimateHoldingCompany"/>
<xsl:for-each select="../N2:NameElement/*">
<xsl:element name="N2:OrganisationName">
<xsl:value-of select="."/>
</xsl:element>
</xsl:for-each>
</Report_Entry>
</xsl:for-each>
</xsl:copy>
</xsl:template>
</xsl:transform>
Running the code gave this result:
"C:\Program Files (x86)\Python36-32\python.exe" D:/my_pythoncode.py
Traceback (most recent call last):
File "D:/my_pythoncode.py", line 1, in <module>
import lxml.etree as ET
ModuleNotFoundError: No module named 'lxml'
Thanks.

Filtering xml: XSLT with lxml in python

fTrying to filter the xml input with XSLT, I have problem running the following code. I think there is a problem with the defined XSLT..I would like to define a rule in XSLT to discard 'Foo' element in the input xml. This is how my code looks like:
from lxml import etree
from io import StringIO
def testFilter():
xslt_root = etree.XML('''\
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="Foo"/>
</xsl:stylesheet>
''')
transform = etree.XSLT(xslt_root)
f = StringIO(unicode('<?xml version="1.0"?><ComponentData><DataSet name="one"> <Foo fooValue="2014"/></DataSet><DataSet name="two"><Foo fooValue="2015"/></DataSet></ComponentData>
'))
doc = etree.parse(f)
result_tree = transform(doc)
print(str(result_tree))
if __name__=='__main__':
testFilter()
What you are missing is the correct template-match.
Modified code:
from lxml import etree
from io import StringIO
def testFilter():
xslt_root = etree.XML('''\
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="node() | #*">
<xsl:copy>
<xsl:apply-templates select="node() | #*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="TimeStamp"/>
</xsl:stylesheet>
''')
transform = etree.XSLT(xslt_root)
f = StringIO(unicode('<?xml version="1.0"?><ComponentData><DataSet name="one"> <TimeStamp timeStampValue="2014"/></DataSet><DataSet name="two"><TimeStamp timeStampValue="2015"/></DataSet></ComponentData>'))
doc = etree.parse(f)
result_tree = transform(doc)
print(str(result_tree))
if __name__=='__main__':
testFilter()
This outputs:
<?xml version="1.0"?>
<ComponentData><DataSet name="one"> </DataSet><DataSet name="two"/></ComponentData>

Categories