Python xml.etree issue - python

I'm trying to build a script in python 3 using xml.etree which accepts version as a parameter , parses xml and replace the version in the xml tags + values from the tree to the root and his children.
I am at the point where i can change the default value in the root but i am struggling with changing the version to childs and grandchilds - CurrentVersion, Template and Base.
Here is my code and XML:
code-
import sys
from xml.etree import ElementTree as et
version = sys.argv[1]
parse = et.parse("WebApp2.config")
root = parse.getroot()
def changeVersion(version):
ourVersion = root.find('OurVersion')
root.set("default", version)
print(et.tostring(root))
parse.write("WebApp2.config", xml_declaration=True)
if __name__ == "__main__":
changeVersion(version)
XML-
<?xml version="1.0"?>
<OurVersion default="1.0.0.3">
<CurrentVersion bitSupport="true" deviceDetectionSupport="true"
version="1.0.0.3">
<Template>D:\Some\Path\Software\1.0.0.3\webApp\index.webapp</Template>
<BasePath>resources/1.0.0.3/webApp/</BasePath>
</CurrentVersion>
</OurVersion>
I've tried to add something like the below, but im getting issue that "no set attribue to currentVersion" -
ourVersion = root.find('OurVersion')
ourVersion.set('default`, version)
currentVersion = ourVersion.find('CurrentVersion')
currentVersion.set('version', version)
Appreciate your help on this matter ;)

Your first script works because with root.set("default", version) you are using root to refer to the attribute default that you want to modify.
In fact ourVersion = root.find('OurVersion') returns nothing (None) because
OurVersion is your root and then ourVersion.find('CurrentVersion') cannot return what you expect.
Try this instead :
currentVersion = root.find('CurrentVersion')
currentVersion.set('version', version)

Related

Suppress automatically added namespace in etree Python

<rootTag xmlns="model">
<tag>
I have an xml file with a namespace specified as above. I can use etree in Python to parse it, but after making changes and writing it back to the file, etree changes it to this
<rootTag xmlns:ns0="model">
<ns0:tag>
and prepended "ns0" to all the tags. I don't want that to happen.
A sample program is as follows:
et = xml.etree.ElementTree.parse(xml_name)
root = (et.getroot())
root.find('.//*'+pattern).text = new_text
et.write(xml_name)
Is there someway to suppress this automatic change? Thanks
This can be done using register_namespace() by using an empty string for the prefix...
ET.register_namespace("", "model")
Full working example...
import xml.etree.ElementTree as ET
xml = """
<rootTag xmlns="model">
<tag>foo</tag>
</rootTag>
"""
ET.register_namespace("", "model")
root = ET.fromstring(xml)
root.find("{model}tag").text = "bar"
print(ET.tostring(root).decode())
printed output...
<rootTag xmlns="model">
<tag>bar</tag>
</rootTag>
Also see this answer for another example.

Create array of values from specific element in XML using Python

I have an XML file which has many elements. I would like to create a list/array of all the values which have a specific element name, in my case "pair:ApplicationNumber".
I've gone over a lot of the other questions however I am not able to find an answer. I know that I can do this by loading the text file and going over it using pandas however, I'm sure there's a much better way.
I was unsuccessful trying ElementTree as well as XML.Dom using minidom
My code currently looks as follows:
import os
from xml.dom import minidom
WindowsUser = os.getenv('username')
XMLPath = os.path.join('C:\\Users', WindowsUser, 'Downloads', 'ApplicationsByCustomerNumber.xml')
xmldoc = minidom.parse(XMLPath)
itemlist = xmldoc.getElementsByTagName('pair:ApplicationNumber')
for s in itemlist:
print(s.attributes['pair:ApplicationNumber'].value)
an example XML file looks as follows:
<?xml version="1.0" encoding="UTF-8"?>
<pair:PatentApplicationList xsi:schemaLocation="urn:us:gov:uspto:pair PatentApplicationList.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:pair="urn:us:gov:uspto:pair">
<pair:FileHeader>
<pair:FileCreationTimeStamp>2017-07-10T10:52:12.12</pair:FileCreationTimeStamp>
</pair:FileHeader>
<pair:ApplicationStatusData>
<pair:ApplicationNumber>62383607</pair:ApplicationNumber>
<pair:ApplicationStatusCode>20</pair:ApplicationStatusCode>
<pair:ApplicationStatusText>Application Dispatched from Preexam, Not Yet Docketed</pair:ApplicationStatusText>
<pair:ApplicationStatusDate>2016-09-16</pair:ApplicationStatusDate>
<pair:AttorneyDocketNumber>1354-T-02-US</pair:AttorneyDocketNumber>
<pair:FilingDate>2016-09-06</pair:FilingDate>
<pair:LastModifiedTimestamp>2017-05-30T21:40:37.37</pair:LastModifiedTimestamp>
<pair:CustomerNumber>122761</pair:CustomerNumber><pair:LastFileHistoryTransaction>
<pair:LastTransactionDate>2017-05-30</pair:LastTransactionDate>
<pair:LastTransactionDescription>Email Notification</pair:LastTransactionDescription> </pair:LastFileHistoryTransaction>
<pair:ImageAvailabilityIndicator>true</pair:ImageAvailabilityIndicator>
</pair:ApplicationStatusData>
<pair:ApplicationStatusData>
<pair:ApplicationNumber>62292372</pair:ApplicationNumber>
<pair:ApplicationStatusCode>160</pair:ApplicationStatusCode>
<pair:ApplicationStatusText>Abandoned -- Incomplete Application (Pre-examination)</pair:ApplicationStatusText>
<pair:ApplicationStatusDate>2016-11-01</pair:ApplicationStatusDate>
<pair:AttorneyDocketNumber>681-S-23-US</pair:AttorneyDocketNumber>
<pair:FilingDate>2016-02-08</pair:FilingDate>
<pair:LastModifiedTimestamp>2017-06-20T21:59:26.26</pair:LastModifiedTimestamp>
<pair:CustomerNumber>122761</pair:CustomerNumber><pair:LastFileHistoryTransaction>
<pair:LastTransactionDate>2017-06-20</pair:LastTransactionDate>
<pair:LastTransactionDescription>Petition Entered</pair:LastTransactionDescription> </pair:LastFileHistoryTransaction>
<pair:ImageAvailabilityIndicator>true</pair:ImageAvailabilityIndicator>
</pair:ApplicationStatusData>
<pair:ApplicationStatusData>
<pair:ApplicationNumber>62289245</pair:ApplicationNumber>
<pair:ApplicationStatusCode>160</pair:ApplicationStatusCode>
<pair:ApplicationStatusText>Abandoned -- Incomplete Application (Pre-examination)</pair:ApplicationStatusText>
<pair:ApplicationStatusDate>2016-10-26</pair:ApplicationStatusDate>
<pair:AttorneyDocketNumber>1526-P-01-US</pair:AttorneyDocketNumber>
<pair:FilingDate>2016-01-31</pair:FilingDate>
<pair:LastModifiedTimestamp>2017-06-15T21:24:13.13</pair:LastModifiedTimestamp>
<pair:CustomerNumber>122761</pair:CustomerNumber><pair:LastFileHistoryTransaction>
<pair:LastTransactionDate>2017-06-15</pair:LastTransactionDate>
<pair:LastTransactionDescription>Petition Entered</pair:LastTransactionDescription> </pair:LastFileHistoryTransaction>
<pair:ImageAvailabilityIndicator>true</pair:ImageAvailabilityIndicator>
</pair:ApplicationStatusData>
</pair:PatentApplicationList>
The XML in your example is expanding the "pair:" part of the tags according to the schema you've used, so it doesn't match 'pair:ApplicationNumber', even though it looks like it should.
I've used element tree to extract the application numbers as follows (I've just used a local XML file in my examples, rather than the full path in your code)
Example 1:
from xml.etree import ElementTree
tree = ElementTree.parse('ApplicationsByCustomerNumber.xml')
root = tree.getroot()
for item in root:
if 'ApplicationStatusData' in item.tag:
for child in item:
if 'ApplicationNumber' in child.tag:
print child.text
Example 2:
from xml.etree import ElementTree
tree = ElementTree.parse('ApplicationsByCustomerNumber.xml')
root = tree.getroot()
for item in root.iter('{urn:us:gov:uspto:pair}ApplicationStatusData'):
for child in item.iter('{urn:us:gov:uspto:pair}ApplicationNumber'):
print child.text
Hope this may be useful.

Extracting nested namespace from a xml using lxml

I'm new to Python and currently learning to parse XML. All seems to be going well until I hit a wall with nested namespaces.
Below is an snippet of my xml ( with a beginning and child element that I'm trying to parse:
<?xml version="1.0" encoding="UTF-8"?>
-<CompositionPlaylist xmlns="http://www.digicine.com/PROTO-ASDCP-CPL-20040511#">
<!-- Generated by orca_wrapping version 3.8.3-0 -->
<Id>urn:uuid:e0e43007-ca9b-4ed8-97b9-3ac9b272be7a</Id>
-------------
-------------
-------------
-<cc-cpl:MainClosedCaption xmlns:cc-cpl="http://www.digicine.com/PROTO- ASDCP-CC-CPL-20070926#"><Id>urn:uuid:0607e57f-edcc-46ec- 997a-d2fbc0c1ea3a</Id><EditRate>24 1</EditRate><IntrinsicDuration>2698</IntrinsicDuration></cc-cpl:MainClosedCaption>
------------
------------
------------
</CompositionPlaylist>
What I'm need is a solution to extract the URI of the local name 'MainClosedCaption'. In this case, I'm trying to extract the string "http://www.digicine.com/PROTO- ASDCP-CC-CPL-20070926#". I looked through a lot of tutorials but cannot seems to find a solution.
If there's anyone out there can lend your expertise, it would be much appreciated.
Here what I did so far with the help from the two contributors:
#!/usr/bin/env python
from xml.etree import ElementTree as ET #import ElementTree module as an alias ET
from lxml import objectify, etree
def parse():
import os
import sys
cpl_file = sys.argv[1]
xml_file = os.path.abspath(__file__)
xml_file = os.path.dirname(xml_file)
xml_file = os.path.join(xml_file,cpl_file)
with open(xml_file)as f:
xml = f.read()
tree = etree.XML(xml)
caption_namespace = etree.QName(tree.find('.//{*}MainClosedCaption')).namespace
print caption_namespace
print tree.nsmap
nsmap = {}
for ns in tree.xpath('//namespace::*'):
if ns[0]:
nsmap[ns[0]] = ns[1]
tree.xpath('//cc-cpl:MainClosedCaption', namespace=nsmap)
return nsmap
if __name__=="__main__":
parse()
But it's not working so far. I got the result 'None' when I used QName to locate the tag and its namespace. And when I try to locate all namespace in the XML using for loop as suggested in another post, I got the error 'Unknown return type: dict'
Any suggestions pls?
This program prints the namespace of the indicated tag:
from lxml import etree
xml = etree.XML('''<?xml version="1.0" encoding="UTF-8"?>
<CompositionPlaylist xmlns="http://www.digicine.com/PROTO-ASDCP-CPL-20040511#">
<!-- Generated by orca_wrapping version 3.8.3-0 -->
<Id>urn:uuid:e0e43007-ca9b-4ed8-97b9-3ac9b272be7a</Id>
<cc-cpl:MainClosedCaption xmlns:cc-cpl="http://www.digicine.com/PROTO-ASDCP-CC-CPL-20070926#">
<Id>urn:uuid:0607e57f-edcc-46ec- 997a-d2fbc0c1ea3a</Id>
<EditRate>24 1</EditRate>
<IntrinsicDuration>2698</IntrinsicDuration>
</cc-cpl:MainClosedCaption>
</CompositionPlaylist>
''')
print etree.QName(xml.find('.//{*}MainClosedCaption')).namespace
Result:
http://www.digicine.com/PROTO-ASDCP-CC-CPL-20070926#
Reference: http://lxml.de/tutorial.html#namespaces

python lxml etree parsing an fvdl file

The file contains the following lines.
<?xml version="1.0" encoding="UTF-8"?>
<FVDL xmlns="xmlns://www.fortifysoftware.com/schema/fvdl" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="1.9" xsi:type="FVDL">`
<CreatedTS date="2013-08-06" time="11:8:48" />`
I am trying to read the version tag in FVDL.I am using lxml etree and my code snippet is
from lxml import etree
with open(os.path.join(analysis,"merged-results.fvdl") ,"r") as file_handle:
context = etree.parse(file_handle)
ver = context.xpath('//FVDL')
print ver
This had worked before in parsing a standard xml file. However it is failing for the above mentioned file .(ver is an empty list at the end of execution)
Alternative to #falsetru's answer
(By "trying to read the version tag", I understand "the version attribute" (which may not be what you want))
Explicitly register fvdl namespace, under the "fvdl" prefix:
ver = context.xpath('//fvdl:FVDL/#version',
namespaces={"fvdl": "xmlns://www.fortifysoftware.com/schema/fvdl"})
Or, riskier, if somehow you know you want the version attribute from the root node
ver = context.xpath('/*/#version')
Both give ['1.9']
context = etree.parse(file_handle)
ver = context.getroot()
print ver.attrib['version']
output:'1.9'
Use [local-name()=...]:
ver = context.xpath('//*[local-name()="FVDL"]')

python lxml with py2exe

I have Generated an XML with dom and i want to use lxml to pretty print the xml.
this is my code for pretty print the xml
def prettify_xml(xml_str):
import lxml.etree as etree
root = etree.fromstring(xml_str)
xml_str = etree.tostring(root, pretty_print=True)
return xml_str
my output should be an xml formatted string.
I got this code from some post in stactoverflow. This works flawlessly when i am compiling wit python itself. But when i convert my project to a binary created from py2exe (my binary is windows service with a namedpipe).I had two problems:
My service was not starting , i solved this by adding lxml.etree in includes option in py2exe function. then on my service started properly.
when xml generation in called here, is the error which I am seeing in my log
'module' object has no attribute 'fromstring'
where do i rectify this error ? And Is my first problem's solution correct ?
my xml generation Code :
from xml.etree import ElementTree
from xml.dom import minidom
from xml.etree.ElementTree import Element, SubElement, tostring, XML
import lxml.etree
def prettify_xml(xml_str):
root = lxml.etree.fromstring(xml_str)
xml_str = lxml.etree.tostring(root, pretty_print=True)
return xml_str
def dll_xml(status):
try:
xml_declaration = '<?xml version="1.0" standalone="no" ?>'
rootTagName='response'
root = Element(rootTagName)
root.set('id' , 'rp001')
parent = SubElement(root, 'command', opcode ='-ac')
# Create children
chdtag1Name = 'mode'
chdtag1Value = 'repreport'
chdtag2Name='status'
chdtag2Value = status
fullchildtag1 = ''+chdtag1Name+' value = "'+chdtag1Value+'"'
fullchildtag2=''+chdtag2Name+' value="'+chdtag2Value+'"'
children = XML('''<root><'''+fullchildtag1+''' /><'''+fullchildtag2+'''/></root> ''')
# Add parent
parent.extend(children)
dll_xml_doc = xml_declaration + tostring(root)
dll_xml_doc = prettify_xml(dll_xml_doc)
return dll_xml_doc
except Exception , error:
log.error("xml_generation_failed : %s" % error)
Try to use PyInstaller instead py2exe. I converted your program to binary .exe with no problem just by running python pyinstaller.py YourPath\xml_a.py.

Categories