I've searched extensively for the past few days and can't seem to find what I'm looking for. I've written a script using Python 2.7.3 and ElementTree to parse an XML file and edit an attribute buried deep within the XML file. The script works fine. I had a meeting late last week with the customer who informed me the target platform will be CentOS. I thought, no problem. To test on the anticipated platform I created a CentOS VMWare client and much to my surprise my script crapped the bed, giving me the error message, "SyntaxError: expected path separator ([)" In the course of my researching the nature of this error message I learned that CentOS 6.4 supports Python 2.6.6, which contains an older version of ElementTree that does not have support for searching for attributes [#attribute] syntax.
This customer won't upgrade Python on the platform, nor will they install additional libraries, so lxml is not an option for me. My question is, can I somehow still access the buried attribute and edit it without the ElementTree support for the [#attribute] facilities?
Here's an example of the kind of XML I'm dealing with:
`
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<my-gui>
<vehicles>
<car vendor="Ford"/>
</vehicles>
<options>
<line transmission='manual'/>
</options>
<title>Dealership</title>
<choice id='manual' title="Dealership">
<pkg-deal id='manual' auth='manager'>.</pkg-deal>
</choice>
<choice id='manual' title='Dealership'/>
<choice id='manual' DealerLocation='Dealer_Loc'/>
<choices-outline color='color_choice'>
<line choice='blue'/>
</choices-outline>
<choice id='cars' GroupID='convertables'>
<pkg-deal id='model.Taurus' version="SEL" arguments='LeatherInterior' enabled='XMRadio'>Taurus</pkg-deal>
<pkg-deal id='model.Mustang' version="GT" enabled='SIRIUSRadio'>Mustang</pkg-deal>
<pkg-deal id='model.Focus' version="SE" enabled='XMRadio'>Focus</pkg-deal>
<pkg-deal id='model.Fairlane'>Fairlane</pkg-deal>
<pkg-deal id='model.Fusion' version="SE" arguments='ClothInerior'>Fusion</pkg-deal>
<pkg-deal id='model.Fiesta' version="S Hatch" enabled="SIRIUSRadio">Fiesta</pkg-deal>
</choice>
</my-gui>
`
Here's a snippet of the successful Python 2.7.3 code that breaks under Python 2.6.6:
if self.root.iterfind('pkg-deal'):
self.deal = self.root.find('.//pkg-deal[#id="model.fusion"]')
self.arg = str(self.deal.get('arguments'))
if self.arg.find('with Scotchguard=') > 0:
QtGui.QMessageBox.information(self, 'DealerAssist', 'The selected car is already updated. Nothing to do.')
self.leave()
self.deal.set('arguments', self.arg + ' with Scotchguard')
...
...
Is there a way I can modify the first line of this 'if' statement block that will allow me to edit the 'arguments' attribute of the Fusion element? Or am I relegated to implementing libxml2, which promises to be a real pain?...
Thanks.
This may be side-stepping the question, but you could just try copying-and-pasting the version of ElementTree from Python 2.7, renaming it to avoid conflicting with the standard library, and importing and using that.
However, since ElementTree isn't meant to be used as a standalone file, what you need to do is navigate to C:\Python27\Lib\xml and copy the entire etree folder and import ElementTree by doing import etree.ElementTree inside your script.
To avoid accidentally importing or using the version of ElementTree from Python 2.6, you should probably rename the etree folder, its contents, delete the .pyc files, and fix the imports inside the file to reference the Python 2.7 version.
This same problem was solved by another user here.
This user filtered the attribute manually in Python 2.6. I'm posting their code example here even though the example pertains specifically to the asker's code:
def final_xml(self,username):
users = self.root.findall("user")
for user in users:
if user.attrib.get('username') == 'user1':
break
else:
raise ValueError('No such user')
# `user` is now set to the correct element
self.root.remove(user)
print user
tree = ET.ElementTree(self.root)
tree.write("msl.xml")
Related
I am trying to learn Python (day 2) and am hoping to practice with Excel books first as this is where I am comfortable/fluent.
Right off the bat I am having an error that I don't quit understand when running the below code:
import openpyxl
wb = openpyxl.load_workbook("/Users/Scott/Desktop/Workbook1.xlsx")
print(wb.sheetnames)
This does print my sheet names as requested, but it is followed by:
/Users/Scott/PycharmProjects/Excel/venv/lib/python3.7/site-packages/openpyxl/worksheet/_reader.py:293: UserWarning: Unknown extension is not supported and will be removed
warn(msg)
I have found other questions that point to slicers/conditional formatting etc, but that does not apply here. This is a book I just made and only added 3 sheets before saving. It has no data, no formatting, and the extension is valid. I have no add-ons installed on my excel either.
Any idea why why I am getting this error? How do I resolve?
Python: 3.7
openpyxl: 2.6
I had a similar issue. I developed an application which read and write Excel files. It woked well on my Windows computer, but then I tried to run it on a friends mac. It showed the same error. I could "fix" it by changing the configuration of the workbook, like this:
import openpyxl as op
wb = op.load_workbook(file, read_only=True, data_only=True)
But, as you can see, you can only read Excel files with this configuration. At the end, I realized that my friend didn't have Microsoft Office installed on his computer. Install it truly solved my problem.
This question was from a couple years ago but I'm encountering it now with openpyxl and require a fix, as the warning is confounding and misleading to my end users.
The warning from openpyxl comes via the stdlib warnings library, which can be suppressed.
import warnings
warnings.simplefilter("ignore")
That's the "hit it with a hammer" approach. More granular levels of warnings suppression can be found here: https://docs.python.org/3/library/warnings.html
This is exactly the problem I encountered just now..
And to my situation (not to everyone) I discovered that you just need to close your excel and rerun the code, very simple.
If this doesn't work, you can refer to other answers.
Thanks
Python - Openpyxl - "UserWarning: Unknown extension" issue
To understand the error, you need to know what's inside an XLSX file. The best way to take a look is to change the extension to zip and open that. Inside you will see a file called [Content_Types].xml and directories for the other content. If you check out the XML in Content_Types you will see a <Types ...> tag containing other tags like this:
<Default Extension="png" ContentType="image/png"/>
<Default Extension="rels" ContentType="application/vnd.openxmlformats-package.relationships+xml"/>
<Default Extension="xml" ContentType="application/xml"/>
Note the "Extension" property. This is what the warning refers to. In the example above, my file included Extension="png" - the unknown extension.
For me, it was enough to specify read_only=True and the error went away eg:
wb = openpyxl.load_workbook(file, read_only=True)
I could also fix the issue by copying everything except the images to a new workbook and saving that. After checking, the xml in the new workbook no longer contained the png property.
Note, reading into pandas with pd.read_excel uses openpyxl and generates the same "Unknown extension" error but there is no way to pass through the read_only parameter. You can suppress the specific warning with:
import warnings
warnings.filterwarnings('ignore', category=UserWarning, module='openpyxl')
I've been helping some students with a Python class this fall, and I've noticed that several students using Mac OS to complete a particular assignment involving validating XML against XSD's are running into OS Errors ("Error reading file: failed to load eternal entity [the xml file]"). Their code works perfectly fine on my end (Windows) but on Mac OS it refuses to.
Here is the code that is causing the problem:
from lxml import etree
xmlschema_doc = etree.parse("the_xsd.xsd")
xmlschema = etree.XMLSchema(xmlschema_doc)
doc = etree.parse("the_xml.xml")
print(xmlschema.validate(doc))
In particular, the line doc = etree.parse("the_xml.xml") is where the error occurs.
I've made sure the students 1) have all their files (XML, Python, XSD) in the same folder, 2) I've suggested they use the full filepaths and 3) I found this bit of code and suggested they try it (to no avail):
prog_dir = os.path.abspath(os.path.dirname(__file__))
os.chdir(prog_dir)
Again: the XML validates against the XSD on Windows just fine, but on their Macs they get the error.
Any insights would be much appreciated.
I have some problem with lxml and python.
I have this code:
import lxml.etree as ET
xml_dom = ET.parse(xml_path)
xslt_dom = ET.parse(xslt_path)
print('transforming...')
transform = ET.XSLT(xslt_dom)
print('transformed: ', transform)
parsed_xml = transform(xml_dom)
print('all good!')
On my local environment, all works good (python 3.6.5 on a virtualenv with lxml 3.6.0).
The problem is, i have this code on a Centos 7 server, with the exact same specs (Python 3.6.5 and lxml 3.6.0), if i execute it from command line, all is good, when i put this code inside a Django (2.0) project, it "freeze" on this part:
transform = ET.XSLT(xslt_dom)
No exceptions, no errors, nothing. The print below that line never executes.
I changed permissions of the files, to apache group, set read permissions, and nothig works.
The weird thing is, from console works nice, from "apache + Django", don't.
Any suggestion?
Thanks.
Here are two different files that my python (2.6) script encounters. One will parse, the other will not. I'm just curious as to why this happens.
This xml file will not parse and the script will fail:
<Landfire_Feedback_Point_xlsform id="fbfm40v10" instanceID="uuid:9e062da6-b97b-4d40-b354-6eadf18a98ab" submissionDate="2013-04-30T23:03:32.881Z" isComplete="true" markedAsCompleteDate="2013-04-30T23:03:32.881Z" xmlns="http://opendatakit.org/submissions">
<date_test>2013-04-17</date_test>
<plot_number>10</plot_number>
<select_multiple_names>BillyBob</select_multiple_names>
<geopoint_plot>43.2452830500 -118.2149402900 210.3000030518 3.0000000000</geopoint_plot><fbfm40_new>GS2</fbfm40_new>
<select_grazing>NONE</select_grazing>
<image_close>1366230030355.jpg</image_close>
<plot_note>No road present.</plot_note>
<n0:meta xmlns:n0="http://openrosa.org/xforms">
<n0:instanceID>uuid:9e062da6-b97b-4d40-b354-6eadf18a98ab</n0:instanceID>
</n0:meta>
</Landfire_Feedback_Point_xlsform>
This xml file will parse correctly and the script succeeds:
<Landfire_Feedback_Point_xlsform id="fbfm40v10">
<date_test>2013-05-14</date_test>
<plot_number>010</plot_number>
<select_multiple_names>BillyBob</select_multiple_names>
<geopoint_plot>43.26630563 -118.39881809 351.70001220703125 5.0</geopoint_plot>
<fbfm40_new>GR1</fbfm40_new>
<select_grazing>HIGH</select_grazing>
<image_close>fbfm40v10_PLOT_010_ID_6.jpg</image_close>
<plot_note>Heavy grazing</plot_note>
<meta><instanceID>uuid:90e7d603-86c0-46fc-808f-ea0baabdc082</instanceID></meta>
</Landfire_Feedback_Point_xlsform>
Here is a little python script that demonstrates that one will work, while the other will not. I'm just looking for an explanation as to why one is seen by ElementTree as an xml file while the other isn't. Specifically, the one that doesn't seem to parse fails with a "'NONE' type doesn't have a 'text' attribute" or something similar. But, it's because it doesn't seem to consider the file as xml or it can't see any elements beyond the opening line. Any explanation or direction with regard to this error would be appreciated. Thanks in advance.
Python script:
import os
from xml.etree import ElementTree
def replace_xml_attribute_in_file(original_file,element_name,attribute_value):
#THIS FUNCTION ONLY WORKS ON XML FILES WITH UNIQUE ELEMENT NAMES
# -DUPLICATE ELEMENT NAMES WILL ONLY GET THE FIRST ELEMENT WITH A GIVEN NAME
#split original filename and add tempfile name
tempfilename="temp.xml"
rootsplit = original_file.rsplit('\\') #split the root directory on the backslash
rootjoin = '\\'.join(rootsplit[:-1]) #rejoin the root diretory parts with a backslash -minus the last
temp_file = os.path.join(rootjoin,tempfilename)
et = ElementTree.parse(original_file)
author=et.find(element_name)
author.text = attribute_value
et.write(temp_file)
if os.path.exists(temp_file) and os.path.exists(original_file): #if both the original and the temp files exist
os.remove(original_file) #erase the original
os.rename(temp_file,original_file) #rename the new file
else:
print "Something went wrong."
replace_xml_attribute_in_file("testfile1.xml","image_close","whoopdeedoo.jpg");
Here is a little python script that demonstrates that one will work, while the other will not. I'm just looking for an explanation as to why one is seen by ElementTree as an xml file while the other isn't.
Your code doesn't demonstrate that at all. It demonstrates that they're both seen by ElementTree as valid XML files chock full of nodes. They both parse just fine, they both read past the first line, etc.
The only problem is that the first one doesn't have a node named 'image_close', so your code doesn't work.
You can see that pretty easily:
for node in et.getroot().getchildren():
print node.tag
You get 9 children of the root, with either version.
And the output to that should show you the problem. The node you want is actually named {http://opendatakit.org/submissions}image_close in the first example, rather than image_close as in the second.
And, as you can probably guess, this is because of the namespace=http://opendatakit.org/submissions in the root node. ElementTree uses the "James Clark notation" for mapping unknown-namespaced names to universal names.
Anyway, because none of the nodes are named image_close, the et.find(element_name) returns None, so your code stores author=None, then tries to assign to author.text, and gets an error.
As for how to fix this problem… well, you could learn how namespaces work by default in ElementTree, or you could upgrade to Python 2.7 or install a newer ElementTree for 2.6 that lets you customize things more easily. But if you want to do custom namespace handling and also stick with your old version… I'd start with this article (and its two predecessors) and this one.
I've written a Python script to create some XML, but I didn't find a way to edit the heading within Python.
Basically, instead of having this as my heading:
<?xml version="1.0" ?>
I need to have this:
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
I looked into this as best I could, but found no way to add standalone status from within Python. So I figured that I'd have to go over the file after it had been created and then replace the text. I read in several places that I should stay far away from using readlines() because it could ruin the XML formatting.
Currently the code I have to do this - which I got from another Stackoverflow post about editing XML with Python - is:
doc = parse('file.xml')
elem = doc.find('xml version="1.0"')
elem.text = 'xml version="1.0" encoding="UTF-8" standalone="no"'
That provides me with a KeyError. I've tried to debug it to no avail, which leads me to believe that perhaps the XML heading wasn't meant to be edited in this way. Or my code is just wrong.
If anyone is curious (or miraculously knows how to insert standalone status into Python code), here is my code to write the xml file:
with open('file.xml', 'w') as f:
f.write(doc.toprettyxml(indent=' '))
Some people don't like "toprettyxml", but with my relatively basic level, it seemed like the best bet.
Anyway, if anyone can provide some advice or insight, I would be very grateful.
The xml.etree API does not give you any options to write out a standalone attribute in the XML declaration.
You may want to use the lxml library instead; it uses the same ElementTree API, but offers more control over the output. tostring() supports a standalone flag:
from lxml import etree
etree.tostring(doc, pretty_print=True, standalone=False)
or use .write(), which support the same options:
doc.write(outputfile, pretty_print=True, standalone=False)