I am making a GUI using appJar(python Library, uses Tkinter). I have an XML file.
I am parsing the XML file using ElementTree XML parsing library.
I want to see my XML file in a tree view.
So I am parsing the file using Element Tree, Getting the tags in Need to show in the Treeview and Forming a new XML object. and passing the new object in the appJar Function: .addTree().
But I am Getting the error as:
..lib\site-packages\appJar\appjar.py", line 8764, in addTree
xmlDoc = parseString(data).
...lib\xml\dom\expatbuilder.py", line 223, in parseString
parser.Parse(string, True)
TypeError: a bytes-like object is required, not 'ElementTree'
xml=et.Element(root)
print(xml)
for ele in valList:
reg=et.SubElement(xml, ele.find('Name').text)
bitroot= ele.findall('Bit')
for bit in bitroot:
et.SubElement(reg, bit.find('Name').text)
xmltree= et.ElementTree(xml)
app.startFrame('bottomleft',1,0,2)
app.setBg('orange')
app.setSticky('news')
app.setStretch('none')
app.addTree('REGISTER', xmltree)
I am getting the error, as far as I can understand is because .addTree() API is unable to read the format of xmltree variable.
According to appJar documentation, you need to pass an XML string to .addTree(), not an ElementTree. According to ElementTree documentation, you can use xml.etree.ElementTree.tostring() to build an XML string from your Element:
xml_string = et.tostring(xml)
app.addTree('REGISTER', xml_string)
Related
I'm trying to parse an XML document I retrieve from the web, but it crashes after parsing with this error:
': failed to load external entity "<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="GreenButtonDataStyleSheet.xslt"?>
That is the second line in the XML that is downloaded. Is there a way to prevent the parser from trying to load the external entity, or another way to solve this? This is the code I have so far:
import urllib2
import lxml.etree as etree
file = urllib2.urlopen("http://www.greenbuttondata.org/data/15MinLP_15Days.xml")
data = file.read()
file.close()
tree = etree.parse(data)
In concert with what mzjn said, if you do want to pass a string to etree.parse(), just wrap it in a StringIO object.
Example:
from lxml import etree
from StringIO import StringIO
myString = "<html><p>blah blah blah</p></html>"
tree = etree.parse(StringIO(myString))
This method is used in the lxml documentation.
etree.parse(source) expects source to be one of
a file name/path
a file object
a file-like object
a URL using the HTTP or FTP protocol
The problem is that you are supplying the XML content as a string.
You can also do without urllib2.urlopen(). Just use
tree = etree.parse("http://www.greenbuttondata.org/data/15MinLP_15Days.xml")
Demonstration (using lxml 2.3.4):
>>> from lxml import etree
>>> tree = etree.parse("http://www.greenbuttondata.org/data/15MinLP_15Days.xml")
>>> tree.getroot()
<Element {http://www.w3.org/2005/Atom}feed at 0xedaa08>
>>>
In a competing answer, it is suggested that lxml fails because of the stylesheet referenced by the processing instruction in the document. But that is not the problem here. lxml does not try to load the stylesheet, and the XML document is parsed just fine if you do as described above.
If you want to actually load the stylesheet, you have to be explicit about it. Something like this is needed:
from lxml import etree
tree = etree.parse("http://www.greenbuttondata.org/data/15MinLP_15Days.xml")
# Create an _XSLTProcessingInstruction object
pi = tree.xpath("//processing-instruction()")[0]
# Parse the stylesheet and return an ElementTree
xsl = pi.parseXSL()
lxml docs for parse says To parse from a string, use the fromstring() function instead.
parse(...)
parse(source, parser=None, base_url=None)
Return an ElementTree object loaded with source elements. If no parser
is provided as second argument, the default parser is used.
The ``source`` can be any of the following:
- a file name/path
- a file object
- a file-like object
- a URL using the HTTP or FTP protocol
To parse from a string, use the ``fromstring()`` function instead.
Note that it is generally faster to parse from a file path or URL
than from an open file object or file-like object. Transparent
decompression from gzip compressed sources is supported (unless
explicitly disabled in libxml2).
You're getting that error because the XML you're loading references an external resource:
<?xml-stylesheet type="text/xsl" href="GreenButtonDataStyleSheet.xslt"?>
LXML doesn't know how to resolve GreenButtonDataStyleSheet.xslt. You and I probably realize that it's going to be available relative to your original URL, http://www.greenbuttondata.org/data/15MinLP_15Days.xml...the trick is to tell lxml how to go about loading it.
The lxml documentation includes a section titled "Document loading and URL resolving", which has just about all the information you need.
I have an xml file with this as the header
<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type='text/xsl' href='\\segotn12805\ppr\PPRData3\StyleSheet\PPRData3.xslt'?>
when I modify the file I use .write (for example)
mytree.write('output.xml')
but the output file does not contain the header info.
The first two lines of the output file look like this
<ns0:pprdata xmlns:ns0="http://ManHub.PPRData">
<ns0:Group name="Models">
any ideas on how I can add the header info to the output file?
The first line is the XML declaration. It is optional, and a parser will assume UTF-8 if not specified.
The second line is a processing instruction.
It would be helpful if you provided more code to show what you are doing, but I suspect that you are using ElementTree. The documentation has this note indicating that by default these are skipped:
Note Not all elements of the XML input will end up as elements of the parsed tree. Currently, this module skips over any XML comments, processing instructions, and document type declarations in the input. Nevertheless, trees built using this module’s API rather than parsing from XML text can have comments and processing instructions in them; they will be included when generating XML output. A document type declaration may be accessed by passing a custom TreeBuilder instance to the XMLParser constructor.
As suggested in this answer, you might want to try using lxml
I am writing code in python that can not only read a xml but also send the results of that parsing as an email. Now I am having trouble just trying to read the file I have in xml. I made a simple python script that I thought would at least read the file which I can then try to email within python but I am getting a Syntax Error in line 4.
root.tag 'log'
Anyways here is the code I written so far:
import xml.etree.cElementTree as etree
tree = etree.parse('C:/opidea.xml')
response = tree.getroot()
log = response.find('log').text
logentry = response.find('logentry').text
author = response.find('author').text
date = response.find('date').text
msg = [i.text for i in response.find('msg')]
Now the xml file has this type of formating
<log>
<logentry
revision="12345">
<author>glv</author>
<date>2012-08-09T13:16:24.488462Z</date>
<paths>
<path
action="M"
kind="file">/trunk/build.xml</path>
</paths>
<msg>BUG_NUMBER:N/A
FEATURE_AFFECTED:N/A
OVERVIEW:Example</msg>
</logentry>
</log>
I want to be able to send an email of this xml file. For now though I am just trying to get the python code to read the xml file.
response.find('log') won't find anything, because:
find(self, path, namespaces=None)
Finds the first matching subelement, by tag name or path.
In your case log is not a subelement, but rather the root element itself. You can get its text directly, though: response.text. But in your example the log element doesn't have any text in it, anyway.
EDIT: Sorry, that quote from the docs actually applies to lxml.etree documentation, rather than xml.etree.
I'm not sure about the reason, but all other calls to find also return None (you can find it out by printing response.find('date') and so on). With lxml ou can use xpath instead:
author = response.xpath('//author')[0].text
msg = [i.text for i in response.xpath('//msg')]
In any case, your use of find is not correct for msg, because find always returns a single element, not a list of them.
I have a etree object called projectxml:
projetxml type <type 'lxml.etree._Element'>
I need to save it on disk, so I convert it to element tree:
savedxml=et.ElementTree(projetxml)
savedxml.write('/home/simon/Vysis.xml')
An other script had to load the the Vysis.xml and two other files of the same kind:
vysis=et.parse('/home/simon/Vysis.xml')
asi=et.parse('/home/simon/ASI.xml')
psi=et.parse('/home/simon/PSI.xml')
Now asi, psi and vysis lxml objects are of the type for example:
<lxml.etree._ElementTree object at 0xa7eaf8c>
My problem is that I can no more do:
R=et.Element('DataBase')
R.append(asi)
R.append(psi)
R.append(vysis)
because of the error:
R.append(asi)
File "lxml.etree.pyx", line 697, in lxml.etree._Element.append (src/lxml /lxml.etree.c:35471)
TypeError: Argument 'element' has incorrect type (expected lxml.etree._Element, got lxml.etree._ElementTree)
I suppose I have two solutions. The first one could be to avoid to convert etree.Element to etree.ElementTree and to save it "directly", but I don't know how. The second solution would be to back convert etree.ElementTree to etree.Element type...There should be a clean solution to save/load a xml object?
The parse function returns an ElementTree, not an Element. If you want to use the results of parse as elements, you need to call getroot.
vysis=et.parse('/home/simon/Vysis.xml').getroot()
asi=et.parse('/home/simon/ASI.xml').getroot()
psi=et.parse('/home/simon/PSI.xml').getroot()
R=et.Element('DataBase')
R.append(asi)
R.append(psi)
R.append(vysis)
I'm trying to parse an XML document I retrieve from the web, but it crashes after parsing with this error:
': failed to load external entity "<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="GreenButtonDataStyleSheet.xslt"?>
That is the second line in the XML that is downloaded. Is there a way to prevent the parser from trying to load the external entity, or another way to solve this? This is the code I have so far:
import urllib2
import lxml.etree as etree
file = urllib2.urlopen("http://www.greenbuttondata.org/data/15MinLP_15Days.xml")
data = file.read()
file.close()
tree = etree.parse(data)
In concert with what mzjn said, if you do want to pass a string to etree.parse(), just wrap it in a StringIO object.
Example:
from lxml import etree
from StringIO import StringIO
myString = "<html><p>blah blah blah</p></html>"
tree = etree.parse(StringIO(myString))
This method is used in the lxml documentation.
etree.parse(source) expects source to be one of
a file name/path
a file object
a file-like object
a URL using the HTTP or FTP protocol
The problem is that you are supplying the XML content as a string.
You can also do without urllib2.urlopen(). Just use
tree = etree.parse("http://www.greenbuttondata.org/data/15MinLP_15Days.xml")
Demonstration (using lxml 2.3.4):
>>> from lxml import etree
>>> tree = etree.parse("http://www.greenbuttondata.org/data/15MinLP_15Days.xml")
>>> tree.getroot()
<Element {http://www.w3.org/2005/Atom}feed at 0xedaa08>
>>>
In a competing answer, it is suggested that lxml fails because of the stylesheet referenced by the processing instruction in the document. But that is not the problem here. lxml does not try to load the stylesheet, and the XML document is parsed just fine if you do as described above.
If you want to actually load the stylesheet, you have to be explicit about it. Something like this is needed:
from lxml import etree
tree = etree.parse("http://www.greenbuttondata.org/data/15MinLP_15Days.xml")
# Create an _XSLTProcessingInstruction object
pi = tree.xpath("//processing-instruction()")[0]
# Parse the stylesheet and return an ElementTree
xsl = pi.parseXSL()
lxml docs for parse says To parse from a string, use the fromstring() function instead.
parse(...)
parse(source, parser=None, base_url=None)
Return an ElementTree object loaded with source elements. If no parser
is provided as second argument, the default parser is used.
The ``source`` can be any of the following:
- a file name/path
- a file object
- a file-like object
- a URL using the HTTP or FTP protocol
To parse from a string, use the ``fromstring()`` function instead.
Note that it is generally faster to parse from a file path or URL
than from an open file object or file-like object. Transparent
decompression from gzip compressed sources is supported (unless
explicitly disabled in libxml2).
You're getting that error because the XML you're loading references an external resource:
<?xml-stylesheet type="text/xsl" href="GreenButtonDataStyleSheet.xslt"?>
LXML doesn't know how to resolve GreenButtonDataStyleSheet.xslt. You and I probably realize that it's going to be available relative to your original URL, http://www.greenbuttondata.org/data/15MinLP_15Days.xml...the trick is to tell lxml how to go about loading it.
The lxml documentation includes a section titled "Document loading and URL resolving", which has just about all the information you need.