Python xml ElementTree from a string source? - python

The ElementTree.parse reads from a file, how can I use this if I already have the XML data in a string?
Maybe I am missing something here, but there must be a way to use the ElementTree without writing out the string to a file and reading it again.
xml.etree.elementtree

You can parse the text as a string, which creates an Element, and create an ElementTree using that Element.
import xml.etree.ElementTree as ET
tree = ET.ElementTree(ET.fromstring(xmlstring))
I just came across this issue and the documentation, while complete, is not very straightforward on the difference in usage between the parse() and fromstring() methods.

If you're using xml.etree.ElementTree.parse to parse from a file, then you can use xml.etree.ElementTree.fromstring to get the root Element of the document. Often you don't actually need an ElementTree.
See xml.etree.ElementTree

You need the xml.etree.ElementTree.fromstring(text)
from xml.etree.ElementTree import XML, fromstring
myxml = fromstring(text)

io.StringIO is another option for getting XML into xml.etree.ElementTree:
import io
f = io.StringIO(xmlstring)
tree = ET.parse(f)
root = tree.getroot()
Hovever, it does not affect the XML declaration one would assume to be in tree (although that's needed for ElementTree.write()). See How to write XML declaration using xml.etree.ElementTree.

Related

Using xml.etree.ElementTree to parse a buffer, not a file [duplicate]

The ElementTree.parse reads from a file, how can I use this if I already have the XML data in a string?
Maybe I am missing something here, but there must be a way to use the ElementTree without writing out the string to a file and reading it again.
xml.etree.elementtree
You can parse the text as a string, which creates an Element, and create an ElementTree using that Element.
import xml.etree.ElementTree as ET
tree = ET.ElementTree(ET.fromstring(xmlstring))
I just came across this issue and the documentation, while complete, is not very straightforward on the difference in usage between the parse() and fromstring() methods.
If you're using xml.etree.ElementTree.parse to parse from a file, then you can use xml.etree.ElementTree.fromstring to get the root Element of the document. Often you don't actually need an ElementTree.
See xml.etree.ElementTree
You need the xml.etree.ElementTree.fromstring(text)
from xml.etree.ElementTree import XML, fromstring
myxml = fromstring(text)
io.StringIO is another option for getting XML into xml.etree.ElementTree:
import io
f = io.StringIO(xmlstring)
tree = ET.parse(f)
root = tree.getroot()
Hovever, it does not affect the XML declaration one would assume to be in tree (although that's needed for ElementTree.write()). See How to write XML declaration using xml.etree.ElementTree.

Modifying XML file from string source - how to do it?

I have a problem, where I want to change some lines in my XML, but this XML is not in file, it is in string. I am using Python 3.x and lib xml.etree.ElementTree for this purpose.
I have these piece of code which I know works for files in project, but as I said, I want no files, only operations on string sources.
source_tree = ET.ElementTree(ET.fromstring(source_config))
source_tree_root = ET.fromstring(source_config)
for item in source_tree_root.iter('generation'):
item.text = item.text.replace(self.firstarg, self.secondarg)
This works, but I don't know how to save it. I tried
source_tree.write(source_config, encoding='latin-1') but this doesn't work (treats all XML as a name).
I don't think you need both source_tree and source_tree_root. By having both, you're creating two separate things. When you write using source_tree, you don't get the changes made to source_tree_root.
Try creating an ElementTree from source_tree_root (which is just an Element), like this (untested since you didn't supply an mcve)...
source_tree_root = ET.fromstring(source_config)
for item in source_tree_root.iter('generation'):
item.text = item.text.replace(self.firstarg, self.secondarg)
ET.ElementTree(source_tree_root).write("output.xml")
OK, I thought that you were using lxml. My bad. Well here is how you would do it with lxml. I think that lxml is superior.
Here is the basic way of parsing a string into an XML document.
from lxml import etree
doc = etree.fromstring(yourstring)
for e in doc.xpath('//sometagname'):
e.set('foo', 'bar')

How can i see the content of a xml file in python

I'm struggling to find a way of seeing the content of a xml file. I have done a lot of searching and the only progress I am making is to keep running my code without any results
Have a look at the BeautifulSoup package and use the lxml parser.
Based on this url:
https://pymotw.com/2/xml/etree/ElementTree/parse.html
the relevant code:
from xml.etree import ElementTree
with open('example.xml', 'rt') as f:
tree = ElementTree.parse(f)
print tree
This will print the XML file.
It's also good for parsing the file and search elements.

Python ElementTree doesn't seem to recognize text nodes

I am trying to parse a simple XML document located at http://www.webservicex.net/airport.asmx/getAirportInformationByAirportCode?airportCode=jfk using the ElementTree module. The code (so far):
import urllib2
from xml.etree import ElementTree
from xml.etree.ElementTree import Element
from xml.etree.ElementTree import SubElement
url = "http://www.webservicex.net/airport.asmx/getAirportInformationByAirportCode?airportCode=jfk"
s = urllib2.urlopen(url)
print s
document = ElementTree.parse(s)
root = document.getroot()
print root
dataset = SubElement(root, 'NewDataSet')
print dataset
table = SubElement(dataset, 'Table')
print table
airportName = SubElement(table, 'CityOrAirportName')
print airportName.text
The final line yields "none" not the name of the airport in the XML. Can anyone assist? This should be realtively simply, but I am missing something.
Look at the documentation for that module. It says, among other things:
The SubElement() function also provides a convenient way to create new sub-elements for a given element
In particular note the word create. You are creating a new element, not reading the elements that are already there.
If you want to locate certain elements within the parsed XML, read the rest of the documentation on that page to understand how to use the library to do that.

How to read and parse XML without schema in Python?

Is there a way to read an XML document in Python without the schema? In my use case there is a file similar to the following.
<people>
<human>
<weight>75</weight>
<height>174</height>
</human>
<human>
<weight>89</weight>
<height>187</height>
</human>
</people>
I need to extract an array of weight from it. It can easily be done with string manipulation but there must be a cleaner way to do that with XML parser?
You could use ElementTree (included in the python standard library) and do the following:
import xml.etree.ElementTree
tree = xml.etree.ElementTree.parse("foo.xml")
myArray = [int(x.text) for x in tree.getroot().findall("human/weight")]

Categories