How can i see the content of a xml file in python - python

I'm struggling to find a way of seeing the content of a xml file. I have done a lot of searching and the only progress I am making is to keep running my code without any results

Have a look at the BeautifulSoup package and use the lxml parser.

Based on this url:
https://pymotw.com/2/xml/etree/ElementTree/parse.html
the relevant code:
from xml.etree import ElementTree
with open('example.xml', 'rt') as f:
tree = ElementTree.parse(f)
print tree
This will print the XML file.
It's also good for parsing the file and search elements.

Related

Using xml.etree.ElementTree to parse a buffer, not a file [duplicate]

The ElementTree.parse reads from a file, how can I use this if I already have the XML data in a string?
Maybe I am missing something here, but there must be a way to use the ElementTree without writing out the string to a file and reading it again.
xml.etree.elementtree
You can parse the text as a string, which creates an Element, and create an ElementTree using that Element.
import xml.etree.ElementTree as ET
tree = ET.ElementTree(ET.fromstring(xmlstring))
I just came across this issue and the documentation, while complete, is not very straightforward on the difference in usage between the parse() and fromstring() methods.
If you're using xml.etree.ElementTree.parse to parse from a file, then you can use xml.etree.ElementTree.fromstring to get the root Element of the document. Often you don't actually need an ElementTree.
See xml.etree.ElementTree
You need the xml.etree.ElementTree.fromstring(text)
from xml.etree.ElementTree import XML, fromstring
myxml = fromstring(text)
io.StringIO is another option for getting XML into xml.etree.ElementTree:
import io
f = io.StringIO(xmlstring)
tree = ET.parse(f)
root = tree.getroot()
Hovever, it does not affect the XML declaration one would assume to be in tree (although that's needed for ElementTree.write()). See How to write XML declaration using xml.etree.ElementTree.

Modifying XML file from string source - how to do it?

I have a problem, where I want to change some lines in my XML, but this XML is not in file, it is in string. I am using Python 3.x and lib xml.etree.ElementTree for this purpose.
I have these piece of code which I know works for files in project, but as I said, I want no files, only operations on string sources.
source_tree = ET.ElementTree(ET.fromstring(source_config))
source_tree_root = ET.fromstring(source_config)
for item in source_tree_root.iter('generation'):
item.text = item.text.replace(self.firstarg, self.secondarg)
This works, but I don't know how to save it. I tried
source_tree.write(source_config, encoding='latin-1') but this doesn't work (treats all XML as a name).
I don't think you need both source_tree and source_tree_root. By having both, you're creating two separate things. When you write using source_tree, you don't get the changes made to source_tree_root.
Try creating an ElementTree from source_tree_root (which is just an Element), like this (untested since you didn't supply an mcve)...
source_tree_root = ET.fromstring(source_config)
for item in source_tree_root.iter('generation'):
item.text = item.text.replace(self.firstarg, self.secondarg)
ET.ElementTree(source_tree_root).write("output.xml")
OK, I thought that you were using lxml. My bad. Well here is how you would do it with lxml. I think that lxml is superior.
Here is the basic way of parsing a string into an XML document.
from lxml import etree
doc = etree.fromstring(yourstring)
for e in doc.xpath('//sometagname'):
e.set('foo', 'bar')

Find all titles in an XML with Elementree from a bz2 file

I'm new to parsing in XML and am stuck with my code regarding finding all titles (title tags) in an XML. This is what I came up with, but it is returning just an empty list, while there should be titles in there.
import bz2
from xml.etree import ElementTree as etree
def parse_xml(filename):
with bz2.BZ2File(filename) as f:
doc = etree.parse(f)
titles = doc.findall('.//{http://www.mediawiki.org/xml/export-0.7/}title')
print titles[:10]
Can someone tell me why this is not working properly? Just to be clear; I need to find all text inside title tags stored in a list, taken from an XML wrapped in a bz2 file (as far as I read the best way is without unzipping).

Writing modified Beautiful Soup tree to file, while maintaining original XML formatting

We have an XML document that has a tag we wish to alter:
...<version>1.0</version>...
It's buried deep in the XML file, but we're successfully able to use Beautiful Soup to replace its contents with a command-line parameter.
The problem is that after modifying the tree, we need to write back to the file we read it from. But, we want to maintain the original formatting of the document. When I use:
fileForWriting = open(myXmlFile, 'w')
fileForWriting.write(soup.prettify())
The prettify() call breaks the formatting, and I end up with:
<version>
1.0
</version>
Is there any way to maintain the original formatting of the XML document, while replacing that single tag text?
Note: Using simply:
fileForWriting.write(str(soup))
Keeps the text and tags on the same line, but eliminates the indents and extra newlines that had been human-added for readability. Close, but no cigar.
By request, the entire script:
from BeautifulSoup import BeautifulSoup as bs
import sys
xmlFile = sys.argv[1:][0]
version = sys.argv[1:][1]
fileForReading = open(xmlFile, 'r')
xmlString = fileForReading.read()
fileForReading.close()
soup = bs(xmlString)
soup.findAll('version')[1].contents[0].replaceWith(version)
fileForWriting = open(xmlFile, 'w')
fileForWriting.write(str(soup))
fileForWriting.close()
The script is then run using:
python myscript.py someFile.xml 1.2
And if you use xml.elementtree, the tree.write(file) method replaces the CRLF by LF only, which also creates issues when trying to import the XML file into i.e. PyXB.
The solution I found is to use ElementTree just to find what I have to replace. Then I do source_XML = 'new value'.join(source_XML.split('what you need to replace)) Finally a file.write(source_XML)
it's not nice, but it solves the issue. However, I do not mind about the indentations, so on this I can't really say. I would only use pprint.pprint() whenever I need to print it.

Python xml ElementTree from a string source?

The ElementTree.parse reads from a file, how can I use this if I already have the XML data in a string?
Maybe I am missing something here, but there must be a way to use the ElementTree without writing out the string to a file and reading it again.
xml.etree.elementtree
You can parse the text as a string, which creates an Element, and create an ElementTree using that Element.
import xml.etree.ElementTree as ET
tree = ET.ElementTree(ET.fromstring(xmlstring))
I just came across this issue and the documentation, while complete, is not very straightforward on the difference in usage between the parse() and fromstring() methods.
If you're using xml.etree.ElementTree.parse to parse from a file, then you can use xml.etree.ElementTree.fromstring to get the root Element of the document. Often you don't actually need an ElementTree.
See xml.etree.ElementTree
You need the xml.etree.ElementTree.fromstring(text)
from xml.etree.ElementTree import XML, fromstring
myxml = fromstring(text)
io.StringIO is another option for getting XML into xml.etree.ElementTree:
import io
f = io.StringIO(xmlstring)
tree = ET.parse(f)
root = tree.getroot()
Hovever, it does not affect the XML declaration one would assume to be in tree (although that's needed for ElementTree.write()). See How to write XML declaration using xml.etree.ElementTree.

Categories