I'm not going to parse a xml file but i want to move it in python. (i.e etree)
I know about the basic of etree however all i want to know is about xml namespace.
I've got 3 namespaces in code. Here is the code that i want to transfer.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><ns1:g2sMessage xmlns:ns1="http://www.gamingstandards.com/g2s/schemas/v1.0.3"><ns1:g2sBody ns1:dateTimeSent="2014-08-14T15:30:48.692+09:00" ns1:egmId="TestAppEGMID" ns1:hostId="1"><ns1:communications ns1:deviceId="0" ns1:sessionId="1001" ns1:sessionType="G2S_request" ns1:timeToLive="100" ns1:commandId="1001" ns1:dateTime="2014-08-14T06:30:48.696Z"><ns1:keepAlive/></ns1:communications></ns1:g2sBody></ns1:g2sMessage>TestAppEGMID1
Does anyone have an idea?
Related
I use the xml library in Python3.5 for reading and writing an xml-file. I don't modify the file. Just open and write. But the library modifes the file.
Why is it modified?
How can I prevent this? e.g. I just want to replace specific tag or it's value in a quite complex xml-file without loosing any other informations.
This is the example file
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<movie>
<title>Der Eisbär</title>
<ids>
<entry>
<key>tmdb</key>
<value xsi:type="xs:int" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">9321</value>
</entry>
<entry>
<key>imdb</key>
<value xsi:type="xs:string" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">tt0167132</value>
</entry>
</ids>
</movie>
This is the code
import xml.etree.ElementTree as ET
tree = ET.parse('x.nfo')
tree.write('y.nfo', encoding='utf-8')
And the xml-file becomes this
<movie xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<title>Der Eisbär</title>
<ids>
<entry>
<key>tmdb</key>
<value xsi:type="xs:int">9321</value>
</entry>
<entry>
<key>imdb</key>
<value xsi:type="xs:string">tt0167132</value>
</entry>
</ids>
</movie>
Line 1 is gone.
The <movie>-tag in line 2 has attributes now.
The <value>-tag in line 7 and 11 now has less attributes.
Note that "xml package" and "the xml library" are ambiguous. There are several XML-related modules in the standard library: https://docs.python.org/3/library/xml.html.
Why is it modified?
ElementTree moves namespace declarations to the root element, and namespaces that aren't actually used in the document are removed.
Why does ElementTree do this? I don't know, but perhaps it is a way to make the implementation simpler.
How can I prevent this? e.g. I just want to replace specific tag or it's value in a quite complex xml-file without loosing any other informations.
I don't think there is a way to prevent this. The issue has been brought up before. Here are two very similar questions with no answers:
How do I parse and write XML using Python's ElementTree without moving namespaces around?
Keep Existing Namespaces when overwriting XML file with ElementTree and Python
My suggestion is to use lxml instead of ElementTree. With lxml, the namespace declarations will remain where they occur in the original file.
Line 1 is gone.
That line is the XML declaration. It is recommended but not mandatory to have one.
If you always want an XML declaration, use xml_declaration=True in the write() method call.
This is my code:
from xml.dom import minidom as md
doc = md.parse('file.props')
# operations with doc
xml_file = open('file.props', "w")
doc.writexml(xml_file, encoding="utf-8")
xml_file.close()
I parse an XML, I do some operations, than I open and write on it. But for example, if in my file got:
<MY_TAG />
^
it rewrites as:
<MY_TAG/>
^
I know this can seem irrelevant, but my file is constantly monitored by version control GIT, which say that line is "different" on every write.
The same with the header:
<?xml version="1.0" encoding="utf-8"?>
<Project ToolsVersion="4.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
it becomes:
<?xml version="1.0" encoding="utf-8"?><Project ToolsVersion="4.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
Which is pretty annoying. Any clues?
Retaining quirks of formatting in XML through parsing and serialization is pretty well impossible. If you need to do text-level comparisons, the only way to do it is to canonicalize the formats that you are comparing (there are various XML canonicalization tools out there).
In principle you can configure git to use an XML-aware diff tool for comparisons, but please don't ask me for details, it's not something I've ever done myself. I've always just lived with the fact that it works badly.
I am creating an xml file and I would like to insert namespaces into header. I want to get this.
<ns2:prenosPodatkovRazporedaOdgovorSporocilo xmlns="http://someurl" xmlns:ns2="http://someotherurl" xmlns:ns3="http://another_someotherurl">
<ns2:statusPrenosa>
<status>09</status>
<nazivStatusa>Napaka prenosa</nazivStatusa>
while I get this
<?xml version='1.0' encoding='UTF-8'?>
<prenosPodatkovRazporedaOdgovorSporocilo>
<ns0:podatkiRazporeda xmlns:ns0="ns2">
<ns2:podatkiRazporeda xmlns:ns2="ns3">
.....
register_namespace does not do the trick or how do I include them when using tostring() function
etree.register_namespace('n2',"http://someurl")
etree.register_namespace('n3',"http://someother_url")
etree.register_namespace('n4',"http://another_someotherurl")
hope I was clear enough
thank you
I'm using Python's elementtree to parse some XML configuration files.
At the top of the file, I have a root element like this:
<?xml version="1.0" encoding="utf-8"?>
<sgx:FooConfig
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:foo="http://ns.au.firm.com/foo.xsd"
xmlns:bar="http://ns.au.firm.com/bar.xsd"
>
The problem is, the bar namespace can be set to one of two different XSDs, depending on the version of the configuration file.
I'm looking for a way to print out the namespace mapping using ElementTree, so I can check which of the two XSDs is being used - then I can get my code to handle the correct case.
Is there a way to print out all the namespace definitions out using Python?
Cheers,
Victor
What you have is not valid xml (undefined prefixes) and I think you can't do this with xml.etree but you should be able to do it using lxml.
import lxml.etree as et
tree = et.XML(yourxml)
print tree.nsmap
I've written a Python script to create some XML, but I didn't find a way to edit the heading within Python.
Basically, instead of having this as my heading:
<?xml version="1.0" ?>
I need to have this:
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
I looked into this as best I could, but found no way to add standalone status from within Python. So I figured that I'd have to go over the file after it had been created and then replace the text. I read in several places that I should stay far away from using readlines() because it could ruin the XML formatting.
Currently the code I have to do this - which I got from another Stackoverflow post about editing XML with Python - is:
doc = parse('file.xml')
elem = doc.find('xml version="1.0"')
elem.text = 'xml version="1.0" encoding="UTF-8" standalone="no"'
That provides me with a KeyError. I've tried to debug it to no avail, which leads me to believe that perhaps the XML heading wasn't meant to be edited in this way. Or my code is just wrong.
If anyone is curious (or miraculously knows how to insert standalone status into Python code), here is my code to write the xml file:
with open('file.xml', 'w') as f:
f.write(doc.toprettyxml(indent=' '))
Some people don't like "toprettyxml", but with my relatively basic level, it seemed like the best bet.
Anyway, if anyone can provide some advice or insight, I would be very grateful.
The xml.etree API does not give you any options to write out a standalone attribute in the XML declaration.
You may want to use the lxml library instead; it uses the same ElementTree API, but offers more control over the output. tostring() supports a standalone flag:
from lxml import etree
etree.tostring(doc, pretty_print=True, standalone=False)
or use .write(), which support the same options:
doc.write(outputfile, pretty_print=True, standalone=False)