I am trying to take an xml document parsed with lxml objectify in python and add subelements to it.
The problem is that I can't work out how to do this. The only real option I've found is a complete reconstruction of the data with objectify.Element and objectify.SubElement, but that... doesn't make any sense.
With JSON for instance, I can just import the data as an object and navigate to any section and read, add and edit data super easily.
How do I parse an xml document with objectify so that I can add subelement data to it?
data.xml
<?xml version='1.0' encoding='UTF-8'?>
<data>
<items>
<item> text_1 </item>
<item> text_2 </item>
</items>
</data>
I'm sure there is an answer on how to do this online but my search terminology is either bad or im dumb, but I can't find a solution to this question. I'm sorry if this is a duplicate question.
I guess it has been quite difficult explain the question, but the problem can essentially be defined by an ambiguity with how .Element and .SubElement can be applied.
This reference contains actionable and replicable ways in which one can append or add data to a parsed XML file.
Solving the key problem of:
How do I reference content in a nested tag without reconstructing the entire tree?
The author uses the find feature, which is called with root.findall(./tag)
This allows one to find the nested tag that they wanted without having to reconstruct the tree.
Here is one of the examples that they have used:
cost = ["$2000","$3000","$4000")
traveltime = ["3hrs", "8hrs", "15hrs"]
i = 0
for elm in root.findall("./country"):
ET.SubElement(elm, "Holiday", attrib={"fight_cost": cost[i],
"trip_duration": traveltime[i]})
i += 1
This example also answers the question of How do you dynamically add data to XML?
They have accomplished this by using a list outside of the loop and ieteration through it with i
In essence, this example helps explain how to reference nested content without remaking the entire tree, as well as how to dynamically add data to xml trees.
Related
XML file
<?xml version="1.0" encoding="utf-8"?>
<Info xmlns="BuildTest">
<RequestDate>5/4/2020 12:27:46 AM</RequestDate>
</Info>
I want to add a new element inside the Info tag.
Here is what I did.
import xml.etree.ElementTree as ET
tree = ET.parse('example.xml')
root = tree.getroot()
ele = ET.Element('element1')
ele.text = 'ele1'
root.append(ele)
tree.write("output.xhtml")
Output
<ns0:Info xmlns:ns0="BuildTest">
<ns0:RequestDate>5/4/2020 12:27:46 AM</ns0:RequestDate>
<element1>ele1</element1></ns0:Info>
Three questions:
The <?xml version="1.0" encoding="utf-8"?> is missing.
The namespace is wrong.
The whitespace of the new element is gone.
I saw many questions related to this topic, most of them are suggesting other packages.
Is there any way it can handle properly?
The processing instructions are not considered XML elements. Just Google are processing instructions part of an XML, and the first result states:
Processing instructions are markup, but they're not elements.
Since the package you are using is literally called ElementTree, you can reasonably expect its objects to be a trees of elements. If I remember correctly, DOM compliant XML packages can support non-element markup in XML.
For the namespace issue, the answer is in stack overflow, at Remove ns0 from XML - you just have to register the namespace you specified in the top element of your document. The following worked for me:
ET.register_namespace("", "Buildtest")
As for the whitespace - the new element does not have any whitespace. You can assign to the tail member to add a linefeed after an element.
I have a xml response from one of my system where i am trying to get the value using python code. Need experts view on highlighting my mistake.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><ns3:loginResponse xmlns:ns2="http://ws.core.product.xxxxx.com/groupService/" xmlns:ns3="http://ws.core.product.xxxxx.com/loginService/" xmlns:ns4="http://ws.core.product.xxxxx.com/userService/"><ns3:return>YWeVDwuZwHdxxxxxxxxxxx_GqLtkNTE.</ns3:return></ns3:loginResponse>
I am using the below code of code and had no luck in getting the value - YWeVDwuZwHdxxxxxxxxxxx_GqLtkNTE . I haven't used xml parsing with namespace. response.text has the above xml response.
responsetree = ET.ElementTree(ET.fromstring(response.text))
responseroot = responsetree.getroot()
for a in root.iter('return'):
print(a.attrib)
YWeVDwuZwHdxxxxxxxxxxx_GqLtkNTE is not in the attrib. It is the element text
The attrib in this case is an empty dict
See https://www.cmi.ac.in/~madhavan/courses/prog2-2012/docs/diveintopython3/xml.html about parsing XML dics using namespace.
Reference from other answer helped to understand the concepts.
Once I understood the xml structure , Its plain simple. Just adding the output it might help someone in future for quick reference.
responsetree = ET.ElementTree(ET.fromstring(response.text))
responseroot = responsetree.getroot()
root[0].text
Keeping it simple for understanding. You might need to find the len(root) and/or iterate over the loop with condition to get apt value. You can also use findall , find along with to get the interested item.
Python, currently using 2.7 but can easily change to latest and greatest.
Needing to parse this XML and return the INT value contained within the item. This isn't my XML. This is coming from a piece of enterprise level software.
<counters>
<item name="stats/counters/session/responsetime" type="int">1047</item>
<item name="stats/counters/session/responsecount" type="int">7423</item>
<item name="stats/counters/init/inittime" type="int">36339</item>
<item name="stats/counters/init/fetchtime" type="int">8097</item>
<item name="stats/connectionsetups" type="int">579</item>
<item name="stats/activesessions" type="int">4294967289</item>
<item name="stats/activeconnections" type="int">0</item>
</counters>
Code:
import xml.etree.ElementTree as ET
import xml
def _getstats():
resp = requests.get(urlStats)
#Writing XML to disk. This makes parsing it MUCH easier.
with open('stats_10.xml', 'wb') as f:
f.write(resp.content)
f.close()
tree = ET.parse('stats_10.xml')
root = tree.getroot()
active = root.find('stats/activesessions')
print active
The return is always None. I'm Using ElementTree. Read through the documentation (https://docs.python.org/3.0/library/xml.etree.elementtree.html) and many StackOF pages.
I think the problem is that the parser doesn't understand the slash.
Attempted to pull by name using "active = int(root['stats/activesessions'])" in place of root find which returns this error:
TypeError: list indices must be integers, not str
Also tried xmltodict but that was even worse that using ElementTree. The error would always be 'list indices must be integers'.
Lastly, this is a dynamic XML document. Indexing by ROW is not an option because at idle, the software returns 10 rows for example and under a load it return 15, with additional rows being mixed with the other rows. I have to pull by child name.
Thank you in advance for any assistance!
ADDITION:
I can run an iteration through the XML and pull the value. However, as stated above, the XML will change and the number of rows will increase, thus throwing my indices off.
active = root[5].text
print active
I believe the find method is looking for a tag name, not an attribute value. You need to find the item tag, check if it has a name attribute, and then check if the attribute equals "stats/activesessions". If this condition is met, you can read in the value of the item tag.
This is obviously me not understanding XML and how it's structured. Added this in my code and I get the return value I'm looking for.
for item in root.findall("./item[#name='system/starttime']"):
starttime = int(item.text)
I have many graphml files starting with:
<?xml version="1.0" encoding="UTF-8"?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns/graphml"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns/graphml">
I need to change the xmlns and xsi attributes to reflect proper values for this XML file format specification:
<?xml version="1.0" encoding="UTF-8"?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns
http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
I tried to change these values with BeautifulSoup like:
soup = BeautifulSoup(myfile, 'html.parser')
soup.graphml['xmlns'] = 'http://graphml.graphdrawing.org/xmlns'
soup.graphml['xsi:schemalocation'] = "http://graphml.graphdrawing.org/xmlns http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd"
It works fine but it is definitely too slow on some of my larger files, so I am trying to do the same with lxml, but I don't understand how to achieve the same result. I sort of managed to reach the attributes, but don't know how to change them:
doc = etree.parse(myfile)
root = doc.getroot()
root.attrib
> {'{http://www.w3.org/2001/XMLSchema-instance}schemaLocation': 'http://graphml.graphdrawing.org/xmlns/graphml'}
What is the right way to accomplish this task?
When you say that you have many files "starting with" those 4 lines, if you really mean they're exactly like that, the fastest way is probably to entirely ignore that fact that it's XML, and just replace those lines.
In Python, just read the first four lines, compare them to what you expect (so you can issue a warning if they don't match), then discard them. Write out the new four lines you want, then copy the rest of the file out. Repeat for each file.
On the other hand, if you have namespace attributes anywhere else in the file this method wouldn't catch them, and you should probably do a real XML-based solution. With a regular SAX parser, you get a callback for each element start, element end, text node, etc. as it comes along. So you'd just copy them out until you hit the one(s) you want (in this case, a graphml element), then instead of copying out that start-tag, write out the new one you want. Then back to copying. XSLT is also a fine way to do this, which would let you write a tiny generic copier, plus one rule to handle the graphml element.
I am looking at a piece of XML that I want to add a node in.
<profile>
<dog>1</dog>
<halfdog>0</halfdog>
<cat>545</cat>
<lions>0</lions>
<bird>23</bird>
<dino>0</dino>
<pineapples>2</pineapples>
<people>0</people>
</profile>
With the above XML, I'm able to insert XML nodes into it. However, I'm not able to insert it at exact locations.
Is there a way to find if I am next to a certain node, whether it be before or after. Say if I wanted to add <snail>2</snail> between the <dino>0</dino> and <pineapples>2</pineapples> nodes.
Using ElementTree how can I find what node is next to me? I'm asking about ElementTree or any standard Python library. Unfortunately, lxml is out of the question for me.
I believe its not doable using ElementTree, but you can do it using the standard python minidom:
# create snail element
snail = dom.createElement('snail')
snail_text = dom.createTextNode('2')
snail.appendChild(snail_text)
# add it in the right place
profile = dom.getElementsByTagName('profile')[0]
pineapples = dom.getElementsByTagName('pineapples')[0]
profile.insertBefore(snail, pineapples)
output:
<?xml version="1.0" ?><profile>
<dog>1</dog>
<halfdog>0</halfdog>
<cat>545</cat>
<lions>0</lions>
<bird>23</bird>
<dino>0</dino>
<snail>2</snail><pineapples>2</pineapples>
<people>0</people>
</profile>
If you know the parent element and the element to insert before, you can use the following method with ElementTree:
index = parentElem.getchildren().index(elemToInsertBefore)
parent.insert(index, newElement)