How to insert a child subelement in Python ElementTree - python

My xml :-
<users>
</users>
i need to just append a child element :-
<users>
<user name="blabla" age="blabla" ><group>blabla</group>
</users>
My code giving some error :(
import xml.etree.ElementTree as ET
doc = ET.parse("users.xml")
root_node = doc.find("users")
child = ET.SubElement(root_node, "user")
child.set("username","srquery")
group = ET.SubElement(child,"group")
group.text = "fresher"
tree = ET.ElementTree(root_node)
tree.write("users.xml")
I missed out "append" , but i don't have idea where to add that. Thanks in advance.

Change this line
root_node = doc.find("users")
...to this line
root_node = doc.getroot()
The key takeaway here is that doc is already a reference to the root node, and is accessed with getroot(). doc.find('users') will not return anything, since users is not a child of the root, it's the root itself.

A slightlly modified version to explain what happens:
root = ET.fromstring('<users></users>') # same as your doc=ET.parse(...).find(...), btw. doc=root
el = ET.Element('group') # creating a new element/xml-node
root.append(el) # and adding it to the root
ET.tostring(root)
>>> '<users><group /></users>'
el.text = "fresher" # adding your text
ET.tostring(root)
>>> '<users><group>fresher</group></users>'

Related

How to add Subelements and its in xml

I have a xml file which has subelements:-
.......
.......
<EnabledFeatureListForUsers>
<FeatureEntitlementDetail>
<UserName>xyz#xyz.com</UserName>
<FeatureList>
<FeatureDetail>
<FeatureId>X</FeatureId>
</FeatureDetail>
</FeatureList>
</FeatureEntitlementDetail>
</EnabledFeatureListForUsers>
.....
.....
I want to add a new sub element FeatureEntitlementDetail with all its subelements/children like username, Feature List, Feature Detail, Feature Id. I tried using SubElement function, but it only adds FeatureEntitlementDetail />. The code which I used was :-
import xml.etree.ElementTree as ET
filename = "XYZ.xml"
xmlTree = ET.parse(filename)
root = xmlTree.getroot()
for element in root.iter('EnabledFeatureListForUsers'):
ET.SubElement(element,"FeatureEntitlementDetail")
Any help is appreciated.
See below
import xml.etree.ElementTree as ET
xml = """
<EnabledFeatureListForUsers>
<FeatureEntitlementDetail>
<UserName>xyz#xyz.com</UserName>
<FeatureList>
<FeatureDetail>
<FeatureId>X</FeatureId>
</FeatureDetail>
</FeatureList>
</FeatureEntitlementDetail>
</EnabledFeatureListForUsers>
"""
root = ET.fromstring(xml)
fed = ET.SubElement(root,'FeatureEntitlementDetail')
un = ET.SubElement(fed,'UserName')
un.text = 'abc.zz.net'
fl = ET.SubElement(fed,'FeatureList')
df = ET.SubElement(fl,'FeatureDetail')
fi = ET.SubElement(df,'FeatureId')
fi.text = 'Z'
ET.dump(root)
output
<?xml version="1.0" encoding="UTF-8"?>
<EnabledFeatureListForUsers>
<FeatureEntitlementDetail>
<UserName>xyz#xyz.com</UserName>
<FeatureList>
<FeatureDetail>
<FeatureId>X</FeatureId>
</FeatureDetail>
</FeatureList>
</FeatureEntitlementDetail>
<FeatureEntitlementDetail>
<UserName>abc.zz.net</UserName>
<FeatureList>
<FeatureDetail>
<FeatureId>Z</FeatureId>
</FeatureDetail>
</FeatureList>
</FeatureEntitlementDetail>
</EnabledFeatureListForUsers>

Python ElementTree adding a child

I have an xml file which looks like this:
<keyboard>
</keyboard>
I want to make it look like this:
<keyboard>
<keybind key="W-c-a"><action name="Execute"><command>sudo shutdown now</command></action></keybind>
</keyboard>
I have a function to add this which has parameters that will change the key and the command. Is this possible to do? If yes, how can I do this?
(The function):
def add_keybinding(self, keys, whatToExec):
keybinding = "<keybind key=\"%s\"><action name=\"Execute\"><command>%s</command><action></keybind>" % (keys, whatToExec)
f = open("/etc/xdg/openbox/rc.xml", "a")
try:
# I want to append the keybinding variable to the <keyboard>
except IOError as e:
print(e)
From the doc, you can try the following:
def add_keybinding(keys, whatToExec, filename):
keybind = ET.Element('keybind')
keybind.set("key", keys)
action = ET.SubElement(keybind, 'action')
action.set("name", "Execute")
command = ET.SubElement(action, 'command')
command.text = whatToExec
tree = ET.parse(filename)
tree.getroot().append(keybind)
tree.write(filename)
Explanation:
Create the keybind tag using xml.etree.ElementTree.Element : keybind = ET.Element('keybind')
Set a property using set: keybind.set("key", keys)
Create the action tag as a sub element of keybind using
xml.etree.ElementTree.SubElement: action = ET.SubElement(keybind, 'action')
Set the property as at step 2: action.set("name", "Execute")
Create command tag: action.set("name", "Execute")
Set command tag content using .text: command.text = whatToExec
Read file using xml.etree.ElementTree.parse: tree = ET.parse(filename)
Append keybind tag to the doc root element using append*
Export new xml to file using write
Full example:
import xml.etree.ElementTree as ET
from xml.dom import minidom
def add_keybinding(keys, whatToExec, filename):
keybind = ET.Element('keybind')
keybind.set("key", keys)
action = ET.SubElement(keybind, 'action')
action.set("name", "Execute")
command = ET.SubElement(action, 'command')
command.text = whatToExec
tree = ET.parse(filename)
tree.getroot().append(keybind)
tree.write(filename)
return tree
def prettify(elem):
rough_string = ET.tostring(elem, 'utf-8')
return minidom.parseString(rough_string).toprettyxml(indent=" ")
filename = "test.xml"
for i in range(3):
tree = add_keybinding(str(i), "whatToExec " + str(i), filename)
print(prettify(tree.getroot()))
Output:
<?xml version="1.0" ?>
<keyboard>
<keybind key="0">
<action name="Execute">
<command>whatToExec 0</command>
</action>
</keybind>
<keybind key="1">
<action name="Execute">
<command>whatToExec 1</command>
</action>
</keybind>
<keybind key="2">
<action name="Execute">
<command>whatToExec 2</command>
</action>
</keybind>
</keyboard>

Using "info.get" for a child element in Python / lxml

I'm trying to get the attribute of a child element in Python, using lxml.
This is the structure of the xml:
<GroupInformation groupId="crid://thing.com/654321" ordered="true">
<GroupType value="show" xsi:type="ProgramGroupTypeType"/>
<BasicDescription>
<Title type="main" xml:lang="EN">A programme</Title>
<RelatedMaterial>
<HowRelated href="urn:eventis:metadata:cs:HowRelatedCS:2010:boxCover">
<Name>Box cover</Name>
</HowRelated>
<MediaLocator>
<mpeg7:MediaUri>file://ftp.something.com/Images/123456.jpg</mpeg7:MediaUri>
</MediaLocator>
</RelatedMaterial>
</BasicDescription>
The code I've got is below. The bit I want to return is the 'value' attribute ("Show" in the example) under 'grouptype' (third line from the bottom):
file_name = input('Enter the file name, including .xml extension: ')
print('Parsing ' + file_name)
from lxml import etree
parser = etree.XMLParser()
tree = etree.parse(file_name, parser)
root = tree.getroot()
nsmap = {'xmlns': 'urn:tva:metadata:2010','mpeg7':'urn:tva:mpeg7:2008'}
with open(file_name+'.log', 'w', encoding='utf-8') as f:
for info in root.xpath('//xmlns:GroupInformation', namespaces=nsmap):
crid = info.get('groupId'))
grouptype = info.find('.//xmlns:GroupType', namespaces=nsmap)
gtype = grouptype.get('value')
titlex = info.find('.//xmlns:BasicDescription/xmlns:Title', namespaces=nsmap)
title = titlex.text if titlex != None else 'Missing'
Can anyone explain to me how to implement it? I had a quick look at the xsi namespace, but was unable to get it to work (and didn't know if it was the right thing to do).
Is this what you are looking for?
grouptype.attrib['value']
PS: why the parenthesis around assignment values? Those look unnecessary.

How should I parse this xml string in python?

My XML string is -
xmlData = """<SMSResponse xmlns="http://example.com" xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
<Cancelled>false</Cancelled>
<MessageID>00000000-0000-0000-0000-000000000000</MessageID>
<Queued>false</Queued>
<SMSError>NoError</SMSError>
<SMSIncomingMessages i:nil="true"/>
<Sent>false</Sent>
<SentDateTime>0001-01-01T00:00:00</SentDateTime>
</SMSResponse>"""
I am trying to parse and get the values of tags - Cancelled, MessageId, SMSError, etc. I am using python's Elementtree library. So far, I have tried things like -
root = ET.fromstring(xmlData)
print root.find('Sent') // gives None
for child in root:
print chil.find('MessageId') // also gives None
Although, I am able to print the tags with -
for child in root:
print child.tag
//child.tag for the tag Cancelled is - {http://example.com}Cancelled
and their respective values with -
for child in root:
print child.text
How do I get something like -
print child.Queued // will print false
Like in PHP we can access them with the root -
$xml = simplexml_load_string($data);
$status = $xml->SMSError;
Your document has a namespace on it, you need to include the namespace when searching:
root = ET.fromstring(xmlData)
print root.find('{http://example.com}Sent',)
print root.find('{http://example.com}MessageID')
output:
<Element '{http://example.com}Sent' at 0x1043e0690>
<Element '{http://example.com}MessageID' at 0x1043e0350>
The find() and findall() methods also take a namespace map; you can search for a arbitrary prefix, and the prefix will be looked up in that map, to save typing:
nsmap = {'n': 'http://example.com'}
print root.find('n:Sent', namespaces=nsmap)
print root.find('n:MessageID', namespaces=nsmap)
If you're set on Python standard XML libraries, you could use something like this:
root = ET.fromstring(xmlData)
namespace = 'http://example.com'
def query(tree, nodename):
return tree.find('{{{ex}}}{nodename}'.format(ex=namespace, nodename=nodename))
queued = query(root, 'Queued')
print queued.text
You can create a dictionary and directly get values out of it...
tree = ET.fromstring(xmlData)
root = {}
for child in tree:
root[child.tag.split("}")[1]] = child.text
print root["Queued"]
With lxml.etree:
In [8]: import lxml.etree as et
In [9]: doc=et.fromstring(xmlData)
In [10]: ns={'n':'http://example.com'}
In [11]: doc.xpath('n:Queued/text()',namespaces=ns)
Out[11]: ['false']
With elementtree you can do:
import xml.etree.ElementTree as ET
root=ET.fromstring(xmlData)
ns={'n':'http://example.com'}
root.find('n:Queued',namespaces=ns).text
Out[13]: 'false'

Xpath select attribute of current node?

I use python with lxml to process the xml. After I query/filter to get to a nodes I want but I have some problem. How to get its attribute's value by xpath ? Here is my input example.
>print(etree.tostring(node, pretty_print=True ))
<rdf:li xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" rdf:resource="urn:miriam:obo.chebi:CHEBI%3A37671"/>
The value I want is in resource=... . Currently I just use the lxml to get the value. I wonder if it is possible to do in pure xpath ? thanks
EDIT: Forgot to said, this is not a root nodes so I can't use // here. I have like 2000-3000 others in xml file. My first attempt was playing around with ".#attrib" and "self::*#" but those does not seems to work.
EDIT2: I will try my best to explain (well, this is my first time to deal with xml problem using xpath. and english is not one of my favorite field....). Here is my input snippet http://pastebin.com/kZmVdbQQ (full one from here http://www.comp-sys-bio.org/yeastnet/ using version 4).
In my code, I try to get speciesTypes node with resource link chebi (<rdf:li rdf:resource="urn:miriam:obo.chebi:...."/>). and then I tried to get value from rdf:resource attribute in rdf:li. The thing is, I am pretty sure it would be easy to get attribute in child node if I start from parent node like speciesTypes, but I wonder how to do if I start from rdf:li. From my understanding, the "//" in xpath will looking for node from everywhere not just only in the current node.
below is my code
import lxml.etree as etree
tree = etree.parse("yeast_4.02.xml")
root = tree.getroot()
ns = {"sbml": "http://www.sbml.org/sbml/level2/version4",
"rdf":"http://www.w3.org/1999/02/22-rdf-syntax-ns#",
"body":"http://www.w3.org/1999/xhtml",
"re": "http://exslt.org/regular-expressions"
}
#good enough for now
maybemeta = root.xpath("//sbml:speciesType[descendant::rdf:li[starts-with(#rdf:resource, 'urn:miriam:obo.chebi') and not(starts-with(#rdf:resource, 'urn:miriam:uniprot'))]]", namespaces = ns)
def extract_name_and_chebi(node):
name = node.attrib['name']
chebies = node.xpath("./sbml:annotation//rdf:li[starts-with(#rdf:resource, 'urn:miriam:obo.chebi') and not(starts-with(#rdf:resource, 'urn:miriam:uniprot'))]", namespaces=ns) #get all rdf:li node with chebi resource
assert len(chebies) == 1
#my current solution to get rdf:resource value from rdf:li node
rdfNS = "{" + ns.get('rdf') + "}"
chebi = chebies[0].attrib[rdfNS + 'resource']
#do protein later
return (name, chebi)
metaWithChebi = map(extract_name_and_chebi, maybemeta)
fo = open("metabolites.txt", "w")
for name, chebi in metaWithChebi:
fo.write("{0}\t{1}\n".format(name, chebi))
Prefix the attribute name with # in the XPath query:
>>> from lxml import etree
>>> xml = """\
... <?xml version="1.0" encoding="utf8"?>
... <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
... <rdf:li rdf:resource="urn:miriam:obo.chebi:CHEBI%3A37671"/>
... </rdf:RDF>
... """
>>> tree = etree.fromstring(xml)
>>> ns = {'rdf': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'}
>>> tree.xpath('//rdf:li/#rdf:resource', namespaces=ns)
['urn:miriam:obo.chebi:CHEBI%3A37671']
EDIT
Here's a revised version of the script in the question:
import lxml.etree as etree
ns = {
'sbml': 'http://www.sbml.org/sbml/level2/version4',
'rdf':'http://www.w3.org/1999/02/22-rdf-syntax-ns#',
'body':'http://www.w3.org/1999/xhtml',
're': 'http://exslt.org/regular-expressions',
}
def extract_name_and_chebi(node):
chebies = node.xpath("""
.//rdf:li[
starts-with(#rdf:resource, 'urn:miriam:obo.chebi')
]/#rdf:resource
""", namespaces=ns)
return node.attrib['name'], chebies[0]
with open('yeast_4.02.xml') as xml:
tree = etree.parse(xml)
maybemeta = tree.xpath("""
//sbml:speciesType[descendant::rdf:li[
starts-with(#rdf:resource, 'urn:miriam:obo.chebi')]]
""", namespaces = ns)
with open('metabolites.txt', 'w') as output:
for node in maybemeta:
output.write('%s\t%s\n' % extract_name_and_chebi(node))
To select off the current node its attribute named rdf:resource, use this XPath expression:
#rdf:resource
In order for this to "work correctly" you must register the association of the prefix "rdf:" to the corresponding namespace.
If you don't know how to register the rdf namespace, it is still possible to select the attribute -- with this XPath expression:
#*[name()='rdf:resource']
Well, I got it. The xpath expression I need here is "./#rdf:resource" not ".#rdf:resource". But why ? I thought "./" indicate the child of current node.

Categories