How to add namespace in existing xml file - python

Sample xml:
<abcd>
... Many contents here ...
</abcd>
I want to change sample file like below:
<abcd xmlns="urn:myname" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:myname myname.xsd">
.... many contents here
</abcd>
So, my code is below, but when I printed out the document the result is same with the input file.
attr_qname = etree.QName("http://www.w3.org/2001/XMLSchema-instance", "schemaLocation")
nsmap = {None: "urn:myname", 'xsi':"http://www.w3.org/2001/XMLSchema-instance"}
document = etree.parse(self.fname)
root = document.getroot()
root = etree.Element('abcd', {attr_qname: 'urn:myname myname.xsd'}, nsmap=nsmap)
print("Parsed : ", etree.tostring(document, pretty_print=True).decode())
How can I add the namespace and print out?

in your code you're just creating a new abcd and don't do anything with it
try this
attr_qname = etree.QName("http://www.w3.org/2001/XMLSchema-instance", "schemaLocation")
nsmap = {None: "urn:myname", 'xsi':"http://www.w3.org/2001/XMLSchema-instance"}
document = etree.parse(self.fname)
root = document.getroot()
abcd = root.find('abcd')
abcd.set('xmlns', "urn:myname")
abcd.set(attr_qname, "urn:myname myname.xsd")
print("Parsed : ", etree.tostring(root, pretty_print=True).decode())

Related

How to add Subelements and its in xml

I have a xml file which has subelements:-
.......
.......
<EnabledFeatureListForUsers>
<FeatureEntitlementDetail>
<UserName>xyz#xyz.com</UserName>
<FeatureList>
<FeatureDetail>
<FeatureId>X</FeatureId>
</FeatureDetail>
</FeatureList>
</FeatureEntitlementDetail>
</EnabledFeatureListForUsers>
.....
.....
I want to add a new sub element FeatureEntitlementDetail with all its subelements/children like username, Feature List, Feature Detail, Feature Id. I tried using SubElement function, but it only adds FeatureEntitlementDetail />. The code which I used was :-
import xml.etree.ElementTree as ET
filename = "XYZ.xml"
xmlTree = ET.parse(filename)
root = xmlTree.getroot()
for element in root.iter('EnabledFeatureListForUsers'):
ET.SubElement(element,"FeatureEntitlementDetail")
Any help is appreciated.
See below
import xml.etree.ElementTree as ET
xml = """
<EnabledFeatureListForUsers>
<FeatureEntitlementDetail>
<UserName>xyz#xyz.com</UserName>
<FeatureList>
<FeatureDetail>
<FeatureId>X</FeatureId>
</FeatureDetail>
</FeatureList>
</FeatureEntitlementDetail>
</EnabledFeatureListForUsers>
"""
root = ET.fromstring(xml)
fed = ET.SubElement(root,'FeatureEntitlementDetail')
un = ET.SubElement(fed,'UserName')
un.text = 'abc.zz.net'
fl = ET.SubElement(fed,'FeatureList')
df = ET.SubElement(fl,'FeatureDetail')
fi = ET.SubElement(df,'FeatureId')
fi.text = 'Z'
ET.dump(root)
output
<?xml version="1.0" encoding="UTF-8"?>
<EnabledFeatureListForUsers>
<FeatureEntitlementDetail>
<UserName>xyz#xyz.com</UserName>
<FeatureList>
<FeatureDetail>
<FeatureId>X</FeatureId>
</FeatureDetail>
</FeatureList>
</FeatureEntitlementDetail>
<FeatureEntitlementDetail>
<UserName>abc.zz.net</UserName>
<FeatureList>
<FeatureDetail>
<FeatureId>Z</FeatureId>
</FeatureDetail>
</FeatureList>
</FeatureEntitlementDetail>
</EnabledFeatureListForUsers>

Python ElementTree adding a child

I have an xml file which looks like this:
<keyboard>
</keyboard>
I want to make it look like this:
<keyboard>
<keybind key="W-c-a"><action name="Execute"><command>sudo shutdown now</command></action></keybind>
</keyboard>
I have a function to add this which has parameters that will change the key and the command. Is this possible to do? If yes, how can I do this?
(The function):
def add_keybinding(self, keys, whatToExec):
keybinding = "<keybind key=\"%s\"><action name=\"Execute\"><command>%s</command><action></keybind>" % (keys, whatToExec)
f = open("/etc/xdg/openbox/rc.xml", "a")
try:
# I want to append the keybinding variable to the <keyboard>
except IOError as e:
print(e)
From the doc, you can try the following:
def add_keybinding(keys, whatToExec, filename):
keybind = ET.Element('keybind')
keybind.set("key", keys)
action = ET.SubElement(keybind, 'action')
action.set("name", "Execute")
command = ET.SubElement(action, 'command')
command.text = whatToExec
tree = ET.parse(filename)
tree.getroot().append(keybind)
tree.write(filename)
Explanation:
Create the keybind tag using xml.etree.ElementTree.Element : keybind = ET.Element('keybind')
Set a property using set: keybind.set("key", keys)
Create the action tag as a sub element of keybind using
xml.etree.ElementTree.SubElement: action = ET.SubElement(keybind, 'action')
Set the property as at step 2: action.set("name", "Execute")
Create command tag: action.set("name", "Execute")
Set command tag content using .text: command.text = whatToExec
Read file using xml.etree.ElementTree.parse: tree = ET.parse(filename)
Append keybind tag to the doc root element using append*
Export new xml to file using write
Full example:
import xml.etree.ElementTree as ET
from xml.dom import minidom
def add_keybinding(keys, whatToExec, filename):
keybind = ET.Element('keybind')
keybind.set("key", keys)
action = ET.SubElement(keybind, 'action')
action.set("name", "Execute")
command = ET.SubElement(action, 'command')
command.text = whatToExec
tree = ET.parse(filename)
tree.getroot().append(keybind)
tree.write(filename)
return tree
def prettify(elem):
rough_string = ET.tostring(elem, 'utf-8')
return minidom.parseString(rough_string).toprettyxml(indent=" ")
filename = "test.xml"
for i in range(3):
tree = add_keybinding(str(i), "whatToExec " + str(i), filename)
print(prettify(tree.getroot()))
Output:
<?xml version="1.0" ?>
<keyboard>
<keybind key="0">
<action name="Execute">
<command>whatToExec 0</command>
</action>
</keybind>
<keybind key="1">
<action name="Execute">
<command>whatToExec 1</command>
</action>
</keybind>
<keybind key="2">
<action name="Execute">
<command>whatToExec 2</command>
</action>
</keybind>
</keyboard>

How to get the content of specific grandchild from xml file through python

Hi I am very new to python programming. I have an xml file of structure:
<?xml version="1.0" encoding="UTF-8"?>
-<LidcReadMessage xsi:schemaLocation="http://www.nih.gov http://troll.rad.med.umich.edu/lidc/LidcReadMessage.xsd"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://www.nih.gov" uid="1.3.6.1.4.1.14519.5.2.1.6279.6001.1307390687803.0">
-<ResponseHeader>
<Version>1.8.1</Version>
<MessageId>-421198203</MessageId>
<DateRequest>2007-11-01</DateRequest>
<TimeRequest>12:30:44</TimeRequest>
<RequestingSite>removed</RequestingSite>
<ServicingSite>removed</ServicingSite>
<TaskDescription>Second unblinded read</TaskDescription>
<CtImageFile>removed</CtImageFile>
<SeriesInstanceUid>1.3.6.1.4.1.14519.5.2.1.6279.6001.179049373636438705059720603192</SeriesInstanceUid>
<DateService>2008-08-18</DateService>
<TimeService>02:05:51</TimeService>
<ResponseDescription>1 - Reading complete</ResponseDescription>
<StudyInstanceUID>1.3.6.1.4.1.14519.5.2.1.6279.6001.298806137288633453246975630178</StudyInstanceUID>
</ResponseHeader>
-<readingSession>
<annotationVersion>3.12</annotationVersion>
<servicingRadiologistID>540461523</servicingRadiologistID>
-<unblindedReadNodule>
<noduleID>Nodule 001</noduleID>
-<characteristics>
<subtlety>5</subtlety>
<internalStructure>1</internalStructure>
<calcification>6</calcification>
<sphericity>3</sphericity>
<margin>3</margin>
<lobulation>3</lobulation>
<spiculation>4</spiculation>
<texture>5</texture>
<malignancy>5</malignancy>
</characteristics>
-<roi>
<imageZposition>-125.000000 </imageZposition>
<imageSOP_UID>1.3.6.1.4.1.14519.5.2.1.6279.6001.110383487652933113465768208719</imageSOP_UID>
......
There are four which contains multiple . Each contains an . I need to extract the information in from all of these headers.
Right now I am doing this:
import xml.etree.ElementTree as ET
tree = ET.parse('069.xml')
root = tree.getroot()
#lst = []
for readingsession in root.iter('readingSession'):
for roi in readingsession.findall('roi'):
id = roi.findtext('imageSOP_UID')
print(id)
but it ouputs like this:
Process finished with exit code 0.
If anyone can help.
The real problem as been wit the namespace. I tried with and without it, but it didn't work with this code.
ds = pydicom.dcmread("000071.dcm")
uid = ds.SOPInstanceUID
tree = ET.parse("069.xml")
root = tree.getroot()
for child in root:
print(child.tag)
if child.tag == '{http://www.nih.gov}readingSession':
read = child.find('{http://www.nih.gov}unblindedReadNodule')
if read != None:
nodule_id = read.find('{http://www.nih.gov}noduleID').text
xml_uid = read.find('{http://www.nih.gov}roi').find('{http://www.nih.gov}imageSOP_UID').text
if xml_uid == uid:
print(xml_uid, "=", uid)
roi= read.find('{http://www.nih.gov}roi')
print(roi)
This work completely fine to get a uid from dicom image of LIDC/IDRI dataset and then extract the same uid from xml file for it region of interest.

Generating xml with LXML - attribute namespace

I need to generate xml which looks like this:
<definitions xmlns:ex="http://www.example1.org" xmlns="http://www.example2.org">
<typeRef xmlns:ns2="xyz">text</typeRef>
</definitions>
My code looks as follows:
class XMLNamespaces:
ex = 'http://www.example1.org'
xmlns = 'http://www.example2.org'
root = Element('definitions', xmlns='http://www.example2.org', nsmap = {'ex':XMLNamespaces.ex})
type_ref = SubElement(root, 'typeRef')
type_ref.attrib[QName(XMLNamespaces.xmlns, 'ns2')] = 'xyz'
type_ref.text = 'text'
tree = ElementTree(root)
tree.write('filename.xml', pretty_print=True)
The result looks like:
<definitions xmlns:ex="http://www.example1.org" xmlns="http://www.example2.org">
<typeRef xmlns:ns0="http://www.example2.org" ns0:ns2="xyz">text</typeRef>
</definitions>
So here is my question:
How to make attribute look like xmlns:ns2="xyz" instead of xmlns:ns0="http://www.example2.org" ns0:ns2="xyz"?
Simply run same process as your opening element where you defined the namespace dictionary with nsmap argument. Notice the added variable in your class object:
from lxml.etree import *
class XMLNamespaces:
ex = 'http://www.example1.org'
xmlns = 'http://www.example2.org'
xyz = 'xyz'
root = Element('definitions', xmlns='http://www.example2.org', nsmap={'ex':XMLNamespaces.ex})
type_ref = SubElement(root, 'typeRef', nsmap={'ns2':XMLNamespaces.xyz})
type_ref.text = 'text'
tree = ElementTree(root)
tree.write('filename.xml', pretty_print=True)
# <definitions xmlns:ex="http://www.example1.org" xmlns="http://www.example2.org">
# <typeRef xmlns:ns2="xyz">text</typeRef>
# </definitions>

How to retrieve certain child elements using python and lxml

With lots of help from stack overflow, I managed to get some python code working to process xml files (using lxml). I've been able to adapt it for lots of different purposes, but there is one thing I can't work out.
Example XML:
<?xml version="1.0" encoding="UTF-8" ?>
<TVAMain xml:lang="PL" publisher="Someone" publicationTime="2014-01-03T06:24:24+00:00" version="217" xmlns="urn:tva:metadata:2010" xmlns:mpeg7="urn:tva:mpeg7:2008" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:tva:metadata:2010 http://Something.xsd">
<ProgramDescription>
<ProgramInformationTable>
<ProgramInformation programId="crid://bds.tv/88032"><BasicDescription>
<Title xml:lang="PL" type="episodeTitle">Some Title</Title>
<Synopsis xml:lang="PL" length="short">Some Synopsis</Synopsis>
<Genre href="urn:tva:metadata:cs:EventGenreCS:2009:96">
<Name xml:lang="EN">Some Genre</Name>
</Genre>
<Language>PL</Language>
<RelatedMaterial>
<HowRelated href="urn:eventis:metadata:cs:HowRelatedCS:2010:boxCover">
<Name>Box cover</Name>
</HowRelated>
<MediaLocator>
<mpeg7:MediaUri>file://Images/98528834.p.jpg</mpeg7:MediaUri>
</MediaLocator>
</RelatedMaterial>
The python code will return the Title, Genre and Synopsis, but it will not return the image reference (3rd line from the bottom). I presume this is because of the name format 'mpeg7:MediaUri' (which I cannot change). The code will return the 'No Image' string instead.
This is the relavent python code
file_name = input('Enter the file name, including .xml extension: ')
print('Parsing ' + file_name)
from lxml import etree
parser = etree.XMLParser()
tree = etree.parse(file_name, parser)
root = tree.getroot()
nsmap = {'xmlns': 'urn:tva:metadata:2010'}
with open(file_name+'.log', 'w', encoding='utf-8') as f:
for info in root.xpath('//xmlns:ProgramInformation', namespaces=nsmap):
crid = (info.get('programId'))
titlex = (info.find('.//xmlns:Title', namespaces=nsmap))
title = (titlex.text if titlex != None else 'No title')
genrex = (info.find('.//xmlns:Genre/xmlns:Name', namespaces=nsmap))
genre = (genrex.text if genrex != None else 'No Genre')
imagex = (info.find('.//xmlns:RelatedMaterial/xmlns:MediaLocator/xmlns:"mpeg7:MediaUri"', namespaces=nsmap))
image = (image.text if imagex != None else 'No Image')
f.write('{}|{}|{}|{}\n'.format(crid, title, genre, image))
Can someone explain how I can adapt the 'imagex' line, so that it returns 'file://Images/98528834.p.jpg' from the example? I had a look at using square brackets, but it caused an error.
That node you are interested in, has mpeg7 namespace instead of default namespace. You can try with this syntax *[local-name() = "elementName"] to match element by it's local name (ignoring the namespace) :
imagex = info.xpath(
'.//xmlns:RelatedMaterial/xmlns:MediaLocator/*[local-name() = "MediaUri"]',
namespaces=nsmap)[0]
Or add the mpeg7 in namespaces declaration :
nsmap = {'xmlns': 'urn:tva:metadata:2010', 'mpeg7':'urn:tva:mpeg7:2008'}
then you can use mpeg7 prefix in xpath query :
imagex = (info.find('.//xmlns:RelatedMaterial/xmlns:MediaLocator/mpeg7:MediaUri', namespaces=nsmap))

Categories