Using Python LXML to removing XML element values but leaving one placeholder - python

I have an XML file which I would like to clear the text in the 'value' child elements, but leave one empty value element as a placeholder for adding text at a later date. I am using Python's LXML module.
Here's an example of the XML section:
<spec class="Spec" name="New Test">
<mainreport>
<item name="New Item">First Item</item>
</mainreport>
<case class="CaseItem" name="Some Name">
<extraelement>
<item name="ID">Some Id</item>
</extraelement>
<pool class="String" name="Originator">
<value>A</value>
<value>B</value>
<value>C</value>
</pool>
<pool class="String" name="Target">
<value>D</value>
<value>E</value>
<value>F</value>
</pool>
And here's what I am hoping to output:
<spec class="Spec" name="New Test">
<mainreport>
<item name="New Item">First Item</item>
</mainreport>
<case class="CaseItem" name="Some Name">
<extraelement>
<item name="ID">Some Id</item>
</extraelement>
<pool class="String" name="Originator">
<value></value>
</pool>
<pool class="String" name="Target">
<value></value>
</pool>
I have written the following code, but it only adds the "value" tag to the last element:
import lxml.etree as et
import os
xml_match = os.path.join("input.xml")
doc = et.parse(xml_match)
for elem in doc.xpath('//case/pool/value'):
elem.getparent().remove(elem)
blankval = et.Element("value")
blankval.text = ""
for elem in doc.xpath('//case/pool'):
elem.insert(1, blankval)
outFile = "output.xml"
doc.write(outFile)

I would remove all value elements and append an empty one in a single loop:
for elem in doc.xpath('//case/pool'):
for value in elem.findall("value"):
elem.remove(value)
blankval = et.Element("value")
blankval.text = ""
elem.append(blankval)
There is also a handy .clear() method, but it would also clear up the attributes.
The reason your current approach is not working is because you are trying to reuse the same exact blankval element, but instead, you need to recreate new element in the loop before you perform an insert operation:
for elem in doc.xpath('//case/pool'):
blankval = et.Element("value")
blankval.text = ""
elem.insert(1, blankval)

Related

How to remove a sub child of an xml file using python?

I have an xml file, which has a particular set of child lines which should be deleted when the python code is run.
Below shown lines are my xml code.
<?xml version="1.0" encoding="utf-8" ?>
<visualization protocolVersion="10.4.0.0">
<globalSection/>
<coreObjectDefinition type="displayDefinition">
<version type="version" value="10.4.0.0"/>
<width>1920</width>
<height>810</height>
<referenceCheck>2</referenceCheck>
<defaultBgColor type="colorSet" r="255" g="255" b="255"/>
<defaultFgColor type="colorSet" r="0" g="0" b="0"/>
<defaultFont type="font" name="Tahoma" size="16" underline="false" strikethrough="false"/>
<defaultStroke type="stroke" width="1.0"/>
<grid type="grid" gridVisible="true" snappingActive="true" verticalSnapInterval="8" horizontalSnapInterval="8" onTop="false">
<color type="colorSet" r="0" g="0" b="0"/>
</grid>
<revisionHistory type="revisionHistory">
<revision type="revision" who="ADMIN" when="2020.05.03 09:46:15.566 CEST" what="Created" where="CPC-A0668-4138"/>
</revisionHistory>
<blinkDelay>500</blinkDelay>
<mousePassThrough>false</mousePassThrough>
<visibilityGroup type="componentData">
<htmlId>2</htmlId>
<name>Overview</name>
<description>Always shown</description>
<minimumZoomEnabled>true</minimumZoomEnabled>
<minimumZoomFactor>10.0</minimumZoomFactor>
</visibilityGroup>
<visibilityGroup type="componentData">
<htmlId>3</htmlId>
<name>Rough</name>
<description>Shown when viewing viewing a large area</description>
<minimumZoomEnabled>true</minimumZoomEnabled>
<minimumZoomFactor>25.0</minimumZoomFactor>
</visibilityGroup>
<visibilityGroup type="componentData">
<htmlId>4</htmlId>
<name>Standard</name>
<description>Shown when using the default view setting</description>
<minimumZoomEnabled>true</minimumZoomEnabled>
<minimumZoomFactor>100.0</minimumZoomFactor>
</visibilityGroup>
<visibilityGroup type="componentData">
<htmlId>5</htmlId>
<name>Detail</name>
<description>Shown only when viewing a small area</description>
<minimumZoomEnabled>true</minimumZoomEnabled>
<minimumZoomFactor>400.0</minimumZoomFactor>
</visibilityGroup>
<visibilityGroup type="componentData">
<htmlId>6</htmlId>
<name>Intricacies</name>
<description>Shown only when viewing a very small area</description>
<minimumZoomEnabled>true</minimumZoomEnabled>
<minimumZoomFactor>1000.0</minimumZoomFactor>
</visibilityGroup>
<visualizationLayer type="componentData">
<htmlId>1</htmlId>
<name>Layer1</name>
</visualizationLayer>
<componentCountHint>1</componentCountHint>
<ellipse type="componentData" x="851.99896" y="300.00006" top="92.000046" bottom="91.99985" left="99.99896" right="100.001526">
<htmlId>7</htmlId>
<stroke type="stroke" width="1.0"/>
<fillPaint type="paint">
<paint type="colorSet" r="255" g="255" b="255"/>
</fillPaint>
**<data type="data">
<action type="actionConnectTo">
<property type="property" name="ellipse.visible"/>
<filter type="filter">
<value>0.0</value>
</filter>
<connection type="connection">
<direction>1</direction>
<itemName>AOG.Templates.Alarm</itemName>
<itemId>2.1.3.0.0.2.1.8</itemId>
</connection>
</action>
</data>**
</ellipse>
</coreObjectDefinition>
</visualization>
I want only the below part to be deleted from the entire xml file.
<data type="data">
<action type="actionConnectTo">
<property type="property" name="ellipse.visible"/>
<filter type="filter">
<value>0.0</value>
</filter>
<connection type="connection">
<direction>1</direction>
<itemName>AOG.Templates.Alarm</itemName>
<itemId>2.1.3.0.0.2.1.8</itemId>
</connection>
</action>
</data>
The below mentioned python code only removes the child section and not the sub child.. Kindly help me out on this
from xml.etree import ElementTree
root = ElementTree.parse("test1.xml").getroot()
b = root.getchildren()[0]
root.remove(b)
ElementTree.dump(root)
Try this.
from simplified_scrapy import SimplifiedDoc,utils,req
html = '''Your xml'''
doc = SimplifiedDoc(html)
data = doc.select('data#type=data') # Get the element
data.repleaceSelf("") # Remove it
print(doc.html) # This is what you want
Unfortunately, you can't access sub-child of an element using ElementTree. Each node only has "pointers" to the direct children of it. So, in order to access the <data/> node and remove it, you should refer to it from its direct parent node.
I'd do it in this way:
for d in root.findall('coreObjectDefinition'):
for e in d.findall('ellipse'):
for f in e.findall('data'):
e.remove(f)
This library has syntax that allows you to search a tree recursively, so you're able to find the element with root.findall('.//data'). So a shorter version of the above code would be:
for d in root.findall('.//ellipse'):
for e in d.findall('data'):
d.remove(e)

How do I write a function that takes an xml file and an integer value X as parameters and updates the attributes of the xml based on the given integer

I am trying to write a function that will take as parameters my xml file file.xml and an integer I want to input from the keyboard.
My xml files looks like this:
<root>
<item name="A" days="10"/>
<item name="B" days="20"/>
I have the integer X :
X= int(input("X value is:")
I want to add the X value to the days attribute in my xml.
for X=1.1 =>I want the output:
A, 11.1 days
B, 20.1 days
I don't know how to write the function because when I tried calling it the name of the file I wanted to open was not recognized =>
read_xml(file.xml)
NameError : name 'file' is not defined.
But more importantly, I don't know how to add an integer value to the attribute of an xml file.
What I did so far using the ElementTree library:
import os
import xml.etree.ElementTree as et
tree = et.ElementTree(file = 'file.xml')
root = tree.getroot()
for item in root.findall('item'):
names = item.get('name')
ages = item.get('age')
genders = item.get('sex')
print(f'''\n{names}, {ages} years old''')
At this moment I get the desired output format but without the integer X added to the days attribute.
Please let me know if you have any idea how to solve this in Python3.
Thanks!!!
import xml.etree.ElementTree as ET
xml = '''<root>
<item name="A" days="10"/>
<item name="B" days="20"/>
</root>'''
def change_days_value(factor):
root = ET.fromstring(xml)
items = root.findall('.//item')
for item in items:
item.attrib['days'] = str(int(item.attrib['days']) * factor)
ET.dump(root)
# read this value from the user
factor = 1.1
change_days_value(factor)
output
<root>
<item days="11.0" name="A" />
<item days="22.0" name="B" />
</root>

python ElementTree remove issue

I have xml file as following:
<plugin-config>
<properties>
<property name="AZSRVC_CONNECTION" value="diamond_plugins#AZSRVC" />
<property name="DIAMOND_HOST" value="10.0.230.1" />
<property name="DIAMOND_PORT" value="3333" />
</properties>
<pack-list>
<vsme-pack id="monthly_50MB">
<campaign-list>
<campaign id="2759" type="SOB" />
<campaign id="2723" type="SUBSCRIBE" />
</campaign-list>
</vsme-pack>
<vsme-pack id="monthly_500MB">
<campaign-list>
<campaign id="3879" type="SOB" />
<campaign id="3885" type="SOB" />
<campaign id="2724" type="SUBSCRIBE" />
<campaign id="1111" type="COB" /></campaign-list>
</vsme-pack>
</pack-list>
</plugin-config>
And trying to run this Python script to remove 'campaign' with specific id.
import xml.etree.ElementTree as ET
tree = ET.parse('pack-assign-config.xml')
root = tree.getroot()
pack_list = root.find('pack-list')
camp_list = pack_list.find(".//vsme-pack[#id='{pack_id}']".format(pack_id=pack_id)).find('campaign-list').findall('campaign')
for camp in camp_list:
if camp.get('id') == '2759':
camp_list.remove(camp)
tree.write('out.xml')
I run script but out is the same as input file, so does not remove element.
Issue :
this is wrong way to find the desired node . you are searching for vsme-pack and the trying to find campaign-list and campaign ? which incorrect format.
camp_list = pack_list.find(".//vsme-pack[#id='{pack_id}']".format(pack_id=pack_id)).find('campaign-list').findall('campaign')
Fixed Code Example
here is the working code which removes the node from xml
import xml.etree.ElementTree as ET
root = ET.parse('pack-assign-config.xml')
# Alternatively, parse the XML that lives in 'filename_path'
# tree = ElementTree.parse(filename_path)
# root = tree.getroot()
# Find the parent element of each "weight" element, using XPATH
for parent in root.findall('.//pack-list/'):
# Find each weight element
for element in parent.findall('campaign-list'):
for camp_list in element.findall('campaign'):
if camp_list.get('id') == '2759' or camp_list.get('id') == '3879' :
element.remove(camp_list)
root.write("out.xml")
hope this helps

Using xmltodict for Python, how can I reference a non-specific XML property and change the value?

I have an xml file as such:
<root processName="someName" >
<Property name="FirstProp" value="one" />
<Property name="SecondProp" value="two" />
<Property name="ThirdProp" value="three" />
</root>
Is it possible, using xmltodict, to find the Property "SecondProp" without knowing the specific index and change value from "two" to "seventeen"? (below)
Code:
import os
import xmltodict
text_file = open("testxml.xml", "r")
tst = text_file.read()
obj = xmltodict.parse(tst)
print(obj['root']['#processName'])
print(obj['root']['Property'][0])
print(obj['root']['Property'][1])
print(obj['root']['Property'][2])
Output:
someName
OrderedDict([('#name', 'FirstProp'), ('#value', 'one')])
OrderedDict([('#name', 'SecondProp'), ('#value', 'two')])
OrderedDict([('#name', 'ThirdProp'), ('#value', 'three')])
You can iterate through obj['root']['Property'] and find the one you're looking for. The way XML is parsed with xmltodict, makes obj['root']['Property'] a list and not a dict.
Example:
for x in obj['root']['Property']:
if x['#name'] == 'SecondProp':
# do whatever you want

Parsing XML using Python on SelectSingleNode

Using the following XML data i want to get value for relevant keys that i'm calling through the python code. And i want to accomplish this task without using any 3rd party libraries.
<Userinfo>
<UserData>
<item key="DateOfBirth" value="19851103" />
<item key="FirstName" value="John" />
<item key="LastName" value="Dicaprio" />
<item key="Gender" value="M" />
<item key="Email" value="john#abc.com" />
<item key="ContactNo" value="235625341" />
</UserData>
</Userinfo>
From the above xml code i want to extract the value from the key i'm calling within the python code below.
def ExtractXml(args):
url = '....'
wc = System.Net.WebClient()
xml = wc.DownloadString(url)
doc = System.Xml.XmlDocument()
doc.LoadXml(xml)
root = doc.DocumentElement
nsmgr = System.Xml.XmlNamespaceManager(doc.NameTable)
#nsmgr.AddNamespace('ns','http://schemas.microsoft.com/developer/msbuild/2003')
node = root.SelectNodes('/Userinfo/UserData',nsmgr)
tcount=root.SelectNodes('/Userinfo/UserData').Count
if not node:
ServiceDesk.Log.PrintError('No condition node')
return
r=[]
t={}
counts=0
for itemNode in node:
counts += 1
fullname = xstr(itemNode.SelectSingleNode("/item[#key='FirstName']/#value",nsmgr))
empname = xstr(itemNode.SelectSingleNode("/item[#key='LastName']/#value",nsmgr))
cardcountry = xstr(itemNode.SelectSingleNode("/item[#key='Email']/#value",nsmgr))
#birthdate = ServiceDesk.Common.ParseDateTime(itemNode.SelectSingleNode("item[#key='DateOfBirth']"))
t = {'counter':counts,'FirstName':fullname,'LastName':empname,'Email':cardcountry,'__rowid__':counter,'__totalcount__':tcount}
r.append(t)
return r
Using following code it doesn't retrive the value for relevant key on SelectSingleNode call. Thanks in advance.
I'm not so sure about IronPython, but Python's core library includes some other platform independent XML parsers... Here's an example that uses xml.etree.ElementTree.
import datetime
import xml.etree.ElementTree as ET
xml = '''<Userinfo>
<UserData>
<item key="DateOfBirth" value="19851103" />
<item key="FirstName" value="John" />
<item key="LastName" value="Dicaprio" />
<item key="Gender" value="M" />
<item key="Email" value="john#abc.com" />
<item key="ContactNo" value="235625341" />
</UserData>
</Userinfo>'''
root = ET.fromstring(xml)
fname = root.findall(".//item[#key='FirstName']")[0].get('value')
dob = datetime.datetime.strptime(
root.findall(".//item[#key='DateOfBirth']")[0].get('value'),
'%Y%m%d')
Currently the XPath expressions in the for loop are all relative to document node since it started with /. To make it relative to context element, which in this case is referenced by itemNode, you can either add a preceding . or just remove the / completely :
for itemNode in node:
counts += 1
# here are some ways to make your XPath heeds the context element `itemNode`
fullname = xstr(itemNode.SelectSingleNode("./item[#key='FirstName']/#value",nsmgr))
empname = xstr(itemNode.SelectSingleNode("item[#key='LastName']/#value",nsmgr))
cardcountry = xstr(itemNode.SelectSingleNode("self::*/item[#key='Email']/#value",nsmgr))

Categories