How to remove a sub child of an xml file using python? - python

I have an xml file, which has a particular set of child lines which should be deleted when the python code is run.
Below shown lines are my xml code.
<?xml version="1.0" encoding="utf-8" ?>
<visualization protocolVersion="10.4.0.0">
<globalSection/>
<coreObjectDefinition type="displayDefinition">
<version type="version" value="10.4.0.0"/>
<width>1920</width>
<height>810</height>
<referenceCheck>2</referenceCheck>
<defaultBgColor type="colorSet" r="255" g="255" b="255"/>
<defaultFgColor type="colorSet" r="0" g="0" b="0"/>
<defaultFont type="font" name="Tahoma" size="16" underline="false" strikethrough="false"/>
<defaultStroke type="stroke" width="1.0"/>
<grid type="grid" gridVisible="true" snappingActive="true" verticalSnapInterval="8" horizontalSnapInterval="8" onTop="false">
<color type="colorSet" r="0" g="0" b="0"/>
</grid>
<revisionHistory type="revisionHistory">
<revision type="revision" who="ADMIN" when="2020.05.03 09:46:15.566 CEST" what="Created" where="CPC-A0668-4138"/>
</revisionHistory>
<blinkDelay>500</blinkDelay>
<mousePassThrough>false</mousePassThrough>
<visibilityGroup type="componentData">
<htmlId>2</htmlId>
<name>Overview</name>
<description>Always shown</description>
<minimumZoomEnabled>true</minimumZoomEnabled>
<minimumZoomFactor>10.0</minimumZoomFactor>
</visibilityGroup>
<visibilityGroup type="componentData">
<htmlId>3</htmlId>
<name>Rough</name>
<description>Shown when viewing viewing a large area</description>
<minimumZoomEnabled>true</minimumZoomEnabled>
<minimumZoomFactor>25.0</minimumZoomFactor>
</visibilityGroup>
<visibilityGroup type="componentData">
<htmlId>4</htmlId>
<name>Standard</name>
<description>Shown when using the default view setting</description>
<minimumZoomEnabled>true</minimumZoomEnabled>
<minimumZoomFactor>100.0</minimumZoomFactor>
</visibilityGroup>
<visibilityGroup type="componentData">
<htmlId>5</htmlId>
<name>Detail</name>
<description>Shown only when viewing a small area</description>
<minimumZoomEnabled>true</minimumZoomEnabled>
<minimumZoomFactor>400.0</minimumZoomFactor>
</visibilityGroup>
<visibilityGroup type="componentData">
<htmlId>6</htmlId>
<name>Intricacies</name>
<description>Shown only when viewing a very small area</description>
<minimumZoomEnabled>true</minimumZoomEnabled>
<minimumZoomFactor>1000.0</minimumZoomFactor>
</visibilityGroup>
<visualizationLayer type="componentData">
<htmlId>1</htmlId>
<name>Layer1</name>
</visualizationLayer>
<componentCountHint>1</componentCountHint>
<ellipse type="componentData" x="851.99896" y="300.00006" top="92.000046" bottom="91.99985" left="99.99896" right="100.001526">
<htmlId>7</htmlId>
<stroke type="stroke" width="1.0"/>
<fillPaint type="paint">
<paint type="colorSet" r="255" g="255" b="255"/>
</fillPaint>
**<data type="data">
<action type="actionConnectTo">
<property type="property" name="ellipse.visible"/>
<filter type="filter">
<value>0.0</value>
</filter>
<connection type="connection">
<direction>1</direction>
<itemName>AOG.Templates.Alarm</itemName>
<itemId>2.1.3.0.0.2.1.8</itemId>
</connection>
</action>
</data>**
</ellipse>
</coreObjectDefinition>
</visualization>
I want only the below part to be deleted from the entire xml file.
<data type="data">
<action type="actionConnectTo">
<property type="property" name="ellipse.visible"/>
<filter type="filter">
<value>0.0</value>
</filter>
<connection type="connection">
<direction>1</direction>
<itemName>AOG.Templates.Alarm</itemName>
<itemId>2.1.3.0.0.2.1.8</itemId>
</connection>
</action>
</data>
The below mentioned python code only removes the child section and not the sub child.. Kindly help me out on this
from xml.etree import ElementTree
root = ElementTree.parse("test1.xml").getroot()
b = root.getchildren()[0]
root.remove(b)
ElementTree.dump(root)

Try this.
from simplified_scrapy import SimplifiedDoc,utils,req
html = '''Your xml'''
doc = SimplifiedDoc(html)
data = doc.select('data#type=data') # Get the element
data.repleaceSelf("") # Remove it
print(doc.html) # This is what you want

Unfortunately, you can't access sub-child of an element using ElementTree. Each node only has "pointers" to the direct children of it. So, in order to access the <data/> node and remove it, you should refer to it from its direct parent node.
I'd do it in this way:
for d in root.findall('coreObjectDefinition'):
for e in d.findall('ellipse'):
for f in e.findall('data'):
e.remove(f)
This library has syntax that allows you to search a tree recursively, so you're able to find the element with root.findall('.//data'). So a shorter version of the above code would be:
for d in root.findall('.//ellipse'):
for e in d.findall('data'):
d.remove(e)

Related

Parsing XML with namespace into dictionary

I'm having a hard time following the xml.etree.ElementTree documentation with regard to parsing an XML document with a namespace and nested tags.
To begin, the xml tree I am trying to parse looks like:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ROOT-MAIN xmlns="http://fakeurl.com/page">
<Alarm> <--- I dont care about these types of objects
<Node>
<location>Texas></location>
<name>John</name>
</Node>
</Alarm>
<Alarm> <--- I care about these types of objects
<CreateTime>01/01/2011</CreateTime>
<Story>
<Node>
<Name>Ethan</name
<Address category="residential>
<address>1421 Morning SE</address>
</address>
</Node>
</Story>
<Build>
<Action category="build_value_1">Build was successful</Action>
</Build>
<OtherData type="string" meaning="favoriteTVShow">Purple</OtherData>
<OtherData type="string" meaning="favoriteColor">Seinfeld</OtherData>
</Alarm>
</ROOT-MAIN>
I am trying to build an array of dictionaries that have a similar structure to the second < Alarm > object. When parsing this XML file, I do the following:
import xml.etree.ElementTree as ET
tree = ET.parse('data/'+filename)
root = tree.getroot()
namespace= '{http://fakeurl.com/page}'
for alarm in tree.findall(namespace+'Alarm'):
for elem in alarm.iter():
try:
creation_time = elem.find(namespace+'CreateTime')
for story in elem.findall(namespace+'Story'):
for node in story.findall(namespace+'Node'):
for Address in node.findall(namespace+'Address'):
address = Address.find(namespace+'address').text
for build in elem.findall(namespace+'Build'):
category= build.find(namespace+'Action').attrib
action = build.find(namespace+'Action').text
for otherdata in elem.findall(namespace+'OtherData'):
#not sure how to get the 'meaning' attribute value as well as the text value for these <OtherData> tags
except:
pass
Right I'm just trying to get values for:
< address >
< Action > (attribute value and text value)
< OtherData > (attribute value and text value)
I'm sort of able to do this with for loops within for-loops but I was hoping for a cleaner, xpath solution which I haven't figured out how to do with a namespace.
Any suggestions would be much appreciated.
Here (collecting a subset of the elements you mentioned -- add more code to collect rest of elements)
import xml.etree.ElementTree as ET
import re
xmlstring = '''<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<root xmlns="http://fakeurl.com/page">
<Alarm>
<Node>
<location>Texas></location>
<name>John</name>
</Node>
</Alarm>
<Alarm>
<CreateTime>01/01/2011</CreateTime>
<Story>
<Node>
<Name>Ethan</Name>
<Address category="residential">
<address>1421 Morning SE</address>
</Address>
</Node>
</Story>
<Build>
<Action category="build_value_1">Build was successful</Action>
</Build>
<OtherData type="string" meaning="favoriteTVShow">Purple</OtherData>
<OtherData type="string" meaning="favoriteColor">Seinfeld</OtherData>
</Alarm>
</root>'''
xmlstring = re.sub(' xmlns="[^"]+"', '', xmlstring, count=1)
root = ET.fromstring(xmlstring)
alarms = root.findall('Alarm')
alarms_list = []
for alarm in alarms:
create_time = alarm.find('CreateTime')
if create_time is not None:
entry = {'create_time': create_time.text}
alarms_list.append(entry)
actions = alarm.findall('Build/Action')
if actions:
entry['builds'] = []
for action in actions:
entry['builds'].append({'category': action.attrib['category'], 'status': action.text})
print(alarms_list)

python ElementTree remove issue

I have xml file as following:
<plugin-config>
<properties>
<property name="AZSRVC_CONNECTION" value="diamond_plugins#AZSRVC" />
<property name="DIAMOND_HOST" value="10.0.230.1" />
<property name="DIAMOND_PORT" value="3333" />
</properties>
<pack-list>
<vsme-pack id="monthly_50MB">
<campaign-list>
<campaign id="2759" type="SOB" />
<campaign id="2723" type="SUBSCRIBE" />
</campaign-list>
</vsme-pack>
<vsme-pack id="monthly_500MB">
<campaign-list>
<campaign id="3879" type="SOB" />
<campaign id="3885" type="SOB" />
<campaign id="2724" type="SUBSCRIBE" />
<campaign id="1111" type="COB" /></campaign-list>
</vsme-pack>
</pack-list>
</plugin-config>
And trying to run this Python script to remove 'campaign' with specific id.
import xml.etree.ElementTree as ET
tree = ET.parse('pack-assign-config.xml')
root = tree.getroot()
pack_list = root.find('pack-list')
camp_list = pack_list.find(".//vsme-pack[#id='{pack_id}']".format(pack_id=pack_id)).find('campaign-list').findall('campaign')
for camp in camp_list:
if camp.get('id') == '2759':
camp_list.remove(camp)
tree.write('out.xml')
I run script but out is the same as input file, so does not remove element.
Issue :
this is wrong way to find the desired node . you are searching for vsme-pack and the trying to find campaign-list and campaign ? which incorrect format.
camp_list = pack_list.find(".//vsme-pack[#id='{pack_id}']".format(pack_id=pack_id)).find('campaign-list').findall('campaign')
Fixed Code Example
here is the working code which removes the node from xml
import xml.etree.ElementTree as ET
root = ET.parse('pack-assign-config.xml')
# Alternatively, parse the XML that lives in 'filename_path'
# tree = ElementTree.parse(filename_path)
# root = tree.getroot()
# Find the parent element of each "weight" element, using XPATH
for parent in root.findall('.//pack-list/'):
# Find each weight element
for element in parent.findall('campaign-list'):
for camp_list in element.findall('campaign'):
if camp_list.get('id') == '2759' or camp_list.get('id') == '3879' :
element.remove(camp_list)
root.write("out.xml")
hope this helps

Using xmltodict for Python, how can I reference a non-specific XML property and change the value?

I have an xml file as such:
<root processName="someName" >
<Property name="FirstProp" value="one" />
<Property name="SecondProp" value="two" />
<Property name="ThirdProp" value="three" />
</root>
Is it possible, using xmltodict, to find the Property "SecondProp" without knowing the specific index and change value from "two" to "seventeen"? (below)
Code:
import os
import xmltodict
text_file = open("testxml.xml", "r")
tst = text_file.read()
obj = xmltodict.parse(tst)
print(obj['root']['#processName'])
print(obj['root']['Property'][0])
print(obj['root']['Property'][1])
print(obj['root']['Property'][2])
Output:
someName
OrderedDict([('#name', 'FirstProp'), ('#value', 'one')])
OrderedDict([('#name', 'SecondProp'), ('#value', 'two')])
OrderedDict([('#name', 'ThirdProp'), ('#value', 'three')])
You can iterate through obj['root']['Property'] and find the one you're looking for. The way XML is parsed with xmltodict, makes obj['root']['Property'] a list and not a dict.
Example:
for x in obj['root']['Property']:
if x['#name'] == 'SecondProp':
# do whatever you want

Using Python LXML to removing XML element values but leaving one placeholder

I have an XML file which I would like to clear the text in the 'value' child elements, but leave one empty value element as a placeholder for adding text at a later date. I am using Python's LXML module.
Here's an example of the XML section:
<spec class="Spec" name="New Test">
<mainreport>
<item name="New Item">First Item</item>
</mainreport>
<case class="CaseItem" name="Some Name">
<extraelement>
<item name="ID">Some Id</item>
</extraelement>
<pool class="String" name="Originator">
<value>A</value>
<value>B</value>
<value>C</value>
</pool>
<pool class="String" name="Target">
<value>D</value>
<value>E</value>
<value>F</value>
</pool>
And here's what I am hoping to output:
<spec class="Spec" name="New Test">
<mainreport>
<item name="New Item">First Item</item>
</mainreport>
<case class="CaseItem" name="Some Name">
<extraelement>
<item name="ID">Some Id</item>
</extraelement>
<pool class="String" name="Originator">
<value></value>
</pool>
<pool class="String" name="Target">
<value></value>
</pool>
I have written the following code, but it only adds the "value" tag to the last element:
import lxml.etree as et
import os
xml_match = os.path.join("input.xml")
doc = et.parse(xml_match)
for elem in doc.xpath('//case/pool/value'):
elem.getparent().remove(elem)
blankval = et.Element("value")
blankval.text = ""
for elem in doc.xpath('//case/pool'):
elem.insert(1, blankval)
outFile = "output.xml"
doc.write(outFile)
I would remove all value elements and append an empty one in a single loop:
for elem in doc.xpath('//case/pool'):
for value in elem.findall("value"):
elem.remove(value)
blankval = et.Element("value")
blankval.text = ""
elem.append(blankval)
There is also a handy .clear() method, but it would also clear up the attributes.
The reason your current approach is not working is because you are trying to reuse the same exact blankval element, but instead, you need to recreate new element in the loop before you perform an insert operation:
for elem in doc.xpath('//case/pool'):
blankval = et.Element("value")
blankval.text = ""
elem.insert(1, blankval)

Python, XML parsing, and Elementtree

What am I screwing up here?
I can't get this to return any results. I'm sure I'm doing something stupid. I'm not a programmer and this is driving me crazy. Trying to learn but after about 8 hours I'm frazzled.
Here is a sample of my XML:
<?xml version="1.0"?>
-<MyObjectBuilder_Sector xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<!-- Saved '2014-08-23T15:28:07.8585220-05:00' with SEToolbox version '1.44.14.2' -->
-<Position>
<X>0</X>
<Y>0</Y>
<Z>0</Z>
</Position>
-<SectorEvents>
-<Events>
-<MyObjectBuilder_GlobalEventBase>
-<DefinitionId>
<TypeId>MyObjectBuilder_GlobalEventDefinition</TypeId>
<SubtypeId>SpawnCargoShip</SubtypeId>
</DefinitionId>
<Enabled>false</Enabled>
<ActivationTimeMs>401522</ActivationTimeMs>
</MyObjectBuilder_GlobalEventBase>
</Events>
</SectorEvents>
<AppVersion>1044014</AppVersion>
-<SectorObjects>
-<MyObjectBuilder_EntityBase xsi:type="MyObjectBuilder_VoxelMap">
<EntityId>72248529206701361</EntityId>
<PersistentFlags>CastShadows InScene</PersistentFlags>
-<PositionAndOrientation>
<Position z="-466" y="-8987" x="-95"/>
<Forward z="-1" y="0" x="0"/>
<Up z="0" y="1" x="0"/>
</PositionAndOrientation>
<Filename>BaseAsteroid.vox</Filename>
</MyObjectBuilder_EntityBase>
-<MyObjectBuilder_EntityBase xsi:type="MyObjectBuilder_VoxelMap">
<EntityId>72151252176979970</EntityId>
<PersistentFlags>CastShadows InScene</PersistentFlags>
-<PositionAndOrientation>
<Position z="-11301.9033" y="-1183.70569" x="-2126.84"/>
<Forward z="-1" y="0" x="0"/>
<Up z="0" y="1" x="0"/>
</PositionAndOrientation>
<Filename>asteroid0.vox</Filename>
</MyObjectBuilder_EntityBase>
-<MyObjectBuilder_EntityBase xsi:type="MyObjectBuilder_VoxelMap">
<EntityId>72108197145016458</EntityId>
<PersistentFlags>CastShadows InScene</PersistentFlags>
-<PositionAndOrientation>
<Position z="355.7873" y="18738.05" x="1064.912"/>
<Forward z="-1" y="0" x="0"/>
<Up z="0" y="1" x="0"/>
</PositionAndOrientation>
<Filename>asteroid1.vox</Filename>
</MyObjectBuilder_EntityBase>
Here is my code, it just never finds anything...:(
from xml.etree import cElementTree as ElementTree
ElementTree.register_namespace('xsi', 'http://www.w3.org/2001/XMLScheme-instance')
namespace = {'xsi': 'http://www.w3.org/2001/XMLScheme-instance'}
xmlPath = 'e:\\test.xml'
xmlRoot = ElementTree.parse(xmlPath).getroot()
#why this no return anything
results = xmlRoot.findall(".//SectorObjects/MyObjectBuilder_EntityBase[#xsi:type='MyObjectBuilder_VoxelMap']", namespaces=namespace)
print(results)
Your question is 'What am I screwing up here?' First of all your XML itself has issues and seems you cannot get it to paste here right. I did few things to make it workable.
1) Added lines below since they were not there in the XML:
</SectorObjects>
</MyObjectBuilder_Sector>
2) The findall function doesn't take a named argument 'namespaces' and the xsi part also gave an error (SyntaxError: prefix 'xsi' not found in prefix map). So I changed the call to:
results = xmlRoot.findall(".//SectorObjects/MyObjectBuilder_EntityBase")
When I ran the code with above changes, I got this output below:
[<Element 'MyObjectBuilder_EntityBase' at 0x025028A8>, <Element 'MyObjectBuilder_EntityBase' at 0x02502CC8>, <Element 'MyObjectBuilder_EntityBase' at 0x02502E18>]
If you want to do more with these like getting the value of EntityId, you can do this:
results = xmlRoot.findall(".//SectorObjects/MyObjectBuilder_EntityBase")
try:
for result in results:
print result.find('EntityId').text
except AttributeError as aE:
print str(aE)
Output:
72248529206701361
72151252176979970
72108197145016458

Categories