Replace contents of lxml StringElement with starting a namespace war?

Replace contents of lxml StringElement with starting a namespace war? - python

I can't figure out how to replace contents of lxml StringElement (styleUrl in this case) which already has a namespace (other than pytype). I end up getting an element level namespace injected. This is a much distilled and simplified version that only tries to rename one StyleMap to illustrate the issue:
#!/usr/bin/env python
from __future__ import print_function
import sys
from pykml import parser as kmlparser
from lxml import objectify
frm = "lineStyle30218901714341461519022"
to = "s1"
b4_et = kmlparser.parse('b4.kml')
b4_root = b4_et
el = b4_root.xpath('//*[#id="%s"]' % frm)[0]
el.attrib['id'] = to
el = b4_root.xpath('//*[text()="#%s"]' % frm)[0]
el.xpath('./..')[0].styleUrl = '#'+to
objectify.deannotate(b4_root, xsi_nil=True)
b4_et.write(sys.stdout, pretty_print=True)
test data:
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2" xmlns:gx="http://www.google.com/kml/ext/2.2" xmlns:kml="http://www.opengis.net/kml/2.2" xmlns:atom="http://www.w3.org/2005/Atom">
<Document>
<name>Wasatch Trails</name>
<open>1</open>
<Style id="lineStyle30218901714341461519049">
<LineStyle><color>ff0080ff</color><width>4</width></LineStyle>
</Style>
<Style id="lineStyle30218901714341461519027">
<LineStyle><color>ff0080ff</color><width>4</width></LineStyle>
</Style>
<StyleMap id="lineStyle30218901714341461519022">
<Pair><key>normal</key><styleUrl>#lineStyle30218901714341461519049</styleUrl></Pair>
<Pair><key>highlight</key><styleUrl>#lineStyle30218901714341461519027</styleUrl></Pair>
</StyleMap>
<Placemark>
<name>Trail</name>
<styleUrl>#lineStyle30218901714341461519022</styleUrl>
<LineString>
<tessellate>1</tessellate>
<coordinates>
-111.6472637672589,40.4810633294269,0 -111.650415221546,40.48116138407261,0 -111.6504410181637,40.48118694372887,0
</coordinates>
</LineString>
</Placemark>
</Document>
</kml>
The only issue I have not been able to resolve is lxml putting a xmlns:py="http://codespeak.net/lxml/objectify/pytype" attribute into the newly created styleUrl element. I'm guessing this is caused by the document having a default namespace for kml/2.2. I don't know how to tell it the new styleUrl should be kml instead of pytype.
...
<styleUrl xmlns:py="http://codespeak.net/lxml/objectify/pytype">#s1</styleUrl>
...

Replacing following:
el.xpath('./..')[0].styleUrl = '#'+to
with:
el.xpath('./..')[0].styleUrl = objectify.StringElement('#' + to)
will give you what you want. But I'm not sure whether this is the best way.
BTW, you can use set(key, value) method to set attribute value:
el.set('id', to) # isntead of el.attrib['id'] = to

Related

How to rewrite thid XML file?

I trying to rewrite this xml file containing this XML code:
<?xml version="1.0" encoding="UTF-8"?>
<BrowserAutomationStudioProject>
<ModelList>
<Model>
<Name>token</Name>
<Description ru="token" en="token"/>
<Value>5660191076:AAEY8RI3hXcI3dEvjWAj7p2e7DdxOMNjPfk8</Value>
</Model>
<Defaults/>
<Model>
<Name>chat_id</Name>
<Value>5578940124</Value>
</Model>
<Defaults/>
</ModelList>
</BrowserAutomationStudioProject>
My python code:
import xml.etree.ElementTree as ET
tree = ET.parse('Actual.xml')
root = tree.getroot()
for model in root.findall('Model'):
name = model.find('Name').text
if name == 'token':
model.find('Value').text = '123456789:ABCDEFGHIJKLMNOPQRSTUVWXYZ'
if name == 'chat_id':
model.find('Value').text = '1234567890'
tree.write('xml_file.xml')
It works but I get the same file:
<?xml version="1.0" encoding="UTF-8"?>
<BrowserAutomationStudioProject>
<ModelList>
<Model>
<Name>token</Name>
<Description ru="token" en="token"/>
<Value>5660191076:AAEY8RI3hXcI3dEvjWAj7p2e7DdxOMNjPfk8</Value>
</Model>
<Defaults/>
<Model>
<Name>chat_id</Name>
<Value>5578940124</Value>
</Model>
<Defaults/>
</ModelList>
</BrowserAutomationStudioProject>
What's wrong with my code?
Even ChatGPT can't help me haha
I even tried to print it but it doesn't work
What I should do?
Please help me.

As described in the documentation, Element.findall() finds only elements with a tag which are direct children of the current element.. You need to force ET to selects all subelements, on all levels beneath the current element by using //.
Since <Model> is not a direct child of root (it's a grandchild, or something to that effect :)), root.findall('Model') finds nothing. So to get ET to find it, you need to modify that to
root.findall('.//Model')
and it should work.

You could also use for model in root.findall('ModelList/Model').

If you know the order of the xml tag you can do something like pop() the values from a list by iterate through the tree:
import xml.etree.ElementTree as ET
tree = ET.parse('Actual.xml')
root = tree.getroot()
input_value = ['1234567890','123456789:ABCDEFGHIJKLMNOPQRSTUVWXYZ']
for elem in root.iter():
if elem.tag == "Value":
elem.text = input_value.pop()
print(elem.tag, elem.text)
tree.write('xml_file.xml')
Output:
<?xml version="1.0"?>
<BrowserAutomationStudioProject>
<ModelList>
<Model>
<Name>token</Name>
<Description ru="token" en="token" />
<Value>123456789:ABCDEFGHIJKLMNOPQRSTUVWXYZ</Value>
</Model>
<Defaults />
<Model>
<Name>chat_id</Name>
<Value>1234567890</Value>
</Model>
<Defaults />
</ModelList>
</BrowserAutomationStudioProject>

Removing Elements from a KML (Python)

I generated a KML file using Python's SimpleKML library and the following script, the output of which is also shown below:
import simplekml
kml = simplekml.Kml()
ground = kml.newgroundoverlay(name='Aerial Extent')
ground.icon.href = 'C:\\Users\\mdl518\\Desktop\\aerial_image.png'
ground.latlonbox.north = 46.55537
ground.latlonbox.south = 46.53134
ground.latlonbox.east = 48.60005
ground.latlonbox.west = 48.57678
ground.latlonbox.rotation = 0.090320
kml.save(".//aerial_extent.kml")
The output KML:
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2" xmlns:gx="http://www.google.com/kml/ext/2.2">
<Document id="1">
<GroundOverlay id="2">
<name>Aerial Extent</name>
<Icon id="3">
<href>C:\\Users\\mdl518\\Desktop\\aerial_image.png</href>
</Icon>
<LatLonBox>
<north>46.55537</north>
<south>46.53134</south>
<east>48.60005</east>
<west>48.57678</west>
<rotation>0.090320</rotation>
</LatLonBox>
</GroundOverlay>
</Document>
However, I am trying to remove the "Document" tag from this KML since it is a default element generated with SimpleKML, while keeping the child elements (e.g. GroundOverlay). Additionally, is there a way to remove the "id" attributes associated with specific elements (i.e. for the GroundOverlay, Icon elements)? I am exploring the usage of ElementTree/lxml to enable this, but these seem to be more specific to XML files as opposed to KMLs. Here's what I'm trying to use to modify the KML, but it is unable to remove the Document element:
from lxml import etree
tree = etree.fromstring(open("C:\\Users\\mdl518\\Desktop\\aerial_extent.kml").read())
for item in tree.xpath("//Document[#id='1']"):
item.getparent().remove(item)
print(etree.tostring(tree, pretty_print=True))
Here is the final desired output XML:
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2" xmlns:gx="http://www.google.com/kml/ext/2.2">
<GroundOverlay>
<name>Aerial Extent</name>
<Icon>
<href>C:\\Users\\mdl518\\Desktop\\aerial_image.png</href>
</Icon>
<LatLonBox>
<north>46.55537</north>
<south>46.53134</south>
<east>48.60005</east>
<west>48.57678</west>
<rotation>0.090320</rotation>
</LatLonBox>
</GroundOverlay>
</kml>
Any insights are most appreciated!

You are getting tripped up on the dreaded namespaces...
Try using something like this:
ns = {'kml': 'http://www.opengis.net/kml/2.2'}
for item in tree.xpath("//kml:Document[#id='1']",namespaces=ns):
item.getparent().remove(item)
Edit:
To remove just the parent and retain all its descendants, try the following:
retain = doc.xpath("//kml:Document[#id='1']/kml:GroundOverlay",namespaces=ns)[0]
for item in doc.xpath("//kml:Document[#id='1']",namespaces=ns):
anchor = item.getparent()
anchor.remove(item)
anchor.insert(1,retain)
print(etree.tostring(doc, pretty_print=True).decode())
This should get you the desired output.

Replacing a xml element with lxml

I have this complex xml file and we need to dynamically update some elements in it. I have successfully been able to update value strings (attributes) using lxml, but I'm completely unsure how to go about replacing an entire element. Here's some pseudo-code to show what I'm trying to do.
import os
from lxml import etree
directory_name = "C:\\apps"
file_name = "web.config"
xpath_identifier = '/configuration/applicationSettings/Things/setting[#name="CorsTrustedOrigins"]'
#contents of the xml file for reference:
<configuration>
<configSections>
<section name="log4net" type="log4net.Config.Log4NetConfigurationSectionHandler, log4net"/>
<sectionGroup name="applicationSettings" type="System.Configuration.ApplicationSettingsGroup, System, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089">
<section name="Things" type="System.Configuration.ClientSettingsSection, System, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089" requirePermission="false"/>
</sectionGroup>
</configSections>
<appSettings/>
<applicationSettings>
<Things>
<setting name="CorsTrustedOrigins" serializeAs="Xml">
<value>
<ArrayOfString>
<string>http://localhost:51363</string>
<string>http://localhost:3333</string>
</ArrayOfString>
</value>
</setting>
</Things>
</applicationSettings>
</configuration>
file_full_path = os.path.join(directory_name, file_name)
tree = etree.parse(file_full_path)
root = tree.getroot()
etree.tostring(root)
xpath_identifier = str(xpath_identifier)
value = root.xpath(xpath_identifier)
#This successfully prints the element I'm after, so I'm sure my xpath is good:
etree.tostring(value[0])
#This is the new xml element I want to replace the current xpath'ed element with:
newxml = '''
<setting name="CorsTrustedOrigins" serializeAs="Xml">
<value>
<ArrayOfString>
<string>http://maddafakka</string>
</ArrayOfString>
</value>
</setting>
'''
newtree = etree.fromstring(newxml)
#I've tried this:
value[0].getparent().replace(value[0], newtree)
#and this
value[0] = newtree
#The value of value[0] gets updated, but the "root document" does not.
What I'm trying to do is to update the "ArrayofStrings" element to reflect the values in the "newxml" var.
I'm kindof struggling to navigate the lxml infos on the web, but I can't seem to find an example similar to what I'm trying to do.
Any pointers appreciated!

You should just remove the indexed access on the node:
value[0].getparent().replace(value[0], newtree)
.... to:
value.getparent().replace(value, newtree)

Error in escaping XML for a KML file

Some time ago I asked a question trying to figure out why modifying a KML file increased the file size.
After poking around, I've found that the issue had to do with escaping XML.
Essentially, the "<", ">", and "&" characters were being replaced with:
"<", ">", and "&"
It's not a big deal for smaller files, but the extra characters make a big difference in larger files.
I copied some code from this site to help solve the problem:
import lxml
from lxml import etree
import pykml
from pykml.factory import KML_ElementMaker as KML
from pykml import parser
def unescape(s):
s = s.replace("<", "<")
s = s.replace(">", ">")
## Ampersands must be last to avoid errors in text replacement
s = s.replace("&", "&")
return s
with open("myplaces.kml", "rb") as f:
doc = parser.parse(f).getroot()
a = doc.Document.Folder[0].Folder[1]
for q in GEList:
x = KML.Folder(KML.name(q))
a.append(x)
finished = (etree.tostring(doc, pretty_print = True))
finished = unescape(finished)
with open("myplaces.kml", "wb") as f:
f.write(finished)
Now however, I'm running into another error. I compared the file before and after I replaced the <, >, and & characters.
Before: <description><![CDATA[<img src="fedland_leg_pop_2.jpg" alt="headerimg" width="550" height="77"><br>
After: <description><img src="fedland_leg_pop_2.jpg" alt="headerimg" width="550" height="77"><br>
Now it seems to be throwing out "< ![CDATA[", & I can't figure out why.

I had the same issue but then I found this (https://developers.google.com/kml/documentation/kml_tut#descriptive_html):
Using the CDATA Element
If you want to write standard HTML inside a tag, you can put it inside a CDATA tag. If you don't, the angle brackets need to be written as entity references to prevent Google Earth from parsing the HTML incorrectly (for example, the symbol > is written as > and the symbol < is written as <). This is a standard feature of XML and is not unique to Google Earth.
Consider the difference between HTML markup with CDATA tags and without CDATA. First, here's the with CDATA tags:
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2">
<Document>
<Placemark>
<name>CDATA example</name>
<description>
<![CDATA[
<h1>CDATA Tags are useful!</h1>
<p><font color="red">Text is <i>more readable</i> and
<b>easier to write</b> when you can avoid using entity
references.</font></p>
]]>
</description>
<Point>
<coordinates>102.595626,14.996729</coordinates>
</Point>
</Placemark>
</Document>
</kml>
And here's the without CDATA tags, so that special characters must use entity references:
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2">
<Document>
<Placemark>
<name>Entity references example</name>
<description>
<h1>Entity references are hard to type!</h1>
<p><font color="green">Text is
<i>more readable</i>
and <b>easier to write</b>
when you can avoid using entity references.</font></p>
</description>
<Point>
<coordinates>102.594411,14.998518</coordinates>
</Point>
</Placemark>
</Document>
</kml>

Element Tree doesn't load a Google Earth-exported KML

I have a problem related to a Google Earth exported KML, as it doesn't seem to work well with Element Tree. I don't have a clue where the problem might lie, so I will explain how I do everything.
Here is the relevant code:
kmlFile = open( filePath, 'r' ).read( -1 ) # read the whole file as text
kmlFile = kmlFile.replace( 'gx:', 'gx' ) # we need this as otherwise the Element Tree parser
# will give an error
kmlData = ET.fromstring( kmlFile )
document = kmlData.find( 'Document' )
With this code, ET (Element Tree object) creates an Element object accessible via variable kmlData. It points to the root element ('kml' tag). However, when I run a search for the sub-element 'Document', it returns None. Although the 'Document' tag is present in the KML file!
Are there any other discrepancies between KMLs and XMLs apart from the 'gx: smth' tags? I have searched through the KML files I am dealing with and found nothing suspicious. Here is a simplified structure of an KML file the program is supposed to deal with:
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://earth.google.com/kml/2.2">
<Document>
<name>UK.kmz</name>
<Style id="sh_blu-blank">
<IconStyle>
<scale>1.3</scale>
<Icon>
<href>http://maps.google.com/mapfiles/kml/paddle/blu-blank.png</href>
</Icon>
<hotSpot x="32" y="1" xunits="pixels" yunits="pixels"/>
</IconStyle>
<ListStyle>
<ItemIcon>
<href>http://maps.google.com/mapfiles/kml/paddle/blu-blank-lv.png</href>
</ItemIcon>
</ListStyle>
</Style>
[other style tags...]
<Folder>
<name>UK</name>
<Placemark>
<name>1262 Crossness Pumping Station</name>
<LookAt>
<longitude>0.1329926667038817</longitude>
<latitude>51.50303535104574</latitude>
<altitude>0</altitude>
<range>4246.539753518848</range>
<tilt>0</tilt>
<heading>-4.295161152207489</heading>
<altitudeMode>relativeToGround</altitudeMode>
<gx:altitudeMode>relativeToSeaFloor</gx:altitudeMode>
</LookAt>
<styleUrl>#msn_blu-blank15000</styleUrl>
<Point>
<coordinates>0.1389579668507301,51.50888923518947,0</coordinates>
</Point>
</Placemark>
[other placemark tags...]
</Folder>
</Document>
</kml>
Do you have an idea why I can't access any sub-elements of 'kml'? By the way, Python version is 2.7.

The KML document is in the http://earth.google.com/kml/2.2 namespace, as indicated by
<kml xmlns="http://earth.google.com/kml/2.2">
This means that the name of the Document element is in fact {http://earth.google.com/kml/2.2}Document.
Instead of this:
document = kmlData.find('Document')
you need this:
document = kmlData.find('{http://earth.google.com/kml/2.2}Document')
However, there is a problem with the XML file. There is an element called gx:altitudeMode. The gx bit is a namespace prefix. Such a prefix needs to be declared, but the declaration is missing.
You have worked around the problem by simply replacing gx: with gx. But the proper way to do this would be to add the namespace declaration. Based on https://developers.google.com/kml/documentation/altitudemode, I take it that gx is associated with the http://www.google.com/kml/ext/2.2 namespace. So for the document to be well-formed, the root element start tag should read
<kml xmlns="http://earth.google.com/kml/2.2" xmlns:gx="http://www.google.com/kml/ext/2.2">
Now the document can be parsed:
In [1]: from xml.etree import ElementTree as ET
In [2]: kmlData = ET.parse("kml2.xml")
In [3]: document = kmlData.find('{http://earth.google.com/kml/2.2}Document')
In [4]: document
Out[4]: <Element '{http://earth.google.com/kml/2.2}Document' at 0x1895810>
In [5]:

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Replace contents of lxml StringElement with starting a namespace war? - python

Related

How to rewrite thid XML file?

Removing Elements from a KML (Python)

Replacing a xml element with lxml

Error in escaping XML for a KML file

Element Tree doesn't load a Google Earth-exported KML

Categories

Resources