Python:JS: How to parse multiple kml node in one xml file? - python

<xml>
<mapshape title="Bar" extras="">
<kml></kml>
</mapshape>
<mapshape title="Foo" extras="">
<kml></kml>
</mapshape>
</xml>
I've got a xml doc like that, multiple mapshape nodes in one xml, and each contains one valid kml file. I need to plot them all on google maps.
I have tried libraries like geoxml(3), they can parse one kml file, but my document has many kmls, how can I deal with this?

You can use lxml to extract the kml sections and then pass them on to the other library:
doc = """<xml>
<mapshape title="Bar" extras="">
<kml></kml>
</mapshape>
<mapshape title="Foo" extras="">
<kml></kml>
</mapshape>
</xml>"""
import lxml.etree as etree
xml = etree.fromstring(doc)
for mapshape in xml:
kml = etree.tostring(mapshape.getchildren()[0])
parseKML(kml)

Related

Extract xml data with in cdata using Python

I have a requirement where I have extract XML with in CDATA with in XML.
I am able to extract XML tags, but not XML tags in CDATA.
I need to extract
EventId = 122157660 (I am able to do, good with this).
_Type="Phone" _Value="5152083348" with in PAYLOAD/REQUEST_GROUP/REQUESTING_PARTY/CONTACT_DETAIL/CONTACT_POINT (need help with this.)
Below is the XML sample I am working with.
<B2B_DATA>
<B2B_METADATA>
<EventId>122157660</EventId>
<MessageType>Request</MessageType>
</B2B_METADATA>
<PAYLOAD>
<![CDATA[<?xml version="1.0"?>
<REQUEST_GROUP MISMOVersionID="1.1.1">
<REQUESTING_PARTY _Name="CityBank" _StreetAddress="801 Main St" _City="rockwall" _State="MD" _PostalCode="11311" _Identifier="416">
<CONTACT_DETAIL _Name="XX Davis">
<CONTACT_POINT _Type="Phone" _Value="1236573348"/>
<CONTACT_POINT _Type="Email" _Value="jXX#city.com"/>
</CONTACT_DETAIL>
</REQUESTING_PARTY>
</REQUEST_GROUP>]]>
</PAYLOAD>
</B2B_DATA>
I have tried this -
tree = ElementTree.parse('file.xml')
root = tree.getroot()
for child in root:
print(child.tag)
O/P
B2B_METADATA
PAYLOAD
Not able to parse inside PAYLOAD.
Any help is greatly appreciated.
What you need to do, in this case, is parse the outer xml, extract the xml in the CDATA, parse that inner xml and extract the target data from that.
I personally would use lxml and xpath, not ElementTree:
from lxml import etree
root = etree.parse('file.xml')
#step one: extract the cdata as a string
cd = root.xpath('//PAYLOAD//text()')[0].strip()
#step 2 - parse the cdata string as xml
doc = etree.XML(cd)
#finally, extract the target data
doc.xpath('//REQUESTING_PARTY//CONTACT_POINT[#_Type="Phone"]/#_Value')[0]
Output, based on your sample xml above:
'1236573348'

Write Open Office XML (e.g. docx) with XML that matches the OOXML namespace

I have a python program that edits the XML in a .docx file. I'd like to edit the XML with ETree.
When I read the XML from the .docx file, it begins like this:
b'<?xml version="1.0" encoding="UTF-8" standalone="yes"?>\r\n<w:document xmlns:wpc="http://schemas.micro'...
This is in a variable called data. I create the element tree with:
import xml.etree.ElementTree as ElementTree
tree = ElementTree.XML(data)
I convert it back with:
data = ElementTree.tostring(tree)
However, there have been subtle changes to the XML. It now looks like this:
b'<ns0:document xmlns:ns0="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:ns1="ht...
Word won't read this, even though it is standard XML.
EDIT: I tried adding the string to my XML, just to get it to round-trip:
XML_HEADER=b'<?xml version="1.0" encoding="UTF-8" standalone="yes"?>\r\n'
tree = ElementTree.XML(data)
data = XML_HEADER + ElementTree.tostring(tree)
But I still get the error:
We're sorry. We can't open <filename>.docx because we found a problem with its contents.
Details:
The XML data is invalid according to the schema.
Location: Part: /word/document.xml, Line: 0, Column:0
I can't fix word. I've got to generate XML that looks exactly like the XML that I started with. How do I get ETree to generate that?

Put a XML file inside a Python script?

I'm trying to create a face-detection script using Python's OpenCV using the haar cascade XML file.
My goal is to upload a python file to a website but due to some weird policies, I can only upload the Python file, without the XML...
The question is, is it possible to somehow put the XML file inside the Python script, say, convert it to a String or something and then generate an XML from that String?
xml = """<?xml version="1.0" encoding="UTF-8"?>
<a>
<b>Yes, you can embed XML in a string literal in Python.</b>
</a>"""
Not answer to title but answer of your description question.
Haar cascade doesn't support non-file XML strings. Also, if you try to put an XML file to a website and give a link to an XML file with cv2.CascadeClassifier(), it will give an error.
But you can use the request module on python to achieve what you want.
It gets XML from the website, then puts it into a file
def function(self, image):
# download XML from server
link = LINK_TO_XML
r = requests.get(link, allow_redirects=True)
open('haarcascade_frontalface_default.xml', 'wb').write(r.content)
# end of download
haar_cascade = cv.CascadeClassifier('haarcascade_frontalface_default.xml')
First, copy the contents of the XML file into the python file and assign the whole thing to a string. Then use XML library to create a tree type data structure named root which contains the contents of the XML file. This tree is traversable and you can do what you like with it in your program:
import xml.etree.ElementTree as ET
root = ET.fromstring(XML_file_example_as_string).
To generate XML from the string you can use ElementTree.write() like this:
tree = ET.ElementTree(root)
tree.write('example.xml')

Find all titles in an XML with Elementree from a bz2 file

I'm new to parsing in XML and am stuck with my code regarding finding all titles (title tags) in an XML. This is what I came up with, but it is returning just an empty list, while there should be titles in there.
import bz2
from xml.etree import ElementTree as etree
def parse_xml(filename):
with bz2.BZ2File(filename) as f:
doc = etree.parse(f)
titles = doc.findall('.//{http://www.mediawiki.org/xml/export-0.7/}title')
print titles[:10]
Can someone tell me why this is not working properly? Just to be clear; I need to find all text inside title tags stored in a list, taken from an XML wrapped in a bz2 file (as far as I read the best way is without unzipping).

Python parse XML files with HTML content

I use an API to get some XML files but some of them contain HTML tags without escaping them. For example, <br> or <b></b>
I use this code to read them, but the files with the HTML raise an error. I don't have access to change manually all the files. Is there any way to parse the file without losing the HTML tags?
from xml.dom.minidom import parse, parseString
xml = ...#here is the api to receive the xml file
dom = parse(xml)
strings = dom.getElementsByTagName("string")
Read the xml file as a string, and fix the malformed tags before you parse it:
import xml.etree.ElementTree as ET
with open(xml) as xml_file: # open the xml file for reading
text= xml_file.read() # read its contents
text= text.replace('<br>', '<br />') # fix malformed tags
document= ET.fromstring(text) # parse the string
strings= document.findall('string') # find all string elements
If you can use third-party libs I suggest you to use Beautiful Soup it can handle xml as well as html and also it parses broken markup, also providing easy to use api.

Categories