Put a XML file inside a Python script?

Put a XML file inside a Python script? - python

I'm trying to create a face-detection script using Python's OpenCV using the haar cascade XML file.
My goal is to upload a python file to a website but due to some weird policies, I can only upload the Python file, without the XML...
The question is, is it possible to somehow put the XML file inside the Python script, say, convert it to a String or something and then generate an XML from that String?

xml = """<?xml version="1.0" encoding="UTF-8"?>
<a>
<b>Yes, you can embed XML in a string literal in Python.</b>
</a>"""

Not answer to title but answer of your description question.
Haar cascade doesn't support non-file XML strings. Also, if you try to put an XML file to a website and give a link to an XML file with cv2.CascadeClassifier(), it will give an error.
But you can use the request module on python to achieve what you want.
It gets XML from the website, then puts it into a file
def function(self, image):
# download XML from server
link = LINK_TO_XML
r = requests.get(link, allow_redirects=True)
open('haarcascade_frontalface_default.xml', 'wb').write(r.content)
# end of download
haar_cascade = cv.CascadeClassifier('haarcascade_frontalface_default.xml')

First, copy the contents of the XML file into the python file and assign the whole thing to a string. Then use XML library to create a tree type data structure named root which contains the contents of the XML file. This tree is traversable and you can do what you like with it in your program:
import xml.etree.ElementTree as ET
root = ET.fromstring(XML_file_example_as_string).
To generate XML from the string you can use ElementTree.write() like this:
tree = ET.ElementTree(root)
tree.write('example.xml')

Related

Python FileStorage: Read XML file

Below is a simple example to write an XML file and read it back. The writing works OK, but I am not sure how to read this file back? Below is some sample code. How do I get thse values from the XML file?
file1 = 'result1.xml'
fs = cv2.FileStorage(file1, cv2.FILE_STORAGE_WRITE)
fs.write('var1', 1)
fs.write('var2', 2)
fs = cv2.FileStorage(file1,cv2.FILE_STORAGE_READ)
fn = fs.real

Python in different versions has its own library to parse XML data.
Here is where you can find the documentation : XML Library
You have to be careful when using it, as said in title of the webpage, this library is not safe if XML files aren't built properly.
Here is another useful website : How to parse XML files using Python ?

Write Open Office XML (e.g. docx) with XML that matches the OOXML namespace

I have a python program that edits the XML in a .docx file. I'd like to edit the XML with ETree.
When I read the XML from the .docx file, it begins like this:
b'<?xml version="1.0" encoding="UTF-8" standalone="yes"?>\r\n<w:document xmlns:wpc="http://schemas.micro'...
This is in a variable called data. I create the element tree with:
import xml.etree.ElementTree as ElementTree
tree = ElementTree.XML(data)
I convert it back with:
data = ElementTree.tostring(tree)
However, there have been subtle changes to the XML. It now looks like this:
b'<ns0:document xmlns:ns0="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:ns1="ht...
Word won't read this, even though it is standard XML.
EDIT: I tried adding the string to my XML, just to get it to round-trip:
XML_HEADER=b'<?xml version="1.0" encoding="UTF-8" standalone="yes"?>\r\n'
tree = ElementTree.XML(data)
data = XML_HEADER + ElementTree.tostring(tree)
But I still get the error:
We're sorry. We can't open <filename>.docx because we found a problem with its contents.
Details:
The XML data is invalid according to the schema.
Location: Part: /word/document.xml, Line: 0, Column:0
I can't fix word. I've got to generate XML that looks exactly like the XML that I started with. How do I get ETree to generate that?

Understanding XML and XSD parsing in Python 3

So I will ask the question somewhat vaugley because I'm first not sure if the question can be asked.. Here goes,
I want to read an XML tree in using Python3 which I am new to. I have accomplished this with relative ease using:
xml.etree.ElementTree.parse(urllib.request.urlopen(url))
The XML stream are different data sets and there is a XSD which is available which I have also parsed int he same way. Now, my question is can I create a parser using the XSD schema? I am new to XML in this way, but I have found examples where the parser object was generated using the XSD then the XML was read in accordingly. However, I cannot find the equivalent in Python3.
Here is vaugely what I want in Python2.X:
schema = etree.XMLSchema(schema_root)
xmlparser = etree.XMLParser(schema=schema)
I'm not sure if I'm even conceptualizing this correctly. Maybe this is an XML problem not a python problem, i.e., maybe you can only validate the XML against the schema and not actually use it to parse with the specifics from the XSD. Anyone help clear this up?

xmlschema is a Python module to manage XML Schemas or validate XML instances to the XSD.
Example to validate XML documents from URL against the XML Schema using Python3.
import requests
import xmlschema
import xml.etree.ElementTree as ET
# to make it simple use external XML Schema and create a local file from it to validate XML examples
xsd_url = "https://raw.githubusercontent.com/sissaschool/xmlschema/master/tests/test_cases/examples/collection/collection.xsd"
with open("test.xsd", "w", newline="") as out:
out.write(requests.get(xsd_url).text)
xsd = xmlschema.XMLSchema("test.xsd")
# XML #1 validates to the Schema
url1 = "https://raw.githubusercontent.com/sissaschool/xmlschema/master/tests/test_cases/examples/collection/collection.xml"
xt = ET.fromstring(requests.get(url1).text)
print("xml1 valid=", xsd.is_valid(xt), sep="")
# XML #2 with invalid structure
url2 = "https://raw.githubusercontent.com/sissaschool/xmlschema/master/tests/test_cases/examples/collection/collection-1_error.xml"
xt = ET.fromstring(requests.get(url2).text)
print("xml2 valid=", xsd.is_valid(xt), sep="")
Output:
xml1 valid=True
xml2 valid=False

Following xs:include when parsing XSD as XML with lxml in Python

So, my problem is I'm trying to do something a little un-orthodox. I have a complicated set of XSD files. However I don't want to use these XSD files to verify an XML file; I want to parse these XSDs as XML and interrogate them just as I would a normal XML file. This is possible because XSDs are valid XML. I am using lxml with Python3.
The problem I'm having is with the statement:
<xs:include schemaLocation="sdm-extension.xsd"/>
If I instruct lxml to create an XSD for verifying like this:
schema = etree.XMLSchema(schema_root)
this dependency will be resolved (the file exists in the same directory as the one I've just loaded). HOWEVER, I am treating these as XML so, correctly, lxml just treats this as a normal element with an attribute and does not follow it.
Is there an easy or correct way to extend lxml so that I may have the same or similar behaviour as, say
<xi:include href="metadata.xml" parse="xml" xpointer="title"/>
I could, of course, create a separate xml file manually that includes all the dependencies in the XSD schema. That is perhaps a solution?

So it seems like one option is to use the xi:xinclude method and create a separate xml file that includes all the XSDs I want to parse. Something along the lines of:
<fullxsd>
<xi:include href="./xsd-cdisc-sdm-1.0.0/sdm1-0-0.xsd" parse="xml"/>
<xi:include href="./xsd-cdisc-sdm-1.0.0/sdm-ns-structure.xsd" parse="xml"/>
</fullxsd>
Then use some lxml along the lines of
def combine(xsd_file):
with open(xsd_file, 'rb') as f_xsd:
parser = etree.XMLParser(recover=True, encoding='utf-8',remove_comments=True, remove_blank_text=True)
xsd_source = f_xsd.read()
root = etree.fromstring(xsd_source, parser)
incl = etree.XInclude()
incl(root)
print(etree.tostring(root, pretty_print=True))
Its not ideal but it seems the proper way. I've looked at custom URI parsers in the lxml but that would mean actually altering the XSDs which seems messier.

Try this:
def validate_xml(schema_file, xml_file):
xsd_doc = etree.parse(schema_file)
xsd = etree.XMLSchema(xsd_doc)
xml = etree.parse(xml_file)
return xsd.validate(xml)

Python parse XML files with HTML content

I use an API to get some XML files but some of them contain HTML tags without escaping them. For example, <br> or <b></b>
I use this code to read them, but the files with the HTML raise an error. I don't have access to change manually all the files. Is there any way to parse the file without losing the HTML tags?
from xml.dom.minidom import parse, parseString
xml = ...#here is the api to receive the xml file
dom = parse(xml)
strings = dom.getElementsByTagName("string")

Read the xml file as a string, and fix the malformed tags before you parse it:
import xml.etree.ElementTree as ET
with open(xml) as xml_file: # open the xml file for reading
text= xml_file.read() # read its contents
text= text.replace('<br>', '<br />') # fix malformed tags
document= ET.fromstring(text) # parse the string
strings= document.findall('string') # find all string elements

If you can use third-party libs I suggest you to use Beautiful Soup it can handle xml as well as html and also it parses broken markup, also providing easy to use api.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Put a XML file inside a Python script? - python

xml = """<?xml version="1.0" encoding="UTF-8"?> <a> <b>Yes, you can embed XML in a string literal in Python.</b> </a>"""

Related

Python FileStorage: Read XML file

Write Open Office XML (e.g. docx) with XML that matches the OOXML namespace

Understanding XML and XSD parsing in Python 3

Following xs:include when parsing XSD as XML with lxml in Python

Python parse XML files with HTML content

Categories

Resources