why does xml data parse as a string and not int - python

i have been trying to get my xml code to work on my python and after a few hours it did work but the next day when i re opened the codes, it didnt work. there was always a "not defined error" and now when i try to re do the codes, the data turns into a string?
this is the code that i have now (just a summary)
<?xml version='1.0' encoding='UTF-8'?>
<settings>
<standard name="max_days">90</standard>
<standard name="min_days">7</standard>
<standard name="warn_days">7</standard>
</settings>
and i parse my xml file like this:
file_name = ET.parse('imports.xml')
myroot = file_name.getroot()
for i in myroot[0]:
print i.text
the problem is when i do the above, there will be no output at all not even an error but if i did
print myroot[0].text
it works but the data that i put is clearly an int so why is it a string when its parsed and whats the error?

Related

Importing xml files with an encoder element at the start(using lxml)

I have a file trace.xml.The very first line in this file is<?xml version="1.0" encoding="UTF-8"?>.I tried reading the data using the following command:
with open('trace.xml') as fobj:
xml=fobj.read()
root = etree.fromstring(xml)
This however yields the following error:ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration.
I am assuming its because of the very first tag i mentioned above.Is there a way around this any help would be appreciated.
Try changing
root = etree.fromstring(xml)
to
root = etree.fromstring(xml.encode())
and see if it works.

how to build xml file in python, with formatting

I'm trying to build a xml file in python so I can write it out to a file, but I'm getting complications with new lines and tabbing etc...
I cannot use a module to do this - because Im using a cut down version of python 2. It must all be in pure python.
For instance, how is it possible to create a xml file with this type of formatting, which keeps all the new lines and tabs (whitespace)?
e.g.
<?xml version="1.0" encoding="UTF-8"?>
<myfiledata>
<mydata>
blahblah
</mydata>
</myfiledata>
I've tried enclosing each line
' <myfiledata>' +\n
' blahblah' +\n
etc.
However, the output Im getting from the script is not anything close to how it looks in my python file, there is extra white space and the new lines arent properly working.
Is there any definitive way to do this? I would rather be editing a file that looks somewhat like what I will end up with - for clarity sake...
You can use XMLGenerator from saxutils to generate the XML and xml.dom.minidom to parse it and print the pretty xml (both modules from standard library in Python 2).
Sample code creating a XML and pretty-printing it:
from __future__ import print_function
from xml.sax.saxutils import XMLGenerator
import io
import xml.dom.minidom
def pprint_xml_string(s):
"""Pretty-print an XML string with minidom"""
parsed = xml.dom.minidom.parse(io.BytesIO(s))
return parsed.toprettyxml()
# create a XML file in-memory:
fp = io.BytesIO()
xg = XMLGenerator(fp)
xg.startDocument()
xg.startElement('root', {})
xg.startElement('subitem', {})
xg.characters('text content')
xg.endElement('subitem')
xg.startElement('subitem', {})
xg.characters('text content for another subitem')
xg.endElement('subitem')
xg.endElement('root')
xg.endDocument()
# pretty-print it
xml_string = fp.getvalue()
pretty_xml = pprint_xml_string(xml_string)
print(pretty_xml)
Output is:
<?xml version="1.0" ?>
<root>
<subitem>text content</subitem>
<subitem>text content for another subitem</subitem>
</root>
Note that the text content elements (wrapped in <subitem> tags) aren't indented because doing so would change their content (XML doesn't ignore whitespace like HTML does).
The answer was to use xml.element.tree and from xml.dom import minidom
Which are all available on python 2.5

extracting result text from xml using Python

I have the following xml :
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ns3:result xmlns:ns2="http://ws.def.com/">
<ns3:value>QWESW12323D2412123S</ns3:value>
</ns3:result>
and want to parse it with python and extract this text i tried the following :
from xml.etree import ElementTree as etree
xml = etree.fromstring(data)
item = xml.find('ns3:value')
print item
but i get empty item ,could someone help to achieve this with Python?
Use the syntax '{ns3}value' to apply the namespace - although as ns3 isn't defined, I don't think this is actually valid xml.

How to auto-close xml tags in truncated file?

I receive an email when a system in my company generates an error. This email contains XML all crammed onto a single line.
I wrote a notepad++ Python script that parses out everything except XML and pretty prints it. Unfortunately some of the emails contain too much XML data and it gets truncated. In general, the truncated data isn't that important to me. I would like to be able to just auto-close any open tags so that my Python script works. It doesn't need to be smart or correct, it just needs to make the xml well-enough formed that the script runs. Is there a way to do this?
I am open to Python scripts, online apps, downloadable apps, etc.
I realize that the right solution is to get the non-truncated xml, but pulling the right lever to get things done will be far more work than just dealing with it.
Use Beautiful Soup
>>> import bs4
>>> s= bs4.BeautifulSoup("<asd><xyz>asd</xyz>")
>>> s
<html><head></head><body><asd><xyz>asd</xyz></asd></body></html>
>>
>>> s.body.contents[0]
<asd><xyz>asd</xyz></asd>
Notice that it closed the "asd" tag automagically"
To create a notepad++ script to handle this,
download the tarball and extract the files
Copy the bs4 directory to your PythonScript/scripts folder.
In notepad++ add the following code to your python script
#import Beautiful Soup
import bs4
#get text in document
text = editor.getText()
#soupify it to fix XML
soup = bs4.BeautifulSoup(text)
#convert soup object to string again
text = str(soup)
#clear editor and replace bad xml with fixed xml
editor.clearAll()
editor.addText(text)
#change language to xml
notepad.menuCommand( MENUCOMMAND.LANG_XML )
#soup has its own prettify, but I like the XML tools version better
notepad.runMenuCommand('XML Tools', 'Pretty print (XML only - with line breaks)', 1)
If you have BeautifulSoup and lxml installed, it's straightforward:
>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup("""
... <?xml version="1.0" encoding="utf-8"?>
... <a>
... <b>foo</b>
... <c>bar</""", "xml")
>>> soup
<?xml version="1.0" encoding="utf-8"?>
<a>
<b>foo</b>
<c>bar</c></a>
Note the second "xml" argument to the constructor to avoid the XML being interpreted as HTML.

Parsing an XML file in python for emailing purposes

I am writing code in python that can not only read a xml but also send the results of that parsing as an email. Now I am having trouble just trying to read the file I have in xml. I made a simple python script that I thought would at least read the file which I can then try to email within python but I am getting a Syntax Error in line 4.
root.tag 'log'
Anyways here is the code I written so far:
import xml.etree.cElementTree as etree
tree = etree.parse('C:/opidea.xml')
response = tree.getroot()
log = response.find('log').text
logentry = response.find('logentry').text
author = response.find('author').text
date = response.find('date').text
msg = [i.text for i in response.find('msg')]
Now the xml file has this type of formating
<log>
<logentry
revision="12345">
<author>glv</author>
<date>2012-08-09T13:16:24.488462Z</date>
<paths>
<path
action="M"
kind="file">/trunk/build.xml</path>
</paths>
<msg>BUG_NUMBER:N/A
FEATURE_AFFECTED:N/A
OVERVIEW:Example</msg>
</logentry>
</log>
I want to be able to send an email of this xml file. For now though I am just trying to get the python code to read the xml file.
response.find('log') won't find anything, because:
find(self, path, namespaces=None)
Finds the first matching subelement, by tag name or path.
In your case log is not a subelement, but rather the root element itself. You can get its text directly, though: response.text. But in your example the log element doesn't have any text in it, anyway.
EDIT: Sorry, that quote from the docs actually applies to lxml.etree documentation, rather than xml.etree.
I'm not sure about the reason, but all other calls to find also return None (you can find it out by printing response.find('date') and so on). With lxml ou can use xpath instead:
author = response.xpath('//author')[0].text
msg = [i.text for i in response.xpath('//msg')]
In any case, your use of find is not correct for msg, because find always returns a single element, not a list of them.

Categories