extracting result text from xml using Python

extracting result text from xml using Python - python

I have the following xml :
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ns3:result xmlns:ns2="http://ws.def.com/">
<ns3:value>QWESW12323D2412123S</ns3:value>
</ns3:result>
and want to parse it with python and extract this text i tried the following :
from xml.etree import ElementTree as etree
xml = etree.fromstring(data)
item = xml.find('ns3:value')
print item
but i get empty item ,could someone help to achieve this with Python?

Use the syntax '{ns3}value' to apply the namespace - although as ns3 isn't defined, I don't think this is actually valid xml.

Related

why does xml data parse as a string and not int

i have been trying to get my xml code to work on my python and after a few hours it did work but the next day when i re opened the codes, it didnt work. there was always a "not defined error" and now when i try to re do the codes, the data turns into a string?
this is the code that i have now (just a summary)
<?xml version='1.0' encoding='UTF-8'?>
<settings>
<standard name="max_days">90</standard>
<standard name="min_days">7</standard>
<standard name="warn_days">7</standard>
</settings>
and i parse my xml file like this:
file_name = ET.parse('imports.xml')
myroot = file_name.getroot()
for i in myroot[0]:
print i.text
the problem is when i do the above, there will be no output at all not even an error but if i did
print myroot[0].text
it works but the data that i put is clearly an int so why is it a string when its parsed and whats the error?

python lxml insert namespaces into header

I am creating an xml file and I would like to insert namespaces into header. I want to get this.
<ns2:prenosPodatkovRazporedaOdgovorSporocilo xmlns="http://someurl" xmlns:ns2="http://someotherurl" xmlns:ns3="http://another_someotherurl">
<ns2:statusPrenosa>
<status>09</status>
<nazivStatusa>Napaka prenosa</nazivStatusa>
while I get this
<?xml version='1.0' encoding='UTF-8'?>
<prenosPodatkovRazporedaOdgovorSporocilo>
<ns0:podatkiRazporeda xmlns:ns0="ns2">
<ns2:podatkiRazporeda xmlns:ns2="ns3">
.....
register_namespace does not do the trick or how do I include them when using tostring() function
etree.register_namespace('n2',"http://someurl")
etree.register_namespace('n3',"http://someother_url")
etree.register_namespace('n4',"http://another_someotherurl")
hope I was clear enough
thank you

XML parsing with ElementTree produces wrong output

I want to parse an XML file with ElementTree but at a certain tag the output is wrong
<descriptions>
<description descriptionType="Abstract">Some Abstract Text
</description>
</descriptions>
So I parse it with the XML function
import xml.etree.ElementTree as ElementTree
root = ElementTree.XML(my_xml)
root.getchildren()[0].items()
and the outcome is:
Out: [('descriptionType', 'Abstract')]
Is there any problem with the XML, I use ElementTree in a wrong way or it's a bug?

I guess you want to get the text. So:
root.getchildren()[0].text
not
root.getchildren()[0].items()

It was just that if there are no tags its stored in the text attribute..

Python ElementTree - print out namespace definitions?

I'm using Python's elementtree to parse some XML configuration files.
At the top of the file, I have a root element like this:
<?xml version="1.0" encoding="utf-8"?>
<sgx:FooConfig
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:foo="http://ns.au.firm.com/foo.xsd"
xmlns:bar="http://ns.au.firm.com/bar.xsd"
>
The problem is, the bar namespace can be set to one of two different XSDs, depending on the version of the configuration file.
I'm looking for a way to print out the namespace mapping using ElementTree, so I can check which of the two XSDs is being used - then I can get my code to handle the correct case.
Is there a way to print out all the namespace definitions out using Python?
Cheers,
Victor

What you have is not valid xml (undefined prefixes) and I think you can't do this with xml.etree but you should be able to do it using lxml.
import lxml.etree as et
tree = et.XML(yourxml)
print tree.nsmap

Reading XML DOCTYPE info with Python

I need to parse a version of an XML file as follows.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE twReport [
<!ELEMENT twReport (twHead?, (twWarn | twDebug | twInfo)*, twBody, twSum?,
twDebug*, twFoot?, twClientInfo?)>
<!ATTLIST twReport version CDATA "10,4"> <----- VERSION INFO HERE
I use xml.dom.minidom for parsing XML file, and I need to parse the version of the XML file written in embedded DTD.
Can I use xml.dom.minidom for this purpose?
Is there any python XML parser for that purposes?

How about xmlproc's DTD api?
Here's a random snippet of code I wrote years and years ago to do some work with DTDs from Python, which might give you an idea of what it's like to work with this library:
from xml.parsers.xmlproc import dtdparser
attr_separator = '_'
child_separator = '_'
dtd = dtdparser.load_dtd('schedule.dtd')
for name, element in dtd.elems.items():
for attr in element.attrlist:
output = '%s%s%s = ' % (name, attr_separator, attr)
print output
for child in element.get_valid_elements(element.get_start_state()):
output = '%s%s%s = ' % (name, child_separator, child)
print output
(FYI, this was the first result when searching for "python dtd parser")

Because both of the the standard library XML libraries (xml.dom.minidom and xml.etree) use the same parser (xml.parsers.expat) you are limited in the "quality" of XML data you are able to successfully parse.
You're better off using the tried-and-true 3rd party modules out there like lxml or BeautifulSoup that are not only more resilient to errors, but will also give you exactly what you are looking for with little trouble.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

extracting result text from xml using Python - python

Use the syntax '{ns3}value' to apply the namespace - although as ns3 isn't defined, I don't think this is actually valid xml.

Related

why does xml data parse as a string and not int

python lxml insert namespaces into header

XML parsing with ElementTree produces wrong output

Python ElementTree - print out namespace definitions?

Reading XML DOCTYPE info with Python

Categories

Resources