i am trying to get the text of an element using mini dom, in the following code , i have also tried getText() Method as suggested here , but i am unable to get the desired output, following is my code. I dont get the Text value from the element i am trying to work on.
import xml.dom.minidom
doc = xml.dom.minidom.parse("DL_INVOICE_DETAIL_TCB.xml")
results = doc.getElementsByTagName("G_TRANSACTIONS")
def getText(nodelist):
rc = []
for node in nodelist:
if node.nodeType == node.TEXT_NODE:
rc.append(node.data)
return ''.join(rc)
for result in results:
for element in result.getElementsByTagName("INVOICE_NUMBER"):
print(element.nodeType)
print(element.nodeValue)
Following is my XML sample
<LIST_G_TRANSACTIONS>
<G_TRANSACTIONS>
<INVOICE_NUMBER>31002</INVOICE_NUMBER>
<TRANSACTION_CLASS>Invoice</TRANSACTION_CLASS>
</G_TRANSACTIONS>
</LIST_G_TRANSACTIONS>
I am using the following
A minidom based answer
from xml.dom import minidom
xml = """\
<LIST_G_TRANSACTIONS>
<G_TRANSACTIONS>
<INVOICE_NUMBER>31002</INVOICE_NUMBER>
<TRANSACTION_CLASS>Invoice1</TRANSACTION_CLASS>
</G_TRANSACTIONS>
<G_TRANSACTIONS>
<INVOICE_NUMBER>31006</INVOICE_NUMBER>
<TRANSACTION_CLASS>Invoice2</TRANSACTION_CLASS>
</G_TRANSACTIONS>
</LIST_G_TRANSACTIONS>"""
dom = minidom.parseString(xml)
invoice_numbers = [int(x.firstChild.data) for x in dom.getElementsByTagName("INVOICE_NUMBER")]
print(invoice_numbers)
output
[31002, 31006]
If using ElementTree is fine with you, here is the code:
import xml.etree.ElementTree as ET
xml = '''<LIST_G_TRANSACTIONS>
<G_TRANSACTIONS>
<INVOICE_NUMBER>31002</INVOICE_NUMBER>
<TRANSACTION_CLASS>Invoice1</TRANSACTION_CLASS>
</G_TRANSACTIONS>
<G_TRANSACTIONS>
<INVOICE_NUMBER>31006</INVOICE_NUMBER>
<TRANSACTION_CLASS>Invoice2</TRANSACTION_CLASS>
</G_TRANSACTIONS>
</LIST_G_TRANSACTIONS>'''
root = ET.fromstring(xml)
invoice_numbers = [entry.text for entry in list(root.findall('.//INVOICE_NUMBER'))]
print(invoice_numbers)
output
['31002', '31006']
Related
I need to access the tags in UBL 2.1 and modify them depend on the on the user input on python.
So, I used the ElementTree library to access the tags and modify them.
Here is a sample of the xml code:
<ns0:Invoice xmlns:ns0="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2" xmlns:ns1="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" xmlns:ns2="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2">
<ns1:ProfileID>reporting:1.0</ns1:ProfileID>
<ns1:ID>0</ns1:ID>
<ns1:UUID>dbdf65eb-5d66-47e6-bb0c-a84bbf7baa30</ns1:UUID>
<ns1:IssueDate>2022-11-05</ns1:IssueDate>
The issue :
I want to access the tags but it is doesn't modifed and enter the loop
I tried both ways:
mytree = ET.parse('test.xml')
myroot = mytree.getroot()
for x in myroot.find({xmlns:ns1=urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2}IssueDate}"):
x.text = '1999'
mytree.write('test.xml')
mytree = ET.parse('test.xml')
myroot = mytree.getroot()
for x in myroot.iter('./Invoice/AllowanceCharge/ChargeIndicator'):
x.text = str('true')
mytree.write('test.xml')
None of them worked and modify the tag.
So the questions is : How can I reach the specific tag and modify it?
If you correct the namespace and the brakets in your for loop it works for a valid XML like (root tag must be closed!):
Input:
<?xml version="1.0" encoding="utf-8"?>
<ns0:Invoice xmlns:ns0="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2" xmlns:ns1="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" xmlns:ns2="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2">
<ns1:ProfileID>reporting:1.0</ns1:ProfileID>
<ns1:ID>0</ns1:ID>
<ns1:UUID>dbdf65eb-5d66-47e6-bb0c-a84bbf7baa30</ns1:UUID>
<ns1:IssueDate>2022-11-05</ns1:IssueDate>
</ns0:Invoice>
Your repaired code:
import xml.etree.ElementTree as ET
tree = ET.parse('test.xml')
root = tree.getroot()
for elem in root.findall("{urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2}IssueDate"):
elem.text = '1999'
tree.write('test_changed.xml', encoding='utf-8', xml_declaration=True)
ET.dump(root)
Output:
<ns0:Invoice xmlns:ns0="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2" xmlns:ns1="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2">
<ns1:ProfileID>reporting:1.0</ns1:ProfileID>
<ns1:ID>0</ns1:ID>
<ns1:UUID>dbdf65eb-5d66-47e6-bb0c-a84bbf7baa30</ns1:UUID>
<ns1:IssueDate>1999</ns1:IssueDate>
</ns0:Invoice>
I am using Python 3.7 with Anaconda, and I am trying to get the text value from the XML element following the element with text = CX.PAIR1.BORROWER.FICO. So in this case I would like to return '779'.
As can be seen with the following XML segment, these tags are not unique and have no names or attributes, so only the contained text can be used to locate the data.
<CustomField>
<id>CustomField/63</id>
<fieldName>CX.WLS.RATETYPE</fieldName>
<stringValue>FIX30</stringValue>
</CustomField>
<CustomField>
<id>CustomField/64</id>
<fieldName>CX.PAIR1.BORROWER.FICO</fieldName>
<stringValue>779</stringValue>
<numericValue>779.0</numericValue>
</CustomField>
<CustomField>
<id>CustomField/65</id>
<fieldName>CX.PAIRS16</fieldName>
<stringValue>779</stringValue>
<numericValue>779.0</numericValue>
</CustomField>
I have tried various forms of this:
Borrower_FICO = root.find('.//*[fieldName = "CX.PAIR1.BORROWER.FICO"]/following-sibling::node()')
and..
Borrower_FICO = root.find('.//*[text() = "CX.PAIR1.BORROWER.FICO"]/following-sibling::node()')
but can't see to pull the data into my variable Borrower_FICO
What am I getting wrong here?
following-sibling is not supported by ElementTree. It does work with lxm though.
I converted to lxml and got it to work using a loop:
for elem in root.iter():
if elem.text == "CX.PAIR1.BORROWER.FICO" :
Borrower_FICO = elem.getnext().text
Below: (no need for lxml)
import xml.etree.ElementTree as ET
xml = '''<r><CustomField>
<id>CustomField/63</id>
<fieldName>CX.WLS.RATETYPE</fieldName>
<stringValue>FIX30</stringValue>
</CustomField>
<CustomField>
<id>CustomField/64</id>
<fieldName>CX.PAIR1.BORROWER.FICO</fieldName>
<stringValue>779</stringValue>
<numericValue>779.0</numericValue>
</CustomField>
<CustomField>
<id>CustomField/65</id>
<fieldName>CX.PAIRS16</fieldName>
<stringValue>779</stringValue>
<numericValue>779.0</numericValue>
</CustomField></r>'''
root = ET.fromstring(xml)
entry = root.find(".//CustomField/[fieldName='CX.PAIR1.BORROWER.FICO']")
print(entry.find('stringValue').text)
output:
779
I'm trying to parse the following XML using Python and lxml:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="/bind9.xsl"?>
<isc version="1.0">
<bind>
<statistics version="2.2">
<memory>
<summary>
<TotalUse>1232952256
</TotalUse>
<InUse>835252452
</InUse>
<BlockSize>598212608
</BlockSize>
<ContextSize>52670016
</ContextSize>
<Lost>0
</Lost>
</summary>
</memory>
</statistics>
</bind>
</isc>
The goal is to extract the tag name and text of every element under bind/statistics/memory/summary in order to produce the following mapping:
TotalUse: 1232952256
InUse: 835252452
BlockSize: 598212608
ContextSize: 52670016
Lost: 0
I've managed to extract the element values, but I can't figure out the xpath expression to get the element tag names.
A sample script:
from lxml import etree as et
def main():
xmlfile = "bind982.xml"
location = "bind/statistics/memory/summary/*"
label_selector = "??????" ## what to put here...?
value_selector = "text()"
with open(xmlfile, "r") as data:
xmldata = et.parse(data)
etree = xmldata.getroot()
statlist = etree.xpath(location)
for stat in statlist:
label = stat.xpath(label_selector)[0]
value = stat.xpath(value_selector)[0]
print "{0}: {1}".format(label, value)
if __name__ == '__main__':
main()
I know I could use value = stat.tag instead of stat.xpath(), but the script must be sufficiently generic to also process other pieces of XML where the label selector is different.
What xpath selector would return an element's tag name?
Simply use XPath's name(), and remove the zero index since this returns a string and not list.
from lxml import etree as et
def main():
xmlfile = "ExtractXPathTagName.xml"
location = "bind/statistics/memory/summary/*"
label_selector = "name()" ## what to put here...?
value_selector = "text()"
with open(xmlfile, "r") as data:
xmldata = et.parse(data)
etree = xmldata.getroot()
statlist = etree.xpath(location)
for stat in statlist:
label = stat.xpath(label_selector)
value = stat.xpath(value_selector)[0]
print("{0}: {1}".format(label, value).strip())
if __name__ == '__main__':
main()
Output
TotalUse: 1232952256
InUse: 835252452
BlockSize: 598212608
ContextSize: 52670016
Lost: 0
I think you don't need XPath for the two values, the element nodes have properties tag and text so use for instance a list comprehension:
[(element.tag, element.text) for element in etree.xpath(location)]
Or if you really want to use XPath
result = [(element.xpath('name()'), element.xpath('string()')) for element in etree.xpath(location)]
You could of course also construct a list of dictionaries:
result = [{ element.tag : element.text } for element in root.xpath(location)]
or
result = [{ element.xpath('name()') : element.xpath('string()') } for element in etree.xpath(location)]
I am trying to easily access values from an xml file.
<artikelen>
<artikel nummer="121">
<code>ABC123</code>
<naam>Highlight pen</naam>
<voorraad>231</voorraad>
<prijs>0.56</prijs>
</artikel>
<artikel nummer="123">
<code>PQR678</code>
<naam>Nietmachine</naam>
<voorraad>587</voorraad>
<prijs>9.99</prijs>
</artikel>
..... etc
If i want to acces the value ABC123, how do I get it?
import xmltodict
with open('8_1.html') as fd:
doc = xmltodict.parse(fd.read())
print(doc[fd]['code'])
Using your example:
import xmltodict
with open('artikelen.xml') as fd:
doc = xmltodict.parse(fd.read())
If you examine doc, you'll see it's an OrderedDict, ordered by tag:
>>> doc
OrderedDict([('artikelen',
OrderedDict([('artikel',
[OrderedDict([('#nummer', '121'),
('code', 'ABC123'),
('naam', 'Highlight pen'),
('voorraad', '231'),
('prijs', '0.56')]),
OrderedDict([('#nummer', '123'),
('code', 'PQR678'),
('naam', 'Nietmachine'),
('voorraad', '587'),
('prijs', '9.99')])])]))])
The root node is called artikelen, and there a subnode artikel which is a list of OrderedDict objects, so if you want the code for every article, you would do:
codes = []
for artikel in doc['artikelen']['artikel']:
codes.append(artikel['code'])
# >>> codes
# ['ABC123', 'PQR678']
If you specifically want the code only when nummer is 121, you could do this:
code = None
for artikel in doc['artikelen']['artikel']:
if artikel['#nummer'] == '121':
code = artikel['code']
break
That said, if you're parsing XML documents and want to search for a specific value like that, I would consider using XPath expressions, which are supported by ElementTree.
This is using xml.etree
You can try this:
for artikelobj in root.findall('artikel'):
print artikelobj.find('code')
if you want to extract a specific code based on the attribute 'nummer' of artikel, then you can try this:
for artikelobj in root.findall('artikel'):
if artikel.get('nummer') == 121:
print artikelobj.find('code')
this will print only the code you want.
You can use lxml package using XPath Expression.
from lxml import etree
f = open("8_1.html", "r")
tree = etree.parse(f)
expression = "/artikelen/artikel[1]/code"
l = tree.xpath(expression)
code = next(i.text for i in l)
print code
# ABC123
The thing to notice here is the expression. /artikelen is the root element. /artikel[1] chooses the first artikel element under root(Notice first element is not at index 0). /code is the child element under artikel[1]. You can read more about at lxml and xpath syntax.
To read .xml files :
import lxml.etree as ET
root = ET.parse(filename).getroot()
value = root.node1.node2.variable_name.text
xml file :
<global>
<rtmp>
<fcsapp>
<password>
<key>hello123</key>
<key>check123</key>
</password>
</fcsapp>
</rtmp>
</global>
python code : To obtain all the key tag values.
hello123
check123
using xml.etree.ElementTree
for streams in xmlRoot.iter('global'):
xpath = "/rtmp/fcsapp/password"
tag = "key"
for child in streams.findall(xpath):
resultlist.append(child.find(tag).text)
print resultlist
The output obtained is [hello123], but I want it to display both ([hello123, check123])
How do I obtain this?
Using lxml and cssselect I would do it like this:
>>> from lxml.html import fromstring
>>> doc = fromstring(open("foo.xml", "r").read())
>>> doc.cssselect("password key")
[<Element key at 0x7f77a6786cb0>, <Element key at 0x7f77a6786d70>]
>>> [e.text for e in doc.cssselect("password key")]
['hello123 \n ', 'check123 \n ']
With lxml and xpath You can do it in the following way:
from lxml import etree
xml = """
<global>
<rtmp>
<fcsapp>
<password>
<key>hello123</key>
<key>check123</key>
</password>
</fcsapp>
</rtmp>
</global>
"""
tree = etree.fromstring(xml)
result = tree.xpath('//password/key/text()')
print result # ['hello123', 'check123']
try beautifulsoup package "https://pypi.python.org/pypi/BeautifulSoup"
using xml.etree.ElementTree
for streams in xmlRoot.iter('global'):
xpath = "/rtmp/fcsapp/password"
tag = "key"
for child in streams.iter(tag):
resultlist.append(child.text)
print resultlist
have to iter over the "key" tag in for loop to obtain the desired result. The above code solves the problem.