I'm reading an xml object into Python 3.6 on Windows 10 from file. Here is a sample of the xml:
<?xml version="1.0"?>
<rss version="2.0" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:dc="http://purl.org/dc/elements/1.1/">
<channel>
<item>
<BurnLocation># 32 40 52.99 # 80 57 33.00</BurnLocation>
<geo:lat>32.681389</geo:lat>
<geo:long>-80.959167</geo:long>
<County>Jasper</County>
<BurnType>PD</BurnType>
<BurnTypeDescription>PILED DEBRIS</BurnTypeDescription>
<Acres>2</Acres>
</item>
<item>
<BurnLocation># 33 29 34.26 # 81 15 52.89</BurnLocation>
<geo:lat>33.492851</geo:lat>
<geo:long>-81.264694</geo:long>
<County>Orangebrg</County>
<BurnType>PD</BurnType>
<BurnTypeDescription>PILED DEBRIS</BurnTypeDescription>
<Acres>1</Acres>
</item>
</channel>
</rss>
Here is a version of my code:
import os
import xml.etree.ElementTree as ET
local_filename = os.path.join('C:\\Temp\\test\\', filename)
tree = ET.parse(local_filename)
root = tree.getroot()
for child in root:
for next1 in child:
for next2 in next1:
print(next2.tag,next2.attrib)
The issue I'm having is that I cannot seem to isolate the attributes of the child tags, they are coming up as empty dictionaries. Here is an example of the result:
BurnLocation {}
{http://www.w3.org/2003/01/geo/wgs84_pos#}lat {}
{http://www.w3.org/2003/01/geo/wgs84_pos#}long {}
County {}
BurnType {}
BurnTypeDescription {}
Acres {}
BurnLocation {}
{http://www.w3.org/2003/01/geo/wgs84_pos#}lat {}
{http://www.w3.org/2003/01/geo/wgs84_pos#}long {}
County {}
BurnType {}
BurnTypeDescription {}
Acres {}
I am trying to print out the items within the tags (i.e. Jasper), what am I doing wrong?
What you want here is the text contents of each element, and not their attributes.
This ought to do it (slightly simplified for a fixed filename):
import xml.etree.ElementTree as ET
tree = ET.parse('sample.xml')
root = tree.getroot()
for child in root:
for next1 in child:
for next2 in next1:
print ('{} = "{}"'.format(next2.tag,next2.text))
print ()
However, I'd simplify it a bit by:
locating all <item> elements at once, and
then looping over its children elements.
Thus
import xml.etree.ElementTree as ET
tree = ET.parse('sample.xml')
for item in tree.findall('*/item'):
for elem in list(item):
print ('{} = "{}"'.format(elem.tag,elem.text))
print ()
Related
I've got an XML file which looks like this:
<?xml version="1.0"?>
-<Object>
<ID>Object_01</ID>
<Location>Manchester</Location>
<Date>01-01-2020</Date>
<Time>15u59m05s</Time>
-<Max_25Hz>
<25Hz>0.916631065043311</25Hz>
<25Hz>0.797958008447961</25Hz>
</Max_25Hz>
-<Max_75Hz>
<75Hz>1.96599232706463</75Hz>
<75Hz>1.48317837078523</75Hz>
</Max_75Hz>
</Object>
I still don't really understand the difference between attributes and text. With below code I tried to receive all the values using text.
import xml.etree.ElementTree as ET
root = r'c:\data\FF\Desktop\My_files\XML-files\Object_01.xml'
tree = ET.parse(root)
root = tree.getroot()
for elem in root:
for subelem in elem:
print(subelem.text)
Expected output:
Object_01
Manchester
01-01-2020
15u59m05s
0.916631065043311
0.797958008447961
1.96599232706463
1.48317837078523
Received output:
0.916631065043311
0.797958008447961
1.96599232706463
1.48317837078523
I tried to do to same with .attributes in the hope to receive all the 'column' names but then I received:
{}
{}
{}
{}
You can access them directly above the for-loop.
Ex:
tree = ET.ElementTree(ET.fromstring(X))
root = tree.getroot()
for elem in root:
print(elem.text) #! Access them Here
for subelem in elem:
print(subelem.text)
Output:
Object_01
Manchester
01-01-2020
15u59m05s
0.916631065043311
0.797958008447961
1.96599232706463
1.48317837078523
You could give a try to https://github.com/martinblech/xmltodict.
It is almost a replacement for json module. This allows to read an xml file into a python dict. This simplifies greatly accessing the xml content.
Something like:
from xmldict import *
root = r'c:\data\FF\Desktop\My_files\XML-files\Object_01.xml'
with open(root) as file:
xmlStr = file.read()
xmldict = xml.parse(xmlStr)
print (xmldict['Object']['Id'])
I have an XML file like this i need to insert this data to PostgreSQL DB.Below is the sample XML and the code which i use ,but i'm not getting any output,Can someone please guide on how to effectively fetch these XML values.
<?xml version="1.0" encoding="utf-8"?>
<rss xmlns:g="http://base.google.com/ns/1.0" version="2.0" encoding="utf-8">
<config>
<g:system>Magento</g:system>
<g:extension>Magmodules_Googleshopping</g:extension>
<g:extension_version>1.6.8</g:extension_version>
<g:store>emb</g:store>
<g:url>https://www.xxxxx.com/</g:url>
<g:products>1320</g:products>
<g:generated>2020-06-11 11:18:32</g:generated>
<g:processing_time>17.5007</g:processing_time>
</config>
<channel>
<item>
<g:id>20</g:id>
<g:title>product 1</g:title>
<g:description>description about product 1</g:description>
<g:gtin>42662</g:gtin>
<g:brand>company</g:brand>
<g:mpn>0014</g:mpn>
<g:link>link.html</g:link>
<g:image_link>link/c/a/cat_21_16.jpg</g:image_link>
<g:availability>in stock</g:availability>
<g:condition>new</g:condition>
<g:price>9</g:price>
<g:shipping>
<g:country>UAE</g:country>
<g:service>DHL</g:service>
<g:price>2.90</g:price>
</g:shipping>
</item>
<item>
.
.
.
</item>
Below is the script which i use,
Python : 3.5 Postgres version 11
# import modules
import sys
import psycopg2
import datetime
now = datetime.datetime.now()
# current data and time
dt = now.strftime("%Y%m%dT%H%M%S")
# xml tree access
#from xml.etree import ElementTree
import xml.etree.ElementTree as ET
# incremental variable
x = 0
with open('/Users/admin/documents/shopping.xml', 'rt',encoding="utf8") as f:
#tree = ElementTree.parse(f)
tree = ET.parse(f)
# connection to postgreSQL database
try:
conn=psycopg2.connect(host='localhost', database='postgres',
user='postgres', password='postgres',port='5432')
except:
print ("Hey I am unable to connect to the database.")
cur = conn.cursor()
# access the xml tree element nodes
try:
for node in tree.findall('.//item'):
src = node.find('id')
tgt = node.find('mpn')
print(node)
except:
print ("Oops I can't insert record into database table!")
conn.commit()
conn.close()
The current output i'm getting is like,
None
None
None
Expected Output,
id title description gtin ......
20 product 1 g:description xxxx .....
Strange is that you can't find item. It seems you use wrong file and it doesn't have item.
Using your XML data as string and ET.fromstring() I have no problem to get item.
Maybe check print( f.read() ) to see what you really read from file.
Problem is only id, tgt which use namespace - g: - and it need something more then only g:id, g:tgt
tree = ET.fromstring(xml)
ns = {'g': "http://base.google.com/ns/1.0"}
for node in tree.findall('.//item'):
src = node.find('g:id', ns)
tgt = node.find('g:mpn', ns)
print('Node:', node)
print('src:', src.text)
print('tgt:', tgt.text)
or use directly as '{http://base.google.com/ns/1.0}id' '{http://base.google.com/ns/1.0}mpn'
tree = ET.fromstring(xml)
for node in tree.findall('.//item'):
src = node.find('{http://base.google.com/ns/1.0}id')
tgt = node.find('{http://base.google.com/ns/1.0}mpn')
print('Node:', node)
print('src:', src.text)
print('tgt:', tgt.text)
Minimal working code:
import xml.etree.ElementTree as ET
xml = '''<?xml version="1.0" encoding="utf-8"?>
<rss xmlns:g="http://base.google.com/ns/1.0" version="2.0" encoding="utf-8">
<config>
<g:system>Magento</g:system>
<g:extension>Magmodules_Googleshopping</g:extension>
<g:extension_version>1.6.8</g:extension_version>
<g:store>emb</g:store>
<g:url>https://www.xxxxx.com/</g:url>
<g:products>1320</g:products>
<g:generated>2020-06-11 11:18:32</g:generated>
<g:processing_time>17.5007</g:processing_time>
</config>
<channel>
<item>
<g:id>20</g:id>
<g:title>product 1</g:title>
<g:description>description about product 1</g:description>
<g:gtin>42662</g:gtin>
<g:brand>company</g:brand>
<g:mpn>0014</g:mpn>
<g:link>link.html</g:link>
<g:image_link>link/c/a/cat_21_16.jpg</g:image_link>
<g:availability>in stock</g:availability>
<g:condition>new</g:condition>
<g:price>9</g:price>
<g:shipping>
<g:country>UAE</g:country>
<g:service>DHL</g:service>
<g:price>2.90</g:price>
</g:shipping>
</item>
</channel>
</rss>
'''
tree = ET.fromstring(xml)
ns = {'g': "http://base.google.com/ns/1.0"}
for node in tree.findall('.//item'):
src = node.find('g:id', ns)
tgt = node.find('g:mpn', ns)
print('Node:', node)
print('src:', src.text)
print('tgt:', tgt.text)
Result:
Node: <Element 'item' at 0x7f74ba45b710>
src: 20
tgt: 0014
BTW: It works even when I use io.StringIO to simulate file
f = io.StringIO(xml)
tree = ET.parse(f)
Minimal working code:
import xml.etree.ElementTree as ET
import io
xml = '''<?xml version="1.0" encoding="utf-8"?>
<rss xmlns:g="http://base.google.com/ns/1.0" version="2.0" encoding="utf-8">
<config>
<g:system>Magento</g:system>
<g:extension>Magmodules_Googleshopping</g:extension>
<g:extension_version>1.6.8</g:extension_version>
<g:store>emb</g:store>
<g:url>https://www.xxxxx.com/</g:url>
<g:products>1320</g:products>
<g:generated>2020-06-11 11:18:32</g:generated>
<g:processing_time>17.5007</g:processing_time>
</config>
<channel>
<item>
<g:id>20</g:id>
<g:title>product 1</g:title>
<g:description>description about product 1</g:description>
<g:gtin>42662</g:gtin>
<g:brand>company</g:brand>
<g:mpn>0014</g:mpn>
<g:link>link.html</g:link>
<g:image_link>link/c/a/cat_21_16.jpg</g:image_link>
<g:availability>in stock</g:availability>
<g:condition>new</g:condition>
<g:price>9</g:price>
<g:shipping>
<g:country>UAE</g:country>
<g:service>DHL</g:service>
<g:price>2.90</g:price>
</g:shipping>
</item>
</channel>
</rss>
'''
f = io.StringIO(xml)
tree = ET.parse(f)
ns = {'g': "http://base.google.com/ns/1.0"}
for node in tree.findall('.//item'):
src = node.find('{http://base.google.com/ns/1.0}id')
tgt = node.find('{http://base.google.com/ns/1.0}mpn')
print('Node:', node)
print('src:', src.text)
print('mpn:', tgt.text)
I try to walk through a large xml file, and collect some data. As the location of the data can be find by the path, I used xpath, but no result.
Could someonne suggest what I am doing wrong?
Example of the xml:
<?xml version="1.0" encoding="UTF-8"?>
<rootnode>
<subnode1>
</subnode1>
<subnode2>
</subnode2>
<subnode3>
<listnode>
<item id="1"><name>test name1</name></item>
<item id="2"><name>test name2</name></item>
<item id="3"><name>test name3</name></item>
</listnode>
</subnode3>
</rootnode>
The code:
import lxml.etree as ET
tree = ET.parse('temp/temp.xml')
subtree = tree.xpath('./rootnode/subnode3/listnode')
for next_item in subtree:
Id = next_item.attrib.get('id')
name = next_item.find('name').text
print('{:>20} - {:>20}'.format(name,Id))
You are pretty close.
Ex:
import lxml.etree as ET
tree = ET.parse('temp/temp.xml')
subtree = tree.xpath('/rootnode/subnode3/listnode')
for next_item in subtree:
for item in next_item.findall('item'):
Id = item.attrib.get('id')
name = item.find('name').text
print('{:>20} - {:>20}'.format(name,Id))
OR
subtree = tree.xpath('/rootnode/subnode3/listnode/item')
for item in subtree:
Id = item.attrib.get('id')
name = item.find('name').text
print('{:>20} - {:>20}'.format(name,Id))
Output:
test name1 - 1
test name2 - 2
test name3 - 3
I'm trying to retrieve the value of a particular xml tag in an XML file. The problem is that it returns a memory address instead of the actual value.
Already tried multiple approaches using other libraries as well. Nothing really yielded the result.
from xml.etree import ElementTree
tree = ElementTree.parse('C:\\Users\\Sid\\Desktop\\Test.xml')
root = tree.getroot()
items = root.find("items")
item= items.find("item")
print(item)
Expected was 1 2 3 4. Actual : Memory address.
XML File is :
<data>
<items>
<item>1</item>
</items>
<items>
<item>2</item>
</items>
<items>
<item>3</item>
</items>
<items>
<item>4</item>
</items>
</data>
Using BeautifulSoup:
from bs4 import BeautifulSoup
import urllib
test = '''<data>
<items>
<item>1</item>
</items>
<items>
<item>2</item>
</items>
<items>
<item>3</item>
</items>
<items>
<item>4</item>
</items>
</data>'''
soup = BeautifulSoup(test, 'html.parser')
data = soup.find_all("item")
for d in data:
print(d.text)
OUTPUT:
1
2
3
4
Using XML Element Tree:
from xml.etree import ElementTree
tree = ElementTree.parse('list.txt')
root = tree.getroot()
items = root.findall("items")
for elem in items:
desired_tag = elem.find("item")
print(desired_tag.text)
OUTPUT:
1
2
3
4
EDIT:
If you want them printed in a line separated by spaces.
print(desired_tag.text, "\t", end = "")
Here is an api xml i am working with:
<response>
<request>polaris</request>
<status>0</status>
<verbiage>OK</verbiage>
<object id="S251">
<type id="1">Star</type>
<name>α UMi</name>
<catId>α UMi</catId>
<constellation id="84">Ursa Minor</constellation>
<ra unit="hour">2.5301944</ra>
<de unit="degree">89.264167</de>
<mag>2.02</mag>
</object>
<object id="S251">
<type id="1">Star</type>
<name>α UMi</name>
<catId>α UMi</catId>
<constellation id="84">Ursa Minor</constellation>
<ra unit="hour">2.5301944</ra>
<de unit="degree">89.264167</de>
<mag>2.02</mag>
</object>
</response>
Here is my current code:
#!/usr/bin/env python
import xml.etree.ElementTree as ET
tree = ET.parse('StarGaze.xml')
root = tree.getroot()
callevent=root.find('polaris')
Moc1=callevent.find('polaris')
for node in Moc1.getiterator():
if node.tag=='constellation id':
print node.tag, node.attrib, node.text'
I want to be able to print defined children. For example:
constellation id=
ra unit=
Any help would be very much appreciated
Iterate over the object nodes and locate the constellation and ra nodes using findall() and find() methods and .attrib attribute:
import xml.etree.ElementTree as ET
tree = ET.parse('StarGaze.xml')
root = tree.getroot()
for obj in root.findall("object"):
constellation = obj.find("constellation")
ra = obj.find("ra")
print(constellation.attrib["id"], constellation.text, ra.attrib["unit"], ra.text)
Would print:
84 Ursa Minor hour 2.5301944
84 Ursa Minor hour 2.5301944