Parsing Weather XML with Python - python

I'm a beginner but with a lot of effort I'm trying to parse some data about the weather from an .xml file called "weather.xml" which looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<Weather>
<locality name="Rome" alt="21">
<situation temperature="18°C" temperatureF="64,4°F" humidity="77%" pression="1016 mb" wind="5 SSW km/h" windKN="2,9 SSW kn">
<description>clear sky</description>
<lastUpdate>17:45</lastUpdate>
/>
</situation>
<sun sunrise="6:57" sunset="18:36" />
</locality>
I parsed some data from this XML and this is how my Python code looks now:
#!/usr/bin/python
from xml.dom import minidom
xmldoc = minidom.parse('weather.xml')
entry_situation = xmldoc.getElementsByTagName('situation')
entry_locality = xmldoc.getElementsByTagName('locality')
print entry_locality[0].attributes['name'].value
print "Temperature: "+entry_situation[0].attributes['temperature'].value
print "Humidity: "+entry_situation[0].attributes['humidity'].value
print "Pression: "+entry_situation[0].attributes['pression'].value
It's working fine but if I try to parse data from "description" or "lastUpdate" node with the same method, I get an error, so this way must be wrong for those nodes which actually I can see they are differents.
I'm also trying to write output into a log file with no success, the most I get is an empty file.
Thank you for your time reading this.

It is because "description" and "lastUpdate" are not attributes but child nodes of the "situation" node.
Try:
d = entry_situation[0].getElementsByTagName("description")[0]
print "Description: %s" % d.firstChild.nodeValue
You should use the same method to access the "situation" node from its parent "locality".
By the way you should take a look at the lxml module, especially the objectify API as yegorich said. It is easier to use.

Related

How can we parse xml data that contains nodes with xml namespace tags in python?

I am getting XML as a response so I want to parse it. I tried many python libraries but not get my desired results. So if you can help, it will be really appreciative.
The following code returns None:
xmlResponse = ET.fromstring(context.response_document)
a = xmlResponse.findall('.//Body')
print(a)
Sample XML Data:
<S:Envelope
xmlns:S="http://www.w3.org/2003/05/soap-envelope">
<S:Header>
<wsa:Action s:mustUnderstand="1"
xmlns:s="http://www.w3.org/2003/05/soap-envelope"
xmlns:wsa="http://www.w3.org/2005/08/addressing">urn:ihe:iti:2007:RegistryStoredQueryResponse
</wsa:Action>
</S:Header>
<S:Body>
<query:AdhocQueryResponse status="urn:oasis:names:tc:ebxml-regrep:ResponseStatusType:Success"
xmlns:query="urn:oasis:names:tc:ebxml-regrep:xsd:query:3.0">
<rim:RegistryObjectList
xmlns:rim="u`enter code here`rn:oasis:names:tc:ebxml-regrep:xsd:rim:3.0"/>
</query:AdhocQueryResponse>
</S:Body>
</S:Envelope>
I want to get status from it which is in Body. If you can suggest some changes of some library then please help me. Thanks
Given the following base code:
import xml.etree.ElementTree as ET
root = ET.fromstring(xml)
Let's build on top of it to get your desired output.
Your initial find for .//Body x-path returns NONE because it doesn't exist in your XML response.
Each tag in your XML has a namespace associated with it. More info on xml namespaces can be found here.
Consider the following line with xmlns value (xml-namespace):
<S:Envelope xmlns:S="http://www.w3.org/2003/05/soap-envelope">
The value of namespace S is set to be http://www.w3.org/2003/05/soap-envelope.
Replacing S in {S}Envelope with value set above will give you the resulting tag to find in your XML:
root.find('{http://www.w3.org/2003/05/soap-envelope}Envelope') #top most node
We would need to do the same for <S:Body>.
To get<S:Body> elements and it's child nodes you can do the following:
body_node = root.find('{http://www.w3.org/2003/05/soap-envelope}Body')
for response_child_node in list(body_node):
print(response_child_node.tag) #tag of the child node
print(response_child_node.get('status')) #the status you're looking for
Outputs:
{urn:oasis:names:tc:ebxml-regrep:xsd:query:3.0}AdhocQueryResponse
urn:oasis:names:tc:ebxml-regrep:ResponseStatusType:Success
Alternatively
You can also directly find all {query}AdhocQueryResponse in your XML using:
response_nodes = root.findall('.//{urn:oasis:names:tc:ebxml-regrep:xsd:query:3.0}AdhocQueryResponse')
for response in response_nodes:
print(response.get('status'))
Outputs:
urn:oasis:names:tc:ebxml-regrep:ResponseStatusType:Success

Parse XML from String using Python

I'm pulling XML from a URL, which is behind a login screen. I can enter my credentials and grab the XML just fine, but I'm having trouble parsing this string. I think it's a list of XML tags. Anyway, the XML looks like this:
<Report Type="SLA Report" SiteName="Get Dataset Metadata" SLA_Name="Get Dataset Metadata" SLA_Description="Get Dataset Metadata" From="2018-11-27 00:00" Thru="2018-11-27 23:59" obj_device="4500" locations="69,31,"><Objective Type="Availability"><Goal>99.93</Goal><Actual>100.00</Actual><Compliant>Yes</Compliant><Errors>0</Errors><Checks>2878</Checks></Objective><Objective Type="Uptime"><Goal/><Actual/><Compliant/><Errors>0</Errors><Checks>0</Checks></Objective><Objective Type="Response Time"><Goal>300.00</Goal><Actual>1.7482</Actual><Compliant>Yes</Compliant><Errors>0</Errors><Checks>2878</Checks></Objective><MonitoringPeriods><Monitor><Exclude>No</Exclude><DayFrom>Sunday</DayFrom><TimeFrom>00:00</TimeFrom><DayThru>Sunday</DayThru><TimeThru>23:59</TimeThru></Monitor><Monitor><Exclude>No</Exclude><DayFrom>Monday</DayFrom><TimeFrom>00:00</TimeFrom><DayThru>Monday</DayThru><TimeThru>23:59</TimeThru></Monitor>
I want to get everything between the tags and everything between the tags. I want to load this into a Data Frame. That's it. How can I do that?
I thought I could use tree and root, like this:
REQUEST_URL = 'https://URL'
response = requests.get(REQUEST_URL, auth=(login, password))
xml_data = response.text.encode('utf-8', 'ignore')
tree = ET.parse(xml_data)
root = tree.getroot()
But that just gives me NameError: name 'tree' is not defined & NameError: name 'root' is not defined. I'm hoping there is a one-liner or two-liner, to get everything tidied up. So far, I haven't come across anything useful.
print(response.text) gives me:
<?xml version='1.0' standalone='yes'?><Report Type='SLA Report'
SiteName='Execute Query'
SLA_Name='Execute Query'
SLA_Description='Execute Query'
From='2018-11-27 00:00'
Thru='2018-11-27 23:59'
obj_device='4500'
locations='69,31,'
>
<Objective Type='Availability'>
<Goal>99.93</Goal>
<Actual>99.93</Actual>
<Compliant>Yes</Compliant>
<Errors>2</Errors>
<Checks>2878</Checks>
</Objective>
<Objective Type='Uptime'>
<Goal></Goal>
<Actual></Actual>
<Compliant></Compliant>
<Errors>0</Errors>
<Checks>0</Checks>
</Objective>
<Objective Type='Response Time'>
<Goal>300.00</Goal>
<Actual>3.1164</Actual>
<Compliant>Yes</Compliant>
<Errors>0</Errors>
<Checks>2878</Checks>
</Objective>
<MonitoringPeriods>
<Monitor>
<Exclude>No</Exclude><DayFrom>Sunday</DayFrom><TimeFrom>00:00</TimeFrom><DayThru>Sunday</DayThru><TimeThru>23:59</TimeThru>
</Monitor>

Create array of values from specific element in XML using Python

I have an XML file which has many elements. I would like to create a list/array of all the values which have a specific element name, in my case "pair:ApplicationNumber".
I've gone over a lot of the other questions however I am not able to find an answer. I know that I can do this by loading the text file and going over it using pandas however, I'm sure there's a much better way.
I was unsuccessful trying ElementTree as well as XML.Dom using minidom
My code currently looks as follows:
import os
from xml.dom import minidom
WindowsUser = os.getenv('username')
XMLPath = os.path.join('C:\\Users', WindowsUser, 'Downloads', 'ApplicationsByCustomerNumber.xml')
xmldoc = minidom.parse(XMLPath)
itemlist = xmldoc.getElementsByTagName('pair:ApplicationNumber')
for s in itemlist:
print(s.attributes['pair:ApplicationNumber'].value)
an example XML file looks as follows:
<?xml version="1.0" encoding="UTF-8"?>
<pair:PatentApplicationList xsi:schemaLocation="urn:us:gov:uspto:pair PatentApplicationList.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:pair="urn:us:gov:uspto:pair">
<pair:FileHeader>
<pair:FileCreationTimeStamp>2017-07-10T10:52:12.12</pair:FileCreationTimeStamp>
</pair:FileHeader>
<pair:ApplicationStatusData>
<pair:ApplicationNumber>62383607</pair:ApplicationNumber>
<pair:ApplicationStatusCode>20</pair:ApplicationStatusCode>
<pair:ApplicationStatusText>Application Dispatched from Preexam, Not Yet Docketed</pair:ApplicationStatusText>
<pair:ApplicationStatusDate>2016-09-16</pair:ApplicationStatusDate>
<pair:AttorneyDocketNumber>1354-T-02-US</pair:AttorneyDocketNumber>
<pair:FilingDate>2016-09-06</pair:FilingDate>
<pair:LastModifiedTimestamp>2017-05-30T21:40:37.37</pair:LastModifiedTimestamp>
<pair:CustomerNumber>122761</pair:CustomerNumber><pair:LastFileHistoryTransaction>
<pair:LastTransactionDate>2017-05-30</pair:LastTransactionDate>
<pair:LastTransactionDescription>Email Notification</pair:LastTransactionDescription> </pair:LastFileHistoryTransaction>
<pair:ImageAvailabilityIndicator>true</pair:ImageAvailabilityIndicator>
</pair:ApplicationStatusData>
<pair:ApplicationStatusData>
<pair:ApplicationNumber>62292372</pair:ApplicationNumber>
<pair:ApplicationStatusCode>160</pair:ApplicationStatusCode>
<pair:ApplicationStatusText>Abandoned -- Incomplete Application (Pre-examination)</pair:ApplicationStatusText>
<pair:ApplicationStatusDate>2016-11-01</pair:ApplicationStatusDate>
<pair:AttorneyDocketNumber>681-S-23-US</pair:AttorneyDocketNumber>
<pair:FilingDate>2016-02-08</pair:FilingDate>
<pair:LastModifiedTimestamp>2017-06-20T21:59:26.26</pair:LastModifiedTimestamp>
<pair:CustomerNumber>122761</pair:CustomerNumber><pair:LastFileHistoryTransaction>
<pair:LastTransactionDate>2017-06-20</pair:LastTransactionDate>
<pair:LastTransactionDescription>Petition Entered</pair:LastTransactionDescription> </pair:LastFileHistoryTransaction>
<pair:ImageAvailabilityIndicator>true</pair:ImageAvailabilityIndicator>
</pair:ApplicationStatusData>
<pair:ApplicationStatusData>
<pair:ApplicationNumber>62289245</pair:ApplicationNumber>
<pair:ApplicationStatusCode>160</pair:ApplicationStatusCode>
<pair:ApplicationStatusText>Abandoned -- Incomplete Application (Pre-examination)</pair:ApplicationStatusText>
<pair:ApplicationStatusDate>2016-10-26</pair:ApplicationStatusDate>
<pair:AttorneyDocketNumber>1526-P-01-US</pair:AttorneyDocketNumber>
<pair:FilingDate>2016-01-31</pair:FilingDate>
<pair:LastModifiedTimestamp>2017-06-15T21:24:13.13</pair:LastModifiedTimestamp>
<pair:CustomerNumber>122761</pair:CustomerNumber><pair:LastFileHistoryTransaction>
<pair:LastTransactionDate>2017-06-15</pair:LastTransactionDate>
<pair:LastTransactionDescription>Petition Entered</pair:LastTransactionDescription> </pair:LastFileHistoryTransaction>
<pair:ImageAvailabilityIndicator>true</pair:ImageAvailabilityIndicator>
</pair:ApplicationStatusData>
</pair:PatentApplicationList>
The XML in your example is expanding the "pair:" part of the tags according to the schema you've used, so it doesn't match 'pair:ApplicationNumber', even though it looks like it should.
I've used element tree to extract the application numbers as follows (I've just used a local XML file in my examples, rather than the full path in your code)
Example 1:
from xml.etree import ElementTree
tree = ElementTree.parse('ApplicationsByCustomerNumber.xml')
root = tree.getroot()
for item in root:
if 'ApplicationStatusData' in item.tag:
for child in item:
if 'ApplicationNumber' in child.tag:
print child.text
Example 2:
from xml.etree import ElementTree
tree = ElementTree.parse('ApplicationsByCustomerNumber.xml')
root = tree.getroot()
for item in root.iter('{urn:us:gov:uspto:pair}ApplicationStatusData'):
for child in item.iter('{urn:us:gov:uspto:pair}ApplicationNumber'):
print child.text
Hope this may be useful.

python elementree blank output

I am parsing an XML output from VCloud, however I am not able to reach to the values
<?xml version="1.0" encoding="UTF-8"?>
<SupportedVersions xmlns="http://www.vmware.com/vcloud/versions" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.vmware.com/vcloud/versions http://10.10.6.12/api/versions/schema/versions.xsd">
<VersionInfo>
<Version>1.5</Version>
<LoginUrl>https://api.vcd.portal.skyscapecloud.com/api/sessions</LoginUrl>
<MediaTypeMapping>
<MediaType>application/vnd.vmware.vcloud.instantiateVAppTemplateParams+xml</MediaType>
<ComplexTypeName>InstantiateVAppTemplateParamsType</ComplexTypeName>
<SchemaLocation>http://api.vcd.portal.skyscapecloud.com/api/v1.5/schema/master.xsd</SchemaLocation>
</MediaTypeMapping>
<MediaTypeMapping>
<MediaType>application/vnd.vmware.admin.vmwProviderVdcReferences+xml</MediaType>
<ComplexTypeName>VMWProviderVdcReferencesType</ComplexTypeName>
<SchemaLocation>http://api.vcd.portal.skyscapecloud.com/api/v1.5/schema/vmwextensions.xsd</SchemaLocation>
</MediaTypeMapping>
<MediaTypeMapping>
<MediaType>application/vnd.vmware.vcloud.customizationSection+xml</MediaType>
<ComplexTypeName>CustomizationSectionType</ComplexTypeName>
<SchemaLocation>http://api.vcd.portal.skyscapecloud.com/api/v1.5/schema/master.xsd</SchemaLocation>
</MediaTypeMapping>
this is what I have been using
import xml.etree.ElementTree as ET
data = ET.fromstring(content)
versioninfo = data.findall("VersionInfo/Version")
print len(versioninfo)
print versioninfo.text
however this gives a blank output...any suggestions?
Try this:
import xml.etree.ElementTree as ET
data = ET.fromstring(content)
versioninfo = data.find(
"ns:VersionInfo/ns:Version",
namespaces={'ns':'http://www.vmware.com/vcloud/versions'})
print versioninfo.text
Use .find(), not .findall() to return a single element
Your XML uses namespaces. The full path to your desired object is: '{http://www.vmware.com/vcloud/versions}VersionInfo/{http://www.vmware.com/vcloud/versions}Version' By passing in the namespaces parameter, you are able to use the shortcut syntax: ns:VersionInfo/ns:Version.

Convert XML to python objects using lxml

I'm trying to use the lxml library to parse an XML file...what I want is to use XML as the datasource, but still maintain the normal Django-way of interactive with the resulting objects...from the docs, I can see that lxml.objectify is what I'm suppossed to use, but I don't know how to proceed after: list = objectify.parse('myfile.xml')
Any help will be very much appreciated. Thanks.
A sample of the file (has about 100+ records) is this:
<store>
<book>
<publisher>Hodder &...</publisher>
<isbn>345123890</isbn>
<author>King</author>
<comments>
<comment rank='1'>Interesting</comment>
<comments>
<pages>200</pages>
</book>
<book>
<publisher>Penguin Books</publisher>
<isbn>9011238XX</isbn>
<author>Armstrong</author>
<comments />
<pages>150</pages>
</book>
</store>
From this, I want to do the following (something just as easy to write as Books.objects.all() and Books.object.get_object_or_404(isbn=selected) is most preferred ):
Display a list of all books with their respective attributes
Enable viewing of further details of a book by selecting it from the list
Firstly, "list" isn't a very good variable because it "shadows" the built-in type "list."
Now, say you have this xml:
<root>
<node1 val="foo">derp</node1>
<node2 val="bar" />
</root>
Now, you could do this:
root = objectify.parse("myfile.xml")
print root.node1.get("val") # prints "foo"
print root.node1.text # prints "derp"
print root.node2.get("val") # prints "bar"
Another tip: when you have lots of nodes with the same name, you can loop over them.
>>> xml = """<root>
<node val="foo">derp</node>
<node val="bar" />
</root>"""
>>> root = objectify.fromstring(xml)
>>> for node in root.node:
print node.get("val")
foo
bar
Edit
You should be able to simply set your django context to the books object, and use that from your templates.
context = dict(books = root.book,
# other stuff
)
And then you'll be able to iterate through the books in the template, and access each book object's attributes.

Categories