I'm a beginner but with a lot of effort I'm trying to parse some data about the weather from an .xml file called "weather.xml" which looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<Weather>
<locality name="Rome" alt="21">
<situation temperature="18°C" temperatureF="64,4°F" humidity="77%" pression="1016 mb" wind="5 SSW km/h" windKN="2,9 SSW kn">
<description>clear sky</description>
<lastUpdate>17:45</lastUpdate>
/>
</situation>
<sun sunrise="6:57" sunset="18:36" />
</locality>
I parsed some data from this XML and this is how my Python code looks now:
#!/usr/bin/python
from xml.dom import minidom
xmldoc = minidom.parse('weather.xml')
entry_situation = xmldoc.getElementsByTagName('situation')
entry_locality = xmldoc.getElementsByTagName('locality')
print entry_locality[0].attributes['name'].value
print "Temperature: "+entry_situation[0].attributes['temperature'].value
print "Humidity: "+entry_situation[0].attributes['humidity'].value
print "Pression: "+entry_situation[0].attributes['pression'].value
It's working fine but if I try to parse data from "description" or "lastUpdate" node with the same method, I get an error, so this way must be wrong for those nodes which actually I can see they are differents.
I'm also trying to write output into a log file with no success, the most I get is an empty file.
Thank you for your time reading this.
It is because "description" and "lastUpdate" are not attributes but child nodes of the "situation" node.
Try:
d = entry_situation[0].getElementsByTagName("description")[0]
print "Description: %s" % d.firstChild.nodeValue
You should use the same method to access the "situation" node from its parent "locality".
By the way you should take a look at the lxml module, especially the objectify API as yegorich said. It is easier to use.
Related
I am getting XML as a response so I want to parse it. I tried many python libraries but not get my desired results. So if you can help, it will be really appreciative.
The following code returns None:
xmlResponse = ET.fromstring(context.response_document)
a = xmlResponse.findall('.//Body')
print(a)
Sample XML Data:
<S:Envelope
xmlns:S="http://www.w3.org/2003/05/soap-envelope">
<S:Header>
<wsa:Action s:mustUnderstand="1"
xmlns:s="http://www.w3.org/2003/05/soap-envelope"
xmlns:wsa="http://www.w3.org/2005/08/addressing">urn:ihe:iti:2007:RegistryStoredQueryResponse
</wsa:Action>
</S:Header>
<S:Body>
<query:AdhocQueryResponse status="urn:oasis:names:tc:ebxml-regrep:ResponseStatusType:Success"
xmlns:query="urn:oasis:names:tc:ebxml-regrep:xsd:query:3.0">
<rim:RegistryObjectList
xmlns:rim="u`enter code here`rn:oasis:names:tc:ebxml-regrep:xsd:rim:3.0"/>
</query:AdhocQueryResponse>
</S:Body>
</S:Envelope>
I want to get status from it which is in Body. If you can suggest some changes of some library then please help me. Thanks
Given the following base code:
import xml.etree.ElementTree as ET
root = ET.fromstring(xml)
Let's build on top of it to get your desired output.
Your initial find for .//Body x-path returns NONE because it doesn't exist in your XML response.
Each tag in your XML has a namespace associated with it. More info on xml namespaces can be found here.
Consider the following line with xmlns value (xml-namespace):
<S:Envelope xmlns:S="http://www.w3.org/2003/05/soap-envelope">
The value of namespace S is set to be http://www.w3.org/2003/05/soap-envelope.
Replacing S in {S}Envelope with value set above will give you the resulting tag to find in your XML:
root.find('{http://www.w3.org/2003/05/soap-envelope}Envelope') #top most node
We would need to do the same for <S:Body>.
To get<S:Body> elements and it's child nodes you can do the following:
body_node = root.find('{http://www.w3.org/2003/05/soap-envelope}Body')
for response_child_node in list(body_node):
print(response_child_node.tag) #tag of the child node
print(response_child_node.get('status')) #the status you're looking for
Outputs:
{urn:oasis:names:tc:ebxml-regrep:xsd:query:3.0}AdhocQueryResponse
urn:oasis:names:tc:ebxml-regrep:ResponseStatusType:Success
Alternatively
You can also directly find all {query}AdhocQueryResponse in your XML using:
response_nodes = root.findall('.//{urn:oasis:names:tc:ebxml-regrep:xsd:query:3.0}AdhocQueryResponse')
for response in response_nodes:
print(response.get('status'))
Outputs:
urn:oasis:names:tc:ebxml-regrep:ResponseStatusType:Success
I'm pulling XML from a URL, which is behind a login screen. I can enter my credentials and grab the XML just fine, but I'm having trouble parsing this string. I think it's a list of XML tags. Anyway, the XML looks like this:
<Report Type="SLA Report" SiteName="Get Dataset Metadata" SLA_Name="Get Dataset Metadata" SLA_Description="Get Dataset Metadata" From="2018-11-27 00:00" Thru="2018-11-27 23:59" obj_device="4500" locations="69,31,"><Objective Type="Availability"><Goal>99.93</Goal><Actual>100.00</Actual><Compliant>Yes</Compliant><Errors>0</Errors><Checks>2878</Checks></Objective><Objective Type="Uptime"><Goal/><Actual/><Compliant/><Errors>0</Errors><Checks>0</Checks></Objective><Objective Type="Response Time"><Goal>300.00</Goal><Actual>1.7482</Actual><Compliant>Yes</Compliant><Errors>0</Errors><Checks>2878</Checks></Objective><MonitoringPeriods><Monitor><Exclude>No</Exclude><DayFrom>Sunday</DayFrom><TimeFrom>00:00</TimeFrom><DayThru>Sunday</DayThru><TimeThru>23:59</TimeThru></Monitor><Monitor><Exclude>No</Exclude><DayFrom>Monday</DayFrom><TimeFrom>00:00</TimeFrom><DayThru>Monday</DayThru><TimeThru>23:59</TimeThru></Monitor>
I want to get everything between the tags and everything between the tags. I want to load this into a Data Frame. That's it. How can I do that?
I thought I could use tree and root, like this:
REQUEST_URL = 'https://URL'
response = requests.get(REQUEST_URL, auth=(login, password))
xml_data = response.text.encode('utf-8', 'ignore')
tree = ET.parse(xml_data)
root = tree.getroot()
But that just gives me NameError: name 'tree' is not defined & NameError: name 'root' is not defined. I'm hoping there is a one-liner or two-liner, to get everything tidied up. So far, I haven't come across anything useful.
print(response.text) gives me:
<?xml version='1.0' standalone='yes'?><Report Type='SLA Report'
SiteName='Execute Query'
SLA_Name='Execute Query'
SLA_Description='Execute Query'
From='2018-11-27 00:00'
Thru='2018-11-27 23:59'
obj_device='4500'
locations='69,31,'
>
<Objective Type='Availability'>
<Goal>99.93</Goal>
<Actual>99.93</Actual>
<Compliant>Yes</Compliant>
<Errors>2</Errors>
<Checks>2878</Checks>
</Objective>
<Objective Type='Uptime'>
<Goal></Goal>
<Actual></Actual>
<Compliant></Compliant>
<Errors>0</Errors>
<Checks>0</Checks>
</Objective>
<Objective Type='Response Time'>
<Goal>300.00</Goal>
<Actual>3.1164</Actual>
<Compliant>Yes</Compliant>
<Errors>0</Errors>
<Checks>2878</Checks>
</Objective>
<MonitoringPeriods>
<Monitor>
<Exclude>No</Exclude><DayFrom>Sunday</DayFrom><TimeFrom>00:00</TimeFrom><DayThru>Sunday</DayThru><TimeThru>23:59</TimeThru>
</Monitor>
I have an XML file which has many elements. I would like to create a list/array of all the values which have a specific element name, in my case "pair:ApplicationNumber".
I've gone over a lot of the other questions however I am not able to find an answer. I know that I can do this by loading the text file and going over it using pandas however, I'm sure there's a much better way.
I was unsuccessful trying ElementTree as well as XML.Dom using minidom
My code currently looks as follows:
import os
from xml.dom import minidom
WindowsUser = os.getenv('username')
XMLPath = os.path.join('C:\\Users', WindowsUser, 'Downloads', 'ApplicationsByCustomerNumber.xml')
xmldoc = minidom.parse(XMLPath)
itemlist = xmldoc.getElementsByTagName('pair:ApplicationNumber')
for s in itemlist:
print(s.attributes['pair:ApplicationNumber'].value)
an example XML file looks as follows:
<?xml version="1.0" encoding="UTF-8"?>
<pair:PatentApplicationList xsi:schemaLocation="urn:us:gov:uspto:pair PatentApplicationList.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:pair="urn:us:gov:uspto:pair">
<pair:FileHeader>
<pair:FileCreationTimeStamp>2017-07-10T10:52:12.12</pair:FileCreationTimeStamp>
</pair:FileHeader>
<pair:ApplicationStatusData>
<pair:ApplicationNumber>62383607</pair:ApplicationNumber>
<pair:ApplicationStatusCode>20</pair:ApplicationStatusCode>
<pair:ApplicationStatusText>Application Dispatched from Preexam, Not Yet Docketed</pair:ApplicationStatusText>
<pair:ApplicationStatusDate>2016-09-16</pair:ApplicationStatusDate>
<pair:AttorneyDocketNumber>1354-T-02-US</pair:AttorneyDocketNumber>
<pair:FilingDate>2016-09-06</pair:FilingDate>
<pair:LastModifiedTimestamp>2017-05-30T21:40:37.37</pair:LastModifiedTimestamp>
<pair:CustomerNumber>122761</pair:CustomerNumber><pair:LastFileHistoryTransaction>
<pair:LastTransactionDate>2017-05-30</pair:LastTransactionDate>
<pair:LastTransactionDescription>Email Notification</pair:LastTransactionDescription> </pair:LastFileHistoryTransaction>
<pair:ImageAvailabilityIndicator>true</pair:ImageAvailabilityIndicator>
</pair:ApplicationStatusData>
<pair:ApplicationStatusData>
<pair:ApplicationNumber>62292372</pair:ApplicationNumber>
<pair:ApplicationStatusCode>160</pair:ApplicationStatusCode>
<pair:ApplicationStatusText>Abandoned -- Incomplete Application (Pre-examination)</pair:ApplicationStatusText>
<pair:ApplicationStatusDate>2016-11-01</pair:ApplicationStatusDate>
<pair:AttorneyDocketNumber>681-S-23-US</pair:AttorneyDocketNumber>
<pair:FilingDate>2016-02-08</pair:FilingDate>
<pair:LastModifiedTimestamp>2017-06-20T21:59:26.26</pair:LastModifiedTimestamp>
<pair:CustomerNumber>122761</pair:CustomerNumber><pair:LastFileHistoryTransaction>
<pair:LastTransactionDate>2017-06-20</pair:LastTransactionDate>
<pair:LastTransactionDescription>Petition Entered</pair:LastTransactionDescription> </pair:LastFileHistoryTransaction>
<pair:ImageAvailabilityIndicator>true</pair:ImageAvailabilityIndicator>
</pair:ApplicationStatusData>
<pair:ApplicationStatusData>
<pair:ApplicationNumber>62289245</pair:ApplicationNumber>
<pair:ApplicationStatusCode>160</pair:ApplicationStatusCode>
<pair:ApplicationStatusText>Abandoned -- Incomplete Application (Pre-examination)</pair:ApplicationStatusText>
<pair:ApplicationStatusDate>2016-10-26</pair:ApplicationStatusDate>
<pair:AttorneyDocketNumber>1526-P-01-US</pair:AttorneyDocketNumber>
<pair:FilingDate>2016-01-31</pair:FilingDate>
<pair:LastModifiedTimestamp>2017-06-15T21:24:13.13</pair:LastModifiedTimestamp>
<pair:CustomerNumber>122761</pair:CustomerNumber><pair:LastFileHistoryTransaction>
<pair:LastTransactionDate>2017-06-15</pair:LastTransactionDate>
<pair:LastTransactionDescription>Petition Entered</pair:LastTransactionDescription> </pair:LastFileHistoryTransaction>
<pair:ImageAvailabilityIndicator>true</pair:ImageAvailabilityIndicator>
</pair:ApplicationStatusData>
</pair:PatentApplicationList>
The XML in your example is expanding the "pair:" part of the tags according to the schema you've used, so it doesn't match 'pair:ApplicationNumber', even though it looks like it should.
I've used element tree to extract the application numbers as follows (I've just used a local XML file in my examples, rather than the full path in your code)
Example 1:
from xml.etree import ElementTree
tree = ElementTree.parse('ApplicationsByCustomerNumber.xml')
root = tree.getroot()
for item in root:
if 'ApplicationStatusData' in item.tag:
for child in item:
if 'ApplicationNumber' in child.tag:
print child.text
Example 2:
from xml.etree import ElementTree
tree = ElementTree.parse('ApplicationsByCustomerNumber.xml')
root = tree.getroot()
for item in root.iter('{urn:us:gov:uspto:pair}ApplicationStatusData'):
for child in item.iter('{urn:us:gov:uspto:pair}ApplicationNumber'):
print child.text
Hope this may be useful.
I am parsing an XML output from VCloud, however I am not able to reach to the values
<?xml version="1.0" encoding="UTF-8"?>
<SupportedVersions xmlns="http://www.vmware.com/vcloud/versions" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.vmware.com/vcloud/versions http://10.10.6.12/api/versions/schema/versions.xsd">
<VersionInfo>
<Version>1.5</Version>
<LoginUrl>https://api.vcd.portal.skyscapecloud.com/api/sessions</LoginUrl>
<MediaTypeMapping>
<MediaType>application/vnd.vmware.vcloud.instantiateVAppTemplateParams+xml</MediaType>
<ComplexTypeName>InstantiateVAppTemplateParamsType</ComplexTypeName>
<SchemaLocation>http://api.vcd.portal.skyscapecloud.com/api/v1.5/schema/master.xsd</SchemaLocation>
</MediaTypeMapping>
<MediaTypeMapping>
<MediaType>application/vnd.vmware.admin.vmwProviderVdcReferences+xml</MediaType>
<ComplexTypeName>VMWProviderVdcReferencesType</ComplexTypeName>
<SchemaLocation>http://api.vcd.portal.skyscapecloud.com/api/v1.5/schema/vmwextensions.xsd</SchemaLocation>
</MediaTypeMapping>
<MediaTypeMapping>
<MediaType>application/vnd.vmware.vcloud.customizationSection+xml</MediaType>
<ComplexTypeName>CustomizationSectionType</ComplexTypeName>
<SchemaLocation>http://api.vcd.portal.skyscapecloud.com/api/v1.5/schema/master.xsd</SchemaLocation>
</MediaTypeMapping>
this is what I have been using
import xml.etree.ElementTree as ET
data = ET.fromstring(content)
versioninfo = data.findall("VersionInfo/Version")
print len(versioninfo)
print versioninfo.text
however this gives a blank output...any suggestions?
Try this:
import xml.etree.ElementTree as ET
data = ET.fromstring(content)
versioninfo = data.find(
"ns:VersionInfo/ns:Version",
namespaces={'ns':'http://www.vmware.com/vcloud/versions'})
print versioninfo.text
Use .find(), not .findall() to return a single element
Your XML uses namespaces. The full path to your desired object is: '{http://www.vmware.com/vcloud/versions}VersionInfo/{http://www.vmware.com/vcloud/versions}Version' By passing in the namespaces parameter, you are able to use the shortcut syntax: ns:VersionInfo/ns:Version.
I'm trying to use the lxml library to parse an XML file...what I want is to use XML as the datasource, but still maintain the normal Django-way of interactive with the resulting objects...from the docs, I can see that lxml.objectify is what I'm suppossed to use, but I don't know how to proceed after: list = objectify.parse('myfile.xml')
Any help will be very much appreciated. Thanks.
A sample of the file (has about 100+ records) is this:
<store>
<book>
<publisher>Hodder &...</publisher>
<isbn>345123890</isbn>
<author>King</author>
<comments>
<comment rank='1'>Interesting</comment>
<comments>
<pages>200</pages>
</book>
<book>
<publisher>Penguin Books</publisher>
<isbn>9011238XX</isbn>
<author>Armstrong</author>
<comments />
<pages>150</pages>
</book>
</store>
From this, I want to do the following (something just as easy to write as Books.objects.all() and Books.object.get_object_or_404(isbn=selected) is most preferred ):
Display a list of all books with their respective attributes
Enable viewing of further details of a book by selecting it from the list
Firstly, "list" isn't a very good variable because it "shadows" the built-in type "list."
Now, say you have this xml:
<root>
<node1 val="foo">derp</node1>
<node2 val="bar" />
</root>
Now, you could do this:
root = objectify.parse("myfile.xml")
print root.node1.get("val") # prints "foo"
print root.node1.text # prints "derp"
print root.node2.get("val") # prints "bar"
Another tip: when you have lots of nodes with the same name, you can loop over them.
>>> xml = """<root>
<node val="foo">derp</node>
<node val="bar" />
</root>"""
>>> root = objectify.fromstring(xml)
>>> for node in root.node:
print node.get("val")
foo
bar
Edit
You should be able to simply set your django context to the books object, and use that from your templates.
context = dict(books = root.book,
# other stuff
)
And then you'll be able to iterate through the books in the template, and access each book object's attributes.