XML in Python and lxml - python

I am using the pinnacle (betting) api which returns an XML file. At the moment, I save it to a .xml file as below:
req = urllib2.Request(url, headers=headers)
responseData = urllib2.urlopen(req).read()
ofn = 'pinnacle_feed_basketball.xml'
with open(ofn, 'w') as ofile:
ofile.write(responseData)
parse_xml()
and then open it in the parse_xml function
tree = etree.parse("pinnacle_feed_basketball.xml")
fdtime = tree.xpath('//rsp/fd/fdTime/text()')
I am presuming saving it as an XML file and then reading in the file is not necessary but I cannot get it to work without doing this.
I tried passing in responseData to the parsexml() function
parse_xml(responseData)
and then in the function
tree = etree.parse(responseData)
fdtime = tree.xpath('//rsp/fd/fdTime/text()')
But it doesn't work.

If you want to parse an in-memory object (in your case, a string), use etree.fromstring(<obj>) -- etree.parse expects a file-like object or filename -- Docs
For example:
import urllib2, lxml.etree as etree
url = 'http://www.xmlfiles.com/examples/note.xml'
headers = {}
req = urllib2.Request(url, headers=headers)
responseData = urllib2.urlopen(req).read()
element = etree.fromstring(responseData)
print(element)
print(etree.tostring(element, pretty_print=True))
Output:
<Element note at 0x2c29dc8>
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>

parse() is designed to read from file-like objects.
But you are passing a string in both cases - pinnacle_feed_basketball.xml string and responseData, which is also a string.
In the first case it should be:
with open("pinnacle_feed_basketball.xml") as f:
tree = etree.parse(f)
In the second case:
root = etree.fromstring(responseData) # note that you are not getting an "ElementTree" object here
FYI, urllib2.urlopen(req) is also a file-like object:
tree = etree.parse(urllib2.urlopen(req))

Related

How to write ElementTree to a file object in Python?

I have a XML-file that I need to send to some URL. I do that this way:
data = { 'file' : open('test.xml', 'rb') }
req = requests.post(URL, files=data)
This works, but problem is that first I need to generate XML, then I need to do this:
et = etree.ElementTree(root)
et.write('test.xml', encoding='utf8')
and then after that I do this:
data = { 'file' : open('test.xml', 'rb') }
req = requests.post(URL, files=data)
But I don't like this, I have XML-file, then I write it to disk, just to read it again from disk.
Is there a way to write that XML directly to file object (equivalent of open('test.xml', 'rb')) without writing it to file first?
Try using tostring
Ex:
et = etree.ElementTree(root)
req = requests.post(URL, data=etree.tostring(et.getroot()))

Parsing an xml file and creating another from the parsed object

I am trying to parse an xml file(containing bad characters) using lxml module in recover = True mode.
Below is the code snippet
from lxml import etree
f=open('test.xml')
data=f.read()
f.close()
parser = etree.XMLParser(recover=True)
x = etree.fromstring(data, parser=parser)
Now I want to create another xml file (test1.xml) from the above object (x)
Could anyone please help in this matter.
Thanks
I think this is what you are searching for
from lxml import etree
# opening the source file
with open('test.xml','r') as f:
# reading the number
data=f.read()
parser = etree.XMLParser(recover=True)
# fromstring() parses XML from a string directly into an Element
x = etree.fromstring(data, parser=parser)
# taking the content retrieved
y = etree.tostring(x, pretty_print=True).decode("utf-8")
# writing the content on the output file
with open('test1.xml','w') as f:
f.write(y)

Can anyone tell me what error msg "line 1182 in parse" means when I'm trying to parse and xml in python

This is the code that results in an error message:
import urllib
import xml.etree.ElementTree as ET
url = raw_input('Enter URL:')
urlhandle = urllib.urlopen(url)
data = urlhandle.read()
tree = ET.parse(data)
The error:
I'm new to python. I did read documentation and a couple of tutorials, but clearly I still have done something wrong. I don't believe it is the xml file itself because it does this to two different xml files.
Consider using ElementTree's fromstring():
import urllib
import xml.etree.ElementTree as ET
url = raw_input('Enter URL:')
# http://feeds.bbci.co.uk/news/rss.xml?edition=int
urlhandle = urllib.urlopen(url)
data = urlhandle.read()
tree = ET.fromstring(data)
print ET.tostring(tree, encoding='utf8', method='xml')
data is a reference to the XML content as a string, but the parse() function expects a filename or file object as argument. That's why there is an an error.
urlhandle is a file object, so tree = ET.parse(urlhandle) should work for you.
The error message indicates that your code is trying to open a file, who's name is stored in the variable source.
It's failing to open that file (IOError) because the variable source contains a bunch of XML, not a file name.

lxml.objectify.parse fails while fromstring works

Struggling on why lxml.objectify.parse is failing when I use a file IO but not a string IO.
The following code works:
with open(logPath,'r', encoding='utf-8') as f:
xml = f.read()
root = objectify.fromstring(xml)
print(root.tag)
The following code fails with error:
AttributeError: 'lxml.etree._ElementTree' object has no attribute 'tag'
with open(pelogPath,'r', encoding='utf-8') as f:
#xml = f.read()
root = objectify.parse(f)
print(root.tag)
That's because fromstring() would return a root element directly:
Parses an XML document or fragment from a string. Returns the root node (or the result returned by a parser target).
while the parse() would return an ElementTree object:
Return an ElementTree object loaded with source elements.
Use getroot() to get to the root element in this case:
tree = objectify.parse(f)
root = tree.getroot()

Parsing XML File with Python, while extracting Attributes and Children

I'm trying to read an XML file in Python whose general format is as follows:
<item id="1149" num="1" type="topic">
<title>Afghanistan</title>
<additionalInfo>Afghanistan</additionalInfo>
</item>
(This snippet repeats many times.)
I'm trying to get the id value and the title value to be printed into a file.
Currently, I'm having trouble with getting the XML file into Python. Currently, I'm doing this to get the XML file:
import xml.etree.ElementTree as ET
from urllib2 import urlopen
url = 'http://api.npr.org/list?id=3002' #1007 is science
response = urlopen(url)
f = open('out.xml', 'w')
f.write(response)
However, whenever I run this code, I get the error Traceback (most recent call last): File "python", line 9, in <module> TypeError: expected a character buffer object, which makes me think that I'm not using something that can handle XML.
Is there any way that I can save the XML file to a file, then extract the title of each section, as well as the id attribute associated with that title?
Thanks for the help.
You can read the content of response by this code :
import urllib2
opener = urllib2.build_opener(urllib2.HTTPRedirectHandler(),urllib2.HTTPCookieProcessor())
response= opener.open("http://api.npr.org/list?id=3002").read()
opener.close()
and then write it to file :
f = open('out.xml', 'w')
f.write(response)
f.close()
What you want is response.read() not response. The response variable is an instance not the xml string. By doing response.read() it will read the xml from the response instance.
You can then write it directly to a file like so:
url = 'http://api.npr.org/list?id=3002' #1007 is science
response = urlopen(url)
f = open('out.xml', 'w')
f.write(response.read())
Alternatively you could also parse it directly into the ElementTree like so:
url = 'http://api.npr.org/list?id=3002' #1007 is science
response = urlopen(url)
tree = ET.fromstring(response.read())
To extract all of the id/title pairs you could do the following as well:
url = 'http://api.npr.org/list?id=3002' #1007 is science
response = urlopen(url)
tree = ET.fromstring(response.read())
for item in tree.findall("item"):
print item.get("id")
print item.find("title").text
From there you can decide where to store/output the values

Categories