I have had great success parsing RSS feeds from the National Hurricane Center using the feedparser module:
import feedparser
feedparser.parse('https://www.nhc.noaa.gov/gis-at.xml') #Works Fine
feedparser.parse('https://www.nhc.noaa.gov/gis-ep.xml') #Works Fine
However, when I try to read the superficially similar feed from the Central Pacific Hurricane Center, I generate a KeyError:
feedparser.parse('http://www.prh.noaa.gov/cphc/gis-cp.xml') #Doesn't work
Is this a bug with feedparser? Is the CPHC's feed malformed? Is there an option that I've forgotten to specify? It seems the trouble is that there isn't a key named 'where', but I don't know why this isn't a problem for the NHC feeds. The stack is reproduced below:
>>> import feedparser
>>> feedparser.parse('http://www.prh.noaa.gov/cphc/gis-cp.xml')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File ".../anaconda3/lib/python3.6/site-packages/feedparser.py", line 3956, in parse
saxparser.parse(source)
File ".../anaconda3/lib/python3.6/xml/sax/expatreader.py", line 111, in parse
xmlreader.IncrementalParser.parse(self, source)
File ".../anaconda3/lib/python3.6/xml/sax/xmlreader.py", line 125, in parse
self.feed(buffer)
File ".../anaconda3/lib/python3.6/xml/sax/expatreader.py", line 217, in feed
self._parser.Parse(data, isFinal)
File "/tmp/build/80754af9/python_1516124163501/work/Modules/pyexpat.c", line 414, in StartElement
File ".../anaconda3/lib/python3.6/xml/sax/expatreader.py", line 370, in start_element_ns
AttributesNSImpl(newattrs, qnames))
File ".../anaconda3/lib/python3.6/site-packages/feedparser.py", line 2031, in startElementNS
self.unknown_starttag(localname, list(attrsD.items()))
File ".../anaconda3/lib/python3.6/site-packages/feedparser.py", line 666, in unknown_starttag
return method(attrsD)
File ".../anaconda3/lib/python3.6/site-packages/feedparser.py", line 1500, in _start_gml_point
self._parse_srs_attrs(attrsD)
File ".../anaconda3/lib/python3.6/site-packages/feedparser.py", line 1496, in _parse_srs_attrs
context['where']['srsName'] = srsName
File ".../anaconda3/lib/python3.6/site-packages/feedparser.py", line 356, in __getitem__
return dict.__getitem__(self, key)
KeyError: 'where'
I know this is an old question but I faced myself this issue and became my first opensource contribution :)
Is this a bug with feedparser?
Yes, it was.
Is the CPHC's feed malformed?
Also yes, or at least it doesn't follow the GeoRSSS GML model to the letter. If you check the GMLPoint description you will see the following structure:
<georss:where>
<gml:Point>
<gml:pos>45.256 -71.92</gml:pos>
</gml:Point>
</georss:where>
but the feed data is structured this way:
<gml:Point>
<gml:pos>45.256 -71.92</gml:pos>
</gml:Point>
So that's why the KeyError: 'where' occurs, due to the absent of where tag.
This was fixed on feedparser's 6.0.9 hotfix (see https://github.com/kurtmckee/feedparser/pull/306)
Related
Hi friends I am trying to implement Mindbody API using python. But I am getting error that caused my work down. Down the page error is shown.
Traceback (most recent call last):
File "Appointment.py", line 6, in <module>
class AppointmentService():
File "Appointment.py", line 10, in AppointmentService
service = Client(urlService)
File "C:\Python27\lib\site-packages\suds\client.py", line 112, in __init__
self.wsdl = reader.open(url)
File "C:\Python27\lib\site-packages\suds\reader.py", line 152, in open
d = self.fn(url, self.options)
File "C:\Python27\lib\site-packages\suds\wsdl.py", line 136, in __init__
d = reader.open(url)
File "C:\Python27\lib\site-packages\suds\reader.py", line 79, in open
d = self.download(url)
File "C:\Python27\lib\site-packages\suds\reader.py", line 101, in download
return sax.parse(string=content)
File "C:\Python27\lib\site-packages\suds\sax\parser.py", line 136, in parse
sax.parse(source)
File "C:\Python27\lib\xml\sax\expatreader.py", line 110, in parse
xmlreader.IncrementalParser.parse(self, source)
File "C:\Python27\lib\xml\sax\xmlreader.py", line 123, in parse
self.feed(buffer)
File "C:\Python27\lib\xml\sax\expatreader.py", line 217, in feed
self._err_handler.fatalError(exc)
File "C:\Python27\lib\xml\sax\handler.py", line 38, in fatalError
raise exception
xml.sax._exceptions.SAXParseException: <unknown>:9:673: not well-formed (invalid token)
enter image description here
actually I am unable to resolve this problem and don't know the reason behind the error
It seems you are implementing third party API so you should take care of input which are sent to the server. Mainly this type of error occurs due to improper formatting of input and server was not able to understand your request. So please check you are hitting right service call with required parameter as in the format. Do refer to MindBody Api documentation
Good example and explanation can be found here for MindBody pyton integration http://www.mindbodydeveloper.com/mindbody-integration-python-example/
i tried to open rdf file (dmoz rdf dump), but a get this error message
Traceback (most recent call last):
File "/media/_dev_/ODP_RDF_get_links.py", line 4, in <module>
result = g.parse("data/content.rdf")
File "/usr/local/lib/python2.7/dist-packages/rdflib/graph.py", line 1033, in parse
parser.parse(source, self, **args)
File "/usr/local/lib/python2.7/dist-packages/rdflib/plugins/parsers/rdfxml.py", line 577, in parse
self._parser.parse(source)
File "/usr/lib/python2.7/xml/sax/expatreader.py", line 107, in parse
xmlreader.IncrementalParser.parse(self, source)
File "/usr/lib/python2.7/xml/sax/xmlreader.py", line 123, in parse
self.feed(buffer)
File "/usr/lib/python2.7/xml/sax/expatreader.py", line 210, in feed
self._parser.Parse(data, isFinal)
File "/usr/lib/python2.7/xml/sax/expatreader.py", line 352, in end_element_ns
self._cont_handler.endElementNS(pair, None)
File "/usr/local/lib/python2.7/dist-packages/rdflib/plugins/parsers/rdfxml.py", line 160, in endElementNS
self.current.end(name, qname)
File "/usr/local/lib/python2.7/dist-packages/rdflib/plugins/parsers/rdfxml.py", line 331, in node_element_end
self.error("Repeat node-elements inside property elements: %s"%"".join(name))
File "/usr/local/lib/python2.7/dist-packages/rdflib/plugins/parsers/rdfxml.py", line 185, in error
raise ParserError(info + message)
file:///media/_dev_/data/content.rdf:5:12: Repeat node-elements inside property elements: http://dmoz.org/rdf/catid
my simple code is as follow:
import rdflib
g = rdflib.Graph()
result = g.parse("data/content.rdf")
print("graph has %s statements." % len(g))
i need to be able to read the file.
extract all links in the world category.
thanks for any possible help
EDIT:
PS: found this wikipedia rdf_dumps, so developing custom scripts is necessary to use this dump
I'm new to python, and seriously need help! I have a number of errors I can't figure out. I'm using python 2.7 on a mac. Here is the list of errors:
Traceback (most recent call last):
File "minihiveosc.py", line 378, in <module>
swhive = SWMiniHiveOSC( options.host, options.hport, options.ip, options.port, options.minibees, options.serial, options.baudrate, options.config, [1,options.minibees], options.verbose, options.apimode )
File "minihiveosc.py", line 280, in __init__
self.hive.load_from_file( config )
File "/Users/Puffin/Documents/python/pydon/pydon/pydonhive.py", line 396, in load_from_file
hiveconf = cfgfile.read_file( filename )
File "/Users/Puffin/Documents/python/pydon/pydon/minibeexml.py", line 116, in read_file
tree = ET.parse( filename )
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 1183, in parse
tree.parse(source, parser)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 656, in parse
parser.feed(data)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 1643, in feed
self._raiseerror(v)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 1507, in _raiseerror
raise err
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 164, column 8
Any chance someone can help me?
Thanks!
What you posted in your question is called a "Traceback", and it shows only one error:
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 164, column 8
All the lines before it show how python got there; in the file minihiveosc.py, on line 378 some code was executed (shown in the traceback), which then led to line 280 of the same file, where something else was called, etc.
Every time Python calls a function the current state is pushed onto the stack to make room for the next context, and when an exception occurs python can show you this stack to help you diagnose your problem
In this case, you are trying to feed an XML document to the XML parser that has an error in it; by the time the parser gets to line 164, column 8, it found something it didn't expect. You'll need to inspect that document to see what the problem is, it'll be around that area.
It just because that your XML file is not wellformed at line 8. When the parser tries to read that line it raises that error. Have a look at your document to see what it is.
This is one error with stack trace.
Creation of SWMiniHiveOSC object caused error when executing load_from_file(config) method. File name or file content is in 'options.config'. Your XML config file is not well-formed, there is invalid token at line 164, column 8 in this file. The problem is with XML file, not python code.
i am trying to import the content of my blog using BeautifulSoup,using the the syntax as given below
import urllib2
from BeautifulSoup import BeautifulSoup
response=urllib2.urlopen('http://www.bugsandbrains.blogspot.com')
html=response.read()
soup=BeautifulSoup(html)
Every thing worked fine two or three time after that it started throwing HtmlParseError
i see it highly unlikely that the structure of the page might have changed within a few minutes what else can might be causing this problem ?
i am enclosing the trace as well.
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/usr/lib/pymodules/python2.6/BeautifulSoup.py", line 1499, in __init__
BeautifulStoneSoup.__init__(self, *args, **kwargs)
File "/usr/lib/pymodules/python2.6/BeautifulSoup.py", line 1230, in __init__
self._feed(isHTML=isHTML)
File "/usr/lib/pymodules/python2.6/BeautifulSoup.py", line 1263, in _feed
self.builder.feed(markup)
File "/usr/lib/python2.6/HTMLParser.py", line 108, in feed
self.goahead(0)
File "/usr/lib/python2.6/HTMLParser.py", line 150, in goahead
k = self.parse_endtag(i)
File "/usr/lib/python2.6/HTMLParser.py", line 317, in parse_endtag
self.error("bad end tag: %r" % (rawdata[i:j],))
File "/usr/lib/python2.6/HTMLParser.py", line 115, in error
raise HTMLParseError(message, self.getpos())
HTMLParseError: bad end tag: u"</scr' + 'ipt>", at line 1152, column 16
I just tried your code on Windows with:
Python: 2.6 (same as yours)
BeautiSoup: 3.0.8.1 (latest)
I can't reproduce this. Are you using the latest code 3.0 series which is meant for Python 2.6, not 3.1 series which is for Python 3 [0]. Sorry, but can't think of any other clues right now.
[0] http://www.crummy.com/software/BeautifulSoup/#Download
I have tried your code, and it works. My env: ActivePython 2.6.6.15, BeautifulSoup 3.0.8.1. I printed out soup variable and it contains content of "Boredom Induced Post". When I tested http://www.bugsandbrains.blogspot.com with browsers they shows Wave Sandbox login page. No clue about what is wrong :(
I'm creating a code that gets image's urls from any web pages, the code are in python and use BeutifulSoup and httplib2.
When I run the code, I get the next error:
Look me http://movies.nytimes.com (this line is printed by the code)
Traceback (most recent call last):
File "main.py", line 103, in <module>
visit(initialList,profundidad)
File "main.py", line 98, in visit
visit(dodo[indice], bottom -1)
File "main.py", line 94, in visit
getImages(w)
File "main.py", line 34, in getImages
iSoupList = BeautifulSoup(response, parseOnlyThese=SoupStrainer('img'))
File "/usr/local/lib/python2.6/dist-packages/BeautifulSoup.py", line 1499, in __init__
BeautifulStoneSoup.__init__(self, *args, **kwargs)
File "/usr/local/lib/python2.6/dist-packages/BeautifulSoup.py", line 1230, in __init__
self._feed(isHTML=isHTML)
File "/usr/local/lib/python2.6/dist-packages/BeautifulSoup.py", line 1263, in _feed
self.builder.feed(markup)
File "/usr/lib/python2.6/HTMLParser.py", line 108, in feed
self.goahead(0)
File "/usr/lib/python2.6/HTMLParser.py", line 148, in goahead
k = self.parse_starttag(i)
File "/usr/lib/python2.6/HTMLParser.py", line 226, in parse_starttag
endpos = self.check_for_whole_start_tag(i)
File "/usr/lib/python2.6/HTMLParser.py", line 301, in check_for_whole_start_tag
self.error("malformed start tag")
File "/usr/lib/python2.6/HTMLParser.py", line 115, in error
raise HTMLParseError(message, self.getpos())
HTMLParser.HTMLParseError: malformed start tag, at line 942, column 118
Someone can explain me how to fix or make an exeption for the error
Are you using latest version of BeautifulSoup?
This seems a known issue of version 3.1.x, because it started using a new parser (HTMLParser, instead of SGMLParser) that is much worse at processing malformed HTML. You can find more information about this on BeautifulSoup website.
As a quick solution, you can simply use an older version (3.0.7a).
To catch that error specifically, change your code to look like this:
try:
iSoupList = BeautifulSoup(response, parseOnlyThese=SoupStrainer('img'))
except HTMLParseError:
#Do something intelligent here
Here's some more reading on Python's try except blocks:
http://docs.python.org/tutorial/errors.html
I got that error when I had the string =& in my HTML document. When I replaced that string (in my case with =and) then I no longer received that parsing error.