Parsing XML exception

Parsing XML exception - python

I'm new to python, and seriously need help! I have a number of errors I can't figure out. I'm using python 2.7 on a mac. Here is the list of errors:
Traceback (most recent call last):
File "minihiveosc.py", line 378, in <module>
swhive = SWMiniHiveOSC( options.host, options.hport, options.ip, options.port, options.minibees, options.serial, options.baudrate, options.config, [1,options.minibees], options.verbose, options.apimode )
File "minihiveosc.py", line 280, in __init__
self.hive.load_from_file( config )
File "/Users/Puffin/Documents/python/pydon/pydon/pydonhive.py", line 396, in load_from_file
hiveconf = cfgfile.read_file( filename )
File "/Users/Puffin/Documents/python/pydon/pydon/minibeexml.py", line 116, in read_file
tree = ET.parse( filename )
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 1183, in parse
tree.parse(source, parser)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 656, in parse
parser.feed(data)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 1643, in feed
self._raiseerror(v)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 1507, in _raiseerror
raise err
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 164, column 8
Any chance someone can help me?
Thanks!

What you posted in your question is called a "Traceback", and it shows only one error:
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 164, column 8
All the lines before it show how python got there; in the file minihiveosc.py, on line 378 some code was executed (shown in the traceback), which then led to line 280 of the same file, where something else was called, etc.
Every time Python calls a function the current state is pushed onto the stack to make room for the next context, and when an exception occurs python can show you this stack to help you diagnose your problem
In this case, you are trying to feed an XML document to the XML parser that has an error in it; by the time the parser gets to line 164, column 8, it found something it didn't expect. You'll need to inspect that document to see what the problem is, it'll be around that area.

It just because that your XML file is not wellformed at line 8. When the parser tries to read that line it raises that error. Have a look at your document to see what it is.

This is one error with stack trace.
Creation of SWMiniHiveOSC object caused error when executing load_from_file(config) method. File name or file content is in 'options.config'. Your XML config file is not well-formed, there is invalid token at line 164, column 8 in this file. The problem is with XML file, not python code.

Related

Parse postgresql -pycparser.plyparser.ParseError before: pgwin32_signal_event

I need to parse an open-source project Postgresql using pycparser.
While parsing its source-code the following error arises:
Traceback (most recent call last):
File "examples\using_cpp_libc.py", line 48, in <module>
getAllFiles(projectName)
File "examples\using_cpp_libc.py", line 29, in getAllFiles
ast = parse_file(dirName+'\\'+fname, use_cpp = True, cpp_path = 'cpp',
cpp_args = [r'-nostdinc',r'-Iutils/fake_libc_include',r'-
Iprojects/postgresql/src/include'])
File "G:\python\pycparser-master\pycparser\__init__.py", line 92, in
parse_file
return parser.parse(text, filename)
File "G:\python\pycparser-master\pycparser\c_parser.py", line 152, in parse
debug=debuglevel)
File "G:\python\pycparser-master\pycparser\ply\yacc.py", line 334, in parse
return self.parseopt_notrack(input, lexer, debug, tracking, tokenfunc)
File "G:\python\pycparser-master\pycparser\ply\yacc.py", line 1204, in
parseopt_notrack
tok = call_errorfunc(self.errorfunc, errtoken, self)
File "G:\python\pycparser-master\pycparser\ply\yacc.py", line 193, in
call_errorfunc
r = errorfunc(token)
File "G:\python\pycparser-master\pycparser\c_parser.py", line 1838, in
p_error
column=self.clex.find_tok_column(p)))
File "G:\python\pycparser-master\pycparser\plyparser.py", line 67, in
_parse_error
raise ParseError("%s: %s" % (coord, msg))
pycparser.plyparser.ParseError:
projects/postgresql/src/include/pg_config_os.h:366:15: before:
pgwin32_signal_event
I am using postgresql-9.6.9, build it using visual studio express 2017 on windows 10 (64-bit)

The blog post you quoted in the comment is the canonical resource. Parsing large C projects is not easy - they have their own quirks - so it takes work. I doubt it's resolvable within the confines of a Stack Overflow question.
You need to start tackling the issues one by one - for example look at the pgwin32_signal_event token in pg_config_os.h - why can't it be parsed? Perhaps its type is unparsable? Was it defined? Could it be added to a "fake" header, etc. Unfortunately, there's no easy way to do this except working through the issues one by one.
Be sure to preprocess the file you're parsing first, dumping the full preprocessed version into a single .c file - this gets all the types into a single file you can work with.

Unexpected error when using feedparser.py

I have had great success parsing RSS feeds from the National Hurricane Center using the feedparser module:
import feedparser
feedparser.parse('https://www.nhc.noaa.gov/gis-at.xml') #Works Fine
feedparser.parse('https://www.nhc.noaa.gov/gis-ep.xml') #Works Fine
However, when I try to read the superficially similar feed from the Central Pacific Hurricane Center, I generate a KeyError:
feedparser.parse('http://www.prh.noaa.gov/cphc/gis-cp.xml') #Doesn't work
Is this a bug with feedparser? Is the CPHC's feed malformed? Is there an option that I've forgotten to specify? It seems the trouble is that there isn't a key named 'where', but I don't know why this isn't a problem for the NHC feeds. The stack is reproduced below:
>>> import feedparser
>>> feedparser.parse('http://www.prh.noaa.gov/cphc/gis-cp.xml')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File ".../anaconda3/lib/python3.6/site-packages/feedparser.py", line 3956, in parse
saxparser.parse(source)
File ".../anaconda3/lib/python3.6/xml/sax/expatreader.py", line 111, in parse
xmlreader.IncrementalParser.parse(self, source)
File ".../anaconda3/lib/python3.6/xml/sax/xmlreader.py", line 125, in parse
self.feed(buffer)
File ".../anaconda3/lib/python3.6/xml/sax/expatreader.py", line 217, in feed
self._parser.Parse(data, isFinal)
File "/tmp/build/80754af9/python_1516124163501/work/Modules/pyexpat.c", line 414, in StartElement
File ".../anaconda3/lib/python3.6/xml/sax/expatreader.py", line 370, in start_element_ns
AttributesNSImpl(newattrs, qnames))
File ".../anaconda3/lib/python3.6/site-packages/feedparser.py", line 2031, in startElementNS
self.unknown_starttag(localname, list(attrsD.items()))
File ".../anaconda3/lib/python3.6/site-packages/feedparser.py", line 666, in unknown_starttag
return method(attrsD)
File ".../anaconda3/lib/python3.6/site-packages/feedparser.py", line 1500, in _start_gml_point
self._parse_srs_attrs(attrsD)
File ".../anaconda3/lib/python3.6/site-packages/feedparser.py", line 1496, in _parse_srs_attrs
context['where']['srsName'] = srsName
File ".../anaconda3/lib/python3.6/site-packages/feedparser.py", line 356, in __getitem__
return dict.__getitem__(self, key)
KeyError: 'where'

I know this is an old question but I faced myself this issue and became my first opensource contribution :)
Is this a bug with feedparser?
Yes, it was.
Is the CPHC's feed malformed?
Also yes, or at least it doesn't follow the GeoRSSS GML model to the letter. If you check the GMLPoint description you will see the following structure:
<georss:where>
<gml:Point>
<gml:pos>45.256 -71.92</gml:pos>
</gml:Point>
</georss:where>
but the feed data is structured this way:
<gml:Point>
<gml:pos>45.256 -71.92</gml:pos>
</gml:Point>
So that's why the KeyError: 'where' occurs, due to the absent of where tag.
This was fixed on feedparser's 6.0.9 hotfix (see https://github.com/kurtmckee/feedparser/pull/306)

SyntaxError using gdata-python to get worksheets feed

I'm getting an occasional error when trying to fetch a list of worksheets from gdata. This does not happen for all spreadsheets, but will consistently happen to the same spreadsheet for a period of several days to weeks. I suspected permissions, but was unable to find any special permissions for the spreadsheets that cause the error. I'm using OAuth2, gdata 2.0.18, and Python 2.6.8.
Traceback (most recent call last):
File "/mnt/shared_from_host/snake/base/fetchers/google_spreadsheet/common.py", line 176, in get_worksheet_list
feed = client.get_worksheets(spreadsheet_id)
File "/home/ubuntu/.virtualenvs/snakeenv/lib/python2.6/site-packages/gdata/spreadsheets/client.py", line 108, in get_worksheets
**kwargs)
File "/home/ubuntu/.virtualenvs/snakeenv/lib/python2.6/site-packages/gdata/client.py", line 640, in get_feed
**kwargs)
File "/home/ubuntu/.virtualenvs/snakeenv/lib/python2.6/site-packages/gdata/client.py", line 278, in request
version=get_xml_version(self.api_version))
File "/home/ubuntu/.virtualenvs/snakeenv/lib/python2.6/site-packages/atom/core.py", line 520, in parse
tree = ElementTree.fromstring(xml_string)
File "<string>", line 86, in XML
SyntaxError: no element found: line 1, column 0
This seems to be from the request getting an empty string as the response.
Does anybody have any idea on why this might not work, or troubleshooting ideas? Thanks.

BeautifulSoup not working

i am trying to import the content of my blog using BeautifulSoup,using the the syntax as given below
import urllib2
from BeautifulSoup import BeautifulSoup
response=urllib2.urlopen('http://www.bugsandbrains.blogspot.com')
html=response.read()
soup=BeautifulSoup(html)
Every thing worked fine two or three time after that it started throwing HtmlParseError
i see it highly unlikely that the structure of the page might have changed within a few minutes what else can might be causing this problem ?
i am enclosing the trace as well.
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/usr/lib/pymodules/python2.6/BeautifulSoup.py", line 1499, in __init__
BeautifulStoneSoup.__init__(self, *args, **kwargs)
File "/usr/lib/pymodules/python2.6/BeautifulSoup.py", line 1230, in __init__
self._feed(isHTML=isHTML)
File "/usr/lib/pymodules/python2.6/BeautifulSoup.py", line 1263, in _feed
self.builder.feed(markup)
File "/usr/lib/python2.6/HTMLParser.py", line 108, in feed
self.goahead(0)
File "/usr/lib/python2.6/HTMLParser.py", line 150, in goahead
k = self.parse_endtag(i)
File "/usr/lib/python2.6/HTMLParser.py", line 317, in parse_endtag
self.error("bad end tag: %r" % (rawdata[i:j],))
File "/usr/lib/python2.6/HTMLParser.py", line 115, in error
raise HTMLParseError(message, self.getpos())
HTMLParseError: bad end tag: u"</scr' + 'ipt>", at line 1152, column 16

I just tried your code on Windows with:
Python: 2.6 (same as yours)
BeautiSoup: 3.0.8.1 (latest)
I can't reproduce this. Are you using the latest code 3.0 series which is meant for Python 2.6, not 3.1 series which is for Python 3 [0]. Sorry, but can't think of any other clues right now.
[0] http://www.crummy.com/software/BeautifulSoup/#Download

I have tried your code, and it works. My env: ActivePython 2.6.6.15, BeautifulSoup 3.0.8.1. I printed out soup variable and it contains content of "Boredom Induced Post". When I tested http://www.bugsandbrains.blogspot.com with browsers they shows Wave Sandbox login page. No clue about what is wrong :(

how to fix or make an exception for this error

I'm creating a code that gets image's urls from any web pages, the code are in python and use BeutifulSoup and httplib2.
When I run the code, I get the next error:
Look me http://movies.nytimes.com (this line is printed by the code)
Traceback (most recent call last):
File "main.py", line 103, in <module>
visit(initialList,profundidad)
File "main.py", line 98, in visit
visit(dodo[indice], bottom -1)
File "main.py", line 94, in visit
getImages(w)
File "main.py", line 34, in getImages
iSoupList = BeautifulSoup(response, parseOnlyThese=SoupStrainer('img'))
File "/usr/local/lib/python2.6/dist-packages/BeautifulSoup.py", line 1499, in __init__
BeautifulStoneSoup.__init__(self, *args, **kwargs)
File "/usr/local/lib/python2.6/dist-packages/BeautifulSoup.py", line 1230, in __init__
self._feed(isHTML=isHTML)
File "/usr/local/lib/python2.6/dist-packages/BeautifulSoup.py", line 1263, in _feed
self.builder.feed(markup)
File "/usr/lib/python2.6/HTMLParser.py", line 108, in feed
self.goahead(0)
File "/usr/lib/python2.6/HTMLParser.py", line 148, in goahead
k = self.parse_starttag(i)
File "/usr/lib/python2.6/HTMLParser.py", line 226, in parse_starttag
endpos = self.check_for_whole_start_tag(i)
File "/usr/lib/python2.6/HTMLParser.py", line 301, in check_for_whole_start_tag
self.error("malformed start tag")
File "/usr/lib/python2.6/HTMLParser.py", line 115, in error
raise HTMLParseError(message, self.getpos())
HTMLParser.HTMLParseError: malformed start tag, at line 942, column 118
Someone can explain me how to fix or make an exeption for the error

Are you using latest version of BeautifulSoup?
This seems a known issue of version 3.1.x, because it started using a new parser (HTMLParser, instead of SGMLParser) that is much worse at processing malformed HTML. You can find more information about this on BeautifulSoup website.
As a quick solution, you can simply use an older version (3.0.7a).

To catch that error specifically, change your code to look like this:
try:
iSoupList = BeautifulSoup(response, parseOnlyThese=SoupStrainer('img'))
except HTMLParseError:
#Do something intelligent here
Here's some more reading on Python's try except blocks:
http://docs.python.org/tutorial/errors.html

I got that error when I had the string =& in my HTML document. When I replaced that string (in my case with =and) then I no longer received that parsing error.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Parsing XML exception - python

It just because that your XML file is not wellformed at line 8. When the parser tries to read that line it raises that error. Have a look at your document to see what it is.

Related

Parse postgresql -pycparser.plyparser.ParseError before: pgwin32_signal_event

Unexpected error when using feedparser.py

SyntaxError using gdata-python to get worksheets feed

BeautifulSoup not working

how to fix or make an exception for this error

Categories

Resources