pycparser ParseError - python

I am trying to creat AST usinfg pyCparser,
The following error printed:
Traceback (most recent call last):
File "C:\Work\RE\Tools\VarsExporter\BuildExportedDb.py", line 1076, in
main()
File "C:\Work\RE\Tools\VarsExporter\BuildExportedDb.py", line 1032, in main
ast = parse_file(i_file)
File "C:\Python27\lib\site-packages\pycparser_init_.py", line 93, in
parse_file
return parser.parse(text, filename)
File "C:\Python27\lib\site-packages\pycparser\c_parser.py", line 152, in
parse
debug=debuglevel)
File "C:\Python27\lib\site-packages\pycparser\ply\yacc.py", line 331, in
parse
return self.parseopt_notrack(input, lexer, debug, tracking, tokenfunc)
File "C:\Python27\lib\site-packages\pycparser\ply\yacc.py", line 1199, in
parseopt_notrack
tok = call_errorfunc(self.errorfunc, errtoken, self)
File "C:\Python27\lib\site-packages\pycparser\ply\yacc.py", line 193, in
call_errorfunc
r = errorfunc(token)
File "C:\Python27\lib\site-packages\pycparser\c_parser.py", line 1761, in
p_error
column=self.clex.find_tok_column(p)))
File "C:\Python27\lib\site-packages\pycparser\plyparser.py", line 67, in _
parse_error
raise ParseError("%s: %s" % (coord, msg))
ParseError: Objectffly\SerDb.i:43:18: before: __loff_t
What causes the above issue? How Can I handle it?
Any suggestions how can I debug it, and find out what's going on?

From pyCparser git FAQ:
C code almost always #includes various header files from the standard C library, like stdio.h. While (with some effort) pycparser can be made to parse the standard headers from any C compiler, it's much simpler to use the provided "fake" standard includes in utils/fake_libc_include. These are standard C header files that contain only the bare necessities to allow valid parsing of the files that use them.
To solve the issue I have successfully used the method described here.

Related

Parse postgresql -pycparser.plyparser.ParseError before: pgwin32_signal_event

I need to parse an open-source project Postgresql using pycparser.
While parsing its source-code the following error arises:
Traceback (most recent call last):
File "examples\using_cpp_libc.py", line 48, in <module>
getAllFiles(projectName)
File "examples\using_cpp_libc.py", line 29, in getAllFiles
ast = parse_file(dirName+'\\'+fname, use_cpp = True, cpp_path = 'cpp',
cpp_args = [r'-nostdinc',r'-Iutils/fake_libc_include',r'-
Iprojects/postgresql/src/include'])
File "G:\python\pycparser-master\pycparser\__init__.py", line 92, in
parse_file
return parser.parse(text, filename)
File "G:\python\pycparser-master\pycparser\c_parser.py", line 152, in parse
debug=debuglevel)
File "G:\python\pycparser-master\pycparser\ply\yacc.py", line 334, in parse
return self.parseopt_notrack(input, lexer, debug, tracking, tokenfunc)
File "G:\python\pycparser-master\pycparser\ply\yacc.py", line 1204, in
parseopt_notrack
tok = call_errorfunc(self.errorfunc, errtoken, self)
File "G:\python\pycparser-master\pycparser\ply\yacc.py", line 193, in
call_errorfunc
r = errorfunc(token)
File "G:\python\pycparser-master\pycparser\c_parser.py", line 1838, in
p_error
column=self.clex.find_tok_column(p)))
File "G:\python\pycparser-master\pycparser\plyparser.py", line 67, in
_parse_error
raise ParseError("%s: %s" % (coord, msg))
pycparser.plyparser.ParseError:
projects/postgresql/src/include/pg_config_os.h:366:15: before:
pgwin32_signal_event
I am using postgresql-9.6.9, build it using visual studio express 2017 on windows 10 (64-bit)
The blog post you quoted in the comment is the canonical resource. Parsing large C projects is not easy - they have their own quirks - so it takes work. I doubt it's resolvable within the confines of a Stack Overflow question.
You need to start tackling the issues one by one - for example look at the pgwin32_signal_event token in pg_config_os.h - why can't it be parsed? Perhaps its type is unparsable? Was it defined? Could it be added to a "fake" header, etc. Unfortunately, there's no easy way to do this except working through the issues one by one.
Be sure to preprocess the file you're parsing first, dumping the full preprocessed version into a single .c file - this gets all the types into a single file you can work with.

pycparser.plyparser.ParseError on complex struct

I'm trying to use pycparser to parse this C code:
https://github.com/nbeaver/mx-trunk/blob/0b80678773582babcd56fe959d5cfbb776cc0004/libMx/d_adsc_two_theta.c
A repo with a minimal example and Makefile is here:
https://github.com/nbeaver/pycparser-problem
Using pycparser v2.14 (from pip) and gcc 4.9.2 on Debian Jessie.
Things I have tried:
Pass the -nostdinc flag to gcc and including the fake_libc_include folder.
Use -D'__attribute__(x)=' to take out GCC extensions
Use fake headers for e.g. <sys/param.h>
Use the -std=c99 in case the code is not C99 compatible.
Reproduce the redis example in case there is something weird with my machine.
This is what the traceback looks like:
Traceback (most recent call last):
File "just_parse.py", line 21, in <module>
parse(path)
File "just_parse.py", line 9, in parse
ast = pycparser.parse_file(filename)
File "/home/nathaniel/.local/lib/python2.7/site-packages/pycparser/__init__.py", line 93, in parse_file
return parser.parse(text, filename)
File "/home/nathaniel/.local/lib/python2.7/site-packages/pycparser/c_parser.py", line 146, in parse
debug=debuglevel)
File "/home/nathaniel/.local/lib/python2.7/site-packages/pycparser/ply/yacc.py", line 265, in parse
return self.parseopt_notrack(input,lexer,debug,tracking,tokenfunc)
File "/home/nathaniel/.local/lib/python2.7/site-packages/pycparser/ply/yacc.py", line 1047, in parseopt_notrack
tok = self.errorfunc(errtoken)
File "/home/nathaniel/.local/lib/python2.7/site-packages/pycparser/c_parser.py", line 1680, in p_error
column=self.clex.find_tok_column(p)))
File "/home/nathaniel/.local/lib/python2.7/site-packages/pycparser/plyparser.py", line 55, in _parse_error
raise ParseError("%s: %s" % (coord, msg))
pycparser.plyparser.ParseError: in/d_adsc_two_theta.c:63:82: before: .
The traceback points to this line:
https://github.com/nbeaver/mx-trunk/blob/0b80678773582babcd56fe959d5cfbb776cc0004/libMx/d_adsc_two_theta.c#L63
Which in turn points to this #define macro:
https://github.com/nbeaver/mx-trunk/blob/0b80678773582babcd56fe959d5cfbb776cc0004/libMx/mx_motor.h#L484
The cause appears to be the offsetof() function. The minimal working examples are fixed by recent commits, however:
https://github.com/eliben/pycparser/issues/87

Error when using python textblob library tagger

I had the textblob library working fine for a while, but decided to install (using easy_install) an additional library (page here) claiming faster and more accurate tagging.
I couldn't get it working so I uninstalled it, but it seems to have messed with the tagging function in TextBlob. I've uninstalled and reinstalled both nltk and TextBlob numerous times with both pip and easy_install, and made sure they're up to date.
Here is an example of a simple script which generates the error:
from textblob import TextBlob
blob = TextBlob("This is a sentence")
print repr(blob.tags)
and the error printed:
Traceback (most recent call last):
File "tesst.py", line 5, in <module>
print repr(blob.tags)
File "C:\Users\Emmet\Anaconda\lib\site-packages\textblob\decorators.py", line 24, in __get__
value = obj.__dict__[self.func.__name__] = self.func(obj)
File "C:\Users\Emmet\Anaconda\lib\site-packages\textblob\blob.py", line 445, in pos_tags
for word, t in self.pos_tagger.tag(self.raw)
File "C:\Users\Emmet\Anaconda\lib\site-packages\textblob\decorators.py", line 35, in decorated
return func(*args, **kwargs)
File "C:\Users\Emmet\Anaconda\lib\site-packages\textblob\en\taggers.py", line 34, in tag
tagged = nltk.tag.pos_tag(text)
File "C:\Users\Emmet\Anaconda\lib\site-packages\nltk\tag\__init__.py", line 110, in pos_tag
tagger = PerceptronTagger()
File "C:\Users\Emmet\Anaconda\lib\site-packages\nltk\tag\perceptron.py", line 141, in __init__
self.load(AP_MODEL_LOC)
File "C:\Users\Emmet\Anaconda\lib\site-packages\nltk\tag\perceptron.py", line 209, in load
self.model.weights, self.tagdict, self.classes = load(loc)
File "C:\Users\Emmet\Anaconda\lib\site-packages\nltk\data.py", line 801, in load
opened_resource = _open(resource_url)
File "C:\Users\Emmet\Anaconda\lib\site-packages\nltk\data.py", line 924, in _open
return urlopen(resource_url)
File "C:\Users\Emmet\Anaconda\lib\urllib2.py", line 154, in urlopen
return opener.open(url, data, timeout)
File "C:\Users\Emmet\Anaconda\lib\urllib2.py", line 431, in open
response = self._open(req, data)
File "C:\Users\Emmet\Anaconda\lib\urllib2.py", line 454, in _open
'unknown_open', req)
File "C:\Users\Emmet\Anaconda\lib\urllib2.py", line 409, in _call_chain
result = func(*args)
File "C:\Users\Emmet\Anaconda\lib\urllib2.py", line 1265, in unknown_open
raise URLError('unknown url type: %s' % type)
urllib2.URLError: <urlopen error unknown url type: c>
You can see that the error actually mentions the perceptron tagger. Is there any way to more thoroughly remove any references there may be to the alternate tagger?
Also note that only the "tags" function has been affected.
This seems to be a problem with nltk version 3.2. Until it's fixed in the release, you can use this hack:
NLTK v3.2: Unable to nltk.pos_tag()
I found out why I was having trouble with the ap tagger. My issue is solved here. More specifically, by the comment "Another option is to install nltk and then change "from textblob.packages import nltk" to "import nltk" [in the taggers.py] file."
(Note that this doesn't correspond to the error message above: that error was coming up without aptagger installed. I was getting another error with it installed, and this is a solution for that.)

Merging PDF files with Python3

I am writing a small script that needs to merge many one-page pdf files. I want the script to run with Python3 and to have as few dependencies as possible.
For the PDF merging part, I tried using PyPdf. However, the Python 3 support seems to be buggy; It can't handle inkscape generated PDF files (which I need). I have the current git version of PyPdf installed, and the following test script doesn't work:
import PyPDF2
output_pdf = PyPDF2.PdfFileWriter()
with open("testI.pdf", "rb") as input:
input_pdf = PyPDF2.PdfFileReader(input)
output_pdf.addPage(input_pdf.getPage(0))
with open("test.pdf", "wb") as output:
output_pdf.write(output)
It throws the following stack trace:
Traceback (most recent call last):
File "test.py", line 7, in <module>
output.addPage(input.getPage(0))
File "/usr/lib/python3.3/site-packages/pyPdf/pdf.py", line 420, in getPage
self._flatten()
File "/usr/lib/python3.3/site-packages/pyPdf/pdf.py", line 574, in _flatten
self._flatten(page.getObject(), inherit)
File "/usr/lib/python3.3/site-packages/pyPdf/generic.py", line 165, in getObject
return self.pdf.getObject(self).getObject()
File "/usr/lib/python3.3/site-packages/pyPdf/pdf.py", line 616, in getObject
retval = readObject(self.stream, self)
File "/usr/lib/python3.3/site-packages/pyPdf/generic.py", line 66, in readObject
return DictionaryObject.readFromStream(stream, pdf)
File "/usr/lib/python3.3/site-packages/pyPdf/generic.py", line 526, in readFromStream
value = readObject(stream, pdf)
File "/usr/lib/python3.3/site-packages/pyPdf/generic.py", line 57, in readObject
return ArrayObject.readFromStream(stream, pdf)
File "/usr/lib/python3.3/site-packages/pyPdf/generic.py", line 152, in readFromStream
obj = readObject(stream, pdf)
File "/usr/lib/python3.3/site-packages/pyPdf/generic.py", line 86, in readObject
return NumberObject.readFromStream(stream)
File "/usr/lib/python3.3/site-packages/pyPdf/generic.py", line 231, in readFromStream
return FloatObject(name.decode("ascii"))
File "/usr/lib/python3.3/site-packages/pyPdf/generic.py", line 207, in __new__
return decimal.Decimal.__new__(cls, str(value), context)
TypeError: optional argument must be a context
The same script, however, works flawlessly with Python 2.7.
What am I doing wrong here? Is it a bug in the library? Can I work around it without touching the PyPDF library?
So I found the answer. The decimal.Decimal module in Python3.3 shows some weird behaviour. This is the corresponding StackOverflow question: Instantiate Decimal class I added some workaround to the PyPDF2 library and submitted a pull request.
Just to make sure you are aware of already existing tools that do exactly this:
PDFtk
PDFjam (my favourite, requires LaTeX though)
Directly with GhostScript:
gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=finished.pdf file1.pdf file2.pdf

What does this Python message mean?

ho-fe3fdd00-12:~ Sam$ easy_install BeautifulSoup
Traceback (most recent call last):
File "/usr/bin/easy_install", line 8, in <module>
load_entry_point('setuptools==0.6c7', 'console_scripts', 'easy_install')()
File "/System/Library/Frameworks/Python.framework/Versions/2.5/Extras/lib/python/setuptools/command/easy_install.py", line 1670, in main
with_ei_usage(lambda:
File "/System/Library/Frameworks/Python.framework/Versions/2.5/Extras/lib/python/setuptools/command/easy_install.py", line 1659, in with_ei_usage
return f()
File "/System/Library/Frameworks/Python.framework/Versions/2.5/Extras/lib/python/setuptools/command/easy_install.py", line 1674, in <lambda>
distclass=DistributionWithoutHelpCommands, **kw
File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/distutils/core.py", line 125, in setup
dist.parse_config_files()
File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/distutils/dist.py", line 373, in parse_config_files
parser.read(filename)
File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/ConfigParser.py", line 267, in read
self._read(fp, filename)
File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/ConfigParser.py", line 462, in _read
raise MissingSectionHeaderError(fpname, lineno, line)
ConfigParser.MissingSectionHeaderError: File contains no section headers.
file: /Users/Sam/.pydistutils.cfg, line: 1
'install_lib = ~/Library/Python/$py_version_short/site-packages\n'
I am trying to install beautifulsoup.
The first two lines in ~/.pydistutils.cfg:
install_lib = ~/Library/Python/$py_version_short/site-packages
install_scripts = ~/bin
BeautifulSoup is a pure Python module which you can install by grabbing the BeautifulSoup.py file (eg. from inside the standard .tar.gz distribution) and putting it somewhere on your PythonPath - eg. inside /Users/Sam/Library/Python/2.5/site-packages, if the paths mentioned in the error message are accurate.
No need for fussy and error-prone installers which just overcomplicate the issue.
The configuration file .pydstutils.cfg has a syntax error.
Try to add the line at the top of ~/.pydistutils.cfg:
[easy_install]

Categories