Python package to parse identifiers in a program (C, Scala, Lisp)?

Python package to parse identifiers in a program (C, Scala, Lisp)? - python

In the title I mention 3 different languages in which I would like to find out if a python package exists which can give me a list of identifiers for a program in any of those; so doesn't have to be all three of them as I doubt it there would be one like that. So my question is does a function or class exist in python that allows me too get a list of identifiers for a specific program in a language, preferably one in the 3 I listed in the title. Any help appreciated.

In general, this is not possible without having a nearly complete language implementation.
There is a rudimentary preprocessor in C, which could allow to mask function declarations from an ad hoc scanning. There is a powerful metaprogramming in Lisp, which means you can only extract the definitions using a full-featured Lisp compiler, simple parsing won't help at all.
Scala is the simplest of these three, but still its syntax is over-bloated and you'll need at least a complete parser. Python is not nearly a right tool for doing this sort of things any way.

There's pycparser, which you can use to generate a C AST from code and then traverse it to get whatever you want.
There's this simple lisp interpreter in Python from which you should be able to scrap the parser.
And I doubt there's anything similar and readily available for Scala, but you can use something like ply to make a parser. It won't be as easy, but will do.

Related

Best python library to use for a BNF based autocompleter

I want to create a SQL autocompleter for use with rlwrap: https://github.com/hanslub42/rlwrap
This could then be used with sqlite3 & osqueri for example (I know they already have some autocompletion facility, but it's not good enough, especially under rlwrap).
In fact, more generally I would like to know the best approach for building autocompleters based on BNF grammar descriptions; I may want to produce autocompleters for other rlwrapped REPLs at some point in the future.
I have no experience with parsers, but I have read some stuff online about the different types of parsers and how they work, and this Pyleri tutorial: https://tomassetti.me/pyleri-tutorial/
Pyleri looks fairly straightforward, and has the expecting property which makes it easy to create a auto-completer, but AFAIK it would involve translating the sqlite BNF (and any other BNF's that I might want to use in the future) into python code, which is a drag.
ANTLR has lots of predefined grammar files for many different languages, and the ability to output python code, but I'm not sure how easy it is to produce an autocompleter, and I don't want to read through all the documentation only to find out I've wasted my time.
So can anyone advise me? What's the best approach?

How to select a subset from Python for parsing purpose

I am working on assignment and need to develop a python to openmodelica translator. For which I am using flex and bison in initial stages. Initially I need to define a subset of python language on which I could perform a whole demo. I am new to Python language, Can anybody suggest how can I define a subset of python language? Thanks.

Well as you are probably not interested in writing it in Python itself, I guess the language reference is the best starting point. It defines the whole grammar of the language. So this is likely a good starting point to find some features of the language you want to implement on your own; and then you need to write your own grammar and a parser for it in your language of choice.
Otherwise, you could use the built-in Python language services to actually parse real Python code and extract it into abstract syntax trees for example.
But if you are meant to only have a subset, I don’t think having the full language capabilities will do you any good. So you better start off with a real subset of the grammar. A good way to get to know which features you want to take over is probably by using the language yourself for a bit. Do some tutorials etc. and see how the basic syntax works.

What should I know about Python to identify comments in different source files?

I have a need to identify comments in different kinds of source files in a given directory. ( For example java,XML, JavaScript, bash). I have decided to do this using Python (as an attempt to learn Python). The questions I have are
1) What should I know about python to get this done? ( I have an idea that Regular Expressions will be useful but are there alternatives/other modules that will be useful? Libraries that I can use to get this done?)
2) Is Python a good choice for such a task? Will some other language make this easier to accomplish?

Your problem seems to be more related to programming language parsing. I believe with regular expressions you will be able to find comments in most of the languages. The good thing is that you have regular expressions almost everywhere: Perl, Python, Ruby, AWK, Sed, etc.
But, as the other answer said, you'd better use some parsing machinery. And, if not a full blown parser, a lexer. For Python, check out the Pygments library, which has lexers for many languages already implemented.

1) What you need to know about is parsing, not regex. Additionally you will need the os module and some knowledge about pythons file handling. DiveIntoPython (http://www.diveintopython.net/) is a good start here. I'd recommend chapter 6. (And maybe 1-5 as well :) )
2) Python is a good start. Another language is not going to make it easier, but different. Python allready is pretty simple to start with.
I would recommend not to use regex for your task, as it is as simple as searching for comment signs and linefeeds.

The pyparsing module directly supports several styles of comments. E.g.,
from pyparsing import javaStyleComment
for match in javaStyleComment.scanString(text):
<do stuff>
So if your goal is just getting the job done, look into this since the comment parsers are likely to be more robust than anything you'd throw together. If you're more interested in learning to do it yourself, this might be too much processed food for your taste.

How can I go about securely executing a subset of python?

I need to store source code for a basic function in a database and allow it to be modified through an admin interface. This code will take several numbers and strings as parameters, and return a number or None. I know that eval is evil, so I need to implement a safe way to execute a very basic subset of python, or something syntactically similar at least, from within a python based web-app.
The obvious answer is to implement a DSL (Domain Specific Language), however, I have no experience with that, nor do I have any idea where to begin, and a lot of the resources available seem to go a little over my head. I'm hoping that maybe there is something already out there which will allow me to essentially generate a secure python-callable function from a string in a database. the language really only needs to support assignment, basic math, if/else, and case insensitive string comparisons. any other features are a bonus, but I think most things can be done with just that, no need for complex data structures, classes, functions, etc.
If no such thing currently exists, I'm willing to look into the possibility of creating one, but as I said, I have no idea how to go about that, and any advice in that regard would be appreciated as well.

Restricted Python environments are hard to make really safe.
Maybe something like lua is a better fit for you

PySandbox might help. I haven't tested it, just found it linked elsewhere.

You could use Pyparsing to implement your DSL, provided the expressions involved won't be too complex (you don't give full details on that but you imply the requirements are pretty simple). See the examples page including specifically fourFn.py or simpleCalc.py.

You could implement a subset of Python by using the ast module to parse Python code into an abstract syntax tree then walk the tree checking that it only uses the subset of Python that you allow. This will only work in Python 2.x since Python 3 has removed the ast module.
However even using this method it will be hard to create something that is 100% secure, since even the most innocuous code could allow the user to write something that could blow up your application, e.g. by allocating more memory than you have available or putting the program into an infinite loop using all the CPU.

Python implementation of Parsec?

I recently wrote a parser in Python using Ply (it's a python reimplementation of yacc). When I was almost done with the parser I discovered that the grammar I need to parse requires me to do some look up during parsing to inform the lexer. Without doing a look up to inform the lexer I cannot correctly parse the strings in the language.
Given than I can control the state of the lexer from the grammar rules I think I'll be solving my use case using a look up table in the parser module, but it may become too difficult to maintain/test. So I want to know about some of the other options.
In Haskell I would use Parsec, a library of parsing functions (known as combinators). Is there a Python implementation of Parsec? Or perhaps some other production quality library full of parsing functionality so I can build a context sensitive parser in Python?
EDIT: All my attempts at context free parsing have failed. For this reason, I don't expect ANTLR to be useful here.

I believe that pyparsing is based on the same principles as parsec.

PySec is another monadic parser, I don't know much about it, but it's worth looking at here

An option you may consider, if an LL parser is ok to you, is to give ANTLR a try, it can generate python too (actually it is LL(*) as they name it, * stands for the quantity of lookahead it can cope with).

Nothing prevents you for diverting your parser from the "context free" path using PLY. You can pass information to the lexer during parsing, and in this way achieve full flexibility. I'm pretty sure that you can parse anything you want with PLY this way.
For a hands-on example, consider - it is a parser for ANSI C written in Python with PLY. It solves the classic C typedef - identifier problem (that makes C's grammar non context-sensitive) by populating a symbol table in the parser that is being used in the lexer to resolve symbol names as either types or not.

There's ANTLR, which is LL(*), there's PyParsing, which is more object friendly and is sort of like a DSL, and then there's Parsing which is like OCaml's Menhir.

ANTLR is great and has the added benefit of working across multiple languages.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python package to parse identifiers in a program (C, Scala, Lisp)? - python

Related

Best python library to use for a BNF based autocompleter

How to select a subset from Python for parsing purpose

What should I know about Python to identify comments in different source files?

How can I go about securely executing a subset of python?

Python implementation of Parsec?

Categories

Resources