How to select a subset from Python for parsing purpose

How to select a subset from Python for parsing purpose - python

I am working on assignment and need to develop a python to openmodelica translator. For which I am using flex and bison in initial stages. Initially I need to define a subset of python language on which I could perform a whole demo. I am new to Python language, Can anybody suggest how can I define a subset of python language? Thanks.

Well as you are probably not interested in writing it in Python itself, I guess the language reference is the best starting point. It defines the whole grammar of the language. So this is likely a good starting point to find some features of the language you want to implement on your own; and then you need to write your own grammar and a parser for it in your language of choice.
Otherwise, you could use the built-in Python language services to actually parse real Python code and extract it into abstract syntax trees for example.
But if you are meant to only have a subset, I don’t think having the full language capabilities will do you any good. So you better start off with a real subset of the grammar. A good way to get to know which features you want to take over is probably by using the language yourself for a bit. Do some tutorials etc. and see how the basic syntax works.

Related

Best python library to use for a BNF based autocompleter

I want to create a SQL autocompleter for use with rlwrap: https://github.com/hanslub42/rlwrap
This could then be used with sqlite3 & osqueri for example (I know they already have some autocompletion facility, but it's not good enough, especially under rlwrap).
In fact, more generally I would like to know the best approach for building autocompleters based on BNF grammar descriptions; I may want to produce autocompleters for other rlwrapped REPLs at some point in the future.
I have no experience with parsers, but I have read some stuff online about the different types of parsers and how they work, and this Pyleri tutorial: https://tomassetti.me/pyleri-tutorial/
Pyleri looks fairly straightforward, and has the expecting property which makes it easy to create a auto-completer, but AFAIK it would involve translating the sqlite BNF (and any other BNF's that I might want to use in the future) into python code, which is a drag.
ANTLR has lots of predefined grammar files for many different languages, and the ability to output python code, but I'm not sure how easy it is to produce an autocompleter, and I don't want to read through all the documentation only to find out I've wasted my time.
So can anyone advise me? What's the best approach?

Python package to parse identifiers in a program (C, Scala, Lisp)?

In the title I mention 3 different languages in which I would like to find out if a python package exists which can give me a list of identifiers for a program in any of those; so doesn't have to be all three of them as I doubt it there would be one like that. So my question is does a function or class exist in python that allows me too get a list of identifiers for a specific program in a language, preferably one in the 3 I listed in the title. Any help appreciated.

In general, this is not possible without having a nearly complete language implementation.
There is a rudimentary preprocessor in C, which could allow to mask function declarations from an ad hoc scanning. There is a powerful metaprogramming in Lisp, which means you can only extract the definitions using a full-featured Lisp compiler, simple parsing won't help at all.
Scala is the simplest of these three, but still its syntax is over-bloated and you'll need at least a complete parser. Python is not nearly a right tool for doing this sort of things any way.

There's pycparser, which you can use to generate a C AST from code and then traverse it to get whatever you want.
There's this simple lisp interpreter in Python from which you should be able to scrap the parser.
And I doubt there's anything similar and readily available for Scala, but you can use something like ply to make a parser. It won't be as easy, but will do.

Generating parser in Python language from JavaCC source?

I do mean the ??? in the title because I'm not exactly sure. Let me explain the situation.
I'm not a computer science student & I never did any compilers course. Till now I used to think that compiler writers or students who did compilers course are outstanding because they had to write Parser component of the compiler in whatever language they are writing the compiler. It's not an easy job right?
I'm dealing with Information Retrieval problem. My desired programming language is Python.
Parser Nature:
http://ir.iit.edu/~dagr/frDocs/fr940104.0.txt is the sample corpus. This file contains around 50 documents with some XML style markup. (You can see it in above link). I need to note down other some other values like <DOCNO> FR940104-2-00001 </DOCNO> & <PARENT> FR940104-2-00001 </PARENT> and I only need to index the <TEXT> </TEXT> portion of document which contains some varying tags which I need to strip down and a lot of  comments that are to be neglected and some &hyph; &space; & character entities. I don't know why corpus has things like this when its know that it's neither meant to be rendered by browser nor a proper XML document.
I thought of using any Python XML parser and extract desired text. But after little searching I found JavaCC parser source code (Parser.jj) for the same corpus I'm using here. A quick look up on JavaCC followed by Compiler-compiler revealed that after all compiler writers aren't as great as I thought. They use Compiler-compiler to generate parser code in desired language. Wiki says input to compiler-compiler is input is a grammar (usually in BNF). This is where I'm lost.
Is Parser.jj the grammar (Input to compiler-compiler called JavaCC)? It's definitely not BNF. What is this grammar called? Why is this grammar has Java language? Isn't there any universal grammar language?
I want python parser for parsing the corpus. Is there any way I can translate Parser.jj to get python equivalent? If yes, what is it? If no, what are my other options?
By any chance does any one know what is this corpus? Where is its original source? I would like to see some description for it. It is distributed on internet with name frDocs.tar.gz

Why do you call this "XML-style" markup? - this looks like pretty standard/basic XML to me.
Try elementTree or lxml. Instead of writing a parser, use one of the stable, well-hardened libraries that are already out there.

You can't build a parser - let alone a whole compiler - from a(n E)BNF grammar - it's just the grammar, i.e. syntax (and some syntax, like Python's indentation-based block rules, can't be modeled in it at all), not the semantics. Either you use seperate tools for these aspects, or use a more advances framework (like Boost::Spirit in C++ or Parsec in Haskell) that unifies both.
JavaCC (like yacc) is responsible for generating a parser, i.e. the subprogram that makes sense of the tokens read from the source code. For this, they mix a (E)BNF-like notation with code written in the language the resulting parser will be in (for e.g. building a parse tree) - in this case, Java. Of course it would be possible to make up another language - but since the existing languages can handle those tasks relatively well, it would be rather pointless. And since other parts of the compiler might be written by hand in the same language, it makes sense to leave the "I got ze tokens, what do I do wit them?" part to the person who will write these other parts ;)
I never heard of "PythonCC", and google didn't either (well, theres a "pythoncc" project on google code, but it's describtion just says "pythoncc is a program that tries to generate optimized machine Code for Python scripts." and there was no commit since march). Do you mean any of these python parsing libraries/tools? But I don't think there's a way to automatically convert the javaCC code to a Python equivalent - but the whole thing looks rather simple, so if you dive in and learn a bit about parsing via javaCC and [python library/tool of your choice], you might be able to translate it...

Scripting language for trading strategy development

I'm currently working on a component of a trading product that will allow a quant or strategy developer to write their own custom strategies. I obviously can't have them write these strategies in natively compiled languages (or even a language that compiles to a bytecode to run on a vm) since their dev/test cycles have to be on the order of minutes.
I've looked at lua, python, ruby so far and really enjoyed all of them so far, but still found them a little "low level" for my target users. Would I need to somehow write my own parser + interpreter to support a language with a minimum of support for looping, simple arithmatic, logical expression evaluation, or is there another recommendation any of you may have? Thanks in advance.

Mark-Jason Dominus, the author of Perl's Text::Template module, has some insights that might be relevant:
When people make a template module
like this one, they almost always
start by inventing a special syntax
for substitutions. For example, they
build it so that a string like %%VAR%%
is replaced with the value of $VAR.
Then they realize the need extra
formatting, so they put in some
special syntax for formatting. Then
they need a loop, so they invent a
loop syntax. Pretty soon they have a
new little template language.
This approach has two problems: First,
their little language is crippled. If
you need to do something the author
hasn't thought of, you lose. Second:
Who wants to learn another language?
If you write your own mini-language, you could end up in the same predicament -- maintaining a grammar and a parser for a tool that's crippled by design.
If a real programming language seems a bit too low-level, the solution may not be to abandon the language but instead to provide your end users with higher-level utility functions, so that they can operate with familiar concepts without getting bogged down in the weeds of the underlying language.
That allows beginning users to operate at a high level; however, you and any end users with a knack for it -- your super-users -- can still leverage the full power of Ruby or Python or whatever.

It sounds like you might need to create some sort of Domain Specific Language (DSL) for your users that could be built loosely on top of the target language. Ruby, Python and Lua all have their various quirks regarding syntax, and to a degree some of these can be massaged with clever function definitions.
An example of a fairly robust DSL is Cucumber which implements a an interesting strategy of converting user-specified verbiage to actual executable code through a series of regular expressions applied to the input data.
Another candidate might be JavaScript, or some kind of DSL to JavaScript bridge, as that would allow the strategy to run either client-side or server-side. That might help scale your application since client machines often have surplus computing power compared to a heavily loaded server.

Custom-made modules are going to be needed, no matter what you choose, that define your firm's high level constructs.
Here are some of the needs I envision -- you may have some of these covered already: a way to get current positions, current and historical quotes, previous performance data, etc... into the application. Define/backtest/send various kinds of orders (limit/market/stop, what exchange, triggers) or parameters of options, etc... You probably are going to need multiple sandboxes for testing as well as the real thing.
Quants want to be able to do matrix operations, stochastic calculus, PDEs.
If you wanted to do it in python, loading NumPy would be a start.
You could also start with a proprietary system designed to do mathematical financial research such as something built on top of Mathematica or Matlab.

I've been working on a Python Algorithmic Trading Library (actually for backtesting, not for real trading). You may want to take a look at it: http://gbeced.github.com/pyalgotrade/

Check out http://www.tadeveloper.com for a backtesting framework using MATLAB as a scripting language. MATLAB has the advantage that it is very powerful but you do not need to be a programmer to use it.

This might be a bit simplistic, but a lot of quant users are used to working with Excel & VBA macros. Would something like VBSCript be usable, as they may have some experience in this area.

Existing languages are "a little "low level" for my target users."
Yet, all you need is "a minimum of support for looping, simple arithmatic, logical expression evaluation"
I don't get the problem. You only want a few features. What's wrong with the list of languages you provided? They actually offer those features?
What's the disconnect? Feel free to update your question to expand on what the problem is.

I would use Common Lisp, which supports rapid development (you have a running image and can compile/recompile individual functions) and tailoring the language to your domain. You would provide functions and macros as building blocks to express strategies, and the whole language would be available to the user for combining these.

Is something along the lines of Processing the complexity level that you're shooting for? Processing is a good example of taking a full-blown language (Java) and reducing/simplifying the available syntax into only a subset applicable to the problem domain (problem domain = visualization in the case of Processing).
Here's a little side-by-side comparison from the Processing docs.
Java:
g.setColor(Color.black)
fillRect(0, 0, size.width, size.height);
Processing:
background(0);
As others have suggested, you may be able to simply write enough high-level functions such that most of the complexity is hidden from the user but you still retain the ability to do more low-level things when necessary. The Wiring language for Arduino follows this strategy of using a thin layer of high-level functions on top of C in order to make it more accessible to non-programmers and hobbyists.

Define the language first -- if possible, use the pseudo-language called EBN, it's very simple (see the Wikipedia entry).
Then once you have that, pick the language. Almost certainly you will want to use a DSL. Ruby and Lua are both really good at that, IMO.
Once you start working on it, you may find that you go back to your definition and tweak it. But that's the right order to do things, I think.

I have been in the same boat building and trading with my own software. Java is not great because you want something higher level like you say. I have had a lot of success using the eclipse project xtext. http://www.eclipse.org/Xtext It does all the plumbing of building parsers etc. for you and using eclipse you can quickly generate code with functional editors. I suggest looking into this as you consider other options as well. This combined with the eclipse modeling framework is very powerful for quickly building DSL's which sounds like you need. - Duncan

Mini-languages in Python

I'm after creating a simple mini-language parser in Python, programming close to the problem domain and all that.
Anyway, I was wondering how the people on here would go around doing that - what are the preferred ways of doing this kind of thing in Python?
I'm not going to give specific details of what I'm after because at the moment I'm just investigating how easy this whole field is in Python.

Pyparsing is handy for writing "little languages". I gave a presentation at PyCon'06 on writing a simple adventure game engine, in which the language being parsed and interpreted was the game command set ("inventory", "take sword", "drop book", etc.). (Source code here.)
You can also find links to other pyparsing articles at the pyparsing wiki.

I have limited but positive experience with PLY (Python Lex-Yacc). It combines Lex and Yacc functionality in a single Python class. You may want to check it out.
Fellow Stackoverflow'er Ned Batchelder has a nice overview of available tools on his website. There's also an overview on the Python website itself.

I would recommend funcparserlib. It was written especially for parsing little languages and DSLs and it is faster and smaller than pyparsing (see stats on its homepage). Minimalists and functional programmers should like funcparserlib.
Edit: By the way, I'm the author of this library, so my opinion may be biased.

Python is such a wonderfully simple and extensible language that I'd suggest merely creating a comprehensive python module, and coding against that.
I see that while I typed up the above, PLY has already been mentioned.

If you ask me this now, I would try the textx library for python. You can very easily create a dsl in that with python! Advantages are that it creates an AST for you, and lexing and parsing is combined.
http://igordejanovic.net/textX/

In order to be productive, I'd always use a parser generator like CocoPy (Tutorial) to have your grammar transformed into a (correct) parser (unless you want to implement the parser manually for the sake of learning).
The rest is writing the actual interpreter/compiler (Create stack-based byte code or memory AST to be interpreted and then evaluate it).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.