Best python library to use for a BNF based autocompleter - python

I want to create a SQL autocompleter for use with rlwrap: https://github.com/hanslub42/rlwrap
This could then be used with sqlite3 & osqueri for example (I know they already have some autocompletion facility, but it's not good enough, especially under rlwrap).
In fact, more generally I would like to know the best approach for building autocompleters based on BNF grammar descriptions; I may want to produce autocompleters for other rlwrapped REPLs at some point in the future.
I have no experience with parsers, but I have read some stuff online about the different types of parsers and how they work, and this Pyleri tutorial: https://tomassetti.me/pyleri-tutorial/
Pyleri looks fairly straightforward, and has the expecting property which makes it easy to create a auto-completer, but AFAIK it would involve translating the sqlite BNF (and any other BNF's that I might want to use in the future) into python code, which is a drag.
ANTLR has lots of predefined grammar files for many different languages, and the ability to output python code, but I'm not sure how easy it is to produce an autocompleter, and I don't want to read through all the documentation only to find out I've wasted my time.
So can anyone advise me? What's the best approach?

Related

Is it possible to use a with grako generated parser without grako?

see the title. For a small tool I am writing I wanted to introduce a simple boolean filter language and decided to do that "properly" and use a parser-generator. After playing around with grako a bit I found I like it and got the filter-language done fairly quick (which is also nice :))
The problem is now, if I want to use the tool on other computers or give it to other people I first have to somehow make grako available there, which is a bit bothersome, because everything else is standard python3 stuff.
I guess it is possible by co-packaging the necessary grako-classes, but that seems a bit messy (licensing would be mentioned in any way). Maybe I have overlooked some built-in method.
The short answer is No.
Grako-generated parsers do require the grako library.
For example:
with self._group():
with self._choice():
with self._option():
self._token('nameguard')
with self._option():
self._token('ignorecase')
with self._option():
self._token('left_recursion')
self._error('expecting one of: ignorecase left_recursion nameguard')
All the self._xyz() come from either grako.contexts.ParseContext or grako.parsing.Parser. The backtracking, caching, and the book-keeping required are all hidden behind context managers and decorators.
Having generated parsers depend on grako was a design choice aimed at making the parsers smaller and easier to understand, which was one of the primary objectives of the project (as there are many otherwise-great parser generators that produce obfuscated code).
The other option was to copy the code that the generated parsers could depend on onto each parser, but that seemed a bit unpythonic.

Python package to parse identifiers in a program (C, Scala, Lisp)?

In the title I mention 3 different languages in which I would like to find out if a python package exists which can give me a list of identifiers for a program in any of those; so doesn't have to be all three of them as I doubt it there would be one like that. So my question is does a function or class exist in python that allows me too get a list of identifiers for a specific program in a language, preferably one in the 3 I listed in the title. Any help appreciated.
In general, this is not possible without having a nearly complete language implementation.
There is a rudimentary preprocessor in C, which could allow to mask function declarations from an ad hoc scanning. There is a powerful metaprogramming in Lisp, which means you can only extract the definitions using a full-featured Lisp compiler, simple parsing won't help at all.
Scala is the simplest of these three, but still its syntax is over-bloated and you'll need at least a complete parser. Python is not nearly a right tool for doing this sort of things any way.
There's pycparser, which you can use to generate a C AST from code and then traverse it to get whatever you want.
There's this simple lisp interpreter in Python from which you should be able to scrap the parser.
And I doubt there's anything similar and readily available for Scala, but you can use something like ply to make a parser. It won't be as easy, but will do.

What should I know about Python to identify comments in different source files?

I have a need to identify comments in different kinds of source files in a given directory. ( For example java,XML, JavaScript, bash). I have decided to do this using Python (as an attempt to learn Python). The questions I have are
1) What should I know about python to get this done? ( I have an idea that Regular Expressions will be useful but are there alternatives/other modules that will be useful? Libraries that I can use to get this done?)
2) Is Python a good choice for such a task? Will some other language make this easier to accomplish?
Your problem seems to be more related to programming language parsing. I believe with regular expressions you will be able to find comments in most of the languages. The good thing is that you have regular expressions almost everywhere: Perl, Python, Ruby, AWK, Sed, etc.
But, as the other answer said, you'd better use some parsing machinery. And, if not a full blown parser, a lexer. For Python, check out the Pygments library, which has lexers for many languages already implemented.
1) What you need to know about is parsing, not regex. Additionally you will need the os module and some knowledge about pythons file handling. DiveIntoPython (http://www.diveintopython.net/) is a good start here. I'd recommend chapter 6. (And maybe 1-5 as well :) )
2) Python is a good start. Another language is not going to make it easier, but different. Python allready is pretty simple to start with.
I would recommend not to use regex for your task, as it is as simple as searching for comment signs and linefeeds.
The pyparsing module directly supports several styles of comments. E.g.,
from pyparsing import javaStyleComment
for match in javaStyleComment.scanString(text):
<do stuff>
So if your goal is just getting the job done, look into this since the comment parsers are likely to be more robust than anything you'd throw together. If you're more interested in learning to do it yourself, this might be too much processed food for your taste.

How can I go about securely executing a subset of python?

I need to store source code for a basic function in a database and allow it to be modified through an admin interface. This code will take several numbers and strings as parameters, and return a number or None. I know that eval is evil, so I need to implement a safe way to execute a very basic subset of python, or something syntactically similar at least, from within a python based web-app.
The obvious answer is to implement a DSL (Domain Specific Language), however, I have no experience with that, nor do I have any idea where to begin, and a lot of the resources available seem to go a little over my head. I'm hoping that maybe there is something already out there which will allow me to essentially generate a secure python-callable function from a string in a database. the language really only needs to support assignment, basic math, if/else, and case insensitive string comparisons. any other features are a bonus, but I think most things can be done with just that, no need for complex data structures, classes, functions, etc.
If no such thing currently exists, I'm willing to look into the possibility of creating one, but as I said, I have no idea how to go about that, and any advice in that regard would be appreciated as well.
Restricted Python environments are hard to make really safe.
Maybe something like lua is a better fit for you
PySandbox might help. I haven't tested it, just found it linked elsewhere.
You could use Pyparsing to implement your DSL, provided the expressions involved won't be too complex (you don't give full details on that but you imply the requirements are pretty simple). See the examples page including specifically fourFn.py or simpleCalc.py.
You could implement a subset of Python by using the ast module to parse Python code into an abstract syntax tree then walk the tree checking that it only uses the subset of Python that you allow. This will only work in Python 2.x since Python 3 has removed the ast module.
However even using this method it will be hard to create something that is 100% secure, since even the most innocuous code could allow the user to write something that could blow up your application, e.g. by allocating more memory than you have available or putting the program into an infinite loop using all the CPU.

Scripting language for trading strategy development

I'm currently working on a component of a trading product that will allow a quant or strategy developer to write their own custom strategies. I obviously can't have them write these strategies in natively compiled languages (or even a language that compiles to a bytecode to run on a vm) since their dev/test cycles have to be on the order of minutes.
I've looked at lua, python, ruby so far and really enjoyed all of them so far, but still found them a little "low level" for my target users. Would I need to somehow write my own parser + interpreter to support a language with a minimum of support for looping, simple arithmatic, logical expression evaluation, or is there another recommendation any of you may have? Thanks in advance.
Mark-Jason Dominus, the author of Perl's Text::Template module, has some insights that might be relevant:
When people make a template module
like this one, they almost always
start by inventing a special syntax
for substitutions. For example, they
build it so that a string like %%VAR%%
is replaced with the value of $VAR.
Then they realize the need extra
formatting, so they put in some
special syntax for formatting. Then
they need a loop, so they invent a
loop syntax. Pretty soon they have a
new little template language.
This approach has two problems: First,
their little language is crippled. If
you need to do something the author
hasn't thought of, you lose. Second:
Who wants to learn another language?
If you write your own mini-language, you could end up in the same predicament -- maintaining a grammar and a parser for a tool that's crippled by design.
If a real programming language seems a bit too low-level, the solution may not be to abandon the language but instead to provide your end users with higher-level utility functions, so that they can operate with familiar concepts without getting bogged down in the weeds of the underlying language.
That allows beginning users to operate at a high level; however, you and any end users with a knack for it -- your super-users -- can still leverage the full power of Ruby or Python or whatever.
It sounds like you might need to create some sort of Domain Specific Language (DSL) for your users that could be built loosely on top of the target language. Ruby, Python and Lua all have their various quirks regarding syntax, and to a degree some of these can be massaged with clever function definitions.
An example of a fairly robust DSL is Cucumber which implements a an interesting strategy of converting user-specified verbiage to actual executable code through a series of regular expressions applied to the input data.
Another candidate might be JavaScript, or some kind of DSL to JavaScript bridge, as that would allow the strategy to run either client-side or server-side. That might help scale your application since client machines often have surplus computing power compared to a heavily loaded server.
Custom-made modules are going to be needed, no matter what you choose, that define your firm's high level constructs.
Here are some of the needs I envision -- you may have some of these covered already: a way to get current positions, current and historical quotes, previous performance data, etc... into the application. Define/backtest/send various kinds of orders (limit/market/stop, what exchange, triggers) or parameters of options, etc... You probably are going to need multiple sandboxes for testing as well as the real thing.
Quants want to be able to do matrix operations, stochastic calculus, PDEs.
If you wanted to do it in python, loading NumPy would be a start.
You could also start with a proprietary system designed to do mathematical financial research such as something built on top of Mathematica or Matlab.
I've been working on a Python Algorithmic Trading Library (actually for backtesting, not for real trading). You may want to take a look at it: http://gbeced.github.com/pyalgotrade/
Check out http://www.tadeveloper.com for a backtesting framework using MATLAB as a scripting language. MATLAB has the advantage that it is very powerful but you do not need to be a programmer to use it.
This might be a bit simplistic, but a lot of quant users are used to working with Excel & VBA macros. Would something like VBSCript be usable, as they may have some experience in this area.
Existing languages are "a little "low level" for my target users."
Yet, all you need is "a minimum of support for looping, simple arithmatic, logical expression evaluation"
I don't get the problem. You only want a few features. What's wrong with the list of languages you provided? They actually offer those features?
What's the disconnect? Feel free to update your question to expand on what the problem is.
I would use Common Lisp, which supports rapid development (you have a running image and can compile/recompile individual functions) and tailoring the language to your domain. You would provide functions and macros as building blocks to express strategies, and the whole language would be available to the user for combining these.
Is something along the lines of Processing the complexity level that you're shooting for? Processing is a good example of taking a full-blown language (Java) and reducing/simplifying the available syntax into only a subset applicable to the problem domain (problem domain = visualization in the case of Processing).
Here's a little side-by-side comparison from the Processing docs.
Java:
g.setColor(Color.black)
fillRect(0, 0, size.width, size.height);
Processing:
background(0);
As others have suggested, you may be able to simply write enough high-level functions such that most of the complexity is hidden from the user but you still retain the ability to do more low-level things when necessary. The Wiring language for Arduino follows this strategy of using a thin layer of high-level functions on top of C in order to make it more accessible to non-programmers and hobbyists.
Define the language first -- if possible, use the pseudo-language called EBN, it's very simple (see the Wikipedia entry).
Then once you have that, pick the language. Almost certainly you will want to use a DSL. Ruby and Lua are both really good at that, IMO.
Once you start working on it, you may find that you go back to your definition and tweak it. But that's the right order to do things, I think.
I have been in the same boat building and trading with my own software. Java is not great because you want something higher level like you say. I have had a lot of success using the eclipse project xtext. http://www.eclipse.org/Xtext It does all the plumbing of building parsers etc. for you and using eclipse you can quickly generate code with functional editors. I suggest looking into this as you consider other options as well. This combined with the eclipse modeling framework is very powerful for quickly building DSL's which sounds like you need. - Duncan

Categories