Alternative language for pattern-matching - python

I'm writing a program in Mathematica that relies on pattern-matching to perform payroll and warrant of payment verification. The crux of the problem is to compare different data files (both CSV and XLS) to make sure they contain the exact same information, since pay is handled by two different third-parties.
My use of Mathematica makes development of the program quite streamlined and fun, but is prohibitive on a distribution level. CDF format is not an option, since the program requires the user to import data files, something which WRI does not permit in CDF.
An ideal programming language for this task would enable me to pack it up as standalone, for OS X, Linux or Windows, as well as being able to do the pattern-matching. Support for GUI (primitive or extensive) is also needed.
I thought of Python to translate my program in, but I'm not sure if it's a good bet.
What suggestions do you have?
My only understanding of pattern-matching is that which the Mathematica documentation has taught me.
An example of a task that Mathematica handles perfectly is the following:
Import XLS file, sort data by dates and names, extract certain dates and names. Import CSV file, sort data by dates and names, extract certain dates and names.
Compare both, produce a nice formatted output containing desired (missing) information.
Navigating through the data in Mathematica is also pretty easy and intuitive.

Consider Haskell which seems to have all the features you want and is cross-platform.

If you want to use a more standard language that has some capabilities for working with spreadsheets, unless I'm misunderstanding the question, I would suggest using just simple Java with the Apache POI library, specifically made for horrible spreadsheet formats. Also it's considerably faster to pick up than Haskell is, though I suppose if you already know Mathematica it wouldn't be that bad to move over to another mathematically inclined language.

Prolog is a logical programming language in that it actually does a proof based on the facts that you give it. Thus is you provide it with the approiate facts for warrenty or payroll information it will be able to prove that it is either of them by trying to get to a base case in which both sides of an equation cancel. There is more to this but I'm on my phone at the moment.
For your situation you would be able to read data in a easier to program language and verify you parameters in Prolog and as long as your Prolog facts are correct it will be able to quickly verify that your data is valid. It can be thought of as regular expressions on steroids with a lot more functionality.
http://www.amzi.com/articles/lsapi_design.htm

Related

How to automate specman files changes with python?

I am working on a specman environment (hardware verification language), and I want to automate my tasks.
In order to do so, I learned Python programming with the target to use the file manipulation abilities. The problem is that I know only how to manipulate .txt files, Is there a way to change different kind of files?
Your question is way too generic. It's possible to change *.e files using string matching, maybe in some cases this makes sense as a one-time task, but there couldn't be any rules for that. Writing e parser in python doesn't sound like a feasible task.
The only reasonable way to analyze e code is to load it and use reflection. But not always you can feed the results to python to let it make any meaningful modifications.
It's totally possible to use python to generate e code based on some formally defined specs, specifically mentioned coverage, generation constraints, etc. It can be efficient and maintainable approach. However, there are different facilities for that, including tables.
Python certainly can be used for all kinds of smart scriptology: define environment, track installations and versions, choose flows, generate stubs, etc.

parsing PDB header information with python (protein structure files)?

Is there a parser for PDB-files (Protein Data Bank) that can extract (most) information from the header/REMARK-section, like refinement statistics, etc.?
It might be worthwhile to note that I am mainly interested in accessing data from files right after they have been produced, not from structures that have already been deposited in the Protein Data Bank. This means that there is quite a variety of different "propriety" formats to deal with, depending on the refinement software used.
I've had a look at Biopython, but they explicitly state in the FAQ that "If you are interested in data mining the PDB header, you might want to look elsewhere because there is only limited support for this."
I am well aware that it would be a lot easier to extract this information from mmCIF-files, but unfortunately these are still not output routinely from many macromolecular crystallography programs.
The best way I have found so far is converting the PDB-file into mmcif-format using pdb_extract (http://pdb-extract.wwpdb.org/ , either online or as a standalone).
The mmcif-file can be parsed using Biopythons Bio.PDB-module.
Writing to the mmcif-file is a bit trickier, Python PDBx seems to work reasonably well.
This, and other useful PDB-/mmcif-tools can be found at http://mmcif.wwpdb.org/docs/software-resources.html
Maybe you should try that library?
https://pypi.python.org/pypi/bioservices

Best way to build an application based on R?

I'm looking for suggestions on how to go about building an application that uses R for analytics, table generation, and plotting. What I have in mind is an application that:
displays various data tables in different tabs, somewhat like in Excel, and the columns should be sortable by clicking.
takes user input parameters in some dialog windows.
displays plots dynamically (i.e. user-input-dependent) either in a tab or in a new pop-up window/frame
Note that I am not talking about a general-purpose fron-end/GUI for exploring data with R (like say Rattle), but a specific application.
Some questions I'd like to see addressed are:
Is an entirely R-based approach even possible ( on Windows ) ? The following passage from the Rattle article in R-Journal intrigues me:
It is interesting to note that the
first implementation of Rattle
actually used Python for implementing
the callbacks and R for the
statistics, using rpy. The release
of RGtk2 allowed the interface el-
ements of Rattle to be written
directly in R so that Rattle is a
fully R-based application
If it's better to use another language for the GUI part, which language is best suited for this? I'm looking for a language where it's relatively "painless" to build the GUI, and that also integrates very well with R. From this StackOverflow question How should I do rapid GUI development for R and Octave methods (possibly with Python)? I see that Python + PyQt4 + QtDesigner + RPy2 seems to be the best combo. Is that the consensus ?
Anyone have pointers to specific (open source) applications of the type I describe, as examples that I can learn from?
There are lots of ways to do this, including the python approach you mention. If you want to do it solely within R and if your aims are modest enough, the gWidgets package can be used. This exposes some of the features of either RGtk2, tcltk or qtbase (see the qtinterfaces project on r-forge) in a manner that is about as painless as can be. If you want more, look at using those packages directly. I'd recommend RGtk2 if you are going to share with others and if not, qtbase or tcltk.
Python + Qt4 + RPy = Much Win.
For example, see what Carson Farmer has done with Qgis and the ManageR plugin - its a full R interface to geographic data in the Qgis mapping package.
Depending on how much statistical functionality you need you might even get away without needing it at all, doing all the stats in Python, leveraging such goodies as the Numpy numeric package and the Qwt plotting canvas.
How about traditional LAMP + a R backend? Optionally s/MySQL/Postgres and optionally s/PHP/Perl Rapache looks pretty cool too: rapache.net
If you go for C++, take a look at rcpp and Rinside
Java can be combined with R using JRI
RServe gives you a TCP/IP protocol to interact with R. There's a Java client and a C++ client, so either of them can be used.
On a sidenote: Another thing you should be aware of, is that R contains quite some libraries written in Fortran and C, that can be called directly. Same goes for more advanced packages like VGAM, they also contain quite some C routines. Depending on what exactly you want to do, you might try to work with those ones, just to avoid the overhead of the R functions itself.
I've been looking for an overview of those myself, but AFAIK you'll have to do some effort to get everything. Some things you certainly should look at is The R language definition and R Internals.

How can I go about securely executing a subset of python?

I need to store source code for a basic function in a database and allow it to be modified through an admin interface. This code will take several numbers and strings as parameters, and return a number or None. I know that eval is evil, so I need to implement a safe way to execute a very basic subset of python, or something syntactically similar at least, from within a python based web-app.
The obvious answer is to implement a DSL (Domain Specific Language), however, I have no experience with that, nor do I have any idea where to begin, and a lot of the resources available seem to go a little over my head. I'm hoping that maybe there is something already out there which will allow me to essentially generate a secure python-callable function from a string in a database. the language really only needs to support assignment, basic math, if/else, and case insensitive string comparisons. any other features are a bonus, but I think most things can be done with just that, no need for complex data structures, classes, functions, etc.
If no such thing currently exists, I'm willing to look into the possibility of creating one, but as I said, I have no idea how to go about that, and any advice in that regard would be appreciated as well.
Restricted Python environments are hard to make really safe.
Maybe something like lua is a better fit for you
PySandbox might help. I haven't tested it, just found it linked elsewhere.
You could use Pyparsing to implement your DSL, provided the expressions involved won't be too complex (you don't give full details on that but you imply the requirements are pretty simple). See the examples page including specifically fourFn.py or simpleCalc.py.
You could implement a subset of Python by using the ast module to parse Python code into an abstract syntax tree then walk the tree checking that it only uses the subset of Python that you allow. This will only work in Python 2.x since Python 3 has removed the ast module.
However even using this method it will be hard to create something that is 100% secure, since even the most innocuous code could allow the user to write something that could blow up your application, e.g. by allocating more memory than you have available or putting the program into an infinite loop using all the CPU.

Scripting language for trading strategy development

I'm currently working on a component of a trading product that will allow a quant or strategy developer to write their own custom strategies. I obviously can't have them write these strategies in natively compiled languages (or even a language that compiles to a bytecode to run on a vm) since their dev/test cycles have to be on the order of minutes.
I've looked at lua, python, ruby so far and really enjoyed all of them so far, but still found them a little "low level" for my target users. Would I need to somehow write my own parser + interpreter to support a language with a minimum of support for looping, simple arithmatic, logical expression evaluation, or is there another recommendation any of you may have? Thanks in advance.
Mark-Jason Dominus, the author of Perl's Text::Template module, has some insights that might be relevant:
When people make a template module
like this one, they almost always
start by inventing a special syntax
for substitutions. For example, they
build it so that a string like %%VAR%%
is replaced with the value of $VAR.
Then they realize the need extra
formatting, so they put in some
special syntax for formatting. Then
they need a loop, so they invent a
loop syntax. Pretty soon they have a
new little template language.
This approach has two problems: First,
their little language is crippled. If
you need to do something the author
hasn't thought of, you lose. Second:
Who wants to learn another language?
If you write your own mini-language, you could end up in the same predicament -- maintaining a grammar and a parser for a tool that's crippled by design.
If a real programming language seems a bit too low-level, the solution may not be to abandon the language but instead to provide your end users with higher-level utility functions, so that they can operate with familiar concepts without getting bogged down in the weeds of the underlying language.
That allows beginning users to operate at a high level; however, you and any end users with a knack for it -- your super-users -- can still leverage the full power of Ruby or Python or whatever.
It sounds like you might need to create some sort of Domain Specific Language (DSL) for your users that could be built loosely on top of the target language. Ruby, Python and Lua all have their various quirks regarding syntax, and to a degree some of these can be massaged with clever function definitions.
An example of a fairly robust DSL is Cucumber which implements a an interesting strategy of converting user-specified verbiage to actual executable code through a series of regular expressions applied to the input data.
Another candidate might be JavaScript, or some kind of DSL to JavaScript bridge, as that would allow the strategy to run either client-side or server-side. That might help scale your application since client machines often have surplus computing power compared to a heavily loaded server.
Custom-made modules are going to be needed, no matter what you choose, that define your firm's high level constructs.
Here are some of the needs I envision -- you may have some of these covered already: a way to get current positions, current and historical quotes, previous performance data, etc... into the application. Define/backtest/send various kinds of orders (limit/market/stop, what exchange, triggers) or parameters of options, etc... You probably are going to need multiple sandboxes for testing as well as the real thing.
Quants want to be able to do matrix operations, stochastic calculus, PDEs.
If you wanted to do it in python, loading NumPy would be a start.
You could also start with a proprietary system designed to do mathematical financial research such as something built on top of Mathematica or Matlab.
I've been working on a Python Algorithmic Trading Library (actually for backtesting, not for real trading). You may want to take a look at it: http://gbeced.github.com/pyalgotrade/
Check out http://www.tadeveloper.com for a backtesting framework using MATLAB as a scripting language. MATLAB has the advantage that it is very powerful but you do not need to be a programmer to use it.
This might be a bit simplistic, but a lot of quant users are used to working with Excel & VBA macros. Would something like VBSCript be usable, as they may have some experience in this area.
Existing languages are "a little "low level" for my target users."
Yet, all you need is "a minimum of support for looping, simple arithmatic, logical expression evaluation"
I don't get the problem. You only want a few features. What's wrong with the list of languages you provided? They actually offer those features?
What's the disconnect? Feel free to update your question to expand on what the problem is.
I would use Common Lisp, which supports rapid development (you have a running image and can compile/recompile individual functions) and tailoring the language to your domain. You would provide functions and macros as building blocks to express strategies, and the whole language would be available to the user for combining these.
Is something along the lines of Processing the complexity level that you're shooting for? Processing is a good example of taking a full-blown language (Java) and reducing/simplifying the available syntax into only a subset applicable to the problem domain (problem domain = visualization in the case of Processing).
Here's a little side-by-side comparison from the Processing docs.
Java:
g.setColor(Color.black)
fillRect(0, 0, size.width, size.height);
Processing:
background(0);
As others have suggested, you may be able to simply write enough high-level functions such that most of the complexity is hidden from the user but you still retain the ability to do more low-level things when necessary. The Wiring language for Arduino follows this strategy of using a thin layer of high-level functions on top of C in order to make it more accessible to non-programmers and hobbyists.
Define the language first -- if possible, use the pseudo-language called EBN, it's very simple (see the Wikipedia entry).
Then once you have that, pick the language. Almost certainly you will want to use a DSL. Ruby and Lua are both really good at that, IMO.
Once you start working on it, you may find that you go back to your definition and tweak it. But that's the right order to do things, I think.
I have been in the same boat building and trading with my own software. Java is not great because you want something higher level like you say. I have had a lot of success using the eclipse project xtext. http://www.eclipse.org/Xtext It does all the plumbing of building parsers etc. for you and using eclipse you can quickly generate code with functional editors. I suggest looking into this as you consider other options as well. This combined with the eclipse modeling framework is very powerful for quickly building DSL's which sounds like you need. - Duncan

Categories