Lexing and Parsing Utilities

Lexing and Parsing Utilities - python

I'm looking for lexical analysis and parser-generating utilities that are not Flex or Bison. Requirements:
Parser is specified using a context-free LL(*) or GLR grammar. I would also consider PEGs.
Integrates tightly with a programming language that could be used for both scripting and application development. Language should also have facilities for easily interfacing with C. Good examples are Python, Ruby, and Guile. No C, Java, or Perl please. I want the language to be homogeneous; I want the parser generator to output code in the same language.
Well-documented and production-quality.
Open source. Free is also desirable (although not required).
Compatible with Linux distributions or one of the open source BSDs. I would consider OpenSolaris.
Rapid development is a considerably greater concern than efficiency.
Suited to parsing natural language as well as formal languages. Natural language parsing is limited to short, simple sentences with very little ambiguity.
I have my eye on ANTLR, although I have never used it. Comments to that effect are appreciated. Let me know what your favorite utilities are that meet these requirements, and why you would recommend them.

There is a list of modern Packrat parsers here.

NL text tends have to lots of ambiguity. If you want to parse natural langauge, I don't think any of the classic compiler-type parser generators (LALR, LL [including ANTLR]) will help you much, and compiler type parser generators typically don't handle this at all.
A GLR parser, which does handle ambiguity, may be of some use; bison offers this as an option.

Guile 2.0 (to be released in about a few days) has an LALR(1) parsing library.

Related

Ruby/Python - generating and parsing C/C++ code

I need to generate C structs and arrays from data stored in a db table, and alternately parse similar info. I use both ruby and python for this task, and was wondering if anyone heard of a module/lib that handles this for either/both languages? I could do this on my own with some string processing, but wanted to check if there's a known and tested parser out there that people know of. thanks.

Check out open-source software tool SWIG (Simplified Wrapper and Interface Generator). First sentence of the intro on the webpage:
SWIG is a software development tool
that connects programs written in C
and C++ with a variety of high-level
programming languages. SWIG is used
with different types of languages
including common scripting languages
such as Perl, PHP, Python, Tcl and
Ruby.
Very mature (initial release - February 1996 according to Wikipedia) and there are lots of tutorials, documentation, and help.

There is a basic C struct parser here on the pyparsing wiki. Pyparsing is a Python module for creating parsers by assembling separate parsing building blocks together. (No help on the Ruby part of the question, though.)

Haven't used it myself, but CAST might be worth a look:
http://cast.rubyforge.org/

Scripting language for trading strategy development

I'm currently working on a component of a trading product that will allow a quant or strategy developer to write their own custom strategies. I obviously can't have them write these strategies in natively compiled languages (or even a language that compiles to a bytecode to run on a vm) since their dev/test cycles have to be on the order of minutes.
I've looked at lua, python, ruby so far and really enjoyed all of them so far, but still found them a little "low level" for my target users. Would I need to somehow write my own parser + interpreter to support a language with a minimum of support for looping, simple arithmatic, logical expression evaluation, or is there another recommendation any of you may have? Thanks in advance.

Mark-Jason Dominus, the author of Perl's Text::Template module, has some insights that might be relevant:
When people make a template module
like this one, they almost always
start by inventing a special syntax
for substitutions. For example, they
build it so that a string like %%VAR%%
is replaced with the value of $VAR.
Then they realize the need extra
formatting, so they put in some
special syntax for formatting. Then
they need a loop, so they invent a
loop syntax. Pretty soon they have a
new little template language.
This approach has two problems: First,
their little language is crippled. If
you need to do something the author
hasn't thought of, you lose. Second:
Who wants to learn another language?
If you write your own mini-language, you could end up in the same predicament -- maintaining a grammar and a parser for a tool that's crippled by design.
If a real programming language seems a bit too low-level, the solution may not be to abandon the language but instead to provide your end users with higher-level utility functions, so that they can operate with familiar concepts without getting bogged down in the weeds of the underlying language.
That allows beginning users to operate at a high level; however, you and any end users with a knack for it -- your super-users -- can still leverage the full power of Ruby or Python or whatever.

It sounds like you might need to create some sort of Domain Specific Language (DSL) for your users that could be built loosely on top of the target language. Ruby, Python and Lua all have their various quirks regarding syntax, and to a degree some of these can be massaged with clever function definitions.
An example of a fairly robust DSL is Cucumber which implements a an interesting strategy of converting user-specified verbiage to actual executable code through a series of regular expressions applied to the input data.
Another candidate might be JavaScript, or some kind of DSL to JavaScript bridge, as that would allow the strategy to run either client-side or server-side. That might help scale your application since client machines often have surplus computing power compared to a heavily loaded server.

Custom-made modules are going to be needed, no matter what you choose, that define your firm's high level constructs.
Here are some of the needs I envision -- you may have some of these covered already: a way to get current positions, current and historical quotes, previous performance data, etc... into the application. Define/backtest/send various kinds of orders (limit/market/stop, what exchange, triggers) or parameters of options, etc... You probably are going to need multiple sandboxes for testing as well as the real thing.
Quants want to be able to do matrix operations, stochastic calculus, PDEs.
If you wanted to do it in python, loading NumPy would be a start.
You could also start with a proprietary system designed to do mathematical financial research such as something built on top of Mathematica or Matlab.

I've been working on a Python Algorithmic Trading Library (actually for backtesting, not for real trading). You may want to take a look at it: http://gbeced.github.com/pyalgotrade/

Check out http://www.tadeveloper.com for a backtesting framework using MATLAB as a scripting language. MATLAB has the advantage that it is very powerful but you do not need to be a programmer to use it.

This might be a bit simplistic, but a lot of quant users are used to working with Excel & VBA macros. Would something like VBSCript be usable, as they may have some experience in this area.

Existing languages are "a little "low level" for my target users."
Yet, all you need is "a minimum of support for looping, simple arithmatic, logical expression evaluation"
I don't get the problem. You only want a few features. What's wrong with the list of languages you provided? They actually offer those features?
What's the disconnect? Feel free to update your question to expand on what the problem is.

I would use Common Lisp, which supports rapid development (you have a running image and can compile/recompile individual functions) and tailoring the language to your domain. You would provide functions and macros as building blocks to express strategies, and the whole language would be available to the user for combining these.

Is something along the lines of Processing the complexity level that you're shooting for? Processing is a good example of taking a full-blown language (Java) and reducing/simplifying the available syntax into only a subset applicable to the problem domain (problem domain = visualization in the case of Processing).
Here's a little side-by-side comparison from the Processing docs.
Java:
g.setColor(Color.black)
fillRect(0, 0, size.width, size.height);
Processing:
background(0);
As others have suggested, you may be able to simply write enough high-level functions such that most of the complexity is hidden from the user but you still retain the ability to do more low-level things when necessary. The Wiring language for Arduino follows this strategy of using a thin layer of high-level functions on top of C in order to make it more accessible to non-programmers and hobbyists.

Define the language first -- if possible, use the pseudo-language called EBN, it's very simple (see the Wikipedia entry).
Then once you have that, pick the language. Almost certainly you will want to use a DSL. Ruby and Lua are both really good at that, IMO.
Once you start working on it, you may find that you go back to your definition and tweak it. But that's the right order to do things, I think.

I have been in the same boat building and trading with my own software. Java is not great because you want something higher level like you say. I have had a lot of success using the eclipse project xtext. http://www.eclipse.org/Xtext It does all the plumbing of building parsers etc. for you and using eclipse you can quickly generate code with functional editors. I suggest looking into this as you consider other options as well. This combined with the eclipse modeling framework is very powerful for quickly building DSL's which sounds like you need. - Duncan

Parsing XML - right scripting languages / packages for the job?

I know that any language is capable of parsing XML; I'm really just looking for advantages or drawbacks that you may have come across in your own experiences. Perl would be my standard go to here, but I'm open to suggestions.
Thanks!
UPDATE: I ended up going with XML::Simple which did a nice job, but I have one piece of advice if you plan to use it--research the forcearray option first. I had to rewrite a bunch of statements after learning that it is usually best practice to set forcearray. This page had the clearest explanation that I could find. Frankly, I'm surprised this isn't the default behavior.

If you are using Perl then I would recommend XML::Simple:
As more and more Web sites begin using
XML for their content, it's
increasingly important for Web
developers to know how to parse XML
data and convert it into different
formats. That's where the Perl module
called XML::Simple comes in. It takes
away the drudgery of parsing XML data,
making the process easier than you
ever thought possible.

XML::Twig is very nice, especially because it’s not as awfully verbose as some of the other options.

For pure XML parsing, I wouldn't use Java, C#, C++, C, etc. They tend to overcomplicate things, as in you want a banana and get the gorilla with it as well.
Higher-level and interpreted languages such as Perl, PHP, Python, Groovy are more suitable. Perl is included in virtually every Linux distro, as is PHP for the most part.
I've used Groovy recently for especially this and found it very easy. Mind you though that a C parser will be orders of magnitude faster than Groovy for instance.

It's all going to be in the libraries.
Python has great libraries for XML. My preference is lxml. It uses libxml/libxslt so it's fast, but the Python binding make it really easy to use. Perl may very well have equally awesome OO libraries.

I saw that people recommend XML::Simple if you decide on Perl.
While XML::Simple is, indeed, very simple to use and great, is a DOM parser. As such, it is, sadly, completely unsuitable to processing large XML files as your process would run out of memory (it's a common problem for any DOM parser, not limited to XML::Simple or Perl).
So, for large files, you must pick a SAX parser in whichever language you choose (there are many XML SAX parsers in Perl, or use another stream parser like XML::Twig that is even better than standard SAX parser. Can't speak for other languages).

Not exactly a scripting language, but you could also consider Scala. You can start from here.

Scala's XML support is rather good, especially as XML can just be typed directly into Scala programs.
Microsoft also did some cool integrated stuff with their LINQ for XML
But I really like Elementtree and just that package alone is a good reason to use Python instead of Perl ;)
Here's an example:
import elementtree.ElementTree as ET
# build a tree structure
root = ET.Element("html")
head = ET.SubElement(root, "head")
title = ET.SubElement(head, "title")
title.text = "Page Title"
body = ET.SubElement(root, "body")
body.set("bgcolor", "#ffffff")
body.text = "Hello, World!"
# wrap it in an ElementTree instance, and save as XML
tree = ET.ElementTree(root)
tree.write("page.xhtml")

It's not a scripting language, but Scala is great for working with XML natively. Also, see this book (draft) by Burak.

Python has some pretty good support for XML. From the standard library DOM packages to much more 'pythonic' libraries that parse XML directly into more usable object structures.
There isn't really a 'right' language though... there are good XML packages for most languages nowadays.

If you're going to use Ruby to do it then you're going to want to take a look at Nokogiri or Hpricot. Both have their strengths and weaknesses. The language and package selection really comes down to what you want to do with the data after you've parsed it.

Reading Data out of XML files is dead easy with C# and LINQ to XML!
Somehow, although I really love python, I found it hard to parse XML with the standard libraries.

I would say it depends like everything else. VB.NET 2008 uses XML literals, has IntelliSense for LINQ to XML, and a few power toys that help turn XML into XSD. So personally, if you are working in a .NET environment I think this is the best choice.

What's a good resource for starting to write a programming language, that's not context free? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
I'm looking to write a programming language for fun, however most of the resource I have seen are for writing a context free language, however I wish to write a language that, like python, uses indentation, which to my understanding means it can't be context free.

A context-free grammar is, simply, one that doesn't require a symbol table in order to correctly parse the code. A context-sensitive grammar does.
The D programming language is an example of a context free grammar. C++ is a context sensitive one. (For example, is T*x declaring x to be pointer to T, or is it multiplying T by x ? We can only tell by looking up T in the symbol table to see if it is a type or a variable.)
Whitespace has nothing to do with it.
D uses a context free grammar in order to greatly simplify parsing it, and so that simple tools can parse it (such as syntax highlighting editors).

You might want to read this rather well written essay on parsing Python, Python: Myths about Indentation.
While I haven't tried to write a context free parser using something like yacc, I think it may be possible using a conditional lexer to return the indentation change tokens as described in the url.
By the way, here is the official python grammar from python.org: http://www.python.org/doc/current/ref/grammar.txt

I would familiarize myself with the problem first by reading up on some of the literature that's available on the subject. The classic Compilers book by Aho et. al. may be heavy on the math and comp sci, but a much more aproachable text is the Let's Build a Compiler articles by Jack Crenshaw. This is a series of articles that Mr. Crenshaw wrote back in the late 80's and it's the most under-appreciated text on compilers ever written. The approach is simple and to the point: Mr. Crenshaw shows "A" approach that works. You can easily go through the content in the span of a few evenings and have a much better understanding of what a compiler is all about. A couple of caveats are that the examples in the text are written in Turbo Pascal and the compilers emit 68K assembler. The examples are easy enough to port to a more current programming language and I recomment Python for that. But if you want to follow along as the examples are presented you will at least need Turbo Pascal 5.5 and a 68K assembler and emulator. The text is still relevant today and using these old technologies is really fun. I highly recommend it as anyone's first text on compilers. The great news is that languages like Python and Ruby are open sourced and you can download and study the C source code in order to better understand how it's done.

"Context-free" is a relative term. Most context-free parsers actually parse a superset of the language which is context-free and then check the resulting parse tree to see if it is valid. For example, the following two C programs are valid according to the context-free grammar of C, but one quickly fails during context-checking:
int main()
{
int i;
i = 1;
return 0;
}
int main()
{
int i;
i = "Hello, world";
return 0;
}
Free of context, i = "Hello, world"; is a perfectly valid assignment, but in context you can see that the types are all wrong. If the context were char* i; it would be okay. So the context-free parser will see nothing wrong with that assignment. It's not until the compiler starts checking types (which are context dependent) that it will catch the error.
Anything that can be produced with a keyboard can be parsed as context-free; at the very least you can check that all the characters used are valid (the set of all strings containing only displayable Unicode Characters is a context-free grammar). The only limitation is how useful your grammar is and how much context-sensitive checking you have to do on your resulting parse tree.
Whitespace-dependent languages like Python make your context-free grammar less useful and therefore require more context-sensitive checking later on (much of this is done at runtime in Python through dynamic typing). But there is still plenty that a context-free parser can do before context-sensitive checking is needed.

I don't know of any tutorials/guides, but you could try looking at the source for tinypy, it's a very small implementation of a python like language.

Using indentation in a language doesn't necessarily mean that the language's grammar can not be context free. I.e. the indentation will determine in which scope a statement exists. A statement will still be a statement no matter which scope it is defined within (scope can often be handled by a different part of the compiler/interpreter, generally during a semantic parse).
That said a good resource is the antlr tool (http://www.antlr.org). The author of the tool has also produced a book on creating parsers for languages using antlr (http://www.pragprog.com/titles/tpantlr/the-definitive-antlr-reference). There is pretty good documentation and lots of example grammars.

If you're really going to take a whack at language design and implementation, you might want to add the following to your bookshelf:
Programming Language Pragmatics, Scott et al.
Design Concepts in Programming Languages, Turbak et al.
Modern Compiler Design, Grune et al. (I sacrilegiously prefer this to "The Dragon Book" by Aho et al.)
Gentler introductions such as:
Crenshaw's tutorial (as suggested by #'Jonas Gorauskas' here)
The Definitive ANTLR Reference by Parr
Martin Fowler's recent work on DSLs
You should also consider your implementation language. This is one of those areas where different languages vastly differ in what they facilitate. You should consider languages such as LISP, F# / OCaml, and Gilad Bracha's new language Newspeak.

I would recommend that you write your parser by hand, in which case having significant whitespace should not present any real problems.
The main problem with using a parser generator is that it is difficult to get good error recovery in the parser. If you plan on implementing an IDE for your language, then having good error recovery is important for getting things like Intellisence to work. Intellisence always works on incomplete syntactic constructs, and the better the parser is at figuring out what construct the user is trying to type, the better an intellisence experience you can deliver.
If you write a hand-written top-down parser, you can pretty much implement what ever rules you want, where ever you want to. This is what makes it easy to provide error recovery. It will also make it trivial for you to implement significant whitespace. You can simply store what the current indentation level is in a variable inside your parser class, and can stop parsing blocks when you encounter a token on a new line that has a column position that is less than the current indentation level. Also, chances are that you are going to run into ambiguities in your grammar. Most “production” languages in wide use have syntactic ambiguities. A good example is generics in C# (there are ambiguities around "<" in an expression context, it can be either a "less-than" operator, or the start of a "generic argument list"). In a hand-written parser solving ambiguities like that are trivial. You can just add a little bit of non-determinism where you need it with relatively little impact on the rest of the parser,
Furthermore, because you are designing the language yourself, you should assume it's design is going to evolve rapidly (for some languages with standards committees, like C++ this is not the case). Making changes to automatically generated parsers to either handle ambiguities, or evolve the language, may require you to do significant refactoring of the grammar, which can be both irritating and time consuming. Changes to hand written parsers, particularly for top-down parsers, are usually pretty localized.
I would say that parser generators are only a good choice if:
You never plan on writing an IDE ever,
The language has really simple syntax, or
You need a parser extremely quickly, and are ok with a bad user experience

Have you read Aho, Sethi, Ullman: "Compilers: Principles, Techniques, and Tools"? It is a classical language reference book.
/Allan

If you've never written a parser before, start with something simple. Parsers are surprisingly subtle, and you can get into all sorts of trouble writing them if you've never studied the structure of programming languages.
Reading Aho, Sethi, and Ullman (it's known as "The Dragon Book") is a good plan. Contrary to other contributors, I say you should play with simpler parser generators like Yacc and Bison first, and only when you get burned because you can't do something with that tool should you go on to try to build something with an LL(*) parser like Antlr.

Just because a language uses significant indentation doesn't mean that it is inherently context-sensitive. As an example, Haskell makes use of significant indentation, and (to my knowledge) its grammar is context-free.
An example of source requiring a context-sensitive grammar could be this snippet from Ruby:
my_essay = << END_STR
This is within the string
END_STR
<< self
def other_method
...
end
end
Another example would be Scala's XML mode:
def doSomething() = {
val xml = <code>def val <tag/> class</code>
xml
}
As a general rule, context-sensitive languages are slightly harder to imagine in any precise sense and thus far less common. Even Ruby and Scala don't really count since their context sensitive features encompass only a minor sub-set of the language. If I were you, I would formulate my grammar as inspiration dictates and then worry about parsing methodologies at a later date. I think you'll find that whatever you come up with will be naturally context-free, or very close to it.
As a final note, if you really need context-sensitive parsing tools, you might try some of the less rigidly formal techniques. Parser combinators are used in Scala's parsing. They have some annoying limitations (no lexing), but they aren't a bad tool. LL(*) tools like ANTLR also seem to be more adept at expressing such "ad hoc" parsing escapes. Don't try to use Yacc or Bison with a context-sensitive grammar, they are far to strict to express such concepts easily.

A context-sensitive language? This one's non-indented: Protium (http://www.protiumble.com)

Python vs. Ruby for metaprogramming [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
I'm currently primarily a D programmer and am looking to add another language to my toolbox, preferably one that supports the metaprogramming hacks that just can't be done in a statically compiled language like D.
I've read up on Lisp a little and I would love to find a language that allows some of the cool stuff that Lisp does, but without the strange syntax, etc. of Lisp. I don't want to start a language flame war, and I'm sure both Ruby and Python have their tradeoffs, so I'll list what's important to me personally. Please tell me whether Ruby, Python, or some other language would be best for me.
Important:
Good metaprogramming. Ability to create classes, methods, functions, etc. at runtime. Preferably, minimal distinction between code and data, Lisp style.
Nice, clean, sane syntax and consistent, intuitive semantics. Basically a well thought-out, fun to use, modern language.
Multiple paradigms. No one paradigm is right for every project, or even every small subproblem within a project.
An interesting language that actually affects the way one thinks about programming.
Somewhat important:
Performance. It would be nice if performance was decent, but when performance is a real priority, I'll use D instead.
Well-documented.
Not important:
Community size, library availability, etc. None of these are characteristics of the language itself, and all can change very quickly.
Job availability. I am not a full-time, professional programmer. I am a grad student and programming is tangentially relevant to my research.
Any features that are primarily designed with very large projects worked on by a million code monkeys in mind.

I've read up on Lisp a little and I would love to find a language that allows some of the cool stuff that Lisp does, but without the strange syntax, etc. of Lisp.
Wouldn't we all.
minimal distinction between code and data, Lisp style
Sadly, the minimal distinction between code and data and "strange" syntax are consequences of each other.
If you want easy-to-read syntax, you have Python. However, the code is not represented in any of the commonly-used built-in data structures. It fails—as most languages do—in item #1 of your 'important' list. That makes it difficult to provide useful help.
You can't have it all. Remember, you aren't the first to have this thought. If something like your ideal language existed, we'd all be using it. Since the real world falls short of your ideals, you'll have to re-prioritize your wish list. The "important" section has to be rearranged to identify what's really important to you.

Honestly, as far as metaprogramming facilities go, Ruby and Python are a lot more similar than some of their adherent like to admit. This review of both language offers a pretty good comparison/review:
http://regebro.wordpress.com/2009/07/12/python-vs-ruby/
So, just pick one based on some criteria. Maybe you like Rails and want to study that code. Maybe SciPy is your thing. Look at the ecosystem of libraries, community, etc, and pick one. You certainly won't lose out on some metaprogramming nirvana based on your choice of either.

Disclaimer: I only dabble in either language, but I have at least written small working programs (not just quick scripts, for which I use Perl, bash or GNU make) in both.
Ruby can be really nice for the "multiple paradigms" point 3, because it works hard to make it easy to create domain-specific languages. For example, browse online and look at a couple of bits of Ruby on Rails code, and a couple of bits of Rake code. They're both Ruby, and you can see the similarities, but they don't look like what you'd normally think of as the same language.
Python seems to me to be a bit more predictable (possibly correlated to 'clean' and 'sane' point 2), but I don't really know whether that's because of the language itself or just that it's typically used by people with different values. I have never attempted deep magic in Python. I would certainly say that both languages are well thought out.
Both score well in 1 and 4. [Edit: actually 1 is pretty arguable - there is "eval" in both, as common in interpreted languages, but they're hardly conceptually pure. You can define closures, assign methods to objects, and whatnot. Not sure whether this goes as far as you want.]
Personally I find Ruby more fun, but in part that's because it's easier to get distracted thinking of cool ways to do things. I've actually used Python more. Sometimes you don't want cool, you want to get on with it so it's done before bedtime...
Neither of them is difficult to get into, so you could just decide to do your next minor task in one, and the one after that in the other. Or pick up an introductory book on each from the library, skim-read them both and see what grabs you.

There's not really a huge difference between python and ruby at least at an ideological level. For the most part, they're just different flavors of the same thing. Thus, I would recommend seeing which one matches your programming style more.

Have you considered Smalltalk? It offers a very simple, clear and extensible syntax with reflectivity and introspection capabilities and a fully integrated development environment that takes advantage of those capabilities. Have a look at some of the work being done in Squeak Smalltalk for instance. A lot of researchers using Squeak hang out on the Squeak mailing list and #squeak on freenode, so you can get help on complex issues very easily.
Other indicators of its current relevance: it runs on any platform you'd care to name (including the iPhone); Gilad Bracha is basing his Newspeak work on Squeak; the V8 team cut their teeth on Smalltalk VMs; and Dan Ingalls and Randal Schwartz have recently returned to Smalltalk work after years in the wilderness.
Best of luck with your search - let us know what you decide in the end.

Lisp satisfies all your criteria, including performance, and it is the only language that doesn't have (strange) syntax. If you eschew it on such an astoundingly ill-informed/wrong-headed basis and consequently miss out on the experience of using e.g. Emacs+SLIME+CL, you'll be doing yourself a great disservice.

Your 4 "important" points lead to Ruby exactly, while the 2 "somewhat important" points ruled by Python. So be it.

You are describing Ruby.
Good metaprogramming. Ability to create classes, methods, functions,
etc. at runtime. Preferably, minimal
distinction between code and data,
Lisp style.
It's very easy to extend and modify existing primitives at runtime. In ruby everything is an object, strings, integers, even functions.
You can also construct shortcuts for syntactic sugar, for example with class_eval.
Nice, clean, sane syntax and consistent, intuitive semantics.
Basically a well thought-out, fun to
use, modern language.
Ruby follows the principle of less surprise, and when comparing Ruby code vs the equivalent in other language many people consider it more "beautiful".
Multiple paradigms. No one paradigm is right for every project,
or even every small subproblem within
a project.
You can follow imperative, object oriented, functional and reflective.
An interesting language that actually affects the way one thinks
about programming.
That's very subjective, but from my point of view the ability to use many paradigms at the same time allows for very interesting ideas.
I've tried Python and it doesn't fit your important points.

Compare code examples that do the same thing (join with a newline non-empty descriptions of items from a myList list) in different languages (languages are arranged in reverse-alphabetic order):
Ruby:
myList.collect { |f| f.description }.select { |d| d != "" }.join("\n")
Or
myList.map(&:description).reject(&:empty?).join("\n")
Python:
descriptions = (f.description() for f in mylist)
"\n".join(filter(len, descriptions))
Or
"\n".join(f.description() for f in mylist if f.description())
Perl:
join "\n", grep { $_ } map { $_->description } #myList;
Or
join "\n", grep /./, map { $_->description } #myList;
Javascript:
myList.map(function(e) e.description())
.filter(function(e) e).join("\n")
Io:
myList collect(description) select(!="") join("\n")
Here's an Io guide.

Ruby would be better than Lisp in terms of being "mainstream" (whatever that really means, but one realistic concern is how easy it would be to find answers to your questions on Lisp programming if you were to go with that.) In any case, I found Ruby very easy to pick up. In the same amount of time that I had spent first learning Python (or other languages for that matter), I was soon writing better code much more efficiently than I ever had before. That's just one person's opinion, though; take it with a grain of salt, I guess. I know much more about Ruby at this point than I do Python or Lisp, but you should know that I was a Python person for quite a while before I switched.
Lisp is definitely quite cool and worth looking into; as you said, the size of community, etc. can change quite quickly. That being said, the size itself isn't as important as the quality of the community. For example, the #ruby-lang channel is still filled with some incredibly smart people. Lisp seems to attract some really smart people too. I can't speak much about the Python community as I don't have a lot of firsthand experience, but it seems to be "too big" sometimes. (I remember people being quite rude on their IRC channel, and from what I've heard from friends that are really into Python, that seems to be the rule rather than the exception.)
Anyway, some resources that you might find useful are:
1) The Pragmatic Programmers Ruby Metaprogramming series (http://www.pragprog.com/screencasts/v-dtrubyom/the-ruby-object-model-and-metaprogramming) -- not free, but the later episodes are quite intriguing. (The code is free, if you want to download it and see what you'd be learning about.)
2) On Lisp by Paul Graham (http://www.paulgraham.com/onlisp.html). It's a little old, but it's a classic (and downloadable for free).

#Jason I respectively disagree. There are differences that make Ruby superior to Python for metaprogramming - both philosophical and pragmatic. For starters, Ruby gets inheritance right with Single Inheritance and Mixins. And when it comes to metaprogramming you simply need to understand that it's all about the self. The canonical difference here is that in Ruby you have access to the self object at runtime - in Python you do not!
Unlike Python, in Ruby there is no separate compile or runtime phase. In Ruby, every line of code is executed against a particular self object. In Ruby every class inherits from both object and a hidden metaclass. This makes for some interesting dynamics:
class Ninja
def rank
puts "Orange Clan"
end
self.name #=> "Ninja"
end
Using self.name accesses the Ninja classes' metaclass name method to return the class name of Ninja. Does metaprogramming flower so beautiful in Python? I sincerely doubt it!

I am using Python for many projects and I think Python does provide all the features you asked for.
important:
Metaprogramming: Python supports metaclasses and runtime class/method generation etc
Syntax: Well thats somehow subjective. I like Pythons syntax for its simplicity, but some People complain that Python is whitespace-sensitive.
Paradigms: Python supports procedural, object-oriented and basic functional programming.
I think Python has a very practical oriented style, it was very inspiring for me.
Somewhat important:
Performance: Well its a scripting language. But writing C extensions for Python is a common optimization practice.
Documentation: I cannot complain. Its not that detailed as someone may know from Java, but its good enough.
As you are grad student you may want to read this paper claiming that Python is all a scientist needs.
Unfortunately I cannot compare Python to Ruby, since I never used that language.
Regards,
Dennis

Well, if you don't like the lisp syntax perhaps assembler is the way to go. :-)
It certainly has minimal distinction between code and data, is multi-paradigm (or maybe that is no-paradigm) and it's a mind expanding (if tedious) experience both in terms of the learning and the tricks you can do.

Io satisfies all of your "Important" points. I don't think there's a better language out there for doing crazy meta hackery.

one that supports the metaprogramming hacks that just can't be done in a statically compiled language
I would love to find a language that allows some of the cool stuff that Lisp does
Lisp can be compiled.

Did you try Rebol?

My answer would be neither. I know both languages, took a class on Ruby and been programming in python for several years. Lisp is good at metaprogramming due to the fact that its sole purpose is to transform lists, its own source code is just a list of tokens so metaprogramming is natural. The three languages I like best for this type of thing is Rebol, Forth and Factor. Rebol is a very strong dialecting language which takes code from its input stream, runs an expression against it and transforms it using rules written in the language. Very expressive and extremely good at dialecting. Factor and Forth are more or less completely divorced from syntax and you program them by defining and calling words. They are generally mostly written in their own language. You don't write applications in traditional sense, you extend the language by writing your own words to define your particular application. Factor can be especially nice as it has many features I have only seen in smalltalk for evaluating and working with source code. A really nice workspace, interactive documents, etc.

There isn't really a lot to separate Python and Ruby. I'd say the Python community is larger and more mature than the Ruby community, and that's really important for me. Ruby is a more flexible language, which has positive and negative repercussions. However, I'm sure there will be plenty of people to go into detail on both these languages, so I'll throw a third option into the ring. How about JavaScript?
JavaScript was originally designed to be Scheme for the web, and it's prototype-based, which is an advantage over Python and Ruby as far as multi-paradigm and metaprogramming is concerned. The syntax isn't as nice as the other two, but it is probably the most widely deployed language in existence, and performance is getting better every day.

If you like the lisp-style code-is-data concept, but don't like the Lispy syntax, maybe Prolog would be a good choice.
Whether that qualifies as a "fun to use, modern language", I'll leave to others to judge. ;-)

Ruby is my choice after exploring Python, Smalltalk, and Ruby.

What about OCaml ?
OCaml features: a static type system, type inference, parametric polymorphism, tail recursion, pattern matching, first class lexical closures, functors (parametric modules), exception handling, and incremental generational automatic garbage collection.
I think that it satisfies the following:
Important:
Nice, clean, sane syntax and consistent, intuitive semantics. Basically a well thought-out, fun to use, modern language.
Multiple paradigms. No one paradigm is right for every project, or even every small subproblem within a project.
An interesting language that actually affects the way one thinks about programming.
Somewhat important:
Performance. It would be nice if performance was decent, but when performance is a real priority, I'll use D instead.
Well-documented.

I've use Python a very bit, but much more Ruby. However I'd argue they both provide what you asked for.
If I see all your four points then you may at least check:
http://www.iolanguage.com/
And Mozart/Oz may be interesting for you also:
http://mozart.github.io/
Regards
Friedrich

For python-style syntax and lisp-like macros (macros that are real code) and good DSL see converge.

I'm not sure that Python would fulfill all things you desire (especially the point about the minimal distinction between code and data), but there is one argument in favour of python. There is a project out there which makes it easy for you to program extensions for python in D, so you can have the best of both worlds. http://pyd.dsource.org/celerid.html

if you love the rose, you have to learn to live with the thorns :)

I would recommend you go with Ruby.
When I first started to learn it, I found it really easy to pick up.

Do not to mix Ruby Programming Language with Ruby Implementations, thinking that POSIX threads are not possible in ruby.
You can simply compile with pthread support, and this was already possible at the time this thread was created, if you pardon the pun.
The answer to this question is simple. If you like lisp, you will probably prefer ruby. Or, whatever you like.

I suggest that you try out both languages and pick the one that appeals to you. Both Python and Ruby can do what you want.
Also read this thread.

Go with JS just check out AJS (Alternative JavaScript Syntax) at my github http://github.com/visionmedia it will give you some cleaner looking closures etc :D

Concerning your main-point (meta-programming):
Version 1.6 of Groovy has AST (Abstract Syntax Tree) programming built-in as a standard and integrated feature.
Ruby has RubyParser, but it's an add-on.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.