Can you add new statements to Python's syntax? - python
Can you add new statements (like print, raise, with) to Python's syntax?
Say, to allow..
mystatement "Something"
Or,
new_if True:
print "example"
Not so much if you should, but rather if it's possible (short of modifying the python interpreters code)
You may find this useful - Python internals: adding a new statement to Python, quoted here:
This article is an attempt to better understand how the front-end of Python works. Just reading documentation and source code may be a bit boring, so I'm taking a hands-on approach here: I'm going to add an until statement to Python.
All the coding for this article was done against the cutting-edge Py3k branch in the Python Mercurial repository mirror.
The until statement
Some languages, like Ruby, have an until statement, which is the complement to while (until num == 0 is equivalent to while num != 0). In Ruby, I can write:
num = 3
until num == 0 do
puts num
num -= 1
end
And it will print:
3
2
1
So, I want to add a similar capability to Python. That is, being able to write:
num = 3
until num == 0:
print(num)
num -= 1
A language-advocacy digression
This article doesn't attempt to suggest the addition of an until statement to Python. Although I think such a statement would make some code clearer, and this article displays how easy it is to add, I completely respect Python's philosophy of minimalism. All I'm trying to do here, really, is gain some insight into the inner workings of Python.
Modifying the grammar
Python uses a custom parser generator named pgen. This is a LL(1) parser that converts Python source code into a parse tree. The input to the parser generator is the file Grammar/Grammar[1]. This is a simple text file that specifies the grammar of Python.
[1]: From here on, references to files in the Python source are given relatively to the root of the source tree, which is the directory where you run configure and make to build Python.
Two modifications have to be made to the grammar file. The first is to add a definition for the until statement. I found where the while statement was defined (while_stmt), and added until_stmt below [2]:
compound_stmt: if_stmt | while_stmt | until_stmt | for_stmt | try_stmt | with_stmt | funcdef | classdef | decorated
if_stmt: 'if' test ':' suite ('elif' test ':' suite)* ['else' ':' suite]
while_stmt: 'while' test ':' suite ['else' ':' suite]
until_stmt: 'until' test ':' suite
[2]: This demonstrates a common technique I use when modifying source code I’m not familiar with: work by similarity. This principle won’t solve all your problems, but it can definitely ease the process. Since everything that has to be done for while also has to be done for until, it serves as a pretty good guideline.
Note that I've decided to exclude the else clause from my definition of until, just to make it a little bit different (and because frankly I dislike the else clause of loops and don't think it fits well with the Zen of Python).
The second change is to modify the rule for compound_stmt to include until_stmt, as you can see in the snippet above. It's right after while_stmt, again.
When you run make after modifying Grammar/Grammar, notice that the pgen program is run to re-generate Include/graminit.h and Python/graminit.c, and then several files get re-compiled.
Modifying the AST generation code
After the Python parser has created a parse tree, this tree is converted into an AST, since ASTs are much simpler to work with in subsequent stages of the compilation process.
So, we're going to visit Parser/Python.asdl which defines the structure of Python's ASTs and add an AST node for our new until statement, again right below the while:
| While(expr test, stmt* body, stmt* orelse)
| Until(expr test, stmt* body)
If you now run make, notice that before compiling a bunch of files, Parser/asdl_c.py is run to generate C code from the AST definition file. This (like Grammar/Grammar) is another example of the Python source-code using a mini-language (in other words, a DSL) to simplify programming. Also note that since Parser/asdl_c.py is a Python script, this is a kind of bootstrapping - to build Python from scratch, Python already has to be available.
While Parser/asdl_c.py generated the code to manage our newly defined AST node (into the files Include/Python-ast.h and Python/Python-ast.c), we still have to write the code that converts a relevant parse-tree node into it by hand. This is done in the file Python/ast.c. There, a function named ast_for_stmt converts parse tree nodes for statements into AST nodes. Again, guided by our old friend while, we jump right into the big switch for handling compound statements and add a clause for until_stmt:
case while_stmt:
return ast_for_while_stmt(c, ch);
case until_stmt:
return ast_for_until_stmt(c, ch);
Now we should implement ast_for_until_stmt. Here it is:
static stmt_ty
ast_for_until_stmt(struct compiling *c, const node *n)
{
/* until_stmt: 'until' test ':' suite */
REQ(n, until_stmt);
if (NCH(n) == 4) {
expr_ty expression;
asdl_seq *suite_seq;
expression = ast_for_expr(c, CHILD(n, 1));
if (!expression)
return NULL;
suite_seq = ast_for_suite(c, CHILD(n, 3));
if (!suite_seq)
return NULL;
return Until(expression, suite_seq, LINENO(n), n->n_col_offset, c->c_arena);
}
PyErr_Format(PyExc_SystemError,
"wrong number of tokens for 'until' statement: %d",
NCH(n));
return NULL;
}
Again, this was coded while closely looking at the equivalent ast_for_while_stmt, with the difference that for until I've decided not to support the else clause. As expected, the AST is created recursively, using other AST creating functions like ast_for_expr for the condition expression and ast_for_suite for the body of the until statement. Finally, a new node named Until is returned.
Note that we access the parse-tree node n using some macros like NCH and CHILD. These are worth understanding - their code is in Include/node.h.
Digression: AST composition
I chose to create a new type of AST for the until statement, but actually this isn't necessary. I could've saved some work and implemented the new functionality using composition of existing AST nodes, since:
until condition:
# do stuff
Is functionally equivalent to:
while not condition:
# do stuff
Instead of creating the Until node in ast_for_until_stmt, I could have created a Not node with an While node as a child. Since the AST compiler already knows how to handle these nodes, the next steps of the process could be skipped.
Compiling ASTs into bytecode
The next step is compiling the AST into Python bytecode. The compilation has an intermediate result which is a CFG (Control Flow Graph), but since the same code handles it I will ignore this detail for now and leave it for another article.
The code we will look at next is Python/compile.c. Following the lead of while, we find the function compiler_visit_stmt, which is responsible for compiling statements into bytecode. We add a clause for Until:
case While_kind:
return compiler_while(c, s);
case Until_kind:
return compiler_until(c, s);
If you wonder what Until_kind is, it's a constant (actually a value of the _stmt_kind enumeration) automatically generated from the AST definition file into Include/Python-ast.h. Anyway, we call compiler_until which, of course, still doesn't exist. I'll get to it an a moment.
If you're curious like me, you'll notice that compiler_visit_stmt is peculiar. No amount of grep-ping the source tree reveals where it is called. When this is the case, only one option remains - C macro-fu. Indeed, a short investigation leads us to the VISIT macro defined in Python/compile.c:
#define VISIT(C, TYPE, V) {\
if (!compiler_visit_ ## TYPE((C), (V))) \
return 0; \
It's used to invoke compiler_visit_stmt in compiler_body. Back to our business, however...
As promised, here's compiler_until:
static int
compiler_until(struct compiler *c, stmt_ty s)
{
basicblock *loop, *end, *anchor = NULL;
int constant = expr_constant(s->v.Until.test);
if (constant == 1) {
return 1;
}
loop = compiler_new_block(c);
end = compiler_new_block(c);
if (constant == -1) {
anchor = compiler_new_block(c);
if (anchor == NULL)
return 0;
}
if (loop == NULL || end == NULL)
return 0;
ADDOP_JREL(c, SETUP_LOOP, end);
compiler_use_next_block(c, loop);
if (!compiler_push_fblock(c, LOOP, loop))
return 0;
if (constant == -1) {
VISIT(c, expr, s->v.Until.test);
ADDOP_JABS(c, POP_JUMP_IF_TRUE, anchor);
}
VISIT_SEQ(c, stmt, s->v.Until.body);
ADDOP_JABS(c, JUMP_ABSOLUTE, loop);
if (constant == -1) {
compiler_use_next_block(c, anchor);
ADDOP(c, POP_BLOCK);
}
compiler_pop_fblock(c, LOOP, loop);
compiler_use_next_block(c, end);
return 1;
}
I have a confession to make: this code wasn't written based on a deep understanding of Python bytecode. Like the rest of the article, it was done in imitation of the kin compiler_while function. By reading it carefully, however, keeping in mind that the Python VM is stack-based, and glancing into the documentation of the dis module, which has a list of Python bytecodes with descriptions, it's possible to understand what's going on.
That's it, we're done... Aren't we?
After making all the changes and running make, we can run the newly compiled Python and try our new until statement:
>>> until num == 0:
... print(num)
... num -= 1
...
3
2
1
Voila, it works! Let's see the bytecode created for the new statement by using the dis module as follows:
import dis
def myfoo(num):
until num == 0:
print(num)
num -= 1
dis.dis(myfoo)
Here's the result:
4 0 SETUP_LOOP 36 (to 39)
>> 3 LOAD_FAST 0 (num)
6 LOAD_CONST 1 (0)
9 COMPARE_OP 2 (==)
12 POP_JUMP_IF_TRUE 38
5 15 LOAD_NAME 0 (print)
18 LOAD_FAST 0 (num)
21 CALL_FUNCTION 1
24 POP_TOP
6 25 LOAD_FAST 0 (num)
28 LOAD_CONST 2 (1)
31 INPLACE_SUBTRACT
32 STORE_FAST 0 (num)
35 JUMP_ABSOLUTE 3
>> 38 POP_BLOCK
>> 39 LOAD_CONST 0 (None)
42 RETURN_VALUE
The most interesting operation is number 12: if the condition is true, we jump to after the loop. This is correct semantics for until. If the jump isn't executed, the loop body keeps running until it jumps back to the condition at operation 35.
Feeling good about my change, I then tried running the function (executing myfoo(3)) instead of showing its bytecode. The result was less than encouraging:
Traceback (most recent call last):
File "zy.py", line 9, in
myfoo(3)
File "zy.py", line 5, in myfoo
print(num)
SystemError: no locals when loading 'print'
Whoa... this can't be good. So what went wrong?
The case of the missing symbol table
One of the steps the Python compiler performs when compiling the AST is create a symbol table for the code it compiles. The call to PySymtable_Build in PyAST_Compile calls into the symbol table module (Python/symtable.c), which walks the AST in a manner similar to the code generation functions. Having a symbol table for each scope helps the compiler figure out some key information, such as which variables are global and which are local to a scope.
To fix the problem, we have to modify the symtable_visit_stmt function in Python/symtable.c, adding code for handling until statements, after the similar code for while statements [3]:
case While_kind:
VISIT(st, expr, s->v.While.test);
VISIT_SEQ(st, stmt, s->v.While.body);
if (s->v.While.orelse)
VISIT_SEQ(st, stmt, s->v.While.orelse);
break;
case Until_kind:
VISIT(st, expr, s->v.Until.test);
VISIT_SEQ(st, stmt, s->v.Until.body);
break;
[3]: By the way, without this code there’s a compiler warning for Python/symtable.c. The compiler notices that the Until_kind enumeration value isn’t handled in the switch statement of symtable_visit_stmt and complains. It’s always important to check for compiler warnings!
And now we really are done. Compiling the source after this change makes the execution of myfoo(3) work as expected.
Conclusion
In this article I've demonstrated how to add a new statement to Python. Albeit requiring quite a bit of tinkering in the code of the Python compiler, the change wasn't difficult to implement, because I used a similar and existing statement as a guideline.
The Python compiler is a sophisticated chunk of software, and I don't claim being an expert in it. However, I am really interested in the internals of Python, and particularly its front-end. Therefore, I found this exercise a very useful companion to theoretical study of the compiler's principles and source code. It will serve as a base for future articles that will get deeper into the compiler.
References
I used a few excellent references for the construction of this article. Here they are, in no particular order:
PEP 339: Design of the CPython compiler - probably the most important and comprehensive piece of official documentation for the Python compiler. Being very short, it painfully displays the scarcity of good documentation of the internals of Python.
"Python Compiler Internals" - an article by Thomas Lee
"Python: Design and Implementation" - a presentation by Guido van Rossum
Python (2.5) Virtual Machine, A guided tour - a presentation by Peter Tröger
original source
One way to do things like this is to preprocess the source and modify it, translating your added statement to python. There are various problems this approach will bring, and I wouldn't recommend it for general usage, but for experimentation with language, or specific-purpose metaprogramming, it can occassionally be useful.
For instance, lets say we want to introduce a "myprint" statement, that instead of printing to the screen instead logs to a specific file. ie:
myprint "This gets logged to file"
would be equivalent to
print >>open('/tmp/logfile.txt','a'), "This gets logged to file"
There are various options as to how to do the replacing, from regex substitution to generating an AST, to writing your own parser depending on how close your syntax matches existing python. A good intermediate approach is to use the tokenizer module. This should allow you to add new keywords, control structures etc while interpreting the source similarly to the python interpreter, thus avoiding the breakage crude regex solutions would cause. For the above "myprint", you could write the following transformation code:
import tokenize
LOGFILE = '/tmp/log.txt'
def translate(readline):
for type, name,_,_,_ in tokenize.generate_tokens(readline):
if type ==tokenize.NAME and name =='myprint':
yield tokenize.NAME, 'print'
yield tokenize.OP, '>>'
yield tokenize.NAME, "open"
yield tokenize.OP, "("
yield tokenize.STRING, repr(LOGFILE)
yield tokenize.OP, ","
yield tokenize.STRING, "'a'"
yield tokenize.OP, ")"
yield tokenize.OP, ","
else:
yield type,name
(This does make myprint effectively a keyword, so use as a variable elsewhere will likely cause problems)
The problem then is how to use it so that your code is usable from python. One way would just be to write your own import function, and use it to load code written in your custom language. ie:
import new
def myimport(filename):
mod = new.module(filename)
f=open(filename)
data = tokenize.untokenize(translate(f.readline))
exec data in mod.__dict__
return mod
This requires you handle your customised code differently from normal python modules however. ie "some_mod = myimport("some_mod.py")" rather than "import some_mod"
Another fairly neat (albeit hacky) solution is to create a custom encoding (See PEP 263) as this recipe demonstrates. You could implement this as:
import codecs, cStringIO, encodings
from encodings import utf_8
class StreamReader(utf_8.StreamReader):
def __init__(self, *args, **kwargs):
codecs.StreamReader.__init__(self, *args, **kwargs)
data = tokenize.untokenize(translate(self.stream.readline))
self.stream = cStringIO.StringIO(data)
def search_function(s):
if s!='mylang': return None
utf8=encodings.search_function('utf8') # Assume utf8 encoding
return codecs.CodecInfo(
name='mylang',
encode = utf8.encode,
decode = utf8.decode,
incrementalencoder=utf8.incrementalencoder,
incrementaldecoder=utf8.incrementaldecoder,
streamreader=StreamReader,
streamwriter=utf8.streamwriter)
codecs.register(search_function)
Now after this code gets run (eg. you could place it in your .pythonrc or site.py) any code starting with the comment "# coding: mylang" will automatically be translated through the above preprocessing step. eg.
# coding: mylang
myprint "this gets logged to file"
for i in range(10):
myprint "so does this : ", i, "times"
myprint ("works fine" "with arbitrary" + " syntax"
"and line continuations")
Caveats:
There are problems to the preprocessor approach, as you'll probably be familiar with if you've worked with the C preprocessor. The main one is debugging. All python sees is the preprocessed file which means that text printed in the stack trace etc will refer to that. If you've performed significant translation, this may be very different from your source text. The example above doesn't change line numbers etc, so won't be too different, but the more you change it, the harder it will be to figure out.
Yes, to some extent it is possible. There is a module out there that uses sys.settrace() to implement goto and comefrom "keywords":
from goto import goto, label
for i in range(1, 10):
for j in range(1, 20):
print i, j
if j == 3:
goto .end # breaking out from nested loop
label .end
print "Finished"
Short of changing and recompiling the source code (which is possible with open source), changing the base language is not really possible.
Even if you do recompile the source, it wouldn't be python, just your hacked-up changed version which you need to be very careful not to introduce bugs into.
However, I'm not sure why you'd want to. Python's object-oriented features makes it quite simple to achieve similar results with the language as it stands.
General answer: you need to preprocess your source files.
More specific answer: install EasyExtend, and go through following steps
i) Create a new langlet ( extension language )
import EasyExtend
EasyExtend.new_langlet("mystmts", prompt = "my> ", source_ext = "mypy")
Without additional specification a bunch of files shall be created under EasyExtend/langlets/mystmts/ .
ii) Open mystmts/parsedef/Grammar.ext and add following lines
small_stmt: (expr_stmt | print_stmt | del_stmt | pass_stmt | flow_stmt |
import_stmt | global_stmt | exec_stmt | assert_stmt | my_stmt )
my_stmt: 'mystatement' expr
This is sufficient to define the syntax of your new statement. The small_stmt non-terminal is part of the Python grammar and it's the place where the new statement is hooked in. The parser will now recognize the new statement i.e. a source file containing it will be parsed. The compiler will reject it though because it still has to be transformed into valid Python.
iii) Now one has to add semantics of the statement. For this one has to edit
msytmts/langlet.py and add a my_stmt node visitor.
def call_my_stmt(expression):
"defines behaviour for my_stmt"
print "my stmt called with", expression
class LangletTransformer(Transformer):
#transform
def my_stmt(self, node):
_expr = find_node(node, symbol.expr)
return any_stmt(CST_CallFunc("call_my_stmt", [_expr]))
__publish__ = ["call_my_stmt"]
iv) cd to langlets/mystmts and type
python run_mystmts.py
Now a session shall be started and the newly defined statement can be used:
__________________________________________________________________________________
mystmts
On Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit (Intel)]
__________________________________________________________________________________
my> mystatement 40+2
my stmt called with 42
Quite a few steps to come to a trivial statement, right? There isn't an API yet that lets one define simple things without having to care about grammars. But EE is very reliable modulo some bugs. So it's just a matter of time that an API emerges that lets programmers define convenient stuff like infix operators or small statements using just convenient OO programming. For more complex things like embedding whole languages in Python by means of building a langlet there is no way of going around a full grammar approach.
Here's a very simple but crappy way to add new statements, in interpretive mode only. I'm using it for little 1-letter commands for editing gene annotations using only sys.displayhook, but just so I could answer this question I added sys.excepthook for the syntax errors as well. The latter is really ugly, fetching the raw code from the readline buffer. The benefit is, it's trivially easy to add new statements this way.
jcomeau#intrepid:~/$ cat demo.py; ./demo.py
#!/usr/bin/python -i
'load everything needed under "package", such as package.common.normalize()'
import os, sys, readline, traceback
if __name__ == '__main__':
class t:
#staticmethod
def localfunction(*args):
print 'this is a test'
if args:
print 'ignoring %s' % repr(args)
def displayhook(whatever):
if hasattr(whatever, 'localfunction'):
return whatever.localfunction()
else:
print whatever
def excepthook(exctype, value, tb):
if exctype is SyntaxError:
index = readline.get_current_history_length()
item = readline.get_history_item(index)
command = item.split()
print 'command:', command
if len(command[0]) == 1:
try:
eval(command[0]).localfunction(*command[1:])
except:
traceback.print_exception(exctype, value, tb)
else:
traceback.print_exception(exctype, value, tb)
sys.displayhook = displayhook
sys.excepthook = excepthook
>>> t
this is a test
>>> t t
command: ['t', 't']
this is a test
ignoring ('t',)
>>> ^D
I've found a guide on adding new statements:
https://troeger.eu/files/teaching/pythonvm08lab.pdf
Basically, to add new statements, you must edit Python/ast.c (among other things) and recompile the python binary.
While it's possible, don't. You can achieve almost everything via functions and classes (which wont require people to recompile python just to run your script..)
It's possible to do this using EasyExtend:
EasyExtend (EE) is a preprocessor
generator and metaprogramming
framework written in pure Python and
integrated with CPython. The main
purpose of EasyExtend is the creation
of extension languages i.e. adding
custom syntax and semantics to Python.
It's not exactly adding new statements to the language syntax, but macros are a powerful tool: https://github.com/lihaoyi/macropy
Some things can be done with decorators. Let's e.g. assume, Python had no with statement. We could then implement a similar behaviour like this:
# ====== Implementation of "mywith" decorator ======
def mywith(stream):
def decorator(function):
try: function(stream)
finally: stream.close()
return decorator
# ====== Using the decorator ======
#mywith(open("test.py","r"))
def _(infile):
for l in infile.readlines():
print(">>", l.rstrip())
It is a pretty unclean solution however as done here. Especially the behaviour where the decorator calls the function and sets _ to None is unexpected. For clarification: This decorator is equivalent to writing
def _(infile): ...
_ = mywith(open(...))(_) # mywith returns None.
and decorators are normally expected to modify, not to execute, functions.
I used such a method before in a script where I had to temporarily set the working directory for several functions.
OUTDATED:
The Logix project is now deprecated and no longer developed, per the Logix website.
There is a language based on python called Logix with which you CAN do such things. It hasn't been under development for a while, but the features that you asked for do work with the latest version.
Not without modifying the interpreter. I know a lot of languages in the past several years have been described as "extensible", but not in the way you're describing. You extend Python by adding functions and classes.
Ten years ago you couldn't, and I doubt that's changed. However, it wasn't that hard to modify the syntax back then if you were prepared to recompile python, and I doubt that's changed, either.
Related
Executing an import statement string and using the import [duplicate]
How do I execute a string containing Python code in Python? Do not ever use eval (or exec) on data that could possibly come from outside the program in any form. It is a critical security risk. You allow the author of the data to run arbitrary code on your computer. If you are here because you want to create multiple variables in your Python program following a pattern, you almost certainly have an XY problem. Do not create those variables at all - instead, use a list or dict appropriately.
For statements, use exec(string) (Python 2/3) or exec string (Python 2): >>> my_code = 'print("hello world")' >>> exec(my_code) Hello world When you need the value of an expression, use eval(string): >>> x = eval("2+2") >>> x 4 However, the first step should be to ask yourself if you really need to. Executing code should generally be the position of last resort: It's slow, ugly and dangerous if it can contain user-entered code. You should always look at alternatives first, such as higher order functions, to see if these can better meet your needs.
In the example a string is executed as code using the exec function. import sys import StringIO # create file-like string to capture output codeOut = StringIO.StringIO() codeErr = StringIO.StringIO() code = """ def f(x): x = x + 1 return x print 'This is my output.' """ # capture output and errors sys.stdout = codeOut sys.stderr = codeErr exec code # restore stdout and stderr sys.stdout = sys.__stdout__ sys.stderr = sys.__stderr__ print f(4) s = codeErr.getvalue() print "error:\n%s\n" % s s = codeOut.getvalue() print "output:\n%s" % s codeOut.close() codeErr.close()
eval and exec are the correct solution, and they can be used in a safer manner. As discussed in Python's reference manual and clearly explained in this tutorial, the eval and exec functions take two extra parameters that allow a user to specify what global and local functions and variables are available. For example: public_variable = 10 private_variable = 2 def public_function(): return "public information" def private_function(): return "super sensitive information" # make a list of safe functions safe_list = ['public_variable', 'public_function'] safe_dict = dict([ (k, locals().get(k, None)) for k in safe_list ]) # add any needed builtins back in safe_dict['len'] = len >>> eval("public_variable+2", {"__builtins__" : None }, safe_dict) 12 >>> eval("private_variable+2", {"__builtins__" : None }, safe_dict) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<string>", line 1, in <module> NameError: name 'private_variable' is not defined >>> exec("print \"'%s' has %i characters\" % (public_function(), len(public_function()))", {"__builtins__" : None}, safe_dict) 'public information' has 18 characters >>> exec("print \"'%s' has %i characters\" % (private_function(), len(private_function()))", {"__builtins__" : None}, safe_dict) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<string>", line 1, in <module> NameError: name 'private_function' is not defined In essence you are defining the namespace in which the code will be executed.
Remember that from version 3 exec is a function! so always use exec(mystring) instead of exec mystring.
Avoid exec and eval Using exec and eval in Python is highly frowned upon. There are better alternatives From the top answer (emphasis mine): For statements, use exec. When you need the value of an expression, use eval. However, the first step should be to ask yourself if you really need to. Executing code should generally be the position of last resort: It's slow, ugly and dangerous if it can contain user-entered code. You should always look at alternatives first, such as higher order functions, to see if these can better meet your needs. From Alternatives to exec/eval? set and get values of variables with the names in strings [while eval] would work, it is generally not advised to use variable names bearing a meaning to the program itself. Instead, better use a dict. It is not idiomatic From http://lucumr.pocoo.org/2011/2/1/exec-in-python/ (emphasis mine) Python is not PHP Don't try to circumvent Python idioms because some other language does it differently. Namespaces are in Python for a reason and just because it gives you the tool exec it does not mean you should use that tool. It is dangerous From http://nedbatchelder.com/blog/201206/eval_really_is_dangerous.html (emphasis mine) So eval is not safe, even if you remove all the globals and the builtins! The problem with all of these attempts to protect eval() is that they are blacklists. They explicitly remove things that could be dangerous. That is a losing battle because if there's just one item left off the list, you can attack the system. So, can eval be made safe? Hard to say. At this point, my best guess is that you can't do any harm if you can't use any double underscores, so maybe if you exclude any string with double underscores you are safe. Maybe... It is hard to read and understand From http://stupidpythonideas.blogspot.it/2013/05/why-evalexec-is-bad.html (emphasis mine): First, exec makes it harder to human beings to read your code. In order to figure out what's happening, I don't just have to read your code, I have to read your code, figure out what string it's going to generate, then read that virtual code. So, if you're working on a team, or publishing open source software, or asking for help somewhere like StackOverflow, you're making it harder for other people to help you. And if there's any chance that you're going to be debugging or expanding on this code 6 months from now, you're making it harder for yourself directly.
eval() is just for expressions, while eval('x+1') works, eval('x=1') won't work for example. In that case, it's better to use exec, or even better: try to find a better solution :)
It's worth mentioning that exec's brother exists as well, called execfile, if you want to call a Python file. That is sometimes good if you are working in a third party package which have terrible IDE's included and you want to code outside of their package. Example: execfile('/path/to/source.py') or: exec(open("/path/to/source.py").read())
You accomplish executing code using exec, as with the following IDLE session: >>> kw = {} >>> exec( "ret = 4" ) in kw >>> kw['ret'] 4
As the others mentioned, it's "exec" .. but, in case your code contains variables, you can use "global" to access it, also to prevent the compiler to raise the following error: NameError: name 'p_variable' is not defined exec('p_variable = [1,2,3,4]') global p_variable print(p_variable)
I tried quite a few things, but the only thing that worked was the following: temp_dict = {} exec("temp_dict['val'] = 10") print(temp_dict['val']) output: 10
Use eval.
Check out eval: x = 1 print eval('x+1') ->2
The most logical solution would be to use the built-in eval() function .Another solution is to write that string to a temporary python file and execute it.
Ok .. I know this isn't exactly an answer, but possibly a note for people looking at this as I was. I wanted to execute specific code for different users/customers but also wanted to avoid the exec/eval. I initially looked to storing the code in a database for each user and doing the above. I ended up creating the files on the file system within a 'customer_filters' folder and using the 'imp' module, if no filter applied for that customer, it just carried on import imp def get_customer_module(customerName='default', name='filter'): lm = None try: module_name = customerName+"_"+name; m = imp.find_module(module_name, ['customer_filters']) lm = imp.load_module(module_name, m[0], m[1], m[2]) except: '' #ignore, if no module is found, return lm m = get_customer_module(customerName, "filter") if m is not None: m.apply_address_filter(myobj) so customerName = "jj" would execute apply_address_filter from the customer_filters\jj_filter.py file
Create a keyword in python instead of inheriting from a class [duplicate]
Can you add new statements (like print, raise, with) to Python's syntax? Say, to allow.. mystatement "Something" Or, new_if True: print "example" Not so much if you should, but rather if it's possible (short of modifying the python interpreters code)
You may find this useful - Python internals: adding a new statement to Python, quoted here: This article is an attempt to better understand how the front-end of Python works. Just reading documentation and source code may be a bit boring, so I'm taking a hands-on approach here: I'm going to add an until statement to Python. All the coding for this article was done against the cutting-edge Py3k branch in the Python Mercurial repository mirror. The until statement Some languages, like Ruby, have an until statement, which is the complement to while (until num == 0 is equivalent to while num != 0). In Ruby, I can write: num = 3 until num == 0 do puts num num -= 1 end And it will print: 3 2 1 So, I want to add a similar capability to Python. That is, being able to write: num = 3 until num == 0: print(num) num -= 1 A language-advocacy digression This article doesn't attempt to suggest the addition of an until statement to Python. Although I think such a statement would make some code clearer, and this article displays how easy it is to add, I completely respect Python's philosophy of minimalism. All I'm trying to do here, really, is gain some insight into the inner workings of Python. Modifying the grammar Python uses a custom parser generator named pgen. This is a LL(1) parser that converts Python source code into a parse tree. The input to the parser generator is the file Grammar/Grammar[1]. This is a simple text file that specifies the grammar of Python. [1]: From here on, references to files in the Python source are given relatively to the root of the source tree, which is the directory where you run configure and make to build Python. Two modifications have to be made to the grammar file. The first is to add a definition for the until statement. I found where the while statement was defined (while_stmt), and added until_stmt below [2]: compound_stmt: if_stmt | while_stmt | until_stmt | for_stmt | try_stmt | with_stmt | funcdef | classdef | decorated if_stmt: 'if' test ':' suite ('elif' test ':' suite)* ['else' ':' suite] while_stmt: 'while' test ':' suite ['else' ':' suite] until_stmt: 'until' test ':' suite [2]: This demonstrates a common technique I use when modifying source code I’m not familiar with: work by similarity. This principle won’t solve all your problems, but it can definitely ease the process. Since everything that has to be done for while also has to be done for until, it serves as a pretty good guideline. Note that I've decided to exclude the else clause from my definition of until, just to make it a little bit different (and because frankly I dislike the else clause of loops and don't think it fits well with the Zen of Python). The second change is to modify the rule for compound_stmt to include until_stmt, as you can see in the snippet above. It's right after while_stmt, again. When you run make after modifying Grammar/Grammar, notice that the pgen program is run to re-generate Include/graminit.h and Python/graminit.c, and then several files get re-compiled. Modifying the AST generation code After the Python parser has created a parse tree, this tree is converted into an AST, since ASTs are much simpler to work with in subsequent stages of the compilation process. So, we're going to visit Parser/Python.asdl which defines the structure of Python's ASTs and add an AST node for our new until statement, again right below the while: | While(expr test, stmt* body, stmt* orelse) | Until(expr test, stmt* body) If you now run make, notice that before compiling a bunch of files, Parser/asdl_c.py is run to generate C code from the AST definition file. This (like Grammar/Grammar) is another example of the Python source-code using a mini-language (in other words, a DSL) to simplify programming. Also note that since Parser/asdl_c.py is a Python script, this is a kind of bootstrapping - to build Python from scratch, Python already has to be available. While Parser/asdl_c.py generated the code to manage our newly defined AST node (into the files Include/Python-ast.h and Python/Python-ast.c), we still have to write the code that converts a relevant parse-tree node into it by hand. This is done in the file Python/ast.c. There, a function named ast_for_stmt converts parse tree nodes for statements into AST nodes. Again, guided by our old friend while, we jump right into the big switch for handling compound statements and add a clause for until_stmt: case while_stmt: return ast_for_while_stmt(c, ch); case until_stmt: return ast_for_until_stmt(c, ch); Now we should implement ast_for_until_stmt. Here it is: static stmt_ty ast_for_until_stmt(struct compiling *c, const node *n) { /* until_stmt: 'until' test ':' suite */ REQ(n, until_stmt); if (NCH(n) == 4) { expr_ty expression; asdl_seq *suite_seq; expression = ast_for_expr(c, CHILD(n, 1)); if (!expression) return NULL; suite_seq = ast_for_suite(c, CHILD(n, 3)); if (!suite_seq) return NULL; return Until(expression, suite_seq, LINENO(n), n->n_col_offset, c->c_arena); } PyErr_Format(PyExc_SystemError, "wrong number of tokens for 'until' statement: %d", NCH(n)); return NULL; } Again, this was coded while closely looking at the equivalent ast_for_while_stmt, with the difference that for until I've decided not to support the else clause. As expected, the AST is created recursively, using other AST creating functions like ast_for_expr for the condition expression and ast_for_suite for the body of the until statement. Finally, a new node named Until is returned. Note that we access the parse-tree node n using some macros like NCH and CHILD. These are worth understanding - their code is in Include/node.h. Digression: AST composition I chose to create a new type of AST for the until statement, but actually this isn't necessary. I could've saved some work and implemented the new functionality using composition of existing AST nodes, since: until condition: # do stuff Is functionally equivalent to: while not condition: # do stuff Instead of creating the Until node in ast_for_until_stmt, I could have created a Not node with an While node as a child. Since the AST compiler already knows how to handle these nodes, the next steps of the process could be skipped. Compiling ASTs into bytecode The next step is compiling the AST into Python bytecode. The compilation has an intermediate result which is a CFG (Control Flow Graph), but since the same code handles it I will ignore this detail for now and leave it for another article. The code we will look at next is Python/compile.c. Following the lead of while, we find the function compiler_visit_stmt, which is responsible for compiling statements into bytecode. We add a clause for Until: case While_kind: return compiler_while(c, s); case Until_kind: return compiler_until(c, s); If you wonder what Until_kind is, it's a constant (actually a value of the _stmt_kind enumeration) automatically generated from the AST definition file into Include/Python-ast.h. Anyway, we call compiler_until which, of course, still doesn't exist. I'll get to it an a moment. If you're curious like me, you'll notice that compiler_visit_stmt is peculiar. No amount of grep-ping the source tree reveals where it is called. When this is the case, only one option remains - C macro-fu. Indeed, a short investigation leads us to the VISIT macro defined in Python/compile.c: #define VISIT(C, TYPE, V) {\ if (!compiler_visit_ ## TYPE((C), (V))) \ return 0; \ It's used to invoke compiler_visit_stmt in compiler_body. Back to our business, however... As promised, here's compiler_until: static int compiler_until(struct compiler *c, stmt_ty s) { basicblock *loop, *end, *anchor = NULL; int constant = expr_constant(s->v.Until.test); if (constant == 1) { return 1; } loop = compiler_new_block(c); end = compiler_new_block(c); if (constant == -1) { anchor = compiler_new_block(c); if (anchor == NULL) return 0; } if (loop == NULL || end == NULL) return 0; ADDOP_JREL(c, SETUP_LOOP, end); compiler_use_next_block(c, loop); if (!compiler_push_fblock(c, LOOP, loop)) return 0; if (constant == -1) { VISIT(c, expr, s->v.Until.test); ADDOP_JABS(c, POP_JUMP_IF_TRUE, anchor); } VISIT_SEQ(c, stmt, s->v.Until.body); ADDOP_JABS(c, JUMP_ABSOLUTE, loop); if (constant == -1) { compiler_use_next_block(c, anchor); ADDOP(c, POP_BLOCK); } compiler_pop_fblock(c, LOOP, loop); compiler_use_next_block(c, end); return 1; } I have a confession to make: this code wasn't written based on a deep understanding of Python bytecode. Like the rest of the article, it was done in imitation of the kin compiler_while function. By reading it carefully, however, keeping in mind that the Python VM is stack-based, and glancing into the documentation of the dis module, which has a list of Python bytecodes with descriptions, it's possible to understand what's going on. That's it, we're done... Aren't we? After making all the changes and running make, we can run the newly compiled Python and try our new until statement: >>> until num == 0: ... print(num) ... num -= 1 ... 3 2 1 Voila, it works! Let's see the bytecode created for the new statement by using the dis module as follows: import dis def myfoo(num): until num == 0: print(num) num -= 1 dis.dis(myfoo) Here's the result: 4 0 SETUP_LOOP 36 (to 39) >> 3 LOAD_FAST 0 (num) 6 LOAD_CONST 1 (0) 9 COMPARE_OP 2 (==) 12 POP_JUMP_IF_TRUE 38 5 15 LOAD_NAME 0 (print) 18 LOAD_FAST 0 (num) 21 CALL_FUNCTION 1 24 POP_TOP 6 25 LOAD_FAST 0 (num) 28 LOAD_CONST 2 (1) 31 INPLACE_SUBTRACT 32 STORE_FAST 0 (num) 35 JUMP_ABSOLUTE 3 >> 38 POP_BLOCK >> 39 LOAD_CONST 0 (None) 42 RETURN_VALUE The most interesting operation is number 12: if the condition is true, we jump to after the loop. This is correct semantics for until. If the jump isn't executed, the loop body keeps running until it jumps back to the condition at operation 35. Feeling good about my change, I then tried running the function (executing myfoo(3)) instead of showing its bytecode. The result was less than encouraging: Traceback (most recent call last): File "zy.py", line 9, in myfoo(3) File "zy.py", line 5, in myfoo print(num) SystemError: no locals when loading 'print' Whoa... this can't be good. So what went wrong? The case of the missing symbol table One of the steps the Python compiler performs when compiling the AST is create a symbol table for the code it compiles. The call to PySymtable_Build in PyAST_Compile calls into the symbol table module (Python/symtable.c), which walks the AST in a manner similar to the code generation functions. Having a symbol table for each scope helps the compiler figure out some key information, such as which variables are global and which are local to a scope. To fix the problem, we have to modify the symtable_visit_stmt function in Python/symtable.c, adding code for handling until statements, after the similar code for while statements [3]: case While_kind: VISIT(st, expr, s->v.While.test); VISIT_SEQ(st, stmt, s->v.While.body); if (s->v.While.orelse) VISIT_SEQ(st, stmt, s->v.While.orelse); break; case Until_kind: VISIT(st, expr, s->v.Until.test); VISIT_SEQ(st, stmt, s->v.Until.body); break; [3]: By the way, without this code there’s a compiler warning for Python/symtable.c. The compiler notices that the Until_kind enumeration value isn’t handled in the switch statement of symtable_visit_stmt and complains. It’s always important to check for compiler warnings! And now we really are done. Compiling the source after this change makes the execution of myfoo(3) work as expected. Conclusion In this article I've demonstrated how to add a new statement to Python. Albeit requiring quite a bit of tinkering in the code of the Python compiler, the change wasn't difficult to implement, because I used a similar and existing statement as a guideline. The Python compiler is a sophisticated chunk of software, and I don't claim being an expert in it. However, I am really interested in the internals of Python, and particularly its front-end. Therefore, I found this exercise a very useful companion to theoretical study of the compiler's principles and source code. It will serve as a base for future articles that will get deeper into the compiler. References I used a few excellent references for the construction of this article. Here they are, in no particular order: PEP 339: Design of the CPython compiler - probably the most important and comprehensive piece of official documentation for the Python compiler. Being very short, it painfully displays the scarcity of good documentation of the internals of Python. "Python Compiler Internals" - an article by Thomas Lee "Python: Design and Implementation" - a presentation by Guido van Rossum Python (2.5) Virtual Machine, A guided tour - a presentation by Peter Tröger original source
One way to do things like this is to preprocess the source and modify it, translating your added statement to python. There are various problems this approach will bring, and I wouldn't recommend it for general usage, but for experimentation with language, or specific-purpose metaprogramming, it can occassionally be useful. For instance, lets say we want to introduce a "myprint" statement, that instead of printing to the screen instead logs to a specific file. ie: myprint "This gets logged to file" would be equivalent to print >>open('/tmp/logfile.txt','a'), "This gets logged to file" There are various options as to how to do the replacing, from regex substitution to generating an AST, to writing your own parser depending on how close your syntax matches existing python. A good intermediate approach is to use the tokenizer module. This should allow you to add new keywords, control structures etc while interpreting the source similarly to the python interpreter, thus avoiding the breakage crude regex solutions would cause. For the above "myprint", you could write the following transformation code: import tokenize LOGFILE = '/tmp/log.txt' def translate(readline): for type, name,_,_,_ in tokenize.generate_tokens(readline): if type ==tokenize.NAME and name =='myprint': yield tokenize.NAME, 'print' yield tokenize.OP, '>>' yield tokenize.NAME, "open" yield tokenize.OP, "(" yield tokenize.STRING, repr(LOGFILE) yield tokenize.OP, "," yield tokenize.STRING, "'a'" yield tokenize.OP, ")" yield tokenize.OP, "," else: yield type,name (This does make myprint effectively a keyword, so use as a variable elsewhere will likely cause problems) The problem then is how to use it so that your code is usable from python. One way would just be to write your own import function, and use it to load code written in your custom language. ie: import new def myimport(filename): mod = new.module(filename) f=open(filename) data = tokenize.untokenize(translate(f.readline)) exec data in mod.__dict__ return mod This requires you handle your customised code differently from normal python modules however. ie "some_mod = myimport("some_mod.py")" rather than "import some_mod" Another fairly neat (albeit hacky) solution is to create a custom encoding (See PEP 263) as this recipe demonstrates. You could implement this as: import codecs, cStringIO, encodings from encodings import utf_8 class StreamReader(utf_8.StreamReader): def __init__(self, *args, **kwargs): codecs.StreamReader.__init__(self, *args, **kwargs) data = tokenize.untokenize(translate(self.stream.readline)) self.stream = cStringIO.StringIO(data) def search_function(s): if s!='mylang': return None utf8=encodings.search_function('utf8') # Assume utf8 encoding return codecs.CodecInfo( name='mylang', encode = utf8.encode, decode = utf8.decode, incrementalencoder=utf8.incrementalencoder, incrementaldecoder=utf8.incrementaldecoder, streamreader=StreamReader, streamwriter=utf8.streamwriter) codecs.register(search_function) Now after this code gets run (eg. you could place it in your .pythonrc or site.py) any code starting with the comment "# coding: mylang" will automatically be translated through the above preprocessing step. eg. # coding: mylang myprint "this gets logged to file" for i in range(10): myprint "so does this : ", i, "times" myprint ("works fine" "with arbitrary" + " syntax" "and line continuations") Caveats: There are problems to the preprocessor approach, as you'll probably be familiar with if you've worked with the C preprocessor. The main one is debugging. All python sees is the preprocessed file which means that text printed in the stack trace etc will refer to that. If you've performed significant translation, this may be very different from your source text. The example above doesn't change line numbers etc, so won't be too different, but the more you change it, the harder it will be to figure out.
Yes, to some extent it is possible. There is a module out there that uses sys.settrace() to implement goto and comefrom "keywords": from goto import goto, label for i in range(1, 10): for j in range(1, 20): print i, j if j == 3: goto .end # breaking out from nested loop label .end print "Finished"
Short of changing and recompiling the source code (which is possible with open source), changing the base language is not really possible. Even if you do recompile the source, it wouldn't be python, just your hacked-up changed version which you need to be very careful not to introduce bugs into. However, I'm not sure why you'd want to. Python's object-oriented features makes it quite simple to achieve similar results with the language as it stands.
General answer: you need to preprocess your source files. More specific answer: install EasyExtend, and go through following steps i) Create a new langlet ( extension language ) import EasyExtend EasyExtend.new_langlet("mystmts", prompt = "my> ", source_ext = "mypy") Without additional specification a bunch of files shall be created under EasyExtend/langlets/mystmts/ . ii) Open mystmts/parsedef/Grammar.ext and add following lines small_stmt: (expr_stmt | print_stmt | del_stmt | pass_stmt | flow_stmt | import_stmt | global_stmt | exec_stmt | assert_stmt | my_stmt ) my_stmt: 'mystatement' expr This is sufficient to define the syntax of your new statement. The small_stmt non-terminal is part of the Python grammar and it's the place where the new statement is hooked in. The parser will now recognize the new statement i.e. a source file containing it will be parsed. The compiler will reject it though because it still has to be transformed into valid Python. iii) Now one has to add semantics of the statement. For this one has to edit msytmts/langlet.py and add a my_stmt node visitor. def call_my_stmt(expression): "defines behaviour for my_stmt" print "my stmt called with", expression class LangletTransformer(Transformer): #transform def my_stmt(self, node): _expr = find_node(node, symbol.expr) return any_stmt(CST_CallFunc("call_my_stmt", [_expr])) __publish__ = ["call_my_stmt"] iv) cd to langlets/mystmts and type python run_mystmts.py Now a session shall be started and the newly defined statement can be used: __________________________________________________________________________________ mystmts On Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit (Intel)] __________________________________________________________________________________ my> mystatement 40+2 my stmt called with 42 Quite a few steps to come to a trivial statement, right? There isn't an API yet that lets one define simple things without having to care about grammars. But EE is very reliable modulo some bugs. So it's just a matter of time that an API emerges that lets programmers define convenient stuff like infix operators or small statements using just convenient OO programming. For more complex things like embedding whole languages in Python by means of building a langlet there is no way of going around a full grammar approach.
Here's a very simple but crappy way to add new statements, in interpretive mode only. I'm using it for little 1-letter commands for editing gene annotations using only sys.displayhook, but just so I could answer this question I added sys.excepthook for the syntax errors as well. The latter is really ugly, fetching the raw code from the readline buffer. The benefit is, it's trivially easy to add new statements this way. jcomeau#intrepid:~/$ cat demo.py; ./demo.py #!/usr/bin/python -i 'load everything needed under "package", such as package.common.normalize()' import os, sys, readline, traceback if __name__ == '__main__': class t: #staticmethod def localfunction(*args): print 'this is a test' if args: print 'ignoring %s' % repr(args) def displayhook(whatever): if hasattr(whatever, 'localfunction'): return whatever.localfunction() else: print whatever def excepthook(exctype, value, tb): if exctype is SyntaxError: index = readline.get_current_history_length() item = readline.get_history_item(index) command = item.split() print 'command:', command if len(command[0]) == 1: try: eval(command[0]).localfunction(*command[1:]) except: traceback.print_exception(exctype, value, tb) else: traceback.print_exception(exctype, value, tb) sys.displayhook = displayhook sys.excepthook = excepthook >>> t this is a test >>> t t command: ['t', 't'] this is a test ignoring ('t',) >>> ^D
I've found a guide on adding new statements: https://troeger.eu/files/teaching/pythonvm08lab.pdf Basically, to add new statements, you must edit Python/ast.c (among other things) and recompile the python binary. While it's possible, don't. You can achieve almost everything via functions and classes (which wont require people to recompile python just to run your script..)
It's possible to do this using EasyExtend: EasyExtend (EE) is a preprocessor generator and metaprogramming framework written in pure Python and integrated with CPython. The main purpose of EasyExtend is the creation of extension languages i.e. adding custom syntax and semantics to Python.
It's not exactly adding new statements to the language syntax, but macros are a powerful tool: https://github.com/lihaoyi/macropy
Some things can be done with decorators. Let's e.g. assume, Python had no with statement. We could then implement a similar behaviour like this: # ====== Implementation of "mywith" decorator ====== def mywith(stream): def decorator(function): try: function(stream) finally: stream.close() return decorator # ====== Using the decorator ====== #mywith(open("test.py","r")) def _(infile): for l in infile.readlines(): print(">>", l.rstrip()) It is a pretty unclean solution however as done here. Especially the behaviour where the decorator calls the function and sets _ to None is unexpected. For clarification: This decorator is equivalent to writing def _(infile): ... _ = mywith(open(...))(_) # mywith returns None. and decorators are normally expected to modify, not to execute, functions. I used such a method before in a script where I had to temporarily set the working directory for several functions.
OUTDATED: The Logix project is now deprecated and no longer developed, per the Logix website. There is a language based on python called Logix with which you CAN do such things. It hasn't been under development for a while, but the features that you asked for do work with the latest version.
Not without modifying the interpreter. I know a lot of languages in the past several years have been described as "extensible", but not in the way you're describing. You extend Python by adding functions and classes.
Ten years ago you couldn't, and I doubt that's changed. However, it wasn't that hard to modify the syntax back then if you were prepared to recompile python, and I doubt that's changed, either.
Generating Python code with Hy macros
I am trying to generate some python code from Hy. How is that done better? I have tried several approaches. One is with a macro: (defmacro make-vars [data] (setv res '()) (for [element data] (setv varname (HySymbol (+ "var" (str element)))) (setv res (cons `(setv ~varname 0) res))) `(do ~#res)) Then after capturing macroexpansion, I print python disassembly of a code. However, it seems that with macros I am unable to pass variables, so that: (setv vnames [1 2 3]) (make-vars vnames) defines varv, varn, vara and so on, in stead of var1, var2, var3. It seems that correct invocation could be made with: (macroexpand `(make-vars ~vnames)) but that seems to be excessively complex. Other issue I have encountered is the necessity of HySymbol, which came as a big surprise. But I have really been hurt by that, when I tried second approach, where I made a function that returns quoted forms: (defn make-faction-detaches [faction metadata unit-types] (let [meta-base (get metadata "Base") meta-pattern (get metadata "Sections") class-cand [] class-def '() class-grouping (dict)] (for [(, sec-name sec-flag) (.iteritems meta-pattern)] ;; if section flag is set but no unit types with the section are found, break and return nothing (print "checking" sec-name) (if-not (or (not sec-flag) (any (genexpr (in sec-name (. ut roles)) [ut unit-types]))) (break) ;; save unit types for section (do (print "match for section" sec-name) (setv sec-grouping (list-comp ut [ut unit-types] (in sec-name (. ut roles)))) (print (len sec-grouping) "types found for section" sec-name) (when sec-grouping (assoc class-grouping sec-name sec-grouping)))) ;; in case we finished the cycle (else (do (def class-name (.format "{}_{}" (. meta-base __name__) (fix-faction-string faction)) army-id (.format "{}_{}" (. meta-base army_id) (fix-faction-string faction)) army-name (.format "{} ({})" (fix-faction-name faction) (. meta-base army_name))) (print "Class name is" class-name) (print "Army id is" army-id) (print "Army name is" army-name) (setv class-cand [(HySymbol class-name)]) (setv class-def [`(defclass ~(HySymbol class-name) [~(HySymbol (. meta-base __name__))] [army_name ~(HyString army-name) faction ~(HyString faction) army_id ~(HyString army-id)] (defn --init-- [self] (.--init-- (super) ~(HyDict (interleave (genexpr (HyString k) [k class-grouping]) (cycle [(HyInteger 1)])))) ~#(map (fn [key] `(.add-classes (. self ~(HySymbol key)) ~(HyList (genexpr (HySymbol (. ut __name__)) [ut (get class-grouping key)])))) class-grouping)))])))) (, class-def class-cand))) That function takes metadata that looks like this in python: metadata = [ {'Base': DetachPatrol, 'Sections': {'hq': True, 'elite': False, 'troops': True, 'fast': False, 'heavy': False, 'fliers': False, 'transports': False}}] And takes a list of classes that have form of: class SomeSection(object): roles = ['hq'] It required extensive usage of internal classes of hy, and I failed to properly represent True and False, resorting to HyInteger(1) and HyInteger(0) instead. To get python code from this function, I run its result through disassemble. To summarise: What would be the best way to generate python code from Hy? What is internal representation for True and False? Can one call a function that processes its parameters and returns a quoted Hy form from a macro and how?
In Hy you generally don't need to generate Python code, since Hy is much better at generating Hy code, and it is just as executable. This is done all the time in Hy macros. In the unusual case that you need to generate real Python and not just Hy, the best way is with strings, the same way you'd do it in Python. Hy compiles to Python's AST, not to Python itself. The disassembler is really just for debugging purposes. It doesn't always generate valid Python: => (setv +!#$ 42) => +!#$ 42 => (disassemble '(setv +!#$ 42) True) '+!#$ = 42' => (exec (disassemble '(setv +!#$ 42) True)) Traceback (most recent call last): File "/home/gilch/repos/hy/hy/importer.py", line 193, in hy_eval return eval(ast_compile(expr, "<eval>", "eval"), namespace) File "<eval>", line 1, in <module> File "<string>", line 1 +!#$ = 42 ^ SyntaxError: invalid syntax => (exec "spam = 42; print(spam)") 42 The variable name +!#$ is just as legal as spam is in the AST, but Python's exec chokes on it because it is not a valid Python identifier. If you understand and are okay with this limitation, you can use disassemble, but without macros. Ordinary runtime functions are allowed to take and generate (as you demostrated) Hy expressions. Macros are really just functions like this than run at compile time. It's not unusual in Hy for a macro to delegate part of its work to an ordinary function that takes a Hy expression as one of its arguments and returns a Hy expression. The easiest way to create a Hy expression as data is to quote it with '. The backtick syntax for interpolating values is also valid even outside the body of a macro. You can use this in normal runtime functions too. But understand, you must insert quoted forms into the interpolation if you want to disassemble it, because that's what a macro would receives as arguments--the code itself, not its evaluated values. That's why you're using HySymbol and friends. => (setv class-name 'Foo) ; N.B. 'Foo is quoted => (print (disassemble `(defclass ~class-name) True)) class Foo: pass You can ask the REPL what types it uses for quoted forms. => (type 1) <class 'int'> => (type '1) <class 'hy.models.HyInteger'> => (type "foo!") <class 'str'> => (type '"foo!") <class 'hy.models.HyString'> => (type True) <class 'bool'> => (type 'True) <class 'hy.models.HySymbol'> As you can see, True is just a symbol internally. Note that I was able to generate a HySymbol with just ', without using the HySymbol call. If your metadata file was written in Hy and made with quoted Hy forms in the first place, you wouldn't have to convert them. But there's no reason it has to be done at the last minute inside the backtick form. That could be done in advance by a helper function if that's what you'd prefer. Followup Can one call a function that processes its parameters and returns a quoted Hy form from a macro and how? My original point was that a macro is the wrong tool for what you're trying to do. But to clarify, you can call a macro at runtime, by using macroexpand, as you already demonstrated. You can, of course, put the macroexpand call inside another function, but macroexpand must have a quoted form as its argument. Also, the same question about dynamically generated dictionaries. Construction I have used looks horrible. The dictionary part could be simplified to something more like {~#(interleave (map HyString class-grouping) (repeat '1))} While Python's dict is backed by a hash table, Hy's HyDict model is really just a list. This is because it doesn't represent the hash table itself, but the code that produces the dict. That's why you can splice into it just like a list. However if possible, could you add an example of properly passing dynamically generated strings into the final quoted expression? As far as I understand, it can be done with adding one more assignment (that would add quotation), but is there a more elegant way? Hy's models are considered part of the public API, they're just not used much outside of macros. It's fine to use them when you need to. Other Lisps don't make the same kind of distinction between code model objects and the data they produce. Hy does it this way for better Python interop. One could argue that the ~ syntax should do this conversion automatically for certain datatypes, but at present, it doesn't. [Update: On the current master branch, Hy's compiler will auto-wrap compatible values in a Hy model when it can, so you usually don't have to do this yourself anymore.] HySymbol is appropriate for dynamically generating symbols from strings like you're trying to do. It's not the only way, but it's what you want in this case. The other way, gensym, is used more often in macros, but they can't be as pretty. You can call gensym with a string to give it a more meaningful name for debugging purposes, but it still has a numeric suffix to make it unique. You could, of course, assign HySymbol a shorter alias, or delegate that part to a helper function. You can also convert it in advance, for example, the fragment (def class-name (.format "{}_{}" (. meta-base __name__) ... Could instead be (def class-name (HySymbol (.format "{}_{}" (. meta-base __name__) ... Then you don't have to do it twice. (setv class-cand [class-name]) (setv class-def [`(defclass ~class-name ... That probably makes the template easier to read. Update Hy master now mangles symbols to valid Python identifiers on compilation, so the hy2py tool and the astor disassembly should more reliably generate valid Python code, even if there are special characters in the symbols.
Scripts that Rely on Docstrings?
From the 2.7.2 docs, Section 6, Modules: Passing two -O flags to the Python interpreter (-OO) will cause the bytecode compiler to perform optimizations that could in some rare cases result in malfunctioning programs. Currently only __doc__ strings are removed from the bytecode, resulting in more compact .pyo files. This got my attention: Since some programs may rely on having these available, you should only use this option if you know what you’re doing. Are there any cases where removing a script's docstrings might logically break some dependency or other aspect of a code's functionality, disregarding any syntactic errors? EDIT Why would removing comments break a help statement? It doesnt seem to do so in the interpreter. >>> help('import_pi') Help on module import_pi: NAME import_pi FILE /home/droogans/py/import_pi.py FUNCTIONS print_pi() DATA pi = 3.1415926535897931 >>> import import_pi() >>> import_pi.__doc__ >>> >>> print import_pi.print_pi.__doc__ Convert a string or number to a floating point number, if possible.
For example ply is a module that does lexing and parsing which uses docstrings to describe the grammar. Stripping docstrings would break the code.
The -OO option only affects whether a doc string is stored -- it does not affect parsing. For example the following code works with and without optimizations enabled: def f(): 'Empty docstring' assert f() is None The programs that will break with the docstring optimization enabled are ones that rely on the contents of the docstring. Paul Hankin mentioned Ply which is tool that uses docstrings for its dispatch logic. Another example is the doctest module which uses the contents of docstrings for tests. Here's a simple example of code that won't work with the -OO optimization enabled: def f(): '30 + 40' return eval(f.__doc__) print f() Note help() will still work with the -OO optimization enabled, but it will only find the function name, arguments, and module, but not the docstring: >>> help(f) Help on function f in module __main__: f()
Conditional compilation in Python
How to do conditional compilation in Python ? Is it using DEF ?
Python isn't compiled in the same sense as C or C++ or even Java, python files are compiled "on the fly", you can think of it as being similar to a interpreted language like Basic or Perl.1 You can do something equivalent to conditional compile by just using an if statement. For example: if FLAG: def f(): print "Flag is set" else: def f(): print "Flag is not set" You can do the same for the creation classes, setting of variables and pretty much everything. The closest way to mimic IFDEF would be to use the hasattr function. E.g.: if hasattr(aModule, 'FLAG'): # do stuff if FLAG is defined in the current module. You could also use a try/except clause to catch name errors, but the idiomatic way would be to set a variable to None at the top of your script. Python code is byte compiled into an intermediate form like Java, however there generally isn't a separate compilation step. The "raw" source files that end in .py are executable.
There is actually a way to get conditional compilation, but it's very limited. if __debug__: doSomething() The __debug__ flag is a special case. When calling python with the -O or -OO options, __debug__ will be false, and the compiler will ignore that statement. This is used primarily with asserts, which is why assertions go away if you 'really compile' your scripts with optimization. So if your goal is to add debugging code, but prevent it from slowing down or otherwise affecting a 'release' build, this does what you want. But you cannot assign a value to __debug__, so that's about all you can use it for.
Use pypreprocessor Which can also be found on PYPI (Python Package Index) and can be fetched using pip. The basic example of usage is: from pypreprocessor import pypreprocessor pypreprocessor.parse() #define debug #ifdef debug print('The source is in debug mode') #else print('The source is not in debug mode') #endif You can also output the postprocessed code to a file by specifying... pypreprocessor.output = 'output_file_name.py' anywhere between the pypreprocessor import and the call to parse(). The module is essentially the python implementation of C preprocessor conditional compilation. SideNote: This is compatible with both python2x and python 3k Disclaimer: I'm the author of pypreprocessor Update: I forgot to mention before. Unlike the if/else or if _debug: approaches described in other answers, this is a true preprocessor. The bytecode produced will not contain the code that is conditionally excluded.
Python compiles a module automatically when you import it, so the only way to avoid compiling it is to not import it. You can write something like: if some_condition: import some_module But that would only work for complete modules. In C and C++ you typically use a preprocessor for conditional compilation. There is nothing stopping you from using a preprocessor on your Python code, so you could write something like: #ifdef SOME_CONDITION def some_function(): pass #endif Run that through a C preprocessor and you'd have real conditional compilation and some_function will only be defined if SOME_CONDITION is defined. BUT (and this is important): Conditional compilation is probably not what you want. Remember that when you import a module, Python simply executes the code in it. The def and class statements in the module are actually executed when you import the module. So the typical way of implementing what other languages would use conditional compilation for is just a normal if statement, like: if some_condition: def some_function(): pass This will only define some_function if some_condition is true. It's stuff like this that makes dynamic languages so powerful while remaining conceptually simple.
Doesn't make much sense in a dynamic environment. If you are looking for conditional definition of functions, you can use if: if happy: def makemehappy(): return "I'm good"
You could use the method discussed here: Determine if variable is defined in Python as a substitute for #ifdef