For one of my projects I have a python program built around the python cmd class. This allowed me to craft a mini language around sql statements that I was sending to a database. Besides making it far easier to connect with python, I could do things that sql can't do. This was very important for several projects. However, I now need to add in if blocks for greater control flow.
My current thinking is that I will just add two new commands to the language, IF and END. These set a variable which determines whether or not to skip a line. I would like to know if anyone else has done this with the cmd module, and if so, is there a standard method I'm missing? Google doesn't seem to reveal anything, and the cmd docs don't reveal anything either.
For an idea that's similar to what I'm doing, go here. Questions and comments welcome. :)
Hmm, a little more complicated than what I was thinking, though having python syntax would be nice. I debated building a mini language for quite some time before I finally did it. The problem primarily comes in from the external limitations. I have a bunch of "data", which is being generous, to turn into sql. This is based on other "data" that won't pass through. It's also unique to each specific "version" of the problem. Doing straight data to sql would have been my first inclination, but was not practical.
For the curious, I spent a great deal of time going over the mini languages chapter in the art of unix programming, found here.
If I had built the thing in pure python, I wouldn't have had the flexibility I absolutely needed for the problem set.
The limitations of making a "mini language" have become apparent.
Proper languages have a tree-like structure and more complex syntax than cmd can handle easily.
Sometimes it's actually easier to use Python directly than it is to invent your own DSL.
Currently, your DSL probably reads a script-like file of commands.
Because of the way cmd works, your little comments get a string argument, which must be parsed. Then the command gets executed. And, further, each command is a method of the cmd.Cmd subclass.
Here's what you can do.
Each do_foo( self, args ) method becomes a stand-alone callable object. It will follow the Command design pattern. It will do exactly what the method function does now. No less. Exactly the same.
class Foo( object ):
def __init__( self, global_context ):
self.context= global_context
def __call__( self, args ):
... The rest of do_foo ...
Additionally, your existing cmd.Cmd subclass probably maintains some internal state.
All of the self.this and self.that instance variables must be changed to reference
and explicit context object.
class Context( object ): pass
Change self.this or self.that to self.context.this or self.context.that
Now, you can create your context and your various commands.
ctx = Context()
foo= Foo(ctx)
Your script changes syntax slightly. From:
foo argstring
bar argstring
to:
from mylanguage import foo, bar
foo( "argstring" )
bar( "argstring" )
This does Exactly what the CLI does now. No more. No less. Exactly the same. Slightly different syntax.
Now your script is no longer in a DSL that's hard to expand. It's in Python.
Having done that, you can now use Python syntax if statements.
You have the Exact functionality currently implemented in cmd with better syntax.
After examining the problem set some more, I've come to the conclusion that I can leave the minilanguage alone. It has all the features I need, and I don't have the time to rebuild the project from the ground up. This has been an interesting problem and I'm no longer sure I would build another minilanguage if I encountered the same situation. OTOH, it works very well here, and I am loathe to give up the advantages it has conferred.
Related
I've seen this multiple times in multiple places, but never have found a satisfying explanation as to why this should be the case.
So, hopefully, one will be presented here. Why should we (at least, generally) not use exec() and eval()?
EDIT: I see that people are assuming that this question pertains to web servers – it doesn't. I can see why an unsanitized string being passed to exec could be bad. Is it bad in non-web-applications?
There are often clearer, more direct ways to get the same effect. If you build a complex string and pass it to exec, the code is difficult to follow, and difficult to test.
Example: I wrote code that read in string keys and values and set corresponding fields in an object. It looked like this:
for key, val in values:
fieldName = valueToFieldName[key]
fieldType = fieldNameToType[fieldName]
if fieldType is int:
s = 'object.%s = int(%s)' % (fieldName, fieldType)
#Many clauses like this...
exec(s)
That code isn't too terrible for simple cases, but as new types cropped up it got more and more complex. When there were bugs they always triggered on the call to exec, so stack traces didn't help me find them. Eventually I switched to a slightly longer, less clever version that set each field explicitly.
The first rule of code clarity is that each line of your code should be easy to understand by looking only at the lines near it. This is why goto and global variables are discouraged. exec and eval make it easy to break this rule badly.
When you need exec and eval, yeah, you really do need them.
But, the majority of the in-the-wild usage of these functions (and the similar constructs in other scripting languages) is totally inappropriate and could be replaced with other simpler constructs that are faster, more secure and have fewer bugs.
You can, with proper escaping and filtering, use exec and eval safely. But the kind of coder who goes straight for exec/eval to solve a problem (because they don't understand the other facilities the language makes available) isn't the kind of coder that's going to be able to get that processing right; it's going to be someone who doesn't understand string processing and just blindly concatenates substrings, resulting in fragile insecure code.
It's the Lure Of Strings. Throwing string segments around looks easy and fools naïve coders into thinking they understand what they're doing. But experience shows the results are almost always wrong in some corner (or not-so-corner) case, often with potential security implications. This is why we say eval is evil. This is why we say regex-for-HTML is evil. This is why we push SQL parameterisation. Yes, you can get all these things right with manual string processing... but unless you already understand why we say those things, chances are you won't.
eval() and exec() can promote lazy programming. More importantly it indicates the code being executed may not have been written at design time therefore not tested. In other words, how do you test dynamically generated code? Especially across browsers.
Security aside, eval and exec are often marked as undesirable because of the complexity they induce. When you see a eval call you often don't know what's really going on behind it, because it acts on data that's usually in a variable. This makes code harder to read.
Invoking the full power of the interpreter is a heavy weapon that should be only reserved for very tricky cases. In most cases, however, it's best avoided and simpler tools should be employed.
That said, like all generalizations, be wary of this one. In some cases, exec and eval can be valuable. But you must have a very good reason to use them. See this post for one acceptable use.
In contrast to what most answers are saying here, exec is actually part of the recipe for building super-complete decorators in Python, as you can duplicate everything about the decorated function exactly, producing the same signature for the purposes of documentation and such. It's key to the functionality of the widely used decorator module (http://pypi.python.org/pypi/decorator/). Other cases where exec/eval are essential is when constructing any kind of "interpreted Python" type of application, such as a Python-parsed template language (like Mako or Jinja).
So it's not like the presence of these functions are an immediate sign of an "insecure" application or library. Using them in the naive javascripty way to evaluate incoming JSON or something, yes that's very insecure. But as always, its all in the way you use it and these are very essential functions.
I have used eval() in the past (and still do from time-to-time) for massaging data during quick and dirty operations. It is part of the toolkit that can be used for getting a job done, but should NEVER be used for anything you plan to use in production such as any command-line tools or scripts, because of all the reasons mentioned in the other answers.
You cannot trust your users--ever--to do the right thing. In most cases they will, but you have to expect them to do all of the things you never thought of and find all of the bugs you never expected. This is precisely where eval() goes from being a tool to a liability.
A perfect example of this would be using Django, when constructing a QuerySet. The parameters passed to a query accepts keyword arguments, that look something like this:
results = Foo.objects.filter(whatever__contains='pizza')
If you're programmatically assigning arguments, you might think to do something like this:
results = eval("Foo.objects.filter(%s__%s=%s)" % (field, matcher, value))
But there is always a better way that doesn't use eval(), which is passing a dictionary by reference:
results = Foo.objects.filter( **{'%s__%s' % (field, matcher): value} )
By doing it this way, it's not only faster performance-wise, but also safer and more Pythonic.
Moral of the story?
Use of eval() is ok for small tasks, tests, and truly temporary things, but bad for permanent usage because there is almost certainly always a better way to do it!
Allowing these function in a context where they might run user input is a security issue, and sanitizers that actually work are hard to write.
Same reason you shouldn't login as root: it's too easy to shoot yourself in the foot.
Don't try to do the following on your computer:
s = "import shutil; shutil.rmtree('/nonexisting')"
eval(s)
Now assume somebody can control s from a web application, for example.
Reason #1: One security flaw (ie. programming errors... and we can't claim those can be avoided) and you've just given the user access to the shell of the server.
Try this in the interactive interpreter and see what happens:
>>> import sys
>>> eval('{"name" : %s}' % ("sys.exit(1)"))
Of course, this is a corner case, but it can be tricky to prevent things like this.
I'm learning python and came into a situation where I need to change the behvaviour of a function. I'm initially a java programmer so in the Java world a change in a function would let Eclipse shows that a lot of source files in Java has errors. That way I can know which files need to get modified. But how would one do such a thing in python considering there are no types?! I'm using TextMate2 for python coding.
Currently I'm doing the brute-force way. Opening every python script file and check where I'm using that function and then modify. But I'm sure this is not the way to deal with large projects!!!
Edit: as an example I define a class called Graph in a python script file. Graph has two objects variables. I created many objects (each with different name!!!) of this class in many script files and then decided that I want to change the name of the object variables! Now I'm going through each file and reading my code again in order to change the names again :(. PLEASE help!
Example: File A has objects x,y,z of class C. File B has objects xx,yy,zz of class C. Class C has two instance variables names that should be changed Foo to Poo and Foo1 to Poo1. Also consider many files like A and B. What would you do to solve this? Are you serisouly going to open each file and search for x,y,z,xx,yy,zz and then change the names individually?!!!
Sounds like you can only code inside an IDE!
Two steps to free yourself from your IDE and become a better programmer.
Write unit tests for your code.
Learn how to use grep
Unit tests will exercise your code and provide reassurance that it is always doing what you wanted it to do. They make refactoring MUCH easier.
grep, what a wonderful tool grep -R 'my_function_name' src will find every reference to your function in files under the directory src.
Also, see this rather wonderful blog post: Unix as an IDE.
Whoa, slow down. The coding process you described is not scalable.
How exactly did you change the behavior of the function? Give specifics, please.
UPDATE: This all sounds like you're trying to implement a class and its methods by cobbling together a motley patchwork of functions and local variables - like I wrongly did when I first learned OO coding in Python. The code smell is that when the type/class of some class internal changes, it should generally not affect the class methods. If you're refactoring all your code every 10 mins, you're doing something seriously wrong. Step back and think about clean decomposition into objects, methods and data members.
(Please give more specifics if you want a more useful answer.)
If you were only changing input types, there might be no need to change the calling code.
(Unless the new fn does something very different to the old one, in which case what was the argument against calling it a different name?)
If you changed the return type, and you can't find a common ancestor type or container (tuple, sequence etc.) to put the return values in, then yes you need to change its caller code. However...
...however if the function should really be a method of a class, declare that class and the method already. The previous paragraph was a code smell that your function really should have been a method, specifically a polymorphic method.
Read about code smells, anti-patterns and When do you know you're dealing with an anti-pattern?. There e.g. you will find a recommendation for the video "Recovery from Addiction - A taste of the Python programming language's concision and elegance from someone who once suffered an addiction to the Java programming language." - Sean Kelly
Also, sounds like you want to use Test-Driven Design and add some unittests.
If you give us the specifics we can critique it better.
You won't get this functionality in a text editor. I use sublime text 3, and I love it, but it doesn't have this functionality. It does however jump to files and functions via its 'Goto Anything' (Ctrl+P) functionality, and its Multiple Selections / Multi Edit is great for small refactoring tasks.
However, when it comes to IDEs, JetBrains pycharm has some of the amazing re-factoring tools that you might be looking for.
The also free Python Tools for Visual Studio (see free install options here which can use the free VS shell) has some excellent Refactoring capabilities and a superb REPL to boot.
I use all three. I spend most of my time in sublime text, I like pycharm for refactoring, and I find PT4VS excellent for very involved prototyping.
Despite python being a dynamically typed language, IDEs can still introspect to a reasonable degree. But, of course, it won't approach the level of Java or C# IDEs. Incidentally, if you are coming over from Java, you may have come across JetBrains IntelliJ, which PyCharm will feel almost identical to.
One's programming style is certainly different between a statically typed language like C# and a dynamic language like python. I find myself doing things in smaller, testable modules. The iteration speed is faster. And in a dynamic language one relies less on IDE tools and more on unit tests that cover the key functionality. If you don't have these you will break things when you refactor.
One answer only specific to your edit:
if your old code was working and does not need to be modified, you could just keep old names as alias of the new ones, resulting in your old code not to be broken. Example:
class MyClass(object):
def __init__(self):
self.t = time.time()
# creating new names
def new_foo(self, arg):
return 'new_foo', arg
def new_bar(self, arg):
return 'new_bar', arg
# now creating functions aliases
foo = new_foo
bar = new_bar
if your code need rework, rewrite your common code, execute everything, and correct any failure. You could also look for any import/instantiation of your class.
One of the tradeoffs between statically and dynamically typed languages is that the latter require less scaffolding in the form of type declarations, but also provide less help with refactoring tools and compile-time error detection. Some Python IDEs do offer a certain level of type inference and help with refactoring, but even the best of them will not be able to match the tools developed for statically typed languages.
Dynamic language programmers typically ensure correctness while refactoring in one or more of the following ways:
Use grep to look for function invocation sites, and fix them. (You would have to do that in languages like Java as well if you wanted to handle reflection.)
Start the application and see what goes wrong.
Write unit tests, if you don't already have them, use a coverage tool to make sure that they cover your whole program, and run the test suite after each change to check that everything still works.
I want to automatically add sphinx comment under head functions and classes.
When I press Enter after head function or class, comment could be implemented like this:
def func(a): #<Enter>
"""
Args:
a (type): The name to use.
Returns:
type. The return
"""
Is it possible to configure .vimrc (.vimrc.local)? Do you know command for this? Or may be plugin?
Though you can do this with the built-in (insert-mode) mappings, you'll soon want to do more advanced insertions.
snippets are like the built-in :abbreviate on steroids, usually with parameter insertions, mirroring, and multiple stops inside them. One of the first, very famous (and still widely used) Vim plugins is snipMate (inspired by the TextMate editor); unfortunately, it's not maintained any more; though there is a fork. A modern alternative (that requires Python though) is UltiSnips. There are more, see this list on the Vim Tips Wiki.
There are two things to evaluate: First, the features of the snippet engine itself, and second, the quality and breadth of snippets provided by the author or others.
I'm developing a system that operates on (arbitrary) data from databases. The data may need some preprocessing before the system can work with it. To allow the user the specify possibly complex rules I though of giving the user the possibility to input Python code which is used to do this task. The system is pure Python.
My plan is to introduce the tables and columns as variables and let the user to anything Python can do (including access to the standard libs). Now to my problem:
How do I take a string (the user entered), compile it to Python (after adding code to provide the input data) and get the output. I think the easiest way would be to use the user-entered data a the body of a method and take the return value of that function a my new data.
Is this possible? If yes, how? It's unimportant that the user may enter malicious code since the worst thing that could happen is, that he screws up his own system, which is thankfully not my problem ;)
Python provides an exec() statement which should do what you want. You will want to pass in the variables that you want available as the second and/or third arguments to the function (globals and locals respectively) as those control the environment that the exec is run in.
For example:
env = {'somevar': 'somevalue'}
exec(code, env)
Alternatively, execfile() can be used in a similar way, if the code that you want executed is stored in its own file.
If you only have a single expression that you want to execute, you can also use eval.
Is this possible?
If it doesn't involve time travel, anti-gravity or perpetual motion the answer to this question is always "YES". You don't need to ask that.
The right way to proceed is as follows.
You build a framework with some handy libraries and packages.
You build a few sample applications that implement this requirement: "The data may need some preprocessing before the system can work with it."
You write documentation about how that application imports and uses modules from your framework.
You turn the framework, the sample applications and the documentation over to users to let them build these applications.
Don't waste time on "take a string (the user entered), compile it to Python (after adding code to provide the input data) and get the output".
The user should write applications like this.
from your_framework import the_file_loop
def their_function( one_line_as_dict ):
one_line_as_dict['field']= some stuff
the_file_loop( their_function )
That can actually be the entire program.
You'll have to write the_file_loop, which will look something like this.
def the_file_loop( some_function ):
with open('input') as source:
with open('output') as target:
for some_line in source:
the_data = make_a_dictionary( some_line )
some_function( the_data )
target.write( make_a_line( the_data ) )
By creating a framework, and allowing users to write their own programs, you'll be a lot happier with the results. Less magic.
2 choices:
You take his input and put it in a file, then you execute it.
You use exec()
If you just want to set some local values and then provide a python shell, check out the code module.
You can start an instance of a shell that is similar to the python shell, as well as initialize it with whatever local variables you want. This would assume that whatever functionality you want to use the resulting values is built into the classes you are passing in as locals.
Example:
shell = code.InteractiveConsole({'foo': myVar1, 'bar': myVar2})
What you actually want is exec, since eval is limited to taking an expression and returning a value. With exec, you can have code blocks (statements) and work on arbitrarily complex data, passed in as the globals and locals of the code.
The result is then returned by the code via some convention (like binding it to result).
well, you're describing compile()
But... I think I'd still implement this using regular python source files. Add a special location to the path, say '~/.myapp/plugins', and just __import__ everything there. Probably you'll want to provide some convenient base classes that expose the interface you're trying to offer, so that your users can inherit from them.
Python is so dynamic that it's not always clear what's going on in a large program, and looking at a tiny bit of source code does not always help. To make matters worse, editors tend to have poor support for navigating to the definitions of tokens or import statements in a Python file.
One way to compensate might be to write a special profiler that, instead of timing the program, would record the runtime types and paths of objects of the program and expose this data to the editor.
This might be implemented with sys.settrace() which sets a callback for each line of code and is how pdb is implemented, or by using the ast module and an import hook to instrument the code, or is there a better strategy? How would you write something like this without making it impossibly slow, and without runnning afoul of extreme dynamism e.g side affects on property access?
I don't think you can help making it slow, but it should be possible to detect the address of each variable when you encounter a STORE_FAST STORE_NAME STORE_* opcode.
Whether or not this has been done before, I do not know.
If you need debugging, look at PDB, this will allow you to step through your code and access any variables.
import pdb
def test():
print 1
pdb.set_trace() # you will enter an interpreter here
print 2
What if you monkey-patched object's class or another prototypical object?
This might not be the easiest if you're not using new-style classes.
You might want to check out PyChecker's code - it does (i think) what you are looking to do.
Pythoscope does something very similar to what you describe and it uses a combination of static information in a form of AST and dynamic information through sys.settrace.
BTW, if you have problems refactoring your project, give Pythoscope a try.