Is it ever polite to put code in a python configuration file? - python

One of my favorite features about python is that you can write configuration files in python that are very simple to read and understand. If you put a few boundaries on yourself, you can be pretty confident that non-pythonistas will know exactly what you mean and will be perfectly capable of reconfiguring your program.
My question is, what exactly are those boundaries? My own personal heuristic was
Avoid flow control. No functions, loops, or conditionals. Those wouldn't be in a text config file and people aren't expecting to have understand them. In general, it probably shouldn't matter the order in which your statements execute.
Stick to literal assignments. Methods and functions called on objects are harder to think through. Anything implicit is going to be a mess. If there's something complicated that has to happen with your parameters, change how they're interpreted.
Language keywords and error handling are right out.
I guess I ask this because I came across a situation with my Django config file where it seems to be useful to break these rules. I happen to like it, but I feel a little guilty. Basically, my project is deployed through svn checkouts to a couple different servers that won't all be configured the same (some will share a database, some won't, for example). So, I throw a hook at the end:
try:
from settings_overrides import *
LOCALIZED = True
except ImportError:
LOCALIZED = False
where settings_overrides is on the python path but outside the working copy. What do you think, either about this example, or about python config boundaries in general?

There is a Django wiki page, which addresses exactly the thing you're asking.
http://code.djangoproject.com/wiki/SplitSettings
Do not reinvent the wheel. Use configparser and INI files. Python files are to easy to break by someone, who doesn't know Python.

Your heuristics are good. Rules are made so that boundaries are set and only broken when it's obviously a vastly better solution than the alternate.
Still, I can't help but wonder that the site checking code should be in the parser, and an additional configuration item added that selects which option should be taken.
I don't think that in this case the alternative is so bad that breaking the rules makes sense...
-Adam

I think it's a pain vs pleasure argument.
It's not wrong to put code in a Python config file because it's all valid Python, but it does mean you could confuse a user who comes in to reconfigure an app. If you're that worried about it, rope it off with comments explaining roughly what it does and that the user shouldn't edit it, rather edit the settings_overrides.py file.
As for your example, that's nigh on essential for developers to test then deploy their apps. Definitely more pleasure than pain. But you should really do this instead:
LOCALIZED = False
try:
from settings_overrides import *
except ImportError:
pass
And in your settings_overrides.py file:
LOCALIZED = True
... If nothing but to make it clear what that file does.. What you're doing there splits overrides into two places.

As a general practice, see the other answers on the page; it all depends. Specifically for Django, however, I see nothing fundamentally wrong with writing code in the settings.py file... after all, the settings file IS code :-)
The Django docs on settings themselves say:
A settings file is just a Python module with module-level variables.
And give the example:
assign settings dynamically using normal Python syntax. For example:
MY_SETTING = [str(i) for i in range(30)]

Settings as code is also a security risk. You import your "config", but in reality you are executing whatever code is in that file. Put config in files that you parse first and you can reject nonsensical or malicious values, even if it is more work for you. I blogged about this in December 2008.

Related

Write custom translation allocation in Pyramid

Pyramid uses gettext *.po files for translations, a very good and stable way to internationalize an application. It's one disadvantage is it cannot be changed from the app itself. I need some way to give a normal user the ability to change the translations on his own. Django allows changes in the file directly and after the change it restarts the whole app. I do not have that freedom, because the changes will be quite frequent.
Since I could not find any package that will help me with the task, I decided to override the Localizer. My idea is based on using a Translation Domain like Zope projects use and make Localizer search for registered domain, and if not found, back off to default translation strategy.
The problem is that I could not find a good way to place a custom translation solution into the Localizer itself. All I could think of is to reimplement the get_localizer method and rewrite the whole Localizer. But there are several things, that need to be copypasted here, such as interpolation of mappings and other tweeks related to translation strings.
I don't know how much things you have in there but I did something alike a while ago.. Will have to do it once again. The implementation is pretty simple...
If you can be sure that all calls will be handled trough _() or something alike. You can provide your own function. It will look something like it.
def _(val):
val = db.translations.find({key: id, locale: request.locale_name})
if val:
return val['value']
else:
return real_gettext(val)
this is pretty simple... then you need to have something that will dump the database into the file...
But I guess overriding the localizer makes more sense.. I did it a long time ago and overiding the function was easier than searching in the code.
The plus side of Localiser is that it will work everywhere. Monkey patch is pretty cool but it's also pretty ugly. If I had to do it again, I'd provide my own localizer that will load from a database first and then display it's own value. The reason behind database is that if someone closes the server and the file hasn't been modified you won't see the results.
If the DB is more than needed, then Localize is good enough and you can update the file on every update. If the server will get restarted it will load the new files... You'll have to compile the catalog first.

When should a Python script be split into multiple files/modules?

In Java, this question is easy (if a little tedious) - every class requires its own file. So the number of .java files in a project is the number of classes (not counting anonymous/nested classes).
In Python, though, I can define multiple classes in the same file, and I'm not quite sure how to find the point at which I split things up. It seems wrong to make a file for every class, but it also feels wrong just to leave everything in the same file by default. How do I know where to break a program up?
Remember that in Python, a file is a module that you will most likely import in order to use the classes contained therein. Also remember one of the basic principles of software development "the unit of packaging is the unit of reuse", which basically means:
If classes are most likely used together, or if using one class leads to using another, they belong in a common package.
As I see it, this is really a question about reuse and abstraction. If you have a problem that you can solve in a very general way, so that the resulting code would be useful in many other programs, put it in its own module.
For example: a while ago I wrote a (bad) mpd client. I wanted to make configuration file and option parsing easy, so I created a class that combined ConfigParser and optparse functionality in a way I thought was sensible. It needed a couple of support classes, so I put them all together in a module. I never use the client, but I've reused the configuration module in other projects.
EDIT: Also, a more cynical answer just occurred to me: if you can only solve a problem in a really ugly way, hide the ugliness in a module. :)
In Java ... every class requires its own file.
On the flipside, sometimes a Java file, also, will include enums or subclasses or interfaces, within the main class because they are "closely related."
not counting anonymous/nested classes
Anonymous classes shouldn't be counted, but I think tasteful use of nested classes is a choice much like the one you're asking about Python.
(Occasionally a Java file will have two classes, not nested, which is allowed, but yuck don't do it.)
Python actually gives you the choice to package your code in the way you see fit.
The analogy between Python and Java is that a file i.e., the .py file in Python is
equivalent to a package in Java as in it can contain many related classes and functions.
For good examples, have a look in the Python built-in modules.
Just download the source and check them out, the rule of thumb I follow is
when you have very tightly coupled classes or functions you keep them in a single file
else you break them up.

How can I sandbox Python in pure Python?

I'm developing a web game in pure Python, and want some simple scripting available to allow for more dynamic game content. Game content can be added live by privileged users.
It would be nice if the scripting language could be Python. However, it can't run with access to the environment the game runs on since a malicious user could wreak havoc which would be bad. Is it possible to run sandboxed Python in pure Python?
Update: In fact, since true Python support would be way overkill, a simple scripting language with Pythonic syntax would be perfect.
If there aren't any Pythonic script interpreters, are there any other open source script interpreters written in pure Python that I could use? The requirements are support for variables, basic conditionals and function calls (not definitions).
This is really non-trivial.
There are two ways to sandbox Python. One is to create a restricted environment (i.e., very few globals etc.) and exec your code inside this environment. This is what Messa is suggesting. It's nice but there are lots of ways to break out of the sandbox and create trouble. There was a thread about this on Python-dev a year ago or so in which people did things from catching exceptions and poking at internal state to break out to byte code manipulation. This is the way to go if you want a complete language.
The other way is to parse the code and then use the ast module to kick out constructs you don't want (e.g. import statements, function calls etc.) and then to compile the rest. This is the way to go if you want to use Python as a config language etc.
Another way (which might not work for you since you're using GAE), is the PyPy sandbox. While I haven't used it myself, word on the intertubes is that it's the only real sandboxed Python out there.
Based on your description of the requirements (The requirements are support for variables, basic conditionals and function calls (not definitions)) , you might want to evaluate approach 2 and kick out everything else from the code. It's a little tricky but doable.
Roughly ten years after the original question, Python 3.8.0 comes with auditing. Can it help? Let's limit the discussion to hard-drive writing for simplicity - and see:
from sys import addaudithook
def block_mischief(event,arg):
if 'WRITE_LOCK' in globals() and ((event=='open' and arg[1]!='r')
or event.split('.')[0] in ['subprocess', 'os', 'shutil', 'winreg']): raise IOError('file write forbidden')
addaudithook(block_mischief)
So far exec could easily write to disk:
exec("open('/tmp/FILE','w').write('pwned by l33t h4xx0rz')", dict(locals()))
But we can forbid it at will, so that no wicked user can access the disk from the code supplied to exec(). Pythonic modules like numpy or pickle eventually use the Python's file access, so they are banned from disk write, too. External program calls have been explicitly disabled, too.
WRITE_LOCK = True
exec("open('/tmp/FILE','w').write('pwned by l33t h4xx0rz')", dict(locals()))
exec("open('/tmp/FILE','a').write('pwned by l33t h4xx0rz')", dict(locals()))
exec("numpy.savetxt('/tmp/FILE', numpy.eye(3))", dict(locals()))
exec("import subprocess; subprocess.call('echo PWNED >> /tmp/FILE', shell=True)", dict(locals()))
An attempt of removing the lock from within exec() seems to be futile, since the auditing hook uses a different copy of locals that is not accessible for the code ran by exec. Please prove me wrong.
exec("print('muhehehe'); del WRITE_LOCK; open('/tmp/FILE','w')", dict(locals()))
...
OSError: file write forbidden
Of course, the top-level code can enable file I/O again.
del WRITE_LOCK
exec("open('/tmp/FILE','w')", dict(locals()))
Sandboxing within Cpython has proven extremely hard and many previous attempts have failed. This approach is also not entirely secure e.g. for public web access:
perhaps hypothetical compiled modules that use direct OS calls cannot be audited by Cpython - whitelisting the safe pure pythonic modules is recommended.
Definitely there is still the possibility of crashing or overloading the Cpython interpreter.
Maybe there remain even some loopholes to write the files on the harddrive, too. But I could not use any of the usual sandbox-evasion tricks to write a single byte. We can say the "attack surface" of Python ecosystem reduces to rather a narrow list of events to be (dis)allowed: https://docs.python.org/3/library/audit_events.html
I would be thankful to anybody pointing me to the flaws of this approach.
EDIT: So this is not safe either! I am very thankful to #Emu for his clever hack using exception catching and introspection:
#!/usr/bin/python3.8
from sys import addaudithook
def block_mischief(event,arg):
if 'WRITE_LOCK' in globals() and ((event=='open' and arg[1]!='r') or event.split('.')[0] in ['subprocess', 'os', 'shutil', 'winreg']):
raise IOError('file write forbidden')
addaudithook(block_mischief)
WRITE_LOCK = True
exec("""
import sys
def r(a, b):
try:
raise Exception()
except:
del sys.exc_info()[2].tb_frame.f_back.f_globals['WRITE_LOCK']
import sys
w = type('evil',(object,),{'__ne__':r})()
sys.audit('open', None, w)
open('/tmp/FILE','w').write('pwned by l33t h4xx0rz')""", dict(locals()))
I guess that auditing+subprocessing is the way to go, but do not use it on production machines:
https://bitbucket.org/fdominec/experimental_sandbox_in_cpython38/src/master/sandbox_experiment.py
AFAIK it is possible to run a code in a completely isolated environment:
exec somePythonCode in {'__builtins__': {}}, {}
But in such environment you can do almost nothing :) (you can not even import a module; but still a malicious user can run an infinite recursion or cause running out of memory.) Probably you would want to add some modules that will be the interface to you game engine.
I'm not sure why nobody mentions this, but Zope 2 has a thing called Python Script, which is exactly that - restricted Python executed in a sandbox, without any access to filesystem, with access to other Zope objects controlled by Zope security machinery, with imports limited to a safe subset.
Zope in general is pretty safe, so I would imagine there are no known or obvious ways to break out of the sandbox.
I'm not sure how exactly Python Scripts are implemented, but the feature was around since like year 2000.
And here's the magic behind PythonScripts, with detailed documentation: http://pypi.python.org/pypi/RestrictedPython - it even looks like it doesn't have any dependencies on Zope, so can be used standalone.
Note that this is not for safely running arbitrary python code (most of the random scripts will fail on first import or file access), but rather for using Python for limited scripting within a Python application.
This answer is from my comment to a question closed as a duplicate of this one: Python from Python: restricting functionality?
I would look into a two server approach. The first server is the privileged web server where your code lives. The second server is a very tightly controlled server that only provides a web service or RPC service and runs the untrusted code. You provide your content creator with your custom interface. For example you if you allowed the end user to create items, you would have a look up that called the server with the code to execute and the set of parameters.
Here's and abstract example for a healing potion.
{function_id='healing potion', action='use', target='self', inventory_id='1234'}
The response might be something like
{hp='+5' action={destroy_inventory_item, inventory_id='1234'}}
Hmm. This is a thought experiment, I don't know of it being done:
You could use the compiler package to parse the script. You can then walk this tree, prefixing all identifiers - variables, method names e.t.c. (also has|get|setattr invocations and so on) - with a unique preamble so that they cannot possibly refer to your variables. You could also ensure that the compiler package itself was not invoked, and perhaps other blacklisted things such as opening files. You then emit the python code for this, and compiler.compile it.
The docs note that the compiler package is not in Python 3.0, but does not mention what the 3.0 alternative is.
In general, this is parallel to how forum software and such try to whitelist 'safe' Javascript or HTML e.t.c. And they historically have a bad record of stomping all the escapes. But you might have more luck with Python :)
I think your best bet is going to be a combination of the replies thus far.
You'll want to parse and sanitise the input - removing any import statements for example.
You can then use Messa's exec sample (or something similar) to allow the code execution against only the builtin variables of your choosing - most likely some sort of API defined by yourself that provides the programmer access to the functionality you deem relevant.

What's the best way to record the type of every variable assignment in a Python program?

Python is so dynamic that it's not always clear what's going on in a large program, and looking at a tiny bit of source code does not always help. To make matters worse, editors tend to have poor support for navigating to the definitions of tokens or import statements in a Python file.
One way to compensate might be to write a special profiler that, instead of timing the program, would record the runtime types and paths of objects of the program and expose this data to the editor.
This might be implemented with sys.settrace() which sets a callback for each line of code and is how pdb is implemented, or by using the ast module and an import hook to instrument the code, or is there a better strategy? How would you write something like this without making it impossibly slow, and without runnning afoul of extreme dynamism e.g side affects on property access?
I don't think you can help making it slow, but it should be possible to detect the address of each variable when you encounter a STORE_FAST STORE_NAME STORE_* opcode.
Whether or not this has been done before, I do not know.
If you need debugging, look at PDB, this will allow you to step through your code and access any variables.
import pdb
def test():
print 1
pdb.set_trace() # you will enter an interpreter here
print 2
What if you monkey-patched object's class or another prototypical object?
This might not be the easiest if you're not using new-style classes.
You might want to check out PyChecker's code - it does (i think) what you are looking to do.
Pythoscope does something very similar to what you describe and it uses a combination of static information in a form of AST and dynamic information through sys.settrace.
BTW, if you have problems refactoring your project, give Pythoscope a try.

python coding speed and cleanest

Python is pretty clean, and I can code neat apps quickly.
But I notice I have some minor error someplace and I dont find the error at compile but at run time. Then I need to change and run the script again. Is there a way to have it break and let me modify and run?
Also, I dislike how python has no enums. If I were to write code that needs a lot of enums and types, should I be doing it in C++? It feels like I can do it quicker in C++.
"I don't find the error at compile but at run time"
Correct. True for all non-compiled interpreted languages.
"I need to change and run the script again"
Also correct. True for all non-compiled interpreted languages.
"Is there a way to have it break and let me modify and run?"
What?
If it's a run-time error, the script breaks, you fix it and run again.
If it's not a proper error, but a logic problem of some kind, then the program finishes, but doesn't work correctly. No language can anticipate what you hoped for and break for you.
Or perhaps you mean something else.
"...code that needs a lot of enums"
You'll need to provide examples of code that needs a lot of enums. I've been writing Python for years, and have no use for enums. Indeed, I've been writing C++ with no use for enums either.
You'll have to provide code that needs a lot of enums as a specific example. Perhaps in another question along the lines of "What's a Pythonic replacement for all these enums."
It's usually polymorphic class definitions, but without an example, it's hard to be sure.
With interpreted languages you have a lot of freedom. Freedom isn't free here either. While the interpreter won't torture you into dotting every i and crossing every T before it deems your code worthy of a run, it also won't try to statically analyze your code for all those problems. So you have a few choices.
1) {Pyflakes, pychecker, pylint} will do static analysis on your code. That settles the syntax issue mostly.
2) Test-driven development with nosetests or the like will help you. If you make a code change that breaks your existing code, the tests will fail and you will know about it. This is actually better than static analysis and can be as fast. If you test-first, then you will have all your code checked at test runtime instead of program runtime.
Note that with 1 & 2 in place you are a bit better off than if you had just a static-typing compiler on your side. Even so, it will not create a proof of correctness.
It is possible that your tests may miss some plumbing you need for the app to actually run. If that happens, you fix it by writing more tests usually. But you still need to fire up the app and bang on it to see what tests you should have written and didn't.
You might want to look into something like nosey, which runs your unit tests periodically when you've saved changes to a file. You could also set up a save-event trigger to run your unit tests in the background whenever you save a file (possible e.g. with Komodo Edit).
That said, what I do is bind the F7 key to run unit tests in the current directory and subdirectories, and the F6 key to run pylint on the current file. Frequent use of these allows me to spot errors pretty quickly.
Python is an interpreted language, there is no compile stage, at least not that is visible to the user. If you get an error, go back, modify the script, and try again. If your script has long execution time, and you don't want to stop-restart, you can try a debugger like pdb, using which you can fix some of your errors during runtime.
There are a large number of ways in which you can implement enums, a quick google search for "python enums" gives everything you're likely to need. However, you should look into whether or not you really need them, and if there's a better, more 'pythonic' way of doing the same thing.

Categories