Development vs Release Python Code - python

I am currently developing a Python application which I continually performance test, simply by recording the runtime of various parts.
A lot of the code is related only to the testing environment and would not exist in the real world application, I have these separated into functions and at the moment I comment out these calls when testing. This requires me to remember which calls refer to test only components (they are quite interleaved so I cannot group the functionality).
I was wondering if there was a better solution to this, the only idea I have had so far is creation of a 'mode' boolean and insertion of If statements, though this feels needlessly messy. I was hoping there might be some more standardised testing method that I am naive of.
I am new to python so I may have overlooked some simple solutions.
Thank you in advance

There are libraries for testing like those in the development-section of the standard library. If you did not use such tools yet, you should start to do so - they help a lot with testing. (especially unittest).
Normally Python runs programs in debug mode with __debug__ set to True (see docs on assert) - you can switch off debug mode by setting the command-line switches -O or -OO for optimization (see docs).
There is something about using specifically assertions in the Python Wiki

I'd say if you're commenting out several parts of your code when switching between debug&release mode I think you're doing wrong. Take a look for example to the logging library, as you can see, with that library you can specify the logging level you want to use only by changing a single parameter.
Try to avoid commenting specific parts of your debug code by having one or more variables which controls the mode (debug, release, ...) your script will run. You could also use some builtin ones python already provides

Related

Application logs for support and analysis purpose

I am analyzing an existing Python code that runs into hundreds of line. Adding log per line to capture flow / understanding run time processing is painful - but then the current application logging is very poor by just using print data.
Hence for support purpose these are not enough as its difficult to understand without looking into code.
What is the best way of change these unstandard logs into at least something like -
Class Name - Method Name - Error Details additional more details
With small modifications - I also run into risk of breaking the flow if not dealt carefully.
Please let me know which application mechanism logging would be the best?
I would advise you to type "Python /?" in a command prompt and see which possibilities you have (e.g. python -v gives a verbose output on the import statements in your code). Like this you might find a way of having more information without needing to modify your source code. Obviously I don't know if the information you get from python -v is the one you're looking for.
I think probably decorators are your best option so you touch the code as less as possible.
First link redirects the standard stdout to a python standard logging module, so it would have the format you want if you specify it within the logger properties.
https://wiki.python.org/moin/PythonDecoratorLibrary#Redirects_stdout_printing_to_python_standard_logging.
https://wiki.python.org/moin/PythonDecoratorLibrary#Logging_decorator_with_specified_logger_.28or_default.29

Is monkeypatching stdlib methods a good practice in Python? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
Over time I found the need to override several stdlib methods from Python in order to overcome limitation or to add some missing functionality.
In all cases I added a wrapper function and replaced the original method from the module with my wrapper (the wrapper was calling the original method).
Why I did this? Just to be sure that all the calls to the method are using my new versions, even if these are called from other third-party modules.
I know that monkeypatching can be a bad thing but my question is if this is useful if you use it with care? Meaning that:
you still call the original methods, assuring that you are not missing anything when the original module is updated
you are not changing the original "meaning" of the methods
Examples:
add coloring support to python logging module.
make open() be able to recognize Unicode BOM masks when using text mode
adding logging support to os.system() or subprocess.Popen() - letting you output to console or/and redirect to another file.
implementing methods that are missing on your platform like os.chown() or os.lchown() that are missing on Windows.
Doing things like these appear to me as decent overrides but I would like to see how others are seeing them and specially what should be considered as an acceptable monkeypatch and what not.
None of these things seem to require monkeypatching. All of them seem to have better, more robust and reliable solutions.
Adding a logging handler is easy. No monkeypatch.
Fixing open is done this way.
from io import open
That was easy. No patch.
Logging to os.system()? I'd think that a simple "wrapper" function would be far better than a complex patch. Further, I'd use subprocess.Popen, since that's the recommended replacement.
Adding missing methods to mask OS differences (like os.chown()) seems like a better use for try/except. But that's just me. I like explicit rather than implicit.
On balance, I still can't see a good reason for monkeypatching.
I'd hate to be locked in to legacy code (like os.system) because I was too dependent on my monkeypatches.
The concept of "subclass" applies to modules as well as classes. You can easily write your own modules which (a) import and (b) extend existing modules. You then use your new modules because they provided extra features. You don't need to monkeypatch.
even if these are called from other third-party modules
Dreadful idea. You can easily break another module by altering built-in features. If you have read the other module and are sure the monkeypatches won't break then what you've found is this.
The "other" module should have had room for customization. It should have had a place for a "dependency injection" or Strategy design pattern. Good thinking.
Once you've found this, the "other" module can be fixed to allow this customization. It may be as simple as a documentation change explaining how to modify an object. It may be an additional
parameter for construction to insert your customization.
You can then provide the revised module to the authors to see if they'll support your small update to their module. Many classes can use extra help supporting a "dependency injection" or Strategy design for extensions.
If you have not read the other module and are not sure your monkeypatches work... well... we still have hope that the monkeypatches don't break anything.
Monkeypatching can be "the least of evils", sometimes -- mostly, when you need to test code which uses a subsystem that is not well designed for testability (doesn't support dependency injection &c). In those cases you will be monkeypatching (very temporarily, fortunately) in your test harness, and almost invariably monkeypatching with mocks or fakes for the purpose of isolating tests (i.e., making them unit tests, rather than integration tests).
This "bad but could be worse" use case does not appear to apply to your examples -- they can all be better architected by editing the application level code to call your appropriate wrapper functions (say myos.chown rather than the bare os.chown, for example) and putting your wrapper functions in your own intermediate modules (such as myown) that stand between the application level code and the standard library (or third-party extensions that you are thus wrapping -- there's nothing special about the standard library in this respect).
One problematic situation might arise when the "application level code" isn't really under your control -- it's a third party subsystem that you'd rather not modify. Nevertheless, I have found that in such situations modifying the third party subsystem to call wrappers (rather than the standard library functions directly) is way more productive in the long run -- then of course you submit the change to the maintainers of the third party subsystem in question, they roll your change into their subsystem's next release, and life gets better for everybody (you included, since once your changes are accepted they'll get routinely maintained and tested by others!-).
(As a side note, such wrappers may also be worth submitting as diffs to the standard library, but that is a different case since the standard library evolves very very slowly and cautiously, and in particular on the Python 2 line will never evolve any longer, since 2.7 is the last of that line and it's feature-frozen).
Of course, all of this presupposes an open-source culture. If for some mysterious reasons you're using a closed-source third party subsystem, therefore one which you cannot possibly maintain, then you are in another situation where monkey patching may be the lesser evil (but that's just because the evil of losing strategic control of your development by trusting in code you can't possibly maintain is such a bigger evil in itself;-). I've never found myself in this situation with a third-party package that was both closed-source and itself written in Python (if the latter condition doesn't hold your monkeypatches would do you no good;-).
Note that here the working definition of "closed-source" is really very strict: for example, even Microsoft 12+ years ago distributed sources of libraries such as MFC with Visual C++ (as their product was then called) -- closed-source because you couldn't redistribute their sources, but still, you DID have sources at hand, so when you met some terrible limitation or bug you COULD fix it (and submit the change to them for a future release, as well as publishing your change as a diff as long as it included absolutely none of their copyrighted code -- not trivial, but feasible).
Monkeypatching well beyond the strict confines within which such an approach is "the least of evil" is a frequent mistake of users of dynamic languages -- be careful not to fall into that trap yourself!

How can I sandbox Python in pure Python?

I'm developing a web game in pure Python, and want some simple scripting available to allow for more dynamic game content. Game content can be added live by privileged users.
It would be nice if the scripting language could be Python. However, it can't run with access to the environment the game runs on since a malicious user could wreak havoc which would be bad. Is it possible to run sandboxed Python in pure Python?
Update: In fact, since true Python support would be way overkill, a simple scripting language with Pythonic syntax would be perfect.
If there aren't any Pythonic script interpreters, are there any other open source script interpreters written in pure Python that I could use? The requirements are support for variables, basic conditionals and function calls (not definitions).
This is really non-trivial.
There are two ways to sandbox Python. One is to create a restricted environment (i.e., very few globals etc.) and exec your code inside this environment. This is what Messa is suggesting. It's nice but there are lots of ways to break out of the sandbox and create trouble. There was a thread about this on Python-dev a year ago or so in which people did things from catching exceptions and poking at internal state to break out to byte code manipulation. This is the way to go if you want a complete language.
The other way is to parse the code and then use the ast module to kick out constructs you don't want (e.g. import statements, function calls etc.) and then to compile the rest. This is the way to go if you want to use Python as a config language etc.
Another way (which might not work for you since you're using GAE), is the PyPy sandbox. While I haven't used it myself, word on the intertubes is that it's the only real sandboxed Python out there.
Based on your description of the requirements (The requirements are support for variables, basic conditionals and function calls (not definitions)) , you might want to evaluate approach 2 and kick out everything else from the code. It's a little tricky but doable.
Roughly ten years after the original question, Python 3.8.0 comes with auditing. Can it help? Let's limit the discussion to hard-drive writing for simplicity - and see:
from sys import addaudithook
def block_mischief(event,arg):
if 'WRITE_LOCK' in globals() and ((event=='open' and arg[1]!='r')
or event.split('.')[0] in ['subprocess', 'os', 'shutil', 'winreg']): raise IOError('file write forbidden')
addaudithook(block_mischief)
So far exec could easily write to disk:
exec("open('/tmp/FILE','w').write('pwned by l33t h4xx0rz')", dict(locals()))
But we can forbid it at will, so that no wicked user can access the disk from the code supplied to exec(). Pythonic modules like numpy or pickle eventually use the Python's file access, so they are banned from disk write, too. External program calls have been explicitly disabled, too.
WRITE_LOCK = True
exec("open('/tmp/FILE','w').write('pwned by l33t h4xx0rz')", dict(locals()))
exec("open('/tmp/FILE','a').write('pwned by l33t h4xx0rz')", dict(locals()))
exec("numpy.savetxt('/tmp/FILE', numpy.eye(3))", dict(locals()))
exec("import subprocess; subprocess.call('echo PWNED >> /tmp/FILE', shell=True)", dict(locals()))
An attempt of removing the lock from within exec() seems to be futile, since the auditing hook uses a different copy of locals that is not accessible for the code ran by exec. Please prove me wrong.
exec("print('muhehehe'); del WRITE_LOCK; open('/tmp/FILE','w')", dict(locals()))
...
OSError: file write forbidden
Of course, the top-level code can enable file I/O again.
del WRITE_LOCK
exec("open('/tmp/FILE','w')", dict(locals()))
Sandboxing within Cpython has proven extremely hard and many previous attempts have failed. This approach is also not entirely secure e.g. for public web access:
perhaps hypothetical compiled modules that use direct OS calls cannot be audited by Cpython - whitelisting the safe pure pythonic modules is recommended.
Definitely there is still the possibility of crashing or overloading the Cpython interpreter.
Maybe there remain even some loopholes to write the files on the harddrive, too. But I could not use any of the usual sandbox-evasion tricks to write a single byte. We can say the "attack surface" of Python ecosystem reduces to rather a narrow list of events to be (dis)allowed: https://docs.python.org/3/library/audit_events.html
I would be thankful to anybody pointing me to the flaws of this approach.
EDIT: So this is not safe either! I am very thankful to #Emu for his clever hack using exception catching and introspection:
#!/usr/bin/python3.8
from sys import addaudithook
def block_mischief(event,arg):
if 'WRITE_LOCK' in globals() and ((event=='open' and arg[1]!='r') or event.split('.')[0] in ['subprocess', 'os', 'shutil', 'winreg']):
raise IOError('file write forbidden')
addaudithook(block_mischief)
WRITE_LOCK = True
exec("""
import sys
def r(a, b):
try:
raise Exception()
except:
del sys.exc_info()[2].tb_frame.f_back.f_globals['WRITE_LOCK']
import sys
w = type('evil',(object,),{'__ne__':r})()
sys.audit('open', None, w)
open('/tmp/FILE','w').write('pwned by l33t h4xx0rz')""", dict(locals()))
I guess that auditing+subprocessing is the way to go, but do not use it on production machines:
https://bitbucket.org/fdominec/experimental_sandbox_in_cpython38/src/master/sandbox_experiment.py
AFAIK it is possible to run a code in a completely isolated environment:
exec somePythonCode in {'__builtins__': {}}, {}
But in such environment you can do almost nothing :) (you can not even import a module; but still a malicious user can run an infinite recursion or cause running out of memory.) Probably you would want to add some modules that will be the interface to you game engine.
I'm not sure why nobody mentions this, but Zope 2 has a thing called Python Script, which is exactly that - restricted Python executed in a sandbox, without any access to filesystem, with access to other Zope objects controlled by Zope security machinery, with imports limited to a safe subset.
Zope in general is pretty safe, so I would imagine there are no known or obvious ways to break out of the sandbox.
I'm not sure how exactly Python Scripts are implemented, but the feature was around since like year 2000.
And here's the magic behind PythonScripts, with detailed documentation: http://pypi.python.org/pypi/RestrictedPython - it even looks like it doesn't have any dependencies on Zope, so can be used standalone.
Note that this is not for safely running arbitrary python code (most of the random scripts will fail on first import or file access), but rather for using Python for limited scripting within a Python application.
This answer is from my comment to a question closed as a duplicate of this one: Python from Python: restricting functionality?
I would look into a two server approach. The first server is the privileged web server where your code lives. The second server is a very tightly controlled server that only provides a web service or RPC service and runs the untrusted code. You provide your content creator with your custom interface. For example you if you allowed the end user to create items, you would have a look up that called the server with the code to execute and the set of parameters.
Here's and abstract example for a healing potion.
{function_id='healing potion', action='use', target='self', inventory_id='1234'}
The response might be something like
{hp='+5' action={destroy_inventory_item, inventory_id='1234'}}
Hmm. This is a thought experiment, I don't know of it being done:
You could use the compiler package to parse the script. You can then walk this tree, prefixing all identifiers - variables, method names e.t.c. (also has|get|setattr invocations and so on) - with a unique preamble so that they cannot possibly refer to your variables. You could also ensure that the compiler package itself was not invoked, and perhaps other blacklisted things such as opening files. You then emit the python code for this, and compiler.compile it.
The docs note that the compiler package is not in Python 3.0, but does not mention what the 3.0 alternative is.
In general, this is parallel to how forum software and such try to whitelist 'safe' Javascript or HTML e.t.c. And they historically have a bad record of stomping all the escapes. But you might have more luck with Python :)
I think your best bet is going to be a combination of the replies thus far.
You'll want to parse and sanitise the input - removing any import statements for example.
You can then use Messa's exec sample (or something similar) to allow the code execution against only the builtin variables of your choosing - most likely some sort of API defined by yourself that provides the programmer access to the functionality you deem relevant.

What is the use of the "-O" flag for running Python?

Python can run scripts in optimized mode (python -O) which turns off debugs, removes assert statements, and IIRC it also removes docstrings.
However, I have not seen it used. Is python -O actually used? If so, what for?
python -O does the following currently:
completely ignores asserts
sets the special builtin name __debug__ to False (which by default is True)
and when called as python -OO
removes docstrings from the code
I don't know why everyone forgets to mention the __debug__ issue; perhaps it is because I'm the only one using it :) An if __debug__ construct creates no bytecode at all when running under -O, and I find that very useful.
It saves a small amount of memory, and a small amount of disk space if you distribute any archive form containing only the .pyo files. (If you use assert a lot, and perhaps with complicated conditions, the savings can be not trivial and can extend to running time too).
So, it's definitely not useless -- and of course it's being used (if you deploy a Python-coded server program to a huge number N of server machines, why ever would you want to waste N * X bytes to keep docstrings which nobody, ever, would anyway be able to access?!). Of course it would be better if it saved even more, but, hey -- waste not, want not!-)
So it's pretty much a no-brainer to keep this functionality (which is in any case trivially simple to provide, you know;-) in Python 3 -- why add even "epsilon" to the latter's adoption difficulties?-)
Prepacked software in different Linux distributions often comes byte-compiled with -O. For example, this if from Fedora packaging guidelines for python applications:
In the past it was common practice to %ghost .pyo files in order to save a small amount of space on the users filesystem. However, this has two issues: 1. With SELinux, if a user is running python -O [APP] it will try to write the .pyos when they don't exist. This leads to AVC denial records in the logs. 2. If the system administrator runs python -OO [APP] the .pyos will get created with no docstrings. Some programs require docstrings in order to function. On subsequent runs with python -O [APP] python will use the cached .pyos even though a different optimization level has been requested. The only way to fix this is to find out where the .pyos are and delete them.
The current method of dealing with pyo files is to include them as is, no %ghosting.
Removing assertions means a small performance benefit, so you could use this for "release" code. Anyway nobody uses it because many Python libraries are open sourced and thus the help() function should work.
So, as long as there isn't any real optimization in this mode, you can ignore it.

python coding speed and cleanest

Python is pretty clean, and I can code neat apps quickly.
But I notice I have some minor error someplace and I dont find the error at compile but at run time. Then I need to change and run the script again. Is there a way to have it break and let me modify and run?
Also, I dislike how python has no enums. If I were to write code that needs a lot of enums and types, should I be doing it in C++? It feels like I can do it quicker in C++.
"I don't find the error at compile but at run time"
Correct. True for all non-compiled interpreted languages.
"I need to change and run the script again"
Also correct. True for all non-compiled interpreted languages.
"Is there a way to have it break and let me modify and run?"
What?
If it's a run-time error, the script breaks, you fix it and run again.
If it's not a proper error, but a logic problem of some kind, then the program finishes, but doesn't work correctly. No language can anticipate what you hoped for and break for you.
Or perhaps you mean something else.
"...code that needs a lot of enums"
You'll need to provide examples of code that needs a lot of enums. I've been writing Python for years, and have no use for enums. Indeed, I've been writing C++ with no use for enums either.
You'll have to provide code that needs a lot of enums as a specific example. Perhaps in another question along the lines of "What's a Pythonic replacement for all these enums."
It's usually polymorphic class definitions, but without an example, it's hard to be sure.
With interpreted languages you have a lot of freedom. Freedom isn't free here either. While the interpreter won't torture you into dotting every i and crossing every T before it deems your code worthy of a run, it also won't try to statically analyze your code for all those problems. So you have a few choices.
1) {Pyflakes, pychecker, pylint} will do static analysis on your code. That settles the syntax issue mostly.
2) Test-driven development with nosetests or the like will help you. If you make a code change that breaks your existing code, the tests will fail and you will know about it. This is actually better than static analysis and can be as fast. If you test-first, then you will have all your code checked at test runtime instead of program runtime.
Note that with 1 & 2 in place you are a bit better off than if you had just a static-typing compiler on your side. Even so, it will not create a proof of correctness.
It is possible that your tests may miss some plumbing you need for the app to actually run. If that happens, you fix it by writing more tests usually. But you still need to fire up the app and bang on it to see what tests you should have written and didn't.
You might want to look into something like nosey, which runs your unit tests periodically when you've saved changes to a file. You could also set up a save-event trigger to run your unit tests in the background whenever you save a file (possible e.g. with Komodo Edit).
That said, what I do is bind the F7 key to run unit tests in the current directory and subdirectories, and the F6 key to run pylint on the current file. Frequent use of these allows me to spot errors pretty quickly.
Python is an interpreted language, there is no compile stage, at least not that is visible to the user. If you get an error, go back, modify the script, and try again. If your script has long execution time, and you don't want to stop-restart, you can try a debugger like pdb, using which you can fix some of your errors during runtime.
There are a large number of ways in which you can implement enums, a quick google search for "python enums" gives everything you're likely to need. However, you should look into whether or not you really need them, and if there's a better, more 'pythonic' way of doing the same thing.

Categories