how do i edit a running python program? - python

scenario: a modular app that loads .py modules on the fly as it works. programmer (me) wishes to edit the code of a module and then re-load it into the program without halting execution.
can this be done?
i have tried running import a second time on an updated module.py, but the changes are not picked up

While reload does reload a module, as the other answer mentions, you need quite a few precautions to make it work smoothly -- and for some things you might believe would work easily, you're in for quite a shock in terms of amount of work actually needed.
If you ever use the form from module import afunction, then you've almost ensured reload won't work: you must exclusively import modules, never functions, classes, etc, from inside modules, if you want to have any hope of reload doing something useful all all (otherwise you'd have to somehow chases all the bits and pieces imported here and there from the module, and rebind each and every one of them -- eep;-). Note that I prefer following this rule anyway, whether I plan to do any reloading or not, but, with reload, it's crucial.
The difficult problem is: if you have, alive anywhere, instances of classes that existed in the previous version of the module, reload per se will do absolutely nothing to upgrade those instances. That problem is a truly hard one; one of the longest, hardest recipes in the Python Cookbook (2nd edition) is all about how to code your modules to support such "reload that actually upgrades existing instances". This only matters if you program in OOP style, of course, but... any Python program complex enough to need "reload this plugin" functionality is very likely to have lots of OOP in it, so it's hardly a minor issue.
The docs for reload are pretty complete and do mention this issue, but give no hint how to solve it. This recipe by Michael Hudson, from the Python Cookbook online, is better, but it's only the start of what we evolved into the printed (2nd edition) -- recipe 20.15, the online version of that is here (incomplete unless you sign up for a free time-limited preview of O'Reilly's commercial online books service).

reload() will do what you need. You just have to be careful about what code executes in your module upon re-compile; it is usually easiest if your module just contains definitions.

use the builtin command:
reload(module)
How do I unload (reload) a Python module?

You may get some use from the livecoding module.

Alex's answer and the others cover the general case.
However, since these modules you're working on are your own and you'll edit only them, it's possible to implement some kind of a local reloading mechanism rather than rely on the builtin reload. You can architect your module loader as something that loads up the files, compiles it and evals them into a specific namespace and tells the calling code that it's ready. You can take care of reloading by making sure that there's a decent init function which reinitialises the module properly. I assume you'll be editing only these.
It's a bit of work but might be interesting and worth your time.

Related

checking/verifying python code

Python is a relatively new language for me and I already see some of the trouble areas of maintaining a scripting language based project. I am just wondering how the larger community , with a scenario when one has to maintain a fairly large code base written by people who are not around anymore, deals with the following situations:
Return type of a function/method. Assuming past developers didn't document the code very well, this is turning out to be really annoying as I am basically reading code line by line to figure out what a method/function is suppose to return.
Code refactoring: I figured a lot of code need to be moved around, edited/deleted and etc. But lot of times simple errors, which would otherwise be compile time error in other compiled languages e.g. - wrong number of arguments, wrong type of arguments, method not present and etc, only show up when you run the code and the code reaches the problematic area. Therefore, whether a re-factored code will work at all or not can only be known once you run the code thoroughly. I am using PyLint with PyDev but still I find it very lacking in this respect.
You are right, that's an issue with dynamically typed interpreted languages.
There are to important things that can help:
Good documentation
Extensive unit-testing.
They apply to other languages as well of course, but here they are especially important.
As far as I know If code is not documented at all and the author isn't around anymore it's up to you to find out what the ode actually does.
That's why people should always stick to certain guidelindes that can be enforced by stylecheckers like pep8. https://pypi.python.org/pypi/pep8
Comments and docstrings should be included in every method to avoid such situation you're describing. http://www.python.org/dev/peps/pep-0257/#what-is-a-docstring
Also unittests are very helpfull for refactoring since you can check if you broke something with the click of a button. http://docs.python.org/2/library/unittest.html
hope this helps
Others have already mentioned documentation and unit-testing as being the main tools here. I want to add a third: the Python shell. One of the huge advantages of a non-compiled language like Python is that you can easily fire up the shell, import your module, and run the code there to see what it does and what it returns.
Linked to this is the Python debugger: just put import pdb;pdb.set_trace() at any point in your code, and when you run it you will be dropped into the interactive debugger where you can inspect the current values of the variables. In fact, the pdb shell is an actual Python shell as well, so you can even change things there.

bpython-like autocomplete and parameter description in Emacs Python Mode?

I've been using bpython for a while now for all of my Python interpreting needs. It's delightful, particularly when you're using unfamiliar new libraries, or libraries with a multitude of functions. In any case, it's nice to have a bpython interpreter running alongside what I'm doing, but it'd be even better if I had both the autocomplete-like feature, and the parameter description in the manner that bpython does while I'm editing code in Emacs. Am I completely crazy? Does anyone have an idea on how to do this?
Thanks,
Bradley Powers
You're not completely crazy.
python-mode can integrate with eldoc-mode to display the arg spec of the function you're calling at point. Just do M-x eldoc-mode while you're in a python file to turn it on and it should start working. It talks to an inferior python buffer to inspect the functions directly, so it should always be decently accurate. You can turn it on automatically for all new python-mode buffers with (add-hook 'python-mode-hook '(lambda () (eldoc-mode 1)) t) in your emacs startup file. Now, at this point I have to say that I don't do any regular python programming, and that when I tried it just now it didn't work. I spent a few minutes poking around in the source code and everything seems to be in place, but the code that it runs in the inferior process is just returning an empty string. Perhaps it's just my setup, or perhaps I'm reading the wrong source files; it's hard to say.
Emacs provides several different types of expansion/autocompletion. By default you have access to dabbrev-expand by hitting M-/. This is a fairly simple form of completion; it's just meant to work on any old file you happen to edit. More sophisticated is hippie-expand, but even that doesn't do anything python-specific. The documentation says that it can integrate with hippie-expand for exact completions, but this might be a lie; I couldn't figure out how it works. A little poking around shows several related solutions for this, all of which seem to rely on pymacs. If I were going to do a lot of python programming and didn't already have a fairly complicated emacs set up, I'd probably start by installing emacs-for-python. It looks to be a pretty complete setup, and even claims to have live warning/error detection.
In the spirit of helping others to help themselves, I'd like to point out how I came upon all of this information. My first step was to open a file in python-mode. I didn't actually have any python code available, so I just went to my scratch buffer and made it a python buffer (M-x python-mode). Then I asked for help about this strange new mode (C-h m) to see what it could do. It's author has kindly put a brief summary of what the mode can do which mentions eldoc-mode, Imenu, outline-mode, hippie-expand, rlcompleter, abbrev tables, and a bunch of other things. From there I started looking at the source code. For instance, to integrate with eldoc-mode, it defines a function called python-eldoc-function and gives that to the eldoc module for use in python buffers. Reading that code shows me how it interacts with the inferior buffer, etc.
I hope some of this helps.

How to prevent decompilation or inspecting python code?

let us assume that there is a big, commercial project (a.k.a Project), which uses Python under the hood to manage plugins for configuring new control surfaces which can be attached and used by Project.
There was a small information leak, some part of the Project's Python API leaked to the public information and people were able to write Python scripts which were called by the underlying Python implementation as a part of Project's plugin loading mechanism.
Further on, using inspect module and raw __dict__ readings, people were able to find out a major part of Project's underlying Python implementation.
Is there a way to keep the Python secret codes secret?
Quick look at Python's documentation revealed a way to suppres a import of inspect module this way:
import sys
sys.modules['inspect'] = None
Does it solve the problem completely?
No, this does not solve the problem. Someone could just rename the inspect module to something else and import it.
What you're trying to do is not possible. The python interpreter must be able to take your bytecode and execute it. Someone will always be able to decompile the bytecode. They will always be able to produce an AST and view the flow of the code with variable and class names.
Note that this process can also be done with compiled language code; the difference there is that you will get assembly. Some tools can infer C structure from the assembly, but I don't have enough experience with that to comment on the details.
What specific piece of information are you trying to hide? Could you keep the algorithm server side and make your software into a client that touches your web service? Keeping the code on a machine you control is the only way to really keep control over the code. You can't hand someone a locked box, the keys to the box, and prevent them from opening the box when they have to open it in order to run it. This is the same reason DRM does not work.
All that being said, it's still possible to make it hard to reverse engineer, but it will never be impossible when the client has the executable.
There is no way to keep your application code an absolute secret.
Frankly, if a group of dedicated and determined hackers (in the good sense, not in the pejorative sense) can crack the PlayStation's code signing security model, then your app doesn't stand a chance. Once you put your app into the hands of someone outside your company, it can be reverse-engineered.
Now, if you want to put some effort into making it harder, you can compile your own embedded python executable, strip out unnecessary modules, obfuscate the compiled python bytecode and wrap it up in some malware rootkit that refuses to start your app if a debugger is running.
But you should really think about your business model. If you see the people who are passionate about your product as a threat, if you see those who are willing to put time and effort into customizing your product to personalize their experience as a danger, perhaps you need to re-think your approach to security. Assuming you're not in the DRM business, or have a similar model that involves squeezing money from reluctant consumers, consider developing an approach that involves sharing information with your users, and allowing them to collaboratively improve your product.
Is there a way to keep the Python secret codes secret?
No there is not.
Python is particularly easy to reverse engineer, but other languages, even compiled ones, are easy enough to reverse.
You cannot fully prevent reverse engineering of software - if it comes down to it, one can always analyze the assembler instructions your program consists of.
You can, however, significantly complicate the process, for example by messing with Python internals. However, before jumping to how to do it, I'd suggest you evaluate whether to do it. It's usually harder to "steal" your code (one needs to fully understand them to be able to extend them, after all) than code it oneself. A pure, unobfuscated Python plugin interface, however, can be vital in creating a whole ecosystem around your program, far outweighing the possible downsides to having someone peek in your maybe not perfectly designed coding internals.

What builtin functions shouldn't be run by untrusted users?

I'm creating a corewars type application that runs on django and allows a user to upload some python code that will control their character. Now, I know the real answer to this is that as long as I'm taking code input from untrusted users I'll have security vulnerabilities. I'm just trying to minimize the risk as much as possible. Here are some that spring to mind:
__import__ (I'll probably also do some ast scanning to make sure there aren't any import statements)
open
file
input
raw_input
Are there any others I'm missing?
There are lots of answers on what to do in general about restricting Python at http://wiki.python.org/moin/SandboxedPython. When I looked at it some time ago, the Zope RestrictedPython looked the best solution, working with a whitelist system. You'll still need to take care in your own code so that you don't expose any security vulnerabilities, but that seems to be the best system out there.
Since you sound determined to do this, I'll link you to the standard rexec module, not because I think you should use it (don't - it has known vulnerabilities), but because it might be a good starting point for getting your webserver compromised your own restricted-execution framework.
In particular, under the heading "Defining restricted environments" several modules and functions are listed that were considered reasonably safe by the rexec designer; these might be usable as an initial whitelist of sorts. I'd also suggest examining its code for other gotchas you might not have thought of.
You will really need to avoid eval.
Imagine code such as:
eval("__impor" + "t__('whatever').destroy_your_server")
This is probably the most important one.
Yeah, you have to whitelist. There are so many ways to hide the bad commands.
This is NOT the worst case scenario:
the worst case scenario is that someone gets into the database
The worst case scenario is getting the entire machine rooted and you not noticing as it probes your other machines and keylogs your passwords. Isolate this machine and consider it hostile (DMZ, block it from being able to launch attacks internally and externally, etc). Run tripwire or AIDE on non-writeable media and log everything to a second host.
Finally, as plash shows, there are a lot of dangerous system calls that need to be protected against.
If you're not committed to using Python as the language inside the game, one possibility would be to embed Lua using LunaticPython (I suggest the bugfixes branch at https://code.launchpad.net/~dne/lunatic-python/bugfixes).
It's much easier to sandbox Lua than Python, and it's much easier to embed Lua than to create your own programming language.
You should use a whitelist, rather than a blacklist. If you use a blacklist, you will always miss something. Even if you don't, Python will add a function to the standard library, and you won't update your blacklist in time.
Things you're currently allowing but probably should not include:
compile
eval
reload (if they do access the filesystem somehow, this is basically import)
I agree that this would be very tricky to do correctly. One complication (among many) could be a user accessing one of these functions through a field in another class.
I would consider using another isolation mechanism, such as a virtual machine, instead or in addition to this. You might look at how codepad does it.

Python package name conventions

Is there a package naming convention for Python like Java's com.company.actualpackage? Most of the time I see simple, potentially colliding package names like "web".
If there is no such convention, is there a reason for it? What do you think of using the Java naming convention in the Python world?
Python has two "mantras" that cover this topic:
Explicit is better than implicit.
and
Namespaces are one honking great idea -- let's do more of those!
There is a convention for naming of and importing of modules that can be found in The Python Style Guide (PEP 8).
The biggest reason that there is no such convention to consistently prefix your modules names in a Java style, is because over time you end up with a lot of repetition in your code that doesn't really need to be there.
One of the problems with Java is it forces you to repeat yourself, constantly. There's a lot of boilerplate that goes into Java code that just isn't necessary in Python. (Getters/setters being a prime example of that.)
Namespaces aren't so much of a problem in Python because you are able to give modules an alias upon import. Such as:
import com.company.actualpackage as shortername
So you're not only able to create or manipulate the namespace within your programs, but are able to create your own keystroke-saving aliases as well.
The Java's conventions also has its own drawbacks. Not every opensource package has a stable website behind it. What should a maintainer do if his website changes? Also, using this scheme package names become long and hard to remember. Finally, the name of the package should represent the purpose of the package, not its owner
An update for anyone else who comes looking for this:
As of 2012, PEP 423 addresses this. PEP 8 touches on the topic briefly, but only to say: all lowercase or underscores.
The gist of it: pick memorable, meaningful names that aren't already used on PyPI.
There is no Java-like naming convention for Python packages. You can of course adopt one for any package you develop yourself, but you might have to invasively edit any package you may adopt from third parties, and the "culturally alien" naming convention will probably sap the changes of your own packages to be widely adopted outside of your organization.
Technically, there would be nothing wrong with Java's convention in Python (it would just make some from statements a tad longer, no big deal), but in practice the cultural aspects make it pretty much unfeasible.
The reason there's normally no package hierarchy is because Python packages aren't easily extended that way. Packages are actual directories, and though you can make packages look in multiple directories for sub-modules (by adding directories to the __path__ list of the package) it's not convenient, and easily done wrong. As for why Python packages aren't easily extended that way, well, that's a design choice. Guido didn't like deep hierarchies (and still doesn't) and doesn't think they're necessary.
The convention is to pick a toplevel package name that's obvious but unique to your project -- for example, the name of the project itself. You can structure everything inside it however you want (because you are in control of it.) Splitting the package into separate bits with separate owners is a little more work, but with a few guidelines it's possible. It's rarely needed.
There's nothing stopping you using that convention if you want to, but it's not at all standard in the Python world and you'd probably get funny looks. It's not much fun to take care of admin on packages when they're deeply nested in com.
It may sound sloppy to someone coming from Java, but in reality it doesn't really seem to have caused any big difficulties, even with packages as poorly-named as web.py.
The place where you often do get namespace conflicts in practice is relative imports: where code in package.module1 tries to import module2 and there's both a package.module2 and a module2 in the standard library (which there commonly is as the stdlib is large and growing). Luckily, ambiguous relative imports are going away.
I've been using python for years and 99.9% of the collisions I have seen comer from new developers trying to name a file "xml.py". I can see some advantages to the Java scheme, but most developers are smart enough to pick reasonable package names, so it really is't that big of a problem.

Categories