What tools or techniques can help avoid bugs, especially silly mistakes such as typos, coding in Python and Django?
I know unit-testing every line of code is the "proper" way, but are there any shortcuts?
I know of pylint, but unfortunately it doesn't check Django ORM named parameters, where a typo can go unnoticed. Is there any tool that can handle this kind of bugs?
A colleague thought of an idea to gather smart statistics on tokens (for example about named parameters to functions...), and when a once-in-a-code-base token is encountered it is warned as possible typo.
Do you know of any tool that does something similar?
Sorry I don't know if I understand you correctly,
But I think a good IDE has automatic code validation and some of them also work with Django. For example, there is a good python plugin for Eclipse called PYDEV. There is also a good IDE based on Eclipse/Pydev called Aptana Studio that you can try (it also has good support for editing HTML/JS/CSS).
This Question is also a very good comparison of all the Python IDE's.
pyflakes is a static analyser that will find undeclared variables (e.g. typos) and the like. plenty of editors have plugins that run pyflakes on the fly or on save. This is not a substitute for unit tests, but it can save a few unnecessary save-reload-run cycles
Thank you for your answers, I'll check these tools.
I wanted to share with you other ideas (none python/django specific):
Assert conditions in code - but remove from production code.
Run periodic checks on the data (eg. sending email to dev when found unexpected state) - in case a bug slips by it may be detected faster, before more data is corrupt (but alas after some of it is already corrupt).
Make a single bottom-line test (perhaps simulating user input), that covers most of the program. It may catch exceptions and asserts and is may be easier to maintain than many tests.
Related
My current makeshift approach is logging to a textfile, but that isn't very interactive. I've tried using pdb, but that doesn't seem to get along with urwid, pdb doesn't take any input once it hits a breakpoint.
A couple of practices down the line... Debugging urwid is strange and not really well possible in the classical sense, most of the time after rendering the canvas you can't really check things anymore.
What helped me:
Routing errors into a file. If you get exceptions and want to understand what, where and how, nice implementation is given here: https://stackoverflow.com/a/12877023/5058041
Really try to understand what your modules are and how you want to achieve things. Reading the documentation for the n+1-time is a good idea.
Look at the implementation of the widgets you use. Often they have some more information.
I know that doesn't really count as debugging, but it helped me a lot in finding errors or strange behavior.
One thing I've found myself doing is to add a text widget just to display debugging messages.
I haven't built many complicated apps (a solitaire game was the biggest app i wrote with it), so this approach was good enough.
In some specific cases, you might still be able to get away using PUDB -- but since it's also using Urwid, it will steal the output from the app. In practice, after you go from your app to pudb (maybe from a pudb.set_trace() breakpoint added to your code), then you won't be able to get back to your app.
For more complex applications it might be interesting to build a "debug mode", or maybe you could try using remote pudb? Haven't tried that yet, but it looks useful. =)
just in case anyone's searching for a better answer, I can report that VSCode's Python debugger debugpy is excellent for debugging urwid applications (and for debugging Python generally.) Your debugger is entirely separate from the console and doesn't interfere with drawing.
I'm creating a program in python (2.7) and I want to protect it from reverse engineering.
I compiled it using cx_freeze (supplies basic security- obfuscation and anti-debugging)
How can I add more protections such as obfuscation, packing, anti-debugging, encrypt the code recognize VM.
I thought maybe to encrypt to payload and decrypt it on run time, but I have no clue how to do it.
Generally speaking, it's almost impossible for you to make your program unbreakable as long as there's enough motive for the hackers.
But still you can make it harder to be reverse engineered, try to use cython to compile your core codes into pyd or so files.
There's no way to make anything digital safe nowadays.
What you CAN do is making it hard to a point where it's frustrating to do it, but I admit I don't know python specific ways to achieve that. The amount of security of your program is not actually a function of programsecurity, but of psychology.
Yes, psychology.
Given the fact that it's an arms race between crackers and anti-crackers, where both continuously attempt to top each other, the only thing one can do is trying to make it as frustrating as possible. How do we achieve that?
By being a pain in the rear!
Every additional step you take to make sure your code is hard to decipher is a good one.
For example could you turn your program into a single compiled block of bytecode, which you call from inside your program. Use an external library to encrypt it beforehand and decrypt it afterwards. Do the same with extra steps for codeblocks of functions. Or, have functions in precompiled blocks ready, but broken. At runtime, utilizing byteplay, repair the bytecode with bytes depending on other bytes of different functions, which would then stop your program from working when modified.
There are lots of ways of messing with people's heads and while I can't tell you any python specific ways, if you think in context of "How to be difficult", you'll find the weirdest ways of making it a mess to deal with your code.
Funnily enough this is much easier in assembly, than python, so maybe you should look into executing foreign code via ctypes or whatever.
Summon your inner Troll!
Story time: I was a Python programmer for a long time. Recently I joined in a company as a Python programmer. My manager was a Java programmer for a decade I guess. He gave me a project and at the initial review, he asked me that are we obfuscating the code? and I said, we don't do that kind of thing in Python. He said we do that kind of things in Java and we want the same thing to be implemented in python. Eventually I managed to obfuscate code just removing comments and spaces and renaming local variables) but entire python debugging process got messed up.
Then he asked me, Can we use ProGuard? I didn't know what the hell it was. After some googling I said it is for Java and cannot be used in Python. I also said whatever we are building we deploy in our own servers, so we don't need to actually protect the code. But he was reluctant and said, we have a set of procedures and they must be followed before deploying.
Eventually I quit my job after a year tired of fighting to convince them Python is not Java. I also had no interest in making them to think differently at that point of time.
TLDR; Because of the open source nature of the Python, there are no viable tools available to obfuscate or encrypt your code. I also don't think it is not a problem as long as you deploy the code in your own server (providing software as a service). But if you actually provide the product to the customer, there are some tools available to wrap up your code or byte code and give it like a executable file. But it is always possible to view your code if they want to. Or you choose some other language that provides better protection if it is absolutely necessary to protect your code. Again keep in mind that it is always possible to do reverse engineering on the code.
Python is a relatively new language for me and I already see some of the trouble areas of maintaining a scripting language based project. I am just wondering how the larger community , with a scenario when one has to maintain a fairly large code base written by people who are not around anymore, deals with the following situations:
Return type of a function/method. Assuming past developers didn't document the code very well, this is turning out to be really annoying as I am basically reading code line by line to figure out what a method/function is suppose to return.
Code refactoring: I figured a lot of code need to be moved around, edited/deleted and etc. But lot of times simple errors, which would otherwise be compile time error in other compiled languages e.g. - wrong number of arguments, wrong type of arguments, method not present and etc, only show up when you run the code and the code reaches the problematic area. Therefore, whether a re-factored code will work at all or not can only be known once you run the code thoroughly. I am using PyLint with PyDev but still I find it very lacking in this respect.
You are right, that's an issue with dynamically typed interpreted languages.
There are to important things that can help:
Good documentation
Extensive unit-testing.
They apply to other languages as well of course, but here they are especially important.
As far as I know If code is not documented at all and the author isn't around anymore it's up to you to find out what the ode actually does.
That's why people should always stick to certain guidelindes that can be enforced by stylecheckers like pep8. https://pypi.python.org/pypi/pep8
Comments and docstrings should be included in every method to avoid such situation you're describing. http://www.python.org/dev/peps/pep-0257/#what-is-a-docstring
Also unittests are very helpfull for refactoring since you can check if you broke something with the click of a button. http://docs.python.org/2/library/unittest.html
hope this helps
Others have already mentioned documentation and unit-testing as being the main tools here. I want to add a third: the Python shell. One of the huge advantages of a non-compiled language like Python is that you can easily fire up the shell, import your module, and run the code there to see what it does and what it returns.
Linked to this is the Python debugger: just put import pdb;pdb.set_trace() at any point in your code, and when you run it you will be dropped into the interactive debugger where you can inspect the current values of the variables. In fact, the pdb shell is an actual Python shell as well, so you can even change things there.
I've looked at most of the IDE's out there. I've set up vim to use autocompletion and I'm using it right now. However, I can't seem to get it to work like Visual Studio with .NET. Autocompletion seems to work only in certain cases and it only shows methods and not what parameters they take. It's pretty much unusable to me.
What I'm after is a pop-up that will show me all methods available and the parameters they take. Pretty much the feel of VS2010 when you're programming .NET.
You won't get the kind of autocompletion in a dynamic language like Python that you get in more explicitly typed languages. Consider:
def MyFunction(MyArg):
MyArg.
When you type the "." in MyArg., you expect the editor to provide a list of methods with arguments. That can't happen in Python because the editor has absolutely no way of knowing what type (or types) MyArg could possibly be. Even the Python compiler doesn't have that information when it's compiling the code. That's why, if you put MyArg.SomeNonExistentFunction() you won't get any kind of error message until runtime.
If you wrote something like:
def MyFunction:
MyObject = MyClass(SomeArg)
MyObject.
then a smart enough editor can supply a list of methods available after that final ".".
You'll find that those editors that are supplying autocomplete "sometimes" are doing so in cases similar to my second example, and not doing so in cases similar to the first. With Python, that's as good as you can get.
I've been using Eclipse with the PyDev extension for some time now. The auto-completion there is really quite impressive, I highly recommend it.
Gedit has a developer plugin which tries to do some syntax completion. For reasons already mentioned, it doesn't work very well. I found it more annoying than helpful and disabled it after a few weeks trial.
ipython's new Qt console has tab completion and you can have some tooltip sort of popups with syntax help and docstrings. See screenshot below for example..
But as most people have already pointed out, this kind of thing you are asking for is really more appropriate for less dynamic languages.
I'm creating a corewars type application that runs on django and allows a user to upload some python code that will control their character. Now, I know the real answer to this is that as long as I'm taking code input from untrusted users I'll have security vulnerabilities. I'm just trying to minimize the risk as much as possible. Here are some that spring to mind:
__import__ (I'll probably also do some ast scanning to make sure there aren't any import statements)
open
file
input
raw_input
Are there any others I'm missing?
There are lots of answers on what to do in general about restricting Python at http://wiki.python.org/moin/SandboxedPython. When I looked at it some time ago, the Zope RestrictedPython looked the best solution, working with a whitelist system. You'll still need to take care in your own code so that you don't expose any security vulnerabilities, but that seems to be the best system out there.
Since you sound determined to do this, I'll link you to the standard rexec module, not because I think you should use it (don't - it has known vulnerabilities), but because it might be a good starting point for getting your webserver compromised your own restricted-execution framework.
In particular, under the heading "Defining restricted environments" several modules and functions are listed that were considered reasonably safe by the rexec designer; these might be usable as an initial whitelist of sorts. I'd also suggest examining its code for other gotchas you might not have thought of.
You will really need to avoid eval.
Imagine code such as:
eval("__impor" + "t__('whatever').destroy_your_server")
This is probably the most important one.
Yeah, you have to whitelist. There are so many ways to hide the bad commands.
This is NOT the worst case scenario:
the worst case scenario is that someone gets into the database
The worst case scenario is getting the entire machine rooted and you not noticing as it probes your other machines and keylogs your passwords. Isolate this machine and consider it hostile (DMZ, block it from being able to launch attacks internally and externally, etc). Run tripwire or AIDE on non-writeable media and log everything to a second host.
Finally, as plash shows, there are a lot of dangerous system calls that need to be protected against.
If you're not committed to using Python as the language inside the game, one possibility would be to embed Lua using LunaticPython (I suggest the bugfixes branch at https://code.launchpad.net/~dne/lunatic-python/bugfixes).
It's much easier to sandbox Lua than Python, and it's much easier to embed Lua than to create your own programming language.
You should use a whitelist, rather than a blacklist. If you use a blacklist, you will always miss something. Even if you don't, Python will add a function to the standard library, and you won't update your blacklist in time.
Things you're currently allowing but probably should not include:
compile
eval
reload (if they do access the filesystem somehow, this is basically import)
I agree that this would be very tricky to do correctly. One complication (among many) could be a user accessing one of these functions through a field in another class.
I would consider using another isolation mechanism, such as a virtual machine, instead or in addition to this. You might look at how codepad does it.