When is semicolon use in Python considered "good" or "acceptable"? - python

Python is a "whitespace delimited" language. However, the use of semicolons are allowed. For example, the following works, but it is frowned upon:
print("Hello!");
print("This is valid");
I've been using Python for several years now, and the only time I have ever used semicolons is in generating one-time command-line scripts with Python:
python -c "import inspect, mymodule; print(inspect.getfile(mymodule))"
Or adding code in comments on Stack Overflow (i.e., "you should try import os; print os.path.join(a,b)")
I also noticed in this answer to a similar question that the semicolon can also be used to make one line if blocks, as in
if x < y < z: print(x); print(y); print(z)
which is convenient for the two usage examples I gave (command-line scripts and comments).
The above examples are for communicating code in paragraph form or making short snippets, but not something I would expect in a production codebase.
Here is my question: in Python, is there ever a reason to use the semicolon in a production code? I imagine that they were added to the language solely for the reasons I have cited, but it’s always possible that Guido had a grander scheme in mind.
No opinions please; I'm looking either for examples from existing code where the semicolon was useful, or some kind of statement from the python docs or from Guido about the use of the semicolon.

PEP 8 is the official style guide and says:
Compound statements (multiple statements on the same line) are generally discouraged.
(See also the examples just after this in the PEP.)
While I don't agree with everything PEP 8 says, if you're looking for an authoritative source, that's it. You should use multi-statement lines only as a last resort. (python -c is a good example of such a last resort, because you have no way to use actual linebreaks in that case.)

I use semicolons in code all of the time. Our code folds across lines quite frequently, and a semicolon is a definitive assertion that a statement is ending.
output, errors, status = generate_output_with_errors_and_status(
first_monstrous_functional_argument(argument_one_to_argument
, argument_two_to_argument)
, second_argument);
See? They're quite helpful to readability.

Related

Why is exec() dangerous? [duplicate]

I've seen this multiple times in multiple places, but never have found a satisfying explanation as to why this should be the case.
So, hopefully, one will be presented here. Why should we (at least, generally) not use exec() and eval()?
EDIT: I see that people are assuming that this question pertains to web servers – it doesn't. I can see why an unsanitized string being passed to exec could be bad. Is it bad in non-web-applications?
There are often clearer, more direct ways to get the same effect. If you build a complex string and pass it to exec, the code is difficult to follow, and difficult to test.
Example: I wrote code that read in string keys and values and set corresponding fields in an object. It looked like this:
for key, val in values:
fieldName = valueToFieldName[key]
fieldType = fieldNameToType[fieldName]
if fieldType is int:
s = 'object.%s = int(%s)' % (fieldName, fieldType)
#Many clauses like this...
exec(s)
That code isn't too terrible for simple cases, but as new types cropped up it got more and more complex. When there were bugs they always triggered on the call to exec, so stack traces didn't help me find them. Eventually I switched to a slightly longer, less clever version that set each field explicitly.
The first rule of code clarity is that each line of your code should be easy to understand by looking only at the lines near it. This is why goto and global variables are discouraged. exec and eval make it easy to break this rule badly.
When you need exec and eval, yeah, you really do need them.
But, the majority of the in-the-wild usage of these functions (and the similar constructs in other scripting languages) is totally inappropriate and could be replaced with other simpler constructs that are faster, more secure and have fewer bugs.
You can, with proper escaping and filtering, use exec and eval safely. But the kind of coder who goes straight for exec/eval to solve a problem (because they don't understand the other facilities the language makes available) isn't the kind of coder that's going to be able to get that processing right; it's going to be someone who doesn't understand string processing and just blindly concatenates substrings, resulting in fragile insecure code.
It's the Lure Of Strings. Throwing string segments around looks easy and fools naïve coders into thinking they understand what they're doing. But experience shows the results are almost always wrong in some corner (or not-so-corner) case, often with potential security implications. This is why we say eval is evil. This is why we say regex-for-HTML is evil. This is why we push SQL parameterisation. Yes, you can get all these things right with manual string processing... but unless you already understand why we say those things, chances are you won't.
eval() and exec() can promote lazy programming. More importantly it indicates the code being executed may not have been written at design time therefore not tested. In other words, how do you test dynamically generated code? Especially across browsers.
Security aside, eval and exec are often marked as undesirable because of the complexity they induce. When you see a eval call you often don't know what's really going on behind it, because it acts on data that's usually in a variable. This makes code harder to read.
Invoking the full power of the interpreter is a heavy weapon that should be only reserved for very tricky cases. In most cases, however, it's best avoided and simpler tools should be employed.
That said, like all generalizations, be wary of this one. In some cases, exec and eval can be valuable. But you must have a very good reason to use them. See this post for one acceptable use.
In contrast to what most answers are saying here, exec is actually part of the recipe for building super-complete decorators in Python, as you can duplicate everything about the decorated function exactly, producing the same signature for the purposes of documentation and such. It's key to the functionality of the widely used decorator module (http://pypi.python.org/pypi/decorator/). Other cases where exec/eval are essential is when constructing any kind of "interpreted Python" type of application, such as a Python-parsed template language (like Mako or Jinja).
So it's not like the presence of these functions are an immediate sign of an "insecure" application or library. Using them in the naive javascripty way to evaluate incoming JSON or something, yes that's very insecure. But as always, its all in the way you use it and these are very essential functions.
I have used eval() in the past (and still do from time-to-time) for massaging data during quick and dirty operations. It is part of the toolkit that can be used for getting a job done, but should NEVER be used for anything you plan to use in production such as any command-line tools or scripts, because of all the reasons mentioned in the other answers.
You cannot trust your users--ever--to do the right thing. In most cases they will, but you have to expect them to do all of the things you never thought of and find all of the bugs you never expected. This is precisely where eval() goes from being a tool to a liability.
A perfect example of this would be using Django, when constructing a QuerySet. The parameters passed to a query accepts keyword arguments, that look something like this:
results = Foo.objects.filter(whatever__contains='pizza')
If you're programmatically assigning arguments, you might think to do something like this:
results = eval("Foo.objects.filter(%s__%s=%s)" % (field, matcher, value))
But there is always a better way that doesn't use eval(), which is passing a dictionary by reference:
results = Foo.objects.filter( **{'%s__%s' % (field, matcher): value} )
By doing it this way, it's not only faster performance-wise, but also safer and more Pythonic.
Moral of the story?
Use of eval() is ok for small tasks, tests, and truly temporary things, but bad for permanent usage because there is almost certainly always a better way to do it!
Allowing these function in a context where they might run user input is a security issue, and sanitizers that actually work are hard to write.
Same reason you shouldn't login as root: it's too easy to shoot yourself in the foot.
Don't try to do the following on your computer:
s = "import shutil; shutil.rmtree('/nonexisting')"
eval(s)
Now assume somebody can control s from a web application, for example.
Reason #1: One security flaw (ie. programming errors... and we can't claim those can be avoided) and you've just given the user access to the shell of the server.
Try this in the interactive interpreter and see what happens:
>>> import sys
>>> eval('{"name" : %s}' % ("sys.exit(1)"))
Of course, this is a corner case, but it can be tricky to prevent things like this.

Is there a way to put PyDev's #UndefinedVariable in multiline statements?

I am using PyDev with Eclipse and I have some attributes that are only set during runtime. Normally I can fix PyDev's errors like this:
obj.runtime_attr # #UndefinedVariable
However, since my statement is long and thus, with respect to PEP8, multiline, it looks like this:
some.long.statement.\
with.multiline(obj.runtime_attr).\
more()
Now I cannot add #UndefinedVariable because it breaks line continuation (PEP8 demands there are two spaces before a line-ending comment). However, I cannot put it in the end of the line (it just doesn't work):
some.long.statement.\
with.multiline(obj.runtime_attr).\
more() # #UndefinedVariable
Is there any way this could work that I am overlooking? Is this just a missing feature where you cannot get it right?
First, remember that the most important rule of PEP 8 is:
But most importantly: know when to be inconsistent -- sometimes the style guide just doesn't apply. When in doubt, use your best judgment. Look at other examples and decide what looks best.
And it specifically says to avoid a rule:
When applying the rule would make the code less readable, even for someone who is used to reading code that follows the rules.
That being said, you're already violating the letter and the spirit of PEP 8 just by having these lines of code, unless you can't avoid it without making things worse. As Maximum Line Length says, using backslash continuations is the least preferred way to deal with long lines. On top of that, it specifically says to "Make sure to indent the continued line appropriately", which you aren't doing.
The obvious way to break this up is to use some intermediates variables. This isn't C++; there's no "copy constructor" cost to worry about. In a real-life example (unlike this toy example), there are probably good names that you can come up with that will be much more meaningful than the long expression they replace.
intermediate = some.long.statement
multiline = intermediate.with.multiline(obj.runtime_attr)
more = multiline.more()
If that isn't appropriate, as PEP 8 explicitly says, it's better to rely on parenthetical continuations than backslash continuations. Is that doable here? Sure:
some.long.statement.with.multiline(
obj.runtime_attr).more()
Or, if worst comes to worst:
(some.long.statement.
with.multiline(obj.runtime_attr).more())
This sometimes makes things less readable rather than more, in which case you shouldn't do it. But it's always an option. And if you have to go to extraordinary lengths to make backslash continuation work for you, it's probably going to be worse than even the worst excesses of over-parenthetizing.
At any rate, doing things either of these ways means you can put a comment on the end of each line, so your problem never comes up in the first place.

Why is print not a function in python?

Why is print a keyword in python and not a function?
Because Guido has decided that he made a mistake. :)
It has since been corrected: try Python 3, which dedicates a section of its release notes to describing the change to a function.
For the whole background, see PEP 3105 and the several links provided in its References section!
print was a statement in Python because it was a statement in ABC, the main inspiration for Python (although it was called WRITE there). That in turn probably had a statement instead of a function as it was a teaching language and as such inspired by basic. Python on the other hand, turned out to be more than a teaching language (although it's good for that too).
However, nowadays print is a function. Yes, in Python 2 as well, you can do
from __future__ import print_function
and you are all set. Works since Python 2.6.
It is now a function in Python 3.
The print statement in Python 2.x has some special syntax which would not be available for an ordinary function. For example you can use a trailing , to suppress the output of a final newline or you can use >> to redirect the output to a file. But all this wasn't convincing enough even to Guido van Rossum himself to keep it a statement -- he turned print into a function in Python 3.x.
An answer that draws from what I appreciate about the print statement, but not necessarily from the official Python history...
Python is, to some extent, a scripting language. Now, there are lots of definitions of "scripting language", but the one I'll use here is: a language designed for efficient use of short or interactive programs. Such languages tend to allow one-line programs without excessive boilerplate; make keyboard input easier (for instance, by avoiding excessive punctuation); and provide built-in syntax for common tasks (convenience at the possible expense of purity). In Python's case, printing a value is a very common thing to do, especially in interactive mode. Requiring print to be a function seems unnecessarily inconvenient here. There's a significantly lower risk of error with the special syntax that does the right thing 99% of the time.
I will throw in my thoughts on this:
In Python 2.x print is not a statement by mistake, or because printing to stdout is such a basic thing to do. Everything else is so thought-through or has at least understandable reasons that a mistake of that order would seem odd. If communicating with stdout would have been cosidered so basic, communicating with stdin would have to be just as important, yet input() is a function.
If you look at the list of reserved keywords and the list of statements which are not expressions, print clearly stands out which is another hint that there must be very specific reasons.
I think print had to be a statement and not an expression, to avoid a security breach in input(). Remember that input() in Python2 evaluates whatever the user types into stdin. If the user typed print a and a holds a list of all passwords, that would be quiet catastrophic.
Apparently, the ability of input() to evaluate expressions was considered more important than print being a normal built-in function.

The origin of using # as a comment in Python?

So I just had like this mental explosion dude! I was looking at my Python source code and was reading some comments and then I looked a the comments again. When I came across this:
#!/usr/bin/env python
# A regular comment
Which made me wonder, was # chosen as the symbol to start a comment because it would allow the python program to be invoked in a shell, like so:
./test.py
and then be ignored once the python interpreter was running?
Yes.
Using # to start a comment is a convention followed by every major interpreted language designed to work on POSIX systems (i.e. not Windows).
It also dovetails nicely with the fact that the sequence "#!" at the beginning of a file is recognized by the OS to mean "run the command on this line" when you try to run the script file itself.
But mostly, it's the commonly accepted convention. If python didn't use # to start a comment, it would confuse a lot of people.
EDIT
Using "#" as a comment marker apparently pre-dates the "#!" hash-bang notation. "#!" was introduced by Dennis Ritchie betwen Unix 7 and 8, while languages that support # as a comment marker existed earlier. For example, the Bourne shell was already the default when Version 7 Unix was introduced.
Therefore, the convention of using "#" as a comment marker probably influenced the choice of "#!" as the command line marker, and not the other way around.
Using # for comments was happening before Python came around. The shebang (#!/usr/bin/env python) convention is almost as old as UNIX itself. The two are intertwined for many interpreted (aka shell) languages.
Might as well study up on the history of the shebang!
"All the rest of the line after character X" is clearly the handiest way to do comments if you have an X available (C++ had to use two characters, //, for the purpose, to offer an alternative to C's clunky PL/I-inspired '/' ... '/' "brackets").
Almost all printable Ascii characters can be used for other purposes in Python -- if the choice is between # and ?, with the first already being familiar from its use in sh, bash, tcl, perl, awk, ... -- it's not a very hard choice, is it? The handiness of hashbang is just a vig.
Very good, you are correct.
This is known as the shebang or hash-bang. I suppose a scripting language could use any comment character it liked and allow #! on the first line as a comment, but it does seem easier to just make # be the comment character...

Is there a way around coding in Python without the tab, indent & whitespace criteria?

I want to start using Python for small projects but the fact that a misplaced tab or indent can throw a compile error is really getting on my nerves. Is there some type of setting to turn this off?
I'm currently using NotePad++. Is there maybe an IDE that would take care of the tabs and indenting?
The answer is no.
At least, not until something like the following is implemented:
from __future__ import braces
No. Indentation-as-grammar is an integral part of the Python language, for better and worse.
Emacs! Seriously, its use of "tab is a command, not a character", is absolutely perfect for python development.
All of the whitespace issues I had when I was starting Python were the result mixing tabs and spaces. Once I configured everything to just use one or the other, I stopped having problems.
In my case I configured UltraEdit & vim to use spaces in place of tabs.
It's possible to write a pre-processor which takes randomly-indented code with pseudo-python keywords like "endif" and "endwhile" and properly indents things. I had to do this when using python as an "ASP-like" language, because the whole notion of "indentation" gets a bit fuzzy in such an environment.
Of course, even with such a thing you really ought to indent sanely, at which point the conveter becomes superfluous.
I find it hard to understand when people flag this as a problem with Python. I took to it immediately and actually find it's one of my favourite 'features' of the language :)
In other languages I have two jobs:
1. Fix the braces so the computer can parse my code
2. Fix the indentation so I can parse my code.
So in Python I have half as much to worry about ;-)
(nb the only time I ever have problem with indendation is when Python code is in a blog and a forum that messes with the white-space but this is happening less and less as the apps get smarter)
I'm currently using NotePad++. Is
there maybe an IDE that would take
care of the tabs and indenting?
I liked pydev extensions of eclipse for that.
I do not believe so, as Python is a whitespace-delimited language. Perhaps a text editor or IDE with auto-indentation would be of help. What are you currently using?
No, there isn't. Indentation is syntax for Python. You can:
Use tabnanny.py to check your code
Use a syntax-aware editor that highlights such mistakes (vi does that, emacs I bet it does, and then, most IDEs do too)
(far-fetched) write a preprocessor of your own to convert braces (or whatever block delimiters you love) into indentation
You should disable tab characters in your editor when you're working with Python (always, actually, IMHO, but especially when you're working with Python). Look for an option like "Use spaces for tabs": any decent editor should have one.
I agree with justin and others -- pick a good editor and use spaces rather than tabs for indentation and the whitespace thing becomes a non-issue. I only recently started using Python, and while I thought the whitespace issue would be a real annoyance it turns out to not be the case. For the record I'm using emacs though I'm sure there are other editors out there that do an equally fine job.
If you're really dead-set against it, you can always pass your scripts through a pre-processor but that's a bad idea on many levels. If you're going to learn a language, embrace the features of that language rather than try to work around them. Otherwise, what's the point of learning a new language?
Not really. There are a few ways to modify whitespace rules for a given line of code, but you will still need indent levels to determine scope.
You can terminate statements with ; and then begin a new statement on the same line. (Which people often do when golfing.)
If you want to break up a single line into multiple lines you can finish a line with the \ character which means the current line effectively continues from the first non-whitespace character of the next line. This visually appears violate the usual whitespace rules but is legal.
My advice: don't use tabs if you are having tab/space confusion. Use spaces, and choose either 2 or 3 spaces as your indent level.
A good editor will make it so you don't have to worry about this. (python-mode for emacs, for example, you can just use the tab key and it will keep you honest).
Tabs and spaces confusion can be fixed by setting your editor to use spaces instead of tabs.
To make whitespace completely intuitive, you can use a stronger code editor or an IDE (though you don't need a full-blown IDE if all you need is proper automatic code indenting).
A list of editors can be found in the Python wiki, though that one is a bit too exhausting:
- http://wiki.python.org/moin/PythonEditors
There's already a question in here which tries to slim that down a bit:
https://stackoverflow.com/questions/60784/poll-which-python-ideeditor-is-the-best
Maybe you should add a more specific question on that: "Which Python editor or IDE do you prefer on Windows - and why?"
Getting your indentation to work correctly is going to be important in any language you use.
Even though it won't affect the execution of the program in most other languages, incorrect indentation can be very confusing for anyone trying to read your program, so you need to invest the time in figuring out how to configure your editor to align things correctly.
Python is pretty liberal in how it lets you indent. You can pick between tabs and spaces (but you really should use spaces) and can pick how many spaces. The only thing it requires is that you are consistent which ultimately is important no matter what language you use.
I was a bit reluctant to learn Python because of tabbing. However, I almost didn't notice it when I used Vim.
If you don't want to use an IDE/text editor with automatic indenting, you can use the pindent.py script that comes in the Tools\Scripts directory. It's a preprocessor that can convert code like:
def foobar(a, b):
if a == b:
a = a+1
elif a < b:
b = b-1
if b > a: a = a-1
end if
else:
print 'oops!'
end if
end def foobar
into:
def foobar(a, b):
if a == b:
a = a+1
elif a < b:
b = b-1
if b > a: a = a-1
# end if
else:
print 'oops!'
# end if
# end def foobar
Which is valid python.
Nope, there's no way around it, and it's by design:
>>> from __future__ import braces
File "<stdin>", line 1
SyntaxError: not a chance
Most Python programmers simply don't use tabs, but use spaces to indent instead, that way there's no editor-to-editor inconsistency.
I'm surprised no one has mentioned IDLE as a good default python editor. Nice syntax colors, handles indents, has intellisense, easy to adjust fonts, and it comes with the default download of python. Heck, I write mostly IronPython, but it's so nice & easy to edit in IDLE and run ipy from a command prompt.
Oh, and what is the big deal about whitespace? Most easy to read C or C# is well indented, too, python just enforces a really simple formatting rule.
Many Python IDEs and generally-capable text/source editors can handle the whitespace for you.
However, it is best to just "let go" and enjoy the whitespace rules of Python. With some practice, they won't get into your way at all, and you will find they have many merits, the most important of which are:
Because of the forced whitespace, Python code is simpler to understand. You will find that as you read code written by others, it is easier to grok than code in, say, Perl or PHP.
Whitespace saves you quite a few keystrokes of control characters like { and }, which litter code written in C-like languages. Less {s and }s means, among other things, less RSI and wrist pain. This is not a matter to take lightly.
In Python, indentation is a semantic element as well as providing visual grouping for readability.
Both space and tab can indicate indentation. This is unfortunate, because:
The interpretation(s) of a tab varies
among editors and IDEs and is often
configurable (and often configured).
OTOH, some editors are not
configurable but apply their own
rules for indentation.
Different sequences of
spaces and tabs may be visually
indistinguishable.
Cut and pastes can alter whitespace.
So, unless you know that a given piece of code will only be modified by yourself with a single tool and an unvarying config, you must avoid tabs for indentation (configure your IDE) and make sure that you are warned if they are introduced (search for tabs in leading whitespace).
And you can still expect to be bitten now and then, as long as arbitrary semantics are applied to control characters.
Check the options of your editor or find an editor/IDE that allows you to convert TABs to spaces. I usually set the options of my editor to substitute the TAB character with 4 spaces, and I never run into any problems.
Yes, there is a way. I hate these "no way" answers, there is no way until you discover one.
And in that case, whatever it is worth, there is one.
I read once about a guy who designed a way to code so that a simple script could re-indent the code properly. I didn't managed to find any links today, though, but I swear I read it.
The main tricks are to always use return at the end of a function, always use pass at the end of an if or at the end of a class definition, and always use continue at the end of a while. Of course, any other no-effect instruction would fit the purpose.
Then, a simple awk script can take your code and detect the end of block by reading pass/continue/return instructions, and the start of code with if/def/while/... instructions.
Of course, because you'll develop your indenting script, you'll see that you don't have to use continue after a return inside the if, because the return will trigger the indent-back mechanism. The same applies for other situations. Just get use to it.
If you are diligent, you'll be able to cut/paste and add/remove if and correct the indentations automagically. And incidentally, pasting code from the web will require you to understand a bit of it so that you can adapt it to that "non-classical" setting.

Categories