Python design mistakes [closed]

Python design mistakes [closed] - python

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
A while ago, when I was learning Javascript, I studied Javascript: the good parts, and I particularly enjoyed the chapters on the bad and the ugly parts. Of course, I did not agree with everything, as summing up the design defects of a programming language is to a certain extent subjective - although, for instance, I guess everyone would agree that the keyword with was a mistake in Javascript. Nevertheless, I find it useful to read such reviews: even if one does not agree, there is a lot to learn.
Is there a blog entry or some book describing design mistakes for Python? For instance I guess some people would count the lack of tail call optimization a mistake; there may be other issues (or non-issues) which are worth learning about.

You asked for a link or other source, but there really isn't one. The information is spread over many different places. What really constitutes a design mistake, and do you count just syntactic and semantic issues in the language definition, or do you include pragmatic things like platform and standard library issues and specific implementation issues? You could say that Python's dynamism is a design mistake from a performance perspective, because it makes it hard to make a straightforward efficient implementation, and it makes it hard (I didn't say completely impossible) to make an IDE with code completion, refactoring, and other nice things. At the same time, you could argue for the pros of dynamic languages.
Maybe one approach to start thinking about this is to look at the language changes from Python 2.x to 3.x. Some people would of course argue that print being a function is inconvenient, while others think it's an improvement. Overall, there are not that many changes, and most of them are quite small and subtle. For example, map() and filter() return iterators instead of lists, range() behaves like xrange() used to, and dict methods like dict.keys() return views instead of lists. Then there are some changes related to integers, and one of the big changes is binary/string data handling. It's now text and data, and text is always Unicode. There are several syntactic changes, but they are more about consistency than revamping the whole language.
From this perspective, it appears that Python has been pretty well designed on the language (syntax and sematics) level since at least 2.x. You can always argue about indentation-based block syntax, but we all know that doesn't lead anywhere... ;-)
Another approach is to look at what alternative Python implementations are trying to address. Most of them address performance in some way, some address platform issues, and some add or make changes to the language itself to more efficiently solve certain kinds of tasks. Unladen swallow wants to make Python significantly faster by optimizing the runtime byte-compilation and execution stages. Stackless adds functionality for efficient, heavily threaded applications by adding constructs like microthreads and tasklets, channels to allow bidirectional tasklet communication, scheduling to run tasklets cooperatively or preemptively, and serialisation to suspend and resume tasklet execution. Jython allows using Python on the Java platform and IronPython on the .Net platform. Cython is a Python dialect which allows calling C functions and declaring C types, allowing the compiler to generate efficient C code from Cython code. Shed Skin brings implicit static typing into Python and generates C++ for standalone programs or extension modules. PyPy implements Python in a subset of Python, and changes some implementation details like adding garbage collection instead of reference counting. The purpose is to allow Python language and implementation development to become more efficient due to the higher-level language. Py V8 bridges Python and JavaScript through the V8 JavaScript engine – you could say it's solving a platform issue. Psyco is a special kind of JIT that dynamically generates special versions of the running code for the data that is currently being handled, which can give speedups for your Python code without having to write optimised C modules.
Of these, something can be said about the current state of Python by looking at PEP-3146 which outlines how Unladen Swallow would be merged into CPython. This PEP is accepted and is thus the Python developers' judgement of what is the most feasible direction to take at the moment. Note it addresses performance, not the language per se.
So really I would say that Python's main design problems are in the performance domain – but these are basically the same challenges that any dynamic language has to face, and the Python family of languages and implementations are trying to address the issues. As for outright design mistakes like the ones listed in Javascript: the good parts, I think the meaning of "mistake" needs to be more explicitly defined, but you may want to check out the following for thoughts and opinions:
FLOSS Weekly 11: Guido van Rossum (podcast August 4th, 2006)
The History of Python blog

Is there a blog entry or some book describing design mistakes for Python?
Yes.
It's called the Py3K list of backwards-incompatible changes.
Start here: http://docs.python.org/release/3.0.1/whatsnew/3.0.html
Read all the Python 3.x release notes for additional details on the mistakes in Python 2.

My biggest peeve with Python - and one which was not really addressed in the move to 3.x - is the lack of proper naming conventions in the standard library.
Why, for example, does the datetime module contain a class itself called datetime? (To say nothing of why we have separate datetime and time modules, but also a datetime.time class!) Why is datetime.datetime in lower case, but decimal.Decimal is upper case? And please, tell me why we have that terrible mess under the xml namespace: xml.sax, but xml.etree.ElementTree - what is going on there?

Try these links:
http://c2.com/cgi/wiki?PythonLanguage
http://c2.com/cgi/wiki?PythonProblems

Things that frequently surprise inexperienced developers are candidate mistakes. Here is one, default arguments:
http://www.deadlybloodyserious.com/2008/05/default-argument-blunders/

A personal language peeve of mine is name binding for lambdas / local functions:
fns = []
for i in range(10):
fns.append(lambda: i)
for fn in fns:
print(fn()) # !!! always 9 - not what I'd naively expect
IMO, I'd much prefer looking up the names referenced in a lambda at declaration time. I understand the reasons for why it works the way it does, but still...
You currently have to work around it by binding i into a new name whos value doesn't change, using a function closure.

This is more of a minor problem with the language, rather than a fundamental mistake, but: Property overriding. If you override a property (using getters and setters), there is no easy way of getting the parent class' property.

Yeah, it's strange but I guess that's what you get for having mutable variables.
I think the reason is that the "i" refers to a box which has a mutable value and the "for" loop will change that value over time, so reading the box value later gets you the only value there is left.
I don't know how one would fix that short of making it a functional programming language without mutable variables (at least without unchecked mutable variables).
The workaround I use is creating a new variable with a default value (default values being evaluated at DEFINITION time in Python, which is annoying at other times) which causes copying of the value to the new box:
fns = []
for i in range(10):
fns.append(lambda j=i: j)
for fn in fns:
print(fn()) # works

I find it surprising that nobody mentioned the global interpreter lock.

One of the things I find most annoying in Python is using writelines() and readlines() on a file. readlines() not only returns a list of lines, but it also still has the \n characters at the end of each line, so you have to always end up doing something like this to strip them:
lines = [l.replace("\n", "").replace("\r", "") for l in f.readlines()]
And when you want to use writelines() to write lines to a file, you have to add \n at the end of every line in the list before you write them, sort of like this:
f.writelines([l + "\n" for l in lines])
writelines() and readlines() should take care of endline characters in an OS independent way, so you don't have to deal with it yourself.
You should just be able to go:
lines = f.readlines()
and it should return a list of lines, without \n or \r characters at the end of the lines.
Likewise, you should just be able to go:
f.writelines(lines)
To write a list of lines to a file, and it should use the operating systems preferred enline characters when writing the file, you shouldn't need to do this yourself to the list first.

My biggest dislike is range(), because it doesn't do what you'd expect, e.g.:
>>> for i in range(1,10): print i,
1 2 3 4 5 6 7 8 9
A naive user coming from another language would expect 10 to be printed as well.

You asked for liks; I have written a document on that topic some time ago: http://segfaulthunter.github.com/articles/biggestsurprise/

I think there's a lot of weird stuff in python in the way they handle builtins/constants. Like the following:
True = "hello"
False = "hello"
print True == False
That prints True...
def sorted(x):
print "Haha, pwned"
sorted([4, 3, 2, 1])
Lolwut? sorted is a builtin global function. The worst example in practice is list, which people tend to use as a convenient name for a local variable and end up clobbering the global builtin.

Related

Pass by object reference good practices

I come from C++, and I am struggling to get a sense of safety when programming in Python (for instance misspelling can create extremely hard to find bugs, but that is not the point here).
Here I would like to understand how I can avoid doing horrible things by adhering to good practices.
The simple function below is perfectly fine in c++ but creates what I can only call a monstrosity in Python.
def fun(x):
x += 1
x = x + 1
return x
When I call it
var1 = 1;
print(fun(var1), var1)
var2 = np.array([1]);
print(fun(var2), var2)
I get
3 1
[3] [2]
Apart from the lack of homogeneous behaviour (which is already terrible), the second case is particularly hideous. The external variable is modified only by some of the instructions!
I know in details why it happens. So that is not my question. The point is that when constructing a complex program, I do not want to have to be extra careful with all these context-dependent and highly implicit technicalities.
There must be some good practice I can strictly adhere to that will prevent me from inadvertently producing the code above. I can think of ways, but they seem to overcomplicate the code, making C++ look like a more high level language.
What good practice should I follow to avoid that monstrosity?
Thanks!
[EDIT] Some clarification: What I struggle with is the fact that Python makes a type-dependent and context-dependent choice of creating a temporary. Again, I know the rules. However in C++ the choice is done by the programmer and clear throughout the whole function, while that is not the case in Python. Python requires the programmer to know quite some technicalities of the operations done on the argument in order to figure out if at that point Python is working on a temporary or on the original.
Notice that I constructed a function which both returns a value and has a side effect just to show my point.
The point is that a programmer might want to write that function to simply have side effects (no return statement), and midway through the function Python decides to build a temporary, so some side effects are not applied.
On the other hand the programmer might not want side effects, and instead get some (and hard to predict ones).
In C++ the above is simply and clearly handled. In Python it is rather technical and requires knowing what triggers the generation of temporaries and what not. As I need to explain this to my students, I would like to give them a simple rule that will prevent them from falling into those traps.

Good practices to avoid such pitfalls:
Functions which modify inputs should not return anything (e.g. list.sort)
Functions which do not modify the input should return the modified value (e.g. sorted)
Your fun does both, which goes against the conventions followed by most standard library code and popular third-party Python libraries. Breaking this "unwritten rule" is the cause of the particularly hideous result there.
Generally speaking, it's best if functions are kept "pure" when possible. It's easier to reason about a pure and stateless function, and they're easier to test.
A "sense of safety" when programming in Python comes from having a good test suite. As an interpreted and dynamic programming language, almost everything in Python happens at runtime. There is very little to protect you at compile time - pretty much only the syntax errors will be found. This is great for flexibility, e.g. virtually anything can be monkeypatched at runtime. With great power comes great responsibility. It is not unusual for a Python project to have twice as much test code as there is library code.

The one good practice that jumps to mind is command-query separation:
A function or method should only ever either compute and return something, or do something, at least when it comes to outside-observable behavior.
There's very few exceptions acceptable (think e.g. the pop method of a Stack data structure: It returns something, and does something) but those tend to be in places where it's so idiomatic, you wouldn't expect it any other way.
And when a function does something to its input values, that should be that function's sole purpose. That way, there's no nasty surprises.
Now for the inconsistent behavior between a "primitive" type and a more complex type, it's easiest to code defensively and assume that it's a reference anyway.

Is my understanding of how Python is written/implemented correct?

I want to understand how Python works at a base level, and this will hopefully help me understand a bit more about the inner workings of other compiled/interpreted languages. Unfortunately, the compilers class is a bit away for now. From what I read on this site and elsewhere, people answering "What base language is Python written in" seem to convey that there's a difference between talking about the "rules" of a language versus how the language rules are implemented for usage. So, is it correct to say that Python (and other high-level languages) are all essentially just sets of rules "written" in any natural language? And then the matter of how they're actually used (where used means compiled/interpreted to actually create things) can vary, with various languages being used to implement compilers? So in this case, CPython, IronPython, and Jython would be syntactically equal languages which all follow the same set of rules, just that those rules are implemented themselves in their respective languages.
Please let me know if my understanding of this is correct, if you have anything to add that might further solidify my understanding, or if I'm blatantly wrong.

Code written in Python should be able to run on any Python interpreter. Python is essentially a specification for a programming language with a reference implementation (CPython). Whenever the Python specifications and PEPs are ambiguous, the other interpreters usually choose to implement the same behavior, unless they have reason not to.
That being said, it's entirely possible that a program written in Python will behave differently on different implementations. This is because many programmers venture into "undefined behavior." For example, CPython has a "Global Interpreter Lock" that means only one thread is actually executing at a time (modulo some conditions), but other interpreters do not have that behavior. So, for example, there is different behaviors about atomicity (e.g., each bytecode instruction is atomic in CPython) as other interpreters.
You can consider it like C. C is a language specification, but there are many compilers implementing it: GCC, LLVM, Borland, MSVC++, ICC, etc. There are programming languages and implementations of those programming languages.

You are correct when you make the distinction between what a language means and how it does what it means.
What it means
The first step to compiling a language is to parse its code to generate an Abstract Syntax Tree. That is a tree that defines what the code you wrote means, what it is supposed to do. By example if you have the following code
a = 1
if a:
print('not zero')
It would generate a tree that looks more or less like this.
code
___________|______
| |
declaration if
__|__ ___|____
| | | |
a 1 a print
|
'not zero'
This represents what the code means, but tells us nothing about how it executes it.
Edit: of course the above is far from what Python's parsers would actually generate, I made plenty of oversimplification for the purpose of readability. Luckily for us, if you are curious about what is actually generated you can import ast that provides a Python parser.
import ast
code = """
a = 1
if a:
print('not zero')
"""
my_ast = ast.parse(code)
Enjoy inspecting my_ast.
What it does
Once you have an AST, you can convert it back to whatver you want. It can be C, it can be machine code, you can even convert it back to Python if you wish. The most used implementation of Python is CPython which is written in C.
What is going on under the hood is thus pretty close to your understanding. First, a language is a set of rules that defines a behaviour, and only then is there an implementation to that languages that defines how it does it. And yes of course, you can have different implementations of a same language with slight difference of behaviours.

Basically it's a bunch of dictionary data structures implementing functions, modules, etc. The global variables and their values live in a per-module dictionary. Variables within a class are another dictionary. Those within an object are yet another dictionary and so are those within a function. Even a function call has its own dictionary so that different calls have different copies of the local variables.
It has no lexical scope unlike most other languages and, in my opinion, was designed to be implemented as simply as possible by 1 coder using dictionaries.

Is it a good idea to dynamically create variables?

I recently found out how to dynamically create variables in python through this method:
vars()['my_variable'] = 'Some Value'
Thus creating the variable my_variable.
My question is, is this a good idea? Or should I always declare the variables ahead of time?

I think it's preferable to use a dictionnary if it's possible:
vars_dict = {}
vars_dict["my_variable"] = 'Some Value'
vars_dict["my_variable2"] = 'Some Value'
I think it's more pythonic.

This is a bad idea, since it gets much harder to analyze the code, both for a human being looking at the source, and for tools like pylint or pychecker. You'll be a lot more likely to introduce bugs if you use tricks like that. If you think you need that feature at some time, think really hard if you can solve your problem in a simpler and more conventional way. I've used Python for almost 20 years, and never felt a need to do that.
If you have more dynamic needs, just use a normal dictionary, or possibly something like json.
One of the great things with Python, its dynamic nature and good standard collection types, is that you can avoid putting logic in text strings. Both the Python interpreter, syntax highlighting in your IDE, intellisense and code analysis tools look at your source code, provides helpful suggestions and finds bugs and weaknesses. This doesn't work if your data structure or logic has been hidden in text strings.
More stupid and rigid languages, such as C++ and Java, often makes developers resort to string based data structures such as XML or json, since they don't have convenient collections like Python lists or dicts. This means that you hide business logic from the compiler and other safety checks built into the language or tools, and have to do a lot of checks that your development tools would otherwise do for you. In Python you don't have to do that ... so don't!

There is no guarantee that vars()['myvariable'] = 'Some value' and my variable = 'Some value' have the same effect. From the documentation:
Without an argument, vars() acts like locals(). Note, the locals
dictionary is only useful for reads since updates to the locals
dictionary are ignored.
This code is simply wrong.

Pros:
adds another level of indirection, makes the environment more dynamic
in particular, allows to avoid more code duplication
Cons:
not applicable for function namespaces (due to optimization)
adds another level of indirection, makes the environment more dynamic
"lexical references" are much harder to track and maintain
if created names are arbitrary, conflicts are waiting to happen
it's hard to find the ins and outs in the code base and predict its behaviour
that's why these tricks may upset code checking tools like pylint
if variables are processed in a similar way, they probably belong together separately from others (in a dedicated dict) rather than reusing a namespace dict, making it a mess in the process
In brief, at the abstraction level Python's language and runtime features are designed for, it's only good in small, well-defined amounts.

I don't see what would be the advantage of it, also would make your code harder to understand.
So no I don't think it is a good idea.

When coding in Python, how do I achieve guarantees of correctness similar to those I get with Haskell's type system?

Using Haskell's type system I know that at some point in the program, a variable must contain say an Int of a list of strings. For code that compiles, the type checker offers certain guarantees that for instance I'm not trying to add an Int and a String.
Are there any tools to provide similar guarantees for Python code?
I know about and practice TDD.

The quick answer is "not really". While tools like PyLint (which is very good BTW) will give you a lot of help and good advice on what constitutes good Python style, that isn't exactly what you're looking for and it certainly isn't a real substitute for things like HM type inference.
There are some interesting research projects in this area, notably Gradual Typing by Jeremy Siek and colleagues and some really interesting ideas like the blame calculus of Wadler and Findler.
Practically speaking, I think the best you can achieve is by using some sensibly chosen runtime methods. Use the inspect module to test the type of an object (but remember to be true to Python's duck typing and so on). Use assert statements liberally. Or (possible 'And') use something like Design by Contract using decorators. There are lots of ways to implement these idioms, but this is typically done on a per-project basis. You may want to think about whether and how such methods affect the performance and resource usage of your programs, if this is critical for you. There have, however, been some efforts to standardise techniques like DBC for Python, but these haven't (yet) been pushed into the cPython trunk. Here's hoping though :)

Python is dynamic and strongly typed programming language. What that means is that you can define a variable without explicitly stating its type, but when you first use that variable it becomes bound to a certain type.
For example,
x = 5 is an integer, and so now you cannot concatenate it with string, e.g. x+"hello"

What are the most frustrating Python hacks to unwind, rewrite, etc.? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
My impression of Python from the short time I've been developing with it is that it's incredible powerful and flexible, but I can't help but feel like "with great power comes great responsibility." So while I've read numerous blog posts about simple and elegant Python snippets that solve a problems, I wonder if there are design patterns or abuses of Python language features that, once built into an application or library, cause the code to be incredibly brittle and near impossible to refactor.
So the question is basically what are the most frustrating, but somewhat common, Python "hacks" or language feature abuses that someone can introduce that will cause nightmares for future maintainers of that code?

Excessive usage of from module import *.
Having a lot of such imports at the module you don't know where each variable came from and have to look though all imported modules. Searching doesn't help much in this case.

Magic that works but not always. For example, when metaclasses are abused to create a DSL. Such DSL could be suitable for most tasks but breaks horribly on a complex (unexpected by author) one.

Using eval or exec on user input may be the most common abuse of Python features.

It's not a hack, but there's been a somewhat large issue with Python 2.X's print keyword.
People would rely on print to be called for output throughout an entire project, and then when it finally came time to, say, change output to a file and to stdout, they'd have to go in and refactor all those print keywords to another custom output function.
Python 3 solved this by making print an actual function rather than a keyword (therefore automatically making output loosely coupled to the rest of the system), so if need be you can replace the original print with a new print that does more than just write to stdout.
See PEP3105 for the specific reasoning from Guido and more details.

..what are the most frustrating, but somewhat common, Python "hacks" or language feature abuses that someone can introduce that will cause nightmares for future maintainers of that code?
Hard to refactor:
nested list comprehensions (as in: multiple levels deep).
Most people (when learning Python) are fascinated by the power and utility of list comprehensions. This can cause a tendency to over-use them and build deeply nested, complicated ones. Most of the time the same code should have been written with simple loops for readability and maintainability. I consider three levels already too deeply nested.
--
And also (not so hard to refactor but mostly irritating):
trying to use Python as if it was another language (without it's own specific constructs); e.g.:
for i in range(len(mylist)):
item = mylist[i]
# do stuff with item
instead of
for i, item in enumerate(mylist):
# do stuff with item
or even (why do you need the index anyway):
for item in mylist:
# do stuff with item
This includes: reinventing the wheel (badly) when functionality is already (aptly named) in the rich standard library.
And type-checking, making stuff impossible to subclass, etc...

The single biggest issue I've come across is use of double-leading-underscore attributes. The perpetrators are practically always new Python programmers or programmers who prefer another language (in particular Java, for some reason.) Double leading underscores causes the attributes to be name-mangled (using the current class name), avoiding collisions in subclasses. It's too frequently seen as 'private', even though it isn't. (See this answer I once wrote.) The same classes are usually littered with accessors -- not properties, but regular methods called directly -- to get at these name-mangled attributes. The end result is always a horribly convoluted class that's impossible to subclass to specialize or bugfix or monkeypatch or test.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.