Python understanding list comprehension from Java background [duplicate] - python

This question already has answers here:
Generator expressions vs. list comprehensions
(13 answers)
Closed 7 years ago.
I come from a Java background and just started to work on Python. Most of the things are fairly easy to pick up but I am having hard time to understand one thing in the language which I just found out that is called list comprehension. What is this list comprehension in Python? How does this compare with language constructs found in Java? The problem is it's everywhere, nearly all the examples I found here and there use it.
For the following example, allow me to understand how this works.
[x**2 for x in range(10)]
And then there this.
[j + k for j in 'abc' for k in 'def']
Beyond that I also have seen things like this somewhere on Stackoverflow.
(x for x in (0,1,2,3,4))
Also things like this.
total = sum(x+y for x in (0,1,2,3) for y in (0,1,2,3) if x < y)
This started to get messy, could you please help me?

What is this list comprehension in Python?
First let’s start with the basic definition taken from the official Python documentation.
A list comprehension consists of brackets containing an expression followed by a for clause, then zero or more for or if clauses. The result will be a new list resulting from evaluating the expression in the context of the for and if clauses which follow it.
The problem is it's everywhere, nearly all the examples I found here and there use it.
List comprehension is a very flexible concept. It allows us to define lists as we know from the mathematics. Say we have a set S in which each element of S is a square of a number x and x could only takes values ranging from 0 to 10.
See the above definition. It took a paragraph to describe it. But there is a better way to describe it.
S = {x² : x in {0 ... 10}}
That’s why I love math, it is always to the point. Now remember your first example?
S = [x**2 for x in range(10)]
That’s the set we just defined. Neat right? That’s why it is used so much. (Don’t get confused with the x**2 syntax, because Python follows a weird syntax here which you probably might be familiar as x^2 from other languages.)
In Python you can iterate over pretty much anything. Which basically means you can easily iterate over each character of a string. Let’s look at the second example. It just iterates over the words ‘abc’ and ‘def’ and creates a list out of them.
lst = [j + k for j in 'abc' for k in 'def']
Notice that we assigned this to a list named lst. It was no coincidence. Python is the most humane programming language I have ever laid eyes on. So she will help you when you get stuck. Like this.
help(lst)
You can now see what you can do with lst. Ever got confused what lst is? You can check what it is via type.
print type(lst)
Before we move forward let’s talk a little bit about iterators and generators.
Iterators are objects that you can call next() method on. Like this.
iterator = iter([1,2,3,4])
Now we can print the first element of our iterator, like this.
print iterator.next()
Now we can talk about the generators. They are functions that generate iterators. There is however one other concept called generator expressions.
(x for x in (0,1,2,3,4))
This is a generator expression. A generator expressions is like a shortcut to build generators out of expressions similar to that of list comprehensions.
total = sum(x+y for x in (0,1,2,3) for y in (0,1,2,3) if x < y)
What above line does is to first create a generator using a generator expression and iterate over each element and sum those elements.

Related

How a for in loop in python ends when there is no update statement in it? [duplicate]

This question already has an answer here:
How does the Python for loop actually work?
(1 answer)
Closed 3 months ago.
For example:
#1
val = 5
for i in range(val) :
print(i)
When the range is exhausted i.e. last value reached how python knows for in loop ends . As in other languages
#2
for(i=0;i<=5;i++){
print(i)
}
As in this exp. when i's values becomes larger than 5 false condition leads to termination of loop .
I tried reading docs of python and browsed over google but no satisfying answer. So unable to get a picture of this .
So this is actually a complicated question, but the very rough version of the answer is "the compiler/interpreter can do what it wants".
It isn't actually running the human-readable text you write at all - instead it goes through a whole pipeline of transformations. At minimum, a lexer converts the text to a sequence of symbols, and then a parser turns that into a tree of language constructs; that may then be compiled into machine code or interpreted by a virtual machine.
So, the python interpreter creates a structure that handles the underlying logic. Depending on the optimizations performed (those are really a black box, it's hard to say what they do), this may be producing structures logically equivalent to what a Java-like for loop would make, or it could actually create a data structure of numbers (that's what the range() function does on its own) and then iterate over them.
Editing to give some more foundation for what this construct even means:
Python iteration-style loops are different in how they're defined from C-style i++ sorts of loops. Python loops are intended to iterate on each element of a list or other sequence data structure - you can say, for instance, for name in listOfNames, and then use name in the following block.
When you say for i in range(x), this is the pythonic way of doing something like the C-style loop. Think of it as the reverse of
for(int i = 0; i < arr.length(); i++){
foo(arr[i[)
}
In that code block you're accessing each element of an indexible sequence arr by going through each valid index. You don't actually care about i - it's just a means to an end, a way to make sure you visit each element.
Python assumes that's what you're trying to do: the python variant is
for elem in arr:
foo(elem)
Which most people would agree is simpler, clearer and more elegant.
However, there are times when you actually do want to explicitly go number by number. To do that with a python style, you create a list of all the numbers you'll want to visit - that's what the range function does. You'll mostly see it as part of a loop statement, but it can exist independently - you can say x = range(10), and x will hold a list that consists of the numbers 0-9 inclusive.
So, where before you were incrementing a number to visit each item of a list, now you're taking a list of numbers to get incrementing values.
"How it does this" is still explanation I gave above - the parser and interpreter know how to create the nitty-gritty logic that actually creates this sequence and step through it, or possibly transform it into some logically equivalent steps.

Condensing an if-statement

I have the following lists:
languages =["java", "haskell", "Go", "Python"]
animals = ["pigeon", "python", "shark"]
names = ["Johan","Frank", "Sarah"]
I want to find out whether or not python exists in all three of the following lists. The following if-statement is what I came up with just using the "in" method and "and" operators.
if("Python" in languages and "Python" in animals and "Python" in names )
Is there a way to condense this statement into a smaller length? I.E.
if("Python" in languages and in animals and in names)
You can avoid repeating "Python":
if all("Python" in L for L in [languages, animals, names]):
But this is not much shorter.
If this is a test you're expecting to do repeatedly, it would be more efficient to pre-calculate the intersection of your lists:
lanimes = set(languages) & set(animals) & set(names)
if "Python" in lanimes:
(The in operator is O(n) for a list, O(1) for a set.)
Consider:
if all("Python" in x for x in (languages, animals, names)):
I don't think Python has any syntax sugar specifically like that, but depending on how many lists you have, you could do something like
if all("Python" in x for x in [languages, animals, names])
On its own, it's probably a bit more verbose than your ands, but if you have a large number of lists, or you already have a list of lists, then it should save some space, and IMHO it is more immediately clear what the goal of the if statement is.
If you are using Python 3, you can use extended iterable unpacking:
if 'Python' in (*languages, *animals, *names):
Short answer: No, the language syntax does not allow this.
If you really want to cut down on duplicating 'Python', you could use something like this in your if condition:
all('Python' in p for p in (languages, animal, names))
I also suggest that maybe you could reevaluate the design to make your code more flexible. Comprehensions and generator expressions are a good start.

Short-circuiting list comprehensions [duplicate]

This question already has answers here:
Using while in list comprehension or generator expressions
(2 answers)
Closed 7 years ago.
On several occasions I've wanted python syntax for short-circuiting list comprehensions or generator expressions.
Here is a simple list comprehension, and the equivalent for loop in python:
my_list = [1, 2, 3, 'potato', 4, 5]
[x for x in my_list if x != 'potato']
result = []
for element in my_list:
if element != 'potato':
result.append(element)
There isn't support in the language for a comprehension which short-circuits. Proposed syntax, and equivalent for loop in python:
[x for x in my_list while x != 'potato']
# --> [1, 2, 3]
result = []
for element in my_list:
if element != 'potato':
result.append(element)
else:
break
It should work with arbitrary iterables, including infinite sequences, and be extendible to generator expression syntax. I am aware of list(itertools.takewhile(lambda x: x != 'potato'), my_list) as an option, but:
it's not particularly pythonic - not as readable as a while comprehension
it probably can't be as efficient or fast as a CPython comprehension
it requires an additional step to transform the output, whereas that can be put into a comprehension directly, e.g. [x.lower() for x in mylist]
even the original author doesn't seem to like it much.
My question is, was there any theoretical wrinkle about why it's not a good idea to extend the grammar to this use case, or is it just not possible because python dev think it would be rarely useful? It seems like a simple addition to the language, and a useful feature, but I'm probably overlooking some hidden subtleties or complications.
Related: this and this
Turns out, as #Paul McGuire noted, that it had been proposed in PEP 3142 and got rejected by Guido:
I didn't know there was a PEP for that. I hereby reject it. No point
wasting more time on it.
He doesn't give explanations, though. In the mailing list, some of the points against it are:
"[comprehension are] a carefully designed 1 to 1 transformation between multiple nested statements and a single expression. But this proposal ignores and breaks that. (here)." That is, the while keyword does not correspond to a while in the explicit loop - it is only a break there.
"It makes Python harder to learn because it adds one more thing to learn." (cited here)
It adds another difference between generator expressions and list-comprehension. Seems like adding this syntax to list comprehension too is absolutely out of the question.
I think one basic difference from the usual list comprehension is that while is inherently imperative, not declarative. It depends and dictates an order of execution, which is not guaranteed by the language (AFAIK). I guess this is the reason it is not included in Haskell's comprehensions, from which Python stole the idea.
Of course, generator expressions do have direction, but their elements may be precomputed - again, AFAIK. The PEP mentioned did propose it only for generator expressions - which makes some sense.
Of course, Python is an imperative language anyway, but it will raise problems.
What about choosing out of a non-ordered collection?
[x for x in {1,2,3} while x!=2]
You don't have it in simple for loops too, but that's something you can't enforce by the language. takewhile answers this question, but it is an arbitrary answer.
One last point, note that for consistency you will want support for dropwhile. something like
[x for x in my_list from x != 'potato']
Which is even less related to the equivalent for construct, and this time it is not possibly short circuit if my_list is just an iterable.

For...in questions (Python)

I was trying some different ways to run some for...in loops. Consider a list of lists:
list_of_lists = []
list = [1, 2, 3, 4, 5]
for i in range(len(list)):
list_of_lists.append(list) # each entry in l_o_l will now be list
Now let's say I want to have the first "column" of l_o_l be included in a separate list, a.
There are several ways I can go about this. For example:
a = [list[0] for list in list_of_lists] # this works (a = [1, 1, 1, 1, 1])
OR
a=[]
for list in list_of_lists:
a.append(hit[0]) #this also works
For the second example, however, I would imagine the "full" expansion to be equivalent to
a=[]
a.append(list[0] for list in list_of_lists) #but this, obviously, produces a generator error
The working "translation" is, in fact,
a=[]
a.append([list[0] for list in list_of_lists]) #this works
My question is on interpretation and punctuation, then. How come Python "knows" to append/does append the list brackets around the "list[0] for list in list_of_lists" expansion (and thus requires it in any rewrite)?
The issue here is that list comprehensions and generator expressions are not just loops, they are more than that.
List comprehensions are designed to be an easy way to build up a list from an iterable, as you have shown.
Your latter two examples both don't work - in both cases you are appending the wrong thing to the list - in the first case, a generator, the second appends a list inside your existing list. Neither of these are what you want.
You are trying to do something in two different ways at the same time, and it doesn't work. Just use the list comprehension - it does what you want to do in the most efficient and readable way.
Your main problem is you seem to have taken list comprehensions and generator expressions and not understood what they are and what they are trying to do. I suggest you try to understand them further before using them.
My question is on interpretation and punctuation, then. How come
Python "knows" to append/does append the list brackets around the
"hit[0] for list in list_of_lists" expansion (and thus requires it in
any rewrite)?
Not sure what that is supposed to mean. I think you might be unaware that in addition to list comprehensions [i*2 for i in range(0,3)] there are also generator expressions (i*2 for i in range(0,3)).
Generator expressions are a syntax for creating generators that perform a mapping, just like a list comprehension (but as a generator). Any list comprehension [c] can be rewritten list(c). The reason why there is a naked c inside list() is because where generator expressions appear as a parameter to a call, it is permitted to drop the brackets. This is what you are seeing in a.append(hit[0] for list in list_of_lists).

Why is python list comprehension sometimes frowned upon?

Many developers I have met suggest it's best practice to go with simple loops and if conditions instead of one line list comprehension statements.
I have always found them very powerful as I can fit a lot of code in a single line and it saves a lot of variables from being created. Why is it still considered a bad practice?
(Is it slow?)
List comprehensions are used for creating lists, for example:
squares = [item ** 2 for item in some_list]
For loops are better for doing something with the elements of a list (or other objects):
for item in some_list:
print(item)
Using a comprehension for its side effects, or a for-loop for creating a list, is generally frowned upon.
Some of the other answers here advocate turning a comprehension into a loop once it becomes too long. I don't think that's good style: the append calls required for creating a list are still ugly. Instead, refactor into a function:
def polynomial(x):
return x ** 4 + 7 * x ** 3 - 2 * x ** 2 + 3 * x - 4
result = [polynomial(x) for x in some_list]
Only if you're concerned about speed – and you've done your profiling! – you should keep the long, unreadable list comprehension.
It's not considered bad.
However, list comprehensions that either
Have side effects, or
Are more readable as a multi-line for loop
are generally frowned upon.
In other words, readability is important.
As an overly contrived example of the first bad example:
x = range(10)
[x.append(5) for _ in xrange(5)]
Basically, if you're not storing the result, and/or it modifies some other object, then a list comprehension is probably a bad idea.
As an example of the second, have a look at pretty much any code golf entry written in python. Once your list comprehension starts to become something that isn't readable at a glance, consider using a for loop.
There is no performance hit (even if there were, I would not worry about it; if performance were so important, you've got the wrong language). Some frown upon list comprehensions because they're a bit too terse for some, that's a matter of personal preference. Generally, I find simple list comprehensions to be much more readable than their loop/condition equivalents, but when they start to get long (e.g. you start to go past 80 characters), they might be better replaced with a loop.

Categories