This question already has answers here:
Explanation of how nested list comprehension works?
(11 answers)
Closed 6 years ago.
Can someone please explain the meaning the syntax behind the following line of code:
temp3 = [x for x in temp1 if x not in s]
I understand it's for finding the differences between 2 lists, but what does the 'x' represent here? Each individual element in the list that is being compared? I understand that temp1 and s are lists. Also, does x for x have to have the same variable or could it be x for y?
[x for x in temp1 if x not in s]
It may help to re-order it slightly, so you can read the whole thing left to right. Let's move the first x to the end.
[for x in temp1 if x not in s yield x]
I've added a fake yield keyword so it reads naturally as English. If we then add some colons it becomes even clearer.
[for x in temp1: if x not in s: yield x]
Really, this is the order that things get evaluated in. The x variable comes from the for loop, that's why you can refer to it in the if and yield clauses. But the way list comprehensions are written is to put the value being yielding at the front. So you end up using a variable name that's not yet defined.
In fact, this final rewrite is exactly how you'd write an explicit generator function.
def func(temp1, s):
for x in temp1:
if x not in s:
yield x
If you call func(temp1, s) you get a generator equivalent to the list. You could turn it into that list with list(func(temp1, s)).
It iterates through each element in temp1 and checks to see if it is not in s before including it in temp3.
It is a shorter and more pythonic way of writing
temp3 = []
for item in temp1:
if item not in s:
temp3.append(item)
Where temp1 and s are the two lists you are comparing.
As for your second question, x for y will work, but probably not in the way you intend to, and certainly not in a very useful way. It will assign each item in temp1 to the variable name y, and then search for x in the scope outside of the list comprehension. Assuming x is defined previously (otherwise you will get NameError or something similar), the condition if x not in s will evaluate to the same thing for every item in temp1, which is why it’s not terribly useful. And if that condition is true, your resulting temp3 will be populated with xs; the y values are unused.
Do not take this as saying that using different variables in a list comprehension is never useful. In fact list comprehensions like [a if condition(x) else b for x in original_sequence] are often very useful. A list comprehension like [a for x in original_sequence if condition(x)] can also be useful for constructing a list containing exactly as many instances of a as the number of items in original_sequence that satisfy condition().
Try yourself:
arr = [1,2,3]
[x+5 for x in arr]
This should give you [6, 7, 8] that are the values on the [1,2,3] list plus 5. This syntax is know as list comprehension (or mapping). It applies the same instructions to all elements on a list. Would be the same as doing this:
for x in arr:
arr += 5
X is same variable and it is not y. It works same as below code
newList = []
for x in temp1:
if x not in s:
newList.append(x)
So x for x, here first is x which is inside append in code and x after for is same as for x in temp1.
Related
I came across a solution on Stack Overflow to generate prime numbers using list comprehension. But was unable to understand what does the inner for loop do.
I have tried something like
[x for x in range(5,20) for y in range(2,int(x/2)+1) if any(x%y == 0)]
It throws an error: 'bool' object is not iterable
I know that my syntax is wrong but logically for prime numbers we have a for loop followed by a for loop and then a if condition to calculate remainder(x%y).
But the answer on Stack Overflow is
[x for x in range(2, 20) if all(x % y != 0 for y in range(2, x))]
I understood the reason why all is used, but I am unable to get how the condition inside all() is working as ideally for should be following if so that range(2,x) is iterated and y gets values which are in turn used for computing(x%y). How can y be used even before it is has been assigned a value.
That is just the wonderful thing about list comprehension if it can work normally like the for loop, people wont create it because the for loop is more readable and understandable.
You may find out that the result of list comprehension is always a list, meanwhile the result of for loop would always many single values and these single values is a part of iterable
[x +1 for x in range(1,5)]
[2, 3, 4, 5]
for x in range (1,10): print(x+1)
2
3
4
5
You can simply understand that the loop comprehension already have the list of values, then they just simply feed orderly to the condition value by value. Like this:
[1+1 , 2+1 , 3+1 , 4+1]
Your code is wrong because you inherit too much from the ordinary for loop. Your code written in for loop would be like this:
for x in range(5,20):
for y in range(2,int(x/2)+1):
if any(x%y == 0):
print(x)
And the result would obviously:
TypeError: 'bool' object is not iterable
because any requires an iterable such as a generator expression or a **list** as mentioned above by #meowgoesthedog . Coincidentally, list is just all about list comprehension. However, you need comprehend it in order to utilize the list comprehension well. It sometimes happens to me too, in your case, the for y in range(2,int(x/2)+1) works as a normal for loop.
This is the syntax of list comprehension.
In side the condition if which is optional predicate. We can create another list comprehension by following the rules with x%y==0 is output expression and a variable y representing members of the input sequence range(2,int(x/2)+1)
all() and any() works on itterable objects. For example all([True, True, False, True]) returns False. You cant use any(True) (like in your example: any(x%y == 0))
This statement [x for x in range(2, 20) if all(x % y != 0 for y in range(2, x))] can be translated to this code:
res = []
for x in range(2, 20):
temporary_list = (x%y != 0 for y in range(2,x))
if all(temporary_list):
res.append(x)
Ps. I saw in comments that you are not sure how y is declared. In python, there are more great structures than list of comprehension. One of them is generator of comprehension - I believe it is used in this case.
The syntax all and any work on iterable objects (list, sets, etc). Therefore you get an error when you apply it on boolean - x%y==0.
You can use any in the following manner -
[x for x in range(5,20) if not any([x % y == 0 for y in range(2, int(x/2)+1)])]
or -
[x for x in range(2, 20) if not any(x % y == 0 for y in range(2, int(x/2)+1))]
As any and all complement each other.
I'm trying to convert this working nested forloop into a single line list comprehension & i cannot seem to get it to work. The pseudo-code is as follows:
result = []
for x in something:
for y in x.address:
m = re.search("some pattern to match",y)
if m:
result += [m.group(1)]
Any pointers on how do i go about this ?
You'll need a generator expression..
matches = ( re.search(r'some pattern to match', y) for x in something
for y in x.address )
result = [ m.group(1) for m in matches if m ]
Nested loops are not really a problem for list comprehensions, as you can nest those there too:
lst = []
for y in z:
for x in y:
lst.append(f(x))
This translates into the following list comprehension:
[f(x) for y in z for x in y]
And you can easily continue that for multiple levels.
Conditions that decide on whether you want to add something to the list or not also work just fine:
lst = []
for x in y:
if t(x):
lst.append(f(x))
This translated into the following list comprehension with a filter:
[f(x) for x in y if t(x)]
Of course you can also combine that with multiple levels.
Now what is some kind of a problem though is when you want to execute something first, then filter on the result of that and append also something that depends on the result. The naive solution would be to move the function call inside and do it twice:
rexpr = re.compile('some pattern to match')
[rexpr.search(y).group(1) for x in something for y in x.address if rexpr.search(y)]
But this obviously runs the search twice which you generally want to avoid. At this point, you could use some hackish solutions which I generally wouldn’t recommend (as they harm readability). Since your result only depends on the result of the regular expression search, you could also solve this in two steps: First, you search on every element and map them to a match object, and then you filter on those matches and just return the valid ones:
[m.group(1) for m in (rexpr.search(y) for x in something for y in x.address) if m]
Note that I’m using generator expressions here: Those are essentially the same as list comprehensions, but don’t create the full result as a list but only yield on element at a time. So it’s more efficient if you only want to consume this one by one (which is the case here). After all, you’re only interested in the result from the list comprehension, so the comprehension will consume the generator expression.
I would do something like this:
# match function
def match(x):
m = re.search("some pattern to match",x)
if m:
return m.group(1)
else:
return None
#list comprehension
results = [match(y) for x in something for y in x.address if match(y)]
I am writing a list comprehension in Python:
[2 * x if x > 2 else add_nothing_to_list for x in some_list]
I need the "add_nothing_to_list" part (the else part of the logic) to literally be nothing.
Does Python have a way to do this? In particular, is there a way to say a.append(nothing) which would leave a unchanged. This can be a useful feature to write generalized code.
Just move the condition to the last
[2 * x for x in some_list if x > 2]
Quoting the List Comprehension documentation,
A list comprehension consists of brackets containing an expression followed by a for clause, then zero or more for or if clauses. The result will be a new list resulting from evaluating the expression in the context of the for and if clauses which follow it.
In this case, the expression is 2 * x and then a for statement, for x in some_list, followed by an if statement, if x > 2.
This comprehension can be understood, like this
result = []
for x in some_list:
if x > 2:
result.append(x)
How can I update the upper limit of a loop in each iteration? In the following code, List is shortened in each loop. However, the lenList in the for, in loop is not, even though I defined lenList as global. Any ideas how to solve this? (I'm using Python 2.sthg)
Thanks!
def similarity(List):
import difflib
lenList = len(List)
for i in range(1,lenList):
import numpy as np
global lenList
a = List[i]
idx = [difflib.SequenceMatcher(None, a, x).ratio() for x in List]
z = idx > .9
del List[z]
lenList = len(List)
X = ['jim','jimmy','luke','john','jake','matt','steve','tj','pat','chad','don']
similarity(X)
Looping over indices is bad practice in python. You may be able to accomplish what you want like this though (edited for comments):
def similarity(alist):
position = 0
while position < len(alist):
item = alist[position]
position += 1
# code here that modifies alist
A list will evaluate True if it has any entries, or False when it is empty. In this way you can consume a list that may grow during the manipulation of its items.
Additionally, if you absolutely have to have indices, you can get those as well:
for idx, item in enumerate(alist):
# code here, where items are actual list entries, and
# idx is the 0-based index of the item in the list.
In ... 3.x (I believe) you can even pass an optional parameter to enumerate to control the starting value of idx.
The issue here is that range() is only evaluated once at the start of the loop and produces a range generator (or list in 2.x) at that time. You can't then change the range. Not to mention that numbers and immutable, so you are assigning a new value to lenList, but that wouldn't affect any uses of it.
The best solution is to change the way your algorithm works not to rely on this behaviour.
The range is an object which is constructed before the first iteration of your loop, so you are iterating over the values in that object. You would instead need to use a while loop, although as Lattyware and g.d.d.c point out, it would not be very Pythonic.
What you are effectively looping on in the above code is a list which got generated in the first iteration itself.
You could have as well written the above as
li = range(1,lenList)
for i in li:
... your code ...
Changing lenList after li has been created has no effect on li
This problem will become quite a lot easier with one small modification to how your function works: instead of removing similar items from the existing list, create and return a new one with those items omitted.
For the specific case of just removing similarities to the first item, this simplifies down quite a bit, and removes the need to involve Numpy's fancy indexing (which you weren't actually using anyway, because of a missing call to np.array):
import difflib
def similarity(lst):
a = lst[0]
return [a] + \
[x for x in lst[1:] if difflib.SequenceMatcher(None, a, x).ratio() > .9]
From this basis, repeating it for every item in the list can be done recursively - you need to pass the list comprehension at the end back into similarity, and deal with receiving an empty list:
def similarity(lst):
if not lst:
return []
a = lst[0]
return [a] + similarity(
[x for x in lst[1:] if difflib.SequenceMatcher(None, a, x).ratio() > .9])
Also note that importing inside a function, and naming a variable list (shadowing the built-in list) are both practices worth avoiding, since they can make your code harder to follow.
i have two lists eg x = [1,2,3,4,4,5,6,7,7] y = [3,4,5,6,7,8,9,10], i want to iterate over the two lists while comparing items. For those that match, i would like to call some function and remove them from the lists, in this example i should end up with x= [1,2] and y = [8,9,10]. Sets will not work for this problem because of my type of data and the comparison operator.
for i in x:
for j in y:
if i ==j:
callsomefunction(i,j)
remove i, j from x and y respectively
Edit: After discovering the person asking the question simply didn't know about __hash__ I provided this information in a comment:
To use sets, implement __hash__. So if obj1 == obj2 when obj1.a == obj2.a and ob1.b == obj2.b, __hash__ should be return hash((self.a, self.b)) and your sets will work as expected.
That solved their problem, and they switched to using sets.
The rest of this answer is now obsolete, but it's still correct (but horribly inefficient) so I'll leave it here.
This code does what you want. At the end, newx and newy are the non-overlapping items of x and y specifically.
x = [1,2,3,4,4,5,6,7,7]
y = [3,4,5,6,7,8,9,10]
# you can leave out bad and just compare against
# x at the end if memory is more important than speed
newx, bad, newy = [], [], []
for i in x:
if i in y:
callsomefunction(i)
bad.append(i)
else:
newx.append(i)
for i in y:
if i not in bad:
newy.append(i)
print newx
print newy
However, I know without even seeing your code that this is the wrong way to do this. You can certainly do it with sets, but if you don't want to, that's up to you.
Ok, discard my post, I hadn't seen the point where you mentionned that sets wouldn't work.
Nevertheless, if you're OK with a little work, you might want to use classes so that operators do work as they are expected to.
I think the most "pythonistic" way of doing this is to use sets.
You could then do :
x = set([1,2,3,4,4,5,6,7,7])
y = set([3,4,5,6,7,8,9,10])
for item in x.intersection(y): #y.intersection(x) is fine too.
my_function(item) #You could use my_function(item, item) if that's what your function requires
x.remove(item)
y.remove(item)
I think that sets are also more efficient than lists for this kind of work when it comes down to performance (though this might not be your top priority).
On a sidenote, you could also use:
x,y = x.difference(y), y.difference(x)
This effectively removes items that are in x and y from x and y.
Try this:
for i in x:
if i in y:
callsomefunction(i)
x.remove(i)
y.remove(i)
EDIT: updated answer
how about this:
import itertools
x = [1,2,3,4,4,5,6,7,7]
y = [3,4,5,6,7,8,9,10]
output = map(somefunction, itertools.product(x,y))