This question already has answers here:
How can I use list comprehensions to process a nested list?
(13 answers)
Closed last month.
In Python you can have multiple iterators in a list comprehension, like
[(x,y) for x in a for y in b]
for some suitable sequences a and b. I'm aware of the nested loop semantics of Python's list comprehensions.
My question is: Can one iterator in the comprehension refer to the other? In other words: Could I have something like this:
[x for x in a for a in b]
where the current value of the outer loop is the iterator of the inner?
As an example, if I have a nested list:
a=[[1,2],[3,4]]
what would the list comprehension expression be to achieve this result:
[1,2,3,4]
?? (Please only list comprehension answers, since this is what I want to find out).
Suppose you have a text full of sentences and you want an array of words.
# Without list comprehension
list_of_words = []
for sentence in text:
for word in sentence:
list_of_words.append(word)
return list_of_words
I like to think of list comprehension as stretching code horizontally.
Try breaking it up into:
# List Comprehension
[word for sentence in text for word in sentence]
Example:
>>> text = (("Hi", "Steve!"), ("What's", "up?"))
>>> [word for sentence in text for word in sentence]
['Hi', 'Steve!', "What's", 'up?']
This also works for generators
>>> text = (("Hi", "Steve!"), ("What's", "up?"))
>>> gen = (word for sentence in text for word in sentence)
>>> for word in gen: print(word)
Hi
Steve!
What's
up?
To answer your question with your own suggestion:
>>> [x for b in a for x in b] # Works fine
While you asked for list comprehension answers, let me also point out the excellent itertools.chain():
>>> from itertools import chain
>>> list(chain.from_iterable(a))
>>> list(chain(*a)) # If you're using python < 2.6
Gee, I guess I found the anwser: I was not taking care enough about which loop is inner and which is outer. The list comprehension should be like:
[x for b in a for x in b]
to get the desired result, and yes, one current value can be the iterator for the next loop.
Order of iterators may seem counter-intuitive.
Take for example: [str(x) for i in range(3) for x in foo(i)]
Let's decompose it:
def foo(i):
return i, i + 0.5
[str(x)
for i in range(3)
for x in foo(i)
]
# is same as
for i in range(3):
for x in foo(i):
yield str(x)
ThomasH has already added a good answer, but I want to show what happens:
>>> a = [[1, 2], [3, 4]]
>>> [x for x in b for b in a]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'b' is not defined
>>> [x for b in a for x in b]
[1, 2, 3, 4]
>>> [x for x in b for b in a]
[3, 3, 4, 4]
I guess Python parses the list comprehension from left to right. This means, the first for loop that occurs will be executed first.
The second "problem" of this is that b gets "leaked" out of the list comprehension. After the first successful list comprehension b == [3, 4].
This memory technic helps me a lot:
[ <RETURNED_VALUE> <OUTER_LOOP1> <INNER_LOOP2> <INNER_LOOP3> ... <OPTIONAL_IF> ]
And now you can think about Return + Outer-loop
as the only Right Order
Knowing above, the order in list comprehensive even for 3 loops seem easy:
c=[111, 222, 333]
b=[11, 22, 33]
a=[1, 2, 3]
print(
[
(i, j, k) # <RETURNED_VALUE>
for i in a for j in b for k in c # in order: loop1, loop2, loop3
if i < 2 and j < 20 and k < 200 # <OPTIONAL_IF>
]
)
[(1, 11, 111)]
because the above is just a:
for i in a: # outer loop1 GOES SECOND
for j in b: # inner loop2 GOES THIRD
for k in c: # inner loop3 GOES FOURTH
if i < 2 and j < 20 and k < 200:
print((i, j, k)) # returned value GOES FIRST
for iterating one nested list/structure, technic is the same:
for a from the question:
a = [[1,2],[3,4]]
[i2 for i1 in a for i2 in i1]
which return [1, 2, 3, 4]
for one another nested level
a = [[[1, 2], [3, 4]], [[5, 6], [7, 8, 9]], [[10]]]
[i3 for i1 in a for i2 in i1 for i3 in i2]
which return [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
and so on
I could never write double list comprehension on my first attempt. Reading into PEP202, it turns out the reason is that it was implemented in the opposite way you would read it in English. The good news is that it is a logically sound implementation, so once you understand the structure, it's very easy to get right.
Let a, b, c, d be successively nested objects. For me, the intuitive way to extend list comprehension would mimic English:
# works
[f(b) for b in a]
# does not work
[f(c) for c in b for b in a]
[f(c) for c in g(b) for b in a]
[f(d) for d in c for c in b for b in a]
In other words, you'd be reading from the bottom up, i.e.
# wrong logic
(((d for d in c) for c in b) for b in a)
However this is not how Python implements nested lists. Instead, the implementation treats the first chunk as completely separate, and then chains the fors and ins in a single block from the top down (instead of bottom up), i.e.
# right logic
d: (for b in a, for c in b, for d in c)
Note that the deepest nested level (for d in c) is farthest from the final object in the list (d). The reason for this comes from Guido himself:
The form [... for x... for y...] nests, with the last index varying fastest, just like nested for loops.
Using Skam's text example, this becomes even more clear:
# word: for sentence in text, for word in sentence
[word for sentence in text for word in sentence]
# letter: for sentence in text, for word in sentence, for letter in word
[letter for sentence in text for word in sentence for letter in word]
# letter:
# for sentence in text if len(sentence) > 2,
# for word in sentence[0],
# for letter in word if letter.isvowel()
[letter for sentence in text if len(sentence) > 2 for word in sentence[0] for letter in word if letter.isvowel()]
If you want to keep the multi dimensional array, one should nest the array brackets. see example below where one is added to every element.
>>> a = [[1, 2], [3, 4]]
>>> [[col +1 for col in row] for row in a]
[[2, 3], [4, 5]]
>>> [col +1 for row in a for col in row]
[2, 3, 4, 5]
I feel this is easier to understand
[row[i] for row in a for i in range(len(a))]
result: [1, 2, 3, 4]
Additionally, you could use just the same variable for the member of the input list which is currently accessed and for the element inside this member. However, this might even make it more (list) incomprehensible.
input = [[1, 2], [3, 4]]
[x for x in input for x in x]
First for x in input is evaluated, leading to one member list of the input, then, Python walks through the second part for x in x during which the x-value is overwritten by the current element it is accessing, then the first x defines what we want to return.
This flatten_nlevel function calls recursively the nested list1 to covert to one level. Try this out
def flatten_nlevel(list1, flat_list):
for sublist in list1:
if isinstance(sublist, type(list)):
flatten_nlevel(sublist, flat_list)
else:
flat_list.append(sublist)
list1 = [1,[1,[2,3,[4,6]],4],5]
items = []
flatten_nlevel(list1,items)
print(items)
output:
[1, 1, 2, 3, 4, 6, 4, 5]
Related
I'm new in python so any help or recomendation is appreciated.
What I'm trying to do is, having two lists (not necessarily inverted).
For instance:
l1 = [1,2,3,4,5]
l2 = [5,4,3,2,1]
Comparing them to return the common values, but not as anyone would normally do, which in this case, the return will be all the elements of the list, because they are the same, just inverted.
What I'm trying to compare is, the same thing but like in stages, or semi portions of the list, and check if there is any coincidence until there, if it is, return that element, if not, keep looking in the next group.
For instance:
the first iteration, would check (having the lists previously defined:
l1 = [1]
l2 = [5]
#is there any coincidence until there? -> false (keep looking)
2nd iteration:
l1 = [1, 2]
l2 = [5, 4]
#is there any coincidence until there? -> false (keep looking)
3rd iteration:
l1 = [1, 2, 3]
l2 = [5, 4, 3]
#is there any coincidence until there? -> true (returns 3,
#which is the element where the coincidence was found, not necessarily
#the same index in both lists)
Having in mind that it will compare the last element from the first list with all from the second till that point, which in this case will be just the first from the second list, if no matches, keep trying with the element immediately preceding the last from the first list with all from the second, and so on, returning the first item that matches.
Another example to clarify:
l1 = [1,2,3,4,5]
l2 = [3,4,5,6,7]
And the output will be 3
A tricky one:
l1 = [1,2,3,4]
l2 = [2,1,4,5]
1st iteration
l1 = [1]
l2 = [2]
# No output
2nd iteration
l1 = [1,2]
l2 = [2,1]
# Output will be 2
Since that element was found in the second list too, and the item that I'm checking first is the last of the first list [1,2], and looking if it is also in the sencond list till that point [2,1].
All of this for needing to implementate the bidirectional search, but I'm finding myself currently stuck in this step as I'm not so used to the for loops and list handling yet.
you can compare the elements of the two lists in the same loop:
l1 = [1,2,3,4,5]
l2 = [5,4,3,2,1]
for i, j in zip(l1, l2):
if i == j:
print('true')
else:
print('false')
It looks like you're really asking: What is (the index of) the first element that l1 and l2 have in common at the same index?
The solution:
next((i, a) for i, (a, b) in enumerate(zip(l1, l2)) if a == b)
How this works:
zip(l1, l2) pairs up elements from l1 and l2, generating tuples
enumerate() gets those tuples, and keeps track of the index, i.e. (0, (1, 5), (1, (2, 4)), etc.
for i, (a, b) in .. generates those pairs of indices and value tuples
The if a == b ensures that only those indices and values where the values match are yielded
next() gets the next element from an iterable, you're interested in the first element that matches the condition, so that's what next() gets you here.
The working example:
l1 = [1, 2, 3, 4, 5]
l2 = [5, 4, 3, 2, 1]
i, v = next((i, a) for i, (a, b) in enumerate(zip(l1, l2)) if a == b)
print(f'index: {i}, value: {v}') # prints "index: 2, value: 3"
If you're not interested in the index, but just in the first value they have in common:
l1 = [1, 2, 3, 4, 5]
l2 = [5, 4, 3, 2, 1]
v = next(a for a, b in zip(l1, l2) if a == b)
print(v) # prints "3"
Edit: you commented and updated the question, and it's clear you don't want the first match at the same index between the lists, but rather the first common element in the heads of the lists.
(or, possibly the first element from the second list that is in the first list, which user #AndrejKesely provided an answer for - which you accepted, although it doesn't appear to answer the problem as described)
Here's a solution that gets the first match from the first part of each list, which seems to match what you describe as the problem:
l1 = [1, 2, 3, 4, 5]
l2 = [5, 2, 6, 7, 8]
v = next(next(iter(x)) for n in range(max(len(l1), len(l2))) if (x := set(l1[:n+1]) & set(l2[:n+1])))
print(v) # prints "2"
Note: the solution fails if there is no match at all, with a StopIteration. Using short-circuiting with any() that can be avoided:
x = None if not any((x := set(l1[:n+1]) & set(l2[:n+1])) for n in range(max(len(l1), len(l2)))) else next(iter(x))
print(x)
This solution has x == None if there is no match, and otherwise x will be the first match in the shortest heads of both lists, so:
l1 = [1, 2, 3, 4, 5]
l2 = [5, 2, 6, 7, 8] # result 2
l1 = [1, 2, 3, 4, 5]
l2 = [5, 6, 7, 8] # result 5
l1 = [1, 2, 3, 4, 5]
l2 = [6, 7, 8] # result None
Note that also:
l1 = [1, 2, 3]
l2 = [4, 3, 2] # result 2, not 3
Both 2 and 3 seem to be valid answers here, it's not clear from your description why 3 should be favoured over 2?
If you do need that element of the two possible answers that comes first in l2, the solution would be a bit more complicated still, since the sets are unordered by definition, so changing the order of l1 and l2 in the answer won't matter.
If you care about that order, this works:
x = None if not any(x := ((set(l1[:n//2+1+n%2]) & set(l2[:n//2+1]))) for n in range(max(len(l1), len(l2)) * 2)) else next(iter(x))
This also works for lists with different lengths, unlike the more readable answer by user #BenGrossmann. Note that they have some efficiency in reusing the constructed sets and adding one element at a time, which also allows them to remember the last element added to the set corresponding with the first list, which is why they also correctly favor 3 over 2 in [[1, 2, 3], [4, 3, 2]].
If the last answer is what you need, you should consider amending their answer (for example using zip_longest) to deal correctly with lists of different lengths, since it will be more efficient for longer lists, and is certainly more readable.
Taking the solution from #BenGrossman, but generalising it for any number of lists, with any number of elements, and favouring the ordering you specified:
from itertools import zip_longest
lists = [[1, 2, 3, 4, 5],
[6, 7, 8, 5, 4]]
sets = [set() for _ in range(len(lists))]
for xs in zip_longest(*lists):
for x, s in zip(xs, sets):
s.add(x)
if i := set.intersection(*sets):
v = sorted([(lists[0].index(x), x) for x in i])[-1][1]
break
else:
v = None
print(v)
This works as described for all the examples, as well as for lists of unequal length, and will favour the elements that are farthest back in the first list (and thus earlier in the others).
The following can be made more efficient, but does work.
lists = [[1,2,3,4,5], # input to the script
[5,4,3,2,1]]
sets = [set(), set()]
for a,b in zip(*lists):
sets[0].add(a)
sets[1].add(b)
if sets[0]&sets[1]:
print("first element in first overlap:")
print(a)
break
else:
print("no overlap")
This results in the output
first element in first overlap:
3
Using lists = [[5,7,6],[7,5,4]] instead results in
first element in first overlap:
7
I am trying to use list comprehension to remove a number of items from a list by just keeping those not specified.
For example if I have 2 lists a = [1,3,5,7,10] and b = [2,4] I want to keep all items from a that are not at an index corresponding to a number in b.
Now, I tried to use y = [a[x] for x not in b] but this produces a SyntaxError.
y = [a[x] for x in b] works fine and keeps just exact the elements that i want removed.
So how do I achieve this? And on a side note, is this a good way to do it or should I use del?
You can use enumerate() and look up indexes in b:
>>> a = [1, 3, 5, 7, 10]
>>> b = [2, 4]
>>> [item for index, item in enumerate(a) if index not in b]
[1, 3, 7]
Note that to improve the lookup time, better have the b as a set instead of a list. Lookups into sets are O(1) on average while in a list - O(n) where n is the length of the list.
Guess you're looking for somthing like :
[ x for x in a if a.index(x) not in b ]
Or, using filter:
filter(lambda x : a.index(x) not in b , a)
Try this it will work
[j for i,j in enumerate(a) if i not in b ]
after this:
y = [a[x] for x in b]
just add:
for x in y:
a.remove(x)
then you end up with a stripped down list in a
I am trying to remove non repeating characters from a list in python. e.g list = [1,1,2,3,3,3,5,6] should return [1,1,3,3].
My initial attempt was:
def tester(data):
for x in data:
if data.count(x) == 1:
data.remove(x)
return data
This will work for some inputs, but for [1,2,3,4,5], for example, it returns [2,4]. Could someone please explain why this occurs?
l=[1,1,2,3,3,3,5,6]
[x for x in l if l.count(x) > 1]
[1, 1, 3, 3, 3]
Adds elements that appear at least twice in your list.
In your own code you need to change the line for x in data to for x in data[:]:
Using data[:] you are iterating over a copy of original list.
There is a linear time solution for that:
def tester(data):
cnt = {}
for e in data:
cnt[e] = cnt.get(e, 0) + 1
return [x for x in data if cnt[x] > 1]
This is occurring because you are removing from a list as you're iterating through it. Instead, consider appending to a new list.
You could also use collections.Counter, if you're using 2.7 or greater:
[a for a, b in collections.Counter(your_list).items() if b > 1]
Another linear solution.
>>> data = [1, 1, 2, 3, 3, 3, 5, 6]
>>> D = dict.fromkeys(data, 0)
>>> for item in data:
... D[item] += 1
...
>>> [item for item in data if D[item] > 1]
[1, 1, 3, 3, 3]
You shouldn't remove items from a mutable list while iterating over that same list. The interpreter doesn't have any way to keep track of where it is in the list while you're doing this.
See this question for another example of the same problem, with many suggested alternative approaches.
you can use the list comprehention,just like this:
def tester(data):
return [x for x in data if data.count(x) != 1]
it is not recommended to remove item when iterating
This question closely relates to How do I run two python loops concurrently?
I'll put it in a clearer manner:
I get what the questioner asks in the above link, something like
for i in [1,2,3], j in [3,2,1]:
print i,j
cmp(i,j) #do_something(i,j)
But
L1: for i in [1,2,3] and j in [3,2,1]:
doesnt work
Q1.
but this was amusing what happened here:
for i in [1,2,3], j in [3,2,1]:
print i,j
[1, 2, 3] 0
False 0
Q2. How do I make something like L1 work?
Not Multithreading or parallelism really. (It's two concurrent tasks not a loop inside a loop) and then compare the result of the two.
Here the lists were numbers. My case is not numbers:
for i in f_iterate1() and j in f_iterate2():
UPDATE: abarnert below was right, I had j defined somewhere. So now it is:
>>> for i in [1,2,3], j in [3,2,1]:
print i,j
Traceback (most recent call last):
File "<pyshell#142>", line 1, in <module>
for i in [1,2,3], j in [3,2,1]:
NameError: name 'j' is not defined
And I am not looking to zip two iteration functions! But process them simultaneously in a for loop like situation. and the question still remains how can it be achieved in python.
UPDATE #2: Solved for same length lists
>>> def a(num):
for x in num:
yield x
>>> n1=[1,2,3,4]
>>> n2=[3,4,5,6]
>>> x1=a(n1)
>>> x2=a(n2)
>>> for i,j in zip(x1,x2):
print i,j
1 3
2 4
3 5
4 6
>>>
[Solved]
Q3. What if n3=[3,4,5,6,7,8,78,34] which is greater than both n1,n2.
zip wont work here.something like izip_longest?
izip_longest works good enough.
It's hard to understand what you're asking, but I think you just want zip:
for i, j in zip([1,2,3], [3,2,1]):
print i, j
for i, j in zip(f_iterate1(), f_iterate2()):
print i, j
And so on…
This doesn't do anything concurrently as the term is normally used, it just does one thing at a time, but that one thing is "iterate over two sequences in lock-step".
Note that this extends in the obvious way to three or more lists:
for i, j, k in zip([1,2,3], [3,2,1], [13, 22, 31]):
print i, j, k
(If you don't even know how many lists you have, see the comments.)
In case you're wondering what's going on with this:
for i in [1,2,3], j in [3,2,1]:
print i,j
Try this:
print [1,2,3], j in [3,2,1]
If you've already defined j somewhere, it will print either [1, 2, 3] False or [1, 2, 3] True. Otherwise, you'll get a NameError. That's because you're just creating a tuple of two values, the first being the list [1,2,3], and the second being the result of the expression j in [3,2,1].
So:
j=0
for i in [1,2,3], j in [3,2 1]:
print i, j
… is equivalent to:
j=0
for i in ([1,2,3], False):
print i, 0
… which will print:
[1, 2, 3] 0
False 0
You want to use the zip() function:
for i, j in zip([1, 2, 3], [3, 2, 1]):
#
for i, j in zip(f_iterate1(), f_iterate2()):
#
zip() pairs up the elements of the input lists, letting you process them together.
If your inputs are large or are iterators, use future_builtins.zip(), or, if you don't care about forward compatibility with Python 3, use itertools.izip() instead; these yield pairs on demand instead of creating a whole output list in one go:
from future_builtins import zip
for i, j in zip(f_iterate1(), f_iterate2()):
Your generators fall in this scenario.
Last but not least, if your input lists have different lengths, zip() stops when the shortest list is exhausted. If you want to continue with the longest list instead, use itertools.izip_longest(); it'll use a fill value when the shorter input sequence(s) are exhausted:
>>> for i, j, k in izip_longest(range(3), range(3, 5), range(5, 10), fillvalue=42):
... print i, j, k
...
0 3 5
1 4 6
2 42 7
42 42 8
42 42 9
The default for fillvalue is None.
Your attempt:
for i in [1,2,3], j in [3,2,1]:
is really interpreted as:
for i in ([1,2,3], j in [3,2,1]):
where the latter part is interpreted as a tuple with two values, one a list, the other a boolean; after testing j in [3,2,1], is either True or False. You had j defined as 0 from a previous loop experiment and thus 0 in [3, 2, 1] is False.
For same-length arrays, you can use the index to refer to corresponding locations in respective lists, like so:
a = [1, 2, 3, 4, 5]
b = [2, 4, 6, 8, 10]
for i in range(len(a)):
print(a[i])
print(b[i])
This accesses same indices of both lists at the same time.
I have two lists of the same length which contains a variety of different elements. I'm trying to compare them to find the number of elements which exist in both lists, but have different indexes.
Here are some example inputs/outputs to demonstrate what I mean:
>>> compare([1, 2, 3, 4], [4, 3, 2, 1])
4
>>> compare([1, 2, 3], [1, 2, 3])
0
# Each item in the first list has the same index in the other
>>> compare([1, 2, 4, 4], [1, 4, 4, 2])
2
# The 3rd '4' in both lists don't count, since they have the same indexes
>>> compare([1, 2, 3, 3], [5, 3, 5, 5])
1
# Duplicates don't count
The lists are always the same size.
This is the algorithm I have so far:
def compare(list1, list2):
# Eliminate any direct matches
list1 = [a for (a, b) in zip(list1, list2) if a != b]
list2 = [b for (a, b) in zip(list1, list2) if a != b]
out = 0
for possible in list1:
if possible in list2:
index = list2.index(possible)
del list2[index]
out += 1
return out
Is there a more concise and eloquent way to do the same thing?
This python function does hold for the examples you provided:
def compare(list1, list2):
D = {e:i for i, e in enumerate(list1)}
return len(set(e for i, e in enumerate(list2) if D.get(e) not in (None, i)))
since duplicates don't count, you can use sets to find only the elements in each list. A set only holds unique elements. Then select only the elements shared between both using list.index
def compare(l1, l2):
s1, s2 = set(l1), set(l2)
shared = s1 & s2 # intersection, only the elements in both
return len([e for e in shared if l1.index(e) != l2.index(e)])
You can actually bring this down to a one-liner if you want
def compare(l1, l2):
return len([e for e in set(l1) & set(l2) if l1.index(e) != l2.index(e)])
Alternative:
Functionally you can use the reduce builtin (in python3, you have to do from functools import reduce first). This avoids construction of the list which saves excess memory usage. It uses a lambda function to do the work.
def compare(l1, l2):
return reduce(lambda acc, e: acc + int(l1.index(e) != l2.index(e)),
set(l1) & set(l2), 0)
A brief explanation:
reduce is a functional programming contruct that reduces an iterable to a single item traditionally. Here we use reduce to reduce the set intersection to a single value.
lambda functions are anonymous functions. Saying lambda x, y: x + 1 is like saying def func(x, y): return x + y except that the function has no name. reduce takes a function as its first argument. The first argument a the lambda receives when used with reduce is the result of the previous function, the accumulator.
set(l1) & set(l2) is a set consisting of unique elements that are in both l1 and l2. It is iterated over, and each element is taken out one at a time and used as the second argument to the lambda function.
0 is the initial value for the accumulator. We use this since we assume there are 0 shared elements with different indices to start.
I dont claim it is the simplest answer, but it is a one-liner.
import numpy as np
import itertools
l1 = [1, 2, 3, 4]
l2 = [1, 3, 2, 4]
print len(np.unique(list(itertools.chain.from_iterable([[a,b] for a,b in zip(l1,l2) if a!= b]))))
I explain:
[[a,b] for a,b in zip(l1,l2) if a!= b]
is the list of couples from zip(l1,l2) with different items. Number of elements in this list is number of positions where items at same position differ between the two lists.
Then, list(itertools.chain.from_iterable() is for merging component lists of a list. For instance :
>>> list(itertools.chain.from_iterable([[3,2,5],[5,6],[7,5,3,1]]))
[3, 2, 5, 5, 6, 7, 5, 3, 1]
Then, discard duplicates with np.unique(), and take len().