Python: Loops for simultaneous operation, Two or possibly more? - python

This question closely relates to How do I run two python loops concurrently?
I'll put it in a clearer manner:
I get what the questioner asks in the above link, something like
for i in [1,2,3], j in [3,2,1]:
print i,j
cmp(i,j) #do_something(i,j)
But
L1: for i in [1,2,3] and j in [3,2,1]:
doesnt work
Q1.
but this was amusing what happened here:
for i in [1,2,3], j in [3,2,1]:
print i,j
[1, 2, 3] 0
False 0
Q2. How do I make something like L1 work?
Not Multithreading or parallelism really. (It's two concurrent tasks not a loop inside a loop) and then compare the result of the two.
Here the lists were numbers. My case is not numbers:
for i in f_iterate1() and j in f_iterate2():
UPDATE: abarnert below was right, I had j defined somewhere. So now it is:
>>> for i in [1,2,3], j in [3,2,1]:
print i,j
Traceback (most recent call last):
File "<pyshell#142>", line 1, in <module>
for i in [1,2,3], j in [3,2,1]:
NameError: name 'j' is not defined
And I am not looking to zip two iteration functions! But process them simultaneously in a for loop like situation. and the question still remains how can it be achieved in python.
UPDATE #2: Solved for same length lists
>>> def a(num):
for x in num:
yield x
>>> n1=[1,2,3,4]
>>> n2=[3,4,5,6]
>>> x1=a(n1)
>>> x2=a(n2)
>>> for i,j in zip(x1,x2):
print i,j
1 3
2 4
3 5
4 6
>>>
[Solved]
Q3. What if n3=[3,4,5,6,7,8,78,34] which is greater than both n1,n2.
zip wont work here.something like izip_longest?
izip_longest works good enough.

It's hard to understand what you're asking, but I think you just want zip:
for i, j in zip([1,2,3], [3,2,1]):
print i, j
for i, j in zip(f_iterate1(), f_iterate2()):
print i, j
And so on…
This doesn't do anything concurrently as the term is normally used, it just does one thing at a time, but that one thing is "iterate over two sequences in lock-step".
Note that this extends in the obvious way to three or more lists:
for i, j, k in zip([1,2,3], [3,2,1], [13, 22, 31]):
print i, j, k
(If you don't even know how many lists you have, see the comments.)
In case you're wondering what's going on with this:
for i in [1,2,3], j in [3,2,1]:
print i,j
Try this:
print [1,2,3], j in [3,2,1]
If you've already defined j somewhere, it will print either [1, 2, 3] False or [1, 2, 3] True. Otherwise, you'll get a NameError. That's because you're just creating a tuple of two values, the first being the list [1,2,3], and the second being the result of the expression j in [3,2,1].
So:
j=0
for i in [1,2,3], j in [3,2 1]:
print i, j
… is equivalent to:
j=0
for i in ([1,2,3], False):
print i, 0
… which will print:
[1, 2, 3] 0
False 0

You want to use the zip() function:
for i, j in zip([1, 2, 3], [3, 2, 1]):
#
for i, j in zip(f_iterate1(), f_iterate2()):
#
zip() pairs up the elements of the input lists, letting you process them together.
If your inputs are large or are iterators, use future_builtins.zip(), or, if you don't care about forward compatibility with Python 3, use itertools.izip() instead; these yield pairs on demand instead of creating a whole output list in one go:
from future_builtins import zip
for i, j in zip(f_iterate1(), f_iterate2()):
Your generators fall in this scenario.
Last but not least, if your input lists have different lengths, zip() stops when the shortest list is exhausted. If you want to continue with the longest list instead, use itertools.izip_longest(); it'll use a fill value when the shorter input sequence(s) are exhausted:
>>> for i, j, k in izip_longest(range(3), range(3, 5), range(5, 10), fillvalue=42):
... print i, j, k
...
0 3 5
1 4 6
2 42 7
42 42 8
42 42 9
The default for fillvalue is None.
Your attempt:
for i in [1,2,3], j in [3,2,1]:
is really interpreted as:
for i in ([1,2,3], j in [3,2,1]):
where the latter part is interpreted as a tuple with two values, one a list, the other a boolean; after testing j in [3,2,1], is either True or False. You had j defined as 0 from a previous loop experiment and thus 0 in [3, 2, 1] is False.

For same-length arrays, you can use the index to refer to corresponding locations in respective lists, like so:
a = [1, 2, 3, 4, 5]
b = [2, 4, 6, 8, 10]
for i in range(len(a)):
print(a[i])
print(b[i])
This accesses same indices of both lists at the same time.

Related

Merging two sorted arrays in python

I am trying to merge two sorted arrays recursively, and I can merge the first few numbers until one pointer exits the array. There seems to be some problem with the base case not getting executed. I have tried to print the new_arr with the pointers for each recursive call to debug but cannot seem to find a solution. Here is my code:
new_arr= []
i= 0
j=0
def merge(arr1, arr2, i, j):
#base case
##when arr1 pointer exits
print(i,j, new_arr)
if(i>len(arr1)-1):
new_arr.append(arr2[j:])
return new_arr
##when arr2 pointer exits
if (j > len(arr2)-1):
new_arr.append(arr1[i:])
return new_arr
if(arr1[i]<arr2[j]):
new_arr.append(arr1[i])
i+=1
merge(arr1, arr2, i, j)
elif(arr1[i]>=arr2[j]):
new_arr.append(arr2[j])
j+=1
merge(arr1, arr2, i, j)
sortedarr = merge([1,9], [3,7,11,14,18,99], i, j)
print(sortedarr)
and here goes my output:
0 0 []
1 0 [1]
1 1 [1, 3]
1 2 [1, 3, 7]
2 2 [1, 3, 7, 9]
None
These are the issues:
new_arr.append(arr2[j:]) should be new_arr.extend(arr2[j:]). append is for appending one item to the list, while extend concatenates a second list to the first. The same change needs to happen in the second case.
As you count on getting the mutated list as a returned value, you should not discard the list that is returned by the recursive call. You should return it back to the caller, until the first caller gets it.
It is a bad idea to have new_arr a global value. If the main program would call the function a second time for some other input, new_arr will still have its previous values, polluting the result of the next call.
Although the first two fixes will make your function work (for a single test), the last issue would best be fixed by using a different pattern:
Let the recursive call return the list that merges the values that still needed to be analysed, i.e. from i and j onwards. The caller is then responsible of prepending its own value to that returned (partial) list. This way there is no more need of a global variable:
def merge(arr1, arr2, i, j):
if i >= len(arr1):
return arr2[j:]
if j >= len(arr2):
return arr1[i:]
if arr1[i] < arr2[j]:
return [arr1[i]] + merge(arr1, arr2, i + 1, j)
else:
return [arr2[j]] + merge(arr1, arr2, i, j + 1)
sortedarr = merge([1,9], [3,7,11,14,18,99], i, j)
print(sortedarr)
Note that Python already has a built-in function that knows how to merge sorted arrays, heapq.merge.
list(heapq.merge((1, 3, 5, 7), (2, 4, 6, 8)))
[1, 2, 3, 4, 5, 6, 7, 8]

Permutations with repetition without two consecutive equal elements

I need a function that generates all the permutation with repetition of an iterable with the clause that two consecutive elements must be different; for example
f([0,1],3).sort()==[(0,1,0),(1,0,1)]
#or
f([0,1],3).sort()==[[0,1,0],[1,0,1]]
#I don't need the elements in the list to be sorted.
#the elements of the return can be tuples or lists, it doesn't change anything
Unfortunatly itertools.permutation doesn't work for what I need (each element in the iterable is present once or no times in the return)
I've tried a bunch of definitions; first, filterting elements from itertools.product(iterable,repeat=r) input, but is too slow for what I need.
from itertools import product
def crp0(iterable,r):
l=[]
for f in product(iterable,repeat=r):
#print(f)
b=True
last=None #supposing no element of the iterable is None, which is fine for me
for element in f:
if element==last:
b=False
break
last=element
if b: l.append(f)
return l
Second, I tried to build r for cycle, one inside the other (where r is the class of the permutation, represented as k in math).
def crp2(iterable,r):
a=list(range(0,r))
s="\n"
tab=" " #4 spaces
l=[]
for i in a:
s+=(2*i*tab+"for a["+str(i)+"] in iterable:\n"+
(2*i+1)*tab+"if "+str(i)+"==0 or a["+str(i)+"]!=a["+str(i-1)+"]:\n")
s+=(2*i+2)*tab+"l.append(a.copy())"
exec(s)
return l
I know, there's no need you remember me: exec is ugly, exec can be dangerous, exec isn't easy-readable... I know.
To understand better the function I suggest you to replace exec(s) with print(s).
I give you an example of what string is inside the exec for crp([0,1],2):
for a[0] in iterable:
if 0==0 or a[0]!=a[-1]:
for a[1] in iterable:
if 1==0 or a[1]!=a[0]:
l.append(a.copy())
But, apart from using exec, I need a better functions because crp2 is still too slow (even if faster than crp0); there's any way to recreate the code with r for without using exec? There's any other way to do what I need?
You could prepare the sequences in two halves, then preprocess the second halves to find the compatible choices.
def crp2(I,r):
r0=r//2
r1=r-r0
A=crp0(I,r0) # Prepare first half sequences
B=crp0(I,r1) # Prepare second half sequences
D = {} # Dictionary showing compatible second half sequences for each token
for i in I:
D[i] = [b for b in B if b[0]!=i]
return [a+b for a in A for b in D[a[-1]]]
In a test with iterable=[0,1,2] and r=15, I found this method to be over a hundred times faster than just using crp0.
You could try to return a generator instead of a list. With large values of r, your method will take a very long time to process product(iterable,repeat=r) and will return a huge list.
With this variant, you should get the first element very fast:
from itertools import product
def crp0(iterable, r):
for f in product(iterable, repeat=r):
last = f[0]
b = True
for element in f[1:]:
if element == last:
b = False
break
last = element
if b:
yield f
for no_repetition in crp0([0, 1, 2], 12):
print(no_repetition)
# (0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1)
# (1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0)
Instead of filtering the elements, you could generate a list directly with only the correct elements. This method uses recursion to create the cartesian product:
def product_no_repetition(iterable, r, last_element=None):
if r == 0:
return [[]]
else:
return [p + [x] for x in iterable
for p in product_no_repetition(iterable, r - 1, x)
if x != last_element]
for no_repetition in product_no_repetition([0, 1], 12):
print(no_repetition)
I agree with #EricDuminil's comment that you do not want "Permutations with repetition." You want a significant subset of the product of the iterable with itself multiple times. I don't know what name is best: I'll just call them products.
Here is an approach that builds each product line without building all the products then filtering out the ones you want. My approach is to work primarily with the indices of the iterable rather than the iterable itself--and not all the indices, but ignoring the last one. So instead of working directly with [2, 3, 5, 7] I work with [0, 1, 2]. Then I work with the products of those indices. I can transform a product such as [1, 2, 2] where r=3 by comparing each index with the previous one. If an index is greater than or equal to the previous one I increment the current index by one. This prevents two indices from being equal, and this also gets be back to using all the indices. So [1, 2, 2] is transformed to [1, 2, 3] where the final 2 was changed to a 3. I now use those indices to select the appropriate items from the iterable, so the iterable [2, 3, 5, 7] with r=3 gets the line [3, 5, 7]. The first index is treated differently, since it has no previous index. My code is:
from itertools import product
def crp3(iterable, r):
L = []
for k in range(len(iterable)):
for f in product(range(len(iterable)-1), repeat=r-1):
ndx = k
a = [iterable[ndx]]
for j in range(r-1):
ndx = f[j] if f[j] < ndx else f[j] + 1
a.append(iterable[ndx])
L.append(a)
return L
Using %timeit in my Spyder/IPython configuration on crp3([0,1], 3) shows 8.54 µs per loop while your crp2([0,1], 3) shows 133 µs per loop. That shows a sizeable speed improvement! My routine works best where iterable is short and r is large--your routine finds len ** r lines (where len is the length of the iterable) and filters them while mine finds len * (len-1) ** (r-1) lines without filtering.
By the way, your crp2() does do filtering, as shown by the if lines in your code that is execed. The sole if in my code does not filter a line, it modifies an item in the line. My code does return surprising results if the items in the iterable are not unique: if that is a problem, just change the iterable to a set to remove the duplicates. Note that I replaced your l name with L: I think l is too easy to confuse with 1 or I and should be avoided. My code could easily be changed to a generator: replace L.append(a) with yield a and remove the lines L = [] and return L.
How about:
from itertools import product
result = [ x for x in product(iterable,repeat=r) if all(x[i-1] != x[i] for i in range(1,len(x))) ]
Elaborating on #peter-de-rivaz's idea (divide and conquer). When you divide the sequence to create into two subsequences, those subsequences are the same or very close. If r = 2*k is even, store the result of crp(k) in a list and merge it with itself. If r=2*k+1, store the result of crp(k) in a list and merge it with itself and with L.
def large(L, r):
if r <= 4: # do not end the divide: too slow
return small(L, r)
n = r//2
M = large(L, r//2)
if r%2 == 0:
return [x + y for x in M for y in M if x[-1] != y[0]]
else:
return [x + y + (e,) for x in M for y in M for e in L if x[-1] != y[0] and y[-1] != e]
small is an adaptation from #eric-duminil's answer using the famous for...else loop of Python:
from itertools import product
def small(iterable, r):
for seq in product(iterable, repeat=r):
prev, *tail = seq
for e in tail:
if e == prev:
break
prev = e
else:
yield seq
A small benchmark:
print(timeit.timeit(lambda: crp2( [0, 1, 2], 10), number=1000))
#0.16290732200013736
print(timeit.timeit(lambda: crp2( [0, 1, 2, 3], 15), number=10))
#24.798989593000442
print(timeit.timeit(lambda: large( [0, 1, 2], 10), number=1000))
#0.0071403849997295765
print(timeit.timeit(lambda: large( [0, 1, 2, 3], 15), number=10))
#0.03471425700081454

printing items in a list represented by bit list

I have this problem on writing a python function which takes a bit list as input and prints the items represented by this bit list.
so the question is on Knapsack and it is a relatively simple and straightforward one as I'm new to the python language too.
so technically the items can be named in a list [1,2,3,4] which corresponds to Type 1, Type 2, Type 3 and etc but we won't be needing the "type". the problem is, i represented the solution in a bit list [0,1,1,1] where 0 means not taken and 1 means taken. in another words, item of type 1 is not taken but the rest are taken, as represented in the bit list i wrote.
now we are required to write a python function which takes the bit list as input and prints the item corresponding to it in which in this case i need the function to print out [2,3,4] leaving out the 1 since it is 0 by bit list. any help on this? it is a 2 mark question but i still couldn't figure it out.
def printItems(l):
for x in range(len(l)):
if x == 0:
return False
elif x == 1:
return l
i tried something like that but it is wrong. much appreciated for any help.
You can do this with the zip function that takes two tiers Lee and returns them in pairs:
for bit_item, item in zip(bit_list, item_list):
if bit_item:
print item
Or if you need a list rather than printing them, you can use a list comprehension:
[item for bit_item, item in zip(bit_list, item_list) if bit_item]
You can use itertools.compress for a quick solution:
>>> import itertools
>>> list(itertools.compress(itertools.count(1), [0, 1, 1, 1]))
[2, 3, 4]
The reason your solution doesn't work is because you are using return in your function, where you need to use print, and make sure you are iterating over your list correctly. In this case, enumerate simplifies things, but there are many similar approaches that would work:
>>> def print_items(l):
... for i,b in enumerate(l,1):
... if b:
... print(i)
...
>>> print_items([0,1,1,1])
2
3
4
>>>
You may do it using list comprehension with enumerate() as:
>>> my_list = [0, 1, 1, 1]
>>> taken_list = [i for i, item in enumerate(my_list, 1) if item]
>>> taken_list # by default start with 0 ^
[2, 3, 4]
Alternatively, in case you do not need any in-built function and want to create your own function, you may modify your code as:
def printItems(l):
new_list = []
for x in range(len(l)):
if l[x] == 1:
new_list.append(x+1) # "x+1" because index starts with `0` and you need position
return new_list
Sample run:
>>> printItems([0, 1, 1, 1])
[2, 3, 4]

Is there a need for range(len(a))?

One frequently finds expressions of this type in python questions on SO. Either for just accessing all items of the iterable
for i in range(len(a)):
print(a[i])
Which is just a clumbersome way of writing:
for e in a:
print(e)
Or for assigning to elements of the iterable:
for i in range(len(a)):
a[i] = a[i] * 2
Which should be the same as:
for i, e in enumerate(a):
a[i] = e * 2
# Or if it isn't too expensive to create a new iterable
a = [e * 2 for e in a]
Or for filtering over the indices:
for i in range(len(a)):
if i % 2 == 1: continue
print(a[i])
Which could be expressed like this:
for e in a [::2]:
print(e)
Or when you just need the length of the list, and not its content:
for _ in range(len(a)):
doSomethingUnrelatedToA()
Which could be:
for _ in a:
doSomethingUnrelatedToA()
In python we have enumerate, slicing, filter, sorted, etc... As python for constructs are intended to iterate over iterables and not only ranges of integers, are there real-world use-cases where you need in range(len(a))?
If you need to work with indices of a sequence, then yes - you use it... eg for the equivalent of numpy.argsort...:
>>> a = [6, 3, 1, 2, 5, 4]
>>> sorted(range(len(a)), key=a.__getitem__)
[2, 3, 1, 5, 4, 0]
Short answer: mathematically speaking, no, in practical terms, yes, for example for Intentional Programming.
Technically, the answer would be "no, it's not needed" because it's expressible using other constructs. But in practice, I use for i in range(len(a) (or for _ in range(len(a)) if I don't need the index) to make it explicit that I want to iterate as many times as there are items in a sequence without needing to use the items in the sequence for anything.
So: "Is there a need?"? — yes, I need it to express the meaning/intent of the code for readability purposes.
See also: https://en.wikipedia.org/wiki/Intentional_programming
And obviously, if there is no collection that is associated with the iteration at all, for ... in range(len(N)) is the only option, so as to not resort to i = 0; while i < N; i += 1 ...
What if you need to access two elements of the list simultaneously?
for i in range(len(a[0:-1])):
something_new[i] = a[i] * a[i+1]
You can use this, but it's probably less clear:
for i, _ in enumerate(a[0:-1]):
something_new[i] = a[i] * a[i+1]
Personally I'm not 100% happy with either!
Going by the comments as well as personal experience, I say no, there is no need for range(len(a)). Everything you can do with range(len(a)) can be done in another (usually far more efficient) way.
You gave many examples in your post, so I won't repeat them here. Instead, I will give an example for those who say "What if I want just the length of a, not the items?". This is one of the only times you might consider using range(len(a)). However, even this can be done like so:
>>> a = [1, 2, 3, 4]
>>> for _ in a:
... print True
...
True
True
True
True
>>>
Clements answer (as shown by Allik) can also be reworked to remove range(len(a)):
>>> a = [6, 3, 1, 2, 5, 4]
>>> sorted(range(len(a)), key=a.__getitem__)
[2, 3, 1, 5, 4, 0]
>>> # Note however that, in this case, range(len(a)) is more efficient.
>>> [x for x, _ in sorted(enumerate(a), key=lambda i: i[1])]
[2, 3, 1, 5, 4, 0]
>>>
So, in conclusion, range(len(a)) is not needed. Its only upside is readability (its intention is clear). But that is just preference and code style.
Sometimes matplotlib requires range(len(y)), e.g., while y=array([1,2,5,6]), plot(y) works fine, scatter(y) does not. One has to write scatter(range(len(y)),y). (Personally, I think this is a bug in scatter; plot and its friends scatter and stem should use the same calling sequences as much as possible.)
It's nice to have when you need to use the index for some kind of manipulation and having the current element doesn't suffice. Take for instance a binary tree that's stored in an array. If you have a method that asks you to return a list of tuples that contains each nodes direct children then you need the index.
#0 -> 1,2 : 1 -> 3,4 : 2 -> 5,6 : 3 -> 7,8 ...
nodes = [0,1,2,3,4,5,6,7,8,9,10]
children = []
for i in range(len(nodes)):
leftNode = None
rightNode = None
if i*2 + 1 < len(nodes):
leftNode = nodes[i*2 + 1]
if i*2 + 2 < len(nodes):
rightNode = nodes[i*2 + 2]
children.append((leftNode,rightNode))
return children
Of course if the element you're working on is an object, you can just call a get children method. But yea, you only really need the index if you're doing some sort of manipulation.
Sometimes, you really don't care about the collection itself. For instance, creating a simple model fit line to compare an "approximation" with the raw data:
fib_raw = [1, 1, 2, 3, 5, 8, 13, 21] # Fibonacci numbers
phi = (1 + sqrt(5)) / 2
phi2 = (1 - sqrt(5)) / 2
def fib_approx(n): return (phi**n - phi2**n) / sqrt(5)
x = range(len(data))
y = [fib_approx(n) for n in x]
# Now plot to compare fib_raw and y
# Compare error, etc
In this case, the values of the Fibonacci sequence itself were irrelevant. All we needed here was the size of the input sequence we were comparing with.
If you have to iterate over the first len(a) items of an object b (that is larger than a), you should probably use range(len(a)):
for i in range(len(a)):
do_something_with(b[i])
I have an use case I don't believe any of your examples cover.
boxes = [b1, b2, b3]
items = [i1, i2, i3, i4, i5]
for j in range(len(boxes)):
boxes[j].putitemin(items[j])
I'm relatively new to python though so happy to learn a more elegant approach.
Very simple example:
def loadById(self, id):
if id in range(len(self.itemList)):
self.load(self.itemList[id])
I can't think of a solution that does not use the range-len composition quickly.
But probably instead this should be done with try .. except to stay pythonic i guess..
One problem with for i, num in enumerate(a) is that num does not change when you change a[i]. For example, this loop:
for i, num in enumerate(a):
while num > 0:
a[i] -= 1
will never end.
Of course, you could still use enumerate while swapping each use of num for a[i], but that kind of defeats the whole purpose of enumerate, so using for i in range(len(a)) just becomes more logical and readable.
Having a range of indices is useful for some more sophisticated problems in combinatorics. For example, to get all possible partitions of a list into three non-empty sections, the most straightforward approach is to find all possible combinations of distinct endpoints between the first and second section and between the second and third section. This is equivalent to ordered pairs of integers chosen from the valid indices into the list (except zero, since that would make the first partition empty). Thus:
>>> from itertools import combinations
>>> def three_parts(sequence):
... for i, j in combinations(range(1, len(sequence)), 2):
... yield (sequence[:i], sequence[i:j], sequence[j:])
...
>>> list(three_parts('example'))
[('e', 'x', 'ample'), ('e', 'xa', 'mple'), ('e', 'xam', 'ple'), ('e', 'xamp', 'le'), ('e', 'xampl', 'e'), ('ex', 'a', 'mple'), ('ex', 'am', 'ple'), ('ex', 'amp', 'le'), ('ex', 'ampl', 'e'), ('exa', 'm', 'ple'), ('exa', 'mp', 'le'), ('exa', 'mpl', 'e'), ('exam', 'p', 'le'), ('exam', 'pl', 'e'), ('examp', 'l', 'e')]
My code is:
s=["9"]*int(input())
for I in range(len(s)):
while not set(s[I])<=set('01'):s[i]=input(i)
print(bin(sum([int(x,2)for x in s]))[2:])
It is a binary adder but I don't think the range len or the inside can be replaced to make it smaller/better.
I think it's useful for tqdm if you have a large loop and you want to track progress. This will output a progress bar:
from tqdm import tqdm
empty_list = np.full(len(items), np.nan)
for i in tqdm(range(len(items))):
empty_list[i] = do_something(items[i])
This will not show progress, at least in the case I was using it for:
empty_list = np.full(len(items), np.nan)
for i, _ in tqdm(enumerate(items)):
empty_list[i] = do_something(items[i])
Just showed number of iterations. Not as helpful.

Double Iteration in List Comprehension [duplicate]

This question already has answers here:
How can I use list comprehensions to process a nested list?
(13 answers)
Closed last month.
In Python you can have multiple iterators in a list comprehension, like
[(x,y) for x in a for y in b]
for some suitable sequences a and b. I'm aware of the nested loop semantics of Python's list comprehensions.
My question is: Can one iterator in the comprehension refer to the other? In other words: Could I have something like this:
[x for x in a for a in b]
where the current value of the outer loop is the iterator of the inner?
As an example, if I have a nested list:
a=[[1,2],[3,4]]
what would the list comprehension expression be to achieve this result:
[1,2,3,4]
?? (Please only list comprehension answers, since this is what I want to find out).
Suppose you have a text full of sentences and you want an array of words.
# Without list comprehension
list_of_words = []
for sentence in text:
for word in sentence:
list_of_words.append(word)
return list_of_words
I like to think of list comprehension as stretching code horizontally.
Try breaking it up into:
# List Comprehension
[word for sentence in text for word in sentence]
Example:
>>> text = (("Hi", "Steve!"), ("What's", "up?"))
>>> [word for sentence in text for word in sentence]
['Hi', 'Steve!', "What's", 'up?']
This also works for generators
>>> text = (("Hi", "Steve!"), ("What's", "up?"))
>>> gen = (word for sentence in text for word in sentence)
>>> for word in gen: print(word)
Hi
Steve!
What's
up?
To answer your question with your own suggestion:
>>> [x for b in a for x in b] # Works fine
While you asked for list comprehension answers, let me also point out the excellent itertools.chain():
>>> from itertools import chain
>>> list(chain.from_iterable(a))
>>> list(chain(*a)) # If you're using python < 2.6
Gee, I guess I found the anwser: I was not taking care enough about which loop is inner and which is outer. The list comprehension should be like:
[x for b in a for x in b]
to get the desired result, and yes, one current value can be the iterator for the next loop.
Order of iterators may seem counter-intuitive.
Take for example: [str(x) for i in range(3) for x in foo(i)]
Let's decompose it:
def foo(i):
return i, i + 0.5
[str(x)
for i in range(3)
for x in foo(i)
]
# is same as
for i in range(3):
for x in foo(i):
yield str(x)
ThomasH has already added a good answer, but I want to show what happens:
>>> a = [[1, 2], [3, 4]]
>>> [x for x in b for b in a]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'b' is not defined
>>> [x for b in a for x in b]
[1, 2, 3, 4]
>>> [x for x in b for b in a]
[3, 3, 4, 4]
I guess Python parses the list comprehension from left to right. This means, the first for loop that occurs will be executed first.
The second "problem" of this is that b gets "leaked" out of the list comprehension. After the first successful list comprehension b == [3, 4].
This memory technic helps me a lot:
[ <RETURNED_VALUE> <OUTER_LOOP1> <INNER_LOOP2> <INNER_LOOP3> ... <OPTIONAL_IF> ]
And now you can think about Return + Outer-loop
as the only Right Order
Knowing above, the order in list comprehensive even for 3 loops seem easy:
c=[111, 222, 333]
b=[11, 22, 33]
a=[1, 2, 3]
print(
[
(i, j, k) # <RETURNED_VALUE>
for i in a for j in b for k in c # in order: loop1, loop2, loop3
if i < 2 and j < 20 and k < 200 # <OPTIONAL_IF>
]
)
[(1, 11, 111)]
because the above is just a:
for i in a: # outer loop1 GOES SECOND
for j in b: # inner loop2 GOES THIRD
for k in c: # inner loop3 GOES FOURTH
if i < 2 and j < 20 and k < 200:
print((i, j, k)) # returned value GOES FIRST
for iterating one nested list/structure, technic is the same:
for a from the question:
a = [[1,2],[3,4]]
[i2 for i1 in a for i2 in i1]
which return [1, 2, 3, 4]
for one another nested level
a = [[[1, 2], [3, 4]], [[5, 6], [7, 8, 9]], [[10]]]
[i3 for i1 in a for i2 in i1 for i3 in i2]
which return [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
and so on
I could never write double list comprehension on my first attempt. Reading into PEP202, it turns out the reason is that it was implemented in the opposite way you would read it in English. The good news is that it is a logically sound implementation, so once you understand the structure, it's very easy to get right.
Let a, b, c, d be successively nested objects. For me, the intuitive way to extend list comprehension would mimic English:
# works
[f(b) for b in a]
# does not work
[f(c) for c in b for b in a]
[f(c) for c in g(b) for b in a]
[f(d) for d in c for c in b for b in a]
In other words, you'd be reading from the bottom up, i.e.
# wrong logic
(((d for d in c) for c in b) for b in a)
However this is not how Python implements nested lists. Instead, the implementation treats the first chunk as completely separate, and then chains the fors and ins in a single block from the top down (instead of bottom up), i.e.
# right logic
d: (for b in a, for c in b, for d in c)
Note that the deepest nested level (for d in c) is farthest from the final object in the list (d). The reason for this comes from Guido himself:
The form [... for x... for y...] nests, with the last index varying fastest, just like nested for loops.
Using Skam's text example, this becomes even more clear:
# word: for sentence in text, for word in sentence
[word for sentence in text for word in sentence]
# letter: for sentence in text, for word in sentence, for letter in word
[letter for sentence in text for word in sentence for letter in word]
# letter:
# for sentence in text if len(sentence) > 2,
# for word in sentence[0],
# for letter in word if letter.isvowel()
[letter for sentence in text if len(sentence) > 2 for word in sentence[0] for letter in word if letter.isvowel()]
If you want to keep the multi dimensional array, one should nest the array brackets. see example below where one is added to every element.
>>> a = [[1, 2], [3, 4]]
>>> [[col +1 for col in row] for row in a]
[[2, 3], [4, 5]]
>>> [col +1 for row in a for col in row]
[2, 3, 4, 5]
I feel this is easier to understand
[row[i] for row in a for i in range(len(a))]
result: [1, 2, 3, 4]
Additionally, you could use just the same variable for the member of the input list which is currently accessed and for the element inside this member. However, this might even make it more (list) incomprehensible.
input = [[1, 2], [3, 4]]
[x for x in input for x in x]
First for x in input is evaluated, leading to one member list of the input, then, Python walks through the second part for x in x during which the x-value is overwritten by the current element it is accessing, then the first x defines what we want to return.
This flatten_nlevel function calls recursively the nested list1 to covert to one level. Try this out
def flatten_nlevel(list1, flat_list):
for sublist in list1:
if isinstance(sublist, type(list)):
flatten_nlevel(sublist, flat_list)
else:
flat_list.append(sublist)
list1 = [1,[1,[2,3,[4,6]],4],5]
items = []
flatten_nlevel(list1,items)
print(items)
output:
[1, 1, 2, 3, 4, 6, 4, 5]

Categories