One frequently finds expressions of this type in python questions on SO. Either for just accessing all items of the iterable
for i in range(len(a)):
print(a[i])
Which is just a clumbersome way of writing:
for e in a:
print(e)
Or for assigning to elements of the iterable:
for i in range(len(a)):
a[i] = a[i] * 2
Which should be the same as:
for i, e in enumerate(a):
a[i] = e * 2
# Or if it isn't too expensive to create a new iterable
a = [e * 2 for e in a]
Or for filtering over the indices:
for i in range(len(a)):
if i % 2 == 1: continue
print(a[i])
Which could be expressed like this:
for e in a [::2]:
print(e)
Or when you just need the length of the list, and not its content:
for _ in range(len(a)):
doSomethingUnrelatedToA()
Which could be:
for _ in a:
doSomethingUnrelatedToA()
In python we have enumerate, slicing, filter, sorted, etc... As python for constructs are intended to iterate over iterables and not only ranges of integers, are there real-world use-cases where you need in range(len(a))?
If you need to work with indices of a sequence, then yes - you use it... eg for the equivalent of numpy.argsort...:
>>> a = [6, 3, 1, 2, 5, 4]
>>> sorted(range(len(a)), key=a.__getitem__)
[2, 3, 1, 5, 4, 0]
Short answer: mathematically speaking, no, in practical terms, yes, for example for Intentional Programming.
Technically, the answer would be "no, it's not needed" because it's expressible using other constructs. But in practice, I use for i in range(len(a) (or for _ in range(len(a)) if I don't need the index) to make it explicit that I want to iterate as many times as there are items in a sequence without needing to use the items in the sequence for anything.
So: "Is there a need?"? — yes, I need it to express the meaning/intent of the code for readability purposes.
See also: https://en.wikipedia.org/wiki/Intentional_programming
And obviously, if there is no collection that is associated with the iteration at all, for ... in range(len(N)) is the only option, so as to not resort to i = 0; while i < N; i += 1 ...
What if you need to access two elements of the list simultaneously?
for i in range(len(a[0:-1])):
something_new[i] = a[i] * a[i+1]
You can use this, but it's probably less clear:
for i, _ in enumerate(a[0:-1]):
something_new[i] = a[i] * a[i+1]
Personally I'm not 100% happy with either!
Going by the comments as well as personal experience, I say no, there is no need for range(len(a)). Everything you can do with range(len(a)) can be done in another (usually far more efficient) way.
You gave many examples in your post, so I won't repeat them here. Instead, I will give an example for those who say "What if I want just the length of a, not the items?". This is one of the only times you might consider using range(len(a)). However, even this can be done like so:
>>> a = [1, 2, 3, 4]
>>> for _ in a:
... print True
...
True
True
True
True
>>>
Clements answer (as shown by Allik) can also be reworked to remove range(len(a)):
>>> a = [6, 3, 1, 2, 5, 4]
>>> sorted(range(len(a)), key=a.__getitem__)
[2, 3, 1, 5, 4, 0]
>>> # Note however that, in this case, range(len(a)) is more efficient.
>>> [x for x, _ in sorted(enumerate(a), key=lambda i: i[1])]
[2, 3, 1, 5, 4, 0]
>>>
So, in conclusion, range(len(a)) is not needed. Its only upside is readability (its intention is clear). But that is just preference and code style.
Sometimes matplotlib requires range(len(y)), e.g., while y=array([1,2,5,6]), plot(y) works fine, scatter(y) does not. One has to write scatter(range(len(y)),y). (Personally, I think this is a bug in scatter; plot and its friends scatter and stem should use the same calling sequences as much as possible.)
It's nice to have when you need to use the index for some kind of manipulation and having the current element doesn't suffice. Take for instance a binary tree that's stored in an array. If you have a method that asks you to return a list of tuples that contains each nodes direct children then you need the index.
#0 -> 1,2 : 1 -> 3,4 : 2 -> 5,6 : 3 -> 7,8 ...
nodes = [0,1,2,3,4,5,6,7,8,9,10]
children = []
for i in range(len(nodes)):
leftNode = None
rightNode = None
if i*2 + 1 < len(nodes):
leftNode = nodes[i*2 + 1]
if i*2 + 2 < len(nodes):
rightNode = nodes[i*2 + 2]
children.append((leftNode,rightNode))
return children
Of course if the element you're working on is an object, you can just call a get children method. But yea, you only really need the index if you're doing some sort of manipulation.
Sometimes, you really don't care about the collection itself. For instance, creating a simple model fit line to compare an "approximation" with the raw data:
fib_raw = [1, 1, 2, 3, 5, 8, 13, 21] # Fibonacci numbers
phi = (1 + sqrt(5)) / 2
phi2 = (1 - sqrt(5)) / 2
def fib_approx(n): return (phi**n - phi2**n) / sqrt(5)
x = range(len(data))
y = [fib_approx(n) for n in x]
# Now plot to compare fib_raw and y
# Compare error, etc
In this case, the values of the Fibonacci sequence itself were irrelevant. All we needed here was the size of the input sequence we were comparing with.
If you have to iterate over the first len(a) items of an object b (that is larger than a), you should probably use range(len(a)):
for i in range(len(a)):
do_something_with(b[i])
I have an use case I don't believe any of your examples cover.
boxes = [b1, b2, b3]
items = [i1, i2, i3, i4, i5]
for j in range(len(boxes)):
boxes[j].putitemin(items[j])
I'm relatively new to python though so happy to learn a more elegant approach.
Very simple example:
def loadById(self, id):
if id in range(len(self.itemList)):
self.load(self.itemList[id])
I can't think of a solution that does not use the range-len composition quickly.
But probably instead this should be done with try .. except to stay pythonic i guess..
One problem with for i, num in enumerate(a) is that num does not change when you change a[i]. For example, this loop:
for i, num in enumerate(a):
while num > 0:
a[i] -= 1
will never end.
Of course, you could still use enumerate while swapping each use of num for a[i], but that kind of defeats the whole purpose of enumerate, so using for i in range(len(a)) just becomes more logical and readable.
Having a range of indices is useful for some more sophisticated problems in combinatorics. For example, to get all possible partitions of a list into three non-empty sections, the most straightforward approach is to find all possible combinations of distinct endpoints between the first and second section and between the second and third section. This is equivalent to ordered pairs of integers chosen from the valid indices into the list (except zero, since that would make the first partition empty). Thus:
>>> from itertools import combinations
>>> def three_parts(sequence):
... for i, j in combinations(range(1, len(sequence)), 2):
... yield (sequence[:i], sequence[i:j], sequence[j:])
...
>>> list(three_parts('example'))
[('e', 'x', 'ample'), ('e', 'xa', 'mple'), ('e', 'xam', 'ple'), ('e', 'xamp', 'le'), ('e', 'xampl', 'e'), ('ex', 'a', 'mple'), ('ex', 'am', 'ple'), ('ex', 'amp', 'le'), ('ex', 'ampl', 'e'), ('exa', 'm', 'ple'), ('exa', 'mp', 'le'), ('exa', 'mpl', 'e'), ('exam', 'p', 'le'), ('exam', 'pl', 'e'), ('examp', 'l', 'e')]
My code is:
s=["9"]*int(input())
for I in range(len(s)):
while not set(s[I])<=set('01'):s[i]=input(i)
print(bin(sum([int(x,2)for x in s]))[2:])
It is a binary adder but I don't think the range len or the inside can be replaced to make it smaller/better.
I think it's useful for tqdm if you have a large loop and you want to track progress. This will output a progress bar:
from tqdm import tqdm
empty_list = np.full(len(items), np.nan)
for i in tqdm(range(len(items))):
empty_list[i] = do_something(items[i])
This will not show progress, at least in the case I was using it for:
empty_list = np.full(len(items), np.nan)
for i, _ in tqdm(enumerate(items)):
empty_list[i] = do_something(items[i])
Just showed number of iterations. Not as helpful.
Related
I am looping over a list twice and want to catch all unique pairs, rather than all combinations- ie. order in the pair doesn't matter
listy=[0,1,2]
out=[]
for i in listy:
for j in listy:
out.append([i,j])
The output I'm getting is [[0,0],[0,1],[0,2],[1,0],[1,1],[1,2],[2,0],[2,1],[2,2]]
What I am looking for is [[0,0],[0,1],[0,2],[1,1],[1,2],[2,2]]
One possible solution is,
listy=[0,1,2]
out=[]
for i in listy:
for j in listy:
pair=set([i,j])
if pair not in out:
out.append(pair)
This produces [{0},{0,1},{0,2},{1},{1,2},{2}]
However this creates inefficiency in what is a heavy script (the lists are long) and I also do not want a list of sets. I want a list of lists.
Is there a better way to achieve this without using itertools (I want an implementation I can also apply to javascript without too much rethinking)
I don't know javascript at all, but if there is something analogous to list comprehension I would try this:
listy = [0,1,2]
pairs = [[i, j] for i in listy for j in listy if i <= j]
The content of pairs is exactly as you wanted:
[[0, 0], [0, 1], [0, 2], [1, 1], [1, 2], [2, 2]]
Option 1 - "minor fix"
A trivial "fix" would be:
listy=[0,1,2]
out=set()
for i in listy:
for j in listy:
if (i, j) not in out and (j, i) not in out:
out.add((i,j))
The result is:
{(0, 1), (1, 2), (0, 0), (1, 1), (0, 2), (2, 2)}
However this is not an efficient implementation, because we have to check twice if an element is in the list.
Option 2 - More efficient implementation
You could achieve your goal using a trivial scan of the array:
listy=[0,1,2]
out = [(i, j) for i in range(len(listy)) for j in range(i, len(listy))]
NOTE: I use tuples for the pairs, you could easily change that into a list of lists using:
out = [[i, j] for i in range(len(listy)) for j in range(i, len(listy))]
The most straightforward translation of itertools.combinations_with_replacement(listy, 2) (which matches the behavior you desire) that directly generates the results you want from an input list is:
listy = [0,1,2]
out = []
for idx, i in enumerate(listy):
for j in listy[idx:]:
out.append([i, j])
Only change is the use of enumerate (to get the current index being iterated as you iterate) and slicing the listy used in the inner loop (so it starts at the same index as the current run of the outer loop).
This gets the exact result requested with minimal overhead (it does make shallow copies of the list, of decreasing size, once for each inner loop, but this is fairly quick in Python; unless the list is huge, it should be a pretty minimal cost). If you need to avoid slicing, you can make the inner loop an index-based loop with indexing (but in practice, the overhead of indexing is high enough that it'll often lose to the slicing):
listy = [0,1,2]
out = []
for idx, i in enumerate(listy):
for idxj in range(idx, len(listy)):
out.append([i, j[idxj]])
In a homework set I'm working on, I've come across the following question, which I am having trouble answering in a Python-3 function:
"Write a function alternate : int list -> int that takes a list of
numbers and adds them with alternating sign. For example alternate
[1,2,3,4] = 1 - 2 + 3 - 4 = -2."
Full disclosure, the question was written with Standard ML in mind but I have been attempting to learn Python and came across the question. I'm imagining it involves some combination of:
splitting the list,
if [i] % 2 == 0:
and then concatenating the alternate plus and minus signs.
def alternate(l):
return sum(l[::2]) - sum(l[1::2])
Take the sum of all the even indexed elements and subtract the sum of all the odd indexed elements. Empty lists sum to 0 so it coincidently handles lists of length 0 or 1 without code specifically for those cases.
References:
list slice examples
sum()
Not using fancy modules or operators since you are learning Python.
>>> mylist = range(2,20,3)
>>> mylist
[2, 5, 8, 11, 14, 17]
>>> sum(item if i%2 else -1*item for i,item in enumerate(mylist, 1))
-9
>>>
How it works?
>>> mylist = range(2,20,3)
>>> mylist
[2, 5, 8, 11, 14, 17]
enumerate(mylist, 1) - returns each item in the list and its index in the list starting from 1
If the index is odd, then add the item. If the index is even add the negative of the item.
if i%2:
return item
else:
return -1*item
Add everything using sum bulitin.
>>> sum(item if i%2 else -1*item for i,item in enumerate(mylist, 1))
-9
>>>
Although this already has an accepted answer I felt it would be better to also provide a solution that isn't a one-liner.
def alt_sum(lst):
total = 0
for i, value in enumerate(lst):
# checks if current index is odd or even
# if even then add, if odd then subtract
if i % 2 == 0:
total += value
else:
total -= value
return total
>>> alt_sum([1, 2, 3, 4])
-2
my_list = range(3, 20, 2)
sum(item * ((-1)**index) for index, item in enumerate(my_list))
sum = 11 (result of 3-5+7-9+11-13+15-17+19)
You could try this list comprehension:
sum([-e if c%2 else e for c,e in enumerate(yourlistylist)])
Here is one way using operator module:
In [21]: from operator import pos, neg
In [23]: ops = (pos, neg)
In [24]: sum(ops[ind%2](value) for ind, value in enumerate(lst))
Out[24]: -2
You can use cycle from itertools to alternate +/-
>>> from itertools import cycle
>>> data = [*range(1, 5)]
>>> sum(i * s for i, s in zip(data, cycle([1, -1])))
-2
I would like to loop through a list checking each item against the one following it.
Is there a way I can loop through all but the last item using for x in y? I would prefer to do it without using indexes if I can.
Note
freespace answered my actual question, which is why I accepted the answer, but SilentGhost answered the question I should have asked.
Apologies for the confusion.
for x in y[:-1]
If y is a generator, then the above will not work.
the easiest way to compare the sequence item with the following:
for i, j in zip(a, a[1:]):
# compare i (the current) to j (the following)
If you want to get all the elements in the sequence pair wise, use this approach (the pairwise function is from the examples in the itertools module).
from itertools import tee, izip, chain
def pairwise(seq):
a,b = tee(seq)
b.next()
return izip(a,b)
for current_item, next_item in pairwise(y):
if compare(current_item, next_item):
# do what you have to do
If you need to compare the last value to some special value, chain that value to the end
for current, next_item in pairwise(chain(y, [None])):
if you meant comparing nth item with n+1 th item in the list you could also do with
>>> for i in range(len(list[:-1])):
... print list[i]>list[i+1]
note there is no hard coding going on there. This should be ok unless you feel otherwise.
To compare each item with the next one in an iterator without instantiating a list:
import itertools
it = (x for x in range(10))
data1, data2 = itertools.tee(it)
data2.next()
for a, b in itertools.izip(data1, data2):
print a, b
This answers what the OP should have asked, i.e. traverse a list comparing consecutive elements (excellent SilentGhost answer), yet generalized for any group (n-gram): 2, 3, ... n:
zip(*(l[start:] for start in range(0, n)))
Examples:
l = range(0, 4) # [0, 1, 2, 3]
list(zip(*(l[start:] for start in range(0, 2)))) # == [(0, 1), (1, 2), (2, 3)]
list(zip(*(l[start:] for start in range(0, 3)))) # == [(0, 1, 2), (1, 2, 3)]
list(zip(*(l[start:] for start in range(0, 4)))) # == [(0, 1, 2, 3)]
list(zip(*(l[start:] for start in range(0, 5)))) # == []
Explanations:
l[start:] generates a a list/generator starting from index start
*list or *generator: passes all elements to the enclosing function zip as if it was written zip(elem1, elem2, ...)
Note:
AFAIK, this code is as lazy as it can be. Not tested.
Is it possible to shuffle only a (continuous) part of a given list (or array in numpy)?
If this is not generally possible, how about the special case where the first element is fixed while the rest of the list/array need to be shuffled? For example, I have a list/array:
to_be_shuffled = [None, 'a', 'b', 'c', 'd', ...]
where the first element should always stay, while the rest are going to be shuffled repeatedly.
One possible way is to shuffle the whole list first, and then check the first element, if it is not the special fixed element (e.g. None), then swap its position with that of the special element (which would then require a lookup).
Is there any better way for doing this?
Why not just
import random
rest = to_be_shuffled[1:]
random.shuffle(rest)
shuffled_lst = [to_be_shuffled[0]] + rest
numpy arrays don't copy data on slicing:
numpy.random.shuffle(a[1:])
I thought it would be interesting and educational to try to implement a slightly more general approach than what you're asking for. Here I shuffle the indices to the original list (rather than the list itself), excluding the locked indices, and use that index-list to cherry pick elements from the original list. This is not an in-place solution, but implemented as a generator so you can lazily pick elements.
Feel free to edit if you can improve it.
import random
def partial_shuf(input_list, fixed_indices):
"""Given an input_list, yield elements from that list in random order
except where elements indices are in fixed_indices."""
fixed_indices = sorted(set(i for i in fixed_indices if i < len(input_list)))
i = 0
for fixed in fixed_indices:
aslice = range(i, fixed)
i = 1 + fixed
random.shuffle(aslice)
for j in aslice:
yield input_list[j]
yield input_list[fixed]
aslice = range(i, len(input_list))
random.shuffle(aslice)
for j in aslice:
yield input_list[j]
print '\n'.join(' '.join((str(i), str(n))) for i, n in enumerate(partial_shuf(range(4, 36), [0, 4, 9, 17, 25, 40])))
assert sorted(partial_shuf(range(4, 36), [0, 4, 9, 17, 25, 40])) == range(4, 36)
I took the shuffle function from the standard library random module (found in Lib\random.py) and modified it slightly so that it shuffles only a portion of the list specified by start and stop. It does this in place. Enjoy!
from random import randint
def shuffle(x, start=0, stop=None):
if stop is None:
stop = len(x)
for i in reversed(range(start + 1, stop)):
# pick an element in x[start: i+1] with which to exchange x[i]
j = randint(start, i)
x[i], x[j] = x[j], x[i]
For your purposes, calling this function with 1 as the start parameter should do the trick.
I am having a hard time formulating my question so I'll just show by example.
x = ['abc', 'c', 'w', 't', '3']
a, b = random_split(x, 3) # first list should be length 3
# e.g. a => ['abc', 'w', 't']
# e.g. b => ['c', '3']
Is there an easy way of splitting a list into two random samples while maintaining the original ordering?
Edit: I know that I could use random.sample and then reorder, but I was hoping for an easy, simple, one line method.
Edit 2: Here's another solution, see if you can improve it:
def random_split(l, a_size):
a, b = [], []
m = len(l)
which = ([a] * a_size) + ([b] * (m - a_size))
random.shuffle(which)
for array, sample in zip(which, l):
array.append(sample)
return a, b
Edit 3: My concern in avoiding sorting was that in the best case scenario it is O(N*log(N)). It should be possible to get a function that scales O(N) Unfortunately, none of the solutions posted so far actually achieve O(N) Though, after a little thought I found one that works and is comparable to #PedroWerneck's answer in performance. Though, I'm not 100% sure that is truly random.
def random_split(items, size):
n = len(items)
a, b = [], []
for item in items:
if size > 0 and random.random() < float(size)/n:
b.append(item)
size -= 1
else:
a.append(item)
n -= 1
return a, b
I believe it's impossible to do the limiting and no sorting after splitting while keeping the randomness in a simpler way than just sampling and reordering.
If there was no limit, it would be as random as the RNG can by by iterating over the list, and choosing randomly which destination list to send the values to:
>>> import random
>>> x = range(20)
>>> a = []
>>> b = []
>>> for v in x:
... random.choice((a, b)).append(v)
...
>>> a
[0, 2, 3, 4, 6, 7, 10, 12, 15, 17]
>>> b
[1, 5, 8, 9, 11, 13, 14, 16, 18, 19]
If you can deal with some bias, you can stop appending to the first list when it reaches the limit and still use the solution above. If you'll deal with small lists like in your example, it shouldn't be a big deal to retry it until you get the first list length right.
If you want it to be really random and be able to limit the first list size, then you'll have to give up and reorder at least one of the lists. The closest to a one liner implementation I can think is something like:
>>> x = range(20)
>>> b = x[:]
>>> a = sorted([b.pop(b.index(random.choice(b))) for n in xrange(limit)])
>>> a
[0, 1, 5, 10, 15, 16, 17]
>>> b
[2, 3, 4, 6, 7, 8, 9, 11, 12, 13, 14, 18, 19]
You have to sort a, but b has the order kept.
edit
Now, do you really have to avoid reordering at all costs? Many neat solutions were posted, and your second solution is very nice, but none of them is simpler, easier and shorter than:
def random_split(items, size):
sample = set(random.sample(items, size))
return sorted(sample), sorted(set(items) - sample)
Even considering both sorting operations, I think it's hard to beat that one for simplicity and efficiency. Consider how optimized Python's Timsort is and how most other methods have to iterate over the n items at least once for each list.
If you really must avoid reordering, I guess this one also works and is very easy and simple, but iterates twice:
def random_split(items, size):
sample = set(random.sample(items, size))
a = [x for x in items if x in sample]
b = [x for x in items if x not in sample]
return a, b
This is essentially the same as Hexparrot's solution with the set(sample) suggested by senderle to make comparisons O(1), and removing the redundant index sample and enumerate calls. You don't need that if you deal only with hashable objects.
How about this approach. Random sample from the indexes and return two lists from two list comprehensions if in and if not in:
def random_split(lst, size):
import random
samp = set(random.sample(xrange(len(lst)),size))
return ([v for i,v in enumerate(lst) if i in samp],
[v for i,v in enumerate(lst) if i not in samp])
x = ['abc', 'c', 'w', 't', '3']
print random_split(x,3)
returns
(['abc', 't', '3'], ['c', 'w']) #random and retains order
Ok, there have been lots of interesting suggestions, one of which I inadvertently duplicated in a previous version of this post. But here are two solutions that have not been presented in this exact form:
def random_split(seq, n):
indices = set(random.sample(range(len(seq)), n))
left_right = ([], [])
for n, x in enumerate(seq):
left_right[n not in indices].append(x)
return left_right
This does just one pass through the list and produces a uniformly random partition of the list, maintaining order. It's a refinement of hexparrot's suggestion, which was the one I inadvertently duplicated. You could use the ternary operator and two separate lists, but this seems a tad cleaner to me. Using enumerate allows this to handle non-hashable items, as well as sequences with duplicates.
def random_split(seq, n):
rnd_bools = random.sample((0,) * n + (1,) * (len(seq) - n), len(seq))
left_right = ([], [])
for b, x in zip(rnd_bools, seq):
left_right[b].append(x)
return left_right
This one feels right to me. It's a refinement of Jacob Eggers second edit to the question. It's not very different, but instead of shuffling a list of lists, it shuffles a list of bools. I think it's a tad more comprehensible at first glance. It avoids the 2-line shuffle by using random.sample, which generates a copy; some may prefer the 2-line shuffle, and it's easily replaced.
Note that both of these work on the same basic principle: generate a sequence of bools and use them to index a left_right tuple; the first could easily be made almost identical to the second by pre-generating the boolean list.
Finally, the second solution can be converted into a profoundly ugly "one-liner" that I do not recommend -- obviously -- but that I display here for your amusement and ridicule:
random_split = lambda seq, n: reduce(lambda a, b: (a[0] + ([b[1]] if not b[0] else []), a[1] + ([b[1]] if b[0] else [])), zip(random.sample((0,) * n + (1,) * (len(seq) - n), len(seq)), seq), ([], []))
Here is a transcript you can turn into a function:
>>> a = [10,20,30,40,50,60]
>>> keep = sorted(random.sample(range(len(a)),3))
>>> keep
[0, 3, 4]
>>> ([a[i] for i in keep], [a[i] for i in range(len(a)) if i not in keep])
([10, 40, 50], [20, 30, 60])
A variation on the shuffle-sort theme...
def random_split(L, size):
index = range(len(L))
random.shuffle(index)
return ([L[i] for i in sorted(index[:size])],
[L[i] for i in sorted(index[size:])])
I'm taking a guess that your random_split isn't supposed to give repeat elements.
If you don't have any duplicates in the original list, this will work as a one-liner as you have it in the original post, but it uses sorting. It's a very simple, if inefficient method of doing it:
import random
x = ['abc', 'c', 'w', 't', '3']
def random_split(x, n):
k = x[:]
random.shuffle(k)
yield sorted(k[:n], key = x.index)
yield sorted(k[n:], key = x.index)
a, b = random_split(x, 3)
Example of results:
>>> a
['c', 'w', 't']
>>> b
['abc', '3']
Here's something in a few lines:
from random import sample
x = ['abc', 'c', 'w', 't', '3']
sample_size = len(x) // 2
sample_set = set(sample(x, sample_size))
split_list = [[x[i] for i in subset] for subset in (sorted(sample_set), sorted(set(x) - sample_set))]