One-liner for nearly redundant list comprehensions - python

Consider two list comprehensions gamma and delta with nearly redundant code. The difference being the sliced lists alpha and beta, namely
gamma = [alpha[i:i+30] for i in range(0,49980,30)]
delta = [beta[i:i+30] for i in range(0,49980,30)]
Is there a pythonic way to write this as a one liner (say gamma,delta = ... )?
I have a few other pieces of code that are similar in nature, and I'd like to simplify the code's seeming redundancy.

Although one-line list-comprehensions are really useful, they aren't always the best choice. So here since you're doing the same chunking to both lists, if you wanted to change the chunking, you would have to modify both lines.
Instead, we could use a function that would chunk any given list and then use a one-line assignment to chunk gamma and delta.
def chunk(l):
return [l[i:i+30] for i in range(0, len(l), 30)]
gamma, delta = chunk(gamma), chunk(delta)

As far as your question related to combining both the list comprehension expression above is concerned, you can get gamma and delta by using zip with single list comprehension as:
gamma, delta = zip(*[(alpha[i:i+30], beta[i:i+30]) for i in range(0,50000,30)])
Sample example to show how zip works:
>>> zip(*[(i, i+1) for i in range(0, 10, 2)])
[(0, 2, 4, 6, 8), (1, 3, 5, 7, 9)]
Here our list comprehension will return the list of tuples:
>>> [(i, i+1) for i in range(0, 10, 2)]
[(0, 1), (2, 3), (4, 5), (6, 7), (8, 9)]
Then we are unpacking this list using * and using zip we are aggregating the element from each of the iterables:
>>> zip(*[(i, i+1) for i in range(0, 10, 2)])
[(0, 2, 4, 6, 8), (1, 3, 5, 7, 9)]
As an alternative, for dividing the list into evenly sized chunks, please take a look at "How do you split a list into evenly sized chunks?"

Just another way...
gamma, delta = ([src[i:i+30] for i in range(0,49980,30)] for src in (alpha, beta))
It's a bit faster than the accepted zip solution:
genny 3.439506340350704
zippy 4.3039169818228515
Code:
from timeit import timeit
alpha = list(range(60000))
beta = list(range(60000))
def genny():
gamma, delta = ([src[i:i+30] for i in range(0,49980,30)] for src in (alpha, beta))
def zippy():
gamma, delta = zip(*[(alpha[i:i+30], beta[i:i+30]) for i in range(0,50000,30)])
n = 1000
print('genny', timeit(genny, number=n))
print('zippy', timeit(zippy, number=n))

You can you lambda expression:
g = lambda l: [l[i:i+30] for i in range(0,50000, 30)]
gamma, delta = g(alpha), g(beta)

Related

How can zip be used to chunk data into equal sized groups?

>>> n = 3
>>> x = range(n ** 2),
>>> xn = list(zip(*[iter(x)] * n))
In PEP 618, the author gives this example of how zip can be used to chunk data into equal sized groups.
How does it work?
I think that it relies on an implementation detail of zip such that if it takes the first element of each of the elements of the list [iter(x)] * n that equates to the first n elements because of the changing state of iter(x) as each of the elements are taken.
This is because the following code replicates the above behavior:
n = 3
x = range(n ** 2)
xn = [iter(x)] * n
res = []
while True:
try:
col = []
for element in xn:
col.append(next(element))
res.append(col)
except:
break
However, I would like to make sure that this is indeed the case and that this is a reliable behavior that can be used to chunk elements of an iterable.
It's not really specific to zip, but you basically have that right. In effect, it's zipping 3 references to the same iterator, causing it to round-robin between them. During each iteration, one more element is consumed from the iterator.
Effectively, it's the same as doing this:
>>> n = 3
>>> x = range(n ** 2)
>>> a = b = c = iter(x)
>>> list(zip(a, b, c))
[(0, 1, 2), (3, 4, 5), (6, 7, 8)]
Note that it only produces equal sized groups and may drop elements (that part is a characteristic of zip, because it's limited by the smallest iterable, though you could use itertools.zip_longest if you want):
>>> n = 4
>>> x = range(n ** 2)
>>> a = b = c = iter(x)
>>> list(zip(a, b, c))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, 14)]
It's not an implementation of zip. It's how iterables work in Python - they always "consume" and move forward.
eg:
whatever = iter([1, 2, 3])
next(whatever)
# 1
next(whatever)
# 2
What zip does is "advance" each object it's provided with and given the example you've provided [iter(x)] * n... this becomes basically zip(whatever, whatever, whatever)
Since zip works in sequence - it takes the first next from whatever - then the next from whatever which has already moved on from the first next, so it's the value of 2. Which means the next one is 3. etc...
It's behaviour by design and the language guarantees it.

Functional programming vs list comprehension

Mark Lutz in his book "Learning Python" gives an example:
>>> [(x,y) for x in range(5) if x%2==0 for y in range(5) if y%2==1]
[(0, 1), (0, 3), (2, 1), (2, 3), (4, 1), (4, 3)]
>>>
a bit later he remarks that 'a map and filter equivalent' of this is possible though complex and nested.
The closest one I ended up with is the following:
>>> list(map(lambda x:list(map(lambda y:(y,x),filter(lambda x:x%2==0,range(5)))), filter(lambda x:x%2==1,range(5))))
[[(0, 1), (2, 1), (4, 1)], [(0, 3), (2, 3), (4, 3)]]
>>>
The order of tuples is different and nested list had to be introduced. I'm curious what would be the equivalent.
A note to append to #Kasramvd's explanation.
Readability is important in Python. It's one of the features of the language. Many will consider the list comprehension the only readable way.
Sometimes, however, especially when you are working with multiple iterations of conditions, it is clearer to separate your criteria from logic. In this case, using the functional method may be preferable.
from itertools import product
def even_and_odd(vals):
return (vals[0] % 2 == 0) and (vals[1] %2 == 1)
n = range(5)
res = list(filter(even_and_odd, product(n, n)))
One important point that you have to notice is that your nested list comprehension is of O(n2) order. Meaning that it's looping over a product of two ranges. If you want to use map and filter you have to create all the combinations. You can do that after or before filtering but what ever you do you can't have all those combinations with those two functions, unless you change the ranges and/or modify something else.
One completely functional approach is to use itertools.product() and filter as following:
In [16]: from itertools import product
In [17]: list(filter(lambda x: x[0]%2==0 and x[1]%2==1, product(range(5), range(5))))
Out[17]: [(0, 1), (0, 3), (2, 1), (2, 3), (4, 1), (4, 3)]
Also note that using a nested list comprehension with two iterations is basically more readable than multiple map/filter functions. And regarding the performance using built-in funcitons is faster than list comprehension when your function are merely built-in so that you can assure all of them are performing at C level. When you break teh chain with something like a lambda function which is Python/higher lever operation your code won't be faster than a list comprehension.
I think the only confusing part in the expression [(x, y) for x in range(5) if x % 2 == 0 for y in range(5) if y % 2 == 1] is that there an implicit flatten operation is hidden.
Let's consider the simplified version of the expression first:
def even(x):
return x % 2 == 0
def odd(x):
return not even(x)
c = map(lambda x: map(lambda y: [x, y],
filter(odd, range(5))),
filter(even, range(5)))
print(c)
# i.e. for each even X we have a list of odd Ys:
# [
# [[0, 1], [0, 3]],
# [[2, 1], [2, 3]],
# [[4, 1], [4, 3]]
# ]
However, we need pretty the same but flattened list [(0, 1), (0, 3), (2, 1), (2, 3), (4, 1), (4, 3)].
From the official python docs we can grab the example of flatten function:
from itertools import chain
flattened = list(chain.from_iterable(c)) # we need list() here to unroll an iterator
print(flattened)
Which is basically an equivalent for the following list comprehension expression:
flattened = [x for sublist in c for x in sublist]
print(flattened)
# ... which is basically an equivalent to:
# result = []
# for sublist in c:
# for x in sublist:
# result.append(x)
Range support step argument, so I come up with this solution using itertools.chain.from_iterable to flatten inner list:
from itertools import chain
list(chain.from_iterable(
map(
lambda x:
list(map(lambda y: (x, y), range(1, 5, 2))),
range(0, 5, 2)
)
))
Output:
Out[415]: [(0, 1), (0, 3), (2, 1), (2, 3), (4, 1), (4, 3)]

List comprehension and function returning multiple values

I wanted to use list comprehension to avoid writing a for loop appending to some lists. But can it work with a function that returns multiple values? I expected this (simplified example) code to work...
def calc(i):
a = i * 2
b = i ** 2
return a, b
steps = [1,2,3,4,5]
ay, be = [calc(s) for s in steps]
... but it doesn't :(
The for-loop appending to each list works:
def calc(i):
a = i * 2
b = i ** 2
return a, b
steps = [1,2,3,4,5]
ay, be = [],[]
for s in steps:
a, b = calc(s)
ay.append(a)
be.append(b)
Is there a better way or do I just stick with this?
Use zip with *:
>>> ay, by = zip(*(calc(x) for x in steps))
>>> ay
(2, 4, 6, 8, 10)
>>> by
(1, 4, 9, 16, 25)
The horrendous "space efficient" version that returns iterators:
from itertools import tee
ay, by = [(r[i] for r in results) for i, results in enumerate(tee(map(calc, steps), 2))]
But basically just use zip because most of the time it's not worth the ugly.
Explanation:
zip(*(calc(x) for x in steps))
will do (calc(x) for x in steps) to get an iterator of [(2, 1), (4, 4), (6, 9), (8, 16), (10, 25)].
When you unpack, you do the equivalent of
zip((2, 1), (4, 4), (6, 9), (8, 16), (10, 25))
so all of the items are stored in memory at once. Proof:
def return_args(*args):
return args
return_args(*(calc(x) for x in steps))
#>>> ((2, 1), (4, 4), (6, 9), (8, 16), (10, 25))
Hence all items are in memory at once.
So how does mine work?
map(calc, steps) is the same as (calc(x) for x in steps) (Python 3). This is an iterator. On Python 2, use imap or (calc(x) for x in steps).
tee(..., 2) gets two iterators that store the difference in iteration. If you iterate in lockstep the tee will take O(1) memory. If you do not, the tee can take up to O(n). So now we have a usage that lets us have O(1) memory up to this point.
enumerate obviously will keep this in constant memory.
(r[i] for r in results) returns an iterator that takes the ith item from each of the results. This means it receives, in this case, a pair (so r=(2,1), r=(4,4), etc. in turn). It returns the specific iterator.
Hence if you iterate ay and by in lockstep constant memory will be used. The memory usage is proportional to the distance between the iterators. This is useful in many cases (imagine diffing a file or suchwhat) but as I said most of the time it's not worth the ugly. There's an extra constant-factor overhead, too.
You should have shown us what
[calc(s) for s in xrange(5)]
does give you, i.e.
[(0, 0), (2, 1), (4, 4), (6, 9), (8, 16)]
While it isn't the 2 lists that you want, it is still a list of lists. Further more, doesn't that look just like?
zip((0, 2, 4, 6, 8), (0, 1, 4, 9, 16))
zip repackages a set of lists. Usually it is illustrated with 2 longer lists, but it works just as well many short lists.
The third step is to remember that fn(*[arg1,arg2, ...]) = fn(arg1,arg2, ...), that is, the * unpacks a list.
Put it all together to get hcwhsa's answer.

Return a sequence of a variable length whose summation is equal to a given integer

In the form f(x,y,z) where x is a given integer sum, y is the minimum length of the sequence, and z is the maximum length of the sequence. But for now let's pretend we're dealing with a sequence of a fixed length, because it will take me a long time to write the question otherwise.
So our function is f(x,r) where x is a given integer sum and r is the length of a sequence in the list of possible sequences.
For x = 10, and r = 2, these are the possible combinations:
1 + 9
2 + 8
3 + 7
4 + 6
5 + 5
Let's store that in Python as a list of pairs:
[(1,9), (2,8), (3,7), (4,6), (5,5)]
So usage looks like:
>>> f(10,2)
[(1,9), (2,8), (3,7), (4,6), (5,5)]
Back to the original question, where a sequence is return for each length in the range (y,x). I the form f(x,y,z), defined earlier, and leaving out sequences of length 1 (where y-z == 0), this would look like:
>>> f(10,1,3)
[{1: [(1,9), (2,8), (3,7), (4,6), (5,5)],
2: [(1,1,8), (1,2,7), (1,3,6) ... (2,4,4) ...],
3: [(1,1,1,7) ...]}]
So the output is a list of dictionaries where the value is a list of pairs. Not exactly optimal.
So my questions are:
Is there a library that handles this already?
If not, can someone help me write both of the functions I mentioned? (fixed sequence length first)?
Because of the huge gaps in my knowledge of fairly trivial math, could you ignore my approach to integer storage and use whatever structure the makes the most sense?
Sorry about all of these arithmetic questions today. Thanks!
The itertools module will definately be helpful as we're dealing with premutations - however, this looks suspiciously like a homework task...
Edit: Looks like fun though, so I'll do an attempt.
Edit 2: This what you want?
from itertools import combinations_with_replacement
from pprint import pprint
f = lambda target_sum, length: [sequence for sequence in combinations_with_replacement(range(1, target_sum+1), length) if sum(sequence) == target_sum]
def f2(target_sum, min_length, max_length):
sequences = {}
for length in range(min_length, max_length + 1):
sequence = f(target_sum, length)
if len(sequence):
sequences[length] = sequence
return sequences
if __name__ == "__main__":
print("f(10,2):")
print(f(10,2))
print()
print("f(10,1,3)")
pprint(f2(10,1,3))
Output:
f(10,2):
[(1, 9), (2, 8), (3, 7), (4, 6), (5, 5)]
f(10,1,3)
{1: [(10,)],
2: [(1, 9), (2, 8), (3, 7), (4, 6), (5, 5)],
3: [(1, 1, 8),
(1, 2, 7),
(1, 3, 6),
(1, 4, 5),
(2, 2, 6),
(2, 3, 5),
(2, 4, 4),
(3, 3, 4)]}
The problem is known as Integer Partitions, and has been widely studied.
Here you can find a paper comparing the performance of several algorithms (and proposing a particular one), but there are a lot of references all over the Net.
I just wrote a recursive generator function, you should figure out how to get a list out of it yourself...
def f(x,y):
if y == 1:
yield (x, )
elif y > 1:
for head in range(1, x-y+2):
for tail in f(x-head, y-1):
yield tuple([head] + list(tail))
def f2(x,y,z):
for u in range(y, z+1):
for v in f(x, u):
yield v
EDIT: I just see it is not exactly what you wanted, my version also generates duplicates where only the ordering differs. But you can simply filter them out by ordering all results and check for duplicate tuples.

Summing Consecutive Ranges Pythonically

I have a sumranges() function, which sums all the ranges of consecutive numbers found in a tuple of tuples. To illustrate:
def sumranges(nums):
return sum([sum([1 for j in range(len(nums[i])) if
nums[i][j] == 0 or
nums[i][j - 1] + 1 != nums[i][j]]) for
i in range(len(nums))])
>>> nums = ((1, 2, 3, 4), (1, 5, 6), (19, 20, 24, 29, 400))
>>> print sumranges(nums)
7
As you can see, it returns the number of ranges of consecutive digits within the tuple, that is: len((1, 2, 3, 4), (1), (5, 6), (19, 20), (24), (29), (400)) = 7. The tuples are always ordered.
My problem is that my sumranges() is terrible. I hate looking at it. I'm currently just iterating through the tuple and each subtuple, assigning a 1 if the number is not (1 + previous number), and summing the total. I feel like I am missing a much easier way to accomplish my stated objective. Does anyone know a more pythonic way to do this?
Edit: I have benchmarked all the answers given thus far. Thanks to all of you for your answers.
The benchmarking code is as follows, using a sample size of 100K:
from time import time
from random import randrange
nums = [sorted(list(set(randrange(1, 10) for i in range(10)))) for
j in range(100000)]
for func in sumranges, alex, matt, redglyph, ephemient, ferdinand:
start = time()
result = func(nums)
end = time()
print ', '.join([func.__name__, str(result), str(end - start) + ' s'])
Results are as follows. Actual answer shown to verify that all functions return the correct answer:
sumranges, 250281, 0.54171204567 s
alex, 250281, 0.531121015549 s
matt, 250281, 0.843333005905 s
redglyph, 250281, 0.366822004318 s
ephemient, 250281, 0.805964946747 s
ferdinand, 250281, 0.405596971512 s
RedGlyph does edge out in terms of speed, but the simplest answer is probably Ferdinand's, and probably wins for most pythonic.
My 2 cents:
>>> sum(len(set(x - i for i, x in enumerate(t))) for t in nums)
7
It's basically the same idea as descriped in Alex' post, but using a set instead of itertools.groupby, resulting in a shorter expression. Since sets are implemented in C and len() of a set runs in constant time, this should also be pretty fast.
Consider:
>>> nums = ((1, 2, 3, 4), (1, 5, 6), (19, 20, 24, 29, 400))
>>> flat = [[(x - i) for i, x in enumerate(tu)] for tu in nums]
>>> print flat
[[1, 1, 1, 1], [1, 4, 4], [19, 19, 22, 26, 396]]
>>> import itertools
>>> print sum(1 for tu in flat for _ in itertools.groupby(tu))
7
>>>
we "flatten" the "increasing ramps" of interest by subtracting the index from the value, turning them into consecutive "runs" of identical values; then we identify and could the "runs" with the precious itertools.groupby. This seems to be a pretty elegant (and speedy) solution to your problem.
Just to show something closer to your original code:
def sumranges(nums):
return sum( (1 for i in nums
for j, v in enumerate(i)
if j == 0 or v != i[j-1] + 1) )
The idea here was to:
avoid building intermediate lists but use a generator instead, it will save some resources
avoid using indices when you already have selected a subelement (i and v above).
The remaining sum() is still necessary with my example though.
Here's my attempt:
def ranges(ls):
for l in ls:
consec = False
for (a,b) in zip(l, l[1:]+(None,)):
if b == a+1:
consec = True
if b is not None and b != a+1:
consec = False
if consec:
yield 1
'''
>>> nums = ((1, 2, 3, 4), (1, 5, 6), (19, 20, 24, 29, 400))
>>> print sum(ranges(nums))
7
'''
It looks at the numbers pairwise, checking if they are a consecutive pair (unless it's at the last element of the list). Each time there's a consecutive pair of numbers it yields 1.
This could probably be put together in a more compact form, but I think clarity would suffer:
def pairs(seq):
for i in range(1,len(seq)):
yield (seq[i-1], seq[i])
def isadjacent(pair):
return pair[0]+1 == pair[1]
def sumrange(seq):
return 1 + sum([1 for pair in pairs(seq) if not isadjacent(pair)])
def sumranges(nums):
return sum([sumrange(seq) for seq in nums])
nums = ((1, 2, 3, 4), (1, 5, 6), (19, 20, 24, 29, 400))
print sumranges(nums) # prints 7
You could probably do this better if you had an IntervalSet class because then you would scan through your ranges to build your IntervalSet, then just use the count of set members.
Some tasks don't always lend themselves to neat code, particularly if you need to write the code for performance.
There is a formula for this, the sum of the first n numbers, 1+ 2+ ... + n = n(n+1) / 2 . Then if you want to have the sum of i-j then it is (j(j+1)/2) - (i(i+1)/2) this I am sure simplifies but you can work that out. It might not be pythonic but it is what I would use.

Categories