Functional programming vs list comprehension - python

Mark Lutz in his book "Learning Python" gives an example:
>>> [(x,y) for x in range(5) if x%2==0 for y in range(5) if y%2==1]
[(0, 1), (0, 3), (2, 1), (2, 3), (4, 1), (4, 3)]
>>>
a bit later he remarks that 'a map and filter equivalent' of this is possible though complex and nested.
The closest one I ended up with is the following:
>>> list(map(lambda x:list(map(lambda y:(y,x),filter(lambda x:x%2==0,range(5)))), filter(lambda x:x%2==1,range(5))))
[[(0, 1), (2, 1), (4, 1)], [(0, 3), (2, 3), (4, 3)]]
>>>
The order of tuples is different and nested list had to be introduced. I'm curious what would be the equivalent.

A note to append to #Kasramvd's explanation.
Readability is important in Python. It's one of the features of the language. Many will consider the list comprehension the only readable way.
Sometimes, however, especially when you are working with multiple iterations of conditions, it is clearer to separate your criteria from logic. In this case, using the functional method may be preferable.
from itertools import product
def even_and_odd(vals):
return (vals[0] % 2 == 0) and (vals[1] %2 == 1)
n = range(5)
res = list(filter(even_and_odd, product(n, n)))

One important point that you have to notice is that your nested list comprehension is of O(n2) order. Meaning that it's looping over a product of two ranges. If you want to use map and filter you have to create all the combinations. You can do that after or before filtering but what ever you do you can't have all those combinations with those two functions, unless you change the ranges and/or modify something else.
One completely functional approach is to use itertools.product() and filter as following:
In [16]: from itertools import product
In [17]: list(filter(lambda x: x[0]%2==0 and x[1]%2==1, product(range(5), range(5))))
Out[17]: [(0, 1), (0, 3), (2, 1), (2, 3), (4, 1), (4, 3)]
Also note that using a nested list comprehension with two iterations is basically more readable than multiple map/filter functions. And regarding the performance using built-in funcitons is faster than list comprehension when your function are merely built-in so that you can assure all of them are performing at C level. When you break teh chain with something like a lambda function which is Python/higher lever operation your code won't be faster than a list comprehension.

I think the only confusing part in the expression [(x, y) for x in range(5) if x % 2 == 0 for y in range(5) if y % 2 == 1] is that there an implicit flatten operation is hidden.
Let's consider the simplified version of the expression first:
def even(x):
return x % 2 == 0
def odd(x):
return not even(x)
c = map(lambda x: map(lambda y: [x, y],
filter(odd, range(5))),
filter(even, range(5)))
print(c)
# i.e. for each even X we have a list of odd Ys:
# [
# [[0, 1], [0, 3]],
# [[2, 1], [2, 3]],
# [[4, 1], [4, 3]]
# ]
However, we need pretty the same but flattened list [(0, 1), (0, 3), (2, 1), (2, 3), (4, 1), (4, 3)].
From the official python docs we can grab the example of flatten function:
from itertools import chain
flattened = list(chain.from_iterable(c)) # we need list() here to unroll an iterator
print(flattened)
Which is basically an equivalent for the following list comprehension expression:
flattened = [x for sublist in c for x in sublist]
print(flattened)
# ... which is basically an equivalent to:
# result = []
# for sublist in c:
# for x in sublist:
# result.append(x)

Range support step argument, so I come up with this solution using itertools.chain.from_iterable to flatten inner list:
from itertools import chain
list(chain.from_iterable(
map(
lambda x:
list(map(lambda y: (x, y), range(1, 5, 2))),
range(0, 5, 2)
)
))
Output:
Out[415]: [(0, 1), (0, 3), (2, 1), (2, 3), (4, 1), (4, 3)]

Related

Sort tuple list with another list

I have a tuple list to_order such as:
to_order = [(0, 1), (1, 3), (2, 2), (3,2)]
And a list which gives the order to apply to the second element of each tuple of to_order:
order = [2, 1, 3]
So I am looking for a way to get this output:
ordered_list = [(2, 2), (3,2), (0, 1), (1, 3)]
Any ideas?
You can provide a key that will check the index (of the second element) in order and sort based on it:
to_order = [(0, 1), (1, 3), (2, 2), (3,2)]
order = [2, 1, 3]
print(sorted(to_order, key=lambda item: order.index(item[1]))) # [(2, 2), (3, 2), (0, 1), (1, 3)]
EDIT
Since, a discussion on time complexities was start... here ya go, the following algorithm runs in O(n+m), using Eric's input example:
N = 5
to_order = [(randrange(N), randrange(N)) for _ in range(10*N)]
order = list(set(pair[1] for pair in to_order))
shuffle(order)
def eric_sort(to_order, order):
bins = {}
for pair in to_order:
bins.setdefault(pair[1], []).append(pair)
return [pair for i in order for pair in bins[i]]
def alfasin_new_sort(to_order, order):
arr = [[] for i in range(len(order))]
d = {k:v for v, k in enumerate(order)}
for item in to_order:
arr[d[item[1]]].append(item)
return [item for sublist in arr for item in sublist]
from timeit import timeit
print("eric_sort", timeit("eric_sort(to_order, order)", setup=setup, number=1000))
print("alfasin_new_sort", timeit("alfasin_new_sort(to_order, order)", setup=setup, number=1000))
OUTPUT:
eric_sort 59.282021682999584
alfasin_new_sort 44.28244407700004
Algorithm
You can distribute the tuples in a dict of lists according to the second element and iterate over order indices to get the sorted list:
from collections import defaultdict
to_order = [(0, 1), (1, 3), (2, 2), (3, 2)]
order = [2, 1, 3]
bins = defaultdict(list)
for pair in to_order:
bins[pair[1]].append(pair)
print(bins)
# defaultdict(<class 'list'>, {1: [(0, 1)], 3: [(1, 3)], 2: [(2, 2), (3, 2)]})
print([pair for i in order for pair in bins[i]])
# [(2, 2), (3, 2), (0, 1), (1, 3)]
sort or index aren't needed and the output is stable.
This algorithm is similar to the mapping mentioned in the supposed duplicate. This linked answer only works if to_order and order have the same lengths, which isn't the case in OP's question.
Performance
This algorithm iterates twice over each element of to_order. The complexity is O(n). #alfasin's first algorithm is much slower (O(n * m * log n)), but his second one is also O(n).
Here's a list with 10000 random pairs between 0 and 1000. We extract the unique second elements and shuffle them in order to define order:
from random import randrange, shuffle
from collections import defaultdict
from timeit import timeit
from itertools import chain
N = 1000
to_order = [(randrange(N), randrange(N)) for _ in range(10*N)]
order = list(set(pair[1] for pair in to_order))
shuffle(order)
def eric(to_order, order):
bins = defaultdict(list)
for pair in to_order:
bins[pair[1]].append(pair)
return list(chain.from_iterable(bins[i] for i in order))
def alfasin1(to_order, order):
arr = [[] for i in range(len(order))]
d = {k:v for v, k in enumerate(order)}
for item in to_order:
arr[d[item[1]]].append(item)
return [item for sublist in arr for item in sublist]
def alfasin2(to_order, order):
return sorted(to_order, key=lambda item: order.index(item[1]))
print(eric(to_order, order) == alfasin1(to_order, order))
# True
print(eric(to_order, order) == alfasin2(to_order, order))
# True
print("eric", timeit("eric(to_order, order)", globals=globals(), number=100))
# eric 0.3117517130003762
print("alfasin1", timeit("alfasin1(to_order, order)", globals=globals(), number=100))
# alfasin1 0.36100843100030033
print("alfasin2", timeit("alfasin2(to_order, order)", globals=globals(), number=100))
# alfasin2 15.031453827000405
Another solution:
[item for key in order for item in filter(lambda x: x[1] == key, to_order)]
This solution works off of order first, filtering to_order for each key in order.
Equivalent:
ordered = []
for key in order:
for item in filter(lambda x: x[1] == key, to_order):
ordered.append(item)
Shorter, but I'm not aware of a way to do this with list comprehension:
ordered = []
for key in order:
ordered.extend(filter(lambda x: x[1] == key, to_order))
Note: This will not throw a ValueError if to_order contains a tuple x where x[1] is not in order.
I personally prefer the list objects sort function rather than the built-in sort which generates a new list rather than changing the list in place.
to_order = [(0, 1), (1, 3), (2, 2), (3,2)]
order = [2, 1, 3]
to_order.sort(key=lambda x: order.index(x[1]))
print(to_order)
>[(2, 2), (3, 2), (0, 1), (1, 3)]
A little explanation on the way: The key parameter of the sort method basically preprocesses the list and ranks all the values based on a measure. In our case order.index() looks at the first occurrence of the currently processed item and returns its position.
x = [1,2,3,4,5,3,3,5]
print x.index(5)
>4

how to find combinations of elements with minimum length of N using itertools-Python

from itertools import combinations
a = [1,2,3]
combinations(a,2) #will give me ((1,2),(1,3),(2,3))
combinations(a,3) #will give me ((1,2,3),)
but what if I want results of different length which is in a array
e.g.
I want to find all combinations of given array[1,2,3] of length more than or equal to 2
so result should be ((1,2),(1,3),(2,3),(1,2,3))
something like c = combinations(a,>=2)
I tried to use lambda but its not working
c = combinations(a,lambda x: x for x in [2,3])
as well list comprehensive c = combinations(a,[x for x in [2,3]])
I know I can use a simple loop and then find out the combinations of diff length.
for l in [2,3]:
combinations(a,l)
But Is there any pythonic way to do this?
You could combine combinations and chain.from_iterable:
>>> from itertools import chain, combinations
>>> a = [1,2,3]
>>> n = 2
>>> cc = chain.from_iterable(combinations(a, i) for i in range(n, len(a)+1))
>>> list(cc)
[(1, 2), (1, 3), (2, 3), (1, 2, 3)]
chain.from_iterable here is flattening what the generator expression (combinations(a, i) for i in range(n, len(a)+1)) is producing. Otherwise you'd wind up with something like
>>> [list(combinations(a,i)) for i in range(n, len(a)+1)]
[[(1, 2), (1, 3), (2, 3)], [(1, 2, 3)]]
which is fine, but not quite in the format you were looking for.

How to implement this using map and filter?

How to write a statement using map and filter to get same result as this list comprehension expression:
[(x,y) for x in range(10) if x%5==0
for y in range(10) if y%5==1]
result:
[(0, 1), (0, 6), (5, 1), (5, 6)]
I know it seems to be pointless, but I'm just really curious
This is how I did it without comprehesions:
sum(map(lambda x: map(lambda y: (x,y), filter(lambda y: y%5==1,range(10))), filter(lambda x: x%5==0,range(10))),[])
Executing:
>>> sum(map(lambda x: map(lambda y: (x,y), filter(lambda y: y%5==1,range(10))), filter(lambda x: x%5==0,range(10))),[])
[(0, 1), (0, 6), (5, 1), (5, 6)]
The last, and (maybe)nasty trick is using sum to flatten the list. I was getting [[(0, 1), (0, 6)], [(5, 1), (5, 6)]].
Judging by the comments to one of the other answers, you want the cartesian product part of this - that is, this bit:
[(x,y) for x in range(10) for y in range(10)]
also done using just map and filter. This is impossible. The output of map has exactly as many elements as the iterable you input into it. The output of filter has length of no more than the length of the input. The length of a cartesian product is the product of the lengths of the inputs, which no combination of map and filter can give you.
To do the cartesian product part, you need some kind of nested loops. You can write them out yourself, as you have in the list comprehension, or you can use itertools.product:
product(range(10), range(10))
After that, you just need two applications of filter - to directly translate from a listcomp to map/filter, you need one application of filter for each if-part of the listcomp. The ifs are attached to each variable iteration rather than to the result, so you put them around each argument to product - which also avoids having to nest them creatively. Finally, we need to pass it through list since product gives an iterator.
It looks like this:
list(product(filter(lambda x: (x%5 == 0), range(10)), filter(lambda y: (y % 5 == 1), range(10))))

Sort list by nested tuple values

Is there a better way to sort a list by a nested tuple values than writing an itemgetter alternative that extracts the nested tuple value:
def deep_get(*idx):
def g(t):
for i in idx: t = t[i]
return t
return g
>>> l = [((2,1), 1),((1,3), 1),((3,6), 1),((4,5), 2)]
>>> sorted(l, key=deep_get(0,0))
[((1, 3), 1), ((2, 1), 1), ((3, 6), 1), ((4, 5), 2)]
>>> sorted(l, key=deep_get(0,1))
[((2, 1), 1), ((1, 3), 1), ((4, 5), 2), ((3, 6), 1)]
I thought about using compose, but that's not in the standard library:
sorted(l, key=compose(itemgetter(1), itemgetter(0))
Is there something I missed in the libs that would make this code nicer?
The implementation should work reasonably with 100k items.
Context: I would like to sort a dictionary of items that are a histogram. The keys are a tuples (a,b) and the value is the count. In the end the items should be sorted by count descending, a and b. An alternative is to flatten the tuple and use the itemgetter directly but this way a lot of tuples will be generated.
Yes, you could just use a key=lambda x: x[0][1]
Your approach is quite good, given the data structure that you have.
Another approach would be to use another structure.
If you want speed, the de-factor standard NumPy is the way to go. Its job is to efficiently handle large arrays. It even has some nice sorting routines for arrays like yours. Here is how you would write your sort over the counts, and then over (a, b):
>>> arr = numpy.array([((2,1), 1),((1,3), 1),((3,6), 1),((4,5), 2)],
dtype=[('pos', [('a', int), ('b', int)]), ('count', int)])
>>> print numpy.sort(arr, order=['count', 'pos'])
[((1, 3), 1) ((2, 1), 1) ((3, 6), 1) ((4, 5), 2)]
This is very fast (it's implemented in C).
If you want to stick with standard Python, a list containing (count, a, b) tuples would automatically get sorted in the way you want by Python (which uses lexicographic order on tuples).
I compared two similar solutions. The first one uses a simple lambda:
def sort_one(d):
result = d.items()
result.sort(key=lambda x: (-x[1], x[0]))
return result
Note the minus on x[1], because you want the sort to be descending on count.
The second one takes advantage of the fact that sort in Python is stable. First, we sort by (a, b) (ascending). Then we sort by count, descending:
def sort_two(d):
result = d.items()
result.sort()
result.sort(key=itemgetter(1), reverse=True)
return result
The first one is 10-20% faster (both on small and large datasets), and both complete under 0.5sec on my Q6600 (one core used) for 100k items. So avoiding the creation of tuples doesn't seem to help much.
This might be a little faster version of your approach:
l = [((2,1), 1), ((1,3), 1), ((3,6), 1), ((4,5), 2)]
def deep_get(*idx):
def g(t):
return reduce(lambda t, i: t[i], idx, t)
return g
>>> sorted(l, key=deep_get(0,1))
[((2, 1), 1), ((1, 3), 1), ((4, 5), 2), ((3, 6), 1)]
Which could be shortened to:
def deep_get(*idx):
return lambda t: reduce(lambda t, i: t[i], idx, t)
or even just simply written-out:
sorted(l, key=lambda t: reduce(lambda t, i: t[i], (0,1), t))

Using Python's list index() method on a list of tuples or objects?

Python's list type has an index() method that takes one parameter and returns the index of the first item in the list matching the parameter. For instance:
>>> some_list = ["apple", "pear", "banana", "grape"]
>>> some_list.index("pear")
1
>>> some_list.index("grape")
3
Is there a graceful (idiomatic) way to extend this to lists of complex objects, like tuples? Ideally, I'd like to be able to do something like this:
>>> tuple_list = [("pineapple", 5), ("cherry", 7), ("kumquat", 3), ("plum", 11)]
>>> some_list.getIndexOfTuple(1, 7)
1
>>> some_list.getIndexOfTuple(0, "kumquat")
2
getIndexOfTuple() is just a hypothetical method that accepts a sub-index and a value, and then returns the index of the list item with the given value at that sub-index. I hope
Is there some way to achieve that general result, using list comprehensions or lambas or something "in-line" like that? I think I could write my own class and method, but I don't want to reinvent the wheel if Python already has a way to do it.
How about this?
>>> tuple_list = [("pineapple", 5), ("cherry", 7), ("kumquat", 3), ("plum", 11)]
>>> [x for x, y in enumerate(tuple_list) if y[1] == 7]
[1]
>>> [x for x, y in enumerate(tuple_list) if y[0] == 'kumquat']
[2]
As pointed out in the comments, this would get all matches. To just get the first one, you can do:
>>> [y[0] for y in tuple_list].index('kumquat')
2
There is a good discussion in the comments as to the speed difference between all the solutions posted. I may be a little biased but I would personally stick to a one-liner as the speed we're talking about is pretty insignificant versus creating functions and importing modules for this problem, but if you are planning on doing this to a very large amount of elements you might want to look at the other answers provided, as they are faster than what I provided.
Those list comprehensions are messy after a while.
I like this Pythonic approach:
from operator import itemgetter
tuple_list = [("pineapple", 5), ("cherry", 7), ("kumquat", 3), ("plum", 11)]
def collect(l, index):
return map(itemgetter(index), l)
# And now you can write this:
collect(tuple_list,0).index("cherry") # = 1
collect(tuple_list,1).index("3") # = 2
If you need your code to be all super performant:
# Stops iterating through the list as soon as it finds the value
def getIndexOfTuple(l, index, value):
for pos,t in enumerate(l):
if t[index] == value:
return pos
# Matches behavior of list.index
raise ValueError("list.index(x): x not in list")
getIndexOfTuple(tuple_list, 0, "cherry") # = 1
One possibility is to use the itemgetter function from the operator module:
import operator
f = operator.itemgetter(0)
print map(f, tuple_list).index("cherry") # yields 1
The call to itemgetter returns a function that will do the equivalent of foo[0] for anything passed to it. Using map, you then apply that function to each tuple, extracting the info into a new list, on which you then call index as normal.
map(f, tuple_list)
is equivalent to:
[f(tuple_list[0]), f(tuple_list[1]), ...etc]
which in turn is equivalent to:
[tuple_list[0][0], tuple_list[1][0], tuple_list[2][0]]
which gives:
["pineapple", "cherry", ...etc]
You can do this with a list comprehension and index()
tuple_list = [("pineapple", 5), ("cherry", 7), ("kumquat", 3), ("plum", 11)]
[x[0] for x in tuple_list].index("kumquat")
2
[x[1] for x in tuple_list].index(7)
1
Inspired by this question, I found this quite elegant:
>>> tuple_list = [("pineapple", 5), ("cherry", 7), ("kumquat", 3), ("plum", 11)]
>>> next(i for i, t in enumerate(tuple_list) if t[1] == 7)
1
>>> next(i for i, t in enumerate(tuple_list) if t[0] == "kumquat")
2
I would place this as a comment to Triptych, but I can't comment yet due to lack of rating:
Using the enumerator method to match on sub-indices in a list of tuples.
e.g.
li = [(1,2,3,4), (11,22,33,44), (111,222,333,444), ('a','b','c','d'),
('aa','bb','cc','dd'), ('aaa','bbb','ccc','ddd')]
# want pos of item having [22,44] in positions 1 and 3:
def getIndexOfTupleWithIndices(li, indices, vals):
# if index is a tuple of subindices to match against:
for pos,k in enumerate(li):
match = True
for i in indices:
if k[i] != vals[i]:
match = False
break;
if (match):
return pos
# Matches behavior of list.index
raise ValueError("list.index(x): x not in list")
idx = [1,3]
vals = [22,44]
print getIndexOfTupleWithIndices(li,idx,vals) # = 1
idx = [0,1]
vals = ['a','b']
print getIndexOfTupleWithIndices(li,idx,vals) # = 3
idx = [2,1]
vals = ['cc','bb']
print getIndexOfTupleWithIndices(li,idx,vals) # = 4
ok, it might be a mistake in vals(j), the correction is:
def getIndex(li,indices,vals):
for pos,k in enumerate(lista):
match = True
for i in indices:
if k[i] != vals[indices.index(i)]:
match = False
break
if(match):
return pos
z = list(zip(*tuple_list))
z[1][z[0].index('persimon')]
tuple_list = [("pineapple", 5), ("cherry", 7), ("kumquat", 3), ("plum", 11)]
def eachtuple(tupple, pos1, val):
for e in tupple:
if e == val:
return True
for e in tuple_list:
if eachtuple(e, 1, 7) is True:
print tuple_list.index(e)
for e in tuple_list:
if eachtuple(e, 0, "kumquat") is True:
print tuple_list.index(e)
Python's list.index(x) returns index of the first occurrence of x in the list. So we can pass objects returned by list compression to get their index.
>>> tuple_list = [("pineapple", 5), ("cherry", 7), ("kumquat", 3), ("plum", 11)]
>>> [tuple_list.index(t) for t in tuple_list if t[1] == 7]
[1]
>>> [tuple_list.index(t) for t in tuple_list if t[0] == 'kumquat']
[2]
With the same line, we can also get the list of index in case there are multiple matched elements.
>>> tuple_list = [("pineapple", 5), ("cherry", 7), ("kumquat", 3), ("plum", 11), ("banana", 7)]
>>> [tuple_list.index(t) for t in tuple_list if t[1] == 7]
[1, 4]
I guess the following is not the best way to do it (speed and elegance concerns) but well, it could help :
from collections import OrderedDict as od
t = [('pineapple', 5), ('cherry', 7), ('kumquat', 3), ('plum', 11)]
list(od(t).keys()).index('kumquat')
2
list(od(t).values()).index(7)
7
# bonus :
od(t)['kumquat']
3
list of tuples with 2 members can be converted to ordered dict directly, data structures are actually the same, so we can use dict method on the fly.
This is also possible using Lambda expressions:
l = [('rana', 1, 1), ('pato', 1, 1), ('perro', 1, 1)]
map(lambda x:x[0], l).index("pato") # returns 1
Edit to add examples:
l=[['rana', 1, 1], ['pato', 2, 1], ['perro', 1, 1], ['pato', 2, 2], ['pato', 2, 2]]
extract all items by condition:
filter(lambda x:x[0]=="pato", l) #[['pato', 2, 1], ['pato', 2, 2], ['pato', 2, 2]]
extract all items by condition with index:
>>> filter(lambda x:x[1][0]=="pato", enumerate(l))
[(1, ['pato', 2, 1]), (3, ['pato', 2, 2]), (4, ['pato', 2, 2])]
>>> map(lambda x:x[1],_)
[['pato', 2, 1], ['pato', 2, 2], ['pato', 2, 2]]
Note: The _ variable only works in the interactive interpreter. More generally, one must explicitly assign _, i.e. _=filter(lambda x:x[1][0]=="pato", enumerate(l)).
I came up with a quick and dirty approach using max and lambda.
>>> tuple_list = [("pineapple", 5), ("cherry", 7), ("kumquat", 3), ("plum", 11)]
>>> target = 7
>>> max(range(len(tuple_list)), key=lambda i: tuple_list[i][1] == target)
1
There is a caveat though that if the list does not contain the target, the returned index will be 0, which could be misleading.
>>> target = -1
>>> max(range(len(tuple_list)), key=lambda i: tuple_list[i][1] == target)
0

Categories