I have a tuple list to_order such as:
to_order = [(0, 1), (1, 3), (2, 2), (3,2)]
And a list which gives the order to apply to the second element of each tuple of to_order:
order = [2, 1, 3]
So I am looking for a way to get this output:
ordered_list = [(2, 2), (3,2), (0, 1), (1, 3)]
Any ideas?
You can provide a key that will check the index (of the second element) in order and sort based on it:
to_order = [(0, 1), (1, 3), (2, 2), (3,2)]
order = [2, 1, 3]
print(sorted(to_order, key=lambda item: order.index(item[1]))) # [(2, 2), (3, 2), (0, 1), (1, 3)]
EDIT
Since, a discussion on time complexities was start... here ya go, the following algorithm runs in O(n+m), using Eric's input example:
N = 5
to_order = [(randrange(N), randrange(N)) for _ in range(10*N)]
order = list(set(pair[1] for pair in to_order))
shuffle(order)
def eric_sort(to_order, order):
bins = {}
for pair in to_order:
bins.setdefault(pair[1], []).append(pair)
return [pair for i in order for pair in bins[i]]
def alfasin_new_sort(to_order, order):
arr = [[] for i in range(len(order))]
d = {k:v for v, k in enumerate(order)}
for item in to_order:
arr[d[item[1]]].append(item)
return [item for sublist in arr for item in sublist]
from timeit import timeit
print("eric_sort", timeit("eric_sort(to_order, order)", setup=setup, number=1000))
print("alfasin_new_sort", timeit("alfasin_new_sort(to_order, order)", setup=setup, number=1000))
OUTPUT:
eric_sort 59.282021682999584
alfasin_new_sort 44.28244407700004
Algorithm
You can distribute the tuples in a dict of lists according to the second element and iterate over order indices to get the sorted list:
from collections import defaultdict
to_order = [(0, 1), (1, 3), (2, 2), (3, 2)]
order = [2, 1, 3]
bins = defaultdict(list)
for pair in to_order:
bins[pair[1]].append(pair)
print(bins)
# defaultdict(<class 'list'>, {1: [(0, 1)], 3: [(1, 3)], 2: [(2, 2), (3, 2)]})
print([pair for i in order for pair in bins[i]])
# [(2, 2), (3, 2), (0, 1), (1, 3)]
sort or index aren't needed and the output is stable.
This algorithm is similar to the mapping mentioned in the supposed duplicate. This linked answer only works if to_order and order have the same lengths, which isn't the case in OP's question.
Performance
This algorithm iterates twice over each element of to_order. The complexity is O(n). #alfasin's first algorithm is much slower (O(n * m * log n)), but his second one is also O(n).
Here's a list with 10000 random pairs between 0 and 1000. We extract the unique second elements and shuffle them in order to define order:
from random import randrange, shuffle
from collections import defaultdict
from timeit import timeit
from itertools import chain
N = 1000
to_order = [(randrange(N), randrange(N)) for _ in range(10*N)]
order = list(set(pair[1] for pair in to_order))
shuffle(order)
def eric(to_order, order):
bins = defaultdict(list)
for pair in to_order:
bins[pair[1]].append(pair)
return list(chain.from_iterable(bins[i] for i in order))
def alfasin1(to_order, order):
arr = [[] for i in range(len(order))]
d = {k:v for v, k in enumerate(order)}
for item in to_order:
arr[d[item[1]]].append(item)
return [item for sublist in arr for item in sublist]
def alfasin2(to_order, order):
return sorted(to_order, key=lambda item: order.index(item[1]))
print(eric(to_order, order) == alfasin1(to_order, order))
# True
print(eric(to_order, order) == alfasin2(to_order, order))
# True
print("eric", timeit("eric(to_order, order)", globals=globals(), number=100))
# eric 0.3117517130003762
print("alfasin1", timeit("alfasin1(to_order, order)", globals=globals(), number=100))
# alfasin1 0.36100843100030033
print("alfasin2", timeit("alfasin2(to_order, order)", globals=globals(), number=100))
# alfasin2 15.031453827000405
Another solution:
[item for key in order for item in filter(lambda x: x[1] == key, to_order)]
This solution works off of order first, filtering to_order for each key in order.
Equivalent:
ordered = []
for key in order:
for item in filter(lambda x: x[1] == key, to_order):
ordered.append(item)
Shorter, but I'm not aware of a way to do this with list comprehension:
ordered = []
for key in order:
ordered.extend(filter(lambda x: x[1] == key, to_order))
Note: This will not throw a ValueError if to_order contains a tuple x where x[1] is not in order.
I personally prefer the list objects sort function rather than the built-in sort which generates a new list rather than changing the list in place.
to_order = [(0, 1), (1, 3), (2, 2), (3,2)]
order = [2, 1, 3]
to_order.sort(key=lambda x: order.index(x[1]))
print(to_order)
>[(2, 2), (3, 2), (0, 1), (1, 3)]
A little explanation on the way: The key parameter of the sort method basically preprocesses the list and ranks all the values based on a measure. In our case order.index() looks at the first occurrence of the currently processed item and returns its position.
x = [1,2,3,4,5,3,3,5]
print x.index(5)
>4
Related
Let's say I have the following array:
a = [4,2,3,1,4]
Then I sort it:
b = sorted(A) = [1,2,3,4,4]
How could I have a list that map where each number was, ex:
position(b,a) = [3,1,2,0,4]
to clarify this list contains the positions not values)
(ps' also taking in account that first 4 was in position 0)
b = sorted(enumerate(a), key=lambda i: i[1])
This results is a list of tuples, the first item of which is the original index and second of which is the value:
[(3, 1), (1, 2), (2, 3), (0, 4), (4, 4)]
def position(a):
return sorted(range(len(a)), key=lambda k: a[k])
This question already has answers here:
How do I sort a dictionary by value?
(34 answers)
Closed 2 years ago.
Im trying to get the output from my dictionary to be ordered from their values in stead of keys
Question:
ValueCount that accepts a list as a parameter. Your function will return a list of tuples. Each tuple will contain a value and the number of times that value appears in the list
Desired outcome
>>> data = [1,2,3,1,2,3,5,5,4]
>>> ValueCount(data)
[(1, 2), (2, 2), (5, 1), (4, 1)]
My code and outcome
def CountValues(data):
dict1 = {}
for number in data:
if number not in dict1:
dict1[number] = 1
else:
dict1[number] += 1
tuple_data = dict1.items()
lst = sorted(tuple_data)
return(lst)
>>>[(1, 2), (2, 2), (3, 2), (4, 1), (5, 2)]
How would I sort it ascendingly by using the values instead of keys.
If you want to sort by the values(second item in each tuple), specify key:
sorted(tuple_data, key=lambda x: x[1])
Or with operator.itemgetter:
sorted(tuple_data, key=operator.itemgetter(1))
Also as a side note, your counting code:
dict1 = {}
for number in data:
if number not in dict1:
dict1[number] = 1
else:
dict1[number] += 1
Can be simplified with collections.Counter:
dict1 = collections.Counter(data)
With all the above in mind, your code could look like this:
from operator import itemgetter
from collections import Counter
def CountValues(data):
counts = Counter(data)
return sorted(counts.items(), key=itemgetter(1))
print(CountValues([1,2,3,1,2,3,5,5,4]))
# [(4, 1), (1, 2), (2, 2), (3, 2), (5, 2)]
You can use the sorted with the help of key parameter. it is not a in-place sorting . Thus it never modifies the original array.
for more
In [18]: data = [1,2,3,1,2,3,5,5,4]
In [19]: from collections import Counter
In [20]: x=Counter(data).items()
#Sorted OUTPUT
In [21]: sorted(list(x), key= lambda i:i[1] )
Out[21]: [(4, 1), (1, 2), (2, 2), (3, 2), (5, 2)]
In [22]: x
Out[22]: dict_items([(1, 2), (2, 2), (3, 2), (5, 2), (4, 1)])
"Sort" function uses first element of data.
To sort dictionary by its values you can use for-loop for values:
d={1:1,2:2,5:2,4:3,3:2}
x=[]
for i in set(sorted(d.values())):
for j in sorted(d.items()):
if j[1]==i:
x.append(j)
print(x)
if you don't convert sorted(d.values()) to set{} , it will check every value, even there are same numbers. For example if your values list is [1,2,2,3] , it will check items for value "2" two times and as a result your sorted list will contain repeated data which both have value "2" . But set{} keeps only one of each element and in this case, for-loop will check every different value of d.values() . And if there are items with a same value, code will sort them by keys because of sorted(d.items()) .
(to understand better you can use this code without that set{} and use d.items() instead of sorted(d.items()))
Mark Lutz in his book "Learning Python" gives an example:
>>> [(x,y) for x in range(5) if x%2==0 for y in range(5) if y%2==1]
[(0, 1), (0, 3), (2, 1), (2, 3), (4, 1), (4, 3)]
>>>
a bit later he remarks that 'a map and filter equivalent' of this is possible though complex and nested.
The closest one I ended up with is the following:
>>> list(map(lambda x:list(map(lambda y:(y,x),filter(lambda x:x%2==0,range(5)))), filter(lambda x:x%2==1,range(5))))
[[(0, 1), (2, 1), (4, 1)], [(0, 3), (2, 3), (4, 3)]]
>>>
The order of tuples is different and nested list had to be introduced. I'm curious what would be the equivalent.
A note to append to #Kasramvd's explanation.
Readability is important in Python. It's one of the features of the language. Many will consider the list comprehension the only readable way.
Sometimes, however, especially when you are working with multiple iterations of conditions, it is clearer to separate your criteria from logic. In this case, using the functional method may be preferable.
from itertools import product
def even_and_odd(vals):
return (vals[0] % 2 == 0) and (vals[1] %2 == 1)
n = range(5)
res = list(filter(even_and_odd, product(n, n)))
One important point that you have to notice is that your nested list comprehension is of O(n2) order. Meaning that it's looping over a product of two ranges. If you want to use map and filter you have to create all the combinations. You can do that after or before filtering but what ever you do you can't have all those combinations with those two functions, unless you change the ranges and/or modify something else.
One completely functional approach is to use itertools.product() and filter as following:
In [16]: from itertools import product
In [17]: list(filter(lambda x: x[0]%2==0 and x[1]%2==1, product(range(5), range(5))))
Out[17]: [(0, 1), (0, 3), (2, 1), (2, 3), (4, 1), (4, 3)]
Also note that using a nested list comprehension with two iterations is basically more readable than multiple map/filter functions. And regarding the performance using built-in funcitons is faster than list comprehension when your function are merely built-in so that you can assure all of them are performing at C level. When you break teh chain with something like a lambda function which is Python/higher lever operation your code won't be faster than a list comprehension.
I think the only confusing part in the expression [(x, y) for x in range(5) if x % 2 == 0 for y in range(5) if y % 2 == 1] is that there an implicit flatten operation is hidden.
Let's consider the simplified version of the expression first:
def even(x):
return x % 2 == 0
def odd(x):
return not even(x)
c = map(lambda x: map(lambda y: [x, y],
filter(odd, range(5))),
filter(even, range(5)))
print(c)
# i.e. for each even X we have a list of odd Ys:
# [
# [[0, 1], [0, 3]],
# [[2, 1], [2, 3]],
# [[4, 1], [4, 3]]
# ]
However, we need pretty the same but flattened list [(0, 1), (0, 3), (2, 1), (2, 3), (4, 1), (4, 3)].
From the official python docs we can grab the example of flatten function:
from itertools import chain
flattened = list(chain.from_iterable(c)) # we need list() here to unroll an iterator
print(flattened)
Which is basically an equivalent for the following list comprehension expression:
flattened = [x for sublist in c for x in sublist]
print(flattened)
# ... which is basically an equivalent to:
# result = []
# for sublist in c:
# for x in sublist:
# result.append(x)
Range support step argument, so I come up with this solution using itertools.chain.from_iterable to flatten inner list:
from itertools import chain
list(chain.from_iterable(
map(
lambda x:
list(map(lambda y: (x, y), range(1, 5, 2))),
range(0, 5, 2)
)
))
Output:
Out[415]: [(0, 1), (0, 3), (2, 1), (2, 3), (4, 1), (4, 3)]
I have a list of tuples: [(2, Operation.SUBSTITUTED), (1, Operation.DELETED), (2, Operation.INSERTED)]
I would like to sort this list in 2 ways:
First by its 1st value by ascending value, i.e. 1, 2, 3... etc
Second by its 2nd value by reverse alphabetical order, i.e. Operation.SUBSTITITUTED, Operation.INSERTED, Operation, DELETED
So the above list should be sorted as:
[(1, Operation.DELETED), (2, Operation.SUBSTITUTED), (2, Operation.INSERTED)]
How do I go about sort this list?
Since sorting is guaranteed to be stable, you can do this in 2 steps:
lst = [(2, 'Operation.SUBSTITUTED'), (1, 'Operation.DELETED'), (2, 'Operation.INSERTED')]
res_int = sorted(lst, key=lambda x: x[1], reverse=True)
res = sorted(res_int, key=lambda x: x[0])
print(res)
# [(1, 'Operation.DELETED'), (2, 'Operation.SUBSTITUTED'), (2, 'Operation.INSERTED')]
In this particular case, because the order of comparison can be easily inverted for integers, you can sort in one time using negative value for integer key & reverse:
lst = [(2, 'Operation.SUBSTITUTED'), (1, 'Operation.DELETED'), (2, 'Operation.INSERTED')]
res = sorted(lst, key=lambda x: (-x[0],x[1]), reverse=True)
result:
[(1, 'Operation.DELETED'), (2, 'Operation.SUBSTITUTED'), (2, 'Operation.INSERTED')]
negating the integer key cancels the "reverse" aspect, only kept for the second string criterion.
You can use this:
from operator import itemgetter
d = [(1, 'DELETED'), (2, 'INSERTED'), (2, 'SUBSTITUTED')]
d.sort(key=itemgetter(1),reverse=True)
d.sort(key=itemgetter(0))
print(d)
Another way using itemgetter from operator module:
from operator import itemgetter
lst = [(2, 'Operation.SUBSTITUTED'), (1, 'Operation.DELETED'), (2, 'Operation.INSERTED')]
inter = sorted(lst, key=itemgetter(1), reverse=True)
sorted_lst = sorted(inter, key=itemgetter(0))
print(sorted_lst)
# [(1, 'Operation.DELETED'), (2, 'Operation.SUBSTITUTED'), (2, 'Operation.INSERTED')]
If i had to reduce over a pair of values, how do i write the lambda expression for the same.
testr = [('r1', (1, 1)), ('r1', (1, 5)),('r2', (1, 1)),('r3', (1, 1))]
Desired output is
('r1', (2, 6)),('r2', (1, 1)),('r3', (1, 1))
Reduce it by Key:
.reduceByKey(lambda a, b: (a[0]+b[0], a[1]+b[1]))
You can make it more general purpose for arbitrary length tuples with zip:
.reduceByKey(lambda a, b: tuple(x+y for x,y in zip(a,b)))
it is not clear for me how reduce can use to reduce with lambda to reduce list tuples with different keys. My solution is can reduce list of tuples, but it uses function, which is perhaps too troublesome to do in pure lambda, if not impossible.
def reduce_tuple_list(tl):
import operator as op
import functools as fun
import itertools as it
# sort the list for groupby
tl = sorted(tl,key=op.itemgetter(0))
# this function with reduce lists with the same key
def reduce_with_same_key(tl):
def add_tuple(t1,t2):
k1, tl1 = t1
k2, tl2 = t2
if k1 == k2:
l1,r1 = tl1
l2,r2 = tl2
l = l1+l2
r = r1+r2
return k1,(l,r)
else:
return t1,t2
return tuple(fun.reduce(add_tuple, tl))
# group by keys
groups = []
for k, g in it.groupby(tl, key=op.itemgetter(0)):
groups.append(list(g))
new_list = []
# we need to add only lists whose length is greater than one
for el in groups:
if len(el) > 1: # reduce
new_list.append(reduce_with_same_key(el))
else: # single tuple without another one with the same key
new_list.append(el[0])
return new_list
testr = [('r1', (1, 1)), ('r3', (11, 71)), ('r1', (1, 5)),('r2', (1, 1)),('r3', (1, 1))]
>>> reduce_tuple_list(testr)
[('r1', (2, 6)), ('r2', (1, 1)), ('r3', (12, 72))]
you can use combineByKey method
testr = sc.parallelize((('r1', (1, 1)), ('r1', (1, 5)),('r2', (1, 1)),('r3', (1, 1))))
testr.combineByKey(lambda x:x,lambda x,y:(x[0]+y[0],x[1]+y[1]),lambda x,y:(x[0]+x[1],y[0]+y[1])).collect()