I am sorting a tuple of tuples by second item. The tuple I need to sort is this: tuple1 = (('a', 23),('b', 37),('c', 11), ('d',29)). The solution to this program given on the internet is as follows:
tuple1 = (('a', 23),('b', 37),('c', 11), ('d',29))
print(tuple(sorted(list(tuple1), key=lambda x: x[1])))
What I can't understand is the function of key=lambda x: x[1] expression in the code. What does the keyword key denote here? I know lambda is an anonymous function. But how is it working in this code to give the desired output?
The key argument is ment to specify how to perform the sort. You can refer to the following link:
https://www.w3schools.com/python/ref_func_sorted.asp
For a more in-depth explanation of sorted and it's arguments have a look at the following link:
https://developers.google.com/edu/python/sorting
In your case, you sort the list of tuples based on the second element from each tuple.
The keyword key is an argument to sorted, it is the element that is compared when sorting list(tuple1)
The lambda function simply selects the second element of each tuple in the list, so we're comparing the ints not the characters
For a List of T, key takes a function T -> int (or anything thats sortable, but ints behave in the most expected way), and sorts by those. Here T = (int, int) and the lambda returns the 2nd int
Related
Hi I am new to python and wanted to sort these values based on the numeric values present in each tuple
b={('shoe',0.98),('bag',0.67),('leather',0.77)}
I have tried changing it into a list but then the tuple elements cannot be changed
Thanks in advance
Python sets are unordered, so you can’t sort them. But you can sort the elements in a set using the sorted function and passing a lambda that selects the second item of a tuple (since the set elements are tuples and you want to sort by the second elements of the tuples) to the key parameter. This returns a list:
out = sorted(b, key=lambda x:x[1])
Output:
[('bag', 0.67), ('leather', 0.77), ('shoe', 0.98)]
Or you can use operator.itemgetter (there is also attrgetter):
from operator import itemgetter
out = sorted(b, key=itemgetter(1))
I am getting the following error on console
newList = sorted(l,key=sort)
NameError: name 'sort' is not defined
after executing the following piece of code:
if __name__ == '__main__':
l= [[input(), float(input())] for _ in range(int(input()))]
print( l)
new = sorted(l,key=sort)
print(new)
From the given article I learned that key parameter in sorted() can use user/inbuilt Python methods. I am trying to sort my list alphabetically so I passed key=sort with an understanding that it will sort my list alphabetically.
Please help me out where am I going wrong here.
sorted() function has an optional parameter called key which takes a function as its value. This key function transforms each element before sorting, it takes the value and returns 1 value which is then used within sort instead of the original value.
Example:
If there is a list of tuples,
li = [('p1', 20), ('p2', 10), ('p3', 30)]
and you want to sort the elements of the list in such a way that the resulting order is as follows:
Ouptut: li = [('p2', 10), ('p1', 20), ('p3', 30)]
Then, all you need to do is, sort the tuples, based on the 2nd element in each tuple.
To do so, we need to use a custom method, which will be applied to each element and the elements will be sorted based on the representative of each element(i.e KEY).
Hence, the syntax to the same will be as follows:
sorted(li, key=lambda x: x[1])
Edit:
Sort is in itself a function, which is used to arrange elements in a specific order. But, it cannot be used to extract a representative (i.e a KEY) for each element in the list.
Well, sort is not a user/built-in Python function. It is a method on lists, but, not a name accessible globally. Even if you would use it (through key=list.sort), it would not give you your desired result, as the key function is used to transform each element before sorting.
By default (without specifying the key parameter), sorted should already sort strings alphabetically (sort of), same with lists containing lists with a string as the first item. If you want to sort "A" and "a" together (as well as "Z" and "z"), you could use the key function str.lower, but you would need to apply that only to the first element of your inner lists:
new = sorted(l, key=lambda x: x[0].lower())
I am trying to understand how this works:
my_dict = {'a':2,'b':1}
min(my_dict, key=my_dict.get)
produces
b
Which is a really cool feature and one I want to understand better.
Based on the documentation
min(iterable[, key])
Return the smallest item in an iterable or the smallest of two or more arguments...
The optional key argument specifies a one-argument ordering function like that used for list.sort(). The key argument, if supplied, must be in keyword form (for example, min(a,b,c,key=func)).
Where can I find out more about available functions? In the case of a dictionary, is it all the dictionary methods?
Edit: I came across this today:
max(enumerate(array_x), key=operator.itemgetter(1))
Still looking for information on available keyword functions for min/max
The code you have written is
my_dict = {'a':2,'b':1}
min(my_dict, key=my_dict.get)
actually this works on min function.
so, what does min do?
min(a, b, c, ...[, key=func]) -> value
With a single iterable argument, return its lowest item. With two or more arguments, return the lowest argument.
The key here is used to pass a custom comparison function.
Example: output max by length of list, where arg1, arg2 are both lists.
>>>> max([1,2,3,4], [3,4,5], key=len)
[1, 2, 3, 4]
But what if I want the max from the list, but by considering the second element of the tuple? here we can use functions, as given in official documentation. The def statements are compound statements they can't be used where an expression is required, that's why sometimes lambda's are used.
Note that lambda is equivalent to what you'd put in a return statement of a def. Thus, you can't use statements inside a lambda, only expressions are allowed.
>>> max(l, key = lambda i : i[1])
(1, 9)
# Or
>>> import operator
>>> max(l, key = operator.itemgetter(1))
(1, 9)
so the functions are basically depend upon the the iterable and and passing the criteria for the comparison.
Now in your example, you are iterating over your dictionary. And in key, you are using get method here.
The method get() returns a value for the given key. If key is not available then returns default value None.
As here, no arguments are there in get method it simply iterates over values of dictionary. And thus the min gives you the key having minimum value.
For max(enumerate(array_x), key=operator.itemgetter(1))
we want to compare the values of array instead of their indices. So we have enumerated the array.
enumerate(thing), where thing is either an iterator or a sequence, returns a iterator that will return (0, thing[0]), (1, thing1), (2, thing[2])
now we have used itemgetter function of operator module. operator.itemgetter(n) constructs a callable that assumes an iterable object (e.g. list, tuple, set) as input, and fetches the n-th element out of it.
you can also use lambda function of here like
max(enumerate(array_x), key=lambda i: i[1])
So the range of functions in key is almost up to the use. we can use many functions but the sole motive is , it is the criteria for that comparison.
Imagine you have objects with some attribute you want to use to get the minimum value:
min(my_objects, key=lambda x: x.something)
This will give you the object with the smallest something attribute.
The same thing exists for example in sorted() so you can easily sort by a value derived from the object. Imagine you have a list of people and want to sort by first name, then last name:
people.sort(key=lambda x: (x.first_name, x.last_name))
I have an RDD of the form:
(2, [hello, hi, how, are, you])
I need to map these tuple like:
((2,hello), (2, hi), (2, how), ((2, are), (2, you))
I am trying this in python:
PairRDD = rdd.flatMap(lambda (k,v): v.split(',')).map(lambda x: (k,x)).reduceByKey())
This will not work as I do not have k in map transformation. I am not sure how to do it ? Any comments ?
Thanking you in advance.
I think your core issue is a misplaced right parens. Consider the following code (I've tested the equivalent in Scala, but it should work the same way in pySpark):
PairRDD = rdd.flatMap(lambda (k,v): v.split(',').map(lambda x: (k,x)))
v is split into a list of strings, and then that list is mapped to a tuple of (key, string), and then that list is returned to flatMap, splitting it out into multiple rows in the RDD. With the additional right parens after v.split(','), you were throwing away the key (since you only returned a list of strings).
Are the key values unique in the original dataset? If so and you want a list of tuples, then instead of flatMap use map and you'll get what you want without a shuffle. If you do want to combine multiple rows from the original dataset, then a groupByKey is called for, not reduceByKey.
I'm also curious if the split is necessary--is your tuple (Int, String) or (Int, List(String))?
I have a list of tuples
student_tuples = [
('john', 'A', 15),
('jane', 'B', 12),
('dave', 'B', 10),
]
I've been trying different ways to sort this, using itemgetter and lambda functions. Sorting by two indices of the tuples can be done with itemgetting and the lambda function, but it must return a tuple. I can't seem to find that anywhere in the documentation that the key function works on tuples.
Anyway, I wanted to know what itemgetter() actually returns, so this works (copied from the itemgetter documentation):
f = itemgetter(1)
print f(student_tuples[0])
----->A
Is there any way to do this WITHOUT having to reassign itemgetter to a variable? It looks like two arguments are being passed, but something like
print itemgetter(1, student_tuples[0])
-----><operator.itemgetter object at 0xf7309c8c>
doesn't give me anything useful.
I'm just fiddling around trying to learn Python and this is confusing me. I don't know where in itemgetter student_tuples[0] is being added as an argument.
The return value of itemgetter(1) is a function (actually, a callable object, but it's used like a function).
The function it returns is roughly equivalent to the function that results from the expression:
lambda x: x[1]
student_tuples[0] isn't added as an argument anywhere in itemgetter. It is passed as an argument to the function-that-was-returned when you call f(student_tuples[0]).
Since f is the result of itemgetter(1), it follows that you can do this in one line as:
itemgetter(1)(student_tuples[0])
You need to call the return value of itemgetter() again, not pass in the list as the second argument. Example -
itemgetter(1)(student_tuples[0])
As can be seen from the documentation -
operator.itemgetter(*items)
Return a callable object that fetches item from its operand using the operand’s __getitem__() method.
itemgetter() returns a function, which you can again call passing in the actual iterable, to get the value from that particular index in that iterable.
When you pass in multiple values to itemgetter() , it still returns a function, and calling that function would try to get the elements from the iterable using the index you passed in to itemgetter() initially as a tuple. Example -
>>> l =[1,2,3,4,5]
>>> operator.itemgetter(1,2,4)(l)
(2, 3, 5)