Python sorting dictionary

Python sorting dictionary - python

I want to sort the following dictionary by the keys score, which is an array.
student = {"name" : [["Peter"], ["May"], ["Sharon"]],
"score" : [[1,5,3], [3,2,6], [5,9,2]]}
For a better representation:
Peter ---- [1,5,3]
May ---- [3,2,6]
Sharon ---- [5,9,2]
Here I want to use the second element of score to sort the name into a list.
The expected result is:
name_list = ("May ", "Peter", "Sharon")
I have tried to use
sorted_x = sorted(student.items(), key=operator.itemgetter(1))
and
for x in sorted(student, key=lambda k: k['score'][1]):
name_list.append(x['name'])
but both dont work.

First zip the students name and score use zip.
zip(student["name"], student["score"])
you will get better representation:
[(['May'], [3, 2, 6]),
(['Peter'], [1, 5, 3]),
(['Sharon'], [5, 9, 2])]
then sort this list and get the student name:
In [10]: [ i[0][0] for i in sorted(zip(student["name"], student["score"]), key=lambda x: x[1][1])]
Out[10]: ['May', 'Peter', 'Sharon']
Know about the sorted buildin function first: https://docs.python.org/2/howto/sorting.html#sortinghowto

This would probably work for you:
student = {
"name": [["Peter"], ["May"], ["Sharon"]],
"score": [[1,5,3], [3,2,6], [5,9,2]]
}
# pair up the names and scores
# this gives one tuple for each student:
# the first item is their name as a 1-item list
# the second item is their list of scores
pairs = zip(student['name'], student['score'])
# sort the tuples by the second score:
pairs_sorted = sorted(
pairs,
key=lambda t: t[1][1]
)
# get the names, in order
names = [n[0] for n, s in pairs_sorted]
print names
# ['May', 'Peter', 'Sharon']

If you want to use dictionaries, I suggest using OrderedDict:
name = ["Peter", "May", "Sharon"]
score = [[1,5,3], [3,2,6], [5,9,2]]
d = {n: s for (n, s) in zip(name, score)}
from collections import OrderedDict
ordered = OrderedDict(sorted(d.items(), key=lambda t: t[1][1]))
list(ordered) # to retrieve the names
But, if not, the following would be a simpler approach:
name = ["Peter", "May", "Sharon"]
score = [[1,5,3], [3,2,6], [5,9,2]]
d = [(n, s) for (n, s) in zip(name, score)]
ordered = sorted(d, key=lambda t: t[1][1])
names_ordered = [item[0] for item in ordered]

Related

Find max value of a column based on another in python

i have 2d list implementation as follows. It shows no. of times every student topped in exams:-
list = main_record
['student1',1]
['student2',1]
['student2',2]
['student1',5]
['student3',3]
i have another list of unique students as follows:-
list = students_enrolled
['student1','student2','student3']
which i want to display student ranking based on their distinctions as follows:-
list = student_ranking
['student1','student3','student2']
What built in functions can be useful. I could not pose proper query on net. In other words i need python equivalent of following queries:-
select max(main_record[1]) where name = student1 >>> result = 5
select max(main_record[1]) where name = student2 >>> result = 2
select max(main_record[1]) where name = student3 >>> result = 3

You define a dict base key of studentX and save the max value for each student key then sort the students_enrolled base max value of each key.
from collections import defaultdict
main_record = [['student1',1], ['student2',1], ['student2',2], ['student1',5], ['student3',3]]
students_enrolled = ['student1','student2','student3']
# defind dict with negative infinity and update with max in each iteration
tmp_dct = defaultdict(lambda: float('-inf'))
for lst in main_record:
k, v = lst
tmp_dct[k] = max(tmp_dct[k], v)
print(tmp_dct)
students_enrolled.sort(key = lambda x: tmp_dct[x], reverse=True)
print(students_enrolled)
Output:
# tmp_dct =>
defaultdict(<function <lambda> at 0x7fd81044b1f0>,
{'student1': 5, 'student2': 2, 'student3': 3})
# students_enrolled after sorting
['student1', 'student3', 'student2']

If it is a 2D list it should look like this: l = [["student1", 2], ["student2", 3], ["student3", 4]]. To get the highest numeric value from the 2nd column you can use a loop like this:
numbers = []
for student in list:
numbers.append(student[1])
for num in numbers:
n = numbers.copy()
n.sort()
n.reverse()
student_index = numbers.index(n[0])
print(list[student_index], n[0])
numbers.remove(n[0])

Sum lists with different lengths in python

I have 3 lists with different lengths. They are made like this:
final_list = [[1230, 0], [1231,0],[1232,0], [1233, 0], [1234, 0]]
list2 = [[1232, 20], [1233, 30]]
list3 = [[1230, 10], [1231,20],[1232,40]]
What I want to obtain the final_list like this:
final_list = [[1230, 10], [1231,20],[1232,60], [1233, 30], [1234, 0]]
(If, considering each element of list2 and list3, its first value is equal to one of the first elements of the final list, then the corresponding element of the final list has to have the second value equal to the sum of the elements found.)

Not a clean solution, but easy to grasp and might save your day.
f = {}
dcts = map(lambda l: dict([l]), list2+list3)
for dct in dcts:
for k in dct.iterkeys():
f[k] = w.get(k, 0) + d[k]
final_list = map(list, f.items())
however, if you are familiar with itertools
import groupby from itertools
merged = list2+list3
final_list = []
for key, group in groupby(merged, key = lambda e: e[0]):
final_list.append([key, sum(j for i, j in group)])
or a oneliner
[[k, sum(j for i, j in g)] for k, g in groupby(list3+list2, key = lambda e: e[0])]

I created a temp_list and append all three lists to it.
create a dictionary dic and loop through temp_list to sum up each tuple base on the key.
then I turn the dic back into a list and sort it.
I admit this is not the most efficient way to do this. but it is a solution.
temp_list = []
temp_list.append(final_list)
temp_list.append(list2)
temp_list.append(list3)
dic = {}
for lst in temp_list:
for tp in lst:
if tp[0] in dic:
dic[tp[0]] = dic[tp[0]] + tp[1]
else:
dic[tp[0]] = tp[1]
result = []
for key, value in dic.iteritems():
temp = [key,value]
result.append(temp)
result.sort()
result:
[(1230, 10), (1231, 20), (1232, 60), (1233, 30), (1234, 0)]

How to sort a dictionary to output from only highest value?

txt would contain a something like this:
Matt Scored: 10
Jimmy Scored: 3
James Scored: 9
Jimmy Scored: 8
....
My code so far:
from collections import OrderedDict
#opens the class file in order to create a dictionary
dictionary = {}
#splits the data so the name is the key while the score is the value
f = open('ClassA.txt', 'r')
d = {}
for line in f:
firstpart, secondpart = line.strip().split(':')
dictionary[firstpart.strip()] = secondpart.strip()
columns = line.split(": ")
letters = columns[0]
numbers = columns[1].strip()
if d.get(letters):
d[letters].append(numbers)
else:
d[letters] = list(numbers)
#sorts the dictionary so it has a alphabetical order
sorted_dict = OrderedDict(
sorted((key, list(sorted(vals, reverse=True)))
for key, vals in d.items()))
print (sorted_dict)
This code already produces a output of alphabetically sorted names with their scores from highest to lowest printed. However now I require to be able to output the names sorted in a way that the highest score is first and lowest score is last. I tried using the max function however it outputs either only the name and not the score itself, also I want the output to only have the highest score not the previous scores like the current code I have.

I do not think you need dictionary in this case. Just keep scores as a list of tuples.
I.e. sort by name:
>>> sorted([('c', 10), ('b', 16), ('a', 5)],
key = lambda row: row[0])
[('a', 5), ('b', 16), ('c', 10)]
Or by score:
>>> sorted([('c', 10), ('b', 16), ('a', 5)],
key = lambda row: row[1])
[('a', 5), ('c', 10), ('b', 16)]

You can use itertools.groupby to separate out each key on its own. That big long dict comp is ugly, but it works essentially by sorting your input, grouping it by the part before the colon, then taking the biggest result and saving it with the group name.
import itertools, operator
text = """Matt Scored: 10
Jimmy Scored: 3
James Scored: 9
Jimmy Scored: 8"""
result_dict = {group:max(map(lambda s: int(s.split(":")[1]), vals)) for
group,vals in itertools.groupby(sorted(text.splitlines()),
lambda s: s.split(":")[0])}
sorted_dict = sorted(result_dict.items(), key=operator.itemgetter(1), reverse=True)
# result:
[('Matt Scored', 10), ('James Scored', 9), ('Jimmy Scored', 8)]
unrolling the dict comp gives something like:
sorted_txt = sorted(text.splitlines())
groups = itertools.groupby(sorted_txt, lambda s: s.split(":")[0])
result_dict = {}
for group, values in groups:
# group is the first half of the line
result_dict[group] = -1
# some arbitrary small number
for value in values:
#value is the whole line, so....
value = value.split(":")[1]
value = int(value)
result_dict[group] = max(result_dict[group], value)

I would use bisect.insort from the very beginning to have a sorted list whenever you insert a new score, then it's only a matter of reversing or slicing the list to get the desired output:
from bisect import insort
from StringIO import StringIO
d = {}
f = '''Matt Scored: 10
Jimmy Scored: 3
James Scored: 9
Jimmy Scored: 8'''
for line in StringIO(f):
line = line.strip().split(' Scored: ')
name, score = line[0], int(line[1])
if d.get(name):
# whenever new score is inserted, it's sorted from low > high
insort(d[name], score)
else:
d[name] = [score]
d
{'James': [9], 'Jimmy': [3, 8], 'Matt': [10]}
Then to get the desired output:
for k in sorted(d.keys()):
# score from largest to smallest, sorted by names
print 'sorted name, high>low score ', k, d[k][::-1]
# highest score, sorted by name
print 'sorted name, highest score ', k, d[k][-1]
Results:
sorted name, high>low score James [9]
sorted name, highest score James 9
sorted name, high>low score Jimmy [8, 3]
sorted name, highest score Jimmy 8
sorted name, high>low score Matt [10]
sorted name, highest score Matt 10
As a side note: list[::-1] == reversed list, list[-1] == last element

Your code can be simplified a bit using a defaultdict
from collections import defaultdict
d = defaultdict(list)
Next, it's a good practice to use the open context manager when working with files.
with open('ClassA.txt') as f:
Finally, when looping through the lines of f, you should use a single dictionary, not two. To make sorting by score easier, you'll want to store the score as an int.
for line in f:
name, score = line.split(':')
d[name.strip()].append(int(score.strip()))
One of the side effects of this approach is that scores with multiple digits (e.g., Jimmy Scored: 10) will keep their value (10) when creating a new list. In the original version, list('10') results in list['1', '0'].
You can them use sorted's key argument to sort by the values in d rather than its keys.
sorted(d, key=lambda x: max(d[x]))
Putting it all together we get
from collections import defaultdict
d = defaultdict(list)
with open('ClassA.txt') as f:
for line in f:
name, score = line.split(':')
d[name.strip()].append(int(score.strip()))
# Original
print(sorted(d.items()))
# By score ascending
print(sorted(d.items(), key=lambda x: max(x[1])))
# By score descending
print(sorted(d.items(), key=lambda x: max(x[1]), reverse=True))

Finding the dictionary keys whose values are numerically highest

Given a Python dict of the form:
dict = {'Alice': 2341, 'Beth': 9102, 'Cecil': 3258, ......}
Is there an easy way to print the first x keys with the highest numeric values? That is, say:
Beth 9102
Cecil 3258
Currently this is my attempt:
max = 0
max_word = ""
for key, value in w.word_counts.iteritems():
if value > max:
if key not in stop_words:
max = value
max_word = key
print max_word

I'd simply sort the items by the second value and then pick the first K elements :
d_items = sorted(d.items(), key=lambda x: -x[1])
print d_items[:2]
[('Beth', 9102), ('Cecil', 3258)]
The complexity of this approach is O(N log N + K), not that different from optimal O(N + K log K) (using QuickSelect and sorting just the first K elements).

Using collections.Counter.most_common:
>>> from collections import Counter
>>> d = {'Alice': 2341, 'Beth': 9102, 'Cecil': 3258}
>>> c = Counter(d)
>>> c.most_common(2)
[('Beth', 9102), ('Cecil', 3258)]
It uses sorted (O(n*log n)), or heapq.nlargest(k) that might be faster than sorted if k << n, or max() if k==1.

>>> (sorted(dict.items(), key=lambda x:x[1]))[:2]
[('Alice', 2341), ('Cecil', 3258)]

items = sorted(w.word_counts.items(), lambda x, y: cmp(x[1], y[1]), None, True)
items[:5]
Replace 5 with the number of elements you want to get.

d = {'Alice': 2341, 'Beth': 9102, 'Cecil': 3258}
vs = sorted(d, key=d.get,reverse=True)
l = [(x,d.get(x)) for x in vs[0:2]]
n [4]: l
Out[4]: [('Beth', 9102), ('Cecil', 3258)]

Convert dict to list of tuples [(2341, 'Alice'), ...] then sort it (without key=lambda ...).

Sort the top ten results

I am getting a list in which I am saving the results in the following way
City Percentage
Mumbai 98.30
London 23.23
Agra 12.22
.....
List structure is [["Mumbai",98.30],["London",23.23]..]
I am saving this records in form of a list.I need the list to be sort top_ten records.Even if I get cities also, it would be fine.
I am trying to use the following logic, but it fails for to provide accurate data
if (condition):
if b not in top_ten:
top_ten.append(b)
top_ten.remove(tmp)
Any other solution,approach is also welcome.
EDIT 1
for a in sc_percentage:
print a
List I am getting
(<ServiceCenter: DELHI-DLC>, 100.0)
(<ServiceCenter: DELHI-DLE>, 75.0)
(<ServiceCenter: DELHI-DLN>, 90.909090909090907)
(<ServiceCenter: DELHI-DLS>, 83.333333333333343)
(<ServiceCenter: DELHI-DLW>, 92.307692307692307)

Sort the list first and then slice it:
>>> lis = [['Mumbai', 98.3], ['London', 23.23], ['Agra', 12.22]]
>>> print sorted(lis, key = lambda x : x[1], reverse = True)[:10] #[:10] returns first ten items
[['Mumbai', 98.3], ['London', 23.23], ['Agra', 12.22]]
To get data in list form from that file use this:
with open('abc') as f:
next(f) #skip header
lis = [[city,float(val)] for city, val in( line.split() for line in f)]
print lis
#[['Mumbai', 98.3], ['London', 23.23], ['Agra', 12.22]]
Update:
new_lis = sorted(sc_percentage, key = lambda x : x[1], reverse = True)[:10]
for item in new_lis:
print item
sorted returns a new sorted list, as we need to sort the list based on the second item of each element so we used the key parameter.
key = lambda x : x[1] means use the value on the index 1(i.e 100.0, 75.0 etc) of each item for comparison.
reverse= True is used for reverse sorting.

If the list is fairly short then as others have suggested you can sort it and slice it. If the list is very large then you may be better using heapq.nlargest():
>>> import heapq
>>> lis = [['Mumbai', 98.3], ['London', 23.23], ['Agra', 12.22]]
>>> heapq.nlargest(2, lis, key=lambda x:x[1])
[['Mumbai', 98.3], ['London', 23.23]]
The difference is that nlargest only makes a single pass through the list and in fact if you are reading from a file or other generated source need not all be in memory at the same time.
You might also be interested to look at the source for nlargest() as it works in much the same way that you were trying to solve the problem: it keeps only the desired number of elements in a data structure known as a heap and each new value is pushed into the heap then the smallest value is popped from the heap.
Edit to show comparative timing:
>>> import random
>>> records = []
>>> for i in range(100000):
value = random.random() * 100
records.append(('city {:2.4f}'.format(value), value))
>>> import heapq
>>> heapq.nlargest(10, records, key=lambda x:x[1])
[('city 99.9995', 99.99948904248298), ('city 99.9974', 99.99738898315216), ('city 99.9964', 99.99642759230214), ('city 99.9935', 99.99345173704319), ('city 99.9916', 99.99162694442714), ('city 99.9908', 99.99075084123544), ('city 99.9887', 99.98865134685201), ('city 99.9879', 99.98792632193258), ('city 99.9872', 99.98724339718686), ('city 99.9854', 99.98540548350132)]
>>> timeit.timeit('sorted(records, key=lambda x:x[1])[:10]', setup='from __main__ import records', number=10)
1.388942152229788
>>> timeit.timeit('heapq.nlargest(10, records, key=lambda x:x[1])', setup='import heapq;from __main__ import records', number=10)
0.5476185073315492
On my system getting the top 10 from 100 records is fastest by sorting and slicing, but with 1,000 or more records it is faster to use nlargest.

You have to convert your input into something Python can handle easily:
with open('input.txt') as inputFile:
lines = inputFile.readLines()
records = [ line.split() for line in lines ]
records = [ float(percentage), city for city, percentage in records ]
Now the records contain a list of the entries like this:
[ [ 98.3, 'Mumbai' ], [ 23.23, 'London' ], [ 12.22, Agra ] ]
You can sort that list in-place:
records.sort()
You can print the top ten by slicing:
print records[0:10]
If you have a huge list (e. g. millions of entries) and just want the top ten of these in a sorted way, there are better ways than sorting the whole list (which would be a waste of time then).

For printing the top 10 cities you can use :
Sort the list first and then slice it:
>>> lis = [['Mumbai', 98.3], ['London', 23.23], ['Agra', 12.22]]
>>> [k[0] for k in sorted(lis, key = lambda x : x[1], reverse = True)[:10]]
['Mumbai', 'London', 'Agra']
For the given list
>>>: lis=[("<ServiceCenter: DELHI-DLC>", 100.0),("<ServiceCenter: DELHI-DLW>", 92.307692307692307),("<ServiceCenter: DELHI-DLE>", 75.0),("<ServiceCenter: DELHI-DLN>", 90.909090909090907),("<ServiceCenter: DELHI-DLS>", 83.333333333333343)]
>>>:t=[k[0] for k in sorted(lis, key = lambda x : x[1], reverse = True)[:10]]
>>>:print t
['<ServiceCenter: DELHI-DLC>',
'<ServiceCenter: DELHI-DLW>',
'<ServiceCenter: DELHI-DLN>',
'<ServiceCenter: DELHI-DLS>',
'<ServiceCenter: DELHI-DLE>']
Sorted function returns the sorted list with key as the compare function .

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python sorting dictionary - python

Related

Find max value of a column based on another in python

Sum lists with different lengths in python

How to sort a dictionary to output from only highest value?

Finding the dictionary keys whose values are numerically highest

Sort the top ten results

Categories

Resources