def salary_sort(thing):
def importantparts(thing):
for i in range(1, len(thing)):
a=thing[i].split(':')
output = (a[1],a[0],a[8])
sortedlist = sorted(output, key = lambda item: item[2], reverse=True)
print(sortedlist)
return importantparts(thing)
salary_sort(employee_data)
This function is supposed to sort out a list of names by their salary.
I managed to isolate the first last names and salaries but I can't seem to get it to sort by their salaries
'Thing' aka employee_data
employee_data = ["FName LName Tel Address City State Zip Birthdate Salary",
"Arthur:Putie:923-835-8745:23 Wimp Lane:Kensington:DL:38758:8/31/1969:126000",
"Barbara:Kertz:385-573-8326:832 Ponce Drive:Gary:IN:83756:12/1/1946:268500",
"Betty:Boop:245-836-8357:635 Cutesy Lane:Hollywood:CA:91464:6/23/1923:14500",.... etc.]
Output
['Putie', 'Arthur', '126000']
['Kertz', 'Barbara', '268500']
['Betty', 'Boop', '14500']
['Hardy', 'Ephram', '56700']
['Fardbarkle', 'Fred', '780900']
['Igor', 'Chevsky', '23400']
['James', 'Ikeda', '45000']
['Cowan', 'Jennifer', '58900']
['Jesse', 'Neal', '500']
['Jon', 'DeLoach', '85100']
['Jose', 'Santiago', '95600']
['Karen', 'Evich', '58200']
['Lesley', 'Kirstin', '52600']
['Gortz', 'Lori', '35200']
['Corder', 'Norma', '245700']
There are a number of issues with your code, but the key one is that you are sorting each row as you create it, rather than the list of lists.
Also:
importantparts() doesn't return anything (so salarysort() returns None).
You need to cast the Salary field to an int so that it sorts properly by value (they don't all have the same field-width, so an alphanumeric sort will be incorrect).
Finally, you don't need to use for i in range(1, len(thing)):, you can iterate directly over thing, taking a slice to remove the first element1.
1Note that this last is not wrong per se, but iterating directly over an iterable is considered more 'Pythonic'.
def salary_sort(thing):
def importantparts(thing):
unsortedlist = []
for item in thing[1:]:
a=item.split(':')
unsortedlist.append([a[1],a[0],int(a[8])])
print unsortedlist
sortedlist = sorted(unsortedlist, key = lambda item: item[2], reverse=True)
return (sortedlist)
return importantparts(thing)
employee_data = ["FName LName Tel Address City State Zip Birthdate Salary",
"Arthur:Putie:923-835-8745:23 Wimp Lane:Kensington:DL:38758:8/31/1969:126000",
"Barbara:Kertz:385-573-8326:832 Ponce Drive:Gary:IN:83756:12/1/1946:268500",
"Betty:Boop:245-836-8357:635 Cutesy Lane:Hollywood:CA:91464:6/23/1923:14500"]
print salary_sort(employee_data)
Output:
[['Kertz', 'Barbara', 268500], ['Putie', 'Arthur', 126000], ['Boop', 'Betty', 14500]]
You main problem is that you reset the output sequence with each new line instead of first accumulating the data and then sorting. Another problem is that your external function declared an inner one and called it, but the inner one did not return anything. Finally, if you sort strings without converting them to integers, you will get an alphanumeric sort: ('9', '81', '711', '6') which is probably not what you expect.
By the way, the outer-inner functions pattern is of no use here, and you can use a simple direct function.
def salary_sort(thing):
output = []
for i in range(1, len(thing)):
a=thing[i].split(':')
output.append([a[1],a[0],a[8]])
sortedlist = sorted(output, key = lambda item: int(item[2]), reverse=True)
return sortedlist
the result is as expected:
[['Kertz', 'Barbara', '268500'], ['Putie', 'Arthur', '126000'], ['Boop', 'Betty', '14500']]
If you prefer numbers for the salaries, you do the conversion one step higher:
def salary_sort(thing):
output = []
for i in range(1, len(thing)):
a=thing[i].split(':')
output.append([a[1],a[0],int(a[8])])
sortedlist = sorted(output, key = lambda item: item[2], reverse=True)
return sortedlist
and the result is again correct:
[['Kertz', 'Barbara', 268500], ['Putie', 'Arthur', 126000], ['Boop', 'Betty', 14500]]
The problem is that you sort individual elements (meaning ['Putie', 'Arthur', '126000']), based on the salary value, and not the whole array.
Also, since you want to sort the salaries, you have to cast them to int, otherwise alphabetical sort is going to be used.
You can take a look at the following :
def salary_sort(thing):
def importantparts(thing):
data = []
for i in range(1, len(thing)):
a=thing[i].split(':')
output = (a[1],a[0],int(a[8]))
data.append(output)
data.sort(key=lambda item: item[2], reverse=True)
return data
return importantparts(thing)
employee_data = ["FName LName Tel Address City State Zip Birthdate Salary", \
"Arthur:Putie:923-835-8745:23 Wimp Lane:Kensington:DL:38758:8/31/1969:126000", \
"Barbara:Kertz:385-573-8326:832 Ponce Drive:Gary:IN:83756:12/1/1946:268500", \
"Betty:Boop:245-836-8357:635 Cutesy Lane:Hollywood:CA:91464:6/23/1923:14500"]
print(salary_sort(employee_data))
Which gives, as expected :
[('Kertz', 'Barbara', 268500), ('Putie', 'Arthur', 126000), ('Boop', 'Betty', 14500)]
What I did there is pushing all the relevant data for the employees into a new array (named data), and then sorted this array using the lambda function.
Related
Let's say I have a list of tuples like this:
l = [('music','300','url'),('movie','400','url'),
('clothing','250','url'),('music','350','url'),
('music','400','url'),('movie','1000','url')]
and that I want to sort these tuples into multiple lists, each grouped by the first element in the tuples. Further, once grouped into those lists, I want the new lists reverse sorted by the second element (the int). So, the result would be:
music = [('music','400','url'),('music','350','url'),('music','300','url')]
movie = [('movie','1000','url'),('movie','400','url')]
clothing = [('clothing','250','url')]
Perhaps I could forego the multiple lists and make a list of lists of tuples? So, I would get:
sortedlist = [[('music','400','url'),('music','350','url'),('music','300','url')],
[('movie','1000','url'),('movie','400','url')],
[('clothing','250','url')]]
But even in this case, how would I get the internal lists reverse sorted by the second element?
If I'm going about this the wrong way, please mention it. I'm still new at Python. Thx!
Well, you can get your lists easily with a list comprehension:
music = [x for x in l if x[0] == 'music']
movie = [x for x in l if x[0] == 'movie']
clothing = [x for x in l if x[0] == 'clothing']
You can even sort them in place
>>> music.sort(key=lambda x: x[1], reverse=True)
<<< [('music', '400', 'url'), ('music', '350', 'url'), ('music', '300', 'url')]
I'd just use a dict, personally. Simple data structures are best.
from collections import defaultdict
d = defaultdict(list)
for x in l:
d[x[0]].append(x[1:])
Which would give you something like:
>>> for k,v in d.iteritems():
...: print k, v
...:
...:
movie [('400', 'url'), ('1000', 'url')]
clothing [('250', 'url')]
music [('300', 'url'), ('350', 'url'), ('400', 'url')]
But then that's my solution for everything so maybe I need to branch out a little.
You can do something like this:
import itertools
import operator
sorted_l = sorted(l, key=lambda x: (x[0], int(x[1])), reverse=True)
print [list(g[1]) for g in itertools.groupby(sorted_l, key=operator.itemgetter(0))]
Output :
[[('music', '400', 'url'), ('music', '350', 'url'), ('music', '300', 'url')],
[('movie', '1000', 'url'), ('movie', '400', 'url')],
[('clothing', '250', 'url')]]
What I would do in a case like this is a dictionary of lists.
things = {}
for tuple in all_tuples:
key = tuple[0]
if not key in things:
things[key] = [] # Initialize empty list
things[key].append(tuple)
Then you can iterate through "things" using things.keys() or things.values()
E.g.
things["music"] = [('music','400','url'),('music','350','url'),('music','300','url')]
Ok, so I am working on an application that can go through a number of different database objects, compare the string and return the associated id, first name and last name. I currently have it to where I am building a list of tuples and then populating a dictionary with the key and values(using a list). What I want to do next is find the Max percentage and then return the associated fist and last name from the dictionary. I know the description is a little confusing so please look at the below examples and code:
# My Dictionary:
{'percent': [51.9, 52.3, 81.8, 21.0], 'first_name': ['Bob', 'Bill', 'Matt', 'John'], 'last_name': ['Smith', 'Allen', 'Naran', 'Jacobs']}
# I would want this to be returned:
percent = 81.8 (Max percentage match)
first_name = 'Matt' (First name associated with the max percentage match)
last_name = 'Naran' (Last name associated with the max percentage match)
# Code so Far:
compare_list = []
compare_dict = {}
# Builds my list of Tuples
compare_list.append(tuple(("percent", percentage)))
compare_list.append(tuple(("first_name", first_name)))
compare_list.append(tuple(("last_name", last_name)))
# Builds my Dictionary
for x, y in compare_list:
compare_dict.setdefault(x, []).append(y)
Not sure where to go to return the first and last name associated with the Max percentage.
I really appreciate any and all help that you provide!
I hope this will help you:
data = {'percent': [51.9, 52.3, 81.8, 21.0], 'first_name': ['Bob', 'Bill', 'Matt', 'John'], 'last_name': ['Smith', 'Allen', 'Naran', 'Jacobs']}
percentage_list = data['percent']
percentage = max(percentage_list)
max_index = percentage_list.index(percentage)
first_name = data['first_name'][max_index]
last_name = data['last_name'][max_index]
# Code so Far:
compare_list = []
compare_dict = {}
# Builds my list of Tuples
compare_list.append(tuple(("percent", percentage)))
compare_list.append(tuple(("first_name", first_name)))
compare_list.append(tuple(("last_name", last_name)))
# Builds my Dictionary
for x, y in compare_list:
compare_dict.setdefault(x, []).append(y)
print compare_dict
I am using a tuple to store the output of a find -exec stat command and need to condense it in order to run du on it. The output is a tuple with each item being (username,/path/to/file)
I want to condense it to combine like usernames so the end result is (username,/path/to/file1,/path/to/file2,etc)
Is there any way to do this?
Here is the current code that returns my tuple
cmd = ['find',dir_loc,'-type','f','-exec','stat','-c','%U %n','{}','+']
process = Popen(cmd,stdout=PIPE)
find_out = process.communicate()
exit_code = process.wait()
find_out = find_out[0].split('\n')
out_tuple = []
for item in find_out:
out_tuple.append(item.split(' '))
Assuming you have a list of tuples or a list of lists of the form:
out_tuple = [('user_one', 'path_one'),
('user_three', 'path_seven'),
('user_two', 'path_five'),
('user_one', 'path_two'),
('user_one', 'path_three'),
('user_two', 'path_four')]
You can do:
from itertools import groupby
out_tuple.sort()
total_grouped = []
for key, group in groupby(out_tuple, lambda x: x[0]):
grouped_list = [key] + [x[1] for x in group]
total_grouped.append(tuple(grouped_list))
This will give you the list of tuples:
print total_grouped
# Prints:
# [('user_one', 'path_one', 'path_two', 'path_three'),
# ('user_three', 'path_seven'),
# ('user_two', 'path_five', 'path_four')]
If you started with a list of lists, then instead of:
total_grouped.append(tuple(grouped_list))
You can get rid of the tuple construction:
total_grouped.append(grouped_list)
I'll say one thing though, you might be better off using something like a dict as #BradBeattie suggests. If you're going to perform some operation later on that treats the first item in your tuple (or list) in a special way, then a dict is better.
It not only has a notion of uniqueness in the keys, it's also less cumbersome because the nesting has two distinct levels. First you have the dict, then you have the inner item which is a tuple (or a list). This is much clearer than having two similar collections nested one inside the other.
Just use a dict of lists:
out_tuple = [('user1', 'path1'),
('user1', 'path2'),
('user2', 'path3'),
('user1', 'path4'),
('user2', 'path5'),
('user1', 'path6')]
d={}
for user_name, path in out_tuple:
d.setdefault(user_name, []).append(path)
print d
Prints:
{'user2': ['path3', 'path5'], 'user1': ['path1', 'path2', 'path4', 'path6']}
Then if you want the output for each user name as a tuple:
for user_name in d:
print tuple([user_name]+d[user_name])
Prints:
('user2', 'path3', 'path5')
('user1', 'path1', 'path2', 'path4', 'path6')
I have a list of single entry dictionaries. Each dictionary has only 1 key and 1 value. I'd like to sort the list of dictionaries by these values REGARDLESS of the keyname! The key names are both the same and different from dictionary to dictionary.
All of the online examples I have seen assume the same key name across dictionaries. These type of examples have not worked for me because they assume the same key value:
newlist = sorted(list_to_be_sorted, key=lambda k: k['name'])
In my example, I need to compare the values regardless of whether the key is bob or sarah; and order the list of dictionaries. Here's an example list of dictionaries:
Times = [{"Bob":14.05}, {"Tim":15.09}, {"Tim":17.01}, {"Bob":16.81}, {"Sarah":15.08}]
desired output:
[{"Bob":14.05}, {"Sarah":15.08}, {"Tim":15.09}, {"Bob":16.81}, {"Tim":1701}]
times = [{"Bob":14.05},{"Tim":15.09},{"Tim":17.01},{"Bob":16.81},{"Sarah":15.08}]
print sorted(times, key=lambda k: k.values())
Output
[{'Bob': 14.05},{'Sarah': 15.08}, {'Tim': 15.09}, {'Bob': 16.81}, {'Tim': 17.01}]
If there are multiple values in the values list and if you want to consider only the elements at particular index, then you can do
print sorted(times, key=lambda k: k.values()[0])
What about:
newlist = sorted(Times, key=lambda k: k.values()[0])
It keys off the first (only) of the dictionary's .values()
#thefourtheye - your answer is quite nice.
Want to highlight a subtle and IMO interesting thing for folks new to python. Consider this tweak to thefourtheye's answer:
times = [{"Bob":14.05},{"Tim":15.09},{"Tim":17.01},{"Bob":16.81},{"Sarah":15.08}]
print sorted(times, key=lambda k: k.itervalues().next())
Which yields the same result:
[{'Bob': 14.05}, {'Sarah': 15.08}, {'Tim': 15.09}, {'Bob': 16.81}, {'Tim': 17.01}]
The tweak avoids the creation of an intermediate and unnecessary array. By using the iterator "itervalues()" and then getting just the first value (via .next()) the sort method just compares the raw value, without the array.
Let's look at performance:
test_cases = [
[],
[{"Bob":14.05}],
[{"Bob":14.05},{"Tim":15.09},{"Tim":17.01},{"Bob":16.81},{"Sarah":15.08}],
[dict(zip((str(x) for x in xrange(50)), random.sample(xrange(1000), 50)))] # 50 dict's in a list
]
print "perf test"
for test_case in test_cases:
print test_case
print "k.values() :", timeit.repeat(
"sorted(test_case, key=lambda k: k.values())",
"from __main__ import test_case",
)
print "k.itervalues().next():", timeit.repeat(
"sorted(test_case, key=lambda k: k.itervalues().next())",
"from __main__ import test_case",
)
print
results:
[]
k.values() : [0.7124178409576416, 0.7222259044647217, 0.7217190265655518]
k.itervalues().next(): [0.7274281978607178, 0.7140758037567139, 0.7135159969329834]
[{'Bob': 14.05}]
k.values() : [1.3001079559326172, 1.395097017288208, 1.314589023590088]
k.itervalues().next(): [1.2579071521759033, 1.2594029903411865, 1.2587871551513672]
[{'Bob': 14.05}, {'Tim': 15.09}, {'Tim': 17.01}, {'Bob': 16.81}, {'Sarah': 15.08}]
k.values() : [3.1186227798461914, 3.107577085494995, 3.1108040809631348]
k.itervalues().next(): [2.8267030715942383, 2.9143049716949463, 2.8211638927459717]
[{'42': 771, '48': 129, '43': 619, '49': 450, --- SNIP --- , '33': 162, '32': 764}]
k.values() : [1.5659689903259277, 1.6058270931243896, 1.5724899768829346]
k.itervalues().next(): [1.29836106300354, 1.2615361213684082, 1.267350196838379]
Mind you, perf will often not matter, but given that the 2 solutions are similar in terms of readabilty, expressiveness, I think it's good to understand the later solution, and build habits in those terms.
I am getting a list in which I am saving the results in the following way
City Percentage
Mumbai 98.30
London 23.23
Agra 12.22
.....
List structure is [["Mumbai",98.30],["London",23.23]..]
I am saving this records in form of a list.I need the list to be sort top_ten records.Even if I get cities also, it would be fine.
I am trying to use the following logic, but it fails for to provide accurate data
if (condition):
if b not in top_ten:
top_ten.append(b)
top_ten.remove(tmp)
Any other solution,approach is also welcome.
EDIT 1
for a in sc_percentage:
print a
List I am getting
(<ServiceCenter: DELHI-DLC>, 100.0)
(<ServiceCenter: DELHI-DLE>, 75.0)
(<ServiceCenter: DELHI-DLN>, 90.909090909090907)
(<ServiceCenter: DELHI-DLS>, 83.333333333333343)
(<ServiceCenter: DELHI-DLW>, 92.307692307692307)
Sort the list first and then slice it:
>>> lis = [['Mumbai', 98.3], ['London', 23.23], ['Agra', 12.22]]
>>> print sorted(lis, key = lambda x : x[1], reverse = True)[:10] #[:10] returns first ten items
[['Mumbai', 98.3], ['London', 23.23], ['Agra', 12.22]]
To get data in list form from that file use this:
with open('abc') as f:
next(f) #skip header
lis = [[city,float(val)] for city, val in( line.split() for line in f)]
print lis
#[['Mumbai', 98.3], ['London', 23.23], ['Agra', 12.22]]
Update:
new_lis = sorted(sc_percentage, key = lambda x : x[1], reverse = True)[:10]
for item in new_lis:
print item
sorted returns a new sorted list, as we need to sort the list based on the second item of each element so we used the key parameter.
key = lambda x : x[1] means use the value on the index 1(i.e 100.0, 75.0 etc) of each item for comparison.
reverse= True is used for reverse sorting.
If the list is fairly short then as others have suggested you can sort it and slice it. If the list is very large then you may be better using heapq.nlargest():
>>> import heapq
>>> lis = [['Mumbai', 98.3], ['London', 23.23], ['Agra', 12.22]]
>>> heapq.nlargest(2, lis, key=lambda x:x[1])
[['Mumbai', 98.3], ['London', 23.23]]
The difference is that nlargest only makes a single pass through the list and in fact if you are reading from a file or other generated source need not all be in memory at the same time.
You might also be interested to look at the source for nlargest() as it works in much the same way that you were trying to solve the problem: it keeps only the desired number of elements in a data structure known as a heap and each new value is pushed into the heap then the smallest value is popped from the heap.
Edit to show comparative timing:
>>> import random
>>> records = []
>>> for i in range(100000):
value = random.random() * 100
records.append(('city {:2.4f}'.format(value), value))
>>> import heapq
>>> heapq.nlargest(10, records, key=lambda x:x[1])
[('city 99.9995', 99.99948904248298), ('city 99.9974', 99.99738898315216), ('city 99.9964', 99.99642759230214), ('city 99.9935', 99.99345173704319), ('city 99.9916', 99.99162694442714), ('city 99.9908', 99.99075084123544), ('city 99.9887', 99.98865134685201), ('city 99.9879', 99.98792632193258), ('city 99.9872', 99.98724339718686), ('city 99.9854', 99.98540548350132)]
>>> timeit.timeit('sorted(records, key=lambda x:x[1])[:10]', setup='from __main__ import records', number=10)
1.388942152229788
>>> timeit.timeit('heapq.nlargest(10, records, key=lambda x:x[1])', setup='import heapq;from __main__ import records', number=10)
0.5476185073315492
On my system getting the top 10 from 100 records is fastest by sorting and slicing, but with 1,000 or more records it is faster to use nlargest.
You have to convert your input into something Python can handle easily:
with open('input.txt') as inputFile:
lines = inputFile.readLines()
records = [ line.split() for line in lines ]
records = [ float(percentage), city for city, percentage in records ]
Now the records contain a list of the entries like this:
[ [ 98.3, 'Mumbai' ], [ 23.23, 'London' ], [ 12.22, Agra ] ]
You can sort that list in-place:
records.sort()
You can print the top ten by slicing:
print records[0:10]
If you have a huge list (e. g. millions of entries) and just want the top ten of these in a sorted way, there are better ways than sorting the whole list (which would be a waste of time then).
For printing the top 10 cities you can use :
Sort the list first and then slice it:
>>> lis = [['Mumbai', 98.3], ['London', 23.23], ['Agra', 12.22]]
>>> [k[0] for k in sorted(lis, key = lambda x : x[1], reverse = True)[:10]]
['Mumbai', 'London', 'Agra']
For the given list
>>>: lis=[("<ServiceCenter: DELHI-DLC>", 100.0),("<ServiceCenter: DELHI-DLW>", 92.307692307692307),("<ServiceCenter: DELHI-DLE>", 75.0),("<ServiceCenter: DELHI-DLN>", 90.909090909090907),("<ServiceCenter: DELHI-DLS>", 83.333333333333343)]
>>>:t=[k[0] for k in sorted(lis, key = lambda x : x[1], reverse = True)[:10]]
>>>:print t
['<ServiceCenter: DELHI-DLC>',
'<ServiceCenter: DELHI-DLW>',
'<ServiceCenter: DELHI-DLN>',
'<ServiceCenter: DELHI-DLS>',
'<ServiceCenter: DELHI-DLE>']
Sorted function returns the sorted list with key as the compare function .