Sort the top ten results - python

I am getting a list in which I am saving the results in the following way
City Percentage
Mumbai 98.30
London 23.23
Agra 12.22
.....
List structure is [["Mumbai",98.30],["London",23.23]..]
I am saving this records in form of a list.I need the list to be sort top_ten records.Even if I get cities also, it would be fine.
I am trying to use the following logic, but it fails for to provide accurate data
if (condition):
if b not in top_ten:
top_ten.append(b)
top_ten.remove(tmp)
Any other solution,approach is also welcome.
EDIT 1
for a in sc_percentage:
print a
List I am getting
(<ServiceCenter: DELHI-DLC>, 100.0)
(<ServiceCenter: DELHI-DLE>, 75.0)
(<ServiceCenter: DELHI-DLN>, 90.909090909090907)
(<ServiceCenter: DELHI-DLS>, 83.333333333333343)
(<ServiceCenter: DELHI-DLW>, 92.307692307692307)

Sort the list first and then slice it:
>>> lis = [['Mumbai', 98.3], ['London', 23.23], ['Agra', 12.22]]
>>> print sorted(lis, key = lambda x : x[1], reverse = True)[:10] #[:10] returns first ten items
[['Mumbai', 98.3], ['London', 23.23], ['Agra', 12.22]]
To get data in list form from that file use this:
with open('abc') as f:
next(f) #skip header
lis = [[city,float(val)] for city, val in( line.split() for line in f)]
print lis
#[['Mumbai', 98.3], ['London', 23.23], ['Agra', 12.22]]
Update:
new_lis = sorted(sc_percentage, key = lambda x : x[1], reverse = True)[:10]
for item in new_lis:
print item
sorted returns a new sorted list, as we need to sort the list based on the second item of each element so we used the key parameter.
key = lambda x : x[1] means use the value on the index 1(i.e 100.0, 75.0 etc) of each item for comparison.
reverse= True is used for reverse sorting.

If the list is fairly short then as others have suggested you can sort it and slice it. If the list is very large then you may be better using heapq.nlargest():
>>> import heapq
>>> lis = [['Mumbai', 98.3], ['London', 23.23], ['Agra', 12.22]]
>>> heapq.nlargest(2, lis, key=lambda x:x[1])
[['Mumbai', 98.3], ['London', 23.23]]
The difference is that nlargest only makes a single pass through the list and in fact if you are reading from a file or other generated source need not all be in memory at the same time.
You might also be interested to look at the source for nlargest() as it works in much the same way that you were trying to solve the problem: it keeps only the desired number of elements in a data structure known as a heap and each new value is pushed into the heap then the smallest value is popped from the heap.
Edit to show comparative timing:
>>> import random
>>> records = []
>>> for i in range(100000):
value = random.random() * 100
records.append(('city {:2.4f}'.format(value), value))
>>> import heapq
>>> heapq.nlargest(10, records, key=lambda x:x[1])
[('city 99.9995', 99.99948904248298), ('city 99.9974', 99.99738898315216), ('city 99.9964', 99.99642759230214), ('city 99.9935', 99.99345173704319), ('city 99.9916', 99.99162694442714), ('city 99.9908', 99.99075084123544), ('city 99.9887', 99.98865134685201), ('city 99.9879', 99.98792632193258), ('city 99.9872', 99.98724339718686), ('city 99.9854', 99.98540548350132)]
>>> timeit.timeit('sorted(records, key=lambda x:x[1])[:10]', setup='from __main__ import records', number=10)
1.388942152229788
>>> timeit.timeit('heapq.nlargest(10, records, key=lambda x:x[1])', setup='import heapq;from __main__ import records', number=10)
0.5476185073315492
On my system getting the top 10 from 100 records is fastest by sorting and slicing, but with 1,000 or more records it is faster to use nlargest.

You have to convert your input into something Python can handle easily:
with open('input.txt') as inputFile:
lines = inputFile.readLines()
records = [ line.split() for line in lines ]
records = [ float(percentage), city for city, percentage in records ]
Now the records contain a list of the entries like this:
[ [ 98.3, 'Mumbai' ], [ 23.23, 'London' ], [ 12.22, Agra ] ]
You can sort that list in-place:
records.sort()
You can print the top ten by slicing:
print records[0:10]
If you have a huge list (e. g. millions of entries) and just want the top ten of these in a sorted way, there are better ways than sorting the whole list (which would be a waste of time then).

For printing the top 10 cities you can use :
Sort the list first and then slice it:
>>> lis = [['Mumbai', 98.3], ['London', 23.23], ['Agra', 12.22]]
>>> [k[0] for k in sorted(lis, key = lambda x : x[1], reverse = True)[:10]]
['Mumbai', 'London', 'Agra']
For the given list
>>>: lis=[("<ServiceCenter: DELHI-DLC>", 100.0),("<ServiceCenter: DELHI-DLW>", 92.307692307692307),("<ServiceCenter: DELHI-DLE>", 75.0),("<ServiceCenter: DELHI-DLN>", 90.909090909090907),("<ServiceCenter: DELHI-DLS>", 83.333333333333343)]
>>>:t=[k[0] for k in sorted(lis, key = lambda x : x[1], reverse = True)[:10]]
>>>:print t
['<ServiceCenter: DELHI-DLC>',
'<ServiceCenter: DELHI-DLW>',
'<ServiceCenter: DELHI-DLN>',
'<ServiceCenter: DELHI-DLS>',
'<ServiceCenter: DELHI-DLE>']
Sorted function returns the sorted list with key as the compare function .

Related

how to find element inside list of lists? [duplicate]

Let's say I have a list of tuples like this:
l = [('music','300','url'),('movie','400','url'),
('clothing','250','url'),('music','350','url'),
('music','400','url'),('movie','1000','url')]
and that I want to sort these tuples into multiple lists, each grouped by the first element in the tuples. Further, once grouped into those lists, I want the new lists reverse sorted by the second element (the int). So, the result would be:
music = [('music','400','url'),('music','350','url'),('music','300','url')]
movie = [('movie','1000','url'),('movie','400','url')]
clothing = [('clothing','250','url')]
Perhaps I could forego the multiple lists and make a list of lists of tuples? So, I would get:
sortedlist = [[('music','400','url'),('music','350','url'),('music','300','url')],
[('movie','1000','url'),('movie','400','url')],
[('clothing','250','url')]]
But even in this case, how would I get the internal lists reverse sorted by the second element?
If I'm going about this the wrong way, please mention it. I'm still new at Python. Thx!
Well, you can get your lists easily with a list comprehension:
music = [x for x in l if x[0] == 'music']
movie = [x for x in l if x[0] == 'movie']
clothing = [x for x in l if x[0] == 'clothing']
You can even sort them in place
>>> music.sort(key=lambda x: x[1], reverse=True)
<<< [('music', '400', 'url'), ('music', '350', 'url'), ('music', '300', 'url')]
I'd just use a dict, personally. Simple data structures are best.
from collections import defaultdict
d = defaultdict(list)
for x in l:
d[x[0]].append(x[1:])
Which would give you something like:
>>> for k,v in d.iteritems():
...: print k, v
...:
...:
movie [('400', 'url'), ('1000', 'url')]
clothing [('250', 'url')]
music [('300', 'url'), ('350', 'url'), ('400', 'url')]
But then that's my solution for everything so maybe I need to branch out a little.
You can do something like this:
import itertools
import operator
sorted_l = sorted(l, key=lambda x: (x[0], int(x[1])), reverse=True)
print [list(g[1]) for g in itertools.groupby(sorted_l, key=operator.itemgetter(0))]
Output :
[[('music', '400', 'url'), ('music', '350', 'url'), ('music', '300', 'url')],
[('movie', '1000', 'url'), ('movie', '400', 'url')],
[('clothing', '250', 'url')]]
What I would do in a case like this is a dictionary of lists.
things = {}
for tuple in all_tuples:
key = tuple[0]
if not key in things:
things[key] = [] # Initialize empty list
things[key].append(tuple)
Then you can iterate through "things" using things.keys() or things.values()
E.g.
things["music"] = [('music','400','url'),('music','350','url'),('music','300','url')]

Turning numpy array into list of lists without zip

I want to turn my array which consists out of 2 lists into a ranked list.
Currently my code produces :
[['txt1.txt' 'txt2.txt' 'txt3.txt' 'txt4.txt' 'txt5.txt' 'txt6.txt'
'txt7.txt' 'txt8.txt']
['0.13794219565502694' '0.024652340886571225' '0.09806335128916213'
'0.07663118536707426' '0.09118273488073968' '0.06278926571143634'
'0.05114729750522118' '0.02961812647701087']]
I want to make it so that txt1.txt goes with the first value, txt2 goes with the second value etc.
So something like this
[['txt1.txt', '0.13794219565502694'], ['txt2.txt', '0.024652340886571225']... etc ]]
I do not want it to become tuples by using zip.
My current code:
def rankedmatrix():
matrix = numpy.array([names,x])
ranked_matrix = sorted(matrix.tolist(), key=lambda score: score[1], reverse=True)
print(ranked_matrix)
Names being :
names = ['txt1.txt', 'txt2.txt', 'txt3.txt', 'txt4.txt', 'txt5.txt', 'txt6.txt', 'txt7.txt', 'txt8.txt']
x being:
x = [0.1379422 0.01540234 0.09806335 0.07663119 0.09118273 0.06278927
0.0511473 0.02961813]
Any help is appreciated.
You can get the list of lists with zip as well:
x = [['txt1.txt', 'txt2.txt', 'txt3.txt', 'txt4.txt', 'txt5.txt', 'txt6.txt'
'txt7.txt', 'txt8.txt'], ['0.13794219565502694', '0.024652340886571225', '0.09806335128916213',
'0.07663118536707426', '0.09118273488073968', '0.06278926571143634',
'0.05114729750522118', '0.02961812647701087']]
res = [[e1, e2] for e1, e2 in zip(x[0], x[1])]
print(res)
Output:
[['txt1.txt', '0.13794219565502694'], ['txt2.txt', '0.024652340886571225'], ['txt3.txt', '0.09806335128916213'], ['txt4.txt', '0.07663118536707426'], ['txt5.txt', '0.09118273488073968'], ['txt6.txttxt7.txt', '0.06278926571143634'], ['txt8.txt', '0.05114729750522118']]
You can use map to convert the tuple to list.
list(map(list, zip(names, x)))
[['txt1.txt', 0.1379422],
['txt2.txt', 0.01540234],
['txt3.txt', 0.09806335],
['txt4.txt', 0.07663119],
['txt5.txt', 0.09118273],
['txt6.txt', 0.06278927],
['txt7.txt', 0.0511473],
['txt8.txt', 0.02961813]]

Sort list of dictionaries by value, regardless of keyname in python

I have a list of single entry dictionaries. Each dictionary has only 1 key and 1 value. I'd like to sort the list of dictionaries by these values REGARDLESS of the keyname! The key names are both the same and different from dictionary to dictionary.
All of the online examples I have seen assume the same key name across dictionaries. These type of examples have not worked for me because they assume the same key value:
newlist = sorted(list_to_be_sorted, key=lambda k: k['name'])
In my example, I need to compare the values regardless of whether the key is bob or sarah; and order the list of dictionaries. Here's an example list of dictionaries:
Times = [{"Bob":14.05}, {"Tim":15.09}, {"Tim":17.01}, {"Bob":16.81}, {"Sarah":15.08}]
desired output:
[{"Bob":14.05}, {"Sarah":15.08}, {"Tim":15.09}, {"Bob":16.81}, {"Tim":1701}]
times = [{"Bob":14.05},{"Tim":15.09},{"Tim":17.01},{"Bob":16.81},{"Sarah":15.08}]
print sorted(times, key=lambda k: k.values())
Output
[{'Bob': 14.05},{'Sarah': 15.08}, {'Tim': 15.09}, {'Bob': 16.81}, {'Tim': 17.01}]
If there are multiple values in the values list and if you want to consider only the elements at particular index, then you can do
print sorted(times, key=lambda k: k.values()[0])
What about:
newlist = sorted(Times, key=lambda k: k.values()[0])
It keys off the first (only) of the dictionary's .values()
#thefourtheye - your answer is quite nice.
Want to highlight a subtle and IMO interesting thing for folks new to python. Consider this tweak to thefourtheye's answer:
times = [{"Bob":14.05},{"Tim":15.09},{"Tim":17.01},{"Bob":16.81},{"Sarah":15.08}]
print sorted(times, key=lambda k: k.itervalues().next())
Which yields the same result:
[{'Bob': 14.05}, {'Sarah': 15.08}, {'Tim': 15.09}, {'Bob': 16.81}, {'Tim': 17.01}]
The tweak avoids the creation of an intermediate and unnecessary array. By using the iterator "itervalues()" and then getting just the first value (via .next()) the sort method just compares the raw value, without the array.
Let's look at performance:
test_cases = [
[],
[{"Bob":14.05}],
[{"Bob":14.05},{"Tim":15.09},{"Tim":17.01},{"Bob":16.81},{"Sarah":15.08}],
[dict(zip((str(x) for x in xrange(50)), random.sample(xrange(1000), 50)))] # 50 dict's in a list
]
print "perf test"
for test_case in test_cases:
print test_case
print "k.values() :", timeit.repeat(
"sorted(test_case, key=lambda k: k.values())",
"from __main__ import test_case",
)
print "k.itervalues().next():", timeit.repeat(
"sorted(test_case, key=lambda k: k.itervalues().next())",
"from __main__ import test_case",
)
print
results:
[]
k.values() : [0.7124178409576416, 0.7222259044647217, 0.7217190265655518]
k.itervalues().next(): [0.7274281978607178, 0.7140758037567139, 0.7135159969329834]
[{'Bob': 14.05}]
k.values() : [1.3001079559326172, 1.395097017288208, 1.314589023590088]
k.itervalues().next(): [1.2579071521759033, 1.2594029903411865, 1.2587871551513672]
[{'Bob': 14.05}, {'Tim': 15.09}, {'Tim': 17.01}, {'Bob': 16.81}, {'Sarah': 15.08}]
k.values() : [3.1186227798461914, 3.107577085494995, 3.1108040809631348]
k.itervalues().next(): [2.8267030715942383, 2.9143049716949463, 2.8211638927459717]
[{'42': 771, '48': 129, '43': 619, '49': 450, --- SNIP --- , '33': 162, '32': 764}]
k.values() : [1.5659689903259277, 1.6058270931243896, 1.5724899768829346]
k.itervalues().next(): [1.29836106300354, 1.2615361213684082, 1.267350196838379]
Mind you, perf will often not matter, but given that the 2 solutions are similar in terms of readabilty, expressiveness, I think it's good to understand the later solution, and build habits in those terms.

Finding index of values in a list dynamically

I am having two lists as follows:
list_1
['A-1','A-1','A-1','A-2','A-2','A-3']
list_2
['iPad','iPod','iPhone','Windows','X-box','Kindle']
I would like to split the list_2 based on the index values in list_1. For instance,
list_a1
['iPad','iPod','iPhone']
list_a2
['Windows','X-box']
list_a3
['Kindle']
I know index method, but it needs the value to be matched to be passed along with. In this case, I would like to dynamically find the indexes of the values in list_1 with the same value. Is this possible? Any tips/hints would be deeply appreciated.
Thanks.
There are a few ways to do this.
I'd do it by using zip and groupby.
First:
>>> list(zip(list_1, list_2))
[('A-1', 'iPad'),
('A-1', 'iPod'),
('A-1', 'iPhone'),
('A-2', 'Windows'),
('A-2', 'X-box'),
('A-3', 'Kindle')]
Now:
>>> import itertools, operator
>>> [(key, list(group)) for key, group in
... itertools.groupby(zip(list_1, list_2), operator.itemgetter(0))]
[('A-1', [('A-1', 'iPad'), ('A-1', 'iPod'), ('A-1', 'iPhone')]),
('A-2', [('A-2', 'Windows'), ('A-2', 'X-box')]),
('A-3', [('A-3', 'Kindle')])]
So, you just want each group, ignoring the key, and you only want the second element of each element in the group. You can get the second element of each group with another comprehension, or just by unzipping:
>>> [list(zip(*group))[1] for key, group in
... itertools.groupby(zip(list_1, list_2), operator.itemgetter(0))]
[('iPad', 'iPod', 'iPhone'), ('Windows', 'X-box'), ('Kindle',)]
I would personally find this more readable as a sequence of separate iterator transformations than as one long expression. Taken to the extreme:
>>> ziplists = zip(list_1, list_2)
>>> pairs = itertools.groupby(ziplists, operator.itemgetter(0))
>>> groups = (group for key, group in pairs)
>>> values = (zip(*group)[1] for group in groups)
>>> [list(value) for value in values]
… but a happy medium of maybe 2 or 3 lines is usually better than either extreme.
Usually I'm the one rushing to a groupby solution ;^) but here I'll go the other way and manually insert into an OrderedDict:
list_1 = ['A-1','A-1','A-1','A-2','A-2','A-3']
list_2 = ['iPad','iPod','iPhone','Windows','X-box','Kindle']
from collections import OrderedDict
d = OrderedDict()
for code, product in zip(list_1, list_2):
d.setdefault(code, []).append(product)
produces a d looking like
>>> d
OrderedDict([('A-1', ['iPad', 'iPod', 'iPhone']),
('A-2', ['Windows', 'X-box']), ('A-3', ['Kindle'])])
with easy access:
>>> d["A-2"]
['Windows', 'X-box']
and we can get the list-of-lists in list_1 order using .values():
>>> d.values()
[['iPad', 'iPod', 'iPhone'], ['Windows', 'X-box'], ['Kindle']]
If you've noticed that no one is telling you how to make a bunch of independent lists with names like list_a1 and so on-- that's because that's a bad idea. You want to keep the data together in something which you can (at a minimum) iterate over easily, and both dictionaries and list of lists qualify.
Maybe something like this?
#!/usr/local/cpython-3.3/bin/python
import pprint
import collections
def main():
list_1 = ['A-1','A-1','A-1','A-2','A-2','A-3']
list_2 = ['iPad','iPod','iPhone','Windows','X-box','Kindle']
result = collections.defaultdict(list)
for list_1_element, list_2_element in zip(list_1, list_2):
result[list_1_element].append(list_2_element)
pprint.pprint(result)
main()
Using itertools.izip_longest and itertools.groupby:
>>> from itertools import groupby, izip_longest
>>> inds = [next(g)[0] for k, g in groupby(enumerate(list_1), key=lambda x:x[1])]
First group items of list_1 and find the starting index of each group:
>>> inds
[0, 3, 5]
Now use slicing and izip_longest as we need pairs list_2[0:3], list_2[3:5], list_2[5:]:
>>> [list_2[x:y] for x, y in izip_longest(inds, inds[1:])]
[['iPad', 'iPod', 'iPhone'], ['Windows', 'X-box'], ['Kindle']]
To get a list of dicts you can something like:
>>> inds = [next(g) for k, g in groupby(enumerate(list_1), key=lambda x:x[1])]
>>> {k: list_2[ind1: ind2[0]] for (ind1, k), ind2 in
zip_longest(inds, inds[1:], fillvalue=[None])}
{'A-1': ['iPad', 'iPod', 'iPhone'], 'A-3': ['Kindle'], 'A-2': ['Windows', 'X-box']}
You could do this if you want simple code, it's not pretty, but gets the job done.
list_1 = ['A-1','A-1','A-1','A-2','A-2','A-3']
list_2 = ['iPad','iPod','iPhone','Windows','X-box','Kindle']
list_1a = []
list_1b = []
list_1c = []
place = 0
for i in list_1[::1]:
if list_1[place] == 'A-1':
list_1a.append(list_2[place])
elif list_1[place] == 'A-2':
list_1b.append(list_2[place])
else:
list_1c.append(list_2[place])
place += 1

How can I calculate the average of a list of tuples in python?

I have a list of tuples in the format:
[(security, price paid, number of shares purchased)....]
[('MSFT', '$39.458', '1,000'), ('AAPL', '$638.416', '200'), ('FOSL', '$52.033', '1,000'), ('OCZ', '$5.26', '34,480'), ('OCZ', '$5.1571', '5,300')]
I want to consolidate the data. Such that each security is only listed once.
[(Name of Security, Average Price Paid, Number of shares owned), ...]
I used a dictionary as Output.
lis=[('MSFT', '$39.458', '1,000'), ('AAPL', '$638.416', '200'), ('FOSL', '$52.033', '1,000'), ('OCZ', '$5.26', '34,480'), ('OCZ', '$5.1571', '5,300')]
dic={}
for x in lis:
if x[0] not in dic:
price=float(x[1].strip('$'))
nos=int("".join(x[2].split(',')))
#print(nos)
dic[x[0]]=[price,nos]
else:
price=float(x[1].strip('$'))
nos=int("".join(x[2].split(',')))
dic[x[0]][1]+=nos
dic[x[0]][0]=(dic[x[0]][0]+price)/2
print(dic)
output:
{'AAPL': [638.416, 200], 'OCZ': [5.20855, 39780], 'FOSL': [52.033, 1000], 'MSFT': [39.458, 1000]}
It's not very clear what you're trying to do. Some example code would help, along with some information of what you've tried. Even if your approach is dead wrong, it'll give us a vague idea of what you're aiming for.
In the meantime, perhaps numpy's numpy.mean function is appropriate for your problem? I would suggest transforming your list of tuples into a numpy array and then applying the mean function on a slice of said array.
That said, it does work on any list-like data structure and you can specify along which access you would like to perform the average.
http://docs.scipy.org/doc/numpy/reference/generated/numpy.mean.html
EDIT:
From what I've gathered, your list of tuples organizes data in the following manner:
(name, dollar ammount, weight)
I'd start by using numpy to transform your list of tuples into an array. From there, find the unique values in the first column (the names):
import numpy as np
a = np.array([(tag, 23.00, 5), (tag2, 25.00, 10)])
unique_tags = np.unique(a[0,:]) # note the slicing of the array
Now calculate the mean for each tag
meandic = {}
for element in unique_tags:
tags = np.nonzero(a[0,:] == element) # identify which lines are tagged with element
meandic[element] = np.mean([t(1) * t(2) for t in a[tags]])
Please note that this code is untested. I may have gotten small details wrong. If you can't figure something out, just leave a comment and I'll gladly correct my mistake. You'll have to remove '$' and convert strings to floats where necessary.
>>> lis
[('MSFT', '$39.458', '1,000'), ('AAPL', '$638.416', '200'), ('FOSL', '$52.033', '1,000'), ('OCZ', '$5.26', '34,480'), ('OCZ', '$5.1571', '5,300')]
>>> from collections import defaultdict
>>> d = defaultdict(list)
>>> for i in lis:
... amt = float(i[1].strip('$'))
... num = int(i[2].replace(",", ""))
... d[i[0]].append((amt,num))
...
>>> for i in d.iteritems():
... average_price = sum([s[0] for s in i[1]])/len([s[0] for s in i[1]])
... total_shares = sum([s[1] for s in i[1]])
... print (i[0],average_price,total_shares)
...
('AAPL', 638.416, 200)
('OCZ', 5.20855, 39780)
('FOSL', 52.033, 1000)
('MSFT', 39.458, 1000)
Here you go:
the_list = [('msft', '$31', 5), ('msft','$32', 10), ('aapl', '$100', 1)]
clean_list = map (lambda x: (x[0],float (x[1][1:]), int(x[2])), the_list)
out = {}
for name, price, shares in clean_list:
if not name in out:
out[name] = [price, shares]
else:
out[name][0] += price * shares
out[name][1] += shares
# put the output in the requested format
# not forgetting to calculate avg price paid
# out contains total # shares and total price paid
nice_out = [ (name, "$%0.2f" % (out[name][0] / out[name][1]), out[name][1])
for name in out.keys()]
print nice_out
>>> [('aapl', '$100.00', 1), ('msft', '$23.40', 15)]

Categories