Let's say I have a list of tuples like this:
l = [('music','300','url'),('movie','400','url'),
('clothing','250','url'),('music','350','url'),
('music','400','url'),('movie','1000','url')]
and that I want to sort these tuples into multiple lists, each grouped by the first element in the tuples. Further, once grouped into those lists, I want the new lists reverse sorted by the second element (the int). So, the result would be:
music = [('music','400','url'),('music','350','url'),('music','300','url')]
movie = [('movie','1000','url'),('movie','400','url')]
clothing = [('clothing','250','url')]
Perhaps I could forego the multiple lists and make a list of lists of tuples? So, I would get:
sortedlist = [[('music','400','url'),('music','350','url'),('music','300','url')],
[('movie','1000','url'),('movie','400','url')],
[('clothing','250','url')]]
But even in this case, how would I get the internal lists reverse sorted by the second element?
If I'm going about this the wrong way, please mention it. I'm still new at Python. Thx!
Well, you can get your lists easily with a list comprehension:
music = [x for x in l if x[0] == 'music']
movie = [x for x in l if x[0] == 'movie']
clothing = [x for x in l if x[0] == 'clothing']
You can even sort them in place
>>> music.sort(key=lambda x: x[1], reverse=True)
<<< [('music', '400', 'url'), ('music', '350', 'url'), ('music', '300', 'url')]
I'd just use a dict, personally. Simple data structures are best.
from collections import defaultdict
d = defaultdict(list)
for x in l:
d[x[0]].append(x[1:])
Which would give you something like:
>>> for k,v in d.iteritems():
...: print k, v
...:
...:
movie [('400', 'url'), ('1000', 'url')]
clothing [('250', 'url')]
music [('300', 'url'), ('350', 'url'), ('400', 'url')]
But then that's my solution for everything so maybe I need to branch out a little.
You can do something like this:
import itertools
import operator
sorted_l = sorted(l, key=lambda x: (x[0], int(x[1])), reverse=True)
print [list(g[1]) for g in itertools.groupby(sorted_l, key=operator.itemgetter(0))]
Output :
[[('music', '400', 'url'), ('music', '350', 'url'), ('music', '300', 'url')],
[('movie', '1000', 'url'), ('movie', '400', 'url')],
[('clothing', '250', 'url')]]
What I would do in a case like this is a dictionary of lists.
things = {}
for tuple in all_tuples:
key = tuple[0]
if not key in things:
things[key] = [] # Initialize empty list
things[key].append(tuple)
Then you can iterate through "things" using things.keys() or things.values()
E.g.
things["music"] = [('music','400','url'),('music','350','url'),('music','300','url')]
Related
I have two lists and they are lists of tuples.
For example
List1 = [('zaidan', 0.0013568521031207597),('zimmerman', 0.0013568521031207597), ('ypa', 0.004070556309362279)]
List2 = [('zimmerman', 0.0013568521031207597), ('ypa', 0.004070556309362279), ('zaidan', 0.0013568521031207597)]
If the items were in the same order I could use the following code to multiply the two values:
val = [(t1, v1*v2) for (t1, v1), (t2, v2) in zip(tf,idf)]
But my issue is the order of one the lists outputs randomly so the code doesn't work. So essentially I need to see if the word in one list matches the word in the other and then multiply to get an output in a similar way as the list of tuples.
This question excellently demonstrates the advantages of the dictionary data structure and how your problem could benefit from it. So first, we convert your list of tuples to dictionaries (dict-calls) and then you "combine" the two dicts as per your requirement to get the desired result.
lst1 = [('zaidan', 0.0013568521031207597),('zimmerman', 0.0013568521031207597), ('ypa', 0.004070556309362279)]
lst2 = [('zimmerman', 0.0013568521031207597), ('ypa', 0.004070556309362279), ('zaidan', 0.0013568521031207597)]
dct1 = dict(lst1)
dct2 = dict(lst2)
res = {k: v * dct2.get(k, 1) for k, v in dct1.items()}.items()
which produces:
dict_items([('zaidan', 1.8410476297432288e-06), ('zimmerman', 1.8410476297432288e-06), ('ypa', 1.656942866768906e-05)])
And if the dict_item data type is confusing, you can always cast it to a vanilla-list.
res = list(res)
print(res)
# [('zaidan', 1.8410476297432288e-06), ('zimmerman', 1.8410476297432288e-06), ('ypa', 1.656942866768906e-05)]
i would tell you the easiest solution if your data are the same.
just sort it :
ls1 = sorted(ls1, key=lambda tup: tup[0])
ls2 = sorted(ls2, key=lambda tup: tup[0])
val = [(t1, v1*v2) for (t1, v1), (t2, v2) in zip(ls1,ls2)]
If, for any reason, you do not want to use dictionary (although it is a superior solution) but want to do this with lists and tuples, what you are looking for is looping through the lists and checking for equality:
x = [('zaidan', 0.0013568521031207597),('zimmerman', 0.0013568521031207597), ('ypa', 0.004070556309362279)]
y = [('zimmerman', 0.0013568521031207597), ('ypa', 0.004070556309362279), ('zaidan', 0.0013568521031207597)]
z = []
for item in x:
for _item in y:
if item[0] == _item[0]
z.append((item[0], item[1]*_item[1]))
At the end, z will be a list of tuples with the original string at the 0 index and the result of multiplication at the 1 index.
I am using a tuple to store the output of a find -exec stat command and need to condense it in order to run du on it. The output is a tuple with each item being (username,/path/to/file)
I want to condense it to combine like usernames so the end result is (username,/path/to/file1,/path/to/file2,etc)
Is there any way to do this?
Here is the current code that returns my tuple
cmd = ['find',dir_loc,'-type','f','-exec','stat','-c','%U %n','{}','+']
process = Popen(cmd,stdout=PIPE)
find_out = process.communicate()
exit_code = process.wait()
find_out = find_out[0].split('\n')
out_tuple = []
for item in find_out:
out_tuple.append(item.split(' '))
Assuming you have a list of tuples or a list of lists of the form:
out_tuple = [('user_one', 'path_one'),
('user_three', 'path_seven'),
('user_two', 'path_five'),
('user_one', 'path_two'),
('user_one', 'path_three'),
('user_two', 'path_four')]
You can do:
from itertools import groupby
out_tuple.sort()
total_grouped = []
for key, group in groupby(out_tuple, lambda x: x[0]):
grouped_list = [key] + [x[1] for x in group]
total_grouped.append(tuple(grouped_list))
This will give you the list of tuples:
print total_grouped
# Prints:
# [('user_one', 'path_one', 'path_two', 'path_three'),
# ('user_three', 'path_seven'),
# ('user_two', 'path_five', 'path_four')]
If you started with a list of lists, then instead of:
total_grouped.append(tuple(grouped_list))
You can get rid of the tuple construction:
total_grouped.append(grouped_list)
I'll say one thing though, you might be better off using something like a dict as #BradBeattie suggests. If you're going to perform some operation later on that treats the first item in your tuple (or list) in a special way, then a dict is better.
It not only has a notion of uniqueness in the keys, it's also less cumbersome because the nesting has two distinct levels. First you have the dict, then you have the inner item which is a tuple (or a list). This is much clearer than having two similar collections nested one inside the other.
Just use a dict of lists:
out_tuple = [('user1', 'path1'),
('user1', 'path2'),
('user2', 'path3'),
('user1', 'path4'),
('user2', 'path5'),
('user1', 'path6')]
d={}
for user_name, path in out_tuple:
d.setdefault(user_name, []).append(path)
print d
Prints:
{'user2': ['path3', 'path5'], 'user1': ['path1', 'path2', 'path4', 'path6']}
Then if you want the output for each user name as a tuple:
for user_name in d:
print tuple([user_name]+d[user_name])
Prints:
('user2', 'path3', 'path5')
('user1', 'path1', 'path2', 'path4', 'path6')
I am having two lists as follows:
list_1
['A-1','A-1','A-1','A-2','A-2','A-3']
list_2
['iPad','iPod','iPhone','Windows','X-box','Kindle']
I would like to split the list_2 based on the index values in list_1. For instance,
list_a1
['iPad','iPod','iPhone']
list_a2
['Windows','X-box']
list_a3
['Kindle']
I know index method, but it needs the value to be matched to be passed along with. In this case, I would like to dynamically find the indexes of the values in list_1 with the same value. Is this possible? Any tips/hints would be deeply appreciated.
Thanks.
There are a few ways to do this.
I'd do it by using zip and groupby.
First:
>>> list(zip(list_1, list_2))
[('A-1', 'iPad'),
('A-1', 'iPod'),
('A-1', 'iPhone'),
('A-2', 'Windows'),
('A-2', 'X-box'),
('A-3', 'Kindle')]
Now:
>>> import itertools, operator
>>> [(key, list(group)) for key, group in
... itertools.groupby(zip(list_1, list_2), operator.itemgetter(0))]
[('A-1', [('A-1', 'iPad'), ('A-1', 'iPod'), ('A-1', 'iPhone')]),
('A-2', [('A-2', 'Windows'), ('A-2', 'X-box')]),
('A-3', [('A-3', 'Kindle')])]
So, you just want each group, ignoring the key, and you only want the second element of each element in the group. You can get the second element of each group with another comprehension, or just by unzipping:
>>> [list(zip(*group))[1] for key, group in
... itertools.groupby(zip(list_1, list_2), operator.itemgetter(0))]
[('iPad', 'iPod', 'iPhone'), ('Windows', 'X-box'), ('Kindle',)]
I would personally find this more readable as a sequence of separate iterator transformations than as one long expression. Taken to the extreme:
>>> ziplists = zip(list_1, list_2)
>>> pairs = itertools.groupby(ziplists, operator.itemgetter(0))
>>> groups = (group for key, group in pairs)
>>> values = (zip(*group)[1] for group in groups)
>>> [list(value) for value in values]
… but a happy medium of maybe 2 or 3 lines is usually better than either extreme.
Usually I'm the one rushing to a groupby solution ;^) but here I'll go the other way and manually insert into an OrderedDict:
list_1 = ['A-1','A-1','A-1','A-2','A-2','A-3']
list_2 = ['iPad','iPod','iPhone','Windows','X-box','Kindle']
from collections import OrderedDict
d = OrderedDict()
for code, product in zip(list_1, list_2):
d.setdefault(code, []).append(product)
produces a d looking like
>>> d
OrderedDict([('A-1', ['iPad', 'iPod', 'iPhone']),
('A-2', ['Windows', 'X-box']), ('A-3', ['Kindle'])])
with easy access:
>>> d["A-2"]
['Windows', 'X-box']
and we can get the list-of-lists in list_1 order using .values():
>>> d.values()
[['iPad', 'iPod', 'iPhone'], ['Windows', 'X-box'], ['Kindle']]
If you've noticed that no one is telling you how to make a bunch of independent lists with names like list_a1 and so on-- that's because that's a bad idea. You want to keep the data together in something which you can (at a minimum) iterate over easily, and both dictionaries and list of lists qualify.
Maybe something like this?
#!/usr/local/cpython-3.3/bin/python
import pprint
import collections
def main():
list_1 = ['A-1','A-1','A-1','A-2','A-2','A-3']
list_2 = ['iPad','iPod','iPhone','Windows','X-box','Kindle']
result = collections.defaultdict(list)
for list_1_element, list_2_element in zip(list_1, list_2):
result[list_1_element].append(list_2_element)
pprint.pprint(result)
main()
Using itertools.izip_longest and itertools.groupby:
>>> from itertools import groupby, izip_longest
>>> inds = [next(g)[0] for k, g in groupby(enumerate(list_1), key=lambda x:x[1])]
First group items of list_1 and find the starting index of each group:
>>> inds
[0, 3, 5]
Now use slicing and izip_longest as we need pairs list_2[0:3], list_2[3:5], list_2[5:]:
>>> [list_2[x:y] for x, y in izip_longest(inds, inds[1:])]
[['iPad', 'iPod', 'iPhone'], ['Windows', 'X-box'], ['Kindle']]
To get a list of dicts you can something like:
>>> inds = [next(g) for k, g in groupby(enumerate(list_1), key=lambda x:x[1])]
>>> {k: list_2[ind1: ind2[0]] for (ind1, k), ind2 in
zip_longest(inds, inds[1:], fillvalue=[None])}
{'A-1': ['iPad', 'iPod', 'iPhone'], 'A-3': ['Kindle'], 'A-2': ['Windows', 'X-box']}
You could do this if you want simple code, it's not pretty, but gets the job done.
list_1 = ['A-1','A-1','A-1','A-2','A-2','A-3']
list_2 = ['iPad','iPod','iPhone','Windows','X-box','Kindle']
list_1a = []
list_1b = []
list_1c = []
place = 0
for i in list_1[::1]:
if list_1[place] == 'A-1':
list_1a.append(list_2[place])
elif list_1[place] == 'A-2':
list_1b.append(list_2[place])
else:
list_1c.append(list_2[place])
place += 1
Although poorly written, this code:
marker_array = [['hard','2','soft'],['heavy','2','light'],['rock','2','feather'],['fast','3'], ['turtle','4','wet']]
marker_array_DS = []
for i in range(len(marker_array)):
if marker_array[i-1][1] != marker_array[i][1]:
marker_array_DS.append(marker_array[i])
print marker_array_DS
Returns:
[['hard', '2', 'soft'], ['fast', '3'], ['turtle', '4', 'wet']]
It accomplishes part of the task which is to create a new list containing all nested lists except those that have duplicate values in index [1]. But what I really need is to concatenate the matching index values from the removed lists creating a list like this:
[['hard heavy rock', '2', 'soft light feather'], ['fast', '3'], ['turtle', '4', 'wet']]
The values in index [1] must not be concatenated. I kind of managed to do the concatenation part using a tip from another post:
newlist = [i + n for i, n in zip(list_a, list_b]
But I am struggling with figuring out the way to produce the desired result. The "marker_array" list will be already sorted in ascending order before being passed to this code. All like-values in index [1] position will be contiguous. Some nested lists may not have any values beyond [0] and [1] as illustrated above.
Quick stab at it... use itertools.groupby to do the grouping for you, but do it over a generator that converts the 2 element list into a 3 element.
from itertools import groupby
from operator import itemgetter
marker_array = [['hard','2','soft'],['heavy','2','light'],['rock','2','feather'],['fast','3'], ['turtle','4','wet']]
def my_group(iterable):
temp = ((el + [''])[:3] for el in marker_array)
for k, g in groupby(temp, key=itemgetter(1)):
fst, snd = map(' '.join, zip(*map(itemgetter(0, 2), g)))
yield filter(None, [fst, k, snd])
print list(my_group(marker_array))
from collections import defaultdict
d1 = defaultdict(list)
d2 = defaultdict(list)
for pxa in marker_array:
d1[pxa[1]].extend(pxa[:1])
d2[pxa[1]].extend(pxa[2:])
res = [[' '.join(d1[x]), x, ' '.join(d2[x])] for x in sorted(d1)]
If you really need 2-tuples (which I think is unlikely):
for p in res:
if not p[-1]:
p.pop()
marker_array = [['hard','2','soft'],['heavy','2','light'],['rock','2','feather'],['fast','3'], ['turtle','4','wet']]
marker_array_DS = []
marker_array_hit = []
for i in range(len(marker_array)):
if marker_array[i][1] not in marker_array_hit:
marker_array_hit.append(marker_array[i][1])
for i in marker_array_hit:
lists = [item for item in marker_array if item[1] == i]
temp = []
first_part = ' '.join([str(item[0]) for item in lists])
temp.append(first_part)
temp.append(i)
second_part = ' '.join([str(item[2]) for item in lists if len(item) > 2])
if second_part != '':
temp.append(second_part);
marker_array_DS.append(temp)
print marker_array_DS
I learned python for this because I'm a shameless rep whore
marker_array = [
['hard','2','soft'],
['heavy','2','light'],
['rock','2','feather'],
['fast','3'],
['turtle','4','wet'],
]
data = {}
for arr in marker_array:
if len(arr) == 2:
arr.append('')
(first, index, last) = arr
firsts, lasts = data.setdefault(index, [[],[]])
firsts.append(first)
lasts.append(last)
results = []
for key in sorted(data.keys()):
current = [
" ".join(data[key][0]),
key,
" ".join(data[key][1])
]
if current[-1] == '':
current = current[:-1]
results.append(current)
print results
--output:--
[['hard heavy rock', '2', 'soft light feather'], ['fast', '3'], ['turtle', '4', 'wet']]
A different solution based on itertools.groupby:
from itertools import groupby
# normalizes the list of markers so all markers have 3 elements
def normalized(markers):
for marker in markers:
yield marker + [""] * (3 - len(marker))
def concatenated(markers):
# use groupby to iterator over lists of markers sharing the same key
for key, markers_in_category in groupby(normalized(markers), lambda m: m[1]):
# get separate lists of left and right words
lefts, rights = zip(*[(m[0],m[2]) for m in markers_in_category])
# remove empty strings from both lists
lefts, rights = filter(bool, lefts), filter(bool, rights)
# yield the concatenated entry for this key (also removing the empty string at the end, if necessary)
yield filter(bool, [" ".join(lefts), key, " ".join(rights)])
The generator concatenated(markers) will yield the results. This code correctly handles the ['fast', '3'] case and doesn't return an additional third element in such cases.
I am getting a list in which I am saving the results in the following way
City Percentage
Mumbai 98.30
London 23.23
Agra 12.22
.....
List structure is [["Mumbai",98.30],["London",23.23]..]
I am saving this records in form of a list.I need the list to be sort top_ten records.Even if I get cities also, it would be fine.
I am trying to use the following logic, but it fails for to provide accurate data
if (condition):
if b not in top_ten:
top_ten.append(b)
top_ten.remove(tmp)
Any other solution,approach is also welcome.
EDIT 1
for a in sc_percentage:
print a
List I am getting
(<ServiceCenter: DELHI-DLC>, 100.0)
(<ServiceCenter: DELHI-DLE>, 75.0)
(<ServiceCenter: DELHI-DLN>, 90.909090909090907)
(<ServiceCenter: DELHI-DLS>, 83.333333333333343)
(<ServiceCenter: DELHI-DLW>, 92.307692307692307)
Sort the list first and then slice it:
>>> lis = [['Mumbai', 98.3], ['London', 23.23], ['Agra', 12.22]]
>>> print sorted(lis, key = lambda x : x[1], reverse = True)[:10] #[:10] returns first ten items
[['Mumbai', 98.3], ['London', 23.23], ['Agra', 12.22]]
To get data in list form from that file use this:
with open('abc') as f:
next(f) #skip header
lis = [[city,float(val)] for city, val in( line.split() for line in f)]
print lis
#[['Mumbai', 98.3], ['London', 23.23], ['Agra', 12.22]]
Update:
new_lis = sorted(sc_percentage, key = lambda x : x[1], reverse = True)[:10]
for item in new_lis:
print item
sorted returns a new sorted list, as we need to sort the list based on the second item of each element so we used the key parameter.
key = lambda x : x[1] means use the value on the index 1(i.e 100.0, 75.0 etc) of each item for comparison.
reverse= True is used for reverse sorting.
If the list is fairly short then as others have suggested you can sort it and slice it. If the list is very large then you may be better using heapq.nlargest():
>>> import heapq
>>> lis = [['Mumbai', 98.3], ['London', 23.23], ['Agra', 12.22]]
>>> heapq.nlargest(2, lis, key=lambda x:x[1])
[['Mumbai', 98.3], ['London', 23.23]]
The difference is that nlargest only makes a single pass through the list and in fact if you are reading from a file or other generated source need not all be in memory at the same time.
You might also be interested to look at the source for nlargest() as it works in much the same way that you were trying to solve the problem: it keeps only the desired number of elements in a data structure known as a heap and each new value is pushed into the heap then the smallest value is popped from the heap.
Edit to show comparative timing:
>>> import random
>>> records = []
>>> for i in range(100000):
value = random.random() * 100
records.append(('city {:2.4f}'.format(value), value))
>>> import heapq
>>> heapq.nlargest(10, records, key=lambda x:x[1])
[('city 99.9995', 99.99948904248298), ('city 99.9974', 99.99738898315216), ('city 99.9964', 99.99642759230214), ('city 99.9935', 99.99345173704319), ('city 99.9916', 99.99162694442714), ('city 99.9908', 99.99075084123544), ('city 99.9887', 99.98865134685201), ('city 99.9879', 99.98792632193258), ('city 99.9872', 99.98724339718686), ('city 99.9854', 99.98540548350132)]
>>> timeit.timeit('sorted(records, key=lambda x:x[1])[:10]', setup='from __main__ import records', number=10)
1.388942152229788
>>> timeit.timeit('heapq.nlargest(10, records, key=lambda x:x[1])', setup='import heapq;from __main__ import records', number=10)
0.5476185073315492
On my system getting the top 10 from 100 records is fastest by sorting and slicing, but with 1,000 or more records it is faster to use nlargest.
You have to convert your input into something Python can handle easily:
with open('input.txt') as inputFile:
lines = inputFile.readLines()
records = [ line.split() for line in lines ]
records = [ float(percentage), city for city, percentage in records ]
Now the records contain a list of the entries like this:
[ [ 98.3, 'Mumbai' ], [ 23.23, 'London' ], [ 12.22, Agra ] ]
You can sort that list in-place:
records.sort()
You can print the top ten by slicing:
print records[0:10]
If you have a huge list (e. g. millions of entries) and just want the top ten of these in a sorted way, there are better ways than sorting the whole list (which would be a waste of time then).
For printing the top 10 cities you can use :
Sort the list first and then slice it:
>>> lis = [['Mumbai', 98.3], ['London', 23.23], ['Agra', 12.22]]
>>> [k[0] for k in sorted(lis, key = lambda x : x[1], reverse = True)[:10]]
['Mumbai', 'London', 'Agra']
For the given list
>>>: lis=[("<ServiceCenter: DELHI-DLC>", 100.0),("<ServiceCenter: DELHI-DLW>", 92.307692307692307),("<ServiceCenter: DELHI-DLE>", 75.0),("<ServiceCenter: DELHI-DLN>", 90.909090909090907),("<ServiceCenter: DELHI-DLS>", 83.333333333333343)]
>>>:t=[k[0] for k in sorted(lis, key = lambda x : x[1], reverse = True)[:10]]
>>>:print t
['<ServiceCenter: DELHI-DLC>',
'<ServiceCenter: DELHI-DLW>',
'<ServiceCenter: DELHI-DLN>',
'<ServiceCenter: DELHI-DLS>',
'<ServiceCenter: DELHI-DLE>']
Sorted function returns the sorted list with key as the compare function .