Get the n top elements from a list of lists - python

I'm using the following example for demonstration purposes.
[["apple",10],
["oranges",5],
["strawberies",2],
["pineapples",12],
["bananas",9],
["tomattoes",8],
["watermelon",1],
["mangos",7],
["grapes",11],
["potattoes",3]]
I want to get say the top 3 fruits by quantity (top 3 elements returned), however i don't want the order to change.
So the end result will be
[["apple",10],
["pineapples",12],
["grapes",11]]
Any help will be appreciated.

arr = [["apple",10],
["oranges",5],
["strawberies",2],
["pineapples",12],
["bananas",9],
["tomattoes",8],
["watermelon",1],
["mangos",7],
["grapes",11],
["potattoes",3]]
sorted_arr = sorted(arr,key=lambda x: x[1],reverse=True)[:3]
output = [elem for elem in arr if elem in sorted_arr]
print(sorted_arr)
print(output)
First, we sort the array in reverse order to get the first 3 elements. Then, we use list comprehension to loop through the original array and check if the elements are in the top 3. This preserves the order.

As you have probably heard before there is a sort method that you can apply to lists. You can pass in the key as a function that will indicate how you want your list to be sorted.
l = [["apple",10],
["oranges",5],
["strawberies",2],
["pineapples",12],
["bananas",9],
["tomattoes",8],
["watermelon",1],
["mangos",7],
["grapes",11],
["potattoes",3]]
l.sort(key = lambda x: x[1]) #to sort by second element
result = l[-3:] # To get the top three elements

Related

Grouping a grouped list of str without duplicates

I have a grouped list of strings that sort of looks like this, the lists inside of these groups will always contain 5 elements:
text_list = [['aaa','bbb','ccc','ddd','eee'],
['fff','ggg','hhh','iii','jjj'],
['xxx','mmm','ccc','bbb','aaa'],
['fff','xxx','aaa','bbb','ddd'],
['aaa','bbb','ccc','ddd','eee'],
['fff','xxx','aaa','ddd','eee'],
['iii','xxx','ggg','jjj','aaa']]
The objective is simple, group all of the list that is similar by the first 3 elements that is then compared against all of the elements inside of the other groups.
So from the above example the output might look like this (output is the index of the list):
[[0,2,4],[3,5]]
Notice how if there is another list that contains the same elements but in a different order is removed.
I've written the following code to extract the groups but they would return duplicates and I am unsure how to proceed. I also think this might not be the most efficient way to do the extraction as the real list can contain upwards to millions of groups:
grouped_list = []
for i in range(0,len(text_list)):
int_temp = []
for m in range(0,len(text_list)):
if i == m:
continue
bool_check = all( x in text_list[m] for x in text_list[i][0:3])
if bool_check:
if len(int_temp) == 0:
int_temp.append(i)
int_temp.append(m)
continue
int_temp.append(m)
grouped_list.append(int_temp)
## remove index with no groups
grouped_list = [x for x in grouped_list if x != []]
Is there a better way to go about this? How do I remove the duplicate group afterwards? Thank you.
Edit:
To be clearer, I would like to retrieve the lists that is similar to each other but only using the first 3 elements of the other lists. For example, using the first 3 elements from list A, check if list B,C,D... contains all 3 of the elements from list A. Repeat for the entire list then remove any list that contains duplicate elements.
You can build a set of frozensets to keep track of indices of groups with the first 3 items being a subset of the rest of the members:
groups = set()
sets = list(map(set, text_list))
for i, lst in enumerate(text_list):
groups.add(frozenset((i, *(j for j, s in enumerate(sets) if set(lst[:3]) <= s))))
print([sorted(group) for group in groups if len(group) > 1])
If the input list is long, it would be faster to create a set of frozensets of the first 3 items of all sub-lists and use the set to filter all combinations of 3 items from each sub-list, so that the time complexity is essentially linear to the input list rather than quadratic despite the overhead in generating combinations:
from itertools import combinations
sets = {frozenset(lst[:3]) for lst in text_list}
groups = {}
for i, lst in enumerate(text_list):
for c in map(frozenset, combinations(lst, 3)):
if c in sets:
groups.setdefault(c, []).append(i)
print([sorted(group) for group in groups.values() if len(group) > 1])

List comprehension using index values

How do I turn the below for loop into a list comprehension?
I have a list of lists. Each sublist has a number in index 4 position, and I would like to find a way to add them up. The for loop works, but I'd like a one-line solution with list comprehension.
frequencies = []
for row in table:
frequencies.append(int(row[4]))
sum(frequencies)
Here's my attempt:
frequencies = [sum(i) for i, x in enumerate(table) if x == 4]
However, the result is an empty object.
In[52]: total
Out[52]: []
Do it like this -
frequencies = sum([int(row[4]) for row in table])
The approach you're using is a little different from what you want to achieve.

Dictionary comprehension to create dictionary of even numbers

I want to create a dictionary of even numbers with the keys being consecutive integers using dictionary comprehension
The output should be:
{1:2,2:4,3:6,4:8}
I used 2 lines of code ie one line being the list comprehension to get the even numbers and the second being the dictionary comprehension to get the desired output.
The code i used is as follows:
evens=[number for number in range(1,10) if number%2==0]
even_dict={k:evens[k-1] for k in range(1,len(evens)+1)}
My question is instead of using 2 lines of code, can we use a single line of code which involves only dictionary comprehension to get the desired output?
According to what is your desired output, you can simply do:
d = {x: 2*x for x in range(1, 5)}
The way you have it now, you have to define evens before since you are using it in two places in the dict comprehension: To iterate the indices, and to get the actual element. Generally, whenever you need both the index and the element, you can use enumerate instead, possible with start parameter if you want to offset the index:
even_dict = {i: x for i, x in enumerate(evens, start=1)}
Now you only need evens once, and thus you could "inline" it into the dict comprehension:
even_dict = {i: x for i, x in enumerate([number for number in range(1,10) if number%2==0], start=1)}
But you do not need that inner list comprehension at all; to get the even numbers, you could just use range with step parameter:
even_dict = {i: x for i, x in enumerate(range(2, 10, 2), start=1)}
And, finally, in this particular case, you would not even need that, either, as you can just multiply the key with two to get the value, as shown in #olinox's answer.

How to get common elements together in a python list?

This might sound like a stupid question but I have the following list:
list = ['a','b','c','d','a','b','c','d']
And I want to get common elements together to rearrange it as:
sorted_list = ['a','a','b','b','c','c','d','d']
Is there any built in function in python to do that?
Well to get sorted list you could just use:
sorted_list = sorted(list)
which gives the output ['a','a','b','b','c','c','d','d']
To sort and group elements by values:
list = sorted(list)
sorted_list = [[y for y in list if y==x] for x in list]
which gives the output [['a','a'],['b','b'],['c','c'],['d','d']]

Max Value within a List of Lists of Tuple

I have a problem to get the highest Value in a dynamic List of Lists of Tuples.
The List can looks like this:
adymlist = [[('name1',1)],[('name2',2),('name3',1), ...('name10', 20)], ...,[('name m',int),..]]
Now I loop through the List to get the highest Value (integer):
total = {}
y=0
while y < len(adymlist):
if len(adymlist) == 1:
#has the List only 1 Element -> save it in total
total[adymlist[y][0][0]] = adymlist[y][0][1]
y += 1
else:
# here is the problem
# iterate through each lists to get the highest Value
# and you dont know how long this list can be
# safe the highest Value in total f.e. total = {'name1':1,'name10':20, ..}
I tried a lot to get the maximum Value but I found no conclusion to my problem. I know i must loop through each Tuple in the List and compare it with the next one but it dont know how to code it correct.
Also I can use the function max() but it doesnt work with strings and integers. f.e.
a = [ ('a',5),('z',1)] -> result is max(a) ---> ('z',1) obv 5 > 1 but z > a so I tried to expand the max function with max(a, key=int) but I get an Type Error.
Hope you can understand what I want ;-)
UPDATE
Thanks so far.
If I use itertools.chain(*adymlist) and max(flatlist, key=lambda x: x[1])
I will get an exception like : max_word = max(flatlist, key=lambda x: x[1])
TypeError: 'int' object is unsubscriptable
BUT If I use itertools.chain(adymlist) it works fine. But I dont know how to summate all integers from each Tuple of the List. I need your help to figure it out.
Otherwise I wrote a workaround for itertools.chain(*adymlist) to get the sum of all integers and the highest integer in that list.
chain = itertools.chain(*adymlist)
flatlist = list(chain)
# flatlist = string, integer, string, integer, ...
max_count = max(flatlist[1:len(flatlist):2])
total_count = sum(flatlist[1:len(flatlist):2])
# index of highest integer
idx = flatlist.index(next((n for n in flatlist if n == max_count)))
max_keyword = flatlist[idx-1]
It still does what I want, but isn't it to dirty?
To clarify, looks like you've got a list of lists of tuples. It doesn't look like we care about what list they are in, so we can simplify this to two steps
Flatten the list of lists to a list of tuples
Find the max value
The first part can be accomplished via itertools.chain (see e.g., Flattening a shallow list in Python)
The second can be solved through max, you have the right idea, but you should be passing in a function rather than the type you want. This function needs to return the value you've keyed on, in this case ,the second part of the tuple
max(flatlist, key=lambda x: x[1])
Correction
I re-read your question - are you looking for the max value in each sub-list? If this is the case, then only the second part is applicable. Simply iterate over your list for each list
A bit more pythonic than what you currently have would like
output = []
for lst in lists:
output.append( max(flatlist, key=lambda x: x[1]) )
or
map(lambda x: max(x, key=lambda y: y[1]) , lists)
As spintheblack says, you have a list of lists of tuples. I presume you are looking for the highest integer value of all tuples.
You can iterate over the outer list, then over the list of tuples tuples like this:
max_so_far = 0
for list in adymlist:
for t in list:
if t[1] > max_so_far:
max_so_far = t[1]
print max_so_far
This is a little bit more verbose but might be easier to understand.

Categories