comparing two lists and finding most preferred pairs - python

I have two lists in like this:
people: ["rob", "candice", "candice", "rob", "arnold", "ben", "ben", "ben", "arnold"]
fruit: ["orange", "orange", "mango", "mango", "orange", "orange", "banana", "mango", "banana"]
The two lists are of people and the respective fruits they prefer. They are lists of equal length.
I am asked to find (using only lists and no dictionaries) from the fruit list, the pair of fruits that most people prefer?
Expected result here will be ("mango","orange"). I would really appreciate help on the logic for this question. I want to code this myself.
Should I use nested for loops? or zip to run through both lists in parallel? This is some code I am working on. Trying to make a list of all the fruits, not sure if I am heading the right direction:
def findpair(fruit, people):
eachfruit=[]
seen=set()
for i in fruit
for j in people
if i not in seen
seen.add(i)

If you wanted to know what fruit is preferred most, all you have to do is count the fruits; there is no need to pair up fruits with the people here.
The teach-yourself-to-code approach is to keep counters per fruit in a dictionary, then sorting the dictionary by value to get the most popular fruit, then the next most popular, etc.:
counts = {}
for fruit in fruits:
counts[fruit] = counts.get(fruit, 0) + 1
top_fruits = sorted(counts, key=lambda fruit: counts[fruit], reversed=True)
print top_fruits[:2]
The Pythonic method is to use the collections.Counter() object and have it do the counting for you:
top_two = [fruit for fruit, count in Counter(fruits).most_common(2)]
which produces:
>>> from collections import Counter
>>> fruits = ["orange", "orange", "mango", "mango", "orange", "orange", "banana", "mango", "banana"]
>>> [fruit for fruit, count in Counter(fruits).most_common(2)]
['orange', 'mango']
If you needed to count pairs of fruit, you'll need to do more work; collections.defaultdict() can help collect the per-person preferences, using frozenset and set to only look at unique pairs, and itertools.combinations() to generate pairs of fruit per person:
from collections import defaultdict, Counter
from itertools import combinations
likes = defaultdict(set)
for person, fruit in zip(persons, fruits):
likes[person].add(fruit)
counts = Counter()
for person in likes:
for combo in combinations(likes[person], 2):
counts[frozenset(combo)] += 1
result = counts.most_common(1)[0][0]
Demo:
>>> from collections import defaultdict, Counter
>>> from itertools import combinations
>>> persons = ["rob", "candice", "candice", "rob", "arnold", "ben", "ben", "ben", "arnold"]
>>> fruits = ["orange", "orange", "mango", "mango", "orange", "orange", "banana", "mango", "banana"]
>>> likes = defaultdict(set)
>>> for person, fruit in zip(persons, fruits):
... likes[person].add(fruit)
...
>>> likes
defaultdict(<type 'set'>, {'ben': set(['orange', 'mango', 'banana']), 'rob': set(['orange', 'mango']), 'candice': set(['orange', 'mango']), 'arnold': set(['orange', 'banana'])})
>>> counts = Counter()
>>> for person in likes:
... for combo in combinations(likes[person], 2):
... counts[frozenset(combo)] += 1
...
>>> counts.most_common(1)[0][0]
frozenset(['orange', 'mango'])

Actually the expected result should be ('orange', 'mango'). Here's the code that does what you want:
people = ["rob", "candice", "candice", "rob", "arnold", "ben", "ben", "ben", "arnold"]
fruit = ["orange", "orange", "mango", "mango", "orange", "orange", "banana", "mango", "banana"]
class People(object):
def __init__(self, people, fruits):
self.people = people
self.fruits = fruits
self.fruits_list = []
for i in range(len(fruits)):
if not hasattr(self, fruits[i]):
setattr(self, fruits[i], 0)
if fruits[i] not in self.fruits_list:
self.fruits_list.append(fruits[i])
def getFruits(self):
for i in range(len(self.fruits)):
setattr(self, self.fruits[i], getattr(self, self.fruits[i])+1)
fruits_number = []
for i in range(len(self.fruits_list)):
fruits_number.append(getattr(self, self.fruits_list[i]))
max_list = sorted(fruits_number, reverse=True)[0:2]
f = ()
for i in max_list:
f += (self.fruits_list[fruits_number.index(i)],)
return f
obj = People(people, fruit)
obj.getFruits()
This class takes two lists, people and fruit as the parameters, it loops through the fruits list and checks if the class instance has an attribute for each fruit, if not it sets the attributes, it also appends each fruit to the list fruits_list, making sure that there are no repetitions. There's a method, getFruits() which loops through the fruits list and adds 1 to the fruit attribute each time it encounters the fruit, it then loops through the fruit_list and appends the number of fruits in a list called fruits_number and gets the biggest 2 numbers in the list (stored in the max_list variable), then it loops through max_list and adds the value of the fruit in fruits_list with the same index as the number in max_list stored at i in fruit_number, finally the tuple f is returned.

class Counter(dict):
'''Dict subclass for counting hashable items. Sometimes called a bag
or multiset. Elements are stored as dictionary keys and their counts
are stored as dictionary values.
So using Counter is using dict.
Only using list, you can do it this way:
1 sort list
2 go through list, if the fruit same as prev one, counter + 1, otherwise save the counter and fruit in result list
3 sort result list by counter and output result
=============================================================
def play():
fruit = ["orange", "orange", "mango", "mango", "orange", "orange", "banana", "mango", "banana"]
fruit.sort()
result = []
last = None
for f in fruit:
if last != f:
result.append([0, f])
else:
result[-1][0] += 1
last = f
result.sort(reverse=True)
print result[:2]

Related

python How to Reorder elements in a list

Now I have a list like ["apple", "banana", "lemon", "strawberry", "orange"]
If I want to reorder the list, for example:
Insert the strawberry into a new position, then the elements behind the strawberry need to move to the right one step. What should I do?
["apple", "strawberry", "banana", "lemon", "orange"]
You can use insert and pop (in example I also use index for clarity).
x = ["apple", "banana", "lemon", "strawberry", "orange"]
x.insert(1, x.pop(x.index("strawberry")))
print(x)
Prints:
["apple", "strawberry", "banana", "lemon", "orange"]
1 in the insert points to the index in list that you want your value to end up in.

Sorting a list by number of appearances and removing duplicates

For example, I have the following list, in which the number of appearances per element is:
apple - 3
banana - 4
orange - 2
the list:
["apple", "apple", "banana", "orange", "orange", "banana", "banana", "apple", "banana"]
I need to sort the list by prevalence without duplicaes, so the expected results will be:
["banana", "apple", "orange"]
I thought about creating a dictionary with each element as key, iterating over the list and then adding +1 for each time the key is found, so I will end up with an example dictionary:
dic = {"apple": 3, "banana": 4, "orange":2}
But kind of stuck about how to sort the list itself without the dupes..
Thanks in advance.
EDIT: Thank you everyone, I did not have knowledge of Counter. Happy holidays!
Use Counter:
from collections import Counter
data = ["apple", "apple", "banana", "orange", "orange", "banana", "banana", "apple", "banana"]
counts = Counter(data)
result = sorted(counts, key=counts.get, reverse=True)
print(result)
Output
['banana', 'apple', 'orange']
You can use a collections.Counter and its most_common method:
from collections import Counter
lst = ["apple", "apple", "banana", "orange", "orange", "banana", "banana", "apple", "banana"]
res = [k for k, _ in Counter(lst).most_common()]
# ['banana', 'apple', 'orange']
sort a set based on the original list's counts. EDIT: As pointed out in comments, you may want to use other solutions instead if you have a lot of candidates, calling a list's count method multiple times is not optimal.
a = ["apple", "apple", "banana", "orange", "orange", "banana", "banana", "apple", "banana"]
sorted(set(a), key = lambda x: a.count(x), reverse = True) #reverse for descending
Result:
['banana', 'apple', 'orange']
from itertools import groupby
L = ["apple", "apple", "banana", "orange", "orange", "banana", "banana", "apple", "banana"] # Input list
counts = [(i, len(list(c))) for i,c in groupby(sorted(L))] # Create value-count pairs as list of tuples
counts = sorted(counts, key = lambda i: i[1] , reverse=True) #sort value-count list
out = [key for key, value in counts] #extract key
print (out)
using list and set
a = ["apple", "apple", "banana", "orange", "orange", "banana", "banana", "apple", "banana"]
result =sorted(list(set(a)))
output :
['apple', 'banana', 'orange']

How to get all keys from a value for a dict of lists?

I have a dictionary in a format like this
d = {
"Fruit_1" : ["mango", "apple"],
"Fruit_2" : ["apple"],
"Fruit_3" : ["mango", "banana", "apple", "kiwi", "orange"]
}
I'm passing a value as "mango" and I want to get all corresponding keys where only mango occurs. I am not able to get corresponding keys where ever value occurs.
Iterate in d.items and check mango existence in value.
In [21]: [key for key,value in d.items() if 'mango' in value]
Out[21]: ['Fruit_1', 'Fruit_3']
You can do this perhaps:
d = {
"Fruit_1" : ["mango", "apple"],
"Fruit_2" : ["apple"],
"Fruit_3" : ["mango", "banana", "apple", "kiwi", "orange"]
}
# list comprehension
mango_keys = [fruit for fruit in d.keys() if "mango" in d[fruit]]
print(mango_keys)
# ['Fruit_1', 'Fruit_3']
# or more traditional for-loop (but non pythonic)
for fruit in d.keys():
if "mango" in d[fruit]:
print(fruit)
The naive approaches (looping through all items and looking for the fruit) work but have a high complexity, mostly if you have to perform a lot of requests. You could slightly improve it by replacing your list values by a set (for faster in lookup), but that would still be slow (O(n**2) => O(n) but room for improvement).
If you want to be able to perform those queries a lot of times, it would be better to rebuild the dictionary so lookup is very fast once built, using collections.defaultdict
d = {
"Fruit_1" : ["mango", "apple"],
"Fruit_2" : ["apple"],
"Fruit_3" : ["mango", "banana", "apple", "kiwi", "orange"]
}
import collections
newd = collections.defaultdict(list)
for k,vl in d.items():
for v in vl:
newd[v].append(k)
print(newd)
print(newd["mango"])
this is the rebuilt dict:
defaultdict(<class 'list'>, {'apple': ['Fruit_2', 'Fruit_3', 'Fruit_1'], 'orange': ['Fruit_3'], 'banana': ['Fruit_3'], 'kiwi': ['Fruit_3'], 'mango': ['Fruit_3', 'Fruit_1']})
this is the query for "mango":
['Fruit_3', 'Fruit_1']
Like this?
>>> d = {
... "Fruit_1" : ["mango", "apple"],
... "Fruit_2" : ["apple"],
... "Fruit_3" : ["mango", "banana", "apple", "kiwi", "orange"]
... }
>>>
>>> [key for key, value in d.items() if 'mango' in value]
['Fruit_1', 'Fruit_3']
The idea is to iterate over the (key, value) itempairs and check each value for the existence of 'mango'. If yes, keep the key.
Since you are new to Python here's the tradidtional for-loop logic:
>>> result = []
>>> for key, value in d.items():
... if 'mango' in value:
... result.append(key)
...
>>> result
['Fruit_1', 'Fruit_3']
For a single query, you can use a list comprehension. This will has O(n) time complexity each time you search a value:
res = [k for k, v in d.items() if 'mango' in v]
For multiple queries, you can use a defaultdict of set objects via a one-off O(n) cost:
from collections import defaultdict
dd = defaultdict(set)
for k, v in d.items():
for fruit in v:
dd[fruit].add(k)
print(dd)
defaultdict({'mango': {'Fruit_1', 'Fruit_3'},
'apple': {'Fruit_1', 'Fruit_2', 'Fruit_3'},
'banana': {'Fruit_3'},
'kiwi': {'Fruit_3'},
'orange': {'Fruit_3'}})
You can then use dd['mango'] to extract relevant keys.

Extracting elements from lists based on percentage

Say I have a list of n sub lists,
a =[["cat", "dog", "cow", "apple"], ["apple", "dog"], ["cat", "apple"]["cat","apple", "deer"]]
Assuming that the threshold percentage is 70%, the elements which appear at least 70% of the time in the sub lists should be in the output.
In this example, "apple" appears in all the sub lists, "cat" comes 3/4 time in all the sub lists.
Hence, the output should be ["apple", "cat"]
How can I achieve this?
I was using intersection, but then only the common elements in all sub lists would come in the output.
output= list(set(a[0]).intersection(*a))
you can use Counter
>>> from collections import Counter
>>> c = Counter()
>>> for l in a:
... c.update(set(l))
...
>>> c
Counter({'apple': 4, 'cat': 3, 'dog': 2, 'deer': 1, 'cow': 1})
>>> [key for key, value in c.items() if value >= 0.7 * len(a)]
['cat', 'apple']
This will do the job:
import collections
import itertools
a =[["cat", "dog", "cow", "apple"], ["apple", "dog"], ["cat", "apple"],["cat","apple", "deer"]]
n=len(a)
counter = collections.Counter(itertools.chain(*a))
res=[i for i in counter if counter[i]>=0.7*n]
print(res)
This prints
['cat', 'apple']

How to organize list by frequency of occurrence and alphabetically (in case of a tie) while eliminating duplicates?

Basically if given a list:
data = ["apple", "pear", "cherry", "apple", "pear", "apple", "banana"]
I'm trying to make a function that returns a list like this:
["apple", "pear", "banana", "cherry"]
I'm trying to make the return list ordered by most frequently occurring word first while breaking ties by ordering them alphabetically. I also am trying to eliminate duplicates.
I've made lists already of the counts of each element and the indices of each element in data.
x = [n.count() for n in data]
z = [n.index() for n in data]
I don't know where to go from this point.
You could do something like this:
from collections import Counter
data = ["apple", "pear", "cherry", "apple", "pear", "apple", "banana"]
counts = Counter(data)
words = sorted(counts, key=lambda word: (-counts[word], word))
print words
For ordering elements by frequency you can use, collections.most_common documentation here, so for example
from collections import Counter
data = ["apple", "pear", "cherry", "apple", "pear", "apple", "banana"]
print Counter(data).most_common()
#[('apple', 3), ('pear', 2), ('cherry', 1), ('banana', 1)]
Thanks to #Yuushi,
from collections import Counter
data = ["apple", "pear", "cherry", "apple", "pear", "apple", "banana"]
x =[a for (a, b) in Counter(data).most_common()]
print x
#['apple', 'pear', 'cherry', 'banana']
Here is a simple approach, but it should work.
data = ["apple", "pear", "cherry", "apple", "pear", "apple", "banana"]
from collections import Counter
from collections import defaultdict
my_counter = Counter(data)
# creates a dictionary with keys
# being numbers of occurrences and
# values being lists with strings
# that occured a given time
my_dict = defaultdict(list)
for k,v in my_counter.iteritems():
my_dict[v].append(k)
my_list = []
for k in sorted(my_dict, reverse=True):
# This is the second tie-break, if both
# strings showed up the same number of times
# and correspond to the same key, we sort them
# by the alphabetical order
my_list.extend(sorted(my_dict.get(k)))
Result:
>>> my_list
['apple', 'pear', 'banana', 'cherry']

Categories