Minimum number of items to cover all cases

Minimum number of items to cover all cases - python

I am looking for a way to find the minimum number of items needed to cover all the cases in a key-value pair setting.
pd.DataFrame({'key': ['AAA', 'BBB', 'BBB','BBB', 'CCC', 'CCC'],
'value': ['1', '1', '2','4', '1','3']})
I have 4 values (1,2,3,4) and in order to cover them all I need at least the following keys
BBB is the only one to give me 2 and 4
CCC is the only one to give me 3
and both BBB and CCC give me 1
So in that case the minimum number of keys to include all the values is 2 (BBB and CCC)
Is there a model/library to help with this type of calculation?

The problem you are describing is closely related to the set cover problem. Finding a hitting set is NP-hard.
I have implemented your solution as follows:
keys = pd.unique(df['key'])
values = pd.unique(df['value'])
x = len(keys)
count = x
result = keys
for i in range(1 << x):
subset_keys = [keys[j] for j in range(x) if (i & (1 << j))]
subset_values = []
for key in subset_keys:
subset_values += list(df.query("key=='"+key+"'")['value'])
if len(set(subset_values))==len(list(values)) and len(subset_keys)<count:
result = subset_keys
print(result)
Complexity is O(2^n) where n is the number of unique keys.

You could approach the problem with .mode() like this:
import pandas as pd
df = pd.DataFrame({'key': ['AAA', 'BBB', 'BBB','BBB', 'CCC', 'CCC'],
'value': ['1', '1', '2','4', '1','3']})
lst = list()
while not df.empty:
x = df['key'].mode().iloc[0]
df = df[~df['value'].isin(df.loc[df['key'].eq(x), 'value'])]
lst.append(x)
print(lst)
# ['BBB', 'CCC']

Related

generating list of every combination without duplicates

I would like to generate a list of combinations. I will try to simplify my problem to make it understandable.
We have 3 variables :
x : number of letters
k : number of groups
n : number of letters per group
I would like to generate using python a list of every possible combinations, without any duplicate knowing that : i don't care about the order of the groups and the order of the letters within a group.
As an example, with x = 4, k = 2, n = 2 :
# we start with 4 letters, we want to make 2 groups of 2 letters
letters = ['A','B','C','D']
# here would be a code that generate the list
# Here is the result that is very simple, only 3 combinations exist.
combos = [ ['AB', 'CD'], ['AC', 'BD'], ['AD', 'BC'] ]
Since I don't care about the order of or within the groups, and letters within a group, ['AB', 'CD'] and ['DC', 'BA'] is a duplicate.
This is a simplification of my real problem, which has those values : x = 12, k = 4, n = 3. I tried to use some functions from itertools, but with that many letters my computer freezes because it's too many combinations.
Another way of seeing the problem : you have 12 players, you want to make 4 teams of 3 players. What are all the possibilities ?
Could anyone help me to find an optimized solution to generate this list?

There will certainly be more sophisticated/efficient ways of doing this, but here's an approach that works in a reasonable amount of time for your example and should be easy enough to adapt for other cases.
It generates unique teams and unique combinations thereof, as per your specifications.
from itertools import combinations
# this assumes that team_size * team_num == len(players) is a given
team_size = 3
team_num = 4
players = list('ABCDEFGHIJKL')
unique_teams = [set(c) for c in combinations(players, team_size)]
def duplicate_player(combo):
"""Returns True if a player occurs in more than one team"""
return len(set.union(*combo)) < len(players)
result = (combo for combo in combinations(unique_teams, team_num) if not duplicate_player(combo))
result is a generator that can be iterated or turned into a list with list(result). On kaggle.com, it takes a minute or so to generate the whole list of all possible combinations (a total of 15400, in line with the computations by #beaker and #John Coleman in the comments). The teams are tuples of sets that look like this:
[({'A', 'B', 'C'}, {'D', 'E', 'F'}, {'G', 'H', 'I'}, {'J', 'K', 'L'}),
({'A', 'B', 'C'}, {'D', 'E', 'F'}, {'G', 'H', 'J'}, {'I', 'K', 'L'}),
({'A', 'B', 'C'}, {'D', 'E', 'F'}, {'G', 'H', 'K'}, {'I', 'J', 'L'}),
...
]
If you want, you can cast them into strings by calling ''.join() on each of them.

Another solution (players are numbered 0, 1, ...):
import itertools
def equipartitions(base_count: int, group_size: int):
if base_count % group_size != 0:
raise ValueError("group_count must divide base_count")
return set(_equipartitions(frozenset(range(base_count)), group_size))
def _equipartitions(base_set: frozenset, group_size: int):
if not base_set:
yield frozenset()
for combo in itertools.combinations(base_set, group_size):
for rest in _equipartitions(base_set.difference(frozenset(combo)), group_size):
yield frozenset({frozenset(combo), *rest})
all_combinations = [
[tuple(team) for team in combo]
for combo in equipartitions(12, 3)
]
print(all_combinations)
print(len(all_combinations))
And another:
import itertools
from typing import Iterable
def equipartitions(players: Iterable, team_size: int):
if len(players) % team_size != 0:
raise ValueError("group_count must divide base_count")
return _equipartitions(set(players), team_size)
def _equipartitions(players: set, team_size: int):
if not players:
yield []
return
first_player, *other_players = players
for other_team_members in itertools.combinations(other_players, team_size-1):
first_team = {first_player, *other_team_members}
for other_teams in _equipartitions(set(other_players) - set(first_team), team_size):
yield [first_team, *other_teams]
all_combinations = [
{''.join(sorted(team)) for team in combo} for combo in equipartitions(players='ABCDEFGHIJKL', team_size=3)
]
print(all_combinations)
print(len(all_combinations))

Firstly, you can use a list comprehension to give you all of the possible combinations (regardless of the duplicates):
comb = [(a,b) for a in letters for b in letters if a != b]
And, afterwards, you can use the sorted function to sort the tuples. After that, to remove the duplicates, you can convert all of the items to a set and then back to a list.
var = [tuple(sorted(sub)) for sub in comb]
var = list(set(var))

You could use the list comprehension approach, which has a time complexity of O(n*n-1), or you could use a more verbose way, but with a slightly better time complexity of O(n^2-n)/2:
comb = []
for first_letter_idx, _ in enumerate(letters):
for sec_letter_idx in range(first_letter_idx + 1, len(letters)):
comb.append(letters[first_letter_idx] + letters[sec_letter_idx])
print(comb)
comb2 = []
for first_letter_idx, _ in enumerate(comb):
for sec_letter_idx in range(first_letter_idx + 1, len(comb)):
if (comb[first_letter_idx][0] not in comb[sec_letter_idx]
and comb[first_letter_idx][1] not in comb[sec_letter_idx]):
comb2.append([comb[first_letter_idx], comb[sec_letter_idx]])
print(comb2)
This algorithm needs more work to handle dynamic inputs. Maybe with recursion.

Use combination from itertools
from itertools import combinations
x = list(combinations(['A','B','C','D'],2))
t = []
for i in (x):
t.append(i[0]+i[1]) # concatenating the strings and adding in a list
g = []
for i in range(0,len(t),2):
for j in range(i+1,len(t)):
g.append([t[i],t[j]])
break
print(g)

Replace duplicates in a list column

I got a list, in one (the last) column is a string of comma separated items:
temp = ['AAA', 'BBB', 'CCC-DDD', 'EE,FFF,FFF,EE']
Now I want to remove the duplicates in that column.
I tried to make a list out of every column:
e = [s.split(',') for s in temp]
print e
Which gave me:
[['AAA'], ['BBB'], ['CCC-DDD'], ['EE', 'FFF', 'FFF', 'EE']]
Now I tried to remove the duplicates with:
y = list(set(e))
print y
What ended up in an error
TypeError: unhashable type: 'list'
I'd appreciate any help.
Edit:
I didn't exactly said what the end result should be. The list should look like that
temp = ['AAA', 'BBB', 'CCC-DDD', 'EE', 'FFF']
Just the duplicates should get removed in the last column.

Apply set on the elements of the list not on the list of lists. You want your set to contain the strings of each list, not the lists.
e = [list(set(x)) for x in e]
You can do it directly as well:
e = [list(set(s.split(','))) for s in temp]
>>> e
[['AAA'], ['BBB'], ['CCC-DDD'], ['EE', 'FFF']]
you may want sorted(set(s.split(','))) instead to ensure lexicographic order (sets aren't ordered, even in python 3.7)
for a flat, ordered list, create a flat set comprehension and sort it:
e = sorted({x for s in temp for x in s.split(',')})
result:
['AAA', 'BBB', 'CCC-DDD', 'EE', 'FFF']

Here is solution, that uses itertools.chain method
import itertools
temp = ['AAA', 'BBB', 'CCC-DDD', 'EE,FFF,FFF,EE']
y = list(set(itertools.chain(*[s.split(',') for s in temp])))
# ['EE', 'FFF', 'AAA', 'BBB', 'CCC-DDD']

a = ['AAA', 'BBB', 'CCC-DDD', 'EE,FFF,FFF,EE']
b = [s.split(',') for s in a]
c = []
for i in b:
c = c + i
c = list(set(c))
['EE', 'FFF', 'AAA', 'BBB', 'CCC-DDD']

Here is a pure functional way to do it in Python:
from functools import partial
split = partial(str.split, sep=',')
list(map(list, map(set, (map(split, temp)))))
[['AAA'], ['BBB'], ['CCC-DDD'], ['EE', 'FFF']]
Or as I see the answer doesn't need lists inside of a list:
from itertools import chain
list(chain(*map(set, (map(split, temp)))))
['AAA', 'BBB', 'CCC-DDD', 'EE', 'FFF']

convert lists of uniform dicts into pandas Dataframe with nested dicts as multi-index

At a bit of loss despite much searching & experimentation...
Given this:
dictA = {'order': '1',
'char': {'glyph': 'A',
'case': 'upper',
'vowel': True}
}
dictB = {'order': '2',
'char': {'glyph': 'B',
'case': 'upper',
'vowel': False}
}
dictC = {'order': '3',
'char': {'glyph': 'C',
'case': 'upper',
'vowel': False}
}
dictD = {'order': '4',
'char': {'glyph': 'd',
'case': 'lower',
'vowel': False}
}
dictE = {'order': '5',
'char': {'glyph': 'e',
'case': 'lower',
'vowel': True}
}
letters = [dictA, dictB, dictC, dictD, dictE]
how to turn letters into into this: (first column is index)
order char
glyph case vowel
0 1 A upper True
1 2 B upper False
2 3 C upper False
3 4 d lower False
4 5 e lower True
... and as a plus, then be able operate on this frame to tally/plot number of entries that are uppercase, number of entries that are vowels, etc.
Any ideas?
EDIT: My initial example was maybe too simple, but I'll leave it for posterity.
Given:
import re
class Glyph(dict):
def __init__(self, glyph):
super(Glyph, self).__init__()
order = ord(glyph)
self['glyph'] = glyph
self['order'] = order
kind = {'type': None}
if re.search('\s+', glyph):
kind = {'type': 'whitespace'}
elif order in (range(ord('a'), ord('z')) +
range(ord('A'), ord('Z'))
):
lowercase = glyph.lower()
kind = {
'type': lowercase,
'vowel': lowercase in ['a', 'e', 'i', 'o', 'u'],
'case': ['upper', 'lower'][lowercase == glyph],
'number': (ord(lowercase) - ord('a') + 1)
}
self['kind'] = kind
chars = [Glyph(x) for x in 'Hello World']
I can do this:
import pandas as pd
df = pd.DataFrame(chars) # dataframe where 'order' & 'glyph' are OK...
# unpack 'kind' Series into list of dicts and use those to make a table
kindDf = pd.DataFrame(data=[x for x in df['kind']])
My intuition would lead me to think I could then do this:
df['kind'] = kindDf
...But that only adds the first column of my kindDF and puts it under 'kind' in df. Next attempt:
df.pop('kind') # get rid of this column of dicts
joined = df.join(kindDf) # flattens 'kind'...
joined is so close! The trouble is I want those columns from kind to be under a 'kind' hierarchy, rather than flat (as the joined result is). I've tried stack/unstack magic, but I can't grasp it. Do I need a MultiIndex?

This gets you close on the first part:
## a list for storing properly formated dataframes
container=[]
for l in letters:
## loop through list of dicts, turn each into a dataframe
## then add `order` to the index. Then make the dataframe wide using unstack
temp = pd.DataFrame(data=l).set_index('order',append=True).unstack(level=[0])
container.append(temp)
## throw all the dataframes together into one
result = pd.concat(container).reset_index()
result
order char
case glyph vowel
0 1 upper A True
1 2 upper B False
2 3 upper C False
3 4 lower d False
4 5 lower e True
For the second part, you can just rely on groupby and then the built in plotting functions for quick visuals. Omit the plot call after size() if you just want to see the tally.
result.groupby(result.char.vowel).size().plot(kind='bar',
figsize=[8,6])
title('Glyphs are awesome')

Making a dictionary of arrays with two lists with Python [duplicate]

This question already has answers here:
Python creating a dictionary of lists
(7 answers)
Closed 7 years ago.
I have a particular problem that has me stumped. Suppose I have the following two lists:
x = ["A","B","C","D","E"]
y = [1,2,3,2,1]
x and y have a relationship. The relationship is tied by index. That is, "A" relates to 1, "B" related to 2, "C" related to 3 and so on.
What I am trying to do is create a key value relation where the unique items in y are keys and each key has a list that contains the letters related to the key as mentioned previously. I attempted to do the following:
mapping = dict(zip(y,x))
{1: 'E', 2: 'D', 3: 'C'}
This overwrites the previous letter. I would love to be able to return the following:
{1:['A','E'], 2:['B','D'], 3:['C']}
Anyone have a clever solution to this? Preferably without itertools.

You can use setdefault
x = ["A","B","C","D","E"]
y = [1,2,3,2,1]
d = {}
for i,j in zip(y,x):
d.setdefault(i, []).append(j)
print d
Output:
{1: ['A', 'E'], 2: ['B', 'D'], 3: ['C']}

A defaultdict is my preference for situations like this.
from collections import defaultdict
x = ["A","B","C","D","E"]
y = [1,2,3,2,1]
D = defaultdict(list)
for i, j in zip(x, y):
D[j].append(i)
print dict(D)
Output is:
{1: ['A', 'E'], 2: ['B', 'D'], 3: ['C']}

Here is a clever, but O(n^2) and therefore not advised solution that I came up with using a combination of Python's dictionary & list comprehension.
>>> x = ["A","B","C","D","E"]
>>> y = [1,2,3,2,1]
>>> {y[i] : [x[j] for j in range(len(y)) if y[j] == y[i]] for i in range(len(y))}
{1: ['A', 'E'], 2: ['B', 'D'], 3: ['C']}
For what its worth, #Joe R and #mattingly890 solutions are the way to go, since they are O(n) solutions

Here is a simple (not clever) solution. I would argue that a simple solution is more in keeping with pythons philosophy than a clever solution. Perl is a language that's designed to maximise cleverness, imho, and I find it nigh on unreadable (admittedly I avoid it if I can so am an inexperienced Perl programmer).
x = ["A","B","C","D","E"]
y = [1,2,3,2,1]
assert(len(x) == len(y))
d = {}
for i in range(len(x)):
key = y[i]
val = x[i]
if key in d:
d[key].append(val)
else:
d[key] = [val, ]
print d

How to change the items in a list of sublists based on certain rules and conditions of those sublists?

I have a list of sublists that are made up of three items. Only the first and last item matter in the sublists, because I want to change the last item across all sublists based on the frequency of the last item across the list.
This is the list I have:
lst = [['A','abc','id1'],['A','def','id2'],['A','ghi','id1'],['A','ijk','id1'],['A','lmn','id2'],['B','abc','id3'],['B','def','id3'],['B','ghi','id3'],['B','ijk','id3'],['B','lmn','id'],['C','xyz','id6'],['C','lmn','id6'],['C','aaa','id5']]
For example, A appears the most with id1 instead of id2, so I'd like to replace all id2 that appear with A with id1. For B, id3 is the most common, so I'd like to replace any instance of anything else with id3, which means I'd want to replace 'id' with 'id3' only for B. For C, I'd like to replace the instance of 'id5' with 'id6,' because 'id6' appears the most with the list.
Desired_List = lst = [['A','abc','id1'],['A','def','id1'],['A','ghi','id1'],['A','ijk','id1'],['A','lmn','id1'],['B','abc','id3'],['B','def','id3'],['B','ghi','id3'],['B','ijk','id3'],['B','lmn','id3'],['C','xyz','id6'],['C','lmn','id6'],['C','aaa','id6']]
I should also mention that this is going to be done on a very large list, so speed and efficiency is needed.

Straight-up data processing using your ad-hoc requirement above, I can come up with the following algorithm.
First sweep: collect frequency information for every key (i.e. 'A', 'B', 'C'):
def generate_frequency_table(lst):
assoc = {} # e.g. 'A': {'id1': 3, 'id2': 2}
for key, unused, val in list:
freqs = assoc.get(key, None)
if freqs is None:
freqs = {}
assoc[key] = freqs
valfreq = freqs.get(val, None)
if valfreq is None:
freqs[val] = 1
else:
freqs[val] = valfreq + 1
return assoc
>>> generate_frequency_table(lst)
{'A': {'id2': 2, 'id1': 3}, 'C': {'id6': 2, 'id5': 1}, 'B': {'id3': 4, 'id': 1}}
Then, see what 'value' is associated with each key (i.e. {'A': 'id1'}):
def generate_max_assoc(assoc):
max = {} # e.g. {'A': 'id1'}
for key, freqs in assoc.iteritems():
curmax = ('', 0)
for val, freq in freqs.iteritems():
if freq > curmax[1]:
curmax = (val, freq)
max[key] = curmax[0]
return max
>>> maxtable = generate_max_assoc(generate_frequency_table(lst))
>>> print maxtable
{'A': 'id1', 'C': 'id6', 'B': 'id3'}
Finally, iterate through the original list and replace values using the table above:
>>> newlst = [[key, unused, maxtable[key]] for key, unused, val in lst]
>>> print newlst
[['A', 'abc', 'id1'], ['A', 'def', 'id1'], ['A', 'ghi', 'id1'], ['A', 'ijk', 'id1'], ['A', 'lmn', 'id1'], ['B', 'abc', 'id3'], ['B', 'def', 'id3'], ['B', 'ghi', 'id3'], ['B', 'ijk', 'id3'], ['B', 'lmn', 'id3'], ['C', 'xyz', 'id6'], ['C', 'lmn', 'id6'], ['C', 'aaa', 'id6']]

This is pretty much the same solution as supplied by Santa, but I've combined a few steps into one, as we can scan for the maximum value while we are collecting the frequencies:
def fix_by_frequency(triple_list):
freq = {}
for key, _, value in triple_list:
# Get existing data
data = freq[key] = \
freq.get(key, {'max_value': value, 'max_count': 1, 'counts': {}})
# Increment the count
count = data['counts'][value] = data['counts'].get(value, 0) + 1
# Update the most frequently seen
if count > data['max_count']:
data['max_value'], data['max_count'] = value, count
# Use the maximums to map the list
return [[key, mid, freq[key]['max_value']] for key, mid, _ in triple_list]
This has been optimised a bit for readability (I think, be nice!) rather than raw speed. For example you might not want to write back to the dict when you don't need to, or maintain a separate max dict to prevent two key lookups in the list comprehension at the end.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Minimum number of items to cover all cases - python

Related

generating list of every combination without duplicates

Replace duplicates in a list column

convert lists of uniform dicts into pandas Dataframe with nested dicts as multi-index

Making a dictionary of arrays with two lists with Python [duplicate]

How to change the items in a list of sublists based on certain rules and conditions of those sublists?

Categories

Resources