Sequential nested list with string concatenation - python

Given the nested list:
l = [['a','b','c'], ['d'], ['e','f']]
I would like to join them sequentially with '/'.join().
With the expected list result:
['a/d/e', 'a/d/f', 'b/d/e', 'b/d/f', 'c/d/e', 'c/d/f']
The solution needs to be able to scale (2D list of various sizes).
What is the best way to achieve this?

This is what's known as a Cartesian product. Here's an approach using itertools.product:
import itertools as it
list("/".join(p) for p in it.product(*l))
Output:
['a/d/e', 'a/d/f', 'b/d/e', 'b/d/f', 'c/d/e', 'c/d/f']
The itertools.product function takes an arbitrary number of iterables as arguments (and an optional repeat parameter). What I'm doing with *l is unpacking your sublists as separate arguments to the itertools.product function. This is essentially what it sees:
it.product(["a", "b", "c"], ["d"], ["e", "f"])
PS - you could actually use strings as well, since strings are iterable:
In [6]: list(it.product("abc", "d", "ef"))
Out[6]:
[('a', 'd', 'e'),
('a', 'd', 'f'),
('b', 'd', 'e'),
('b', 'd', 'f'),
('c', 'd', 'e'),
('c', 'd', 'f')]
Beware that the size of the Cartesian product of collections A, B, etc is the product of the sizes of each collection. For example, the Cartesian product of (0, 1), ("a", "b", "c") would be 2x3=6. Adding a third collection, (5, 6, 7, 8) bumps the size up to 24.

You need to unpack the sublists and use itertools.product:
from itertools import product
out = ['/'.join(tpl) for tpl in product(*l)]
Output:
['a/d/e', 'a/d/f', 'b/d/e', 'b/d/f', 'c/d/e', 'c/d/f']

Related

Combinations of a list of items in efficient way

I am trying to find if there is a more efficient way of finding these combinations using some Python scientific library.
I am trying to avoid native for loops and list append preferring to use some NumPy or similar functionality that in theory should be more efficient given it's using C code under the hood. I am struggling to find one, but to me this is quite a common problem to make these operations in an efficient way rather than using slow Python native structures.
I am wondering if I am looking in the wrong places? E.g. this does not seem to help here: https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.random.binomial.html
See here I am taking the binomial coefficients of a list of length 5 starting from a lower bound of 2 and finding out all the possible combinations. Meanwhile I append to a global list so I then have a nice list of "taken items" from the original input list.
import itertools
input_list = ['a', 'b', 'c', 'd', 'e']
minimum_amount = 2
comb_list = []
for i in range(minimum_amount, len(input_list)):
curr_list = input_list[:i+1]
print(f"the current index is: {i}, the lists are: {curr_list}")
curr_comb_list = list(itertools.combinations(curr_list, i))
comb_list = comb_list + curr_comb_list
print(f"found {len(comb_list)} combinations (check on set length: {len(set(comb_list))})")
print(comb_list)
Gives:
found 12 combinations (check on set length: 12)
[('a', 'b'), ('a', 'c'), ('b', 'c'), ('a', 'b', 'c'), ('a', 'b', 'd'),
('a', 'c', 'd'), ('b', 'c', 'd'), ('a', 'b', 'c', 'd'), ('a', 'b', 'c', 'e'),
('a', 'b', 'd', 'e'), ('a', 'c', 'd', 'e'), ('b', 'c', 'd', 'e')]
Is it possible to do this avoiding the for loop and using some scientific libraries to do this quicker?
How can I do this in a quicker way?
The final list contains all combinations of any length from 1 to len(input_list), which is actually the Power Set.
Look at How to get all possible combinations of a list’s elements?.
You want all combinations from input_list of length 2 or more.
To get them, you can run:
comb_lst = list(itertools.chain.from_iterable(
[ itertools.combinations(input_list, i)
for i in range(2, len(input_list)) ]))
Something similiar to powerset in examples in the itertools web site,
but not exactly the same (the length starts from 2, not from 1).
Note also that curr_list in your code is actually used only for printing.

Python convert dictionary where keys have multiple values into list of tuples

I am trying to convert a dictionary into a list of tuples. I see that there are lots of similar posts on here, but I didn't see any with the same format that I am looking for. For example, I may be given the following as a dict:
{"A": ["B", "C", "D"], "B":["D"], "C":[], "D":["A"]}
And I want the output to look like a list with elements such as...
[('A', 'B'), ('A", 'C'), ('A', 'D'), ('B', 'D'), ('D', 'A')]
where the key is set with the corresponding values as their own tuples.
So for the following, I am writing the function find_par where it takes in a dict, and 2 strings such as "A" and "D". Ultimately I'm what this function does is try to find the parent node of the vert parameter. I figured if I can get a list, I can loop over the list continuously checking the 2nd element and once it matches vert, I can return the first element because that would be the parent. Otherwise I would return nothing.
def find_par(tree, root, vert):
if root == vert:
return []
else:
for key, value in tree.items():
temp = [key,value]
print (temp)
This code produces the following output with the dict {"A": ["B", "C"], "B":["D"], "C":[], "D":[]} provided:
['A', ['B', 'C']]
['B', ['D']]
['C', []]
['D', []]
Again, it should look like [('A', 'B'), ('A', 'C'), ('B', 'D'),...]
You can use a list comprehension to iterate over the items in the sub-list in each dict value (given your dict stored as variable d):
[(k, i) for k, l in d.items() for i in l]
This returns:
[('A', 'B'), ('A', 'C'), ('A', 'D'), ('B', 'D'), ('D', 'A')]
Check this.
dict1 = {"A": ["B", "C"], "B":["D"], "C":[], "D":[]}
for key, values in dict1.items():
for i in values:
print([key,i])
#prints ['A', 'B'],['A', 'C'],['B', 'D']
I have stored your dict as variable z.
[(a, b) for a in z for b in z[a]]
Below I'll explain how this nesting of list comprehension works:
You're taking every key in the dictionary with
for a in z and matching it with every value in the list associated with z[a] using for b in z[a]. You can see at the beginning of the statement I create a tuple.

All combinations of set of dictionaries into K N-sized groups

I though this would be straightforward, unfortunately, it is not.
I am trying to build a function to take an iterable of dictionaries (i.e., a list of unique dictionaries) and return a list of lists of unique groupings of the dictionaries.
If I have x players I would like to form k teams of n size.
This question and set of answers from CMSDK is the closest thing to a solution I can find. In adapting it from processing strings of letters to dictionaries I am finding my Python skills inadequate.
The original function that I am adapting comes from the second answer:
import itertools as it
def unique_group(iterable, k, n):
"""Return an iterator, comprising groups of size `k` with combinations of size `n`."""
# Build separate combinations of `n` characters
groups = ("".join(i) for i in it.combinations(iterable, n)) # 'AB', 'AC', 'AD', ...
# Build unique groups of `k` by keeping the longest sets of characters
return (i for i in it.product(groups, repeat=k)
if len(set("".join(i))) == sum((map(len, i)))) # ('AB', 'CD'), ('AB', 'CE'), ...
My current adaptation (that utterly fails with an error of TypeError: object of type 'generator' has no len() because of the call to map(len, i)):
def unique_group(iterable, k, n):
groups = []
groups.append((i for i in it.combinations(iterable, n)))
return ( i for i in it.product(groups, repeat=k) if len(set(i)) == sum((map(len, i))) )
For a bit of context: I am trying to programmatically divide a group of players into teams for Christmas Trivia based on their skills. The list of dictionaries is formed from a yaml file that looks like
- name: Patricia
skill: 4
- name: Christopher
skill: 6
- name: Nicholas
skill: 7
- name: Bianca
skill: 4
Which, after yaml.load produces a list of dictionaries:
players = [{'name':'Patricia', 'skill':4},{'name':'Christopher','skill':6},
{'name':'Nicholas','skill':7},{'name':'Bianca','skill':4}]
So I expect output that would look like a list of these (where k = 2 and n = 2) :
(
# Team assignment grouping 1
(
# Team 1
( {'name': 'Patricia', 'skill': 4}, {'name': 'Christopher', 'skill': 6} ),
# Team 2
( {'name': 'Nicholas', 'skill': 7}, {'name': 'Bianca', 'skill': 4} )
),
# Team assignment grouping 2
(
# Team 1
( {'name': 'Patricia', 'skill': 4}, {'name': 'Bianca', 'skill': 4} ),
# Team 2
( {'name': 'Nicholas', 'skill': 7}, {'name': 'Christopher', 'skill': 6} )
),
...,
# More unique lists
)
Each team assignment grouping needs to have unique players across teams (i.e., there cannot be the same player on multiple teams in a team assignment grouping), and each team assignment grouping needs to be unique.
Once I have the list of team assignment combinations I will sum up the skills in every group, take the difference between the highest skill and lowest skill, and choose the grouping (with variance) with the lowest difference between highest and lowest skills.
I will admit I do not understand this code fully. I understand the first assignment to create a list of all the combinations of the letters in a string, and the return statement to find the product under the condition that the product does not contain the same letter in different groups.
My initial attempt was to simply take the it.product(it.combinations(iterable, n), repeat=k) but this does not achieve uniqueness across groups (i.e., I get the same player on different teams in one grouping).
Thanks in advance, and Merry Christmas!
Update:
After a considerable amount of fiddling I have gotten the adaptation to this:
This does not work
def unique_group(iterable, k, n):
groups = []
groups.append((i for i in it.combinations(iterable, n)))
return (i for i in it.product(groups, repeat=k)\
if len(list({v['name']:v for v in it.chain.from_iterable(i)}.values())) ==\
len(list([x for x in it.chain.from_iterable(i)])))
I get a bug
Traceback (most recent call last):
File "./optimize.py", line 65, in <module>
for grouping in unique_group(players, team_size, number_of_teams):
File "./optimize.py", line 32, in <genexpr>
v in it.chain.from_iterable(i)})) == len(list([x for x in
File "./optimize.py", line 32, in <dictcomp>
v in it.chain.from_iterable(i)})) == len(list([x for x in
TypeError: tuple indices must be integers or slices, not str
Which is confusing the crap out of me and makes clear I don't know what my code is doing. In ipython I took this sample output:
assignment = (
({'name': 'Patricia', 'skill': 4}, {'name': 'Bianca', 'skill': 4}),
({'name': 'Patricia', 'skill': 4}, {'name': 'Bianca', 'skill': 4})
)
Which is clearly undesirable and formulated the following test:
len(list({v['name']:v for v in it.chain.from_iterable(assignment)})) == len([v for v in it.chain.from_iterable(assignment)])
Which correctly responds False. But it doesn't work in my method. That is probably because I am cargo cult coding at this point.
I understand what it.chain.from_iterable(i) does (it flattens the tuple of tuples of dictionaries to just a tuple of dictionaries). But it seems that the syntax {v['name']:v for v in ...} does not do what I think it does; either that or I'm unpacking the wrong values! I am trying to test the unique dictionaries against the total dictionaries based on Flatten list of lists and Python - List of unique dictionaries but the answer giving me
>>> L=[
... {'id':1,'name':'john', 'age':34},
... {'id':1,'name':'john', 'age':34},
... {'id':2,'name':'hanna', 'age':30},
... ]
>>> list({v['id']:v for v in L}.values())
Isn't as easy to adapt in this circumstance as I thought, and I'm realizing I don't really know what is getting returned in the it.product(groups, repeat=k). I'll have to investigate more.
This is where I'd leverage the new dataclasses with sets. You can make a dataclass hashable by setting frozen=True in the decorator. First you'd add your players to a set to get unique players. Then you'd get all the combinations of players for n size teams. Then you could create a set of unique teams. Then create valid groupings whereas no player is represented more than once across teams. Finally you could calculate the max disparity in the total team skill level across the grouping (leveraging combinations yet again) and use that to sort your valid groupings. So something like this.
from dataclasses import dataclass
from itertools import combinations
from typing import FrozenSet
import yaml
#dataclass(order=True, frozen=True)
class Player:
name: str
skill: int
#dataclass(order=True, frozen=True)
class Team:
members: FrozenSet[Player]
def total_skill(self):
return sum(p.skill for p in self.members)
def is_valid(grouping):
players = set()
for team in grouping:
for player in team.members:
if player in players:
return False
players.add(player)
return True
def max_team_disparity(grouping):
return max(
abs(t1.total_skill() - t2.total_skill())
for t1, t2 in combinations(grouping, 2)
)
def best_team_matchups(player_file, k, n):
with open(player_file) as f:
players = set(Player(p['name'], p['skill']) for p in yaml.load(f))
player_combs = combinations(players, n)
unique_teams = set(Team(frozenset(team)) for team in player_combs)
valid_groupings = set(g for g in combinations(unique_teams, k) if is_valid(g))
for g in sorted(valid_groupings, key=max_team_disparity):
print(g)
best_team_matchups('test.yaml', k=2, n=4)
Example output:
(
Team(members=frozenset({
Player(name='Chr', skill=6),
Player(name='Christopher', skill=6),
Player(name='Nicholas', skill=7),
Player(name='Patricia', skill=4)
})),
Team(members=frozenset({
Player(name='Bia', skill=4),
Player(name='Bianca', skill=4),
Player(name='Danny', skill=8),
Player(name='Nicho', skill=7)
}))
)
A list of dicts is not a good data structure for mapping what you actually want to rearrange, the player names, to their respective attributes, the skill ratings. You should transform the list of dicts to a name-to-skill mapping dict first:
player_skills = {player['name']: player['skill'] for player in players}
# player_skills becomes {'Patricia': 4, 'Christopher': 6, 'Nicholas': 7, 'Blanca': 4}
so that you can recursively deduct a combination of n players from the pool of players iterable, until the number of groups reaches k:
from itertools import combinations
def unique_group(iterable, k, n, groups=0):
if groups == k:
yield []
pool = set(iterable)
for combination in combinations(pool, n):
for rest in unique_group(pool.difference(combination), k, n, groups + 1):
yield [combination, *rest]
With your sample input, list(unique_group(player_skills, 2, 2)) returns:
[[('Blanca', 'Christopher'), ('Nicholas', 'Patricia')],
[('Blanca', 'Nicholas'), ('Christopher', 'Patricia')],
[('Blanca', 'Patricia'), ('Christopher', 'Nicholas')],
[('Christopher', 'Nicholas'), ('Blanca', 'Patricia')],
[('Christopher', 'Patricia'), ('Blanca', 'Nicholas')],
[('Nicholas', 'Patricia'), ('Blanca', 'Christopher')]]
You can get the combination with the lowest variance in total skill ratings by using the min function with a key function that returns the skill difference between the team with the highest total skill ratings and the one with the lowest, which takes only O(n) in time complexity:
def variance(groups):
total_skills = [sum(player_skills[player] for player in group) for group in groups]
return max(total_skills) - min(total_skills)
so that min(unique_group(player_skills, 2, 2), key=variance) returns:
[('Blanca', 'Nicholas'), ('Christopher', 'Patricia')]
Instead of trying to create every possible grouping of k sets of n elements (possibly including repeats!), and then filtering down to the ones that don't have any overlap, let's directly build groupings that meet the criterion. This also avoids generating redundant groupings in different orders (the original code could also do this by using combinations rather than product in the last step).
The approach is:
Iterate over possibilities (combinations of n elements in the input) for the first set - by which I mean, the one that contains the first of the elements that will be chosen.
For each, recursively find possibilities for the remaining sets. They cannot use elements from the first set, and they also cannot use elements from before the first set (or else the first set wouldn't be first).
In order to combine the results elegantly, we use a recursive generator: rather than trying to build lists that contain results from the recursive calls, we just yield everything we need to. We represent each collection of group_count many elements with a tuple of tuples (the inner tuples are the groups). At the base case, there is exactly one way to make no groups of elements - by just... doing that... yeah... - so we need to yield one value which is a tuple of no tuples of an irrelevant number of elements each - i.e., an empty tuple. In the other cases, we prepend the tuple for the current group to each result from the recursive call, yielding all those results.
from itertools import combinations
def non_overlapping_groups(group_count, group_size, population):
if group_count == 0:
yield ()
return
for indices in combinations(range(len(population)), group_size):
current = (tuple(population[i] for i in indices),)
remaining = [
x for i, x in enumerate(population)
if i not in indices and i > indices[0]
] if indices else population
for recursive in non_overlapping_groups(group_count - 1, group_size, remaining):
yield current + recursive
Let's try it:
>>> list(non_overlapping_groups(2, 3, 'abcdef'))
[(('a', 'b', 'c'), ('d', 'e', 'f')), (('a', 'b', 'd'), ('c', 'e', 'f')), (('a', 'b', 'e'), ('c', 'd', 'f')), (('a', 'b', 'f'), ('c', 'd', 'e')), (('a', 'c', 'd'), ('b', 'e', 'f')), (('a', 'c', 'e'), ('b', 'd', 'f')), (('a', 'c', 'f'), ('b', 'd', 'e')), (('a', 'd', 'e'), ('b', 'c', 'f')), (('a', 'd', 'f'), ('b', 'c', 'e')), (('a', 'e', 'f'), ('b', 'c', 'd'))]
>>> list(non_overlapping_groups(3, 2, 'abcdef'))
[(('a', 'b'), ('c', 'd'), ('e', 'f')), (('a', 'b'), ('c', 'e'), ('d', 'f')), (('a', 'b'), ('c', 'f'), ('d', 'e')), (('a', 'c'), ('b', 'd'), ('e', 'f')), (('a', 'c'), ('b', 'e'), ('d', 'f')), (('a', 'c'), ('b', 'f'), ('d', 'e')), (('a', 'd'), ('b', 'c'), ('e', 'f')), (('a', 'd'), ('b', 'e'), ('c', 'f')), (('a', 'd'), ('b', 'f'), ('c', 'e')), (('a', 'e'), ('b', 'c'), ('d', 'f')), (('a', 'e'), ('b', 'd'), ('c', 'f')), (('a', 'e'), ('b', 'f'), ('c', 'd')), (('a', 'f'), ('b', 'c'), ('d', 'e')), (('a', 'f'), ('b', 'd'), ('c', 'e')), (('a', 'f'), ('b', 'e'), ('c', 'd'))]
>>> # Some quick sanity checks
>>> len(list(non_overlapping_groups(2, 3, 'abcdef')))
10
>>> # With fewer input elements, obviously we can't do it.
>>> len(list(non_overlapping_groups(2, 3, 'abcde')))
0
>>> # Adding a 7th element, any element could be the odd one out,
>>> # and in each case we get another 10 possibilities, making 10 * 7 = 70.
>>> len(list(non_overlapping_groups(2, 3, 'abcdefg')))
70
I performance tested this against a modified version of the original (which also shows how to make it work properly with non-strings, and optimizes the sum calculation):
def unique_group(group_count, group_size, population):
groups = list(it.combinations(population, group_size))
return (
i for i in combinations(groups, group_count)
if len({e for g in i for e in g}) == group_count * group_size
)
Quickly verifying the equivalence:
>>> len(list(unique_group(3, 2, 'abcdef')))
15
>>> len(list(non_overlapping_groups(3, 2, 'abcdef')))
15
>>> set(unique_group(3, 2, 'abcdef')) == set(non_overlapping_groups(3, 2, 'abcdef'))
True
We see that even for fairly small examples (here, the output has 280 groupings), the brute-force approach has to filter through a lot:
>>> import timeit
>>> timeit.timeit("list(g(3, 3, 'abcdefghi'))", globals={'g': unique_group}, number=100)
5.895461600041017
>>> timeit.timeit("list(g(3, 3, 'abcdefghi'))", globals={'g': non_overlapping_groups}, number=100)
0.2303082060534507

"Transpose" (rotate?) nested list

I have a list of lists of lists like this:
[
[
[a,b],
[c,d]
],
[
[e,f],
[g,h]
]
]
Basically, this is a cube of values.
What I want is a different order of items in the same cube, like this:
[
[
[a,e],
[b,f]
],
[
[c,g],
[d,h]
]
]
And, preferably, in a one-liner (yes, I do know that's not the best practice).
I know of the map(list, *zip(a)) trick, but i couldn't figure out how to apply it here. Something with lambdas and maps, probably?
UPD: As for what I need it for --- I've done some tests for speeds of different sorting algorithms; each deepest list has values -- the times that the sorting algorithms that I tested took. These lists are in lists, which represent different types of tests, and the outer list has the same thing repeated for different test sizes. After such rotation, I will have list (test size) of lists (test type) of lists (sort type) of items (time), which is so much more convenient to plot.
If I understand you correctly, you want to first transpose all the sublists then transpose the newly transposed groups:
print([list(zip(*sub)) for sub in zip(*l)])
Output:
In [69]: [list(zip(*sub)) for sub in zip(*l)]
Out[69]: [[('a', 'e'), ('b', 'f')], [('c', 'g'), ('d', 'h')]]
If you want some map foo with a lambda:
In [70]: list(map(list, map(lambda x: zip(*x), zip(*l))))
Out[70]: [[('a', 'e'), ('b', 'f')], [('c', 'g'), ('d', 'h'
For python2 you don't need the extra map call but I would use itertools.izip to do the initial transpose.:
In [9]: from itertools import izip
In [10]: map(lambda x: zip(*x), izip(*l))
Out[10]: [[('a', 'e'), ('b', 'f')], [('c', 'g'), ('d', 'h')]]

Programmatically generate list of combinations other lists

I want to create a list of possible combinations of a list of lists (example will explain better)
list=[[a,b],[c,d],[f]]
The result should be
acf
adf
bcf
bdf
The length of the list can vary, and the length of the lists within the variable list can also vary. How would I make this/these loop(s) programmatically? (preferably explained in Python or pseudo-language)
That's what itertools.product is for:
>>> lst = ['ab','cd','f']
>>> from itertools import product
>>> list(product(*lst))
[('a', 'c', 'f'), ('a', 'd', 'f'), ('b', 'c', 'f'), ('b', 'd', 'f')]
import itertools
list=[['a','b'],['c','d'],['f']]
for comb in itertools.product(*list):
print ''.join(comb)
You can do it recursively:
def printCombos(arrays, combo):
if len(arrays) == 0:
print combo
else:
for i in arrays[0]:
combo.append(i)
printCombos(arrays[1:], combo)
combo.pop()
l=[['a','b'],['c','d'],['f']]
printCombos(l, [])
curlist = []
for firstobj in listoflists[0]:
for secondobj in listoflists[1]:
for lastobj in listoflists[2]:
curlist.append(firstobj)
curlist.append(secondobj)
curlist.append(lastobj)
print ','.join(curlist)

Categories