Summing similar elements within a tuple-of-tuples - python

Following on from this question, I now need to sum similar entries (tuples) within an overall tuple.
So given a tuple-of-tuples such as:
T = (('a', 'b', 2),
('a', 'c', 4),
('b', 'c', 1),
('a', 'b', 8),)
For all tuples where the first and second element are identical, I want to sum the third element, otherwise, leave the tuple in place. So I will end up with the following tuple-of-tuples:
(('a', 'b', 10),
('a', 'c', 4),
('b', 'c', 1),)
The order of the tuples within the enclosing tuple (and the summing) doesn't matter.
We are dealing with tuples so we can't take advantage of something like dict.get(). If we go the defaultdict route :
In [1218]: d = defaultdict(lambda: defaultdict(int))
In [1220]: for t in T:
d[t[0]][t[1]] += t[2]
......:
In [1225]: d
Out[1225]:
defaultdict(<function __main__.<lambda>>,
{'a': defaultdict(int, {'b': 10, 'c': 4}),
'b': defaultdict(int, {'c': 1})})
I'm not quite sure how to reconstruct that into a tuple-of-tuples. Any anyway, although the order of the three elements within each tuple will be consistent, I'm not comfortable with my indexing of the tuples. Can this be done without any conversion to other data types?

Code -
from collections import defaultdict
T1 = (('a', 'b', 2),
('a', 'c', 4),
('b', 'c', 1),
('a', 'b', 8),)
d = defaultdict(int)
for x, y, z in T1:
d[(x, y)] += z
T2 = tuple([(*k, v) for k, v in d.items()])
print(T2)
Output -
(('a', 'c', 4), ('b', 'c', 1), ('a', 'b', 10))
If you're interested in maintaining the original order, then -
from collections import OrderedDict
T1 = (('a', 'b', 2), ('a', 'c', 4), ('b', 'c', 1), ('a', 'b', 8),)
d = OrderedDict()
for x, y, z in T1:
d[(x, y)] = d[(x, y)] + z if (x, y) in d else z
T2 = tuple((*k, v) for k, v in d.items())
print(T2)
Output -
(('a', 'b', 10), ('a', 'c', 4), ('b', 'c', 1))
In Python 2, you should use this -
T2 = tuple([(x, y, z) for (x, y), z in d.items()])

You just need a defaultdict(int):
>>> from collections import defaultdict
>>>
>>> d = defaultdict(int)
>>> T = (('a', 'b', 2),
... ('a', 'c', 4),
... ('b', 'c', 1),
... ('a', 'b', 8),)
>>>
>>> for key1, key2, value in T:
... d[(key1, key2)] += value
...
>>> [(key1, key2, value) for (key1, key2), value in d.items()]
[
('b', 'c', 1),
('a', 'b', 10),
('a', 'c', 4)
]

Related

Use the second value in a list to carry out operation during initial run of a for loop

How do I call a certain value from a dictionary based on the values on a list_a?
list_a = ['S', 'D', 'E']
dict_list = {'S': [('A', 8), ('B', 4), ('D', 6)], 'D': [('C', 5,), ('E', 3)], }
for i in list_a:
print(i)
for I would like to add the value from 'S' to 'D' and then add the value from 'D' to 'E'.
For this example, the answer I am looking for is 9. Basically S to D = 6 and D to E = 3.
Let's use
list_a = ['S', 'D', 'E']
dict_list = {'S': [('A', 8), ('B', 4), ('D', 6)], 'D': [('C', 5), ('E', 3)]}
sum(dict(dict_list[x])[y] for x, y in zip(list_a, list_a[1:]))
# 9
What this does is get pairs of indices, extract the value from the corresponding entries, then sums up the total.
If it is possible for keys not to exist, you may use change dict(dict_list[x])[y] to dict(dict_list.get(x, {}).get(y, 0).

How do I convert a list of pairs into a dictionary with each element as a key to a list of paired values?

I'm doing coursework which involves graphs. I have edge lists E=[('a','b'),('a','c'),('a','d'), ('b','c') etc. ] and I want to a function to convert them into adjacency matrices in the form of dictionaries {'a':['b','c','d'], 'b':['a', etc. } so that I can use a function that only inputs these dictionaries.
My main issue is I can't figure out how to use a loop to add key:values without just overwriting the lists. A previous version of my function would output [] as all values because 'f' has no connections.
I've tried this:
V = ['a','b','c','d','e','f']
E=[('a', 'b'), ('a', 'c'), ('a', 'd'), ('b', 'c'), ('b', 'd'), ('c', 'd')]
def EdgeListtoAdjMat(V,E):
GA={}
conneclist=[]
for v in V:
for i in range(len(V)):
conneclist.append([])
if (v,V[i]) in E:
conneclist[i].append(V[i])
for i in range(len(V)):
GA[V[i]]=conneclist[i]
return(GA)
EdgeListtoAdjMat(V,E) outputs:
{'a': [], 'b': ['b'], 'c': ['c', 'c'], 'd': ['d', 'd', 'd'], 'e': [], 'f': []}
whereas it should output:
{'a':['b','c','d'],
'b':['a','c','d'],
'c':['a','b','d'],
'd':['a','b','c'],
'e':[],
'f':[]
}
The logic of what you're trying to achieve is actually quite simple:
V = ['a','b','c','d','e','f']
E=[('a', 'b'), ('a', 'c'), ('a', 'd'), ('b', 'c'), ('b', 'd'), ('c', 'd')]
result = {}
for elem in V:
tempList = []
for item in E:
if elem in item:
if elem == item[0]:
tempList.append(item[1])
else:
tempList.append(item[0])
result[elem] = tempList
tempList = []
print(result)
Result:
{'a': ['b', 'c', 'd'], 'b': ['a', 'c', 'd'], 'c': ['a', 'b', 'd'], 'd': ['a', 'b', 'c'], 'e': [], 'f': []}
For every element in V, perform a check to see whether that element exists in any tuple in E. If it exists, then take the element that together form a pair on that tuple and append to a temporary list. After checking every element in E, update the result dictionary and move to the next element of V until you're done.
To get back to your code, you need to modify it as following:
def EdgeListtoAdjMat(V,E):
GA={}
conneclist=[]
for i in range(len(V)):
for j in range(len(V)):
# Checking if a pair of two different elements exists in either format inside E.
if not i==j and ((V[i],V[j]) in E or (V[j],V[i]) in E):
conneclist.append(V[j])
GA[V[i]]=conneclist
conneclist = []
return(GA)
A more efficient approach is to iterate through the edges and append to the output dict of lists the vertices in both directions. Use dict.setdefault to initialize each new key with a list. And when the iterations over the edges finish, iterate over the rest of the vertices that are not yet in the output dict to assign to them empty lists:
def EdgeListtoAdjMat(V,E):
GA = {}
for a, b in E:
GA.setdefault(a, []).append(b)
GA.setdefault(b, []).append(a)
for v in V:
if v not in GA:
GA[v] = []
return GA
so that given:
V = ['a', 'b', 'c', 'd', 'e', 'f']
E = [('a', 'b'), ('a', 'c'), ('a', 'd'), ('b', 'c'), ('b', 'd'), ('c', 'd')]
EdgeListtoAdjMat(V, E)) would return:
{'a': ['b', 'c', 'd'], 'b': ['a', 'c', 'd'], 'c': ['a', 'b', 'd'], 'd': ['a', 'b', 'c'], 'e': [], 'f': []}
Since you already have your list of vertices in V, it is easy to prepare a dictionary with an empty list of connections. Then, simply go through the edge list and add to the array on each side:
V = ['a','b','c','d','e','f']
E = [('a', 'b'), ('a', 'c'), ('a', 'd'), ('b', 'c'), ('b', 'd'), ('c', 'd')]
GA = {v:[] for v in V}
for v1,v2 in E:
GA[v1].append(v2)
GA[v2].append(v1)
I think your code is not very pythonic, you could write a more readable code that is simpler to debug and also faster since you are using python's built-in libraries and numpy's indexing.
def EdgeListToAdjMat(V, E):
AdjMat = np.zeros((len(V), len(V))) # the shape of Adjancy Matrix
connectlist = {
# Mapping each character to its index
x: idx for idx, x in enumerate(V)
}
for e in E:
v1, v2 = e
idx_1, idx_2 = connectlist[v1], connectlist[v2]
AdjMat[idx_1, idx_2] = 1
AdjMat[idx_2, idx_1] = 1
return AdjMat
If you'd consider using a library, networkx is designed for these type of network problems:
import networkx as nx
V = ['a','b','c','d','e','f']
E = [('a', 'b'), ('a', 'c'), ('a', 'd'), ('b', 'c'), ('b', 'd'), ('c', 'd')]
G=nx.Graph(E)
G.add_nodes_from(V)
GA = nx.to_dict_of_lists(G)
print(GA)
# {'a': ['c', 'b', 'd'], 'c': ['a', 'b', 'd'], 'b': ['a', 'c', 'd'], 'e': [], 'd': ['a', 'c', 'b'], 'f': []}
You can convert the edge list to the map using itertools.groupby
from itertools import groupby
from operator import itemgetter
V = ['a','b','c','d','e','f']
E = [('a', 'b'), ('a', 'c'), ('a', 'd'), ('b', 'c'), ('b', 'd'), ('c', 'd')]
# add edge in the other direction. E.g., for a -> b, add b -> a
nondirected_edges = E + [tuple(reversed(pair)) for pair in E]
# extract start and end vertices from an edge
v_start = itemgetter(0)
v_end = itemgetter(1)
# group edges by their starting vertex
groups = groupby(sorted(nondirected_edges), key=v_start)
# make a map from each vertex -> adjacent vertices
mapping = {vertex: list(map(v_end, edges)) for vertex, edges in groups}
# if you don't need all the vertices to be present
# and just want to be able to lookup the connected
# list of vertices to a given vertex at some point
# you can use a defaultdict:
from collections import defaultdict
adj_matrix = defaultdict(list, mapping)
# if you need all vertices present immediately:
adj_matrix = dict(mapping)
adj_matrix.update({vertex: [] for vertex in V if vertex not in mapping})

How to do a full outer join / merge of iterators by key?

I have multiple sorted iterators that yield keyed data, representable by lists:
a = iter([(1, 'a'), (2, 't'), (4, 'c')])
b = iter([(1, 'a'), (3, 'g'), (4, 'g')])
I want to merge them, using the key and keeping track of which iterator had a value for a key. This should be equivalent to a full outer join in SQL:
>>> list(full_outer_join(a, b, key=lambda x: x[0]))
[(1, 'a', 'a'), (2, 't', None), (3, None, 'g'), (4, 'c', 'g')]
I tried using heapq.merge and itertools.groupby, but with merge I already lose information about the iterators:
>>> list(heapq.merge(a, b, key=lambda x: x[0]))
[(1, 'a'), (1, 'a'), (2, 't'), (3, 'g'), (4, 'c'), (4, 'g')]
So what I could use is a tag generator
def tagged(it, tag):
for item in it:
yield (tag, *x)
and merge the tagged iterators, group by the key and create a dict using the tag:
merged = merge(tagged(a, 'a'), tagged(b, 'b'), key=lambda x: x[1])
grouped = groupby(merged, key=lambda x: x[1])
[(key, {g[0]: g[2] for g in group}) for key, group in grouped]
Which gives me this usable output:
[(1, {'a': 'a', 'b': 'a'}),
(2, {'a': 't'}),
(3, {'b': 'g'}),
(4, {'a': 'c', 'b': 'g'})]
However, I think creating dicts for every group is quite costly performance wise, so maybe there is a more elegant way?
Edit:
To clarify, the dataset is too big to fit into memory, so I definitely need to use generators/iterators.
Edit 2:
To further clarify, a and b should only be iterated over once, because they represent huge files that are slow to read.
You can alter your groupby solution by using reduce and a generator in a function:
from itertools import groupby
from functools import reduce
def group_data(a, b):
sorted_data = sorted(a+b, key=lambda x:x[0])
data = [reduce(lambda x, y:(*x, y[-1]), list(b)) for _, b in groupby(sorted_data, key=lambda x:x[0])]
current = iter(range(len(list(filter(lambda x:len(x) == 2, data)))))
yield from [i if len(i) == 3 else (*i, None) if next(current)%2 == 0 else (i[0], None, i[-1]) for i in data]
print(list(group_data([(1, 'a'), (2, 't'), (4, 'c')], [(1, 'a'), (3, 'g'), (4, 'g')])))
Output:
[(1, 'a', 'a'), (2, 't', None), (3, None, 'g'), (4, 'c', 'g')]
Here is one solution via dictionaries. I provide it here as it's not clear to me that dictionaries are inefficient in this case.
I believe dict_of_lists can be replaced by an iterator, but I use it in the below solution for demonstration purposes.
a = [(1, 'a'), (2, 't'), (4, 'c')]
b = [(1, 'a'), (3, 'g'), (4, 'g')]
dict_of_lists = {'a': a, 'b': b}
def gen_results(dict_of_lists):
keys = {num for k, v in dict_of_lists.items() \
for num, val in v}
for key in keys:
d = {k: val for k, v in dict_of_lists.items() \
for num, val in v if num == key}
yield (key, d)
Result
list(gen_results(dict_of_lists))
[(1, {'a': 'a', 'b': 'a'}),
(2, {'a': 't'}),
(3, {'b': 'g'}),
(4, {'a': 'c', 'b': 'g'})]

Combinations with max repetitions per element

I want to get a list of k-sized tuples with the combinations of a list of elements (let's call it elements) similar to what itertools.combinations_with_replacement(elements, k) would do. The difference is that I want to add a maximum to the number of replacements per element.
So for example if I run the following:
elements = ['a', 'b']
print(list(itertools.combinations_with_replacement(elements, 3)))
I get:
[('a', 'a', 'a'), ('a', 'a', 'b'), ('a', 'b', 'b'), ('b', 'b', 'b')]
I would like to have something like the following:
elements = {'a': 2, 'b': 3}
print(list(combinations_with_max_replacement(elements, 3)))
Which would print
[('a', 'a', 'b'), ('a', 'b', 'b'), ('b', 'b', 'b')]
Notice that the max number of 'a' in each tuple is 2 so ('a', 'a', 'a') is not part of the result.
I'd prefer to avoid looping through the results of itertools.combinations_with_replacement(elements, k) counting the elements in each tuple and filtering them out.
Let me know if I can give any further info.
Thanks for the help!
UPDATE
I tried:
elements = ['a'] * 2 + ['b'] * 3
print(set(itertools.combinations(elements, 3)))
and get:
{('a', 'b', 'b'), ('b', 'b', 'b'), ('a', 'a', 'b')}
I get the elements I need but I lose the order and seems kind of hacky
I know you don't want to loop through the results but maybe it's easier to filter the output in this way.
def custom_combinations(elements, max_count):
L = list(itertools.combinations_with_replacement(elements, max_count))
for element in elements.keys():
L = list(filter(lambda x: x.count(element) <= elements[element], L))
return L
Pure Python Solution (i.e. without itertools)
You can use recursion:
def combos(els, l):
if l == 1:
return [(k,) for k, v in els.items() if v]
cs = []
for e in els:
nd = {k: v if k != e else v - 1 for k, v in els.items() if v}
cs += [(e,)+c for c in combos(nd, l-1)]
return cs
and a test shows it works:
>>> combos({'a': 2, 'b': 3}, 3)
[('b', 'b', 'b'), ('b', 'b', 'a'), ('b', 'a', 'b'), ('b', 'a', 'a'), ('a', 'b', 'b'), ('a', 'b', 'a'), ('a', 'a', 'b')]
note that we do loose the order but this is unavoidable if we are passing els as a dictionary as you requested.
I believe this recursive solution has the time complexity you desire.
Rather than passing down a dict, we pass down a list of the item pairs. We also pass down start_idx, which tells the 'lower' recursive function calls to ignore earlier elements. This fixes the out-of-order problem of the other recursive answer.
def _combos(elements, start_idx, length):
# ignore elements before start_idx
for i in range(start_idx, len(elements)):
elem, count = elements[i]
if count == 0:
continue
# base case: only one element needed
if length == 1:
yield (elem,)
else:
# need more than one elem: mutate the list and recurse
elements[i] = (elem, count - 1)
# when we recurse, we ignore elements before this one
# this ensures we find combinations, not permutations
for combo in _combos(elements, i, length - 1):
yield (elem,) + combo
# fix the list
elements[i] = (elem, count)
def combos(elements, length):
elements = list(elements.items())
return _combos(elements, 0, length)
print(list(combos({'a': 2, 'b': 3}, 3)))
# [('a', 'a', 'b'), ('a', 'b', 'b'), ('b', 'b', 'b')]
As an bonus, profiling shows it's more performant than the set(itertools.combinations(_)) solution as the input size grows.
print(timeit.Timer("list(combos({'a': 2, 'b': 2, 'c': 2}, 3))",
setup="from __main__ import combos").timeit())
# 9.647649317979813
print(timeit.Timer("set(itertools.combinations(['a'] * 2 + ['b'] * 2 + ['c'] * 2, 3))").timeit())
# 1.7750148189952597
print(timeit.Timer("list(combos({'a': 4, 'b': 4, 'c': 4}, 4))",
setup="from __main__ import combos").timeit())
# 20.669851204031147
print(timeit.Timer("set(itertools.combinations(['a'] * 4 + ['b'] * 4 + ['c'] * 4, 4))").timeit())
# 28.194088937016204
print(timeit.Timer("list(combos({'a': 5, 'b': 5, 'c': 5}, 5))",
setup="from __main__ import combos").timeit())
# 36.4631432640017
print(timeit.Timer("set(itertools.combinations(['a'] * 5 + ['b'] * 5 + ['c'] * 5, 5))").timeit())
# 177.29063899395987

Python - Sorting a Tuple-Keyed Dictionary by Tuple Value

I have a dictionary constituted of tuple keys and integer counts and I want to sort it by the third value of the tuple (key[2]) as so
data = {(a, b, c, d): 1, (b, c, b, a): 4, (a, f, l, s): 3, (c, d, j, a): 7}
print sorted(data.iteritems(), key = lambda x: data.keys()[2])
with this desired output
>>> {(b, c, b, a): 4, (a, b, c, d): 1, (c, d, j, a): 7, (a, f, l, s): 3}
but my current code seems to do nothing. How should this be done?
Edit: The appropriate code is
sorted(data.iteritems(), key = lambda x: x[0][2])
but in the context
from collections import Ordered Dict
data = {('a', 'b', 'c', 'd'): 1, ('b', 'c', 'b', 'a'): 4, ('a', 'f', 'l', 's'): 3, ('c', 'd', 'j', 'a'): 7}
xxx = []
yyy = []
zzz = OrderedDict()
for key, value in sorted(data.iteritems(), key = lambda x: x[0][2]):
x = key[2]
y = key[3]
xxx.append(x)
yyy.append(y)
zzz[x + y] = 1
print xxx
print yyy
print zzz
zzz is unordered. I know that this is because dictionaries are by default unordered and that I need to use OrderedDict to sort it but I don't know where to use it. If I use it as the checked answer suggests I get a 'tuple index out of range' error.
Solution:
from collections import OrderedDict
data = {('a', 'b', 'c', 'd'): 1, ('b', 'c', 'b', 'a'): 4, ('a', 'f', 'l', 's'): 3, ('c', 'd', 'j', 'a'): 7}
xxx = []
yyy = []
zzz = OrderedDict()
for key, value in sorted(data.iteritems(), key = lambda x: x[0][2]):
x = key[2]
y = key[3]
xxx.append(x)
yyy.append(y)
zzz[x + y] = 1
print xxx
print yyy
print zzz
Dictionaries are unordered in Python. You can however use an OrderedDict.
You then have to sort like:
from collections import OrderedDict
result = OrderedDict(sorted(data.iteritems(),key=lambda x:x[0][2]))
You need to use key=lambda x:x[0][2] because the elements are tuples (key,val) and so to obtain the key, you use x[0].
This gives:
>>> data = {('a', 'b', 'c', 'd'): 1, ('b', 'c', 'b', 'a'): 4, ('a', 'f', 'l', 's'): 3, ('c', 'd', 'j', 'a'): 7}
>>> from collections import OrderedDict
>>> result = OrderedDict(sorted(data.iteritems(),key=lambda x:x[0][2]))
>>> result
OrderedDict([(('b', 'c', 'b', 'a'), 4), (('a', 'b', 'c', 'd'), 1), (('c', 'd', 'j', 'a'), 7), (('a', 'f', 'l', 's'), 3)])
EDIT:
In order to make zzz ordered as well, you can update your code to:
data = {('a', 'b', 'c', 'd'): 1, ('b', 'c', 'b', 'a'): 4, ('a', 'f', 'l', 's'): 3, ('c', 'd', 'j', 'a'): 7}
xxx = []
yyy = []
zzz = OrderedDict()
for key, value in sorted(data.iteritems(), key = lambda x: x[0][2]):
x = key[2]
y = key[3]
xxx.append(x)
yyy.append(y)
zzz[x + y] = 1
print xxx
print yyy
print zzz
Your key function is completely broken. It's passed the current value as x, but you ignore that and instead always get the second item from the list of keys.
Use key=lambda x: x[0][2] instead.

Categories