efficient list mapping in python - python

I have the following input:
input = [(dog, dog, cat, mouse), (cat, ruby, python, mouse)]
and trying to have the following output:
outputlist = [[0, 0, 1, 2], [1, 3, 4, 2]]
outputmapping = {0:dog, 1:cat, 2:mouse, 3:ruby, 4:python, 5:mouse}
Any tips on how to handle given with scalability in mind (var input can get really large).

You probably want something like:
import collections
import itertools
def build_catalog(L):
counter = itertools.count().next
names = collections.defaultdict(counter)
result = []
for t in L:
new_t = [ names[item] for item in t ]
result.append(new_t)
catalog = dict((name, idx) for idx, name in names.iteritems())
return result, catalog
Using it:
>>> input = [('dog', 'dog', 'cat', 'mouse'), ('cat', 'ruby', 'python', 'mouse')]
>>> outputlist, outputmapping = build_catalog(input)
>>> outputlist
[[0, 0, 1, 2], [1, 3, 4, 2]]
>>> outputmapping
{0: 'dog', 1: 'cat', 2: 'mouse', 3: 'ruby', 4: 'python'}

This class will automatically map objects to increasing integer values:
class AutoMapping(object):
def __init__(self):
self.map = {}
self.objects = []
def __getitem__(self, val):
if val not in self.map:
self.map[val] = len(self.objects)
self.objects.append(val)
return self.map[val]
Example usage, for your input:
>>> input = [('dog', 'dog', 'cat', 'mouse'), ('cat', 'ruby', 'python', 'mouse')]
>>> map = AutoMapping()
>>> [[map[x] for x in y] for y in input]
[[0, 0, 1, 2], [1, 3, 4, 2]]
>>> map.objects
['dog', 'cat', 'mouse', 'ruby', 'python']
>>> dict(enumerate(map.objects))
{0: 'dog', 1: 'cat', 2: 'mouse', 3: 'ruby', 4: 'python'}

Here is one possible solution, although it isn't the greatest. It could be made slightly more efficient if you know how many elements each entry in the list will have before-hand, by pre-allocating them.
labels=[];
label2index={};
outputlist=[];
for group in input:
current=[];
for label in group:
if label not in label2index:
label2index[label]=len(labels);
labels.append(label);
current.append(label2index[label]);
outputlist.append(current);
outputmapping={};
for idx, val in enumerate(labels):
outputmapping[idx]=val;

I had the same problem quite often in my projects, so I wrapped up a class some time ago that does exactly this:
class UniqueIdGenerator(object):
"""A dictionary-like class that can be used to assign unique integer IDs to
names.
Usage:
>>> gen = UniqueIdGenerator()
>>> gen["A"]
0
>>> gen["B"]
1
>>> gen["C"]
2
>>> gen["A"] # Retrieving already existing ID
0
>>> len(gen) # Number of already used IDs
3
"""
def __init__(self, id_generator=None):
"""Creates a new unique ID generator. `id_generator` specifies how do we
assign new IDs to elements that do not have an ID yet. If it is `None`,
elements will be assigned integer identifiers starting from 0. If it is
an integer, elements will be assigned identifiers starting from the given
integer. If it is an iterator or generator, its `next` method will be
called every time a new ID is needed."""
if id_generator is None:
id_generator = 0
if isinstance(id_generator, int):
import itertools
self._generator = itertools.count(id_generator)
else:
self._generator = id_generator
self._ids = {}
def __getitem__(self, item):
"""Retrieves the ID corresponding to `item`. Generates a new ID for `item`
if it is the first time we request an ID for it."""
try:
return self._ids[item]
except KeyError:
self._ids[item] = self._generator.next()
return self._ids[item]
def __len__(self):
"""Retrieves the number of added elements in this UniqueIDGenerator"""
return len(self._ids)
def reverse_dict(self):
"""Returns the reversed mapping, i.e., the one that maps generated IDs to their
corresponding items"""
return dict((v, k) for k, v in self._ids.iteritems())
def values(self):
"""Returns the list of items added so far. Items are ordered according to
the standard sorting order of their keys, so the values will be exactly
in the same order they were added if the ID generator generates IDs in
ascending order. This hold, for instance, to numeric ID generators that
assign integers starting from a given number."""
return sorted(self._ids.keys(), key = self._ids.__getitem__)
Usage example:
>>> input = [(dog, dog, cat, mouse), (cat, ruby, python, mouse)]
>>> gen = UniqueIdGenerator()
>>> outputlist = [[gen[x] for x in y] for y in input]
[[0, 0, 1, 2], [1, 3, 4, 2]]
>>> print outputlist
>>> outputmapping = gen.reverse_dict()
>>> print outputmapping
{0: 'dog', 1: 'cat', 2: 'mouse', 3: 'ruby', 4: 'python'}

Related

How to create a dict from list where values are elements of the list and keys are function of those elements in python?

I have a list of caller_address elements. For each of these addresses I can get a caller_function, a function containing that caller_address. In a single function there may be more than 1 address.
So if I have a list of caller_address elements:
caller_addresses = [1, 2, 3, 4, 5, 6, 7, 8]
For each of them I can get a function:
caller_functions = [getFunctionContaining(addr) for addr in caller_addresses]
print(caller_functions)
# prints(example): ['func1', 'func1', 'func2', 'func2', 'func2', 'func2', 'func3', 'func3']
In the result I need to get a dict where keys are the functions and values are lists of addresses those functions contain. In my example in must be:
{'func1': [1, 2], 'func2': [3, 4, 5, 6], 'func3': [7, 8]}
# Means 'func1' contains addresses 1 and 2, 'func2' contains 3, 4, 5 and 6, ...
It would be great if there was a function like:
result = to_dict(lambda addr: getFunctionContaining(addr), caller_addresses)
to get the same result.
Where the first argument is the function for keys and the second argument is the list of values. Is there such function in standard library in python?
I could implement it with for loop and dict[getFunctionContaining(addr)].append(addr), but I'm looking for more pythonic way to do this.
Thanks!
Found a solution using itertools.groupby.
This solution is also faster than a solution using a loop.
import itertools
import time
def f(v):
if v < 5:
return 1
if v < 7:
return 2
return 3
def to_dict(key, list_):
out = {}
for el in list_:
out.setdefault(key(el), []).append(el)
return out
def to_dict2(key, list_):
return {k: list(v) for k, v in itertools.groupby(list_, key)}
lst = [1, 2, 3, 4, 5, 6, 7, 8] * 10**4
COUNT = 1000
def timeit(to_dict_f):
elapsed_sum = 0
for _ in range(COUNT):
elapsed_sum -= time.time()
to_dict_f(f, lst)
elapsed_sum += time.time()
return elapsed_sum / COUNT
print('Average time: ', timeit(to_dict), timeit(to_dict2))
Results:
Average time: 0.014930561065673828 0.01346096110343933
to_dict2 (itertools.groupby) on average takes less time than to_dict (loop)

How key in python sorted method can use a list

I am kind of new to Python. I saw the following code and do not understand how a "list" can be used for sorting a string.
lookup = defaultdict(list)
## Filling the lookup
# .....
# .....
inputs = ['abc', 'acb', 'acb'] # a list of strings
result = ''.join(sorted(inputs[0], key=lookup.get))
What I don't understand is the last line the key part. I know it does a lexicographical sort based on the values in the list. I appreciate it if someone can explain it or break this step down to a more readable solution.
for example, if the lookup table looks like this:
{'a' : [-3, 0, 0], 'b': [0, -1, -2], 'c': [0, -2, -1]}
then the result will be this acb
The key argument to sorted means "Pretend the value is the result of this function instead of the actual value." So when you sort 'abc' with the lookup table you gave, it does this:
# [1st, 2nd, 3rd] sort order
lookup.get('a') # [ -3, 0, 0]
lookup.get('b') # [ 0, -1, -2]
lookup.get('c') # [ 0, -2, -1]
Then it will figure out the sorted order of the above values. Lists are sorted lexicographically meaning the first element is compared first, just like in a dictionary ("aardvark" comes before "beaver" and also before "ant").
After looking at the first elements (-3, 0, 0) we know 'a' has the smallest value, but we don't know which of 'b' and 'c' is smaller. But as soon as we see the second elements (0, -1, -2), we know that 'c' is smaller, so the final order is 'acb' without ever consulting the third elements (0, -2, -1).
so from your example, imagine you have the following
lookup = defaultdict(list)
lookup['a'] = [-3, 0, 0]
lookup['b'] = [0, -1, -2]
lookup['c'] = [0, -2, -1]
inputs = ['abc', 'acb', 'acb'] # a list of strings
# note that the key params of sort usually takes a function
result = ''.join(sorted(
inputs[0], # this is the first value 'abc' of the input list
key=lookup.get # passing in lookup.get()
))
the sort function passing in each value of the string 'abc'
lookup.get(a) # first
lookup.get(b) # next
lookup.get(c) # next
To understand the logic of comparison, it's internal for most data structure, you can implement yours for a custom class , __lt__ less than . __gt__ greater than
class my_int(int):
def __lt__(a,b):
return (a % b) % 2 != 0
def __gt__(a,b):
return (a % b) % 2 == 0
Suppose you have a list of animals:
>>> animals=['aarvark','zebra','giraffe','bear','dog','cat','badger','ant']
Sorted lexicographically, or in alphabetical order, aardvark is sorted before ant and both before zebra:
>>> sorted(animals)
['aarvark', 'ant', 'badger', 'bear', 'cat', 'dog', 'giraffe', 'zebra']
Now suppose your 10 year old tells you I want all animals that start with 'b' sorted first, then 'z' then alphabetically.
With a key function, this is trivial to accomplish:
>>> lookup=['b','z']
>>> key_func=lambda s: (lookup.index(s[0]),s) if s[0] in lookup else (len(lookup),s)
>>> sorted(animals, key=key_func)
['badger', 'bear', 'zebra', 'aarvark', 'ant', 'cat', 'dog', 'giraffe']
Before the key function was added to Python sorting routines, the common approach to a problem like this was called Decorate, Sort, Undecorate and can be seen here:
>>> ts=sorted([(lookup.index(s[0]),s) if s[0] in lookup else (len(lookup), s) for s in animals])
>>> ts
[(0, 'badger'), (0, 'bear'), (1, 'zebra'), (2, 'aarvark'), (2, 'ant'), (2, 'cat'), (2, 'dog'), (2, 'giraffe')]
>>> [t[1] for t in ts]
['badger', 'bear', 'zebra', 'aarvark', 'ant', 'cat', 'dog', 'giraffe']
(BTW: This example is way easier and faster if you use a dict instead of a list:
>>> lookup={'b':0, 'z':1}
>>> sorted(animals, key=lambda s: (lookup.get(s[0], len(lookup)),s))
['badger', 'bear', 'zebra', 'aarvark', 'ant', 'cat', 'dog', 'giraffe']
That is the right way but your question involved list lookup...)
Key functions allow you to modify how the sort order is interpreted. For another example, consider if you wanted to sort by integers found in the sort strings and then alphabetically.
Here is the list:
>>> nl=['zebra65','ant101','bear5','no num', '2 num first', 's with 1 and 2']
If you just use the default, it comes out ASCIIbetically:
>>> sorted(nl)
['2 num first', 'ant101', 'bear5', 'no num', 's with 1 and 2', 'zebra65']
With a simple regex and key function, you can find all the numbers and form a tuple for sorting by number then the string:
import re
def find_n(s):
ml=re.findall(r'(\d+)', s)
if ml:
return tuple(map(int, ml))+(s,)
return (0,s)
>>> sorted(nl, key=find_n)
['no num', 's with 1 and 2', '2 num first', 'bear5', 'zebra65', 'ant101']

Take the mean of values in a list if a duplicate is found

I have 2 lists which are associated with each other. E.g., here, 'John' is associated with '1', 'Bob' is associated with 4, and so on:
l1 = ['John', 'Bob', 'Stew', 'John']
l2 = [1, 4, 7, 3]
My problem is with the duplicate John. Instead of adding the duplicate John, I want to take the mean of the values associated with the Johns, i.e., 1 and 3, which is (3 + 1)/2 = 2. Therefore, I would like the lists to actually be:
l1 = ['John', 'Bob', 'Stew']
l2 = [2, 4, 7]
I have experimented with some solutions including for-loops and the "contains" function, but can't seem to piece it together. I'm not very experienced with Python, but linked lists sound like they could be used for this.
Thank you
I believe you should use a dict. :)
def mean_duplicate(l1, l2):
ret = {}
# Iterating through both lists...
for name, value in zip(l1, l2):
if not name in ret:
# If the key doesn't exist, create it.
ret[name] = value
else:
# If it already does exist, update it.
ret[name] += value
# Then for the average you're looking for...
for key, value in ret.iteritems():
ret[key] = value / l1.count(key)
return ret
def median_between_listsElements(l1, l2):
ret = {}
for name, value in zip(l1, l2):
# Creating key + list if doesn't exist.
if not name in ret:
ret[name] = []
ret[name].append(value)
for key, value in ret.iteritems():
ret[key] = np.median(value)
return ret
l1 = ['John', 'Bob', 'Stew', 'John']
l2 = [1, 4, 7, 3]
print mean_duplicate(l1, l2)
print median_between_listsElements(l1, l2)
# {'Bob': 4, 'John': 2, 'Stew': 7}
# {'Bob': 4.0, 'John': 2.0, 'Stew': 7.0}
The following might give you an idea. It uses an OrderedDict assuming that you want the items in the order of appearance from the original list:
from collections import OrderedDict
d = OrderedDict()
for x, y in zip(l1, l2):
d.setdefault(x, []).get(x).append(y)
# OrderedDict([('John', [1, 3]), ('Bob', [4]), ('Stew', [7])])
names, values = zip(*((k, sum(v)/len(v)) for k, v in d.items()))
# ('John', 'Bob', 'Stew')
# (2.0, 4.0, 7.0)
Here is a shorter version using dict,
final_dict = {}
l1 = ['John', 'Bob', 'Stew', 'John']
l2 = [1, 4, 7, 3]
for i in range(len(l1)):
if final_dict.get(l1[i]) == None:
final_dict[l1[i]] = l2[i]
else:
final_dict[l1[i]] = int((final_dict[l1[i]] + l2[i])/2)
print(final_dict)
Something like this:
#!/usr/bin/python
l1 = ['John', 'Bob', 'Stew', 'John']
l2 = [1, 4, 7, 3]
d={}
for i in range(0, len(l1)):
key = l1[i]
if d.has_key(key):
d[key].append(l2[i])
else:
d[key] = [l2[i]]
r = []
for values in d.values():
r.append((key,sum(values)/len(values)))
print r
Hope following code helps
l1 = ['John', 'Bob', 'Stew', 'John']
l2 = [1, 4, 7, 3]
def remove_repeating_names(names_list, numbers_list):
new_names_list = []
new_numbers_list = []
for first_index, first_name in enumerate(names_list):
amount_of_occurencies = 1
number = numbers_list[first_index]
for second_index, second_name in enumerate(names_list):
# Check if names match and
# if this name wasn't read in earlier cycles or is not same element.
if (second_name == first_name):
if (first_index < second_index):
number += numbers_list[second_index]
amount_of_occurencies += 1
# Break the loop if this name was read earlier.
elif (first_index > second_index):
amount_of_occurencies = -1
break
if amount_of_occurencies is not -1:
new_names_list.append(first_name)
new_numbers_list.append(number/amount_of_occurencies)
return [new_names_list, new_numbers_list]
# Unmodified arrays
print(l1)
print(l2)
l1, l2 = remove_repeating_names(l1, l2)
# If you want numbers list to be integer, not float, uncomment following line:
# l2 = [int(number) for number in l2]
# Modified arrays
print(l1)
print(l2)

Adjacency matrix without nested loop

I'm trying to create a adjacency matrix for a set of team players. I've stored the match details like this-
x={'match1':[1,2,3],'match2':[2,3,4],'match3':[3,4,5]}
Here each key word has a list of values that contains the team players that played for that match.
I'm trying to create an adjacency matrix that shows how many matches each team player played with another team member
The output should look like this
1 2 3 4 5
1 1 1 1 0 0
2 1 2 2 1 0
3 1 2 3 2 1
4 0 1 2 2 1
5 0 0 1 1 1
(i,i) element in the matrix is the total number of matches played by that team member. I've been able to calculate the values correctly using Counter.
from collections import defaultdict, Counter
if __name__=='__main__':
x = {'match1': [1, 2, 3], 'match2': [2, 3, 4], 'match3': [3, 4, 5]}
d = defaultdict(list)
col_count = dict()
for key, value in x.items():
for i in value:
d[i] += value
for key, value in d.items():
col_count[key] = Counter(d[key])
print(col_count)
The output is:
{1: Counter({1: 1, 2: 1, 3: 1}), 2: Counter({2: 2, 3: 2, 1: 1, 4: 1}), 3: Counter({3: 3, 2: 2, 4: 2, 1: 1, 5: 1}), 4: Counter({3: 2, 4: 2, 2: 1, 5: 1}), 5: Counter({3: 1, 4: 1, 5: 1})}
Given that x dictionary will contain a large number of keys and each key will have a list with many elements, I wish to avoid the use of nested for loops.
Is it possible to store the match details as any other data type so that subsequent computation will not require nested loops?
If a dictionary is the best way, is it possible to calculate the matrix some other way?
Without modifying the input and output formats I do not see how to avoid nested loops as you have information grouped by match and you want to extract it by player. What you could actually do is avoid the last loop by creating the Counter inside the nested loop:
from collections import defaultdict, Counter
if __name__=='__main__':
x = {'match1': [1, 2, 3], 'match2': [2, 3, 4], 'match3': [3, 4, 5]}
d = defaultdict(Counter)
for key, value in x.items():
for i in value:
d[i].update(value)
print(d)
If the input can be modified you could go for something like:
from collections import Counter
if __name__=='__main__':
x = {1: [{1, 2, 3}], 2: [{1, 2, 3}, {2, 3, 4}], 3: [{1, 2, 3}, {2, 3, 4}, {3, 4, 5}], 4: [{2, 3, 4}, {3, 4, 5}], 5: [{3, 4, 5}]}
d = {player: Counter([p for match in matches for p in match]) for player, matches in x.items()}
print(d)
Where you swap the nested loops for a dict and a list comprehension which should be more efficient. Probably players and matches are not ints and lists of ints so this could be done a bit more redable. For example:
from collections import defaultdict, Counter
def printMatrix(matrix):
print(' '.join([' |'] + list(map(str, matrix.keys()))))
print('---+-' + '-'*len(matrix)*2)
for row, values in matrix.items():
fmt = ' {row} |'
for col in matrix.keys():
fmt += ' {values[' + str(col) + ']}'
print(fmt.format(row=row, values=values))
class Entity:
def __init__(self):
self._id = None
#classmethod
def register(cls, value):
if value in cls.ids:
raise ValueError("The provided ID is already in use")
cls.ids.add(value)
#classmethod
def unregister(cls, value):
if value is not None:
cls.ids.remove(value)
#property
def id(self):
return self._id
#id.setter
def id(self, value):
if value == self.id:
return
self.register(value)
self.unregister(self.id)
self._id = value
class Player(Entity):
ids = set()
def __init__(self, pid):
super().__init__()
self.id = pid
self.__matches = set()
def __repr__(self):
return 'Player<{}>'.format(self.id)
#property
def matches(self):
return set(self.__matches)
def inscribe(self, match):
if match not in self.__matches:
self.__matches.add(match)
def delist(self, match):
self.__matches.remove(match)
class Match(Entity):
ids = set()
def __init__(self, mid, players):
super().__init__()
self.id = mid
self.__players = set()
self.players = players
for player in players:
player.inscribe(self)
def __repr__(self):
return 'Match<{}>'.format(self.id)
#property
def players(self):
return set(self.__players)
#players.setter
def players(self, value):
for player in self.__players:
player.delist(self)
self.__players = set(value)
for player in self.__players:
player.inscribe(self)
if __name__=='__main__':
players = [Player(i) for i in range(1, 6)]
matches = [Match(i, {players[i-1], players[i], players[i+1]}) for i in range(1, 4)]
for player in players:
print(player, player.matches)
for match in matches:
print(match, match.players)
d = {player.id: Counter([p.id for match in player.matches for p in match.players]) for player in players}
printMatrix(d)
The printMatrix() function is just a helper I made to pretty-print the output into the screen.
The Entity class avoids duplicate code that would be needed for both the Player and Match classes as they both have unique IDs. The constructor creates a empty _id attribute. The register() and unregister() methods handle adding and removing IDs from the class attribute ids. It also declares the id property with its getter and setter. Children classes only need to call super().__init__() at the constructor and create the ids class attribute at the level where the ID-uniqueness wants to be enforced, as both Player and Match are doing.
The Player class additionally has the matches instance read-only property that is populated and depopulated with inscribe() and delist() methods respectively. The Match class has the players property with its getter and setter methods.
First the players and matches are created with two list comprehensions (remember that lists start at position 0, so the Player with ID 1 is at players[0]) and printed with their corresponding relations (matches that they play for players and players that participate for the matches).
As both are keeping a reference to the other type, we can get all the info needed to build the dict of Counters you requested just from players.

combining sets within a list

Hi so I'm trying to do the following but have gotten a bit stuck. Say I have a list of sets:
A = [set([1,2]), set([3,4]), set([1,6]), set([1,5])]
I want to create a new list which looks like the following:
B = [ set([1,2,5,6]), set([3,4]) ]
i.e create a list of sets with the sets joined if they overlap. This is probably simple but I can't quite get it right this morning.
This also works and is quite short:
import itertools
groups = [{'1', '2'}, {'3', '2'}, {'2', '4'}, {'5', '6'}, {'7', '8'}, {'7','9'}]
while True:
for s1, s2 in itertools.combinations(groups, 2):
if s1.intersection(s2):
break
else:
break
groups.remove(s1)
groups.remove(s2)
groups.append(s1.union(s2))
groups
This gives the following output:
[{'5', '6'}, {'1', '2', '3', '4'}, {'7', '8', '9'}]
The while True does seems a bit dangerous to me, any thoughts anyone?
How about:
from collections import defaultdict
def sortOverlap(listOfTuples):
# The locations of the values
locations = defaultdict(lambda: [])
# 'Sorted' list to return
sortedList = []
# For each tuple in the original list
for i, a in enumerate(listOfTuples):
for k, element in enumerate(a):
locations[element].append(i)
# Now construct the sorted list
coveredElements = set()
for element, tupleIndices in locations.iteritems():
# If we've seen this element already then skip it
if element in coveredElements:
continue
# Combine the lists
temp = []
for index in tupleIndices:
temp += listOfTuples[index]
# Add to the list of sorted tuples
sortedList.append(list(set(temp)))
# Record that we've covered this element
for element in sortedList[-1]:
coveredElements.add(element)
return sortedList
# Run the example (with tuples)
print sortOverlap([(1,2), (3,4), (1,5), (1,6)])
# Run the example (with sets)
print sortOverlap([set([1,2]), set([3,4]), set([1,5]), set([1,6])])
You could use intersection() and union() in for loops:
A = [set([1,2]), set([3,4]), set([1,6]), set([1,5])]
intersecting = []
for someSet in A:
for anotherSet in A:
if someSet.intersection(anotherSet) and someSet != anotherSet:
intersecting.append(someSet.union(anotherSet))
A.pop(A.index(anotherSet))
A.pop(A.index(someSet))
finalSet = set([])
for someSet in intersecting:
finalSet = finalSet.union(someSet)
A.append(finalSet)
print A
Output: [set([3, 4]), set([1, 2, 5, 6])]
A slightly more straightforward solution,
def overlaps(sets):
overlapping = []
for a in sets:
match = False
for b in overlapping:
if a.intersection(b):
b.update(a)
match = True
break
if not match:
overlapping.append(a)
return overlapping
examples
>>> overlaps([set([1,2]), set([1,3]), set([1,6]), set([3,5])])
[{1, 2, 3, 5, 6}]
>>> overlaps([set([1,2]), set([3,4]), set([1,6]), set([1,5])])
[{1, 2, 5, 6}, {3, 4}]
for set_ in A:
new_set = set(set_)
for other_set in A:
if other_set == new_set:
continue
for item in other_set:
if item in set_:
new_set = new_set.union(other_set)
break
if new_set not in B:
B.append(new_set)
Input/Output:
A = [set([1,2]), set([3,4]), set([2,3]) ]
B = [set([1, 2, 3]), set([2, 3, 4]), set([1, 2, 3, 4])]
A = [set([1,2]), set([3,4]), set([1,6]), set([1,5])]
B = [set([1, 2, 5, 6]), set([3, 4])]
A = [set([1,2]), set([1,3]), set([1,6]), set([3,5])]
B = [set([1, 2, 3, 6]), set([1, 2, 3, 5, 6]), set([1, 3, 5])]
This function will do the job, without touching the input:
from copy import deepcopy
def remove_overlapped(input_list):
input = deepcopy(input_list)
output = []
index = 1
while input:
head = input[0]
try:
next_item = input[index]
except IndexError:
output.append(head)
input.remove(head)
index = 1
continue
if head & next_item:
head.update(next_item)
input.remove(next_item)
index = 1
else:
index += 1
return output
Here is a function that does what you want. Probably not the most pythonic one but does the job, most likely can be improved a lot.
from sets import Set
A = [set([1,2]), set([3,4]), set([2,3]) ]
merges = any( a&b for a in A for b in A if a!=b)
while(merges):
B = [A[0]]
for a in A[1:] :
merged = False
for i,b in enumerate(B):
if a&b :
B[i]=b | a
merged =True
break
if not merged:
B.append(a)
A = B
merges = any( a&b for a in A for b in A if a!=b)
print B
What is happening there is the following, we loop all the sets in A, (except the first since we added that to B already. We check the intersection with all the sets in B, if the intersection result anything but False (aka empty set) we perform a union on the set and start the next iteration, about set operation check this page:
https://docs.python.org/2/library/sets.html
& is intersection operator
| is union operator
You can probably go more pythonic using any() etc but wuold have required more processing so I avoided that

Categories