So I currently have a nested list.
org_network=[[1, 2, 3], [1, 4, 5], [1, 3, 6], [7, 9, 10]]
I need to figure out how to manipulate it to create lists of possible combinations of the nested lists. These combinations cannot have lists that share numbers. Here is an example of what the result should be:
network_1=[[1,2,3],[7,9,10]]
network_2=[[1,4,5],[7,9,10]]
network_3=[[1,3,6],[7,9,10]]
Note:
1. This code is going to be linked to a constantly updated csv file, so the org_network list will have varying amounts of elements within it (which also means that there will be numerous resulting networks.
I have been working on this for about four hours and have yet to figure it out. Any help would be very appreciated. I have primarily been trying to use for loops and any() functions to no avail. Thanks for any help.
You can use itertools.combinations() with set intersection:
>>> from itertools import combinations
>>> org_network=[[1, 2, 3], [1, 4, 5], [1, 3, 6], [7, 9, 10]]
>>> [[x, y] for x, y in combinations(org_network, r=2) if not set(x).intersection(y)]
[[[1, 2, 3], [7, 9, 10]], [[1, 4, 5], [7, 9, 10]], [[1, 3, 6], [7, 9, 10]]]
Here is an approach that will be efficient if the number of unique elements is small relative to the number of sets.
Steps:
For each unique element, store indices of all sets in which the element does not occur.
For each set s in the network, find all other sets that contain every element of s using data from the first step.
Iterate over pairs, discarding duplicates based on ID order.
from functools import reduce
org_network = [[1, 2, 3], [1, 4, 5], [1, 3, 6], [7, 9, 10]]
# convert to sets
sets = [set(lst) for lst in org_network]
# all unique numbers
uniqs = set().union(*sets)
# map each unique number to sets that do not contain it:
other = {x: {i for i, s in enumerate(sets) if x not in s} for x in uniqs}
# iterate over sets:
for i, s in enumerate(sets):
# find all sets not overlapping with i
no_overlap = reduce(lambda l, r: l.intersection(r), (other[x] for x in s))
# iterate over non-overlapping sets
for j in no_overlap:
# discard duplicates
if j <= i:
continue
print([org_network[i], org_network[j]])
# result
# [[1, 2, 3], [7, 9, 10]]
# [[1, 4, 5], [7, 9, 10]]
# [[1, 3, 6], [7, 9, 10]]
Edit: If combinations of size greater than two are required, it is possible to modify the above approach. Here is an extension that uses depth-first search to traverse all pairwise disjoint combinations.
def not_overlapping(set_ids):
candidates = reduce(
lambda l, r: l.intersection(r), (other[x] for sid in set_ids for x in sets[sid])
)
mid = max(set_ids)
return {c for c in candidates if c > mid}
# this will produce "combinations" consisting of a single element
def iter_combinations():
combs = [[i] for i in range(len(sets))]
while combs:
comb = combs.pop()
extension = not_overlapping(comb)
combs.extend(comb + [e] for e in extension)
yield [org_network[i] for i in comb]
def iter_combinations_long():
for comb in iter_combinations():
if len(comb) > 1:
yield comb
all_combs = list(iter_combinations_long())
Related
There are two lists, which respectively represent two results of the clustering algorithm, such as com1 = [[1,2,3,4], [5, 6, 7, 8], [9]], where com1 represents a clustering result, [1,2,3,4] represents nodes 1,2,3,4 belong to the same class. [5,6,7,8] indicates that nodes 5,6,7,8 belong to the same class, and 9 belongs to a separate class. com2 = [[1, 2, 4], [3], [5, 6, 7, 8], [9]]. Now, I need to find out the common parts between com1 and com2, such as [1,2,4],[5,6,7,8],[9].
Is there an efficient way to solve this problem?
Assuming that a given value can only occur in one sublist in com1 and in one sublist in com2, we can observe the following:
Two values will belong to the same sublist in the result when they belong to the same sublist in com1 and also in the same sublist in com2.
So we could collect for each value the two indices of the sublists they belong to: one index that identifies the sublist in com1, and another that identifies the sublist in com2.
We can use those pairs as keys that uniquely identify a target sublist, and populate those sublists accordinly:
from collections import defaultdict
def combine(com1, com2):
d = defaultdict(list)
for com in com1, com2:
for i, lst in enumerate(com):
for val in lst:
d[val].append(i)
res = defaultdict(list)
for val, key in d.items():
res[tuple(key)].append(val)
return list(res.values())
# Example 1
com1 = [[1,2,3,4], [5, 6, 7, 8], [9]]
com2 = [[1, 2, 4], [3], [5, 6, 7, 8], [9]]
print(combine(com1, com2)) # [[1, 2, 4], [3], [5, 6, 7, 8], [9]]
# Example 2
com1 = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
com2 = [[1, 9], [2, 3, 4], [5, 6, 7, 8], [9]]
print(combine(com1, com2)) # [[1], [2, 3], [4], [5, 6], [7, 8], [9]]
If we assume an amortised constant time complexity for dictionary get/set actions, then this brings the total time complexity to O(đť‘›) where đť‘› represents the number of values in the list that gets partitioned.
How can I compare a list of lists with itself in python in order to:
identify identical sublists with the same items (not necessarily in the same
item order)
delete these duplicate sublists
Example:
list = [ [1, 3, 5, 6], [7, 8], [10, 12], [9], [3, 1, 5, 6], [12, 10] ]
clean_list = [ [1, 3, 5, 6], [7, 8], [10, 12], [9] ]
Any help is greatly appreciated.
I can't seem to figure this out.
I would rebuild the "clean_list" in a list comprehension, checking that the sorted version of the sublist isn't already in the previous elements
the_list = [ [1, 3, 5, 6], [7, 8], [10, 12], [9], [3, 1, 5, 6], [12, 10] ]
clean_list = [l for i,l in enumerate(the_list) if all(sorted(l)!=sorted(the_list[j]) for j in range(0,i))]
print(clean_list)
of course, sorting the items for each iteration is time consuming, so you could prepare a sorted list of sublists:
the_sorted_list = [sorted(l) for l in the_list]
and use it:
clean_list = [the_list[i] for i,l in enumerate(the_sorted_list) if all(l!=the_sorted_list[j] for j in range(0,i))]
result (in both cases):
[[1, 3, 5, 6], [7, 8], [10, 12], [9]]
As many suggested, maybe a simple for loop (no list comprehension there) storing the already seen items in a set would be more performant for the lookup of the duplicates. That alternate solution could be necessary if the input list is really big to avoid the O(n) lookup of all.
An example of implementation could be:
test_set = set()
clean_list = []
for l in the_list:
sl = sorted(l)
tsl = tuple(sl)
if not tsl in test_set:
test_set.add(tsl) # note it down to avoid inserting it next time
clean_list.append(sl)
Create a set. Then for each list in the list, sort it, transform into tuple, then insert into set.
setOfLists = set()
for list in listOfLists:
list.sort()
setOfLists.add(tuple(list))
print setOfLists
You can retransform the tuples in the set into lists again.
Simple for loops will work, but if your dataset is small, e.g. 1k or less, you can use this :
b = []
[b.append(i) for i in a if len([j for j in b if set(j) == set(i)])==0 ]
print b
So heres my take on this.
I def a function that sorts each sublist and appends to a temp list. then I check if the sublist in temp_my_list is 'not' in temp_clean_list and if not then append to new list. this should work for any 2 sets of list. I added some extra list to show some kind of result other than an empty string.
my_list = [[1, 3, 5, 6], [7, 8], [10, 12], [9], [3, 1, 5, 6], [12, 10],[16]]
clean_list = [ [1, 3, 5, 6], [7, 8], [10, 12], [9],[18]]
new_list = []
def getNewList():
temp_my_list = []
temp_clean_list = []
for sublist in my_list:
sublist.sort()
temp_my_list.append(msublist)
for sublist in clean_list:
sublist.sort()
temp_clean_list.append(sublist)
for sublist in temp_my_list:
if sublist not in temp_clean_list:
new_list.append(sublist)
getNewList()
print (new_list)
Resulit:
[[16]]
Suppose I have this list:
newlis = [[3, 6, 4, 10], [1, 9, 2, 5], [0, 7, 8]]
I want to sort it in a way that each list is sorted. For instance:
newlis = [[3, 4, 6, 10], [1, 2, 5, 9], [0, 7, 8]]
I tried to write this code:
for i in range(len(newlis)):
if j in newlis[i] < newlis[i+1]:
newlis[i],newlis[i+1]=newlis[i+1],newlis[i]
print newlis
It's not working though. Can someone please help me out? Built in function are not allowed.
There are many things wrong here (among which is that this sounds like a homework question and we aren't supposed to respond to those) but I will give you some helpful advice:
You are comparing element J in list I to list I + 1.
You would want to compare element J in list I to element J + 1 in list I.
Also, you appear to be attempting to sort backwards. You will end up with large left and small right.
Also this is not a sorting algorithm. What happens when you have an array like
[3,6,4,10] => [6,4,10,3]
which is still not ordered, at all. Sorting algorithms are simple, but not that simple. I recommend looking them up.
In if j in newlis[i] < newlis[i+1]:, you are comparing sublists and not the elements of the sublists itself. You need two loops, one for iterating on newlis, and one for sorting the elements of each sublist of newlis.
A sample using Bubble Sort:
You can test it here:
>>> newlis = [[3, 6, 4, 10], [1, 9, 2, 5], [0, 7, 8]]
>>> for sublist in newlis:
... for i in range(len(sublist) - 1):
... if sublist[i] > sublist[i + 1]:
... sublist[i], sublist[i + 1] = sublist[i + 1], sublist[i]
>>> print(newlis)
[[3, 4, 6, 10], [1, 2, 5, 9], [0, 7, 8]]
Links about Bubble Sort:
http://www-ee.eng.hawaii.edu/~tep/EE160/Book/chap10/subsection2.1.2.2.html
http://www.go4expert.com/articles/bubble-sort-algorithm-absolute-beginners-t27883/
I want to merge two arrays in python based on the first element in each column of each array.
For example,
A = ([[1, 2, 3],
[4, 5, 6],
[4, 6, 7],
[5, 7, 8],
[5, 9, 1]])
B = ([[1, .002],
[4, .005],
[5, .006]])
So that I get an array
C = ([[1, 2, 3, .002],
[4, 5, 6, .005],
[4, 6, 7, .005],
[5, 7, 8, .006],
[5, 9, 1, .006]])
For more clarity:
First column in A is 1, 4, 4, 5, 5 and
First column of B is 1, 4, 5
So that 1 in A matches up with 1 in B and gets .002
How would I do this in python? Any suggestions would be great.
Is it Ok to modify A in place?:
d = dict((x[0],x[1:]) for x in B)
Now d is a dictionary where the first column are keys and the subsequent columns are values.
for lst in A:
if lst[0] in d: #Is the first value something that we can extend?
lst.extend(d[lst[0]])
print A
To do it out of place (inspired by the answer by Ashwini):
d = dict((x[0],x[1:]) for x in B)
C = [lst + d.get(lst[0],[]) for lst in A]
However, with this approach, you need to have lists in both A and B. If you have some lists and some tuples it'll fail (although it could be worked around if you needed to), but it will complicate the code slightly.
with either of these answers, B can have an arbitrary number of columns
As a side note on style: I would write the lists as:
A = [[1, 2, 3],
[4, 5, 6],
[4, 6, 7],
[5, 7, 8],
[5, 9, 1]]
Where I've dropped the parenthesis ... They make it look too much like you're putting a list in a tuple. Python's automatic line continuation happens with parenthesis (), square brackets [] or braces {}.
(This answer assumes these are just regular lists. If they’re NumPy arrays, you have more options.)
It looks like you want to use B as a lookup table to find values to add to each row of A.
I would start by making a dictionary out of the data in B. As it happens, B is already in just the right form to be passed to the dict() builtin:
B_dict = dict(B)
Then you just need to build C row by row.
For each row in A, row[0] is the first element, so B_dict[row[0]] is the value you want to add to the end of the row. Therefore row + [B_dict[row[0]] is the row you want to add to C.
Here is a list comprehension that builds C from A and B_dict.
C = [row + [B_dict[row[0]]] for row in A]
You can convert B to a dictionary first, with the first element of each sublist as key and second one as value.
Then simply iterate over A and append the related value fetched from the dict.
In [114]: A = ([1, 2, 3],
[4, 5, 6],
[4, 6, 7],
[5, 7, 8],
[6, 9, 1])
In [115]: B = ([1, .002],
[4, .005],
[5, .006])
In [116]: [x + [dic[x[0]]] if x[0] in dic else [] for x in A]
Out[116]:
[[1, 2, 3, 0.002],
[4, 5, 6, 0.005],
[4, 6, 7, 0.005],
[5, 7, 8, 0.006],
[6, 9, 1]]
Here is a solution using itertools.product() that prevents having to create a dictionary for B:
In [1]: from itertools import product
In [2]: [lst_a + lst_b[1:] for (lst_a, lst_b) in product(A, B) if lst_a[0] == lst_b[0]]
Out[2]:
[[1, 2, 3, 0.002],
[4, 5, 6, 0.005],
[4, 6, 7, 0.005],
[5, 7, 8, 0.006],
[5, 9, 1, 0.006]]
The naive, simple way:
for alist in A:
for blist in B:
if blist[0] == alist[0]:
alist.extend(blist[1:])
# alist.append(blist[1]) if B will only ever contain 2-tuples.
break # Remove this if you want to append more than one.
The downside here is that it's O(N^2) complexity. For most small data sets, that should be ok. If you're looking for something more comprehensive, you'll probably want to look at #mgilson's answer. Some comparison:
His response converts everything in B to a dict and performs list slicing on each element. If you have a lot of values in B, that could be expensive. This uses the existing lists (you're only looking at the first value, anyway).
Because he's using dicts, he gets O(1) lookup times (his answer also assumes that you're never going to append multiple values to the end of the values in A). That means overall, his algorithm will achieve O(N). You'll need to weigh whether the overhead of creating a dict is going to outweight the iteration of the values in B.
In a project I am currently working on I have implemented about 80% of what I want my program to do and I am very happy with the results.
In the remaining 20% I am faced with a problem which puzzles me a bit on how to solve.
Here it is:
I have come up with a list of lists which contain several numbers (arbitrary length)
For example:
listElement[0] = [1, 2, 3]
listElement[1] = [3, 6, 8]
listElement[2] = [4, 9]
listElement[4] = [6, 11]
listElement[n] = [x, y, z...]
where n could reach up to 40,000 or so.
Assuming each list element is a set of numbers (in the mathematical sense), what I would like to do is to derive all the combinations of mutually exclusive sets; that is, like the powerset of the above list elements, but with all non-disjoint-set elements excluded.
So, to continue the example with n=4, I would like to come up with a list that has the following combinations:
newlistElement[0] = [1, 2, 3]
newlistElement[1] = [3, 6, 8]
newlistElement[2] = [4, 9]
newlistElement[4] = [6, 11]
newlistElement[5] = [[1, 2, 3], [4, 9]]
newlistElement[6] = [[1, 2, 3], [6, 11]]
newlistElement[7] = [[1, 2, 3], [4, 9], [6, 11]]
newlistElement[8] = [[3, 6, 8], [4, 9]]
newlistElement[9] = [[4, 9], [6, 11]
An invalid case, for example would be combination [[1, 2, 3], [3, 6, 8]] because 3 is common in two elements.
Is there any elegant way to do this? I would be extremely grateful for any feedback.
I must also specify that I would not like to do the powerset function, because the initial list could have quite a large number of elements (as I said n could go up to 40000), and taking the powerset with so many elements would never finish.
I'd use a generator:
import itertools
def comb(seq):
for n in range(1, len(seq)):
for c in itertools.combinations(seq, n): # all combinations of length n
if len(set.union(*map(set, c))) == sum(len(s) for s in c): # pairwise disjoint?
yield list(c)
for c in comb([[1, 2, 3], [3, 6, 8], [4, 9], [6, 11]]):
print c
This produces:
[[1, 2, 3]]
[[3, 6, 8]]
[[4, 9]]
[[6, 11]]
[[1, 2, 3], [4, 9]]
[[1, 2, 3], [6, 11]]
[[3, 6, 8], [4, 9]]
[[4, 9], [6, 11]]
[[1, 2, 3], [4, 9], [6, 11]]
If you need to store the results in a single list:
print list(comb([[1, 2, 3], [3, 6, 8], [4, 9], [6, 11]]))
The following is a recursive generator:
def comb(input, lst = [], lset = set()):
if lst:
yield lst
for i, el in enumerate(input):
if lset.isdisjoint(el):
for out in comb(input[i+1:], lst + [el], lset | set(el)):
yield out
for c in comb([[1, 2, 3], [3, 6, 8], [4, 9], [6, 11]]):
print c
This is likely to be a lot more efficient than the other solutions in situations where a lot of sets have common elements (of course in the worst case it still has to iterate over the 2**n elements of the powerset).
The method used in the program below is similar to a couple of previous answers in excluding not-disjoint sets and therefore usually not testing all combinations. It differs from previous answers by greedily excluding all the sets it can, as early as it can. This allows it to run several times faster than NPE's solution. Here is a time comparison of the two methods, using input data with 200, 400, ... 1000 size-6 sets having elements in the range 0 to 20:
Set size = 6, Number max = 20 NPE method
0.042s Sizes: [200, 1534, 67]
0.281s Sizes: [400, 6257, 618]
0.890s Sizes: [600, 13908, 2043]
2.097s Sizes: [800, 24589, 4620]
4.387s Sizes: [1000, 39035, 9689]
Set size = 6, Number max = 20 jwpat7 method
0.041s Sizes: [200, 1534, 67]
0.077s Sizes: [400, 6257, 618]
0.167s Sizes: [600, 13908, 2043]
0.330s Sizes: [800, 24589, 4620]
0.590s Sizes: [1000, 39035, 9689]
In the above data, the left column shows execution time in seconds. The lists of numbers show how many single, double, or triple unions occurred. Constants in the program specify data set sizes and characteristics.
#!/usr/bin/python
from random import sample, seed
import time
nsets, ndelta, ncount, setsize = 200, 200, 5, 6
topnum, ranSeed, shoSets, shoUnion = 20, 1234, 0, 0
seed(ranSeed)
print 'Set size = {:3d}, Number max = {:3d}'.format(setsize, topnum)
for casenumber in range(ncount):
t0 = time.time()
sets, sizes, ssum = [], [0]*nsets, [0]*(nsets+1);
for i in range(nsets):
sets.append(set(sample(xrange(topnum), setsize)))
if shoSets:
print 'sets = {}, setSize = {}, top# = {}, seed = {}'.format(
nsets, setsize, topnum, ranSeed)
print 'Sets:'
for s in sets: print s
# Method by jwpat7
def accrue(u, bset, csets):
for i, c in enumerate(csets):
y = u + [c]
yield y
boc = bset|c
ts = [s for s in csets[i+1:] if boc.isdisjoint(s)]
for v in accrue (y, boc, ts):
yield v
# Method by NPE
def comb(input, lst = [], lset = set()):
if lst:
yield lst
for i, el in enumerate(input):
if lset.isdisjoint(el):
for out in comb(input[i+1:], lst + [el], lset | set(el)):
yield out
# Uncomment one of the following 2 lines to select method
#for u in comb (sets):
for u in accrue ([], set(), sets):
sizes[len(u)-1] += 1
if shoUnion: print u
t1 = time.time()
for t in range(nsets-1, -1, -1):
ssum[t] = sizes[t] + ssum[t+1]
print '{:7.3f}s Sizes:'.format(t1-t0), [s for (s,t) in zip(sizes, ssum) if t>0]
nsets += ndelta
Edit: In function accrue, arguments (u, bset, csets) are used as follows:
• u = list of sets in current union of sets
• bset = "big set" = flat value of u = elements already used
• csets = candidate sets = list of sets eligible to be included
Note that if the first line of accrue is replaced by
def accrue(csets, u=[], bset=set()):
and the seventh line by
for v in accrue (ts, y, boc):
(ie, if parameters are re-ordered and defaults given for u and bset) then accrue can be invoked via [accrue(listofsets)] to produce its list of compatible unions.
Regarding the ValueError: zero length field name in format error mentioned in a comment as occurring when using Python 2.6, try the following.
# change:
print "Set size = {:3d}, Number max = {:3d}".format(setsize, topnum)
# to:
print "Set size = {0:3d}, Number max = {1:3d}".format(setsize, topnum)
Similar changes (adding appropriate field numbers) may be needed in other formats in the program. Note, the what's new in 2.6 page says “Support for the str.format() method has been backported to Python 2.6”. While it does not say whether field names or numbers are required, it does not show examples without them. By contrast, either way works in 2.7.3.
using itertools.combinations, set.intersection and for-else loop:
from itertools import *
lis=[[1, 2, 3], [3, 6, 8], [4, 9], [6, 11]]
def func(lis):
for i in range(1,len(lis)+1):
for x in combinations(lis,i):
s=set(x[0])
for y in x[1:]:
if len(s & set(y)) != 0:
break
else:
s.update(y)
else:
yield x
for item in func(lis):
print item
output:
([1, 2, 3],)
([3, 6, 8],)
([4, 9],)
([6, 11],)
([1, 2, 3], [4, 9])
([1, 2, 3], [6, 11])
([3, 6, 8], [4, 9])
([4, 9], [6, 11])
([1, 2, 3], [4, 9], [6, 11])
Similar to NPE's solution, but it's without recursion and it returns a list:
def disjoint_combinations(seqs):
disjoint = []
for seq in seqs:
disjoint.extend([(each + [seq], items.union(seq))
for each, items in disjoint
if items.isdisjoint(seq)])
disjoint.append(([seq], set(seq)))
return [each for each, _ in disjoint]
for each in disjoint_combinations([[1, 2, 3], [3, 6, 8], [4, 9], [6, 11]]):
print each
Result:
[[1, 2, 3]]
[[3, 6, 8]]
[[1, 2, 3], [4, 9]]
[[3, 6, 8], [4, 9]]
[[4, 9]]
[[1, 2, 3], [6, 11]]
[[1, 2, 3], [4, 9], [6, 11]]
[[4, 9], [6, 11]]
[[6, 11]]
One-liner without employing the itertools package.
Here's your data:
lE={}
lE[0]=[1, 2, 3]
lE[1] = [3, 6, 8]
lE[2] = [4, 9]
lE[4] = [6, 11]
Here's the one-liner:
results=[(lE[v1],lE[v2]) for v1 in lE for v2 in lE if (set(lE[v1]).isdisjoint(set(lE[v2])) and v1>v2)]