Identify duplicate lists in list of lists? - python

How can I compare a list of lists with itself in python in order to:
identify identical sublists with the same items (not necessarily in the same
item order)
delete these duplicate sublists
Example:
list = [ [1, 3, 5, 6], [7, 8], [10, 12], [9], [3, 1, 5, 6], [12, 10] ]
clean_list = [ [1, 3, 5, 6], [7, 8], [10, 12], [9] ]
Any help is greatly appreciated.
I can't seem to figure this out.

I would rebuild the "clean_list" in a list comprehension, checking that the sorted version of the sublist isn't already in the previous elements
the_list = [ [1, 3, 5, 6], [7, 8], [10, 12], [9], [3, 1, 5, 6], [12, 10] ]
clean_list = [l for i,l in enumerate(the_list) if all(sorted(l)!=sorted(the_list[j]) for j in range(0,i))]
print(clean_list)
of course, sorting the items for each iteration is time consuming, so you could prepare a sorted list of sublists:
the_sorted_list = [sorted(l) for l in the_list]
and use it:
clean_list = [the_list[i] for i,l in enumerate(the_sorted_list) if all(l!=the_sorted_list[j] for j in range(0,i))]
result (in both cases):
[[1, 3, 5, 6], [7, 8], [10, 12], [9]]
As many suggested, maybe a simple for loop (no list comprehension there) storing the already seen items in a set would be more performant for the lookup of the duplicates. That alternate solution could be necessary if the input list is really big to avoid the O(n) lookup of all.
An example of implementation could be:
test_set = set()
clean_list = []
for l in the_list:
sl = sorted(l)
tsl = tuple(sl)
if not tsl in test_set:
test_set.add(tsl) # note it down to avoid inserting it next time
clean_list.append(sl)

Create a set. Then for each list in the list, sort it, transform into tuple, then insert into set.
setOfLists = set()
for list in listOfLists:
list.sort()
setOfLists.add(tuple(list))
print setOfLists
You can retransform the tuples in the set into lists again.

Simple for loops will work, but if your dataset is small, e.g. 1k or less, you can use this :
b = []
[b.append(i) for i in a if len([j for j in b if set(j) == set(i)])==0 ]
print b

So heres my take on this.
I def a function that sorts each sublist and appends to a temp list. then I check if the sublist in temp_my_list is 'not' in temp_clean_list and if not then append to new list. this should work for any 2 sets of list. I added some extra list to show some kind of result other than an empty string.
my_list = [[1, 3, 5, 6], [7, 8], [10, 12], [9], [3, 1, 5, 6], [12, 10],[16]]
clean_list = [ [1, 3, 5, 6], [7, 8], [10, 12], [9],[18]]
new_list = []
def getNewList():
temp_my_list = []
temp_clean_list = []
for sublist in my_list:
sublist.sort()
temp_my_list.append(msublist)
for sublist in clean_list:
sublist.sort()
temp_clean_list.append(sublist)
for sublist in temp_my_list:
if sublist not in temp_clean_list:
new_list.append(sublist)
getNewList()
print (new_list)
Resulit:
[[16]]

Related

Finding an element in nested python list and then replacing it

I have a nested list and I am trying to replace a certain element of the list with something else.
NL = [[1,2,3],
[4,5,6],
[7,8,9]];
Now, I need to update the list, let's say the user wants to change element at NL[1][1] (i.e. 5) to 'X'.
NL will be updated as
NL = [[1,2,3],
[4,'X',6],
[7,8,9]];`
I am having trouble trying to find the position of the element and then changing it. Any help is much appreciated.
Thanks
Using numpy:
NL = np.array(NL)
mask = np.where(NL == 5)
NL[mask] = 10
array([[ 1, 2, 3],
[ 4, 10, 6],
[ 7, 8, 9]])
Solution2:
def get_index(num, List):
for row, i in enumerate(List):
if num in i:
return row, i.index(num)
return -1
idx = get_index(5,NL)
if idx>0:
NL[idx[0]][idx[1]] = 7
[[1, 2, 3], [4, 7, 6], [7, 8, 9]]
Use 2 indexes, 1 for the what nested list you want and one for what element of the nested list you want.
So in this case you want the 2nd list's 2nd element:
NL[1][1]='X'
Output:
[[1, 2, 3], [4, 'X', 6], [7, 8, 9]]
Let's say you need to find the element 5 and want to replace it with 10.
We iterate through the outer list and then each inner-list's elements. Once we find the element we look for, we can replace it by the indexes. We use enumerate to have the indexes once we find a matching element.
The following code replaces ALL matching elements (all occurences of 5).
NL = [[1,2,3], [4,5,6], [7,8,9]]
print(NL) # prints: [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
for i, sublist in enumerate(NL):
for y, element in enumerate(sublist):
if element == 5:
NL[i][y] = 10
print(NL) # prints: [[1, 2, 3], [4, 10, 6], [7, 8, 9]]
This will replace only the first occurrence of item_to_replace. If you want it to replace in all sublist then remove the break statement from try block.
item_to_replace = 5
for lst in NL:
try:
index = lst.index(item_to_replace)
lst[index] = # your replacement for item_to_replace
break
except ValueError:
continue
You should access to element by indexes. You have 2D list (array) so you should use 2 indexes: NL[1][1] = "X".
Complete code:
NL = [[1, 2, 3],
[4, 5, 6],
[7, 8, 9]]
print("Original: {}".format(NL))
NL[1][1] = "X"
print("New: {}".format(NL))
Output:
>>> python3 test.py
Original: [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
New: [[1, 2, 3], [4, 'X', 6], [7, 8, 9]]
just use NL[1][1] = 'X'
then print(NL)
I am having trouble trying to find the position of the element and then changing it.
Most of the answers here seem to have missed that part, and assumed you had the position.
You can use a nested list comprehension:
NL = [[1,2,3],
[4,5,6],
[7,8,9]]
NL = [['X' if i==5 else i for i in j] for j in NL]
print(NL)
Output:
[[1, 2, 3],
[4,'X',6],
[7, 8, 9]]

Eliminate duplicate from list of lists [duplicate]

This question already has answers here:
set of list of lists in python
(4 answers)
Closed 3 years ago.
I need to eliminate duplicates from a list of list like this one:
list = [[10, 5, 3], [10, 5, 3], [10, 10, 3], [10, 10], [3, 3, 3], [10, 5, 3]]
As a expected result:
result_list = [[10, 5, 3], [10, 3], [10], [3]]
Eliminating duplicates inside sub-lists and in the main list, would it be possible?
I tried with:
result_list = [list(result) for result in set(set(item) for item in list)]
but throws an TypeError saying that a set is a unhashable type
I think it was not a duplicated question, i need to remove the duplicates within the sublists, not just in the main list.
Thanks to everyone who helped me, problem solved.
Sets aren't hashable, but frozensets are:
lst = [[10, 5, 3], [10, 5, 3], [10, 10, 3], [10, 10], [3, 3, 3], [10, 5, 3]]
result_list = [list(result) for result in set(frozenset(item) for item in lst)]
Also don't shadow the builtin name list, especially if you want to use its usual meaning immediately after.
You should use map in order to convert to tuple
result = [list(set(i)) for i in set(map(tuple, mylist))]
Output
[[3], [10, 3, 5], [10, 3], [10]]
You need to use tuples to be able to set, however in a nested list comprehension, you can turn this tuple items back to lists:
list_example = [[10, 5, 3], [10, 5, 3], [10, 10, 3], [10, 10], [3, 3, 3], [10, 5, 3]]
output = [list(x) for x in (list(set([tuple(set(result)) for result in list(set(item) for item in list_example)])))]
print(output)
Output:
[[10, 3, 5], [10], [10, 3], [3]]
You are iterating a list:
for item in list
putting the results into a set:
set(item)
then putting the set into a set:
set(set(item)
For anything to go into a set it has to be hashable, meaning it has a defined hash value resulting from being an immutable object. sets aren't immutable, and so don't have a hash. See Why aren't Python sets hashable?.

Recursive method to zip list?

I have got a nested list of list that looks like the following,
list1 = [[1,2,3],[4,5,6],[7,8,9],[10,11,12]]
However, I would like to find out a method to concatenate the first index of each list with the first index of the other list.
list1 = [[1,4,7,10],[2,5,8,11],[3,6,9,12]]
I have tried doing list comprehension by using the following code
list1 = [[list1[j][i] for j in range(len(list1)) ] for i in range(len(list1[0])) ]
# gives me
# list1 = [[1,4,7,10],[2,5,8,11],[3,6,9,12]]
However, i was hoping alternative methods to achieve the same results, hopefully something that is simpler and more elegant.
Thanks in advance.
zip is a built-in method and does not require outside packages:
>>> list1 = [[1,2,3],[4,5,6],[7,8,9],[10,11,12]]
>>> print([list(x) for x in zip(*list1)])
[[1, 4, 7, 10], [2, 5, 8, 11], [3, 6, 9, 12]]
Notice the *list1! This is needed since list1 is a nested list, so the * unpacks that list's elements to the zip method to zip together. Then, since zip returns a list of tuples we simply convert them to lists (as per your request)
A possible recursion solution can utilize a generator:
def r_zip(d):
yield [i[0] for i in d]
if d[0][1:]:
yield from r_zip([i[1:] for i in d])
print(list(r_zip(list1)))
Output:
[[1, 4, 7, 10], [2, 5, 8, 11], [3, 6, 9, 12]]
x = min([len(list1[i]) for i in range(len(list1))])
[[i[j] for i in list1] for j in range(x)]
Or try using:
>>> list1 = [[1,2,3],[4,5,6],[7,8,9],[10,11,12]]
>>> list(map(list, zip(*list1)))
[[1, 4, 7, 10], [2, 5, 8, 11], [3, 6, 9, 12]]
>>>

how to manipulate nested lists

So I currently have a nested list.
org_network=[[1, 2, 3], [1, 4, 5], [1, 3, 6], [7, 9, 10]]
I need to figure out how to manipulate it to create lists of possible combinations of the nested lists. These combinations cannot have lists that share numbers. Here is an example of what the result should be:
network_1=[[1,2,3],[7,9,10]]
network_2=[[1,4,5],[7,9,10]]
network_3=[[1,3,6],[7,9,10]]
Note:
1. This code is going to be linked to a constantly updated csv file, so the org_network list will have varying amounts of elements within it (which also means that there will be numerous resulting networks.
I have been working on this for about four hours and have yet to figure it out. Any help would be very appreciated. I have primarily been trying to use for loops and any() functions to no avail. Thanks for any help.
You can use itertools.combinations() with set intersection:
>>> from itertools import combinations
>>> org_network=[[1, 2, 3], [1, 4, 5], [1, 3, 6], [7, 9, 10]]
>>> [[x, y] for x, y in combinations(org_network, r=2) if not set(x).intersection(y)]
[[[1, 2, 3], [7, 9, 10]], [[1, 4, 5], [7, 9, 10]], [[1, 3, 6], [7, 9, 10]]]
Here is an approach that will be efficient if the number of unique elements is small relative to the number of sets.
Steps:
For each unique element, store indices of all sets in which the element does not occur.
For each set s in the network, find all other sets that contain every element of s using data from the first step.
Iterate over pairs, discarding duplicates based on ID order.
from functools import reduce
org_network = [[1, 2, 3], [1, 4, 5], [1, 3, 6], [7, 9, 10]]
# convert to sets
sets = [set(lst) for lst in org_network]
# all unique numbers
uniqs = set().union(*sets)
# map each unique number to sets that do not contain it:
other = {x: {i for i, s in enumerate(sets) if x not in s} for x in uniqs}
# iterate over sets:
for i, s in enumerate(sets):
# find all sets not overlapping with i
no_overlap = reduce(lambda l, r: l.intersection(r), (other[x] for x in s))
# iterate over non-overlapping sets
for j in no_overlap:
# discard duplicates
if j <= i:
continue
print([org_network[i], org_network[j]])
# result
# [[1, 2, 3], [7, 9, 10]]
# [[1, 4, 5], [7, 9, 10]]
# [[1, 3, 6], [7, 9, 10]]
Edit: If combinations of size greater than two are required, it is possible to modify the above approach. Here is an extension that uses depth-first search to traverse all pairwise disjoint combinations.
def not_overlapping(set_ids):
candidates = reduce(
lambda l, r: l.intersection(r), (other[x] for sid in set_ids for x in sets[sid])
)
mid = max(set_ids)
return {c for c in candidates if c > mid}
# this will produce "combinations" consisting of a single element
def iter_combinations():
combs = [[i] for i in range(len(sets))]
while combs:
comb = combs.pop()
extension = not_overlapping(comb)
combs.extend(comb + [e] for e in extension)
yield [org_network[i] for i in comb]
def iter_combinations_long():
for comb in iter_combinations():
if len(comb) > 1:
yield comb
all_combs = list(iter_combinations_long())

python how to link sublist together in a list which describe a tree

For example list =[[1,0],[2,1],[6,2],[7,6],[7,8],[15,13],[8,15]]
shows tree
_7_
_6 8_
_2 _15
how to get a new list contain all these number.
like the example list =
[[1,0],[2,1],[6,2],[7,6],[7,8],[15,13],[8,15]]
the output will be new_list=[0,1,2,6,7,8,15,13] (order not important)
My biggest problem is to link [6,2],[7,6],[7,8],[8,15] together
simple, if order doesn't matter:
l =[[1,0],[2,1],[6,2],[7,6],[7,8],[15,13],[8,15]]
# flatten list
t = sum(l,[])
# transform in a set removing duplicate values
# otherwise if u want to keep the order you have to use an OrderedDict
list(set(t)) # [0, 1, 2, 6, 7, 8, 13, 15]
if the order is not important, you can loop through the list and sub-list and check if element is not in new_list:
list = [[1, 0], [2, 1], [6, 2], [7, 6], [7, 8], [15, 13], [8, 15]]
new_list = []
for sub in list:
for elem in sub:
if elem not in new_list:
new_list.append(elem)
print new_list
output:
[1, 0, 2, 6, 7, 8, 15, 13]
Leaving Tree and Graph Theory aside:
list =[[1,0],[2,1],[6,2],[7,6],[7,8],[15,13],[8,15]]
uniq = {}
for i in list:
uniq.update({i[0]: True, i[1]: True})
print(uniq.keys())
>>> [0, 1, 2, 6, 7, 8, 13, 15]
Using python set:
list =[[1,0],[2,1],[6,2],[7,6],[7,8],[15,13],[8,15]]
uniq = set()
for i in list:
uniq.add(i[0])
uniq.add(i[1])
print uniq

Categories