Removing dupes in list of lists in Python

Removing dupes in list of lists in Python - python

Basically, I'm trying to do remove any lists that begin with the same value. For example, two of the below begin with the number 1:
a = [[1,2],[1,0],[2,4],[3,5]]
Because the value 1 exists at the start of two of the lists -- I need to remove both so that the new list becomes:
b = [[2,4],[3,5]]
How can I do this?
I've tried the below, but the output is: [[1, 2], [2, 4], [3, 5]]
def unique_by_first_n(n, coll):
seen = set()
for item in coll:
compare = tuple(item[:n])
print compare # Keep only the first `n` elements in the set
if compare not in seen:
seen.add(compare)
yield item
a = [[1,2],[1,0],[2,4],[3,5]]
filtered_list = list(unique_by_first_n(1, a))

An efficient solution would be to create a Counter object to hold the occurrences of the first elements, and then filter the sub-lists in the main list:
from collections import Counter
counts = Counter(l[0] for l in a)
filtered = [l for l in a if counts[l[0]] == 1]
#[[2, 4], [3, 5]]

If you are happy to use a 3rd party library, you can use Pandas:
import pandas as pd
a = [[1,2],[1,0],[2,4],[3,5]]
df = pd.DataFrame(a)
b = df.drop_duplicates(subset=[0], keep=False).values.tolist()
print(b)
[[2, 4], [3, 5]]
The trick is the keep=False argument, described in the docs for pd.DataFrame.drop_duplicates.

You can use collections.Counter with list comprehension to get sublists whose first item appears only once:
from collections import Counter
c = Counter(n for n, _ in a)
b = [[x, y] for x, y in a if c[x] == 1]

Solution 1
a = [[1,2],[1,0],[2,4],[3,5]]
b = []
for item in a:
i = 0
if item[0] == a[i][0]:
i =+ 1
continue
else:
b.append(item)
i += 1
Solution 2
a = [[1,2],[1,0],[2,4],[3,5]]
b = []
for item in a:
for i in range(0, len(a)):
if item[0] == a[i][0]:
break
else:
if item in b:
continue
else:
b.append(item)
Output
(xenial)vash#localhost:~/pcc/10$ python3 remove_help.py
[[1, 2], [1, 0], [2, 4], [3, 5]]
[[2, 4], [3, 5]]
Achieved your goal no complex methods involed!
Enjoy!

Related

how to combine same matching items in a 1d list and make it as 2d list [duplicate]

From this list:
N = [1,2,2,3,3,3,4,4,4,4,5,5,5,5,5]
I'm trying to create:
L = [[1],[2,2],[3,3,3],[4,4,4,4],[5,5,5,5,5]]
Any value which is found to be the same is grouped into it's own sublist.
Here is my attempt so far, I'm thinking I should use a while loop?
global n
n = [1,2,2,3,3,3,4,4,4,4,5,5,5,5,5] #Sorted list
l = [] #Empty list to append values to
def compare(val):
""" This function receives index values
from the n list (n[0] etc) """
global valin
valin = val
global count
count = 0
for i in xrange(len(n)):
if valin == n[count]: # If the input value i.e. n[x] == n[iteration]
temp = valin, n[count]
l.append(temp) #append the values to a new list
count +=1
else:
count +=1
for x in xrange (len(n)):
compare(n[x]) #pass the n[x] to compare function

Use itertools.groupby:
from itertools import groupby
N = [1,2,2,3,3,3,4,4,4,4,5,5,5,5,5]
print([list(j) for i, j in groupby(N)])
Output:
[[1], [2, 2], [3, 3, 3], [4, 4, 4, 4], [5, 5, 5, 5, 5]]
Side note: Prevent from using global variable when you don't need to.

Someone mentions for N=[1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5, 1] it will get [[1], [2, 2], [3, 3, 3], [4, 4, 4, 4], [5, 5, 5, 5, 5], [1]]
In other words, when numbers of the list isn't in order or it is a mess list, it's not available.
So I have better answer to solve this problem.
from collections import Counter
N = [1,2,2,3,3,3,4,4,4,4,5,5,5,5,5]
C = Counter(N)
print [ [k,]*v for k,v in C.items()]

You can use itertools.groupby along with a list comprehension
>>> l = [1,2,2,3,3,3,4,4,4,4,5,5,5,5,5]
>>> [list(v) for k,v in itertools.groupby(l)]
[[1], [2, 2], [3, 3, 3], [4, 4, 4, 4], [5, 5, 5, 5, 5]]
This can be assigned to the variable L as in
L = [list(v) for k,v in itertools.groupby(l)]

You're overcomplicating this.
What you want to do is: for each value, if it's the same as the last value, just append it to the list of last values; otherwise, create a new list. You can translate that English directly to Python:
new_list = []
for value in old_list:
if new_list and new_list[-1][0] == value:
new_list[-1].append(value)
else:
new_list.append([value])
There are even simpler ways to do this if you're willing to get a bit more abstract, e.g., by using the grouping functions in itertools. But this should be easy to understand.
If you really need to do this with a while loop, you can translate any for loop into a while loop like this:
for value in iterable:
do_stuff(value)
iterator = iter(iterable)
while True:
try:
value = next(iterator)
except StopIteration:
break
do_stuff(value)
Or, if you know the iterable is a sequence, you can use a slightly simpler while loop:
index = 0
while index < len(sequence):
value = sequence[index]
do_stuff(value)
index += 1
But both of these make your code less readable, less Pythonic, more complicated, less efficient, easier to get wrong, etc.

You can do that using numpy too:
import numpy as np
N = np.array([1,2,2,3,3,3,4,4,4,4,5,5,5,5,5])
counter = np.arange(1, np.alen(N))
L = np.split(N, counter[N[1:]!=N[:-1]])
The advantage of this method is when you have another list which is related to N and you want to split it in the same way.

Another slightly different solution that doesn't rely on itertools:
#!/usr/bin/env python
def group(items):
"""
groups a sorted list of integers into sublists based on the integer key
"""
if len(items) == 0:
return []
grouped_items = []
prev_item, rest_items = items[0], items[1:]
subgroup = [prev_item]
for item in rest_items:
if item != prev_item:
grouped_items.append(subgroup)
subgroup = []
subgroup.append(item)
prev_item = item
grouped_items.append(subgroup)
return grouped_items
print group([1,2,2,3,3,3,4,4,4,4,5,5,5,5,5])
# [[1], [2, 2], [3, 3, 3], [4, 4, 4, 4], [5, 5, 5, 5, 5]]

Remove list if it's contained in another list within the same nested list Python

I have a nested list:
regions = [[1,2,3],[3,4],[1,3,4],[1,2,3,5]]
I want to remove every list in this nested list which is contained in another one, i.e., [3,4] contained in [1,3,4] and [1,2,3] contained in [1,2,3,5], so the result is:
result = [[1,3,4],[1,2,3,5]]
So far I'm doing:
regions_remove = []
for i,reg_i in enumerate(regions):
for j,reg_j in enumerate(regions):
if j != i and list(set(reg_i)-set(reg_j)) == []:
regions_remove.append(reg_i)
regions = [list(item) for item in set(tuple(row) for row in regions) -
set(tuple(row) for row in regions_remove)]
And I've got: regions = [[1, 2, 3, 5], [1, 3, 4]] and this is a solution, but I'd like to know what's the most pythonic solution?
(sorry for not posting my entire code before, I'm a new to this...

Here is a solution with list comprehension and all() function :
nested_list = [[1,2,3],[3,4],[1,3,4],[1,2,3,5],[2,5]]
result = list(nested_list) #makes a copy of the initial list
for l1 in nested_list: #list in nested_list
rest = list(result) #makes a copy of the current result list
rest.remove(l1) #the list l1 will be compared to every other list (so except itself)
for l2 in rest: #list to compare
if all([elt in l2 for elt in l1]): result.remove(l1)
#if all the elements of l1 are in l2 (then all() gives True), it is removed
returns:
[[1, 3, 4], [1, 2, 3, 5]]
Further help
all() built-in function: https://docs.python.org/2/library/functions.html#all
copy a list: https://docs.python.org/2/library/functions.html#func-list
list comprehension: https://www.pythonforbeginners.com/basics/list-comprehensions-in-python

I'm definitely overlooking a simpler route, but this approach works
list comprehension
from itertools import product
l = [[1,2,3],[3,4],[1,3,4],[1,2,3,5]]
bad = [i for i in l for j in l if i != j if tuple(i) in product(j, repeat = len(i))]
final = [i for i in l if i not in bad]
Expanded explanation
from itertools import product
l = [[1,2,3],[3,4],[1,3,4],[1,2,3,5]]
bad = []
for i in l:
for j in l:
if i != j:
if tuple(i) in product(j, repeat = len(i)):
bad.append(i)
final = [i for i in l if i not in bad]
print(final)
[[1, 3, 4], [1, 2, 3, 5]]

Group repeated elements of a list

I am trying to create a function that receives a list and return another list with the repeated elements.
For example for the input A = [2,2,1,1,3,2] (the list is not sorted) and the function would return result = [[1,1], [2,2,2]]. The result doesn't need to be sorted.
I already did it in Wolfram Mathematica but now I have to translate it to python3, Mathematica has some functions like Select, Map and Split that makes it very simple without using long loops with a lot of instructions.

result = [[x] * A.count(x) for x in set(A) if A.count(x) > 1]

Simple approach:
def grpBySameConsecutiveItem(l):
rv= []
last = None
for elem in l:
if last == None:
last = [elem]
continue
if elem == last[0]:
last.append(elem)
continue
if len(last) > 1:
rv.append(last)
last = [elem]
return rv
print grpBySameConsecutiveItem([1,2,1,1,1,2,2,3,4,4,4,4,5,4])
Output:
[[1, 1, 1], [2, 2], [4, 4, 4, 4]]
You can sort your output afterwards if you want to have it sorted or sort your inputlist , then you wouldnt get consecutive identical numbers any longer though.
See this https://stackoverflow.com/a/4174955/7505395 for how to sort lists of lists depending on an index (just use 0) as all your inner lists are identical.
You could also use itertools - it hast things like TakeWhile - that looks much smarter if used
This will ignore consecutive ones, and just collect them all:
def grpByValue(lis):
d = {}
for key in lis:
if key in d:
d[key] += 1
else:
d[key] = 1
print(d)
rv = []
for k in d:
if (d[k]<2):
continue
rv.append([])
for n in range(0,d[k]):
rv[-1].append(k)
return rv
data = [1,2,1,1,1,2,2,3,4,4,4,4,5,4]
print grpByValue(data)
Output:
[[1, 1, 1, 1], [2, 2, 2], [4, 4, 4, 4, 4]]

You could do this with a list comprehension:
A = [1,1,1,2,2,3,3,3]
B = []
[B.append([n]*A.count(n)) for n in A if B.count([n]*A.count(n)) == 0]
outputs [[1,1,1],[2,2],[3,3,3]]
Or more pythonically:
A = [1,2,2,3,4,1,1,2,2,2,3,3,4,4,4]
B = []
for n in A:
if B.count([n]*A.count(n)) == 0:
B.append([n]*A.count(n))
outputs [[1,1,1],[2,2,2,2,2],[3,3,3],[4,4,4,4]]
Works with sorted or unsorted list, if you need to sort the list before hand you can do for n in sorted(A)

This is a job for Counter(). Iterating over each element, x, and checking A.count(x) has a O(N^2) complexity. Counter() will count how many times each element exists in your iterable in one pass and then you can generate your result by iterating over that dictionary.
>>> from collections import Counter
>>> A = [2,2,1,1,3,2]
>>> counts = Counter(A)
>>> result = [[key] * value for key, value in counts.items() if value > 1]
>>> result
[[2, 2, 2], [[1, 1]]

Python: count the occurrences of each item from a lists of lists

I'm a newbie to coding. I need to count the number of occurrences of each item in a list of lists. Here is an example of the list of list I deal with:
GC = [[5,4,3,2,1],[9,8,7,6,5,4,3,2],[4,3,2,1],[10,9,8,7,6,5,4]]
and print the results in two columns. column 1 = range of list of list elements, column 2 = total occurrences of each element.

You can accomplish that easily using some built-in libraries/modules:
from itertools import chain
from collections import Counter
l = [[5,4,3,2,1],[9,8,7,6,5,4,3,2],[4,3,2,1],[10,9,8,7,6,5,4]]
l = chain.from_iterable(l)
print Counter(l)
chain.from_iterable(l) flattens the list into 1 dimension and then the Counter constructor creates a Counter object which is basically dictionary mapping each unique item to its count in the list.

If you want each number and the count in columns:
l = [[5,4,3,2,1],[9,8,7,6,5,4,3,2],[4,3,2,1],[10,9,8,7,6,5,4]]
flattened = ([x for y in l for x in y])
counts = [[ele,flattened.count(ele)]for ind, ele in enumerate(set(flattened))]
print counts
[[1, 2], [2, 3], [3, 3], [4, 4], [5, 3], [6, 2], [7, 2], [8, 2], [9, 2], [10, 1]]

If your new to programming then maybe you are doing this to get an idea of how loops work, in that case the example below may help. But its important to note its not the most efficient method see sshashank124 or Padraic Cunningham answers above.
GC = [[5,4,3,2,1],[9,8,7,6,5,4,3,2],[4,3,2,1],[10,9,8,7,6,5,4]]
nums = list(set(flatten(GC)))
occurances = []
for i in nums:
p = 0
for i2 in GC:
for k2 in i2:
if i == k2:
p += 1
occurances.append(p) # save total count in list
print zip(nums, occurances)

Union find implementation using Python

So here's what I want to do: I have a list that contains several equivalence relations:
l = [[1, 2], [2, 3], [4, 5], [6, 7], [1, 7]]
And I want to union the sets that share one element. Here is a sample implementation:
def union(lis):
lis = [set(e) for e in lis]
res = []
while True:
for i in range(len(lis)):
a = lis[i]
if res == []:
res.append(a)
else:
pointer = 0
while pointer < len(res):
if a & res[pointer] != set([]) :
res[pointer] = res[pointer].union(a)
break
pointer +=1
if pointer == len(res):
res.append(a)
if res == lis:
break
lis,res = res,[]
return res
And it prints
[set([1, 2, 3, 6, 7]), set([4, 5])]
This does the right thing but is way too slow when the equivalence relations is too large. I looked up the descriptions on union-find algorithm: http://en.wikipedia.org/wiki/Disjoint-set_data_structure
but I still having problem coding a Python implementation.

Solution that runs in O(n) time
def indices_dict(lis):
d = defaultdict(list)
for i,(a,b) in enumerate(lis):
d[a].append(i)
d[b].append(i)
return d
def disjoint_indices(lis):
d = indices_dict(lis)
sets = []
while len(d):
que = set(d.popitem()[1])
ind = set()
while len(que):
ind |= que
que = set([y for i in que
for x in lis[i]
for y in d.pop(x, [])]) - ind
sets += [ind]
return sets
def disjoint_sets(lis):
return [set([x for i in s for x in lis[i]]) for s in disjoint_indices(lis)]
How it works:
>>> lis = [(1,2),(2,3),(4,5),(6,7),(1,7)]
>>> indices_dict(lis)
>>> {1: [0, 4], 2: [0, 1], 3: [1], 4: [2], 5: [2], 6: [3], 7: [3, 4]})
indices_dict gives a map from an equivalence # to an index in lis. E.g. 1 is mapped to index 0 and 4 in lis.
>>> disjoint_indices(lis)
>>> [set([0,1,3,4], set([2])]
disjoint_indices gives a list of disjoint sets of indices. Each set corresponds to indices in an equivalence. E.g. lis[0] and lis[3] are in the same equivalence but not lis[2].
>>> disjoint_set(lis)
>>> [set([1, 2, 3, 6, 7]), set([4, 5])]
disjoint_set converts disjoint indices into into their proper equivalences.
Time complexity
The O(n) time complexity is difficult to see but I'll try to explain. Here I will use n = len(lis).
indices_dict certainly runs in O(n) time because only 1 for-loop
disjoint_indices is the hardest to see. It certainly runs in O(len(d)) time since the outer loop stops when d is empty and the inner loop removes an element of d each iteration. now, the len(d) <= 2n since d is a map from equivalence #'s to indices and there are at most 2n different equivalence #'s in lis. Therefore, the function runs in O(n).
disjoint_sets is difficult to see because of the 3 combined for-loops. However, you'll notice that at most i can run over all n indices in lis and x runs over the 2-tuple, so the total complexity is 2n = O(n)

I think this is an elegant solution, using the built in set functions:
#!/usr/bin/python3
def union_find(lis):
lis = map(set, lis)
unions = []
for item in lis:
temp = []
for s in unions:
if not s.isdisjoint(item):
item = s.union(item)
else:
temp.append(s)
temp.append(item)
unions = temp
return unions
if __name__ == '__main__':
l = [[1, 2], [2, 3], [4, 5], [6, 7], [1, 7]]
print(union_find(l))
It returns a list of sets.

Perhaps something like this?
#!/usr/local/cpython-3.3/bin/python
import copy
import pprint
import collections
def union(list_):
dict_ = collections.defaultdict(set)
for sublist in list_:
dict_[sublist[0]].add(sublist[1])
dict_[sublist[1]].add(sublist[0])
change_made = True
while change_made:
change_made = False
for key, values in dict_.items():
for value in copy.copy(values):
for element in dict_[value]:
if element not in dict_[key]:
dict_[key].add(element)
change_made = True
return dict_
list_ = [ [1, 2], [2, 3], [4, 5], [6, 7], [1, 7] ]
pprint.pprint(union(list_))

This works by completely exhausting one equivalence at a time. When an element finds it's equivalence it is removed from the original set and no longer searched.
def equiv_sets(lis):
s = set(lis)
sets = []
#loop while there are still items in original set
while len(s):
s1 = set(s.pop())
length = 0
#loop while there are still equivalences to s1
while( len(s1) != length):
length = len(s1)
for v in list(s):
if v[0] in s1 or v[1] in s1:
s1 |= set(v)
s -= set([v])
sets += [s1]
return sets
print equiv_sets([(1,2),(2,3),(4,5),(6,7),(1,7)])
OUTPUT: [set([1, 2, 3, 6, 7]), set([4, 5])]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Removing dupes in list of lists in Python - python

An efficient solution would be to create a Counter object to hold the occurrences of the first elements, and then filter the sub-lists in the main list: from collections import Counter counts = Counter(l[0] for l in a) filtered = [l for l in a if counts[l[0]] == 1] #[[2, 4], [3, 5]]

You can use collections.Counter with list comprehension to get sublists whose first item appears only once: from collections import Counter c = Counter(n for n, _ in a) b = [[x, y] for x, y in a if c[x] == 1]

Related

how to combine same matching items in a 1d list and make it as 2d list [duplicate]

Remove list if it's contained in another list within the same nested list Python

Group repeated elements of a list

Python: count the occurrences of each item from a lists of lists

Union find implementation using Python

Categories

Resources