Get the duplicate elements from a list of lists [duplicate] - python

This question already has answers here:
How do I find the duplicates in a list and create another list with them?
(42 answers)
Closed 8 months ago.
Suppose that I have a list of lists, e.g.
example_list = [[0, 0], [0, 1], [0, 1], [5, 4]]
I want a reasonably fast method of obtaining a list formed exclusively of elements that appear at least twice in the original list. In this example, the new list would be
new_list = [[0, 1]]
since [0, 1] is the only duplicate entry. I have spent lots of time on Stackoverflow looking for a solution, but none of them seem to work for me (details below). How should I proceed in this instance?
Unsuccessful attempts. One solution which does work is to write something like
new_list = [x for x in example_list if example_list.count(x) > 1]
However, this is too slow for my purposes.
Another solution (suggested here) is to write
totals = {}
for k,v in example_list:
totals[k] = totals.get(k,0) + v
totals.items()
[list(t) for t in totals.items()]
print(totals)
I may have misunderstood what the author is suggesting, but this doesn't work for me at all: it prints {0: 2, 5: 4} in the terminal.
A final solution (also suggested on this page) is import Counter from collections and write
new_list = Counter(x for x, new_list in example_list for _ in xrange(new_list))
map(list, new_list.iteritems())
This flags an error on xrange and iteritems (I think it's a Python3 thing?), so I tried
new_list = Counter(x for x, new_list in example_list for _ in range(new_list))
map(list, new_list.items())
which yielded Counter({5: 4, 0: 2}) (again!!), which is of course not what I am after...

You can use Counter to create a dictionary of counts of elements in example_list. But each element should be converted to a tuple to make it hashable. Then, you can filter the elements that meet your criteria.
from collections import Counter
d = Counter([tuple(x) for x in example_list])
[list(k) for k, v in d.items() if v >= 2]
# [[0, 1]]

You could count the values of the inner lists. First, is to iterate the list - but really you want to iterate the values of the outer list. itertools.chain.from_iterable does that for you. Feed it in collections.Counter and you get a count of all of the values. A list comprehension can select the values you want and then place that in an outer list.
>>> example_list = [[0, 0], [0, 1], [0, 1], [5, 4]]
>>> import collections
>>> import itertools
>>> counts = collections.Counter(itertools.chain.from_iterable(example_list))
>>> counts
Counter({0: 4, 1: 2, 5: 1, 4: 1})
>>> selected = [k for k,v in counts.items() if v >= 2]
>>> result = [selected]
>>> result
[[0, 1]]

Related

itertools returning which list it found a match from

I'm new so bear with me.. I could create two separate for loops but I like this method since I'll probably iterate through multiple lists and this is less code. Is it possible in itertools? As I understand it, it creates one list out of the two so I might be out of luck.
import itertools
a = [0, 1, 2]
b = [3, 4, 5]
def finditem(n):
for i in itertools.chain(a, b):
if i == n:
print(n) # here i want (n) and (a or b)
n = 3
finditem(n)
I think what you want is:
given a list of lists and an item to find if it exists in that list
tell me what list that item was found in.
You should probably create a dictionary with your keys being what that list is called and the values for the things in that collection
my_lists = {'a': [0, 1, 2], 'b': [3, 4, 5]}
def find_item(my_lists, item_to_find):
for key in my_lists:
if item_to_find in my_lists[key]:
print(key)
n=3
find_item(my_lists, n)
There isn't much need for itertools here, but you should store your data in a dictionary if you want to be able to give them labels that you can refer to in the program. To list all of the list names containing n:
lists = {'fruits': [0, 1, 2], 'vegetables': [3, 4, 5]}
def find_item(n, lists):
for name, list_ in lists.items():
if n in list_:
print(name)
or similarly:
print([name for name, list_ in lists.items() if n in list_])
or to print only the first list found that contains n:
print(next(name for name, list_ in lists.items() if n in list_))
If you need this for an if condition, and the number of lists is not very large, then probably don't use a loop in the first place, but a simple condition like this:
fruits = [0, 1, 2]
vegetables = [3, 4, 5]
if n in fruits:
# do this
elif n in vegetables:
# do that

Remove list from list of lists if condition is met

I have a list of lists containing an index and two coordinates, [i,x,y] eg:
L=[[1,0,0][2,0,1][3,1,2]]
I want to check if L[i][1] is repeated (as is the case in the example for i=0 and i=1) and keep in the list only the list with the smallest i. In the example [2,0,1] would be removed and L would be:
L=[[1,0,0][3,1,2]]
Is there a simple way to do such a thing?
Keep a set of the x coordinates we've already seen, traverse the input list sorted by ascending i and build and output list adding only the sublists whose x we haven't seen yet:
L = [[1, 0, 0], [2, 0, 1], [3, 1, 2]]
ans = []
seen = set()
for sl in sorted(L):
if sl[1] not in seen:
ans.append(sl)
seen.add(sl[1])
L = ans
It works as required:
L
=> [[1, 0, 0], [3, 1, 2]]
There are probably better solution but you can do with:
i1_list=[]
result_list=[]
for i in L:
if not i[1] in i1_list:
result_list.append(i)
i1_list.append(i[1])
print(result_list)

Take unique values out of a list with unhashable elements [duplicate]

This question already has an answer here:
Python, TypeError: unhashable type: 'list'
(1 answer)
Closed 3 years ago.
So I have the following list:
test_list = ['Hallo', 42, [1, 2], 42, 3 + 2j, 'Hallo', 'Hello', [1, 2], [2, 3], 3 + 2j, 42]
Now I want to take the unique values from the list and print them on the screen. I've tried using the set function, but that doesn't work (Type error: unhasable type: 'list'), because of the [1,2] and [2,3] values in the list. I tried using the append and extend functions, but didn't come up with a solution yet.
expectation:
['Hallo', 42, [1,2], (3+2j), 'Hello', [2,3]]
def unique_list(a_list):
a = set(a_list)
print(a)
a_list = ['Hallo', 42, [1, 2], 42, 3 + 2j, 'Hallo', 'Hello', [1, 2], [2, 3], 3 + 2j, 42]
print(unique_list(a_list))
If the list contains unhashable elements, create a hashable key using repr that be used with a set:
def unique_list(a_list):
seen = set()
for x in a_list:
key = repr(x)
if key not in seen:
seen.add(key)
print(x)
You can use a simple for loop that appends only new elements:
test_list = ['Hallo', 42, [1, 2], 42, 3 + 2j, 'Hallo', 'Hello', [1, 2], [2, 3], 3 + 2j, 42]
new_list = []
for item in test_list:
if item not in new_list:
new_list.append(item)
print(new_list)
# ['Hallo', 42, [1, 2], (3+2j), 'Hello', [2, 3]]
To get the unique items from a list of non-hashables, one can do a partition by equivalence, which is a quadratic method as it compares each items to an item in each of the partitions and if it isn't equal to one of them it creates a new partition just for that item, and then take first item of each partition.
If some of the items are hashable, one can restrict the partition of equivalence to just the non-hashables. And feed the rest of the items through a set.
import itertools
def partition(L):
parts = []
for item in L:
for part in parts:
if item == part[0]:
part.append(item)
break
else:
parts.append([item])
return parts
def unique(L):
return [p[0] for p in partition(L)]
Untested.
One approach that solves this in linear time is to serialize items with serializers such as pickle so that unhashable objects such as lists can be added to a set for de-duplication, but since sets are unordered in Python and you apparently want the output to be in the original insertion order, you can use dict.fromkeys instead:
import pickle
list(map(pickle.loads, dict.fromkeys(map(pickle.dumps, test_list))))
so that given your sample input, this returns:
['Hallo', 42, [1, 2], (3+2j), 'Hello', [2, 3]]
Note that if you're using Python 3.6 or earlier versions where key orders of dicts are not guaranteed, you can use collections.OrderedDict in place of dict.
You could do it in a regular for loop that runs in O(n^2).
def unique_list(a_list):
orig = a_list[:] # shallow-copy original list to avoid modifying it
uniq = [] # start with an empty list as our result
while(len(orig) > 0): # iterate through the original list
uniq.append(orig[0]) # for each element, append it to the unique elements list
while(uniq[-1] in orig): # then, remove all occurrences of that element in the original list
orig.remove(uniq[-1])
return uniq # finally, return the list of unique elements in order of first occurrence in the original list
There's also probably a way to finagle this into a list comprehension, which would be more elegant, but I can't figure it out at the moment. If every element was hashable you could use the set method and that would be easier.

Removing duplicates from a list but returning the same list

I need to remove the duplicates from a list but return the same list.
So options like:
return list(set(list))
will not work for me, as it creates a new list instead.
def remove_extras(lst):
for i in lst:
if lst.count(i)>1:
lst.remove(i)
return lst
Here is my code, it works for some cases, but I dont get why it does not work for remove_extras([1,1,1,1]), as it returns [1,1] when the count for 1 should be >1.
You can use slice assignment to replace the contents of the list after you have created a new list. In case order of the result doesn't matter you can use set:
def remove_duplicates(l):
l[:] = set(l)
l = [1, 2, 1, 3, 2, 1]
remove_duplicates(l)
print(l)
Output:
[1, 2, 3]
You can achieve this using OrderedDict which removes the duplicates while maintaining order of the list.
>>> from collections import OrderedDict
>>> itemList = [1, 2, 0, 1, 3, 2]
>>> itemList[:]=OrderedDict.fromkeys(itemList)
>>> itemList
[1, 2, 0, 3]
This has a Runtime : O(N)

Remove duplicates from list, including original matching item

I tried searching and couldn't find this exact situation, so apologies if it exists already.
I'm trying to remove duplicates from a list as well as the original item I'm searching for. If I have this:
ls = [1, 2, 3, 3]
I want to end up with this:
ls = [1, 2]
I know that using set will remove duplicates like this:
print set(ls) # set([1, 2, 3])
But it still retains that 3 element which I want removed. I'm wondering if there's a way to remove the duplicates and original matching items too.
Use a list comprehension and list.count:
>>> ls = [1, 2, 3, 3]
>>> [x for x in ls if ls.count(x) == 1]
[1, 2]
>>>
Here is a reference on both of those.
Edit:
#Anonymous made a good point below. The above solution is perfect for small lists but may become slow with larger ones.
For large lists, you can do this instead:
>>> from collections import Counter
>>> ls = [1, 2, 3, 3]
>>> c = Counter(ls)
>>> [x for x in ls if c[x] == 1]
[1, 2]
>>>
Here is a reference on collections.Counter.
If items are contigious, then you can use groupby which saves building an auxillary data structure in memory...:
from itertools import groupby, islice
data = [1, 2, 3, 3]
# could also use `sorted(data)` if need be...
new = [k for k, g in groupby(data) if len(list(islice(g, 2))) == 1]
# [1, 2]

Categories