I want to write a program that checks for duplicate values and removes them.
So for example I want from this:
list = [2,6,8,2,9,8,8,5,2,2]
To get only this:
uniques = [6,9,5]
I can't seem to find a good way to compare every item with each other to see if they are equal and them remove them. Any help will be very appreciated!
Use collections.Counter:
>>> from collections import Counter
>>> lst = [2,6,8,2,9,8,8,5,2,2]
>>> c = Counter(lst)
>>> [x for x in c if c[x] == 1]
[6, 9, 5]
Other than using count or in for each element, this should be O(n) instead of O(n²).
If you want to preserve the order of the elements too, follow this best method :
uniques = list(dict.fromkeys(list))
print(uniques)
Given that you want a list of values that appear only once in your list, this should work. I used a hashmap to store the count of values in the the list and then added the keys which have a count of 1 into unqiue.
l = [2,6,8,2,9,8,8,5,2,2]
unique = []
count = {}
for i in l:
if count.get(i) is None:
count[i] = 1
else:
count[i]+=1
for i in count.keys():
if count[i] == 1:
unique.append(i)
print(unique)
Output
[6, 9, 5]
def unique(x: list):
a = set()
for i in range(len(x)):
if x.index(x[i]) == (len(x) - 1) - x[::-1].index(x[i]):
a.add(x[i])
return list(a)
print(unique([1, 2, 1, 11, 4, 4, 6, 3, 2]))
# [3, 11, 6]
list = [2,6,8,2,9,8,8,5,2,2]
uniques = []
for i in range(len(list)):
if list.count(list[i])==1:
uniques.append(list[i])
list = [2,6,8,2,9,8,8,5,2,2]
uniques = []
for i in list:
if i not in uniques:
uniques.append(i)
Related
Im trying to create a new list of unique values and remove said values from the original list so that what's left is duplicates. It appears my for loop is skipping over values.
array = [1,3,4,2,2,3,4]
def duplicates(array):
mylist = []
for item in array:
if item not in mylist:
mylist.append(item)
array.remove(item)
return mylist
results:
duplicates(array)
[1, 4, 2]
I think that using collections.Counter is more appropriate for this task:
array = [1, 3, 4, 2, 2, 3, 4]
from collections import Counter
def duplicates(array):
return [n for n, c in Counter(array).items() if c > 1]
print(duplicates(array))
Output:
[3, 4, 2]
The issue is with the array.remove(item), it is deleting the element at the index position visited. So, index number reduces by one and making the loop to skip reading the next value.
[1, 3, 4, 2, 2, 3, 4] -> before 1st iteration index 0 -> value =1
[3, 4, 2, 2, 3, 4] -> After 1st iteration 1 is removed, so index 0 -> value =3(loop not reading it as it already read index 0, so loop is reading index 1 -> value 4)
Correct code to display values without duplicates:
array = [1,3,4,2,2,3,4]
def duplicates(array):
mylist = []
for item in array:
if item not in mylist:
mylist.append(item)
#array.remove(item)
return mylist
res=duplicates(array)
print (res)
You are removing values from the list you are iterating through, so your loop is skipping values, try this
array = [1,3,4,2,2,3,4]
def duplicates(array):
mylist = []
for i, item in enumerate(array):
if item not in mylist:
mylist.append(item)
array[i] = None
array[:] = list(filter(
lambda x: x is not None,
array
))
return mylist
Though you should clarify what you want to do with array variable as it is currently unclear.
array = [1,3,4,2,2,3,4]
def duplicates(array):
mylist = []
for item in array:
if item not in mylist:
mylist.append(item)
array.remove(item)
else:
array.remove(item)
return mylist
just remove the item that you don't append
You do not need to use a loop, it is much clearer to use a list comprehension
dups = list(set([l for l in array if array.count(l) > 1]))
However, the answer provided by kuco 23 does this appropriately with a loop.
A bit unclear what result you expect. If you want to get all unique values while maintaining order of occurrence, the canonical way to achieve this would be to use a collections.OrderedDict:
from collections import OrderedDict
def duplicates(array):
return list(OrderedDict.fromkeys(array))
>>> duplicates(array)
[1, 3, 4, 2]
If you want get a list of only duplicates, i.e. values that occur more than once, you could use a collections.Counter:
from collections import Counter
def duplicates(array):
return [k for k, v in Counter(array).items() if v > 1]
>>> duplicates(array)
[3, 4, 2]
I have two lists which I want to return len() of similar values in a list.
A = [1,1,2,2]
B = [3,3,3,3,7,7,7]
In first list there are twice number 1 and 2, I want to use len of number values in the list, to see how many times number 1 repeats in first list. in that case will be 2 and 2 for number 2.
This is a job for collections.Counter
>>> from collections import Counter
>>> Counter([1,1,2,2])
Counter({1: 2, 2: 2})
>>> Counter([3,3,3,3,7,7,7])
Counter({3: 4, 7: 3})
Quick one single line solution that doesn't use collections counter.
A=[3,4,4,4,3,5,6,8,4,3]
duplicates=dict(set((x,A.count(x)) for x in filter(lambda rec : A.count(rec)>1,A)))
output:
{3: 3, 4: 4}
This solution doesn't account for "stretches" however
You can simply iterate over your numbers and count identical ones - or use itertools.groupby:
def count_em(l):
"""Returns a list of lenghts of consecutive equal numbers as list.
Example: [1,2,3,4,4,4,3,3] ==> [1,1,1,3,2]"""
if not isinstance(l,list):
return None
def count():
"""Counts equal elements, yields each count"""
# set the first elem as current
curr = [l[0]]
# for the rest of elements
for elem in l[1:]:
if elem == curr[-1]:
# append as long as the element is same as last one in curr
curr.append(elem)
else:
# yield the number
yield len(curr)
# reset curr to count the new ones
curr = [elem]
# yield last group
yield len(curr)
# get all yields and return them as list
return list(count())
def using_groupby(l):
"""Uses itertools.groupby and a list comp to get the lenghts."""
from itertools import groupby
grp = groupby(l) # this groups by the elems themselfs
# count the grouped items and return as list
return [ sum(1 for _ in items) for g,items in grp]
Test:
A = [1,1,2,2]
B = [3,3,3,3,7,7,7]
C = [1,1,2,2,2,1,1,1,1,1,6,6]
for e in [A,B,C]:
print(count_em(e), using_groupby(e))
Output:
# count_em using_groupby Input
[2, 2] [2, 2] # [1,1,2,2]
[4, 3] [4, 3] # [3,3,3,3,7,7,7]
[2, 3, 5, 2] [2, 3, 5, 2] # [1,1,2,2,2,1,1,1,1,1,6,6]
I am trying to create a function that receives a list and return another list with the repeated elements.
For example for the input A = [2,2,1,1,3,2] (the list is not sorted) and the function would return result = [[1,1], [2,2,2]]. The result doesn't need to be sorted.
I already did it in Wolfram Mathematica but now I have to translate it to python3, Mathematica has some functions like Select, Map and Split that makes it very simple without using long loops with a lot of instructions.
result = [[x] * A.count(x) for x in set(A) if A.count(x) > 1]
Simple approach:
def grpBySameConsecutiveItem(l):
rv= []
last = None
for elem in l:
if last == None:
last = [elem]
continue
if elem == last[0]:
last.append(elem)
continue
if len(last) > 1:
rv.append(last)
last = [elem]
return rv
print grpBySameConsecutiveItem([1,2,1,1,1,2,2,3,4,4,4,4,5,4])
Output:
[[1, 1, 1], [2, 2], [4, 4, 4, 4]]
You can sort your output afterwards if you want to have it sorted or sort your inputlist , then you wouldnt get consecutive identical numbers any longer though.
See this https://stackoverflow.com/a/4174955/7505395 for how to sort lists of lists depending on an index (just use 0) as all your inner lists are identical.
You could also use itertools - it hast things like TakeWhile - that looks much smarter if used
This will ignore consecutive ones, and just collect them all:
def grpByValue(lis):
d = {}
for key in lis:
if key in d:
d[key] += 1
else:
d[key] = 1
print(d)
rv = []
for k in d:
if (d[k]<2):
continue
rv.append([])
for n in range(0,d[k]):
rv[-1].append(k)
return rv
data = [1,2,1,1,1,2,2,3,4,4,4,4,5,4]
print grpByValue(data)
Output:
[[1, 1, 1, 1], [2, 2, 2], [4, 4, 4, 4, 4]]
You could do this with a list comprehension:
A = [1,1,1,2,2,3,3,3]
B = []
[B.append([n]*A.count(n)) for n in A if B.count([n]*A.count(n)) == 0]
outputs [[1,1,1],[2,2],[3,3,3]]
Or more pythonically:
A = [1,2,2,3,4,1,1,2,2,2,3,3,4,4,4]
B = []
for n in A:
if B.count([n]*A.count(n)) == 0:
B.append([n]*A.count(n))
outputs [[1,1,1],[2,2,2,2,2],[3,3,3],[4,4,4,4]]
Works with sorted or unsorted list, if you need to sort the list before hand you can do for n in sorted(A)
This is a job for Counter(). Iterating over each element, x, and checking A.count(x) has a O(N^2) complexity. Counter() will count how many times each element exists in your iterable in one pass and then you can generate your result by iterating over that dictionary.
>>> from collections import Counter
>>> A = [2,2,1,1,3,2]
>>> counts = Counter(A)
>>> result = [[key] * value for key, value in counts.items() if value > 1]
>>> result
[[2, 2, 2], [[1, 1]]
This question already has answers here:
How do I find the duplicates in a list and create another list with them?
(42 answers)
Closed 7 years ago.
This function takes an integer list (which your function must not modify) of unsorted values and returns a sorted list of all the duplicates in that first list. For example, duplicates([1, 3, 5, 7, 9, 5, 3, 5, 3]) would return [3, 5]. If there are no duplicates, return an empty list.
Following is my current code, which doesn't work. How can I solve this problem?
def FindDuplicates(in_list):
unique = set(in_list)
for each in unique:
count = in_list.count(each)
if count > 1:
print count
return True
print []
return False
For doing this without a Counter/dict, you can modify your current code to save the duplicate values in a new list, like below:
def FindDuplicates(in_list):
duplicates = []
unique = set(in_list)
for each in unique:
count = in_list.count(each)
if count > 1:
duplicates.append(each)
print duplicates
which will output:
>>> FindDuplicates(lst)
[3, 5]
If you need sorted results, use a sorted(duplicates) call at the end to get results sorted by their values.
You can also solve this (i.e. finding duplicates in a list) using collections.Counter and a list comprehension, as below:
>>> from collections import Counter
>>> lst = [1, 3, 5, 7, 9, 5, 3, 5, 3]
>>> def duplicates(list_of_numbers):
... counter = Counter(list_of_numbers)
... return [y for y in counter if counter[y] > 1]
...
>>> duplicates(lst)
[3, 5]
The above solution assumes the elements of the list are hashable.
Slightly faster, at O(nlogn) worst-case
def FindDuplicates(in_list):
unique = set()
duplicates = set()
for i in in_list:
if i in unique: #hey, I've seen you before
duplicates.add(i)
else:
unique.add(i)
return sorted(duplicates)
#It's this call to sorted that makes it O(nlogn)
#without it, it'd be O(n)
Also the "Man, you can't stop me from using a counter!" variant. Also O(nlogn).
def FindDuplicates(in_list):
d = {}
for i in in_list:
if i in d:
d[i] += 1
else:
d[i] = 1
return sorted(i for i,j in d.items() if j > 1)
#python2 use d.iteritems() not d.items()
(I suppose
if i not in d:
d[i] = 1
else:
d[i] += 1
makes more sense to most people, and it'd work too.)
I want to get the number of times x appears in the nested list.
if the list is:
list = [1, 2, 1, 1, 4]
list.count(1)
>>3
This is OK. But if the list is:
list = [[1, 2, 3],[1, 1, 1]]
How can I get the number of times 1 appears? In this case, 4.
>>> L = [[1, 2, 3], [1, 1, 1]]
>>> sum(x.count(1) for x in L)
4
itertools and collections modules got just the stuff you need (flatten the nested lists with itertools.chain and count with collections.Counter
import itertools, collections
data = [[1,2,3],[1,1,1]]
counter = collections.Counter(itertools.chain(*data))
print counter[1]
Use a recursive flatten function instead of itertools.chain to flatten nested lists of arbitrarily level depth
import operator, collections
def flatten(lst):
return reduce(operator.iadd, (flatten(i) if isinstance(i, collections.Sequence) else [i] for i in lst))
reduce with operator.iadd has been used instead of sum so that the flattened is built only once and updated in-place
Here is yet another approach to flatten a nested sequence. Once the sequence is flattened it is an easy check to find count of items.
def flatten(seq, container=None):
if container is None:
container = []
for s in seq:
try:
iter(s) # check if it's iterable
except TypeError:
container.append(s)
else:
flatten(s, container)
return container
c = flatten([(1,2),(3,4),(5,[6,7,['a','b']]),['c','d',('e',['f','g','h'])]])
print(c)
print(c.count('g'))
d = flatten([[[1,(1,),((1,(1,))), [1,[1,[1,[1]]]], 1, [1, [1, (1,)]]]]])
print(d)
print(d.count(1))
The above code prints:
[1, 2, 3, 4, 5, 6, 7, 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
1
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
12
Try this:
reduce(lambda x,y: x+y,list,[]).count(1)
Basically, you start with an empty list [] and add each element of the list list to it. In this case the elements are lists themselves and you get a flattened list.
PS: Just got downvoted for a similar answer in another question!
PPS: Just got downvoted for this solution as well!
If there is only one level of nesting flattening can be done with this list comprenension:
>>> L = [[1,2,3],[1,1,1]]
>>> [ item for sublist in L for item in sublist ].count(1)
4
>>>
For the heck of it: count to any arbitrary nesting depth, handling tuples, lists and arguments:
hits = lambda num, *n: ((1 if e == num else 0)
for a in n
for e in (hits(num, *a) if isinstance(a, (tuple, list)) else (a,)))
lst = [[[1,(1,),((1,(1,))), [1,[1,[1,[1]]]], 1, [1, [1, (1,)]]]]]
print sum(hits(1, lst, 1, 1, 1))
15
def nested_count(lst, x):
return lst.count(x) + sum(
nested_count(l,x) for l in lst if isinstance(l,list))
This function returns the number of occurrences, plus the recursive nested count in all contained sub-lists.
>>> data = [[1,2,3],[1,1,[1,1]]]
>>> print nested_count(data, 1)
5
The following function will flatten lists of lists of any depth(a) by adding non-lists to the resultant output list, and recursively processing lists:
def flatten(listOrItem, result = None):
if result is None: result = [] # Ensure initial result empty.
if type(listOrItem) != type([]): # Handle non-list by appending.
result.append(listOrItem)
else:
for item in listOrItem: # Recursively handle each item in a list.
flatten(item, result)
return result # Return flattened container.
mylist = flatten([[1,2],[3,'a'],[5,[6,7,[8,9]]],[10,'a',[11,[12,13,14]]]])
print(f'Flat list is {mylist}, count of "a" is {mylist.count("a")}')
print(flatten(7))
Once you have a flattened list, it's a simple matter to use count on it.
The output of that code is:
Flat list is [1, 2, 3, 'a', 5, 6, 7, 8, 9, 10, 'a', 11, 12, 13, 14], count of "a" is 2
[7]
Note the behaviour if you don't pass an actual list, it assumes you want a list regardless, one containing just the single item.
If you don't want to construct a flattened list, you can just use a similar method to get the count of any item in the list of lists, with something like:
def deepCount(listOrItem, searchFor):
if type(listOrItem) != type([]): # Non-list, one only if equal.
return 1 if listOrItem == searchFor else 0
subCount = 0 # List, recursively collect each count.
for item in listOrItem:
subCount += deepCount(item, searchFor)
return subCount
deepList = [[1,2],[3,'a'],[5,[6,7,[8,9]]],[10,'a',[11,[12,13,14]]]]
print(f'Count of "a" is {deepCount(deepList, "a")}')
print(f'Count of 13 is {deepCount(deepList, 13)}')
print(f'Count of 99 is {deepCount(deepList, 99)}')
As expected, the output of this is:
Count of "a" is 2
Count of 13 is 1
Count of 99 is 0
(a) Up to the limits imposed by Python itself of course, limits you can increase by just adding this to the top of your code:
import sys
sys.setrecursionlimit(1001) # I believe default is 1000.
I mention that just in case you have some spectacularly deeply nested structures but you shouldn't really need it. If you're nesting that deeply then you're probably doing something wrong :-)