Group repeated elements of a list - python

I am trying to create a function that receives a list and return another list with the repeated elements.
For example for the input A = [2,2,1,1,3,2] (the list is not sorted) and the function would return result = [[1,1], [2,2,2]]. The result doesn't need to be sorted.
I already did it in Wolfram Mathematica but now I have to translate it to python3, Mathematica has some functions like Select, Map and Split that makes it very simple without using long loops with a lot of instructions.

result = [[x] * A.count(x) for x in set(A) if A.count(x) > 1]

Simple approach:
def grpBySameConsecutiveItem(l):
rv= []
last = None
for elem in l:
if last == None:
last = [elem]
continue
if elem == last[0]:
last.append(elem)
continue
if len(last) > 1:
rv.append(last)
last = [elem]
return rv
print grpBySameConsecutiveItem([1,2,1,1,1,2,2,3,4,4,4,4,5,4])
Output:
[[1, 1, 1], [2, 2], [4, 4, 4, 4]]
You can sort your output afterwards if you want to have it sorted or sort your inputlist , then you wouldnt get consecutive identical numbers any longer though.
See this https://stackoverflow.com/a/4174955/7505395 for how to sort lists of lists depending on an index (just use 0) as all your inner lists are identical.
You could also use itertools - it hast things like TakeWhile - that looks much smarter if used
This will ignore consecutive ones, and just collect them all:
def grpByValue(lis):
d = {}
for key in lis:
if key in d:
d[key] += 1
else:
d[key] = 1
print(d)
rv = []
for k in d:
if (d[k]<2):
continue
rv.append([])
for n in range(0,d[k]):
rv[-1].append(k)
return rv
data = [1,2,1,1,1,2,2,3,4,4,4,4,5,4]
print grpByValue(data)
Output:
[[1, 1, 1, 1], [2, 2, 2], [4, 4, 4, 4, 4]]

You could do this with a list comprehension:
A = [1,1,1,2,2,3,3,3]
B = []
[B.append([n]*A.count(n)) for n in A if B.count([n]*A.count(n)) == 0]
outputs [[1,1,1],[2,2],[3,3,3]]
Or more pythonically:
A = [1,2,2,3,4,1,1,2,2,2,3,3,4,4,4]
B = []
for n in A:
if B.count([n]*A.count(n)) == 0:
B.append([n]*A.count(n))
outputs [[1,1,1],[2,2,2,2,2],[3,3,3],[4,4,4,4]]
Works with sorted or unsorted list, if you need to sort the list before hand you can do for n in sorted(A)

This is a job for Counter(). Iterating over each element, x, and checking A.count(x) has a O(N^2) complexity. Counter() will count how many times each element exists in your iterable in one pass and then you can generate your result by iterating over that dictionary.
>>> from collections import Counter
>>> A = [2,2,1,1,3,2]
>>> counts = Counter(A)
>>> result = [[key] * value for key, value in counts.items() if value > 1]
>>> result
[[2, 2, 2], [[1, 1]]

Related

How to get a list with only unique numbers? (Python)

I want to write a program that checks for duplicate values and removes them.
So for example I want from this:
list = [2,6,8,2,9,8,8,5,2,2]
To get only this:
uniques = [6,9,5]
I can't seem to find a good way to compare every item with each other to see if they are equal and them remove them. Any help will be very appreciated!
Use collections.Counter:
>>> from collections import Counter
>>> lst = [2,6,8,2,9,8,8,5,2,2]
>>> c = Counter(lst)
>>> [x for x in c if c[x] == 1]
[6, 9, 5]
Other than using count or in for each element, this should be O(n) instead of O(n²).
If you want to preserve the order of the elements too, follow this best method :
uniques = list(dict.fromkeys(list))
print(uniques)
Given that you want a list of values that appear only once in your list, this should work. I used a hashmap to store the count of values in the the list and then added the keys which have a count of 1 into unqiue.
l = [2,6,8,2,9,8,8,5,2,2]
unique = []
count = {}
for i in l:
if count.get(i) is None:
count[i] = 1
else:
count[i]+=1
for i in count.keys():
if count[i] == 1:
unique.append(i)
print(unique)
Output
[6, 9, 5]
def unique(x: list):
a = set()
for i in range(len(x)):
if x.index(x[i]) == (len(x) - 1) - x[::-1].index(x[i]):
a.add(x[i])
return list(a)
print(unique([1, 2, 1, 11, 4, 4, 6, 3, 2]))
# [3, 11, 6]
list = [2,6,8,2,9,8,8,5,2,2]
uniques = []
for i in range(len(list)):
if list.count(list[i])==1:
uniques.append(list[i])
list = [2,6,8,2,9,8,8,5,2,2]
uniques = []
for i in list:
if i not in uniques:
uniques.append(i)

Python: Duplicates in list

Im trying to create a new list of unique values and remove said values from the original list so that what's left is duplicates. It appears my for loop is skipping over values.
array = [1,3,4,2,2,3,4]
def duplicates(array):
mylist = []
for item in array:
if item not in mylist:
mylist.append(item)
array.remove(item)
return mylist
results:
duplicates(array)
[1, 4, 2]
I think that using collections.Counter is more appropriate for this task:
array = [1, 3, 4, 2, 2, 3, 4]
from collections import Counter
def duplicates(array):
return [n for n, c in Counter(array).items() if c > 1]
print(duplicates(array))
Output:
[3, 4, 2]
The issue is with the array.remove(item), it is deleting the element at the index position visited. So, index number reduces by one and making the loop to skip reading the next value.
[1, 3, 4, 2, 2, 3, 4] -> before 1st iteration index 0 -> value =1
[3, 4, 2, 2, 3, 4] -> After 1st iteration 1 is removed, so index 0 -> value =3(loop not reading it as it already read index 0, so loop is reading index 1 -> value 4)
Correct code to display values without duplicates:
array = [1,3,4,2,2,3,4]
def duplicates(array):
mylist = []
for item in array:
if item not in mylist:
mylist.append(item)
#array.remove(item)
return mylist
res=duplicates(array)
print (res)
You are removing values from the list you are iterating through, so your loop is skipping values, try this
array = [1,3,4,2,2,3,4]
def duplicates(array):
mylist = []
for i, item in enumerate(array):
if item not in mylist:
mylist.append(item)
array[i] = None
array[:] = list(filter(
lambda x: x is not None,
array
))
return mylist
Though you should clarify what you want to do with array variable as it is currently unclear.
array = [1,3,4,2,2,3,4]
def duplicates(array):
mylist = []
for item in array:
if item not in mylist:
mylist.append(item)
array.remove(item)
else:
array.remove(item)
return mylist
just remove the item that you don't append
You do not need to use a loop, it is much clearer to use a list comprehension
dups = list(set([l for l in array if array.count(l) > 1]))
However, the answer provided by kuco 23 does this appropriately with a loop.
A bit unclear what result you expect. If you want to get all unique values while maintaining order of occurrence, the canonical way to achieve this would be to use a collections.OrderedDict:
from collections import OrderedDict
def duplicates(array):
return list(OrderedDict.fromkeys(array))
>>> duplicates(array)
[1, 3, 4, 2]
If you want get a list of only duplicates, i.e. values that occur more than once, you could use a collections.Counter:
from collections import Counter
def duplicates(array):
return [k for k, v in Counter(array).items() if v > 1]
>>> duplicates(array)
[3, 4, 2]

Return len of similar values in list

I have two lists which I want to return len() of similar values in a list.
A = [1,1,2,2]
B = [3,3,3,3,7,7,7]
In first list there are twice number 1 and 2, I want to use len of number values in the list, to see how many times number 1 repeats in first list. in that case will be 2 and 2 for number 2.
This is a job for collections.Counter
>>> from collections import Counter
>>> Counter([1,1,2,2])
Counter({1: 2, 2: 2})
>>> Counter([3,3,3,3,7,7,7])
Counter({3: 4, 7: 3})
Quick one single line solution that doesn't use collections counter.
A=[3,4,4,4,3,5,6,8,4,3]
duplicates=dict(set((x,A.count(x)) for x in filter(lambda rec : A.count(rec)>1,A)))
output:
{3: 3, 4: 4}
This solution doesn't account for "stretches" however
You can simply iterate over your numbers and count identical ones - or use itertools.groupby:
def count_em(l):
"""Returns a list of lenghts of consecutive equal numbers as list.
Example: [1,2,3,4,4,4,3,3] ==> [1,1,1,3,2]"""
if not isinstance(l,list):
return None
def count():
"""Counts equal elements, yields each count"""
# set the first elem as current
curr = [l[0]]
# for the rest of elements
for elem in l[1:]:
if elem == curr[-1]:
# append as long as the element is same as last one in curr
curr.append(elem)
else:
# yield the number
yield len(curr)
# reset curr to count the new ones
curr = [elem]
# yield last group
yield len(curr)
# get all yields and return them as list
return list(count())
def using_groupby(l):
"""Uses itertools.groupby and a list comp to get the lenghts."""
from itertools import groupby
grp = groupby(l) # this groups by the elems themselfs
# count the grouped items and return as list
return [ sum(1 for _ in items) for g,items in grp]
Test:
A = [1,1,2,2]
B = [3,3,3,3,7,7,7]
C = [1,1,2,2,2,1,1,1,1,1,6,6]
for e in [A,B,C]:
print(count_em(e), using_groupby(e))
Output:
# count_em using_groupby Input
[2, 2] [2, 2] # [1,1,2,2]
[4, 3] [4, 3] # [3,3,3,3,7,7,7]
[2, 3, 5, 2] [2, 3, 5, 2] # [1,1,2,2,2,1,1,1,1,1,6,6]

Picking the most common element from a bunch of lists

I have a list l of lists [l1, ..., ln] of equal length
I want to compare the l1[k], l2[k], ..., ln[k] for all k in len(l1) and make another list l0 by picking the element that appears most frequently.
So, if l1 = [1, 2, 3], l2 = [1, 4, 4] and l3 = [0, 2, 4], then l = [1, 2, 4]. If there is a tie, I will look at the lists that make up the tie and choose the one in the list with higher priority. Priority is given a priori, each list is given a priority.
Ex. if you have value 1 in lists l1 and l3, and value 2 in lists l2 and l4, and 3 in l5, and lists are ordered according to priority, say l5>l2>l3>l1>l4, then I will pick 2, because 2 is in l2 that contains an element with highest occurrence and its priority is higher than l1 and l3.
How do I do this in python without creating a for loop with lots of if/else conditions?
You can use the Counter module from the collections library. Using the map function will reduce your list looping. You will need an if/else statement for the case that there is no most frequent value but only for that:
import collections
list0 = []
list_length = len(your_lists[0])
for k in list_length:
k_vals = map(lambda x: x[k], your_lists) #collect all values at k pos
counts = collections.Counter(k_vals).most_common() #tuples (val,ct) sorted by count
if counts[0][1] > counts[1][1]: #is there a most common value
list0.append(counts[0][0]) #takes the value with highest count
else:
list0.append(k_vals[0]) #takes element from first list
list0 is the answer you are looking for. I just hate using l because it's easy to confuse with the number 1
Edit (based on comments):
Incorporating your comments, instead of the if/else statement, use a while loop:
i = list_length
while counts[0][1] == counts[1][1]:
counts = collections.Counter(k_vals[:i]).most_common() #ignore the lowest priority element
i -= 1 #go back farther if there's still a tie
list0.append(counts[0][0]) #takes the value with highest count once there's no tie
So the whole thing is now:
import collections
list0 = []
list_length = len(your_lists[0])
for k in list_length:
k_vals = map(lambda x: x[k], your_lists) #collect all values at k pos
counts = collections.Counter(k_vals).most_common() #tuples (val,ct) sorted by count
i = list_length
while counts[0][1] == counts[1][1]: #in case of a tie
counts = collections.Counter(k_vals[:i]).most_common() #ignore the lowest priority element
i -= 1 #go back farther if there's still a tie
list0.append(counts[0][0]) #takes the value with highest count
You throw in one more tiny loop but on the bright side there's no if/else statements at all!
Just transpose the sublists and get the Counter.most_common element key from each group:
from collections import Counter
lists = [[1, 2, 3],[1, 4, 4],[0, 2, 4]]
print([Counter(sub).most_common(1)[0][0] for sub in zip(*lists)])
If they are individual lists just zip those:
l1, l2, l3 = [1, 2, 3], [1, 4, 4], [0, 2, 4]
print([Counter(sub).most_common(1)[0][0] for sub in zip(l1,l2,l3)])
Not sure how taking the first element from the grouping if there is a tie makes sense as it may not be the one that tied but that is trivial to implement, just get the two most_common and check if their counts are equal:
def most_cm(lists):
for sub in zip(*lists):
# get two most frequent
comm = Counter(sub).most_common(2)
# if their values are equal just return the ele from l1
yield comm[0][0] if len(comm) == 1 or comm[0][1] != comm[1][1] else sub[0]
We also need if len(comm) == 1 in case all the elements are the same or we will get an IndexError.
If you are talking about taking the element that comes from the earlier list in the event of a tie i.e l2 comes before l5 then that is just the same as taking any of the elements that tie.
For a decent number of sublists:
In [61]: lis = [[randint(1,10000) for _ in range(10)] for _ in range(100000)]
In [62]: list(most_cm(lis))
Out[62]: [5856, 9104, 1245, 4304, 829, 8214, 9496, 9182, 8233, 7482]
In [63]: timeit list(most_cm(lis))
1 loops, best of 3: 249 ms per loop
Solution is:
a = [1, 2, 3]
b = [1, 4, 4]
c = [0, 2, 4]
print [max(set(element), key=element.count) for element in zip(a, b, c)]
That's what you're looking for:
from collections import Counter
from operator import itemgetter
l0 = [max(Counter(li).items(), key=itemgetter(1))[0] for li in zip(*l)]
If you are OK taking any one of a set of elements that are tied as most common, and you can guarantee that you won't hit an empty list within your list of lists, then here is a way using Counter (so, from collections import Counter):
l = [ [1, 0, 2, 3, 4, 7, 8],
[2, 0, 2, 1, 0, 7, 1],
[2, 0, 1, 4, 0, 1, 8]]
res = []
for k in range(len(l[0])):
res.append(Counter(lst[k] for lst in l).most_common()[0][0])
Doing this in IPython and printing the result:
In [86]: res
Out[86]: [2, 0, 2, 1, 0, 7, 8]
Try this:
l1 = [1,2,3]
l2 = [1,4,4]
l3 = [0,2,4]
lists = [l1, l2, l3]
print [max(set(x), key=x.count) for x in zip(*lists)]

Nested List and count()

I want to get the number of times x appears in the nested list.
if the list is:
list = [1, 2, 1, 1, 4]
list.count(1)
>>3
This is OK. But if the list is:
list = [[1, 2, 3],[1, 1, 1]]
How can I get the number of times 1 appears? In this case, 4.
>>> L = [[1, 2, 3], [1, 1, 1]]
>>> sum(x.count(1) for x in L)
4
itertools and collections modules got just the stuff you need (flatten the nested lists with itertools.chain and count with collections.Counter
import itertools, collections
data = [[1,2,3],[1,1,1]]
counter = collections.Counter(itertools.chain(*data))
print counter[1]
Use a recursive flatten function instead of itertools.chain to flatten nested lists of arbitrarily level depth
import operator, collections
def flatten(lst):
return reduce(operator.iadd, (flatten(i) if isinstance(i, collections.Sequence) else [i] for i in lst))
reduce with operator.iadd has been used instead of sum so that the flattened is built only once and updated in-place
Here is yet another approach to flatten a nested sequence. Once the sequence is flattened it is an easy check to find count of items.
def flatten(seq, container=None):
if container is None:
container = []
for s in seq:
try:
iter(s) # check if it's iterable
except TypeError:
container.append(s)
else:
flatten(s, container)
return container
c = flatten([(1,2),(3,4),(5,[6,7,['a','b']]),['c','d',('e',['f','g','h'])]])
print(c)
print(c.count('g'))
d = flatten([[[1,(1,),((1,(1,))), [1,[1,[1,[1]]]], 1, [1, [1, (1,)]]]]])
print(d)
print(d.count(1))
The above code prints:
[1, 2, 3, 4, 5, 6, 7, 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
1
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
12
Try this:
reduce(lambda x,y: x+y,list,[]).count(1)
Basically, you start with an empty list [] and add each element of the list list to it. In this case the elements are lists themselves and you get a flattened list.
PS: Just got downvoted for a similar answer in another question!
PPS: Just got downvoted for this solution as well!
If there is only one level of nesting flattening can be done with this list comprenension:
>>> L = [[1,2,3],[1,1,1]]
>>> [ item for sublist in L for item in sublist ].count(1)
4
>>>
For the heck of it: count to any arbitrary nesting depth, handling tuples, lists and arguments:
hits = lambda num, *n: ((1 if e == num else 0)
for a in n
for e in (hits(num, *a) if isinstance(a, (tuple, list)) else (a,)))
lst = [[[1,(1,),((1,(1,))), [1,[1,[1,[1]]]], 1, [1, [1, (1,)]]]]]
print sum(hits(1, lst, 1, 1, 1))
15
def nested_count(lst, x):
return lst.count(x) + sum(
nested_count(l,x) for l in lst if isinstance(l,list))
This function returns the number of occurrences, plus the recursive nested count in all contained sub-lists.
>>> data = [[1,2,3],[1,1,[1,1]]]
>>> print nested_count(data, 1)
5
The following function will flatten lists of lists of any depth(a) by adding non-lists to the resultant output list, and recursively processing lists:
def flatten(listOrItem, result = None):
if result is None: result = [] # Ensure initial result empty.
if type(listOrItem) != type([]): # Handle non-list by appending.
result.append(listOrItem)
else:
for item in listOrItem: # Recursively handle each item in a list.
flatten(item, result)
return result # Return flattened container.
mylist = flatten([[1,2],[3,'a'],[5,[6,7,[8,9]]],[10,'a',[11,[12,13,14]]]])
print(f'Flat list is {mylist}, count of "a" is {mylist.count("a")}')
print(flatten(7))
Once you have a flattened list, it's a simple matter to use count on it.
The output of that code is:
Flat list is [1, 2, 3, 'a', 5, 6, 7, 8, 9, 10, 'a', 11, 12, 13, 14], count of "a" is 2
[7]
Note the behaviour if you don't pass an actual list, it assumes you want a list regardless, one containing just the single item.
If you don't want to construct a flattened list, you can just use a similar method to get the count of any item in the list of lists, with something like:
def deepCount(listOrItem, searchFor):
if type(listOrItem) != type([]): # Non-list, one only if equal.
return 1 if listOrItem == searchFor else 0
subCount = 0 # List, recursively collect each count.
for item in listOrItem:
subCount += deepCount(item, searchFor)
return subCount
deepList = [[1,2],[3,'a'],[5,[6,7,[8,9]]],[10,'a',[11,[12,13,14]]]]
print(f'Count of "a" is {deepCount(deepList, "a")}')
print(f'Count of 13 is {deepCount(deepList, 13)}')
print(f'Count of 99 is {deepCount(deepList, 99)}')
As expected, the output of this is:
Count of "a" is 2
Count of 13 is 1
Count of 99 is 0
(a) Up to the limits imposed by Python itself of course, limits you can increase by just adding this to the top of your code:
import sys
sys.setrecursionlimit(1001) # I believe default is 1000.
I mention that just in case you have some spectacularly deeply nested structures but you shouldn't really need it. If you're nesting that deeply then you're probably doing something wrong :-)

Categories