Counting occurrences in a Python list - python

I have a list of integers; for example:
l = [1, 2, 3, 4, 4, 4, 1, 1, 1, 2]
I am trying to make a list of the three elements in l with the highest number of occurrences, in descending order of frequency. So in this case I want the list [1, 4, 2], because 1 occurs the most in l (four times), 4 is next with three instances, and then 2 with two. I only want the top three results, so 3 (with only one instance) doesn't make the list.
How can I generate that list?

Use a collections.Counter:
import collections
l= [1 ,2 ,3 ,4,4,4 , 1 ,1 ,1 ,2]
x=collections.Counter(l)
print(x.most_common())
# [(1, 4), (4, 3), (2, 2), (3, 1)]
print([elt for elt,count in x.most_common(3)])
# [1, 4, 2]
collections.Counter was introduced in Python 2.7. If you are using an older version, then you could use the implementation here.

l_items = set(l) # produce the items without duplicates
l_counts = [ (l.count(x), x) for x in set(l)]
# for every item create a tuple with the number of times the item appears and
# the item itself
l_counts.sort(reverse=True)
# sort the list of items, reversing is so that big items are first
l_result = [ y for x,y in l_counts ]
# get rid of the counts leaving just the items

from collections import defaultdict
l= [1 ,2 ,3 ,4,4,4 , 1 , 1 ,1 ,2]
counter=defaultdict(int)
for item in l:
counter[item]+=1
inverted_dict = dict([[v,k] for k,v in counter.items()])
for count in sorted(inverted_dict.keys()):
print inverted_dict[count],count
This should print out the most frequents items in 'l': you would need to restrict to the first three. Be careful when using the inverted_dict there (that is the keys and values gets swapped): this will result in an over-write of values (if two items have identical counts, then only one will be written back to the dict).

Without using collections:
a = reversed(sorted(l,key=l.count))
outlist = []
for element in a:
if element not in outlist:
outlist.append(element)
The first line gets you all the original items sorted by count.
The for loop is necessary to uniquify without losing the order (there may be a better way).

Related

Using zip on the results of itertools.groupby unexpectedly gives empty lists

I've encountered some unexpected empty lists when using zip to transpose the results of itertools.groupby. In reality my data is a bunch of objects, but for simplicity let's say my starting data is this list:
> a = [1, 1, 1, 2, 1, 3, 3, 2, 1]
I want to group the duplicates, so I use itertools.groupby (sorting first, because otherwise groupby only groups consecutive duplicates):
from itertools import groupby
duplicates = groupby(sorted(a))
This gives an itertools.groupby object which when converted to a list gives
[(1, <itertools._grouper object at 0x7fb3fdd86850>), (2, <itertools._grouper object at 0x7fb3fdd91700>), (3, <itertools._grouper object at 0x7fb3fdce7430>)]
So far, so good. But now I want to transpose the results so I have a list of the unique values, [1, 2, 3], and a list of the items in each duplicate group, [<itertools._grouper object ...>, ...]. For this I used the solution in this answer on using zip to "unzip":
>>> keys, values = zip(*duplicates)
>>> print(keys)
(1, 2, 3)
>>> print(values)
(<itertools._grouper object at 0x7fb3fdd37940>, <itertools._grouper object at 0x7fb3fddfb040>, <itertools._grouper object at 0x7fb3fddfb250>)
But when I try to read the itertools._grouper objects, I get a bunch of empty lists:
>>> for value in values:
... print(list(value))
...
[]
[]
[]
What's going on? Shouldn't each value contain the duplicates in the original list, i.e. (1, 1, 1, 1, 1), (2, 2) and (3, 3)?
Ah. The beauty of multiple iterator all using the same underlying object.
The documentation of groupby addresses this very issue:
The returned group is itself an iterator that shares the underlying iterable with groupby(). Because the source is shared, when the groupby() object is advanced, the previous group is no longer visible. So, if that data is needed later, it should be stored as a list:
groups = []
uniquekeys = []
data = sorted(data, key=keyfunc)
for k, g in groupby(data, keyfunc):
groups.append(list(g)) # Store group iterator as a list
uniquekeys.append(k)
So what ends up happening is that all your itertools._grouper objects are consumed before you ever unpack them. You see a similar effect if you try reusing any other iterator more than once. If you want to understand better, look at the next paragraph in the docs, which shows how the internals of groupby actually work.
Part of what helped me understand this is to work examples with a more obviously non-reusable iterator, like a file object. It helps to dissociate from the idea of an underlying buffer you can just keep track of.
A simple fix is to consume the objects yourself, as the documentation recommends:
# This is an iterator over a list:
duplicates = groupby(sorted(a))
# If you convert duplicates to a list, you consume it
# Don't store _grouper objects: consume them yourself:
keys, values = zip(*((key, list(value)) for key, value in duplicates)
As the other answer suggests, you don't need an O(N log N) solution that involves sorting, since you can do this in O(N) time in a single pass. Rather than use a Counter, though, I'd recommend a defaultdict to help store the lists:
from collections import defaultdict
result = defaultdict(list)
for item in a:
result[item].append(item)
For more complex objects, you'd index with key(item) instead of item.
To have grouping by each unique key for duplicate processing:
import itertools
a = [1, 1, 1, 2, 1, 3, 3, 2, 1]
g1 = itertools.groupby(sorted(a))
for k,v in g1:
print(f"Key {k} has", end=" ")
for e in v:
print(e, end=" ")
print()
# Key 1 has 1 1 1 1 1
# Key 2 has 2 2
# Key 3 has 3 3
If it's just for counting how many, with minimal sorting:
import itertools
import collections
a = [1, 1, 1, 2, 1, 3, 3, 2, 1]
g1 = itertools.groupby(a)
c1 = collections.Counter()
for k,v in g1:
l = len(tuple(v))
c1[k] += l
for k,v in c1.items():
print(f"Element {k} repeated {v} times")
# Element 1 repeated 5 times
# Element 2 repeated 2 times
# Element 3 repeated 2 times

Pythonic way of getting hierarchy of elements in numeric list

I have a numeric list a and I want to output a list with the hierarchical position of every element in a (0 for the highest value, 1 for the second-highest, etc).
I want to know if this is the most Pythonic and efficient way to do this. Perhaps there is a better way?
a = [3,5,6,25,-3,100]
b = sorted(a)
b = b[::-1]
[b.index(i) for i in a]
#ThierryLathuille's answer works only if there are no duplicates in the input list since the answer relies on a dict with the list values as keys. If there can be duplicates in the list, you should sort the items in the input list with their indices generated by enumerate, and map those indices to their sorted positions instead:
from operator import itemgetter
mapping = dict(zip(map(itemgetter(0), sorted(enumerate(a), key=itemgetter(1), reverse=True)), range(len(a))))
mapping becomes:
{5: 0, 3: 1, 2: 2, 1: 3, 0: 4, 4: 5}
so that you can then iterate an index over the length of the list to obtain the sorted positions in order:
[mapping[i] for i in range(len(a))]
which returns:
[4, 3, 2, 1, 5, 0]
You could also you numpy.argsort(-a) (-a because argsort assumes ascending order). It could have better performance for large arrays (though there's no official analysis that I know of).
One problem with your solution is the repeated use of index, that will make your final comprehension O(n**2), as index has to go over the sorted list each time.
It would be more efficient to build a dict with the rank of each value in the sorted list:
a = [3,5,6,25,-3,100]
ranks = {val:idx for idx, val in enumerate(sorted(a, reverse=True))}
# {100: 0, 25: 1, 6: 2, 5: 3, 3: 4, -3: 5}
out = [ranks[val] for val in a]
print(out)
# [4, 3, 2, 1, 5, 0]
in order to have a final step in O(n).
First, zip the list with a with range(len(a)) to create a list of tuples (of element and their positions), sort this list in reverse order, zip this with range(len(a)) to mark the positions of each element after the sort, now unsort this list (by sorting this based on the original position of each element), and finally grab the position of each element when it was sorted
>>> a = [3,5,6,25,-3,100]
>>> [i for _,i in sorted(zip(sorted(zip(a, range(len(a))), reverse=True), range(len(a))), key=lambda t:t[0][1])]
[4, 3, 2, 1, 5, 0]

how to delete x instances of y element in a list in Python

I have a list, and in that list, I have a lot of duplicated values. This is the format of the list:
https://imgur.com/a/tj2ZwxG
So I have some fields, in this order: "User_ID" "Movie_ID" "Rating" "Time"
What I want to do is, remove, from the 5th occurrence of "User_ID" untill I find a differente "User_ID". For example:
Let's suppose that I have a list with only "User_ID" (from 1 - 196) like this:
1, 1, 1 ,1 ,1, 1, 2 ,2 , 2, 2, 2, 2, 2...
In this case, I have six occurrences of number 1 and seven occurrences of number 2.
So, I will remove, from 1, after the fifth occurrence, until I find the first "2". And the same thing for 2: I will start removing after its fifth occurrence, untill I find a new number, which will be "3", and so on.
So, I will get a new list, like this: 1, 1, 1, 1, 1, 2, 2, 2, 2, 2
containing only 5 instances of each different element.
I know I can acess all the "User_ID" field like this: list[index]["User_ID"]
is there a function that does that? Or if there isn't, could someone help me to create one?
Thanks for the help!
What I was trying to do was something like this:
a = 0
b = 1
start = 0
position = 0
while(something that I don't know):
while(list[a]['User_ID'] == list[b]['User_ID']): #iterate through the list, and I only advance to the next elements if the previous and next elements are the same
a+=1
b+=1
position+=1
if(list[a]['User_ID'] != list[b]['User_ID']): #when I finally find a different element
del new_list[start:start+position] #I delete from the start position, which is five untill the position before the different element.
a+=1
b+=1
start+=5
list=[1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3]
unique=set(list)
for x in unique:
y=list.count(x)
while y>5:
list.remove(x)
y-=1
print(list)
Your input seems to be a list of dict instances. You can use various itertools to only keep 5 dicts with same User_ID key in a space and time efficient manner:
from itertools import chain, groupby, islice
from operator import itemgetter
lst = [{'User_ID': 1, ...}, {'User_ID': 1, ...}, ..., {'User_ID': 2, ...}, ...]
key = itemgetter('User_ID')
only5 = list(chain.from_iterable(islice(g, 5) for _, g in groupby(lst, key=key)))
This groups the list into chunks with the same User_ID and then takes the first 5 from each chunk into the new list.
I am mostly confused by your list of [1,1,1,1,1] etc, it looks like you have a list of dicts or objects.
If you care about every field you can probably just make it a set then back into a list:
my_list = list(set(my_list))
if they are objects, you can override __eq__(self,other) and __hash__(self) and I think you will be able to use the same list/set/list transform to remove duplicates.

Compare i vs other items in a Python list

Good Day, I've googled this question and have found similar answers but not what I am looking for. I am not sure what the problem is called so I that doesn't help me and I am looking for an elegant solution.
How do I loop over a list, item at a time, and compare it to all other items in a list. For example, if I had a list
l = [1,2,3,4]
Each loop of the out would yield something like
1 vs [2,3,4]
2 vs [1,3,4]
3 vs [1,2,4]
4 vs [1,2,3]
One solution I've been playing with involves duplicating the list every iteration, finding the index of the item, deleting it from the duplicate list and compare the two. This route seems less ideal as you have to create a new list on every iteration.
You can use itertools.combiations to create all combinations of the length 3 from your list and then use set.defference method to get the difference element between the l and the combinations. but note that you need to convert your main list to a set object :
>>> from itertools import combinations
>>> l = {1,2,3,4}
>>> [(l.difference(i).pop(),i) for i in combinations(l,3)]
[(4, (1, 2, 3)), (3, (1, 2, 4)), (2, (1, 3, 4)), (1, (2, 3, 4))]
A simple approach would be to use two loops:
arr = [1,2,3,4]
for i in arr:
comp = []
for j in arr:
if i != j:
comp.append(j)
print(comp)
I guess you could use list comprehension. While still creating a new list every iteration, you don't need to delete an item each time:
l = [1,2,3,4]
for i in l:
temp = [item for item in l if item != i]
print temp
[2, 3, 4]
[1, 3, 4]
[1, 2, 4]
[1, 2, 3]

Deleting repeats in a list python [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicates:
How do you remove duplicates from a list in Python whilst preserving order?
In Python, what is the fastest algorithm for removing duplicates from a list so that all elements are unique while preserving order?
I was wondering if there was a function which does the following:
Take a list as an argument:
list = [ 3 , 5 , 6 , 4 , 6 , 2 , 7 , 6 , 5 , 3 ]
and deletes all the repeats in the list to obtain:
list = [ 3 , 5 , 6 , 4 , 2 , 7 ]
I know you can convert it into a dictionary and use the fact that dictionaries cannot have repeats but I was wondering if there was a better way of doing it.
Thanks
Please see the Python documentation for three ways to accomplish this. The following is copied from that site. Replace the example 'mylist' with your variable name ('list').
First Example: If you don’t mind reordering the list, sort it and then scan from the end of the list, deleting duplicates as you go:
if mylist:
mylist.sort()
last = mylist[-1]
for i in range(len(mylist)-2, -1, -1):
if last == mylist[i]:
del mylist[i]
else:
last = mylist[i]
Second Example: If all elements of the list may be used as dictionary keys (i.e. they are all hashable) this is often faster:
d = {}
for x in mylist:
d[x] = 1
mylist = list(d.keys())
Third Example: In Python 2.5 and later:
mylist = list(set(mylist))
Even though you said you don't necessarily want to use a dict, I think an OrderedDict is a clean solution here.
from collections import OrderedDict
l = [3 ,5 ,6 ,4 ,6 ,2 ,7 ,6 ,5 ,3]
OrderedDict.fromkeys(l).keys()
# [3, 5, 6, 4, 2, 7]
Note that this preserves the original order.
list(set(l)) will not preserve the order. If you want to keep the order then do:
s = set()
result = []
for item in l:
if item not in s:
s.add(item)
result.append(item)
print result
This will run in O(n), where n is the length of the original list.
list(set(list)) works just fine.
First, don't name it list as that shadows the built-in type list. Say, my_list
To solve your problem, the way I've seen most often is list(set(my_list))
set is an unordered container that only has unique elements, and gives (i think) O(1) insertion and checking for membership
As of writing this answer, the only solutions which preserve order are the OrderedDict solution, and Dave's slightly-more-verbose solution.
Here's another way where we abuse side-effects while iterating, which is also more verbose than the OrderedDict solution:
def uniques(iterable):
seen = set()
sideeffect = lambda _: True
return [x for x in iterable
if (not x in seen) and sideeffect(seen.add(x))]
A set would be a better approach than a dictionary terms of O complexity. But both approaches make you loose the ordering (unless you use an ordered dictionary, what augments the complexity again).
As other posters already said, the set solution is not that hard:
l = [ 3 , 5 , 6 , 4 , 6 , 2 , 7 , 6 , 5 , 3 ]
list(set(l))
A way to keep the ordering is:
def uniques(l):
seen = set()
for i in l:
if i not in seen:
seen.add(i)
yield i
Or, in a less readable way:
def uniques(l):
seen = set()
return (seen.add(i) or i for i in l if i not in seen)
You can then use it like this:
l = [ 3 , 5 , 6 , 4 , 6 , 2 , 7 , 6 , 5 , 3 ]
list(uniques(l))
>>> [3, 5, 6, 4, 2, 7]
Here is a snippet from my own collection of handy Python tools - this uses the "abusive side-effect" method that ninjagecko has in his answer. This also takes pains to handle non-hashable values, and to return a sequence of the same type as was passed in:
def unique(seq, keepstr=True):
"""Function to keep only the unique values supplied in a given
sequence, preserving original order."""
# determine what type of return sequence to construct
if isinstance(seq, (list,tuple)):
returnType = type(seq)
elif isinstance(seq, basestring):
returnType = (list, type(seq)('').join)[bool(keepstr)]
else:
# - generators and their ilk should just return a list
returnType = list
try:
seen = set()
return returnType(item for item in seq if not (item in seen or seen.add(item)))
except TypeError:
# sequence items are not of a hashable type, can't use a set for uniqueness
seen = []
return returnType(item for item in seq if not (item in seen or seen.append(item)))
Here are a variety of calls, with sequences/iterators/generators of various types:
from itertools import chain
print unique("ABC")
print unique(list("ABABBAC"))
print unique(range(10))
print unique(chain(reversed(range(5)), range(7)))
print unique(chain(reversed(xrange(5)), xrange(7)))
print unique(i for i in chain(reversed(xrange(5)), xrange(7)) if i % 2)
Prints:
ABC
['A', 'B', 'C']
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[4, 3, 2, 1, 0, 5, 6]
[4, 3, 2, 1, 0, 5, 6]
[3, 1, 5]

Categories