I have a list of integers...
[1,2,3,4,5,8,9,10,11,200,201,202]
I would like to group them into a list of lists where each sublist contains integers whose sequence has not been broken. Like this...
[[1,5],[8,11],[200,202]]
I have a rather clunky work around...
lSequenceOfNum = [1,2,3,4,5,8,9,10,11,200,201,202]
lGrouped = []
start = 0
for x in range(0,len(lSequenceOfNum)):
if x != len(lSequenceOfNum)-1:
if(lSequenceOfNum[x+1] - lSequenceOfNum[x]) > 1:
lGrouped.append([lSequenceOfNum[start],lSequenceOfNum[x]])
start = x+1
else:
lGrouped.append([lSequenceOfNum[start],lSequenceOfNum[x]])
print lGrouped
It is the best I could do. Is there a more "pythonic" way to do this? Thanks..
Assuming the list will always be in ascending order:
from itertools import groupby, count
numberlist = [1,2,3,4,5,8,9,10,11,200,201,202]
def as_range(g):
l = list(g)
return l[0], l[-1]
print [as_range(g) for _, g in groupby(numberlist, key=lambda n, c=count(): n-next(c))]
I realised I had overcomplicated this a little, far easier to just count manually than use a slightly convoluted generator:
def ranges(seq):
start, end = seq[0], seq[0]
count = start
for item in seq:
if not count == item:
yield start, end
start, end = item, item
count = item
end = item
count += 1
yield start, end
print(list(ranges([1,2,3,4,5,8,9,10,11,200,201,202])))
Producing:
[(1, 5), (8, 11), (200, 202)]
This method is pretty fast:
This method (and the old one, they perform almost exactly the same):
python -m timeit -s "from test import ranges" "ranges([1,2,3,4,5,8,9,10,11,200,201,202])"
1000000 loops, best of 3: 0.47 usec per loop
Jeff Mercado's Method:
python -m timeit -s "from test import as_range; from itertools import groupby, count" "[as_range(g) for _, g in groupby([1,2,3,4,5,8,9,10,11,200,201,202], key=lambda n, c=count(): n-next(c))]"
100000 loops, best of 3: 11.1 usec per loop
That's over 20x faster - although, naturally, unless speed matters this isn't a real concern.
My old solution using generators:
import itertools
def resetable_counter(start):
while True:
for i in itertools.count(start):
reset = yield i
if reset:
start = reset
break
def ranges(seq):
start, end = seq[0], seq[0]
counter = resetable_counter(start)
for count, item in zip(counter, seq): #In 2.x: itertools.izip(counter, seq)
if not count == item:
yield start, end
start, end = item, item
counter.send(item)
end = item
yield start, end
print(list(ranges([1,2,3,4,5,8,9,10,11,200,201,202])))
Producing:
[(1, 5), (8, 11), (200, 202)]
You can do this efficiently in three steps
given
list1=[1,2,3,4,5,8,9,10,11,200,201,202]
Calculate the discontinuity
[1,2,3,4,5,8,9,10,11 ,200,201,202]
- [1,2,3,4,5,8,9 ,10 ,11 ,200,201,202]
----------------------------------------
[1,1,1,1,3,1,1 ,1 ,189,1 ,1]
(index) 1 2 3 4 5 6 7 8 9 10 11
* *
rng = [i+1 for i,e in enumerate((x-y for x,y in zip(list1[1:],list1))) if e!=1]
>>> rng
[5, 9]
Add the boundaries
rng = [0] + rng + [len(list1)]
>>> rng
[0, 5, 9,12]
now calculate the actual continuity ranges
[(list1[i],list1[j-1]) for i,j in zip(list2,list2[1:])]
[(1, 5), (8, 11), (200, 202)]
LB [0, 5, 9, 12]
UB [0, 5, 9, 12]
-----------------------
indexes (LB,UB-1) (0,4) (5,8) (9,11)
The question is quite old, but I thought I'll share my solution anyway
Assuming import numpy as np
a = [1,2,3,4,5,8,9,10,11,200,201,202]
np.split(a, array(np.add(np.where(diff(a)>1),1)).tolist()[0])
pseudo code (with off-by-one errors to fix):
jumps = new array;
for idx from 0 to len(array)
if array[idx] != array[idx+1] then jumps.push(idx);
I think this is actually a case where it makes sense to work with the indices (as in C, before java/python/perl/etc. improved upon this) instead of the objects in the array.
Here's a version that should be easy to read:
def close_range(el, it):
while True:
el1 = next(it, None)
if el1 != el + 1:
return el, el1
el = el1
def compress_ranges(seq):
iterator = iter(seq)
left = next(iterator, None)
while left is not None:
right, left1 = close_range(left, iterator)
yield (left, right)
left = left1
list(compress_ranges([1, 2, 3, 4, 5, 8, 9, 10, 11, 200, 201, 202]))
Similar questions:
Python - find incremental numbered sequences with a list comprehension
Pythonic way to convert a list of integers into a string of comma-separated ranges
input = [1, 2, 3, 4, 8, 10, 11, 12, 17]
i, ii, result = iter(input), iter(input[1:]), [[input[0]]]
for x, y in zip(i,ii):
if y-x != 1:
result.append([y])
else:
result[-1].append(y)
>>> result
[[1, 2, 3, 4], [8], [10, 11, 12], [17]]
>>> print ", ".join("-".join(map(str,(g[0],g[-1])[:len(g)])) for g in result)
1-4, 8, 10-12, 17
>>> [(g[0],g[-1])[:len(g)] for g in result]
[(1, 4), (8,), (10, 12), (17,)]
Related
I have a list:
lst = [ 1,2,3,4,5,6,7,8]
I want to increment all numbers above index 4.
for i in range(4,len(lst)):
lst[i]+=2
Since this operation needs to be done many time, I want to do it the most efficient way possible.
How can I do this fast.
Use Numpy for fast array manipulations, check the example below:
import numpy as np
lst = np.array([1,2,3,4,5,6,7,8])
# add 2 at all indices from 4 till the end of the array
lst[4:] += 2
print(lst)
# array([ 1, 2, 3, 4, 7, 8, 9, 10])
If you are updating large ranges of a large list many times, use a more suitable data structure so that the updates don't take O(n) time each.
One such data structure is a segment tree, where each list element corresponds to a leaf node in a tree; the true value of the list element can be represented as the sum of the values on the path between the leaf node and the root node. This way, adding a number to a single internal node is effectively like adding it to all of the list elements represented by that subtree.
The data structure supports get/set operations by index in O(log n) time, and add-in-range operations also in O(log n) time. The solution below uses a binary tree, implemented using a list of length <= 2n.
class RangeAddList:
def __init__(self, vals):
# list length
self._n = len(vals)
# smallest power of 2 >= list length
self._m = 1 << (self._n - 1).bit_length()
# list representing binary tree; leaf nodes offset by _m
self._vals = [0]*self._m + vals
def __repr__(self):
return '{}({!r})'.format(self.__class__.__name__, list(self))
def __len__(self):
return self._n
def __iter__(self):
for i in range(self._n):
yield self[i]
def __getitem__(self, i):
if i not in range(self._n):
raise IndexError()
# add up values from leaf to root node
t = 0
i += self._m
while i > 0:
t += self._vals[i]
i >>= 1
return t + self._vals[0]
def __setitem__(self, i, x):
# add difference (new value - old value)
self._vals[self._m + i] += x - self[i]
def add_in_range(self, i, j, x):
if i not in range(self._n + 1) or j not in range(self._n + 1):
raise IndexError()
# add at internal nodes spanning range(i, j)
i += self._m
j += self._m
while i < j:
if i & 1:
self._vals[i] += x
i += 1
if j & 1:
j -= 1
self._vals[j] += x
i >>= 1
j >>= 1
Example:
>>> r = RangeAddList([0] * 10)
>>> r.add_in_range(0, 4, 10)
>>> r.add_in_range(6, 9, 20)
>>> r.add_in_range(3, 7, 100)
>>> r
RangeAddList([10, 10, 10, 110, 100, 100, 120, 20, 20, 0])
It turns out that NumPy is so well-optimized, you need to go up to lists of length 50,000 or so before the segment tree catches up. The segment tree is still only about twice as fast as NumPy's O(n) range updates for lists of length 100,000 on my machine. You may want to benchmark with your own data to be sure.
This is a fast way of doing it:
lst1 = [1, 2, 3, 4, 5, 6, 7, 8]
new_list = [*lst[:4], *[x+2 for x in lst1[4:]]]
# or even better
new_list[4:] = [x+2 for x in lst1[4:]]
In terms of speed, numpy isn't faster for lists this small:
import timeit
import numpy as np
lst1 = [1, 2, 3, 4, 5, 6, 7, 8]
npa = np.array(lst)
def numpy_it():
global npa
npa[4:] += 2
def python_it():
global lst1
lst1 = [*lst1[:4], *[x+2 for x in lst1[4:]]]
print(timeit.timeit(numpy_it))
print(timeit.timeit(python_it))
For me gets:
1.7008036
0.6737076000000002
But for anything serious numpy beats generating a new list for the slice that needs replacing, which beats regenerating the entire list (which beats in-place replacement with a loop like in your example):
import timeit
import numpy as np
lst1 = list(range(0, 10000))
npa = np.array(lst1)
lst2 = list(range(0, 10000))
lst3 = list(range(0, 10000))
def numpy_it():
global npa
npa[4:] += 2
def python_it():
global lst1
lst1 = [*lst1[:4], *[x+2 for x in lst1[4:]]]
def python_it_slice():
global lst2
lst2[4:] = [x+2 for x in lst2[4:]]
def python_inplace():
global lst3
for i in range(4, len(lst3)):
lst3[i] = lst3[i] + 2
n = 10000
print(timeit.timeit(numpy_it, number=n))
print(timeit.timeit(python_it_slice, number=n))
print(timeit.timeit(python_it, number=n))
print(timeit.timeit(python_inplace, number=n))
Results:
0.057994199999999996
4.3747423
4.5193105000000005
9.949074000000001
Use assign to slice:
lst[4:] = [x+2 for x in lst[4:]]
Test (on my ancient ThinkPad i3-3110, Python 3.5.2):
import timeit
lst = [1, 2, 3, 4, 5, 6, 7, 8]
def python_it():
global lst
lst = [*lst[:4], *[x+2 for x in lst[4:]]]
def python_it2():
global lst
lst[4:] = [x+2 for x in lst[4:]]
print(timeit.timeit(python_it))
print(timeit.timeit(python_it2))
Prints:
1.2732834180060308
0.9285018060181756
use python builtin map function and lambda
lst = [1,2,3,4,5,6,7,8]
lst[4:] = map(lambda x:x+2, lst[4:])
print(lst)
# [1, 2, 3, 4, 7, 8, 9, 10]
I have a query which is a list of numbers. I want to get the ranges of indices in which the number 1 appears. The range starts when 1 appears and ends on the index in which it doesn't appear. I have made an example to illustrate this.
query= [0,0,0,0,0,1,1,1,0,1,0,0,1,1,0]
answer = [[5,8],[9,10],[12,14]]
Note: I am not looking for the first and last index of some value in a list in Python. I'm looking for all the places in which they start and end.
Update: From some of the suggested answers below it looks like Itertools is quite handy for this stuff.
You can use itertools.dropwhile to do this.
>>> query = [0,0,0,0,0,1,1,1,0,1,0,0,1,1,0]
>>> n = 1
>>>
>>> from itertools import dropwhile
>>> itr = enumerate(query)
>>> [[i, next(dropwhile(lambda t: t[1]==n, itr), [len(query)])[0]] for i,x in itr if x==n]
[[5, 8], [9, 10], [12, 14]]
You could also use itertools.groupby for this. Use enumerate to get the indices, then groupby the actual value, then filter by the value, finally get the first and last index from the group.
>>> from itertools import groupby
>>> query = [0,0,0,0,0,1,1,1,0,1,0,0,1,1,0]
>>> [(g[0][0], g[-1][0]+1) for g in (list(g) for k, g in
... groupby(enumerate(query), key=lambda t: t[1]) if k == 1)]
...
[(5, 8), (9, 10), (12, 14)]
query= [0,0,0,0,0,1,1,1,0,1,0,0,1,1,0]
first = 0 # Track the first index in the current group
ingroup = False # Track whether we are currently in a group of ones
answer = []
for i, e in enumerate(query):
if e:
if not ingroup:
first = i
else:
if ingroup:
answer.append([first, i])
ingroup = e
if ingroup:
answer.append([first, len(query)])
>>> answer
[[5, 8], [9, 10], [12, 14]]
I think you probably want something like this.
you can just use basic for loop and if statement where are you checking
where the series of '0' changes to a series of '1' and vice versa
query= [0,0,0,0,0,1,1,1,0,1,0,0,1,1,0]
r_0 = []
r_1 = []
for i in range(len(query)-1):
if query[i] == 0 and query[i+1] == 1:
r_0.append(i+1) # [5, 9, 12]
if query[i] == 1 and query[i + 1] == 0:
r_1.append(i + 1) # [8, 10, 14]
print (list(zip(r_0,r_1)))
output:
[(5, 8), (9, 10), (12, 14)]
Hope this helps. Its a solution without foor-loops
from itertools import chain
query = [0,0,0,0,0,1,1,1,0,1,0,0,1,1,0]
result = list(zip(
filter(lambda i: query[i] == 1 and (i == 0 or query[i-1] != 1), range(len(query))),
chain(filter(lambda i: query[i] != 1 and query[i-1] == 1, range(1, len(query))), [len(query)-1])
))
print(result)
The output is:
[(2, 3), (5, 8), (9, 10), (12, 14)]
Wanted to share a recursive approach
query= [0,0,0,0,0,1,1,1,0,1,0,0,1,1,0]
def findOccurrences(of, at, index=0, occurrences=None):
if occurrences == None: occurrences = [] # python has a weird behavior over lists as the default
# parameter, unfortunately this neets to be done
try:
last = start = query.index(of, index)
for i in at[start:]:
if i == of:
last += 1
else:
break
occurrences.append([start, last])
return findOccurrences(of, at, last, occurrences)
except:
pass
return occurrences
print(findOccurrences(1, query))
print(findOccurrences(1, query, 0)) # Offseting
print(findOccurrences(0, query, 9)) # Offseting with defaul list
I would like to know if there exists a base solution to do something like this:
for n in range(length=8, start_position= 3, direction= forward)
The problem I'm encountering is I would like the loop to continue past the final index, and pick up again at idx =0, then idx=1, etc. and stop at idx= 3, the start_position.
To give context, I seek all possible complete solutions to the n-queen problem.
Based on your latest edit, you need a "normal" range and the modulo operator:
for i in range(START, START + LEN):
do_something_with(i % LEN)
from itertools import chain
for n in chain(range(3,8), range(3)):
...
The chain() returns an iterator with 3, 4, ..., 7, 0, 1, 2
Another option for solving this is to use modular arithmetic. You could do something like this, for example:
for i in range(8)
idx = (i + 3) % 8
# use idx
This easily can be generalized to work with different lengths and offsets.
def loop_around_range(length, start_position, direction='forward'):
looped_range = [k % length for k in range(start_position, start_position+length)]
if direction == 'forward':
return looped_range
else:
return looped_range[::-1]
You could implement this for an arbitrary iterable by using itertools.cycle.
from itertools import cycle
def circular_iterator(iterable, skip=0, length=None, reverse=False):
"""Produces a full cycle of #iterable#, skipping the first #skip# elements
then tacking them on to the end.
if #iterable# does not implement #__len__#, you must provide #length#
"""
if reverse:
iterable = reversed(iterable)
cyc_iter = cycle(iterable)
for _ in range(skip):
next(cyc_iter, None)
if length:
total_length = length
else:
total_length = len(iterable)
for _ in range(total_length):
yield next(cyc_iter, None)
>>> lst = [x for x in range(1, 9)]
# [1, 2, 3, 4, 5, 6, 7, 8]
>>> list(circular_iterator(lst, skip=3))
[4, 5, 6, 7, 8, 1, 2, 3]
I have a bunch of numbers, say the following:
1 2 3 4 6 7 8 20 24 28 32
The information presented there could be represented in Python as ranges:
[range(1, 5), range(6, 9), range(20, 33, 4)]
In my output I'd write 1..4, 6..8, 20..32..4, but that is just a matter of presentation.
Another answer shows how one can do this for contiguous ranges. I don't see how I can easily do this for strided ranges like above. Is there a similar trick for this?
Here's a straight forward approach at the problem.
def get_ranges(ls):
N = len(ls)
while ls:
# single element remains, yield the trivial range
if N == 1:
yield range(ls[0], ls[0] + 1)
break
diff = ls[1] - ls[0]
# find the last index that satisfies the determined difference
i = next(i for i in range(1, N) if i + 1 == N or ls[i+1] - ls[i] != diff)
yield range(ls[0], ls[i] + 1, diff)
# update variables
ls = ls[i+1:]
N -= i + 1
def ranges(data):
result = []
if not data:
return result
idata = iter(data)
first = prev = next(idata)
for following in idata:
if following - prev == 1:
prev = following
else:
result.append((first, prev + 1))
first = prev = following
# There was either exactly 1 element and the loop never ran,
# or the loop just normally ended and we need to account
# for the last remaining range.
result.append((first, prev+1))
return result
Test:
>>> data = range(1, 5) + range(6, 9) + range(20, 24)
>>> print ranges(data)
[(1, 5), (6, 9), (20, 24)]
You can use groupby and count from itertools module along with Counter from collections module like this example:
Update: See the comments in order to understand the logic behind this solution and its limitations.
from itertools import groupby, count
from collections import Counter
def ranges_list(data=list, func=range, min_condition=1):
# Sort in place the ranges list
data.sort()
# Find all the steps between the ranges's elements
steps = [v-k for k,v in zip(data, data[1:])]
# Find the repeated items's steps based on condition.
# Default: repeated more than once (min_condition = 1)
repeated = [item for item, count in Counter(steps).items() if count > min_condition]
# Group the items in to a dict based on the repeated steps
groups = {k:[list(v) for _,v in groupby(data, lambda n, c = count(step = k): n-next(c))] for k in repeated}
# Create a dict:
# - keys are the steps
# - values are the grouped elements
sub = {k:[j for j in v if len(j) > 1] for k,v in groups.items()}
# Those two lines are for pretty printing purpose:
# They are meant to have a sorted output.
# You can replace them by:
# return [func(j[0], j[-1]+1,k) for k,v in sub.items() for j in v]
# Otherwise:
final = [(j[0], j[-1]+1,k) for k,v in sub.items() for j in v]
return [func(*k) for k in sorted(final, key = lambda x: x[0])]
ranges1 = [1, 2, 3, 4, 6, 7, 8, 20, 24, 28, 32]
ranges2 = [1, 2, 3, 4, 6, 7, 10, 20, 24, 28, 50,51,59,60]
print(ranges_list(ranges1))
print(ranges_list(ranges2))
Output:
[range(1, 5), range(6, 9), range(20, 33, 4)]
[range(1, 5), range(6, 8), range(20, 29, 4), range(50, 52), range(59, 61)]
Limitations:
With this kind of intput:
ranges3 = [1,3,6,10]
print(ranges_list(ranges3)
print(ranges_list(ranges3, min_condition=0))
Will output:
# Steps are repeated <= 1 with the condition: min_condition = 1
# Will output an empty list
[]
# With min_condition = 0
# Will output the ranges using: zip(data, data[1:])
[range(1, 4, 2), range(3, 7, 3), range(6, 11, 4)]
Feel free to use this solution and adopt it or modify it in order to fill your needs.
It might not be super short or elegant, but it seems to work:
def ranges(ls):
li = iter(ls)
first = next(li)
while True:
try:
element = next(li)
except StopIteration:
yield range(first, first+1)
return
step = element - first
last = element
while True:
try:
element = next(li)
except StopIteration:
yield range(first, last+step, step)
return
if element - last != step:
yield range(first, last+step, step)
first = element
break
last = element
This iterates over an iterator of the list, and yields range objects:
>>> list(ranges([1, 2, 3, 4, 6, 7, 8, 20, 24, 28, 32]))
[range(1, 5), range(6, 9), range(20, 33, 4)]
It also handles negative ranges, and ranges that have just one element:
>>> list(ranges([9,8,7, 1,3,5, 99])
[range(9, 6, -1), range(1, 7, 2), range(99, 100)]
This question already has answers here:
Find the most common element in a list
(27 answers)
Closed 2 years ago.
In Python, I have a list:
L = [1, 2, 45, 55, 5, 4, 4, 4, 4, 4, 4, 5456, 56, 6, 7, 67]
I want to identify the item that occurred the highest number of times. I am able to solve it but I need the fastest way to do so. I know there is a nice Pythonic answer to this.
I am surprised no-one has mentioned the simplest solution,max() with the key list.count:
max(lst,key=lst.count)
Example:
>>> lst = [1, 2, 45, 55, 5, 4, 4, 4, 4, 4, 4, 5456, 56, 6, 7, 67]
>>> max(lst,key=lst.count)
4
This works in Python 3 or 2, but note that it only returns the most frequent item and not also the frequency. Also, in the case of a draw (i.e. joint most frequent item) only a single item is returned.
Although the time complexity of using max() is worse than using Counter.most_common(1) as PM 2Ring comments, the approach benefits from a rapid C implementation and I find this approach is fastest for short lists but slower for larger ones (Python 3.6 timings shown in IPython 5.3):
In [1]: from collections import Counter
...:
...: def f1(lst):
...: return max(lst, key = lst.count)
...:
...: def f2(lst):
...: return Counter(lst).most_common(1)
...:
...: lst0 = [1,2,3,4,3]
...: lst1 = lst0[:] * 100
...:
In [2]: %timeit -n 10 f1(lst0)
10 loops, best of 3: 3.32 us per loop
In [3]: %timeit -n 10 f2(lst0)
10 loops, best of 3: 26 us per loop
In [4]: %timeit -n 10 f1(lst1)
10 loops, best of 3: 4.04 ms per loop
In [5]: %timeit -n 10 f2(lst1)
10 loops, best of 3: 75.6 us per loop
from collections import Counter
most_common,num_most_common = Counter(L).most_common(1)[0] # 4, 6 times
For older Python versions (< 2.7), you can use this recipe to create the Counter class.
In your question, you asked for the fastest way to do it. As has been demonstrated repeatedly, particularly with Python, intuition is not a reliable guide: you need to measure.
Here's a simple test of several different implementations:
import sys
from collections import Counter, defaultdict
from itertools import groupby
from operator import itemgetter
from timeit import timeit
L = [1,2,45,55,5,4,4,4,4,4,4,5456,56,6,7,67]
def max_occurrences_1a(seq=L):
"dict iteritems"
c = dict()
for item in seq:
c[item] = c.get(item, 0) + 1
return max(c.iteritems(), key=itemgetter(1))
def max_occurrences_1b(seq=L):
"dict items"
c = dict()
for item in seq:
c[item] = c.get(item, 0) + 1
return max(c.items(), key=itemgetter(1))
def max_occurrences_2(seq=L):
"defaultdict iteritems"
c = defaultdict(int)
for item in seq:
c[item] += 1
return max(c.iteritems(), key=itemgetter(1))
def max_occurrences_3a(seq=L):
"sort groupby generator expression"
return max(((k, sum(1 for i in g)) for k, g in groupby(sorted(seq))), key=itemgetter(1))
def max_occurrences_3b(seq=L):
"sort groupby list comprehension"
return max([(k, sum(1 for i in g)) for k, g in groupby(sorted(seq))], key=itemgetter(1))
def max_occurrences_4(seq=L):
"counter"
return Counter(L).most_common(1)[0]
versions = [max_occurrences_1a, max_occurrences_1b, max_occurrences_2, max_occurrences_3a, max_occurrences_3b, max_occurrences_4]
print sys.version, "\n"
for vers in versions:
print vers.__doc__, vers(), timeit(vers, number=20000)
The results on my machine:
2.7.2 (v2.7.2:8527427914a2, Jun 11 2011, 15:22:34)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)]
dict iteritems (4, 6) 0.202214956284
dict items (4, 6) 0.208412885666
defaultdict iteritems (4, 6) 0.221301078796
sort groupby generator expression (4, 6) 0.383440971375
sort groupby list comprehension (4, 6) 0.402786016464
counter (4, 6) 0.564319133759
So it appears that the Counter solution is not the fastest. And, in this case at least, groupby is faster. defaultdict is good but you pay a little bit for its convenience; it's slightly faster to use a regular dict with a get.
What happens if the list is much bigger? Adding L *= 10000 to the test above and reducing the repeat count to 200:
dict iteritems (4, 60000) 10.3451900482
dict items (4, 60000) 10.2988479137
defaultdict iteritems (4, 60000) 5.52838587761
sort groupby generator expression (4, 60000) 11.9538850784
sort groupby list comprehension (4, 60000) 12.1327362061
counter (4, 60000) 14.7495789528
Now defaultdict is the clear winner. So perhaps the cost of the 'get' method and the loss of the inplace add adds up (an examination of the generated code is left as an exercise).
But with the modified test data, the number of unique item values did not change so presumably dict and defaultdict have an advantage there over the other implementations. So what happens if we use the bigger list but substantially increase the number of unique items? Replacing the initialization of L with:
LL = [1,2,45,55,5,4,4,4,4,4,4,5456,56,6,7,67]
L = []
for i in xrange(1,10001):
L.extend(l * i for l in LL)
dict iteritems (2520, 13) 17.9935798645
dict items (2520, 13) 21.8974409103
defaultdict iteritems (2520, 13) 16.8289561272
sort groupby generator expression (2520, 13) 33.853593111
sort groupby list comprehension (2520, 13) 36.1303369999
counter (2520, 13) 22.626899004
So now Counter is clearly faster than the groupby solutions but still slower than the iteritems versions of dict and defaultdict.
The point of these examples isn't to produce an optimal solution. The point is that there often isn't one optimal general solution. Plus there are other performance criteria. The memory requirements will differ substantially among the solutions and, as the size of the input goes up, memory requirements may become the overriding factor in algorithm selection.
Bottom line: it all depends and you need to measure.
Here is a defaultdict solution that will work with Python versions 2.5 and above:
from collections import defaultdict
L = [1,2,45,55,5,4,4,4,4,4,4,5456,56,6,7,67]
d = defaultdict(int)
for i in L:
d[i] += 1
result = max(d.iteritems(), key=lambda x: x[1])
print result
# (4, 6)
# The number 4 occurs 6 times
Note if L = [1, 2, 45, 55, 5, 4, 4, 4, 4, 4, 4, 5456, 7, 7, 7, 7, 7, 56, 6, 7, 67]
then there are six 4s and six 7s. However, the result will be (4, 6) i.e. six 4s.
If you're using Python 3.8 or above, you can use either statistics.mode() to return the first mode encountered or statistics.multimode() to return all the modes.
>>> import statistics
>>> data = [1, 2, 2, 3, 3, 4]
>>> statistics.mode(data)
2
>>> statistics.multimode(data)
[2, 3]
If the list is empty, statistics.mode() throws a statistics.StatisticsError and statistics.multimode() returns an empty list.
Note before Python 3.8, statistics.mode() (introduced in 3.4) would additionally throw a statistics.StatisticsError if there is not exactly one most common value.
A simple way without any libraries or sets
def mcount(l):
n = [] #To store count of each elements
for x in l:
count = 0
for i in range(len(l)):
if x == l[i]:
count+=1
n.append(count)
a = max(n) #largest in counts list
for i in range(len(n)):
if n[i] == a:
return(l[i],a) #element,frequency
return #if something goes wrong
Perhaps the most_common() method
I obtained the best results with groupby from itertools module with this function using Python 3.5.2:
from itertools import groupby
a = [1, 2, 45, 55, 5, 4, 4, 4, 4, 4, 4, 5456, 56, 6, 7, 67]
def occurrence():
occurrence, num_times = 0, 0
for key, values in groupby(a, lambda x : x):
val = len(list(values))
if val >= occurrence:
occurrence, num_times = key, val
return occurrence, num_times
occurrence, num_times = occurrence()
print("%d occurred %d times which is the highest number of times" % (occurrence, num_times))
Output:
4 occurred 6 times which is the highest number of times
Test with timeit from timeit module.
I used this script for my test with number= 20000:
from itertools import groupby
def occurrence():
a = [1, 2, 45, 55, 5, 4, 4, 4, 4, 4, 4, 5456, 56, 6, 7, 67]
occurrence, num_times = 0, 0
for key, values in groupby(a, lambda x : x):
val = len(list(values))
if val >= occurrence:
occurrence, num_times = key, val
return occurrence, num_times
if __name__ == '__main__':
from timeit import timeit
print(timeit("occurrence()", setup = "from __main__ import occurrence", number = 20000))
Output (The best one):
0.1893607140000313
I want to throw in another solution that looks nice and is fast for short lists.
def mc(seq=L):
"max/count"
max_element = max(seq, key=seq.count)
return (max_element, seq.count(max_element))
You can benchmark this with the code provided by Ned Deily which will give you these results for the smallest test case:
3.5.2 (default, Nov 7 2016, 11:31:36)
[GCC 6.2.1 20160830]
dict iteritems (4, 6) 0.2069783889998289
dict items (4, 6) 0.20462976200065896
defaultdict iteritems (4, 6) 0.2095775119996688
sort groupby generator expression (4, 6) 0.4473949929997616
sort groupby list comprehension (4, 6) 0.4367636879997008
counter (4, 6) 0.3618192010007988
max/count (4, 6) 0.20328268999946886
But beware, it is inefficient and thus gets really slow for large lists!
Simple and best code:
def max_occ(lst,x):
count=0
for i in lst:
if (i==x):
count=count+1
return count
lst=[1, 2, 45, 55, 5, 4, 4, 4, 4, 4, 4, 5456, 56, 6, 7, 67]
x=max(lst,key=lst.count)
print(x,"occurs ",max_occ(lst,x),"times")
Output: 4 occurs 6 times
My (simply) code (three months studying Python):
def more_frequent_item(lst):
new_lst = []
times = 0
for item in lst:
count_num = lst.count(item)
new_lst.append(count_num)
times = max(new_lst)
key = max(lst, key=lst.count)
print("In the list: ")
print(lst)
print("The most frequent item is " + str(key) + ". Appears " + str(times) + " times in this list.")
more_frequent_item([1, 2, 45, 55, 5, 4, 4, 4, 4, 4, 4, 5456, 56, 6, 7, 67])
The output will be:
In the list:
[1, 2, 45, 55, 5, 4, 4, 4, 4, 4, 4, 5456, 56, 6, 7, 67]
The most frequent item is 4. Appears 6 times in this list.
if you are using numpy in your solution for faster computation use this:
import numpy as np
x = np.array([2,5,77,77,77,77,77,77,77,9,0,3,3,3,3,3])
y = np.bincount(x,minlength = max(x))
y = np.argmax(y)
print(y) #outputs 77
Following is the solution which I came up with if there are multiple characters in the string all having the highest frequency.
mystr = input("enter string: ")
#define dictionary to store characters and their frequencies
mydict = {}
#get the unique characters
unique_chars = sorted(set(mystr),key = mystr.index)
#store the characters and their respective frequencies in the dictionary
for c in unique_chars:
ctr = 0
for d in mystr:
if d != " " and d == c:
ctr = ctr + 1
mydict[c] = ctr
print(mydict)
#store the maximum frequency
max_freq = max(mydict.values())
print("the highest frequency of occurence: ",max_freq)
#print all characters with highest frequency
print("the characters are:")
for k,v in mydict.items():
if v == max_freq:
print(k)
Input: "hello people"
Output:
{'o': 2, 'p': 2, 'h': 1, ' ': 0, 'e': 3, 'l': 3}
the highest frequency of occurence: 3
the characters are:
e
l
may something like this:
testList = [1, 2, 3, 4, 2, 2, 1, 4, 4]
print(max(set(testList), key = testList.count))