Related
So here is a code I have which attempts to add every element in a list irrespective of their lengths.
def elem_sum(lst1, lst2):
f_len = len(lst1) - (len(lst2) - 1)
for i in range(0, len(lst2), 1):
if f_len - i >= len(lst1):
break
else:
lst1[i] = lst1[i] + lst2[i]
return lst1
When certain inputs such as elem_sum([1, 2, 3], [10, 20]) are provided,
the output correctly returns [11, 22, 3]
While other inputs such as elem_sum([1, 2, 3], [10, 20, 30, 40])
the output returns error.
What should I change here to make sure my code works for any given set of inputs??
One option that may be worth considering is to make use of an existing library routine to give you pairs of numbers from lists of unequal lengths. The zip_longest function in the itertools package will do this, giving a sequence of tuples. After the end of one of the lists is reached, the missing values will be filled out using the specified fill value, which defaults to None but you can pass your own fill value.
For example,
from itertools import zip_longest
lst1 = [1, 2, 3]
lst2 = [10, 20, 30, 40]
for t in zip_longest(lst1, lst2, fillvalue=0):
print(t)
gives:
(1, 10)
(2, 20)
(3, 30)
(0, 40)
So you could just write your function to use a list comprehension that calculates the sum of each of these tuples:
from itertools import zip_longest
def elem_sum(lst1, lst2):
return [sum(t) for t in zip_longest(lst1, lst2, fillvalue=0)]
print(elem_sum([1, 2, 3], [10, 20, 30, 40]))
which gives:
[11, 22, 33, 40]
Provided that you are not using a very old version of Python, the itertools package is in the standard Python library, so you do not need to install any add-on packages.
You could just check the lengths and swap the lists if needed:
def elem_sum(lst1, lst2):
# Swap if needed
if len(lst1) < len(lst2):
lst1, lst2 = lst2, lst1
f_len = len(lst1) - (len(lst2) - 1)
for i in range(0, len(lst2), 1):
if f_len - i >= len(lst1):
break
else:
lst1[i] = lst1[i] + lst2[i]
return lst1
I have dictionary frequency as follows:
freq = {'a': 1, 'b': 2, 'c': 3}
It simply means that I have one a's, twob's, and three c's.
I would like to convert it into a complete list:
lst = ['a', 'b', 'b', 'c', 'c', 'c']
What is the fastest way (time-efficient) or most compact way (space-efficient) to do so?
Yes, but only if the items are (or can be represented as) integers, and if the number of items between the smallest and largest item is sufficiently close to the difference between the two, in which case you can use bucket sort, resulting in O(n) time complexity, where n is the difference between the smallest and the largest item. This would be more efficient than using other sorting algorithms, with an average time complexity of O(n log n).
In the case of List = [1, 4, 5, 2, 6, 7, 9, 3] as it is in your question, it is indeed more efficient to use bucket sort when it is known that 1 is the smallest item and 9 is the largest item, since only 8 is missing between the range. The following example uses collections.Counter to account for the possibility that there can be duplicates in the input list:
from collections import Counter
counts = Counter(List)
print(list(Counter({i: counts[i] for i in range(1, 10)}).elements()))
This outputs:
[1, 2, 3, 4, 5, 6, 7, 9]
Let's break this into two O(N) passes: one to catalog the numbers, and one to create the sorted list. I updated the variable names; List is an especially bad choice, given the built-in type list. I also added 10 to each value, so you can see how the low-end offset works.
coll = [11, 14, 15, 12, 16, 17, 19, 13]
last = 19
first = 11
offset = first
size = last-first+1
# Recognize all values in a dense "array"
need = [False] * size
for item in coll:
need[item - offset] = True
# Iterate again in numerical order; for each True value, add that item to the new list
sorted_list = [idx + offset for idx, needed_flag in enumerate(need) if needed_flag]
print(sorted_list)
OUTPUT:
[11, 12, 13, 14, 15, 16, 17, 19]
The most compact way I usually use is list comprehension -
lst = ['a', 'b', 'b', 'c', 'c', 'c']
freq = {i: 0 for i in lst}
for i in lst: freq[i] += 1
Space complexity - O(n)
Time complexity - O(n)
I'd like to do a random shuffle of a list but with one condition: an element can never be in the same original position after the shuffle.
Is there a one line way to do such in python for a list?
Example:
list_ex = [1,2,3]
each of the following shuffled lists should have the same probability of being sampled after the shuffle:
list_ex_shuffled = [2,3,1]
list_ex_shuffled = [3,1,2]
but the permutations [1,2,3], [1,3,2], [2,1,3] and [3,2,1] are not allowed since all of them repeat one of the elements positions.
NOTE: Each element in the list_ex is a unique id. No repetition of the same element is allowed.
Randomize in a loop and keep rejecting the results until your condition is satisfied:
import random
def shuffle_list(some_list):
randomized_list = some_list[:]
while True:
random.shuffle(randomized_list)
for a, b in zip(some_list, randomized_list):
if a == b:
break
else:
return randomized_list
I'd describe such shuffles as 'permutations with no fixed points'. They're also known as derangements.
The probability that a random permutation is a derangement is approximately 1/e (fun to prove). This is true however long the list. Thus an obvious algorithm to give a random derangement is to shuffle the cards normally, and keep shuffling until you have a derangement. The expected number of necessary shuffles is about 3, and it's rare you'll have to shuffle more than ten times.
(1-1/e)**11 < 1%
Suppose there are n people at a party, each of whom brought an umbrella. At the end of the party, each person takes an umbrella at random from the basket. What is the probability that no-one holds their own umbrella?
You could generate all possible valid shufflings:
>>> list_ex = [1,2,3]
>>> import itertools
>>> list(itertools.ifilter(lambda p: not any(i1==i2 for i1,i2 in zip(list_ex, p)),
... itertools.permutations(list_ex, len(list_ex))))
[(2, 3, 1), (3, 1, 2)]
For some other sequence:
>>> list_ex = [7,8,9,0]
>>> list(itertools.ifilter(lambda p: not any(i1==i2 for i1,i2 in zip(list_ex, p)),
... itertools.permutations(list_ex, len(list_ex))))
[(8, 7, 0, 9), (8, 9, 0, 7), (8, 0, 7, 9), (9, 7, 0, 8), (9, 0, 7, 8), (9, 0, 8, 7), (0, 7, 8, 9), (0, 9, 7, 8), (0, 9, 8, 7)]
You could also make this a bit more efficient by short-circuiting the iterator if you just want one result:
>>> list_ex = [1,2,3]
>>> i = itertools.ifilter(lambda p: not any(i1==i2 for i1,i2 in zip(list_ex, p)),
... itertools.permutations(list_ex, len(list_ex)))
>>> next(i)
(2, 3, 1)
But, it would not be a random choice. You'd have to generate all of them and choose one for it to be an actual random result:
>>> list_ex = [1,2,3]
>>> i = itertools.ifilter(lambda p: not any(i1==i2 for i1,i2 in zip(list_ex, p)),
... itertools.permutations(list_ex, len(list_ex)))
>>> import random
>>> random.choice(list(i))
(2, 3, 1)
Here is another take on this. You can pick one solution or another depending on your needs. This is not a one liner but shuffles the indices of elements instead of the elements themselves. Thus, the original list may have duplicate values or values of types that cannot be compared or may be expensive to compare.
#! /usr/bin/env python
import random
def shuffled_values(data):
list_length = len(data)
candidate = range(list_length)
while True:
random.shuffle(candidate)
if not any(i==j for i,j in zip(candidate, range(list_length))):
yield [data[i] for i in candidate]
list_ex = [1, 2, 3]
list_gen = shuffled_values(list_ex)
for i in range(0, 10):
print list_gen.next()
This gives:
[2, 3, 1]
[3, 1, 2]
[3, 1, 2]
[2, 3, 1]
[3, 1, 2]
[3, 1, 2]
[2, 3, 1]
[2, 3, 1]
[3, 1, 2]
[2, 3, 1]
If list_ex is [2, 2, 2], this method will keep yielding [2, 2, 2] over and over. The other solutions will give you empty lists. I am not sure what you want in this case.
Use Knuth-Durstenfeld to shuffle the list. As long as it is found to be in the original position during the shuffling process, a new shuffling process is started from the beginning until it returns to a qualified arrangement. The time complexity of this algorithm is the smallest constant term:
def _random_derangement(x: list, randint: Callable[[int, int], int]) -> None:
'''
Random derangement list x in place, and return None.
An element can never be in the same original position after the shuffle. provides uniform distribution over permutations.
The formal parameter randint requires a callable object such as rand_int(b, a) that generates a random integer within the specified closed interval.
'''
from collections import namedtuple
sequence_type = namedtuple('sequence_type', ('sequence_number', 'elem'))
x_length = len(x)
if x_length > 1:
for i in range(x_length):
x[i] = sequence_type(sequence_number = i, elem = x[i])
end_label = x_length - 1
while True:
for i in range(end_label, 0, -1):
random_location = randint(i, 0)
if x[random_location].sequence_number != i:
x[i], x[random_location] = x[random_location], x[i]
else:
break
else:
if x[0].sequence_number != 0: break
for i in range(x_length):
x[i] = x[i].elem
complete_shuffle
Here's another algorithm. Take cards at random. If your ith card is card i, put it back and try again. Only problem, what if when you get to the last card it's the one you don't want. Swap it with one of the others.
I think this is fair (uniformally random).
import random
def permutation_without_fixed_points(n):
if n == 1:
raise ArgumentError, "n must be greater than 1"
result = []
remaining = range(n)
i = 0
while remaining:
if remaining == [n-1]:
break
x = i
while x == i:
j = random.randrange(len(remaining))
x = remaining[j]
remaining.pop(j)
result.append(x)
i += 1
if remaining == [n-1]:
j = random.randrange(n-1)
result.append(result[j])
result[j] = n
return result
I want to learn python and i thought changing letters without any module or library i tried something like this but it doesn't work:
d=list('banana')
a=list('abcdefghijklmnopqrstuvwxyz')
for i in range:
d[i]=a[i+2]
print d
I got this error:
TypeError: 'builtin_function_or_method' object is not iterable
I would be appreciated if you help me.
You forgot to specify parameters for range function:
d=list('banana')
a=list('a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z')
for i in range(len(d)):
d[i]=a[i+2]
print d
From python documentation:
range(start, stop[, step]) This is a versatile function to create
lists containing arithmetic progressions. It is most often used in for
loops. The arguments must be plain integers. If the step argument is
omitted, it defaults to 1. If the start argument is omitted, it
defaults to 0. The full form returns a list of plain integers [start,
start + step, start + 2 * step, ...]. If step is positive, the last
element is the largest start + i * step less than stop; if step is
negative, the last element is the smallest start + i * step greater
than stop. step must not be zero (or else ValueError is raised).
Example:
>>>
>>> range(10) [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> range(1, 11) [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> range(0, 30, 5)
Edit per request:
d = list('banana')
a = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z']
mappings = dict((ch, a[idx+2]) for idx, ch in enumerate(set(d)))
for idx in range(len(d)):
d[idx] = mappings[d[idx]]
#OR:
d = [mappings[d[idx]] for idx in range(len(d))]
print d
In [63]: d=list('aabbcc')
In [64]: a='a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z'.split(",")
In [65]: for i,x in enumerate(d):
d[i]=a[(a.index(x)+3)%26]
In [66]: d
Out[66]: ['d', 'd', 'e', 'e', 'f', 'f']
string.translate is ideal for this ... Im not sure if that counts as a library ...
>>> import string
>>> tab = string.maketrans("abcdefghijklmnopqrstuvwxyz","mnopqrstuvwxyzabcdefghi
jkl")
>>> print "hello".translate(tab)
tqxxa
alternativly
>>> print "".join([chr(ord(c)+13) if ord(c) + 13 < ord('z') else chr(ord('a')+(ord(c)+13)%ord('z')) for c in "hello"])
'uryyc'
This question already has answers here:
Find the most common element in a list
(27 answers)
Closed 2 years ago.
In Python, I have a list:
L = [1, 2, 45, 55, 5, 4, 4, 4, 4, 4, 4, 5456, 56, 6, 7, 67]
I want to identify the item that occurred the highest number of times. I am able to solve it but I need the fastest way to do so. I know there is a nice Pythonic answer to this.
I am surprised no-one has mentioned the simplest solution,max() with the key list.count:
max(lst,key=lst.count)
Example:
>>> lst = [1, 2, 45, 55, 5, 4, 4, 4, 4, 4, 4, 5456, 56, 6, 7, 67]
>>> max(lst,key=lst.count)
4
This works in Python 3 or 2, but note that it only returns the most frequent item and not also the frequency. Also, in the case of a draw (i.e. joint most frequent item) only a single item is returned.
Although the time complexity of using max() is worse than using Counter.most_common(1) as PM 2Ring comments, the approach benefits from a rapid C implementation and I find this approach is fastest for short lists but slower for larger ones (Python 3.6 timings shown in IPython 5.3):
In [1]: from collections import Counter
...:
...: def f1(lst):
...: return max(lst, key = lst.count)
...:
...: def f2(lst):
...: return Counter(lst).most_common(1)
...:
...: lst0 = [1,2,3,4,3]
...: lst1 = lst0[:] * 100
...:
In [2]: %timeit -n 10 f1(lst0)
10 loops, best of 3: 3.32 us per loop
In [3]: %timeit -n 10 f2(lst0)
10 loops, best of 3: 26 us per loop
In [4]: %timeit -n 10 f1(lst1)
10 loops, best of 3: 4.04 ms per loop
In [5]: %timeit -n 10 f2(lst1)
10 loops, best of 3: 75.6 us per loop
from collections import Counter
most_common,num_most_common = Counter(L).most_common(1)[0] # 4, 6 times
For older Python versions (< 2.7), you can use this recipe to create the Counter class.
In your question, you asked for the fastest way to do it. As has been demonstrated repeatedly, particularly with Python, intuition is not a reliable guide: you need to measure.
Here's a simple test of several different implementations:
import sys
from collections import Counter, defaultdict
from itertools import groupby
from operator import itemgetter
from timeit import timeit
L = [1,2,45,55,5,4,4,4,4,4,4,5456,56,6,7,67]
def max_occurrences_1a(seq=L):
"dict iteritems"
c = dict()
for item in seq:
c[item] = c.get(item, 0) + 1
return max(c.iteritems(), key=itemgetter(1))
def max_occurrences_1b(seq=L):
"dict items"
c = dict()
for item in seq:
c[item] = c.get(item, 0) + 1
return max(c.items(), key=itemgetter(1))
def max_occurrences_2(seq=L):
"defaultdict iteritems"
c = defaultdict(int)
for item in seq:
c[item] += 1
return max(c.iteritems(), key=itemgetter(1))
def max_occurrences_3a(seq=L):
"sort groupby generator expression"
return max(((k, sum(1 for i in g)) for k, g in groupby(sorted(seq))), key=itemgetter(1))
def max_occurrences_3b(seq=L):
"sort groupby list comprehension"
return max([(k, sum(1 for i in g)) for k, g in groupby(sorted(seq))], key=itemgetter(1))
def max_occurrences_4(seq=L):
"counter"
return Counter(L).most_common(1)[0]
versions = [max_occurrences_1a, max_occurrences_1b, max_occurrences_2, max_occurrences_3a, max_occurrences_3b, max_occurrences_4]
print sys.version, "\n"
for vers in versions:
print vers.__doc__, vers(), timeit(vers, number=20000)
The results on my machine:
2.7.2 (v2.7.2:8527427914a2, Jun 11 2011, 15:22:34)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)]
dict iteritems (4, 6) 0.202214956284
dict items (4, 6) 0.208412885666
defaultdict iteritems (4, 6) 0.221301078796
sort groupby generator expression (4, 6) 0.383440971375
sort groupby list comprehension (4, 6) 0.402786016464
counter (4, 6) 0.564319133759
So it appears that the Counter solution is not the fastest. And, in this case at least, groupby is faster. defaultdict is good but you pay a little bit for its convenience; it's slightly faster to use a regular dict with a get.
What happens if the list is much bigger? Adding L *= 10000 to the test above and reducing the repeat count to 200:
dict iteritems (4, 60000) 10.3451900482
dict items (4, 60000) 10.2988479137
defaultdict iteritems (4, 60000) 5.52838587761
sort groupby generator expression (4, 60000) 11.9538850784
sort groupby list comprehension (4, 60000) 12.1327362061
counter (4, 60000) 14.7495789528
Now defaultdict is the clear winner. So perhaps the cost of the 'get' method and the loss of the inplace add adds up (an examination of the generated code is left as an exercise).
But with the modified test data, the number of unique item values did not change so presumably dict and defaultdict have an advantage there over the other implementations. So what happens if we use the bigger list but substantially increase the number of unique items? Replacing the initialization of L with:
LL = [1,2,45,55,5,4,4,4,4,4,4,5456,56,6,7,67]
L = []
for i in xrange(1,10001):
L.extend(l * i for l in LL)
dict iteritems (2520, 13) 17.9935798645
dict items (2520, 13) 21.8974409103
defaultdict iteritems (2520, 13) 16.8289561272
sort groupby generator expression (2520, 13) 33.853593111
sort groupby list comprehension (2520, 13) 36.1303369999
counter (2520, 13) 22.626899004
So now Counter is clearly faster than the groupby solutions but still slower than the iteritems versions of dict and defaultdict.
The point of these examples isn't to produce an optimal solution. The point is that there often isn't one optimal general solution. Plus there are other performance criteria. The memory requirements will differ substantially among the solutions and, as the size of the input goes up, memory requirements may become the overriding factor in algorithm selection.
Bottom line: it all depends and you need to measure.
Here is a defaultdict solution that will work with Python versions 2.5 and above:
from collections import defaultdict
L = [1,2,45,55,5,4,4,4,4,4,4,5456,56,6,7,67]
d = defaultdict(int)
for i in L:
d[i] += 1
result = max(d.iteritems(), key=lambda x: x[1])
print result
# (4, 6)
# The number 4 occurs 6 times
Note if L = [1, 2, 45, 55, 5, 4, 4, 4, 4, 4, 4, 5456, 7, 7, 7, 7, 7, 56, 6, 7, 67]
then there are six 4s and six 7s. However, the result will be (4, 6) i.e. six 4s.
If you're using Python 3.8 or above, you can use either statistics.mode() to return the first mode encountered or statistics.multimode() to return all the modes.
>>> import statistics
>>> data = [1, 2, 2, 3, 3, 4]
>>> statistics.mode(data)
2
>>> statistics.multimode(data)
[2, 3]
If the list is empty, statistics.mode() throws a statistics.StatisticsError and statistics.multimode() returns an empty list.
Note before Python 3.8, statistics.mode() (introduced in 3.4) would additionally throw a statistics.StatisticsError if there is not exactly one most common value.
A simple way without any libraries or sets
def mcount(l):
n = [] #To store count of each elements
for x in l:
count = 0
for i in range(len(l)):
if x == l[i]:
count+=1
n.append(count)
a = max(n) #largest in counts list
for i in range(len(n)):
if n[i] == a:
return(l[i],a) #element,frequency
return #if something goes wrong
Perhaps the most_common() method
I obtained the best results with groupby from itertools module with this function using Python 3.5.2:
from itertools import groupby
a = [1, 2, 45, 55, 5, 4, 4, 4, 4, 4, 4, 5456, 56, 6, 7, 67]
def occurrence():
occurrence, num_times = 0, 0
for key, values in groupby(a, lambda x : x):
val = len(list(values))
if val >= occurrence:
occurrence, num_times = key, val
return occurrence, num_times
occurrence, num_times = occurrence()
print("%d occurred %d times which is the highest number of times" % (occurrence, num_times))
Output:
4 occurred 6 times which is the highest number of times
Test with timeit from timeit module.
I used this script for my test with number= 20000:
from itertools import groupby
def occurrence():
a = [1, 2, 45, 55, 5, 4, 4, 4, 4, 4, 4, 5456, 56, 6, 7, 67]
occurrence, num_times = 0, 0
for key, values in groupby(a, lambda x : x):
val = len(list(values))
if val >= occurrence:
occurrence, num_times = key, val
return occurrence, num_times
if __name__ == '__main__':
from timeit import timeit
print(timeit("occurrence()", setup = "from __main__ import occurrence", number = 20000))
Output (The best one):
0.1893607140000313
I want to throw in another solution that looks nice and is fast for short lists.
def mc(seq=L):
"max/count"
max_element = max(seq, key=seq.count)
return (max_element, seq.count(max_element))
You can benchmark this with the code provided by Ned Deily which will give you these results for the smallest test case:
3.5.2 (default, Nov 7 2016, 11:31:36)
[GCC 6.2.1 20160830]
dict iteritems (4, 6) 0.2069783889998289
dict items (4, 6) 0.20462976200065896
defaultdict iteritems (4, 6) 0.2095775119996688
sort groupby generator expression (4, 6) 0.4473949929997616
sort groupby list comprehension (4, 6) 0.4367636879997008
counter (4, 6) 0.3618192010007988
max/count (4, 6) 0.20328268999946886
But beware, it is inefficient and thus gets really slow for large lists!
Simple and best code:
def max_occ(lst,x):
count=0
for i in lst:
if (i==x):
count=count+1
return count
lst=[1, 2, 45, 55, 5, 4, 4, 4, 4, 4, 4, 5456, 56, 6, 7, 67]
x=max(lst,key=lst.count)
print(x,"occurs ",max_occ(lst,x),"times")
Output: 4 occurs 6 times
My (simply) code (three months studying Python):
def more_frequent_item(lst):
new_lst = []
times = 0
for item in lst:
count_num = lst.count(item)
new_lst.append(count_num)
times = max(new_lst)
key = max(lst, key=lst.count)
print("In the list: ")
print(lst)
print("The most frequent item is " + str(key) + ". Appears " + str(times) + " times in this list.")
more_frequent_item([1, 2, 45, 55, 5, 4, 4, 4, 4, 4, 4, 5456, 56, 6, 7, 67])
The output will be:
In the list:
[1, 2, 45, 55, 5, 4, 4, 4, 4, 4, 4, 5456, 56, 6, 7, 67]
The most frequent item is 4. Appears 6 times in this list.
if you are using numpy in your solution for faster computation use this:
import numpy as np
x = np.array([2,5,77,77,77,77,77,77,77,9,0,3,3,3,3,3])
y = np.bincount(x,minlength = max(x))
y = np.argmax(y)
print(y) #outputs 77
Following is the solution which I came up with if there are multiple characters in the string all having the highest frequency.
mystr = input("enter string: ")
#define dictionary to store characters and their frequencies
mydict = {}
#get the unique characters
unique_chars = sorted(set(mystr),key = mystr.index)
#store the characters and their respective frequencies in the dictionary
for c in unique_chars:
ctr = 0
for d in mystr:
if d != " " and d == c:
ctr = ctr + 1
mydict[c] = ctr
print(mydict)
#store the maximum frequency
max_freq = max(mydict.values())
print("the highest frequency of occurence: ",max_freq)
#print all characters with highest frequency
print("the characters are:")
for k,v in mydict.items():
if v == max_freq:
print(k)
Input: "hello people"
Output:
{'o': 2, 'p': 2, 'h': 1, ' ': 0, 'e': 3, 'l': 3}
the highest frequency of occurence: 3
the characters are:
e
l
may something like this:
testList = [1, 2, 3, 4, 2, 2, 1, 4, 4]
print(max(set(testList), key = testList.count))