Why is my limit exceeding on the top k frequent question [LEETCODE]? - python

I have the following code for the Leetcode's top k Frequent question.
The time limit complexity allowed is smaller than o(nlogn), where n is the array size
Isn't my big O complexity of o(n)?
If so why am I still exceeding the time limit ?
def topKFrequent(self, nums, k):
output = {}
outlist = []
for item in nums:
output[item] = nums.count(item)
max_count = sorted(output.values(),reverse= True)[:k]
for key,val in output.items():
if val in max_count:
outlist.append(key)
return (outlist)
testinput: array [1,1,1,2,2,3,1,1,1,2,2,3] k = 2
testoutput: [1,2]
Question link: https://leetcode.com/problems/top-k-frequent-elements/

Your solution is O(n^2), because of this:
for item in nums:
output[item] = nums.count(item)
For each item in your array, you're looking through the whole array to count the number of elements which are the same.
Instead of doing this, you can get the counts in O(n) by iterating nums and adding 1 to the counter of each item you find as you go.
The O(n log n) in the end will come from sorted(output.values(), reverse=True) because every generic sorting algorithm (including Timsort) will be O(n log n).

As another answer mentions, your counting is O(n^2) time complexity, which is causing your time limit exceeded. Fortunately, python comes with a Counter object in the collections module, which will do exactly what the other answer describes, but in well-optimized C code. This will reduce your time complexity to O(nlogn).
Furthermore, you can reduce your time complexity to O(nlogk) by replacing the sort call with a min-heap trick. Keep a min-heap of size k, and add the other elements and pop the min one by one, until all elements have been inserted (at some point or another). The k that remain in the heap are your maximum k values.
from collections import Counter
from heapq import heappushpop, heapify
def get_most_frequent(nums, k):
counts = Counter(nums)
counts = [(v, k) for k, v in counts.items()]
heap = counts[:k]
heapify(heap)
for count in counts[k:]:
heappushpop(heap, count)
return [k for v, k in heap]
If you must return the elements in any particular order, you can sort the k elements in O(klogk) time, which still results in the same O(nlogk) time complexity overall.

Related

How can i know the time complexity of my python code?

How can i know the time complexity of this code??, the point of the program is given a list of integers and a single sum value, return the first two values (parse from the left) in order of appearance that add up to form the sum.
If there are two or more pairs with the required sum, the pair whose second element has the smallest index is the solution. more info here: https://www.codewars.com/kata/54d81488b981293527000c8f/train/python/6330a6a6c68f64676db5d0b7
def sum_pairs(arr, suma): ##possible stack solution
pairs = {n:suma-n for n in set(arr)}
unsolved=set()
solved=set()
for i, n in enumerate(arr):
if pairs[n] in unsolved:
solved.add((pairs[n], n, i))
if n not in unsolved:
unsolved.add(n)
if solved == set():
return None
else:
solution = min(solved, key=lambda x:x[2])[:2]
return list(solution)

LeetCode 347 Solution Time Complexity Calculation

I have been following Neetcode and working on several Leetcode problems. Here on 347 he advises that his solution is O(n), but I am having a hard time really breaking out the solution to determine why that is. I feel like it is because the nested for loop only runs until len(answers) == k.
I started off by getting the time complexity of each individual line and the first several are O(n) or O(1), which makes sense. Once I got to the nested for loop I was thrown off because it would make sense that the inner loop would run for each outer loop iteration and result in O(n*m), or something of that nature. This is where I think the limiting factor of the return condition comes in to act as the ceiling for the loop iterations since we will always return once the answers array length equals k (also, in the problem we are guaranteed a unique answer). What is throwing me off the most is that I default to thinking that any nested for loop is going to be O(n^2), which doesn't seem to be uncommon, but seems to not be the case every time.
Could someone please advise if I am on the right track with my suggestion? How would you break this down?
class Solution:
def topKFrequent(self, nums: List[int], k: int) -> List[int]:
countDict = {}
frequency = [[] for i in range(len(nums)+1)]
for j in nums:
countDict[j] = 1 + countDict.get(j, 0)
for c, v in countDict.items():
frequency[v].append(c)
answer = []
for n in range(len(frequency)-1, 0, -1):
for q in frequency[n]:
print(frequency[n])
answer.append(q)
if len(answer) == k:
return answer
frequency is mapping between frequency-of-elements and the values in the original list. The number of total elements inside of frequency is always equal or less than the number of original items in in nums (because it is mapping to the unique values in nums).
Even though there is a nested loop, it is still only ever iterating at some O(C*n) total values, where C <= 1 which is equal to O(n).
Note, you could clean up this answer fairly easily with some basic helpers. Counter can get you the original mapping for countDict. You can use a defaultdict to construct frequency. Then you can flatten the frequency dict values and slice the final result.
from collections import Counter, defaultdict
class Solution:
def top_k_frequent(self, nums: list[int], k: int) -> list[int]:
counter = Counter(nums)
frequency = defaultdict(list)
for num, freq in counter.items():
frequency[freq].append(num)
sorted_nums = []
for freq in sorted(frequency, reverse=True):
sorted_nums += frequency[freq]
return sorted_nums[:k]
A fun way to do this in a one liner!
class Solution:
def top_k_frequent(self, nums: list[int], k: int) -> list[int]:
return sorted(set(nums), key=Counter(nums).get, reverse=True)[:k]

most efficient way to iterate over a large array looking for a missing element in Python

I was trying an online test. the test asked to write a function that given a list of up to 100000 integers whose range is 1 to 100000, would find the first missing integer.
for example, if the list is [1,4,5,2] the output should be 3.
I iterated over the list as follow
def find_missing(num)
for i in range(1, 100001):
if i not in num:
return i
the feedback I receives is the code is not efficient in handling big lists.
I am quite new and I couldnot find an answer, how can I iterate more efficiently?
The first improvement would be to make yours linear by using a set for the repeated membership test:
def find_missing(nums)
s = set(nums)
for i in range(1, 100001):
if i not in s:
return i
Given how C-optimized python sorting is, you could also do sth like:
def find_missing(nums)
s = sorted(set(nums))
return next(i for i, n in enumerate(s, 1) if i != n)
But both of these are fairly space inefficient as they create a new collection. You can avoid that with an in-place sort:
from itertools import groupby
def find_missing(nums):
nums.sort() # in-place
return next(i for i, (k, _) in enumerate(groupby(nums), 1) if i != k)
For any range of numbers, the sum is given by Gauss's formula:
# sum of all numbers up to and including nums[-1] minus
# sum of all numbers up to but not including nums[-1]
expected = nums[-1] * (nums[-1] + 1) // 2 - nums[0] * (nums[0] - 1) // 2
If a number is missing, the actual sum will be
actual = sum(nums)
The difference is the missing number:
result = expected - actual
This compulation is O(n), which is as efficient as you can get. expected is an O(1) computation, while actual has to actually add up the elements.
A somewhat slower but similar complexity approach would be to step along the sequence in lockstep with either a range or itertools.count:
for a, e in zip(nums, range(nums[0], len(nums) + nums[0])):
if a != e:
return e # or break if not in a function
Notice the difference between a single comparison a != e, vs a linear containment check like e in nums, which has to iterate on average through half of nums to get the answer.
You can use Counter to count every occurrence of your list. The minimum number with occurrence 0 will be your output. For example:
from collections import Counter
def find_missing():
count = Counter(your_list)
keys = count.keys() #list of every element in increasing order
main_list = list(range(1:100000)) #the list of values from 1 to 100k
missing_numbers = list(set(main_list) - set(keys))
your_output = min(missing_numbers)
return your_output

How to find the most common string(s) in a Python list?

I am dealing with ancient DNA data. I have an array with n different base pair calls for a given coordinate.
e.g.,
['A','A','C','C','G']
I need to setup a bit in my script whereby the most frequent call(s) are identified. If there is one, it should use that one. If there are two (or three) that are tied (e.g., A and C here), I need it randomly pick one of the two.
I have been looking for a solution but cannot find anything satisfactory. The most frequent solution, I see is Counter, but Counter is useless for me as c.most_common(1) will not identify that 1 and 2 are tied.
You can get the maximum count from the mapping returned by Counter with the max function first, and then ues a list comprehension to output only the keys whose counts equal the maximum count. Since Counter, max, and list comprehension all cost linear time, the overall time complexity of the code can be kept at O(n):
from collections import Counter
import random
lst = ['A','A','C','C','G']
counts = Counter(lst)
greatest = max(counts.values())
print(random.choice([item for item, count in counts.items() if count == greatest]))
This outputs either A or C.
Something like this would work:
import random
string = ['A','A','C','C','G']
dct = {}
for x in set(string):
dct[x] = string.count(x)
max_value = max(dct.values())
lst = []
for key, value in dct.items():
if value == max_value:
lst.append(key)
print(random.choice(lst))

O(nlogn) Runningtime Approach

the function consumes a list of int and produces the unique elements in the list in increasing order. For examples:
singles([4,1,4,17,1]) => [1,4,17]
I only can do it in O(n^2) running time and wonder how to change into O(n) running time without loop.
def singles(lst):
if lst==[]: return []
else:
rest_fn = list (filter (lambda x: x!=lst[0], lst[1:]))
return [lst[0]] + singles(rest_fn)
As discussed above, per https://wiki.python.org/moin/TimeComplexity (which is cited from Time complexity of python set operations? which also links to a more detailed list of operations, sorted should have time complexity O(nlogn). Set should have time complexity O(n). Therefore, doing sorted(set(input)) should have time complexity O(n) + O(nlogn) = O(nlogn)
Edit:
If you can't use set, you should mention that, but as a hint, assuming you can use sorted, you can still do the pull out uniques in O(n) if you use a deque (which has O(1) worst case insertion). Something like
rez = deque()
last = None
for val in sorted(input):
if val != last:
rez.add(val) # or whatever deque uses to add to the end of the list
last = val

Categories