Contains Duplicate II Exceeded time solution - python

I've written a Python solution for the Contains Duplicate II leetcode problem, but when I test it I get a "time limit exceeded" message. However, I'm confused because I thought my solution is O(n) runtime. Could someone please explain?
def containsNearbyDuplicate(self, nums, k):
"""
:type nums: List[int]
:type k: int
:rtype: bool
"""
for i in range(len(nums)):
lookingfor = nums[i]
rest = nums[i+1: ]
if lookingfor in rest:
secondindex = rest.index(lookingfor)+i+1
if abs(i - secondindex) <= k:
return True
return False

In general, using in to search for an element in a list takes linear time.
Applying this to your code, we can observe that the search in rest takes O(len(nums)) times, which you repeat O(len(nums)) times. That leads to a quadratic runtime, causing your submission to TLE.
To get a linear runtime, use a dictionary:
class Solution:
def containsNearbyDuplicate(self, nums, k):
seen = {}
for index, element in enumerate(nums):
if element in seen and index - seen[element] <= k:
return True
seen[element] = index
return False

Related

LeetCode 347 Solution Time Complexity Calculation

I have been following Neetcode and working on several Leetcode problems. Here on 347 he advises that his solution is O(n), but I am having a hard time really breaking out the solution to determine why that is. I feel like it is because the nested for loop only runs until len(answers) == k.
I started off by getting the time complexity of each individual line and the first several are O(n) or O(1), which makes sense. Once I got to the nested for loop I was thrown off because it would make sense that the inner loop would run for each outer loop iteration and result in O(n*m), or something of that nature. This is where I think the limiting factor of the return condition comes in to act as the ceiling for the loop iterations since we will always return once the answers array length equals k (also, in the problem we are guaranteed a unique answer). What is throwing me off the most is that I default to thinking that any nested for loop is going to be O(n^2), which doesn't seem to be uncommon, but seems to not be the case every time.
Could someone please advise if I am on the right track with my suggestion? How would you break this down?
class Solution:
def topKFrequent(self, nums: List[int], k: int) -> List[int]:
countDict = {}
frequency = [[] for i in range(len(nums)+1)]
for j in nums:
countDict[j] = 1 + countDict.get(j, 0)
for c, v in countDict.items():
frequency[v].append(c)
answer = []
for n in range(len(frequency)-1, 0, -1):
for q in frequency[n]:
print(frequency[n])
answer.append(q)
if len(answer) == k:
return answer
frequency is mapping between frequency-of-elements and the values in the original list. The number of total elements inside of frequency is always equal or less than the number of original items in in nums (because it is mapping to the unique values in nums).
Even though there is a nested loop, it is still only ever iterating at some O(C*n) total values, where C <= 1 which is equal to O(n).
Note, you could clean up this answer fairly easily with some basic helpers. Counter can get you the original mapping for countDict. You can use a defaultdict to construct frequency. Then you can flatten the frequency dict values and slice the final result.
from collections import Counter, defaultdict
class Solution:
def top_k_frequent(self, nums: list[int], k: int) -> list[int]:
counter = Counter(nums)
frequency = defaultdict(list)
for num, freq in counter.items():
frequency[freq].append(num)
sorted_nums = []
for freq in sorted(frequency, reverse=True):
sorted_nums += frequency[freq]
return sorted_nums[:k]
A fun way to do this in a one liner!
class Solution:
def top_k_frequent(self, nums: list[int], k: int) -> list[int]:
return sorted(set(nums), key=Counter(nums).get, reverse=True)[:k]

Memoized solution to Combination IV on Leetcode gives TLE when an array is used for caching

While trying to solve Combination IV on Leetcode, I came up with this memoized solution:
def recurse(nums, target, dp):
if dp[target]!=0:
return dp[target]
if target==0:
return dp[0]
for n in nums:
if n<=target:
dp[target] += recurse(nums, target-n, dp)
return dp[target]
class Solution:
def combinationSum4(self, nums: List[int], target: int) -> int:
dp = [0]*(target+1)
dp[0] = 1
return recurse(nums, target, dp)
But this gives me a Time Limit Exceeded error.
Another memoized solution, that uses a dictionary to cache values instead of a dp array, runs fine and does not exceed time limit. The solution is as follows:
class Solution:
def combinationSum4(self, nums: List[int], target: int) -> int:
memo = {}
def dfs(nums, t, memo):
if t in memo:
return memo[t]
if t == 0:
return 1
if t < 0:
return 0
res = 0
for i in nums:
res += dfs(nums, t-i, memo)
memo[t] = res
return res
return (dfs(nums, target, memo))
Why does using a dict instead of an array improve runtime? It is not like we are iterating through the array or dict, we are only using them to store and access values.
EDIT: The test case on which my code crashed is as follows:
nums = [10,20,30,40,50,60,70,80,90,100,110,120,130,140,150,160,170,180,190,200,210,220,230,240,250,260,270,280,290,300,310,320,330,340,350,360,370,380,390,400,410,420,430,440,450,460,470,480,490,500,510,520,530,540,550,560,570,580,590,600,610,620,630,640,650,660,670,680,690,700,710,720,730,740,750,760,770,780,790,800,810,820,830,840,850,860,870,880,890,900,910,920,930,940,950,960,970,980,990,111]
target = 999
The two versions of the code are not the same. In the list version, you keep recursing if your "cached" value is 0. In the dict version, you keep recursing if the current key is not in the cache. This makes a difference when the result is 0. For example, if you try an example with nums=[2, 4, 6, 8, 10] and total=1001, there is no useful caching done in the list version (because every result is 0).
You can improve your list version by initializing every entry to None rather than 0, and using None as a sentinel value to determine if the result isn't cached.
It's also easier to drop the idea of a cache, and use a dynamic programming table directly. For example:
def ways(total, nums):
r = [1] + [0] * total
for i in range(1, total+1):
r[i] = sum(r[i-n] for n in nums if i-n>=0)
return r[total]
This obviously runs in O(total * len(nums)) time.
It's not necessary here (since total is at most 1000 in the question), but you can in principle keep only the last N rows of the table as you iterate (where N is the max of the nums). That reduces the space used from O(total) to O(max(nums)).

No output from algorithm for Two Sum problem in Leetcode

The code I have written that aims to solve the Two Sum problem:
def twoSum(self, nums: List[int], target: int) -> List[int]:
dict = {}
for i in range(len(nums)):
complement = target - nums[i]
if complement in dict:
return [dict[complement], i]
dict[complement] = i
I have just started practicing on LeetCode and I am experiencing issues with solving the Two Sum problem.
The problem statement:
Given an array of integers nums and an integer target, return indices
of the two numbers such that they add up to target.
You may assume that each input would have exactly one solution, and
you may not use the same element twice.
You can return the answer in any order.
My reasoning is to create a dictionary and iterate through the numbers, and for each number generate a complement number that I then look for in my dictionary, if it in fact is there, then I return the index that generates that complement and the current index i. Otherwise, I insert the key with the complement.
Somehow my function does not output anything, just two empty brackets. Below there is a sample input and correct output.
Input: nums = [3,2,4], target = 6
Output: [1,2]
The last line is wrong. It should read dict[nums[i]] = i, because you are storing indeces for their values. Here is the entire function with a better variable name that doesn't shadow the built-in type:
def twoSum(self, nums, target):
dct = {}
for i in range(len(nums)):
complement = target - nums[i]
if complement in dct:
return [dct[complement], i]
dct[nums[i]] = i
Or more concise using enumerate and storing indeces for their complement values:
def twoSum(self, nums, target):
dct = {}
for i, num in enumerate(nums):
if num in dct:
return [dct[num], i]
dct[target - num] = i
You may notice that you had a mixture of the two approaches. You looked for the complement in dct, and also wanted to store it for the current index. One of the two needs to be the current value.

simplify code in depth first search function call

I am one of the many hanging around stack overflow for knowledge and help, especially those who are out of school already. Much of my CS knowledge is learnt from this excellent web. Sometimes my question can get quite silly. Please forgive me as a newbie.
I am working on the Largest Divisible Subset problem on leetcode. There are many good solutions there, but I try to solve with my own thought first. My strategy is turning this problem into a combination problem and find the largest one who meets the divisible requirement.I use depth-first search method and isDivisible to create such a combinations. All the combinations I found meet the divisible requirement.
Here is how I would code to conduct all possible combinations of a given sequence.
def combinations(nums, path, res):
if not nums:
res.append(path)
for i in range(len(nums)):
combinations(nums[i+1:], path+[nums[i]], res)
Following is my code to create a combination of all possible divisible subsets. The code is almost exactly the same as the above code, except that that I add isDivisible to determine whether or not to add the nums[i] to the path.
def isDivisible(num, list_):
return all([num%item==0 or item%num==0 for item in list_])
def dfs(nums, path, res):
if not nums:
res.append(path)
return
for i in range(len(nums)):
# if not path or isDivisible(nums[i], path):
# path = path + [nums[i]]
# dfs(nums[i+1:], path , res)
dfs(nums[i+1:], path + ([nums[i]] if not path or isDivisible(nums[i], path) else []), res)
path = []
res = []
dfs(nums, [], res)
return sorted(res, key=len)
It works fine (almost got accepted but exceeded the time limit for large input) because of the performance of dfs. My question here is how I can simplify the last line of code in dfs by moving ([nums[i]] if not path or isDivisible(nums[i], path) else []) out of the function call, which is too bulky inside a function call. I tried to use the three lines in the comment to replace the last line of code, but it failed because path will propagate every nums[i] who meets the condition to next dfs. Could you please teach me to simplify the code and give some general suggestions. Thank you very much.
Not sure about your method, check first to see if it would get accepted.
Here is a bit simpler to implement solution (which I guess it would be one of Stefan's suggested methods):
class Solution:
def largestDivisibleSubset(self, nums):
hashset = {-1: set()}
for num in sorted(nums):
hashset[num] = max((hashset[k] for k in hashset if num % k == 0), key=len) | {num}
return list(max(hashset.values(), key=len))
Here is LeetCode's DP solution with comments:
class Solution(object):
def largestDivisibleSubset(self, nums):
"""
:type nums: List[int]
:rtype: List[int]
"""
if len(nums) == 0:
return []
# important step !
nums.sort()
# The container that keep the size of the largest divisible subset that ends with X_i
# dp[i] corresponds to len(EDS(X_i))
dp = [0] * (len(nums))
""" Build the dynamic programming matrix/vector """
for i, num in enumerate(nums):
maxSubsetSize = 0
for k in range(0, i):
if nums[i] % nums[k] == 0:
maxSubsetSize = max(maxSubsetSize, dp[k])
maxSubsetSize += 1
dp[i] = maxSubsetSize
""" Find both the size of largest divisible set and its index """
maxSize, maxSizeIndex = max([(v, i) for i, v in enumerate(dp)])
ret = []
""" Reconstruct the largest divisible subset """
# currSize: the size of the current subset
# currTail: the last element in the current subset
currSize, currTail = maxSize, nums[maxSizeIndex]
for i in range(maxSizeIndex, -1, -1):
if currSize == dp[i] and currTail % nums[i] == 0:
ret.append(nums[i])
currSize -= 1
currTail = nums[i]
return reversed(ret)
I guess maybe this LeetCode solution would be a bit closer to your method, is a recursion with memoization.
class Solution:
def largestDivisibleSubset(self, nums: List[int]) -> List[int]:
def EDS(i):
""" recursion with memoization """
if i in memo:
return memo[i]
tail = nums[i]
maxSubset = []
# The value of EDS(i) depends on it previous elements
for p in range(0, i):
if tail % nums[p] == 0:
subset = EDS(p)
if len(maxSubset) < len(subset):
maxSubset = subset
# extend the found max subset with the current tail.
maxSubset = maxSubset.copy()
maxSubset.append(tail)
# memorize the intermediate solutions for reuse.
memo[i] = maxSubset
return maxSubset
# test case with empty set
if len(nums) == 0: return []
nums.sort()
memo = {}
# Find the largest divisible subset
return max([EDS(i) for i in range(len(nums))], key=len)
References
For additional details, you can see the Discussion Board. There are plenty of accepted solutions with a variety of languages and explanations, efficient algorithms, as well as asymptotic time/space complexity analysis1, 2 in there.

Why was my algorithm for this interview a sub-optimal approach?

Given a list of integers, l = [1,5,3,2,6] and a target t = 6, return true if the list contains two distinct integers that sum to the target
I was given this question on a technical Python interview that caused me not to pass. My answer was:
def two_Sum(l, target):
for num in l:
for secondNum in l:
if num != secondNum:
if num + secondNum == target:
return True
The feedback I was given was that my solution was "not optimal". Please help me to understand why this was not the optimal solution and explain in detail what would be optimal for this case!
Your solution has a nested loop iterating the list, which means it's O(n^2) time complexity - and O(1) space, since you don't need to store any data during the iteration.
Reducing to O(n) time complexity is possible like this, coming at the cost of increasing to O(n) space complexity:
def two_sum(l, target):
s = set(l)
for n in l:
delta = target - n
if delta != n and delta in s:
return True
return False
As a slight improvement, you can even avoid to traverse the entire list, but it's still O(n):
def two_sum(l, target):
seen = set()
for n in l:
delta = target - n
if delta != n and delta in seen:
return True
seen.add(n)
return False
you can start by having two pointers (start,end), start will point to start of the list and end will point to end of list, then add them and see if it equals to your target, if equals then print or add to result.
if sum is greater then your target that means decrease your end pointer by 1 and if it's equal to or smaller than your target then increase your start pointer.
def two_Sum(l,target):
start=0
end=len(l)-1
while start!=end:
pair_sum=l[start]+l[end]
if pair_sum==target:
print l[start],l[end]
if pair_sum <= target:
start=start+1
if pair_sum > target:
end = end-1
l=[1,2,3,4,5,6,7,8,9,10]
two_Sum(l,9)
The most efficient way is to hash T-I[i] for each i and check each element as you see it
def sum2(I,T):
h = {}
for itm in I:
if itm in h:
return True
h[T-itm] = 1
return False
This will only go once through your list:
def two_sum(l, t):
s = set(l)
for n in s:
if t-n in s:
if n != t-n:
return True
return False
Your solution is O(n²), as you do a nested iteration of the whole list.
A simple solution with time complexity n log(n) would be:
sort your list
iterate doing a binary search for the complementary to target
Supposing you have binary search implemented in function bs(item, sorted_list):
def two_Sum(l, target):
l_sorted = sorted(l) # n log(n)
return any(bs(target - x, l_sorted) for x in l_sorted) # n log(n)
You can also do some other optimisation like stop iterating if you reach target/2.
Caveat: I don't garantee nor really believe this is the optimal solution, but rather intended to show you a better one and give insight to your improving your own.

Categories