Understanding what's happening in the Kadane Algorithm (Python) - python

I'm having a difficult time understanding what's happening in these two examples I found of the Kadane Algorithm. I'm new to Python and I'm hoping understanding this complex algo will help me see/read programs better.
Why would one example be better than the other, is it just List vs Range? Is there something else that makes one of the examples more efficient? Also, some questions about what's happening in the calculations. (questions inside the examples)
I've used PythonTutor to help me get a visual on what exactly is happening step by step.
Example 1:
In PythonTuter, when you select next step in the screen shot provided, The value of so_far turns to 1. How is this? Giving the sum, I've thought its adding -2 + 1 which is -1, so when so_far turns to 1, how is this?
def max_sub(nums):
max_sum = 0
so_far = nums[0]
for x in nums[1:]:
so_far = max(x, x + so_far)
max_sum = max(so_far, max_sum)
return max_sum
nums = [-2, 1, -3, 4, -1, 2, 1, -5, 4]
max_sub(nums)
6
Example 2:
Similar question for this one, when I select NEXT step, the max_sum turns from -2 to 4... but how so if it's adding the element in the 2 (which is 4). To me, that would be -2 + 4 = 2 ?
def maxSubArraySum(a,size):
max_so_far =a[0]
curr_max = a[0]
for i in range(1,size):
curr_max = max(a[i], curr_max + a[i])
max_so_far = max(max_so_far,curr_max)
return max_so_far
a = [-2, -3, 4, -1, -2, 1, 5, -3]
print("Maximum contiguous sum is" , maxSubArraySum(a,len(a)))
Maximum contiguous sum is 7
So, this would be a 2 part question than:
[1]Based on understandings, why would one be more pythonic and more efficient than the other?
[2]How can I better understand the calculations happening in the examples?

Simply watch each step and you could figure out this problem:
[Notes] this program seems to work based on the assumption of mixed integer numbers? only positive and negatives.
# starting
so_far = -2 # init. to nums[0]
max_sum = 0
# in the for-loop:
x = 1 # starting with nums[1:]
so_far = max(1, -1) -> 1 (x is 1, -2 + 1)
max_sum = max(0, 1) -> 1
..... continue .... each step is to find the max accumulated numbers sum, as it's evident in the max( ) statement. *There is no `sum` involved, except it tried to determine the current x is good (not negative) then so add it to the so_far.

More performance measurement data points to compare these two different approaches shown that first example is definitely faster ~22-24% faster than 2nd one with input size of 2k.
if __name__ == '__main__':
L = list(range(-1_000, 1_000, 1))
random.shuffle(L)
baseTestCase = partial(max_sub, nums=L)
print(timeit.timeit(baseTestCase, number=100_000)) # 86.0588067
rangeTestCase = partial(max_SubArraySum, a=L, size=len(L))
print(timeit.timeit(rangeTestCase, number=100_000)) # 105.415955

Related

Hard to understand the dynamic programming part of Leetcode question 494, Target Sum

I found this solution which is extremely fast(beats 99.5%) and space-saving(beats 95%) at the same time.
I understand most part, except the line: dp[j]=dp[j]+dp[j-num]
I understand the w is calculating the sum of all numbers with '+' sign.
Can anyone explain what this part means? dp[j]=dp[j]+dp[j-num]
Here is the code:
class Solution:
def findTargetSumWays(self, nums: List[int], S: int) -> int:
w=(S+sum(nums))//2
if (S+sum(nums))%2==1: return 0
if sum(nums)<S: return 0
dp=[0]*(w+1)
dp[0]=1
for num in nums:
for j in range(w,num-1,-1):
dp[j]=dp[j]+dp[j-num]
return dp[-1]
Here is the question:
You are given a list of non-negative integers, a1, a2, ..., an, and a target, S. Now you have 2 symbols + and -.
For each integer, you should choose one from + and - as its new symbol.
Find out how many ways to assign symbols to make sum of integers equal to target S.
Example 1:
Input: nums is [1, 1, 1, 1, 1], S is 3.
Output: 5
Explanation:
-1+1+1+1+1 = 3
+1-1+1+1+1 = 3
+1+1-1+1+1 = 3
+1+1+1-1+1 = 3
+1+1+1+1-1 = 3
There are 5 ways to assign symbols to make the sum of nums be target 3.
Sahadat's answer is correct but may not be comprehensive enough for OP. Let me add more details. The mathematical equivalent question is how to choose certain elements from the list to sum to w (no changing signs). This question is however easier to deal with in DP.
IN particular, d[0]=1 since we have unique way to reach 0 (by choosing no element). After that, inductively, for each num we process, we know the number of solutions to reach j is either d[j] (meaning we DO NOT choose num) or d[j-num] (meaning we do choose num). Once we go over all nums in the list, d[w] will contain the number of solutions to reach w. This is also the solution to the original question.
Rather than attempting to theorize about why this works, I'm just gonna walk through the logic and describe it.
In your example, nums = [1, 1, 1, 1, 1]
w=(S+sum(nums))//2 # in your example this is 4
dp=[0]*(w+1) # your array is 5 elements long, its max index is w
dp[0]=1 # initialize first index to 1, everything else is 0
for num in nums: # num will be 1 for 5 iterations
for j in range(w,num-1,-1): # j starts at the end and iterates backward.
# here j goes from index 4 to 0 every time
dp[j]=dp[j]+dp[j-num]. # remember dp = [1, 0, 0, 0, 0]. but after
# the first completion of the inner loop, dp
# will be [1, 2, 0, 0, 0]. remember you have
# negative indices in this language.
return dp[-1] # so after all iterations... dp is [1, 2, 3, 4, 5]
you're just kind of pushing that initial 1 - i.e. dp = [1, 0, 0, 0, 0] - toward the end of the array. And this is happening in a way that is dependent on the size of num is, at each iteration.
Why does this work? Let's try a different example and see the contrast.
Let's say nums = [7, 13] and S is 20. The answer should be 1, right? 7 + 13 = 20, that's all that is possible.
w=(S+sum(nums))//2 # now this is 20.
dp=[0]*(w+1) # your array is 21 elements long, its max index is w
dp[0]=1 # initialize first index to 1, everything else is 0
for num in nums: # so num will be 7, then 13, 2 iterations
for j in range(w,num-1,-1): # j starts at the end and iterates backward.
# on first iteration j goes from index 20 to 6
# on 2nd iteration j goes from index 20 to 12
dp[j]=dp[j]+dp[j-num]. # so when num = 7, and you reach the 2nd last
# iteration of the inner loop... index 7
# you're gona set dp[7] to 1 via dp[7]+dp[0]
# subsequently, when num = 13... when you're
# at dp[20] you'll do dp[20] = d[20]+d[7].
# Thus now dp[20] = 1, which is the answer.
return dp[-1]
So the larger num is... the more you skip around, the less you move 1s up to the top. The larger num values disperse the possible combinations that can add up to a given S. And this algorithm accounts for that.
But notice the manner in which the 1 at dp[0] was moved up to dp[20] in the final example. It was precisely because 7 + 13 = 20, the only combination. These iterations and summations account for all these possible combinations.
But how are potential sign flips accounted for in the algorithm? Well, let's say S was not 20 in the previous, but it was 6. i.e. 13 - 7 = 6, the only solution. Well... lets look at that example:
w=(S+sum(nums))//2 # now this is 13. (20+6)/2 = 13.
dp=[0]*(w+1) # your array is 14 elements long
dp[0]=1 # initialize first index to 1 again
for num in nums: # so num will be 7, then 13, 2 iterations
for j in range(w,num-1,-1):
# j starts at the end and iterates backward.
# on first iteration j goes from index 13 to 6
# on 2nd iteration j goes from index 13 to 12
dp[j]=dp[j]+dp[j-num].
# so this time, on the 2nd iteration, when
# num = 13... you're at dp[13]
# you'll do dp[13] = d[13]+d[0].
# Thus now dp[13] = 1, which is the answer.
# This time the 1 was moved directly from d0 to the end
return dp[-1]
So this time, when the combination included a negative number, it was the size of w being 13 that led to the solution: the index d[13] + d[0] = 1. The array size was smaller such that the larger number added with 1 directly.
It's quite odd, I will admit, you have 2 different mechanisms working depending on how what the signs are, potentially. In the case of a positive and negative number summing to S, it was the smaller overall array length that led to the 1 moving to the final slot. For 2 positive numbers... there was this intermediary moving the 1 up that happened. But at least you can see the difference...

Divide and conquer strategy python

I am trying to write a code which to compare each element of a list and give the index of the closet larger number in two direction: left or right. Using the divide and conquer method
For example:
Input: arr=[5,2,6,8,1,4,3,9]
Output:
Left=[None, 0, None, None, 3, 3, 5, None]
Right=[2, 2, 3, 7, 5, 7, 7, None]
Input: arr=[4,2,3,1,8,5,6,9]
Output:
L=[None, 0, 0, 2, None, 4, 4, None]
R=[4, 2, 4, 4, 7, 6, 7, None]
This is what I have now:
arr = [5,2,6,8,1,4,3,9]
def Left(arr):
L = []
for i in range(len(arr)):
flag = True
for j in range(i,-1,-1):
if (arr[i] < arr[j]):
L.append(j)
flag = False
break
if flag:
L.append(None)
return L
def Right(arr):
R = []
for i in range(len(arr)):
flag = True
for j in range(i, len(arr), 1):
if (arr[i] < arr[j]):
R.append(j)
flag = False
break
if flag:
R.append(None)
return R
print(*Left(arr), sep = ",")
print(*Right(arr), sep =",")
Am I doing it in a right way? Thank you.
This is my python version code for the algorithm in its "closest larger right" version.
Obviously, as you can see it is recursive. Recursion is really elegant but a bit tricky because few lines of code condense lots of concepts regarding to algorithms design and the own language they are coded. In my opinion 4 relevant moments are happening:
1) Recursive calls. Function is call to itself. During this step the list progresible slice into halves. Once the atomic unit is reached the base algorithm will be executed over them firstly (step 3). if not solution is reached greater list sizes will be involved in the calculation in further recursions.
2) Termination condition. Previous step is not run forever, it allows stop recursion and going to the next step (base algorithm). Now the code has len(arr) > 1: that means that the atomic unit will be pairs of numbers (or one of three in case of odd list). You can increase the number so that the recursive function will stay less time slicing the list and summarizing the results, but the counterpart is that in a parallelized environment "workers" will have to digest a bigger list.
3) Base algorithm. It makes the essential calculation. Whatever the size of the list, it returns the indexes of its elements to the right closest larger number
4) "Calculation saving". The base algorithm no need to calculated indexes on those numbers resolved in previous recursions. There is also a break to stop calculations once the number gets the index in the current recursion list.
Other algorithms models could be design, more efficient for sure. It occurs to me ones based on dictionaries or on different slicing strategies.
def closest_larger_right(arr):
len_total = len(arr)
result = [None] * len_total
def recursive(arr, len_total, position=0):
# 2) Termination condition
if len(arr) > 1:
mid = len(arr) // 2
left = arr[:mid]
right = arr[mid:]
position_left = 0 + position
position_right = len(left) + position
# 1) Recursive calls
recursive(left, len_total, position_left)
recursive(right, len_total, position_right)
# 3) Base algorithm
for i in range(len(arr)-1):
# 4) Calculation saving
if result[i + position] is None:
for j in range(i+1, len(arr), 1):
if (arr[i] < arr[j]):
result[i + position] = j + position
break
return result
return recursive(arr, len_total)
# output: [2, 2, 3, 7, 5, 7, 7, None]
print(closest_larger_right([5, 2, 6, 8, 1, 4, 3, 9]))
I am not sure how a divide-and-conquer algorithm can be applied here, but here's an improvement to your current algorithm that also already has optimal running time of O(n) for n elements in the array:
stack = []
left = []
for i in range(len(arr)):
while stack and arr[stack[-1]] < arr[i]:
stack.pop()
left.append(stack[-1] if stack else None)
stack.append(i)
This uses a stack to keep track of the indices of the larger elements to the left, popping indices from the stack as long as their element are smaller than the current element, and then adding the current index itself. Since each element is added to and popped from the stack at most once, running time is O(n). The same can be used for the right-side elements simply by iterating the array in reverse order.

How to rewrite a program to use recursion?

Just started to deal with recursion - I don’t understand everything in it yet. I think that i don't use a basic conditional, but i don't have any idea how to write it. The program itself works and execute everything i need, but there is no recursion.
The idea of the program is that there is a list in which is neede to sum of every x'th number in the list - x here as a step. If x = 0, then the sum is automatically zero. If x is out of range, then the sum is also 0
def sum_elements(nums, x) -> int::
if x not in range(-len(nums), len(nums)) or x == 0:
return 0
if x > 0:
nums = nums[x - 1::x]
return sum(nums)
return sum_elements(nums[::-1], -x)
if __name__ == '__main__':
print(sum_elements([], 0)) # x = 0 -> 0
print(sum_elements([1, 5, 2, 5, 9, 5], 3)) # 2 + 5 = 7
print(sum_elements([5, 6, 10, 20], -2)) # 10 + 5 = 15
print(sum_elements([5, 6, 10, 20], -20)) # x = -20 -> 0
Recursion is when a function calls itself and there a few (non-formal) rules that are always good to keep in the back of your mind when writing these:
1. The base case.
Every recursion function must have a base case that acts as essentially the end of the stack in the recursive call.
2. Every recursive function abides by the non-base(s) and the base case.
In other words, your code must be written in a way that the function either calls itself, or it terminates the recursive call. You can either do this by doing if and else statements, or only writing if statements to catch the base case(s).
3. The input of the function should keep in mind the state of the previous function.
In math, you might remember functions that call themselves (syntax switched for the case of explanation):
f(x)_(n=0) = f(x)_(n=1) + 10
which becomes:
f(x)_(n=1) = ( f(x)_(n=2) + 10 ) + 10
and so on. In essence, you are writing this with code and setting a base case that might say (for the example above, i.e.) "stop when n is 10". If that was the case, you should notice the cascading effect when we are layers deep into that function and when f(x)_(n=10) makes its appearance (and lets says returns 0 + 10) how we would have a final form of f(x)_(n=0) = 0 + 10 + 10 + 10 + ....
So for this function you instead have two inputs, nums and x. These inputs are what we will be modifying as we go down the recursion's stack.
1. Writing our base case.
Writing the base case is typically the easiest part of writing a recursion function. We know, for your problem, the following cases must be caught:
If x is not in the range of the length of nums, then we must return 0.
If len(nums) is 0, then we should return 0.
So lets begin:
def sum_elements(nums, x) -> int:
if len(nums) == 0 or not x in range(-len(nums), len(nums)):
return 0
Notice, however, that range(len([1, 2])) will return range(0, 2) but list(range(0, 2)) will return [0, 1]. Therefore, we must ensure to add a 1 to our len(nums) so that we can truly see if x is within the proper range:
def sum_elements(nums, x) -> int:
if len(nums) == 0 or not x in range(-len(nums), len(nums) + 1):
return 0
Notice that range(-len(nums), len(nums) + 1) for when nums = [1, 2, 3] is equals to range(-3, 4), but list(range(-3, 4)) is equals to [-3, -2, -1, 0, 1, 2, 3]. So therefore, we do not need a -len(nums) + 1 or -len(nums) - 1.
Once we have figured out the base case, we can start working on our actual function. At this point we have done #1 and a portion of #2, but we now must write our non-base(s) case(s).
2. Identifying our other-case(s):
As written in #2, our function input is what is dynamically changing as we go down our function stack. Therefore, we need to think about how we need to modify nums and/or x to fit our purposes. The first thing you should look at, however, is what would happen if we only change one of those variables as we go down the stack.
Keep nums constant, modify x: We know our base case ensures x stays within the constrain of the length of nums in both the positive and negative direction, which is good. However, we must increment x every time the function runs by the original x, or x_0. If we create the function and on every call say x + x, we are not adding the original x to itself, but rather adding the newer x's to itself. This is a problem. Take the following for example:
def sum_elements(nums, x) -> int:
print(nums, x)
# Base case.
if len(nums) == 0 or not x in range(-len(nums), len(nums) + 1):
return 0
# Other case. We must differentiate between positive x, and negative x.
if x > 0:
# Since x is an index that starts at 1, not 0, we must do x-1.
number = nums[x - 1]
else:
# For negative values of x this does not apply. [1, 2][-2] = 1
number = nums[x]
return number + sum_elements(nums, x + x)
Notice how we get:
# [NUMS] x
[1, 2, 3, 4, 5, 6] 2
[1, 2, 3, 4, 5, 6] 4
[1, 2, 3, 4, 5, 6] 8
# OUTPUT
6
and how the x value on the third call is 8. This is no bueno. The more you practice recursion, the more intuitive this concept will become on noticing how changing a certain input might not be the best. You ought to think: "what will this value be when the function continues down the stack?"
Keep x constant, modify nums: If we do this way we should be certain that we will not have issues with the value of x. The issue, then, becomes how we will modify the nums list and use x for our advantage. What we do know, is that x can be technically used as an index, as demonstrated above. So, therefore, what if instead of modifying the index, we modify the list in which that index takes from? Take the following for example:
nums = [1, 2, 3, 4]
x = 2
print(nums) # > [1, 2, 3, 4]
print(nums[x - 1]) # > 2
nums = nums[x:] # > [3, 4]
print(nums[x - 1]) # > 4
So it does seem like we can modify the list and keep a constant x to retrieve the information we want. Awesome! In such case #2 is the way to go.
3. Writing our other-case(s).
So now we will try to now write a function that keeps x constant, but modifies nums. We have a general idea from the code above, and we know from the previous point that we will have to deal with -x and x differently. Therefore, lets write something:
def sum_elements2(nums, x) -> int:
# Base case.
if len(nums) == 0 or not x in range(-len(nums), len(nums) + 1):
return 0
# Other case.
if x >= 0:
number = nums[x - 1]
nums = nums[x:]
else:
number = nums[x]
# Not sure what goes here.
return number + sum_elements(nums, x)
If we test the function above, it seems that it works for any positive x, but won't work for negative values of x. It makes sense, however, that whatever we do to the positive side, we must do the opposite to the negative side. If we try to use nums = nums[:x] we very quickly realize it works. Our final function becomes:
def sum_elements(nums, x) -> int:
# Base case.
if len(nums) == 0 or not x in range(-len(nums), len(nums) + 1):
return 0
# Other case.
if x >= 0:
number = nums[x - 1]
nums = nums[x:]
else:
number = nums[x]
nums = nums[:x]
return number + sum_elements(nums, x)
Running Examples
If we run examples with the above function, we get:
print(sum_elements([1, 2, 3, 4, 5, 6], 2)) # > 2 + 4 + 6 = 12
print(sum_elements([], 0)) # > 0
print(sum_elements([1, 5, 2, 5, 9, 5], 3)) # > 7
print(sum_elements([5, 6, 10, 20], -2)) # > 15
print(sum_elements([5, 6, 10, 20], -20)) # > 0
Maybe this approach can help you understand.
It starts from the first element and sums the rest every x ones.
That is my assumption, as you haven't provided an input and its desired output as an example.
In case you need to start from the xth element the code can be easily modified, I leave it to you to experiment with it.
def sum_elements(nums, x) -> int:
if x>0 and x<=len(nums):
return nums[0] + sum_elements(nums[x:], x)
return 0
lst = [1, 2, 3, 4, 5, 6]
print(sum_elements(lst, 2))
print(sum_elements(lst, 3))
print(sum_elements(lst, 0))
produces
9
5
0
Note: it just demonstrates recursion, but it's not optimal for a number of reasons.
Also it discards negative values of x

Trying to optimize this code: iterating over a list to replace its values

I am trying to do a challenge in Python, the challenge consists of :
Given an array X of positive integers, its elements are to be transformed by running the following operation on them as many times as required:
if X[i] > X[j] then X[i] = X[i] - X[j]
When no more transformations are possible, return its sum ("smallest possible sum").
Basically you pick two non-equal numbers from the array, and replace the largest of them with their subtraction. You repeat this till all numbers in array are same.
I tried a basic approach by using min and max but there is another constraint which is time. I always get timeout because my code is not optimized and takes too much time to execute. Can you please suggest some solutions to make it run faster.
def solution(array):
while len(set(array)) != 1:
array[array.index(max(array))] = max(array) - min(array)
return sum(array)
Thank you so much !
EDIT
I will avoid to spoil the challenge... because I didn't find the solution in Python. But here's the general design of an algorithm that works in Kotlin (in 538 ms). In Python I'm stuck at the middle of the performance tests.
Some thoughts:
First, the idea to remove the minimum from the other elements is good: the modulo (we remove the minimum as long as it is possible) will be small.
Second, if this minimum is 1, the array will be soon full of 1s and the result is N (the len of the array).
Third, if all elements are equal, the result is N times the value of one element.
The algorithm
The idea is to keep two indices: i is the current index that cycles on 0..N and k is the index of the current minimum.
At the beginning, k = i = 0 and the minimum is m = arr[0]. We advance i until one of the following happen:
i == k => we made a full cycle without updating k, return N*m;
arr[i] == 1 => return N;
arr[i] < m => update k and m;
arr[i] > m => compute the new value of arr[i] (that is arr[i] % m or m if arr[i] is a multiple of m). If thats not m, thats arr[i] % m < m: update k and m;
arr[i] == m => pass.
Bascially, we use a rolling minimum and compute the modulos on the fly until all element are the same. That spares the computation of a min of the array periodically.
PREVIOUS ANSWER
As #BallpointBen wrote, you'll get the n times the GCD of all numbers. But that's cheating ;)! If you want to find a solution by hand, you can optimize your code.
While you don't find N identical numbers, you use the set, max (twice!), min and index functions on array. Those functions are pretty expensive. The number of iterations depend on the array.
Imagine the array is sorted in reverse order: [22, 14, 6, 2]. You can replace 22 by 22-14, 14 by 14-6, ... and get: [8, 12, 4, 2]. Sort again: [12, 8, 4, 2], replace again: [4, 4, 4, 2]. Sort again, replace again (if different): [4, 4, 2, 2], [4, 2, 2, 2], [2, 2, 2, 2]. Actually, in the first pass 14 could be replaced by 14-2*6 = 2 (as in the classic GCD computation), giving the following sequence:
[22, 14, 6, 2]
[8, 2, 2, 2]
[2, 2, 2, 2]
The convergence is fast.
def solution2(arr):
N = len(arr)
end = False
while not end:
arr = sorted(arr, reverse=True)
end = True
for i in range(1, N):
while arr[i-1] > arr[i]:
arr[i-1] -= arr[i]
end = False
return sum(arr)
A benchmark:
import random
import timeit
arr = [4*random.randint(1, 100000) for _ in range(100)] # GCD will be 4 or a multiple of 4
assert solution(list(arr)) == solution2(list(arr))
print(timeit.timeit(lambda: solution(list(arr)), number=100))
print(timeit.timeit(lambda: solution2(list(arr)), number=100))
Output:
2.5396839629975148
0.029025810996245127
def solution(a):
N = len(a)
end = False
while not end:
a = sorted(a, reverse=True)
small = min(a)
end = True
for i in range(1, N):
if a[i-1] > small:
a[i-1] = a[i-1]%small if a[i-1]%small !=0 else small
end = False
return sum(a)
made it faster with a slight change
This solution worked for me. I iterated on the list only once. initially I find the minimum and iterating over the list I replace the element with the rest of the division. If I find a rest equal to 1 the result will be trivially 1 multiplied by the length of the list otherwise if it is less than the minimum, i will replace the variable m with the minimum found and continue. Once the iteration is finished, the result will be the minimum for the length of the list.
Here the code:
def solution(a):
L = len(a)
if L == 1:
return a[0]
m=min(a)
for i in range(L):
if a[i] != m:
if a[i] % m != 0:
a[i] = a[i]%m
if a[i]<m:
m=a[i]
elif a[i] % m == 0:
a[i] -= m * (a[i] // m - 1)
if a[i]==1:
return 1*L
return m*L

Is there a python function that returns the first positive int that does not occur in list?

I'm tryin to design a function that, given an array A of N integers, returns the smallest positive integer (greater than 0) that does not occur in A.
This code works fine yet has a high order of complexity, is there another solution that reduces the order of complexity?
Note: The 10000000 number is the range of integers in array A, I tried the sort function but does it reduces the complexity?
def solution(A):
for i in range(10000000):
if(A.count(i)) <= 0:
return(i)
The following is O(n logn):
a = [2, 1, 10, 3, 2, 15]
a.sort()
if a[0] > 1:
print(1)
else:
for i in range(1, len(a)):
if a[i] > a[i - 1] + 1:
print(a[i - 1] + 1)
break
If you don't like the special handling of 1, you could just append zero to the array and have the same logic handle both cases:
a = sorted(a + [0])
for i in range(1, len(a)):
if a[i] > a[i - 1] + 1:
print(a[i - 1] + 1)
break
Caveats (both trivial to fix and both left as an exercise for the reader):
Neither version handles empty input.
The code assumes there no negative numbers in the input.
O(n) time and O(n) space:
def solution(A):
count = [0] * len(A)
for x in A:
if 0 < x <= len(A):
count[x-1] = 1 # count[0] is to count 1
for i in range(len(count)):
if count[i] == 0:
return i+1
return len(A)+1 # only if A = [1, 2, ..., len(A)]
This should be O(n). Utilizes a temporary set to speed things along.
a = [2, 1, 10, 3, 2, 15]
#use a set of only the positive numbers for lookup
temp_set = set()
for i in a:
if i > 0:
temp_set.add(i)
#iterate from 1 upto length of set +1 (to ensure edge case is handled)
for i in range(1, len(temp_set) + 2):
if i not in temp_set:
print(i)
break
My proposal is a recursive function inspired by quicksort.
Each step divides the input sequence into two sublists (lt = less than pivot; ge = greater or equal than pivot) and decides, which of the sublists is to be processed in the next step. Note that there is no sorting.
The idea is that a set of integers such that lo <= n < hi contains "gaps" only if it has less than (hi - lo) elements.
The input sequence must not contain dups. A set can be passed directly.
# all cseq items > 0 assumed, no duplicates!
def find(cseq, cmin=1):
# cmin = possible minimum not ruled out yet
size = len(cseq)
if size <= 1:
return cmin+1 if cmin in cseq else cmin
lt = []
ge = []
pivot = cmin + size // 2
for n in cseq:
(lt if n < pivot else ge).append(n)
return find(lt, cmin) if cmin + len(lt) < pivot else find(ge, pivot)
test = set(range(1,100))
print(find(test)) # 100
test.remove(42)
print(find(test)) # 42
test.remove(1)
print(find(test)) # 1
Inspired by various solutions and comments above, about 20%-50% faster in my (simplistic) tests than the fastest of them (though I'm sure it could be made faster), and handling all the corner cases mentioned (non-positive numbers, duplicates, and empty list):
import numpy
def firstNotPresent(l):
positive = numpy.fromiter(set(l), dtype=int) # deduplicate
positive = positive[positive > 0] # only keep positive numbers
positive.sort()
top = positive.size + 1
if top == 1: # empty list
return 1
sequence = numpy.arange(1, top)
try:
return numpy.where(sequence < positive)[0][0]
except IndexError: # no numbers are missing, top is next
return top
The idea is: if you enumerate the positive, deduplicated, sorted list starting from one, the first time the index is less than the list value, the index value is missing from the list, and hence is the lowest positive number missing from the list.
This and the other solutions I tested against (those from adrtam, Paritosh Singh, and VPfB) all appear to be roughly O(n), as expected. (It is, I think, fairly obvious that this is a lower bound, since every element in the list must be examined to find the answer.) Edit: looking at this again, of course the big-O for this approach is at least O(n log(n)), because of the sort. It's just that the sort is so fast comparitively speaking that it looked linear overall.

Categories