find the duplicate number with O(1) space and O(n) time

find the duplicate number with O(1) space and O(n) time - python

I'm solving a question in leetcode
Given an array nums containing n + 1 integers where each integer is between 1 and n (inclusive), prove that at least one duplicate number must exist. Assume that there is only one duplicate number, find the duplicate one in O(n) time and O(1) space complexity
class Solution(object):
def findDuplicate(self, nums):
"""
:type nums: List[int]
:rtype: int
"""
xor=0
for num in nums:
newx=xor^(2**num)
if newx<xor:
return num
else:
xor=newx
I got the solution accepted but I have been told that it is neither O(1) space nor O(n) time.
can anyone please help me understand why?

Your question is actually hard to answer. Typically when dealing with complexities, there's an assumed machine model. A standard model assumes that memory locations are of size log(n) bits when the input is of size n, and that arithmetic operations on numbers of size log(n) bits are O(1).
In this model, your code isn't O(1) in space and O(n) in time. Your xor value has n bits, and this doesn't fit in a constant memory location (it actually needs n/log(n) memory locations. Similarly, it's not O(n) in time, since the arithmetic operations are on numbers larger than log(n) bits.
To solve your problem in O(1) space and O(n) time, you've got to make sure your values don't get too large. One approach is to xor all the numbers in the array, and then you'll get 1^2^3...^n ^ d where d is the duplicate. Thus you can xor 1^2^3^..^n from the total xor of the array, and find the duplicate value.
def find_duplicate(ns):
r = 0
for i, n in enumerate(ns):
r ^= i ^ n
return r
print find_duplicate([1, 3, 2, 4, 5, 4, 6])
This is O(1) space, and O(n) time since r never uses more bits than n does (that is, approximately ln(n) bits).

Your solution is not O(1) space, meaning: your space/memory is not constant but depending on the input!
newx=xor^(2**num)
This is a bitwise XOR over log_2(2**num) = num bits, where num is one of your input-numbers, resulting in a log_2(2**num) = num bit result.
So n=10 = log_2(2^10) = 10 bits, n=100 = log_2(2^100) = 100 bits. It's growing linearly (not constant).
It's also not within O(n) time-complexity as you got:
an outer loop over all n numbers
and a non-constant / non O(1) inner-loop (see above)
assumption: XOR is not constant in regards to bit-representation of input
that's not always treated like that; but physics support this claim (Chandrasekhar limit, speed of light, ...)

This question has to be solved with linked list Floyd's algorighm.
Convert the array to a linked list. There are n+1 positions but only n values.
For example if you have this array: [1,3,4,2,2] convert it to linked list.
How the pointing works
Starting from index 0, look at which element in that position. it is 1. Then index 0 will point to nums1. 0 is pointing 3. then figure out which value 3 is pointing to. That will nums[3] and so on.
Now you converted this to linked list, you have to use Floyd's hare and tortoise algorithm. Basically you have two pointers, slow and fast. If there is cycle, slow and fast pointers are gonna meet at some point.
from typing import List
class Solution:
def findDuplicate(self, nums: List[int]) -> int:
# slow and fast are index
slow,fast=0,0
while True:
slow=nums[slow]
fast=nums[nums[fast]]
if slow==fast:
break
# so far we found where slow and fast met.
# to find where cycle starts we initialize another pointer from start, let's name is start
# start and slow will move towards each other, and meeting point will be the point that you are looking for
start=0
while True:
slow=nums[slow]
start=nums[start]
if slow==start:
return slow
Notice none of the elements after first index ever points to value at index 0. because our range is 1-n. We are tracking where we point to by nums[value] but since no value will be 0, nothing will point to nums[0]

You can find the xor of all the number in the array (lets call it x) and then calculator xor of the number 1,2,3,....,n (lets call it y). Now, x xor y will be your answer

Related

How do I calculate Time Complexity for this particular algorithm?

I know there are many other questions out there asking for the general guide of how to calculate the time complexity, such as this one.
From them I have learnt that when there is a loop, such as the (for... if...) in my Python programme, the Time complexity is N * N where N is the size of input. (please correct me if this is also wrong) (Edited once after being corrected by an answer)
# greatest common divisor of two integers
a, b = map(int, input().split())
list = []
for i in range(1, a+b+1):
if a % i == 0 and b % i == 0:
list.append(i)
n = len(list)
print(list[n-1])
However, do other parts of the code also contribute to the time complexity, that will make it more than a simple O(n) = N^2 ? For example, in the second loop where I was finding the common divisors of both a and b (a%i = 0), is there a way to know how many machine instructions the computer will execute in finding all the divisors, and the consequent time complexity, in this specific loop?
I wish the question is making sense, apologise if it is not clear enough.
Thanks for answering

First, a few hints:
In your code there is no nested loop. The if-statement does not constitute a loop.
Not all nested loops have quadratic time complexity.
Writing O(n) = N*N doesn't make any sense: what is n and what is N? Why does n appear on the left but N is on the right? You should expect your time complexity function to be dependent on the input of your algorithm, so first define what the relevant inputs are and what names you give them.
Also, O(n) is a set of functions (namely those asymptotically bounded from above by the function f(n) = n, whereas f(N) = N*N is one function. By abuse of notation, we conventionally write n*n = O(n) to mean n*n ∈ O(n) (which is a mathematically false statement), but switching the sides (O(n) = n*n) is undefined. A mathematically correct statement would be n = O(n*n).
You can assume all (fixed bit-length) arithmetic operations to be O(1), since there is a constant upper bound to the number of processor instructions needed. The exact number of processor instructions is irrelevant for the analysis.
Let's look at the code in more detail and annotate it:
a, b = map(int, input().split()) # O(1)
list = [] # O(1)
for i in range(1, a+b+1): # O(a+b) multiplied by what's inside the loop
if a % i == 0 and b % i == 0: # O(1)
list.append(i) # O(1) (amortized)
n = len(list) # O(1)
print(list[n-1]) # O(log(a+b))
So what's the overall complexity? The dominating part is indeed the loop (the stuff before and after is negligible, complexity-wise), so it's O(a+b), if you take a and b to be the input parameters. (If you instead wanted to take the length N of your input input() as the input parameter, it would be O(2^N), since a+b grows exponentially with respect to N.)

One thing to keep in mind, and you have the right idea, is that higher degree take precedence. So you can have a step that’s constant O(1) but happens n times O(N) then it will be O(1) * O(N) = O(N).
Your program is O(N) because the only thing really affecting the time complexity is the loop, and as you know a simple loop like that is O(N) because it increases linearly as n increases.
Now if you had a nested loop that had both loops increasing as n increased, then it would be O(n^2).

What is the time and space complexity of the following program? - leetcode 1291

Can somebody pls tell me what the time and space complexity of my answer is?
I think the time complexity is O(nlogn) assuming this is how long the sorted function takes. I'm also assuming that the double for loop takes constant O(1) time because the length of the numbers string does not change.
I think the space complexity is O(n) where n is the number of elements in our results array. But, I'm unsure of this answer.
Any help is appreciated.
class Solution(object):
def sequentialDigits(self, low, high):
"""
:type low: int
:type high: int
:rtype: List[int]
"""
numbers = "123456789" #every possible substring of this string represents the set of valid numbers that we can have
results = []
#get every substring from the 'number' string
for i in range(len(numbers)):
for j in range(i, len(numbers)):
# sustring in range i to j
subString = numbers[i:j+1]
#convert it to an integer
subString= int(subString)
#if its within the required range, add it to our array
if low<=subString<=high:
results.append(subString)
return sorted(results)

The complexity of your algorithm is O(1). The big O notation is used to denote the asymptotic upper bound of an algorithm as the input size (n) tends to infinity. In this particular case, there are only a finite number of possible solutions (45) which your algorithm correctly finds. Therefore the worst-case running time (or memory consumption) does not depend on the inputs; you always need to loop over maximum 45 candidates.

Since the length of numbers string depends on the order of input. The length would be log10n. So the time complexity from the loop will be O( (log10n )2). And maximum length for the results array will be in length of numbers string squared so time complexity for the sorting will be O( log10n( log( log10n ) ). Overall time complexity will be O( ( log10n)2 + log10n( log( log10n ) ) ) which simplifies to O( log10n( log( log10n ) ) ).

String sorting problem with code execution time limit

I was recently trying to solve a HackerEarth problem. The code worked on the sample inputs and some custom inputs that I gave. But, when I submitted, it showed errors for exceeding the time limit. Can someone explain how I can make the code run faster?
Problem Statement: Cyclic shift
A large binary number is represented by a string A of size N and comprises of 0s and 1s. You must perform a cyclic shift on this string. The cyclic shift operation is defined as follows:
If the string A is [A0, A1,..., An-1], then after performing one cyclic shift, the string becomes [A1, A2,..., An-1, A0].
You performed the shift infinite number of times and each time you recorded the value of the binary number represented by the string. The maximum binary number formed after performing (possibly 0) the operation is B. Your task is to determine the number of cyclic shifts that can be performed such that the value represented by the string A will be equal to B for the Kth time.
Input format:
First line: A single integer T denoting the number of test cases
For each test case:
First line: Two space-separated integers N and K
Second line: A denoting the string
Output format:
For each test case, print a single line containing one integer that represents the number of cyclic shift operations performed such that the value represented by string A is equal to B for the Kth time.
Code:
import math
def value(s):
u = len(s)
d = 0
for h in range(u):
d = d + (int(s[u-1-h]) * math.pow(2, h))
return d
t = int(input())
for i in range(t):
x = list(map(int, input().split()))
n = x[0]
k = x[1]
a = input()
v = 0
for j in range(n):
a = a[1:] + a[0]
if value(a) > v:
b = a
v = value(a)
ctr = 0
cou = 0
while ctr < k:
a = a[1:] + a[0]
cou = cou + 1
if a == b:
ctr = ctr + 1
print(cou)

In the problem, the constraint on n is 0<=n<=1e5. In the function value(), you calculating integer from the binary string whose length can go up to 1e5. so the integer calculating by you can go as high as pow(2, 1e5). This surely impractical.
As mentioned by Prune, you must use some efficient algorithms for finding a subsequence, say sub1, whose repetitions make up the given string A. If you solve this by brute-force, the time complexity will be O(n*n), as maximum value of n is 1e5, time limit will exceed. so use some efficient algorithm.

I can't do much with the code you posted, since you obfuscated it with meaningless variables and a lack of explanation. When I scan it, I get the impression that you've made the straightforward approach of doing a single-digit shift in a long-running loop. You count iterations until you hit B for the Kth time.
This is easy to understand, but cumbersome and inefficient.
Since the cycle repeats every N iterations, you gain no new information from repeating that process. All you need to do is find where in the series of N iterations you encounter B ... which could be multiple times.
In order for B to appear multiple times, A must consist of a particular sub-sequence of bits, repeated 2 or more times. For instance, 101010 or 011011. You can detect this with a simple addition to your current algorithm: at each iteration, check to see whether the current string matches the original. The first time you hit this, simply compute the repetition factor as rep = len(a) / j. At this point, exit the shifting loop: the present value of b is the correct one.
Now that you have b and its position in the first j rotations, you can directly compute the needed result without further processing.
I expect that you can finish the algorithm and do the coding from here.
Ah -- taken as a requirements description, the wording of your problem suggests that B is a given. If not, then you need to detect the largest value.
To find B, append A to itself. Find the A-length string with the largest value. You can hasten this by finding the longest string of 1s, applying other well-known string-search algorithms for the value-trees after the first 0 following those largest strings.
Note that, while you iterate over A, you look for the first place in which you repeat the original value: this is the desired repetition length, which drives the direct-computation phase in the first part of my answer.

Is there a more efficient way to find the missing integer?

I'm currently studying a module called data structures and algorithms at a university. We've been tasked with writing an algorithm that finds the smallest positive integer which does not occur in a given sequence. I was able to find a solution, but is there a more efficient way?
x = [5, 6, 3, 1, 2]
def missing_integer():
for i in range(1, 100):
if i not in x:
return i
print(missing_integer())
The instructions include some examples:
given x = [1, 3, 6, 4, 1, 2], the function should return 5,
given x = [1, 2, 3], the function should return 4 and
given x = [−1, −3], the function should return 1.

You did not ask for the most efficient way to solve the problem, just if there is a more efficient way than yours. The answer to that is yes.
If the missing integer is near the top of the range of the integers and the list is long, your algorithm as a run-time efficiency of O(N**2)--your loop goes through all possible values, and the not in operator searches through the entire list if a match is not found. (Your code searches only up to the value 100--I assume that is just a mistake on your part and you want to handle sequences of any length.)
Here is a simple algorithm that is merely order O(N*log(N)). (Note that quicker algorithms exist--I show this one since it is simple and thus answers your question easily.) Sort the sequence (which has the order I stated) then run through it starting at the smallest value. This linear search will easily find the missing positive integer. This algorithm also has the advantage that the sequence could involve negative numbers, non-integer numbers, and repeated numbers, and the code could easily handle those. This also handles sequences of any size and with numbers of any size, though of course it runs longer for longer sequences. If a good sort routine is used, the memory usage is quite small.

I think the O(n) algorithm goes like this: initialise an array record of length n + 2 (list in Python) to None, and iterate over the input. If the element is one of the array indexes, set the element in the record to True. Now iterate over the new list record starting from index 1. Return the first None encountered.

The slow step in your algorithm is that line:
if i not in x:
That step takes linear time, which makes the entire algorithm O(N*N). If you first turn the list into a set, the lookup is much faster:
def missing_integer():
sx = set(x)
for i in range(1, 100):
if i not in sx:
return i
Lookup in a set is fast, in fact it takes constant time, and the algorithm now runs in linear time O(N).

Another solution is creating an array with a size of Max value, and traverse the array and marking each location of the array when that value is seen. Then, iterate from the start of the array and report the first finding unlabeled location as the smallest missing value. This is done in O(n) (Fill the array and finding the smallest unlabeled location).
Also, for negative values you can add all values the Min value to find all values positive. Then, apply the above method.
The space complexity of this method is \Theta(n).
To know more, see this post about the implementation and scrutinize more about this method.

Can be done in O(n) time with a bit of maths. initialise a minimum_value and maximum_value, and sum_value names then loop once through the numbers to find the minimum and maximum and the sum of all the numbers (mn, mx, sm).
Now the sum of integers 0..n = n*(n-1)/2 = s(n)
Therefore: missing_number = (s(mx) - s(mn)) - sm
All done with traversing the numbers only once!

My answer using list comprehension:
def solution(A):
max_val = max(A);
min_val = min(A);
if max_val<0: val = 1;
elif max_val > 0:
li = [];
[li.append(X) for X in range(min_val,max_val) if X not in A];
if len(li)>0:
if min(li)<0: val = 1;
else: val = min(li);
if len(li)==0: val=max_val+1;
return val;
L = [-1, -3];
res = solution(L);
print(res);

Computing the smallest positive integer not covered by any of a set of intervals

Someone posted this question here a few weeks ago, but it looked awfully like homework without prior research, and the OP promptly removed it after getting a few downvotes.
The question itself was rather interesting though, and I've been thinking about it for a week without finding a satisfying solution. Hopefully someone can help?
The question is as follows: given a list of N integer intervals, whose bounds can take any values from 0 to N³, find the smallest integer i such that i does not belong to any of the input intervals.
For example, if given the list [3,5] [2,8] [0,3] [10,13] (N = 4) , the algorithm should return 9.
The simplest solution that I can think of runs in O(n log(n)), and consists of three steps:
Sort the intervals by increasing lower bound
If the smallest lower bound is > 0, return 0;
Otherwise repeatedly merge the first interval with the second, until the first interval (say [a, b]) does not touch the second (say [c, d]) — that is, until b + 1 < c, or until there is only one interval.
Return b + 1
This simple solution runs in O(n log(n)), but the original poster wrote that the algorithm should run in O(n). That's trivial if the intervals are already sorted, but the example that the OP gave included unsorted intervals. I guess it must have something to do with the N³ bound, but I'm not sure what... Hashing? Linear time sorting? Ideas are welcome.
Here is a rough python implementation for the algorithm described above:
def merge(first, second):
(a, b), (c, d) = first, second
if c <= b + 1:
return (a, max(b, d))
else:
return False
def smallest_available_integer(intervals):
# Sort in reverse order so that push/pop operations are fast
intervals.sort(reverse = True)
if (intervals == [] or intervals[-1][0] > 0):
return 0
while len(intervals) > 1:
first = intervals.pop()
second = intervals.pop()
merged = merge(first, second)
if merged:
print("Merged", first, "with", second, " -> ", merged)
intervals.append(merged)
else:
print(first, "cannot be merged with", second)
return first[1] + 1
print(smallest_available_integer([(3,5), (2,8), (0,3), (10,13)]))
Output:
Merged (0, 3) with (2, 8) -> (0, 8)
Merged (0, 8) with (3, 5) -> (0, 8)
(0, 8) cannot be merged with (10, 13)
9

Elaborating on #mrip's comment: you can do this in O(n) time by using the exact algorithm you've described but changing how the sorting algorithm works.
Typically, radix sort uses base 2: the elements are divvied into two different buckets based on whether their bits are 0 or 1. Each round of radix sort takes time O(n), and there is one round per bit of the largest number. Calling that largest nunber U, this means the time complexity is O(n log U).
However, you can change the base of the radix sort to other bases. Using base b, each round takes time O(n + b), since it takes time O(b) to initialize and iterate over the buckets and O(n) time to distribute elements into the buckets. There are then logb U rounds. This gives a runtime of O((n + b)logb U).
The trick here is that since the maximum number U = n3, you can set b = n and use a base-n radix sort. The number of rounds is now logn U = logn n3 = 3 and each round takes O(n) time, so the total work to sort the numbers is O(n). More generally, you can sort numbers in the range [0, nk) in time O(kn) for any k. If k is a fixed constant, this is O(n) time.
Combined with your original algorithm, this solves the problem in time O(n).
Hope this helps!

Another idea would be use the complement of these intervals somehow. Suppose C() gives you the complement for an interval, for example C([3,5]) would be the integer numbers smaller than 3 and those larger than 5. If the maximum number is N^3, then using modulo N^3+1 you could even represent this as another interval [6,(N^3+1)+2].
If you want a number that does not belong to any of the original intervals, this same number should be present in all of the complements of these intervals. It then comes down to writing a function that can calculate the intersection of any two such 'complement intervals'.
I haven't made an implementation of this idea, since my pen and paper drawings indicated that there were more cases to consider when calculating such an intersection than I first imagined. But I think the idea behind this is valid, and it would result in an O(n) algorithm.
EDIT
On further thought, there is a worst case scenario that makes things more complex than I originally imagined.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.