I'm currently studying a module called data structures and algorithms at a university. We've been tasked with writing an algorithm that finds the smallest positive integer which does not occur in a given sequence. I was able to find a solution, but is there a more efficient way?
x = [5, 6, 3, 1, 2]
def missing_integer():
for i in range(1, 100):
if i not in x:
return i
print(missing_integer())
The instructions include some examples:
given x = [1, 3, 6, 4, 1, 2], the function should return 5,
given x = [1, 2, 3], the function should return 4 and
given x = [−1, −3], the function should return 1.
You did not ask for the most efficient way to solve the problem, just if there is a more efficient way than yours. The answer to that is yes.
If the missing integer is near the top of the range of the integers and the list is long, your algorithm as a run-time efficiency of O(N**2)--your loop goes through all possible values, and the not in operator searches through the entire list if a match is not found. (Your code searches only up to the value 100--I assume that is just a mistake on your part and you want to handle sequences of any length.)
Here is a simple algorithm that is merely order O(N*log(N)). (Note that quicker algorithms exist--I show this one since it is simple and thus answers your question easily.) Sort the sequence (which has the order I stated) then run through it starting at the smallest value. This linear search will easily find the missing positive integer. This algorithm also has the advantage that the sequence could involve negative numbers, non-integer numbers, and repeated numbers, and the code could easily handle those. This also handles sequences of any size and with numbers of any size, though of course it runs longer for longer sequences. If a good sort routine is used, the memory usage is quite small.
I think the O(n) algorithm goes like this: initialise an array record of length n + 2 (list in Python) to None, and iterate over the input. If the element is one of the array indexes, set the element in the record to True. Now iterate over the new list record starting from index 1. Return the first None encountered.
The slow step in your algorithm is that line:
if i not in x:
That step takes linear time, which makes the entire algorithm O(N*N). If you first turn the list into a set, the lookup is much faster:
def missing_integer():
sx = set(x)
for i in range(1, 100):
if i not in sx:
return i
Lookup in a set is fast, in fact it takes constant time, and the algorithm now runs in linear time O(N).
Another solution is creating an array with a size of Max value, and traverse the array and marking each location of the array when that value is seen. Then, iterate from the start of the array and report the first finding unlabeled location as the smallest missing value. This is done in O(n) (Fill the array and finding the smallest unlabeled location).
Also, for negative values you can add all values the Min value to find all values positive. Then, apply the above method.
The space complexity of this method is \Theta(n).
To know more, see this post about the implementation and scrutinize more about this method.
Can be done in O(n) time with a bit of maths. initialise a minimum_value and maximum_value, and sum_value names then loop once through the numbers to find the minimum and maximum and the sum of all the numbers (mn, mx, sm).
Now the sum of integers 0..n = n*(n-1)/2 = s(n)
Therefore: missing_number = (s(mx) - s(mn)) - sm
All done with traversing the numbers only once!
My answer using list comprehension:
def solution(A):
max_val = max(A);
min_val = min(A);
if max_val<0: val = 1;
elif max_val > 0:
li = [];
[li.append(X) for X in range(min_val,max_val) if X not in A];
if len(li)>0:
if min(li)<0: val = 1;
else: val = min(li);
if len(li)==0: val=max_val+1;
return val;
L = [-1, -3];
res = solution(L);
print(res);
Related
I'm solving a question in leetcode
Given an array nums containing n + 1 integers where each integer is between 1 and n (inclusive), prove that at least one duplicate number must exist. Assume that there is only one duplicate number, find the duplicate one in O(n) time and O(1) space complexity
class Solution(object):
def findDuplicate(self, nums):
"""
:type nums: List[int]
:rtype: int
"""
xor=0
for num in nums:
newx=xor^(2**num)
if newx<xor:
return num
else:
xor=newx
I got the solution accepted but I have been told that it is neither O(1) space nor O(n) time.
can anyone please help me understand why?
Your question is actually hard to answer. Typically when dealing with complexities, there's an assumed machine model. A standard model assumes that memory locations are of size log(n) bits when the input is of size n, and that arithmetic operations on numbers of size log(n) bits are O(1).
In this model, your code isn't O(1) in space and O(n) in time. Your xor value has n bits, and this doesn't fit in a constant memory location (it actually needs n/log(n) memory locations. Similarly, it's not O(n) in time, since the arithmetic operations are on numbers larger than log(n) bits.
To solve your problem in O(1) space and O(n) time, you've got to make sure your values don't get too large. One approach is to xor all the numbers in the array, and then you'll get 1^2^3...^n ^ d where d is the duplicate. Thus you can xor 1^2^3^..^n from the total xor of the array, and find the duplicate value.
def find_duplicate(ns):
r = 0
for i, n in enumerate(ns):
r ^= i ^ n
return r
print find_duplicate([1, 3, 2, 4, 5, 4, 6])
This is O(1) space, and O(n) time since r never uses more bits than n does (that is, approximately ln(n) bits).
Your solution is not O(1) space, meaning: your space/memory is not constant but depending on the input!
newx=xor^(2**num)
This is a bitwise XOR over log_2(2**num) = num bits, where num is one of your input-numbers, resulting in a log_2(2**num) = num bit result.
So n=10 = log_2(2^10) = 10 bits, n=100 = log_2(2^100) = 100 bits. It's growing linearly (not constant).
It's also not within O(n) time-complexity as you got:
an outer loop over all n numbers
and a non-constant / non O(1) inner-loop (see above)
assumption: XOR is not constant in regards to bit-representation of input
that's not always treated like that; but physics support this claim (Chandrasekhar limit, speed of light, ...)
This question has to be solved with linked list Floyd's algorighm.
Convert the array to a linked list. There are n+1 positions but only n values.
For example if you have this array: [1,3,4,2,2] convert it to linked list.
How the pointing works
Starting from index 0, look at which element in that position. it is 1. Then index 0 will point to nums1. 0 is pointing 3. then figure out which value 3 is pointing to. That will nums[3] and so on.
Now you converted this to linked list, you have to use Floyd's hare and tortoise algorithm. Basically you have two pointers, slow and fast. If there is cycle, slow and fast pointers are gonna meet at some point.
from typing import List
class Solution:
def findDuplicate(self, nums: List[int]) -> int:
# slow and fast are index
slow,fast=0,0
while True:
slow=nums[slow]
fast=nums[nums[fast]]
if slow==fast:
break
# so far we found where slow and fast met.
# to find where cycle starts we initialize another pointer from start, let's name is start
# start and slow will move towards each other, and meeting point will be the point that you are looking for
start=0
while True:
slow=nums[slow]
start=nums[start]
if slow==start:
return slow
Notice none of the elements after first index ever points to value at index 0. because our range is 1-n. We are tracking where we point to by nums[value] but since no value will be 0, nothing will point to nums[0]
You can find the xor of all the number in the array (lets call it x) and then calculator xor of the number 1,2,3,....,n (lets call it y). Now, x xor y will be your answer
For one of my programming questions, I am required to define a function that accepts two variables, a list of length l and an integer w. I then have to find the maximum sum of a sublist with length w within the list.
Conditions:
1<=w<=l<=100000
Each element in the list ranges from [1, 100]
Currently, my solution works in O(n^2) (correct me if I'm wrong, code attached below), which the autograder does not accept, since we are required to find an even simpler solution.
My code:
def find_best_location(w, lst):
best = 0
n = 0
while n <= len(lst) - w:
lists = lst[n: n + w]
cur = sum(lists)
best = cur if cur>best else best
n+=1
return best
If anyone is able to find a more efficient solution, please do let me know! Also if I computed my big-O notation wrongly do let me know as well!
Thanks in advance!
1) Find sum current of first w elements, assign it to best.
2) Starting from i = w: current = current + lst[i]-lst[i-w], best = max(best, current).
3) Done.
Your solution is indeed O(n^2) (or O(n*W) if you want a tighter bound)
You can do it in O(n) by creating an aux array sums, where:
sums[0] = l[0]
sums[i] = sums[i-1] + l[i]
Then, by iterating it and checking sums[i] - sums[i-W] you can find your solution in linear time
You can even calculate sums array on the fly to reduce space complexity, but if I were you, I'd start with it, and see if I can upgrade my solution next.
write a recursive function that return the minimum element in an array where C is the array and s is the size. this is my code.
c = [2,5,6,4,3]
def min (c, s):
smallest = c[0]
if c[s] < smallest:
smallest = c[s]
else:
return min
print min (c,s)
errors : s is not defined.
Apparently the computer doesn't know what s stands for in the line print min (c,s)
You need to tell a computer what you want s variable to be. I propose you to use 0 instead of s in function call, that way you will start looking for min number from 0.
That being said there are other issues with the code but this will fix your error and you will be able to move forward with your task.
First of all, I'd caution the use of min as a function name.
Min is a python built in so use as your own function name may cause unwanted results in your code.
If I'm understanding the code and your question correctly, it seems as if you are trying to get the minimum of the list length compared to the minimum integer in the list.
c = [2,5,6,4,3]
'''The function takes the name of the array as the parameter. In this case c'''
def minimum(array):
'''This function compares the length of array to the smallest integer in the array'''
length = 0
for i in c:
length += 1
'''In this case length is 5'''
smallest = min(c)
'''smallest is 2 in this case'''
print min(smallest,length)
However if all you want is to get the minimum value of an array, just do:
def minval(array):
print min(array)
A recursive function divides the task into smaller parts that can be solved by the function itself. The function gets repeatedly called with these smaller and smaller tasks until the tasks become so easy, that they can be computed easily.
Thus the smallest element of a list (when computed recursively) is the minimum of the first element and the smallest element of the rest of the list. The task becomes trivial, when there is only one element. In Python3:
def smallest(lst):
"""recursive for learning, not efficient"""
first, *rest = lst
return first if not rest else min(first, smallest(rest))
I agree with others that you should avoid using min as a function name, so that you don't collide with python's builtin implementation. I'm also operating under the assumption that you're not supposed to use min, because otherwise the solution is trivial.
Here's a recursive implementation that doesn't require a second argument, since the list's length can be determined via the len function.
def smallest(lst):
l = len(lst)
if l > 1:
mid = l // 2
m1 = smallest(lst[:mid])
m2 = smallest(lst[mid:])
return m1 if m1 < m2 else m2
return lst[0]
This checks to see whether the argument list has 2 or more values. If so, it splits the list into two halves, determines the smallest value in each half, then returns the smaller of the results. If there's only one element, it's trivially the smallest and gets returned.
Halving the list on each recursive call bounds the depth of the call stack to O(log n), where n is the original list's length. This prevents stack overflow from occurring with any list you could actually create in Python. Another proposed solution whittles the list down one-by-one, and will fail on lists with more than a thousand or so values.
What is the fastest method to get the k smallest numbers in an unsorted list of size N using python?
Is it faster to sort the big list of numbers, and then get the k smallest numbers,or to get the k smallest numbers by finding the minimum in the list k times, making sure u remove the found minimum from the search before the next search?
You could use a heap queue; it can give you the K largest or smallest numbers out of a list of size N in O(NlogK) time.
The Python standard library includes the heapq module, complete with a heapq.nsmallest() function ready implemented:
import heapq
k_smallest = heapq.nsmallest(k, input_list)
Internally, this creates a heap of size K with the first K elements of the input list, then iterating over the remaining N-K elements, pushing each to the heap, then popping off the largest one. Such a push and pop takes log K time, making the overall operation O(NlogK).
The function also optimises the following edge cases:
If K is 1, the min() function is used instead, giving you a O(N) result.
If K >= N, the function uses sorting instead, since O(NlogN) would beat O(NlogK) in that case.
A better option is to use the introselect algorithm, which offers an O(n) option. The only implementation I am aware of is using the numpy.partition() function:
import numpy
# assuming you have a python list, you need to convert to a numpy array first
array = numpy.array(input_list)
# partition, slice back to the k smallest elements, convert back to a Python list
k_smallest = numpy.partition(array, k)[:k].tolist()
Apart from requiring installation of numpy, this also takes N memory (versus K for heapq), as a copy of the list is created for the partition.
If you only wanted indices, you can use, for either variant:
heapq.nsmallest(k, range(len(input_list)), key=input_list.__getitem__) # O(NlogK)
numpy.argpartition(numpy.array(input_list), k)[:k].tolist() # O(N)
If the list of the kth smallest numbers doesn't need to be sorted, this can be done in O(n) time with a selection algorithm like introselect. The standard library doesn't come with one, but NumPy has numpy.partition for the job:
partitioned = numpy.partition(l, k)
# The subarray partitioned[:k] now contains the k smallest elements.
You might want to take a look at heapq:
In [109]: L = [random.randint(1,1000) for _ in range(100)]
In [110]: heapq.nsmallest(10, L)
Out[110]: [1, 17, 17, 19, 24, 37, 37, 45, 63, 73]
EDIT: this assumes that the list is immutable. If the list is an array and can be modified there are linear methods available.
You can get the complexity down to O(n * log k) by using a heap of size k + 1.
Initially get the first k elements into a min-heap.
For every subsequent element, add the element as a leaf and heapify.
Replace the last element with the next element.
Heapify can be done in logarithmic time and hence the time complexity is as above.
You can do it in O(kn) with a selection algorithm. Once kn >= n log n, switch to sorting. That said, the constant on the selection algorithm tends to be a lot higher than the one on quicksort, so you really need to compare i (kn) and j (n log n). In practice, it's usually more desirable to just sort unless you're dealing with large n or very small k.
Edit: see comments. It's actually a lot better.
Using nsmallest numbers in heapq is less code but if you are looking to implement it yourself this is a simple way to do it. This solution requires looping through the data once only but since heappush and heappop run on O(log n) this algorithm would perform best on smaller numbers of k.
import heapq
def getsmallest(arr, k):
m = [-x for x in l[:k]]
heapq.heapify(m)
for num in arr[5:]:
print num, m
heapq.heappush(m, max(-num, heapq.heappop(m)))
return m
if __name__ == '__main__':
l = [1,2,3,52,2,3,1]
print getsmallest(l, 5)
I am creating a fast method of generating a list of primes in the range(0, limit+1). In the function I end up removing all integers in the list named removable from the list named primes. I am looking for a fast and pythonic way of removing the integers, knowing that both lists are always sorted.
I might be wrong, but I believe list.remove(n) iterates over the list comparing each element with n. meaning that the following code runs in O(n^2) time.
# removable and primes are both sorted lists of integers
for composite in removable:
primes.remove(composite)
Based off my assumption (which could be wrong and please confirm whether or not this is correct) and the fact that both lists are always sorted, I would think that the following code runs faster, since it only loops over the list once for a O(n) time. However, it is not at all pythonic or clean.
i = 0
j = 0
while i < len(primes) and j < len(removable):
if primes[i] == removable[j]:
primes = primes[:i] + primes[i+1:]
j += 1
else:
i += 1
Is there perhaps a built in function or simpler way of doing this? And what is the fastest way?
Side notes: I have not actually timed the functions or code above. Also, it doesn't matter if the list removable is changed/destroyed in the process.
For anyone interested the full functions is below:
import math
# returns a list of primes in range(0, limit+1)
def fastPrimeList(limit):
if limit < 2:
return list()
sqrtLimit = int(math.ceil(math.sqrt(limit)))
primes = [2] + range(3, limit+1, 2)
index = 1
while primes[index] <= sqrtLimit:
removable = list()
index2 = index
while primes[index] * primes[index2] <= limit:
composite = primes[index] * primes[index2]
removable.append(composite)
index2 += 1
for composite in removable:
primes.remove(composite)
index += 1
return primes
This is quite fast and clean, it does O(n) set membership checks, and in amortized time it runs in O(n) (first line is O(n) amortized, second line is O(n * 1) amortized, because a membership check is O(1) amortized):
removable_set = set(removable)
primes = [p for p in primes if p not in removable_set]
Here is the modification of your 2nd solution. It does O(n) basic operations (worst case):
tmp = []
i = j = 0
while i < len(primes) and j < len(removable):
if primes[i] < removable[j]:
tmp.append(primes[i])
i += 1
elif primes[i] == removable[j]:
i += 1
else:
j += 1
primes[:i] = tmp
del tmp
Please note that constants also matter. The Python interpreter is quite slow (i.e. with a large constant) to execute Python code. The 2nd solution has lots of Python code, and it can indeed be slower for small practical values of n than the solution with sets, because the set operations are implemented in C, thus they are fast (i.e. with a small constant).
If you have multiple working solutions, run them on typical input sizes, and measure the time. You may get surprised about their relative speed, often it is not what you would predict.
The most important thing here is to remove the quadratic behavior. You have this for two reasons.
First, calling remove searches the entire list for values to remove. Doing this takes linear time, and you're doing it once for each element in removable, so your total time is O(NM) (where N is the length of primes and M is the length of removable).
Second, removing elements from the middle of a list forces you to shift the whole rest of the list up one slot. So, each one takes linear time, and again you're doing it M times, so again it's O(NM).
How can you avoid these?
For the first, you either need to take advantage of the sorting, or just use something that allows you to do constant-time lookups instead of linear-time, like a set.
For the second, you either need to create a list of indices to delete and then do a second pass to move each element up the appropriate number of indices all at once, or just build a new list instead of trying to mutate the original in-place.
So, there are a variety of options here. Which one is best? It almost certainly doesn't matter; changing your O(NM) time to just O(N+M) will probably be more than enough of an optimization that you're happy with the results. But if you need to squeeze out more performance, then you'll have to implement all of them and test them on realistic data.
The only one of these that I think isn't obvious is how to "use the sorting". The idea is to use the same kind of staggered-zip iteration that you'd use in a merge sort, like this:
def sorted_subtract(seq1, seq2):
i1, i2 = 0, 0
while i1 < len(seq1):
if seq1[i1] != seq2[i2]:
i2 += 1
if i2 == len(seq2):
yield from seq1[i1:]
return
else:
yield seq1[i1]
i1 += 1