O(nlogn) Runningtime Approach - python

the function consumes a list of int and produces the unique elements in the list in increasing order. For examples:
singles([4,1,4,17,1]) => [1,4,17]
I only can do it in O(n^2) running time and wonder how to change into O(n) running time without loop.
def singles(lst):
if lst==[]: return []
else:
rest_fn = list (filter (lambda x: x!=lst[0], lst[1:]))
return [lst[0]] + singles(rest_fn)

As discussed above, per https://wiki.python.org/moin/TimeComplexity (which is cited from Time complexity of python set operations? which also links to a more detailed list of operations, sorted should have time complexity O(nlogn). Set should have time complexity O(n). Therefore, doing sorted(set(input)) should have time complexity O(n) + O(nlogn) = O(nlogn)
Edit:
If you can't use set, you should mention that, but as a hint, assuming you can use sorted, you can still do the pull out uniques in O(n) if you use a deque (which has O(1) worst case insertion). Something like
rez = deque()
last = None
for val in sorted(input):
if val != last:
rez.add(val) # or whatever deque uses to add to the end of the list
last = val

Related

Is iterating through a set faster than through a list?

I'm doing the longest consecutive sequence problem on LeetCode (https://leetcode.com/problems/longest-consecutive-sequence/) and wrote the following solution:
(I made a typo earlier and put s instead of nums on line 6)
class Solution:
def longestConsecutive(self, nums: List[int]) -> int:
s = set(nums)
res = 0
for n in nums:
if n - 1 not in nums:
c = 1
while n + 1 in s:
c += 1
n += 1
res = max(res, c)
return res
This solution takes 4902 ms according to the website, but when I change the first for loop to
for n in s:
The runtime drops to 491 ms. Is looping through the hashset 10 times faster?
If you changed if n - 1 not in nums to if n - 1 not in s, then you might see it reduces the runtime a lot. in operator in set is faster in list. Generally, in in set takes O(1), while it takes O(n) for in in list. https://wiki.python.org/moin/TimeComplexity
Regarding iterating in set and list, iterating through a set can be faster if there are lots of duplicates in the list. E.g., iterating through a list with n same elements takes O(n), while it takes O(1) since there will be only one element in the set.
Is iterating through a set faster than through a list?
No, iterating through either of these data structures is the same for the same number of elements.
However, while n + 1 in s does not necessarily iterate through the elements of s. in here is an operator that checks if the value n+1 is an element of s. If s is a set, this operation is guaranteed to have O(1) time. If s is a list, then the operator will have O(n) time.

Would this function be o(n) time complexity?

I'm currently in a data structures class, would this function be considered O(N)? My thinking was due to the while loop not having a direct correlation to the for loop, it's still O(N)? If more information/code is needed for better context, I don't mind editing the post.
I appreciate any clarification.
input_array = [7, 3, 4, 1, 8]
new_min_heap = []
for index in range(0, len(input_array)):
val = input_array[index]
new_min_heap.append(val)
# if new_min_heap has other values, start swappin
if len(new_min_heap) > 1:
parent_index = get_parent_index(index)
parent_val = new_min_heap[parent_index]
while val < parent_val:
new_min_heap.swap(index, parent_index)
val = parent_val
parent_index = get_parent_index(parent_index)
parent_val = new_min_heap[parent_index]
I assume the n is the size of the input array.
The for loop has the O(n) complexity since it is executed n times. The inner while loop exchanges the values between the current array element and it's ancestors. It is executed at most lg(n) times, so the complexity is O(lg(n)). Thus, the total complexity is O(n lg(n)).
If with "swap value and parent_value", you mean:
value, parent_value = parent_value, value
Then, the while loop will be O(1) (in case there isn't any other loop inside the while loop)
So the whole function will be O(n) based on the array length
Maybe if you bring us more context the answer could change.

Why this strange execution time

I am using this sorting algorithm:
def merge(L,R):
"""Merge 2 sorted lists provided as input
into a single sorted list
"""
M = [] #Merged list, initially empty
indL,indR = 0,0 #start indices
nL,nR = len(L),len(R)
#Add one element to M per iteration until an entire sublist
#has been added
for i in range(nL+nR):
if L[indL]<R[indR]:
M.append(L[indL])
indL = indL + 1
if indL>=nL:
M.extend(R[indR:])
break
else:
M.append(R[indR])
indR = indR + 1
if indR>=nR:
M.extend(L[indL:])
break
return M
def func(L_all):
if len(L_all)==1:
return L_all[0]
else:
L_all[-1] = merge(L_all[-2],L_all.pop())
return func(L_all)
merge() is the classical merge algorithm in which, given two lists of sorted numbers, it merges them into a single sorted list, it has a linear complexity. An example of input is L_all = [[1,3],[2,4],[6,7]], a list of N sorted lists. The algorithm applies merge to the last elements of the list until there is just one element in the list, which is sorted. I have evaluated the execution time for different N, using constant length for the lists inside the list and I have obtained an unexpected pattern. The algorithm has a linear complexity but the execution time is constant, as you can see in the graph
What could be the explanation of the fact that the execution time does not depend on N?.
You haven't shown your timing code, but the problem is likely to be that your func mutates the list L_all so that it becomes a list of length 1, containing a single sorted list. After the first call func(L_all) in timeit, all subsequent calls don't change L_all at all. Instead, they just instantly return L_all[0]. Rather than 100000 calls to L_all for each N in timeit , you are in effect just doing one real call for each N. Your timing code just shows that return L_all[0] is O(1), which is hardly surprising.
I would rewrite your code like this:
import functools, random, timeit
def func(L_all):
return functools.reduce(merge,L_all)
for n in range(1,10):
L = [sorted([random.randint(1,10) for _ in range(5)]) for _ in range(n)]
print(timeit.timeit("func(L)",globals=globals()))
Then even for these smallish n you see a clear dependence on n:
0.16632885999999997
1.711736347
3.5761923199999996
6.058960655
8.796722217
15.112843280999996
17.723825805000004
22.803739991999997
26.114925834000005

Time complexity of Python Function Involving List Operations

When I plot the time taken for the following algorithm for different size input, the time complexity appears to be polynomial. I'm not sure which operations account for this.
I'm assuming it's to do with list(s), del l[i] and l[::-1], but I'm not clear what the complexity of these is individually. Can anyone please explain?
Also, is there a way to optimize the algorithm without completely changing the approach? (I know there is a way to bring it down to linear time complexity by using "double-ended pincer-movement".)
def palindrome_index(s):
for i, c in enumerate(s):
l = list(s)
del l[i]
if l[::-1] == l:
return i
return -1
Your algorithm indeed is quadratic in len(s):
In iteration i, you perform linear time operations in the length: creating the list, reversing it, and (on linear on average) erasing element i. Since you perform this len(s) times, it is quadratic in len(s).
I'm assuming it's to do with list(s), del l[i] and l[::-1], but I'm not clear what the complexity of these is individually. Can anyone please explain?
Each of these operations is linear time (at least on average, which is enough to analyze your algorithm). Constructing a list, either from an iterable, or by reversing an existing list, is linear in the length of the list. Deleting element i, at the very least, requires about n - i + 1 shifts of the elements, as each one is moved back once.
All of these are linear "O(n)":
list(s)
list(s) creates a new list from s. To do that, it has to go through all elements in s, so its time is proportional to the length of s.
l[::-1]
Just like list(s), l[::-1] creates a new list with the same elements as l, but in different order. It has to touch each element once, so its time is proportional to the length of l.
del l[i]
In order to delete an element at position i, the element which was at position i+1 has to be moved to position i, then element which was at i+2 has to be moved to position i+1 etc. So, if you are deleting the first element (del l[0]), it has to touch move elements of the list and if you are deleting the last (del l[-1]), it just has to remove the last. On average, it will move n/2 elements, so it is also linear.

Why is my limit exceeding on the top k frequent question [LEETCODE]?

I have the following code for the Leetcode's top k Frequent question.
The time limit complexity allowed is smaller than o(nlogn), where n is the array size
Isn't my big O complexity of o(n)?
If so why am I still exceeding the time limit ?
def topKFrequent(self, nums, k):
output = {}
outlist = []
for item in nums:
output[item] = nums.count(item)
max_count = sorted(output.values(),reverse= True)[:k]
for key,val in output.items():
if val in max_count:
outlist.append(key)
return (outlist)
testinput: array [1,1,1,2,2,3,1,1,1,2,2,3] k = 2
testoutput: [1,2]
Question link: https://leetcode.com/problems/top-k-frequent-elements/
Your solution is O(n^2), because of this:
for item in nums:
output[item] = nums.count(item)
For each item in your array, you're looking through the whole array to count the number of elements which are the same.
Instead of doing this, you can get the counts in O(n) by iterating nums and adding 1 to the counter of each item you find as you go.
The O(n log n) in the end will come from sorted(output.values(), reverse=True) because every generic sorting algorithm (including Timsort) will be O(n log n).
As another answer mentions, your counting is O(n^2) time complexity, which is causing your time limit exceeded. Fortunately, python comes with a Counter object in the collections module, which will do exactly what the other answer describes, but in well-optimized C code. This will reduce your time complexity to O(nlogn).
Furthermore, you can reduce your time complexity to O(nlogk) by replacing the sort call with a min-heap trick. Keep a min-heap of size k, and add the other elements and pop the min one by one, until all elements have been inserted (at some point or another). The k that remain in the heap are your maximum k values.
from collections import Counter
from heapq import heappushpop, heapify
def get_most_frequent(nums, k):
counts = Counter(nums)
counts = [(v, k) for k, v in counts.items()]
heap = counts[:k]
heapify(heap)
for count in counts[k:]:
heappushpop(heap, count)
return [k for v, k in heap]
If you must return the elements in any particular order, you can sort the k elements in O(klogk) time, which still results in the same O(nlogk) time complexity overall.

Categories