Why does this function have exponential complexity big O instead of quadratic? - python

The following function has been given:
def genSubsets(L):
res = []
if len(L) == 0:
return [[]]
smaller = genSubsets(L[:-1])
extra = L[-1:]
new = []
for small in smaller:
new.append(small+extra)
return smaller+new
From my understanding, i making a copy of a list is (O n), then looping is (O n) as well. Which should make this (O n^2). However, it seems that my logic is flawed and the answer is (O 2^n). Why?

From my understanding, i making a copy of a list is (O n)
You are correct that making a copy of a list of n items takes time O(n). And in this case, each of the lists that's being copied is a subset of the original list, which has length n, so each list copied does take time O(n).
then looping is (O n) as well
Looping over a list of length n takes time O(n). However, in this case, the lists that you're looping over do not have n elements in them. There are 2n subsets of a set of size n, so at the top-level recursive call, when you recursively generate all subsets of L[:-1], you will end up with a list of 2n-1 items. Looping over that list takes time O(2n).
More generally, when looking at a loop or a list, it's important to ask "how many times does this loop run?" or "how many elements are in this list?"

Related

Is iterating through a set faster than through a list?

I'm doing the longest consecutive sequence problem on LeetCode (https://leetcode.com/problems/longest-consecutive-sequence/) and wrote the following solution:
(I made a typo earlier and put s instead of nums on line 6)
class Solution:
def longestConsecutive(self, nums: List[int]) -> int:
s = set(nums)
res = 0
for n in nums:
if n - 1 not in nums:
c = 1
while n + 1 in s:
c += 1
n += 1
res = max(res, c)
return res
This solution takes 4902 ms according to the website, but when I change the first for loop to
for n in s:
The runtime drops to 491 ms. Is looping through the hashset 10 times faster?
If you changed if n - 1 not in nums to if n - 1 not in s, then you might see it reduces the runtime a lot. in operator in set is faster in list. Generally, in in set takes O(1), while it takes O(n) for in in list. https://wiki.python.org/moin/TimeComplexity
Regarding iterating in set and list, iterating through a set can be faster if there are lots of duplicates in the list. E.g., iterating through a list with n same elements takes O(n), while it takes O(1) since there will be only one element in the set.
Is iterating through a set faster than through a list?
No, iterating through either of these data structures is the same for the same number of elements.
However, while n + 1 in s does not necessarily iterate through the elements of s. in here is an operator that checks if the value n+1 is an element of s. If s is a set, this operation is guaranteed to have O(1) time. If s is a list, then the operator will have O(n) time.

space complexity for python function

could someone help to get the space complexity for this python function?
input: nums = [1, 2, 3, 4, 5, 6, 7, 8....]
m = integer
for i in range(len(nums)):
temp = nums[i:i+m]
should this space complexity as o(m), or o(n*m), and why? Thank you!
Not including the input, with that piece of code since m doesn't seem to be a constant, it should just be O(m) because at any given point in time, we are only storing 1 chunk of nums[i:i+m] because temp is just reassigned with a new sublist for every loop thus making the previous sublist to be subject for garbage collection already.
So regardless if there are 1 million nums and m is only 5, then we would only be storing 5 items now, then next iteration leave that previous 5 items and store a new set of 5 items (depending on python implementation, this might even just use the same memory used and overwrite the previous one), and so on.
But if you are storing each sublist such as:
temp_list = []
for i in range(len(nums)):
temp = nums[i:i+m]
temp_list.append(temp)
Then it should be O(m * len(nums)) because we will be storing m items for each element in nums.
Ignoring Python's garbage collection, you get space complexity of O(mn), where n = len(nums). That's because you first allocate a list of n elements, and then you allocate n lists of m elements each (note that slicing a list creates a new allocation). That gives a total of n + mn cell allocations, which is asymptotically O(mn).
But the lists created in the for loop are all referenced by temp. That means that as soon as a new list is created, the previous one has no references and it becomes eligible for garbage collection. That leaves us practically with two lists: nums with length of n, and the last temp with length of m, which amounts to space complexity of O(m + n).

Why this strange execution time

I am using this sorting algorithm:
def merge(L,R):
"""Merge 2 sorted lists provided as input
into a single sorted list
"""
M = [] #Merged list, initially empty
indL,indR = 0,0 #start indices
nL,nR = len(L),len(R)
#Add one element to M per iteration until an entire sublist
#has been added
for i in range(nL+nR):
if L[indL]<R[indR]:
M.append(L[indL])
indL = indL + 1
if indL>=nL:
M.extend(R[indR:])
break
else:
M.append(R[indR])
indR = indR + 1
if indR>=nR:
M.extend(L[indL:])
break
return M
def func(L_all):
if len(L_all)==1:
return L_all[0]
else:
L_all[-1] = merge(L_all[-2],L_all.pop())
return func(L_all)
merge() is the classical merge algorithm in which, given two lists of sorted numbers, it merges them into a single sorted list, it has a linear complexity. An example of input is L_all = [[1,3],[2,4],[6,7]], a list of N sorted lists. The algorithm applies merge to the last elements of the list until there is just one element in the list, which is sorted. I have evaluated the execution time for different N, using constant length for the lists inside the list and I have obtained an unexpected pattern. The algorithm has a linear complexity but the execution time is constant, as you can see in the graph
What could be the explanation of the fact that the execution time does not depend on N?.
You haven't shown your timing code, but the problem is likely to be that your func mutates the list L_all so that it becomes a list of length 1, containing a single sorted list. After the first call func(L_all) in timeit, all subsequent calls don't change L_all at all. Instead, they just instantly return L_all[0]. Rather than 100000 calls to L_all for each N in timeit , you are in effect just doing one real call for each N. Your timing code just shows that return L_all[0] is O(1), which is hardly surprising.
I would rewrite your code like this:
import functools, random, timeit
def func(L_all):
return functools.reduce(merge,L_all)
for n in range(1,10):
L = [sorted([random.randint(1,10) for _ in range(5)]) for _ in range(n)]
print(timeit.timeit("func(L)",globals=globals()))
Then even for these smallish n you see a clear dependence on n:
0.16632885999999997
1.711736347
3.5761923199999996
6.058960655
8.796722217
15.112843280999996
17.723825805000004
22.803739991999997
26.114925834000005

Time complexity of Python Function Involving List Operations

When I plot the time taken for the following algorithm for different size input, the time complexity appears to be polynomial. I'm not sure which operations account for this.
I'm assuming it's to do with list(s), del l[i] and l[::-1], but I'm not clear what the complexity of these is individually. Can anyone please explain?
Also, is there a way to optimize the algorithm without completely changing the approach? (I know there is a way to bring it down to linear time complexity by using "double-ended pincer-movement".)
def palindrome_index(s):
for i, c in enumerate(s):
l = list(s)
del l[i]
if l[::-1] == l:
return i
return -1
Your algorithm indeed is quadratic in len(s):
In iteration i, you perform linear time operations in the length: creating the list, reversing it, and (on linear on average) erasing element i. Since you perform this len(s) times, it is quadratic in len(s).
I'm assuming it's to do with list(s), del l[i] and l[::-1], but I'm not clear what the complexity of these is individually. Can anyone please explain?
Each of these operations is linear time (at least on average, which is enough to analyze your algorithm). Constructing a list, either from an iterable, or by reversing an existing list, is linear in the length of the list. Deleting element i, at the very least, requires about n - i + 1 shifts of the elements, as each one is moved back once.
All of these are linear "O(n)":
list(s)
list(s) creates a new list from s. To do that, it has to go through all elements in s, so its time is proportional to the length of s.
l[::-1]
Just like list(s), l[::-1] creates a new list with the same elements as l, but in different order. It has to touch each element once, so its time is proportional to the length of l.
del l[i]
In order to delete an element at position i, the element which was at position i+1 has to be moved to position i, then element which was at i+2 has to be moved to position i+1 etc. So, if you are deleting the first element (del l[0]), it has to touch move elements of the list and if you are deleting the last (del l[-1]), it just has to remove the last. On average, it will move n/2 elements, so it is also linear.

Python very slow random sampling over big list

I'm expecting very slow performance with the algorithm below.
I've a very large (1.000.000+) list containing large strings.
ie: id_list = ['MYSUPERLARGEID:1123:123123', 'MYSUPERLARGEID:1123:134534389', 'MYSUPERLARGEID:1123:12763']...
num_reads is the max number of elements to random choose from this list.
The idea is to randomly choose one of the string ids in id_list until num_reads is reached and to add (I say add, and not append because I don't care on random_id_list order) them into random_id_list which is empty at the beginning.
I can't repeat same id so I remove it from the original list after being randonly chosen. I suspect this is what is doing the script to go real slow.. maybe I'm wrong and it's another part of this loop the responsible of slow behavior.
for x in xrange(0, num_reads):
id_index, id_string = random.choice(list(enumerate(id_list)))
random_id_list.append(id_string)
del read_id_list[id_index]
Use random.sample() to produce a sample of N elements with no repeats:
random_id_list = random.sample(read_id_list, num_reads)
Removing elements from the middle of a large list is indeed slow, as everything beyond that index has to be moved up a step.
This does not, of course, remove elements from the original list anymore, so repeated random.sample() calls can still give you samples with elements that have been picked before. If you need to produce samples repeatedly until your list is exhausted, then shuffle once and from there on out take consecutive slices of k elements from the shuffled list:
def random_samples(k):
random.shuffle(id_list)
for i in range(0, len(id_list), k):
yield id_list[i : i + k]
then use this to produce your samples; either in a loop or with next():
sample_gen = random_samples(num_reads)
random_id_list = next(sample_gen)
# some point later
another_random_id_list = next(sample_gen)
Because the list is shuffled entirely randomly, the slices produced this way are also all valid random samples.
The "hard" way, instead of just shuffling the list, is to evaluate each element of your list in order, and selecting the item with a probability that relies on both the number of items you still need to choose and the number of items left to choose from. This is useful if you don't have the entire list presented to you at once (a so-called on-line algorithm).
Let's say you need to select k of N items. That means each item has a k/N probability of being chosen, if you can consider all items at once. However, if you accept the first item, then you only need to select k-1 items from N-1 remaining items. If you reject it, you still need k items from N-1 remaining items. So the algorithm would look like
N = len(id_list)
k = 10 # For example
choices = []
for i in id_list:
if random.randint(1,N) <= k:
choices.append(i)
k -= 1
N -= 1
Initially, the first item is chosen with the expected probability of k/N. As you go through your list, N steadily decreases, while k decreases as you actually accept items. Note that each item, overall, still has a p = k/N chance of being chosen. As an example, consider the second item in the list. Let pi be the probability that you choose the ith element in the list. p1 is obviously k/N, given the starting values of k and N. Consider p2 for example.
p2 = p1 * (k-1) / (N-1) + (1-p1) * k / (N-1)
= (p1*k - p1 + k - k*p1) / (N-1)
= (k - p1)/(N-1)
= (k - k/N)/(N-1)
= k/(N-1) - k/(N*(N-1)
= (k*N - k)/(N*(N-1))
= k/N
Similar (but longer) analysis holds for p3, p4, etc.

Categories