could someone help to get the space complexity for this python function?
input: nums = [1, 2, 3, 4, 5, 6, 7, 8....]
m = integer
for i in range(len(nums)):
temp = nums[i:i+m]
should this space complexity as o(m), or o(n*m), and why? Thank you!
Not including the input, with that piece of code since m doesn't seem to be a constant, it should just be O(m) because at any given point in time, we are only storing 1 chunk of nums[i:i+m] because temp is just reassigned with a new sublist for every loop thus making the previous sublist to be subject for garbage collection already.
So regardless if there are 1 million nums and m is only 5, then we would only be storing 5 items now, then next iteration leave that previous 5 items and store a new set of 5 items (depending on python implementation, this might even just use the same memory used and overwrite the previous one), and so on.
But if you are storing each sublist such as:
temp_list = []
for i in range(len(nums)):
temp = nums[i:i+m]
temp_list.append(temp)
Then it should be O(m * len(nums)) because we will be storing m items for each element in nums.
Ignoring Python's garbage collection, you get space complexity of O(mn), where n = len(nums). That's because you first allocate a list of n elements, and then you allocate n lists of m elements each (note that slicing a list creates a new allocation). That gives a total of n + mn cell allocations, which is asymptotically O(mn).
But the lists created in the for loop are all referenced by temp. That means that as soon as a new list is created, the previous one has no references and it becomes eligible for garbage collection. That leaves us practically with two lists: nums with length of n, and the last temp with length of m, which amounts to space complexity of O(m + n).
Related
The following function has been given:
def genSubsets(L):
res = []
if len(L) == 0:
return [[]]
smaller = genSubsets(L[:-1])
extra = L[-1:]
new = []
for small in smaller:
new.append(small+extra)
return smaller+new
From my understanding, i making a copy of a list is (O n), then looping is (O n) as well. Which should make this (O n^2). However, it seems that my logic is flawed and the answer is (O 2^n). Why?
From my understanding, i making a copy of a list is (O n)
You are correct that making a copy of a list of n items takes time O(n). And in this case, each of the lists that's being copied is a subset of the original list, which has length n, so each list copied does take time O(n).
then looping is (O n) as well
Looping over a list of length n takes time O(n). However, in this case, the lists that you're looping over do not have n elements in them. There are 2n subsets of a set of size n, so at the top-level recursive call, when you recursively generate all subsets of L[:-1], you will end up with a list of 2n-1 items. Looping over that list takes time O(2n).
More generally, when looking at a loop or a list, it's important to ask "how many times does this loop run?" or "how many elements are in this list?"
Is this the right way to go about it?
Or does 0 also have to be in [] brackets?
list = [0 * i for i in range(n)]
If yes then how is it different from writing it like this right away?
list = [0]* n
Use [0] * n, where n is your desired no. of elements in the list.
In the first instance you are creating a list comprehension in which values within the range 0 to (n-1) are being inserted as elements into a list however they are also individually being multiplied by zero, leaving them as zero.
Whilst in the second instance which you had written as list = [0 * n] you are inserting a single value into a list: 0 multiplied by n - thus leaving the list length at 1.
(now you have adjusted it to match our answers)
ls = [0] * n
And now you have phrased your question as: Why is your list comprehension technique different to the, recently amended approach, of multiplying the list by n. This is because as previously mentioned in the list comprehension approach you are multiplying each element by zero before inserting it into the list, whilst the 2nd approach you are multiplying the entire list by n causing it to become length n, filled with zeros.
When I plot the time taken for the following algorithm for different size input, the time complexity appears to be polynomial. I'm not sure which operations account for this.
I'm assuming it's to do with list(s), del l[i] and l[::-1], but I'm not clear what the complexity of these is individually. Can anyone please explain?
Also, is there a way to optimize the algorithm without completely changing the approach? (I know there is a way to bring it down to linear time complexity by using "double-ended pincer-movement".)
def palindrome_index(s):
for i, c in enumerate(s):
l = list(s)
del l[i]
if l[::-1] == l:
return i
return -1
Your algorithm indeed is quadratic in len(s):
In iteration i, you perform linear time operations in the length: creating the list, reversing it, and (on linear on average) erasing element i. Since you perform this len(s) times, it is quadratic in len(s).
I'm assuming it's to do with list(s), del l[i] and l[::-1], but I'm not clear what the complexity of these is individually. Can anyone please explain?
Each of these operations is linear time (at least on average, which is enough to analyze your algorithm). Constructing a list, either from an iterable, or by reversing an existing list, is linear in the length of the list. Deleting element i, at the very least, requires about n - i + 1 shifts of the elements, as each one is moved back once.
All of these are linear "O(n)":
list(s)
list(s) creates a new list from s. To do that, it has to go through all elements in s, so its time is proportional to the length of s.
l[::-1]
Just like list(s), l[::-1] creates a new list with the same elements as l, but in different order. It has to touch each element once, so its time is proportional to the length of l.
del l[i]
In order to delete an element at position i, the element which was at position i+1 has to be moved to position i, then element which was at i+2 has to be moved to position i+1 etc. So, if you are deleting the first element (del l[0]), it has to touch move elements of the list and if you are deleting the last (del l[-1]), it just has to remove the last. On average, it will move n/2 elements, so it is also linear.
I'm expecting very slow performance with the algorithm below.
I've a very large (1.000.000+) list containing large strings.
ie: id_list = ['MYSUPERLARGEID:1123:123123', 'MYSUPERLARGEID:1123:134534389', 'MYSUPERLARGEID:1123:12763']...
num_reads is the max number of elements to random choose from this list.
The idea is to randomly choose one of the string ids in id_list until num_reads is reached and to add (I say add, and not append because I don't care on random_id_list order) them into random_id_list which is empty at the beginning.
I can't repeat same id so I remove it from the original list after being randonly chosen. I suspect this is what is doing the script to go real slow.. maybe I'm wrong and it's another part of this loop the responsible of slow behavior.
for x in xrange(0, num_reads):
id_index, id_string = random.choice(list(enumerate(id_list)))
random_id_list.append(id_string)
del read_id_list[id_index]
Use random.sample() to produce a sample of N elements with no repeats:
random_id_list = random.sample(read_id_list, num_reads)
Removing elements from the middle of a large list is indeed slow, as everything beyond that index has to be moved up a step.
This does not, of course, remove elements from the original list anymore, so repeated random.sample() calls can still give you samples with elements that have been picked before. If you need to produce samples repeatedly until your list is exhausted, then shuffle once and from there on out take consecutive slices of k elements from the shuffled list:
def random_samples(k):
random.shuffle(id_list)
for i in range(0, len(id_list), k):
yield id_list[i : i + k]
then use this to produce your samples; either in a loop or with next():
sample_gen = random_samples(num_reads)
random_id_list = next(sample_gen)
# some point later
another_random_id_list = next(sample_gen)
Because the list is shuffled entirely randomly, the slices produced this way are also all valid random samples.
The "hard" way, instead of just shuffling the list, is to evaluate each element of your list in order, and selecting the item with a probability that relies on both the number of items you still need to choose and the number of items left to choose from. This is useful if you don't have the entire list presented to you at once (a so-called on-line algorithm).
Let's say you need to select k of N items. That means each item has a k/N probability of being chosen, if you can consider all items at once. However, if you accept the first item, then you only need to select k-1 items from N-1 remaining items. If you reject it, you still need k items from N-1 remaining items. So the algorithm would look like
N = len(id_list)
k = 10 # For example
choices = []
for i in id_list:
if random.randint(1,N) <= k:
choices.append(i)
k -= 1
N -= 1
Initially, the first item is chosen with the expected probability of k/N. As you go through your list, N steadily decreases, while k decreases as you actually accept items. Note that each item, overall, still has a p = k/N chance of being chosen. As an example, consider the second item in the list. Let pi be the probability that you choose the ith element in the list. p1 is obviously k/N, given the starting values of k and N. Consider p2 for example.
p2 = p1 * (k-1) / (N-1) + (1-p1) * k / (N-1)
= (p1*k - p1 + k - k*p1) / (N-1)
= (k - p1)/(N-1)
= (k - k/N)/(N-1)
= k/(N-1) - k/(N*(N-1)
= (k*N - k)/(N*(N-1))
= k/N
Similar (but longer) analysis holds for p3, p4, etc.
I know that similar questions exist, but I'd like to know what is wrong with my code in particular. Thanks in advance!
isum = 0
l = list(range(2, uplim + 1))
while l != []:
isum += l[0]
temp = list(range(l[0], uplim + 1, l[0]))
l = list(set(l) - set(temp))
print(isum)
Explanation: the first loop execution will add 2 (being the first term in the list) to the sum variable and remove all multiples of 2 from the list. 3 will now be the first term in the list and this will be added to isum, followed by all multiples of 3 being removed. 5 will now be the first term (because 4 was removed - being a multiple of 2) etc.
Sets are unordered. The idea behind the code is okay, but converting lists to sets loses the ordering information, and converting back to lists produces a worse than random ordering; you can't make any guarantees about it, even statistically. The first element of l isn't guaranteed to be the lowest, so it isn't guaranteed to be a prime, and everything goes to hell.