I see a lot of questions about the run-time complexity of python's built in methods, and there are a lot of answers for a lot of the methods (e.g. https://wiki.python.org/moin/TimeComplexity , https://www.ics.uci.edu/~pattis/ICS-33/lectures/complexitypython.txt , Cost of len() function , etc.)
What I don't see anything that addresses enumerate. I know it returns at least one new array (the indexes) but how long does it take to generate that and is the other array just the original array?
In other words, I'm assuming it's O(n) for creating a new array (iteration) and O(1) for the reuse of the original array...O(n) in total (I think). Is the another O(n) for the copy making it O(n^2), or something else...?
The enumerate-function returns an iterator. The concept of an iterator is described here.
Basically this means that the iterator gets initialized pointing to the first item of the list and then returning the next element of the list every time its next() method gets called.
So the complexity should be:
Initialization: O(1)
Returning the next element: O(1)
Returning all elements: n * O(1)
Please note that enumerate does NOT create a new data structure (list of tuples or something like that)! It is just iterating over the existing list, keeping the element index in mind.
You can try this out by yourself:
# First, create a list containing a lot of entries:
# (O(n) - executing this line should take some noticeable time)
a = [str(i) for i in range(10000000)] # a = ["0", "1", ..., "9999999"]
# Then call the enumeration function for a.
# (O(1) - executes very fast because that's just the initialization of the iterator.)
b = enumeration(a)
# use the iterator
# (O(n) - retrieving the next element is O(1) and there are n elements in the list.)
for i in b:
pass # do nothing
Assuming the naïve approach (enumerate duplicates the array, then iterates over it), you have O(n) time for duplicating the array, then O(n) time for iterating over it. If that was just n instead of O(n), you would have 2 * n time total, but that's not how O(n) works; all you know is that the amount of time it takes will be some multiple of n. That's (basically) what O(n) means anyway, so in any case, the enumerate function is O(n) time total.
As martineau pointed out, enumerate() does not make a copy of the array. Instead it returns an object which you use to iterate over the array. The call to enumerate() itself is O(1).
Related
My textbook says that the following algorithm has an efficiency of O(n):
list = [5,8,4,5]
def check_for_duplicates(list):
dups = []
for i in range(len(list)):
if list[i] not in dups:
dups.append(list[i])
else:
return True
return False
But why? I ask because the in operation has an efficiency of O(n) as well (according to this resource). If we take list as an example the program needs to iterate 4 times over the list. But with each iteration, dups keeps growing faster. So for the first iteration over list, dups does not have any elements, but for the second iteration it has one element, for the third two elements and for the fourth three elements. Wouldn't that make 1 + 2 + 3 = 6 extra iterations for the in operation on top of the list iterations? But if this is true then wouldn't this alter the efficiency significantly, as the sum of the extra iterations grows faster with every iteration?
You are correct that the runtime of the code that you've posted here is O(n2), not O(n), for precisely the reason that you've indicated.
Conceptually, the algorithm you're implementing goes like this:
Maintain a collection of all the items seen so far.
For each item in the list:
If that item is in the collection, report a duplicate exists.
Otherwise, add it to the collection.
Report that there are no duplicates.
The reason the code you've posted here is slow is because the cost of checking whether a duplicate exists is O(n) when using a list to track the items seen so far. In fact, if you're using a list of the existing elements, what you're doing is essentially equivalent to just checking the previous elements of the array to see if any of them are equal!
You can speed this up by switching your implementation so that you use a set to track prior elements rather than a list. Sets have (expected) O(1) lookups and insertions, so this will make your code run in (expected) O(1) time.
I have designed an algorithm but confused whether the time complexity is theta(n) or theta (n^2).
def prefix_soms(number):
Array_A=[1,2,3,4,5,6]
Array_B=[]
soms=0
for i in range(0,number):
soms=soms+Array_A[i]
Array_B.insert(i,soms)
return Array_B[number-1]
I know the for loop is running n times so that's O(n).
Is the inside operations O(1)?
For arbitrary large numbers, it is not, since adding two huge numbers takes logarithmic time in the value of these numbers. If we assume that the sum will not run out of control, then we can say that it runs in O(n). The .insert(…) is basically just an .append(…). The amortized cost of appending n items is O(n).
We can however improve the readablility, and memory usage, by writing this as:
def prefix_soms(number):
Array_A=[1,2,3,4,5,6]
soms=0
for i in range(0,number):
soms += Array_A[i]
return soms
or:
def prefix_soms(number):
Array_A=[1,2,3,4,5,6]
return sum(Array_A[:number])
or we can omit creating a copy of the list, by using islice(..):
from itertools import islice
def prefix_soms(number):
Array_A=[1,2,3,4,5,6]
return sum(islice(Array_A, number))
We thus do not need to use another list, since we are only interested in the last item.
Given that the insert method doesn't shift your array - that is as to your algorithm it solely appends one element to end of the list - its time
complexity is O(1). Moreover, accessing an element with index takes O(1) time as well.
You run number of number time a loop with some O(1)s. O(number)*someO(1)s = O(number)
The complexity of list.insert is O(n), as shown on this wiki page. You can check the blist library which provides an optimized list type that has an O(log n) insert, but in your algorithm I think that the item is always placed at the end of the Array_B, so it is just an append which takes constant amortized time (You can replace the insert with append to make the code more elegant).
I have been given a small and simple function to refactor into a function that is O(n) complexity. However, i believe the function given already is, unless I am missing something?
Basically the idea of the function is simply to iterate over a list and remove the target item.
for i in self.items:
if i == item:
self.items.pop(i)
I know the for loop gives this a O(n) complexity but does the additional if statement add to the complexity? I didn't think it did in the worst-case for this simple piece of code.
If it does, is there a way this can be re-written to be O(n)?
I cannot think of another way to iterate over a list and remove an item without using a For loop and then using an if statement to do the comparison?
PS. self.items is a list of words
The list.pop method has an average and worst time complexity of O(n), so compounded with the loop, it makes the code O(n^2) in time complexity.
And as #juanpa.arrivillaga has already pointed out in the comments, you can use a list comprehension instead to filter out items of a specific value in O(n) time complexity:
self.items = [i for i in self.items if i != item]
for i in self.items: # grows with the cardinal n of self.items
if i == item:
self.items.pop(i) # grows with the cardinal n of self.items
So you have a complexity of O(n²).
The list method remove(item) in python though is of complexity O(n), so you'd prefer use it.
self.items.remove(item)
Your current solution has the time complexity of O(n^2).
For an O(n) solution you can just use for example an list comprehension to filter all the non wanted elements from the list:
self.items = [i for i in self.items if i != item]
First, you'd have to slightly adjust your code. The argument given to pop() is currently an item, where it should be an index (use remove() to remove the first occurrence of an item).
for i, item in enumerate(self.items):
if item == target:
self.items.pop(i)
The complexity depends on how often item matches the element in your list.
Using n=len(items) and k for the number of matches, the complexity is O(n, k) = O(n) + k O(n). The first term comes from the fact that we iterate through the entire list, the second one corresponds to the individual .pop(i) operations.
The total complexity for k=1 is thus simply O(n), and could go up to O(n*n) for k=n.
Note that the complexity for .pop(i) is O(n-i). Hence, popping the first and last element is O(n) and O(1), respectively.
Lastly, I'd generally recommend not to add/remove items to an object that you're currently iterating over.
In Python which is better in terms of performance:
1)
for i in range(len(a[:-1])):
foo()
or
2)
for i in range(len(a)-1):
foo()
UPDATE:
Some context on why I'm looping over indices (non-idiomatic?):
for c in reversed(range(len(self._N)-1)):
D[c] = np.dot(self._W[c], D[c-1])*A[c]*(1-A[c])
The second one is better, two reasons:
The first one created a new list a[:-1], which takes up unnecessary time & memory, the second one didn't create a new list.
The second one is more intuitive and clear.
[:] returns a shallow copy of a list. it means that every slice notation returns a list which have new address in memory, but its elements would have same addresses that elements of source list have.
range(0, len(a) - 1) should be more performant of the 2 options. The other makes an unnecessary copy of the sequence (sans 1 element). For looping like this, you might even get a performance boost by using xrange on python2.x as that avoids materializing a list from the range call.
Of course, you usually don't need to loop over indices in python -- There's usually a better way.
The expression a[:-1] creates a new list for which you take the length and discard the list. This is O(n) time and space complexity, compared to O(1) time and space complexity for the second example.
neither is the right way
for i,item in enumerate(a[:-1]):
...
is how you should probably do it ...
in general in python you should never do
for i in range(len(a_list)): ...
here are the timing differences
>>> timeit.timeit("for i in range(len(a)-1):x=i","a=range(1000000)",number=100)
3.30806345671283
>>> timeit.timeit("for i in enumerate(a[:-1]):x=i","a=range(1000000)",number=100
5.3319918613661201
as you can see it takes an extra 2 seconds to do 100 times on a large list of integers ... but its still a better idea imho
In the following trivial examples there are two functions that sort a list of random numbers. The first method passes sorted a generator expression, the second method creates a list first:
import random
l = [int(1000*random.random()) for i in xrange(10*6)]
def sort_with_generator():
return sorted(a for a in l)
def sort_with_list():
return sorted([a for a in l])
Benchmarking with line profiler indicates that the second option (sort_with_list) is about twice as fast as the generator expression.
Can anyone explain what's happening, and why the first method is so much slower than the second?
Your first example is a generator expression that iterates over a list. Your second example is a list expression that iterates over a list. Indeed, the second example is slightly faster.
>>> import timeit
>>> timeit("sorted(a for a in l)", setup="import random;l = [int(1000*random.random()) for i in xrange(10*6)]")
5.963912010192871
>>> timeit("sorted([a for a in l])", setup="import random;l = [int(1000*random.random()) for i in xrange(10*6)]")
5.021576881408691
The reason for this is undoubtedly that making a list is done in one go, while iterating over a generator requires function calls.
Generators are not to speed up small lists like this (you have 60 elements in the list, that's very small). It's to save memory when creating long lists, primarily.
If you look at the source for sorted, any sequence you pass in gets copied into a new list first.
newlist = PySequence_List(seq);
generator --> list appears to be slower than list --> list.
>>> timeit.timeit('x = list(l)', setup = 'l = xrange(1000)')
16.656711101531982
>>> timeit.timeit('x = list(l)', setup = 'l = range(1000)')
4.525658845901489
As to why a copy must be made, think about how sorting works. Sorts aren't linear algorithms. We move through the data multiple times, sometimes traversing data in both directions. A generator is intended for producing a sequence through which we iterate once and only once, from start to somewhere after it. A list allows for random access.
On the other hand, creating a list from a generator will mean only one list in memory, while making a copy of a list will mean two lists in memory. Good 'ol fashioned space-time tradeoff.
Python uses Timsort, a hybrid of merge sort and insertion sort.
List expressions, firstly, loads data into a memory. Then doing any operations with resulting list. Let the allocation time is T2 (for second case).
Generator expressions not allocating time at once, but it change iterator value for time t1[i]. Sum of all t1[i] will be T1. T1 ≈ T2.
But when you call sorted(), in the first case time T1 added with time of allocation memory of every pairs compared to sort (tx1[i]). In result, T1 added with sum of all tx1[i].
Therefore, T2 < T1 + sum(tx1[i])