Can the Efficiency of this Algorithm be Linear? - python

My textbook says that the following algorithm has an efficiency of O(n):
list = [5,8,4,5]
def check_for_duplicates(list):
dups = []
for i in range(len(list)):
if list[i] not in dups:
dups.append(list[i])
else:
return True
return False
But why? I ask because the in operation has an efficiency of O(n) as well (according to this resource). If we take list as an example the program needs to iterate 4 times over the list. But with each iteration, dups keeps growing faster. So for the first iteration over list, dups does not have any elements, but for the second iteration it has one element, for the third two elements and for the fourth three elements. Wouldn't that make 1 + 2 + 3 = 6 extra iterations for the in operation on top of the list iterations? But if this is true then wouldn't this alter the efficiency significantly, as the sum of the extra iterations grows faster with every iteration?

You are correct that the runtime of the code that you've posted here is O(n2), not O(n), for precisely the reason that you've indicated.
Conceptually, the algorithm you're implementing goes like this:
Maintain a collection of all the items seen so far.
For each item in the list:
If that item is in the collection, report a duplicate exists.
Otherwise, add it to the collection.
Report that there are no duplicates.
The reason the code you've posted here is slow is because the cost of checking whether a duplicate exists is O(n) when using a list to track the items seen so far. In fact, if you're using a list of the existing elements, what you're doing is essentially equivalent to just checking the previous elements of the array to see if any of them are equal!
You can speed this up by switching your implementation so that you use a set to track prior elements rather than a list. Sets have (expected) O(1) lookups and insertions, so this will make your code run in (expected) O(1) time.

Related

I am confused about the time complexity of my algorithm

I have designed an algorithm but confused whether the time complexity is theta(n) or theta (n^2).
def prefix_soms(number):
Array_A=[1,2,3,4,5,6]
Array_B=[]
soms=0
for i in range(0,number):
soms=soms+Array_A[i]
Array_B.insert(i,soms)
return Array_B[number-1]
I know the for loop is running n times so that's O(n).
Is the inside operations O(1)?
For arbitrary large numbers, it is not, since adding two huge numbers takes logarithmic time in the value of these numbers. If we assume that the sum will not run out of control, then we can say that it runs in O(n). The .insert(…) is basically just an .append(…). The amortized cost of appending n items is O(n).
We can however improve the readablility, and memory usage, by writing this as:
def prefix_soms(number):
Array_A=[1,2,3,4,5,6]
soms=0
for i in range(0,number):
soms += Array_A[i]
return soms
or:
def prefix_soms(number):
Array_A=[1,2,3,4,5,6]
return sum(Array_A[:number])
or we can omit creating a copy of the list, by using islice(..):
from itertools import islice
def prefix_soms(number):
Array_A=[1,2,3,4,5,6]
return sum(islice(Array_A, number))
We thus do not need to use another list, since we are only interested in the last item.
Given that the insert method doesn't shift your array - that is as to your algorithm it solely appends one element to end of the list - its time
complexity is O(1). Moreover, accessing an element with index takes O(1) time as well.
You run number of number time a loop with some O(1)s. O(number)*someO(1)s = O(number)
The complexity of list.insert is O(n), as shown on this wiki page. You can check the blist library which provides an optimized list type that has an O(log n) insert, but in your algorithm I think that the item is always placed at the end of the Array_B, so it is just an append which takes constant amortized time (You can replace the insert with append to make the code more elegant).

O(n) complexity algorithm to remove instances of value from unsorted list without remove() method

I have a homework question to write a function which is part of a class bagOfWords to remove instances of a value from an unsorted list. The list operations we can use don't include remove(). We need to have only O(n) complexity and the naive algorithm doesn't perform that well.
I tried a naive algorithm. This is too complex an algorithm. It uses list.pop(index) which of itself has O(n) complexity and it has two loops. Since we are not allowed to use list.remove() and because a list comprehension would have the same complexity but in a more succinct syntax, I'm trying to find a better implementation.
I thought maybe the solution was a quicksort algorithm because I might be able to do this with O(n) complexity if I first sort the list. But how would I then remove this item without the complexity of pop(index)? Now I'm wondering if searching for the pattern via a KMP algorithm would be the solution, or hashing.
def remove(self, item):
"""Remove all copies of item from the bag.
Do nothing if the item doesn't occur in the bag.
"""
index = 0
while index < len(self.items):
if self.items[index] == item:
self.items.pop(index)
else:
index += 1
The complexity is quadratic. However, I want a complexity that is O(n)
Edit: to clarify, we are actually constrained to modifying an existing list.
Edit: The simplest (and arguably just "correct") way to do this is to use a list comprehension:
self.items = [x for x in self.items if x != item]
It's O(n), and it's faster than the below options. It's also by far the most "pythonic".
However, it does create a new copy of the list. If you are actually constrained to modifying an existing list, here's my original answer:
Here's an "in-place" O(n) algorithm that uses two pointers to collapse the list down, removing the unwanted elements:
ixDst = 0
for ixSrc in range(0, len(items)):
if items[ixSrc] != item:
items[ixDst] = items[ixSrc]
ixDst += 1
del items[ixDst:]
(See it run here)
The only questionable part is resizing the list down with del. I believe that's in-place and "should" be O(1), since the slice we're removing is at the end of the list.
Also, a more pythonic in-place answer (and a bit faster) was suggested by #chepner in the comments:
self.items[:] = (x for x in self.items if x != item)
Thanks #juanpa.arrivillaga and #chepner for the discussion.
If elements of you list are relatively small integers or can be represented as such you can make sorting in O(max(maxValue, n)).
The other way is to have pointers to previous and next elements for every element in the list. With that you can delete one element in O(1). However, this makes the operation of getting an item by index run in O(n) time.
Also if the order of items does not matter you can store pairs like (item, count) where count is number of times item appears, then you need to delete only one such a pair for given item and have desired complexity.
Hope it helps!
If you need it to perform an in-place deletion, you could shift the items over the ones that are deleted and truncate the list in one operation at the end :
index = 0
for value in self.items:
if value != item :
self.items[index] = value
index += 1
del self.items[index:]
If you can afford to create a new list (without a list comprehension), you could do it like this:
cleanedItems = []
for value in items:
if value != item: cleanedItems.append(value)
self.items = cleanedItems

What time complexity is this function?

I have been given a small and simple function to refactor into a function that is O(n) complexity. However, i believe the function given already is, unless I am missing something?
Basically the idea of the function is simply to iterate over a list and remove the target item.
for i in self.items:
if i == item:
self.items.pop(i)
I know the for loop gives this a O(n) complexity but does the additional if statement add to the complexity? I didn't think it did in the worst-case for this simple piece of code.
If it does, is there a way this can be re-written to be O(n)?
I cannot think of another way to iterate over a list and remove an item without using a For loop and then using an if statement to do the comparison?
PS. self.items is a list of words
The list.pop method has an average and worst time complexity of O(n), so compounded with the loop, it makes the code O(n^2) in time complexity.
And as #juanpa.arrivillaga has already pointed out in the comments, you can use a list comprehension instead to filter out items of a specific value in O(n) time complexity:
self.items = [i for i in self.items if i != item]
for i in self.items: # grows with the cardinal n of self.items
if i == item:
self.items.pop(i) # grows with the cardinal n of self.items
So you have a complexity of O(n²).
The list method remove(item) in python though is of complexity O(n), so you'd prefer use it.
self.items.remove(item)
Your current solution has the time complexity of O(n^2).
For an O(n) solution you can just use for example an list comprehension to filter all the non wanted elements from the list:
self.items = [i for i in self.items if i != item]
First, you'd have to slightly adjust your code. The argument given to pop() is currently an item, where it should be an index (use remove() to remove the first occurrence of an item).
for i, item in enumerate(self.items):
if item == target:
self.items.pop(i)
The complexity depends on how often item matches the element in your list.
Using n=len(items) and k for the number of matches, the complexity is O(n, k) = O(n) + k O(n). The first term comes from the fact that we iterate through the entire list, the second one corresponds to the individual .pop(i) operations.
The total complexity for k=1 is thus simply O(n), and could go up to O(n*n) for k=n.
Note that the complexity for .pop(i) is O(n-i). Hence, popping the first and last element is O(n) and O(1), respectively.
Lastly, I'd generally recommend not to add/remove items to an object that you're currently iterating over.

Python: Elegant way to store items for checking item existence in a container

In the situation I encounter, I would like to define "elegant" being having 1) constant O(1) time complexity for checking if an item exists and 2) store only items, nothing more.
For example, if I use a list
num_list = []
for num in range(10): # Dummy operation to fill the container.
num_list += num
if 1 in num_list:
print("Number exists!")
The operation "in" will take O(n) time according to [Link]
In order to achieve constant checking time, I may employ a dictionary
num_dict = {}
for num in range(10): # Dummy operation to fill the container.
num_dict[num] = True
if 1 in num_dict:
print("Number exists!")
In the case of a dictionary, the operation "in" costs O(1) time according to [Link], but additional O(n) storage is required to store dummy values. Therefore, both implementations/containers seem inelegant.
What would be a better implementation/container to achieve constant O(1) time for checking if an item exists while only storing the items? How to keep resource requirement to the bare minimum?
The solution here is to use a set, which doesnˈt requires you to save a dummy variable for each value.
Normally you can't optimise both space and time together. One thing you can do is have more details about the range of data(here min to max value of num) and size of data(here it is number of times loop runs ie., 10). Then you will have two options :
If range is limited then go for dictionary method(or even use array index method)
If size is limited then go for list method.
If you choose right method then you will probably achieve constant time and space for large sample
EDIT:
Set
It is a hash table, implemented very similarly the Python dict with some optimizations that take advantage of the fact that the values are always null (in a set, we only care about the keys). Set operations do require iteration over at least one of the operand tables (both in the case of union). Iteration isn't any cheaper than any other collection ( O(n) ), but membership testing is O(1) on average.

Complexity of enumerate

I see a lot of questions about the run-time complexity of python's built in methods, and there are a lot of answers for a lot of the methods (e.g. https://wiki.python.org/moin/TimeComplexity , https://www.ics.uci.edu/~pattis/ICS-33/lectures/complexitypython.txt , Cost of len() function , etc.)
What I don't see anything that addresses enumerate. I know it returns at least one new array (the indexes) but how long does it take to generate that and is the other array just the original array?
In other words, I'm assuming it's O(n) for creating a new array (iteration) and O(1) for the reuse of the original array...O(n) in total (I think). Is the another O(n) for the copy making it O(n^2), or something else...?
The enumerate-function returns an iterator. The concept of an iterator is described here.
Basically this means that the iterator gets initialized pointing to the first item of the list and then returning the next element of the list every time its next() method gets called.
So the complexity should be:
Initialization: O(1)
Returning the next element: O(1)
Returning all elements: n * O(1)
Please note that enumerate does NOT create a new data structure (list of tuples or something like that)! It is just iterating over the existing list, keeping the element index in mind.
You can try this out by yourself:
# First, create a list containing a lot of entries:
# (O(n) - executing this line should take some noticeable time)
a = [str(i) for i in range(10000000)] # a = ["0", "1", ..., "9999999"]
# Then call the enumeration function for a.
# (O(1) - executes very fast because that's just the initialization of the iterator.)
b = enumeration(a)
# use the iterator
# (O(n) - retrieving the next element is O(1) and there are n elements in the list.)
for i in b:
pass # do nothing
Assuming the naïve approach (enumerate duplicates the array, then iterates over it), you have O(n) time for duplicating the array, then O(n) time for iterating over it. If that was just n instead of O(n), you would have 2 * n time total, but that's not how O(n) works; all you know is that the amount of time it takes will be some multiple of n. That's (basically) what O(n) means anyway, so in any case, the enumerate function is O(n) time total.
As martineau pointed out, enumerate() does not make a copy of the array. Instead it returns an object which you use to iterate over the array. The call to enumerate() itself is O(1).

Categories