Python - time complexity of not in set

Python - time complexity of not in set - python

I know the time complexity of checking if x in set is O(1) but what about if x not in set? Would that be O(1) still because set is similar to a dictionary?

x not in some_set just negates the result of x in some_set, so it has the same time complexity. This is the case for any object, set or not. You can take a look at the place where the CPython implementation does res = !res; if you want.

For more information on the time complexities of Python Data Structures please reference this https://wiki.python.org/moin/TimeComplexity.
From this it is shown that x in s performs O(1) on average and O(n) in worst case. So as pointed out by
user2357112 x not in s is equivalent to not x in s which just negates the result of x in s and will have the same time complexity.

Related

Complexity of built-in methods like list.count(). Do they take O(1) time?

I have two methods to count occurrences of any element. One is using built-in method count and other is using loop.
Time complexity for 2nd method is O(n), but not sure of built-in method.
Does count take time of O(1) or O(n)?Please also tell me about other built-in methods like reverse, index, etc. Using count.
List1 = [10,4,5,10,6,4,10]
print(List1.count(10))
using loop
List2 = [10,4,5,10,6,4,10]
count = 0
for ele in List2:
if (ele == 10):
count += 1
print(count)

As per the documentation
list.count(x) - Return the number of times x appears in the list.
Now think about it: if you have 10 cups over some coloured balls, can you be 100% certain about the number of red balls under the cups before you check under all of the cups?
Hint: No
Therefore, list.count(x) has to check the entire list. As the list has size n, list.count(x) has to be O(n).
EDIT: For the pedantic readers out there, of course there could be an implementation of lists that stores the count of every item. This would lead to an increase in memory usage but would provide the O(1) for list.count(x).
EDIT2: You can have a look at the implementation of list.count here. You will see the for loop that runs exactly n times, definitely answering your question: Built-in methods do not necessarily take O(1) time, list.count(x) being an example of a built-in method which is O(n)

The build in count() method in python is also having time complexity of O(n)
The time complexity of the count(value) method is O(n) for a list with n elements. The standard Python implementation cPython “touches” all elements in the original list to check if they are equal to the value. Thus, the time complexity is linear in the number of list elements.

It's an easy thing to see for yourself.
>>> import timeit
>>> timeit.timeit('x.count(10)', 'x=list(range(100))', number=1000)
0.007884800899773836
>>> timeit.timeit('x.count(10)', 'x=list(range(1000))', number=1000)
0.03378760418854654
>>> timeit.timeit('x.count(10)', 'x=list(range(10000))', number=1000)
0.2234031839761883
>>> timeit.timeit('x.count(10)', 'x=list(range(100000))', number=1000)
2.1812812101561576
Maybe a little better than O(n), but definitely closer to that than O(1).

I am confused about the time complexity of my algorithm

I have designed an algorithm but confused whether the time complexity is theta(n) or theta (n^2).
def prefix_soms(number):
Array_A=[1,2,3,4,5,6]
Array_B=[]
soms=0
for i in range(0,number):
soms=soms+Array_A[i]
Array_B.insert(i,soms)
return Array_B[number-1]
I know the for loop is running n times so that's O(n).
Is the inside operations O(1)?

For arbitrary large numbers, it is not, since adding two huge numbers takes logarithmic time in the value of these numbers. If we assume that the sum will not run out of control, then we can say that it runs in O(n). The .insert(…) is basically just an .append(…). The amortized cost of appending n items is O(n).
We can however improve the readablility, and memory usage, by writing this as:
def prefix_soms(number):
Array_A=[1,2,3,4,5,6]
soms=0
for i in range(0,number):
soms += Array_A[i]
return soms
or:
def prefix_soms(number):
Array_A=[1,2,3,4,5,6]
return sum(Array_A[:number])
or we can omit creating a copy of the list, by using islice(..):
from itertools import islice
def prefix_soms(number):
Array_A=[1,2,3,4,5,6]
return sum(islice(Array_A, number))
We thus do not need to use another list, since we are only interested in the last item.

Given that the insert method doesn't shift your array - that is as to your algorithm it solely appends one element to end of the list - its time
complexity is O(1). Moreover, accessing an element with index takes O(1) time as well.
You run number of number time a loop with some O(1)s. O(number)*someO(1)s = O(number)

The complexity of list.insert is O(n), as shown on this wiki page. You can check the blist library which provides an optimized list type that has an O(log n) insert, but in your algorithm I think that the item is always placed at the end of the Array_B, so it is just an append which takes constant amortized time (You can replace the insert with append to make the code more elegant).

Space complexity when making input smaller

Imagine you want to find all the duplicates in an array, and you must do this in O(1) space and O(N) time.
An algorithm like this would have O(N) space:
def find_duplicates(arr):
seen = set()
res = []
for i in arr:
if i in seen: res.append(i)
seen.add(i)
return res
My question is would the following algorithm use O(1) space or O(N) space:
def find_duplicates(arr):
seen = set()
res = []
while arr:
i = arr.pop()
if i in seen: res.append(i)
seen.add(i)
return res
Technically arr gets smaller and the sum of |seen| and |arr| will always be less than the original |arr|, but at the end of the day I think it's still allocating |arr| space for seen.

In order to determine the space complexity, you have to know something about how pop is implemented, as well as how Python manages memory. In order for your algorithm to use constant space, arr would have to release the memory used by popped items, and seen would have to be able to reuse that memory. However, most implementations of Python probably do not support that level of sharing. In particular, pop isn't going to release any memory; it will keep it against the possibility of needing it in the future, rather than having to ask to get the memory back.

Whenever you try to do time and space complexity analysis, think of a test case which could blow up your program the most.
Your space complexity is O(N). In the case of your second program, if you have a list of numbers with only 1s. Eg: x = [1,1,1,1,1,1,1]. Then you'll see that res grows almost to the size of N. Consider what happens when you have all different numbers. x = [1,2,3,4,5,6,7,8]. Now seen grows to the size of N.
Also thinking about time complexity, the pop() function of python lists could sometime be a problem. Check out this post for more details.

What is the run time of the set difference function in Python?

The question explains it, but what is the time complexity of the set difference operation in Python?
EX:
A = set([...])
B = set([...])
print(A.difference(B)) # What is the time complexity of the difference function?
My intuition tells me O(n) because we can iterate through set A and for each element, see if it's contained in set B in constant time (with a hash function).
Am I right?
(Here is the answer that I came across: https://wiki.python.org/moin/TimeComplexity)

looks that you're right, difference is performed with O(n) complexity in the best cases
But keep in mind that in worst cases (maximizing collisions with hashes) it can raise to O(n**2) (since lookup worst case is O(n): How is set() implemented?, but it seems that you can generally rely on O(1))
As an aside, speed depends on the type of object in the set. Integers hash well (roughly as themselves, with probably some modulo), whereas strings need more CPU.

https://wiki.python.org/moin/TimeComplexity suggests that its O(cardinality of set A) in the example you described.
My understanding that its O(len(A)) and not O(len(B)) because, you only need to check if each element in setA is present in setB. Each lookup in setB is O(1), hence you will be doing len(A) * O(1) lookups on setB. Since O(1) is constant, then its O(len(A))
Eg:
A = {1,2,3,4,5}
B = {3,5,7,8,9,10,11,12,13}
A-B = {1,2,4}
When A-B is called, iterate through every element in A (only 5 elements), and check for membership in B. If not found in B, then it will be present in the result.
Note: Of course all this is amortised complexity. In practice, each lookup in setB could be more than O(1).

python function slowing down for no apparent reason

I have a python function defined as follows which i use to delete from list1 the items which are already in list2. I am using python 2.6.2 on windows XP
def compareLists(list1, list2):
curIndex = 0
while curIndex < len(list1):
if list1[curIndex] in list2:
list1.pop(curIndex)
else:
curIndex += 1
Here, list1 and list2 are a list of lists
list1 = [ ['a', 11221, '2232'], ['b', 1321, '22342'] .. ]
# list2 has a similar format.
I tried this function with list1 with 38,000 elements and list2 with 150,000 elements. If i put in a print statement to print the current iteration, I find that the function slows down with each iterations. At first, it processes around 1000 or more items in a second and then after a while it reduces to around 20-50 a second. Why can that be happening?
EDIT: In the case with my data, the curIndex remains 0 or very close to 0 so the pop operation on list1 is almost always on the first item.
If possible, can someone also suggest a better way of doing the same thing in a different way?

Try a more pythonic approach to the filtering, something like
[x for x in list1 if x not in set(list2)]
Converting both lists to sets is unnessescary, and will be very slow and memory hungry on large amounts of data.
Since your data is a list of lists, you need to do something in order to hash it.
Try out
list2_set = set([tuple(x) for x in list2])
diff = [x for x in list1 if tuple(x) not in list2_set]
I tested out your original function, and my approach, using the following test data:
list1 = [[x+1, x*2] for x in range(38000)]
list2 = [[x+1, x*2] for x in range(10000, 160000)]
Timings - not scientific, but still:
#Original function
real 2m16.780s
user 2m16.744s
sys 0m0.017s
#My function
real 0m0.433s
user 0m0.423s
sys 0m0.007s

There are 2 issues that cause your algorithm to scale poorly:
x in list is an O(n) operation.
pop(n) where n is in the middle of the array is an O(n) operation.
Both situations cause it to scale poorly O(n^2) for large amounts of data. gnud's implementation would probably be the best solution since it solves both problems without changing the order of elements or removing potential duplicates.

If we rule the data structure itself out, look at your memory usage next. If you end up asking the OS to swap in for you (i.e., the list takes up more memory than you have), Python's going to sit in iowait waiting on the OS to get the pages from disk, which makes sense given your description.
Is Python sitting in a jacuzzi of iowait when this slowdown happens? Anything else going on in the environment?
(If you're not sure, update with your platform and one of us will tell you how to tell.)

The only reason why the code can become slower is that you have big elements in both lists which share a lot of common elements (so the list1[curIndex] in list2 takes more time).
Here are a couple of ways to fix this:
If you don't care about the order, convert both lists into sets and use set1.difference(set2)
If the order in list1 is important, then at least convert list2 into a set because in is much faster with a set.
Lastly, try a filter: filter(list1, lambda x: x not in set2)
[EDIT] Since set() doesn't work on recursive lists (didn't expect that), try:
result = filter(list1, lambda x: x not in list2)
It should still be much faster than your version. If it isn't, then your last option is to make sure that there can't be duplicate elements in either list. That would allow you to remove items from both lists (and therefore making the compare ever cheaper as you find elements from list2).

EDIT: I've updated my answer to account for lists being unhashable, as well as some other feedback. This one is even tested.
It probably relates to the cost of poping an item out of a middle of a list.
Alternatively have you tried using sets to handle this?
def difference(list1, list2):
return [x for x in list1 if tuple(x) in set(tuple(y) for y in list2)]
You can then set list one to the resulting list if that is your intention by doing
list1 = difference(list1, list2)

The often suggested set wont work here, because the two lists contain lists, which are unhashable. You need to change your data structure first.
You can
convert the sublists into tuples or class instances to make them hashable, then use sets.
Keep both lists sorted, then you just have to compare the lists' heads.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python - time complexity of not in set - python

I know the time complexity of checking if x in set is O(1) but what about if x not in set? Would that be O(1) still because set is similar to a dictionary?

x not in some_set just negates the result of x in some_set, so it has the same time complexity. This is the case for any object, set or not. You can take a look at the place where the CPython implementation does res = !res; if you want.

Related

Complexity of built-in methods like list.count(). Do they take O(1) time?

I am confused about the time complexity of my algorithm

Space complexity when making input smaller

What is the run time of the set difference function in Python?

python function slowing down for no apparent reason

Categories

Resources