I was reading an interesting post on Short-Circuiting in Python and wondered if this was true for the in operator. My simple testing would conclude that it does not:
%%timeit -n 1000
0 in list(range(10))
1000 loops, best of 3: 639 ns per loop
%%timeit -n 1000
0 in list(range(1000))
1000 loops, best of 3: 23.7 µs per loop
# larger the list, the longer it takes. however, i do notice that a higher
# value does take longer.
%%timeit -n 1000
999 in list(range(1000))
1000 loops, best of 3: 45.1 µs per loop
Is there a detailed explanation of why 999 takes longer than 0. Is the in operator like a loop?
Also, is there a way to tell the in operator to "stop the loop" once the value is found (or is this the already defaulted behavior that I'm not seeing)?
Lastly- Is there another operator/function that I am skipping over that does what I'm talking about in regards to "short-circuiting" in?
Short circuiting does occur. The in operator calls the __contains__ method, which in turn is implemented differently per class (in your case list). Searching for 999 takes around double the time as searching for 0, since half of the work is creating the list, and the other half is iterating through it, which is short circuited in the case of 0.
The implementation of in for list objects is found in list_contains. It performs a scan of the list and does exit early if the last comparison has found the element, there's no point in continuing there.
The loop involved is:
for (i = 0, cmp = 0 ; cmp == 0 && i < Py_SIZE(a); ++i)
cmp = PyObject_RichCompareBool(el, PyList_GET_ITEM(a, i),
Py_EQ);
If cmp is 1 (the value returned from PyObject_RichCompareBool for a match), the for loop condition (cmp == 0 && i < Py_SIZE(a)) becomes false and terminates.
For list objects, which are built-in, what is called for in is a C function (for CPython). For other implementations of Python, this can be a different language using different language constructs.
For user-defined classes in Python, what is called is defined in the Membership test operations of the Reference Manual, take a look there for a run-down of what gets called.
You could also come to this conclusion by timing:
l = [*range(1000)]
%timeit 1 in l
85.8 ns ± 11.9 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
%timeit 999 in l
22 µs ± 221 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
The furthest the element the more you need to scan. If it didn't short-circuit, all in operations would result in similar timings.
Here's another look with a hashed object, set:
from time import time
qlist = list(range(1000))
qset = set(qlist)
start = time()
for i in range(1000):
0 in qlist
print time() - start
start = time()
for i in range(1000):
999 in qlist
print time() - start
start = time()
for i in range(1000):
0 in qset
print time() - start
start = time()
for i in range(1000):
999 in qset
print time() - start
Output:
0.000172853469849 0 in list
0.0399038791656 999 in list
0.000147104263306i 0 in set
0.000195980072021 999 in set
As others have said, the list implementation must do a sequential search. Set inclusion uses a hashed value, and is on par with finding the item in the first element checked.
Related
Following 2 algorithms both have O(n) complexity, however the second one using recursion runs much much slower than the first approach which uses a "for loop". Is it because recursion is expensive in Python?
In the recursion method I assume O(n) because there are n/2 + n/4 + n/8 ... = n comparisons performed in total. I would appreciate if someone could shed more light on how recursion in Python works.
def findmax1(array):
curr = array[0]
for i in range(1,len(array)):
if array[i] > curr:
curr = array[i]
return curr
def findmax2_aux(left, right):
if left > right:
return left
else:
return right
def findmax2(array):
if len(array) <= 1:
return array
mid = len(array)//2
left, right = array[:mid], array[mid:]
left = findmax2(left)
right = findmax2(right)
return findmax2_aux(left, right)
from random import randint
test_array = [randint(1,1000) for x in range(1000000)]
t1 = time.time()
findmax1(test_array)
print(time.time()-t1)
# 0.08
t2 = time.time()
findmax2(test_array)
print(time.time()-t2)
# 1.05
Function calls are generally more expensive than iteration in most languages. Python has to allocate new frame for a function call, and doesn't optimize tail reursion to iteration. See the more generic timings:
In [2]: def recursive(n):
...: if n > 0:
...: return f(n-1)
...:
In [3]: def iterative(n):
...: for _ in range(n):
...: pass
...:
In [4]: %timeit recursive(1000)
114 µs ± 6.1 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [5]: %timeit iterative(1000)
19.2 µs ± 1.5 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)
There is no such fact as recursion being expensive in python but some reasons as to why it appears is because there is increase in number of function invokes and increase in size of function call stack (which keeps track of each nested function call) leading to increased number of interactions b/w processor and memory .
Still recursion is an effective tool but avoid it if iterative and recursive approaches have same time complexity.
This question already has answers here:
Why is a `for` loop so much faster to count True values?
(5 answers)
Closed 3 years ago.
I get a list and want to know if all elements are identical.
For lists with a high number of elements, that are indeed all identical, the conversion into a set is fast, but otherwise going over the list with early exit is performing better:
def are_all_identical_iterate(dataset):
first = dataset[0]
for data in dataset:
if data != first:
return False
return True
# versus
def are_all_identical_all(dataset):
return all(data == dataset[0] for data in dataset)
# or
def are_all_identical_all2(dataset):
return all(data == dataset[0] for data in iter(dataset))
# or
def are_all_identical_all3(dataset):
iterator = iter(dataset)
first = next(iterator)
return all(first == rest for rest in iterator)
NUM_ELEMENTS = 50000
testDataset = [1337] * NUM_ELEMENTS # all identical
from timeit import timeit
print(timeit("are_all_identical_iterate(testDataset)", setup="from __main__ import are_all_identical_iterate, testDataset", number=1000))
print(timeit("are_all_identical_all(testDataset)", setup="from __main__ import are_all_identical_all, testDataset", number=1000))
My results:
0.94 seconds,
3.09 seconds,
3.27 seconds,
2.61 seconds
The for loop takes less than have the time (3x) of the all function. The all function is supposed to be the same implementation.
What is going on?
I want to know why the loop is so much faster and why there is a difference between the last 3 implementations. The last implementation should have one compare less, because the iterator removes the first element, but that shouldn't have this kind of impact.
As suggested in this other SO post one cause could be that:
The use of a generator expression causes overhead for constantly
pausing and resuming the generator.
Anyway I suggest two another approaches that looks faster using map:
def are_all_identical_map(dataset):
for x in map(lambda x: x == dataset[0], dataset):
if not x:
return False
return True
%%timeit
are_all_identical_map(testDataset)
#7.5 ms ± 64.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
and
%%timeit
(map(lambda x: x == dataset[0], dataset)) and True
#303 ns ± 13.4 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
I ran some test to determine if O(==) for Strings is O(len(string)) or O(1).
My tests:
import timeit
x = 'ab' * 500000000
y = 'ab' * 500000000
%timeit x == y
> 163 ms ± 4.62 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
x = 'ab' * 5000
y = 'ab' * 5000
%timeit x == y
> 630 ns ± 23.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
Looking at the above results I understand that string comparison is linear O(N) and not O(1).
However, I was reading this document: Complexities of Python Operations
The part:
Finally, when comparing two lists for equality, the complexity class above shows as O(N), but in reality we would need to multiply this complexity class by O==(...) where O==(...) is the complexity class for checking whether two values in the list are ==. If they are ints, O==(...) would be O(1); if they are strings, O==(...) in the worst case it would be O(len(string)). This issue applies any time an == check is done. We mostly will assume == checking on values in lists is O(1): e.g., checking ints and small/fixed-length strings.
This says the worst case for strings would be O(len(string)). My question is why worst case? Shouldn't the best/average case be O(len(string))?
The algorithm is simple, you check the strings char by char, so:
Hello == Hello => They are equal...so it is actually the worst case because you check all the chars from both strings
Hello != Hella => Still worst case, you realize they are different in the last char of the strings.
hello != Hello => Best case scenario, the first char for both (h != H) are different, so you stop checking them there.
Often to save some time, I would like we to use n = len(s) in my local function.
I am curious about which call is faster or they are the same?
while i < len(s):
# do something
vs
while i < n:
# do something
There should not be too much difference, but using len(s), we need to reach s first, then call s.length. This is O(1) + O(1). But using n, it is O(1). I assume so.
it has to be faster.
Using n you're looking in the variables (dictionaries) once.
Using len(s) you're looking twice (len is also a function that we have to look for). Then you call the function.
That said if you do while i < n: most of the time you can get away with a classical for i in range(len(s)): loop since upper boundary doesn't change, and is evaluated once only at start in range (which may lead you to: Why wouldn't I iterate directly on the elements or use enumerate ?)
while i < len(s) allows to compare your index against a varying list. That's the whole point. If you fix the bound, it becomes less attractive.
In a for loop, it's easy to skip increments with continue (as easy as it is to forget to increment i and end up with an infinite while loop)
You're right, here's some benchmarks:
s = np.random.rand(100)
n = 100
Above is setup.
%%timeit
50 < len(s)
86.3 ns ± 2.4 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
Versus:
%%timeit
50 < n
36.8 ns ± 1.15 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
But then again, it's hard to imagine differences on ~60ns level would have affected speed. Unless you're calling len(s) millions of times.
I have this code:
s = set([5,6,7,8])
if key in s:
return True
if key not in s:
return False
It seems to me that it shouldn't, in theory, differ time wise, but I may be missing something under the hood.
Is there any reason to prefer one over the other in terms of processing time or readability?
Perhaps is this an example of:
"Premature optimization is the root of all evil"?
Short Answer: No, no difference. Yes, probably premature optimization.
OK, I ran this test:
import random
s = set([5,6,7,8])
for _ in range(5000000):
s.add(random.randint(-100000,100000000))
def test_in():
count = 0
for _ in range(50000):
if random.randint(-100000,100000000) in s:
count += 1
print(count)
def test_not_in():
count = 0
for _ in range(50000):
if random.randint(-100000,100000000) not in s:
count += 1
print(count)
When I time the outputs:
%timeit test_in()
10 loops, best of 3: 83.4 ms per loop
%timeit test_not_in()
10 loops, best of 3: 78.7 ms per loop
BUT, that small difference seems to be a symptom of counting the components. There are an average of 47500 "not ins" but only 2500 "ins". If I change both tests to pass, e.g.:
def test_in():
for _ in range(50000):
if random.randint(-100000,100000000) in s:
pass
The results are nearly identical
%timeit test_in()
10 loops, best of 3: 77.4 ms per loop
%timeit test_not_in()
10 loops, best of 3: 78.7 ms per loop
In this case, my intuition failed me. I had thought that saying it is not in the set could have had added some additional processing time. When I further consider what a hashmap does, it seems obvious that this can't be the case.
You shouldn't see a difference. The lookup time in a set is constant. You hash the entry, then look it up in a hashmap. All keys are checked in the same time, and in vs not in should be comparable.
Running a simple performance test in an ipython session with timeit confirms g.d.d.c's statement.
def one(k, s):
if k in s:
return True
def two(k, s):
if k not in s:
return False
s = set(range(1, 100))
%timeit -r7 -n 10000000 one(50, s)
## 83.7 ns ± 0.874 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
%timeit -r7 -n 10000000 two(50, s)
## 86.1 ns ± 1.11 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
Optimisations such as this aren't going to gain you a lot, and as has been pointed out in the comments will in fact reduce the speed at which you'll push out bugfixes/improvements/... due to bad readability. For this type of low-level performance gains, I'd suggest looking into Cython or Numba.