I ran some test to determine if O(==) for Strings is O(len(string)) or O(1).
My tests:
import timeit
x = 'ab' * 500000000
y = 'ab' * 500000000
%timeit x == y
> 163 ms ± 4.62 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
x = 'ab' * 5000
y = 'ab' * 5000
%timeit x == y
> 630 ns ± 23.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
Looking at the above results I understand that string comparison is linear O(N) and not O(1).
However, I was reading this document: Complexities of Python Operations
The part:
Finally, when comparing two lists for equality, the complexity class above shows as O(N), but in reality we would need to multiply this complexity class by O==(...) where O==(...) is the complexity class for checking whether two values in the list are ==. If they are ints, O==(...) would be O(1); if they are strings, O==(...) in the worst case it would be O(len(string)). This issue applies any time an == check is done. We mostly will assume == checking on values in lists is O(1): e.g., checking ints and small/fixed-length strings.
This says the worst case for strings would be O(len(string)). My question is why worst case? Shouldn't the best/average case be O(len(string))?
The algorithm is simple, you check the strings char by char, so:
Hello == Hello => They are equal...so it is actually the worst case because you check all the chars from both strings
Hello != Hella => Still worst case, you realize they are different in the last char of the strings.
hello != Hello => Best case scenario, the first char for both (h != H) are different, so you stop checking them there.
Related
Say we have a call like:
ser = pd.Series([1,2,3,4])
ser[ser>1].any()
Now my question is: Is pandas "smart enough" to stop computation and spit out the "true" when it encounters the 2 or does it really go through the whole array first and checks the any() afterwards. If the latter is true: How to avoid this behavior?
Is pandas "smart enough" to stop computation
pandas acts differently, it widely uses vectorized operations (apply one operation/function to a sequence of values at once) and the mentioned expression ser[ser>1].any() implies:
ser > 1 - evaluates on the whole series and returns a boolean array of results (for each value) array([False, True, True, True])
ser[ser > 1] - filters the series by boolean array
.any() - finally evaluates function on the filtered series
Actually, your intention is covered by (ser > 1).any() (without interim filtering).
If you expect a classical any behavior (to immediately return on encountering True) you can go respectively in classical way:
any(x > 1 for x in ser)
And, of course, the classical way in this case will go faster:
In [409]: %timeit (ser > 1).any()
75.4 µs ± 636 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [410]: %timeit any(x > 1 for x in ser)
1.44 µs ± 22.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
I have the following nested loop:
sum_tot = 0.0
for i in range(len(N)-1):
for j in range(len(N)-1):
sum_tot = sum_tot + N[i]**2*N[j]**2*W[i]*W[j]*x_i[j][-1] / (N[j]**2 - x0**2) *(z_i[i][j] - z_j[i][j])*x_j[i][-1] / (N[i]**2 - x0**2)
It's basically a mathematical function that has a double summation. Each sum goes up to the length of N. I've been trying to figure out if there was a way to write this without using a nested for-loop in order to reduce computational time. I tried using list comprehension, but the computational time is similar if not the same. Is there a way to write this expression as matrices to avoid the loops?
Note that range will stop at N-2 given your current loop: range goes up to but not including its argument. You probably mean to write for i in range(len(N)).
It's also difficult to reduce summation: the actual time it takes is based on the number of terms computed, so if you write it a different way which still involves the same number of terms, it will take just as long. However, O(n^2) isn't exactly bad: it looks like the best you can do in this situation unless you find a mathematical simplification of the problem.
You might consider checking this post to gather ways to write out the summation in a neater fashion.
#Kraigolas makes valid points. But let's try a few benchmarks on a dummy, double nested operation, either way. (Hint: Numba might help you speed things up)
Note, I would avoid numpy arrays specifically because all of the cross-product between the range is going to be in memory at once. If this is a massive range, you may run out of memory.
Nested for loops
n = 5000
s1 = 0
for i in range(n):
for j in range(n):
s1 += (i/2) + (j/3)
print(s1)
#2.26 s ± 101 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
List comprehension
n = 5000
s2 = 0
s2 = sum([i/2+j/3 for i in range(n) for j in range(n)])
print(s2)
#3.2 s ± 307 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Itertools product
from itertools import product
n = 5000
s3 = 0
for i,j in product(range(n),repeat=2):
s3 += (i/2) + (j/3)
print(s3)
#2.35 s ± 186 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Note: When using Numba, you would want to run the code at least once before, because the first time it compiles the code and therefore the speed is slow. The real speedup comes second run onwards.
Numba njit (SIMD)
from numba import njit
n=5000
#njit
def f(n):
s = 0
for i in range(n):
for j in range(n):
s += (i/2) + (j/3)
return s
s4 = f(n)
#29.4 ms ± 1.85 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Numba njit parallel with prange
An excellent suggestion by #Tim, added to benchmarks
#njit(parallel=True)
def f(n):
s = 0
for i in prange(n):
for j in prange(n):
s += (i/2) + (j/3)
return s
s5 = f(n)
#21.8 ms ± 4.81 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Significant boost up with Numba as expected. Maybe try that?
To convert this to matrix calculations, I would suggest combine some terms first.
If these objects are not numpy arrays, it's better to convert them to numpy arrays, as they support element-wise operations.
To convert, simply do
import numpy
N = numpy.array(N)
w = numpy.array(w)
x_i = numpy.array(x_i)
x_j = numpy.array(x_j)
z_i = numpy.array(z_i)
z_j = numpy.array(z_j)
Then,
common_terms = N**2*w/(N**2-x0**2)
i_terms = common_terms*x_j[:,-1]
j_terms = common_terms*x_i[:,-1]
i_j_matrix = z_i - z_j
sum_output = (i_terms.reshape((1,-1)) # i_j_matrix # j_terms.reshape((-1,1)))[0,0]
This question already has answers here:
Why is a `for` loop so much faster to count True values?
(5 answers)
Closed 3 years ago.
I get a list and want to know if all elements are identical.
For lists with a high number of elements, that are indeed all identical, the conversion into a set is fast, but otherwise going over the list with early exit is performing better:
def are_all_identical_iterate(dataset):
first = dataset[0]
for data in dataset:
if data != first:
return False
return True
# versus
def are_all_identical_all(dataset):
return all(data == dataset[0] for data in dataset)
# or
def are_all_identical_all2(dataset):
return all(data == dataset[0] for data in iter(dataset))
# or
def are_all_identical_all3(dataset):
iterator = iter(dataset)
first = next(iterator)
return all(first == rest for rest in iterator)
NUM_ELEMENTS = 50000
testDataset = [1337] * NUM_ELEMENTS # all identical
from timeit import timeit
print(timeit("are_all_identical_iterate(testDataset)", setup="from __main__ import are_all_identical_iterate, testDataset", number=1000))
print(timeit("are_all_identical_all(testDataset)", setup="from __main__ import are_all_identical_all, testDataset", number=1000))
My results:
0.94 seconds,
3.09 seconds,
3.27 seconds,
2.61 seconds
The for loop takes less than have the time (3x) of the all function. The all function is supposed to be the same implementation.
What is going on?
I want to know why the loop is so much faster and why there is a difference between the last 3 implementations. The last implementation should have one compare less, because the iterator removes the first element, but that shouldn't have this kind of impact.
As suggested in this other SO post one cause could be that:
The use of a generator expression causes overhead for constantly
pausing and resuming the generator.
Anyway I suggest two another approaches that looks faster using map:
def are_all_identical_map(dataset):
for x in map(lambda x: x == dataset[0], dataset):
if not x:
return False
return True
%%timeit
are_all_identical_map(testDataset)
#7.5 ms ± 64.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
and
%%timeit
(map(lambda x: x == dataset[0], dataset)) and True
#303 ns ± 13.4 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
Often to save some time, I would like we to use n = len(s) in my local function.
I am curious about which call is faster or they are the same?
while i < len(s):
# do something
vs
while i < n:
# do something
There should not be too much difference, but using len(s), we need to reach s first, then call s.length. This is O(1) + O(1). But using n, it is O(1). I assume so.
it has to be faster.
Using n you're looking in the variables (dictionaries) once.
Using len(s) you're looking twice (len is also a function that we have to look for). Then you call the function.
That said if you do while i < n: most of the time you can get away with a classical for i in range(len(s)): loop since upper boundary doesn't change, and is evaluated once only at start in range (which may lead you to: Why wouldn't I iterate directly on the elements or use enumerate ?)
while i < len(s) allows to compare your index against a varying list. That's the whole point. If you fix the bound, it becomes less attractive.
In a for loop, it's easy to skip increments with continue (as easy as it is to forget to increment i and end up with an infinite while loop)
You're right, here's some benchmarks:
s = np.random.rand(100)
n = 100
Above is setup.
%%timeit
50 < len(s)
86.3 ns ± 2.4 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
Versus:
%%timeit
50 < n
36.8 ns ± 1.15 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
But then again, it's hard to imagine differences on ~60ns level would have affected speed. Unless you're calling len(s) millions of times.
I was reading an interesting post on Short-Circuiting in Python and wondered if this was true for the in operator. My simple testing would conclude that it does not:
%%timeit -n 1000
0 in list(range(10))
1000 loops, best of 3: 639 ns per loop
%%timeit -n 1000
0 in list(range(1000))
1000 loops, best of 3: 23.7 µs per loop
# larger the list, the longer it takes. however, i do notice that a higher
# value does take longer.
%%timeit -n 1000
999 in list(range(1000))
1000 loops, best of 3: 45.1 µs per loop
Is there a detailed explanation of why 999 takes longer than 0. Is the in operator like a loop?
Also, is there a way to tell the in operator to "stop the loop" once the value is found (or is this the already defaulted behavior that I'm not seeing)?
Lastly- Is there another operator/function that I am skipping over that does what I'm talking about in regards to "short-circuiting" in?
Short circuiting does occur. The in operator calls the __contains__ method, which in turn is implemented differently per class (in your case list). Searching for 999 takes around double the time as searching for 0, since half of the work is creating the list, and the other half is iterating through it, which is short circuited in the case of 0.
The implementation of in for list objects is found in list_contains. It performs a scan of the list and does exit early if the last comparison has found the element, there's no point in continuing there.
The loop involved is:
for (i = 0, cmp = 0 ; cmp == 0 && i < Py_SIZE(a); ++i)
cmp = PyObject_RichCompareBool(el, PyList_GET_ITEM(a, i),
Py_EQ);
If cmp is 1 (the value returned from PyObject_RichCompareBool for a match), the for loop condition (cmp == 0 && i < Py_SIZE(a)) becomes false and terminates.
For list objects, which are built-in, what is called for in is a C function (for CPython). For other implementations of Python, this can be a different language using different language constructs.
For user-defined classes in Python, what is called is defined in the Membership test operations of the Reference Manual, take a look there for a run-down of what gets called.
You could also come to this conclusion by timing:
l = [*range(1000)]
%timeit 1 in l
85.8 ns ± 11.9 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
%timeit 999 in l
22 µs ± 221 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
The furthest the element the more you need to scan. If it didn't short-circuit, all in operations would result in similar timings.
Here's another look with a hashed object, set:
from time import time
qlist = list(range(1000))
qset = set(qlist)
start = time()
for i in range(1000):
0 in qlist
print time() - start
start = time()
for i in range(1000):
999 in qlist
print time() - start
start = time()
for i in range(1000):
0 in qset
print time() - start
start = time()
for i in range(1000):
999 in qset
print time() - start
Output:
0.000172853469849 0 in list
0.0399038791656 999 in list
0.000147104263306i 0 in set
0.000195980072021 999 in set
As others have said, the list implementation must do a sequential search. Set inclusion uses a hashed value, and is on par with finding the item in the first element checked.