I tested this with python 3.5 in Debian Stretch.
I tried benchmark the "Avoiding dots" optimization.
As expected, the "Avoiding dots" optimization is really much faster.
Unexpected, timeit reports the slower code as the faster.
What is the reason?
$ time python3 -m timeit -s "s=''" "s.isalpha()"
10000000 loops, best of 3: 0.119 usec per loop
real 0m5.023s
user 0m4.922s
sys 0m0.012s
$ time python3 -m timeit -s "isalpha=str.isalpha;s=''" "isalpha(s)"
1000000 loops, best of 3: 0.212 usec per loop
real 0m0.937s
user 0m0.927s
sys 0m0.000s
timeit did 10 times as many iterations in the “slow” case. It adaptively tries more iterations to find a number that balances statistical quality and waiting time.
Thank to Davis Herring's answer.
Let's make understand at more details:
From python3 -m timeit -h:
If -n is not given, a suitable number of loops is calculated by trying
successive powers of 10 until the total time is at least 0.2 seconds.
Verify by calculate the details information:
$ time python3 -m timeit -v -s "s=''" "s.isalpha()"
10 loops -> 3.44e-06 secs
100 loops -> 1.29e-05 secs
1000 loops -> 0.000117 secs
10000 loops -> 0.00116 secs
100000 loops -> 0.0118 secs
1000000 loops -> 0.116 secs
10000000 loops -> 1.17 secs
raw times: 1.21 1.21 1.21
10000000 loops, best of 3: 0.121 usec per loop
real 0m4.992s
user 0m4.977s
sys 0m0.012s
All the x loops -> y secs time sum is used to determine the suitable loops number.
Each items in the "raw time" line are single repeat timer result(the -r option determine the number of items in the "raw time" line).
Almost all time is matched:
>>> 3.44e-06+1.29e-05+0.000117+0.00116+0.0118+0.116+1.17+1.21+1.21+1.21
4.92909334
>>> 4.992-4.92909334
0.06290666000000034
Related
If I am creating a program that does some complex calculations on a data set and I already know what some of the values should be, should I still calculate them? For example if I know that 0 or 1 would always be themselves should I just check if the value is 0 or 1 or actually do the calculations?
Edit:
I don't have code because I was asking as a concept. I was creating a program to return the base 10 log of each number in a data set and I was wondering if it would be more efficient to return values I already knew like 0 for 1, "undefined" for 0, and the number of zeros for numbers divisible by 10. I wasn't sure if it was more efficient and if it would be efficient on a larger scale.
Let's try this simple example
$ python3 -m timeit -s "from math import log; mylog=lambda x: log(x)" "mylog(1)"
10000000 loops, best of 3: 0.152 usec per loop
$ python3 -m timeit -s "from math import log; mylog=lambda x: 0.0 if x==1 else log(x)" "mylog(1)"
10000000 loops, best of 3: 0.0976 usec per loop
So there is some speedup, however. All the non special cases run slower
$ python3 -m timeit -s "from math import log; mylog=lambda x: log(x)" "mylog(2)"
10000000 loops, best of 3: 0.164 usec per loop
$ python3 -m timeit -s "from math import log; mylog=lambda x: 0.0 if x==1 else log(x)" "mylog(2)"
1000000 loops, best of 3: 0.176 usec per loop
And in this case, it's better just to leave the wrapper function out altogether
$ python3 -m timeit -s "from math import log" "log(2)"
10000000 loops, best of 3: 0.0804 usec per loop
Imagine list of strings like this one: ('{hello world} is awesome', 'Hello world is less awesome', '{hello world} is {awesome} too'). I want to check each string in for cycle for starting character, I think I have got 4 options:
if re.search(r'^\{', i):
if re.match(r'\{', i):
if i.startswith('{'):
if i[:1] == '{':
Which is the fastest one? Is there some even more faster than these 4 options?
Note: The starting string to compare could be longer, not only one letter, e.g. {hello
The fastest is i[0] == value, since it directly uses a pointer to the underlying array. Regex needs to (at least) parse the pattern, while startsWith has the overhead of a method call and creating a slice of that size before the actual comparison.
As #dsqdfg said in the comments, there is a timing function in python I've never known until now. I tried to measure them and there are some results:
python -m timeit -s 'text="{hello world}"' 'text[:6] == "{hello"'
1000000 loops, best of 3: 0.224 usec per loop
python -m timeit -s 'text="{hello world}"' 'text.startswith("{hello")'
1000000 loops, best of 3: 0.291 usec per loop
python -m timeit -s 'text="{hello world}"' 'import re' 're.match(r"\{hello", text)'
100000 loops, best of 3: 2.53 usec per loop
python -m timeit -s 'text="{hello world}"' 'import re' 're.search(r"^\{hello", text)'
100000 loops, best of 3: 2.86 usec per loop
Here is what I mean:
> python -m timeit "set().difference(xrange(0,10))"
1000000 loops, best of 3: 0.624 usec per loop
> python -m timeit "set().difference(xrange(0,10**4))"
10000 loops, best of 3: 170 usec per loop
Apparently python iterates through the whole argument, even if the result is known to be the empty set beforehand. Is there any good reason for this? The code was run in python 2.7.6.
(Even for nonempty sets, if you find that you've removed all of the first set's elements midway through the iteration, it makes sense to stop right away.)
Is there any good reason for this?
Having a special path for the empty set had not come up before.
Even for nonempty sets, if you find that you've removed all of the first set's elements midway through the iteration, it makes sense to stop right away.
This is a reasonable optimization request. I've made a patch and will apply it shortly. Here are the new timings with the patch applied:
$ py -m timeit -s "r = range(10 ** 4); s = set()" "s.difference(r)"
10000000 loops, best of 3: 0.104 usec per loop
$ py -m timeit -s "r = set(range(10 ** 4)); s = set()" "s.difference(r)"
10000000 loops, best of 3: 0.105 usec per loop
$ py -m timeit -s "r = range(10 ** 4); s = set()" "s.difference_update(r)"
10000000 loops, best of 3: 0.0659 usec per loop
$ py -m timeit -s "r = set(range(10 ** 4)); s = set()" "s.difference_update(r)"
10000000 loops, best of 3: 0.0684 usec per loop
IMO it's a matter of specialisation, consider:
In [18]: r = range(10 ** 4)
In [19]: s = set(range(10 ** 4))
In [20]: %time set().difference(r)
CPU times: user 387 µs, sys: 0 ns, total: 387 µs
Wall time: 394 µs
Out[20]: set()
In [21]: %time set().difference(s)
CPU times: user 10 µs, sys: 8 µs, total: 18 µs
Wall time: 16.2 µs
Out[21]: set()
Apparently difference has specialised implementation for set - set.
Note that difference operator requires right hand argument to be a set, while difference allows any iterable.
Per #wim implementation is at https://github.com/python/cpython/blob/master/Objects/setobject.c#L1553-L1555
When Python core developers add new features, the first priority is correct code with thorough test coverage. That is hard enough in itself. Speedups often come later as someone has the idea and inclination. I opened a tracker issue 28071 summarizing the proposal and counter-reasons discussed here. I will try to summarize its disposition here.
UPDATE: An early-out for sets that start empty has been added for 3.6.0b1, due in about a day.
This is mostly an exercise in learning Python. I wrote this function to test if a number is prime:
def p1(n):
for d in xrange(2, int(math.sqrt(n)) + 1):
if n % d == 0:
return False
return True
Then I realized I can make easily rewrite it using any():
def p2(n):
return not any((n % d == 0) for d in xrange(2, int(math.sqrt(n)) + 1))
Performance-wise, I was expecting p2 to be faster than, or at the very least as fast as, p1 because any() is builtin, but for a large-ish prime, p2 is quite a bit slower:
$ python -m timeit -n 100000 -s "import test" "test.p1(999983)"
100000 loops, best of 3: 60.2 usec per loop
$ python -m timeit -n 100000 -s "import test" "test.p2(999983)"
100000 loops, best of 3: 88.1 usec per loop
Am I using any() incorrectly here? Is there a way to write this function using any() so that it's as far as iterating myself?
Update: Numbers for an even larger prime
$ python -m timeit -n 1000 -s "import test" "test.p1(9999999999971)"
1000 loops, best of 3: 181 msec per loop
$ python -m timeit -n 1000 -s "import test" "test.p2(9999999999971)"
1000 loops, best of 3: 261 msec per loop
The performance difference is minimal, but the reason it exists is that any incurs building a generator expression, and an extra function call, compared to the for loop. Both have identical behaviors, though (shortcut evaluation).
As the size of your input grows, the difference won't diminish (I was wrong) because you're using a generator expression, and iterating over it requires calling a method (.next()) on it and an extra stack frame. any does that under the hood, of course.
The for loop is iterating over an xrange object. any is iterating over a generator expression, which itself is iterating over an xrange object.
Either way, use whichever produces the most readable/maintainable code. Choosing one over the other will have little, if any, performance impact on whatever program you're writing.
From interpreter, I get:
>>> timeit.repeat("-".join( str(n) for n in range(10000) ) , repeat = 3, number=10000)
[1.2294530868530273, 1.2298660278320312, 1.2300069332122803] # this is seconds
From commandline, I get:
$ python -m timeit -n 10000 '"-".join(str(n) for n in range(10000))'
10000 loops, best of 3: 1.79 msec per loop # this is milli second
Why this difference in magnitude of timings in the two cases?
The two lines aren't measuring the same thing. In the first snippet, you're timing the calculation 0-1-2-...-9999. while in the second snippet you're timing the string concatenation "-".join(str(n) for n in range(10000)).
In addition, timeit and repeat report the total time, while the CLI averages the time over the number of iterations. So the first code actually takes 12.29 ms "per loop".