From interpreter, I get:
>>> timeit.repeat("-".join( str(n) for n in range(10000) ) , repeat = 3, number=10000)
[1.2294530868530273, 1.2298660278320312, 1.2300069332122803] # this is seconds
From commandline, I get:
$ python -m timeit -n 10000 '"-".join(str(n) for n in range(10000))'
10000 loops, best of 3: 1.79 msec per loop # this is milli second
Why this difference in magnitude of timings in the two cases?
The two lines aren't measuring the same thing. In the first snippet, you're timing the calculation 0-1-2-...-9999. while in the second snippet you're timing the string concatenation "-".join(str(n) for n in range(10000)).
In addition, timeit and repeat report the total time, while the CLI averages the time over the number of iterations. So the first code actually takes 12.29 ms "per loop".
Related
I tested this with python 3.5 in Debian Stretch.
I tried benchmark the "Avoiding dots" optimization.
As expected, the "Avoiding dots" optimization is really much faster.
Unexpected, timeit reports the slower code as the faster.
What is the reason?
$ time python3 -m timeit -s "s=''" "s.isalpha()"
10000000 loops, best of 3: 0.119 usec per loop
real 0m5.023s
user 0m4.922s
sys 0m0.012s
$ time python3 -m timeit -s "isalpha=str.isalpha;s=''" "isalpha(s)"
1000000 loops, best of 3: 0.212 usec per loop
real 0m0.937s
user 0m0.927s
sys 0m0.000s
timeit did 10 times as many iterations in the “slow” case. It adaptively tries more iterations to find a number that balances statistical quality and waiting time.
Thank to Davis Herring's answer.
Let's make understand at more details:
From python3 -m timeit -h:
If -n is not given, a suitable number of loops is calculated by trying
successive powers of 10 until the total time is at least 0.2 seconds.
Verify by calculate the details information:
$ time python3 -m timeit -v -s "s=''" "s.isalpha()"
10 loops -> 3.44e-06 secs
100 loops -> 1.29e-05 secs
1000 loops -> 0.000117 secs
10000 loops -> 0.00116 secs
100000 loops -> 0.0118 secs
1000000 loops -> 0.116 secs
10000000 loops -> 1.17 secs
raw times: 1.21 1.21 1.21
10000000 loops, best of 3: 0.121 usec per loop
real 0m4.992s
user 0m4.977s
sys 0m0.012s
All the x loops -> y secs time sum is used to determine the suitable loops number.
Each items in the "raw time" line are single repeat timer result(the -r option determine the number of items in the "raw time" line).
Almost all time is matched:
>>> 3.44e-06+1.29e-05+0.000117+0.00116+0.0118+0.116+1.17+1.21+1.21+1.21
4.92909334
>>> 4.992-4.92909334
0.06290666000000034
Here is what I mean:
> python -m timeit "set().difference(xrange(0,10))"
1000000 loops, best of 3: 0.624 usec per loop
> python -m timeit "set().difference(xrange(0,10**4))"
10000 loops, best of 3: 170 usec per loop
Apparently python iterates through the whole argument, even if the result is known to be the empty set beforehand. Is there any good reason for this? The code was run in python 2.7.6.
(Even for nonempty sets, if you find that you've removed all of the first set's elements midway through the iteration, it makes sense to stop right away.)
Is there any good reason for this?
Having a special path for the empty set had not come up before.
Even for nonempty sets, if you find that you've removed all of the first set's elements midway through the iteration, it makes sense to stop right away.
This is a reasonable optimization request. I've made a patch and will apply it shortly. Here are the new timings with the patch applied:
$ py -m timeit -s "r = range(10 ** 4); s = set()" "s.difference(r)"
10000000 loops, best of 3: 0.104 usec per loop
$ py -m timeit -s "r = set(range(10 ** 4)); s = set()" "s.difference(r)"
10000000 loops, best of 3: 0.105 usec per loop
$ py -m timeit -s "r = range(10 ** 4); s = set()" "s.difference_update(r)"
10000000 loops, best of 3: 0.0659 usec per loop
$ py -m timeit -s "r = set(range(10 ** 4)); s = set()" "s.difference_update(r)"
10000000 loops, best of 3: 0.0684 usec per loop
IMO it's a matter of specialisation, consider:
In [18]: r = range(10 ** 4)
In [19]: s = set(range(10 ** 4))
In [20]: %time set().difference(r)
CPU times: user 387 µs, sys: 0 ns, total: 387 µs
Wall time: 394 µs
Out[20]: set()
In [21]: %time set().difference(s)
CPU times: user 10 µs, sys: 8 µs, total: 18 µs
Wall time: 16.2 µs
Out[21]: set()
Apparently difference has specialised implementation for set - set.
Note that difference operator requires right hand argument to be a set, while difference allows any iterable.
Per #wim implementation is at https://github.com/python/cpython/blob/master/Objects/setobject.c#L1553-L1555
When Python core developers add new features, the first priority is correct code with thorough test coverage. That is hard enough in itself. Speedups often come later as someone has the idea and inclination. I opened a tracker issue 28071 summarizing the proposal and counter-reasons discussed here. I will try to summarize its disposition here.
UPDATE: An early-out for sets that start empty has been added for 3.6.0b1, due in about a day.
$ python -m timeit -s'tes = "987kkv45kk321"*100' 'a = [list(i) for i in tes.split("kk")]'
10000 loops, best of 3: 79.4 usec per loop
$ python -m timeit -s'tes = "987kkv45kk321"*100' 'b = list(map(list, tes.split("kk")))'
10000 loops, best of 3: 66.9 usec per loop
$ python -m timeit -s'tes = "987kkv45kk321"*10' 'a = [list(i) for i in tes.split("kk")]'
100000 loops, best of 3: 8.34 usec per loop
$ python -m timeit -s'tes = "987kkv45kk321"*10' 'b = list(map(list, tes.split("kk")))'
100000 loops, best of 3: 7.38 usec per loop
$ python -m timeit -s'tes = "987kkv45kk321"' 'a = [list(i) for i in tes.split("kk")]'
1000000 loops, best of 3: 1.51 usec per loop
$ python -m timeit -s'tes = "987kkv45kk321"' 'b = list(map(list, tes.split("kk")))'
1000000 loops, best of 3: 1.63 usec per loop
I tried using timeit and wonder why creating list of lists from string.split() with list comprehension is faster for a shorter string but slower for longer string.
The fixed setup costs for map are higher than the setup costs for the listcomp solution. But the per-item costs for map are lower. So for short inputs, map is paying more in fixed setup costs than it saves on the per item costs (because there are so few items). When the number of items increases, the fixed setup costs for map don't change, but the savings per item is being reaped for more items, so map slowly pulls ahead.
Things that map saves on:
Only looks up list once (the listcomp has to look it up in the builtin namespace every single loop, after checking the nested and global scopes first, because it can't guarantee list isn't overridden from loop to loop)
Executes no Python bytecode per item (because the mapping function is also C level), so the interpreter doesn't get involved at all, reducing the amount of hot C level code
map loses on the actual call to map (C built-in functions are fast to run, but comparatively slow to call, especially if they take variable length arguments), and the creation and cleanup of the map object (the listcomp closure is compiled up front). But as I noted above, neither of these is tied to the size of the inputs, so you make up for it rapidly if the mapping function is a C builtin.
This kind of timing is basically useless.
The time frames you are getting are in microseconds - and you are just creating tens of different one-character-long-elements list in each interaction. You get basically linear type, because the number of objects you create is proportional to your string lengths. There is hardly any surprise in this.
This is mostly an exercise in learning Python. I wrote this function to test if a number is prime:
def p1(n):
for d in xrange(2, int(math.sqrt(n)) + 1):
if n % d == 0:
return False
return True
Then I realized I can make easily rewrite it using any():
def p2(n):
return not any((n % d == 0) for d in xrange(2, int(math.sqrt(n)) + 1))
Performance-wise, I was expecting p2 to be faster than, or at the very least as fast as, p1 because any() is builtin, but for a large-ish prime, p2 is quite a bit slower:
$ python -m timeit -n 100000 -s "import test" "test.p1(999983)"
100000 loops, best of 3: 60.2 usec per loop
$ python -m timeit -n 100000 -s "import test" "test.p2(999983)"
100000 loops, best of 3: 88.1 usec per loop
Am I using any() incorrectly here? Is there a way to write this function using any() so that it's as far as iterating myself?
Update: Numbers for an even larger prime
$ python -m timeit -n 1000 -s "import test" "test.p1(9999999999971)"
1000 loops, best of 3: 181 msec per loop
$ python -m timeit -n 1000 -s "import test" "test.p2(9999999999971)"
1000 loops, best of 3: 261 msec per loop
The performance difference is minimal, but the reason it exists is that any incurs building a generator expression, and an extra function call, compared to the for loop. Both have identical behaviors, though (shortcut evaluation).
As the size of your input grows, the difference won't diminish (I was wrong) because you're using a generator expression, and iterating over it requires calling a method (.next()) on it and an extra stack frame. any does that under the hood, of course.
The for loop is iterating over an xrange object. any is iterating over a generator expression, which itself is iterating over an xrange object.
Either way, use whichever produces the most readable/maintainable code. Choosing one over the other will have little, if any, performance impact on whatever program you're writing.
I'd like to create a random list of integers for testing purposes. The distribution of the numbers is not important. The only thing that is counting is time. I know generating random numbers is a time-consuming task, but there must be a better way.
Here's my current solution:
import random
import timeit
# Random lists from [0-999] interval
print [random.randint(0, 1000) for r in xrange(10)] # v1
print [random.choice([i for i in xrange(1000)]) for r in xrange(10)] # v2
# Measurement:
t1 = timeit.Timer('[random.randint(0, 1000) for r in xrange(10000)]', 'import random') # v1
t2 = timeit.Timer('random.sample(range(1000), 10000)', 'import random') # v2
print t1.timeit(1000)/1000
print t2.timeit(1000)/1000
v2 is faster than v1, but it is not working on such a large scale. It gives the following error:
ValueError: sample larger than population
Is there a fast, efficient solution that works at that scale?
Some results from the answer
Andrew's: 0.000290962934494
gnibbler's: 0.0058455221653
KennyTM's: 0.00219276118279
NumPy came, saw, and conquered.
It is not entirely clear what you want, but I would use numpy.random.randint:
import numpy.random as nprnd
import timeit
t1 = timeit.Timer('[random.randint(0, 1000) for r in xrange(10000)]', 'import random') # v1
### Change v2 so that it picks numbers in (0, 10000) and thus runs...
t2 = timeit.Timer('random.sample(range(10000), 10000)', 'import random') # v2
t3 = timeit.Timer('nprnd.randint(1000, size=10000)', 'import numpy.random as nprnd') # v3
print t1.timeit(1000)/1000
print t2.timeit(1000)/1000
print t3.timeit(1000)/1000
which gives on my machine:
0.0233682730198
0.00781716918945
0.000147947072983
Note that randint is very different from random.sample (in order for it to work in your case I had to change the 1,000 to 10,000 as one of the commentators pointed out -- if you really want them from 0 to 1,000 you could divide by 10).
And if you really don't care what distribution you are getting then it is possible that you either don't understand your problem very well, or random numbers -- with apologies if that sounds rude...
All the random methods end up calling random.random() so the best way is to call it directly:
[int(1000*random.random()) for i in xrange(10000)]
For example,
random.randint calls random.randrange.
random.randrange has a bunch of overhead to check the range before returning istart + istep*int(self.random() * n).
NumPy is much faster still of course.
Your question about performance is moot—both functions are very fast. The speed of your code will be determined by what you do with the random numbers.
However it's important you understand the difference in behaviour of those two functions. One does random sampling with replacement, the other does random sampling without replacement.
Firstly, you should use randrange(0,1000) or randint(0,999), not randint(0,1000). The upper limit of randint is inclusive.
For efficiently, randint is simply a wrapper of randrange which calls random, so you should just use random. Also, use xrange as the argument to sample, not range.
You could use
[a for a in sample(xrange(1000),1000) for _ in range(10000/1000)]
to generate 10,000 numbers in the range using sample 10 times.
(Of course this won't beat NumPy.)
$ python2.7 -m timeit -s 'from random import randrange' '[randrange(1000) for _ in xrange(10000)]'
10 loops, best of 3: 26.1 msec per loop
$ python2.7 -m timeit -s 'from random import sample' '[a%1000 for a in sample(xrange(10000),10000)]'
100 loops, best of 3: 18.4 msec per loop
$ python2.7 -m timeit -s 'from random import random' '[int(1000*random()) for _ in xrange(10000)]'
100 loops, best of 3: 9.24 msec per loop
$ python2.7 -m timeit -s 'from random import sample' '[a for a in sample(xrange(1000),1000) for _ in range(10000/1000)]'
100 loops, best of 3: 3.79 msec per loop
$ python2.7 -m timeit -s 'from random import shuffle
> def samplefull(x):
> a = range(x)
> shuffle(a)
> return a' '[a for a in samplefull(1000) for _ in xrange(10000/1000)]'
100 loops, best of 3: 3.16 msec per loop
$ python2.7 -m timeit -s 'from numpy.random import randint' 'randint(1000, size=10000)'
1000 loops, best of 3: 363 usec per loop
But since you don't care about the distribution of numbers, why not just use:
range(1000)*(10000/1000)
?