i have created the code below, it takes a series of values,
and generates 10 numbers between x and r with an average value of 8000
in order to meet the specification to cover the range as well as possible, I also calculated the standard deviation, which is a good measure of spread. So whenever a sample set meets the criteria of mean of 8000, I compared it to previous matches and constantly choose the samples that have the highest std dev (mean always = 8000)
def node_timing(average_block_response_computational_time, min_block_response_computational_time, max_block_response_computational_time):
sample_count = 10
num_of_trials = 1
# print average_block_response_computational_time
# print min_block_response_computational_time
# print max_block_response_computational_time
target_sum = sample_count * average_block_response_computational_time
samples_list = []
curr_stdev_max = 0
for trials in range(num_of_trials):
samples = [0] * sample_count
while sum(samples) != target_sum:
samples = [rd.randint(min_block_response_computational_time, max_block_response_computational_time) for trial in range(sample_count)]
# print ("Mean: ", st.mean(samples), "Std Dev: ", st.stdev(samples), )
# print (samples, "\n")
if st.stdev(samples) > curr_stdev_max:
curr_stdev_max = st.stdev(samples)
samples_best = samples[:]
return samples_best[0]
i take the first value in the list and use this as a timing value, however this code is REALLY slow, i need to call this piece of code several thousand times during the simulation so need to improve the efficency of the code some how
anyone got any suggestions on how to ?
To see where we'd get the best speed improvements, I started by profiling your code.
import cProfile
pr = cProfile.Profile()
pr.enable()
for i in range(100):
print(node_timing(8000, 7000, 9000))
pr.disable()
pr.print_stats(sort='time')
The top of the results show where your code is spending most of its time:
23561178 function calls (23561176 primitive calls) in 10.612 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
4502300 3.694 0.000 7.258 0.000 random.py:172(randrange)
4502300 2.579 0.000 3.563 0.000 random.py:222(_randbelow)
4502300 1.533 0.000 8.791 0.000 random.py:216(randint)
450230 1.175 0.000 9.966 0.000 counter.py:19(<listcomp>)
4608421 0.690 0.000 0.690 0.000 {method 'getrandbits' of '_random.Random' objects}
100 0.453 0.005 10.596 0.106 counter.py:5(node_timing)
4502300 0.294 0.000 0.294 0.000 {method 'bit_length' of 'int' objects}
450930 0.141 0.000 0.150 0.000 {built-in method builtins.sum}
100 0.016 0.000 0.016 0.000 {built-in method builtins.print}
600 0.007 0.000 0.025 0.000 statistics.py:105(_sum)
2200 0.005 0.000 0.006 0.000 fractions.py:84(__new__)
...
From this output, we can see that we're spending ~7.5 seconds (out of 10.6 seconds) generating random numbers. Therefore, the only way to make this noticeably faster is to generate fewer random numbers or generate them faster. You're not using a cryptographic random number generator so I don't have a way to make generating numbers faster. However, we can fudge the algorithm a bit and drastically reduce the number of values we need to generate.
Instead of only accepting samples with a mean of exactly 8000, what if we accepted samples with a mean of 8000 +- 0.1% (then we're taking samples with a mean of 7992 to 8008)? By being a tiny bit inexact, we can drastically speed up the algorithm. I replaced the while condition with:
while abs(sum(samples) - target_sum) > epsilon
Where epsilon = target_sum * 0.001. Then I ran the script again and got much better profiler numbers.
232439 function calls (232437 primitive calls) in 0.163 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
100 0.032 0.000 0.032 0.000 {built-in method builtins.print}
31550 0.026 0.000 0.053 0.000 random.py:172(randrange)
31550 0.019 0.000 0.027 0.000 random.py:222(_randbelow)
31550 0.011 0.000 0.064 0.000 random.py:216(randint)
4696 0.010 0.000 0.013 0.000 fractions.py:84(__new__)
3155 0.008 0.000 0.073 0.000 counter.py:19(<listcomp>)
600 0.008 0.000 0.039 0.000 statistics.py:105(_sum)
100 0.006 0.000 0.131 0.001 counter.py:4(node_timing)
32293 0.005 0.000 0.005 0.000 {method 'getrandbits' of '_random.Random' objects}
1848 0.004 0.000 0.009 0.000 fractions.py:401(_add)
Allowing the mean to be up to 0.1% off of the target dropped the number of calls to randint by 100x. Naturally, the code also runs 100x faster (and now spends most of its time printing to console).
I have a script that finds the sum of all numbers that can be written as the sum of fifth powers of their digits. (This problem is described in more detail on the Project Euler web site.)
I have written it two ways, but I do not understand the performance difference.
The first way uses nested list comprehensions:
exp = 5
def min_combo(n):
return ''.join(sorted(list(str(n))))
def fifth_power(n, exp):
return sum([int(x) ** exp for x in list(n)])
print sum( [fifth_power(j,exp) for j in set([min_combo(i) for i in range(101,1000000) ]) if int(j) > 10 and j == min_combo(fifth_power(j,exp)) ] )
and profiles like this:
$ python -m cProfile euler30.py
443839
3039223 function calls in 2.040 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1007801 1.086 0.000 1.721 0.000 euler30.py:10(min_combo)
7908 0.024 0.000 0.026 0.000 euler30.py:14(fifth_power)
1 0.279 0.279 2.040 2.040 euler30.py:6(<module>)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
1007801 0.175 0.000 0.175 0.000 {method 'join' of 'str' objects}
1 0.013 0.013 0.013 0.013 {range}
1007801 0.461 0.000 0.461 0.000 {sorted}
7909 0.002 0.000 0.002 0.000 {sum}
The second way is the more usual for loop:
exp = 5
ans= 0
def min_combo(n):
return ''.join(sorted(list(str(n))))
def fifth_power(n, exp):
return sum([int(x) ** exp for x in list(n)])
for j in set([ ''.join(sorted(list(str(i)))) for i in range(100, 1000000) ]):
if int(j) > 10:
if j == min_combo(fifth_power(j,exp)):
ans += fifth_power(j,exp)
print 'answer', ans
Here is the profiling info again:
$ python -m cProfile euler30.py
answer 443839
2039325 function calls in 1.709 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
7908 0.024 0.000 0.026 0.000 euler30.py:13(fifth_power)
1 1.081 1.081 1.709 1.709 euler30.py:6(<module>)
7902 0.009 0.000 0.015 0.000 euler30.py:9(min_combo)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
1007802 0.147 0.000 0.147 0.000 {method 'join' of 'str' objects}
1 0.013 0.013 0.013 0.013 {range}
1007802 0.433 0.000 0.433 0.000 {sorted}
7908 0.002 0.000 0.002 0.000 {sum}
Why does the list comprehension implementation call min_combo() 1,000,000 more times than the for loop implementation?
Because on the second one you implemented again the content of min_combo inside the set call...
Do the same thing and you'll have the same result.
BTW, change those to avoid big lists being created:
sum([something for foo in bar]) -> sum(something for foo in bar)
set([something for foo in bar]) -> set(something for foo in bar)
(without [...] they become generator expressions).
Was writing a blog post about some python coding styles and came across something that I found very strange and I was wondering if someone understood what was going on with it. Basically I've got two versions of the same function:
a = lambda x: (i for i in range(x))
def b(x):
for i in range(x):
yield i
And I want to compare the performance of these two doing just being set up. In my mind this should involve a negligible amount of computation and both methods should come up pretty close to zero, however, when I actually ran the timeit:
def timing(x, number=10):
implicit = timeit.timeit('a(%s)' % int(x), 'from __main__ import a', number=number)
explicit = timeit.timeit('b(%s)' % int(x), 'from __main__ import b', number=number)
return (implicit, explicit)
def plot_timings(*args, **kwargs):
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
x_vector = np.linspace(*args, **kwargs)
timings = np.vectorize(timing)(x_vector)
ax.plot(x_vector, timings[0], 'b--')
ax.plot(x_vector, timings[1], 'r--')
ax.set_yscale('log')
plt.show()
plot_timings(1, 1000000, 20)
I get a HUGE difference between the two methods as shown below:
Where a is in blue, and b is in red.
Why is the difference so huge? It looks the explicit for loop version is also growing logarithmically, while the implicit version is doing nothing (as it should).
Any thoughts?
The difference is caused by range
a needs to call range when you construct it.
b doesn't need to call range until the first iteration
>>> def myrange(n):
... print "myrange(%s)"%n
... return range(n)
...
>>> a = lambda x: (i for i in myrange(x))
>>> def b(x):
... for i in myrange(x):
... yield i
...
>>> a(100)
myrange(100)
range(100)
<generator object <genexpr> at 0xd62d70>
>>> b(100)
<generator object b at 0xdadb90>
>>> next(_) # <-- first iteration of b(100)
myrange(100)
range(100)
0
The lambda call is the slow one. Check this out:
import cProfile
a = lambda x: (i for i in range(x))
def b(x):
for i in range(x):
yield i
def c(x):
for i in xrange(x):
yield i
def d(x):
i = 0
while i < x:
yield i
i += 1
N = 100000
print " -- a --"
cProfile.run("""
for x in xrange(%i):
a(x)
""" % N)
print " -- b --"
cProfile.run("""
for x in xrange(%i):
b(x)
""" % N)
print " -- c --"
cProfile.run("""
for x in xrange(%i):
c(x)
""" % N)
print " -- d --"
cProfile.run("""
for x in xrange(%i):
d(x)
""" % N)
print " -- a (again) --"
cProfile.run("""
for x in xrange(%i):
a(x)
""" % N)
Gives me the following results:
-- a --
300002 function calls in 61.764 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 30.881 30.881 61.764 61.764 <string>:3(<module>)
100000 0.051 0.000 0.051 0.000 test.py:5(<genexpr>)
100000 0.247 0.000 30.832 0.000 test.py:5(<lambda>)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
100000 30.585 0.000 30.585 0.000 {range}
-- b --
100002 function calls in 0.076 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.066 0.066 0.076 0.076 <string>:3(<module>)
100000 0.010 0.000 0.010 0.000 test.py:7(b)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
-- c --
100002 function calls in 0.075 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.065 0.065 0.075 0.075 <string>:3(<module>)
100000 0.010 0.000 0.010 0.000 test.py:11(c)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
-- d --
100002 function calls in 0.075 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.065 0.065 0.075 0.075 <string>:3(<module>)
100000 0.010 0.000 0.010 0.000 test.py:15(d)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
-- a (again) --
300002 function calls in 60.890 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 30.487 30.487 60.890 60.890 <string>:3(<module>)
100000 0.049 0.000 0.049 0.000 test.py:5(<genexpr>)
100000 0.237 0.000 30.355 0.000 test.py:5(<lambda>)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
100000 30.118 0.000 30.118 0.000 {range}
Look at the codes below, I use two ways to solve the problem (simple recursive and DP). Why is the DP way slower?
What's your suggestion?
#!/usr/local/bin/python2.7
# encoding: utf-8
Problem: There is an array with positive integer. given a positive integer S,\
find the total number of combinations in Which the numbers' sum is S.
Method I:
def find_sum_recursive(number_list, sum_to_find):
count = 0
for i in range(len(number_list)):
sub_sum = sum_to_find - number_list[i]
if sub_sum < 0:
continue
elif sub_sum == 0:
count += 1
continue
else:
sub_list = number_list[i + 1:]
count += find_sum_recursive(sub_list, sub_sum)
return count
Method II:
def find_sum_DP(number_list, sum_to_find):
count = 0
if(0 == sum_to_find):
count = 1
elif([] != number_list and sum_to_find > 0):
count = find_sum_DP(number_list[:-1], sum_to_find) + find_sum_DP(number_list[:-1], sum_to_find - number_list[:].pop())
return count
Running it:
def main(argv=None): # IGNORE:C0111
number_list = [5, 5, 10, 3, 2, 9, 8]
sum_to_find = 15
input_setup = ';number_list = [5, 5, 10, 3, 2, 9, 8, 7, 6, 4, 3, 2, 9, 5, 4, 7, 2, 8, 3];sum_to_find = 15'
print 'Calculating...'
print 'recursive starting'
count = find_sum_recursive(number_list, sum_to_find)
print timeit.timeit('count = find_sum_recursive(number_list, sum_to_find)', setup='from __main__ import find_sum_recursive' + input_setup, number=10)
cProfile.run('find_sum_recursive(' + str(number_list) + ',' + str(sum_to_find) + ')')
print 'recursive ended:', count
print 'DP starting'
count_DP = find_sum_DP(number_list, sum_to_find)
print timeit.timeit('count_DP = find_sum_DP(number_list, sum_to_find)', setup='from __main__ import find_sum_DP' + input_setup, number=10)
cProfile.run('find_sum_DP(' + str(number_list) + ',' + str(sum_to_find) + ')')
print 'DP ended:', count_DP
print 'Finished.'
if __name__ == '__main__':
sys.exit(main())
I recode the method II, and it's right now:
def find_sum_DP(number_list, sum_to_find):
count = [[0 for i in xrange(0, sum_to_find + 1)] for j in xrange(0, len(number_list) + 1)]
for i in range(len(number_list) + 1):
for j in range(sum_to_find + 1):
if (0 == i and 0 == j):
count[i][j] = 1
elif (i > 0 and j > 0):
if (j > number_list[i - 1]):
count[i][j] = count[i - 1][j] + count[i - 1][j - number_list[i - 1]]
elif(j < number_list[i - 1]):
count[i][j] = count[i - 1][j]
else:
count[i][j] = count[i - 1][j] + 1
else:
count[i][j] = 0
return count[len(number_list)][sum_to_find]
Compare between method I & II:
Calculating...
recursive starting
0.00998711585999
92 function calls (63 primitive calls) in 0.000 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.000 0.000 <string>:1(<module>)
30/1 0.000 0.000 0.000 0.000 FindSum.py:18(find_sum_recursive)
30 0.000 0.000 0.000 0.000 {len}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
30 0.000 0.000 0.000 0.000 {range}
recursive ended: 6
DP starting
0.00171685218811
15 function calls in 0.000 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.000 0.000 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 FindSum.py:33(find_sum_DP)
3 0.000 0.000 0.000 0.000 {len}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
9 0.000 0.000 0.000 0.000 {range}
DP ended: 6
Finished.
If you're using iPython, %prun is your friend here.
Take a look at the output for the recursive version:
2444 function calls (1631 primitive calls) in 0.002 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
814/1 0.002 0.000 0.002 0.002 <ipython-input-1-7488a6455e38>:1(find_sum_recursive)
814 0.000 0.000 0.000 0.000 {range}
814 0.000 0.000 0.000 0.000 {len}
1 0.000 0.000 0.002 0.002 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
And now, for the DP version:
10608 function calls (3538 primitive calls) in 0.007 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
7071/1 0.007 0.000 0.007 0.007 <ipython-input-15-3535e3ab26eb>:1(find_sum_DP)
3535 0.001 0.000 0.001 0.000 {method 'pop' of 'list' objects}
1 0.000 0.000 0.007 0.007 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
7071 is quite a bit higher than 814!
Your problem here is that your dynamic programming method isn't dynamic programming! The point of dynamic programming is that, when you have a problem with overlapping subproblems, as you do here, you store the results of each subproblem, and then when if you need the result again, you take it from that store rather than recalculating. Your code doesn't do that: every time you call find_sum_DP, you're recalculating, even if the same calculation has already been done. The result is that your _DP method is actually not only recursive, but recursive with more function calls than your recursive method.
(I'm currently writing a DP version to demonstrate)
Edit:
I need to add the caveat that, while I should know much more about dynamic programming, I very embarrassingly don't. I'm also writing this quickly and late at night, a bit as an exercise for myself. Nevertheless, here is a dynamic programming implementation of the function:
import numpy as np
def find_sum_realDP( number_list, sum_to_find ):
memo = np.zeros( (len(number_list),sum_to_find+1) ,dtype=np.int)-1
# This will store our results. memo[l][n] will give us the result
# for number_list[0:l+1] and a sum_to_find of n. If it hasn't been
# calculated yet, it will give us -1. This is not at all efficient
# storage, but isn't terribly bad.
# Now that we have that, we'll call the real function. Instead of modifying
# the list and making copies or views, we'll keep the same list, and keep
# track of the index we're on (nli).
return find_sum_realDP_do( number_list, len(number_list)-1, sum_to_find, memo ),memo
def find_sum_realDP_do( number_list, nli, sum_to_find, memo ):
# Our count is 0 by default.
ret = 0
# If we aren't at the sum to find yet, do we have any numbers left after this one?
if ((sum_to_find > 0) and nli>0):
# Each of these checks to see if we've already stored the result of the calculation.
# If so, we use that, if not, we calculate it.
if memo[nli-1,sum_to_find]>=0:
ret += memo[nli-1,sum_to_find]
else:
ret += find_sum_realDP_do(number_list, nli-1, sum_to_find, memo)
# This one is a bit tricky, and was a bug when I first wrote it. We don't want to
# have a negative sum_to_find, because that will be very bad; we'll start using results
# from other places in memo because it will wrap around.
if (sum_to_find-number_list[nli]>=0) and memo[nli-1,sum_to_find-number_list[nli]]>=0:
ret += memo[nli-1,sum_to_find-number_list[nli]]
elif (sum_to_find-number_list[nli]>=0):
ret += find_sum_realDP_do(number_list, nli-1, sum_to_find-number_list[nli], memo)
# Do we not actually have any sum to find left?
elif (0 == sum_to_find):
ret = 1
# If we only have one number left, will it get us there?
elif (nli == 0) and (sum_to_find-number_list[nli] == 0 ):
ret = 1
# Store our result.
memo[nli,sum_to_find] = ret
# Return our result.
return ret
Note that this uses numpy. It's very likely that you don't have this installed, but I'm not sure how to write a reasonably-performing dynamic programming algorithm in Python without it; I don't think Python lists have anywhere near the performance of Numpy arrays. Note also that this vs your code deals with zeros differently, so rather than debug this I'll just say that this code is for nonzero positive integers in the number list. Now, with this algorithm, profiling gives us:
243 function calls (7 primitive calls) in 0.001 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
237/1 0.001 0.000 0.001 0.001 <ipython-input-155-4a624e5a99b7>:9(find_sum_realDP_do)
1 0.000 0.000 0.001 0.001 <ipython-input-155-4a624e5a99b7>:1(find_sum_realDP)
1 0.000 0.000 0.000 0.000 {numpy.core.multiarray.zeros}
1 0.000 0.000 0.001 0.001 <string>:1(<module>)
2 0.000 0.000 0.000 0.000 {len}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
243 is a great deal better than even the recursive version! But your example data is small enough that it doesn't really show off how much better a dynamic programming algorithm is.
Let's try nlist2 = [7, 6, 2, 3, 7, 7, 2, 7, 4, 2, 4, 5, 6, 1, 7, 4, 6, 3, 2, 1, 1, 1, 4,
2, 3, 5, 2, 4, 4, 2, 4, 5, 4, 2, 1, 7, 6, 6, 1, 5, 4, 5, 3, 2, 3, 7,
1, 7, 6, 6], with the same sum_to_find=15. This has 50 values, and 900206 ways to get 15...
With find_sum_recursive:
3335462 function calls (2223643 primitive calls) in 14.137 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1111820/1 13.608 0.000 14.137 14.137 <ipython-input-46-7488a6455e38>:1(find_sum_recursive)
1111820 0.422 0.000 0.422 0.000 {range}
1111820 0.108 0.000 0.108 0.000 {len}
1 0.000 0.000 14.137 14.137 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
And now with find_sum_realDP:
736 function calls (7 primitive calls) in 0.007 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
730/1 0.007 0.000 0.007 0.007 <ipython-input-155-4a624e5a99b7>:9(find_sum_realDP_do)
1 0.000 0.000 0.007 0.007 <ipython-input-155-4a624e5a99b7>:1(find_sum_realDP)
1 0.000 0.000 0.000 0.000 {numpy.core.multiarray.zeros}
1 0.000 0.000 0.007 0.007 <string>:1(<module>)
2 0.000 0.000 0.000 0.000 {len}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
So we have less than 1/1000th of the calls, and run in less than 1/2000th of the time. Of course, the bigger a list you use, the better the DP algorithm will work. On my computer, running with sum_to_find of 15 and a list of 600 random numbers from 1 to 8, realDP only takes 0.09 seconds, and has less than 10,000 function calls; it's around this point that the 64-bit integers I'm using start overflowing and we have all sorts of other problems. Needless to say, the recursive algorithm would never be able to handle a list anywhere near that size before the computer stopped functioning, either from the materials inside it breaking down or the heat death of the universe.
One thing is that your code does much list copying. It would be faster if it just passed index or indices to define a “window view” and not to copy the lists all over. For the first method you can easily add a parametr starting_index and use it in your for loop. In the second method, your write number_list[:].pop() and copy whole list just to get the last element which you could simply do as number_list[-1]. You could also add a parameter ending_index and use it in your test (len(number_list) == ending_index instead of number_list != [], btw even just plain number_list is better than testing against empty list).
Based on that answer here are two versions of merge function used for mergesort.
Could you help me to understand why the second one is much faster.
I have tested it for list of 50000 and the second one is 8 times faster (Gist).
def merge1(left, right):
i = j = inv = 0
merged = []
while i < len(left) and j < len(right):
if left[i] <= right[j]:
merged.append(left[i])
i += 1
else:
merged.append(right[j])
j += 1
inv += len(left[i:])
merged += left[i:]
merged += right[j:]
return merged, inv
.
def merge2(array1, array2):
inv = 0
merged_array = []
while array1 or array2:
if not array1:
merged_array.append(array2.pop())
elif (not array2) or array1[-1] > array2[-1]:
merged_array.append(array1.pop())
inv += len(array2)
else:
merged_array.append(array2.pop())
merged_array.reverse()
return merged_array, inv
Here is the sort function:
def _merge_sort(list, merge):
len_list = len(list)
if len_list < 2:
return list, 0
middle = len_list / 2
left, left_inv = _merge_sort(list[:middle], merge)
right, right_inv = _merge_sort(list[middle:], merge)
l, merge_inv = merge(left, right)
inv = left_inv + right_inv + merge_inv
return l, inv
.
import numpy.random as nprnd
test_list = nprnd.randint(1000, size=50000).tolist()
test_list_tmp = list(test_list)
merge_sort(test_list_tmp, merge1)
test_list_tmp = list(test_list)
merge_sort(test_list_tmp, merge2)
Similar answer as kreativitea's above, but with more info (i think!)
So profiling the actual merge functions, for the merging of two 50K arrays,
merge 1
311748 function calls in 15.363 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.001 0.001 15.363 15.363 <string>:1(<module>)
1 15.322 15.322 15.362 15.362 merge.py:3(merge1)
221309 0.030 0.000 0.030 0.000 {len}
90436 0.010 0.000 0.010 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
merge2
250004 function calls in 0.104 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.001 0.001 0.104 0.104 <string>:1(<module>)
1 0.074 0.074 0.103 0.103 merge.py:20(merge2)
50000 0.005 0.000 0.005 0.000 {len}
100000 0.010 0.000 0.010 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
100000 0.014 0.000 0.014 0.000 {method 'pop' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'reverse' of 'list' objects}
So for merge1, it's 221309 len, 90436 append, and takes 15.363 seconds.
So for merge2, it's 50000 len, 100000 append, and 100000 pop and takes 0.104 seconds.
len and append pop are all O(1) (more info here), so these profiles aren't showing what's actually taking the time, since going of just that, it should be faster, but only ~20% so.
Okay the cause is actually fairly obvious if you just read the code:
In the first method, there is this line:
inv += len(left[i:])
so every time that is called, it has to rebuild an array. If you comment out this line (or just replace it with inv += 1 or something) then it becomes faster than the other method. This is the single line responsible for the increased time.
Having noticed this is the cause, the issue can be fixed by improving the code; change it to this for a speed up. After doing this, it will be faster than merge2
inv += len(left) - i
Update it to this:
def merge3(left, right):
i = j = inv = 0
merged = []
while i < len(left) and j < len(right):
if left[i] <= right[j]:
merged.append(left[i])
i += 1
else:
merged.append(right[j])
j += 1
inv += len(left) - i
merged += left[i:]
merged += right[j:]
return merged, inv
You can use the excellent cProfile module to help you solve things like this.
>>> import cProfile
>>> a = range(1,20000,2)
>>> b = range(0,20000,2)
>>> cProfile.run('merge1(a, b)')
70002 function calls in 0.195 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.184 0.184 0.195 0.195 <pyshell#7>:1(merge1)
1 0.000 0.000 0.195 0.195 <string>:1(<module>)
50000 0.008 0.000 0.008 0.000 {len}
19999 0.003 0.000 0.003 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
>>> cProfile.run('merge2(a, b)')
50004 function calls in 0.026 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.016 0.016 0.026 0.026 <pyshell#12>:1(merge2)
1 0.000 0.000 0.026 0.026 <string>:1(<module>)
10000 0.002 0.000 0.002 0.000 {len}
20000 0.003 0.000 0.003 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
20000 0.005 0.000 0.005 0.000 {method 'pop' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'reverse' of 'list' objects}
After looking at the information a bit, it looks like the commenters are correct-- its not the len function-- it's the string module. The string module is invoked when you compare the length of things, as follows:
>>> cProfile.run('0 < len(c)')
3 function calls in 0.000 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.000 0.000 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 {len}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
It is also invoked when slicing a list, but this is a very quick operation.
>>> len(c)
20000000
>>> cProfile.run('c[3:2000000]')
2 function calls in 0.011 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.011 0.011 0.011 0.011 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
TL;DR: Something in the string module is taking 0.195s in your first function, and 0.026s in your second function. : apparently, the rebuilding of the array in inv += len(left[i:]) this line.
If I had to guess, I would say it probably has to do with the cost of removing elements from a list, removing from the end (pop) is quicker than removing from the beginning. the second favors removing elements from the end of the list.
See Performance Notes: http://effbot.org/zone/python-list.htm
"The time needed to remove an item is about the same as the time needed to insert an item at the same location; removing items at the end is fast, removing items at the beginning is slow."