Issue when computing fibonacci number with recursion in python - python

I have this code for computing fibonacci numbers using cache (dictionary).
cache = {}
def dynamic_fib(n):
print n
if n == 0 or n == 1:
return 1
if not (n in cache):
print "caching %d" % n
cache[n] = dynamic_fib(n-1) + dynamic_fib(n-2)
return cache[n]
if __name__ == "__main__":
start = time.time()
print "DYNAMIC: ", dynamic_fib(2000)
print (time.time() - start)
I works fine with small numbers, but with more than 1000 as an input, it seems to stop.
This is the result with 2000 as an input.
....
caching 1008
1007
caching 1007
1006
caching 1006
1005
caching 1005
This is a result with 1000 as an input.
....
8
caching 8
7
caching 7
6
caching 6
5
caching 5
It looks like that after 995 storage into the dictionary, it just hangs.
What might be wrong in this? What debugging technique can I use to see what went wrong in python?
I run python on Mac OS X 10.7.5, I have 4G bytes of RAM, so I think some KB (or even MB) of memory usage doesn't matter much.

Python has a default recursion limit set to 1000.
You need to increase it in your program.
import sys
sys.setrecursionlimit(5000)
From : http://docs.python.org/2/library/sys.html#sys.setrecursionlimit
sys.setrecursionlimit(limit)
Set the maximum depth of the Python interpreter stack to limit.
This limit prevents infinite recursion from causing an overflow of the C
stack and crashing Python.
The highest possible limit is platform-dependent. A user may need to set the
limit higher when she has a program that requires deep recursion and a
platform that supports a higher limit. This should bedone with care, because
a too-high limit can lead to a crash.

You don't really gain anything by storing the cache as a dictionary since in order to calculate f(n) you need to know f(n-1) (and f(n-2)). In other words, your dictionary will always have keys from 2-n. You might as well just use a list instead (it's only an extra 2 elements). Here's a version which caches properly and doesn't hit the recursion limit (ever):
import time
cache = [1,1]
def dynamic_fib(n):
#print n
if n >= len(cache):
for i in range(len(cache),n):
dynamic_fib(i)
cache.append(dynamic_fib(n-1) + dynamic_fib(n-2))
print "caching %d" % n
return cache[n]
if __name__ == "__main__":
start = time.time()
a = dynamic_fib(4000)
print "Dynamic",a
print (time.time() - start)
Note that you could do the same thing with a dict, but I'm almost positive that a list will be faster.
Just for fun, here's a bunch of options (and timings!):
def fib_iter(n):
a, b = 1, 1
for i in xrange(n):
a, b = b, a + b
return a
memo_iter = [1,1]
def fib_iter_memo(n):
if n == 0:
return 1
else:
try:
return memo_iter[n+1]
except IndexError:
a,b = memo_iter[-2:]
for i in xrange(len(memo_iter),n+2):
a, b = b, a + b
memo_iter.append(a)
return memo_iter[-1]
dyn_cache = [1,1]
def dynamic_fib(n):
if n >= len(dyn_cache):
for i in xrange(len(dyn_cache),n):
dynamic_fib(i)
dyn_cache.append(dynamic_fib(n-1) + dynamic_fib(n-2))
return dyn_cache[n]
dyn_cache2 = [1,1]
def dynamic_fib2(n):
if n >= len(dyn_cache2):
for i in xrange(len(dyn_cache2),n):
dynamic_fib2(i)
dyn_cache2.append(dyn_cache2[-1] + dyn_cache2[-2])
return dyn_cache2[n]
cache_fibo = [1,1]
def dyn_fib_simple(n):
while len(cache_fibo) <= n:
cache_fibo.append(cache_fibo[-1]+cache_fibo[-2])
return cache_fibo[n]
import timeit
for func in ('dyn_fib_simple','dynamic_fib2','dynamic_fib','fib_iter_memo','fib_iter'):
print timeit.timeit('%s(100)'%func,setup='from __main__ import %s'%func),func
print fib_iter(100)
print fib_iter_memo(100)
print fib_iter_memo(100)
print dynamic_fib(100)
print dynamic_fib2(100)
print dyn_fib_simple(100)
And the results:
0.269892930984 dyn_fib_simple
0.256865024567 dynamic_fib2
0.241492033005 dynamic_fib
0.222282171249 fib_iter_memo
7.23831701279 fib_iter
573147844013817084101
573147844013817084101
573147844013817084101
573147844013817084101
573147844013817084101
573147844013817084101

A recursion free version:
def fibo(n):
cache=[1,1]
while len(cache) < n:
cache.append(cache[-1]+cache[-2])
return cache

It's probably because of the limit of stack depth, which results in an RuntimeError. You can increase the stack's recursion limit by calling
sys.setrecursionlimit(<number>)
of the sys module.

Related

Time delta gives 0.0 output

I'm trying to get time delta to run a linear search. When I run on debug mode, the delta variable is logging the difference but when ran as regular python script, print isn't giving the right result. It prints 0.0 even if I change the target which isn't accurate time difference.
import time
import itertools
def linear_search(arry, target):
for index, value in enumerate(arry):
if value == target:
return True
return False
k = list(itertools.islice(range(2000000), 1, 2000000, 2))
start = time.time()
linear_search(k, 25)
done = time.time()
delta = done - start
print(delta)
Can someone help to find if there's anything wrong in the print statement?
The issue might be the resolution of time.time() particularly on Windows. I believe the resolution there is apparently 16 milliseconds.
There is a different package (timeit) that is more oriented to timing very short method calls.
Here is an example:
import timeit
setup = '''
def linear_search(arry, target):
for index, value in enumerate(arry):
if value == target:
return True
return False
k = list(itertools.islice(range(2000000), 1, 2000000, 2))
n = 25
'''
print(timeit.timeit("linear_search(k, n)", setup=setup, number=1000))
This gives me an average of 0.00021399999999999197 over 1000 runs

Why count() method is faster than a for loop python

Here are 2 functions that do exactly the same thing, but does anyone know why the one using the count() method is much faster than the other? (I mean how does it work? How is it built?)
If possible, I'd like a more understandable answer than what's found here : Algorithm used to implement the Python str.count function
or what's in the source code : https://hg.python.org/cpython/file/tip/Objects/stringlib/fastsearch.h
def scoring1(seq):
score = 0
for i in range(len(seq)):
if seq[i] == '0':
score += 1
return score
def scoring2(seq):
score = 0
score = seq.count('0')
return score
seq = 'AATTGGCCGGGGAG0CTTC0CTCC000TTTCCCCGGAAA'
# takes 1min15 when applied to 100 sequences larger than 100 000 characters
score1 = scoring1(seq)
# takes 10 sec when applied to 100 sequences larger than 100 000 characters
score2 = scoring2(seq)
Thanks a lot for your reply
#CodeMonkey has already given the answer, but it is potentially interesting to note that your first function can be improved so that it runs about 20% faster:
import time, random
def scoring1(seq):
score=0
for i in range(len(seq)):
if seq[i]=='0':
score+=1
return score
def scoring2(seq):
score=0
for x in seq:
score += (x =='0')
return score
def scoring3(seq):
score = 0
score = seq.count('0')
return score
def test(n):
seq = ''.join(random.choice(['0','1']) for i in range(n))
functions = [scoring1,scoring2,scoring3]
for i,f in enumerate(functions):
start = time.clock()
s = f(seq)
elapsed = time.clock() - start
print('scoring' + str(i+1) + ': ' + str(s) + ' computed in ' + str(elapsed) + ' seconds')
test(10**7)
Typical output:
scoring1: 5000742 computed in 0.9651326495293333 seconds
scoring2: 5000742 computed in 0.7998054195159483 seconds
scoring3: 5000742 computed in 0.03732172598339578 seconds
Both of the first two approaches are blown away by the built-in count().
Moral of the story: when you are not using an already optimized built-in method, you need to optimize your own code.
Because count is executed in the underlying native implementation. The for-loop is executed in slower interpreted code.

Python Random Function without using random module

I need to write the function -
random_number(minimum,maximum)
Without using the random module and I did this:
import time
def random_number(minimum,maximum):
now = str(time.clock())
rnd = float(now[::-1][:3:])/1000
return minimum + rnd*(maximum-minimum)
I am not sure this is fine.. is there a known way to do it with the time?
The thing is I need to do something that somehow uses the time
You could generate randomness based on a clock drift:
import struct
import time
def lastbit(f):
return struct.pack('!f', f)[-1] & 1
def getrandbits(k):
"Return k random bits using a relative drift of two clocks."
# assume time.sleep() and time.clock() use different clocks
# though it might work even if they use the same clock
#XXX it does not produce "good" random bits, see below for details
result = 0
for _ in range(k):
time.sleep(0)
result <<= 1
result |= lastbit(time.clock())
return result
Once you have getrandbits(k), it is straigforward to get a random integer in range [a, b], including both end points. Based on CPython Lib/random.py:
def randint(a, b):
"Return random integer in range [a, b], including both end points."
return a + randbelow(b - a + 1)
def randbelow(n):
"Return a random int in the range [0,n). Raises ValueError if n<=0."
# from Lib/random.py
if n <= 0:
raise ValueError
k = n.bit_length() # don't use (n-1) here because n can be 1
r = getrandbits(k) # 0 <= r < 2**k
while r >= n: # avoid skew
r = getrandbits(k)
return r
Example, to generate 20 random numbers from 10 to 110 including:
print(*[randint(10, 110) for _ in range(20)])
Output:
11 76 66 58 107 102 73 81 16 58 43 107 108 98 17 58 18 107 107 77
If getrandbits(k) returns k random bits then randint(a, b) should work as is (no skew due to modulo, etc).
To test the quality of getrandbits(k), dieharder utility could be used:
$ python3 random-from-time.py | dieharder -a -g 200
where random-from-time.py generates infinite (random) binary stream:
#!/usr/bin/env python3
def write_random_binary_stream(write):
while True:
write(getrandbits(32).to_bytes(4, 'big'))
if __name__ == "__main__":
import sys
write_random_binary_stream(sys.stdout.buffer.write)
where getrandbits(k) is defined above.
The above assumes that you are not allowed to use os.urandom() or ssl.RAND_bytes(), or some known PRNG algorithm such as Mersenne Twister to implement getrandbits(k).
getrandbits(n) implemented using "time.sleep() + time.clock()" fails dieharder tests (too many to be a coincidence).
The idea is still sound: a clock drift may be used as a source of randomness (entropy) but you can't use it directly (the distribution is not uniform and/or some bits are dependent); the bits could be passed as a seed to a PRNG that accepts an arbitrary entropy source instead. See "Mixing" section.
Are you allowed to read random data in some special file? Under Linux, the file `/dev/urandom' provides a convenient way to get random bytes. You could write:
import struct
f = open("/dev/urandom","r")
n = struct.unpack("i",f.read(4))[0]
But this will not work under Windows however.
Idea is to get number between 0 and 1 using time module and use that to get a number in range.Following will print 20 numbers randomly in range 20 and 60
from time import time
def time_random():
return time() - float(str(time()).split('.')[0])
def gen_random_range(min, max):
return int(time_random() * (max - min) + min)
if __name__ == '__main__':
for i in range(20):
print gen_random_range(20,60)
here we need to understand one thing that
a random varible is generated by using random
values that gives at run time. For that we need
time module
time.time() gives you random values (digits count nearly 17).
we need in milliseconds so we need to multiply by 1000
if i need the values from 0-10
then we need to get the value less than 10 that means we need below:
time.time%10 (but it is in float we need to convert to int)
int(time.time%10)
import time
def rand_val(x):
random=int(time.time()*1000)
random %= x
return random
x=int(input())
print(rand_val(x))
Use API? if allowed.
import urllib2
def get_random(x,y):
url = 'http://www.random.org/integers/?num=1&min=[min]&max=[max]&col=1&base=10&format=plain&rnd=new'
url = url.replace("[min]", str(x))
url = url.replace("[max]", str(y))
response = urllib2.urlopen(url)
num = response.read()
return num.strip()
print get_random(1,1000)
import datetime
def rand(s,n):
'''
This function create random number between the given range, its maximum range is 6 digits
'''
s = int(s)
n = int(n)
list_sec = datetime.datetime.now()
last_el=str(list_sec).split('.')[-1]
len_str=len(str(n))
get_number_elements = last_el[-int(len_str):]
try:
if int(get_number_elements)<=n and int(get_number_elements)>=s:
return get_number_elements
else:
max_value = int('9'*len_str)
res = s+int(get_number_elements)*(n-s)/(max_value)
return res
except Exception as e:
print(e)
finding random values between in a range(x,y)
you need to subtract low range from high store at x
then find random from 0-x
then add the value to low range-> lowrange+x(x is random)
import time
def rand_val(x,y):
sub=y-x
random=int(time.time()*1000)
random %=sub
random+=x
return random
x=int(input())
y=int(input())
print(rand_val(x,y))

Track and display percentage of code already executed

I have a very large code that takes some time to run. In order to make sure the process hasn't stalled somewhere I print to screen the percentage of the code that has already been executed, which depends on a for loop and an integer.
To display the percentage of the for loop already processed I use flags to indicate how much of the loop already passed.
The MWE might make it a bit more clear:
import time
N = 100
flag_15, flag_30, flag_45, flag_60, flag_75, flag_90 = False, False,\
False, False, False, False
for i in range(N):
# Large block of code.
time.sleep(0.1)
if i + 1 >= 0.15 * N and flag_15 is False:
print '15%'
flag_15 = True
elif i + 1 >= 0.3 * N and flag_30 is False:
print '30%'
flag_30 = True
elif i + 1 >= 0.45 * N and flag_45 is False:
print '45%'
flag_45 = True
elif i + 1 >= 0.6 * N and flag_60 is False:
print '60%'
flag_60 = True
elif i + 1 >= 0.75 * N and flag_75 is False:
print '75%'
flag_75 = True
elif i + 1 >= 0.9 * N and flag_90 is False:
print '90%'
flag_90 = True
elif i + 1 == N:
print '100%'
This works but is quite verbose and truly ugly. I was wondering if there might be a better/prettier way of doing this.
I like to use modulus to periodically print status messages.
import time
N = 100
for i in range(N):
#do work here
if i % 15 == 0:
print "{}% complete".format(int(100 * i / N))
print "100% complete"
Result:
0% complete
15% complete
30% complete
45% complete
60% complete
75% complete
90% complete
100% complete
for values of N other than 100, if you want to print every 15%, you'll have to dynamically calculate the stride instead of just using the literal 15 value.
import time
import math
N = 300
percentage_step = 15
stride = N * percentage_step / 100
for i in range(N):
#do work
if i % stride == 0:
print "{}% complete".format(int(100 * i / N))
(Posting a second answer because this solution uses a completely different technique)
You could create a list of milestone values, and print a message when the percentage complete reaches the lowest value.
milestones = [15, 30, 45, 60, 75, 90, 100]
for i in range(N):
#do work here
percentage_complete = (100.0 * (i+1) / N)
while len(milestones) > 0 and percentage_complete >= milestones[0]:
print "{}% complete".format(milestones[0])
#remove that milestone from the list
milestones = milestones[1:]
Result:
15% complete
30% complete
45% complete
60% complete
75% complete
90% complete
100% complete
Unlike the "stride" method I posted earlier, here you have precise control over which percentages are printed. They don't need to be evenly spaced, they don't need to be divisible by N, they don't even need to be integers! You could do milestones = [math.pi, 4.8, 15.16, 23.42, 99] if you wanted.
You can use combination of write() and flush() for nice ProgressBar:
import sys
import time
for i in range(100):
row = "="*i + ">"
sys.stdout.write("%s\r%d%%" %(row, i + 1))
sys.stdout.flush()
time.sleep(0.1)
sys.stdout.write("\n")
Progress will be displaying like this:
69%====================================================================>
You don't need any flags. You can just print the completion based on the current value of i.
for i in range(N):
# lots of code
print '{0}% completed.'.format((i+1)*100.0/N)
Just add a "\r" in Misha's answer:
import sys
import time
for i in range(100):
row = "="*i + ">"
sys.stdout.write("%s\r %d%%\r" %(row, i + 1))
sys.stdout.flush()
time.sleep(0.1)
sys.stdout.write("\n")
Output:
65%======================================================>
In colab.research.google.com works like this:
import sys
import time
for i in range(100):
row = "="*i + ">"
sys.stdout.write("\r %d%% %s " %( i + 1,row))
sys.stdout.flush()
time.sleep(0.1)
sys.stdout.write("\n")

Create a list with initial capacity in Python

Code like this often happens:
l = []
while foo:
# baz
l.append(bar)
# qux
This is really slow if you're about to append thousands of elements to your list, as the list will have to be constantly resized to fit the new elements.
In Java, you can create an ArrayList with an initial capacity. If you have some idea how big your list will be, this will be a lot more efficient.
I understand that code like this can often be refactored into a list comprehension. If the for/while loop is very complicated, though, this is unfeasible. Is there an equivalent for us Python programmers?
Warning: This answer is contested. See comments.
def doAppend( size=10000 ):
result = []
for i in range(size):
message= "some unique object %d" % ( i, )
result.append(message)
return result
def doAllocate( size=10000 ):
result=size*[None]
for i in range(size):
message= "some unique object %d" % ( i, )
result[i]= message
return result
Results. (evaluate each function 144 times and average the duration)
simple append 0.0102
pre-allocate 0.0098
Conclusion. It barely matters.
Premature optimization is the root of all evil.
Python lists have no built-in pre-allocation. If you really need to make a list, and need to avoid the overhead of appending (and you should verify that you do), you can do this:
l = [None] * 1000 # Make a list of 1000 None's
for i in xrange(1000):
# baz
l[i] = bar
# qux
Perhaps you could avoid the list by using a generator instead:
def my_things():
while foo:
#baz
yield bar
#qux
for thing in my_things():
# do something with thing
This way, the list isn't every stored all in memory at all, merely generated as needed.
Short version: use
pre_allocated_list = [None] * size
to preallocate a list (that is, to be able to address 'size' elements of the list instead of gradually forming the list by appending). This operation is very fast, even on big lists. Allocating new objects that will be later assigned to list elements will take much longer and will be the bottleneck in your program, performance-wise.
Long version:
I think that initialization time should be taken into account.
Since in Python everything is a reference, it doesn't matter whether you set each element into None or some string - either way it's only a reference. Though it will take longer if you want to create a new object for each element to reference.
For Python 3.2:
import time
import copy
def print_timing (func):
def wrapper (*arg):
t1 = time.time()
res = func (*arg)
t2 = time.time ()
print ("{} took {} ms".format (func.__name__, (t2 - t1) * 1000.0))
return res
return wrapper
#print_timing
def prealloc_array (size, init = None, cp = True, cpmethod = copy.deepcopy, cpargs = (), use_num = False):
result = [None] * size
if init is not None:
if cp:
for i in range (size):
result[i] = init
else:
if use_num:
for i in range (size):
result[i] = cpmethod (i)
else:
for i in range (size):
result[i] = cpmethod (cpargs)
return result
#print_timing
def prealloc_array_by_appending (size):
result = []
for i in range (size):
result.append (None)
return result
#print_timing
def prealloc_array_by_extending (size):
result = []
none_list = [None]
for i in range (size):
result.extend (none_list)
return result
def main ():
n = 1000000
x = prealloc_array_by_appending(n)
y = prealloc_array_by_extending(n)
a = prealloc_array(n, None)
b = prealloc_array(n, "content", True)
c = prealloc_array(n, "content", False, "some object {}".format, ("blah"), False)
d = prealloc_array(n, "content", False, "some object {}".format, None, True)
e = prealloc_array(n, "content", False, copy.deepcopy, "a", False)
f = prealloc_array(n, "content", False, copy.deepcopy, (), False)
g = prealloc_array(n, "content", False, copy.deepcopy, [], False)
print ("x[5] = {}".format (x[5]))
print ("y[5] = {}".format (y[5]))
print ("a[5] = {}".format (a[5]))
print ("b[5] = {}".format (b[5]))
print ("c[5] = {}".format (c[5]))
print ("d[5] = {}".format (d[5]))
print ("e[5] = {}".format (e[5]))
print ("f[5] = {}".format (f[5]))
print ("g[5] = {}".format (g[5]))
if __name__ == '__main__':
main()
Evaluation:
prealloc_array_by_appending took 118.00003051757812 ms
prealloc_array_by_extending took 102.99992561340332 ms
prealloc_array took 3.000020980834961 ms
prealloc_array took 49.00002479553223 ms
prealloc_array took 316.9999122619629 ms
prealloc_array took 473.00004959106445 ms
prealloc_array took 1677.9999732971191 ms
prealloc_array took 2729.999780654907 ms
prealloc_array took 3001.999855041504 ms
x[5] = None
y[5] = None
a[5] = None
b[5] = content
c[5] = some object blah
d[5] = some object 5
e[5] = a
f[5] = []
g[5] = ()
As you can see, just making a big list of references to the same None object takes very little time.
Prepending or extending takes longer (I didn't average anything, but after running this a few times I can tell you that extending and appending take roughly the same time).
Allocating new object for each element - that is what takes the most time. And S.Lott's answer does that - formats a new string every time. Which is not strictly required - if you want to preallocate some space, just make a list of None, then assign data to list elements at will. Either way it takes more time to generate data than to append/extend a list, whether you generate it while creating the list, or after that. But if you want a sparsely-populated list, then starting with a list of None is definitely faster.
The Pythonic way for this is:
x = [None] * numElements
Or whatever default value you wish to prepopulate with, e.g.
bottles = [Beer()] * 99
sea = [Fish()] * many
vegetarianPizzas = [None] * peopleOrderingPizzaNotQuiche
(Caveat Emptor: The [Beer()] * 99 syntax creates one Beer and then populates an array with 99 references to the same single instance)
Python's default approach can be pretty efficient, although that efficiency decays as you increase the number of elements.
Compare
import time
class Timer(object):
def __enter__(self):
self.start = time.time()
return self
def __exit__(self, *args):
end = time.time()
secs = end - self.start
msecs = secs * 1000 # Millisecs
print('%fms' % msecs)
Elements = 100000
Iterations = 144
print('Elements: %d, Iterations: %d' % (Elements, Iterations))
def doAppend():
result = []
i = 0
while i < Elements:
result.append(i)
i += 1
def doAllocate():
result = [None] * Elements
i = 0
while i < Elements:
result[i] = i
i += 1
def doGenerator():
return list(i for i in range(Elements))
def test(name, fn):
print("%s: " % name, end="")
with Timer() as t:
x = 0
while x < Iterations:
fn()
x += 1
test('doAppend', doAppend)
test('doAllocate', doAllocate)
test('doGenerator', doGenerator)
with
#include <vector>
typedef std::vector<unsigned int> Vec;
static const unsigned int Elements = 100000;
static const unsigned int Iterations = 144;
void doAppend()
{
Vec v;
for (unsigned int i = 0; i < Elements; ++i) {
v.push_back(i);
}
}
void doReserve()
{
Vec v;
v.reserve(Elements);
for (unsigned int i = 0; i < Elements; ++i) {
v.push_back(i);
}
}
void doAllocate()
{
Vec v;
v.resize(Elements);
for (unsigned int i = 0; i < Elements; ++i) {
v[i] = i;
}
}
#include <iostream>
#include <chrono>
using namespace std;
void test(const char* name, void(*fn)(void))
{
cout << name << ": ";
auto start = chrono::high_resolution_clock::now();
for (unsigned int i = 0; i < Iterations; ++i) {
fn();
}
auto end = chrono::high_resolution_clock::now();
auto elapsed = end - start;
cout << chrono::duration<double, milli>(elapsed).count() << "ms\n";
}
int main()
{
cout << "Elements: " << Elements << ", Iterations: " << Iterations << '\n';
test("doAppend", doAppend);
test("doReserve", doReserve);
test("doAllocate", doAllocate);
}
On my Windows 7 CoreĀ i7, 64-bit Python gives
Elements: 100000, Iterations: 144
doAppend: 3587.204933ms
doAllocate: 2701.154947ms
doGenerator: 1721.098185ms
While C++ gives (built with Microsoft Visual C++, 64-bit, optimizations enabled)
Elements: 100000, Iterations: 144
doAppend: 74.0042ms
doReserve: 27.0015ms
doAllocate: 5.0003ms
C++ debug build produces:
Elements: 100000, Iterations: 144
doAppend: 2166.12ms
doReserve: 2082.12ms
doAllocate: 273.016ms
The point here is that with Python you can achieve a 7-8% performance improvement, and if you think you're writing a high-performance application (or if you're writing something that is used in a web service or something) then that isn't to be sniffed at, but you may need to rethink your choice of language.
Also, the Python code here isn't really Python code. Switching to truly Pythonesque code here gives better performance:
import time
class Timer(object):
def __enter__(self):
self.start = time.time()
return self
def __exit__(self, *args):
end = time.time()
secs = end - self.start
msecs = secs * 1000 # millisecs
print('%fms' % msecs)
Elements = 100000
Iterations = 144
print('Elements: %d, Iterations: %d' % (Elements, Iterations))
def doAppend():
for x in range(Iterations):
result = []
for i in range(Elements):
result.append(i)
def doAllocate():
for x in range(Iterations):
result = [None] * Elements
for i in range(Elements):
result[i] = i
def doGenerator():
for x in range(Iterations):
result = list(i for i in range(Elements))
def test(name, fn):
print("%s: " % name, end="")
with Timer() as t:
fn()
test('doAppend', doAppend)
test('doAllocate', doAllocate)
test('doGenerator', doGenerator)
Which gives
Elements: 100000, Iterations: 144
doAppend: 2153.122902ms
doAllocate: 1346.076965ms
doGenerator: 1614.092112ms
(in 32-bit, doGenerator does better than doAllocate).
Here the gap between doAppend and doAllocate is significantly larger.
Obviously, the differences here really only apply if you are doing this more than a handful of times or if you are doing this on a heavily loaded system where those numbers are going to get scaled out by orders of magnitude, or if you are dealing with considerably larger lists.
The point here: Do it the Pythonic way for the best performance.
But if you are worrying about general, high-level performance, Python is the wrong language. The most fundamental problem being that Python function calls has traditionally been up to 300x slower than other languages due to Python features like decorators, etc. (PythonSpeed/PerformanceTips, Data Aggregation).
As others have mentioned, the simplest way to preseed a list is with NoneType objects.
That being said, you should understand the way Python lists actually work before deciding this is necessary.
In the CPython implementation of a list, the underlying array is always created with overhead room, in progressively larger sizes ( 4, 8, 16, 25, 35, 46, 58, 72, 88, 106, 126, 148, 173, 201, 233, 269, 309, 354, 405, 462, 526, 598, 679, 771, 874, 990, 1120, etc), so that resizing the list does not happen nearly so often.
Because of this behavior, most list.append() functions are O(1) complexity for appends, only having increased complexity when crossing one of these boundaries, at which point the complexity will be O(n). This behavior is what leads to the minimal increase in execution time in S.Lott's answer.
Source: Python list implementation
I ran S.Lott's code and produced the same 10% performance increase by preallocating. I tried Ned Batchelder's idea using a generator and was able to see the performance of the generator better than that of the doAllocate. For my project the 10% improvement matters, so thanks to everyone as this helps a bunch.
def doAppend(size=10000):
result = []
for i in range(size):
message = "some unique object %d" % ( i, )
result.append(message)
return result
def doAllocate(size=10000):
result = size*[None]
for i in range(size):
message = "some unique object %d" % ( i, )
result[i] = message
return result
def doGen(size=10000):
return list("some unique object %d" % ( i, ) for i in xrange(size))
size = 1000
#print_timing
def testAppend():
for i in xrange(size):
doAppend()
#print_timing
def testAlloc():
for i in xrange(size):
doAllocate()
#print_timing
def testGen():
for i in xrange(size):
doGen()
testAppend()
testAlloc()
testGen()
Output
testAppend took 14440.000ms
testAlloc took 13580.000ms
testGen took 13430.000ms
Concerns about preallocation in Python arise if you're working with NumPy, which has more C-like arrays. In this instance, preallocation concerns are about the shape of the data and the default value.
Consider NumPy if you're doing numerical computation on massive lists and want performance.
Python's list doesn't support preallocation. Numpy allows you to preallocate memory, but in practice it doesn't seem to be worth it if your goal is to speed up the program.
This test simply writes an integer into the list, but in a real application you'd likely do more complicated things per iteration, which further reduces the importance of the memory allocation.
import timeit
import numpy as np
def list_append(size=1_000_000):
result = []
for i in range(size):
result.append(i)
return result
def list_prealloc(size=1_000_000):
result = [None] * size
for i in range(size):
result[i] = i
return result
def numpy_prealloc(size=1_000_000):
result = np.empty(size, np.int32)
for i in range(size):
result[i] = i
return result
setup = 'from __main__ import list_append, list_prealloc, numpy_prealloc'
print(timeit.timeit('list_append()', setup=setup, number=10)) # 0.79
print(timeit.timeit('list_prealloc()', setup=setup, number=10)) # 0.62
print(timeit.timeit('numpy_prealloc()', setup=setup, number=10)) # 0.73
For some applications, a dictionary may be what you are looking for. For example, in the find_totient method, I found it more convenient to use a dictionary since I didn't have a zero index.
def totient(n):
totient = 0
if n == 1:
totient = 1
else:
for i in range(1, n):
if math.gcd(i, n) == 1:
totient += 1
return totient
def find_totients(max):
totients = dict()
for i in range(1,max+1):
totients[i] = totient(i)
print('Totients:')
for i in range(1,max+1):
print(i,totients[i])
This problem could also be solved with a preallocated list:
def find_totients(max):
totients = None*(max+1)
for i in range(1,max+1):
totients[i] = totient(i)
print('Totients:')
for i in range(1,max+1):
print(i,totients[i])
I feel that this is not as elegant and prone to bugs because I'm storing None which could throw an exception if I accidentally use them wrong, and because I need to think about edge cases that the map lets me avoid.
It's true the dictionary won't be as efficient, but as others have commented, small differences in speed are not always worth significant maintenance hazards.
Fastest Way - use * like list1 = [False] * 1_000_000
Comparing all the common methods (list appending vs preallocation vs for vs while), I found that using * gives the most efficient execution time.
import time
large_int = 10_000_000
start_time = time.time()
# Test 1: List comprehension
l1 = [False for _ in range(large_int)]
end_time_1 = time.time()
# Test 2: Using *
l2 = [False] * large_int
end_time_2 = time.time()
# Test 3: Using append with for loop & range
l3 = []
for _ in range(large_int):
l3.append(False)
end_time_3 = time.time()
# Test 4: Using append with while loop
l4, i = [], 0
while i < large_int:
l4.append(False)
i += 1
end_time_4 = time.time()
# Results
diff_1 = end_time_1 - start_time
diff_2 = end_time_2 - end_time_1
diff_3 = end_time_3 - end_time_2
diff_4 = end_time_4 - end_time_3
print(f"Test 1. {diff_1:.4f} seconds")
print(f"Test 2. {diff_2:.4f} seconds")
print(f"Test 3. {diff_3:.4f} seconds")
print(f"Test 4. {diff_4:.4f} seconds")
print("\nTest 2 is faster than - ")
print(f" Test 1 by - {(diff_1 / diff_2 * 100 - 1):,.0f}%")
print(f" Test 3 by - {(diff_3 / diff_2 * 100 - 1):,.0f}%")
print(f" Test 4 by - {(diff_4 / diff_2 * 100 - 1):,.0f}%")
From what I understand, Python lists are already quite similar to ArrayLists. But if you want to tweak those parameters I found this post on the Internet that may be interesting (basically, just create your own ScalableList extension):
http://mail.python.org/pipermail/python-list/2000-May/035082.html

Categories