This question already has answers here:
Sieve of Eratosthenes - Finding Primes Python
(22 answers)
Closed 4 years ago.
I need to generate a large number of prime numbers, however it is taking far too long using the Sieve of Eratosthenes. Currently it takes roughly 3 seconds to generate primes under 100,000 and roughly 30 for primes under 1,000,000. Which seems to indicate an O(n) complexity but as far as I know that's not right. Code:
def generate_primes(limit):
boolean_list = [False] * 2 + [True] * (limit - 1)
for n in range(2, int(limit ** 0.5 + 1)):
if boolean_list[n] == True:
for i in range(n ** 2, limit + 1, n):
boolean_list[i] = False
Am I missing something obvious? How can I improve the performance of the sieve?
Loop indexing is well known in Python to be an incredibly slow operation. By replacing a loop with array slicing, and a list with a Numpy array, we see increases # 3x:
import numpy as np
import timeit
def generate_primes_original(limit):
boolean_list = [False] * 2 + [True] * (limit - 1)
for n in range(2, int(limit ** 0.5 + 1)):
if boolean_list[n] == True:
for i in range(n ** 2, limit + 1, n):
boolean_list[i] = False
return np.array(boolean_list,dtype=np.bool)
def generate_primes_fast(limit):
boolean_list = np.array([False] * 2 + [True] * (limit - 1),dtype=bool)
for n in range(2, int(limit ** 0.5 + 1)):
if boolean_list[n]:
boolean_list[n*n:limit+1:n] = False
return boolean_list
limit = 1000
print(timeit.timeit("generate_primes_fast(%d)"%limit, setup="from __main__ import generate_primes_fast"))
# 30.90620080102235 seconds
print(timeit.timeit("generate_primes_original(%d)"%limit, setup="from __main__ import generate_primes_original"))
# 91.12803511600941 seconds
assert np.array_equal(generate_primes_fast(limit),generate_primes_original(limit))
# [nothing to stdout - they are equal]
To gain even more speed, one option is to use numpy vectorization. Looking at the outer loop, it's not immediately obvious how one could vectorize that.
Second, you will see dramatic speed-ups if you port to Cython, which should be a fairly seamless process.
Edit: you may also see improvements by changing things like n**2 => math.pow(n,2), but minor improvements like that are inconsequential compared to the bigger problem, which is the iterator.
If your are still using Python 2 use xrange instead of range for greater speed
Related
I am calculating the run time complexity of the below Python code as n^2 but it seems that it is incorrect. The correct answer shown is n(n-1)/2. Can anyone help help me in understanding why the inner loop is not running n*n times but rather n(n-1)/2?
for i in range(n):
for j in range(i):
val += 1
On the first run of the inner loop, when i = 0, the for loop does not execute at all. (There are zero elements to iterate over in range(0)).
On the second run of the inner loop, when i = 1, the for loop executes for one iteration. (There is one element to iterate over in range(1)).
Then, there are two elements in the third run, three in the fourth, following this pattern, up until i = n - 1.
The total number of iterations is 0 + 1 + ... + (n - 1). This summation has a well known form*: for any natural m, 0 + 1 + ... + m = m * (m + 1) / 2. In this case, we have m = n - 1, so there are n * (n - 1) / 2 iterations in total.
You're right that the asymptotic time complexity of this algorithm is O(n^2). However, given that you've said that the "correct answer" is n * (n - 1) / 2, the question most likely asked for the exact number of iterations (rather than an asymptotic bound).
* See here for a proof of this closed form.
def s_r(n):
if(n==1):
return 1
temp = 1
for i in range(int(n)):
temp += i
return temp * s_r(n/2) * s_r(n/2) * s_r(n/2) * s_r(n/2)
Using recursion tree what’s the Big Oh
And also how do we write this function into one recursive call.
I could only do the first part which I got O(n^2). This was one my exam questions which I want to know the answer and guidance to. Thank you
First, note that the program is incorrect: n/2 is floating point division in python (since Python3), so the program does not terminate unless n is a power of 2 (or eventually rounds to a power of 2). The corrected version of the program that works for all integer n>=1, is this:
def s_r(n):
if(n==1):
return 1
temp = 1
for i in range(n):
temp += i
return temp * s_r(n//2) * s_r(n//2) * s_r(n//2) * s_r(n//2)
If T(n) is the number of arithmetic operations performed by the function, then we have the recurrence relation:
T(1) = 0
T(n) = n + 4T(n//2)
T(n) is Theta(n^2) -- for n a power of 2 and telescoping we get: n + 4(n/2) + 16(n/4) + ... = 1n + 2n + 4n + 8n + ... + nn = (2n-1)n
We can rewrite the program to use Theta(log n) arithmetic operations. First, the temp variable is 1 + 0+1+...+(n-1) = n(n-1)/2 + 1. Second, we can avoid making the same recursive call 4 times.
def s_r(n):
return (1 + n * (n-1) // 2) * s_r(n//2) ** 4 if n > 1 else 1
Back to complexity, I have been careful to say that the first function does Theta(n^2) arithmetic operations and the second Theta(log n) arithmetic operations, rather than use the expression "time complexity". The result of the function grows FAST, so it's not practical to assume arithmetic operations run in O(1) time. If we print the length of the result and the time taken to compute it (using the second, faster version of the code) for powers of 2 using this...
import math
import timeit
def s_r(n):
return (1 + n * (n-1) // 2) * s_r(n//2) ** 4 if n > 1 else 1
for i in range(16):
n = 2 ** i
start = timeit.default_timer()
r = s_r(n)
end = timeit.default_timer()
print('n=2**%d,' % i, 'digits=%d,' % (int(math.log(r))+1), 'time=%.3gsec' % (end - start))
... we get this table, in which you can see the number of digits in the result grows FAST (s_r(2**14) has 101 million digits for example), and the time measured does not grow with log(n), but when n is doubled the time increases by something like a factor of 10, so it grows something like n^3 or n^4.
n=2**0, digits=1, time=6e-07sec
n=2**1, digits=1, time=1.5e-06sec
n=2**2, digits=5, time=9e-07sec
n=2**3, digits=23, time=1.1e-06sec
n=2**4, digits=94, time=2.1e-06sec
n=2**5, digits=382, time=2.6e-06sec
n=2**6, digits=1533, time=3.8e-06sec
n=2**7, digits=6140, time=3.99e-05sec
n=2**8, digits=24569, time=0.000105sec
n=2**9, digits=98286, time=0.000835sec
n=2**10, digits=393154, time=0.00668sec
n=2**11, digits=1572628, time=0.0592sec
n=2**12, digits=6290527, time=0.516sec
n=2**13, digits=25162123, time=4.69sec
n=2**14, digits=100648510, time=42.1sec
n=2**15, digits=402594059, time=377sec
Note that it's not wrong to say the time complexity of the original function is O(n^2) and the improved version of the code O(log n), it's just that these describe a measure of the program (arithmetic operations) and are not at all useful as an estimate of actual program running time.
As #kcsquared said in comments I also believe this function is O(n²). This code can be refactored into one function call by storing the result of recursion call, or just doing some math application. Also you can simplify the range sum, by using the built-in sum
def s_r(n):
if n == 1:
return 1
return (sum(range(int(n))) + 1) * s_r(n/2) ** 4
I have this python code for finding the longest substring. I'm trying to figure out the asymptotic run time of it and I've arrived at an answer but I'm not sure if it's correct. Here is the code:
def longest_substring(s, t):
best = ' '
for s_start in range(0, len(s)):
for s_end in range(s_start, len(s)+1):
for t_start in range(0, len(t)):
for t_end in range(t_start, len(t)+1):
if s[s_start:s_end] == t[t_start:t_end]:
current = s[s_start:s_end]
if len(current) > len(best):
best = current
return best
Obviously this function has a very slow run time. It was designed that way. My approach was that because there is a for loop with 3 more nested for-loops, the run-time is something like O(n^4). I am not sure if this is correct due to not every loop iterating over the input size. Also, it is to be assumed that s = t = n(input size). Any ideas?
If you're not convinced that it's O(n^5), try calculating how many loops you run through for string s alone (i.e. the outer two loops). When s_start == 0, the inner loop runs n + 1 times; when s_start == 1, the inner loop runs n times, and so on, until s_start = n - 1, for which the inner loop runs twice.
The sum
(n + 1) + (n) + (n - 1) + ... + 2
is an arithmetic series for which the formula is
((n + 1) + 2) * n / 2
which is O(n^2).
An additional n factor comes from s[s_start:s_end] == t[t_start:t_end], which is O(n).
I recently downloaded the bitarray module from here, for a faster prime sieve, but the results are dismal.
from bitarray import bitarray
from numpy import ones
from timeit import timeit
def sieve1(n = 1000000):
'''Sieve of Eratosthenes (bitarray)'''
l = (n - 1)/2; a = bitarray(l); a.setall(True)
for i in xrange(500):
if a[i]:
s = i+i+3; t = (s*s-3)/2; a[t:l:s] = False
return [2] + [x+x+3 for x in xrange(l) if a[x]]
def sieve2(n = 1000000):
'''Sieve of Eratosthenes (list)'''
l = (n - 1)/2; a = [True] * l
for i in xrange(500):
if a[i]:
s = i+i+3; t = (s*s-3)/2; u = l-t-1
a[t:l:s] = [False] * (u/s + 1)
return [2] + [x+x+3 for x in xrange(l)]
def sieve3(n = 1000000):
'''Sieve of Eratosthenes (numpy.ones)'''
l = (n - 1)/2; a = ones(l, dtype=bool)
for i in xrange(500):
if a[i]:
s = i+i+3; t = (s*s-3)/2; a[t:l:s] = False
return [2] + [x+x+3 for x in xrange(l)]
print timeit(sieve1, number=10)
print timeit(sieve2, number=10)
print timeit(sieve3, number=10)
Here are the result -
1.59695601594
0.666230770593
0.523708537583
The bitarray sieve is more than twice as slow as a list.
Does anyone have suggestions for better arrays? Anything should be faster than a python list, or so I thought.
numpy.ones is fastest, but I don't like numpy since it takes a long time to import it.
I'm basically looking for a fast data-holder, which is mutable, and can hold True and False.
The actual setting and clearing of bits is very fast in bitarray. What is making is slower building the return list. Instead of iterating through a range, and then testing each bit, take advantage of bitarray's support of iterating through the bits.
Try this:
def bitarray_sieve(n = 1000000):
'''Sieve of Eratosthenes (bitarray)'''
l = (n - 1)//2; a = bitarray(l); a.setall(True)
for i in range(500):
if a[i]:
s = i+i+3; t = (s*s-3)//2; a[t:l:s] = False
return [2] + [x+x+3 for x,b in enumerate(a) if b]
It runs in about 0.38 seconds on my machine while the list version takes about 0.47 seconds.
Have you looked at this question?
I maintain gmpy2 and I added the ability to iterate over the bits in an integer and to set/clear bits.
The following example takes about 0.16 seconds.
def gmpy2_sieve2(n=1000000):
'''Sieve of Eratosthenes (gmpy2, version 2)'''
l = (n - 1)//2; a = gmpy2.xbit_mask(l)
for i in range(500):
if a[i]:
s = i+i+3; t = (s*s-3)//2; u = l-t-1
a[t:l:s] = 0
return [2] + [x+x+3 for x in a.iter_set()]
The bottleneck is now the calculation x+x+3. The following solution doesn't skip sieving 2. It takes twice as much memory but it allow the bit positions to be used immediately. It takes about 0.08 seconds on my machine:
def gmpy2_sieve(limit=1000000):
'''Returns a generator that yields the prime numbers up to limit.
Bits are set to 1 if their position is composite.'''
sieve_limit = gmpy2.isqrt(limit) + 1
limit += 1
# Mark bit positions 0 and 1 as not prime.
bitmap = gmpy2.xmpz(3)
# Process 2 separately. This allows us to use p+p for the step size
# when sieving the remaining primes.
bitmap[4 : limit : 2] = -1
# Sieve the remaining primes.
for p in bitmap.iter_clear(3, sieve_limit):
bitmap[p*p : limit : p+p] = -1
return list(bitmap.iter_clear(2, limit))
For the actual setting/clearing of the bits, bitarray is faster than gmpy2. And bitarray has many capabilities that gmpy2 lacks. However, I couldn't find a faster method in bitarrary to get the index of which bits are set or clear.
BTW, your benchmark functions sieve2() and sieve3() return incorrect results; you are missing if a[x].
In Wikipedia this is one of the given algorithms to generate prime numbers:
def eratosthenes_sieve(n):
# Create a candidate list within which non-primes will be
# marked as None; only candidates below sqrt(n) need be checked.
candidates = [i for i in range(n + 1)]
fin = int(n ** 0.5)
# Loop over the candidates, marking out each multiple.
for i in range(2, fin + 1):
if not candidates[i]:
continue
candidates[i + i::i] = [None] * (n // i - 1)
# Filter out non-primes and return the list.
return [i for i in candidates[2:] if i]
I changed the algorithm slightly.
def eratosthenes_sieve(n):
# Create a candidate list within which non-primes will be
# marked as None; only candidates below sqrt(n) need be checked.
candidates = [i for i in range(n + 1)]
fin = int(n ** 0.5)
# Loop over the candidates, marking out each multiple.
candidates[4::2] = [None] * (n // 2 - 1)
for i in range(3, fin + 1, 2):
if not candidates[i]:
continue
candidates[i + i::i] = [None] * (n // i - 1)
# Filter out non-primes and return the list.
return [i for i in candidates[2:] if i]
I first marked off all the multiples of 2, and then I considered odd numbers only. When I timed both algorithms (tried 40.000.000) the first one was always better (albeit very slightly). I don't understand why. Can somebody please explain?
P.S.: When I try 100.000.000, my computer freezes. Why is that? I have Core Duo E8500, 4GB RAM, Windows 7 Pro 64 Bit.
Update 1: This is Python 3.
Update 2: This is how I timed:
start = time.time()
a = eratosthenes_sieve(40000000)
end = time.time()
print(end - start)
UPDATE: Upon valuable comments (especially by nightcracker and Winston Ewert) I managed to code what I intended in the first place:
def eratosthenes_sieve(n):
# Create a candidate list within which non-primes will be
# marked as None; only c below sqrt(n) need be checked.
c = [i for i in range(3, n + 1, 2)]
fin = int(n ** 0.5) // 2
# Loop over the c, marking out each multiple.
for i in range(fin):
if not c[i]:
continue
c[c[i] + i::c[i]] = [None] * ((n // c[i]) - (n // (2 * c[i])) - 1)
# Filter out non-primes and return the list.
return [2] + [i for i in c if i]
This algorithm improves the original algorithm (mentioned at the top) by (usually) 50%. (Still, worse than the algorithm mentioned by nightcracker, naturally).
A question to Python Masters: Is there a more Pythonic way to express this last code, in a more "functional" way?
UPDATE 2: I still couldn't decode the algorithm mentioned by nightcracker. I guess I'm too stupid.
The question is, why would it even be faster? In both examples you are filtering multiples of two, the hard way. It doesn't matter whether you hardcode candidates[4::2] = [None] * (n // 2 - 1) or that it gets executed in the first loop of for i in range(2, fin + 1):.
If you are interested in an optimized sieve of Eratosthenes, here you go:
def primesbelow(N):
# https://stackoverflow.com/questions/2068372/fastest-way-to-list-all-primes-below-n-in-python/3035188#3035188
#""" Input N>=6, Returns a list of primes, 2 <= p < N """
correction = N % 6 > 1
N = (N, N-1, N+4, N+3, N+2, N+1)[N%6]
sieve = [True] * (N // 3)
sieve[0] = False
for i in range(int(N ** .5) // 3 + 1):
if sieve[i]:
k = (3 * i + 1) | 1
sieve[k*k // 3::2*k] = [False] * ((N//6 - (k*k)//6 - 1)//k + 1)
sieve[(k*k + 4*k - 2*k*(i%2)) // 3::2*k] = [False] * ((N // 6 - (k*k + 4*k - 2*k*(i%2))//6 - 1) // k + 1)
return [2, 3] + [(3 * i + 1) | 1 for i in range(1, N//3 - correction) if sieve[i]]
Explanation here: Porting optimized Sieve of Eratosthenes from Python to C++
The original source is here, but there was no explanation. In short this primesieve skips multiples of 2 and 3 and uses a few hacks to make use of fast Python assignment.
You do not save a lot of time avoiding the evens. Most of the computation time within the algorithm is spent doing this:
candidates[i + i::i] = [None] * (n // i - 1)
That line causes a lot of action on the part of the computer. Whenever the number in question is even, this is not run as the loop bails on the if statement. The time spent running the loop for even numbers is thus really really small. So eliminating those even rounds does not produce a significant change in the timing of the loop. That's why your method isn't considerably faster.
When python produces numbers for range it uses a formula: start + index * step. Multiplying by two as you do in your case is going to be slightly more expensive then one as in the original case.
There is also quite possibly a small overhead to having a longer function.
Neither are of those are really significant speed issues, but they override the very small amount of benefit your version brings.
Its probably slightly slower because you are performing extra set up to do something that was done in the first case anyway (marking off multiples of two). That setup time might be what you see if it is as slight as you say
Your extra step is unnecessary and will actually traverse the whole collection n once doing that 'get rid of evens' operation rather than just operating on n^1/2.