Sieve of Eratosthenes - Primes between X and N - python

I found this highly optimised implementation of the Sieve of Eratosthenes for Python on Stack Overflow. I have a rough idea of what it's doing but I must admit the details of it's workings elude me.
I would still like to use it for a little project (I'm aware there are libraries to do this but I would like to use this function).
Here's the original:
'''
Sieve of Eratosthenes
Implementation by Robert William Hanks
https://stackoverflow.com/questions/2068372/fastest-way-to-list-all-primes-below-n/3035188
'''
def sieve(n):
"""Return an array of the primes below n."""
prime = numpy.ones(n//3 + (n%6==2), dtype=numpy.bool)
for i in range(3, int(n**.5) + 1, 3):
if prime[i // 3]:
p = (i + 1) | 1
prime[ p*p//3 ::2*p] = False
prime[p*(p-2*(i&1)+4)//3::2*p] = False
result = (3 * prime.nonzero()[0] + 1) | 1
result[0] = 3
return numpy.r_[2,result]
What I'm trying to achieve is to modify it to return all primes below n starting at x so that:
primes = sieve(50, 100)
would return primes between 50 and 100. This seemed easy enough, I tried replacing these two lines:
def sieve(x, n):
...
for i in range(x, int(n**.5) + 1, 3):
...
But for a reason I can't explain, the value of x in the above has no influence on the numpy array returned!
How can I modify sieve() to only return primes between x and n

The implementation you've borrowed is able to start at 3 because it replaces sieving out the multiples of 2 by just skipping all even numbers; that's what the 2*… that appear multiple times in the code are about. The fact that 3 is the next prime is also hardcoded in all over the place, but let's ignore that for the moment, because if you can't get past the special-casing of 2, the special-casing of 3 doesn't matter.
Skipping even numbers is a special case of a "wheel". You can skip sieving multiples of 2 by always incrementing by 2; you can skip sieving multiples of 2 and 3 by alternately incrementing by 2 and 4; you can skip sieving multiples of 2, 3, 5, and 7 by alternately incrementing by 2, 4, 2, 4, 6, 2, 6, … (there's 48 numbers in the sequence), and so on. So, you could extend this code by first finding all the primes up to x, then building a wheel, then using that wheel to find all the primes between x and n.
But that's adding a lot of complexity. And once you get too far beyond 7, the cost (both in time, and in space for storing the wheel) swamps the savings. And if your whole goal is not to find the primes before x, finding the primes before x so you don't have to find them seems kind of silly. :)
The simpler thing to do is just find all the primes up to n, and throw out the ones below x. Which you can do with a trivial change at the end:
primes = numpy.r_[2,result]
return primes[primes>=x]
Or course there are ways to do this without wasting storage for those initial primes you're going to throw away. They'd be a bit complicated to work into this algorithm (you'd probably want to build the array in sections, then drop each section that's entirely < x as you go, then stack all the remaining sections); it would be far easier to use a different implementation of the algorithm that isn't designed for speed and simplicity over space…
And of course there are different prime-finding algorithms that don't require enumerating all the primes up to x in the first place. But if you want to use this implementation of this algorithm, that doesn't matter.

Since you're now interested in looking into other algorithms or other implementations, try this one. It doesn't use numpy, but it is rather fast. I've tried a few variations on this theme, including using sets, and pre-computing a table of low primes, but they were all slower than this one.
#! /usr/bin/env python
''' Prime range sieve.
Written by PM 2Ring 2014.10.15
For range(0, 30000000) this is actually _faster_ than the
plain Eratosthenes sieve in sieve3.py !!!
'''
import sys
def potential_primes():
''' Make a generator for 2, 3, 5, & thence all numbers coprime to 30 '''
s = (2, 3, 5, 7, 11, 13, 17, 19, 23, 29)
for i in s:
yield i
s = (1,) + s[3:]
j = 30
while True:
for i in s:
yield j + i
j += 30
def range_sieve(lo, hi):
''' Create a list of all primes in the range(lo, hi) '''
#Mark all numbers as prime
primes = [True] * (hi - lo)
#Eliminate 0 and 1, if necessary
for i in range(lo, min(2, hi)):
primes[i - lo] = False
ihi = int(hi ** 0.5)
for i in potential_primes():
if i > ihi:
break
#Find first multiple of i: i >= i*i and i >= lo
ilo = max(i, 1 + (lo - 1) // i ) * i
#Determine how many multiples of i >= ilo are in range
n = 1 + (hi - ilo - 1) // i
#Mark them as composite
primes[ilo - lo : : i] = n * [False]
return [i for i,v in enumerate(primes, lo) if v]
def main():
lo = int(sys.argv[1]) if len(sys.argv) > 1 else 0
hi = int(sys.argv[2]) if len(sys.argv) > 2 else lo + 30
#print lo, hi
primes = range_sieve(lo, hi)
#print len(primes)
print primes
#print primes[:10], primes[-10:]
if __name__ == '__main__':
main()
And here's a link to the plain Eratosthenes sieve that I mentioned in the docstring, in case you want to compare this program to that one.
You could improve this slightly by getting rid of the loop under #Eliminate 0 and 1, if necessary. And I guess it might be slightly faster if you avoided looking at even numbers; it'd certainly use less memory. But then you'd have to handle the cases when 2 was inside the range, and I figure that the less tests you have the faster this thing will run.
Here's a minor improvement to that code: replace
#Mark all numbers as prime
primes = [True] * (hi - lo)
#Eliminate 0 and 1, if necessary
for i in range(lo, min(2, hi)):
primes[i - lo] = False
with
#Eliminate 0 and 1, if necessary
lo = max(2, lo)
#Mark all numbers as prime
primes = [True] * (hi - lo)
However, the original form may be preferable if you want to return the plain bool list rather than performing the enumerate to build a list of integers: the bool list is more useful for testing if a given number is prime; OTOH, the enumerate could be used to build a set rather than a list.

Related

Python: take multiples based on condition

Is there a Way in Python to have a for loop which cycle only on multiples of a given number that are not multiples of any number lower than given number?
I mean something like this:
if given_number = 13
the for loop is going to cycle only on [13 * 13, 13 * 17, 13 * 19, 13 * 23, 13*29, 13*31, 13*39, ........]
I came across this kind of problem working on this function:
def get_primes_lower_n(n: int) -> List[int]:
"""
n: int
the given input number
returns: List[int]
all primes number lower than given input number
raises:
ValueError: the given input is an integer < 2
TypeError: the given input is not an integer
"""
if not isinstance(n, int):
raise TypeError("an integer is required")
if n < 2:
raise ValueError("an integer > 1 is required")
primes = np.ones(n + 1, dtype=bool) # bool(1) = True
primes[:2] = False
primes[4:: 2] = False
for i in range(3, isqrt(n) + 1, 2):
if primes[i]:
primes[i ** 2:: 2 * i] = False
return np.where(primes == True)[0]
if n is something like 9 * 10 ** 9 the algorithm is going to explode. For any value of I primes[i ** 2:: 2 * i] = False is going to take more or less 52 seconds. Let's think of I = 7; primes[i ** 2:: 2 * i] = False
is going to set to False all values in positions multiples of 21 and 35 that are already set to False.
As the value of I increases, I expect the time of execution of this operation primes[i ** 2:: 2 * i] = False to take less time (less values need to be set), but instead it increases exponentially. Why?
To answer your first question:
Is there a Way in python to have a for loop which cycle only on multiples of a given number that are not multiples of any number lower than given number?
Yes there is, but not efficient enough (unless you find a faster implementation).
Let n be the number we are looking at.
The numbers that are not multiples of any number lower than n are exactly the numbers left from the sieve when ran up to n. Therefore, these numbers are already present in the array, they are the True values with index greater than n. Unfortunately, as these are stored by index, finding them is expensive and finally makes the code go slower, as the collisions from the original algorithm are so rare. Still, here is a possible implementation in the for loop.
for i in range(3, isqrt(n) + 1, 2):
primes[i*(np.nonzero(primes[i:n//i])[0]+i)] = False
# [i:n//i] is to bound the search from i to the last number such
# that i*(n//i) < n.
Here is an equivalent code without the bool array:
def get_primes_lower_n(n: int) -> list[int]:
primes = np.arange(3, n + 1, 2) # only odd numbers over 3
val = 0
idx = -1
while val <= isqrt(n)+1:
idx += 1
val = primes[idx]
primes = np.setdiff1d(primes, val*primes[idx:idx+bisect.bisect_left(primes[idx:], n//val+1)])
return np.insert(primes, 0, 2)
As a conclusion, it is worth it to set the same values multiple times rather than use an exact approch that is slower.
Sorry for the lack of working solution but i hope this can help you in some way. If you find an interesting algorithm, feel free to let me know!

Optimizing Totient function

I'm trying to maximize the Euler Totient function on Python given it can use large arbitrary numbers. The problem is that the program gets killed after some time so it doesn't reach the desired ratio. I have thought of increasing the starting number into a larger number, but I don't think it's prudent to do so. I'm trying to get a number when divided by the totient gets higher than 10. Essentially I'm trying to find a sparsely totient number that fits this criteria.
Here's my phi function:
def phi(n):
amount = 0
for k in range(1, n + 1):
if fractions.gcd(n, k) == 1:
amount += 1
return amount
The most likely candidates for high ratios of N/phi(N) are products of prime numbers. If you're just looking for one number with a ratio > 10, then you can generate primes and only check the product of primes up to the point where you get the desired ratio
def totientRatio(maxN,ratio=10):
primes = []
primeProd = 1
isPrime = [1]*(maxN+1)
p = 2
while p*p<=maxN:
if isPrime[p]:
isPrime[p*p::p] = [0]*len(range(p*p,maxN+1,p))
primes.append(p)
primeProd *= p
tot = primeProd
for f in primes:
tot -= tot//f
if primeProd/tot >= ratio:
return primeProd,primeProd/tot,len(primes)
p += 1 + (p&1)
output:
totientRatio(10**6)
16516447045902521732188973253623425320896207954043566485360902980990824644545340710198976591011245999110,
10.00371973209101,
55
This gives you the smallest number with that ratio. Multiples of that number will have the same ratio.
n = 16516447045902521732188973253623425320896207954043566485360902980990824644545340710198976591011245999110
n*2/totient(n*2) = 10.00371973209101
n*11*13/totient(n*11*13) = 10.00371973209101
No number will have a higher ratio until you reach the next product of primes (i.e. that number multiplied by the next prime).
n*263/totient(n*263) = 10.041901868473037
Removing a prime from the product affects the ratio by a proportion of (1-1/P).
For example if m = n/109, then m/phi(m) = n/phi(n) * (1-1/109)
(n//109) / totient(n//109) = 9.91194248684247
10.00371973209101 * (1-1/109) = 9.91194248684247
This should allow you to navigate the ratios efficiently and find the numbers that meed your need.
For example, to get a number with a ratio that is >= 10 but closer to 10, you can go to the next prime product(s) and remove one or more of the smaller primes to reduce the ratio. This can be done using combinations (from itertools) and will allow you to find very specific ratios:
m = n*263/241
m/totient(m) = 10.000234225865265
m = n*(263...839) / (7 * 61 * 109 * 137) # 839 is 146th prime
m/totient(m) = 10.000000079805726
I have a partial solution for you, but the results don't look good.. (this solution may not give you an answer with modern computer hardware (amount of ram is limiting currently)) I took an answer from this pcg challenge and modified it to spit out ratios of n/phi(n) up to a particular n
import numba as nb
import numpy as np
import time
n = int(2**31)
#nb.njit("i4[:](i4[:])", locals=dict(
n=nb.int32, i=nb.int32, j=nb.int32, q=nb.int32, f=nb.int32))
def summarum(phi):
#calculate phi(i) for i: 1 - n
#taken from <a>https://codegolf.stackexchange.com/a/26753/42652</a>
phi[1] = 1
i = 2
while i < n:
if phi[i] == 0:
phi[i] = i - 1
j = 2
while j * i < n:
if phi[j] != 0:
q = j
f = i - 1
while q % i == 0:
f *= i
q //= i
phi[i * j] = f * phi[q]
j += 1
i += 1
#divide each by n to get ratio n/phi(n)
i = 1
while i < n: #jit compiled while loop is faster than: for i in range(): blah blah blah
phi[i] = i//phi[i]
i += 1
return phi
if __name__ == "__main__":
s1 = time.time()
a = summarum(np.zeros(n, np.int32))
locations = np.where(a >= 10)
print(len(locations))
I only have enough ram on my work comp. to test about 0 < n < 10^8 and the largest ratio was about 6. You may or may not have any luck going up to larger n, although 10^8 already took several seconds (not sure what the overhead was... spyder's been acting strange lately)
p55# is a sparsely totient number satisfying the desired condition.
Furthermore, all subsequent primorial numbers are as well, because pn# / phi(pn#) is a strictly increasing sequence:
p1# / phi(p1#) is 2, which is positive. For n > 1, pn# / phi(pn#) is equal to pn-1#pn / phi(pn-1#pn), which, since pn and pn-1# are coprime, is equal to (pn-1# / phi(pn-1#)) * (pn/phi(pn)). We know pn > phi(pn) > 0 for all n, so pn/phi(pn) > 1. So we have that the sequence pn# / phi(pn#) is strictly increasing.
I do not believe these to be the only sparsely totient numbers satisfying your request, but I don't have an efficient way of generating the others coming to mind. Generating primorials, by comparison, amounts to generating the first n primes and multiplying the list together (whether by using functools.reduce(), math.prod() in 3.8+, or ye old for loop).
As for the general question of writing a phi(n) function, I would probably first find the prime factors of n, then use Euler's product formula for phi(n). As an aside, make sure to NOT use floating-point division. Even finding the prime factors of n by trial division should outperform computing gcd n times, but when working with large n, replacing this with an efficient prime factorization algorithm will pay dividends. Unless you want a good cross to die on, don't write your own. There's one in sympy that I'm aware of, and given the ubiquity of the problem, probably plenty of others around. Time as needed.
Speaking of timing, if this is still relevant enough to you (or a future reader) to want to time... definitely throw the previous answer in the mix as well.

python prime factorization performance

I'm relatively new to python and I'm confused about the performance of two relatively simple blocks of code. The first function generates a prime factorization of a number n given a list of primes. The second generates a list of all factors of n. I would have though prime_factor would be faster than factors (for the same n), but this is not the case. I'm not looking for better algorithms, but rather I would like to understand why prime_factor is so much slower than factors.
def prime_factor(n, primes):
prime_factors = []
i = 0
while n != 1:
if n % primes[i] == 0:
factor = primes[i]
prime_factors.append(factor)
n = n // factor
else: i += 1
return prime_factors
import math
def factors(n):
if n == 0: return []
factors = {1, n}
for i in range(2, math.floor(n ** (1/2)) + 1):
if n % i == 0:
factors.add(i)
factors.add(n // i)
return list(factors)
Using the timeit module,
{ i:factors(i) for i in range(1, 10000) } takes 2.5 seconds
{ i:prime_factor(i, primes) for i in range(1, 10000) } takes 17 seconds
This is surprising to me. factors checks every number from 1 to sqrt(n), while prime_factor only checks primes. I would appreciate any help in understanding the performance characteristics of these two functions.
Thanks
Edit: (response to roliu)
Here is my code to generate a list of primes from 2 to up_to:
def primes_up_to(up_to):
marked = [0] * up_to
value = 3
s = 2
primes = [2]
while value < up_to:
if marked[value] == 0:
primes.append(value)
i = value
while i < up_to:
marked[i] = 1
i += value
value += 2
return primes
Without seeing what you used for primes, we have to guess (we can't run your code).
But a big part of this is simply mathematics: there are (very roughly speaking) about n/log(n) primes less than n, and that's a lot bigger than sqrt(n). So when you pass a prime, prime_factor(n) does a lot more work: it goes through O(n/log(n)) operations before finding the first prime factor (n itself!), while factors(n) gives up after O(sqrt(n)) operations.
This can be very significant. For example, sqrt(10000) is just 100, but there are 1229 primes less than 10000. So prime_factor(n) can need to do over 10 times as much work to deal with the large primes in your range.

Optimalization of the primes finding function

After 10 minutes of work I have written a function presented below. It returns a list of all primes lower than an argument. I have used all known for me programing and mathematical tricks in order to make this function as fast as possible. To find all the primes lower than a million it takes about 2 seconds.
Do you see any possibilities to optimize it even further? Any ideas?
def Primes(To):
if To<2:
return []
if To<3:
return [2]
Found=[2]
n=3
LastSqr=0
while n<=To:
k=0
Limit=len(Found)
IsPrime=True
while k<Limit:
if k>=LastSqr:
if Found[k]>pow(n,0.5):
LastSqr=k
break
if n%Found[k]==0:
IsPrime=False
break
k+=1
if IsPrime:
Found.append(n)
n+=1
return Found
You can use a couple tricks to speed things up, using the basic sieve of erastothenes. One is to use Wheel Factorization to skip calculating numbers that are known not to be prime. For example, besides 2 and 3, all primes are congruent to 1 or 5 mod 6. This means you don't have to process 4 of every 6 numbers at all.
At the next level, all primes are congruent to 1, 7, 11, 13, 17, 19, 23, or 29, mod 30. You can throw out 22 of every 30 numbers.
Here is a simple implementation of the sieve of Erastothenes that doesn't calculate or store even numbers:
def basic_gen_primes(n):
"""Return a list of all primes less then or equal to n"""
if n < 2:
return []
# The sieve. Each entry i represents (2i + 1)
size = (n + 1) // 2
sieve = [True] * size
# 2(0) + 1 == 1 is not prime
sieve[0] = False
for i, value in enumerate(sieve):
if not value:
continue
p = 2*i + 1
# p is prime. Remove all of its multiples from the sieve
# p^2 == (2i + 1)(2i + 1) == (4i^2 + 4i + 1) == 2(2i^2 + 2i) + 1
multiple = 2 * i * i + 2 * i
if multiple >= size:
break
while multiple < size:
sieve[multiple] = False
multiple += p
return [2] + [2*i+1 for i, value in enumerate(sieve) if value]
As mentioned, you can use more exotic sieves as well.
You can check only odd numbers. So why don't you use n+=2 instead of n+=1?
google and wikipedia for better algorithms. If you are only looking for small primes this might be fast enough. But the real algorithms are a lot faster for large primes.
http://en.wikipedia.org/wiki/Quadratic_sieve
start with that page.
Increment n by two instead of one. ?

Why is this algorithm worse?

In Wikipedia this is one of the given algorithms to generate prime numbers:
def eratosthenes_sieve(n):
# Create a candidate list within which non-primes will be
# marked as None; only candidates below sqrt(n) need be checked.
candidates = [i for i in range(n + 1)]
fin = int(n ** 0.5)
# Loop over the candidates, marking out each multiple.
for i in range(2, fin + 1):
if not candidates[i]:
continue
candidates[i + i::i] = [None] * (n // i - 1)
# Filter out non-primes and return the list.
return [i for i in candidates[2:] if i]
I changed the algorithm slightly.
def eratosthenes_sieve(n):
# Create a candidate list within which non-primes will be
# marked as None; only candidates below sqrt(n) need be checked.
candidates = [i for i in range(n + 1)]
fin = int(n ** 0.5)
# Loop over the candidates, marking out each multiple.
candidates[4::2] = [None] * (n // 2 - 1)
for i in range(3, fin + 1, 2):
if not candidates[i]:
continue
candidates[i + i::i] = [None] * (n // i - 1)
# Filter out non-primes and return the list.
return [i for i in candidates[2:] if i]
I first marked off all the multiples of 2, and then I considered odd numbers only. When I timed both algorithms (tried 40.000.000) the first one was always better (albeit very slightly). I don't understand why. Can somebody please explain?
P.S.: When I try 100.000.000, my computer freezes. Why is that? I have Core Duo E8500, 4GB RAM, Windows 7 Pro 64 Bit.
Update 1: This is Python 3.
Update 2: This is how I timed:
start = time.time()
a = eratosthenes_sieve(40000000)
end = time.time()
print(end - start)
UPDATE: Upon valuable comments (especially by nightcracker and Winston Ewert) I managed to code what I intended in the first place:
def eratosthenes_sieve(n):
# Create a candidate list within which non-primes will be
# marked as None; only c below sqrt(n) need be checked.
c = [i for i in range(3, n + 1, 2)]
fin = int(n ** 0.5) // 2
# Loop over the c, marking out each multiple.
for i in range(fin):
if not c[i]:
continue
c[c[i] + i::c[i]] = [None] * ((n // c[i]) - (n // (2 * c[i])) - 1)
# Filter out non-primes and return the list.
return [2] + [i for i in c if i]
This algorithm improves the original algorithm (mentioned at the top) by (usually) 50%. (Still, worse than the algorithm mentioned by nightcracker, naturally).
A question to Python Masters: Is there a more Pythonic way to express this last code, in a more "functional" way?
UPDATE 2: I still couldn't decode the algorithm mentioned by nightcracker. I guess I'm too stupid.
The question is, why would it even be faster? In both examples you are filtering multiples of two, the hard way. It doesn't matter whether you hardcode candidates[4::2] = [None] * (n // 2 - 1) or that it gets executed in the first loop of for i in range(2, fin + 1):.
If you are interested in an optimized sieve of Eratosthenes, here you go:
def primesbelow(N):
# https://stackoverflow.com/questions/2068372/fastest-way-to-list-all-primes-below-n-in-python/3035188#3035188
#""" Input N>=6, Returns a list of primes, 2 <= p < N """
correction = N % 6 > 1
N = (N, N-1, N+4, N+3, N+2, N+1)[N%6]
sieve = [True] * (N // 3)
sieve[0] = False
for i in range(int(N ** .5) // 3 + 1):
if sieve[i]:
k = (3 * i + 1) | 1
sieve[k*k // 3::2*k] = [False] * ((N//6 - (k*k)//6 - 1)//k + 1)
sieve[(k*k + 4*k - 2*k*(i%2)) // 3::2*k] = [False] * ((N // 6 - (k*k + 4*k - 2*k*(i%2))//6 - 1) // k + 1)
return [2, 3] + [(3 * i + 1) | 1 for i in range(1, N//3 - correction) if sieve[i]]
Explanation here: Porting optimized Sieve of Eratosthenes from Python to C++
The original source is here, but there was no explanation. In short this primesieve skips multiples of 2 and 3 and uses a few hacks to make use of fast Python assignment.
You do not save a lot of time avoiding the evens. Most of the computation time within the algorithm is spent doing this:
candidates[i + i::i] = [None] * (n // i - 1)
That line causes a lot of action on the part of the computer. Whenever the number in question is even, this is not run as the loop bails on the if statement. The time spent running the loop for even numbers is thus really really small. So eliminating those even rounds does not produce a significant change in the timing of the loop. That's why your method isn't considerably faster.
When python produces numbers for range it uses a formula: start + index * step. Multiplying by two as you do in your case is going to be slightly more expensive then one as in the original case.
There is also quite possibly a small overhead to having a longer function.
Neither are of those are really significant speed issues, but they override the very small amount of benefit your version brings.
Its probably slightly slower because you are performing extra set up to do something that was done in the first case anyway (marking off multiples of two). That setup time might be what you see if it is as slight as you say
Your extra step is unnecessary and will actually traverse the whole collection n once doing that 'get rid of evens' operation rather than just operating on n^1/2.

Categories