Python: Faster Universal Hashing function with built in libs

Python: Faster Universal Hashing function with built in libs - python

I am trying to implement the universal hashing function with only base libs:
I am having issues because I am unable to run this in an effective time. I know % is slow so I have tried the following:
((a * x + b) % P) % n
divmod(divmod(a * x + b, P)[1], n)[1]
subeq = pow(a * x + b, 1, P)
hash = pow(subeq, 1, self.n)
All of these function are too slow for what I am trying to do. Is there a faster way to do mod division only using the base libs that I am unaware of?
Edit To elaborate, I will be running this function about 200000 times (or more) and I need for all 200000 runs to complete in under 4 seconds. None of these methods are even in that ball park (taking minutes)

You're not going to do better than ((a * x + b) % P) % m in pure Python code; the overhead of the Python interpreter is going to bottleneck you more than anything else; yes, if you ensure the m is a power of two, you can precompute mm1 = m - 1 and change the computation to ((a * x + b) % P) & mm1, replacing a more expensive remaindering operation with a cheaper bitmasking operation, but unless P is huge (hundreds of bits minimum), the interpreter overhead will likely outweigh the differences between remainder and bitmasking.
If you really need the performance, and the types you're working with will fit in C level primitive type, you may benefit from writing a Python C extension that converts all the values to size_t, Py_hash_t, uint64_t, or whatever suits your problem and performs the math as a set of bulk conversions to C types, C level math, then a single conversion back to Python int, saving a bunch of byte code and intermediate values (that are promptly tossed).
If the values are too large to fit in C primitives, GMP types are an option (look at mpz_import and mpz_export for efficient conversions from PyLong to mpz_t and back), but the odds of seeing big savings go down; GMP does math faster in general, and can mutate numbers in place rather than creating and destroying lots of temporaries, but even with mpz_import and mpz_export, the cost of converting between Python and GMP types would likely eat most of the savings.

from math import ceil, log2
from primesieve import nth_prime #will get nth prime number [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37]
from random import randint
class UniversalHashing:
""" N = #bins
p = prime number st: p >= N
nth_prime(1, 1 << max(32, ceil(log2(N))))
nth_prime(1,1<<max(32),ceil(log2(2)))))
nth_prime(1,2**32)
nth_prime(1,4294967296)
=4294967311
assert:- Returns Error if condition not satisfied
<< operatior:- multiply with 2the power like 2<<2 =2*2'2=8 or 7*2'3=56 and ceil will give the exact value or next vlue ceil(1)=1 , ceil(1.1)=2
randint:- Return a random integer N such that a <= N <= b. Alias for randrange(a, b+1). """
def __init__(self, N, p = None):
self.N = N
if p is None:
p = nth_prime(1, 1 << max(32, ceil(log2(N))))
assert p >= N, 'Prime number p should be at least N!'
self.p = p
def draw(self):
a = randint(1, self.p - 1)
b = randint(0, self.p - 1)
return lambda x: ((a * x + b) % self.p) % self.N
if __name__ == '__main__':
N = 50 #bins
n = 100000 #elements
H = UniversalHashing(N)
h = H.draw()
T = [0] * N
for _ in range(n):
x = randint(0, n * 10)
T[h(x)] += 1
for i in range(len(T)):
print(T[i] / n) # This should be approximately equal

Related

Let n be a square number. Using Python, how we can efficiently calculate natural numbers y up to a limit l such that n+y^2 is again a square number?

Using Python, I would like to implement a function that takes a natural number n as input and outputs a list of natural numbers [y1, y2, y3, ...] such that n + y1*y1 and n + y2*y2 and n + y3*y3 and so forth is again a square.
What I tried so far is to obtain one y-value using the following function:
def find_square(n:int) -> tuple[int, int]:
if n%2 == 1:
y = (n-1)//2
x = n+y*y
return (y,x)
return None
It works fine, eg. find_square(13689) gives me a correct solution y=6844. It would be great to have an algorithm that yields all possible y-values such as y=44 or y=156.

Simplest slow approach is of course for given N just to iterate all possible Y and check if N + Y^2 is square.
But there is a much faster approach using integer Factorization technique:
Lets notice that to solve equation N + Y^2 = X^2, that is to find all integer pairs (X, Y) for given fixed integer N, we can rewrite this equation to N = X^2 - Y^2 = (X + Y) * (X - Y) which follows from famous school formula of difference of squares.
Now lets rename two factors as A, B i.e. N = (X + Y) * (X - Y) = A * B, which means that X = (A + B) / 2 and Y = (A - B) / 2.
Notice that A and B should be of same odditiy, either both odd or both even, otherwise in last formulas above we can't have whole division by 2.
We will factorize N into all possible pairs of two factors (A, B) of same oddity. For fast factorization in code below I used simple to implement but yet quite fast algorithm Pollard Rho, also two extra algorithms were needed as a helper to Pollard Rho, one is Fermat Primality Test (which allows fast checking if number is probably prime) and second is Trial Division Factorization (which helps Pollard Rho to factor out small factors, which could cause Pollard Rho to fail).
Pollard Rho for composite number has time complexity O(N^(1/4)) which is very fast even for 64-bit numbers. Any faster factorization algorithm can be chosen if needed a bigger space to be searched. My fast algorithm time is dominated by speed of factorization, remaining part of algorithm is blazingly fast, just few iterations of loop with simple formulas.
If your N is a square itself (hence we know its root easily), then Pollard Rho can factor N even much faster, within O(N^(1/8)) time. Even for 128-bit numbers it means very small time, 2^16 operations, and I hope you're solving your task for less than 128 bit numbers.
If you want to process a range of possible N values then fastest way to factorize them is to use techniques similar to Sieve of Erathosthenes, using set of prime numbers, it allows to compute all factors for all N numbers within some range. Using Sieve of Erathosthenes for the case of range of Ns is much faster than factorizing each N with Pollard Rho.
After factoring N into pairs (A, B) we compute (X, Y) based on (A, B) by formulas above. And output resulting Y as a solution of fast algorithm.
Following code as an example is implemented in pure Python. Of course one can use Numba to speed it up, Numba usually gives 30-200 times speedup, for Python it achieves same speed as optimized C++. But I thought that main thing here is to implement fast algorithm, Numba optimizations can be done easily afterwards.
I added time measurement into following code. Although it is pure Python still my fast algorithm achieves 8500x times speedup compared to regular brute force approach for limit of 1 000 000.
You can change limit variable to tweak amount of searched space, or num_tests variable to tweak amount of different tests.
Following code implements both solutions - fast solution find_fast() described above plus very tiny brute force solution find_slow() which is very slow as it scans all possible candidates. This slow solution is only used to compare correctness in tests and compare speedup.
Code below uses nothing except few standard Python library modules, no external modules were used.
Try it online!
def find_slow(N):
import math
def is_square(x):
root = int(math.sqrt(float(x)) + 0.5)
return root * root == x, root
l = []
for y in range(N):
if is_square(N + y ** 2)[0]:
l.append(y)
return l
def find_fast(N):
import itertools, functools
Prod = lambda it: functools.reduce(lambda a, b: a * b, it, 1)
fs = factor(N)
mfs = {}
for e in fs:
mfs[e] = mfs.get(e, 0) + 1
fs = sorted(mfs.items())
del mfs
Ys = set()
for take_a in itertools.product(*[
(range(v + 1) if k != 2 else range(1, v)) for k, v in fs]):
A = Prod([p ** t for (p, _), t in zip(fs, take_a)])
B = N // A
assert A * B == N, (N, A, B, take_a)
if A < B:
continue
X = (A + B) // 2
Y = (A - B) // 2
assert N + Y ** 2 == X ** 2, (N, A, B, X, Y)
Ys.add(Y)
return sorted(Ys)
def trial_div_factor(n, limit = None):
# https://en.wikipedia.org/wiki/Trial_division
fs = []
while n & 1 == 0:
fs.append(2)
n >>= 1
all_checked = False
for d in range(3, (limit or n) + 1, 2):
if d * d > n:
all_checked = True
break
while True:
q, r = divmod(n, d)
if r != 0:
break
fs.append(d)
n = q
if n > 1 and all_checked:
fs.append(n)
n = 1
return fs, n
def fermat_prp(n, trials = 32):
# https://en.wikipedia.org/wiki/Fermat_primality_test
import random
if n <= 16:
return n in (2, 3, 5, 7, 11, 13)
for i in range(trials):
if pow(random.randint(2, n - 2), n - 1, n) != 1:
return False
return True
def pollard_rho_factor(n):
# https://en.wikipedia.org/wiki/Pollard%27s_rho_algorithm
import math, random
fs, n = trial_div_factor(n, 1 << 7)
if n <= 1:
return fs
if fermat_prp(n):
return sorted(fs + [n])
for itry in range(8):
failed = False
x = random.randint(2, n - 2)
for cycle in range(1, 1 << 60):
y = x
for i in range(1 << cycle):
x = (x * x + 1) % n
d = math.gcd(x - y, n)
if d == 1:
continue
if d == n:
failed = True
break
return sorted(fs + pollard_rho_factor(d) + pollard_rho_factor(n // d))
if failed:
break
assert False, f'Pollard Rho failed! n = {n}'
def factor(N):
import functools
Prod = lambda it: functools.reduce(lambda a, b: a * b, it, 1)
fs = pollard_rho_factor(N)
assert N == Prod(fs), (N, fs)
return sorted(fs)
def test():
import random, time
limit = 1 << 20
num_tests = 20
t0, t1 = 0, 0
for i in range(num_tests):
if (round(i / num_tests * 1000)) % 100 == 0 or i + 1 >= num_tests:
print(f'test {i}, ', end = '', flush = True)
N = random.randrange(limit)
tb = time.time()
r0 = find_slow(N)
t0 += time.time() - tb
tb = time.time()
r1 = find_fast(N)
t1 += time.time() - tb
assert r0 == r1, (N, r0, r1, t0, t1)
print(f'\nTime slow {t0:.05f} sec, fast {t1:.05f} sec, speedup {round(t0 / max(1e-6, t1))} times')
if __name__ == '__main__':
test()
Output:
test 0, test 2, test 4, test 6, test 8, test 10, test 12, test 14, test 16, test 18, test 19,
Time slow 26.28198 sec, fast 0.00301 sec, speedup 8732 times

For the easiest solution, you can try this:
import math
n=13689 #or we can ask user to input a square number.
for i in range(1,9999):
if math.sqrt(n+i**2).is_integer():
print(i)

Is there anyway to make this code to run faster [duplicate]

This question already has answers here:
calculate mod using pow function python
(3 answers)
Closed 1 year ago.
Is there any way to make this code run faster?
g = 53710316114328094
a = 995443176435632644
n = 926093738455418579
print(g**a%n)
It is running for too long I want to make it faster
I also tried:
import math
g = 53710316114328094
a = 995443176435632644
n = 926093738455418579
print(math.pow(g**a)%n)
and
g = 53710316114328094
a = 995443176435632644
n = 926093738455418579
def power(a,b):
ans = 1
for i in range(b):
ans *= a
return ans
print(power(g,a)%n)
All of these code is running for very long

First of all you need to know about Binary exponentiation algorithm. The idea is that instead of computing e.g. 5^46 like 5*5*5*5... 46 times, you can do
5^46 == 5^2 * 5^4 * 5^8 * 5^32
The key here, is that you can compute 5^2 fast from 5 (just square it), then 5^4 fast from 5^2 (just square it), then 5^8 from 5^4 (just square it) and so on. To determine which 5^K numbers you should multiply and which not, you can represent the power as a binary number, and multiply to the final result only those components, that correspond to 1 in this binary representation. E.g.
decimal 46 == binary 101110
Thus
5^1 is skipped (corresponds to right most 0), 5^2 is multiplied (corresponds to right most 1), 5^4 is multiplied(second from the right 1), 5^8 is multiplied (third from the right 1), 5^16 is skipped (the left most 0) and 5^32 is multiplied (the left most 1).
Next, you need to compute a very huge power, it's impractically big. But there is a shortcut, since you use modulo operation.
You see, there's a rule that
(a*b % n) == ( (a % n)*(b % n) ) % n
So these should be equivalent
5^46 % n == ( ( ( (5^2 % n) * (5^4 % n) % n) * (5^8 % n) % n) * (5^32 % n) % n)
Notice that each number we multiply won't ever exceed n, so the overall multiplication chain will not take forever, as n is big, but not even remotely as gigantic as g**a
In the code, all of that looks like that. It computes instantly
def pow_modulo_n(base, power, n):
result = 1
multiplier = base
while power > 0:
power, binary_digit = divmod(power, 2)
if binary_digit == 1:
result = (result * multiplier) % n
multiplier = (multiplier**2) % n
return result % n
g = 53710316114328094
a = 995443176435632644
n = 926093738455418579
print(pow_modulo_n(g, a, n))
This prints
434839845697636246

Is gmpy2 suitable for implementing RSA in python?

More specifically, is the gmpy2.next_prime function good enough to find the large primes needed? Or should I be using one of the other many gmpy2.*_prp functions?
For example, is the following code good enough for finding suitable primes for encryption?
import os
import gmpy2
def random(bytez):
seed = reduce(lambda a, b: (a << 8)|ord(b), os.urandom(bytez), 0)
return gmpy2.mpz_urandomb(gmpy2.random_state(seed), bytez*8)
def find_prime(bytez=128):
p = random(bytez)|1
while not gmpy2.is_bpsw_prp(p):
p = random(bytez)|1
return p
def good_pair(p, q):
n = p*q
k = gmpy2.ceil(gmpy2.log2(n))
if abs(p - q) > 2**(k/2 - 100):
return n
return 0
def make_rsa_keypair():
p, q = find_prime(), find_prime()
n = good_pair(p, q)
while not n:
p, q = find_prime(), find_prime()
n = good_pair(p, q)
tot = n - (p + q - 1)
e = (1 << 16) + 1
d = gmpy2.invert(e, tot)
return {
'public':{
'n':n,
'e':e,
},
'private':{
'n':n,
'd':d,
}
}
UPDATE: updated the code with the suggestion.

Disclaimer: I maintain gmpy2.
I would recommend using gmpy2.is_bpsw_prp instead of gmpy2.next_prime. The BPSW test will be faster and there are no known counter-examples. The is_prime and next_prime checks used to use, and may still use, a fixed set of bases and it is possible to composites that pass a series of known tests. IIRC, someone found a composite that passed the first 17 checks. By default, 25 checks are done but it is a weakness.
I am planning to include an APR-CL provable primality test in the next release of gmpy2.
There are specific guidelines for selecting RSA primes that should be followed to prevent accidentally choosing primes that create an n that can be easily factored.

Generating digits of square root of 2

I want to generate the digits of the square root of two to 3 million digits.
I am aware of Newton-Raphson but I don't have much clue how to implement it in C or C++ due to lack of biginteger support. Can somebody point me in the right direction?
Also, if anybody knows how to do it in python (I'm a beginner), I would also appreciate it.

You could try using the mapping:
a/b -> (a+2b)/(a+b) starting with a= 1, b= 1. This converges to sqrt(2) (in fact gives the continued fraction representations of it).
Now the key point: This can be represented as a matrix multiplication (similar to fibonacci)
If a_n and b_n are the nth numbers in the steps then
[1 2] [a_n b_n]T = [a_(n+1) b_(n+1)]T
[1 1]
which now gives us
[1 2]n [a_1 b_1]T = [a_(n+1) b_(n+1)]T
[1 1]
Thus if the 2x2 matrix is A, we need to compute An which can be done by repeated squaring and only uses integer arithmetic (so you don't have to worry about precision issues).
Also note that the a/b you get will always be in reduced form (as gcd(a,b) = gcd(a+2b, a+b)), so if you are thinking of using a fraction class to represent the intermediate results, don't!
Since the nth denominators is like (1+sqrt(2))^n, to get 3 million digits you would likely need to compute till the 3671656th term.
Note, even though you are looking for the ~3.6 millionth term, repeated squaring will allow you to compute the nth term in O(Log n) multiplications and additions.
Also, this can easily be made parallel, unlike the iterative ones like Newton-Raphson etc.

EDIT: I like this version better than the previous. It's a general solution that accepts both integers and decimal fractions; with n = 2 and precision = 100000, it takes about two minutes. Thanks to Paul McGuire for his suggestions & other suggestions welcome!
def sqrt_list(n, precision):
ndigits = [] # break n into list of digits
n_int = int(n)
n_fraction = n - n_int
while n_int: # generate list of digits of integral part
ndigits.append(n_int % 10)
n_int /= 10
if len(ndigits) % 2: ndigits.append(0) # ndigits will be processed in groups of 2
decimal_point_index = len(ndigits) / 2 # remember decimal point position
while n_fraction: # insert digits from fractional part
n_fraction *= 10
ndigits.insert(0, int(n_fraction))
n_fraction -= int(n_fraction)
if len(ndigits) % 2: ndigits.insert(0, 0) # ndigits will be processed in groups of 2
rootlist = []
root = carry = 0 # the algorithm
while root == 0 or (len(rootlist) < precision and (ndigits or carry != 0)):
carry = carry * 100
if ndigits: carry += ndigits.pop() * 10 + ndigits.pop()
x = 9
while (20 * root + x) * x > carry:
x -= 1
carry -= (20 * root + x) * x
root = root * 10 + x
rootlist.append(x)
return rootlist, decimal_point_index

As for arbitrary big numbers you could have a look at The GNU Multiple Precision Arithmetic Library (for C/C++).

For work? Use a library!
For fun? Good for you :)
Write a program to imitate what you would do with pencil and paper. Start with 1 digit, then 2 digits, then 3, ..., ...
Don't worry about Newton or anybody else. Just do it your way.

Here is a short version for calculating the square root of an integer a to digits of precision. It works by finding the integer square root of a after multiplying by 10 raised to the 2 x digits.
def sqroot(a, digits):
a = a * (10**(2*digits))
x_prev = 0
x_next = 1 * (10**digits)
while x_prev != x_next:
x_prev = x_next
x_next = (x_prev + (a // x_prev)) >> 1
return x_next
Just a few caveats.
You'll need to convert the result to a string and add the decimal point at the correct location (if you want the decimal point printed).
Converting a very large integer to a string isn't very fast.
Dividing very large integers isn't very fast (in Python) either.
Depending on the performance of your system, it may take an hour or longer to calculate the square root of 2 to 3 million decimal places.
I haven't proven the loop will always terminate. It may oscillate between two values differing in the last digit. Or it may not.

The nicest way is probably using the continued fraction expansion [1; 2, 2, ...] the square root of two.
def root_two_cf_expansion():
yield 1
while True:
yield 2
def z(a,b,c,d, contfrac):
for x in contfrac:
while a > 0 and b > 0 and c > 0 and d > 0:
t = a // c
t2 = b // d
if not t == t2:
break
yield t
a = (10 * (a - c*t))
b = (10 * (b - d*t))
# continue with same fraction, don't pull new x
a, b = x*a+b, a
c, d = x*c+d, c
for digit in rdigits(a, c):
yield digit
def rdigits(p, q):
while p > 0:
if p > q:
d = p // q
p = p - q * d
else:
d = (10 * p) // q
p = 10 * p - q * d
yield d
def decimal(contfrac):
return z(1,0,0,1,contfrac)
decimal((root_two_cf_expansion()) returns an iterator of all the decimal digits. t1 and t2 in the algorithm are minimum and maximum values of the next digit. When they are equal, we output that digit.
Note that this does not handle certain exceptional cases such as negative numbers in the continued fraction.
(This code is an adaptation of Haskell code for handling continued fractions that has been floating around.)

Well, the following is the code that I wrote. It generated a million digits after the decimal for the square root of 2 in about 60800 seconds for me, but my laptop was sleeping when it was running the program, it should be faster that. You can try to generate 3 million digits, but it might take a couple days to get it.
def sqrt(number,digits_after_decimal=20):
import time
start=time.time()
original_number=number
number=str(number)
list=[]
for a in range(len(number)):
if number[a]=='.':
decimal_point_locaiton=a
break
if a==len(number)-1:
number+='.'
decimal_point_locaiton=a+1
if decimal_point_locaiton/2!=round(decimal_point_locaiton/2):
number='0'+number
decimal_point_locaiton+=1
if len(number)/2!=round(len(number)/2):
number+='0'
number=number[:decimal_point_locaiton]+number[decimal_point_locaiton+1:]
decimal_point_ans=int((decimal_point_locaiton-2)/2)+1
for a in range(0,len(number),2):
if number[a]!='0':
list.append(eval(number[a:a+2]))
else:
try:
list.append(eval(number[a+1]))
except IndexError:
pass
p=0
c=list[0]
x=0
ans=''
for a in range(len(list)):
while c>=(20*p+x)*(x):
x+=1
y=(20*p+x-1)*(x-1)
p=p*10+x-1
ans+=str(x-1)
c-=y
try:
c=c*100+list[a+1]
except IndexError:
c=c*100
while c!=0:
x=0
while c>=(20*p+x)*(x):
x+=1
y=(20*p+x-1)*(x-1)
p=p*10+x-1
ans+=str(x-1)
c-=y
c=c*100
if len(ans)-decimal_point_ans>=digits_after_decimal:
break
ans=ans[:decimal_point_ans]+'.'+ans[decimal_point_ans:]
total=time.time()-start
return ans,total

Python already supports big integers out of the box, and if that's the only thing holding you back in C/C++ you can always write a quick container class yourself.
The only problem you've mentioned is a lack of big integers. If you don't want to use a library for that, then are you looking for help writing such a class?

Here's a more efficient integer square root function (in Python 3.x) that should terminate in all cases. It starts with a number much closer to the square root, so it takes fewer steps. Note that int.bit_length requires Python 3.1+. Error checking left out for brevity.
def isqrt(n):
x = (n >> n.bit_length() // 2) + 1
result = (x + n // x) // 2
while abs(result - x) > 1:
x = result
result = (x + n // x) // 2
while result * result > n:
result -= 1
return result

How to implement an efficient infinite generator of prime numbers in Python?

This is not a homework, I am just curious.
INFINITE is the key word here.
I wish to use it as for p in primes(). I believe that this is a built-in function in Haskell.
So, the answer cannot be as naive as "Just do a Sieve".
First of all, you do not know how many consecutive primes will be consumed. Well, suppose you could concoct 100 of them at a time. Would you use the same Sieve approach as well as the frequency of prime numbers formula?
I prefer non-concurrent approach.
Thank you for reading (and writing ;) )!

“If I have seen further…”
The erat2 function from the cookbook can be further sped up (by about 20-25%):
erat2a
import itertools as it
def erat2a( ):
D = { }
yield 2
for q in it.islice(it.count(3), 0, None, 2):
p = D.pop(q, None)
if p is None:
D[q*q] = q
yield q
else:
# old code here:
# x = p + q
# while x in D or not (x&1):
# x += p
# changed into:
x = q + 2*p
while x in D:
x += 2*p
D[x] = p
The not (x&1) check verifies that x is odd. However, since both q and p are odd, by adding 2*p half of the steps are avoided along with the test for oddity.
erat3
If one doesn't mind a little extra fanciness, erat2 can be sped up by 35-40% with the following changes (NB: needs Python 2.7+ or Python 3+ because of the itertools.compress function):
import itertools as it
def erat3( ):
D = { 9: 3, 25: 5 }
yield 2
yield 3
yield 5
MASK= 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0,
MODULOS= frozenset( (1, 7, 11, 13, 17, 19, 23, 29) )
for q in it.compress(
it.islice(it.count(7), 0, None, 2),
it.cycle(MASK)):
p = D.pop(q, None)
if p is None:
D[q*q] = q
yield q
else:
x = q + 2*p
while x in D or (x%30) not in MODULOS:
x += 2*p
D[x] = p
The erat3 function takes advantage of the fact that all primes (except for 2, 3, 5) modulo 30 result to only eight numbers: the ones included in the MODULOS frozenset. Thus, after yielding the initial three primes, we start from 7 and work only with the candidates.
The candidate filtering uses the itertools.compress function; the “magic” is in the MASK sequence; MASK has 15 elements (there are 15 odd numbers in every 30 numbers, as chosen by the itertools.islice function) with a 1 for every possible candidate, starting from 7. The cycle repeats as specified by the itertools.cycle function.
The introduction of the candidate filtering needs another modification: the or (x%30) not in MODULOS check. The erat2 algorithm processed all odd numbers; now that the erat3 algorithm processes only r30 candidates, we need to make sure that all D.keys() can only be such —false— candidates.
Benchmarks
Results
On an Atom 330 Ubuntu 9.10 server, versions 2.6.4 and 3.1.1+:
$ testit
up to 8192
==== python2 erat2 ====
100 loops, best of 3: 18.6 msec per loop
==== python2 erat2a ====
100 loops, best of 3: 14.5 msec per loop
==== python2 erat3 ====
Traceback (most recent call last):
…
AttributeError: 'module' object has no attribute 'compress'
==== python3 erat2 ====
100 loops, best of 3: 19.2 msec per loop
==== python3 erat2a ====
100 loops, best of 3: 14.1 msec per loop
==== python3 erat3 ====
100 loops, best of 3: 11.7 msec per loop
On an AMD Geode LX Gentoo home server, Python 2.6.5 and 3.1.2:
$ testit
up to 8192
==== python2 erat2 ====
10 loops, best of 3: 104 msec per loop
==== python2 erat2a ====
10 loops, best of 3: 81 msec per loop
==== python2 erat3 ====
Traceback (most recent call last):
…
AttributeError: 'module' object has no attribute 'compress'
==== python3 erat2 ====
10 loops, best of 3: 116 msec per loop
==== python3 erat2a ====
10 loops, best of 3: 82 msec per loop
==== python3 erat3 ====
10 loops, best of 3: 66 msec per loop
Benchmark code
A primegen.py module contains the erat2, erat2a and erat3 functions. Here follows the testing script:
#!/bin/sh
max_num=${1:-8192}
echo up to $max_num
for python_version in python2 python3
do
for function in erat2 erat2a erat3
do
echo "==== $python_version $function ===="
$python_version -O -m timeit -c \
-s "import itertools as it, functools as ft, operator as op, primegen; cmp= ft.partial(op.ge, $max_num)" \
"next(it.dropwhile(cmp, primegen.$function()))"
done
done

Since the OP asks for an efficient implementation, here's a significant improvement to the active state 2002 code by David Eppstein/Alex Martelli (seen here in his answer): don't record a prime's info in the dictionary until its square is seen among the candidates. Brings space complexity down to below O(sqrt(n)) instead of O(n), for n primes produced ( π(sqrt(n log n)) ~ 2 sqrt(n log n) / log(n log n) ~ 2 sqrt(n / log n) ). Consequently, time complexity is also improved, i.e. it runs faster.
Creates a "sliding sieve" as a dictionary of current multiples of each base prime (i.e. below the sqrt of the current production point), together with their step values:
from itertools import count
# ideone.com/aVndFM
def postponed_sieve(): # postponed sieve, by Will Ness
yield 2; yield 3; yield 5; yield 7; # original code David Eppstein,
sieve = {} # Alex Martelli, ActiveState Recipe 2002
ps = postponed_sieve() # a separate base Primes Supply:
p = next(ps) and next(ps) # (3) a Prime to add to dict
q = p*p # (9) its sQuare
for c in count(9,2): # the Candidate
if c in sieve: # c's a multiple of some base prime
s = sieve.pop(c) # i.e. a composite ; or
elif c < q:
yield c # a prime
continue
else: # (c==q): # or the next base prime's square:
s=count(q+2*p,2*p) # (9+6, by 6 : 15,21,27,33,...)
p=next(ps) # (5)
q=p*p # (25)
for m in s: # the next multiple
if m not in sieve: # no duplicates
break
sieve[m] = s # original test entry: ideone.com/WFv4f
(the older, original code here was edited to incorporate changes as seen in the answer by Tim Peters, below). see also this for a related discussion.
Similar 2-3-5-7 wheel-based code runs ~ 2.15x faster (which is very close to the theoretical improvement of 3/2 * 5/4 * 7/6 = 2.1875).
2022 update: I've recently chanced upon this old "NESL" thing from the 1990s which actually uses the same sqrt-recursion trick. So nothing is new under the sun. :)
The code can be straightforwardly augmented to start the primes enumeration directly from a given value. This can be seen in this JS-based entry.

For posterity, here's a rewrite of Will Ness's beautiful algorithm for Python 3. Some changes are needed (iterators no longer have .next() methods, but there's a new next() builtin function). Other changes are for fun (using the new yield from <iterable> replaces four yield statements in the original. More are for readability (I'm not a fan of overusing ;-) 1-letter variable names).
It's significantly faster than the original, but not for algorithmic reasons. The speedup is mostly due to removing the original's add() function, doing that inline instead.
def psieve():
import itertools
yield from (2, 3, 5, 7)
D = {}
ps = psieve()
next(ps)
p = next(ps)
assert p == 3
psq = p*p
for i in itertools.count(9, 2):
if i in D: # composite
step = D.pop(i)
elif i < psq: # prime
yield i
continue
else: # composite, = p*p
assert i == psq
step = 2*p
p = next(ps)
psq = p*p
i += step
while i in D:
i += step
D[i] = step

This isn't originally my code, however, it's worth posting. The original can be found here: http://code.activestate.com/recipes/117119/
def gen_primes():
D = {}
q = 2 # first integer to test for primality.
while True:
if q not in D:
# not marked composite, must be prime
yield q
#first multiple of q not already marked
D[q * q] = [q]
else:
for p in D[q]:
D.setdefault(p + q, []).append(p)
# no longer need D[q], free memory
del D[q]
q += 1
It's a generator, so use it like any other.
primes = gen_primes()
for p in primes:
print p
It takes 1.62s to generate and put into a set, 1 million primes, on my desktop.

Do a segmented sieve, where the size of a segment is determined by available memory or the maximal size of a bitset.
For each segment represent the numbers in some interval [n; n + segment_size) as a bit set and sieve with all prime numbers below the square root of the upper bound.
Using a bit set uses less memory than a hash table or tree data structure, because you are working with dense sets of numbers.

Another way to do it:
import itertools
def primeseq():
prime = [2]
num = 0
yield 2
for i in itertools.count(3, 2):
is_prime = True
for num in prime:
if i % num == 0:
is_prime = False
break
elif num ** 2 > i:
break
if is_prime:
prime.append(i)
yield i

And another answer, more memory-efficient than my erat3 answer here:
import heapq
def heapprimegen():
hp= []
yield 2
yield 3
cn= 3
nn, inc= 3, 6
while 1:
while cn < nn:
yield cn
heapq.heappush(hp, (3*cn, 2*cn))
cn+= 2
cn= nn+2
nn, inc= heapq.heappushpop(hp, (nn+inc, inc))
It maintains a heap (a list) of prime multiples rather than a dictionary. It loses some speed, obviously.

Here is a complicated heap-based implementation, which is not much faster than other heap-based implementations (see the speed comparison in another answer of mine), but it uses much less memory.
This implementation uses two heaps (tu and wv), which contain the same number elements. Each element is an int pair. In order to find all primes up to q**2 (where q is a prime), each heap will contain at most 2*pi(q-1) elements, where pi(x) is the number of positive primes not larger than x. So the total number of integers is at most 4*pi(floor(sqrt(n))). (We could gain a factor on 2 on memory by pushing half as much stuff to the heap, but that would make the algorithm slower.)
Other dict and heap-based approaches (e.g. erat2b, and heap_prime_gen_squares and heapprimegen) above store about `2*pi(n)' integers, because they extend their heap or dict every time they find a prime. As a comparison: to find the 1_000_000 primes, this implementation stores less than 4141 integers, other implementations store more than 1_000_000 integers.
import heapq
def heap_prime_gen_smallmem():
yield 2
yield 3
f = 5
fmar3 = 2
q = 7
q6 = 7 * 6
qmar3 = 4
tu = [(25, 30), (35, 30)]
vw = [(25, 30), (35, 30)]
while True:
qmar3 += 2
if qmar3 == 6:
qb = q + 4
q6b = q6 + 24
qmar3 = 2
else:
qb = q + 2
q6b = q6 + 12
if q < tu[0][0]:
d = q * q
while f < d:
a, b = vw[0]
if f < a:
yield f
else:
a, b = vw[0]
heapq.heapreplace(vw, (a + b, b))
a, b = vw[0]
while f >= a:
heapq.heapreplace(vw, (a + b, b))
a, b = vw[0]
fmar3 += 2
if fmar3 == 6:
f += 4
fmar3 = 2
else:
f += 2
c = q * qb
heapq.heappush(tu, (d, q6))
heapq.heappush(tu, (c, q6))
heapq.heappush(vw, (d, q6))
heapq.heappush(vw, (c, q6))
else:
a, b = tu[0]
heapq.heapreplace(tu, (a + b, b))
a, b = tu[0]
while q >= a:
heapq.heapreplace(tu, (a + b, b))
a, b = tu[0]
q = qb
q6 = q6b

Here's a pretty fast infinite generator, written in Python2 but easily adjusted to Python3. To use it to add the primes up to 10**9, use the following:
from itertools import takewhile
from functools import partial
from operator import gt
print (sum(takewhile(partial(gt, 10**9), prime_gen_inf())))
It's a segmented sieve, faster but obviously less elegant than Will Ness's algorithm.
from operator import mul
from functools import reduce
def prod(x): return reduce(mul, x, 1)
def build_sieve(wheel):
w = prod(wheel)
w_phi = prod([p-1 for p in wheel])
rems = [a for a in range(w) if all(a % p for p in wheel)]
assert len(rems) == w_phi
inv = {a:pow(a, w_phi - 1, w) for a in rems}
try:
known_p = wheel + rems[1 : rems.index(rems[1]*rems[1])]
except ValueError:
known_p = wheel + rems[1:]
return wheel, w, w_phi, rems, inv, known_p
#Adjust the chunk variable based on your computer's architecture.
#
#Adjust the line with #! if you don't need "true" infinite. If you don't need
#primes larger than 1<<32, use array('H', []), if 1<<64 use 'L', if 1<<128 (in
#Python3) use 'Q', otherwise use empty list [].
#To save memory, comment out the lines with #*, and uncomment the commented-out
#lines
import itertools
from itertools import islice, count, compress, izip
chain_f = itertools.chain.from_iterable
from array import array
def prime_gen_inf(chunk=250000, sieve_info = build_sieve([2,3,5,7])):
""" Indefinitely yields primes """
wheel, w, w_phi, rems, inv, known_p = sieve_info
for p in known_p: yield p
new_n = 0;
while True:
size = min(chunk, (p * p - new_n) / w)
sieve = bytearray([1]) * size * w_phi
n, new_n = new_n, new_n + size * w
if not n:
zero = bytearray([0])
seen = len(known_p) - len(wheel) + 1
sieve[:seen:1] = zero * seen
p_gen = islice(prime_gen_inf(), len(wheel), None)
new_p = next(p_gen)
ps = [] #! array('H', [])
p_invs = bytearray([]) #*
while new_p * new_p < new_n:
ps.append(new_p)
p_invs.append(inv[new_p % w]) #*
new_p = next(p_gen)
for p, p_inv, modp in izip(ps, p_invs, [-n % p for p in ps]): #*
s = [(modp + p * (p_inv * (r - modp) % w)) / w for r in rems] #*
#for p in ps:
# s = [(-n%p + p * (inv[p%w] * (r - -n%p) % w)) / w for r in rems]
for i, start in enumerate(s):
slice_size = ((size - start - 1) / p + 1)
sieve[i + start * w_phi :: p * w_phi] = zero * slice_size
for p in compress(chain_f(izip(*[count(n+r, w) for r in rems])), sieve):
yield p

Here is a simple but not terribly slow one using a heap instead of a dict:
import heapq
def heap_prime_gen_squares():
yield 2
yield 3
h = [(9, 6)]
n = 5
while True:
a, b = h[0]
while n < a:
yield n
heapq.heappush(h, (n * n, n << 1))
n += 2
heapq.heapreplace(h, (a + b, b)) # Replace h[0], which is still (a, b).
My speed measurements of user time for the first 1 million primes (smaller numbers are better):
postponed_sieve (dict-based): 8.553s
erat2b (dict-based): 9.513s
erat2a (dict-based): 10.313s
heap_prime_gen_smallmem (heap-based): 23.935s
heap_prime_gen_squares (heap-based): 27.302s
heapprimegen (dict-based): 145.029s
So dict-based approaches seem to be the fastest.

Here's a generator that's a little truer to how it's done in Haskell: filtering against composites of known primes, then adding the remaining primes to the list.
def gen_primes():
primes = []
i = 2
while True:
prime = True
for p in primes:
if not (i % p):
prime = False
break
if prime:
yield i
primes.append(i)
i += 1

I know the post is old, but I came across this question... The following code is based on a very simple idea: a growing sieve of Eratosthenes. Although this solution is slower than the best ones here, it is easy to grasp and designed to be readable...
I used integers to store the results of the sieve.
In binary format, an integer is a list of 0s and 1s, 0 at position i if i is not a prime, 1 if it may be a prime.
The requisite infinity is a result of the fact that Python 3 integers are unbounded.
def primes():
container, size = 1 << 2, 3 # we start with 0b100 (from right to left: 0 and 1 are not primes, 2 is
last_prime = 1
while True:
prime = next((j for j in range(last_prime+1, size) if container & 1 << j), None) # find the next prime
while not prime:
container, size = expand(container, size, 2**16) # add 65536 cells and sieve the container
prime = next((j for j in range(last_prime+1, size) if container & 1 << j), None)
yield prime
last_prime = prime
How to expand the container? Just add a bunch of 1s at the left of the container (in binary format) and sieve them. This is
identical to the standard sieve, with a slight difference. In the standard sieve, if we find a prime i, we start to cross the cells at i*i, with a step of i.
Here, this may have been done for the first part of container. We just need to start at the beginning of the new part of the container if it is farther than i*i.
def expand(container, size, n):
new_size = size + n
container += (1 << (new_size + 1) - 1) - (1 << size) # add n 1's
for i in range(2, new_size):
if container & (1 << i): # i is a prime
t = sum(1 << j for j in range(max(i, size // i)*i, new_size, i)) # set 1 for all mutiple
container &= ~t # cross the cells
return container, new_size
Test for a million primes:
import itertools
assert 78498 == len(list(itertools.takewhile(lambda p: p<1000000, primes())))

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: Faster Universal Hashing function with built in libs - python

Related

Let n be a square number. Using Python, how we can efficiently calculate natural numbers y up to a limit l such that n+y^2 is again a square number?

Is there anyway to make this code to run faster [duplicate]

Is gmpy2 suitable for implementing RSA in python?

Generating digits of square root of 2

How to implement an efficient infinite generator of prime numbers in Python?

Categories

Resources