Probability of finding a prime (using miller-rabin test) - python

I've implemented Miller-Rabin primality test and every function seems to be working properly in isolation. However, when I try to find a prime by generating random numbers of 70 bits my program generates in average more than 100000 numbers before finding a number that passes the Miller-Rabin test (10 steps). This is very strange, the probability of being prime for a random odd number of less than 70 bits should be very high (more than 1/50 according to Hadamard-de la Vallée Poussin Theorem). What could be wrong with my code? Would it be possible that the random number generator throws prime numbers with very low probability? I guess not... Any help is very welcome.
import random
def miller_rabin_rounds(n, t):
'''Runs miller-rabin primallity test t times for n'''
# First find the values r and s such that 2^s * r = n - 1
r = (n - 1) / 2
s = 1
while r % 2 == 0:
s += 1
r /= 2
# Run the test t times
for i in range(t):
a = random.randint(2, n - 1)
y = power_remainder(a, r, n)
if y != 1 and y != n - 1:
# check there is no j for which (a^r)^(2^j) = -1 (mod n)
j = 0
while j < s - 1 and y != n - 1:
y = (y * y) % n
if y == 1:
return False
j += 1
if y != n - 1:
return False
return True
def power_remainder(a, k, n):
'''Computes (a^k) mod n efficiently by decomposing k into binary'''
r = 1
while k > 0:
if k % 2 != 0:
r = (r * a) % n
a = (a * a) % n
k //= 2
return r
def random_odd(n):
'''Generates a random odd number of max n bits'''
a = random.getrandbits(n)
if a % 2 == 0:
a -= 1
return a
if __name__ == '__main__':
t = 10 # Number of Miller-Rabin tests per number
bits = 70 # Number of bits of the random number
a = random_odd(bits)
count = 0
while not miller_rabin_rounds(a, t):
count += 1
if count % 10000 == 0:
print(count)
a = random_odd(bits)
print(a)

The reason this works in python 2 and not python 3 is that the two handle integer division differently. In python 2, 3/2 = 1, whereas in python 3, 3/2=1.5.
It looks like you should be forcing integer division in python 3 (rather than float division). If you change the code to force integer division (//) as such:
# First find the values r and s such that 2^s * r = n - 1
r = (n - 1) // 2
s = 1
while r % 2 == 0:
s += 1
r //= 2
You should see the correct behaviour regardless of what python version you use.

Related

Function that prints prime factorization of any number / Python

I'm looking for help writing a function that takes a positive integer n as input and prints its prime factorization to the screen. The output should gather the factors together into a single string so that the results of a call like prime_factorization(60) would be to print the string “60 = 2 x 2 x 3 x 5” to the screen. The following is what I have so far.
UPDATE: I made progress and figured out how to find the prime factorization. However, I still need help printing it the correct way as mentioned above.
""""
Input is a positive integer n
Output is its prime factorization, computed as follows:
"""
import math
def prime_factorization(n):
while (n % 2) == 0:
print(2)
# Turn n into odd number
n = n / 2
for i in range (3, int(math.sqrt(n)) + 1, 2):
while (n % i) == 0:
print(i)
n = n / I
if (n > 2):
print(n)
prime_factorization(60)
Note that I am trying to print it so if the input is 60, the output reads " 60 = 2 x 2 x 3 x 5 "
You should always separate computation from presentation. You can build the function as a generator that divides the number by increasing divisors (2 and then odds). When you find one that fits, output it and continue with the result of the division. This will only produce prime factors.
Then use that function to obtain the data to print rather than trying to mix in the printing and formatting.
def primeFactors(N):
p,i = 2,1 # prime divisor and increment
while p*p<=N: # no need to go beyond √N
while N%p == 0: # if is integer divisor
yield p # output prime divisor
N //= p # remove it from the number
p,i = p+i,2 # advance to next potential divisor 2, 3, 5, ...
if N>1: yield N # remaining value is a prime if not 1
output:
N=60
print(N,end=" = ")
print(*primeFactors(N),sep=" x ")
60 = 2 x 2 x 3 x 5
Use a list to store all factors, then print them together in the required format as a string.
import math
def prime_factorization(n):
factors = [] # to store factors
while (n % 2) == 0:
factors.append(2)
# Turn n into odd number
n = n / 2
for i in range (3, int(math.sqrt(n)) + 1, 2):
while (n % i) == 0:
factors.append(i)
n = n / I
if (n > 2):
factors.append(n)
print(" x ".join(str(i) for i in factors)) # to get the required string
prime_factorization(60)
Here is a way of doing it with f-strings. In addition, you need to do integer division (with //) to avoid getting floats in your answer.
""""
Input is a positive integer n
Output is its prime factorization, computed as follows:
"""
import math
def prime_factorization(n):
n_copy = n
prime_list = []
while (n % 2) == 0:
prime_list.append(2)
# Turn n into odd number
n = n // 2
for i in range(3, int(math.sqrt(n)) + 1, 2):
while (n % i) == 0:
prime_list.append(i)
n = n // i
if (n > 2):
prime_list.append(n)
print(f'{n_copy} =', end = ' ')
for factor in prime_list[:-1]:
print (f'{factor} x', end=' ' )
print(prime_list[-1])
prime_factorization(60)
#output: 60 = 2 x 2 x 3 x 5

Weird behaviour of division in python

I'm trying to solve this problem in hackerrank. At some point I have to check if a number divides n(given input) or not.
This code works perfectly well except one test case(not an issue):
if __name__ == '__main__':
tc = int(input().strip())
for i_tc in range(tc):
n = int(input().strip())
while n % 2 == 0 and n is not 0:
n >>= 1
last = 0
for i in range(3, int(n ** 0.5), 2):
while n % i == 0 and n > 0:
last = n
n = n // i # Concentrate here
print(n if n > 2 else last)
Now you can see that I'm dividing the number only when i is a factor of n.For example if the numbers be i = 2 and n = 4 then n / 2 and n // 2 doesn't make any difference right.
But when I use the below code all test cases are getting failed:
if __name__ == '__main__':
tc = int(input().strip())
for i_tc in range(tc):
n = int(input().strip())
while n % 2 == 0 and n is not 0:
n >>= 1
last = 0
for i in range(3, int(n ** 0.5), 2):
while n % i == 0 and n > 0:
last = n
n = n / i # Notice this is not //
print(n if n > 2 else last)
This is not the first time.Even for this problem I faced the same thing.For this problem I have to only divide by 2 so I used right shift operator to get rid of this.But here I can't do any thing since right shift can't help me.
Why is this happening ? If the numbers are small I can't see any difference but as the number becomes larger it is somehow behaving differently.
It is not even intuitive to use // when / fails. What is the reason for this ?
The main reason of the difference between n // i and n / i given that n and i are of type int and n % i == 0 is that
the type of n // i is still int whereas the type of n / i is float and
integers in Python have unlimited precision whereas the precision of floats is limited.
Therefore, if the value of n // i is outside the range that is accurately representable by the python float type, then it will be not equal to the computed value of n / i.
Illustration:
>>> (10**16-2)/2 == (10**16-2)//2
True
>>> (10**17-2)/2 == (10**17-2)//2
False
>>> int((10**17-2)//2)
49999999999999999
>>> int((10**17-2)/2)
50000000000000000
>>>

Computing Eulers Totient Function

I am trying to find an efficient way to compute Euler's totient function.
What is wrong with this code? It doesn't seem to be working.
def isPrime(a):
return not ( a < 2 or any(a % i == 0 for i in range(2, int(a ** 0.5) + 1)))
def phi(n):
y = 1
for i in range(2,n+1):
if isPrime(i) is True and n % i == 0 is True:
y = y * (1 - 1/i)
else:
continue
return int(y)
Here's a much faster, working way, based on this description on Wikipedia:
Thus if n is a positive integer, then φ(n) is the number of integers k in the range 1 ≤ k ≤ n for which gcd(n, k) = 1.
I'm not saying this is the fastest or cleanest, but it works.
from math import gcd
def phi(n):
amount = 0
for k in range(1, n + 1):
if gcd(n, k) == 1:
amount += 1
return amount
You have three different problems...
y needs to be equal to n as initial value, not 1
As some have mentioned in the comments, don't use integer division
n % i == 0 is True isn't doing what you think because of Python chaining the comparisons! Even if n % i equals 0 then 0 == 0 is True BUT 0 is True is False! Use parens or just get rid of comparing to True since that isn't necessary anyway.
Fixing those problems,
def phi(n):
y = n
for i in range(2,n+1):
if isPrime(i) and n % i == 0:
y *= 1 - 1.0/i
return int(y)
Calculating gcd for every pair in range is not efficient and does not scales. You don't need to iterate throught all the range, if n is not a prime you can check for prime factors up to its square root, refer to https://stackoverflow.com/a/5811176/3393095.
We must then update phi for every prime by phi = phi*(1 - 1/prime).
def totatives(n):
phi = int(n > 1 and n)
for p in range(2, int(n ** .5) + 1):
if not n % p:
phi -= phi // p
while not n % p:
n //= p
#if n is > 1 it means it is prime
if n > 1: phi -= phi // n
return phi
I'm working on a cryptographic library in python and this is what i'm using. gcd() is Euclid's method for calculating greatest common divisor, and phi() is the totient function.
def gcd(a, b):
while b:
a, b=b, a%b
return a
def phi(a):
b=a-1
c=0
while b:
if not gcd(a,b)-1:
c+=1
b-=1
return c
Most implementations mentioned by other users rely on calling a gcd() or isPrime() function. In the case you are going to use the phi() function many times, it pays of to calculated these values before hand. A way of doing this is by using a so called sieve algorithm.
https://stackoverflow.com/a/18997575/7217653 This answer on stackoverflow provides us with a fast way of finding all primes below a given number.
Oke, now we can replace isPrime() with a search in our array.
Now the actual phi function:
Wikipedia gives us a clear example: https://en.wikipedia.org/wiki/Euler%27s_totient_function#Example
phi(36) = phi(2^2 * 3^2) = 36 * (1- 1/2) * (1- 1/3) = 30 * 1/2 * 2/3 = 12
In words, this says that the distinct prime factors of 36 are 2 and 3; half of the thirty-six integers from 1 to 36 are divisible by 2, leaving eighteen; a third of those are divisible by 3, leaving twelve numbers that are coprime to 36. And indeed there are twelve positive integers that are coprime with 36 and lower than 36: 1, 5, 7, 11, 13, 17, 19, 23, 25, 29, 31, and 35.
TL;DR
With other words: We have to find all the prime factors of our number and then multiply these prime factors together using foreach prime_factor: n *= 1 - 1/prime_factor.
import math
MAX = 10**5
# CREDIT TO https://stackoverflow.com/a/18997575/7217653
def sieve_for_primes_to(n):
size = n//2
sieve = [1]*size
limit = int(n**0.5)
for i in range(1,limit):
if sieve[i]:
val = 2*i+1
tmp = ((size-1) - i)//val
sieve[i+val::val] = [0]*tmp
return [2] + [i*2+1 for i, v in enumerate(sieve) if v and i>0]
PRIMES = sieve_for_primes_to(MAX)
print("Primes generated")
def phi(n):
original_n = n
prime_factors = []
prime_index = 0
while n > 1: # As long as there are more factors to be found
p = PRIMES[prime_index]
if (n % p == 0): # is this prime a factor?
prime_factors.append(p)
while math.ceil(n / p) == math.floor(n / p): # as long as we can devide our current number by this factor and it gives back a integer remove it
n = n // p
prime_index += 1
for v in prime_factors: # Now we have the prime factors, we do the same calculation as wikipedia
original_n *= 1 - (1/v)
return int(original_n)
print(phi(36)) # = phi(2**2 * 3**2) = 36 * (1- 1/2) * (1- 1/3) = 36 * 1/2 * 2/3 = 12
It looks like you're trying to use Euler's product formula, but you're not calculating the number of primes which divide a. You're calculating the number of elements relatively prime to a.
In addition, since 1 and i are both integers, so is the division, in this case you always get 0.
With regards to efficiency, I haven't noticed anyone mention that gcd(k,n)=gcd(n-k,n). Using this fact can save roughly half the work needed for the methods involving the use of the gcd. Just start the count with 2 (because 1/n and (n-1)/k will always be irreducible) and add 2 each time the gcd is one.
Here is a shorter implementation of orlp's answer.
from math import gcd
def phi(n): return sum([gcd(n, k)==1 for k in range(1, n+1)])
As others have already mentioned it leaves room for performance optimization.
Actually to calculate phi(any number say n)
We use the Formula
where p are the prime factors of n.
So, you have few mistakes in your code:
1.y should be equal to n
2. For 1/i actually 1 and i both are integers so their evaluation will also be an integer,thus it will lead to wrong results.
Here is the code with required corrections.
def phi(n):
y = n
for i in range(2,n+1):
if isPrime(i) and n % i == 0 :
y -= y/i
else:
continue
return int(y)

Rabin-Miller Strong Pseudoprime Test Implementation won't work

Been trying to implement Rabin-Miller Strong Pseudoprime Test today.
Have used Wolfram Mathworld as reference, lines 3-5 sums up my code pretty much.
However, when I run the program, it says (sometimes) that primes (even low such as 5, 7, 11) are not primes. I've looked over the code for a very long while and cannot figure out what is wrong.
For help I've looked at this site aswell as many other sites but most use another definition (probably the same, but since I'm new to this kind of math, I can't see the same obvious connection).
My Code:
import random
def RabinMiller(n, k):
# obviously not prime
if n < 2 or n % 2 == 0:
return False
# special case
if n == 2:
return True
s = 0
r = n - 1
# factor n - 1 as 2^(r)*s
while r % 2 == 0:
s = s + 1
r = r // 2 # floor
# k = accuracy
for i in range(k):
a = random.randrange(1, n)
# a^(s) mod n = 1?
if pow(a, s, n) == 1:
return True
# a^(2^(j) * s) mod n = -1 mod n?
for j in range(r):
if pow(a, 2**j*s, n) == -1 % n:
return True
return False
print(RabinMiller(7, 5))
How does this differ from the definition given at Mathworld?
1. Comments on your code
A number of the points I'll make below were noted in other answers, but it seems useful to have them all together.
In the section
s = 0
r = n - 1
# factor n - 1 as 2^(r)*s
while r % 2 == 0:
s = s + 1
r = r // 2 # floor
you've got the roles of r and s swapped: you've actually factored n − 1 as 2sr. If you want to stick to the MathWorld notation, then you'll have to swap r and s in this section of the code:
# factor n - 1 as 2^(r)*s, where s is odd.
r, s = 0, n - 1
while s % 2 == 0:
r += 1
s //= 2
In the line
for i in range(k):
the variable i is unused: it's conventional to name such variables _.
You pick a random base between 1 and n − 1 inclusive:
a = random.randrange(1, n)
This is what it says in the MathWorld article, but that article is written from the mathematician's point of view. In fact it is useless to pick the base 1, since 1s = 1 (mod n) and you'll waste a trial. Similarly, it's useless to pick the base n − 1, since s is odd and so (n − 1)s = −1 (mod n). Mathematicians don't have to worry about wasted trials, but programmers do, so write instead:
a = random.randrange(2, n - 1)
(n needs to be at least 4 for this optimization to work, but we can easily arrange that by returning True at the top of the function when n = 3, just as you do for n = 2.)
As noted in other replies, you've misunderstood the MathWorld article. When it says that "n passes the test" it means that "n passes the test for the base a". The distinguishing fact about primes is that they pass the test for all bases. So when you find that as = 1 (mod n), what you should do is to go round the loop and pick the next base to test against.
# a^(s) = 1 (mod n)?
x = pow(a, s, n)
if x == 1:
continue
There's an opportunity for optimization here. The value x that we've just computed is a20 s (mod n). So we could test it immediately and save ourselves one loop iteration:
# a^(s) = ±1 (mod n)?
x = pow(a, s, n)
if x == 1 or x == n - 1:
continue
In the section where you calculate a2j s (mod n) each of these numbers is the square of the previous number (modulo n). It's wasteful to calculate each from scratch when you could just square the previous value. So you should write this loop as:
# a^(2^(j) * s) = -1 (mod n)?
for _ in range(r - 1):
x = pow(x, 2, n)
if x == n - 1:
break
else:
return False
It's a good idea to test for divisibility by small primes before trying Miller–Rabin. For example, in Rabin's 1977 paper he says:
In implementing the algorithm we incorporate some laborsaving steps. First we test for divisibility by any prime p < N, where, say N = 1000.
2. Revised code
Putting all this together:
from random import randrange
small_primes = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31] # etc.
def probably_prime(n, k):
"""Return True if n passes k rounds of the Miller-Rabin primality
test (and is probably prime). Return False if n is proved to be
composite.
"""
if n < 2: return False
for p in small_primes:
if n < p * p: return True
if n % p == 0: return False
r, s = 0, n - 1
while s % 2 == 0:
r += 1
s //= 2
for _ in range(k):
a = randrange(2, n - 1)
x = pow(a, s, n)
if x == 1 or x == n - 1:
continue
for _ in range(r - 1):
x = pow(x, 2, n)
if x == n - 1:
break
else:
return False
return True
In addition to what Omri Barel has said, there is also a problem with your for loop. You will return true if you find one a that passes the test. However, all a have to pass the test for n to be a probable prime.
I'm wondering about this piece of code:
# factor n - 1 as 2^(r)*s
while r % 2 == 0:
s = s + 1
r = r // 2 # floor
Let's take n = 7. So n - 1 = 6. We can express n - 1 as 2^1 * 3. In this case r = 1 and s = 3.
But the code above finds something else. It starts with r = 6, so r % 2 == 0. Initially, s = 0 so after one iteration we have s = 1 and r = 3. But now r % 2 != 0 and the loop terminates.
We end up with s = 1 and r = 3 which is clearly incorrect: 2^r * s = 8.
You should not update s in the loop. Instead, you should count how many times you can divide by 2 (this will be r) and the result after the divisions will be s. In the example of n = 7, n - 1 = 6, we can divide it once (so r = 1) and after the division we end up with 3 (so s = 3).
Here's my version:
# miller-rabin pseudoprimality checker
from random import randrange
def isStrongPseudoprime(n, a):
d, s = n-1, 0
while d % 2 == 0:
d, s = d/2, s+1
t = pow(a, d, n)
if t == 1:
return True
while s > 0:
if t == n - 1:
return True
t, s = pow(t, 2, n), s - 1
return False
def isPrime(n, k):
if n % 2 == 0:
return n == 2
for i in range(1, k):
a = randrange(2, n)
if not isStrongPseudoprime(n, a):
return False
return True
If you want to know more about programming with prime numbers, I modestly recommend this essay on my blog.
You should also have a look at Wikipedia, where known "random" sequences gives guaranteed answers up to a given prime.
if n < 1,373,653, it is enough to test a = 2 and 3;
if n < 9,080,191, it is enough to test a = 31 and 73;
if n < 4,759,123,141, it is enough to test a = 2, 7, and 61;
if n < 2,152,302,898,747, it is enough to test a = 2, 3, 5, 7, and 11;
if n < 3,474,749,660,383, it is enough to test a = 2, 3, 5, 7, 11, and 13;
if n < 341,550,071,728,321, it is enough to test a = 2, 3, 5, 7, 11, 13, and 17;

How can I improve my code for euler 14?

I solved Euler problem 14 but the program I used is very slow. I had a look at what the others did and they all came up with elegant solutions. I tried to understand their code without much success.
Here is my code (the function to determine the length of the Collatz chain
def collatz(n):
a=1
while n!=1:
if n%2==0:
n=n/2
else:
n=3*n+1
a+=1
return a
Then I used brute force. It is slow and I know it is weak. Could someone tell me why my code is weak and how I can improve my code in plain English.
Bear in mind that I am a beginner, my programming skills are basic.
Rather than computing every possible chain from the start to the end, you can keep a cache of chain starts and their resulting length. For example, for the chain
13 40 20 10 5 16 8 4 2 1
you could remember the following:
The Collatz chain that starts with 13 has length 10
The Collatz chain that starts with 40 has length 9
The Collatz chain starting with 20 has length 8
... and so on.
We can then use this saved information to stop computing a chain as soon as we encounter a number which is already in our cache.
Implementation
Use dictionaries in Python to associate starting numbers with their chain length:
chain_sizes = {}
chain_sizes[13] = 10
chain_sizes[40] = 9
chain_sizes[40] # => 9
20 in chain_sizes # => False
Now you just have to adapt your algorithm to make use of this dictionary (filling it with values as well as looking up intermediate numbers).
By the way, this can be expressed very nicely using recursion. The chain sizes that can occur here will not overflow the stack :)
Briefly, because my English is horrible ;-)
Forall n >= 1, C(n) = n/2 if n even,
3*n + 1 if n odd
It is possible to calculate several consecutive iterates at once.
kth iterate of a number ending in k 0 bits:
C^k(a*2^k) = a
(2k)th iterate of a number ending in k 1 bits:
C^(2k)(a*2^k + 2^k - 1) = a*3^k + 3^k - 1 = (a + 1)*3^k - 1
Cf. formula on Wikipédia article (in French); see also my website (in French), and Module tnp1 in my Python package DSPython.
Combine the following code with the technique of memoization explained by Niklas B :
#!/usr/bin/env python
# -*- coding: latin-1 -*-
from __future__ import division # Python 3 style in Python 2
from __future__ import print_function # Python 3 style in Python 2
def C(n):
"""Pre: n: int >= 1
Result: int >= 1"""
return (n//2 if n%2 == 0
else n*3 + 1)
def Ck(n, k):
"""Pre: n: int >= 1
k: int >= 0
Result: int >= 1"""
while k > 0:
while (n%2 == 0) and k: # n even
n //= 2
k -= 1
if (n == 1) and k:
n = 4
k -= 1
else:
nb = 0
while (n > 1) and n%2 and (k > 1): # n odd != 1
n //= 2
nb += 1
k -= 2
if n%2 and (k == 1):
n = (n + 1)*(3**(nb + 1)) - 2
k -= 1
elif nb:
n = (n + 1)*(3**nb) - 1
return n
def C_length(n):
"""Pre: n: int >= 1
Result: int >= 1"""
l = 1
while n > 1:
while (n > 1) and (n%2 == 0): # n even
n //= 2
l += 1
nb = 0
while (n > 1) and n%2: # n odd != 1
n //= 2
nb += 1
l += 2
if nb:
n = (n + 1)*(3**nb) - 1
return l
if __name__ == '__main__':
for n in range(1, 51):
print(n, ': length =', C_length(n))

Categories