numpy precision with large numbers

numpy precision with large numbers - python

I want to factorize a large number using Fermat's factorization method. This is how I implemented it:
import numpy as np
def fac(n):
x = np.ceil(np.sqrt(n))
y = x*x - n
while not np.sqrt(y).is_integer():
x += 1
y = x*x - n
return(x + np.sqrt(y), x - np.sqrt(y))
Using this method I want to factor N into its components. Note that N=p*q, where p and q are prime.
I chose the following values to compute N:
p = 34058934059834598495823984675767545695711020949846845989934523432842834738974239847294083409583495898523872347284789757987987387543533846141.0
q = 34058934059834598495823984675767545695711020949846845989934523432842834738974239847294083409583495898523872347284789757987987387543533845933.0
and defined N
N = p*q
Now I factor N:
r = fac(n)
However, the factorization seems to not be correct:
int(r[0])*int(r[1]) == N
It does work for smaller ints:
fac(65537)
Out[1]: (65537.0, 1.0)
I'm quite sure the reason is numerical precision at some point.
I tried calculating N in numpy using object types:
N = np.dot(np.array(p).astype(object), np.array(q).astype(object))
but it doesn't help. Still, the numpy requires a float for the sqrt function.
I also tried using the math library instead of numpy, this library seems to not require a float for its sqrt function, but ultimately running into precision issues as well.

Python int are multiple precision numbers. But numpy is a wrapper around C low level libraries to speed up operations. The downside is that it cannot handle those multi-precision numbers. Worse, if you try to use np.sqrt on them, they will be converted to floating point numbers (C double or numpy float64) what have a precision of about 15 decimal digits.
But as Python int type is already a multiprecision type, you could use math.sqrt to get an approximative value of the true square root, and then use Newton to find a closer value:
def isqrt(n):
x = int(math.sqrt(n))
old = None
while True:
d = (n - x * x) // (2 * x)
if d == 0: break
if d == 1: # infinite loop prevention
if old is None:
old = 1
else: break
x += d
return x
Using it, your fac function could become:
def fac(n):
x = isqrt(n)
if x*x < n: x += 1
y = x * x - n
while True:
z = isqrt(y)
if z*z == y: break
x += 1
y = x*x -n
return x+z, x-z
Demo:
p = 34058934059834598495823984675767545695711020949846845989934523432842834738974239847294083409583495898523872347284789757987987387543533846141
q = 34058934059834598495823984675767545695711020949846845989934523432842834738974239847294083409583495898523872347284789757987987387543533845933
N = p*q
print(fac(N) == (p,q))
prints as expected True

Related

Errors in Directly vs Recursively Calculating a given Fibonacci Number

I was bored at work and was playing with some math and python coding, when I noticed the following:
Recursively (or if using a for loop) you simply add integers together to get a given Fibonacci number. However there is also a direct equation for calculating Fibonacci numbers, and for large n this equation will give answers that are, frankly, quite wrong with respect to the recursively calculated Fibonacci number.
I imagine this is due to rounding and floating point arithmetic ( sqrt(5) is irrational after all), and if so can anyone point me into a direction on how I could modify the fibo_calc_direct function to return a more accurate result?
Thanks!
def fib_calc_recur(n, ii = 0, jj = 1):
#n is the index of the nth fibonacci number, F_n, where F_0 = 0, F_1 = 1, ...
if n == 0: #use recursion
return ii
if n == 1:
return jj
else:
return(fib_calc_recur(n -1, jj, ii + jj))
def fib_calc_direct(n):
a = (1 + np.sqrt(5))/2
b = (1 - np.sqrt(5))/2
f = (1/np.sqrt(5)) * (a**n - b**n)
return(f)

You could make use of Decimal numbers, and set its precision depending on the magninute of n
Not your question, but I'd use an iterative version of the addition method. Here is a script that makes both calculations (naive addition, direct with Decimal) for values of n up to 4000:
def fib_calc_iter(n):
a, b = 0, 1
if n < 2:
return n
for _ in range(1, n):
a, b = b, a + b
return b
from decimal import Decimal, getcontext
def fib_calc_decimal(n):
getcontext().prec = n // 4 + 3 # Choose a precision good enough for this n
sqrt5 = Decimal(5).sqrt()
da = (1 + sqrt5) / 2
db = (1 - sqrt5) / 2
f = (da**n - db**n) / sqrt5
return int(f + Decimal(0.5)) # Round to nearest int
# Test it...
for n in range(1, 4000):
x = fib_calc_iter(n)
y = fib_calc_decimal(n)
if x != y:
print(f"Difference found for n={n}.\nNaive method={x}.\nDecimal method={y}")
break
else:
print("No differences found")

Let n be a square number. Using Python, how we can efficiently calculate natural numbers y up to a limit l such that n+y^2 is again a square number?

Using Python, I would like to implement a function that takes a natural number n as input and outputs a list of natural numbers [y1, y2, y3, ...] such that n + y1*y1 and n + y2*y2 and n + y3*y3 and so forth is again a square.
What I tried so far is to obtain one y-value using the following function:
def find_square(n:int) -> tuple[int, int]:
if n%2 == 1:
y = (n-1)//2
x = n+y*y
return (y,x)
return None
It works fine, eg. find_square(13689) gives me a correct solution y=6844. It would be great to have an algorithm that yields all possible y-values such as y=44 or y=156.

Simplest slow approach is of course for given N just to iterate all possible Y and check if N + Y^2 is square.
But there is a much faster approach using integer Factorization technique:
Lets notice that to solve equation N + Y^2 = X^2, that is to find all integer pairs (X, Y) for given fixed integer N, we can rewrite this equation to N = X^2 - Y^2 = (X + Y) * (X - Y) which follows from famous school formula of difference of squares.
Now lets rename two factors as A, B i.e. N = (X + Y) * (X - Y) = A * B, which means that X = (A + B) / 2 and Y = (A - B) / 2.
Notice that A and B should be of same odditiy, either both odd or both even, otherwise in last formulas above we can't have whole division by 2.
We will factorize N into all possible pairs of two factors (A, B) of same oddity. For fast factorization in code below I used simple to implement but yet quite fast algorithm Pollard Rho, also two extra algorithms were needed as a helper to Pollard Rho, one is Fermat Primality Test (which allows fast checking if number is probably prime) and second is Trial Division Factorization (which helps Pollard Rho to factor out small factors, which could cause Pollard Rho to fail).
Pollard Rho for composite number has time complexity O(N^(1/4)) which is very fast even for 64-bit numbers. Any faster factorization algorithm can be chosen if needed a bigger space to be searched. My fast algorithm time is dominated by speed of factorization, remaining part of algorithm is blazingly fast, just few iterations of loop with simple formulas.
If your N is a square itself (hence we know its root easily), then Pollard Rho can factor N even much faster, within O(N^(1/8)) time. Even for 128-bit numbers it means very small time, 2^16 operations, and I hope you're solving your task for less than 128 bit numbers.
If you want to process a range of possible N values then fastest way to factorize them is to use techniques similar to Sieve of Erathosthenes, using set of prime numbers, it allows to compute all factors for all N numbers within some range. Using Sieve of Erathosthenes for the case of range of Ns is much faster than factorizing each N with Pollard Rho.
After factoring N into pairs (A, B) we compute (X, Y) based on (A, B) by formulas above. And output resulting Y as a solution of fast algorithm.
Following code as an example is implemented in pure Python. Of course one can use Numba to speed it up, Numba usually gives 30-200 times speedup, for Python it achieves same speed as optimized C++. But I thought that main thing here is to implement fast algorithm, Numba optimizations can be done easily afterwards.
I added time measurement into following code. Although it is pure Python still my fast algorithm achieves 8500x times speedup compared to regular brute force approach for limit of 1 000 000.
You can change limit variable to tweak amount of searched space, or num_tests variable to tweak amount of different tests.
Following code implements both solutions - fast solution find_fast() described above plus very tiny brute force solution find_slow() which is very slow as it scans all possible candidates. This slow solution is only used to compare correctness in tests and compare speedup.
Code below uses nothing except few standard Python library modules, no external modules were used.
Try it online!
def find_slow(N):
import math
def is_square(x):
root = int(math.sqrt(float(x)) + 0.5)
return root * root == x, root
l = []
for y in range(N):
if is_square(N + y ** 2)[0]:
l.append(y)
return l
def find_fast(N):
import itertools, functools
Prod = lambda it: functools.reduce(lambda a, b: a * b, it, 1)
fs = factor(N)
mfs = {}
for e in fs:
mfs[e] = mfs.get(e, 0) + 1
fs = sorted(mfs.items())
del mfs
Ys = set()
for take_a in itertools.product(*[
(range(v + 1) if k != 2 else range(1, v)) for k, v in fs]):
A = Prod([p ** t for (p, _), t in zip(fs, take_a)])
B = N // A
assert A * B == N, (N, A, B, take_a)
if A < B:
continue
X = (A + B) // 2
Y = (A - B) // 2
assert N + Y ** 2 == X ** 2, (N, A, B, X, Y)
Ys.add(Y)
return sorted(Ys)
def trial_div_factor(n, limit = None):
# https://en.wikipedia.org/wiki/Trial_division
fs = []
while n & 1 == 0:
fs.append(2)
n >>= 1
all_checked = False
for d in range(3, (limit or n) + 1, 2):
if d * d > n:
all_checked = True
break
while True:
q, r = divmod(n, d)
if r != 0:
break
fs.append(d)
n = q
if n > 1 and all_checked:
fs.append(n)
n = 1
return fs, n
def fermat_prp(n, trials = 32):
# https://en.wikipedia.org/wiki/Fermat_primality_test
import random
if n <= 16:
return n in (2, 3, 5, 7, 11, 13)
for i in range(trials):
if pow(random.randint(2, n - 2), n - 1, n) != 1:
return False
return True
def pollard_rho_factor(n):
# https://en.wikipedia.org/wiki/Pollard%27s_rho_algorithm
import math, random
fs, n = trial_div_factor(n, 1 << 7)
if n <= 1:
return fs
if fermat_prp(n):
return sorted(fs + [n])
for itry in range(8):
failed = False
x = random.randint(2, n - 2)
for cycle in range(1, 1 << 60):
y = x
for i in range(1 << cycle):
x = (x * x + 1) % n
d = math.gcd(x - y, n)
if d == 1:
continue
if d == n:
failed = True
break
return sorted(fs + pollard_rho_factor(d) + pollard_rho_factor(n // d))
if failed:
break
assert False, f'Pollard Rho failed! n = {n}'
def factor(N):
import functools
Prod = lambda it: functools.reduce(lambda a, b: a * b, it, 1)
fs = pollard_rho_factor(N)
assert N == Prod(fs), (N, fs)
return sorted(fs)
def test():
import random, time
limit = 1 << 20
num_tests = 20
t0, t1 = 0, 0
for i in range(num_tests):
if (round(i / num_tests * 1000)) % 100 == 0 or i + 1 >= num_tests:
print(f'test {i}, ', end = '', flush = True)
N = random.randrange(limit)
tb = time.time()
r0 = find_slow(N)
t0 += time.time() - tb
tb = time.time()
r1 = find_fast(N)
t1 += time.time() - tb
assert r0 == r1, (N, r0, r1, t0, t1)
print(f'\nTime slow {t0:.05f} sec, fast {t1:.05f} sec, speedup {round(t0 / max(1e-6, t1))} times')
if __name__ == '__main__':
test()
Output:
test 0, test 2, test 4, test 6, test 8, test 10, test 12, test 14, test 16, test 18, test 19,
Time slow 26.28198 sec, fast 0.00301 sec, speedup 8732 times

For the easiest solution, you can try this:
import math
n=13689 #or we can ask user to input a square number.
for i in range(1,9999):
if math.sqrt(n+i**2).is_integer():
print(i)

Taylor series for log(x)

I'm trying to evaluate a Taylor polynomial for the natural logarithm, ln(x), centred at a=1 in Python. I'm using the series given on Wikipedia however when I try a simple calculation like ln(2.7) instead of giving me something close to 1 it gives me a gigantic number. Is there something obvious that I'm doing wrong?
def log(x):
n=1000
s=0
for i in range(1,n):
s += ((-1)**(i+1))*((x-1)**i)/i
return s
Using the Taylor series:
Gives the result:
EDIT: If anyone stumbles across this an alternative way to evaluate the natural logarithm of some real number is to use numerical integration (e.g. Riemann sum, midpoint rule, trapezoid rule, Simpson's rule etc) to evaluate the integral that is often used to define the natural logarithm;

That series is only valid when x is <= 1. For x>1 you will need a different series.
For example this one (found here):
def ln(x): return 2*sum(((x-1)/(x+1))**i/i for i in range(1,100,2))
output:
ln(2.7) # 0.9932517730102833
math.log(2.7) # 0.9932517730102834
Note that it takes a lot more than 100 terms to converge as x gets bigger (up to a point where it'll become impractical)
You can compensate for that by adding the logarithms of smaller factors of x:
def ln(x):
if x > 2: return ln(x/2) + ln(2) # ln(x) = ln(x/2 * 2) = ln(x/2) + ln(2)
return 2*sum(((x-1)/(x+1))**i/i for i in range(1,1000,2))
which is something you can also do in your Taylor based function to support x>1:
def log(x):
if x > 1: return log(x/2) - log(0.5) # ln(2) = -ln(1/2)
n=1000
s=0
for i in range(1,n):
s += ((-1)**(i+1))*((x-1)**i)/i
return s
These series also take more terms to converge when x gets closer to zero so you may want to work them in the other direction as well to keep the actual value to compute between 0.5 and 1:
def log(x):
if x > 1: return log(x/2) - log(0.5) # ln(x/2 * 2) = ln(x/2) + ln(2)
if x < 0.5: return log(2*x) + log(0.5) # ln(x*2 / 2) = ln(x*2) - ln(2)
...
If performance is an issue, you'll want to store ln(2) or log(0.5) somewhere and reuse it instead of computing it on every call
for example:
ln2 = None
def ln(x):
if x <= 2:
return 2*sum(((x-1)/(x+1))**i/i for i in range(1,10000,2))
global ln2
if ln2 is None: ln2 = ln(2)
n2 = 0
while x>2: x,n2 = x/2,n2+1
return ln2*n2 + ln(x)

The program is correct, but the Mercator series has the following caveat:
The series converges to the natural logarithm (shifted by 1) whenever −1 < x ≤ 1.
The series diverges when x > 1, so you shouldn't expect a result close to 1.

The python function math.frexp(x) can be used to advantage here to modify the problem so that the taylor series is working with a value close to one. math.frexp(x) is described as:
Return the mantissa and exponent of x as the pair (m, e). m is a float
and e is an integer such that x == m * 2**e exactly. If x is zero,
returns (0.0, 0), otherwise 0.5 <= abs(m) < 1. This is used to “pick
apart” the internal representation of a float in a portable way.
Using math.frexp(x) should not be regarded as "cheating" because it is presumably implemented just by accessing the bit fields in the underlying binary floating point representation. It isn't absolutely guaranteed that the representation of floats will be IEEE 754 binary64, but as far as I know every platform uses this. sys.float_info can be examined to find out the actual representation details.
Much like the other answer does you can use the standard logarithmic identities as follows: Let m, e = math.frexp(x). Then log(x) = log(m * 2e) = log(m) + e * log(2). log(2) can be precomputed to full precision ahead of time and is just a constant in the program. Here is some code illustrating this to compute the two similar taylor series approximations to log(x). The number of terms in each series was determined by trial and error rather than rigorous analysis.
taylor1 implements log(1 + x) = x1 - (1/2) * x2 + (1/3) * x3 ...
taylor2 implements log(x) = 2 * [t + (1/3) * t3 + (1/5) * t5 ...], where t = (x - 1) / (x + 1).
import math
import struct
_LOG_OF_2 = 0.69314718055994530941723212145817656807550013436025
def taylor1(x):
m, e = math.frexp(x)
log_of_m = 0
num_terms = 36
sign = 1
m_minus1_power = m - 1
for k in range(1, num_terms + 1):
log_of_m += sign * m_minus1_power / k
sign = -sign
m_minus1_power *= m - 1
return log_of_m + e * _LOG_OF_2
def taylor2(x):
m, e = math.frexp(x)
num_terms = 12
half_log_of_m = 0
t = (m - 1) / (m + 1)
t_squared = t * t
t_power = t
denominator = 1
for k in range(num_terms):
half_log_of_m += t_power / denominator
denominator += 2
t_power *= t_squared
return 2 * half_log_of_m + e * _LOG_OF_2
This seems to work well over most of the domain of log(x), but as x approaches 1 (and log(x) approaches 0) the transformation provided by x = m * 2e actually produces a less accurate result. So a better algorithm would first check if x is close to 1, say abs(x-1) < .5, and if so the just compute the taylor series approximation directly on x.

My answer is just using the Taylor series for In(x). I really hope this helps. It is simple and straight to the point.
enter image description here

Computing Eulers Totient Function

I am trying to find an efficient way to compute Euler's totient function.
What is wrong with this code? It doesn't seem to be working.
def isPrime(a):
return not ( a < 2 or any(a % i == 0 for i in range(2, int(a ** 0.5) + 1)))
def phi(n):
y = 1
for i in range(2,n+1):
if isPrime(i) is True and n % i == 0 is True:
y = y * (1 - 1/i)
else:
continue
return int(y)

Here's a much faster, working way, based on this description on Wikipedia:
Thus if n is a positive integer, then φ(n) is the number of integers k in the range 1 ≤ k ≤ n for which gcd(n, k) = 1.
I'm not saying this is the fastest or cleanest, but it works.
from math import gcd
def phi(n):
amount = 0
for k in range(1, n + 1):
if gcd(n, k) == 1:
amount += 1
return amount

You have three different problems...
y needs to be equal to n as initial value, not 1
As some have mentioned in the comments, don't use integer division
n % i == 0 is True isn't doing what you think because of Python chaining the comparisons! Even if n % i equals 0 then 0 == 0 is True BUT 0 is True is False! Use parens or just get rid of comparing to True since that isn't necessary anyway.
Fixing those problems,
def phi(n):
y = n
for i in range(2,n+1):
if isPrime(i) and n % i == 0:
y *= 1 - 1.0/i
return int(y)

Calculating gcd for every pair in range is not efficient and does not scales. You don't need to iterate throught all the range, if n is not a prime you can check for prime factors up to its square root, refer to https://stackoverflow.com/a/5811176/3393095.
We must then update phi for every prime by phi = phi*(1 - 1/prime).
def totatives(n):
phi = int(n > 1 and n)
for p in range(2, int(n ** .5) + 1):
if not n % p:
phi -= phi // p
while not n % p:
n //= p
#if n is > 1 it means it is prime
if n > 1: phi -= phi // n
return phi

I'm working on a cryptographic library in python and this is what i'm using. gcd() is Euclid's method for calculating greatest common divisor, and phi() is the totient function.
def gcd(a, b):
while b:
a, b=b, a%b
return a
def phi(a):
b=a-1
c=0
while b:
if not gcd(a,b)-1:
c+=1
b-=1
return c

Most implementations mentioned by other users rely on calling a gcd() or isPrime() function. In the case you are going to use the phi() function many times, it pays of to calculated these values before hand. A way of doing this is by using a so called sieve algorithm.
https://stackoverflow.com/a/18997575/7217653 This answer on stackoverflow provides us with a fast way of finding all primes below a given number.
Oke, now we can replace isPrime() with a search in our array.
Now the actual phi function:
Wikipedia gives us a clear example: https://en.wikipedia.org/wiki/Euler%27s_totient_function#Example
phi(36) = phi(2^2 * 3^2) = 36 * (1- 1/2) * (1- 1/3) = 30 * 1/2 * 2/3 = 12
In words, this says that the distinct prime factors of 36 are 2 and 3; half of the thirty-six integers from 1 to 36 are divisible by 2, leaving eighteen; a third of those are divisible by 3, leaving twelve numbers that are coprime to 36. And indeed there are twelve positive integers that are coprime with 36 and lower than 36: 1, 5, 7, 11, 13, 17, 19, 23, 25, 29, 31, and 35.
TL;DR
With other words: We have to find all the prime factors of our number and then multiply these prime factors together using foreach prime_factor: n *= 1 - 1/prime_factor.
import math
MAX = 10**5
# CREDIT TO https://stackoverflow.com/a/18997575/7217653
def sieve_for_primes_to(n):
size = n//2
sieve = [1]*size
limit = int(n**0.5)
for i in range(1,limit):
if sieve[i]:
val = 2*i+1
tmp = ((size-1) - i)//val
sieve[i+val::val] = [0]*tmp
return [2] + [i*2+1 for i, v in enumerate(sieve) if v and i>0]
PRIMES = sieve_for_primes_to(MAX)
print("Primes generated")
def phi(n):
original_n = n
prime_factors = []
prime_index = 0
while n > 1: # As long as there are more factors to be found
p = PRIMES[prime_index]
if (n % p == 0): # is this prime a factor?
prime_factors.append(p)
while math.ceil(n / p) == math.floor(n / p): # as long as we can devide our current number by this factor and it gives back a integer remove it
n = n // p
prime_index += 1
for v in prime_factors: # Now we have the prime factors, we do the same calculation as wikipedia
original_n *= 1 - (1/v)
return int(original_n)
print(phi(36)) # = phi(2**2 * 3**2) = 36 * (1- 1/2) * (1- 1/3) = 36 * 1/2 * 2/3 = 12

It looks like you're trying to use Euler's product formula, but you're not calculating the number of primes which divide a. You're calculating the number of elements relatively prime to a.
In addition, since 1 and i are both integers, so is the division, in this case you always get 0.

With regards to efficiency, I haven't noticed anyone mention that gcd(k,n)=gcd(n-k,n). Using this fact can save roughly half the work needed for the methods involving the use of the gcd. Just start the count with 2 (because 1/n and (n-1)/k will always be irreducible) and add 2 each time the gcd is one.

Here is a shorter implementation of orlp's answer.
from math import gcd
def phi(n): return sum([gcd(n, k)==1 for k in range(1, n+1)])
As others have already mentioned it leaves room for performance optimization.

Actually to calculate phi(any number say n)
We use the Formula
where p are the prime factors of n.
So, you have few mistakes in your code:
1.y should be equal to n
2. For 1/i actually 1 and i both are integers so their evaluation will also be an integer,thus it will lead to wrong results.
Here is the code with required corrections.
def phi(n):
y = n
for i in range(2,n+1):
if isPrime(i) and n % i == 0 :
y -= y/i
else:
continue
return int(y)

Float Limit in Python

I was writing a program in python
import sys
def func(N, M):
if N == M:
return 0.00
else:
if M == 0:
return pow(2, N+1) - 2.00
else :
return 1.00 + (0.5)*func(N, M+1) + 0.5*func(N, 0)
def main(*args):
test_cases = int(raw_input())
while test_cases:
string = raw_input()
a = string.split(" ")
N = int(a[0])
M = int(a[1])
test_cases = test_cases -1
result = func(N, M)
print("%.2f" % round(result, 2))
if __name__ == '__main__':
sys.setrecursionlimit(1500)
sys.exit(main(*sys.argv))
It gives the same answer for N = 1000 ,M = 1 and N = 1000 , M = 2
On searching I found that limit of float expires over 10^400. My question is how to overcome it

Floats in Python are IEEE doubles: they are not unlimited precision. But if your computation only needs integers, then just use integers: they are unlimited precision. Unfortunately, I think your computation does not stay within the integers.
There are third-party packages built on GMP that provide arbitrary-precision floats: https://www.google.com/search?q=python%20gmp

I maintain one of the Python to GMP/MPFR libraries and I tested your function. After checking the results and looking at your function, I think your function remains entirely in the integers. The following function returns the same values:
def func(N, M):
if M == 0:
return 2**(N+1) - 2
elif N == M:
return 0
else:
return func(N, M+1)//2 + 2**N
The limiting factor with Python's builtin float is not the exponent range (roughly 10**308) but the precision (53 bits). You need around N bits of precision to distinguish between func(N,1) and func(N,2)

Consider using an arbitrary precision floating-point library, for example the bigfloat package, or mpmath.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

numpy precision with large numbers - python

Related

Errors in Directly vs Recursively Calculating a given Fibonacci Number

Let n be a square number. Using Python, how we can efficiently calculate natural numbers y up to a limit l such that n+y^2 is again a square number?

Taylor series for log(x)

Computing Eulers Totient Function

Float Limit in Python

Categories

Resources