Why does dividing by the larger factor pair result in slower execution?
My solution for https://codility.com/programmers/task/min_perimeter_rectangle/
from math import sqrt, floor
# This fails the performance tests
def solution_slow(n):
x = int(sqrt(n))
for i in xrange(x, n+1):
if n % i == 0:
return 2*(i + n / i))
# This passes the performance tests
def solution_fast(n):
x = int(sqrt(n))
for i in xrange(x, 0, -1):
if n % i == 0:
return 2*(i + n / i)
It's not division that slows it down; it's the number of iterations required.
Let L = xrange(0, x) (order doesn't matter here) and R = xrange(x, n+1). Every factor of n in L can be paired with exactly one factor of n in R. In general, x is much, much smaller than n/2, so L is much smaller than R. This means that there are far more elements of R that don't divide n than there are in L. In the case of a prime number, there are no factors, so the slow solution has to check every value of the much larger than instead of the much smaller set.
That's obvious. The first function loops many more times.
Note that sqrt(n) != n - sqrt(n)! in general sqrt(n) << n-sqrt(n) where << means much lesser than.
If n=1000 the first function is looping 969 times while the second one only 32.
I'd say the of iterations is the key which makes perfomance a little bit different between your functions as #Bakuriu already said. Also, xrange could be slightly more expensive than using a simple loop, for instance, take a look f3 will perform a little better than f1 & f2:
import timeit
from math import sqrt, floor
def f1(n):
x = int(sqrt(n))
for i in xrange(x, n + 1):
if n % i == 0:
return 2 * (i + n / i)
def f2(n):
x = int(sqrt(n))
for i in xrange(x, 0, -1):
if n % i == 0:
return 2 * (i + n / i)
def f3(n):
x = int(sqrt(n))
while True:
if n % x == 0:
return 2 * (x + n / x)
x -= 1
N = 30
K = 100000
print("Measuring {0} times f1({1})={2}".format(
K, N, timeit.timeit('f1(N)', setup='from __main__ import f1, N', number=K)))
print("Measuring {0} times f1({1})={2}".format(
K, N, timeit.timeit('f2(N)', setup='from __main__ import f2, N', number=K)))
print("Measuring {0} times f1({1})={2}".format(
K, N, timeit.timeit('f3(N)', setup='from __main__ import f3, N', number=K)))
# Measuring 100000 times f1(30)=0.0738177938151
# Measuring 100000 times f1(30)=0.0753000788315
# Measuring 100000 times f1(30)=0.0503645315841
# [Finished in 0.3s]
Next time, you got these type of questions, using a profiler is highly recommended :)
Related
Using Python, I would like to implement a function that takes a natural number n as input and outputs a list of natural numbers [y1, y2, y3, ...] such that n + y1*y1 and n + y2*y2 and n + y3*y3 and so forth is again a square.
What I tried so far is to obtain one y-value using the following function:
def find_square(n:int) -> tuple[int, int]:
if n%2 == 1:
y = (n-1)//2
x = n+y*y
return (y,x)
return None
It works fine, eg. find_square(13689) gives me a correct solution y=6844. It would be great to have an algorithm that yields all possible y-values such as y=44 or y=156.
Simplest slow approach is of course for given N just to iterate all possible Y and check if N + Y^2 is square.
But there is a much faster approach using integer Factorization technique:
Lets notice that to solve equation N + Y^2 = X^2, that is to find all integer pairs (X, Y) for given fixed integer N, we can rewrite this equation to N = X^2 - Y^2 = (X + Y) * (X - Y) which follows from famous school formula of difference of squares.
Now lets rename two factors as A, B i.e. N = (X + Y) * (X - Y) = A * B, which means that X = (A + B) / 2 and Y = (A - B) / 2.
Notice that A and B should be of same odditiy, either both odd or both even, otherwise in last formulas above we can't have whole division by 2.
We will factorize N into all possible pairs of two factors (A, B) of same oddity. For fast factorization in code below I used simple to implement but yet quite fast algorithm Pollard Rho, also two extra algorithms were needed as a helper to Pollard Rho, one is Fermat Primality Test (which allows fast checking if number is probably prime) and second is Trial Division Factorization (which helps Pollard Rho to factor out small factors, which could cause Pollard Rho to fail).
Pollard Rho for composite number has time complexity O(N^(1/4)) which is very fast even for 64-bit numbers. Any faster factorization algorithm can be chosen if needed a bigger space to be searched. My fast algorithm time is dominated by speed of factorization, remaining part of algorithm is blazingly fast, just few iterations of loop with simple formulas.
If your N is a square itself (hence we know its root easily), then Pollard Rho can factor N even much faster, within O(N^(1/8)) time. Even for 128-bit numbers it means very small time, 2^16 operations, and I hope you're solving your task for less than 128 bit numbers.
If you want to process a range of possible N values then fastest way to factorize them is to use techniques similar to Sieve of Erathosthenes, using set of prime numbers, it allows to compute all factors for all N numbers within some range. Using Sieve of Erathosthenes for the case of range of Ns is much faster than factorizing each N with Pollard Rho.
After factoring N into pairs (A, B) we compute (X, Y) based on (A, B) by formulas above. And output resulting Y as a solution of fast algorithm.
Following code as an example is implemented in pure Python. Of course one can use Numba to speed it up, Numba usually gives 30-200 times speedup, for Python it achieves same speed as optimized C++. But I thought that main thing here is to implement fast algorithm, Numba optimizations can be done easily afterwards.
I added time measurement into following code. Although it is pure Python still my fast algorithm achieves 8500x times speedup compared to regular brute force approach for limit of 1 000 000.
You can change limit variable to tweak amount of searched space, or num_tests variable to tweak amount of different tests.
Following code implements both solutions - fast solution find_fast() described above plus very tiny brute force solution find_slow() which is very slow as it scans all possible candidates. This slow solution is only used to compare correctness in tests and compare speedup.
Code below uses nothing except few standard Python library modules, no external modules were used.
Try it online!
def find_slow(N):
import math
def is_square(x):
root = int(math.sqrt(float(x)) + 0.5)
return root * root == x, root
l = []
for y in range(N):
if is_square(N + y ** 2)[0]:
l.append(y)
return l
def find_fast(N):
import itertools, functools
Prod = lambda it: functools.reduce(lambda a, b: a * b, it, 1)
fs = factor(N)
mfs = {}
for e in fs:
mfs[e] = mfs.get(e, 0) + 1
fs = sorted(mfs.items())
del mfs
Ys = set()
for take_a in itertools.product(*[
(range(v + 1) if k != 2 else range(1, v)) for k, v in fs]):
A = Prod([p ** t for (p, _), t in zip(fs, take_a)])
B = N // A
assert A * B == N, (N, A, B, take_a)
if A < B:
continue
X = (A + B) // 2
Y = (A - B) // 2
assert N + Y ** 2 == X ** 2, (N, A, B, X, Y)
Ys.add(Y)
return sorted(Ys)
def trial_div_factor(n, limit = None):
# https://en.wikipedia.org/wiki/Trial_division
fs = []
while n & 1 == 0:
fs.append(2)
n >>= 1
all_checked = False
for d in range(3, (limit or n) + 1, 2):
if d * d > n:
all_checked = True
break
while True:
q, r = divmod(n, d)
if r != 0:
break
fs.append(d)
n = q
if n > 1 and all_checked:
fs.append(n)
n = 1
return fs, n
def fermat_prp(n, trials = 32):
# https://en.wikipedia.org/wiki/Fermat_primality_test
import random
if n <= 16:
return n in (2, 3, 5, 7, 11, 13)
for i in range(trials):
if pow(random.randint(2, n - 2), n - 1, n) != 1:
return False
return True
def pollard_rho_factor(n):
# https://en.wikipedia.org/wiki/Pollard%27s_rho_algorithm
import math, random
fs, n = trial_div_factor(n, 1 << 7)
if n <= 1:
return fs
if fermat_prp(n):
return sorted(fs + [n])
for itry in range(8):
failed = False
x = random.randint(2, n - 2)
for cycle in range(1, 1 << 60):
y = x
for i in range(1 << cycle):
x = (x * x + 1) % n
d = math.gcd(x - y, n)
if d == 1:
continue
if d == n:
failed = True
break
return sorted(fs + pollard_rho_factor(d) + pollard_rho_factor(n // d))
if failed:
break
assert False, f'Pollard Rho failed! n = {n}'
def factor(N):
import functools
Prod = lambda it: functools.reduce(lambda a, b: a * b, it, 1)
fs = pollard_rho_factor(N)
assert N == Prod(fs), (N, fs)
return sorted(fs)
def test():
import random, time
limit = 1 << 20
num_tests = 20
t0, t1 = 0, 0
for i in range(num_tests):
if (round(i / num_tests * 1000)) % 100 == 0 or i + 1 >= num_tests:
print(f'test {i}, ', end = '', flush = True)
N = random.randrange(limit)
tb = time.time()
r0 = find_slow(N)
t0 += time.time() - tb
tb = time.time()
r1 = find_fast(N)
t1 += time.time() - tb
assert r0 == r1, (N, r0, r1, t0, t1)
print(f'\nTime slow {t0:.05f} sec, fast {t1:.05f} sec, speedup {round(t0 / max(1e-6, t1))} times')
if __name__ == '__main__':
test()
Output:
test 0, test 2, test 4, test 6, test 8, test 10, test 12, test 14, test 16, test 18, test 19,
Time slow 26.28198 sec, fast 0.00301 sec, speedup 8732 times
For the easiest solution, you can try this:
import math
n=13689 #or we can ask user to input a square number.
for i in range(1,9999):
if math.sqrt(n+i**2).is_integer():
print(i)
Determinant definition has only additions, subtractions and multiplications. So a determinant of a matrix with integer elements must be integer.
However numpy.linalg.det() returns a "slightly off" floating-point number:
>>> import numpy
>>> M = [[-1 if i==j else 1 for j in range(7)] for i in range(7)]
>>> numpy.linalg.det(M)
319.99999999999994
It gets worse for a larger matrix:
>>> M = [[-1024 if i==j else 1024 for j in range(7)] for i in range(7)]
>>> numpy.linalg.det(M)
3.777893186295698e+23
>>> "%.0f" % numpy.linalg.det(M)
'377789318629569805156352'
And it's wrong! I'm sure the correct answer is:
>>> 320 * 1024**7
377789318629571617095680
Of course, for a big matrix it may be a rather long integer. But python has long integers built in.
How can I get an exact integer value of the determinant instead of approximate floating point value?
A simple practical way to calculate determinant of an integet matrix is Bareiss algorithm.
def det(M):
M = [row[:] for row in M] # make a copy to keep original M unmodified
N, sign, prev = len(M), 1, 1
for i in range(N-1):
if M[i][i] == 0: # swap with another row having nonzero i's elem
swapto = next( (j for j in range(i+1,N) if M[j][i] != 0), None )
if swapto is None:
return 0 # all M[*][i] are zero => zero determinant
M[i], M[swapto], sign = M[swapto], M[i], -sign
for j in range(i+1,N):
for k in range(i+1,N):
assert ( M[j][k] * M[i][i] - M[j][i] * M[i][k] ) % prev == 0
M[j][k] = ( M[j][k] * M[i][i] - M[j][i] * M[i][k] ) // prev
prev = M[i][i]
return sign * M[-1][-1]
This algorithm is reasonably fast (O(N³) complexity).
And it's an integer preserving algorithm. It does have a division. But as long as all the elements of M are integer, all intermediate calculations would be integer too (the division remainder will be zero).
As a bonus the same code works for fractions/floating-point/complex elements if you drop the assert line and replace the integer division // with a regular division /.
PS: Another alternative is to use sympy instead of numpy:
>>> import sympy
>>> sympy.Matrix([ [-1024 if i==j else 1024 for j in range(7)] for i in range(7) ]).det()
377789318629571617095680
But somewhy that is MUCH slower than the above det() function.
# Performance test: `numpy.linalg.det(M)` vs `det(M)` vs `sympy.Matrix(M).det()`
import timeit
def det(M):
...
M = [[-1024 if i==j else 1024 for j in range(7)] for i in range(7)]
print(timeit.repeat("numpy.linalg.det(M)", setup="import numpy; from __main__ import M", number=100, repeat=5))
#: [0.0035009384155273, 0.0033931732177734, 0.0033941268920898, 0.0033800601959229, 0.0033988952636719]
print(timeit.repeat("det(M)", setup="from __main__ import det, M", number=100, repeat=5))
#: [0.0171120166778564, 0.0171020030975342, 0.0171608924865723, 0.0170948505401611, 0.0171010494232178]
print(timeit.repeat("sympy.Matrix(M).det()", setup="import sympy; from __main__ import M", number=100, repeat=5))
#: [0.9561479091644287, 0.9564781188964844, 0.9539868831634521, 0.9536828994750977, 0.9546608924865723]
Summary:
det(M) is 5+ times slower than numpy.linalg.det(M),
det(M) is ~50 times faster than sympy.Matrix(M).det()
It becomes even faster without the assert line.
#pycoder's answer is the preferred solution; for comparison, I wrote a Gaussian elimination function using the Fraction class which allows exact arithmetic of rational numbers. It is about 11 times slower than the Bareiss algorithm on the same benchmark.
from fractions import Fraction
def det(matrix):
matrix = [[Fraction(x, 1) for x in row] for row in matrix]
n = len(matrix)
d, sign = 1, 1
for i in range(n):
if matrix[i][i] == 0:
j = next((j for j in range(i + 1, n) if matrix[j][i] != 0), None)
if j is None:
return 0
matrix[i], matrix[j] = matrix[j], matrix[i]
sign = -sign
d *= matrix[i][i]
for j in range(i + 1, n):
factor = matrix[j][i] / matrix[i][i]
for k in range(i + 1, n):
matrix[j][k] -= factor * matrix[i][k]
return int(d) * sign
I need to compute:
x=(x*a+b)/2 % 2**128
many times. x,a,b are 128-bit numbers (choosed randomly). How to do it in fastest way? I thought about numpy, could it help somehow? Now it is about 100 times to slow... Is there way to do it faster? Of course it has to be done separately, step by step, algorithm is more coplicated than this (a,b is changed after few steps), so we can't try to do here any math or fast exponentiation.
Example of more complete code:
a=333
b=555
c=777
d=999
x=12345
for i in range(128):
if x % 2 == 1:
x=((x * a + b)/2) % 340282366920938463463374607431768211456
else:
x=(x * c/2 + d) % 340282366920938463463374607431768211456
print(x)
You're going to have some troubles in that you're doing inherently inefficient operations:128-bit integers are not native to most Python implementations, and will incur the penalties of longint operations. However, you can drop the execution time by about 20% if you use shift & mask operations instead of division and modulus by powers of 2:
import timeit
a=333
b=555
c=777
d=999
x=12345
two_128 = 2 ** 128
mask = two_128 - 1
def rng_orig(x):
for i in range(128):
if x % 2 == 1:
x=((x * a + b)/2) % two_128
else:
x=(x * c/2 + d) % two_128
def rng_bit(x):
for i in range(128):
if x & 1:
x=((x * a + b) >> 1) & mask
else:
x=(x * (c >> 1) + d) & mask
repeat = 100000
print(timeit.timeit(lambda: rng_orig(x), number = repeat))
print(timeit.timeit(lambda: rng_bit (x), number = repeat))
Timing results:
5.1968478000000005
3.965898900000001
If this is intended to be integer arithmetics, you should use integer divisions. This will avoid unnecessary conversion to floats. Also, using bitwise operations for the modulo is probably going to be faster.
mask128 = 2**128 - 1
x = ( (x*a+b)//2 ) & mask128
I've been trying to find a super fast code that can calculate the factorial of a big number like 70000 in 0.5 second,My own code could do it in 10 seconds.I've searched everywhere, every code I find has memory error problem or is not as fast as I want. Can anyone help me with this?
enter code here
import math
num =int(raw_input())
usefrm=0
if len(str(num)) > 2:
if int(str(num)[-2]) % 2 == 0:
usefrm = 'even'
else:
usefrm = 'odd'
else:
if num % 2 == 0:
usefrm = 'even1'
else:
usefrm = 'odd1'
def picknumber(num):
s = str(math.factorial(num))
l = []
for n in s:
if int(n) != 0:
l.append(int(n))
return l[-1]
def picknumber1(num):
s = str(num)
l = []
for n in s:
if int(n) != 0:
l.append(int(n))
return l[-1]
if usefrm == 'even':
e=picknumber1(6*picknumber(int(num/5))*picknumber(int(str(num)[-1])))
if usefrm == 'odd':
e=picknumber1(4*picknumber(int(num/5))*picknumber(int(str(num)[-1])))
else:
e=picknumber1(math.factorial(num))
print e
For most practical use, the Stirling's approximation is very fast and quite accurate
import math
from decimal import Decimal
def fact(n):
d = Decimal(n)
return (Decimal(2 * math.pi) * d).sqrt() * (d / Decimal(math.e)) ** d
print(fact(70000))
1.176811014417743803074731978E+308759
Try to use the commutativity property of integer multiplication.
When multiplied numbers are long (they do not fit in a single word), the time necessary to perform the operation grows superlinearly with their length.
If you multiply the smallest (shortest in terms of memory representation) factors (and partial products) first, you may save a lot of time.
You may use math.factorial(). For example:
from math import factorial
factorial(7000)
with execution time of 20.5 msec for calculating the factorial of 7000:
python -m timeit -c "from math import factorial; factorial(7000)"
10 loops, best of 3: 20.5 msec per loop
If you don't want a perfect precision, you can use the Stirling's approximation
https://en.wikipedia.org/wiki/Stirling's_approximation
import np
n! ~ np.sqrt(2*np.pi*n)*(n/np.e)**n
for large n values. This calculation is literally instantaneous.
Maybe you can try to make use of threads.
I am trying to find an efficient way to compute Euler's totient function.
What is wrong with this code? It doesn't seem to be working.
def isPrime(a):
return not ( a < 2 or any(a % i == 0 for i in range(2, int(a ** 0.5) + 1)))
def phi(n):
y = 1
for i in range(2,n+1):
if isPrime(i) is True and n % i == 0 is True:
y = y * (1 - 1/i)
else:
continue
return int(y)
Here's a much faster, working way, based on this description on Wikipedia:
Thus if n is a positive integer, then φ(n) is the number of integers k in the range 1 ≤ k ≤ n for which gcd(n, k) = 1.
I'm not saying this is the fastest or cleanest, but it works.
from math import gcd
def phi(n):
amount = 0
for k in range(1, n + 1):
if gcd(n, k) == 1:
amount += 1
return amount
You have three different problems...
y needs to be equal to n as initial value, not 1
As some have mentioned in the comments, don't use integer division
n % i == 0 is True isn't doing what you think because of Python chaining the comparisons! Even if n % i equals 0 then 0 == 0 is True BUT 0 is True is False! Use parens or just get rid of comparing to True since that isn't necessary anyway.
Fixing those problems,
def phi(n):
y = n
for i in range(2,n+1):
if isPrime(i) and n % i == 0:
y *= 1 - 1.0/i
return int(y)
Calculating gcd for every pair in range is not efficient and does not scales. You don't need to iterate throught all the range, if n is not a prime you can check for prime factors up to its square root, refer to https://stackoverflow.com/a/5811176/3393095.
We must then update phi for every prime by phi = phi*(1 - 1/prime).
def totatives(n):
phi = int(n > 1 and n)
for p in range(2, int(n ** .5) + 1):
if not n % p:
phi -= phi // p
while not n % p:
n //= p
#if n is > 1 it means it is prime
if n > 1: phi -= phi // n
return phi
I'm working on a cryptographic library in python and this is what i'm using. gcd() is Euclid's method for calculating greatest common divisor, and phi() is the totient function.
def gcd(a, b):
while b:
a, b=b, a%b
return a
def phi(a):
b=a-1
c=0
while b:
if not gcd(a,b)-1:
c+=1
b-=1
return c
Most implementations mentioned by other users rely on calling a gcd() or isPrime() function. In the case you are going to use the phi() function many times, it pays of to calculated these values before hand. A way of doing this is by using a so called sieve algorithm.
https://stackoverflow.com/a/18997575/7217653 This answer on stackoverflow provides us with a fast way of finding all primes below a given number.
Oke, now we can replace isPrime() with a search in our array.
Now the actual phi function:
Wikipedia gives us a clear example: https://en.wikipedia.org/wiki/Euler%27s_totient_function#Example
phi(36) = phi(2^2 * 3^2) = 36 * (1- 1/2) * (1- 1/3) = 30 * 1/2 * 2/3 = 12
In words, this says that the distinct prime factors of 36 are 2 and 3; half of the thirty-six integers from 1 to 36 are divisible by 2, leaving eighteen; a third of those are divisible by 3, leaving twelve numbers that are coprime to 36. And indeed there are twelve positive integers that are coprime with 36 and lower than 36: 1, 5, 7, 11, 13, 17, 19, 23, 25, 29, 31, and 35.
TL;DR
With other words: We have to find all the prime factors of our number and then multiply these prime factors together using foreach prime_factor: n *= 1 - 1/prime_factor.
import math
MAX = 10**5
# CREDIT TO https://stackoverflow.com/a/18997575/7217653
def sieve_for_primes_to(n):
size = n//2
sieve = [1]*size
limit = int(n**0.5)
for i in range(1,limit):
if sieve[i]:
val = 2*i+1
tmp = ((size-1) - i)//val
sieve[i+val::val] = [0]*tmp
return [2] + [i*2+1 for i, v in enumerate(sieve) if v and i>0]
PRIMES = sieve_for_primes_to(MAX)
print("Primes generated")
def phi(n):
original_n = n
prime_factors = []
prime_index = 0
while n > 1: # As long as there are more factors to be found
p = PRIMES[prime_index]
if (n % p == 0): # is this prime a factor?
prime_factors.append(p)
while math.ceil(n / p) == math.floor(n / p): # as long as we can devide our current number by this factor and it gives back a integer remove it
n = n // p
prime_index += 1
for v in prime_factors: # Now we have the prime factors, we do the same calculation as wikipedia
original_n *= 1 - (1/v)
return int(original_n)
print(phi(36)) # = phi(2**2 * 3**2) = 36 * (1- 1/2) * (1- 1/3) = 36 * 1/2 * 2/3 = 12
It looks like you're trying to use Euler's product formula, but you're not calculating the number of primes which divide a. You're calculating the number of elements relatively prime to a.
In addition, since 1 and i are both integers, so is the division, in this case you always get 0.
With regards to efficiency, I haven't noticed anyone mention that gcd(k,n)=gcd(n-k,n). Using this fact can save roughly half the work needed for the methods involving the use of the gcd. Just start the count with 2 (because 1/n and (n-1)/k will always be irreducible) and add 2 each time the gcd is one.
Here is a shorter implementation of orlp's answer.
from math import gcd
def phi(n): return sum([gcd(n, k)==1 for k in range(1, n+1)])
As others have already mentioned it leaves room for performance optimization.
Actually to calculate phi(any number say n)
We use the Formula
where p are the prime factors of n.
So, you have few mistakes in your code:
1.y should be equal to n
2. For 1/i actually 1 and i both are integers so their evaluation will also be an integer,thus it will lead to wrong results.
Here is the code with required corrections.
def phi(n):
y = n
for i in range(2,n+1):
if isPrime(i) and n % i == 0 :
y -= y/i
else:
continue
return int(y)