Why for-loop factorial function faster than recursive one? - python

I have written two functions to calculate combinations. The first one uses a for loop, the other one uses a recursive factorial function. Why is the first faster than the second?
def combinations(n: int, k: int) -> int:
# Collection >= Selection
if n < k:
raise ValueError(
"The size of the collection we are selecting items from must be "
"larger than the size of the selection."
)
# Sizes > 0
if n < 0 or k < 0:
raise ValueError(
"Cannot work with negative integers."
)
# Compute with standard python only
numerator = 1
for i in range(n + 1 - k, n+1):
numerator *= i
denominator = 1
for i in range(1, k+1):
denominator *= i
return int(numerator / denominator)
The second function needs a factorial function defined as:
def factorial(n: int) -> int:
if n < 0:
raise ValueError(
"Cannot calculate factorial of a negative number."
)
# Recursive function up to n = 0
return n * factorial(n - 1) if n - 1 >= 0 else 1
And it is defined as:
def combinations2(n: int, k: int) -> int:
# Collection >= Selection
if n < k:
raise ValueError(
"The size of the collection we are selecting items from must be "
"larger than the size of the selection."
)
return int(factorial(n) / (factorial(k) * factorial(n - k)))
When I run the following test on IPython console, it is clear which one is faster
%timeit combinations(1000, 50)
16.2 µs ± 1.95 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)
and
%timeit combinations2(1000, 50)
1.6 ms ± 129 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
NEW VERSION OF COMBINATIONS2
Okay following the comments, I agree combinations2 is doing many more operations. So I rewrote both factorial and combinations function, here are their versions:
def factorial(n: int, lower: int=-1) -> int:
# n > 0
if n < 0:
raise ValueError(
"Cannot calculate factorial of a negative number."
)
# Recursive function up to n = 0 or up to lower bound
if n - 1 >= 0 and n - 1 >= lower:
return n * factorial(n - 1, lower)
return 1
which now can have a lower bound. Notice that in general factorial(a, b) = factorial(a) / factorial(b). Also, here is the new version of the combinations2 function:
def combinations2(n: int, k: int) -> int:
if n < k:
raise ValueError(
"The size of the collection we are selecting items from must be "
"larger than the size of the selection."
)
return int(factorial(n, n - k) / factorial(k))
But again, this is their comparison:
%timeit combinations(100, 50)
10.5 µs ± 1.67 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit combinations2(100, 50)
56.1 µs ± 5.79 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Just count the number of operations:
In combinations you are making (n+1) - (n+1-k) multiplications for numerator, and (k+1) - 1 multiplications for denominator.
Total: 2k multiplications
In cominations2 you are making n + k + (n-k) multiplications, i.e. 2n multiplications.
And you are making also 2n function calls for recursion.
With k=50 and n=1000, no wonder why the first solution is faster.

Related

Python More efficient permutation

I have a string that consists of x amount of the letter 'r' and y amount of 'u'. And my goal is two print all possible combinations of that same amount of x,y with different orders. This my example of a working code.
import itertools
RIGHT = 'r'
UP = 'u'
def up_and_right(n, k, lst):
case = RIGHT*n + UP*k
return [''.join(p) for p in set(itertools.permutations(case))]
Example input and output:
input:
lst1 = []
up_and_right(2,2,lst1)
lst1 => output:['ruru', 'urur', 'rruu', 'uurr', 'urru', 'ruur']
My problem is when the input is an integer more than 10 the code takes a minute to compute. How can I improve the compute time?
Thanks in advance!
Try itertools.combinations:
import itertools
RIGHT = "r"
UP = "u"
def up_and_right(n, k):
out = []
for right_idxs in itertools.combinations(range(n + k), r=n):
s = ""
for idx in range(n + k):
if idx in right_idxs:
s += RIGHT
else:
s += UP
out.append(s)
return out
print(up_and_right(2, 2))
Prints:
['rruu', 'ruru', 'ruur', 'urru', 'urur', 'uurr']
With one-liner:
def up_and_right(n, k):
return [
"".join(RIGHT if idx in right_idxs else UP for idx in range(n + k))
for right_idxs in itertools.combinations(range(n + k), r=n)
]
Your problem is about finding permutations of multisets. sympy has a utility method that deals with it.
from sympy.utilities.iterables import multiset_permutations
def up_and_right2(n,k):
case = RIGHT*n + UP*k
return list(map(''.join, multiset_permutations(case)))
Test:
y = up_and_right(n,k)
x = up_and_right2(n,k)
assert len(x) == len(y) and set(y) == set(x)
Timings:
n = 5
k = 6
%timeit up_and_right(n,k)
3.57 s ± 52.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit up_and_right2(n,k)
4.17 ms ± 159 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Why np.float32 may perform slower than np.float64?

I am writing my own implementation of the Sinkhorn-Knopp algorithm for the optimal transport problem, basing on the implementation from the github repo of the POT. The function looks as follows:
#version for the dense matrices
def sinkhorn_knopp(C, reg, a = None, b = None, max_iter = 1e3, eps = 1e-9, log = False, verbose = False, log_interval = 10):
a = np.asarray(a, dtype=np.float64)
b = np.asarray(b, dtype=np.float64)
C = np.asarray(C, dtype=np.float64)
# if the weights are not specified, assign them uniformly
if len(a.shape) == 0:
a = np.ones((C.shape[0],), dtype=np.float64) / C.shape[0]
if len(b.shape) == 0:
b = np.ones((C.shape[1],), dtype=np.float64) / C.shape[1]
# Faster exponent
K = np.divide(C, -reg)
K = np.exp(K)
# Init data
dim_a = len(a)
dim_b = len(b)
# Set ininital values of u and v
u = np.ones(dim_a) / dim_a
v = np.ones(dim_b) / dim_b
r = np.empty_like(b)
Kp = (1 / a).reshape(-1, 1) * K
err = 1
cpt = 0
if log:
log = {'err' : []}
while(err > eps and cpt < max_iter):
uprev = u
vprev = v
KtransposeU = K.T # u
v = np.divide(b, KtransposeU)
u = 1. / (Kp # v)
if (np.any(KtransposeU == 0)
or np.any(np.isnan(u)) or np.any(np.isnan(v))
or np.any(np.isinf(u)) or np.any(np.isinf(v))):
# we have reached the machine precision
# come back to previous solution and quit loop
print('Warning: numerical errors at iteration', cpt)
u = uprev
v = vprev
break
if cpt % log_interval == 0:
#residual on the iteration
r = (u # K) * v
# violation of marginal
err = np.linalg.norm(r - b)
if log:
log['err'].append(err)
cpt += 1
#return OT matrix
ot_matrix = u * K * v
loss = np.sum(C * ot_matrix)
if log:
return ot_matrix, loss, log
else:
return ot_matrix, loss
I have listed the code for the np.float64. However if one works np.float32, surprisingly, the algorithm performs slower. Naively, one should expect to work faster for a "smaller" float, as there are less bit operations. But the measurements show following numbers:
#np.float64 version
63.6 ms ± 7.94 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
#np.float32 version
71.4 ms ± 2.01 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
#np.float64 version
650 ms ± 12.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
#np.float32 version
2.48 s ± 298 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
For the larger problem the difference in time is almost for times, which looks rather weird. Why it may be the case?

How can I speed up my calculations with loops? Python

I wrote this code. But it works very slowly.
I'm figuring out how many times I have to run the case generator to find numbers less than or equal to inv, in this case six. I count the number of attempts until a digit <= 6 is generated. I find inv equal to 1 and repeat the loop. Until inv is 0. I will keep trying to generate six digits <= 6.
And I will repeat all this 10 ** 4 degrees again to find the arithmetic mean.
Help me speed up this code. Works extremely slowly. I would be immensely grateful. Without third party libraries Thank!
import random
inv = 6
def math(inv):
n = 10**4
counter = 0
while n != 0:
invers = inv
count = 0
while invers > 0:
count += 1
random_digit = random.randint(1, 45)
if random_digit <= invers:
invers -= 1
counter += count
count = 0
if invers == 0:
n -= 1
invers = inv
print(counter/10**4)
math(inv)
Here is a simple way to accelerate your code as is using numba:
m2 = numba.jit(nopython=True)(math)
Timings in ipython:
%timeit math(inv)
1.44 s ± 16.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit -n 7 m2(inv)
10.4 ms ± 177 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
This speeds up your code by over 100x.
You don't need all those loops. numpy.random.randint() can generate an array of a given size with random integers between low and high, so
np.random.randint(1, 45, (1000, 10000)) will generate a matrix of 500 rows and 10k columns filled with random numbers. Then all you need to do is count how which is the first row in each column that contains a 6 (Numpy: find first index of value fast). Then
max_tries = 1000
def do_math(inv):
rands = np.random.randint(1, 45, (max_tries, 10000))
first_inv = np.argmax(rands < inv, axis=0)
counter = first_inv.mean()
return counter
This is not exactly what you want your function to do, but I found all your loops quite convoluted so this should point you in the right direction, feel free to adapt this code to what you need. This function will give you the number of tries required to get a random number less than inv, averaged over 10000 experiments.

Scipy pivoted QR permutation

I have to solve a lot of linear systems using the Scipy pivoted QR-decomposition.
Q, R, perm = scipy.linalg.qr(PW, pivoting=True, mode='full')
During solving the system I reorder the solution using a
permutation matrix using the function below.
def pvec2pmat(vec):
n = len(vec)
P = np.zeros((n, n))
counter = 0
for i in range(0, n):
for j in range(0, n):
if j == vec[counter]:
P[i, j] = 1.0
counter = counter + 1
break
return P.T
Unfortunately, this turns out to be very slow and the code spend a lot of time generating these matrices.
Is it possible to speedup this function?
It is very hard to answer your question, as I do not know what your code is supposed to do. Furthermore, there seems to be no connection to the title. If you understand you correctly, you ask me to optimize the given function, without even knowing what the input is.
I will assume that vec is expected to be a 1-dimensional integer array. In this case your second loop is quite unnecessary. There are two caseses:
vec[counter] in range(0, n)
In this case you set P[i, vec[counter]] to one, and increase the counter (due to the break statement)
vec[counter] not in range(0, n)
The counter will never be increases, as the if statement will never be True. Thus, we ignore the rest of vec and return the matrix.
Therefor a first simplification would be:
def pvec2pmat(vec):
n = len(vec)
P = np.zeros((n, n))
counter = 0
for i in range(0, n):
if vec[counter] in range(0, n):
P[i, vec[counter]] = 1.0
counter += 1
else:
return P.T
return P.T
So the relevant part of vec is only till for the first time a value not in range(0, n) is reached. We can check this right in the beginning and discard the rest.
We can do this using
invalid = (vec < 0) & (vec > 0)
try:
first_invalid = np.flatnonzero(invalid)[0]
except IndexError: # no invalid values
pass
else:
vec[:first_invalid] # keep only till first invalid encounter
Now we know that we assign one value for all rows i <= vec.size.
So we can simplify the loop
for i, vec_val in enumerate(vec):
P[i, vec_val] = 1
This can however also be done using indixing:
P[np.arange(vec.size), vec] = 1
Finally we realize, that instead of taken the transpose, we can just assign it in the reverse order and get
def pvec2pmat(vec):
n = len(vec)
P = np.zeros((n, n))
invalid = (vec < 0) & (vec > 0)
try:
first_invalid = np.flatnonzero(invalid)[0]
except IndexError: # no invalid values
pass
else:
vec[:first_invalid] # keep only till first invalid encounter
P[vec, np.arange(vec.size)] = 1
return P
A quick timing:
vec = np.arange(1000)
np.random.shuffle(vec)
# your version
128 ms ± 2.66 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
# my version
%timeit pvec2pmat(vec)
379 µs ± 22.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# for this simple example of a permutation, we can of course resort to
# siple indexing
%timeit np.eye(vec.size)[:, vec]
8.89 ms ± 13.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
For this particular example, my code takes 1/300 the time of your version. Depending on your need, another large speedup can be achieved, if P is chosen as an boolean matrix, and we assign True.

Primitive Calculator in Python

I have written the following code which does what is supposed to do and passes the tests and time and memory limits. However, it takes 90% of the time limit. Is it anyway to speed this up?
Secondly, I have seen other solutions that seem to be more straightforward and do not build a list of all minimum operations for integers up to n. Isn't it true that in DP we are supposed to do that? In other words, aren't we supposed to always build a table from bottom up?
Lastly, how can I make the code more readable?
# Use Python 3
""" You're given a calculator with only 3 operations: (-1, //2, //3).
find the minimum number of operations and the sequence of numbers to
go from 1 to n"""
import sys
input = sys.stdin.read()
n = int(input)
def operations(n):
"""
:param n: integer
:return: The list of the minimum number of operations to reduce n to 1
for each integer up to n. """
lst = [0] * n
for index in range(1, n):
nodes = [1 + lst[index - 1]]
for k in (2, 3):
if (index + 1) % k == 0:
nodes.append(1 + lst[((index + 1) // k) - 1])
lst[index] = sorted(nodes)[0]
return lst
master_sequence = list(enumerate(operations(n), 1))
end = master_sequence[-1]
minimum_operations = end[1]
sequence = []
while end != (1, 0):
step = [item[0] for item in master_sequence if
(end[1] - item[1]) == 1 and (end[0] - item[0] == 1 or end[0] %
item[0] == 0)][0]
sequence.append(step)
end = master_sequence[step - 1]
print(minimum_operations)
for s in sequence[::-1]:
print(s, end=' ')
print(n)
DP just means using sub-problem results to shorten the time/space complexity, so it often builds up but doesn't necessarily mean every value. Note: you could also solve this problem using a heap search, which wouldn't hit every node and I would imagine is pretty close to this in terms of timing and presumably less space.
A shorter approach using DP to the same result:
In []:
n = 10
# Define the operations and their condition for application:
ops = [(lambda x: True, lambda x: x-1),
(lambda x: x%2==0, lambda x: x//2),
(lambda x: x%3==0, lambda x: x//3)]
# Construct the operations count for all values up to `n`
min_ops = [0]*(n+1)
for i in range(2, n+1):
min_ops[i] = min(min_ops[op(i)] for cond, op in ops if cond(i))+1
# Reconstruct the path
r = []
while n:
r.append(n)
n = min((op(n) for cond, op in ops if cond(n)), key=min_ops.__getitem__)
len(r)-1, r[::-1]
Out[]
(3, [1, 3, 9, 10])
Some quick timings for different n:
10: 22 µs ± 577 ns per loop
1000: 1.48 ms ± 12.3 µs per loop
10000: 15.3 ms ± 325 µs per loop
100000: 159 ms ± 2.81 ms per loop
When I ran your code for I got:
10: 15.7 µs ± 229 ns per loop
1000: 4.55 ms ± 318 µs per loop
10000: 27.1 ms ± 896 µs per loop
100000: 315 ms ± 7.13 ms per loop

Categories