Python More efficient permutation - python

I have a string that consists of x amount of the letter 'r' and y amount of 'u'. And my goal is two print all possible combinations of that same amount of x,y with different orders. This my example of a working code.
import itertools
RIGHT = 'r'
UP = 'u'
def up_and_right(n, k, lst):
case = RIGHT*n + UP*k
return [''.join(p) for p in set(itertools.permutations(case))]
Example input and output:
input:
lst1 = []
up_and_right(2,2,lst1)
lst1 => output:['ruru', 'urur', 'rruu', 'uurr', 'urru', 'ruur']
My problem is when the input is an integer more than 10 the code takes a minute to compute. How can I improve the compute time?
Thanks in advance!

Try itertools.combinations:
import itertools
RIGHT = "r"
UP = "u"
def up_and_right(n, k):
out = []
for right_idxs in itertools.combinations(range(n + k), r=n):
s = ""
for idx in range(n + k):
if idx in right_idxs:
s += RIGHT
else:
s += UP
out.append(s)
return out
print(up_and_right(2, 2))
Prints:
['rruu', 'ruru', 'ruur', 'urru', 'urur', 'uurr']
With one-liner:
def up_and_right(n, k):
return [
"".join(RIGHT if idx in right_idxs else UP for idx in range(n + k))
for right_idxs in itertools.combinations(range(n + k), r=n)
]

Your problem is about finding permutations of multisets. sympy has a utility method that deals with it.
from sympy.utilities.iterables import multiset_permutations
def up_and_right2(n,k):
case = RIGHT*n + UP*k
return list(map(''.join, multiset_permutations(case)))
Test:
y = up_and_right(n,k)
x = up_and_right2(n,k)
assert len(x) == len(y) and set(y) == set(x)
Timings:
n = 5
k = 6
%timeit up_and_right(n,k)
3.57 s ± 52.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit up_and_right2(n,k)
4.17 ms ± 159 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Related

Why np.float32 may perform slower than np.float64?

I am writing my own implementation of the Sinkhorn-Knopp algorithm for the optimal transport problem, basing on the implementation from the github repo of the POT. The function looks as follows:
#version for the dense matrices
def sinkhorn_knopp(C, reg, a = None, b = None, max_iter = 1e3, eps = 1e-9, log = False, verbose = False, log_interval = 10):
a = np.asarray(a, dtype=np.float64)
b = np.asarray(b, dtype=np.float64)
C = np.asarray(C, dtype=np.float64)
# if the weights are not specified, assign them uniformly
if len(a.shape) == 0:
a = np.ones((C.shape[0],), dtype=np.float64) / C.shape[0]
if len(b.shape) == 0:
b = np.ones((C.shape[1],), dtype=np.float64) / C.shape[1]
# Faster exponent
K = np.divide(C, -reg)
K = np.exp(K)
# Init data
dim_a = len(a)
dim_b = len(b)
# Set ininital values of u and v
u = np.ones(dim_a) / dim_a
v = np.ones(dim_b) / dim_b
r = np.empty_like(b)
Kp = (1 / a).reshape(-1, 1) * K
err = 1
cpt = 0
if log:
log = {'err' : []}
while(err > eps and cpt < max_iter):
uprev = u
vprev = v
KtransposeU = K.T # u
v = np.divide(b, KtransposeU)
u = 1. / (Kp # v)
if (np.any(KtransposeU == 0)
or np.any(np.isnan(u)) or np.any(np.isnan(v))
or np.any(np.isinf(u)) or np.any(np.isinf(v))):
# we have reached the machine precision
# come back to previous solution and quit loop
print('Warning: numerical errors at iteration', cpt)
u = uprev
v = vprev
break
if cpt % log_interval == 0:
#residual on the iteration
r = (u # K) * v
# violation of marginal
err = np.linalg.norm(r - b)
if log:
log['err'].append(err)
cpt += 1
#return OT matrix
ot_matrix = u * K * v
loss = np.sum(C * ot_matrix)
if log:
return ot_matrix, loss, log
else:
return ot_matrix, loss
I have listed the code for the np.float64. However if one works np.float32, surprisingly, the algorithm performs slower. Naively, one should expect to work faster for a "smaller" float, as there are less bit operations. But the measurements show following numbers:
#np.float64 version
63.6 ms ± 7.94 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
#np.float32 version
71.4 ms ± 2.01 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
#np.float64 version
650 ms ± 12.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
#np.float32 version
2.48 s ± 298 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
For the larger problem the difference in time is almost for times, which looks rather weird. Why it may be the case?

Splitting values in an array 'logarithmically' / based on another array

I have a 2d array, where each element is a fourier transform. I'd like to split transform 'logarithmically'. For example, let's take a single one of those arrays and call it a:
a = np.arange(0, 512)
# I want to split a into 'bins' defined by b, below:
b = np.array([0] + [10 * 2**i for i in range(6)]) # [0, 10, 20, 40, 80, 160, 320, 640]
What I'm looking to do is something like using np.split, except I would like to split values into 'bins' based on array b such that all values of a between [0, 10) are in one bin, all values between [10, 20) in another, etc.
I could do this in some sort of convoluted for loop:
split_arr = []
for i in range(1, len(b)):
fbin = []
for amp in a:
if (amp >= b[i-1]) and (amp < b[i]):
fbin.append(amp)
split_arr.append(fbin)
I have many arrays to split, and also this is ugly (just my opinion). Is there a better way?
Here is how you can do it, using np.split:
np.split(a, np.searchsorted(a,b))
If your array a is not sorted, sort it before the above command:
a = np.sort(a)
np.searchsorted finds the locations of values in b that would be inserted in the sorted array a. In other words, np.searchsorted finds the locations where you want to split your array. And if you do not want the empty array at the beginning, simply remove 0 from b.
First you can reduce the 'ugliness' by using list comprehension:
split_arr = [[amp for amp in a if (amp >= b[i-1]) and (amp < b[i])] for i in range(1, len(b))]
Then you can apply the same logic using numpy fast parallelized functionalities (which has the bonus of looking even cleaner):
split_arr = [a[(a >= b[i-1]) & (a < b[i])] for i in range(1, len(b))]
Comparison:
%timeit [[amp for amp in a if (amp >= b[i-1]) and (amp < b[i])] for i in range(1, len(b))]
1.29 ms ± 109 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit [a[(a >= b[i-1]) & (a < b[i])] for i in range(1, len(b))]
35.9 µs ± 4.52 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Scipy pivoted QR permutation

I have to solve a lot of linear systems using the Scipy pivoted QR-decomposition.
Q, R, perm = scipy.linalg.qr(PW, pivoting=True, mode='full')
During solving the system I reorder the solution using a
permutation matrix using the function below.
def pvec2pmat(vec):
n = len(vec)
P = np.zeros((n, n))
counter = 0
for i in range(0, n):
for j in range(0, n):
if j == vec[counter]:
P[i, j] = 1.0
counter = counter + 1
break
return P.T
Unfortunately, this turns out to be very slow and the code spend a lot of time generating these matrices.
Is it possible to speedup this function?
It is very hard to answer your question, as I do not know what your code is supposed to do. Furthermore, there seems to be no connection to the title. If you understand you correctly, you ask me to optimize the given function, without even knowing what the input is.
I will assume that vec is expected to be a 1-dimensional integer array. In this case your second loop is quite unnecessary. There are two caseses:
vec[counter] in range(0, n)
In this case you set P[i, vec[counter]] to one, and increase the counter (due to the break statement)
vec[counter] not in range(0, n)
The counter will never be increases, as the if statement will never be True. Thus, we ignore the rest of vec and return the matrix.
Therefor a first simplification would be:
def pvec2pmat(vec):
n = len(vec)
P = np.zeros((n, n))
counter = 0
for i in range(0, n):
if vec[counter] in range(0, n):
P[i, vec[counter]] = 1.0
counter += 1
else:
return P.T
return P.T
So the relevant part of vec is only till for the first time a value not in range(0, n) is reached. We can check this right in the beginning and discard the rest.
We can do this using
invalid = (vec < 0) & (vec > 0)
try:
first_invalid = np.flatnonzero(invalid)[0]
except IndexError: # no invalid values
pass
else:
vec[:first_invalid] # keep only till first invalid encounter
Now we know that we assign one value for all rows i <= vec.size.
So we can simplify the loop
for i, vec_val in enumerate(vec):
P[i, vec_val] = 1
This can however also be done using indixing:
P[np.arange(vec.size), vec] = 1
Finally we realize, that instead of taken the transpose, we can just assign it in the reverse order and get
def pvec2pmat(vec):
n = len(vec)
P = np.zeros((n, n))
invalid = (vec < 0) & (vec > 0)
try:
first_invalid = np.flatnonzero(invalid)[0]
except IndexError: # no invalid values
pass
else:
vec[:first_invalid] # keep only till first invalid encounter
P[vec, np.arange(vec.size)] = 1
return P
A quick timing:
vec = np.arange(1000)
np.random.shuffle(vec)
# your version
128 ms ± 2.66 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
# my version
%timeit pvec2pmat(vec)
379 µs ± 22.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# for this simple example of a permutation, we can of course resort to
# siple indexing
%timeit np.eye(vec.size)[:, vec]
8.89 ms ± 13.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
For this particular example, my code takes 1/300 the time of your version. Depending on your need, another large speedup can be achieved, if P is chosen as an boolean matrix, and we assign True.

Why for-loop factorial function faster than recursive one?

I have written two functions to calculate combinations. The first one uses a for loop, the other one uses a recursive factorial function. Why is the first faster than the second?
def combinations(n: int, k: int) -> int:
# Collection >= Selection
if n < k:
raise ValueError(
"The size of the collection we are selecting items from must be "
"larger than the size of the selection."
)
# Sizes > 0
if n < 0 or k < 0:
raise ValueError(
"Cannot work with negative integers."
)
# Compute with standard python only
numerator = 1
for i in range(n + 1 - k, n+1):
numerator *= i
denominator = 1
for i in range(1, k+1):
denominator *= i
return int(numerator / denominator)
The second function needs a factorial function defined as:
def factorial(n: int) -> int:
if n < 0:
raise ValueError(
"Cannot calculate factorial of a negative number."
)
# Recursive function up to n = 0
return n * factorial(n - 1) if n - 1 >= 0 else 1
And it is defined as:
def combinations2(n: int, k: int) -> int:
# Collection >= Selection
if n < k:
raise ValueError(
"The size of the collection we are selecting items from must be "
"larger than the size of the selection."
)
return int(factorial(n) / (factorial(k) * factorial(n - k)))
When I run the following test on IPython console, it is clear which one is faster
%timeit combinations(1000, 50)
16.2 µs ± 1.95 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)
and
%timeit combinations2(1000, 50)
1.6 ms ± 129 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
NEW VERSION OF COMBINATIONS2
Okay following the comments, I agree combinations2 is doing many more operations. So I rewrote both factorial and combinations function, here are their versions:
def factorial(n: int, lower: int=-1) -> int:
# n > 0
if n < 0:
raise ValueError(
"Cannot calculate factorial of a negative number."
)
# Recursive function up to n = 0 or up to lower bound
if n - 1 >= 0 and n - 1 >= lower:
return n * factorial(n - 1, lower)
return 1
which now can have a lower bound. Notice that in general factorial(a, b) = factorial(a) / factorial(b). Also, here is the new version of the combinations2 function:
def combinations2(n: int, k: int) -> int:
if n < k:
raise ValueError(
"The size of the collection we are selecting items from must be "
"larger than the size of the selection."
)
return int(factorial(n, n - k) / factorial(k))
But again, this is their comparison:
%timeit combinations(100, 50)
10.5 µs ± 1.67 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit combinations2(100, 50)
56.1 µs ± 5.79 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Just count the number of operations:
In combinations you are making (n+1) - (n+1-k) multiplications for numerator, and (k+1) - 1 multiplications for denominator.
Total: 2k multiplications
In cominations2 you are making n + k + (n-k) multiplications, i.e. 2n multiplications.
And you are making also 2n function calls for recursion.
With k=50 and n=1000, no wonder why the first solution is faster.

Primitive Calculator in Python

I have written the following code which does what is supposed to do and passes the tests and time and memory limits. However, it takes 90% of the time limit. Is it anyway to speed this up?
Secondly, I have seen other solutions that seem to be more straightforward and do not build a list of all minimum operations for integers up to n. Isn't it true that in DP we are supposed to do that? In other words, aren't we supposed to always build a table from bottom up?
Lastly, how can I make the code more readable?
# Use Python 3
""" You're given a calculator with only 3 operations: (-1, //2, //3).
find the minimum number of operations and the sequence of numbers to
go from 1 to n"""
import sys
input = sys.stdin.read()
n = int(input)
def operations(n):
"""
:param n: integer
:return: The list of the minimum number of operations to reduce n to 1
for each integer up to n. """
lst = [0] * n
for index in range(1, n):
nodes = [1 + lst[index - 1]]
for k in (2, 3):
if (index + 1) % k == 0:
nodes.append(1 + lst[((index + 1) // k) - 1])
lst[index] = sorted(nodes)[0]
return lst
master_sequence = list(enumerate(operations(n), 1))
end = master_sequence[-1]
minimum_operations = end[1]
sequence = []
while end != (1, 0):
step = [item[0] for item in master_sequence if
(end[1] - item[1]) == 1 and (end[0] - item[0] == 1 or end[0] %
item[0] == 0)][0]
sequence.append(step)
end = master_sequence[step - 1]
print(minimum_operations)
for s in sequence[::-1]:
print(s, end=' ')
print(n)
DP just means using sub-problem results to shorten the time/space complexity, so it often builds up but doesn't necessarily mean every value. Note: you could also solve this problem using a heap search, which wouldn't hit every node and I would imagine is pretty close to this in terms of timing and presumably less space.
A shorter approach using DP to the same result:
In []:
n = 10
# Define the operations and their condition for application:
ops = [(lambda x: True, lambda x: x-1),
(lambda x: x%2==0, lambda x: x//2),
(lambda x: x%3==0, lambda x: x//3)]
# Construct the operations count for all values up to `n`
min_ops = [0]*(n+1)
for i in range(2, n+1):
min_ops[i] = min(min_ops[op(i)] for cond, op in ops if cond(i))+1
# Reconstruct the path
r = []
while n:
r.append(n)
n = min((op(n) for cond, op in ops if cond(n)), key=min_ops.__getitem__)
len(r)-1, r[::-1]
Out[]
(3, [1, 3, 9, 10])
Some quick timings for different n:
10: 22 µs ± 577 ns per loop
1000: 1.48 ms ± 12.3 µs per loop
10000: 15.3 ms ± 325 µs per loop
100000: 159 ms ± 2.81 ms per loop
When I ran your code for I got:
10: 15.7 µs ± 229 ns per loop
1000: 4.55 ms ± 318 µs per loop
10000: 27.1 ms ± 896 µs per loop
100000: 315 ms ± 7.13 ms per loop

Categories