I have written the following code which does what is supposed to do and passes the tests and time and memory limits. However, it takes 90% of the time limit. Is it anyway to speed this up?
Secondly, I have seen other solutions that seem to be more straightforward and do not build a list of all minimum operations for integers up to n. Isn't it true that in DP we are supposed to do that? In other words, aren't we supposed to always build a table from bottom up?
Lastly, how can I make the code more readable?
# Use Python 3
""" You're given a calculator with only 3 operations: (-1, //2, //3).
find the minimum number of operations and the sequence of numbers to
go from 1 to n"""
import sys
input = sys.stdin.read()
n = int(input)
def operations(n):
"""
:param n: integer
:return: The list of the minimum number of operations to reduce n to 1
for each integer up to n. """
lst = [0] * n
for index in range(1, n):
nodes = [1 + lst[index - 1]]
for k in (2, 3):
if (index + 1) % k == 0:
nodes.append(1 + lst[((index + 1) // k) - 1])
lst[index] = sorted(nodes)[0]
return lst
master_sequence = list(enumerate(operations(n), 1))
end = master_sequence[-1]
minimum_operations = end[1]
sequence = []
while end != (1, 0):
step = [item[0] for item in master_sequence if
(end[1] - item[1]) == 1 and (end[0] - item[0] == 1 or end[0] %
item[0] == 0)][0]
sequence.append(step)
end = master_sequence[step - 1]
print(minimum_operations)
for s in sequence[::-1]:
print(s, end=' ')
print(n)
DP just means using sub-problem results to shorten the time/space complexity, so it often builds up but doesn't necessarily mean every value. Note: you could also solve this problem using a heap search, which wouldn't hit every node and I would imagine is pretty close to this in terms of timing and presumably less space.
A shorter approach using DP to the same result:
In []:
n = 10
# Define the operations and their condition for application:
ops = [(lambda x: True, lambda x: x-1),
(lambda x: x%2==0, lambda x: x//2),
(lambda x: x%3==0, lambda x: x//3)]
# Construct the operations count for all values up to `n`
min_ops = [0]*(n+1)
for i in range(2, n+1):
min_ops[i] = min(min_ops[op(i)] for cond, op in ops if cond(i))+1
# Reconstruct the path
r = []
while n:
r.append(n)
n = min((op(n) for cond, op in ops if cond(n)), key=min_ops.__getitem__)
len(r)-1, r[::-1]
Out[]
(3, [1, 3, 9, 10])
Some quick timings for different n:
10: 22 µs ± 577 ns per loop
1000: 1.48 ms ± 12.3 µs per loop
10000: 15.3 ms ± 325 µs per loop
100000: 159 ms ± 2.81 ms per loop
When I ran your code for I got:
10: 15.7 µs ± 229 ns per loop
1000: 4.55 ms ± 318 µs per loop
10000: 27.1 ms ± 896 µs per loop
100000: 315 ms ± 7.13 ms per loop
Related
I have a string that consists of x amount of the letter 'r' and y amount of 'u'. And my goal is two print all possible combinations of that same amount of x,y with different orders. This my example of a working code.
import itertools
RIGHT = 'r'
UP = 'u'
def up_and_right(n, k, lst):
case = RIGHT*n + UP*k
return [''.join(p) for p in set(itertools.permutations(case))]
Example input and output:
input:
lst1 = []
up_and_right(2,2,lst1)
lst1 => output:['ruru', 'urur', 'rruu', 'uurr', 'urru', 'ruur']
My problem is when the input is an integer more than 10 the code takes a minute to compute. How can I improve the compute time?
Thanks in advance!
Try itertools.combinations:
import itertools
RIGHT = "r"
UP = "u"
def up_and_right(n, k):
out = []
for right_idxs in itertools.combinations(range(n + k), r=n):
s = ""
for idx in range(n + k):
if idx in right_idxs:
s += RIGHT
else:
s += UP
out.append(s)
return out
print(up_and_right(2, 2))
Prints:
['rruu', 'ruru', 'ruur', 'urru', 'urur', 'uurr']
With one-liner:
def up_and_right(n, k):
return [
"".join(RIGHT if idx in right_idxs else UP for idx in range(n + k))
for right_idxs in itertools.combinations(range(n + k), r=n)
]
Your problem is about finding permutations of multisets. sympy has a utility method that deals with it.
from sympy.utilities.iterables import multiset_permutations
def up_and_right2(n,k):
case = RIGHT*n + UP*k
return list(map(''.join, multiset_permutations(case)))
Test:
y = up_and_right(n,k)
x = up_and_right2(n,k)
assert len(x) == len(y) and set(y) == set(x)
Timings:
n = 5
k = 6
%timeit up_and_right(n,k)
3.57 s ± 52.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit up_and_right2(n,k)
4.17 ms ± 159 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Binomial coefficient for given value of n and k(nCk)
using numpy to multiply the results of a for loop
but numpy method is returning the memory location not the result
pls provide better solution in terms of time complexity if possible.
or any other suggestions.
import time
import numpy
def binomialc(n,k):
return 1 if k==0 or k==n else numpy.prod((n+1-i)/i for i in range(1,k+1))
starttime=time.perf_counter()
print(binomialc(600,298))
print(time.perf_counter()-starttime)
You may want to use: scipy.special.binom()
or, since Python 3.8: math.comb()
EDIT
I am not quite sure why you would not want to use SciPy but you are OK with NumPy, as SciPy is a well-established library from essentially the same folks developing NumPy.
Anyway, here a couple of other methods:
using math.factorial:
import math
def binom(n, k):
return math.factorial(n) // math.factorial(k) // math.factorial(n - k)
using prod() and math.factorial() (theoretically more efficient, but not in practice):
def prod(items, start=1):
for item in items:
start *= item
return start
def binom_simplified(n, k):
if k > n - k:
return prod(range(k + 1, n + 1)) // math.factorial(n - k)
else:
return prod(range(n - k + 1, n + 1)) // math.factorial(k)
using numpy.prod():
import numpy as np
def binom_np(n, k):
return 1 if k == 0 or k == n else np.prod([(n + 1 - i) / i for i in range(1, k + 1)])
Speed-wise, scipy.special.binom() is the fastest by far and large, but if you need the exact value also for very large numbers, you may prefer binom() (somewhat surprisingly even over math.comb()).
%timeit scipy.special.binom(600, 298)
# 1000000 loops, best of 3: 1.56 µs per loop
print(scipy.special.binom(600, 298))
# 1.3332140543730587e+179
%timeit math.comb(600, 298)
# 10000 loops, best of 3: 75.6 µs per loop
print(math.binom(600, 298))
# 133321405437268991724586879878020905773601074858558174180536459530557427686938822154484588609548964189291743543415057988154692680263088796451884071926401665548516571367537285901600
%timeit binom(600, 298)
# 10000 loops, best of 3: 36.5 µs per loop
print(binom(600, 298))
# 133321405437268991724586879878020905773601074858558174180536459530557427686938822154484588609548964189291743543415057988154692680263088796451884071926401665548516571367537285901600
%timeit binom_np(600, 298)
# 10000 loops, best of 3: 45.8 µs per loop
print(binom_np(600, 298))
# 1.3332140543726893e+179
%timeit binom_simplified(600, 298)
# 10000 loops, best of 3: 41.9 µs per loop
print(binom_simplified(600, 298))
# 133321405437268991724586879878020905773601074858558174180536459530557427686938822154484588609548964189291743543415057988154692680263088796451884071926401665548516571367537285901600
I have to solve a lot of linear systems using the Scipy pivoted QR-decomposition.
Q, R, perm = scipy.linalg.qr(PW, pivoting=True, mode='full')
During solving the system I reorder the solution using a
permutation matrix using the function below.
def pvec2pmat(vec):
n = len(vec)
P = np.zeros((n, n))
counter = 0
for i in range(0, n):
for j in range(0, n):
if j == vec[counter]:
P[i, j] = 1.0
counter = counter + 1
break
return P.T
Unfortunately, this turns out to be very slow and the code spend a lot of time generating these matrices.
Is it possible to speedup this function?
It is very hard to answer your question, as I do not know what your code is supposed to do. Furthermore, there seems to be no connection to the title. If you understand you correctly, you ask me to optimize the given function, without even knowing what the input is.
I will assume that vec is expected to be a 1-dimensional integer array. In this case your second loop is quite unnecessary. There are two caseses:
vec[counter] in range(0, n)
In this case you set P[i, vec[counter]] to one, and increase the counter (due to the break statement)
vec[counter] not in range(0, n)
The counter will never be increases, as the if statement will never be True. Thus, we ignore the rest of vec and return the matrix.
Therefor a first simplification would be:
def pvec2pmat(vec):
n = len(vec)
P = np.zeros((n, n))
counter = 0
for i in range(0, n):
if vec[counter] in range(0, n):
P[i, vec[counter]] = 1.0
counter += 1
else:
return P.T
return P.T
So the relevant part of vec is only till for the first time a value not in range(0, n) is reached. We can check this right in the beginning and discard the rest.
We can do this using
invalid = (vec < 0) & (vec > 0)
try:
first_invalid = np.flatnonzero(invalid)[0]
except IndexError: # no invalid values
pass
else:
vec[:first_invalid] # keep only till first invalid encounter
Now we know that we assign one value for all rows i <= vec.size.
So we can simplify the loop
for i, vec_val in enumerate(vec):
P[i, vec_val] = 1
This can however also be done using indixing:
P[np.arange(vec.size), vec] = 1
Finally we realize, that instead of taken the transpose, we can just assign it in the reverse order and get
def pvec2pmat(vec):
n = len(vec)
P = np.zeros((n, n))
invalid = (vec < 0) & (vec > 0)
try:
first_invalid = np.flatnonzero(invalid)[0]
except IndexError: # no invalid values
pass
else:
vec[:first_invalid] # keep only till first invalid encounter
P[vec, np.arange(vec.size)] = 1
return P
A quick timing:
vec = np.arange(1000)
np.random.shuffle(vec)
# your version
128 ms ± 2.66 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
# my version
%timeit pvec2pmat(vec)
379 µs ± 22.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# for this simple example of a permutation, we can of course resort to
# siple indexing
%timeit np.eye(vec.size)[:, vec]
8.89 ms ± 13.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
For this particular example, my code takes 1/300 the time of your version. Depending on your need, another large speedup can be achieved, if P is chosen as an boolean matrix, and we assign True.
I have written two functions to calculate combinations. The first one uses a for loop, the other one uses a recursive factorial function. Why is the first faster than the second?
def combinations(n: int, k: int) -> int:
# Collection >= Selection
if n < k:
raise ValueError(
"The size of the collection we are selecting items from must be "
"larger than the size of the selection."
)
# Sizes > 0
if n < 0 or k < 0:
raise ValueError(
"Cannot work with negative integers."
)
# Compute with standard python only
numerator = 1
for i in range(n + 1 - k, n+1):
numerator *= i
denominator = 1
for i in range(1, k+1):
denominator *= i
return int(numerator / denominator)
The second function needs a factorial function defined as:
def factorial(n: int) -> int:
if n < 0:
raise ValueError(
"Cannot calculate factorial of a negative number."
)
# Recursive function up to n = 0
return n * factorial(n - 1) if n - 1 >= 0 else 1
And it is defined as:
def combinations2(n: int, k: int) -> int:
# Collection >= Selection
if n < k:
raise ValueError(
"The size of the collection we are selecting items from must be "
"larger than the size of the selection."
)
return int(factorial(n) / (factorial(k) * factorial(n - k)))
When I run the following test on IPython console, it is clear which one is faster
%timeit combinations(1000, 50)
16.2 µs ± 1.95 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)
and
%timeit combinations2(1000, 50)
1.6 ms ± 129 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
NEW VERSION OF COMBINATIONS2
Okay following the comments, I agree combinations2 is doing many more operations. So I rewrote both factorial and combinations function, here are their versions:
def factorial(n: int, lower: int=-1) -> int:
# n > 0
if n < 0:
raise ValueError(
"Cannot calculate factorial of a negative number."
)
# Recursive function up to n = 0 or up to lower bound
if n - 1 >= 0 and n - 1 >= lower:
return n * factorial(n - 1, lower)
return 1
which now can have a lower bound. Notice that in general factorial(a, b) = factorial(a) / factorial(b). Also, here is the new version of the combinations2 function:
def combinations2(n: int, k: int) -> int:
if n < k:
raise ValueError(
"The size of the collection we are selecting items from must be "
"larger than the size of the selection."
)
return int(factorial(n, n - k) / factorial(k))
But again, this is their comparison:
%timeit combinations(100, 50)
10.5 µs ± 1.67 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit combinations2(100, 50)
56.1 µs ± 5.79 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Just count the number of operations:
In combinations you are making (n+1) - (n+1-k) multiplications for numerator, and (k+1) - 1 multiplications for denominator.
Total: 2k multiplications
In cominations2 you are making n + k + (n-k) multiplications, i.e. 2n multiplications.
And you are making also 2n function calls for recursion.
With k=50 and n=1000, no wonder why the first solution is faster.
I am looking to memory optimise np.packbits(A==A[:, None], axis=1), where A is dense array of integers of length n. A==A[:, None] is memory hungry for large n since the resulting Boolean array is stored inefficiently with each Boolean value costing 1 byte.
I wrote the below script to achieve the same result while packing bits one section at a time. It is, however, around 3x slower, so I am looking for ways to speed it up. Or, alternatively, a better algorithm with small memory overhead.
Note: this is a follow-up question to one I asked earlier; Comparing numpy array with itself by element efficiently.
Reproducible code below for benchmarking.
import numpy as np
from numba import jit
#jit(nopython=True)
def bool2int(x):
y = 0
for i, j in enumerate(x):
if j: y += int(j)<<(7-i)
return y
#jit(nopython=True)
def compare_elementwise(arr, result, section):
n = len(arr)
for row in range(n):
for col in range(n):
section[col%8] = arr[row] == arr[col]
if ((col + 1) % 8 == 0) or (col == (n-1)):
result[row, col // 8] = bool2int(section)
section[:] = 0
return result
n = 10000
A = np.random.randint(0, 1000, n)
result_arr = np.zeros((n, n // 8 if n % 8 == 0 else n // 8 + 1)).astype(np.uint8)
selection_arr = np.zeros(8).astype(np.uint8)
# memory efficient version, but slow
packed = compare_elementwise(A, result_arr, selection_arr)
# memory inefficient version, but fast
packed2 = np.packbits(A == A[:, None], axis=1)
assert (packed == packed2).all()
%timeit compare_elementwise(A, result_arr, selection_arr) # 1.6 seconds
%timeit np.packbits(A == A[:, None], axis=1) # 0.460 second
Here is a solution 3 times faster than the numpy one (a.size must be a multiple of 8; see below) :
#nb.njit
def comp(a):
res=np.zeros((a.size,a.size//8),np.uint8)
for i,x in enumerate(a):
for j,y in enumerate(a):
if x==y: res[i,j//8] |= 128 >> j%8
return res
This works because the array is scanned one time, where you do it many times,
and amost all terms are null.
In [122]: %timeit np.packbits(A == A[:, None], axis=1)
389 ms ± 57.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [123]: %timeit comp(A)
123 ms ± 24.4 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
If a.size%8 > 0, the cost for find back the information will be higher. The best way in this case is to pad the initial array with some (in range(7)) zeros.
For completeness, the padding could be done as so:
if A.size % 8 != 0: A = np.pad(A, (0, 8 - A.size % 8), 'constant', constant_values=0)