Binomial coefficient for given value of n and k(nCk)
using numpy to multiply the results of a for loop
but numpy method is returning the memory location not the result
pls provide better solution in terms of time complexity if possible.
or any other suggestions.
import time
import numpy
def binomialc(n,k):
return 1 if k==0 or k==n else numpy.prod((n+1-i)/i for i in range(1,k+1))
starttime=time.perf_counter()
print(binomialc(600,298))
print(time.perf_counter()-starttime)
You may want to use: scipy.special.binom()
or, since Python 3.8: math.comb()
EDIT
I am not quite sure why you would not want to use SciPy but you are OK with NumPy, as SciPy is a well-established library from essentially the same folks developing NumPy.
Anyway, here a couple of other methods:
using math.factorial:
import math
def binom(n, k):
return math.factorial(n) // math.factorial(k) // math.factorial(n - k)
using prod() and math.factorial() (theoretically more efficient, but not in practice):
def prod(items, start=1):
for item in items:
start *= item
return start
def binom_simplified(n, k):
if k > n - k:
return prod(range(k + 1, n + 1)) // math.factorial(n - k)
else:
return prod(range(n - k + 1, n + 1)) // math.factorial(k)
using numpy.prod():
import numpy as np
def binom_np(n, k):
return 1 if k == 0 or k == n else np.prod([(n + 1 - i) / i for i in range(1, k + 1)])
Speed-wise, scipy.special.binom() is the fastest by far and large, but if you need the exact value also for very large numbers, you may prefer binom() (somewhat surprisingly even over math.comb()).
%timeit scipy.special.binom(600, 298)
# 1000000 loops, best of 3: 1.56 µs per loop
print(scipy.special.binom(600, 298))
# 1.3332140543730587e+179
%timeit math.comb(600, 298)
# 10000 loops, best of 3: 75.6 µs per loop
print(math.binom(600, 298))
# 133321405437268991724586879878020905773601074858558174180536459530557427686938822154484588609548964189291743543415057988154692680263088796451884071926401665548516571367537285901600
%timeit binom(600, 298)
# 10000 loops, best of 3: 36.5 µs per loop
print(binom(600, 298))
# 133321405437268991724586879878020905773601074858558174180536459530557427686938822154484588609548964189291743543415057988154692680263088796451884071926401665548516571367537285901600
%timeit binom_np(600, 298)
# 10000 loops, best of 3: 45.8 µs per loop
print(binom_np(600, 298))
# 1.3332140543726893e+179
%timeit binom_simplified(600, 298)
# 10000 loops, best of 3: 41.9 µs per loop
print(binom_simplified(600, 298))
# 133321405437268991724586879878020905773601074858558174180536459530557427686938822154484588609548964189291743543415057988154692680263088796451884071926401665548516571367537285901600
Related
I have written the following code which does what is supposed to do and passes the tests and time and memory limits. However, it takes 90% of the time limit. Is it anyway to speed this up?
Secondly, I have seen other solutions that seem to be more straightforward and do not build a list of all minimum operations for integers up to n. Isn't it true that in DP we are supposed to do that? In other words, aren't we supposed to always build a table from bottom up?
Lastly, how can I make the code more readable?
# Use Python 3
""" You're given a calculator with only 3 operations: (-1, //2, //3).
find the minimum number of operations and the sequence of numbers to
go from 1 to n"""
import sys
input = sys.stdin.read()
n = int(input)
def operations(n):
"""
:param n: integer
:return: The list of the minimum number of operations to reduce n to 1
for each integer up to n. """
lst = [0] * n
for index in range(1, n):
nodes = [1 + lst[index - 1]]
for k in (2, 3):
if (index + 1) % k == 0:
nodes.append(1 + lst[((index + 1) // k) - 1])
lst[index] = sorted(nodes)[0]
return lst
master_sequence = list(enumerate(operations(n), 1))
end = master_sequence[-1]
minimum_operations = end[1]
sequence = []
while end != (1, 0):
step = [item[0] for item in master_sequence if
(end[1] - item[1]) == 1 and (end[0] - item[0] == 1 or end[0] %
item[0] == 0)][0]
sequence.append(step)
end = master_sequence[step - 1]
print(minimum_operations)
for s in sequence[::-1]:
print(s, end=' ')
print(n)
DP just means using sub-problem results to shorten the time/space complexity, so it often builds up but doesn't necessarily mean every value. Note: you could also solve this problem using a heap search, which wouldn't hit every node and I would imagine is pretty close to this in terms of timing and presumably less space.
A shorter approach using DP to the same result:
In []:
n = 10
# Define the operations and their condition for application:
ops = [(lambda x: True, lambda x: x-1),
(lambda x: x%2==0, lambda x: x//2),
(lambda x: x%3==0, lambda x: x//3)]
# Construct the operations count for all values up to `n`
min_ops = [0]*(n+1)
for i in range(2, n+1):
min_ops[i] = min(min_ops[op(i)] for cond, op in ops if cond(i))+1
# Reconstruct the path
r = []
while n:
r.append(n)
n = min((op(n) for cond, op in ops if cond(n)), key=min_ops.__getitem__)
len(r)-1, r[::-1]
Out[]
(3, [1, 3, 9, 10])
Some quick timings for different n:
10: 22 µs ± 577 ns per loop
1000: 1.48 ms ± 12.3 µs per loop
10000: 15.3 ms ± 325 µs per loop
100000: 159 ms ± 2.81 ms per loop
When I ran your code for I got:
10: 15.7 µs ± 229 ns per loop
1000: 4.55 ms ± 318 µs per loop
10000: 27.1 ms ± 896 µs per loop
100000: 315 ms ± 7.13 ms per loop
Consider two ndarrays of length n, arr1 and arr2. I'm computing the following sum of products, and doing it num_runs times to benchmark:
import numpy as np
import time
num_runs = 1000
n = 100
arr1 = np.random.rand(n)
arr2 = np.random.rand(n)
start_comp = time.clock()
for r in xrange(num_runs):
sum_prods = np.sum( [arr1[i]*arr2[j] for i in xrange(n)
for j in xrange(i+1, n)] )
print "total time for comprehension = ", time.clock() - start_comp
start_loop = time.clock()
for r in xrange(num_runs):
sum_prod = 0.0
for i in xrange(n):
for j in xrange(i+1, n):
sum_prod += arr1[i]*arr2[j]
print "total time for loop = ", time.clock() - start_loop
The output is
total time for comprehension = 3.23097066953
total time for comprehension = 3.9045544426
so using list comprehension appears faster.
Is there a much more efficient implementation, using Numpy routines perhaps, to calculate such a sum of products?
Rearrange the operation into an O(n) runtime algorithm instead of O(n^2), and take advantage of NumPy for the products and sums:
# arr1_weights[i] is the sum of all terms arr1[i] gets multiplied by in the
# original version
arr1_weights = arr2[::-1].cumsum()[::-1] - arr2
sum_prods = arr1.dot(arr1_weights)
Timing shows this to be about 200 times faster than the list comprehension for n == 100.
In [21]: %%timeit
....: np.sum([arr1[i] * arr2[j] for i in range(n) for j in range(i+1, n)])
....:
100 loops, best of 3: 5.13 ms per loop
In [22]: %%timeit
....: arr1_weights = arr2[::-1].cumsum()[::-1] - arr2
....: sum_prods = arr1.dot(arr1_weights)
....:
10000 loops, best of 3: 22.8 µs per loop
A vectorized way : np.sum(np.triu(np.multiply.outer(arr1,arr2),1)).
for a 30x improvement:
In [9]: %timeit np.sum(np.triu(np.multiply.outer(arr1,arr2),1))
1000 loops, best of 3: 272 µs per loop
In [10]: %timeit np.sum( [arr1[i]*arr2[j] for i in range(n)
for j in range(i+1, n)]
100 loops, best of 3: 7.9 ms per loop
In [11]: allclose(np.sum(np.triu(np.multiply.outer(arr1,arr2),1)),
np.sum(np.triu(np.multiply.outer(arr1,arr2),1)))
Out[11]: True
Another fast approch is to use numba :
from numba import jit
#jit
def t(arr1,arr2):
s=0
for i in range(n):
for j in range(i+1,n):
s+= arr1[i]*arr2[j]
return s
for a 10x new factor :
In [12]: %timeit t(arr1,arr2)
10000 loops, best of 3: 21.1 µs per loop
And using #user2357112 minimal answer,
#jit
def t2357112(arr1,arr2):
s=0
c=0
for i in range(n-2,-1,-1):
c += arr2[i+1]
s += arr1[i]*c
return s
for
In [13]: %timeit t2357112(arr1,arr2)
100000 loops, best of 3: 2.33 µs per loop
, just doing the necessary operations.
You can use the following broadcasting trick:
a = np.sum(np.triu(arr1[:,None]*arr2[None,:],1))
b = np.sum( [arr1[i]*arr2[j] for i in xrange(n) for j in xrange(i+1, n)] )
print a == b # True
Basically, I'm paying the price of calculating the product of all elements pairwise in arr1 and arr2 to take advantage of the speed of numpy broadcasting/vectorization being done much faster in low-level code.
And timings:
%timeit np.sum(np.triu(arr1[:,None]*arr2[None,:],1))
10000 loops, best of 3: 55.9 µs per loop
%timeit np.sum( [arr1[i]*arr2[j] for i in xrange(n) for j in xrange(i+1, n)] )
1000 loops, best of 3: 1.45 ms per loop
I profiled my program, and more than 80% of the time is spent in this one-line function! How can I optimize it? I am running with PyPy, so I'd rather not use NumPy, but since my program is spending almost all of its time there, I think giving up PyPy for NumPy might be worth it. However, I would prefer to use the CFFI, since that's more compatible with PyPy.
#x, y, are lists of 1s and 0s. c_out is a positive int. bit is 1 or 0.
def findCarryIn(x, y, c_out, bit):
return (2 * c_out +
bit -
sum(map(lambda x_bit, y_bit: x_bit & y_bit, x, reversed(y)))) #note this is basically a dot product.
Without using Numpy, After testing with timeit , The fastest method for the summing (that you are doing) seems to be using simple for loop and summing over the elements, Example -
def findCarryIn(x, y, c_out, bit):
s = 0
for i,j in zip(x, reversed(y)):
s += i & j
return (2 * c_out + bit - s)
Though this did not increase the performance by a lot (maybe 20% or so).
The results of timing tests (With different methods , func4 containing the method described above) -
def func1(x,y):
return sum(map(lambda x_bit, y_bit: x_bit & y_bit, x, reversed(y)))
def func2(x,y):
return sum([i & j for i,j in zip(x,reversed(y))])
def func3(x,y):
return sum(x[i] & y[-1-i] for i in range(min(len(x),len(y))))
def func4(x,y):
s = 0
for i,j in zip(x, reversed(y)):
s += i & j
return s
In [125]: %timeit func1(x,y)
100000 loops, best of 3: 3.02 µs per loop
In [126]: %timeit func2(x,y)
The slowest run took 6.42 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 2.9 µs per loop
In [127]: %timeit func3(x,y)
100000 loops, best of 3: 4.31 µs per loop
In [128]: %timeit func4(x,y)
100000 loops, best of 3: 2.2 µs per loop
This can for sure be sped up a lot using numpy. You could define your function something like this:
def find_carry_numpy(x, y, c_out, bit):
return 2 * c_out + bit - np.sum(x & y[::-1])
Create some random data:
In [36]: n = 100; c = 15; bit = 1
In [37]: x_arr = np.random.rand(n) > 0.5
In [38]: y_arr = np.random.rand(n) > 0.5
In [39]: x_list = list(x_arr)
In [40]: y_list = list(y_arr)
Check that results are the same:
In [42]: find_carry_numpy(x_arr, y_arr, c, bit)
Out[42]: 10
In [43]: findCarryIn(x_list, y_list, c, bit)
Out[43]: 10
Quick speed test:
In [44]: timeit find_carry_numpy(x_arr, y_arr, c, bit)
10000 loops, best of 3: 19.6 µs per loop
In [45]: timeit findCarryIn(x_list, y_list, c, bit)
1000 loops, best of 3: 409 µs per loop
So you gain a factor of 20 in speed! That is a pretty typical speedup when converting Python code to Numpy.
Using sympy I have two lists:
terms = [1, x, x*(x-1)]
coefficients = [-1,8.1,7]
I need to get the output:
-1+8.1*x+7*x(x-1)
I tried:
print (sum(a,x__i) for a, x__i in izip(terminos,coeficientes))
But I actually got a generator, I tried such based on a working code:
def lag_l(xx, j):
x = Symbol("x")
parts = ((x - x_i) / (xx[j] - x_i) for i, x_i in enumerate(xx) if i != j)
return prod(parts)
def lag_L(xx, yy):
return sum(y*lag_l(xx, j) for j, y in enumerate(yy))
How can I complete this?
In [159]: import sympy as sy
In [160]: from sympy.abc import x
In [161]: terms = [1, x, x*(x-1)]
In [162]: coefficients = [-1,8.1,7]
In [163]: sum(t*c for t, c in zip(terms, coefficients))
Out[163]: 7*x*(x - 1) + 8.1*x - 1
Interestingly, sum(term*coef for term, coef in zip(terms, coefficients)) is a bit faster than sum(coef * term for coef, term in zip(coefficients, terms)):
In [182]: %timeit sum(term * coef for term, coef in zip(terms, coefficients))
10000 loops, best of 3: 34.1 µs per loop
In [183]: %timeit sum(coef * term for coef, term in zip(coefficients, terms))
10000 loops, best of 3: 38.7 µs per loop
The reason for this is because coef * term calls coef.__mul__(term) which then has to call term.__rmul__(coef) since ints do not know how to multiply with sympy Symbols. That extra function call makes coef * term slower than term * coef. (term * coef calls term.__mul__(coef) directly.)
Here are some more microbenchmarks:
In [178]: %timeit sum(IT.imap(op.mul, coefficients, terms))
10000 loops, best of 3: 38 µs per loop
In [186]: %timeit sum(IT.imap(op.mul, terms, coefficients))
10000 loops, best of 3: 32.8 µs per loop
In [179]: %timeit sum(map(op.mul, coefficients, terms))
10000 loops, best of 3: 38.5 µs per loop
In [188]: %timeit sum(map(op.mul, terms, coefficients))
10000 loops, best of 3: 33.3 µs per loop
Notice that the order of the terms and coefficients matters, but otherwise there is little time difference between these variants. For larger input, they also perform about the same:
In [203]: terms = [1, x, x*(x-1)] * 100000
In [204]: coefficients = [-1,8.1,7] * 100000
In [205]: %timeit sum(IT.imap(op.mul, terms, coefficients))
1 loops, best of 3: 3.63 s per loop
In [206]: %timeit sum(term * coef for term, coef in zip(terms, coefficients))
1 loops, best of 3: 3.63 s per loop
In [207]: %timeit sum(map(op.mul, terms, coefficients))
1 loops, best of 3: 3.48 s per loop
Also be aware that if you do not know (through profiling) that this operation is a critical bottleneck in your code, worrying about these slight differences is a waste of your time, since the time it takes to pre-optimize this stuff is far greater than the amount of time you save whilst the code is running. As they say, preoptimization is the root of all evil. I'm probably already guilty of that.
In Python2,
sum(IT.imap(op.mul, coefficients, terms))
uses the least memory.
In Python3, zip and map returns iterators, so
sum(t*c for t, c in zip(terms, coefficients))
sum(map(op.mul, coefficients, terms))
would also be memory-efficient.
Using a simple generator expression:
sum(coef * term for coef, term in zip(coefficients, terms))
Alternatively, instead of using zip you want to use something similar to zip_with:
def zip_with(operation, *iterables):
for elements in zip(*iterables):
yield operation(*elements)
And use it as:
import operator as op
sum(zip_with(op.mul, coefficients, terms))
As unutbu mentioned python already provide such a function in itertools.imap under python2 and the built'-in's map in python3, so you can avoid re-writing it and use either:
sum(itertools.imap(op.mul, coefficients, terms))
or
sum(map(op.mul, coefficients, terms) #python3
python 2 map works slightly differently when you pass more than one sequence and the length differ.
Given is a large array. I am looking for the index up to which all elements in the array add up to a number smaller than limit. I found two ways to do so:
import time as tm
import numpy as nm
# Data that we are working with
large = nm.array([3] * 8000)
limit = 23996
# Numpy version, hoping it would be faster
start = tm.time() # Start timing
left1 = nm.tril([large] * len(large)) # Build triangular matrix
left2 = nm.sum(left1, 1) # Sum up all rows of the matrix
idx = nm.where(left2 >= limit)[0][0] # Check what row exceeds the limit
stop = tm.time()
print "Numpy result :", idx
print "Numpy took :", stop - start, " seconds"
# Python loop
sm = 0 # dynamic sum of elements
start = tm.time()
for i in range(len(large)):
sm += large[i] # sum up elements one by one
if sm >= limit: # check if the sum exceeds the limit
idx = i
break # If limit is reached, stop looping.
else:
idx = i
stop = tm.time()
print "Loop result :", idx
print "Loop took :", stop - start, " seconds"
Unfortunately, the numpy version runs out of memory if the array is much larger. By larger I mean 100 000 values. Of course, this gives a big matrix, but the for-loop takes 2min. to run through those 100 000 values just as well. So, where is the bottleneck? How can I speed this code up?
You can get this with:
np.argmin(large.cumsum() < limit)
or equivalently
(large.cumsum() < limit).argmin()
In IPython:
In [6]: %timeit (large.cumsum() < limit).argmin()
10000 loops, best of 3: 33.8 µs per loop
for large with 100000 elements, and limit = 100000.0/2
In [4]: %timeit (large.cumsum() < limit).argmin()
1000 loops, best of 3: 444 µs per loop
It does not make any real difference, but it is conventional to import numpy as np rather than import numpy as nm.
Documentation:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.cumsum.html
http://docs.scipy.org/doc/numpy/reference/generated/numpy.argmin.html
Using numba you can speed up the python loop significantly.
import numba
import numpy as np
def numpyloop(large,limit):
return np.argmin(large.cumsum() < limit)
#numba.autojit
def pythonloop(large,limit):
sm = 0
idx = 0
for i in range(len(large)):
#for i in range(large.shape[0]):
sm += large[i] # sum up elements one by one
if sm >= limit: # check if the sum exceeds the limit
idx = i
break # If limit is reached, stop looping.
else:
idx = i
return idx
large = np.array([3] * 8000)
limit = 23996
%timeit pythonloop(large,limit)
%timeit numpyloop(large,limit)
large = np.array([3] * 100000)
limit = 100000/2
%timeit pythonloop(large,limit)
%timeit numpyloop(large,limit)
Python: 100000 loops, best of 3: 6.63 µs per loop
Numpy: 10000 loops, best of 3: 33.2 µs per loop
Large array, small limit
Python: 100000 loops, best of 3: 12.1 µs per loop
Numpy: 1000 loops, best of 3: 351 µs per loop