Is there a way to compute the Cobb-Douglas utility function faster in Python. I run it millions of time, so a speed increase would help. The function raises elements of quantities_list to power of corresponding elements of exponents list, and then multiplies all the resulting elements.
n = 10
quantities = range(n)
exponents = range(n)
def Cobb_Douglas(quantities_list, exponents_list):
number_of_variables = len(quantities_list)
value = 1
for variable in xrange(number_of_variables):
value *= quantities_list[variable] ** exponents_list[variable]
return value
t0 = time.time()
for i in xrange(100000):
Cobb_Douglas(quantities, exponents)
t1 = time.time()
print t1-t0
Iterators are your friend. I got a 28% speedup on my computer by switching your loop to this:
for q, e in itertools.izip(quantities_list, exponents_list):
value *= q ** e
I also got similar results when switching your loop to a functools.reduce call, so it's not worth providing a code sample.
In general, numpy is the right choice for fast arithmetic operations, but numpy's largest integer type is 64 bits, which won't hold the result for your example. If you're using a different numeric range or arithmetic type, numpy is king:
quantities = np.array(quantities, dtype=np.int64)
exponents = np.array(exponents, dtype=np.int64)
def Cobb_Douglas(quantities_list, exponents_list):
return np.product(np.power(quantities_list, exponents_list))
# result: 2649120435010011136
# actual: 21577941222941856209168026828800000
Couple of suggestions:
Use Numpy
Vectorize your code
If quantities are large and and nothing's going to be zero or negative, work in log-space.
I got about a 15% speedup locally using:
def np_Cobb_Douglas(quantities_list, exponents_list):
return np.product(np.power(quantities_list, exponents_list))
And about 40% using:
def np_log_Cobb_Douglas(quantities_list, exponents_list):
return np.exp(np.dot(np.log(quantities_list), np.log(exponents_list)))
Last but not least, there should be some scaling of your Cobb-Douglas parameters so you don't run into overflow errors (if I'm remembering my intro macro correctly).
Related
input_numbers=list(map(int,input().split()))
sum_number=0
def my_gen(a):
i=0
while i <= a:
yield i
i += 1
for i in my_gen(input_numbers[0]):
sum_number += i**input_numbers[1]
print(sum_number%1000000009)
I tried not using generator, but it was too slow.
so, tried again with generator and it was slow too.
How can I make this faster?//
More information:
My scoring bot is saying Time out.
and
(1<=input_numbers[0]<=1,000,000,000)
(1<=input_numbers[1]<=50)
& Numpy cant be used
You can use Faulhaber's formula which will only require a loop over the power value (rather than the billion numbers from 0 to N).
from fractions import Fraction
from functools import lru_cache
#lru_cache()
def bernoulli(n,result=True): # optimized version
A = [Fraction(1,n+1)]
for j,b in enumerate(bernoulli(n-1,False) if n else []):
A.append((n-j)*(b-A[-1]))
return A[-1] if result else A
#lru_cache()
def comb(n,r):
return 1 if not r else comb(n,r-1)*(n-r+1)//r
def powerSum(N,P):
result = sum(comb(P+1,j) * bernoulli(j) * N**(P+1-j) for j in range(P+1))
return (result / (P+1)).numerator
output:
powerSum(100,3) # 25502500
sum(i**3 for i in range(100+1)) # 25502500 (proof)
powerSum(1000000000,50)%1000000009
# 265558322 in 0.016 seconds on my laptop
sum(i**50 for i in range(1000000000+1))%1000000009
# 265558322 (proof in 16.5 minutes)
The code is slow because you're taking large exponents of large numbers. But the final output doesn't require the full sum, just the modulo. So you can apply basic modular arithmetic to keep the numbers in your calculation smaller while getting the same final answer.
This is the bad part of problem: https://projecteuler.net/problem=429
But the good parts is to solve it yourself.
I tried a problem on project euler where I needed to find the sum of all the fibonacci terms under 4 million. It took me a long time but then I found out that I can use memoization to do it but it seems to take still a long time. After a lot of research, I found out that I can use a built-in module called lru_cache. My question is : why isn't it as fast as memoization ?
Here's my code:
from functools import lru_cache
#lru_cache(maxsize=1000000)
def fibonacci_memo(input_value):
global value
fibonacci_cache = {}
if input_value in fibonacci_cache:
return fibonacci_cache[input_value]
if input_value == 0:
value = 1
elif input_value == 1:
value = 1
elif input_value > 1:
value = fibonacci_memo(input_value - 1) + fibonacci_memo(input_value - 2)
fibonacci_cache[input_value] = value
return value
def sumOfFib():
SUM = 0
for n in range(500):
if fibonacci_memo(n) < 4000000:
if fibonacci_memo(n) % 2 == 0:
SUM += fibonacci_memo(n)
return SUM
print(sumOfFib())
The code works by the way. It takes less than a second to run it when I use the lru_cache module.
The other answer is the correct way to calculate the fibonacci sequence, indeed, but you should also know why your memoization wasn't working. To be specific:
fibonacci_cache = {}
This line being inside the function means you were emptying your cache every time fibonacci_memo was called.
You shouldn't be computing the Fibonacci sequence, not even by dynamic programming. Since the Fibonacci sequence satisfies a linear recurrence relation with constant coefficients and constant order, then so will be the sequence of their sums.
Definitely don't cache all the values. That will give you an unnecessary consumption of memory. When the recurrences have constant order, you only need to remember as many previous terms as the order of the recurrence.
Further more, there is a way to turn recurrences of constant order into systems recurrences of order one. The solution of the latter is given by a power of a matrix. This gives a faster algorithm, for large values of n. Each step will be more expensive, though. So, the best method would use a combination of the two, choosing the first method for small values of n and the latter for large inputs.
O(n) using the recurrence for the sum
Denote S_n=F_0+F_1+...+F_n the sum of the first Fibonacci numbers F_0,F_1,...,F_n.
Observe that
S_{n+1}-S_n=F_{n+1}
S_{n+2}-S_{n+1}=F_{n+2}
S_{n+3}-S_{n+2}=F_{n+3}
Since F_{n+3}=F_{n+2}+F_{n+1} we get that S_{n+3}-S_{n+2}=S_{n+2}-S_n. So
S_{n+3}=2S_{n+2}-S_n
with the initial conditions S_0=F_0=1, S_1=F_0+F_1=1+1=2, and S_2=S_1+F_2=2+2=4.
One thing that you can do is compute S_n bottom up, remembering the values of only the previous three terms at each step. You don't need to remember all of the values of S_k, from k=0 to k=n. This gives you an O(n) algorithm with O(1) amount of memory.
O(ln(n)) by matrix exponentiation
You can also get an O(ln(n)) algorithm in the following way:
Call X_n to be the column vector with components S_{n+2},S_{n+1},S_{n}
So, the recurrence above gives the recurrence
X_{n+1}=AX_n
where A is the matrix
[
[2,0,-1],
[1,0,0],
[0,1,0],
]
Therefore, X_n=A^nX_0. We have X_0. To multiply by A^n we can do exponentiation by squaring.
For the sake of completeness here are implementations of the general ideas described in #NotDijkstra's answer plus my humble optimizations including the "closed form" solution implemented in integer arithmetic.
We can see that the "smart" methods are not only an order of magnitude faster but also seem to scale better compatible with the fact (thanks #NotDijkstra) that Python big ints use better than naive multiplication.
import numpy as np
import operator as op
from simple_benchmark import BenchmarkBuilder, MultiArgument
B = BenchmarkBuilder()
def pow(b,e,mul=op.mul,unit=1):
if e == 0:
return unit
res = b
for bit in bin(e)[3:]:
res = mul(res,res)
if bit=="1":
res = mul(res,b)
return res
def mul_fib(a,b):
return (a[0]*b[0]+5*a[1]*b[1])>>1 , (a[0]*b[1]+a[1]*b[0])>>1
def fib_closed(n):
return pow((1,1),n+1,mul_fib)[1]
def fib_mat(n):
return pow(np.array([[1,1],[1,0]],'O'),n,op.matmul)[0,0]
def fib_sequential(n):
t1,t2 = 1,1
for i in range(n-1):
t1,t2 = t2,t1+t2
return t2
def sum_fib_direct(n):
t1,t2,res = 1,1,1
for i in range(n):
t1,t2,res = t2,t1+t2,res+t2
return res
def sum_fib(n,method="closed"):
if method == "direct":
return sum_fib_direct(n)
return globals()[f"fib_{method}"](n+2)-1
methods = "closed mat sequential direct".split()
def f(method):
def f(n):
return sum_fib(n,method)
f.__name__ = method
return f
for method in methods:
B.add_function(method)(f(method))
B.add_arguments('N')(lambda:(2*(1<<k,) for k in range(23)))
r = B.run()
r.plot()
import matplotlib.pylab as P
P.savefig(fib.png)
I am not sure how you are taking anything near a second. Here is the memoized version without fanciness:
class fibs(object):
def __init__(self):
self.thefibs = {0:0, 1:1}
def __call__(self, n):
if n not in self.thefibs:
self.thefibs[n] = self(n-1)+self(n-2)
return self.thefibs[n]
dog = fibs()
sum([dog(i) for i in range(40) if dog(i) < 4000000])
If I were to use the following defined function to compute a discrete Fourier transform, how would I show that the computation scales as O(N^2) as a function of vector length.
def dft(y):
N = len(y)
c = np.zeros(N//2+1,complex)
for k in range(N//2+1):
for n in range(N):
c[k] += y[k]*np.exp(-2j*np.pi*k*n/N)
return c
from what I understand, if an algorithm scales as O(N^2) means that it is quadratic and the run time of the loops is proportional to the square of N. If N were doubled...then the run time would increase by N*N.
My first thought would to run a program were I transform an array of values where the length is equal to N, and then double these values (doubling N) and show that the run time difference between these two is N^2. Does this make any sense (or is there a different/better way)? If so how would I measure the run time in python?
thank you.
The runtime? You could Just make a counter at the beginning and each time something is done increase it by 1. So, inside your second for loop just increment the counter by 1, and when the program finishes print the counter. That would show the amount of calculations needed.
count = 0
def dft(y):
N = len(y)
c = np.zeros(N//2+1,complex)
for k in range(N//2+1):
for n in range(N):
c[k] += y[k]*np.exp(-2j*np.pi*k*n/N)
count+=1
return c
print(count)
A little depending on what time you want to measure you could use time.clock (which I think is closest to what you want here - it measures the time shares that your program actually got to run) or datetime.datetime.now.
You just get the time before and after your calculation is done. Something like:
t0 = time.clock()
dft()
t1 = time.clock()
print("Time ellapsed: {0}".format(t1-t0))
Note that what you're looking for when doubling N is a quadrupling of the time.
The line computing the coefficients is repeated
times. Then you need to show there is a constant M and a value for N that
as N approaches infinity. Then you've shown
.
The timeit library is made for exactly this purpose.
https://docs.python.org/2/library/timeit.html
from timeit import timeit
for i in [10, 100, 1000, ...]:
y = generate_array(i)
timeit('dft(y)')
I know there's already a question similar to this, but I want to speed it up using GMPY2 (or something similar with GMP).
Here is my current code, it's decent but can it be better?
Edit: new code, checks divisors 2 and 3
def factors(n):
result = set()
result |= {mpz(1), mpz(n)}
def all_multiples(result, n, factor):
z = mpz(n)
while gmpy2.f_mod(mpz(z), factor) == 0:
z = gmpy2.divexact(z, factor)
result |= {mpz(factor), z}
return result
result = all_multiples(result, n, 2)
result = all_multiples(result, n, 3)
for i in range(1, gmpy2.isqrt(n) + 1, 6):
i1 = mpz(i) + 1
i2 = mpz(i) + 5
div1, mod1 = gmpy2.f_divmod(n, i1)
div2, mod2 = gmpy2.f_divmod(n, i2)
if mod1 == 0:
result |= {i1, div1}
if mod2 == 0:
result |= {i2, div2}
return result
If it's possible, I'm also interested in an implementation with divisors only within n^(1/3) and 2^(2/3)*n(1/3)
As an example, mathematica's factor() is much faster than the python code. I want to factor numbers between 20 and 50 decimal digits. I know ggnfs can factor these in less than 5 seconds.
I am interested if any module implementing fast factorization exists in python too.
I just made some quick changes to your code to eliminate redundant name lookups. The algorithm is still the same but it is about twice as fast on my computer.
import gmpy2
from gmpy2 import mpz
def factors(n):
result = set()
n = mpz(n)
for i in range(1, gmpy2.isqrt(n) + 1):
div, mod = divmod(n, i)
if not mod:
result |= {mpz(i), div}
return result
print(factors(12345678901234567))
Other suggestions will need more information about the size of the numbers, etc. For example, if you need all the possible factors, it may be faster to construct those from all the prime factors. That approach will let you decrease the limit of the range statement as you proceed and also will let you increment by 2 (after removing all the factors of 2).
Update 1
I've made some additional changes to your code. I don't think your all_multiplies() function is correct. Your range() statement isn't optimal since 2 is check again but my first fix made it worse.
The new code delays computing the co-factor until it knows the remainder is 0. I also tried to use the built-in functions as much as possible. For example, mpz % integer is faster than gmpy2.f_mod(mpz, integer) or gmpy2.f_mod(integer, mpz) where integer is a normal Python integer.
import gmpy2
from gmpy2 import mpz, isqrt
def factors(n):
n = mpz(n)
result = set()
result |= {mpz(1), n}
def all_multiples(result, n, factor):
z = n
f = mpz(factor)
while z % f == 0:
result |= {f, z // f}
f += factor
return result
result = all_multiples(result, n, 2)
result = all_multiples(result, n, 3)
for i in range(1, isqrt(n) + 1, 6):
i1 = i + 1
i2 = i + 5
if not n % i1:
result |= {mpz(i1), n // i1}
if not n % i2:
result |= {mpz(i2), n // i2}
return result
print(factors(12345678901234567))
I would change your program to just find all the prime factors less than the square root of n and then construct all the co-factors later. Then you decrease n each time you find a factor, check if n is prime, and only look for more factors if n isn't prime.
Update 2
The pyecm module should be able to factor the size numbers you are trying to factor. The following example completes in about a second.
>>> import pyecm
>>> list(pyecm.factors(12345678901234567890123456789012345678901, False, True, 10, 1))
[mpz(29), mpz(43), mpz(43), mpz(55202177), mpz(2928109491677), mpz(1424415039563189)]
There exist different Python factoring modules in the Internet. But if you want to implement factoring yourself (without using external libraries) then I can suggest quite fast and very easy to implement Pollard-Rho Algorithm. I implemented it fully in my code below, you just scroll down directly to my code (at the bottom of answer) if you don't want to read.
With great probability Pollard-Rho algorithm finds smallest non-trivial factor P (not equal to 1 or N) within time of O(Sqrt(P)). To compare, Trial Division algorithm that you implemented in your question takes O(P) time to find factor P. It means for example if a prime factor P = 1 000 003 then trial division will find it after 1 000 003 division operations, while Pollard-Rho on average will find it just after 1 000 operations (Sqrt(1 000 003) = 1 000), which is much much faster.
To make Pollard-Rho algorithm much faster we should be able to detect prime numbers, to exclude them from factoring and don't wait unnecessarily time, for that in my code I used Fermat Primality Test which is very fast and easy to implement within just 7-9 lines of code.
Pollard-Rho algorithm itself is very short, 13-15 lines of code, you can see it at the very bottom of my pollard_rho_factor() function, the remaining lines of code are supplementary helpers-functions.
I implemented all algorithms from scratch without using extra libraries (except random module). That's why you can see my gcd() function there although you can use built-in Python's math.gcd() instead (which finds Greatest Common Divisor).
You can see function Int() in my code, it is used just to convert Python's integers to GMPY2. GMPY2 ints will make algorithm faster, you can just use Python's int(x) instead. I didn't use any specific GMPY2 function, just converted all ints to GMPY2 ints to have around 50% speedup.
As an example I factor first 190 digits of Pi!!! It takes 3-15 seconds to factor them. Pollard-Rho algorithm is randomized hence it takes different time to factor same number on each run. You can restart program again and see that it will print different running time.
Of course factoring time depends greatly on size of prime divisors. Some 50-200 digits numbers can be factoring within fraction of second, some will take months. My example 190 digits of Pi has quite small prime factors, except largest one, that's why it is fast. Other digits of Pi may be not that fast to factor. So digit-size of number doesn't matter very much, only size of prime factors matter.
I intentionally implemented pollard_rho_factor() function as one standalone function, without breaking it into smaller separate functions. Although it breaks Python's style guide, which (as I remember) suggests not to have nested functions and place all possible functions at global scope. Also style guide suggests to do all imports at global scope in first lines of script. I did single function intentionally so that it is easy copy-pastable and fully ready to use in your code. Fermat primality test is_fermat_probable_prime() sub-function is also copy pastable and works without extra dependencies.
In very rare cases Pollard-Rho algorithm may fail to find non-trivial prime factor, especially for very small factors, for example you can replace n inside test() with small number 4 and see that Pollard-Rho fails. For such small failed factors you can easily use your Trial Division algorithm that you implemented in your question.
Try it online!
def pollard_rho_factor(N, *, trials = 16):
# https://en.wikipedia.org/wiki/Pollard%27s_rho_algorithm
import math, random
def Int(x):
import gmpy2
return gmpy2.mpz(x) # int(x)
def is_fermat_probable_prime(n, *, trials = 32):
# https://en.wikipedia.org/wiki/Fermat_primality_test
import random
if n <= 16:
return n in (2, 3, 5, 7, 11, 13)
for i in range(trials):
if pow(random.randint(2, n - 2), n - 1, n) != 1:
return False
return True
def gcd(a, b):
# https://en.wikipedia.org/wiki/Greatest_common_divisor
# https://en.wikipedia.org/wiki/Euclidean_algorithm
while b != 0:
a, b = b, a % b
return a
def found(f, prime):
print(f'Found {("composite", "prime")[prime]} factor, {math.log2(f):>7.03f} bits... {("Pollard-Rho failed to fully factor it!", "")[prime]}')
return f
N = Int(N)
if N <= 1:
return []
if is_fermat_probable_prime(N):
return [found(N, True)]
for j in range(trials):
i, stage, y, x = 0, 2, Int(1), Int(random.randint(1, N - 2))
while True:
r = gcd(N, abs(x - y))
if r != 1:
break
if i == stage:
y = x
stage <<= 1
x = (x * x + 1) % N
i += 1
if r != N:
return sorted(pollard_rho_factor(r) + pollard_rho_factor(N // r))
return [found(N, False)] # Pollard-Rho failed
def test():
import time
# http://www.math.com/tables/constants/pi.htm
# pi = 3.
# 1415926535 8979323846 2643383279 5028841971 6939937510 5820974944 5923078164 0628620899 8628034825 3421170679
# 8214808651 3282306647 0938446095 5058223172 5359408128 4811174502 8410270193 8521105559 6446229489 5493038196
# n = first 190 fractional digits of Pi
n = 1415926535_8979323846_2643383279_5028841971_6939937510_5820974944_5923078164_0628620899_8628034825_3421170679_8214808651_3282306647_0938446095_5058223172_5359408128_4811174502_8410270193_8521105559_6446229489
tb = time.time()
print('N:', n)
print('Factors:', pollard_rho_factor(n))
print(f'Time: {time.time() - tb:.03f} sec')
test()
Output:
N: 1415926535897932384626433832795028841971693993751058209749445923078164062862089986280348253421170679821480865132823066470938446095505822317253594081284811174502841027019385211055596446229489
Found prime factor, 1.585 bits...
Found prime factor, 6.150 bits...
Found prime factor, 20.020 bits...
Found prime factor, 27.193 bits...
Found prime factor, 28.311 bits...
Found prime factor, 545.087 bits...
Factors: [mpz(3), mpz(71), mpz(1063541), mpz(153422959), mpz(332958319), mpz(122356390229851897378935483485536580757336676443481705501726535578690975860555141829117483263572548187951860901335596150415443615382488933330968669408906073630300473)]
Time: 2.963 sec
I am working on Project Euler Problem 50, which states:
The prime 41, can be written as the sum of six consecutive primes:
41 = 2 + 3 + 5 + 7 + 11 + 13
This is the longest sum of consecutive primes that adds to a prime below one-hundred.
The longest sum of consecutive primes below one-thousand that adds to a prime, contains 21 terms, and is equal to 953.
Which prime, below one-million, can be written as the sum of the most consecutive primes?
For determining the terms in prime P (if it at all can be written as a sum of primes) I use a sliding window of all the primes (in increasing order) up to (but not including) P, and calculate the sum of all these windows, if the sum is equal to the prime considered, I count the length of the window...
This works fine for all primes up to 1000, but for primes up to 10**6 it is very slow, so I was hoping memozation would help; when calculating the sum of sliding windows, a lot of double work is done...(right?)
So I found the standard memoizaton implemention on the net and just pasted it in my code, is this correct? (I have no idea how it is supposed to work here...)
primes = tuple(n for n in range(1, 10**6) if is_prime(n)==True)
count_best = 0
##http://docs.python.org/release/2.3.5/lib/itertools-example.html:
## Slightly modified (first for loop)
from itertools import islice
def window(seq):
for n in range(2, len(seq) + 1):
it = iter(seq)
result = tuple(islice(it, n))
if len(result) == n:
yield result
for elem in it:
result = result[1:] + (elem,)
yield result
def memoize(function):
cache = {}
def decorated_function(*args):
if args in cache:
return cache[args]
else:
val = function(*args)
cache[args] = val
return val
return decorated_function
#memoize
def find_lin_comb(prime):
global count_best
for windows in window(primes[0 : primes.index(prime)]):
if sum(windows) == prime and len(windows) > count_best:
count_best = len(windows)
print('Prime: ', prime, 'Terms: ', count_best)
##Find them:
for x in primes[::-1]: find_lin_comb(x)
(btw, the tuple of prime numbers is generated "decently" fast)
All input is appreciated, I am just a hobby programmer, so please don´t get to advanced on me.
Thank you!
Edit: here is a working code paste that doesn´t have ruined indentations:
http://pastebin.com/R1NpMqgb
This works fine for all primes up to 1000, but for primes up to 10**6 it is very slow, so I was hoping memozation would help; when calculating the sum of sliding windows, a lot of double work is done...(right?)
Yes, right. And of course it's slow for the primes up to 106.
Say you have n primes up to N, numbered in increasing order, p_1 = 2, p_2 = 3, .... When considering whether prime no. k is the sum of consecutive primes, you consider all windows [p_i, ..., p_j], for pairs (i,j) with i < j < k. There are (k-1)*(k-2)/2 of them. Going through all k to n, you examine about n³/6 windows in total (counting multiplicity, you're examining w(i.j) in total n-j times). Even ignoring the cost of creating the window and summing it, you can see how it scales badly:
For N = 1000, there are n = 168 primes and about 790000 windows to examine (counting multiplicity).
For N = 10**6, there are n = 78498 primes and about 8.3*10**13 windows to examine.
Now factor in the work for creating and summing the windows, estimate it low at j-i+1 for summing the j-i+1 primes in w(i,j), the work for p_k is about k³/6, and the total work becomes roughly k**4/24. Something like 33 million steps for N = 1000, peanuts, but nearly 1.6*10**18 for N = 1000000.
A year contains about 3.1*10**7 seconds, with a ~3GHz CPU, that's roughly 1017 clock cycles. So we're talking of an operation needing something like 100 CPU-years (may be a factor of 10 off or so).
You aren't willing to wait that long, I suppose;)
Now, with memoisation, you still look at each window multiple times, but you do the computation of each window only once. That means you need about n³/6 work for the computation of the windows, and look about n³/6 times at any window.
Problem 1: You still need to look at windows about 8.3*10**13 times, that's several hours even if looking cost only one cycle.
Problem 2: There are about 8.3*10**13 windows to memoise. You don't have that much memory, unless you can use a really large HD.
You can circumvent the memory problem by throwing away data you don't need anymore and only calculating data for the windows when it is needed, but once you know which data you may throw away when, you should be able to see a much better approach.
The longest sum of consecutive primes below one-thousand that adds to a prime, contains 21 terms, and is equal to 953.
What does this tell you about the window generating that sum? Where can it start, where can it stop? How can you use that information to create an efficient algorithm to solve the problem?
The memoize decorator adds a wrapper to a function to cache the return value for each value of the argument (each combination of values in case of multiple arguments). It is useful when the function is called multiple times with the same arguments. You can only use it with a pure function, i.e.
The function has no side effects. Changing a global variable and doing output are examples of side effects.
The return value depends only on the values of the arguments, not on some global variables that may change values between calls.
Your find_lin_comb function does not satisfy the above criteria. For one thing, it is called with a different argument every time, for another, the function does not return a value.