How to optimize data generation for numpy call - python

I'd like to know how to make the following code shorter and/or more efficient. Could I (or should I) get rid of the for loop by using a functional method, or is there method I should be using from numpy?
The code calculates the expected value of an array of of integers.
vals = np.arange(self.n+1)
# array of probability of each value in vals
parr = np.ones(len(vals))
for i in range(len(vals)):
parr[i] *= self.prob(vals[i])
return np.dot(vals,parr)
As requested in comments, the implementation of the method prob():
def prob(self, x):
"""Computes probability of removing x items
:param x: number of items to remove
:returns: probability of removing x items
"""
# p is the probability of removing an item
# sl.choose computes n choose x
return sl.choose(self.n, x) * (self.p**x) * \
(1-self.p)**(self.n-x)

I think it will be most faster:
vals = np.arange(self.n+1)
# array of probability of each value in vals
parr = self.prob(vals)
return np.dot(vals,parr)
and function:
def prob(list_of_x):
"""Computes probability of removing x items
:param list_of_x: numbers of items to remove
:returns: probability of removing x items
"""
# p is the probability of removing an item
# sl.choose computes n choose x
return np.asarray([sl.choose(self.n, e) for e in list_of_x]) * (self.p ** list_of_x) * \
(1-self.p)**(self.n - list_of_x)
Because numpy is faster:
import timeit
import numpy as np
list_a = [1, 2, 3] * 1000
list_b = [4, 5, 6] * 1000
np_list_a = np.asarray(list_a)
np_list_b = np.asarray(list_b)
print(timeit.timeit('[a * b for a, b in zip(list_a, list_b)]', 'from __main__ import list_a, list_b', number=1000))
print(timeit.timeit('np_list_a * np_list_b', 'from __main__ import np_list_a, np_list_b', number=1000))
Result:
0.19378583212707723
0.004333830584755033

The loop can be reduced to a list comprehension:
vals = np.arange(self.n+1)
# array of probability of each value in vals
parr = [self.prob(v) for v in vals]
return np.dot(vals, parr)

Related

How do I run this function for multiple values of N?

I am trying to run the code below for N = np.linspace(20,250,47), but I get multiple errors when trying to change the N. I am new to python and am not sure how to get multiple values of this function using multiple values of N. Below is the code with N = 400 and it does work, but I am not sure how to make it work for multiple N's at the same time.
import matplotlib.pyplot as plt
import numpy as np
S0 = 9
K = 10
T = 3
r = 0.06
sigma = 0.3
N = 400
dt = T / N
u = exp(sigma*sqrt(dt)+(r-0.5*sigma**2)*dt)
d = exp(-sigma*sqrt(dt)+(r-0.5*sigma**2)*dt)
p = 0.5
def binomial_tree_put(N, T, S0, sigma, r, K, array_out=False):
dt = T / N
u = exp(sigma*sqrt(dt)+(r-0.5*sigma**2)*dt)
d = exp(-sigma*sqrt(dt)+(r-0.5*sigma**2)*dt)
p = 0.5
price_tree = np.zeros([N+1,N+1])
for i in range(N+1):
for j in range(i+1):
price_tree[j,i] = S0*(d**j)*(u**(i-j))
option = np.zeros([N+1,N+1])
option[:,N] = np.maximum(np.zeros(N+1), K - price_tree[:,N])
for i in np.arange(N-1, -1, -1):
for j in np.arange(0, i+1):
option[j, i] = np.exp(-r*dt)*(p*option[j, i+1]+(1-p)*option[j+1, i+1])
if array_out:
return [option[0,0], price_tree, option]
else:
return option[0,0]
Suppose you have a list of values for N e.g N = [400, 300, 500, 800], then you need to call the function for every value, you can use a loop for that.
For example,
for num in N:
binomial_tree_put(num, *other arguments*)
np.linspace() creates an np.array but the function expects a sinlge integer. If you want to execute a function for each element contained inside a array/list, you can do that inside a loop like this:
# your code as defined above goes here
for num in np.linspace(20,250,47):
N = int(num) # you could just put N in the line above - this is just to illustrate
binomial_tree_put(N, T, S0, sigma, r, K, array_out=False)
Be aware, depending on how long your function takes to execute and how many elements are in your iterable (e.g. 47 for your case), it may take a while to execute.
Edit: I also noticed you seem to be missing an import in your example code. exp() and sqrt() are part of the math module.
You can also use partial function, like this:
from functools import partial
N = [1, 2, ...] # all your N values
binom_fct = partial(binomial_tree_put, T=T, S0=S0, sigma=sigma, r=r, K=K, array_out=array_out)
for num in N:
binom_fct(num)
partial help here

itertools combinations in tandem with looping

I have the following Python code. Because random is being used, it generates a new answer every time:
import random
import numpy as np
N = 64 # Given
T = 5 # Given
FinalLengths = []
for i in range(T):
c = range(1, N)
x = random.sample(c, 2) # Choose 2 random numbers between 1 and N-1
LrgstNode = max(x)
SmlstNode = min(x)
RopeLengths = [SmlstNode, LrgstNode - SmlstNode, N - LrgstNode]
S = max(RopeLengths)
N = S
FinalLengths.append(S)
avgS = np.mean(FinalLengths) # Find average
print("The mean of S is {}".format(avgS))
My research has led me to possibly using itertools combinations in order to produce all possible combinations within the range and get the avg to converge. If so, how?
Thank you.
It sounds like you're after something like this:
import random
import numpy as np
from itertools import combinations
N = 64 # Given
T = 5 # Given
FinalLengths = []
for i in range(T):
c = list(range(1, N))
for x in combinations(c, 2):
S = max([min(x), max(x) - min(x), N - max(x)])
N = S
FinalLengths.append(S)
avgS = np.mean(FinalLengths) # Find average
print("The mean of S is {}".format(avgS))
To use combinations(l, size) we can pass in a list l and the size of each combination tuple, and int size. That's all there is to it!

Passing variables to parallelized function

I'm parallelizing the generation of a matrix where each element in the matrix is computed by a function fun. I can get it to work if the only thing I pass into this function are the indices i and j. However, I want to pass another variable into this function say x, how do I do this?
I'm using Python 2.7
import numpy as np
import multiprocess as mp
import itertools
p = mp.Pool()
def fun((i,j)):
print i,j
prod = i * j
# what if I want to have a variable x in this function
# prod = i * j * x
return prod
combs = ((i,j) for i,j in itertools.product(xrange(5), repeat=2) if i <= 5)
result = p.map(fun, combs)
p.close()
p.join()
newresult = np.array(result).reshape(5,5)
print newresult
def fun((i,j,x)):
print i,j,x
prod = i * j * x
return prod
Why this works: You are actually just passing one object into the function, which turns out to be a tuple. def fun((i,j)) is just simply breaking the tuple apart again from the object. So to answer your question, you can just add another element to the tuple and it works fine.
A more visibly clear representation of what you are doing:
def fun(data):
i,j,x = data
print i,j,x
prod = i * j * x
return prod
data = (2,4,10)
print(fun(data))
Or you can do this:
def fun((i,j), x):
print i,j, x
prod = i * j * x
# what if I want to have a variable x in this function
# prod = i * j * x
return prod
print(fun((2,4), 10))

Possible to calculate a double sum in python using list comprehensions to replace both for-loops?

I have a double sum which reads basically
sum = exp( x^2 + y^2 )
Of course, I could simply use two nested for loops but that tends to be time consuming for large numbers. I can use one list comprehension to replace the inner for loop, see here:
import numpy as np
N_x = 100
N_y = 100
# straight forward way
result_1 = .0
for x in xrange(N_x):
for y in xrange(N_y):
result_1 += np.exp( (float(x)/N_x)**2 + ( (float(y)/N_y)**2 )
# using one list comprehension
result_2 = .0
for x in xrange(N_x):
inner_loop = [ np.exp( (float(y)/N_y)**2 ) for y in range(N_y) ]
result_2 += np.exp( (float(x)/N_x)**2 ) * sum(inner_loop)
But how to replace the outer for loop as well with a list comprehension (which I expect to be faster), any hints?
Why don't you do it the numpy way... without for loops:
x = np.arange(N_x)
y = np.arange(N_y)
xx, yy = np.meshgrid(x, y)
result = np.sum(np.exp((xx/N_x)**2 + (yy/N_y)**2))
result = sum(np.exp(x**2 + y**2) for x in range(N_x) for y in range(N_y))
You were almost there. The full sum can be written as the product of two 1D sums, i.e. (sum exp x^2) * (sum exp y^2):
>>> import numpy as np
>>>
>>> N_x = N_y = 100
>>>
# brute force
>>> result_1 = .0
>>> for x in xrange(N_x):
... for y in xrange(N_y):
... result_1 += np.exp( (float(x)/N_x)**2 + (float(y)/N_y)**2 )
...
>>> result_1
21144.232143358553
>>>
# single product method
>>> from __future__ import division
>>>
>>> x, y = np.arange(N_x) / N_x, np.arange(N_y) / N_y
>>> np.exp(x*x).sum() * np.exp(y*y).sum()
21144.232143358469
My guess is this you can even do with list comp and beat the brute force numpy method:
>> rx, ry = 1.0 / (N_x*N_x), 1.0 / (N_y*N_y)
>>> sum([np.exp(rx*x*x) for x in xrange(N_x)]) * sum([np.exp(ry*y*y) for y in xrange(N_y)])
21144.232143358469
Indeed, timings done in Python3 because I don't know how to use timeit in Python2:
>>> from timeit import repeat
>>>
>>> kwds = dict(globals=globals(), number=100)
>>>
# single product - list comp
>>> repeat('sum(np.exp(rx*x*x) for x in range(N_x)) * sum(np.exp(ry*y*y) for y in range(N_y))', **kwds)
[0.0166887859813869, 0.016465034103021026, 0.016357041895389557]
>>>
# numpy brute force
>>> repeat('np.exp(np.add.outer(x*x, y*y)).sum()', **kwds)
[0.07063774298876524, 0.0348161740694195, 0.02283189189620316]
Obviously, numpy single product is even faster
>>> repeat('np.exp(x*x).sum() * np.exp(y*y).sum()', **kwds)
[0.0031406711786985397, 0.0031003099866211414, 0.0031157969497144222]

Weighted averaging a list

Thanks for your responses. Yes, I was looking for the weighted average.
rate = [14.424, 14.421, 14.417, 14.413, 14.41]
amount = [3058.0, 8826.0, 56705.0, 30657.0, 12984.0]
I want the weighted average of the top list based on each item of the bottom list.
So, if the first bottom-list item is small (such as 3,058 compared to the total 112,230), then the first top-list item should have less of an effect on the top-list average.
Here is some of what I have tried. It gives me an answer that looks right, but I am not sure if it follows what I am looking for.
for g in range(len(rate)):
rate[g] = rate[g] * (amount[g] / sum(amount))
rate = sum(rate)
EDIT:
After comparing other responses with my code, I decided to use the zip code to keep it as short as possible.
You could use numpy.average to calculate weighted average.
In [13]: import numpy as np
In [14]: rate = [14.424, 14.421, 14.417, 14.413, 14.41]
In [15]: amount = [3058.0, 8826.0, 56705.0, 30657.0, 12984.0]
In [17]: weighted_avg = np.average(rate, weights=amount)
In [19]: weighted_avg
Out[19]: 14.415602815646439
for g in range(len(rate)):
rate[g] = rate[g] * amount[g] / sum(amount)
rate = sum(rate)
is the same as:
sum(rate[g] * amount[g] / sum(amount) for g in range(len(rate)))
which is the same as:
sum(rate[g] * amount[g] for g in range(len(rate))) / sum(amount)
which is the same as:
sum(x * y for x, y in zip(rate, amount)) / sum(amount)
Result:
14.415602815646439
This looks like a weighted average.
values = [1, 2, 3, 4, 5]
weights = [2, 8, 50, 30, 10]
s = 0
for x, y in zip(values, weights):
s += x * y
average = s / sum(weights)
print(average) # 3.38
This outputs 3.38, which indeed tends more toward the values with the highest weights.
Let's use python zip function
zip([iterable, ...])
This function returns a list of tuples, where the i-th tuple contains the i-th element from each of the argument sequences or iterables. The returned list is truncated in length to the length of the shortest argument sequence. When there are multiple arguments which are all of the same length, zip() is similar to map() with an initial argument of None. With a single sequence argument, it returns a list of 1-tuples. With no arguments, it returns an empty list.
weights = [14.424, 14.421, 14.417, 14.413, 14.41]
values = [3058.0, 8826.0, 56705.0, 30657.0, 12984.0]
weighted_average = sum(weight * value for weight, value in zip(weights, values)) / sum(weights)
As a documented and tested function:
def weighted_average(values, weights=None):
"""
Returns the weighted average of `values` with weights `weights`
Returns the simple aritmhmetic average if `weights` is None.
>>> weighted_average([3, 9], [1, 2])
7.0
>>> 7 == (3*1 + 9*2) / (1 + 2)
True
"""
if weights == None:
weights = [1 for _ in range(len(values))]
normalization = 0
val = 0
for value, weight in zip(values, weights):
val += value * weight
normalization += weight
return val / normalization
For completeness another version where the values and weights are stored in tuples:
def weighted_average(values_and_weights):
"""
The input is expected in the form:
[(value_1, weight_1), (value_2, weight_2), ...(value_n, weight_n)]
>>> weighted_average([(3,1), (9,2)])
7.0
>>> 7 == (3*1 + 9*2) / (1 + 2)
True
"""
normalization = 0
val = 0
for value, weight in values_and_weights:
val += value * weight
normalization += weight
return val / normalization

Categories