I am trying to reduce the time of a function that performs a serie of calculations with two matrix. Searching for this, I've heard of numpy, but I really do not know how apply it to my problem. Also, I Think one of the things is making my function slow is having many dots operators (I heard of that in this this page ).
The math correspond with a factorization for the Quadratic assignment problem:
My code is:
delta = 0
for k in xrange(self._tam):
if k != r and k != s:
delta +=
self._data.stream_matrix[r][k] \
* (self._data.distance_matrix[sol[s]][sol[k]] - self._data.distance_matrix[sol[r]][sol[k]]) + \
self._data.stream_matrix[s][k] \
* (self._data.distance_matrix[sol[r]][sol[k]] - self._data.distance_matrix[sol[s]][sol[k]]) + \
self._data.stream_matrix[k][r] \
* (self._data.distance_matrix[sol[k]][sol[s]] - self._data.distance_matrix[sol[k]][sol[r]]) + \
self._data.stream_matrix[k][s] \
* (self._data.distance_matrix[sol[k]][sol[r]] - self._data.distance_matrix[sol[k]][sol[s]])
return delta
Running this on a problem of size 20 (Matrix of 20x20) take about 20 segs, the bottleneck is in this function
ncalls tottime percall cumtime percall filename:lineno(function)
303878 15.712 0.000 15.712 0.000 Heuristic.py:66(deltaC)
I tried to apply map to the for loop, but because the loop body isn't a function call, it is not possible.
How could I reduce the time?
EDIT1
To answer eickenberg comment:
sol is a permutation, for example [1,2,3,4]. the function is called when I am generating neighbor solutions, so, a neighbor of [1,2,3,4] is [2,1,3,4]. I am changing only two positions in the original permutation and then call deltaC, which calculates a factorization of the solution with positions r,s swaped (In the example above r,s = 0,1). This permutation is made to avoid calculate the entire cost of the neighbor solution. I suppose I can store the values of sol[k,r,s] in a local variable to avoid looking up its value in each iteration. I do not know if this is what you was asking in your comment.
EDIT2
A minimal working example:
import random
distance_matrix = [[0, 12, 6, 4], [12, 0, 6, 8], [6, 6, 0, 7], [4, 8, 7, 0]]
stream_matrix = [[0, 3, 8, 3], [3, 0, 2, 4], [8, 2, 0, 5], [3, 4, 5, 0]]
def deltaC(r, s, S=None):
'''
Difference between C with values i and j swapped
'''
S = [0,1,2,3]
if S is not None:
sol = S
else:
sol = S
delta = 0
sol_r, sol_s = sol[r], sol[s]
for k in xrange(4):
if k != r and k != s:
delta += (stream_matrix[r][k] \
* (distance_matrix[sol_s][sol[k]] - distance_matrix[sol_r][sol[k]]) + \
stream_matrix[s][k] \
* (distance_matrix[sol_r][sol[k]] - distance_matrix[sol_s][sol[k]]) + \
stream_matrix[k][r] \
* (distance_matrix[sol[k]][sol_s] - distance_matrix[sol[k]][sol_r]) + \
stream_matrix[k][s] \
* (distance_matrix[sol[k]][sol_r] - distance_matrix[sol[k]][sol_s]))
return delta
for _ in xrange(303878):
d = deltaC(random.randint(0,3), random.randint(0,3))
print d
Now I think the better option is use NumPy. I tried with Matrix(), but did not improve the performance.
Best solution found
Well, Finally I was able to reduce the time a bit more combining #TooTone's solution and storing the indexes in a set to avoid the if. The time has dropped from about 18 seconds to 8 seconds. Here is the code:
def deltaC(self, r, s, sol=None):
delta = 0
sol = self.S if sol is None else self.S
sol_r, sol_s = sol[r], sol[s]
stream_matrix = self._data.stream_matrix
distance_matrix = self._data.distance_matrix
indexes = set(xrange(self._tam)) - set([r, s])
for k in indexes:
sol_k = sol[k]
delta += \
(stream_matrix[r][k] - stream_matrix[s][k]) \
* (distance_matrix[sol_s][sol_k] - distance_matrix[sol_r][sol_k]) \
+ \
(stream_matrix[k][r] - stream_matrix[k][s]) \
* (distance_matrix[sol_k][sol_s] - distance_matrix[sol_k][sol_r])
return delta
In order to reduce the time even more, I think the best way would be write a module.
In the simple example you've given, with for k in xrange(4): the loop body only executes twice (if r==s), or three times (if r!=s) and an initial numpy implementation, below, is slower by a large factor. Numpy is optimized for performing calculations over long vectors and if the vectors are short the overheads can outweigh the benefits. (And note in this formula, the matrices are being sliced in different dimensions, and indexed non-contiguously, which can only make things more complicated for a vectorizing implementation).
import numpy as np
distance_matrix_np = np.array(distance_matrix)
stream_matrix_np = np.array(stream_matrix)
n = 4
def deltaC_np(r, s, sol):
delta = 0
sol_r, sol_s = sol[r], sol[s]
K = np.array([i for i in xrange(n) if i!=r and i!=s])
return np.sum(
(stream_matrix_np[r,K] - stream_matrix_np[s,K]) \
* (distance_matrix_np[sol_s,sol[K]] - distance_matrix_np[sol_r,sol[K]]) + \
(stream_matrix_np[K,r] - stream_matrix_np[K,s]) \
* (distance_matrix_np[sol[K],sol_s] - distance_matrix_np[sol[K],sol_r]))
In this numpy implementation, rather than a for loop over the elements in K, the operations are applied across all the elements in K within numpy. Also, note that your mathematical expression can be simplified. Each term in brackets on the left is the negative of the term in brackets on the right.
This applies to your original code too. For example, (self._data.distance_matrix[sol[s]][sol[k]] - self._data.distance_matrix[sol[r]][sol[k]]) is equal to -1 times (self._data.distance_matrix[sol[r]][sol[k]] - self._data.distance_matrix[sol[s]][sol[k]]), so you were doing unnecessary computation, and your original code can be optimized without using numpy.
It turns out that the bottleneck in the numpy function is the innocent-looking list comprehension
K = np.array([i for i in xrange(n) if i!=r and i!=s])
Once this is replaced with vectorizing code
if r==s:
K=np.arange(n-1)
K[r:] += 1
else:
K=np.arange(n-2)
if r<s:
K[r:] += 1
K[s-1:] += 1
else:
K[s:] += 1
K[r-1:] += 1
the numpy function is much faster.
A graph of run times is shown immediately below (right at the bottom of this answer is the original graph before optimizing the numpy function). You can see that it either makes sense to use your optimized original code or the numpy code, depending on how large the matrix is.
The full code is below for reference, partly in case someone else can take it further. (The function deltaC2 is your original code optimized to take account of the way the mathematical expression can be simplified.)
def deltaC(r, s, sol):
delta = 0
sol_r, sol_s = sol[r], sol[s]
for k in xrange(n):
if k != r and k != s:
delta += \
stream_matrix[r][k] \
* (distance_matrix[sol_s][sol[k]] - distance_matrix[sol_r][sol[k]]) + \
stream_matrix[s][k] \
* (distance_matrix[sol_r][sol[k]] - distance_matrix[sol_s][sol[k]]) + \
stream_matrix[k][r] \
* (distance_matrix[sol[k]][sol_s] - distance_matrix[sol[k]][sol_r]) + \
stream_matrix[k][s] \
* (distance_matrix[sol[k]][sol_r] - distance_matrix[sol[k]][sol_s])
return delta
import numpy as np
def deltaC_np(r, s, sol):
delta = 0
sol_r, sol_s = sol[r], sol[s]
if r==s:
K=np.arange(n-1)
K[r:] += 1
else:
K=np.arange(n-2)
if r<s:
K[r:] += 1
K[s-1:] += 1
else:
K[s:] += 1
K[r-1:] += 1
#K = np.array([i for i in xrange(n) if i!=r and i!=s]) #TOO SLOW
return np.sum(
(stream_matrix_np[r,K] - stream_matrix_np[s,K]) \
* (distance_matrix_np[sol_s,sol[K]] - distance_matrix_np[sol_r,sol[K]]) + \
(stream_matrix_np[K,r] - stream_matrix_np[K,s]) \
* (distance_matrix_np[sol[K],sol_s] - distance_matrix_np[sol[K],sol_r]))
def deltaC2(r, s, sol):
delta = 0
sol_r, sol_s = sol[r], sol[s]
for k in xrange(n):
if k != r and k != s:
sol_k = sol[k]
delta += \
(stream_matrix[r][k] - stream_matrix[s][k]) \
* (distance_matrix[sol_s][sol_k] - distance_matrix[sol_r][sol_k]) \
+ \
(stream_matrix[k][r] - stream_matrix[k][s]) \
* (distance_matrix[sol_k][sol_s] - distance_matrix[sol_k][sol_r])
return delta
import time
N=200
elapsed1s = []
elapsed2s = []
elapsed3s = []
ns = range(10,410,10)
for n in ns:
distance_matrix_np=np.random.uniform(0,n**2,size=(n,n))
stream_matrix_np=np.random.uniform(0,n**2,size=(n,n))
distance_matrix=distance_matrix_np.tolist()
stream_matrix=stream_matrix_np.tolist()
sol = range(n-1,-1,-1)
sol_np = np.array(range(n-1,-1,-1))
Is = np.random.randint(0,n-1,4)
Js = np.random.randint(0,n-1,4)
total1 = 0
start = time.clock()
for reps in xrange(N):
for i in Is:
for j in Js:
total1 += deltaC(i,j, sol)
elapsed1 = (time.clock() - start)
start = time.clock()
total2 = 0
start = time.clock()
for reps in xrange(N):
for i in Is:
for j in Js:
total2 += deltaC_np(i,j, sol_np)
elapsed2 = (time.clock() - start)
total3 = 0
start = time.clock()
for reps in xrange(N):
for i in Is:
for j in Js:
total3 += deltaC2(i,j, sol_np)
elapsed3 = (time.clock() - start)
print n, elapsed1, elapsed2, elapsed3, total1, total2, total3
elapsed1s.append(elapsed1)
elapsed2s.append(elapsed2)
elapsed3s.append(elapsed3)
#Check errors of one method against another
#err = 0
#for i in range(min(n,50)):
# for j in range(min(n,50)):
# err += np.abs(deltaC(i,j,sol)-deltaC_np(i,j,sol_np))
#print err
import matplotlib.pyplot as plt
plt.plot(ns, elapsed1s, label='Original',lw=2)
plt.plot(ns, elapsed3s, label='Optimized',lw=2)
plt.plot(ns, elapsed2s, label='numpy',lw=2)
plt.legend(loc='upper left', prop={'size':16})
plt.xlabel('matrix size')
plt.ylabel('time')
plt.show()
And here is the original graph before optimizing out the list comprehension in deltaC_np
Related
Im trying to run insertion sort and merge sort and plot them. Im taking the time for 5 different N and plot them. I want to do this three times such that Insertion time < merge time, Insertion time = merge time and Insertion time > merge time. However, No matter what I set as N, Insertion sort is always much faster. This is my output for N = 5000
N Values: [1, 1001, 2001, 3001, 4001]
Merge Sort: [0.005, 11.198, 21.965, 35.996, 49.971000000000004]
Insertion Sort: [0.002, 0.268, 0.545, 0.9129999999999999, 1.177]
I have tried different N up to like 10000000 and merge sort is always slower. What am I missing here?
def insertion_sort(array):
start_time = datetime.datetime.now()
for j in range(1, len(array)):
key = array[j]
i = j - 1
while i >= 0 and array[i] > key:
array[i + 1] = array[i]
i -= 1
array[i + 1] = key
time_diff = datetime.datetime.now() - start_time
return time_diff.total_seconds() * 1000
def merge_sort(arr, p, r):
start_time = datetime.datetime.now()
if p < r:
m = (p + (r - 1)) // 2
merge_sort(arr, p, m)
merge_sort(arr, m + 1, r)
merge(arr, p, m, r)
time_diff = datetime.datetime.now() - start_time
return time_diff.total_seconds() * 1000
def merge(arr, p, q, r):
n1 = q - p + 1
n2 = r - q
L = [0] * (n1 + 1)
R = [0] * (n2 + 1)
for i in range(0, n1):
L[i] = arr[p + i]
for j in range(0, n2):
R[j] = arr[q + 1 + j]
i = 0
j = 0
k = p
while i < n1 and j < n2:
if L[i] <= R[j]:
arr[k] = L[i]
i += 1
else:
arr[k] = R[j]
j += 1
k += 1
while i < n1:
arr[k] = L[i]
i += 1
k += 1
while j < n2:
arr[k] = R[j]
j += 1
k += 1
return arr
x, y1, y2 = [], [], []
N = 5000
for i in range(1, N, N // 5):
array = [j for j in range(i)]
array = array[::-1] # Array in reversed order
x.append(i)
y1.append(merge_sort(array, 0, len(array) - 1))
y2.append(insertion_sort(array))
What am I missing here?
Your test is not correct, as you don't provide the same array order to the two sorting algorithms.
As the first algorithm sorts the array (in-place), the second gets a sorted array.
To make a fair comparison, make sure to make a copy of the original array, e.g. using [:]:
y1.append(merge_sort(array[:], 0, len(array) - 1))
y2.append(insertion_sort(array[:]))
And now the results will show what you really expected.
Insertion sort runs very fast on sorted arrays. It just does n comparisons, and that's it.
So, a question for you: Why is insertion sort called with a sorted array in your code?
Hint: Try changing the order of these two lines and see how the running times change:
y1.append(merge_sort(array, 0, len(array) - 1))
y2.append(insertion_sort(array))
You're missing several significant factors:
Algorithmic complexity is a ratio as N approaches infinity. You're nowhere near infinity. :-) Some algorithms have a high overhead, such that their efficiencies don't begin to dominate execution time until you get to much larger lists.
If you want to see these efficiency effects, you have to efficiently implement the algorithms. Your code has a lot of superfluous overhead, especially in the merge sort. I recommend that you research a better implementation, as you're doing some extra copying and list building that adds nothing to the final result.
Whatever you choose to implement, you need to research its properties. As you can see from the raw figures and the graph, both of your functions are still dependent most on the linear and constant components of the implementation, and have no yet reached the parts of the curve dominated by the N^2 and N log N terms.
This is a performance question. I am trying to optimize the following double for loop. Here is a MWE
import numpy as np
from timeit import default_timer as tm
# L1 and L2 will range from 0 to 3 typically, sometimes up to 5
# all of the following are dummy values but match correct `type`
L1, L2, x1, x2, fac = 2, 3, 2.0, 4.5, 2.3
saved_values = np.random.uniform(high=75.0, size=[max(L1,L2) + 1, max(L1,L2) + 1])
facts = np.random.uniform(high=65.0, size=[L1 + L2 + 1])
val = 0
start = tm()
for i in range(L1+1):
sf = saved_values[L1][i] * x1 ** (L1 - i)
for j in range(L2 + 1):
m = i + j
if m % 2 == 0:
num = sf * facts[m] / (2 * fac) ** (m / 2)
val += saved_values[L2][j] * x1 ** (L1 - j) * num
end = tm()
time = end-start
print("Long way: time taken was {} and value is {}".format(time, val))
My idea for a solution is to take out the if m % 2 == 0: statement and then calculate all i and j combinations i.e., a matrix, which I should be able to vectorize, and then use something like np.where() to add up all of the elements meeting the requirement of if m % 2 == 0: where m= i+j.
Even if this is not faster than the explicit for loops, it should be vectorized because in reality I will be sending arrays to a function containing the double for loops, so being able to do that part vectorized, should get me the speed gains I am after, even if vectorizing this double for loop does not.
I am stuck spinning my wheels right now on how to broadcast, but account for the sf factor as well as the m factor in the inner loop.
Given :
I : a positive integer
n : a positive integer
nth Term of sequence for input = I :
F(I,1) = (I * (I+1)) / 2
F(I,2) = F(I,1) + F(I-1,1) + F(I-2,1) + .... F(2,1) + F(1,1)
F(I,3) = F(I,2) + F(I-1,2) + F(I-2,2) + .... F(2,2) + F(2,1)
..
..
F(I,n) = F(I,n-1) + F(I-1,n-1) + F(I-2,n-1) + .... F(2,n-1) + F(1,n-1)
nth term --> F(I,n)
Approach 1 : Used recursion to find the above :
def recursive_sum(I, n):
if n == 1:
return (I * (I + 1)) // 2
else:
return sum(recursive_sum(j, n - 1) for j in range(I, 0, -1))
Approach 2 : Iteration to store reusable values in a dictionary. Used this dictionary to get the nth term.:
def non_recursive_sum_using_data(I, n):
global data
if n == 1:
return (I * (I + 1)) // 2
else:
return sum(data[j][n - 1] for j in range(I, 0, -1))
def iterate(I,n):
global data
data = {}
i = 1
j = 1
for i in range(n+1):
for j in range(I+1):
if j not in data:
data[j] = {}
data[j][i] = recursive_sum(j,i)
return data[I][n]
The recursion approach is obviously not efficient due to maximum recursion depth. Also the next approach's time and space complexity will be poor.
Is there better way to recurse ? or a different approach than recursion ?
I am curious if we can find a formula for nth term.
You could just cache your recursive results:
from functools import lru_cache
#lru_cache(maxsize=None)
def recursive_sum(I, n):
if n == 1:
return (I * (I + 1)) // 2
return sum(recursive_sum(j, n - 1) for j in range(I, 0, -1))
That way you can get the readability and brevity of the recursive approach without most of the performance issues since the function is only called once for each argument combination (I, n).
Using the usual binomial(n,k) = n!/(k!*(n-k)!), you have
F(I,n) = binomial(I+n, n+1).
Then you can choose the method you like most to compute binomial coefficients.
Here an example:
def binomial(n, k):
numerator = denominator = 1
t = max(k, n-k)
for low,high in enumerate(range(t+1, n+1), 1):
numerator *= high
denominator *= low
return numerator // denominator
def F(I,n): return binomial(I+n, n+1)
The formula for the nth term of the sequence is the one you have already mentioned.
Also rightly so you have identified that it will lead to an inefficient algorithm and stack overflow.
You can look into dynamic programming approach where u calculate F(I,N) just once and just reuse the value.
For example this is how the fibonacci seq is calculated.
[just-example] https://www.geeksforgeeks.org/program-for-nth-fibonacci-number/
You need to find the same pattern and cache the value
I have an example for this here in this small code written in golang
https://play.golang.org/p/vRi-QMj7z2v
the standard DP
One can do a (tiny) bit of math to rewrite your function:
F(i,n) = sum_{k=0}^{i-1} F(i-k, n-1) = sum_{k=1}^{i} F(k, n-1)
Now notice, that if you consider a matrix F_{ixn}, to compute F(i,n) we just need to add the elements of the previous column.
x----+---
| + |
|----+ |
|----+-F(i,n)
We conclude that we can build the first layer (aka column). Then the second one. And so forth until we get to the n-th layer.
Finally we take the last element of our final layer which is F(i,n)
The computation time is about O(I*n)
More math based but faster
An other way is to consider our layer as a vector variable X.
We can write the recurrence relation as
X_n = MX_{n-1}
where M is a triangular matrix with 1 in the lower part.
We then want to compute the general term of X_n so we want to compute M^n.
By following Yves Daoust
(I just copy from the link above)
Coefficients should be indiced _{n+1} and _n, but here it is _1 and '' for readability
Moreover the matrix is upper triangular but we can just take the transpose afterwards...
a_1 b_1 c_1 d_1 1 1 1 1 a b c d
a_1 b_1 c_1 = 0 1 1 1 * 0 a b c
a_1 b_1 0 0 1 1 0 0 a b
a_1 0 0 0 1 0 0 0 a
by going from last row to first:
a = 1
from b_1 = a+b = 1 + b = n, b = n
from c_1 = a+b+c = 1+n+c, c = n(n+1)/2
from d_1 = a+b+c+d = 1+n+n(n+1)/2 +d, d = n(n+1)(n+2)/6
I have not proved it but I hint that e_1 = n(n+1)(n+2)(n+3)/24 (so basically C_n^k)
(I think the proof lies more in the fact that F(i,n) = F(i,n-1) + F(i-1,n) )
More generally instead of taking variables a,b,c... but X_n(0), X_n(1)...
X_n(0) = 1
X_n(i) = n*...*(n+i-1) / i!
And by applying recusion for computing X:
X_n(0) = 1
X_n(i) = X_n(i-1)*(n+i-1)/i
Finally we deduce F(i,n) as the scalar product Y_{n-1} * X_1 where Y_n is the reversed vector of X_n and X_1(n) = n*(n+1)/2
from functools import lru_cache
#this is copypasted from schwobaseggl
#lru_cache(maxsize=None)
def recursive_sum(I, n):
if n == 1:
return (I * (I + 1)) // 2
return sum(recursive_sum(j, n - 1) for j in range(I, 0, -1))
def iterative_sum(I,n):
layer = [ i*(i+1)//2 for i in range(1,I+1)]
x = 2
while x <= n:
next_layer = [layer[0]]
for i in range(1,I):
#we don't need to compute all the sum everytime
#take the previous sum and add it the new number
next_layer.append( next_layer[i-1] + layer[i] )
layer = next_layer
x += 1
return layer[-1]
def brutus(I,n):
if n == 1:
return I*(I+1)//2
X_1 = [ i*(i+1)//2 for i in range(1, I+1)]
X_n = [1]
for i in range(1, I):
X_n.append(X_n[-1] * (n-1 + i-1) / i )
X_n.reverse()
s = 0
for i in range(0, I):
s += X_1[i]*X_n[i]
return s
def do(k,n):
print('rec', recursive_sum(k,n))
print('it ', iterative_sum(k,n))
print('bru', brutus(k,n))
print('---')
do(1,4)
do(2,1)
do(3,2)
do(4,7)
do(7,4)
I would like to create a simple multifractal (Binomial Measure). It can be done as follows:
The binomial measure is a probability measure which is defined conveniently via a recursive construction. Start by splitting $ I := [0, 1] $ into two subintervals $ I_0 $ and $ I_1 $ of equal length and assign the masses $ m_0 $ and $ m_1 = 1 - m_0 $ to them. With the two subintervals one proceeds in the same manner and so forth: at stage two, e.g. the four subintervals $ I_{00}, I_{01}, I_{10}, I_{11} $ have masses $ m_0m_0, m_0m_1 m_1m_0 m_1m_1 $ respectively.
Rudolf H. Riedi. Introduction to Multifractals
And it should look like this on the 13 iteration:
I tried to implement it recursively but something went wrong: it uses the previously changed interval in both left child and the right one
def binom_measuare(iterations, val_dct=None, interval=[0, 1], p=0.4, temp=None):
if val_dct is None:
val_dct = {str(0.0): 0}
if temp is None:
temp = 0
temp += 1
x0 = interval[0] + (interval[1] - interval[0]) / 2
x1 = interval[1]
print(x0, x1)
m0 = interval[1] * p
m1 = interval[1] * (1 - p)
val_dct[str(x0)] = m0
val_dct[str(x1)] = m1
print('DEBUG: iter before while', iterations)
while iterations != 0:
if temp % 2 == 0:
iterations -= 1
print('DEBUG: iter after while (left)', iterations)
# left
interval = [interval[0] + (interval[1] - interval[0]) / 2, interval[1] / 2]
binom_measuare(iterations, val_dct, interval, p=0.4, temp=temp)
elif temp % 2 == 1:
print('DEBUG: iter after while (right)', iterations)
# right
interval = [interval[0] + (interval[1] - interval[0]) / 2, interval[1]]
binom_measuare(iterations, val_dct, interval, p=0.4, temp=temp)
else:
return val_dct
Also, I have tried to do this using for-loop and it is doing good job up to the second iteration: on the third iteration it uses 2^3 multipliers rather than 3 $ m_0m_0m_0 $ and 2^4 on the fourth rather than 4 and so on:
iterations = 4
interval = [0, 1]
val_dct = {str(0.0): 0}
p = 0.4
for i in range(1, iterations):
splits = 2 ** i
step = interval[1] / splits
print(splits)
for k in range(1, splits + 1):
deg0 = splits // 2 - k // 2
deg1 = k // 2
print(deg0, deg1)
val_dct[str(k * step)] = p ** deg0 * (1 - p) ** deg1
print(val_dct)
The concept seems very easy to implement and probably someone has already done it. Am I just looking from another angle?
UPD: Please, may sure that your suggestion can achieve the results that are illustrated in the Figure above (p=0.4, iteration=13).
UPUPD: Bill Bell provided a nice idea to achieve what Riedi mentioned in the article. I used Bill's approach and wrote a function that implements it for needed number of iterations and $m_0$ (please see my answer below).
If I understand the principle correctly you could use the sympy symbolic algebra library for making this calculation, along these lines.
>>> from sympy import *
>>> var('m0 m1')
(m0, m1)
>>> layer1 = [m0, m1]
>>> layer2 = [m0*m0, m0*m1, m0*m1, m1*m1]
>>> layer3 = []
>>> for item in layer2:
... layer3.append(m0*item)
... layer3.append(m1*item)
...
>>> layer3
[m0**3, m0**2*m1, m0**2*m1, m0*m1**2, m0**2*m1, m0*m1**2, m0*m1**2, m1**3]
The intervals are always of equal size.
When you need to evaluate the distribution you can use the following kind of code.
>>> [_.subs(m0,0.3).subs(m1,0.7) for _ in layer2]
[0.0900000000000000, 0.210000000000000, 0.210000000000000, 0.490000000000000]
I think that the problem is in your while loop: it doesn't properly handle the base case of your recursion. It stops only when iterations is 0, but keeps looping otherwise. If you want to debug why this forms an infinite loop, I'll leave it to you. Instead, I did my best to fix the problem.
I changed the while to a simple if, made the recursion a little safer by not changing iterations within the routine, and made interval a local copy of the input parameter. You're using a mutable object as a default value, which is dangerous.
def binom_measuare(iterations, val_dct=None, span=[0, 1], p=0.4, temp=None):
interval = span[:]
...
...
print('DEBUG: iter before while', iterations)
if iterations > 0:
if temp % 2 == 0:
print('DEBUG: iter after while (left)', iterations)
# left
interval = [interval[0] + (interval[1] - interval[0]) / 2, interval[1] / 2]
binom_measuare(iterations-1, val_dct, interval, 0.4, temp)
else:
print('DEBUG: iter after while (right)', iterations)
# right
interval = [interval[0] + (interval[1] - interval[0]) / 2, interval[1]]
binom_measuare(iterations-1, val_dct, interval, 0.4, temp)
else:
return val_dct
This terminates and seems to give somewhat sane results. However, I wonder about your interval computations, when the right boundary can often be less than the left. Consider [0.5, 1.0] ... the left-child recursion will be on the interval [0.75, 0.5]; is that what you wanted?
This is my adaptation of #Bill Bell's answer to my question. It generalizes the idea that he provided.
import matplotlib.pyplot as plt
from sympy import var
def binom_measuare(iterations, p=0.4, current_layer=None):
var('m0 m1')
if current_layer is None:
current_layer = [1]
next_layer = []
for item in current_layer:
next_layer.append(m0*item)
next_layer.append(m1*item)
if iterations != 0:
return binom_measuare(iterations - 1, current_layer=next_layer)
else:
return [i.subs(m0, p).subs(m1, 1 - p) for i in next_layer]
Let's plot the output
y = binom_measuare(iterations=12)
x = [(i+1) / len(y) for i in range(len(y))]
x = [0] + x
y = [0] + y
plt.plot(x, y)
I think we have it.
Say I have a 1D array x with positive and negative values in Python, e.g.:
x = random.rand(10) * 10
For a given positive value of K, I would like to find the offset c that makes the sum of positive elements of the array y = x + c equal to K.
How can I solve this problem efficiently?
How about binary search to determine which elements of x + c are going to contribute to the sum, followed by solving the linear equation? The running time of this code is O(n log n), but only O(log n) work is done in Python. The running time could be dropped to O(n) via a more complicated partitioning strategy. I'm not sure whether a practical improvement would result.
import numpy as np
def findthreshold(x, K):
x = np.sort(np.array(x))[::-1]
z = np.cumsum(np.array(x))
l = 0
u = x.size
while u - l > 1:
m = (l + u) // 2
if z[m] - (m + 1) * x[m] >= K:
u = m
else:
l = m
return (K - z[l]) / (l + 1)
def test():
x = np.random.rand(10)
K = np.random.rand() * x.size
c = findthreshold(x, K)
assert np.abs(K - np.sum(np.clip(x + c, 0, np.inf))) / K <= 1e-8
Here's a randomized expected O(n) variant. It's faster (on my machine, for large inputs), but not dramatically so. Watch out for catastrophic cancellation in both versions.
def findthreshold2(x, K):
sumincluded = 0
includedsize = 0
while x.size > 0:
pivot = x[np.random.randint(x.size)]
above = x[x > pivot]
if sumincluded + np.sum(above) - (includedsize + above.size) * pivot >= K:
x = above
else:
notbelow = x[x >= pivot]
sumincluded += np.sum(notbelow)
includedsize += notbelow.size
x = x[x < pivot]
return (K - sumincluded) / includedsize
You can sort x in descending order, loop over x and compute the required c thus far. If the next element plus c is positive, it should be included in the sum, so c gets smaller.
Note that it might be the case that there is no solution: if you include elements up to m, c is such that m+1 should also be included, but when you include m+1, c decreases and a[m+1]+c might get negative.
In pseudocode:
sortDescending(x)
i = 0, c = 0, sum = 0
while i < x.length and x[i] + c >= 0
sum += x[i]
c = (K - sum) / i
i++
if i == 0 or x[i-1] + c < 0
#no solution
The running time is obviously O(n log n) because it is dominated by the initial sort.