Splitting the unit segment into two parts recursively - python

I would like to create a simple multifractal (Binomial Measure). It can be done as follows:
The binomial measure is a probability measure which is defined conveniently via a recursive construction. Start by splitting $ I := [0, 1] $ into two subintervals $ I_0 $ and $ I_1 $ of equal length and assign the masses $ m_0 $ and $ m_1 = 1 - m_0 $ to them. With the two subintervals one proceeds in the same manner and so forth: at stage two, e.g. the four subintervals $ I_{00}, I_{01}, I_{10}, I_{11} $ have masses $ m_0m_0, m_0m_1 m_1m_0 m_1m_1 $ respectively.
Rudolf H. Riedi. Introduction to Multifractals
And it should look like this on the 13 iteration:
I tried to implement it recursively but something went wrong: it uses the previously changed interval in both left child and the right one
def binom_measuare(iterations, val_dct=None, interval=[0, 1], p=0.4, temp=None):
if val_dct is None:
val_dct = {str(0.0): 0}
if temp is None:
temp = 0
temp += 1
x0 = interval[0] + (interval[1] - interval[0]) / 2
x1 = interval[1]
print(x0, x1)
m0 = interval[1] * p
m1 = interval[1] * (1 - p)
val_dct[str(x0)] = m0
val_dct[str(x1)] = m1
print('DEBUG: iter before while', iterations)
while iterations != 0:
if temp % 2 == 0:
iterations -= 1
print('DEBUG: iter after while (left)', iterations)
# left
interval = [interval[0] + (interval[1] - interval[0]) / 2, interval[1] / 2]
binom_measuare(iterations, val_dct, interval, p=0.4, temp=temp)
elif temp % 2 == 1:
print('DEBUG: iter after while (right)', iterations)
# right
interval = [interval[0] + (interval[1] - interval[0]) / 2, interval[1]]
binom_measuare(iterations, val_dct, interval, p=0.4, temp=temp)
else:
return val_dct
Also, I have tried to do this using for-loop and it is doing good job up to the second iteration: on the third iteration it uses 2^3 multipliers rather than 3 $ m_0m_0m_0 $ and 2^4 on the fourth rather than 4 and so on:
iterations = 4
interval = [0, 1]
val_dct = {str(0.0): 0}
p = 0.4
for i in range(1, iterations):
splits = 2 ** i
step = interval[1] / splits
print(splits)
for k in range(1, splits + 1):
deg0 = splits // 2 - k // 2
deg1 = k // 2
print(deg0, deg1)
val_dct[str(k * step)] = p ** deg0 * (1 - p) ** deg1
print(val_dct)
The concept seems very easy to implement and probably someone has already done it. Am I just looking from another angle?
UPD: Please, may sure that your suggestion can achieve the results that are illustrated in the Figure above (p=0.4, iteration=13).
UPUPD: Bill Bell provided a nice idea to achieve what Riedi mentioned in the article. I used Bill's approach and wrote a function that implements it for needed number of iterations and $m_0$ (please see my answer below).

If I understand the principle correctly you could use the sympy symbolic algebra library for making this calculation, along these lines.
>>> from sympy import *
>>> var('m0 m1')
(m0, m1)
>>> layer1 = [m0, m1]
>>> layer2 = [m0*m0, m0*m1, m0*m1, m1*m1]
>>> layer3 = []
>>> for item in layer2:
... layer3.append(m0*item)
... layer3.append(m1*item)
...
>>> layer3
[m0**3, m0**2*m1, m0**2*m1, m0*m1**2, m0**2*m1, m0*m1**2, m0*m1**2, m1**3]
The intervals are always of equal size.
When you need to evaluate the distribution you can use the following kind of code.
>>> [_.subs(m0,0.3).subs(m1,0.7) for _ in layer2]
[0.0900000000000000, 0.210000000000000, 0.210000000000000, 0.490000000000000]

I think that the problem is in your while loop: it doesn't properly handle the base case of your recursion. It stops only when iterations is 0, but keeps looping otherwise. If you want to debug why this forms an infinite loop, I'll leave it to you. Instead, I did my best to fix the problem.
I changed the while to a simple if, made the recursion a little safer by not changing iterations within the routine, and made interval a local copy of the input parameter. You're using a mutable object as a default value, which is dangerous.
def binom_measuare(iterations, val_dct=None, span=[0, 1], p=0.4, temp=None):
interval = span[:]
...
...
print('DEBUG: iter before while', iterations)
if iterations > 0:
if temp % 2 == 0:
print('DEBUG: iter after while (left)', iterations)
# left
interval = [interval[0] + (interval[1] - interval[0]) / 2, interval[1] / 2]
binom_measuare(iterations-1, val_dct, interval, 0.4, temp)
else:
print('DEBUG: iter after while (right)', iterations)
# right
interval = [interval[0] + (interval[1] - interval[0]) / 2, interval[1]]
binom_measuare(iterations-1, val_dct, interval, 0.4, temp)
else:
return val_dct
This terminates and seems to give somewhat sane results. However, I wonder about your interval computations, when the right boundary can often be less than the left. Consider [0.5, 1.0] ... the left-child recursion will be on the interval [0.75, 0.5]; is that what you wanted?

This is my adaptation of #Bill Bell's answer to my question. It generalizes the idea that he provided.
import matplotlib.pyplot as plt
from sympy import var
def binom_measuare(iterations, p=0.4, current_layer=None):
var('m0 m1')
if current_layer is None:
current_layer = [1]
next_layer = []
for item in current_layer:
next_layer.append(m0*item)
next_layer.append(m1*item)
if iterations != 0:
return binom_measuare(iterations - 1, current_layer=next_layer)
else:
return [i.subs(m0, p).subs(m1, 1 - p) for i in next_layer]
Let's plot the output
y = binom_measuare(iterations=12)
x = [(i+1) / len(y) for i in range(len(y))]
x = [0] + x
y = [0] + y
plt.plot(x, y)
I think we have it.

Related

vectorizing a double for loop

This is a performance question. I am trying to optimize the following double for loop. Here is a MWE
import numpy as np
from timeit import default_timer as tm
# L1 and L2 will range from 0 to 3 typically, sometimes up to 5
# all of the following are dummy values but match correct `type`
L1, L2, x1, x2, fac = 2, 3, 2.0, 4.5, 2.3
saved_values = np.random.uniform(high=75.0, size=[max(L1,L2) + 1, max(L1,L2) + 1])
facts = np.random.uniform(high=65.0, size=[L1 + L2 + 1])
val = 0
start = tm()
for i in range(L1+1):
sf = saved_values[L1][i] * x1 ** (L1 - i)
for j in range(L2 + 1):
m = i + j
if m % 2 == 0:
num = sf * facts[m] / (2 * fac) ** (m / 2)
val += saved_values[L2][j] * x1 ** (L1 - j) * num
end = tm()
time = end-start
print("Long way: time taken was {} and value is {}".format(time, val))
My idea for a solution is to take out the if m % 2 == 0: statement and then calculate all i and j combinations i.e., a matrix, which I should be able to vectorize, and then use something like np.where() to add up all of the elements meeting the requirement of if m % 2 == 0: where m= i+j.
Even if this is not faster than the explicit for loops, it should be vectorized because in reality I will be sending arrays to a function containing the double for loops, so being able to do that part vectorized, should get me the speed gains I am after, even if vectorizing this double for loop does not.
I am stuck spinning my wheels right now on how to broadcast, but account for the sf factor as well as the m factor in the inner loop.

Optimizing this recursive function (dynamic programming)

I'm solving a very simple algorithm problem that requests recursion and memoization. The code below works fine but it doesn't meet the time limitation. Someone advised me to optimize tail recursion, but it is not a tail recursion.. This is just a studying material, not a homework.
Question
• A snail can climb 2m per 1 day if it rains, 1m otherwise.
• The probability of raining per day is 75%.
• Given the number of days(<=1000) and height(<=1000), calculate the probability that the snail can get out of the well (climb more than the height well)
This python code is implemented with recursion and memoization.
import sys
sys.setrecursionlimit(10000)
# Probability of success that snails can climb 'targetHeight' within 'days'
def successRate(days, targetHeight):
global cache
# edge case
if targetHeight <= 1:
return 1
if days == 1:
if targetHeight > 2:
return 0
elif targetHeight == 2:
return 0.75
elif targetHeight == 1:
return 0.25
answer = cache[days][targetHeight]
# if the answer is not previously calculated
if answer == -1:
answer = 0.75 * (successRate(days - 1, targetHeight - 2)) + 0.25 * (successRate(days - 1, targetHeight - 1))
cache[days][targetHeight] = answer
return answer
height, duration = map(int, input().split())
cache = [[-1 for j in range(height + 1)] for i in range(duration + 1)] # cache initialized as -1
print(round(successRate(duration, height),7))
It is simple. So it is just a hint.
For inital part set:
# suppose cache is allocated
cache[1][1] = 0.25
cache[1][2] = 0.75
for i in range(3,targetHeight+1):
cache[1][i] = 0
for i in range(days+1):
cache[i][1] = 1
cache[i][0] = 1
And then try to rewrite the recursive part using the initialized values (you should iterate bottom-up, likes the below). And finally, return the value of cache[days][targetHeight].
for i in range(2, days+1):
for j in range(2, targetHeight+1):
cache[i][j] = 0.75 * cache[i-1][j-2] + 0.25 * cache[i-1][j-1]

How to write mathematical sequences in python

I'm completely stuck on a task in one of the exercises we've been given however and was hoping someone could help me with it.
The following is the actual task:
Consider the sequence: x(n+1)= 0.2x(n)−α(x(n)^2−5) with x(0)=
1 for α successively equal to -0.5, +0.5, -0.25, +0.25.
Check the convergence; if the sequence converges, print the message Sequence converged to x= (the value you got) otherwise print No convergence detected
Check whether there are negative elements in the sequence
(Hint: If |xn−xn−1| < 10−9 consider a sequence to be convergent)
I'm not sure how to do sequences in python though and haven't found anything that explains how via googling so have been unsuccessful in trying to do it so far. After numerous attempts, I'm completely stuck.
This is the code I've done:
conv = [-0.5, 0.5, -0.25, 0.25]
b=0
a=conv[b]
c=0
for x in conv:
x1=0.2*x-a*((x**2)-5)
if abs(x1 - x) < 1.e-9:
c += 1
x = x1
b += 1
if c > 0:
print('Sequence converged to x=' + x)
if c === 0:
print('No converge detected')
You need to loop over the values in your "conv" list, assigning them to a, like "for a in conv:". Each iteration of the loop is a sequence as defined by a different "a" value. Then inside that loop, another loop like:
for a in conv:
convergence_determined = False
n = 0
x = 1 # this is x(0) for this sequence
while not convergence_determined:
prev_x = x
n = n += 1
next_x = 0.2 * x - a * (x * x - 5)
x = next_x
if abs(x - prev_x) < 1.e-9:
convergence_determined = True
print('for a = ' + str(a) + ', converged to ' + str(x))
break
# you will need some scheme to identify non-convergence
This is not tested code, just to give you an idea of how to proceed.
The procedure is called 'fixed-point iteration'. The question is similar to this SO question, asked yesterday (and likely others).
The sequence definition shows a as a constant. Indeed, letting a vary for a given sequence in the way indicated makes no sense as it would guarantee non-convergence. The instructor's notation is sloppy, but I am sure that the intent is for students to run 4 iterations, one for each value of a1. (I say this also because I know what fixed-point iteration behaviors are illustrated by the particular choices for a.)
The code below mixes my answer from the link above with the update code from this question. N is chosen as large enough for this problem (I started larger). Charon, I leave it to you to use the results to answer the questions posed by the instructor.
for a in [-0.5, 0.5, -0.25, 0.25]:
x = 1.0
n = 30
for i in range(n):
x1 = 0.2*x - a*(x**2 - 5)
print(i, x) # remove before submitting
if abs(x1 - x) < 1.e-9:
print('a = {a}, n = {i}, result is {x}.'.format(a=a, i=i, x=x))
break
x = x1
else:
print('No convergence within {n} iterations.'.format(n=n))

How to optimize math operations on matrix in python

I am trying to reduce the time of a function that performs a serie of calculations with two matrix. Searching for this, I've heard of numpy, but I really do not know how apply it to my problem. Also, I Think one of the things is making my function slow is having many dots operators (I heard of that in this this page ).
The math correspond with a factorization for the Quadratic assignment problem:
My code is:
delta = 0
for k in xrange(self._tam):
if k != r and k != s:
delta +=
self._data.stream_matrix[r][k] \
* (self._data.distance_matrix[sol[s]][sol[k]] - self._data.distance_matrix[sol[r]][sol[k]]) + \
self._data.stream_matrix[s][k] \
* (self._data.distance_matrix[sol[r]][sol[k]] - self._data.distance_matrix[sol[s]][sol[k]]) + \
self._data.stream_matrix[k][r] \
* (self._data.distance_matrix[sol[k]][sol[s]] - self._data.distance_matrix[sol[k]][sol[r]]) + \
self._data.stream_matrix[k][s] \
* (self._data.distance_matrix[sol[k]][sol[r]] - self._data.distance_matrix[sol[k]][sol[s]])
return delta
Running this on a problem of size 20 (Matrix of 20x20) take about 20 segs, the bottleneck is in this function
ncalls tottime percall cumtime percall filename:lineno(function)
303878 15.712 0.000 15.712 0.000 Heuristic.py:66(deltaC)
I tried to apply map to the for loop, but because the loop body isn't a function call, it is not possible.
How could I reduce the time?
EDIT1
To answer eickenberg comment:
sol is a permutation, for example [1,2,3,4]. the function is called when I am generating neighbor solutions, so, a neighbor of [1,2,3,4] is [2,1,3,4]. I am changing only two positions in the original permutation and then call deltaC, which calculates a factorization of the solution with positions r,s swaped (In the example above r,s = 0,1). This permutation is made to avoid calculate the entire cost of the neighbor solution. I suppose I can store the values of sol[k,r,s] in a local variable to avoid looking up its value in each iteration. I do not know if this is what you was asking in your comment.
EDIT2
A minimal working example:
import random
distance_matrix = [[0, 12, 6, 4], [12, 0, 6, 8], [6, 6, 0, 7], [4, 8, 7, 0]]
stream_matrix = [[0, 3, 8, 3], [3, 0, 2, 4], [8, 2, 0, 5], [3, 4, 5, 0]]
def deltaC(r, s, S=None):
'''
Difference between C with values i and j swapped
'''
S = [0,1,2,3]
if S is not None:
sol = S
else:
sol = S
delta = 0
sol_r, sol_s = sol[r], sol[s]
for k in xrange(4):
if k != r and k != s:
delta += (stream_matrix[r][k] \
* (distance_matrix[sol_s][sol[k]] - distance_matrix[sol_r][sol[k]]) + \
stream_matrix[s][k] \
* (distance_matrix[sol_r][sol[k]] - distance_matrix[sol_s][sol[k]]) + \
stream_matrix[k][r] \
* (distance_matrix[sol[k]][sol_s] - distance_matrix[sol[k]][sol_r]) + \
stream_matrix[k][s] \
* (distance_matrix[sol[k]][sol_r] - distance_matrix[sol[k]][sol_s]))
return delta
for _ in xrange(303878):
d = deltaC(random.randint(0,3), random.randint(0,3))
print d
Now I think the better option is use NumPy. I tried with Matrix(), but did not improve the performance.
Best solution found
Well, Finally I was able to reduce the time a bit more combining #TooTone's solution and storing the indexes in a set to avoid the if. The time has dropped from about 18 seconds to 8 seconds. Here is the code:
def deltaC(self, r, s, sol=None):
delta = 0
sol = self.S if sol is None else self.S
sol_r, sol_s = sol[r], sol[s]
stream_matrix = self._data.stream_matrix
distance_matrix = self._data.distance_matrix
indexes = set(xrange(self._tam)) - set([r, s])
for k in indexes:
sol_k = sol[k]
delta += \
(stream_matrix[r][k] - stream_matrix[s][k]) \
* (distance_matrix[sol_s][sol_k] - distance_matrix[sol_r][sol_k]) \
+ \
(stream_matrix[k][r] - stream_matrix[k][s]) \
* (distance_matrix[sol_k][sol_s] - distance_matrix[sol_k][sol_r])
return delta
In order to reduce the time even more, I think the best way would be write a module.
In the simple example you've given, with for k in xrange(4): the loop body only executes twice (if r==s), or three times (if r!=s) and an initial numpy implementation, below, is slower by a large factor. Numpy is optimized for performing calculations over long vectors and if the vectors are short the overheads can outweigh the benefits. (And note in this formula, the matrices are being sliced in different dimensions, and indexed non-contiguously, which can only make things more complicated for a vectorizing implementation).
import numpy as np
distance_matrix_np = np.array(distance_matrix)
stream_matrix_np = np.array(stream_matrix)
n = 4
def deltaC_np(r, s, sol):
delta = 0
sol_r, sol_s = sol[r], sol[s]
K = np.array([i for i in xrange(n) if i!=r and i!=s])
return np.sum(
(stream_matrix_np[r,K] - stream_matrix_np[s,K]) \
* (distance_matrix_np[sol_s,sol[K]] - distance_matrix_np[sol_r,sol[K]]) + \
(stream_matrix_np[K,r] - stream_matrix_np[K,s]) \
* (distance_matrix_np[sol[K],sol_s] - distance_matrix_np[sol[K],sol_r]))
In this numpy implementation, rather than a for loop over the elements in K, the operations are applied across all the elements in K within numpy. Also, note that your mathematical expression can be simplified. Each term in brackets on the left is the negative of the term in brackets on the right.
This applies to your original code too. For example, (self._data.distance_matrix[sol[s]][sol[k]] - self._data.distance_matrix[sol[r]][sol[k]]) is equal to -1 times (self._data.distance_matrix[sol[r]][sol[k]] - self._data.distance_matrix[sol[s]][sol[k]]), so you were doing unnecessary computation, and your original code can be optimized without using numpy.
It turns out that the bottleneck in the numpy function is the innocent-looking list comprehension
K = np.array([i for i in xrange(n) if i!=r and i!=s])
Once this is replaced with vectorizing code
if r==s:
K=np.arange(n-1)
K[r:] += 1
else:
K=np.arange(n-2)
if r<s:
K[r:] += 1
K[s-1:] += 1
else:
K[s:] += 1
K[r-1:] += 1
the numpy function is much faster.
A graph of run times is shown immediately below (right at the bottom of this answer is the original graph before optimizing the numpy function). You can see that it either makes sense to use your optimized original code or the numpy code, depending on how large the matrix is.
The full code is below for reference, partly in case someone else can take it further. (The function deltaC2 is your original code optimized to take account of the way the mathematical expression can be simplified.)
def deltaC(r, s, sol):
delta = 0
sol_r, sol_s = sol[r], sol[s]
for k in xrange(n):
if k != r and k != s:
delta += \
stream_matrix[r][k] \
* (distance_matrix[sol_s][sol[k]] - distance_matrix[sol_r][sol[k]]) + \
stream_matrix[s][k] \
* (distance_matrix[sol_r][sol[k]] - distance_matrix[sol_s][sol[k]]) + \
stream_matrix[k][r] \
* (distance_matrix[sol[k]][sol_s] - distance_matrix[sol[k]][sol_r]) + \
stream_matrix[k][s] \
* (distance_matrix[sol[k]][sol_r] - distance_matrix[sol[k]][sol_s])
return delta
import numpy as np
def deltaC_np(r, s, sol):
delta = 0
sol_r, sol_s = sol[r], sol[s]
if r==s:
K=np.arange(n-1)
K[r:] += 1
else:
K=np.arange(n-2)
if r<s:
K[r:] += 1
K[s-1:] += 1
else:
K[s:] += 1
K[r-1:] += 1
#K = np.array([i for i in xrange(n) if i!=r and i!=s]) #TOO SLOW
return np.sum(
(stream_matrix_np[r,K] - stream_matrix_np[s,K]) \
* (distance_matrix_np[sol_s,sol[K]] - distance_matrix_np[sol_r,sol[K]]) + \
(stream_matrix_np[K,r] - stream_matrix_np[K,s]) \
* (distance_matrix_np[sol[K],sol_s] - distance_matrix_np[sol[K],sol_r]))
def deltaC2(r, s, sol):
delta = 0
sol_r, sol_s = sol[r], sol[s]
for k in xrange(n):
if k != r and k != s:
sol_k = sol[k]
delta += \
(stream_matrix[r][k] - stream_matrix[s][k]) \
* (distance_matrix[sol_s][sol_k] - distance_matrix[sol_r][sol_k]) \
+ \
(stream_matrix[k][r] - stream_matrix[k][s]) \
* (distance_matrix[sol_k][sol_s] - distance_matrix[sol_k][sol_r])
return delta
import time
N=200
elapsed1s = []
elapsed2s = []
elapsed3s = []
ns = range(10,410,10)
for n in ns:
distance_matrix_np=np.random.uniform(0,n**2,size=(n,n))
stream_matrix_np=np.random.uniform(0,n**2,size=(n,n))
distance_matrix=distance_matrix_np.tolist()
stream_matrix=stream_matrix_np.tolist()
sol = range(n-1,-1,-1)
sol_np = np.array(range(n-1,-1,-1))
Is = np.random.randint(0,n-1,4)
Js = np.random.randint(0,n-1,4)
total1 = 0
start = time.clock()
for reps in xrange(N):
for i in Is:
for j in Js:
total1 += deltaC(i,j, sol)
elapsed1 = (time.clock() - start)
start = time.clock()
total2 = 0
start = time.clock()
for reps in xrange(N):
for i in Is:
for j in Js:
total2 += deltaC_np(i,j, sol_np)
elapsed2 = (time.clock() - start)
total3 = 0
start = time.clock()
for reps in xrange(N):
for i in Is:
for j in Js:
total3 += deltaC2(i,j, sol_np)
elapsed3 = (time.clock() - start)
print n, elapsed1, elapsed2, elapsed3, total1, total2, total3
elapsed1s.append(elapsed1)
elapsed2s.append(elapsed2)
elapsed3s.append(elapsed3)
#Check errors of one method against another
#err = 0
#for i in range(min(n,50)):
# for j in range(min(n,50)):
# err += np.abs(deltaC(i,j,sol)-deltaC_np(i,j,sol_np))
#print err
import matplotlib.pyplot as plt
plt.plot(ns, elapsed1s, label='Original',lw=2)
plt.plot(ns, elapsed3s, label='Optimized',lw=2)
plt.plot(ns, elapsed2s, label='numpy',lw=2)
plt.legend(loc='upper left', prop={'size':16})
plt.xlabel('matrix size')
plt.ylabel('time')
plt.show()
And here is the original graph before optimizing out the list comprehension in deltaC_np

Not sure how to integrate negative number function in data generating algorithm?

I’m having a bit of trouble controlling the results from a data generating algorithm I am working on. Basically it takes values from a list and then lists all the different combinations to get to a specific sum. So far the code works fine(haven’t tested scaling it with many variables yet), but I need to allow for negative numbers to be include in the list.
The way I think I can solve this problem is to put a collar on the possible results as to prevent infinity results(if apples is 2 and oranges are -1 then for any sum, there will be an infinite solutions but if I say there is a limit of either then it cannot go on forever.)
So Here's super basic code that detects weights:
import math
data = [-2, 10,5,50,20,25,40]
target_sum = 100
max_percent = .8 #no value can exceed 80% of total(this is to prevent infinite solutions
for node in data:
max_value = abs(math.floor((target_sum * max_percent)/node))
print node, "'s max value is ", max_value
Here's the code that generates the results(first function generates a table if its possible and the second function composes the actual results. Details/pseudo code of the algo is here: Can brute force algorithms scale? ):
from collections import defaultdict
data = [-2, 10,5,50,20,25,40]
target_sum = 100
# T[x, i] is True if 'x' can be solved
# by a linear combination of data[:i+1]
T = defaultdict(bool) # all values are False by default
T[0, 0] = True # base case
for i, x in enumerate(data): # i is index, x is data[i]
for s in range(target_sum + 1): #set the range of one higher than sum to include sum itself
for c in range(s / x + 1):
if T[s - c * x, i]:
T[s, i+1] = True
coeff = [0]*len(data)
def RecursivelyListAllThatWork(k, sum): # Using last k variables, make sum
# /* Base case: If we've assigned all the variables correctly, list this
# * solution.
# */
if k == 0:
# print what we have so far
print(' + '.join("%2s*%s" % t for t in zip(coeff, data)))
return
x_k = data[k-1]
# /* Recursive step: Try all coefficients, but only if they work. */
for c in range(sum // x_k + 1):
if T[sum - c * x_k, k - 1]:
# mark the coefficient of x_k to be c
coeff[k-1] = c
RecursivelyListAllThatWork(k - 1, sum - c * x_k)
# unmark the coefficient of x_k
coeff[k-1] = 0
RecursivelyListAllThatWork(len(data), target_sum)
My problem is, I don't know where/how to integrate my limiting code to the main code inorder to restrict results and allow for negative numbers. When I add a negative number to the list, it displays it but does not include it in the output. I think this is due to it not being added to the table(first function) and I'm not sure how to have it added(and still keep the programs structure so I can scale it with more variables).
Thanks in advance and if anything is unclear please let me know.
edit: a bit unrelated(and if detracts from the question just ignore, but since your looking at the code already, is there a way I can utilize both cpus on my machine with this code? Right now when I run it, it only uses one cpu. I know the technical method of parallel computing in python but not sure how to logically parallelize this algo)
You can restrict results by changing both loops over c from
for c in range(s / x + 1):
to
max_value = int(abs((target_sum * max_percent)/x))
for c in range(max_value + 1):
This will ensure that any coefficient in the final answer will be an integer in the range 0 to max_value inclusive.
A simple way of adding negative values is to change the loop over s from
for s in range(target_sum + 1):
to
R=200 # Maximum size of any partial sum
for s in range(-R,R+1):
Note that if you do it this way then your solution will have an additional constraint.
The new constraint is that the absolute value of every partial weighted sum must be <=R.
(You can make R large to avoid this constraint reducing the number of solutions, but this will slow down execution.)
The complete code looks like:
from collections import defaultdict
data = [-2,10,5,50,20,25,40]
target_sum = 100
# T[x, i] is True if 'x' can be solved
# by a linear combination of data[:i+1]
T = defaultdict(bool) # all values are False by default
T[0, 0] = True # base case
R=200 # Maximum size of any partial sum
max_percent=0.8 # Maximum weight of any term
for i, x in enumerate(data): # i is index, x is data[i]
for s in range(-R,R+1): #set the range of one higher than sum to include sum itself
max_value = int(abs((target_sum * max_percent)/x))
for c in range(max_value + 1):
if T[s - c * x, i]:
T[s, i+1] = True
coeff = [0]*len(data)
def RecursivelyListAllThatWork(k, sum): # Using last k variables, make sum
# /* Base case: If we've assigned all the variables correctly, list this
# * solution.
# */
if k == 0:
# print what we have so far
print(' + '.join("%2s*%s" % t for t in zip(coeff, data)))
return
x_k = data[k-1]
# /* Recursive step: Try all coefficients, but only if they work. */
max_value = int(abs((target_sum * max_percent)/x_k))
for c in range(max_value + 1):
if T[sum - c * x_k, k - 1]:
# mark the coefficient of x_k to be c
coeff[k-1] = c
RecursivelyListAllThatWork(k - 1, sum - c * x_k)
# unmark the coefficient of x_k
coeff[k-1] = 0
RecursivelyListAllThatWork(len(data), target_sum)

Categories