Finding the smallest solution set, if one exists (two multipliers) - python

Note: This is the two-multipliers variation of this problem
Given a set A, consisting of floats between 0.0 and 1.0, find a smallest set B such that for each a in A, there is either a value where a == B[x], or there is a pair of unique values where a == B[x] * B[y].
For example, given
$ A = [0.125, 0.25, 0.5, 0.75, 0.9]
A possible (but probably not smallest) solution for B is
$ B = solve(A)
$ print(B)
[0.25, 0.5, 0.75, 0.9]
This satisfies the initial problem, because A[0] == B[0] * B[1], A[1] == B[1], etc., which allows us to recreate the original set A. The length of B is smaller than that of A, but I’m guessing there are smaller answers as well.
I assume that the solution space for B is large, if not infinite. If a solution exists, how would a smallest set B be found?
Notes:
We're not necessarily limited to the items in A. B can consist of any set of values, whether or not they exist in A.
Since items in A are all 0-1 floats, I'm assuming that B will also be 0-1 floats. Is this the case?
This may be a constraint satisfaction problem, but I'm not sure how it would be defined?
Since floating point math is generally problematic, any answer should frame the algorithm around rational numbers.

Sort the array. For each pair of elements Am, An ∈ A, m < n - calculate their ratio.
Check if the ratio is equal to some element in A, which is not equal to Am nor to An.
Example:
A = { 0.125, 0.25, 0.5, 0.75, 0.9 }
(0.125, 0.25): 0.5 <--- bingo
(0.125, 0.5 ): 0.25 <--- bingo
(0.125, 0.75): 0.1(6)
(0.125, 0.9 ): 0.13(8)
(0.25 , 0.5 ): 0.5
(0.25 , 0.75): 0.(3)
(0.25 , 0.9 ): 0.2(7)
(0.5 , 0.75): 0.(6)
(0.5 , 0.9 ): 0.(5)
(0.75 , 0.9 ): 0.8(3)
The numerator (0.125) is redundant (= 0.25 * 0.5) or (= 0.5 * 0.25)
We can do better by introducing new elements:
Another example:
A = { 0.1, 0.11, 0.12, 0.2, 0.22, 0.24 }
(0.1 , 0.11): 0.(90) ***
(0.1 , 0.12): 0.8(3) +++
(0.1 , 0.2 ): 0.5 <--------
(0.1 , 0.22): 0.(45)
(0.1 , 0.24): 0.41(6)
(0.11, 0,12): 0.91(6) ~~~
(0.11, 0.2 ): 0.55
(0.11, 0.22): 0.5 <--------
(0.11, 0.24): 0.458(3)
(0.12, 0.2 ): 0.6
(0.12, 0.22): 0.(54)
(0.12, 0.24): 0.5 <--------
(0.2 , 0.22): 0.(90) ***
(0.2 , 0.24): 0.8(3) +++
(0.22. 0.24): 0.91(6) ~~~
Any 2 or more pairs (a1,a2), (a3,a4), (... , ...) with a common ratio f can be replaced with { a1, a3, ..., f }.
Hence adding 0.5 to our set makes { 0.1, 0.11, 0.12 } redundant.
B = (0.2, 0.22, 0.24, 0.5}
We are now (i the general case) left with an optimization problem of selecting which of these elements to remove and which of these factors to add in order to minimize the cardinality of B (which I leave as an exercise to the reader).
Note that there is no need to introduce numbers greater than 1. B can also be represented as { 0.1, 0.11, 0.12, 2} but this set has the same cardinality.

Google's OR-Tools provide a nice CP solver which can be used to get solutions to this. You can encode your problem as a simple set of boolean variables, saying which variables or combinations of variables are valid.
I start by pulling in the relevant part of the library and setting up a few variables:
from ortools.sat.python import cp_model
A = [0.125, 0.25, 0.5, 0.75, 0.9]
# A = [0.1, 0.11, 0.12, 0.2, 0.22, 0.24]
model = cp_model.CpModel()
we can then define a few helper functions for creating variables from our numbers:
vars = {}
def get_var(val):
assert val >= 0 and val <= 1
if val in vars:
return vars[val]
var = model.NewBoolVar(str(val))
vars[val] = var
return var
pairs = {}
def get_pair(pair):
if pair in pairs:
return pairs[pair]
a, b = pair
av = get_var(a)
bv = get_var(b)
var = model.NewBoolVar(f'[{a} * {b}]')
model.AddBoolOr([av.Not(), bv.Not(), var])
model.AddImplication(var, av)
model.AddImplication(var, bv)
pairs[pair] = var
return var
i.e. get_var(0.5) will create a boolean variable (with Name='0.5'), while get_pair(0.5, 0.8) will create a variable and set constraints so that it's only true when 0.5 and 0.8 are also true. there's a useful document on encoding boolean logic in ortools
then we can go through A figuring out what combinations are valid and adding them as constraints to the solver:
for i, a in enumerate(A):
opts = {(a,)}
for a2 in A[i+1:]:
assert a < a2
m = a / a2
if m == a2:
opts.add((m,))
elif m < a2:
opts.add((m, a2))
else:
opts.add((a2, m))
alts = []
for opt in opts:
if len(opt) == 1:
alts.append(get_var(*opt))
else:
alts.append(get_pair(opt))
model.AddBoolOr(alts)
next we need a way of saying that we prefer variables to be false rather than true. the minimal version of this is:
model.Minimize(sum(vars.values()))
but we get much nicer results if we complicate this a bit and put a preference on values that were in A:
costsum = 0
for val, var in vars.items():
cost = 1000 if val in A else 1001
costsum += var * cost
model.Minimize(costsum)
finally, we can run our solver and print out a solution:
solver = cp_model.CpSolver()
status = solver.Solve(model)
print(solver.StatusName(status))
if status in {cp_model.FEASIBLE, cp_model.OPTIMAL}:
B = [val for val, var in vars.items() if solver.Value(var)]
print(sorted(B))
this gives me back the expected sets of:
[0.125, 0.5, 0.75, 0.9] and [0.2, 0.22, 0.24, 0.5]
for the two examples at the top
you could also encode the fact that you only consider solutions valid if |B| < |A| in the solver, but I'd be tempted to do that outside

Related

How to calculate the logarithmic maximum of n different independent simultaneous bets in Python?

I am trying to calculate the logarithmic maximum of n different bets. However, for this example, I have 2 independent simultaneous bets.
Bet 1 has a win probability of 30% and decimal odds of 12.80.
Bet 2 also has a win probability of 30% and decimal odds of 12.80.
To calculate the logarithmic maximum of 2 independent simultaneous bets, I need to work out the probability of all 4 combinations:
Bet 1 Winning/Bet 2 Winning
Bet 1 Winning/Bet 2 Losing
Bet 1 Losing/Bet 2 Winning
Bet 1 Losing/Bet 2 Losing
Assuming x0 is the amount between 0% and 100% of my portfolio on Bet 1 and x1 is the amount between 0% and 100% of my portfolio on Bet 2, the mathematically optimum stakes on both bets can be solved by maximising the following expression:
0.09log(1 + 11.8x0 + 11.8x1) + 0.21log(1 + 11.8x0 - x1) + 0.21log(1 - x0 + 11.8x1) + 0.49log(1 - x0 - x1) which equals x0: 0.214648, x1: 0.214648
(The 11.8 is not a typo, it is simply 12.8 - 1, the profit).
I have tried to implement this calculation in python, with little success. Here is my current code that I need assistance with:
from scipy.optimize import minimize
from math import log
from itertools import product
from sympy import symbols
Bets = [[0.3, 12.8], [0.3, 12.8]]
Odds = [([i[0], 1 - i[0]]) for i in Bets]
OddsList = list(product(Odds[0], Odds[1]))
#Output [(0.3, 0.3), (0.3, 0.7), (0.7, 0.3), (0.7, 0.7)]
Probability = []
for i in range(0, len(OddsList)):
Probability.append(OddsList[i][0] * OddsList[i][1])
#Output [0.09, 0.21, 0.21, 0.49]
Win = [([i[1] - 1, - 1]) for i in Bets]
WinList = list(product(Win[0], Win[1]))
#Output [(11.8, 11.8), (11.8, -1), (-1, 11.8), (-1, -1)]
xValues = []
for j in range(0, len(Bets)):
xValues.append(symbols('x' + str(j)))
#Output [x0, x1]
def logarithmic_return(xValues, Probability, WinList):
Sum = 0
for i in range(0, len(Probability)):
Sum += Probability[i] * log (1 + (WinList[i][0] * xValues[0]) + ((WinList[i][1] * xValues[1])))
return Sum
minimize(logarithmic_return(xValues, Probability, WinList))
#Error TypeError: Cannot convert expression to float
# However, when I do this, it works perfectly:
logarithmic_return([0.214648, 0.214648], Probability, WinList)
#Output 0.3911621722324154
Seems like this is your first time mixing numerical Python with symbolic. In short, you cannot use numerical functions (like math.log or scipy.optimize.minimize) on symbolic expressions. You need to convert your symbolic expressions to lambda function first.
Let's try to fix it:
from scipy.optimize import minimize
from itertools import product
from sympy import symbols, lambdify, log
import numpy as np
Bets = [[0.3, 12.8], [0.3, 12.8]]
Odds = [([i[0], 1 - i[0]]) for i in Bets]
OddsList = list(product(Odds[0], Odds[1]))
#Output [(0.3, 0.3), (0.3, 0.7), (0.7, 0.3), (0.7, 0.7)]
Probability = []
for i in range(0, len(OddsList)):
Probability.append(OddsList[i][0] * OddsList[i][1])
#Output [0.09, 0.21, 0.21, 0.49]
Win = [([i[1] - 1, - 1]) for i in Bets]
WinList = list(product(Win[0], Win[1]))
#Output [(11.8, 11.8), (11.8, -1), (-1, 11.8), (-1, -1)]
xValues = []
for j in range(0, len(Bets)):
xValues.append(symbols('x' + str(j)))
#Output [x0, x1]
def logarithmic_return(xValues, Probability, WinList):
Sum = 0
for i in range(0, len(Probability)):
Sum += Probability[i] * log (1 + (WinList[i][0] * xValues[0]) + ((WinList[i][1] * xValues[1])))
return Sum
# this is the symbolic expression
expr = logarithmic_return(xValues, Probability, WinList)
# convert the symbolic expression to a lambda function for
# numerical evaluation
f = lambdify(xValues, expr)
# minimize expect a function of the type f(x), not f(x0, x1).
# hence, we create a wrapper function
func_to_minimize = lambda x: f(x[0], x[1])
initial_guess = [0.5, 0.5]
minimize(func_to_minimize, initial_guess)
# fun: -inf
# hess_inv: array([[1, 0],
# [0, 1]])
# jac: array([nan, nan])
# message: 'NaN result encountered.'
# nfev: 3
# nit: 0
# njev: 1
# status: 3
# success: False
# x: array([0.5, 0.5])
As you can see, the minimization works. However it didn't find any solution. This is your problem to fix. Here, I just hint you the shape of the function you are trying to minimize.
The problem here is that scipy.optimize.minimize wants to be passed a function. You are not passing a function. You are CALLING your function, and passing its return (a float) to minimize.
You need:
minimize( logarithmic_return, xValues, args=(Probability, WinList) )

Maximize objective using scipy (by kelly criterium)

I have the following two pandas dataframes: new & outcome
new = pd.DataFrame([[5,5,1.6],[0.22,0.22,0.56]]).T
new.index = ['Visitor','Draw','Home']
new.columns = ['Decimal odds', 'Win prob']
new['Bet amount'] = np.zeros((len(new),1))
With output:
Decimal odds Win prob Bet amount
Visitor 5.0 0.22 0.0
Draw 5.0 0.22 0.0
Home 1.6 0.56 0.0
And dataframe 'outcome'
outcome = pd.DataFrame([[0.22,0.22,0.56],[100,100,100]]).T
outcome.index = ['Visitor win','Draw','Home win']
outcome.columns = ['Prob.','Starting bankroll']
outcome['Wins'] = ((new['Decimal odds'] - 1) * new['Bet amount']).values
outcome['Losses'] = [sum(new['Bet amount'][[1,2]]) , sum(new['Bet amount'][[0,2]]), sum(new['Bet amount'][[0,1]])]
outcome['Ending bankroll'] = outcome['Starting bankroll'] + outcome['Wins'] - outcome['Losses']
outcome['Logarithm'] = np.log(outcome['Ending bankroll'])
With output:
Prob. Starting bankroll Wins Losses Ending bankroll Logarithm
Visitor win 0.22 100.0 0.0 0.0 100.0 4.60517
Draw 0.22 100.0 0.0 0.0 100.0 4.60517
Home win 0.56 100.0 0.0 0.0 100.0 4.60517
Hereby the objective is calculated by the formula below:
objective = sum(outcome['Prob.'] * outcome['Logarithm'])
Now I want to maximize the objective by the values contained in column `new['Bet amount']. The constraints are that a, b, and c are bounded between 0 and 100. Also the summation of a, b and c must be below 100. Reason is that a,b,c resemble the ratio of your bankroll that is used to place a sports bet.
Want to achieve this using the scipy library. My code so far looks like:
from scipy.optimize import minimize
prob = new['Win prob']
decimal = new['Decimal odds']
bank = outcome['Starting bankroll'][0]
def constraint1(bet):
a,b,c = bet
return 100 - a + b + c
con1 = {'type': 'ineq', 'fun': constraint1}
cons = [con1]
b0, b1, b2 = (0,100), (0,100), (0,100)
bnds = (b0, b1, b2)
def f(bet, sign = -1):
global prob, decimal, bank
p0,p1,p2 = prob
d0,d1,d2 = decimal
a,b,c = bet
wins0 = a * (d0-1)
wins1 = b * (d1-1)
wins2 = c * (d2-1)
loss0 = b + c
loss1 = a + c
loss2 = a + b
log0 = np.log(bank + wins0 - loss0)
log1 = np.log(bank + wins1 - loss1)
log2 = np.log(bank + wins2 - loss2)
objective = (log0 * p0 + log1 * p1 + log2 * p2)
return sign * objective
bet = [5,8,7]
result = minimize(f, bet, method = 'SLSQP', bounds = bnds, constraints = cons)
This however, does not result in the desired result. Desired result would be:
a = 3.33
b = 3.33
c = 0
My question is also how to set the method and initial values? Results seem to differ a lot by assigning different method's and initial values for the bets.
Any help would be greatly appreciated!
(This is an example posted on the pinnacle website: https://www.pinnacle.com/en/betting-articles/Betting-Strategy/the-real-kelly-criterion/HZKJTFCB3KNYN9CJ)
If you print out the "bet" values inside your function, you can see where it's going wrong.
[5. 8. 7.]
[5.00000001 8. 7. ]
[5. 8.00000001 7. ]
[5. 8. 7.00000001]
[5.00040728 7.9990977 6.99975556]
[5.00040729 7.9990977 6.99975556]
[5.00040728 7.99909772 6.99975556]
[5.00040728 7.9990977 6.99975558]
[5.00244218 7.99458802 6.99853367]
[5.0024422 7.99458802 6.99853367]
The algorithm is trying to optimize the formula with very small adjustments relative to your initial values, and it never adjusts enough to get to the values you're looking for.
If you check scipy webpage, you find https://docs.scipy.org/doc/scipy/reference/optimize.minimize-slsqp.html#optimize-minimize-slsqp
eps float
Step size used for numerical approximation of the Jacobian.
result = minimize(f, bet, method='SLSQP', bounds=bnds, constraints=cons,
options={'maxiter': 100, 'ftol': 1e-06, 'iprint': 1, 'disp': True,
'eps': 1.4901161193847656e-08, 'finite_diff_rel_step': None})
So you're starting off with a step size of 1.0e-08, so your initial estimates are off by many orders of magnitude outside the range where the algorithm is going to be looking.
I'd recommend normalizing your bets to values between zero and 1. So instead of saying I'm placing a bet between 0 and 100, just say you're wagering a fraction of your net wealth between 0 and 1. A lot of algorithms are designed to work with standardized inputs (between 0 and 1) or normalized inputs (standard deviations from the mean).
Also, it looks like :
def constraint1(bet):
a,b,c = bet
return 100 - a + b + c
should be:
def constraint1(bet):
a,b,c = bet
return 100 - (a + b + c)
but I don't think that impacts your results

How to get rid of the ZeroDivisionError: float division by zero

I keep running into the error float division by zero and can't understand why I am getting it. However when I run the code originally given to me (written and run in matlab) no errors occur.
The Code
import numpy as np
import matplotlib.pyplot as plt
from astropy import constants as const
#Part 1: Exploring Rotation Curves
M = 10**42 #Approximate mass of the Milky Way (kg)
G = const.G #Universal gravitational constant (m^3 kg^-1 s^-2)
r = np.linspace(0, 3e20) #Radii (m)
rkpc = r*(3.24e-20) #Radii (kpc)
plt.figure(1)
plt.title('Rotation Curves for Three Mass Distributions')
v1 = np.sqrt(G * M / r) # Orbital velocity in system with central mass (m/s)
M_prop = np.linspace(0, M) # Array of masses increasing proportionally with radius
v2 = np.sqrt(G * M_prop / r)
M_dens = (M * (r / (max(r)))**3)
v3 = np.sqrt((G * M_dens) / r)
plt.plot(rkpc, v1/1000, 'b', label = 'Constant M_{r}')
plt.plot(rkpc, v2/1000, 'k', label = 'M_{r} \propto r')
plt.plot(rkpc, v3/1000, 'r', label = 'M_{r} \propto r^{3}')
I know the error is occurring due to the two following lines
M_dens = (M * (r / (max(r)))**3)
v3 = np.sqrt((G * M_dens) / r)
I assume it is happening due to the max(r) but would someone be able to shed more light on why this is happening? Potentially a fix?
Sorry if this doesn't work, I'm a bit rough with math commands like these.
In this line:
r = np.linspace(0, 3e20)
r will start as 0. Later in this line:
v3 = np.sqrt((G * M_dens) / r)
you divide by r, which is 0.
Anything divided by 0 is undefined, so Python doesn't like it and raises the error.
I'm not sure how matlab handles divide by zero but it's possible to change the numpy behaviour using np.errstate.
a = np.arange(-5, 5.)
b = np.arange(-2, 8.)
with np.errstate(divide='ignore'):
res0 = a / b
res1 = b / a
print(res0, '\n', res1)
# [ 2.5 4. -inf -2. -0.5 0. 0.25 0.4 0.5 0.57142857]
# [ 0.4 0.25 -0. -0.5 -2. inf 4. 2.5 2. 1.75]
Alternatively create a function which can set the inf, -inf results to a useful default value.
def do_div( a,b, def_val=np.inf):
with np.errstate(divide='ignore'):
res = a / b
res[ ~np.isfinite(res) ] = def_val
return res
print( do_div( a, b, 100 ))
# [ 2.5 4. 100. -2. -0.5 0. 0.25 0.4 0.5 0.57142857]
print( do_div( b, a, 100 ))
# [ 0.4 0.25 -0. -0.5 -2. 100. 4. 2.5 2. 1.75]
Setting the errstate for divide to 'ignore' suppresses the warning. Numpy returns plus or minus infinity for a divide by zero. The do_div function sets any infinity values to a default. In my work that's most often zero. I've used 100 here so it's easy to see. Matlab probably does something similar returning infinity or an alternative default value and not issuing an error or a warning.

Sampling real numbers with sum and minimum value constraints

How can I sample N random values such that the following constraints are satisfied?
the N values add up to 1.0
none of the values is less than 0.01 (or some other threshold T << 1/N)
The following procedure was my first attempt.
def proportions(N):
proportions = list()
for value in sorted(numpy.random.random(N - 1) * 0.98 + 0.01):
prop = value - sum(proportions)
proportions.append(prop)
prop = 1.0 - sum(proportions)
proportions.append(prop)
return proportions
The * 0.98 + 0.01 bit was intended to enforce the ≥ 1% constraint. This works on the margins, but doesn't work internally—if two random values have a distance of < 0.01 it is not caught/corrected. Example:
>>> numpy.random.seed(2000)
>>> proportions(5)
[0.3397481983960182, 0.14892479749759702, 0.07456518420712799, 0.005868759570153426, 0.43089306032910335]
Any suggestions to fix this broken approach or to replace it with a better approach?
You could adapt Mark Dickinson's nice solution:
import random
def proportions(n):
dividers = sorted(random.sample(range(1, 100), n - 1))
return [(a - b) / 100 for a, b in zip(dividers + [100], [0] + dividers)]
print(proportions(5))
# [0.13, 0.19, 0.3, 0.34, 0.04]
# or
# [0.31, 0.38, 0.12, 0.05, 0.14]
# etc
Note this assumes "none of the values is less than 0.01" is a fixed threshold
UPDATE: We can generalize if we take the reciprocal of the threshold and use that to replace the hard-coded 100 values in the proposed code.
def proportions(N, T=0.01):
limit = int(1 / T)
dividers = sorted(random.sample(range(1, limit), N - 1))
return [(a - b) / limit for a, b in zip(dividers + [limit], [0] + dividers)]
What about this?
N/2 times, choose a random number x such that 1/N+x & 1/N-x fit your constraints; add 1/N+x & 1/N-x
If N is odd, add 1/N

DataFrame.corr() - Pearson linear correlation calculated with the same duplicated data?

x=[0.3, 0.3, 0.3, ..., 0.3] (number of 0.3: 10)
y=x
What is the linear correlation coefficiency between x and y?
For this x and y, all pairs points to the same point (0.3, 0.3). Can we say x and y are linear correlated?
scipy.stats.pearsonr(x, y) will give you Yes (1.0, 0.0). But does it make sense?
However, if we change all 0.3 to 3, scipy will give you No (NaN, 1.0). Why is it different from previous (0.3) one? Related to the deviation of the floating numbers? But if we use 3.0 instead of 3, we still get No (NaN, 1.0). Does any one know why different inputs generates different outputs?
# When using 0.3:
# result: (1.0, 0.0)
import scipy.stats
a=[]
for i in range(10):
a.append(0.3)
b=a
scipy.stats.pearsonr(a,b)
# When using int 3:
# result: (nan, 1.0)
import scipy.stats
a=[]
for i in range(10):
a.append(3)
b=a
scipy.stats.pearsonr(a,b)
# When using 3.0:
# result: (nan, 1.0)
import scipy.stats
a=[]
for i in range(10):
a.append(3.0)
b=a
scipy.stats.pearsonr(a,b)
See the in-line comments above.
Using the Pearson R coefficient, which assumes a normal distribution of the data, on a bunch of constants is a mathematically undefined operation.
xm = x - x.mean()
ym = y - y.mean()
r = sum(xm * ym) / np.sqrt( sum(xm**2) * sum(ym**2) )
In other words, if there is no variation in your data, you are dividing by zero.
Now the reason why it works for a repetition of the float 0.3:
a = [0.3 for _ in range(10)] #note that single-decimal only 0.3 and 0.6 fail
b = [3.0 for _ in range(10)]
print(np.asarray(a).mean(), np.asarray(b).mean())
#0.29999999999999993 3.0
print(0.3 - 0.29999999999999993)
#5.551115123125783e-17
So, by merit of this tiny, tiny floating point deviation stemming from the averaging operation, there is something to calculate and the correlation can be pegged at 1.0; although the application of the method is still invalid.

Categories