I am optimizing the behavior of battery storage combined with solar PV to generate the highest possible revenue stream.
I now want to add one more revenue stream: Peak Shaving (or Demand Charge Reduction)
My approach is as follows:
Next to the price per kWh, an industrial customer pays for the maximal amount of power (kW) he was drawing from the grid in one period (i=1:end), so called demand charges
This maximum amount is found in the vector P_Grid = P_GridLoad (energy self-consumed from the grid) + P_GridBatt (energy used to charge the battery)
There exists a price vector which tells the price per kW for all points in time
I now want to generate a vector P_GridMax that is zero for all points in time but the moment when the maximal value of P_Grid occurs (then it equals max(P_Grid).
Thus, the vector P_GridMax consists of zeros and one nonzero element (not more!)
In doing so, I can now multiply this vector with the price vector, sum up over all points in time and receive the billed demand charges
By including this vector into the objective of my model I can minimize these charges
Now, does anybody see a solution for how to formulate such a constraint (P_GridMax)? I already updated my objective function and defined P_Grid.
Any other approach would also be welcome.
This is the relevant part of my model, with P_xxx = power flow vectors, C_xxx = price vectors, ...
m.P_Grid = Var(m.i_TIME, within = NonNegativeReals)
m.P_GridMax = Var(m.i_TIME, within = NonNegativeReals)
# Minimize electricity bill
def Total_cost(m):
return ... + sum(m.P_GridMax[i] * m.C_PowerCosts[i] for i in m.i_TIME) - ...
m.Cost = Objective(rule=Total_cost)
## Peak Shaving constraints
def Grid_Def(m,i):
return m.P_Grid[i] = m.P_GridLoad[i] + m.P_GridBatt[i]
m.Bound_Grid = Constraint(m.i_TIME,rule=Grid_Def)
def Peak_Rule(m,i):
????
????
????
????
m.Bound_Peak = Constraint(m.i_TIME,rule=Peak_Rule)
Thank you very much in advance! Please be aware that I have very little experience with python/pyomo coding, I would really appreciate you giving extensive explanations :)
Best,
Mathias
Another idea is that you don't actually need to index your P_GridMax variable with time.
If you're dealing with demand costs they tend to be fixed over some period, or in your case it seems that they are fixed over the entire problem horizon (since you're only looking for one max value).
In that case you would just need to do:
m.P_GridMax = pyo.Var(domain=pyo.NonNegativeReals)
def Peak_Rule(m, i):
return m.P_GridMax >= m.P_Grid[i]
m.Bound_Peak = pyo.Constraint(m.i_TIME,rule=Peak_Rule)
if you're really set on multiplying your vectors element-wise you can also just create a new variable that represents that indexed product and apply the same principle to extract the max value.
Here is one way to do this:
introduce a binary helper variable ismax[i] for i in i_TIME. This variable is 1 if the maximum is obtained in period i and 0 otherwise. Then obviously you have a constraint sum(ismax[i] for i in i_TIME) == 1: the maximum must be attained in exactly one period.
Now you need two additional constraints:
if ismax[i] == 0 then P_GridMax[i] == 0.
if ismax[i] == 1 then for all j in i_TIME we must have P_GridMax[i] >= P_GridMax[j].
The best way to formulate this would be to use indicator constraints but I don't know Pyomo so I don't know whether it supports that (I suppose it does but I don't know how to write them). So I'll give instead a big-M formulation.
For this formulation you need to define a constant M so that P_Grid[i] can not exceed that value for any i. With that the first constraint becomes
P_GridMax[i] <= M * ismax[i]
That constraint forces P_GridMax[i] to 0 unless ismax[i] == 1. For ismax[i] == 1 it is redundant.
The second constraint would be for all j in i_TIME
P_GridMax[i] + M * (1 - ismax[i]) >= P_Grid[j]
If ismax[i] == 0 then the left-hand side of this constraint is at least M, so by the definition of M it will be satisfied no matter what the value of P_GridMax[i] is (the first constraint forces P_Grid[i] == 0 in that case). For ismax[i] == 1 the left-hand side of the constraint becomes just P_GridMax[i], exactly what we want.
Related
I've written a function which calculates the proportion of observations that fall within a specified interval. So, if our observations are assessment marks, we can find out the proportion of students that got, say, between 70 and 100 marks. I've included a boolean parameter since in all but the last interval (with the largest observation as the upper bound) we want to say that the value on the upper bound is included in the next interval. For example, if we're looking at marks between 50-70, we don't want to include 70. My function is:
import numpy as np
def compute_interval_proportion(observations, lower_bound, upper_bound, include_upper):
"""
Calculates the proportion of observations that fall within a specified interval.
If include_upper == True, then the interval is inclusive; otherwise not.
"""
if include_upper == True:
indices = np.where((observations >= lower_bound)
& (assessment1marks <= upper_bound))
else:
indices = np.where((observations >= lower_bound)
& (assessment1marks < upper_bound))
count = len(observations[indices])
proportion = round(count / len(assessment1marks),3)
return proportion
This function works, I think, but I feel it is a bit pedestrian (e.g. lots of parameters) and perhaps there is a more sleek or quicker way of doing it. Perhaps there is a way of avoiding requiring the user to manually specify whether they want to include the upper bound or not. Any suggestions?
I've tried to simplify your function, the results are below. The main changes are:
We automatically detect if upper is the observations' upper bound, in which case we include the bound in the interval.
numpy conveniently lets you sum booleans by casting False to 0 and True to 1, which allows us to turn the proportion calculation into a simple mean.
def compute_interval_proportion(observations, lower, upper):
if upper >= observations.max():
upper_cond = observations <= upper
else:
upper_cond = observations < upper
proportion = ((observations >= lower) & upper_cond).mean()
return proportion.round(3)
I am working on one automation project using python and selenium. I am about to finish it but I stuck on one point.
I have a time value in min. like 4529 min, 3123.. In my scenario, firstly, I am taking number of project and I am taking percentages of each project. Then I need to assign the time regarding of percentage.
If user has 3 project;
first project: %50
second project: %30
third project: %20
But the problem is when I am trying to divide the total time result should be decimal it can not be float value. So, I need to adjust percentage after taking it from user.
This is what I am trying to do. But the code below it is working correctly sometimes but in some cases it doesnt work properly.
Do you have any suggestion?
existingTime = 4529
percentage1 = 70
percentage2 = 30
value1 = (existingTime * percentage1)/100
value2 = (existingTime * percentage2)/100
if value1 % 1 != 0:
mod1 = value1 % 1
mod2 = value2 % 1
value1 = int(value1- mod1)
value2 = value2 + mod1
if value2 % 1 != 0:
value2 = math.ceil(value2)
value2 = int(value2)
percentage1 = (value1 * 100)/ existingTime
percentage2 = (value2 * 100)/ existingTime
print(value1)
print(value2)
total = value2+value1
print(total)
print(percentage1)
print(percentage2)
Output:
3170
1360
4530
69.99337602119674
30.028703908147495
Let me see if I actually understand your problem.
You have some integer quantity (of minutes), and you want to divide it according to a list of percentages, so that each allocation is an integer which is "as close as possible" to the corresponding percentage and the sum of the allocations is exactly the original quantity.
Is that it?
This is actually a surprisingly subtle problem, usually referred to as the allocation problem, which has been well-studied because it's essentially the same problem as allocating congressional seats in a proportional-representation electoral system, or proportionally allocating representatives to states in a confederate state like the US; these are problems which require a solution which is not only provably correct, but also visibly fair.
The complication is that there is more than one possible measure for "as close as possible", and different such measures require subtly different algorithms. The difference between the possible results are very slight, usually not more than a difference of one in any single allocation. But in an election, or a distribution of electoral power, even very slight differences can have dramatic consequences. I'm assuming that your problem is not actually that sensitive, and that it's sufficient that the result be "as close as possible" according to some reasonable criterion, which does not need to be otherwise justified.
So here's an allocation method which is computationally simple and which produces results which are guaranteed to satisfy the so-called "quota rule", which is that each allocation is either the floor or the ceiling of the exact proportional allocation (that is, the allocation ignoring the need that each allocation be an integer). There are reasons why it's not very often used in elections (search for the "Alabama paradox" if you're curious), but it's simple to explain and simple to verify, which are also useful attributes for an election system.
The algorithm is called "largest remainder", because that's the way it works. I'll use your first example, which divides 4529 minutes into three allocations with percentages 50%, 30% and 20%, because dividing into two pieces is just too trivial to show how the algorithm works.
We start by computing the precise allocations, the products of the total allocation with each percentage:
A: 4529 * 50% = 2264.50
B: 4529 * 30% = 1358.70
C: 4529 * 20% = 905.80
Note that since the percentages add up to 1, the sum of the precise allocations is exactly the total allocation.
We start with an initial allocation, using just the integer part of each precise allocation. In mathematical terms, that's the "floor", but in Python we can compute it using the int constructor, which truncates towards 0. (See help(int)). In most cases, that will not sum to the total allocation, because there will almost always be some fraction in the product. So there will be some unallocated units. But, crucially, there will be fewer unallocated units than the number of allocations, because we're dropping only remainders, each of which is strictly less than 1.0, and if you sum k such remainders, the sum must be less than k.
So the next step is to figure out how many unallocated items there are:
M = 4529
A + B + C = 2264 + 1358 + 905 = 4527
M - (A + B + C) = 2
So we still need to allocate two units, and we have three possible places to allocate them. As its name suggests, the largest remainder procedure is to sort the partition in descending order by remainder, and then allocate the rest of the units in turn to the partitions with the largest remainders. In this case, the largest remainder is 0.80 (C) and the second-largest remainder is 0.70 (B), so that's where the two units go, resulting in the allocation:
A: 2264 = 2264 (49.989%)
B: 1358 + 1 = 1359 (30.007%)
C: 905 + 1 = 906 (20.004%)
----
Total 4529
As a python program:
def allocate(m:int, w:[float]):
""" Given a number of units m, allocate them according to
the vector of weights w.
"""
# Normalise w
wsum = sum(w)
# compute precise allocation for each value
precise = [m * v / wsum for v in w]
# and the initial allocation
alloc = [int(v) for v in precise]
# compute the shortfall
needed = m - sum(alloc)
if needed: # Almost always true, but just in case
# make a list of (index, initial, remainder) sorted by remainder
triples = sorted(((i, a, p - a)
for (i, a), p in zip(enumerate(alloc), precise)),
key = lambda v:v[2])
# increment the allocation for the 'needed' largest elements
triples = (triples[:-needed]
+ [(i, a+1, r) for i, a, r in triples[-needed:]])
# sort again by the index, in order to get the triples back
# into the right order, and then extract the allocs
alloc = [t[1] for t in sorted(triples)]
return alloc
I have the following OF to minimize the cost of supply chain:
mdl.minimize(mdl.sum((cs+ch+cf+cv*d[j])*q[j] for j in arcs) + mdl.sum(α*(eh+et*d[j])*q[j] for j in arcs) + mdl.sum(β*(gh+gt*d[j])*q[j] for j in arcs) + mdl.sum(X[f]*cjf for f in comb))
Where cs, ch, cf, cv, eh, et, gh, gt, cjf, α and β are a series of constant parameters.
d[j] is the distance between origin iand destination j that are combined in a list of arcs or tuples.
q[j] is the flow variable between origin i and destination j in arcs.
X[f] is a binary variable to open a facility in destination j with capacity f, the possible combinations of j and f are listed in comb.
The first constraint 1 ensures the flow q[i,j] from origin i does not exceed its maximum availability of material dQ in i. D[(i, j)] is a binary parameter that is 1 if the distance between origin i and destination j is less or equal than a treshold value, else the value of D[(i, j)] is 0. (This parameter helps us to limit the transport distance.)
for i in I: mdl.add_constraint(mdl.sum(q[(i, j)]*D[(i, j)] for j in J) <= Qi[i])
The second constraint 2 ensures the flow q[i,j] to a destination j equals the capacity of the opened facility in destination j with capacity f.
for j in J: mdl.add_constraint(mdl.sum(q[(i, j)]for i in I) == mdl.sum(X[(j,f)] for f in F))
But then, we want another constraint 3 that ensures the sum of capacities f in the facilities opened at destinations j has to be as close as possible to the total demand of capacities E. Let's say there is an energy demand of 100 MW E = 100, then we want to reduce the cost in OF of the supply but also make sure we reach the demand E. Otherwise minimizing the cost would be 0. This constraint can be formulated like:
mdl.add_constraint(mdl.sum(X[j,f]for j in J for f in F) == E)
Unfortunately, this solution is never feasible. If we replace == by <= than it is feasible, but it is at minimal cost and the capacity is nowhere near maximal.
We don't need this to be a strict constraint but we do wanna get as close to E as possible, by opening multiple facilities at destinations j with different capacities f. (Eg. we could have one destination with 20 MW, one at 5 MW, two at 30 MW and another at 15 MW to reach 100 MW by opening 5 destinations)
One way is to force the model to open N number of locations j, however, we have a set of 128 locations. To find the minimum cost and maximum capacity from a ranges of scenarios from N=1 to N=128 means we need to run this model 128 times.
On top of the above-mentioned constraint we have 3 additional constraints:
We can only select destination j to built a facility and open it at only one capacity f.
The sum of destinations j to open is greater than 0.
There is no negative flow q between origins i and destinations j
Is there a way to:
Make constraint 3 less binding, but still try to reach E while keeping the cost minimal?
Reformulate the OF to combine the minimized cost with the maximized capacity?
Importantly we do not want to run the model 128 times. We want to model to select the destinations j to open a facility and select the capacity f accordingly, to minimize the overall cost of supply and maximize the installed capacity. In our case,e it would also be very unlikely to open just one destination j to satisfy all demand E. Instead we would have multiple j with smaller f capacity that approach E when summed.
This is 'multi-objective optimisation'. Two possible ways to achieve this are outlined below.
First approach is to get a combined single objective function. This is easier if both terms are working in the same direction e.g. both are minimising terms. So for your 'constraint 3' try using a term in the objective for the shortfall relative to the demand, so the shortfall is something like:
Shortfall == E - mdl.sum(X[j,f]for j in J for f in F)
Then add the shortfall into the objective, and use some weighting factors for the two terms, e.g.:
w * Cost + (1-w) * Shortfall
Then if w is one, you are just minimising the cost, while if w is zero, you are just minimising the shortfall. Of course, if you use a value for the weighting between zero and one, then you will get a balance of the two objectives. You will need to pay careful attention to the value of the weighting split between your two terms.
A variation in this approach would be to give a much bigger weight to one term than the other, so that term dominates the objective. Then the solver will try to minimise that more important term (e.g. the shortfall), and the other term will help select the lower cost options for achieving that. In practice this often does NOT work as well as people expect - adding very large and very small terms in the objective can give rise to numerical issues in the solver, and often the real differences in the objective values of different solutions can get lost in the tolerances in the solver anyway. For example we have seen some people use relative weights of 1 million to one, while still using an optimality gap of 1e-6; in this case the second term effectively gets lost in the noise because many (perhaps very different) alternatives look almost the same to the solver and fall within the tolerances and so effectivley get ignored.
The second approach is 'lexical multi-objective' solving which is a little more complicated, but doesn't rely on some troublesome weighting factors. Effectively you need to solve your problem twice - once to find the maximum capacity that you can provide, and then fix that with a constraint in your second problem and minimise the cost of delivering that capacity.
In practice you might adjust this purist approach, and accept solutions in your second model that are close enough to achieving the maximum capacity. So for example you may fix the total capacity in your second model to be e.g. at least 99% of the calculated maximum capacity achievable from the first model. This reflects the cases where there are maybe only a few (expensive) ways to achieve the absolute maximum capacity, but there may be worthwhile savings if we explore solutions that are close to that maximum.
Note that there are several solvers that provide ready-built support for multi-objective models using this 'lexical' approach which may avoid you explicitly solving the model twice for each case.
You could try to use CPLEX multi objective feature in docplex.
See basic example in https://www.linkedin.com/pulse/making-optimization-simple-python-alex-fleischer/
from docplex.mp.model import Model
mdl = Model(name='buses')
nbbus50 = mdl.integer_var(name='nbBus50')
nbbus40 = mdl.integer_var(name='nbBus40')
nbbus30 = mdl.integer_var(name='nbBus30')
cost = mdl.continuous_var(name='cost')
co2emission = mdl.continuous_var(name='co2emission')
mdl.add_constraint(nbbus50*50+nbbus40*40 + nbbus30*30 >= 200, 'kids')
mdl.add_constraint(co2emission==nbbus50+nbbus40*1.1+nbbus30*1.2)
mdl.add_constraint(cost==nbbus40*500 + nbbus30*400+nbbus50*625)
sense="min"
exprs=[cost,co2emission]
priorities=[1,2]
weights=[1,1]
mdl.set_multi_objective(sense, exprs, priorities, weights, abstols=None, reltols=None, names=None)
mdl.solve(lex_mipgaps = [0.001, 0.05], log_output=True)
for v in mdl.iter_integer_vars():
print(v," = ",v.solution_value)
print("The minimum cost is ",cost.solution_value);
print("CO2 emission is ",co2emission.solution_value);
'''
which gives
nbBus50 = 4.0
nbBus40 = 0
nbBus30 = 0
The minimum cost is 2500.0
CO2 emission is 4.0
'''
You might also consider a simpler API, namely Model.minimize_static_lex, to which you pass a list of expressions you want to minimize in a lexicographic ordering:
#mdl.set_multi_objective(sense, exprs, priorities, weights, abstols=None, reltols=None, names=None)
mdl.minimize_static_lex(exprs=[cost, co2emission])
mdl.solve(lex_mipgaps=[0.001, 0.05], log_output=True)
(I posted a similar question a few days ago, but I have changed my approach given the answers in the last post and have a different approach)
I am trying to use scipy.optimize to solve my optimization problem, but I keep getting an incorrect answer, it just returns my initial guess (x0). Here, I am using the dual_annealing algorithm, but I have also tried different global optimization algorithms (differential_evolution, shgo) as well as local minimization (minimize with method SLSQP, but this caused problems as my function does not have a gradient) but to no avail.
For context, the program is trying to find the best way to allocate some product across multiple stores. Each stores has a forecast of what they are expected to sell in the following days (sales_data). This forecast does not necessarily have to be integers, or above 1 (it rarely is), it is an expectation in the statistical sense. So, if a store has sales_data = [0.33, 0.33, 0.33] the it is expected that after 3 days, they will sell 1 unit of product.
I want to minimize the total time it takes to sell the units i am allocating (i want to sell them the fastest) and my constraints are that I have to allocate the units that I have available, and I cannot allocate a negative number of product to a store. I am ok having non-integer allocations for now. For my initial allocations i am dividing the units I have available equally among all stores.
Thus, stated in a more mathematical way, I want to maximize the time_objective function, subject to the constraints that all allocations have to be of non-negative value (min(allocations) >= 0) and that I have to allocate all the units available (sum(allocations) == unitsAvailable). As dual_annealing does not support constraints, I deal with the first constraint by assigning 0 as a lower bound for every allocation (and unitsAvailable as the upper bound). For the second constraint, I wrap the objective function in the constrained_objective function, which returns numpy.inf if the constraint is violated. This constrained_objective function also takes all allocations but the last one, and sets the last allocation to the remainder units (as this is equivalent of constraining all of them, but does not require the sum of allocations to be exactly unitsAvailable).
Here is my code:
import numpy
import scipy.optimize as spo
unitsAvailable = 10
days = 50
class Store:
def __init__(self, num):
self.num = num
self.sales_data = []
# Mock Data
stores = []
for i in range(10):
# Identifier
stores.append(Store(i))
# Expected units to be sold that day (It's unlikey they will sell 1 every day)
stores[i].sales_data = [(i % 10) / 10 for x in range(days)]
def days_to_turn(alloc, store):
day = 0
inventory = alloc
while (inventory > 0 and day < days):
inventory -= store.sales_data[day]
day += 1
return day
def time_objective(allocations):
time = 0
for i in range(len(stores)):
time = max(time, days_to_turn(allocations[i], stores[i]))
return time
def constrained_objective(partial_allocs):
if numpy.sum(partial_allocs) > unitsAvailable:
# can't sell more than is available, so make the objective infeasible
return numpy.inf
# Partial_alloc contains allocations to all but one store.
# The final store gets allocated the remaining units.
allocs = numpy.append(partial_allocs, unitsAvailable - numpy.sum(partial_allocs))
return time_objective(allocs)
# Initial guess (x0)
guess_allocs = []
for i in range(len(stores)):
guess_allocs.append(unitsAvailable / len(stores))
guess_allocs = numpy.array(guess_allocs)
print('Optimizing...')
bounds = [(0, unitsAvailable)] * (len(stores))
time_solution = spo.dual_annealing(constrained_objective, bounds[:-1], x0=guess_allocs[:-1])
allocs = numpy.append(time_solution.x, unitsAvailable - numpy.sum(time_solution.x))
print("Allocations: " + str(allocs))
print("Days to turn: " + str(time_solution.fun))
I am using scipy.optimize.minimize to try to determine the optimal parameters of a probability density function (PDF). My PDF involves a discrete Gaussian kernel (https://en.wikipedia.org/wiki/Gaussian_function and https://en.wikipedia.org/wiki/Scale_space_implementation#The_discrete_Gaussian_kernel).
In theory, I know the average value of the PDF (where the PDF should be centered on). So if I were to calculate the expectation value of my PDF, I should recover the mean value that I already know. My PDF is sampled at discrete values of n (which must never be negative and should start at 0 to make any physical sense), and I am trying to determine the optimal value of t (the "scaling factor") to recover the average value of the PDF (which again, I already know ahead of time).
My minimal working example to determine the optimal "scaling factor" t is the following:
#!/usr/bin/env python3
import numpy as np
from scipy.special import iv
from scipy.optimize import minimize
def discrete_gaussian_kernel(t, n):
return np.exp(-t) * iv(n, t)
def expectation_value(t, average):
# One constraint is that the starting value
# of the range over which I sample the PDF
# should be 0.
# Method 1 - This seems to give good, consistent results
int_average = int(average)
ceiling_average = int(np.ceil(average))
N = range(int_average - ceiling_average + 1,
int_average + ceiling_average + 2)
# Method 2 - The multiplicative factor for 'end' is arbitrary.
# I should in principle be able make end be as large as
# I want since the PDF goes to zero for large values of n,
# but this seems to impact the result and I do now know why.
#start = 0
#end = 2 * int(average)
#N = range(start, end)
return np.sum([n * discrete_gaussian_kernel(t, n - average) for n in N])
def minimize_function(t, average):
return average - expectation_value(t, average)
if __name__ == '__main__':
average = 8.33342
#average = 7.33342
solution = minimize(fun = minimize_function,
x0 = 1,
args = average)
print(solution)
t = solution.x[0]
print(' solution t =', t)
print(' given average =', average)
print('recalculated average =', expectation_value(t, average))
I have two problems with my minimal working example:
1) The code works OK for some values of what I choose for the variable "average." One example of this is when the value is 8.33342. However, the code does not work for other values, for example 7.33342. In this case, I get
RuntimeWarning: overflow encountered in exp
so I was thinking that maybe scipy.optimize.minimize is choosing a bad value for t (like a large negative number). I am confident that this is the problem since I have printed out the value of t in the function expectation_value, and t becomes increasingly negative. So I would like to add bounds to the possible values of what "t" can take ("t" should not be negative). Looking at the documentation of scipy.optimize.minimize, there is a bounds keyword argument. So I tried:
solution = minimize(fun = minimize_function,
x0 = 1,
args = average,
bounds = ((0, None)))
but I get the error:
ValueError: length of x0 != length of bounds
I searched for this error on stackoverflow, and there are some other threads, but I did not find any helpful. How can I set a bound successfully?
2) My other question has to do with scipy.optimize.minimize being sensitive to the range over which I calculate the expectation value. For an average value of
average = 8.33342
and the method of calculating the range as
# Method 1 - This seems to give good, consistent results
int_average = int(average)
ceiling_average = int(np.ceil(average))
N = range(int_average - ceiling_average + 1,
int_average + ceiling_average + 2)
the "recalculated average" is 8.3329696426. But for the other method (which has a very similar range),
# Method 2 - The multiplicative factor for 'end' is arbitrary.
# I should in principle be able make end be as large as
# I want since the PDF goes to zero for large values of n,
# but this seems to impact the result and I do now know why.
start = 0
end = 2 * int(average)
N = range(start, end)
the "recalculated average" is 8.31991111857. The ranges are similar in each case, so I don't know why there is such a large change, especially since I what my recalculated average to be as close as possible to the true average. And if I were to extend the range to larger values (which I think is reasonable since the PDF goes to zero there),
start = 0
end = 4 * int(average)
N = range(start, end)
the "recalculated average" is 9.12939372912, which is even worse. So is there a consistent method to calculate the range so that the reconstructed average is always as close as possible to the true average? The scaling factor can take on any value so I would think scipy.optimize.minimize should be able to find a scaling factor to get back the true average exactly.