Create a random array with a particular average - python

I am using scipy and want to create an an array of legnth n with a particular average.
Suppose I want an random arrays of length 3 with an average of 2.5 hence the possible options could be:
[1.5, 3.5, 2.5]
[.25, 7.2, .05]
and so on and so forth...
I need to create many such arrays with varying lengths and different averages(specified) for each, so a generalized solution will be welcome.

Just generate numbers over the range you want (0...10 in this case)
>>> import random
>>> nums = [10*random.random() for x in range(5)]
Work out the average
>>> sum(nums)/len(nums)
4.2315222659844824
Shift the average to where you want it
>>> nums = [x - 4.2315222659844824 + 2.5 for x in nums]
>>> nums
[-0.628013346633133, 4.628537956666447, -1.7219257458163257, 7.617565127420011, 2.6038360083629986]
>>> sum(nums)/len(nums)
2.4999999999999996
You can use whichever distribution/range you like. By shifting the average this way it will always get you an average of 2.5 (or very close to it)

You haven't specified what distribution you want.
It's also not clear whether you want the average of the actual array to be 2.5, or the amortized average over all arrays to be 2.5.
The simplest solution—three random numbers in an even distribution from 0 to 2*avg—is this:
return 2*avg * np.random.rand(3)
If you want to guarantee that the average of the array is 2.5, that's a pretty simple constraint, but there are many different ways you could satisfy it, and you need to describe which way you want. For example:
n0 = random.random() * 2*avg
n1 = random.random() * (2*avg - n0)
n2 = random.random() * (2*avg - n0 - n1)
return np.array((n0, n1, n2))

I found a solution to the problem.
numpy.random.triangular(left, mode, right, size=None)
Visit: http://docs.scipy.org/doc/numpy/reference/generated/numpy.random.triangular.html#numpy.random.triangular
However, the minor problem is that it forces a triangular distribution on the samples.

Related

Using Python to use Euler's formula into matrix approximation

I am trying to use Python and NumPy to use Euler’s formula e^i(π) represented as a matrix as e^A where
A = [0 -π]
[π 0]
,and then apply it to the Maclaurin series for an exponential function e^x as
SUMMATION(n=0, infinity) x^n/n! = 1 + x + x^2/2! + x^3/3! +...
So I am trying to compute an approximation matrix S^N+1 and print the matrix and it's four entries.
I have tried emulating euler's and maclaurin's series, which i think the final approximation matrix for this will be when N = 20, but currently my values do not add up. I am also trying to use np.linalg.norm to compute a 2 norm as well.
import math
import numpy as np
n = 0
A = np.eye(2)
A = math.pi * np.rot90(A)
A[0,1] = -A[0,1]
A
mac_series = 0
while n < 120:
print(n)
n += 1
mac_series = (A**n) / (math.factorial(n))
print("\n",mac_series)
np.linalg.norm(mac_series)
The main problem here is that you are confusing A**3 with A#A#A.
Just look at case n=0.
A**0
#array([[1., 1.],
# [1., 1.]])
I am pretty sure, you were expecting A⁰ to be identity (that is only that way that this thinking of x+iy ⇔ np.array([[x,-y],[y,x]]) makes sense)
In numpy, you have np.linalg.matrix_power for that (or you could just accumulate power your self)
sum(np.linalg.matrix_power(A,i) / math.factorial(i) for i in range(20))
is
array([[-1.00000000e+00, 5.28918267e-10],
[-5.28918267e-10, -1.00000000e+00]])
for example. Pretty sure that is what you were expecting (that is the matrix that represents real -1 using the same logic. And whole point of Euler identity is e^(iπ) = -1).
By comparison,
sum(A**i / math.factorial(i) for i in range(20))
returns
array([[ 1. , 0.04321392],
[23.14069263, 1. ]])
Which is just the maclaurin series computed for all four elements of the matrix. In other words, since your matrix is [[0,-π],[π,0]], you are evaluating using a MacLauring series [[e⁰, exp(-π)], [exp(π), e⁰]]. And it works. e⁰=1, obviously. exp(π) is 23.140692632779267, so we got a very good approximation in our result. And exp(-π) is the inverse, 0.04321391826377226. We also got a good approximation.
So it works. Just not at all to do what you obviously intend to do: prove Euler identity's in matrix form; compute exp(iπ) not just exp(π).
Without matrix_power, and with a code closer to your initial code, you could
n=0
mac_series = 0
Apowern=np.eye(2) # A⁰=Id for now
while n < 20:
print(n)
mac_series += Apowern / (math.factorial(n))
Apowern = Apowern # A # # is the matrix multiplication operator
n+=1
Note that I've also moved n+=1 which was misplaced in your code. You were stacking Aⁿ⁺¹/(n+1)! not Aⁿ/n! with your code (in other words, your sum misses the A⁰/0!=Id term).
With this, I get the expected result
>>> mac_series
array([[-1.00000000e+00, 5.28918724e-10],
[-5.28918724e-10, -1.00000000e+00]])
Last problem, more subtle: you may have noticed that I do only 20 iterations, not 120. That is because after 20, you start to have a numerical problem. Apowern (or np.linalg.matrix_power(A,n), it is the same problem for both methods) becomes to big. Since it is divided by n! in the stacking, that doesn't prevent convergence. But it does prevent numeric convergence. And, in practice, after a while, numpy change the type of Apowern.
So, we should not have big matrix divided by big number, and try to iterate things that stay small enough. Like this for example
n=0
mac_series = 0
NthTerm=np.eye(2) # Aⁿ/n!. A⁰/0!=Id for now
while n < 120: # 120 is no longer a problem
print(n)
mac_series += NthTerm
n += 1
NthTerm = (NthTerm # A) / n # so if nthterm was
# Aⁿ/n!, now it becomes Aⁿ/n! # A/(n+1) = Aⁿ⁺¹/(n+1)!
Result
>>> mac_series
array([[-1.00000000e+00, -2.34844612e-16],
[ 2.34844612e-16, -1.00000000e+00]])
tl;dr
You have 4 problems
The one already mentioned by Roy: you are not accumulating the Aⁿ/n!, just replacing them, and eventually keeping only the last. In other words, you need a += instead of =
A**n is not Aⁿ. It is just A, with all the elements to the power n. Said otherwise [[x,-y],[y,x]]**n is not [[x,-y],[y,x]]ⁿ it is [[xⁿ,(-y)ⁿ],[yⁿ,xⁿ]]. So you'll end up computing [[e⁰, 1/e^π], [e^π, e⁰]] ≈ [[1, 0.0432], [23.14, 1]] which is irrelevant.
n+=1 is misplaced
The numerical problem due to Aⁿ becoming huge (even if you intend to divide it by a even huger n!, so it does not theoretically/mathematically pose a problem, but numerically it does, since intermediate result is to big for computer)

Allocate an integer randomly across k bins

I'm looking for an efficient Python function that randomly allocates an integer across k bins.
That is, some function allocate(n, k) will produce a k-sized array of integers summing to n.
For example, allocate(4, 3) could produce [4, 0, 0], [0, 2, 2], [1, 2, 1], etc.
It should be randomly distributed per item, assigning each of the n items randomly to each of the k bins.
This should be faster than your brute-force version when n >> k:
def allocate(n, k):
result = np.zeros(k)
sum_so_far = 0
for ind in range(k-1):
draw = np.random.randint(n - sum_so_far + 1)
sum_so_far += draw
result[ind] = draw
result[k-1] = n - sum_so_far
return result
The idea is to draw a random number up to some maximum m (which starts out equal to n), and then we subtract that number from the maximum for the next draw, and so on, thus guaranteeing that we will never exceed n. This way we fill up the first k-1 entries; the final one is filled with whatever is missing to get a sum of exactly n.
Note: I am not sure whether this results in a "fair" random distribution of values or if it is somehow biased towards putting larger values into earlier indices or something like that.
If you are looking for a uniform distribution across all possible allocations (which is different from randomly distributing each item individually):
Using the "stars and bars" approach, we can transform this into a question of picking k-1 positions for possible dividers from a list of n+k-1 possible positions. (Wikipedia proof)
from random import sample
def allocate(n,k):
dividers = sample(range(1, n+k), k-1)
dividers = sorted(dividers)
dividers.insert(0, 0)
dividers.append(n+k)
return [dividers[i+1]-dividers[i]-1 for i in range(k)]
print(allocate(4,3))
There are ((n+k-1) choose (k-1)) possible distributions, and this is equally likely to result in each one of them.
(This is a modification of Wave Man's solution: that one is not uniform across all possible solutions: note that the only way to get [0,0,4] is to roll (0,0), but there are two ways to get [1,2,1]; rolling (1,3) or (3,1). Choosing from n+k-1 slots and counting dividers as taking a slot corrects for this. In this solution, the random sample (1,2) corresponds to [0,0,4], and the equally likely random sample (2,5) corresponds to [1,2,1])
Here's a brute-force approach:
import numpy as np
def allocate(n, k):
res = np.zeros(k)
for i in range(n):
res[np.random.randint(k)] += 1
return res
Example:
for i in range(3):
print(allocate(4, 3))
[0. 3. 1.]
[2. 1. 1.]
[2. 0. 2.]
Adapting Michael Szczesny's comment based on numpy's new paradigm:
def allocate(n, k):
return np.random.default_rng().multinomial(n, [1 / k] * k)
This notebook verifies that it returns the same distribution as my brute-force approach.
Here's my solution. I think it will make all possible allocations equally likely, but I don't have a proof of that.
from random import randint
def allocate(n,k):
dividers = [randint(0,n) for i in range(k+1)]
dividers[0] = 0
dividers[k] = n
dividers = sorted(dividers)
return [dividers[i+1]-dividers[i] for i in range(k)]
print(allocate(10000,100))

How to determine a proportionately decreasing weight given a list of N elements?

I have a list composed of N elements. For context, I am making a time series forecast, and - once the forecasts have been made - would like to weight the forecasts made at the beginning as more important than the later forecasts. This is useful because when I calculate performance error scores (MAPE), this score will be representative of both the forecasts per item, as well as based on how I want to identify good vs. bad models.
How should I update my existing function in order to take any list of elements (N) in order to generate these steadily decreasing weights?
Here is the function that I have come up with on my own. It works for examples like compute_equal_perc(5), but not for other combinations...
def compute_equal_perc(rng):
perc_allocation = []
equal_perc = 1 / rng
half_rng = rng / 2
step_val = equal_perc / (rng - 1)
print(step_val)
for x in [v for v in range(0, rng)]:
if x == int(half_rng):
perc_allocation.append(equal_perc)
elif x < int(half_rng):
diff_plus = ((abs(int(half_rng) - x) * step_val)) + equal_perc
perc_allocation.append(round(float(diff_plus), 3))
elif x >= int(half_rng):
diff_minus = equal_perc - ((abs(int(half_rng) - x) * step_val))
perc_allocation.append(round(float(diff_minus), 3))
return perc_allocation
For compute_equal_perc(5), the output that I get is:
[0.3, 0.25, 0.2, 0.15, 0.1]
The sum of this sequence should always equal 1, and the increments between values should always be equal.
This can be solved through the application of basic algebra. An arithmetic sequence is defined as
A[i] = a + b*i, for i = 0, 1, 2, 3, ... where a is the initial term
The sum of a sequence of elements 0 through n is
S = (A[0] + A[n]) * (n+1) / 2
in words, sum of the first and list terms, times half the number of terms.
Since you know S and n, you need only decide one more "spread" factor to generate your sequence. The mean element must be 1/n -- this is where your algorithm is wrong, as it fumbles this computation for even values of n.
Your code fails in this coupling of statements:
half_rng = rng / 2
step_val = equal_perc / (rng - 1)
# comparing x to int(half_rng)
If rng is even, you assign the mean value to position rng/2, giving you something such as the list for 4 elements:
[0.417, 0.333, 0.25, 0.167]
This means that you have two elements larger than the desired mean, and only one smaller, forcing the sum over 1.0. Instead, when you have an even quantity of elements, you have to make the mean a "phantom" middle element, and take half-steps around it. Let's look at this with fractions: you already have
[5/12, 4/12, 3/12, 2/12]
Your difference is 1/12 ... 1 / (n * (n-1)) ... and you need to shift these values lower by half a step. Instead, the solution with the spread you've chosen (1/12) would be starting a half-step to the side: subtract 1/24 from each element.
[9/24, 7/24, 5/24, 3/24]
You could also change your step with a simple linear factor. Decide on the ratio you want for your elements in simple integers, such as 5:4:3:2, and then generate your weights from the obvious sum of 5+4+3+2:
[5/14, 4/14, 3/14, 2/14]
Note that this works with any arithmetic sequence of integers, another way of choosing your "spread". If you use 4:3:2:1 you get
[4/10, 3/10, 2/10, 1/10]
or you can cluster them more closely with, say, 13:12:11:10
[13/46, 12/46, 11/46, 10/46]
So ... pick the spread you want and simplify your code to take advantage of that.

How to generate a matrix with random entries and with constraints on row and columns?

How to generate a matrix that its entries are random real numbers between zero and one inclusive with the additional constraint : The sum of each row must be less than or equal to one and the sum of each column must be less than or equal to one.
Examples:
matrix = [0.3, 0.4, 0.2;
0.7, 0.0, 0.3;
0.0, 0.5, 0.1]
If you want a matrix that is uniformly distributed and fulfills those constraints, you probably need a rejection method. In Matlab it would be:
n = 3;
done = false;
while ~done
matrix = rand(n);
done = all(sum(matrix,1)<=1) & all(sum(matrix,2)<=1);
end
Note that this will be slow for large n.
If you're looking for a Python way, this is simply a transcription of Luis Mendo's rejection method. For simplicity, I'll be using NumPy:
import numpy as np
n = 3
done = False
while not done:
matrix = np.random.rand(n,n)
done = np.all(np.logical_and(matrix.sum(axis=0) <= 1, matrix.sum(axis=1) <= 1))
If you don't have NumPy, then you can generate your 2D matrix as a list of lists instead:
import random
n = 3
done = False
while not done:
# Create matrix as a list of lists
matrix = [[random.random() for _ in range(n)] for _ in range(n)]
# Compute the row sums and check for each to be <= 1
row_sums = [sum(matrix[i]) <= 1 for i in range(n)]
# Compute the column sums and check for each to be <= 1
col_sums = [sum([matrix[j][i] for j in range(n)]) <= 1 for i in range(n)]
# Only quit of all row and column sums are less than 1
done = all(row_sums) and all(col_sums)
The rejection method will surely give you a uniform solution, but it might take a long time to generate a good matrix, especially if your matrix is large. So another, but more tedious approach is to generate each element such that the sum can only be 1 in each direction. For this you always generate a new element between 0 and the remainder until 1:
n = 3
matrix = zeros(n+1); %dummy line in first row/column
for k1=2:n+1
for k2=2:n+1
matrix(k1,k2)=rand()*(1-max(sum(matrix(k1,1:k2-1)),sum(matrix(1:k1-1,k2))));
end
end
matrix = matrix(2:end,2:end)
It's a bit tricky because for each element you check the row-sum and column-sum until that point, and use the larger of the two for generating a new element (in order to stay below a sum of 1 in both directions). For practical reasons I padded the matrix with a zero line and column at the beginning to avoid indexing problems with k1-1 and k2-1.
Note that as #LuisMendo pointed out, this will have a different distribution as the rejection method. But if your constraints do not consider the distribution, this could do as well (and this will give you a matrix from a single run).

Generating a set random list of integers based on a distribution

I hope I can explain this well, if I don't I'll try again.
I want to generate an array of 5 random numbers that all add up to 10 but whose allocation are chosen on an interval of [0,2n/m].
I'm using numpy.
The code I have so far looks like this:
import numpy as np
n=10
m=5
#interval that numbers are generated on
randNumbers= np.random.uniform(0,np.divide(np.multiply(2.0,n),fronts),fronts)
#Here I normalize the random numbers
normNumbers = np.divide(randNumbers,np.sum(randNumbers))
#Next I multiply the normalized numbers by n
newList = np.multiply(normNumbers,n)
#Round the numbers two whole numbers
finalList = np.around(newList)
This works for the most part, however the rounding is off, it will add up to 9 or 11 as opposed to 10. Is there a way to do what I'm trying to do without worrying about rounding errors, or maybe a way to work around them? If you would like for me to be more clear I can, because I have trouble explaining what I'm trying to do with this when talking :).
This generates all the possible combinations that sum to 10 and selects a random one
from itertools import product
from random import choice
n=10
m=5
finalList = choice([x for x in product(*[range(2*n/m+1)]*m) if sum(x) == 10])
There may be a more efficient way, but this will select fairly between the outcomes
Lets see how this works when n=10 and m=5
2*n/m+1 = 5, so the expression becomes
finalList = choice([x for x in product(*[range(5)]*5) if sum(x) == 10])
`*[range(5)]*5 is using argument unpacking. This is equivalent to
finalList = choice([x for x in product(range(5),range(5),range(5),range(5),range(5)) if sum(x) == 10])
product() gives the cartesian product of the parameters, which in this case has 5**5 elements, but we then filter out the ones that don't add to 10, which leaves a list of 381 values
choice() is used to select a random value from the resultant list
Just generate four of the numbers using the technique above, then subtract the sum of the four from 10 to pick the last number.

Categories