Birthday Paradox, incorrect output by about 1 - python

I'm relatively new to python and wanted to test myself, by tackling the birthday problem. Rather than calculating it mathematically, I wanted to simulate it, to see if I would get the right answer. So I assigned all boolean values in the list sieve[] as False and then randomly pick a value from 0 to 364 and change it to True, if it's already True then it outputs how many times it had to iterate as an answer.
For some reason, every time I run the code, I get a value between 24.5 and 24.8
The expected result for 50% is 23 people, so why is my result 6% higher than it should be? Is there an error in my code?
import random
def howManyPeople():
sieve = [False] * 365
count = 1
while True:
newBirthday = random.randint(0,364)
if sieve[newBirthday]:
return count
else:
sieve[newBirthday] = True
count += 1
def multipleRun():
global timesToRun
results = []
for i in range(timesToRun):
results.append(howManyPeople())
finalResultAverage = sum(results)
return (finalResultAverage / timesToRun)
timesToRun = int(input("How many times would you like to run this code?"))
print("Average of all solutions = " + str(multipleRun()) + " people")

There's no error in your code. You're computing the mean of your sample of howManyPeople return values, when what you're really interested in (and what the birthday paradox tells you about) is the median of the distribution.
That is, you've got a random process where you incrementally add people to a set, then report the total number of people in that set on the first birthday collision. The birthday paradox implies that at least 50% of the time, your set will have 23 or fewer people. That's not the same thing as saying the expected number of people in the set is 23.0 or smaller.
Here's what I see from one million samples of your howManyPeople function.
In [4]: sample = [howManyPeople() for _ in range(1000000)]
In [5]: import numpy as np
In [6]: np.median(sample)
Out[6]: 23.0
In [7]: np.mean(sample)
Out[7]: 24.617082
In [8]: np.mean([x <= 23 for x in sample])
Out[8]: 0.506978
Note that there's a (tiny) amount of luck here: the median of the distribution of howManyPeople return values is 23 (at least according to Wikipedia's definition), but there's a chance that an unusual sample could have different median, purely through randomness. In this particular case, that chance is entirely negligible. And as user2357112 points out in comments, things are a bit messier in the 2-day year example, where any real number between 2.0 and 3.0 (inclusive) is a valid distribution median, and we could reasonably expect a sample median to be either 2 or 3.
Instead of sampling, we can also compute the probabilities of each output of howManyPeople directly: for any positive integer k, the probability that the output is strictly larger than k is the same as the probability that the first k people have distinct birthdays, which is given (in Python syntax) by factorial(365)/factorial(k)/365**k, and we can use that to compute the probabilities of individual outputs. Here I'm using the name X for the random variable represented by howManyPeople. Some inefficient code:
from math import factorial
def prob_X_greater_than(k):
"""Probability that the output of howManyPeople is > k."""
if k <= 0:
return 1.0
elif k > 365:
return 0.0
else:
return factorial(365) / factorial(365 - k) / 365**k
def prob_X_equals(k):
"""Probability that the output of howManyPeople is == k."""
return prob_x_greater_than(k-1) - prob_x_greater_than(k)
With this, we can get the exact (well, okay, exact up to numerical errors) mean and verify that it roughly matches what we got from the sample:
In [18]: sum(k*prob_x_equals(k) for k in range(1, 366))
Out[18]: 24.616585894598863
And the birthday paradox in this case should tell us that the sum of the probabilities for k <= 23 is greater than 0.5:
In [19]: sum(prob_x_equals(k) for k in range(1, 24))
Out[19]: 0.5072972343239854

What you're seeing is normal. There may be a >50% chance of having a duplicate birthday in a room of 23 random people (ignoring leap years and nonuniform birthday distributions), but that doesn't mean that if you add people to a room one by one, the mean point at which you get a duplicate will be 23.
To get an intuitive feel for this, imagine if years only had two days. In this case, it's clear that there's a 50% chance of having a duplicate birthday in a room with 2 people. However, if you add random people to the room one by one, you're going to need at least two people - 50% chance of stopping at 2 and 50% of 3. The mean stopping point is 2.5, not 2.

Related

Is this just very unlikely? Or is it impossible

So, I'm a beginner in python (coding in general, really), and I've tried to make this little program which generates a random number of rods in 305 attempts
import random
rods = 0
def blazerods():
global rods
seed = random.randint(0, 100000000000)
random.seed(seed)
i = 0
rods = 0
for i in range(0, 305):
rnd = random.random()
if rnd < 0.50:
rods += 1
print(rods)
return rods
while 1==1:
blazerods()
if rods >= 211:
break
The goal is to get 211 or more rods. However, I ran the program for 30 minutes without results.
My questions are: Is it even possible to get 211 or higher with just this code I included?
Can I make it more likely that rods can be more than 211 (still being a very unlikely result, ofc) without changing the chance(50%)?
Is random.seed(seed) even useful?
The probability distribution of rods is Binomial(305,0.5), that is the probability of getting exactly n rods is (305 choose n) * 0.5^305.
To get the probability to get at least 211, you need to sum these terms from 211 to 305. Wolfram alpha gives that as 8.8e-12.
So... it is really, really unlikely and you will have to wait a long time.
If your loop runs 1000 times a second, you will expect to have enough rods about once every 4 years.
If I remember correctly, Matt Parker from the Youtube channel Stand-up Maths has something to say about this particular case in his video "How lucky is too lucky".
As pointed out by Jens, this is easy to calculate via the Binomial distribution. The SciPy stats module allows you to calculate this by doing:
from scipy import stats
# i.e. 305 draws with equal probability
d = stats.binom(305, 0.5)
# the probability of seeing something greater than this value
p = d.sf(210)
which should give you the same value as Jens got: ~8.8e-12.
Next we can use the datetime module to convert this number into the expected time you have to wait:
from datetime import timedelta
time_per_try = timedelta(seconds=1/1000)
print(time_per_try / p)
which should give you ~1300 days, or 3.6 years. Technically, this is the time you'll have to wait to have a 50% chance of seeing it, and it could appear much sooner or later.
You can calculate reasonable values of when this would happen, using the negative binomial distribution. In Python, this looks like:
for q in stats.nbinom(1, p).ppf([0.025, 0.975]):
print(time_per_try * q)
where the 0.025 and 0.975 values give you the 95% confidence interval you hear scientists talking about.
It tells you that if you had 20 computers running your algorithm in parallel, each doing 1000 tests per second, you could expect the first one to finish in around a month while the slowest one would likely be going on for more than 10 years.

Why am I not receiving output for this while loop in python?

I am new to python and trying to learn through small projects:
I am trying to write a code where if suppose you are in a large-lecture class with n other students.Determine how large n must be such that the probability that someone has the same birthday as you is greater than 50%?
Note: Forgetting about leap years
and so assuming 365 days in a year, the probability that no one has the same birthday as you is (364/365)**n
My code for this is:
n=probability
probability = 0
while n==0.50*n:
print("With n students, the
probability
is greater than
50% that someone has the same
birthday
as you.")
Where am I going wrong?How could I implement an ifstatement?The output I am looking for is:
With 253 students, the probability is greater than 50% that someone
has the same birthday as you.
I don't know how to fix your code without redoing it entirely, so here we go:
p = 1/365
n = 0
# Search for the number of students that ensures p(same birthday)>50%.
while 1-(1-p)**n < 0.50 :
n = n+1
print("With {number} students, the probability is greater than 50% that someone has the same birthday as you.".format(number=n))
Feel free to ask extra information.
this is not returning an output because the condition of n==0.5*n is false, so the loop never runs to begin with.

Optimize permutations search loop (can't use itertools) that is extremely slow. Any suggestions?

This is a game where you have 12 cards and you pick you until you choose 3 from the same group. I am attempting to find the probability of choosing each group. The script that I have created works, but it is extremely slow. My coworker created a similar script in R without the functions and his script takes 1/100th the time that mine takes. I am just trying to figure out why. Any ideas would be greatly appreciated.
from collections import Counter
import pandas as pd
from datetime import datetime
weight = pd.read_excel('V01Weights.xlsx')
Weight looks like the following:
Symb Weight
Grand 170000
Grand 170000
Grand 105
Major 170000
Major 170000
Major 215
Minor 150000
Minor 150000
Minor 12000
Bonus 105000
Bonus 105000
Bonus 105000
Max Picks represents the total number of different "cards". Total Picks represents the max number of user choices. This is because after 8 choices, you are guaranteed to have 2 of each type so on the 9th pick, you are guaranteed to have 3 matching.
TotalPicks = 9
MaxPicks = 12
This should have been named PickedProbabilities.
Picks = {0:0,1:0,2:0,3:0}
This is my simple version of the timeit class because I don't like the timeit class
def Time_It(function):
start =datetime.now()
x = function()
finish = datetime.now()
TotalTime = finish - start
Minutes = int(TotalTime.seconds/60)
Seconds = TotalTime.seconds % 60
print('It took ' + str(Minutes) + ' minutes and ' + str(Seconds) + ' seconds')
return(x)
Given x(my picks in order) I find the probability. These picks are done without replacement
def Get_Prob(x,weight):
prob = 1
weights = weight.iloc[:,1]
for index in x:
num = weights[index]
denom = sum(weights)
prob *= num/denom
weights.drop(index, inplace = True)
# print(weights)
return(prob)
This is used to determine if there are duplicates in my loop because that is not allowed
def Is_Allowed(x):
return(len(x) == len(set(x)))
This determines if a win is present in all of the cards present thus far.
def Is_Win(x):
global Picks
WinTypes = [[0,1,2],[3,4,5],[6,7,8],[9,10,11]]
IsWin = False
for index,item in enumerate(WinTypes):
# print(index)
if set(item).issubset(set(x)):
IsWin = True
Picks[index] += Get_Prob(x,weight)
# print(Picks[index])
print(sum(Picks.values()))
break
return(IsWin)
This is my main function that cycles through all of the cards. I attempted to do this using recursion but I eventually gave up. I can't use itertools to create all of the permutations because for example [0,1,2,3,4] will be created by itertools but this is not possible because once you get 3 matching, the game ends.
def Cycle():
for a in range(MaxPicks):
x = [a]
for b in range(MaxPicks):
x = [a,b]
if Is_Allowed(x):
for c in range(MaxPicks):
x = [a,b,c]
if Is_Allowed(x):
if Is_Win(x):
# print(x)
continue
for d in range(MaxPicks):
x = [a,b,c,d]
if Is_Allowed(x):
if Is_Win(x):
# print(x)
continue
for e in range(MaxPicks):
x = [a,b,c,d,e]
if Is_Allowed(x):
if Is_Win(x):
continue
for f in range(MaxPicks):
x = [a,b,c,d,e,f]
if Is_Allowed(x):
if Is_Win(x):
continue
for g in range(MaxPicks):
x = [a,b,c,d,e,f,g]
if Is_Allowed(x):
if Is_Win(x):
continue
for h in range(MaxPicks):
x = [a,b,c,d,e,f,g,h]
if Is_Allowed(x):
if Is_Win(x):
continue
for i in range(MaxPicks):
if Is_Allowed(x):
if Is_Win(x):
continue
Calls the main function
x = Time_It(Cycle)
print(x)
writes the probabilities to a text file
with open('result.txt','w') as file:
# file.write(pickle.dumps(x))
for item in x:
file.write(str(item) + ',' + str(x[item]) + '\n')
My coworker created a similar script in R without the functions and his script takes 1/100th the time that mine takes.
Two easy optimizations:
1) In-line the function calls like Is_Allowed() because Python have a lot of function call overhead (such as creating a new stackframe and argument tuples).
2) Run the code in using pypy which is really good at optimizing functions like this one.
Ok, this time I hope I got your problem right:)
There are two insights (I guess you have them, just for the sake of the completeness) needed in order to speed up your program algorithmically:
The probabilities for the sequence (card_1, card_2) and (card_2, card_1) are not equal, so we cannot use the results from the urn problem, and it looks like we need to try out all permutations.
However, given a set of cards we picked so far, we don't really need the information in which sequence they where picked - it is all the same for the future course of the game. So it is enough to use dynamic programming and calculate the probabilities for every subset to be traversed during the game (thus we need to check 2^N instead of N! states).
For a set of picked cards set the probability to pick a card i in the next turn is:
norm:=sum Wi for i in set
P(i|set)=Wi/norm if i not in set else 0.0
The recursion for calculating P(set) - the probability that a set of picked card occured during the game is:
set_without_i:=set/{i}
P(set)=sum P(set_without_i)*P(i|set_without_i) for i in set
However this should be done only for set_without_i for which the game not ended yet, i.e. no group has 3 cards picked.
This can be done by means of recursion+memoization or, as my version does, by using bottom-up dynamic programming. It also uses binary representation of integers for representations of sets and (most important part!) returns the result almost instantly [('Grand', 0.0014104762718021384), ('Major', 0.0028878988709489244), ('Minor', 0.15321793072867956), ('Bonus', 0.84248369412856905)]:
#calculates probability to end the game with 3 cards of a type
N=12
#set representation int->list
def decode_set(encoded):
decoded=[False]*N
for i in xrange(N):
if encoded&(1<<i):
decoded[i]=True
return decoded
weights = [170000, 170000, 105, 170000, 170000, 215, 150000, 150000, 12000, 105000, 105000, 105000]
def get_probs(decoded_set):
denom=float(sum((w for w,is_taken in zip(weights, decoded_set) if not is_taken)))
return [w/denom if not is_taken else 0.0 for w,is_taken in zip(weights, decoded_set)]
def end_group(encoded_set):
for i in xrange(4):
whole_group = 7<<(3*i) #7=..000111, 56=00111000 and so on
if (encoded_set & whole_group)==whole_group:
return i
return None
#MAIN: dynamic program:
MAX=(1<<N)#max possible set is 1<<N-1
probs=[0.0]*MAX
#we always start with the empty set:
probs[0]=1.0
#building bottom-up
for current_set in xrange(MAX):
if end_group(current_set) is None: #game not ended yet!
decoded_set=decode_set(current_set)
trans_probs=get_probs(decoded_set)
for i, is_set in enumerate(decoded_set):
if not is_set:
new_set=current_set | (1<<i)
probs[new_set]+=probs[current_set]*trans_probs[i]
#filtering wins:
group_probs=[0.0]*4
for current_set in xrange(MAX):
group_won=end_group(current_set)
if group_won is not None:
group_probs[group_won]+=probs[current_set]
print zip(["Grand", "Major", "Minor", "Bonus"], group_probs)
Some explanation of the "tricks" used in code:
A pretty standard trick is to use integer's binary representation to encode a set. Let's say we have objects [a,b,c], so we could represent the set {b,c} as 110, which would mean a (first in the list corresponds to 0- the lowest digit) - not in the set, b(1) in the set, c(1) in the set. However, 110 read as integer it is 6.
The current_set - for loop simulates the game and best understood while playing. Let's play with two cards [a,b] with weights [2,1].
We start the game with an empty set, 0 as integer, so the probability vector (given set, its binary representation and as integer mapped onto probability):
probs=[{}=00=0->1.0, 01={a}=1->0.0, {b}=10=2->0.0, {a,b}=11=3->0.0]
We process the current_set=0, there are two possibilities 66% to take card a and 33% to take cardb, so the probabilities become after the processing:
probs=[{}=00=0->1.0, 01={a}=1->0.66, {b}=10=2->0.33, {a,b}=11=3->0.0]
Now we process the current_set=1={a} the only possibility is to take b so we will end with set {a,b}. So we need to update its (3={a,b}) probability via our formula and we get:
probs=[{}=00=0->1.0, 01={a}=1->0.66, {b}=10=2->0.33, {a,b}=11=3->0.66]
In the next step we process 2, and given set {b} the only possibility is to pick card a, so probability of set {a,b} needs to be updated again
probs=[{}=00=0->1.0, 01={a}=1->0.66, {b}=10=2->0.33, {a,b}=11=3->1.0]
We can get to {a,b} on two different paths - this could be seen in our algorithm. The probability to go through set {a,b} at some point in our game is obviously 1.0.
Another important thing: all paths that leads to {a,b} are taken care of before we process this set (it would be the next step).
Edit: I misunderstood the original problem, the here presented solution is for the following problem:
Given 4 groups with 3 different cards with a different score for every card, we pick up cards as long as we don't have picked 3 cards from the same group. What is the expected score(sum of scores of picked cards) in the end of the game.
I leave the solution as it is, because it was such a joy to work it out after so many probability-theory-less years and I just cannot delete it:)
See my other answer for handling of the original problem
There are two possibilities to improve the performance: making the code faster (and before starting this, one should profile in order to know which part of the program should be optimized, otherwise the time is spent optimizing things that don't count) or improving the algorithm. I propose to do the second.
Ok, this problem seems to be more complex as at the first site. Let's start with some observations.
All you need to know is the expected number of the picked cards at the end of the game:
If Pi is the probability that the card i is picked somewhere during the game, then we are looking for the expected value of the score E(Score)=P1*W1+P2*W2+...Pn*Wn. However, if we look at the cards of a group, we can state that because of the symmetry the probabilities for the cards of this group are the same, e.g. P1=P2=P3=:Pgrand in your case. Thus our expectation can be calculated:
E(Score)=3*Pgrand*(W1+W2+W3)/3+...+3*Pbonus*(W10+W11+W12)/3
We call averageWgrand:=(W1+W2+W3)/3 and note that E(#grand)=3*Pgrand - the expected number of picked grand card at the end of the game. With this our formula becomes:
E(Score)=E(#grand)*averageWgrand+...+E(#bonus)*averageWbonus
In your example we can go even further: The number of cards in every group is equal, so because of the symmetry we can claim: E(#grand)=E(#minor)=E(#major)=E(#grand)=:(E#group). For the sake of simplicity, in the following we consider only this special case (but the outlined solution could be extended also to the general case). This lead to the following simplification:
E(Score)=4*E(#group)(averageWgrand+...+averageWbonus)/4
We call averageW:=(averageWgrand+...+averageWbonus)/4 and note that E(#cards)=4*E(#grand) is the expected number of picked card at the end of the game.
Thus, E(Score)=E(#cards)*averageW, so our task is reduced to calculating the expected value of the number of cards at the end of the game:
E(#cards)=P(1)*1+P(2)*2+...P(n)*n
where P(i) denotes the probability, that the game ends with exact i cards. The probabilities P(1),P(2) and P(k), k>9 are easy to see - they are 0.
Calculation of the probability of ending the game with i picked cards -P(i):
Let's play a slightly different game: we pick exactly i cards and win if and only if:
There is exactly one group with 3 cards picked. We call this group full_group.
The last picked (i-th) card was from the full_group.
It is easy to see, that the probability to win this game P(win) is exactly the probability we are looking for - P(i). Once again we can use the symmetry, because all groups are equal (P(win, full=grand) means the probability that we what and that the full_group=grand):
P(win)=P(win, grand)+P(win, minor)+P(win, major)+P(win, bonus)
=4*P(win, grand)
P(win, grand) is the probability that:
after picking i-1 cards the number of picked grand cards is 2, i.e. `#grand=2' and
after picking i-1 cards, for every group the number of picked cards is less than 3 and
we pick a grand-card in the last round. Given the first two constraints hold, this (conditional) probability is 1/(n-i+1) (there are n-i+1 cards left and only one of them is "right").
From the urn problem we know the probability for
P(#grand=u, #minor=x, #major=y, #bonus=z) = binom(3,u)*binom(3,x)*binom(3,y)*binom(3,z)/binom(12, u+x+y+z)
with binom(n,k)=n!/k!/(n-k)!. Thus P(win, grand) can be calculated as:
P(win, grand) = 1/(n-i+1)*sum P(#grand=2, #minor=x, #major=y, #bonus=z)
where x<=2, y<=2, z<=2 and 2+x+y+z=i-1
And now the code:
import math
def binom(n,k):
return math.factorial(n)//math.factorial(k)//math.factorial(n-k)
#expected number of cards:
n=12 #there are 12 cards
probs=[0]*n
for minor in xrange(3):
for major in xrange(3):
for bonus in xrange(3):
i = 3 + minor +major +bonus
P_urn = binom(3,2)*binom(3,minor)*binom(3,major)*binom(3,bonus)/float(binom(n, n-i+1))
P_right_last_card = 1.0/(n-i+1)
probs[i]+=4*P_urn*P_right_last_card #factor 4 from symmetry
print "Expected number of cards:", sum((prob*card_cnt for card_cnt, prob in enumerate(probs)))
As result I get 6.94285714286 as the expected number of cards in the end of the game. And very fast - almost instantly. Not sure whether it is right though...
Conclusion:
Obviously, if you like to handle a more general case (more groups, number cards in a group different) you have to extend the code (recursion, memoization of binom) and the theory.
But the most crucial part: with this approach you (almost) don't care in which order the cards were picked - and thus the number of states you have to inspect is down by factor of (k-1)! where k is the maximal possible number of cards in the end of the game. In your example k=9 and thus the approach is faster by factor 40000 (I don't even consider the speed-up from the exploited symmetry, because it might not be possible in general case).

Filtering out conditioned arrays from a matrix

In Sweden there is a football (soccer) betting game where you try to find the outcome of 13 matches. Since each match can have a home win, a draw or an away team win this leads to 3**13=1594323 possible outcomes. You win if you have 10 to 13 matches correct. You don't want this to occur when a lot of other people also have high scores since the prize sum is divided among all winners. This is the background to the more generic question that I'm looking for an answer to: how to find all arrays that differ by at least x elements from a given array within the matrix (in this case 1594323*13).
The first obvious idea that came to my mind was to have 13 nested for loops and compare one array at the time. However, I'm using this problem as a training session to learn myself Python programming. Python is not the optimal tool for this kind of task and I could turn to C to get a faster program but I'm interested in the best algorithm as such.
In Python I tried the nested for loop method up to 10 matches, then the execution time got too long, 5 seconds on the netbook I'm using. For each added match, the execution time went up tenfold.
Another approach would be to use a database and that might be the solution but I'm curious what the fastest way of solving this kind of problem is. I haven't been successful googling this problem, maybe because it's hard to use the correct description of the problem in a short search.
Here's a recursive solution that terminates in about 5 seconds on my machine, when given an x value of 0 (the worst case). For x values of 10 and above it's nearly instantaneous.
def arrays_with_differences(arr, x):
if x > len(arr):
return []
if len(arr) == 0:
return [[]]
smallColl1 = arrays_with_differences(arr[:len(arr)-1], x)
smallColl2 = arrays_with_differences(arr[:len(arr)-1], x-1)
last = arr[len(arr)-1]
altLast1 = (last + 1) % 3
altLast2 = (last + 2) % 3
result = [smallArr + [last] for smallArr in smallColl1]
result.extend([smallArr + [altLast1] for smallArr in smallColl2])
result.extend([smallArr + [altLast2] for smallArr in smallColl2])
return result
result = arrays_with_differences([1,0,1,0,1,1,1,1,1,1,1,1,1], 4)
print(len(result))

Random Integer inside range with probability (Python) [duplicate]

This question already has answers here:
Random weighted choice
(7 answers)
Closed 8 years ago.
I am making a text-based RPG. I have an algorithm that determines the damage dealt by the player to the enemy which is based off the values of two variables. I am not sure how the first part of the algorithm will work quite yet, but that isn't important.
(AttackStrength is an attribute of the player that represents generally how strong his attacks are. WeaponStrength is an attribute of swords the player wields and represents generally how strong attacks are with the weapon.)
Here is how the algorithm will go:
import random
Damage = AttackStrength (Do some math operation to WeaponStrength) WeaponStrength
DamageDealt = randrange(DamageDealt - 4, DamageDealt + 1) #Bad pseudocode, sorry
What I am trying to do with the last line is get a random integer inside a range of integers with the minimum bound as 4 less than Damage, and the maximum bound as 1 more than Damage. But, that's not all. I want to assign probabilities that:
X% of the time DamageDealt will equal Damage
Y% of the time DamageDealt will equal one less than Damage
Z% of the time DamageDealt will equal two less than Damage
A% of the time DamageDealt will equal three less than Damage
B% of the time DamageDealt will equal three less than Damage
C% of the time DamageDealt will equal one more than Damage
I hope I haven't over-complicated all of this thank you!
I think the easiest way to do random weighted probability when you have nice integer probabilities like that is to simply populate a list with multiple copies of your choices - in the right ratios - then choose one element from it, randomly.
Let's do it from -3 to 1 with your (original) weights of 10,10,25,25,30 percent. These share a gcd of 5, so you only need a list of length 20 to hold your choices:
choices = [-3]*2 + [-2]*2 + [-1]*5 + [0]*5 + [1]*6
And implementation done, just choose randomly from that. Demo showing 100 trials:
trials = [random.choice(choices) for _ in range(100)]
[trials.count(i) for i in range(-3,2)]
Out[18]: [11, 7, 27, 22, 33]
Essentially, what you're trying to accomplish is simulation of a loaded die: you have six possibilities and want to assign different probabilities to each one. This is a fairly interesting problem, mathematically speaking, and here is a wonderful piece on the subject.
Still, you're probably looking for something a little less verbose, and the easiest pattern to implement here would be via roulette wheel selection. Given a dictionary where keys are the various 'sides' (in this case, your possible damage formulae) and the values are the probabilities that each side can occur (.3, .25, etc.), the method looks like this:
def weighted_random_choice(choices):
max = sum(choices.values())
pick = random.uniform(0, max)
current = 0
for key, value in choices.items():
current += value
if current > pick:
return key
Suppose that we wanted to have these relative weights for the outcomes:
a = (10, 15, 15, 25, 25, 30)
Then we create a list of partial sums b and a function c:
import random
b = [sum(a[:i+1]) for i,x in enumerate(a)]
def c():
n = random.randrange(sum(a))
for i, v in enumerate(b):_
if n < v: return i
The function c will return an integer from 0 to len(a)-1 with probability proportional to the weights specified in a.
This can be a tricky problem with a lot of different probabilities. Since you want to impose probabilities on the outcomes it's not really fair to call them "random". It always helps to imagine how you might represent your data. One way would be to keep a tuple of tuples like
probs = ((10, +1), (30, 0), (25, -1), (25, -2), (15, -3))
You will notice I have adjusted the series to put the highest adjustment first and so on. I have also removed the duplicate "15, -3) that your question implies because (I imagine) of a line duplicated by accident. One very useful test is to ensure that your probabilities add up to 100 (since I've represented them as integer percentages). This reveals a data fault:
>>> sum(prob[0] for prob in probs)
105
This needn't be an issue unless you really want your probabilities to sum to a sensible value. If this isn't necessary you can just treat them as weightings and select random numbers from (0, 104) instead of (0, 99). This is the course I will follow, but the adjustment should be relatively simple.
Given probs and a random number between 0 and (in your case) 104, you can iterate over the probs structure, accumulating probabilities until you find the bin this particular random number belongs to. This would look (something) like:
def damage_offset(N):
prob = random.randint(0, N-1)
cum_prob = 0
for prob, offset in probs:
cum_prob += prob
if cum_prob >= prob:
return offset
This should always terminate if you get your data right (hence my paranoid check on your weightings - I've been doing this quite a while).
Of course it's often possible to trade memory for speed. If the above needs to work faster then it's relatively easy to create a structure that maps random integer choices direct to their results. One way to construct such a mapping would be
damage_offsets = []
for i in range(N):
damage_offsets.append(damage_offset(i))
Then all you have to do after you've picked your random number r between 1 and N is to look up damage_offsets[r-1] for the particular value of r1 and you have created an O(1) operation. As I mentioned at the start, this isn't likely going to be terribly useful unless your probability list becomes huge (but if it does then you really will need to to avoid O(N) operations when you have large N for the number of probability buckets).
Apologies for untested code.

Categories