Calculate the probability of reaching target sum across multiple lotteries - python

If I have separate lotteries with different prize sizes and different odds of winning, how do I calculate the probability of winning at least a given amount if I play each lottery simultaneously and only once?
Lottery A: 50% chance of winning 5 gold.
Lottery B: 40% chance of winning 6 gold.
Lottery C: 30% chance of winning 7 gold.
Is there a method of calculating the probability that I win at least a given value (like 10 gold) if I play all the lotteries simultaneously? I would ideally like it to work for a set of approximately 40 lotteries.
The input would be a list of tuples with probability and prize size, like:
lottery_list = [(0.5, 5), (0.4, 6), (0.3, 7)]
And then a function to calculate the probability of winning at least a target value, like in this case 10 gold:
prob = win_at_least(lottery_list, target_val=10)

I ended up creating a solution myself. This code ignores combinations that are irrelevant, i.e. all combinations of 'downstream' lotteries if a given set of lotteries has already yielded the target value. It takes about 10 ms for 40 lotteries and seems to scale very well.
import numpy as np
import pandas as pd
def multi_lottery(lottery_list: list, target: float) -> float:
"""
This function calculates the odds of winning at least the target
value from a list of lotteries, where each lottery has a
different probability of winning and a different value.
Args:
lottery_list (list): A list of tuples with lottery
probabilities and values.
target (float): The target value.
Returns:
float: The odds of winning at least target value.
"""
# Create a pandas dataframe with the lottery list.
df = pd.DataFrame(lottery_list, columns=["probability", "value"])
# Sort the dataframe by descending value.
df_sorted = df.sort_values("value", ascending=False, ignore_index=True)
probs = df_sorted["probability"].values
values = df_sorted["value"].values
# Create a mask equal to the length of the lottery list.
# This will be used to determine which lotteries are included in
# each combination.
length = len(df_sorted)
mask = np.ones(length, dtype=int)
# Start with odds of losing at 1.
total_odds = 0
while True:
# Go through the lottery list using the mask to determine if
# each lottery is won, and add the values together until the target
# value is reached.
odds = 1
value = 0
for idx, prob in enumerate(probs):
# Include the winning chance if the binary number is 1.
if mask[idx] == 1:
odds *= prob
value += values[idx]
# Else include the losing chance.
else:
odds *= 1 - prob
# If the target value is reached, all subsequent lotteries
# are ignored and the odds are added to the total.
if value >= target:
# Update the anti-odds.
total_odds += odds
# Update the mask by setting the current lottery to zero.
mask[idx] = 0
break
# Else if the last lottery is reached, update the mask by
# setting the largest active lottery to zero and all smaller lotteries to one.
elif idx == length - 1:
largest_active_idx = 0
for i, v in enumerate(mask):
if v == 1:
largest_active_idx = i
mask[largest_active_idx] = 0
mask[largest_active_idx + 1 :] = 1
# Check if the mask is all zeros. If so, break the loop.
if np.sum(mask) == 0:
break
return total_odds

Related

Recursive python function to make two arrays equal?

I'm attempting to write python code to solve a transportation problem using the Least Cost method. I have a 2D numpy array that I am iterating through to find the minimum, perform calculations with that minimum, and then replace it with a 0 so that the loops stops when values matches constantarray, an array of the same shape containing only 0s. The values array contains distances from points in supply to points in demand. I'm currently using a while loop to do so, but the loop isn't running because values.all() != constantarray.all() evaluates to False.
I also need the process to repeat once the arrays have been edited to move onto the next lowest number in values.
constarray = np.zeros((len(supply),len(demand)) #create array of 0s
sandmoved = np.zeros((len(supply),len(demand)) #used to store information needed for later
totalcost = 0
while values.all() != constantarray.all(): #iterate until `values` only contains 0s
m = np.argmin(values,axis = 0)[0] #find coordinates of minimum value
n = np.argmin(values,axis = 1)[0]
if supply[m] > abs(demand[m]): #all demand numbers are negative
supply[m]+=demand[n] #subtract demand from supply
totalcost +=abs(demand[n])*values[m,n]
sandmoved[m,n] = demand[n] #add amount of 'sand' moved to an empty array
values[m,0:-1] = 0 #replace entire m row with 0s since demand has been filled
demand[n]=0 #replace demand value with 0
elif supply[m]< abs(demand[n]):
demand[n]+=supply[m] #combine positive supply with negative demand
sandmoved[m,n]=supply[m]
totalcost +=supply[m]*values[m,n]
values[:-1,n]=0 #replace entire column with 0s since supply has been depleted
supply[m] = 0
There is an additional if statement for when supply[m]==demand[n] but I feel that isn't necessary. I've already tried using nested for loops, and so many different syntax combinations for a while loop but I just can't get it to work the way I want it to. Even when running the code block over over by itself, m and n stay the same and the function removes one value from values but doesn't add it to sandmoved. Any ideas are greatly appreciated!!
Well, here is an example from an old implementation of mine:
import numpy as np
values = np.array([[3, 1, 7, 4],
[2, 6, 5, 9],
[8, 3, 3, 2]])
demand = np.array([250, 350, 400, 200])
supply = np.array([300, 400, 500])
totCost = 0
MAX_VAL = 2 * np.max(values) # choose MAX_VAL higher than all values
while np.any(values.ravel() < MAX_VAL):
# find row and col indices of min
m, n = np.unravel_index(np.argmin(values), values.shape)
if supply[m] < demand[n]:
totCost += supply[m] * values[m,n]
demand[n] -= supply[m]
values[m,:] = MAX_VAL # set all row to MAX_VAL
else:
totCost += demand[n] * values[m,n]
supply[m] -= demand[n]
values[:,n] = MAX_VAL # set all col to MAX_VAL
Solution:
print(totCost)
# 2850
Basically, start by choosing a MAX_VAL higher than all given values and a totCost = 0. Then follow the standard steps of the algorithm. Find row and column indices of the smallest cell, say m, n. Select the m-th supply or the n-th demand whichever is smaller, then add what you selected multiplied by values[m,n] to the totCost, and set all entries of the selected row or column to MAX_VAL to avoid it in the next iterations. Update the greater value by subtracting the selected one and repeat until all values are equal to MAX_VAL.

Average time to hit a given line on 2D random walk on a unit grid

I am trying to simulate the following problem:
Given a 2D random walk (in a lattice grid) starting from the origin what is the average waiting time to hit the line y=1-x
import numpy as np
from tqdm import tqdm
N=5*10**3
results=[]
for _ in tqdm(range(N)):
current = [0,0]
step=0
while (current[1]+current[0] != 1):
step += 1
a = np.random.randint(0,4)
if (a==0):
current[0] += 1
elif (a==1):
current[0] -= 1
elif (a==2):
current[1] += 1
elif (a==3):
current[1] -= 1
results.append(step)
This code is slow even for N<10**4 I am not sure how to optimize it or change it to properly simulate the problem.
Instead of simulating a bunch of random walks sequentially, lets try simulating multiple paths at the same time and tracking the probabilities of those happening, for instance we start at position 0 with probability 1:
states = {0+0j: 1}
and the possible moves along with their associated probabilities would be something like this:
moves = {1+0j: 0.25, 0+1j: 0.25, -1+0j: 0.25, 0-1j: 0.25}
# moves = {1: 0.5, -1:0.5} # this would basically be equivelent
With this construct we can update to new states by going over the combination of each state and each move and update probabilities accordingly
def simulate_one_step(current_states):
newStates = {}
for cur_pos, prob_of_being_here in current_states.items():
for movement_dist,prob_of_moving_this_way in moves.items():
newStates.setdefault(cur_pos+movement_dist, 0)
newStates[cur_pos+movement_dist] += prob_of_being_here*prob_of_moving_this_way
return newStates
Then we just iterate this popping out all winning states at each step:
for stepIdx in range(1, 100):
states = simulate_one_step(states)
winning_chances = 0
# use set(keys) to make copy so we can delete cases out of states as we go.
for pos, prob in set(states.items()):
# if y = 1-x
if pos.imag == 1 - pos.real:
winning_chances += prob
# we no longer consider this a state that propogated because the path stops here.
del states[pos]
print(f"probability of winning after {stepIdx} moves is: {winning_chances}")
you would also be able to look at states for an idea of the distribution of possible positions, although totalling it in terms of distance from the line simplifies the data. Anyway, the final step would be to average the steps taken by the probability of taking that many steps and see if it converges:
total_average_num_moves += stepIdx * winning_chances
But we might be able to gather more insight by using symbolic variables! (note I'm simplifying this to a 1D problem which I describe how at the bottom)
import sympy
x = sympy.Symbol("x") # will sub in 1/2 later
moves = {
1: x, # assume x is the chances for us to move towards the target
-1: 1-x # and therefore 1-x is the chance of moving away
}
This with the exact code as written above gives us this sequence:
probability of winning after 1 moves is: x
probability of winning after 2 moves is: 0
probability of winning after 3 moves is: x**2*(1 - x)
probability of winning after 4 moves is: 0
probability of winning after 5 moves is: 2*x**3*(1 - x)**2
probability of winning after 6 moves is: 0
probability of winning after 7 moves is: 5*x**4*(1 - x)**3
probability of winning after 8 moves is: 0
probability of winning after 9 moves is: 14*x**5*(1 - x)**4
probability of winning after 10 moves is: 0
probability of winning after 11 moves is: 42*x**6*(1 - x)**5
probability of winning after 12 moves is: 0
probability of winning after 13 moves is: 132*x**7*(1 - x)**6
And if we ask the OEIS what the sequence 1,2,5,14,42,132... means it tells us those are Catalan numbers with the formula of (2n)!/(n!(n+1)!) so we can write a function for the non-zero terms in that series as:
f(n,x) = (2n)! / (n! * (n+1)!) * x^(n+1) * (1-x)^n
or in actual code:
import math
def probability_of_winning_after_2n_plus_1_steps(n, prob_of_moving_forward = 0.5):
return (math.factorial(2*n)/math.factorial(n)/math.factorial(n+1)
* prob_of_moving_forward**(n+1) * (1-prob_of_moving_forward)**n)
which now gives us a relatively instant way of calculating relevant parameters for any length, or more usefully ask wolfram alpha what the average would be (it diverges)
Note that we can simplify this to a 1D problem by considering y-x as one variable: "we start at y-x = 0 and move such that y-x either increases or decreases by 1 each move with equal chance and we are interested when y-x = 1. This means we can consider the 1D case by subbing in z=y-x.
Vectorisation would result in much faster code, approximately ~90K times faster. Here is the function that would return step to hit y=1-x line starting from (0,0) and trajectory generation on the 2D grid with unit steps .
import numpy as np
def _random_walk_2D(sim_steps):
""" Walk on 2D unit steps
return x_sim, y_sim, trajectory, number_of_steps_first_hit to y=1-x """
random_moves_x = np.insert(np.random.choice([1,0,-1], sim_steps), 0, 0)
random_moves_y = np.insert(np.random.choice([1,0,-1], sim_steps), 0, 0)
x_sim = np.cumsum(random_moves_x)
y_sim = np.cumsum(random_moves_y)
trajectory = np.array((x_sim,y_sim)).T
y_hat = 1-x_sim # checking if hit y=1-x
y_hit = y_hat-y_sim
hit_steps = np.where(y_hit == 0)
number_of_steps_first_hit = -1
if hit_steps[0].shape[0] > 0:
number_of_steps_first_hit = hit_steps[0][0]
return x_sim, y_sim, trajectory, number_of_steps_first_hit
if number_of_steps_first_hit is -1 it means trajectory does not hit the line.
A longer simulation and repeating might give the average behaviour, but the following one tells if it does not escape to Infiniti it hits line on average ~84 steps.
sim_steps= 5*10**3 # 5K steps
#Repeat
nrepeat = 40000
hit_step = [_random_walk_2D(sim_steps)[3] for _ in range(nrepeat)]
hit_step = [h for h in hit_step if h > -1]
np.mean(hit_step) # ~84 step
Much longer sim_steps will change the result though.
PS:
Good exercise, hope that this wasn't a homework, if it was homework, please cite this answer if it is used.
Edit
As discussed in the comments current _random_walk_2D works for 8-directions. To restrict it to cardinal direction we could do the following filtering:
cardinal_x_y = [(t[0], t[1]) for t in zip(random_moves_x, random_moves_y)
if np.abs(t[0]) != np.abs(t[1])]
random_moves_x = [t[0] for t in cardinal_x_y]
random_moves_y = [t[1] for t in cardinal_x_y]
though this would slow it down the function a bit but still will be super fast compare to for loop solutions.

How to generate in python a random number in a range but biased toward some specific numbers?

I would like to choose a range, for example, 60 to 80, and generate a random number from it. However, between 65-72 I'd like a higher probability, while the other ranges aside from this (60-64 and 73 to 80) to have lower.
An example:
From 60-64 there's 35% chance of being choosen as well for 73-80. From 65-72 65% chance.
The elements in the subranges are equally likely. I'm generating integers.
Also, it would be interesting a scalable solution, so that one could expand its usage for higher ranges, for example, 1000-2000, but biased toward 1400-1600.
Does anyone could help with some ideas?
Thanks beforehand for anyone willing to contribute!
For equally likely outcomes in the subranges, the following will do the trick:
import random
THRESHOLD = [0.65, 0.65 + 0.35 * 5 / 13]
def my_distribution():
u = random.random()
if u <= THRESHOLD[0]:
return random.randint(65, 72)
elif u <= THRESHOLD[1]:
return random.randint(60, 64)
else:
return random.randint(73, 80)
This uses a uniform random number to decide which subrange you're in, then generates values equally likely within that subrange.
The THRESHOLD values are similar to a cumulative distribution function, but arranged so the most likely outcome is checked first. 65% of the time (u <= THRESHOLD[0]) you'll generate from the range [65, 72]. Failing that, 5 of the 13 remaining possibilities (5/13 of 35%) are in the range [60, 64], and the rest are in the range [73, 80]. A Uniform(0,1) value u will fall below the first threshold 65% of the time, and failing that, below the second threshold 5/13 of the time and above that threshold the remaining 8/13 of the time.
The results look like this:
Here's a numpy based solution:
import numpy as np
# Some params
left_start = 60 # Start of left interval====== [60,64]
middle_start = 65 # Start of middle interval === [65,72]
right_start = 73 # Start of right interval ===- [73,80]
right_end = 80 # End of the right interval == [73,80]
count = 1000 # Number of values to generate.
middle_wt = 0.65 # Middle range to be selected with wt/prob=0.65
middle = np.arange(middle_start, right_start)
rest = np.r_[left_start:middle_start, right_start:(right_end+1)]
rng1 = np.random.default_rng(None) # Generator for randomly choosing range.
rng2 = np.random.default_rng(None) # Generator for generating values in the ranges.
# Now generate a random list of 0s and 1s to indicate choice between
# 'middle' and 'rest'. For this number generation we will set middle_wt as
# the weight/probability for 0 and (1-middle_wt) as the weight/probability for 1.
# (0 indicates middle range and 1 indicates the rest.)
range_choices = rng1.choice([0,1], replace=True, size=count, p=[middle_wt, (1-middle_wt)])
# Now generate 'count' values for the middle range
middle_choices = rng2.choice(middle, replace=True, size=count)
# Now generate 'count' values for the 'rest' of the range (non-middle)
rest_choices = rng2.choice(rest, replace=True, size=count)
result = np.choose(range_choices, (middle_choices,rest_choices))
print (np.sum((65 <= result) & (result<=72)))
Note:
In the above code, p=[middle_wt, (1-middle_wt)] is a list of weights. The middle_wt is the weight for the middle range [65,72], and the (1-middle_wt) is the weight for the rest.
Output:
649 # Indicates that 649 out of the 1000 values of result are in the middle range [65,72]

Iteration performance

I made a function to evaluate the following problem experimentally, taken from a A Primer for the Mathematics of Financial Engineering.
Problem: Let X be the number of times you must flip a fair coin until it lands heads. What are E[X] (expected value) and var(X) (variance)?
Following the textbook solution, the following code yields the correct answer:
from sympy import *
k = symbols('k')
Expected_Value = summation(k/2**k, (k, 1, oo)) # Both solutions work
Variance = summation(k**2/2**k, (k, 1, oo)) - Expected_Value**2
To validate this answer, I decided to have a go at making a function to simulate this experiment. The following code is what I came up with.
def coin_toss(toss, probability=[0.5, 0.5]):
"""Computes expected value and variance for coin toss experiment"""
flips = [] # Collects total number of throws until heads appear per experiment.
for _ in range(toss): # Simulate n flips
number_flips=[] # Number of flips until heads is tossed
while sum(number_flips) == 0: # Continue simulation while Tails are thrown
number_flips.append(np.random.choice(2, p=probability)) # Append result to number_flips
flips.append(len(number_flips)) #Append number of flips until lands heads to flips
Expected_Value, Variance = np.mean(flips), np.var(flips)
print('E[X]: {}'.format(Expected_Value),
'\nvar[X]: {}'.format(Variance)) # Return expected value
The run time if I simulate 1e6 experiments, using the following code is approximately 35.9 seconds.
from timeit import Timer
t1 = Timer("""coin_toss(1000000)""", """from __main__ import coin_toss""")
print(t1.timeit(1))
In the interest of developing my understanding of Python, is this a particularly efficient/pythonic way of approaching a problem like this? How can I utilise existing libraries to improve efficiency/flow execution?
In order to code in an efficient and pythonic way, you must take a look at PythonSpeed and NumPy. One exemple of a faster code using numpy can be found below.
The abc of optimizing in python+numpy is to vectorize operations, which in this case is quite dificult because there is a while that could actually be infinite, the coin can be flipped tails 40 times in a row. However, instead of doing a for with toss iterations, the work can be done in chunks. That is the main difference between coin_toss from the question and this coin_toss_2d approach.
coin_toss_2d
The main advantatge of coin_toss_2d is working by chunks, the size of these chunks has some default values, but they can be modified (as they will affect speed). Thus, it will only iterate over the while current_toss<toss a number of times toss%flips_at_a_time. This is achieved with numpy, which allows to generate a matrix with the results of repeating flips_at_a_time times the experiment of flipping a coin flips_per_try times. This matrix will contain 0 (tails) and 1 (heads).
# i.e. doing only 5 repetitions with 3 flips_at_a_time
flip_events = np.random.choice([0,1],size=(repetitions_at_a_time,flips_per_try),p=probability)
# Out
[[0 0 0] # still no head, we will have to keep trying
[0 1 1] # head at the 2nd try (position 1 in python)
[1 0 0]
[1 0 1]
[0 0 1]]
Once this result is obtained, argmax is called. This finds the index corresponding to the maximum (which will be 1, a head) of each row (repetition) and in case of multiple occurences, returns the first one, which is exactly what is needed, the first head after a sequence of tails.
maxs = flip_events.argmax(axis=1)
# Out
[0 1 0 0 2]
# The first position is 0, however, flip_events[0,0]!=1, it's not a head!
However, the case where all the row is 0 must be considered. In this case, the maximum will be 0, and its first occurence will also be 0, the first column (try). Therefore, we check that all the maximums found at the first try correspond to a head at the first try.
not_finished = (maxs==0) & (flip_events[:,0]!=1)
# Out
[ True False False False False] # first repetition is not finished
If that is not the case, we loop repeating that same process but only for the repetitions where there was no head in any of the tries.
n = np.sum(not_finished)
while n!=0: # while there are sequences without any head
flip_events = np.random.choice([0,1],size=(n,flips_per_try),p=probability) # number of experiments reduced to n (the number of all tails sequences)
maxs2 = flip_events.argmax(axis=1)
maxs[not_finished] += maxs2+flips_per_try # take into account that there have been flips_per_try tries already (every iteration is added)
not_finished2 = (maxs2==0) & (flip_events[:,0]!=1)
not_finished[not_finished] = not_finished2
n = np.sum(not_finished)
# Out
# flip_events
[[1 0 1]] # Now there is a head
# maxs2
[0]
# maxs
[3 1 0 0 2] # The value of the still unfinished repetition has been updated,
# taking into account that the first position in flip_events is the 4th,
# without affecting the rest
Then the indexes corresponding to the first head occurence are stored (we have to add 1 because python indexing starts at zero instead of 1). There is one try ... except ... block to cope with cases where toss is not a multiple of repetitions_at_a_time.
def coin_toss_2d(toss, probability=[.5,.5],repetitions_at_a_time=10**5,flips_per_try=20):
# Initialize and preallocate data
current_toss = 0
flips = np.empty(toss)
# loop by chunks
while current_toss<toss:
# repeat repetitions_at_a_time times experiment "flip coin flips_per_try times"
flip_events = np.random.choice([0,1],size=(repetitions_at_a_time,flips_per_try),p=probability)
# store first head ocurrence
maxs = flip_events.argmax(axis=1)
# Check for all tails sequences, that is, repetitions were we have to keep trying to get a head
not_finished = (maxs==0) & (flip_events[:,0]!=1)
n = np.sum(not_finished)
while n!=0: # while there are sequences without any head
flip_events = np.random.choice([0,1],size=(n,flips_per_try),p=probability) # number of experiments reduced to n (the number of all tails sequences)
maxs2 = flip_events.argmax(axis=1)
maxs[not_finished] += maxs2+flips_per_try # take into account that there have been flips_per_try tries already (every iteration is added)
not_finished2 = (maxs2==0) & (flip_events[:,0]!=1)
not_finished[not_finished] = not_finished2
n = np.sum(not_finished)
# try except in case toss is not multiple of repetitions_at_a_time, in general, no error is raised, that is why a try is useful
try:
flips[current_toss:current_toss+repetitions_at_a_time] = maxs+1
except ValueError:
flips[current_toss:] = maxs[:toss-current_toss]+1
# Update current_toss and move to the next chunk
current_toss += repetitions_at_a_time
# Once all values are obtained, average and return them
Expected_Value, Variance = np.mean(flips), np.var(flips)
return Expected_Value, Variance
coin_toss_map
Here the code is basically the same, but now, the intrinsec while is done in a separate function, which is called from the wrapper function coin_toss_map using map.
def toss_chunk(args):
probability,repetitions_at_a_time,flips_per_try = args
# repeat repetitions_at_a_time times experiment "flip coin flips_per_try times"
flip_events = np.random.choice([0,1],size=(repetitions_at_a_time,flips_per_try),p=probability)
# store first head ocurrence
maxs = flip_events.argmax(axis=1)
# Check for all tails sequences
not_finished = (maxs==0) & (flip_events[:,0]!=1)
n = np.sum(not_finished)
while n!=0: # while there are sequences without any head
flip_events = np.random.choice([0,1],size=(n,flips_per_try),p=probability) # number of experiments reduced to n (the number of all tails sequences)
maxs2 = flip_events.argmax(axis=1)
maxs[not_finished] += maxs2+flips_per_try # take into account that there have been flips_per_try tries already (every iteration is added)
not_finished2 = (maxs2==0) & (flip_events[:,0]!=1)
not_finished[not_finished] = not_finished2
n = np.sum(not_finished)
return maxs+1
def coin_toss_map(toss,probability=[.5,.5],repetitions_at_a_time=10**5,flips_per_try=20):
n_chunks, remainder = divmod(toss,repetitions_at_a_time)
args = [(probability,repetitions_at_a_time,flips_per_try) for _ in range(n_chunks)]
if remainder:
args.append((probability,remainder,flips_per_try))
flips = np.concatenate(map(toss_chunk,args))
# Once all values are obtained, average and return them
Expected_Value, Variance = np.mean(flips), np.var(flips)
return Expected_Value, Variance
Performance comparison
In my computer, I got the following computation time:
In [1]: %timeit coin_toss(10**6)
# Out
# ('E[X]: 2.000287', '\nvar[X]: 1.99791891763')
# ('E[X]: 2.000459', '\nvar[X]: 2.00692478932')
# ('E[X]: 1.998118', '\nvar[X]: 1.98881045808')
# ('E[X]: 1.9987', '\nvar[X]: 1.99508631')
# 1 loop, best of 3: 46.2 s per loop
In [2]: %timeit coin_toss_2d(10**6,repetitions_at_a_time=5*10**5,flips_per_try=4)
# Out
# 1 loop, best of 3: 197 ms per loop
In [3]: %timeit coin_toss_map(10**6,repetitions_at_a_time=4*10**5,flips_per_try=4)
# Out
# 1 loop, best of 3: 192 ms per loop
And the results for the mean and variance are:
In [4]: [coin_toss_2d(10**6,repetitions_at_a_time=10**5,flips_per_try=10) for _ in range(4)]
# Out
# [(1.999848, 1.9990739768960009),
# (2.000654, 2.0046035722839997),
# (1.999835, 2.0072329727749993),
# (1.999277, 2.001566477271)]
In [4]: [coin_toss_map(10**6,repetitions_at_a_time=10**5,flips_per_try=4) for _ in range(4)]
# Out
# [(1.999552, 2.0005057992959996),
# (2.001733, 2.011159996711001),
# (2.002308, 2.012128673136001),
# (2.000738, 2.003613455356)]

Changing this Python program to have function def()

The following Python program flips a coin several times, then reports the longest series of heads and tails. I am trying to convert this program into a program that uses functions so it uses basically less code. I am very new to programming and my teacher requested this of us, but I have no idea how to do it. I know I'm supposed to have the function accept 2 parameters: a string or list, and a character to search for. The function should return, as the value of the function, an integer which is the longest sequence of that character in that string. The function shouldn't accept input or output from the user.
import random
print("This program flips a coin several times, \nthen reports the longest
series of heads and tails")
cointoss = int(input("Number of times to flip the coin: "))
varlist = []
i = 0
varstring = ' '
while i < cointoss:
r = random.choice('HT')
varlist.append(r)
varstring = varstring + r
i += 1
print(varstring)
print(varlist)
print("There's this many heads: ",varstring.count("H"))
print("There's this many tails: ",varstring.count("T"))
print("Processing input...")
i = 0
longest_h = 0
longest_t = 0
inarow = 0
prevIn = 0
while i < cointoss:
print(varlist[i])
if varlist[i] == 'H':
prevIn += 1
if prevIn > longest_h:
longest_h = prevIn
print("",longest_h,"")
inarow = 0
if varlist[i] == 'T':
inarow += 1
if inarow > longest_t:
longest_t = inarow
print("",longest_t,"")
prevIn = 0
i += 1
print ("The longest series of heads is: ",longest_h)
print ("The longest series of tails is: ",longest_t)
If this is asking too much, any explanatory help would be really nice instead. All I've got so far is:
def flip (a, b):
flipValue = random.randint
but it's barely anything.
import random
def Main():
numOfFlips=getFlips()
outcome=flipping(numOfFlips)
print(outcome)
def getFlips():
Flips=int(input("Enter number if flips:\n"))
return Flips
def flipping(numOfFlips):
longHeads=[]
longTails=[]
Tails=0
Heads=0
for flips in range(0,numOfFlips):
flipValue=random.randint(1,2)
print(flipValue)
if flipValue==1:
Tails+=1
longHeads.append(Heads) #recording value of Heads before resetting it
Heads=0
else:
Heads+=1
longTails.append(Tails)
Tails=0
longestHeads=max(longHeads) #chooses the greatest length from both lists
longestTails=max(longTails)
return "Longest heads:\t"+str(longestHeads)+"\nLongest tails:\t"+str(longestTails)
Main()
I did not quite understand how your code worked, so I made the code in functions that works just as well, there will probably be ways of improving my code alone but I have moved the code over to functions
First, you need a function that flips a coin x times. This would be one possible implementation, favoring random.choice over random.randint:
def flip(x):
result = []
for _ in range(x):
result.append(random.choice(("h", "t")))
return result
Of course, you could also pass from what exactly we are supposed to take a choice as a parameter.
Next, you need a function that finds the longest sequence of some value in some list:
def longest_series(some_value, some_list):
current, longest = 0, 0
for r in some_list:
if r == some_value:
current += 1
longest = max(current, longest)
else:
current = 0
return longest
And now you can call these in the right order:
# initialize the random number generator, so we get the same result
random.seed(5)
# toss a coin a hundred times
series = flip(100)
# count heads/tails
headflips = longest_series('h', series)
tailflips = longest_series('t', series)
# print the results
print("The longest series of heads is: " + str(headflips))
print("The longest series of tails is: " + str(tailflips))
Output:
>> The longest series of heads is: 8
>> The longest series of heads is: 5
edit: removed the flip implementation with yield, it made the code weird.
Counting the longest run
Let see what you have asked for
I'm supposed to have the function accept 2 parameters: a string or list,
or, generalizing just a bit, a sequence
and a character
again, we'd speak, generically, of an item
to search for. The function should return, as the value of the
function, an integer which is the longest sequence of that character
in that string.
My implementation of the function you are asking for, complete of doc
string, is
def longest_run(i, s):
'Counts the longest run of item "i" in sequence "s".'
c, m = 0, 0
for el in s:
if el==i:
c += 1
elif c:
m = m if m >= c else c
c = 0
return m
We initialize c (current run) and m (maximum run so far) to zero,
then we loop, looking at every element el of the argument sequence s.
The logic is straightforward but for elif c: whose block is executed at the end of a run (because c is greater than zero and logically True) but not when the previous item (not the current one) was not equal to i. The savings are small but are savings...
Flipping coins (and more...)
How can we simulate flipping n coins? We abstract the problem and recognize that flipping n coins corresponds to choosing from a collection of possible outcomes (for a coin, either head or tail) for n times.
As it happens, the random module of the standard library has the exact answer to this problem
In [52]: random.choices?
Signature: choices(population, weights=None, *, cum_weights=None, k=1)
Docstring:
Return a k sized list of population elements chosen with replacement.
If the relative weights or cumulative weights are not specified,
the selections are made with equal probability.
File: ~/lib/miniconda3/lib/python3.6/random.py
Type: method
Our implementation, aimed at hiding details, could be
def roll(n, l):
'''Rolls "n" times a dice/coin whose face values are listed in "l".
E.g., roll(2, range(1,21)) -> [12, 4] simulates rolling 2 icosahedron dices.
'''
from random import choices
return choices(l, k=n)
Putting this together
def longest_run(i, s):
'Counts the longest run of item "i" in sequence "s".'
c, m = 0, 0
for el in s:
if el==i:
c += 1
elif c:
m = m if m >= c else c
c = 0
return m
def roll(n, l):
'''Rolls "n" times a dice/coin whose face values are listed in "l".
E.g., roll(2, range(1,21)) -> [12, 4] simulates rolling 2 icosahedron dices.
'''
from random import choices
return choices(l, k=n)
N = 100 # n. of flipped coins
h_or_t = ['h', 't']
random_seq_of_h_or_t = flip(N, h_or_t)
max_h = longest_run('h', random_seq_of_h_or_t)
max_t = longest_run('t', random_seq_of_h_or_t)

Categories