Longest expected head streak in 200 coinflips - python

I was trying to calculate the expected value for the longest consecutive heads streak in 200 coin flips, using python. I came up with a code which I think does the job right but it's just not efficient because of the amount of calculations and data storage it requires, and I was wondering if someone could help me out with this, making it faster and more efficient (I took only one course of python programming in last semester without any previous knowledge of the subject).
My code was
import numpy as np
from itertools import permutations
counter = 0
sett = 0
rle = []
matrix = np.zeros(200)
for i in range (0,200):
matrix[i] = 1
for j in permutations(matrix):
for k in j:
if k == 1:
counter += 1
else:
if counter > sett:
sett == counter
counter == 0
rle.append(sett)
After finding rle, I'd iterate over it to get how many streaks of which length there are, and their sum divided by 2^200 would give me the expected value I'm looking for.
Thanks in advance for help, much appreciated!

You don't have to try all the permutations (in fact you cannot), but you can do a simple Monte Carlo style simulation. Repeat the 200 coin flips many times. Average the lengths of longest streaks you get and this will be a good approximation of the expected value.
def oneTrial (noOfCoinFlips):
s = numpy.random.binomial(1, 0.5, noOfCoinFlips)
maxCount = 0
count = 0
for x in s:
if x == 1:
count += 1
if x == 0:
count = 0
maxCount = max(maxCount, count)
return maxCount
numpy.mean([oneTrial(200) for x in range(10000)])
Output: 6.9843
Also see this thread for exact computation without using Python simulation.

This is an answer to a slightly different question. But, as I had invested an hour and half of my time into it, I didn't wanna scrape it off.
Let E(k) denote a k head streak, i.e., you get k consecutive heads from the first toss onwards.
E(0): T { another 199 tosses that we do not care about }
E(1): H T { another 198 tosses... }
.
.
E(198): { 198 heads } T H
E(199): { 199 heads } T
E(200): { 200 heads }
Note that P(0) = 0.5, which is P(tails in first toss)
whereas P(1) = 0.25 , i.e., P(heads in first toss and tails in the second)
P(0) = 2**-1
P(1) = 2**-2
.
.
.
P(198) = 2**-199
P(199) = 2**-200
P(200) = 2**-200 #same as P(199)
Which means if you toss a coin 2**200 times, you'd get
E(0) 2**199 times
E(1) 2**198 times
.
.
E(198) 2**1 times
E(199) 2**0 times and
E(200) 2**0 times.
Thus, the expected value reduces to
(0*(2**199) + 1*(2**198) + 2*(2**197) + ... + 198*(2**1) + 199*(2**0) + 200*(2**0))/2**200
This number is virtually equal to 1.
Expected_value = 1 - 2**-200
How I got the difference.
>>> diff = 2**200 - sum([ k*(2**(199-k)) for k in range(200)], 200*(2**0))
>>> diff
1
This can be generalized to n tosses as
f(n) = 1 - 2**(-n)

Related

Calculating time complexity of a python algorithm

I am trying to calculate the time complexity of this algorithm and failing horribly I think. Can anyone tell if I'm on the right track here? any help would be greatly appreciated. (:
data = [["Jacob", "91"], ["John", "81"], ["James", "71"], ["Joe", "61"]] size n
name = "Joe"
not_found = True
index = 0
marks = 0
for i in range(len(data)): # n
if name == data[i][0]: # n(3)
marks = data[i][1] # n(3)(3)
print(marks) #1
#T(n) = n + 3n +9n^2 + 1
#T(n) = 9n^2 +4n + 1
#O(n) = 9n^2
I think a better way to approach this problem is to think how many times you run the same command.
for i in range(len(data)):
if name == data[i][0]:
marks = data[i][1] # <- this command only runs n times
so this is O(n), since you run this command n times
marks = 0
for _ in range(len(data)):
for _ in range(len(data)):
marks += 1 # <- this command only runs n^2 times
so this is O(n^2)

How to get the correct number of distinct combination locks with a margin or error of +-2?

I am trying to solve the usaco problem combination lock where you are given a two lock combinations. The locks have a margin of error of +- 2 so if you had a combination lock of 1-3-5, the combination 3-1-7 would still solve it.
You are also given a dial. For example, the dial starts at 1 and ends at the given number. So if the dial was 50, it would start at 1 and end at 50. Since the beginning of the dial is adjacent to the end of the dial, the combination 49-1-3 would also solve the combination lock of 1-3-5.
In this program, you have to output the number of distinct solutions to the two lock combinations. For the record, the combination 3-2-1 and 1-2-3 are considered distinct, but the combination 2-2-2 and 2-2-2 is not.
I have tried creating two functions, one to check whether three numbers match the constraints of the first combination lock and another to check whether three numbers match the constraints of the second combination lock.
a,b,c = 1,2,3
d,e,f = 5,6,7
dial = 50
def check(i,j,k):
i = (i+dial) % dial
j = (j+dial) % dial
k = (k+dial) % dial
if abs(a-i) <= 2 and abs(b-j) <= 2 and abs(c-k) <= 2:
return True
return False
def check1(i,j,k):
i = (i+dial) % dial
j = (j+dial) % dial
k = (k+dial) % dial
if abs(d-i) <= 2 and abs(e-j) <= 2 and abs(f-k) <= 2:
return True
return False
res = []
count = 0
for i in range(1,dial+1):
for j in range(1,dial+1):
for k in range(1,dial+1):
if check(i,j,k):
count += 1
res.append([i,j,k])
if check1(i,j,k):
count += 1
res.append([i,j,k])
print(sorted(res))
print(count)
The dial is 50 and the first combination is 1-2-3 and the second combination is 5-6-7.
The program should output 249 as the count, but it instead outputs 225. I am not really sure why this is happening. I have added the array for display purposes only. Any help would be greatly appreciated!
You're going to a lot of trouble to solve this by brute force.
First of all, your two check routines have identical functionality: just call the same routine for both combinations, giving the correct combination as a second set of parameters.
The critical logic problem is handling the dial wrap-around: you miss picking up the adjacent numbers. Run 49 through your check against a correct value of 1:
# using a=1, i=49
i = (1+50)%50 # i = 1
...
if abs(1-49) <= 2 ... # abs(1-49) is 48. You need it to show up as 2.
Instead, you can check each end of the dial:
a_diff = abs(i-a)
if a_diff <=2 or a_diff >= (dial-2) ...
Another way is to start by making a list of acceptable values:
a_vals = [(a-oops) % dial] for oops in range(-2, 3)]
... but note that you have to change the 0 value to dial. For instance, for a value of 1, you want a list of [49, 50, 1, 2, 3]
With this done, you can check like this:
if i in a_vals and j in b_vals and k in c_vals:
...
If you want to upgrade to the itertools package, you can simply generate all desired combinations:
combo = set(itertools.product(a_list, b_list_c_list) )
Do that for both given combinations and take the union of the two sets. The length of the union is the desired answer.
I see the follow-up isn't obvious -- at least, it's not appearing in the comments.
You have 5*5*5 solutions for each combination; start with 250 as your total.
Compute the sizes of the overlap sets: the numbers in each triple that can serve for each combination. For your given problem, those are [3],[4],[5]
The product of those set sizes is the quantity of overlap: 1*1*1 in this case.
The overlapping solutions got double-counted, so simply subtract the extra from 250, giving the answer of 249.
For example, given 1-2-3 and 49-6-6, you would get sets
{49, 50, 1}
{4}
{4, 5}
The sizes are 3, 1, 2; the product of those numbers is 6, so your answer is 250-6 = 244
Final note: If you're careful with your modular arithmetic, you can directly compute the set sizes without building the sets, making the program very short.
Here is one approach to a semi-brute-force solution:
import itertools
#The following code assumes 0-based combinations,
#represented as tuples of numbers in the range 0 to dial - 1.
#A simple wrapper function can be used to make the
#code apply to 1-based combos.
#The following function finds all combos which open lock with a given combo:
def combos(combo,tol,dial):
valids = []
for p in itertools.product(range(-tol,1+tol),repeat = 3):
valids.append(tuple((x+i)%dial for x,i in zip(combo,p)))
return valids
#The following finds all combos for a given iterable of target combos:
def all_combos(targets,tol,dial):
return set(combo for target in targets for combo in combos(target,tol,dial))
For example, len(all_combos([(0,1,2),(4,5,6)],2,50)) evaluate to 249.
The correct code for what you are trying to do is the following:
dial = 50
a = 1
b = 2
c = 3
d = 5
e = 6
f = 7
def check(i,j,k):
if (abs(a-i) <= 2 or (dial-abs(a-i)) <= 2) and \
(abs(b-j) <= 2 or (dial-abs(b-j)) <= 2) and \
(abs(c-k) <= 2 or (dial-abs(c-k)) <= 2):
return True
return False
def check1(i,j,k):
if (abs(d-i) <= 2 or (dial-abs(d-i)) <= 2) and \
(abs(e-j) <= 2 or (dial-abs(e-j)) <= 2) and \
(abs(f-k) <= 2 or (dial-abs(f-k)) <= 2):
return True
return False
res = []
count = 0
for i in range(1,dial+1):
for j in range(1,dial+1):
for k in range(1,dial+1):
if check(i,j,k):
count += 1
res.append([i,j,k])
elif check1(i,j,k):
count += 1
res.append([i,j,k])
print(sorted(res))
print(count)
And the result is 249, the total combinations are 2*(5**3) = 250, but we have the duplicates: [3, 4, 5]

Q: Expected number of coin tosses to get N heads in a row, in Python. My code gives answers that don't match published correct ones, but unsure why

I'm trying to write Python code to see how many coin tosses, on average, are required to get a sequences of N heads in a row.
The thing that I'm puzzled by is that the answers produced by my code don't match ones that are given online, e.g. here (and many other places) https://math.stackexchange.com/questions/364038/expected-number-of-coin-tosses-to-get-five-consecutive-heads
According to that, the expected number of tosses that I should need to get various numbers of heads in a row are: E(1) = 2, E(2) = 6, E(3) = 14, E(4) = 30, E(5) = 62. But I don't get those answers! For example, I get E(3) = 8, instead of 14. The code below runs to give that answer, but you can change n to test for other target numbers of heads in a row.
What is going wrong? Presumably there is some error in the logic of my code, but I confess that I can't figure out what it is.
You can see, run and make modified copies of my code here: https://trinket.io/python/17154b2cbd
Below is the code itself, outside of that runnable trinket.io page. Any help figuring out what's wrong with it would be greatly appreciated!
Many thanks,
Raj
P.S. The closest related question that I could find was this one: Monte-Carlo Simulation of expected tosses for two consecutive heads in python
However, as far as I can see, the code in that question does not actually test for two consecutive heads, but instead tests for a sequence that starts with a head and then at some later, possibly non-consecutive, time gets another head.
# Click here to run and/or modify this code:
# https://trinket.io/python/17154b2cbd
import random
# n is the target number of heads in a row
# Change the value of n, for different target heads-sequences
n = 3
possible_tosses = [ 'h', 't' ]
num_trials = 1000
target_seq = ['h' for i in range(0,n)]
toss_sequence = []
seq_lengths_rec = []
for trial_num in range(0,num_trials):
if (trial_num % 100) == 0:
print 'Trial num', trial_num, 'out of', num_trials
# (The free version of trinket.io uses Python2)
target_reached = 0
toss_num = 0
while target_reached == 0:
toss_num += 1
random.shuffle(possible_tosses)
this_toss = possible_tosses[0]
#print([toss_num, this_toss])
toss_sequence.append(this_toss)
last_n_tosses = toss_sequence[-n:]
#print(last_n_tosses)
if last_n_tosses == target_seq:
#print('Reached target at toss', toss_num)
target_reached = 1
seq_lengths_rec.append(toss_num)
print 'Average', sum(seq_lengths_rec) / len(seq_lengths_rec)
You don't re-initialize toss_sequence for each experiment, so you start every experiment with a pre-existing sequence of heads, having a 1 in 2 chance of hitting the target sequence on the first try of each new experiment.
Initializing toss_sequence inside the outer loop will solve your problem:
import random
# n is the target number of heads in a row
# Change the value of n, for different target heads-sequences
n = 4
possible_tosses = [ 'h', 't' ]
num_trials = 1000
target_seq = ['h' for i in range(0,n)]
seq_lengths_rec = []
for trial_num in range(0,num_trials):
if (trial_num % 100) == 0:
print('Trial num {} out of {}'.format(trial_num, num_trials))
# (The free version of trinket.io uses Python2)
target_reached = 0
toss_num = 0
toss_sequence = []
while target_reached == 0:
toss_num += 1
random.shuffle(possible_tosses)
this_toss = possible_tosses[0]
#print([toss_num, this_toss])
toss_sequence.append(this_toss)
last_n_tosses = toss_sequence[-n:]
#print(last_n_tosses)
if last_n_tosses == target_seq:
#print('Reached target at toss', toss_num)
target_reached = 1
seq_lengths_rec.append(toss_num)
print(sum(seq_lengths_rec) / len(seq_lengths_rec))
You can simplify your code a bit, and make it less error-prone:
import random
# n is the target number of heads in a row
# Change the value of n, for different target heads-sequences
n = 3
possible_tosses = [ 'h', 't' ]
num_trials = 1000
seq_lengths_rec = []
for trial_num in range(0, num_trials):
if (trial_num % 100) == 0:
print('Trial num {} out of {}'.format(trial_num, num_trials))
# (The free version of trinket.io uses Python2)
heads_counter = 0
toss_counter = 0
while heads_counter < n:
toss_counter += 1
this_toss = random.choice(possible_tosses)
if this_toss == 'h':
heads_counter += 1
else:
heads_counter = 0
seq_lengths_rec.append(toss_counter)
print(sum(seq_lengths_rec) / len(seq_lengths_rec))
We cam eliminate one additional loop by running each experiment long enough (ideally infinite) number of times, e.g., each time toss a coin n=1000 times. Now, it is likely that the sequence of 5 heads will appear in each such trial. If it does appear, we can call the trial as an effective trial, otherwise we can reject the trial.
In the end, we can take an average of number of tosses needed w.r.t. the number of effective trials (by LLN it will approximate the expected number of tosses). Consider the following code:
N = 100000 # total number of trials
n = 1000 # long enough sequence of tosses
k = 5 # k heads in a row
ntosses = []
pat = ''.join(['1']*k)
effective_trials = 0
for i in range(N): # num of trials
seq = ''.join(map(str,random.choices(range(2),k=n))) # toss a coin n times (long enough times)
if pat in seq:
ntosses.append(seq.index(pat) + k)
effective_trials += 1
print(effective_trials, sum(ntosses) / effective_trials)
# 100000 62.19919
Notice that the result may not be correct if n is small, since it tries to approximate infinite number of coin tosses (to find expected number of tosses to obtain 5 heads in a row, n=1000 is okay since actual expected value is 62).

Changing this Python program to have function def()

The following Python program flips a coin several times, then reports the longest series of heads and tails. I am trying to convert this program into a program that uses functions so it uses basically less code. I am very new to programming and my teacher requested this of us, but I have no idea how to do it. I know I'm supposed to have the function accept 2 parameters: a string or list, and a character to search for. The function should return, as the value of the function, an integer which is the longest sequence of that character in that string. The function shouldn't accept input or output from the user.
import random
print("This program flips a coin several times, \nthen reports the longest
series of heads and tails")
cointoss = int(input("Number of times to flip the coin: "))
varlist = []
i = 0
varstring = ' '
while i < cointoss:
r = random.choice('HT')
varlist.append(r)
varstring = varstring + r
i += 1
print(varstring)
print(varlist)
print("There's this many heads: ",varstring.count("H"))
print("There's this many tails: ",varstring.count("T"))
print("Processing input...")
i = 0
longest_h = 0
longest_t = 0
inarow = 0
prevIn = 0
while i < cointoss:
print(varlist[i])
if varlist[i] == 'H':
prevIn += 1
if prevIn > longest_h:
longest_h = prevIn
print("",longest_h,"")
inarow = 0
if varlist[i] == 'T':
inarow += 1
if inarow > longest_t:
longest_t = inarow
print("",longest_t,"")
prevIn = 0
i += 1
print ("The longest series of heads is: ",longest_h)
print ("The longest series of tails is: ",longest_t)
If this is asking too much, any explanatory help would be really nice instead. All I've got so far is:
def flip (a, b):
flipValue = random.randint
but it's barely anything.
import random
def Main():
numOfFlips=getFlips()
outcome=flipping(numOfFlips)
print(outcome)
def getFlips():
Flips=int(input("Enter number if flips:\n"))
return Flips
def flipping(numOfFlips):
longHeads=[]
longTails=[]
Tails=0
Heads=0
for flips in range(0,numOfFlips):
flipValue=random.randint(1,2)
print(flipValue)
if flipValue==1:
Tails+=1
longHeads.append(Heads) #recording value of Heads before resetting it
Heads=0
else:
Heads+=1
longTails.append(Tails)
Tails=0
longestHeads=max(longHeads) #chooses the greatest length from both lists
longestTails=max(longTails)
return "Longest heads:\t"+str(longestHeads)+"\nLongest tails:\t"+str(longestTails)
Main()
I did not quite understand how your code worked, so I made the code in functions that works just as well, there will probably be ways of improving my code alone but I have moved the code over to functions
First, you need a function that flips a coin x times. This would be one possible implementation, favoring random.choice over random.randint:
def flip(x):
result = []
for _ in range(x):
result.append(random.choice(("h", "t")))
return result
Of course, you could also pass from what exactly we are supposed to take a choice as a parameter.
Next, you need a function that finds the longest sequence of some value in some list:
def longest_series(some_value, some_list):
current, longest = 0, 0
for r in some_list:
if r == some_value:
current += 1
longest = max(current, longest)
else:
current = 0
return longest
And now you can call these in the right order:
# initialize the random number generator, so we get the same result
random.seed(5)
# toss a coin a hundred times
series = flip(100)
# count heads/tails
headflips = longest_series('h', series)
tailflips = longest_series('t', series)
# print the results
print("The longest series of heads is: " + str(headflips))
print("The longest series of tails is: " + str(tailflips))
Output:
>> The longest series of heads is: 8
>> The longest series of heads is: 5
edit: removed the flip implementation with yield, it made the code weird.
Counting the longest run
Let see what you have asked for
I'm supposed to have the function accept 2 parameters: a string or list,
or, generalizing just a bit, a sequence
and a character
again, we'd speak, generically, of an item
to search for. The function should return, as the value of the
function, an integer which is the longest sequence of that character
in that string.
My implementation of the function you are asking for, complete of doc
string, is
def longest_run(i, s):
'Counts the longest run of item "i" in sequence "s".'
c, m = 0, 0
for el in s:
if el==i:
c += 1
elif c:
m = m if m >= c else c
c = 0
return m
We initialize c (current run) and m (maximum run so far) to zero,
then we loop, looking at every element el of the argument sequence s.
The logic is straightforward but for elif c: whose block is executed at the end of a run (because c is greater than zero and logically True) but not when the previous item (not the current one) was not equal to i. The savings are small but are savings...
Flipping coins (and more...)
How can we simulate flipping n coins? We abstract the problem and recognize that flipping n coins corresponds to choosing from a collection of possible outcomes (for a coin, either head or tail) for n times.
As it happens, the random module of the standard library has the exact answer to this problem
In [52]: random.choices?
Signature: choices(population, weights=None, *, cum_weights=None, k=1)
Docstring:
Return a k sized list of population elements chosen with replacement.
If the relative weights or cumulative weights are not specified,
the selections are made with equal probability.
File: ~/lib/miniconda3/lib/python3.6/random.py
Type: method
Our implementation, aimed at hiding details, could be
def roll(n, l):
'''Rolls "n" times a dice/coin whose face values are listed in "l".
E.g., roll(2, range(1,21)) -> [12, 4] simulates rolling 2 icosahedron dices.
'''
from random import choices
return choices(l, k=n)
Putting this together
def longest_run(i, s):
'Counts the longest run of item "i" in sequence "s".'
c, m = 0, 0
for el in s:
if el==i:
c += 1
elif c:
m = m if m >= c else c
c = 0
return m
def roll(n, l):
'''Rolls "n" times a dice/coin whose face values are listed in "l".
E.g., roll(2, range(1,21)) -> [12, 4] simulates rolling 2 icosahedron dices.
'''
from random import choices
return choices(l, k=n)
N = 100 # n. of flipped coins
h_or_t = ['h', 't']
random_seq_of_h_or_t = flip(N, h_or_t)
max_h = longest_run('h', random_seq_of_h_or_t)
max_t = longest_run('t', random_seq_of_h_or_t)

Standard deviation of combinations of dices

I am trying to find stdev for a sequence of numbers that were extracted from combinations of dice (30) that sum up to 120. I am very new to Python, so this code makes the console freeze because the numbers are endless and I am not sure how to fit them all into a smaller, more efficient function. What I did is:
found all possible combinations of 30 dice;
filtered combinations that sum up to 120;
multiplied all items in the list within result list;
tried extracting standard deviation.
Here is the code:
import itertools
import numpy
dice = [1,2,3,4,5,6]
subset = itertools.product(dice, repeat = 30)
result = []
for x in subset:
if sum(x) == 120:
result.append(x)
my_result = numpy.product(result, axis = 1).tolist()
std = numpy.std(my_result)
print(std)
Note that D(X^2) = E(X^2) - E(X)^2, you can solve this problem analytically by following equations.
f[i][N] = sum(k*f[i-1][N-k]) (1<=k<=6)
g[i][N] = sum(k^2*g[i-1][N-k])
h[i][N] = sum(h[i-1][N-k])
f[1][k] = k ( 1<=k<=6)
g[1][k] = k^2 ( 1<=k<=6)
h[1][k] = 1 ( 1<=k<=6)
Sample implementation:
import numpy as np
Nmax = 120
nmax = 30
min_value = 1
max_value = 6
f = np.zeros((nmax+1, Nmax+1), dtype ='object')
g = np.zeros((nmax+1, Nmax+1), dtype ='object') # the intermediate results will be really huge, to keep them accurate we have to utilize python big-int
h = np.zeros((nmax+1, Nmax+1), dtype ='object')
for i in range(min_value, max_value+1):
f[1][i] = i
g[1][i] = i**2
h[1][i] = 1
for i in range(2, nmax+1):
for N in range(1, Nmax+1):
f[i][N] = 0
g[i][N] = 0
h[i][N] = 0
for k in range(min_value, max_value+1):
f[i][N] += k*f[i-1][N-k]
g[i][N] += (k**2)*g[i-1][N-k]
h[i][N] += h[i-1][N-k]
result = np.sqrt(float(g[nmax][Nmax]) / h[nmax][Nmax] - (float(f[nmax][Nmax]) / h[nmax][Nmax]) ** 2)
# result = 32128174994365296.0
You ask for a result of an unfiltered lengths of 630 = 2*1023, impossible to handle as such.
There are two possibilities that can be combined:
Include more thinking to pre-treat the problem, e.g. on how to sample only
those with sum 120.
Do a Monte Carlo simulation instead, i.e. don't sample all
combinations, but only a random couple of 1000 to obtain a representative
sample to determine std sufficiently accurate.
Now, I only apply (2), giving the brute force code:
N = 30 # number of dices
M = 100000 # number of samples
S = 120 # required sum
result = [[random.randint(1,6) for _ in xrange(N)] for _ in xrange(M)]
result = [s for s in result if sum(s) == S]
Now, that result should be comparable to your result before using numpy.product ... that part I couldn't follow, though...
Ok, if you are out after the standard deviation of the product of the 30 dices, that is what your code does. Then I need 1 000 000 samples to get roughly reproducible values for std (1 digit) - takes my PC about 20 seconds, still considerably less than 1 million years :-D.
Is a number like 3.22*1016 what you are looking for?
Edit after comments:
Well, sampling the frequency of numbers instead gives only 6 independent variables - even 4 actually, by substituting in the constraints (sum = 120, total number = 30). My current code looks like this:
def p2(b, s):
return 2**b * 3**s[0] * 4**s[1] * 5**s[2] * 6**s[3]
hits = range(31)
subset = itertools.product(hits, repeat=4) # only 3,4,5,6 frequencies
product = []
permutations = []
for s in subset:
b = 90 - (2*s[0] + 3*s[1] + 4*s[2] + 5*s[3]) # 2 frequency
a = 30 - (b + sum(s)) # 1 frequency
if 0 <= b <= 30 and 0 <= a <= 30:
product.append(p2(b, s))
permutations.append(1) # TODO: Replace 1 with possible permutations
print numpy.std(product) # TODO: calculate std manually, considering permutations
This computes in about 1 second, but the confusing part is that I get as a result 1.28737023733e+17. Either my previous approaches or this one has a bug - or both.
Sorry - not that easy: The sampling is not of the same probability - that is the problem here. Each sample has a different number of possible combinations, giving its weight, which has to be considered before taking the std-deviation. I have drafted that in the code above.

Categories