Paradox python algorithm - python

I am trying to solve a version of the birthday paradox question where I have a probability of 0.5 but I need to find the number of people n where at least 4 have their birthdays within a week of each other.
I have written code that is able to simulate where 2 people have their birthdays on the same day.
import numpy
import matplotlib.pylab as plt
no_of_simulations = 1000
milestone_probabilities = [50, 75, 90, 99]
milestone_current = 0
def birthday_paradox(no_of_people, simulations):
global milestone_probabilities, milestone_current
same_birthday_four_people = 0
#We assume that there are 365 days in all years.
for sim in range(simulations):
birthdays = numpy.random.choice(365, no_of_people, replace=True)
unique_birthdays = set(birthdays)
if len(unique_birthdays) < no_of_people:
same_birthday_four_people += 1
success_fraction = same_birthday_four_people/simulations
if milestone_current < len(milestone_probabilities) and success_fraction*100 > milestone_probabilities[milestone_current]:
print("P(Four people sharing birthday in a room with " + str(no_of_people) + " people) = " + str(success_fraction))
milestone_current += 1
return success_fraction
def main():
day = []
success = []
for i in range(1, 366): #Executing for all possible cases where can have unique birthdays, i.e. from 1 person to a maximum of 365 people in a room
day.append(i)
success.append(birthday_paradox(i, no_of_simulations))
plt.plot(day, success)
plt.show()
main()
I am looking to modify the code to look for sets of 4 instead of 2 and then calculate the difference between them to be less than equal to 7 in order to meet the question.
Am I going down the right path or should I approach the question differently?

The key part of your algorithm is in these lines:
unique_birthdays = set(birthdays)
if len(unique_birthdays) < no_of_people:
same_birthday_four_people += 1
Comparing the number of unique birthdays to the number of people did the work when you tested if two different people had the same birthday, but It wont do for your new test.
Define a new function that will receive the birthday array and return True or False after checking if indeed 4 different people had the a birthday in a range of 7 days:
def four_birthdays_same_week(birthdays):
# fill this function code
def birthday_paradox(no_of_people, simulations):
...
(this function can be defined outside the birthday_paradox function)
Then switch this code:
if len(unique_birthdays) < no_of_people:
same_birthday_four_people += 1
into:
if four_birthdays_same_week(birthdays):
same_birthday_four_people += 1
Regarding the algorithm for checking if there 4 different birthday on the same week: a basic idea would be to sort the array of birthdays, then for every group of 4 birthdays check if the day range between them is equal or lower to 7:
if it is, the function can immediately return True.
(I am sure this algorithm can be vastly improved.)
If after scanning the whole array we didn't return True, the function can return False.

Related

How to loop in python to target amount?

I'm having difficulty with the below problems and not sure what I'm doing wrong. My goal is to figure out how many periods I need to compound interest on a deposit using loops to reach a target deposit amount on a function that takes three arguments I have to create. I've included what I have below but can't seem to get my number of periods.
Example:
period(1000, .05, 2000) - answer 15
where d is initial deposit, r is interest rate and t is target amount.
new_deposit = 0
def periods (d,r,t):
while d*(1+r)<=t:
new_deposit = d*(1+r) - d
print(new_deposit)
return periods
I'm very new to this so not sure where I'm going wrong.
You were close, but your return statement would throw an error as you never set periods.
def periods(d,r,t):
count_periods = 1
current_ammount = d
while current_ammount*(1+r)<=t:
current_ammount = current_ammount*(1+r)
count_periods+=1
print(current_ammount)
return count_periods
print(periods(100, 0.01, 105))
I renamed the return variable, as to not overlap with the function name itself.
EDIT: sorry your logic was flawed all the way through the code, rewrote it.
def periods(d, r, t):
p = 0
while d < t:
d *= (1 + r)
p += 1
return p
periods(100, .01, 105) # 5

How to efficiently process a list that continously being appended with new item in Python

Objective:
To visualize the population size of a particular organism over finite time.
Assumptions:
The organism has a life span of age_limit days
Only Females of age day_lay_egg days can lay the egg, and the female is allowed to lay an egg a maximum of max_lay_egg times. Each breeding session, a maximum of only egg_no eggs can be laid with a 50% probability of producing male offspring.
Initial population of 3 organisms consist of 2 Female and 1 Male
Code Snippets:
Currently, the code below should produced the expected output
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
def get_breeding(d,**kwargs):
if d['lay_egg'] <= kwargs['max_lay_egg'] and d['dborn'] > kwargs['day_lay_egg'] and d['s'] == 1:
nums = np.random.choice([0, 1], size=kwargs['egg_no'], p=[.5, .5]).tolist()
npol=[dict(s=x,d=d['d'], lay_egg=0, dborn=0) for x in nums]
d['lay_egg'] = d['lay_egg'] + 1
return d,npol
return d,None
def to_loop_initial_population(**kwargs):
npol=kwargs['ipol']
nday = 0
total_population_per_day = []
while nday < kwargs['nday_limit']:
# print(f'Executing day {nday}')
k = []
for dpol in npol:
dpol['d'] += 1
dpol['dborn'] += 1
dpol,h = get_breeding(dpol,**kwargs)
if h is None and dpol['dborn'] <= kwargs['age_limit']:
# If beyond the age limit, ignore the parent and update only the decedent
k.append(dpol)
elif isinstance(h, list) and dpol['dborn'] <= kwargs['age_limit']:
# If below age limit, append the parent and its offspring
h.extend([dpol])
k.extend(h)
total_population_per_day.append(dict(nsize=len(k), day=nday))
nday += 1
npol = k
return total_population_per_day
## Some spec and store all setting in a dict
numsex=[1,1,0] # 0: Male, 1: Female
# s: sex, d: day, lay_egg: Number of time the female lay an egg, dborn: The organism age
ipol=[dict(s=x,d=0, lay_egg=0, dborn=0) for x in numsex] # The initial population
age_limit = 45 # Age limit for the species
egg_no=3 # Number of eggs
day_lay_egg = 30 # Matured age for egg laying
nday_limit=360
max_lay_egg=2
para=dict(nday_limit=nday_limit,ipol=ipol,age_limit=age_limit,
egg_no=egg_no,day_lay_egg=day_lay_egg,max_lay_egg=max_lay_egg)
dpopulation = to_loop_initial_population(**para)
### make some plot
df = pd.DataFrame(dpopulation)
sns.lineplot(x="day", y="nsize", data=df)
plt.xticks(rotation=15)
plt.title('Day vs population')
plt.show()
Output:
Problem/Question:
The time to complete the execution time increases exponentially with nday_limit. I need to improve the efficiency of the code. How can I speed up the running time?
Other Thoughts:
I am tempted to apply joblib as below. To my surprise, the execution time is worse.
def djob(dpol,k,**kwargs):
dpol['d'] = dpol['d'] + 1
dpol['dborn'] = dpol['dborn'] + 1
dpol,h = get_breeding(dpol,**kwargs)
if h is None and dpol['dborn'] <= kwargs['age_limit']:
# If beyond the age limit, ignore the that particular subject
k.append(dpol)
elif isinstance(h, list) and dpol['dborn'] <= kwargs['age_limit']:
# If below age limit, append the parent and its offspring
h.extend([dpol])
k.extend(h)
return k
def to_loop_initial_population(**kwargs):
npol=kwargs['ipol']
nday = 0
total_population_per_day = []
while nday < kwargs['nday_limit']:
k = []
njob=1 if len(npol)<=50 else 4
if njob==1:
print(f'Executing day {nday} with single cpu')
for dpols in npol:
k=djob(dpols,k,**kwargs)
else:
print(f'Executing day {nday} with single parallel')
k=Parallel(n_jobs=-1)(delayed(djob)(dpols,k,**kwargs) for dpols in npol)
k = list(itertools.chain(*k))
ll=1
total_population_per_day.append(dict(nsize=len(k), day=nday))
nday += 1
npol = k
return total_population_per_day
for
nday_limit=365
Your code looks alright overall but I can see several points of improvement that are slowing your code down significantly.
Though it must be noted that you can't really help the code slowing down too much with increasing nday values, since the population you need to keep track of keeps growing and you keep re-populating a list to track this. It's expected as the number of objects increase, the loops will take longer to complete, but you can reduce the time it takes to complete a single loop.
elif isinstance(h, list) and dpol['dborn'] <= kwargs['age_limit']:
Here you ask the instance of h every single loop, after confirming whether it's None. You know for a fact that h is going to be a list, and if not, your code will error anyway even before reaching that line for the list not to have been able to be created.
Furthermore, you have a redundant condition check for age of dpol, and then redundantly first extend h by dpol and then k by h. This can be simplified together with the previous issue to this:
if dpol['dborn'] <= kwargs['age_limit']:
k.append(dpol)
if h:
k.extend(h)
The results are identical.
Additionally, you're passing around a lot of **kwargs. This is a sign that your code should be a class instead, where some unchanging parameters are saved through self.parameter. You could even use a dataclass here (https://docs.python.org/3/library/dataclasses.html)
Also, you mix responsibilities of functions which is unnecessary and makes your code more confusing. For instance:
def get_breeding(d,**kwargs):
if d['lay_egg'] <= kwargs['max_lay_egg'] and d['dborn'] > kwargs['day_lay_egg'] and d['s'] == 1:
nums = np.random.choice([0, 1], size=kwargs['egg_no'], p=[.5, .5]).tolist()
npol=[dict(s=x,d=d['d'], lay_egg=0, dborn=0) for x in nums]
d['lay_egg'] = d['lay_egg'] + 1
return d,npol
return d,None
This code contains two responsibilities: Generating a new individual if conditions are met, and checking these conditions, and returning two different things based on them.
This would be better done through two separate functions, one which simply checks the conditions, and another that generates a new individual as follows:
def check_breeding(d, max_lay_egg, day_lay_egg):
return d['lay_egg'] <= max_lay_egg and d['dborn'] > day_lay_egg and d['s'] == 1
def get_breeding(d, egg_no):
nums = np.random.choice([0, 1], size=egg_no, p=[.5, .5]).tolist()
npol=[dict(s=x, d=d['d'], lay_egg=0, dborn=0) for x in nums]
return npol
Where d['lay_egg'] could be updated in-place when iterating over the list if the condition is met.
You could speed up your code even further this way, if you edit the list as you iterate over it (it is not typically recommended but it's perfectly fine to do if you know what you're doing. Make sure to do it by using the index and limit it to the previous bounds of the length of the list, and decrement the index when an element is removed)
Example:
i = 0
maxiter = len(npol)
while i < maxiter:
if check_breeding(npol[i], max_lay_egg, day_lay_egg):
npol.extend(get_breeding(npol[i], egg_no))
if npol[i]['dborn'] > age_limit:
npol.pop(i)
i -= 1
maxiter -= 1
Which could significantly reduce processing time since you're not making a new list and appending all elements all over again every iteration.
Finally, you could check some population growth equation and statistical methods, and you could even reduce this whole code to a calculation problem with iterations, though that wouldn't be a sim anymore.
Edit
I've fully implemented my suggestions for improvements to your code and timed them in a jupyter notebook using %%time. I've separated out function definitions from both so they wouldn't contribute to the time, and the results are telling. I also made it so females produce another female 100% of the time, to remove randomness, otherwise it would be even faster. I compared the results from both to verify they produce identical results (they do, but I removed the 'd_born' parameter cause it's not used in the code apart from setting).
Your implementation, with nday_limit=100 and day_lay_egg=15:
Wall time 23.5s
My implementation with same parameters:
Wall time 18.9s
So you can tell the difference is quite significant, which grows even farther apart for larger nday_limit values.
Full implementation of edited code:
from dataclasses import dataclass
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
#dataclass
class Organism:
sex: int
times_laid_eggs: int = 0
age: int = 0
def __init__(self, sex):
self.sex = sex
def check_breeding(d, max_lay_egg, day_lay_egg):
return d.times_laid_eggs <= max_lay_egg and d.age > day_lay_egg and d.sex == 1
def get_breeding(egg_no): # Make sure to change probabilities back to 0.5 and 0.5 before using it
nums = np.random.choice([0, 1], size=egg_no, p=[0.0, 1.0]).tolist()
npol = [Organism(x) for x in nums]
return npol
def simulate(organisms, age_limit, egg_no, day_lay_egg, max_lay_egg, nday_limit):
npol = organisms
nday = 0
total_population_per_day = []
while nday < nday_limit:
i = 0
maxiter = len(npol)
while i < maxiter:
npol[i].age += 1
if check_breeding(npol[i], max_lay_egg, day_lay_egg):
npol.extend(get_breeding(egg_no))
npol[i].times_laid_eggs += 1
if npol[i].age > age_limit:
npol.pop(i)
maxiter -= 1
continue
i += 1
total_population_per_day.append(dict(nsize=len(npol), day=nday))
nday += 1
return total_population_per_day
if __name__ == "__main__":
numsex = [1, 1, 0] # 0: Male, 1: Female
ipol = [Organism(x) for x in numsex] # The initial population
age_limit = 45 # Age limit for the species
egg_no = 3 # Number of eggs
day_lay_egg = 15 # Matured age for egg laying
nday_limit = 100
max_lay_egg = 2
dpopulation = simulate(ipol, age_limit, egg_no, day_lay_egg, max_lay_egg, nday_limit)
df = pd.DataFrame(dpopulation)
sns.lineplot(x="day", y="nsize", data=df)
plt.xticks(rotation=15)
plt.title('Day vs population')
plt.show()
Try structuring your code as a matrix like state[age][eggs_remaining] = count instead. It will have age_limit rows and max_lay_egg columns.
Males start in the 0 eggs_remaining column, and every time a female lays an egg they move down one (3->2->1->0 with your code above).
For each cycle, you just drop the last row, iterate over all the rows after age_limit and insert a new first row with the number of males and females.
If (as in your example) there only is a vanishingly small chance that a female would die of old age before laying all their eggs, you can just collapse everything into a state_alive[age][gender] = count and a state_eggs[eggs_remaining] = count instead, but it shouldn't be necessary unless the age goes really high or you want to run thousands of simulations.
use numpy array operation as much as possible instead of using loop can improve your performance, see below codes tested in notebook - https://www.kaggle.com/gfteafun/notebook03118c731b
Note that when comparing the time the nsize scale matters.
%%time​
​
# s: sex, d: day, lay_egg: Number of time the female lay an egg, dborn: The organism age
x = np.array([(x, 0, 0, 0) for x in numsex ] )
iparam = np.array([0, 1, 0, 1])
​
total_population_per_day = []
for nday in range(nday_limit):
x = x + iparam
c = np.all(x < np.array([2, nday_limit, max_lay_egg, age_limit]), axis=1) & np.all(x >= np.array([1, day_lay_egg, 0, day_lay_egg]), axis=1)
total_population_per_day.append(dict(nsize=len(x[x[:,3]<age_limit, :]), day=nday))
n = x[c, 2].shape[0]
​
if n > 0:
x[c, 2] = x[c, 2] + 1
newborns = np.array([(x, nday, 0, 0) for x in np.random.choice([0, 1], size=egg_no, p=[.5, .5]) for i in range(n)])
x = np.vstack((x, newborns))
​
​
df = pd.DataFrame(total_population_per_day)
sns.lineplot(x="day", y="nsize", data=df)
plt.xticks(rotation=15)
plt.title('Day vs population')
plt.show()

I used return, however the recursion does not end. help me please

I am doing a question that gives me a start coordinate, a end coordinate and the number of times of moving.Every time you can add 1 or minus 1 to x or y coordinate based on previous coordinate and the number of moving limit the time the coordinate can move. At last, I need to identify whether there is a possibility to get to the end coordinate
I decide to use recursion to solve this problem however, it does not end even if I wrote return inside a if else statement. Do you mind to take a look at it.
This is the code
# https://cemc.uwaterloo.ca/contests/computing/2017/stage%201/juniorEF.pdf
# input
start = input()
end = input()
count = int(input())
coo_end = end.split(' ')
x_end = coo_end[0]
y_end = coo_end[1]
end_set = {int(x_end), int(y_end)}
#processing
coo = start.split(' ')
x = int(coo[0])
y = int(coo[1])
change_x = x
change_y = y
sum = x + y+count
set1 = set()
tim = 0
timer = 0
ways = 4** (count-1)
def elit(x, y, tim,timer, ways = ways):
print(tim,timer)
tim = tim +1
co1 = (x, y+1)
co2 = (x+1, y)
co3 = (x, y-1)
co4 = (x-1, y)
if tim == count:
tim =0
set1.add(co1)
set1.add(co2)
set1.add(co3)
set1.add(co4)
print(timer)
timer = timer +1
if timer == ways:
print('hiii')
return co1, co2, co3, co4 #### this is the place there is a problem
elit(co1[0],co1[1],tim,timer)
elit(co2[0],co2[1],tim,timer)
elit(co3[0],co3[1],tim, timer)
elit(co4[0],co4[1],tim, timer)
#print(elit(change_x,change_y,tim)) - none why
elit(change_x,change_y,tim, timer)
#print(list1)
for a in set1:
if end_set != a:
answer = 'N'
continue
else:
answer = "Y"
break
print(answer)
In addition, if you have any suggestions about writing this question, do you mind to tell me since I am not sure I am using the best solution.
one of example is
Sample Input
3 4 (start value)
3 3 (end value)
3 (count)
Output for Sample Input
Y
Explanation
One possibility is to travel from (3, 4) to (4, 4) to (4, 3) to (3, 3).
the detailed question can be seen in this file https://cemc.uwaterloo.ca/contests/computing/2017/stage%201/juniorEF.pdf
It is question 3. Thank you
thank you guys
the function is returning properly however by the time you reach the recursive depth to return anything you have called so many instances of the function that it seems like its in an infinite loop
when you call elite the first time the function calls itself four more times, in the example you have given timer is only incremented every 3 cycles and the function only return once timer hits 16 thus the function will need to run 48 times before returning anything and each time the function will be called 4 more times, this exponential growth means for this example the function will be called 19807040628566084398385987584 times, which depending on your machine may well take until the heat death of the universe
i thought i should add that i think you have somewhat over complicated the question, on a grid to get from one point to another the only options are the minimum distance or that same minimum with a diversion that must always be a multiple of 2 in length, so if t the movement is at least the minimum distance or any multiple of 2 over the result should be 'Y', the minimum distance will just be the difference between the coordinates on each axis this can be found by add in the difference between the x and y coordinates
abs(int(start[0]) - int(end[0])) + abs(int(start[1]) -int(end[1]))
the whole function therefore can just be:
def elit():
start = input('start: ').split(' ')
end = input('end: ').split(' ')
count = int(input('count: '))
distance = abs(int(start[0]) - int(end[0])) + abs(int(start[1]) -int(end[1]))
if (count - distance) % 2 == 0:
print('Y')
else:
print('N')
input:
3 4
3 3
3
output:
Y
input:
10 4
10 2
5
output:
N

PYTHON - "Love for Mathematics"

I just finished a challenge on Dcoder ("Love for Mathematics") using Python. I failed two test-cases, but got one right. I used somewhat of a lower level of Python for the same as I haven't explored more yet, so I'm sorry if it looks a bit too basic.The Challenge reads:
Students of Dcoder school love Mathematics. They love to read a variety of Mathematics books. To make sure they remain happy, their Mathematics teacher decided to get more books for them.
A student would become happy if there are at least X Mathematics books in the class and not more than Y books because they know "All work and no play makes Jack a dull boy".The teacher wants to buy a minimum number of books to make the maximum number of students happy.
The Input
The first line of input contains an integer N indicating the number of students in the class. This is followed up by N lines where every line contains two integers X and Y respectively.
#Sample Input
5
3 6
1 6
7 11
2 15
5 8
The Output
Output two space-separated integers that denote the minimum number of mathematics books required and the maximum number of happy students.
Explanation: The teacher could buy 5 books and keep student 1, 2, 4 and 5 happy.
#Sample Output
5 4
Constraints:
1 <= N <= 10000
1 <= X, Y <= 10^9
My code:
n = int(input())
l = []
mi = []
ma = []
for i in range(n):
x, y = input().split()
mi.append(int(x))
ma.append(int(y))
if i == 0:
h=ma[0]
else:
if ma[i]>h:
h=ma[i]
for i in range(h):
c = 0
for j in range(len(mi)):
if ma[j]>=i and mi[j]<=i:
c+=1
l.append(c)
great = max(l)
for i in range(1,len(l)+1):
if l[i]==great:
print(i,l[i])
break
My Approach:
I first assigned the two minimum and maximum variables to two different lists - one containing the minimum values, and the other, the maximum. Then I created a loop that processes all numbers from 0 to the maximum possible value of the list containing maximum values and increasing the count for each no. by 1 every time it lies within the favorable range of students.
In this specific case, I got that count list to be (for the above given input):
[1,2,3,3,4,4,3,3,2 ...] and so on. So I could finalize that 4 would be the maximum no. of students and that the first index of 4 in the list would be the minimum no. of textbooks required.
But only 1 test-case worked and two failed. I would really appreciate it if anyone could help me out here.
Thank You.
This problem is alike minimum platform problem.
In that, you need to sort the min and max maths books array in ascending order respectively. Try to understand the problem from the above link (platform problem) then this will be a piece of cake.
Here is your solution:
n = int(input())
min_books = []
max_books = []
for i in range(n):
x, y = input().split()
min_books.append(int(x))
max_books.append(int(y))
min_books.sort()
max_books.sort()
happy_st_result = 1
happy_st = 1
books_needed = min_books[0]
i = 1
j = 0
while (i < n and j < n):
if (min_books[i] <= max_books[j]):
happy_st+= 1
i+= 1
elif (min_books[i] > max_books[j]):
happy_st-= 1
j+= 1
if happy_st > happy_st_result:
happy_st_result = happy_st
books_needed = min_books[i-1]
print(books_needed, happy_st_result)
Try this, and let me know if you need any clarification.
#Vinay Gupta's logic and explanation is correct. If you think on those lines, the answer should become immediately clear to you.
I have implemented the same logic in my code below, except using fewer lines and cool in-built python functions.
# python 3.7.1
import itertools
d = {}
for _ in range(int(input())):
x, y = map(int, input().strip().split())
d.setdefault(x, [0, 0])[0] += 1
d.setdefault(y, [0, 0])[1] += 1
a = list(sorted(d.items(), key=lambda x: x[0]))
vals = list(itertools.accumulate(list(map(lambda x: x[1][0] - x[1][1], a))))
print(a[vals.index(max(vals))][0], max(vals))
The above answer got accepted in Dcoder too.

Python program that calculates the probability of getting 1 at least twice when rolling a fair die 12 times

import random
import sys
bestcounter1 = 0
bestcounter2=0
get_sample = int(sys.argv[1])
for i in range(get_sample):
for i in range(12):
if (random.randint(1,6)==1):
bestcounter1+=1
bestcounter2+=1
oneatleasttwice = (bestcounter2*1.0)/(2*(get_sample))
#Divide by 2 to make both comparable. Otherwise 2 will always be greater than 1 !
print("One atleast twice in 12 rolls: ", oneatleasttwice)
Can anybody explain whether the logic used here is correct or not? The output I get is always around 1.
Thanks
You have to place your counters on the right places. Suppose bestcounter1 is used to count values of 1 during every run (12 rolls) while bestcounter2 is used to count runs when you got 2 or more values of 1. Then your main for loop should look like this:
for i in range(get_sample):
# reset before every run
bestcounter1 = 0
for i in range(12):
if random.randint(1, 6) == 1:
# count values of 1
bestcounter1 += 1
# check if we got 2 or more values of 1
if bestcounter1 >= 2:
# count proper cases
bestcounter2 += 1
break
oneatleasttwice = bestcounter2 / get_sample
I got result 61.9% with one million runs.

Categories