I tried to simulate the NHPP in python. The function works but the numbers simulated don´t follow the NHPP.
The code is:
def nhpp(parametros,T,N):
numeros=list()
# the function of rate λ(t) is a power law model, that is λ(t) = λ β 𝑡**(𝛽−1) ,𝑡, 𝜆, 𝛽>0.
funçao =lambda x:parametros[1] * parametros[0] * x ** (parametros[0] - 1)
#calculate the maximum of the function in the interval (0,T)
res=integrate.quad(funçao,0,T)
# l represents the λ
l=res[0]
t=0
cont=0
contagem =list()
listafinal=list()
for i in range (1,N+1):
u = numpy.random.uniform(0, 1)
#t represents the exponential times generated
t = t - (ln(u) / l)
#fun represents the values of λ(t) for the t1,t2,t3...tN
fun=parametros[1] * parametros[0] * t ** (parametros[0] - 1)
# if u<λ(t)/λ we acept the time
if u<=fun/l:
numeros.append(t)
#cont represents the number of times (N(T)) that were acepted as NHPP
cont = cont + 1
contagem.append(cont)
listafinal.append(numeros)
listafinal.append(contagem)
print(listafinal)
return listafinal
x=nhpp([0.5,0.35],500,20000)
The output of this function is: [[6.637092201160706, 12.739051189013342, 22.89616658744735, 161.12015416135688, 386.6019409119157, 424.7928356177192, 428.48931184149734, 733.1527508780554, 886.1376014091232, 1073.653026573429, 1133.4535462717483, 1787.4258499386765, 2077.7766357676205], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]]
If I do the graphic of the points, the times between occurrences are not decreasing but they should because, when the B (parameter of power-law model) is < 1 (in this case is B=0.5), the times between occurrences decrease. Anyone can help simulate the NHPP in python correctly?
Note :In the power law process:case B<1 times inter failures decrease,case B=1 times inter failures constant,case B>1 times inter failures increase.
i want create someting like this :
In the picture the black line representes B=0,5 ,the blue line B=1, and the red line B=1,5.
Update up-front
A common way to generate/simulate non-homogeneous Poisson processes is to use thinning. Candidate event times are generated at the maximal rate for the interval, and then thinned out by accepting a proportion of them based on the ratio of the instantaneous rate to the maximal rate. Unfortunately, this does not work if the maximal rate in the interval of interest is not finite, which can be the case when the rate function follows a power law with b < 1. I've left the discussion of thinning below for people who find this post based on the question title.
NIST has an online manual which describes a generating algorithm specific to the power law case. According to the link given above, in order to generate power law NHPP events with parameters a=0.5 and b=0.35 you should generate exponential random variates with rate a, add that to the prior time raised to the bth power, and then take the bth root of the sum to yield the next event time:
import random
# params is a list containing rate a and power b.
# t is the amount of time to be simulated.
def nhpp(params, t):
time = 0.0
event = 0
print("event time,event number")
while True:
# Generate b'th root of an exponential with rate "a",
# and update the simulated time accordingly
time = (time ** params[1] + random.expovariate(params[0])) ** (1/params[1])
event += 1
if time > t:
break
print(f"{time},{event}")
nhpp([0.5, 0.35], 10000)
which yields output such as:
event time,event number
0.0027863666405411654,1
0.1663302577640816,2
0.3771684274752755,3
36.54675259117693,4
76.353564909201,5
260.547640677633,6
292.0182519323185,7
406.34546142065693,8
5342.127722590645,9
5472.997406321742,10
5844.439757675029,11
8521.086105482522,12
How to simulate NHPP in python using thinning
The thinning technique's principles are described here. The following is a heavily annotated sample implementation of how to do it in Python:
import math
import random
# A frequency that cycles every 20 time units, in radians
OMEGA = 0.05 * 2 * math.pi
# A sinusoidal time-varying rate, truncated at zero
rate_f = lambda t: max(0, 4 - (16 * math.cos(OMEGA * t)))
# The rate function above gives a maximum instantaneous arrival rate
# of 20 events per time unit, i.e., 4 - (16 * -1) when cos == -1
lambda_max = 20.0
# The following should generate 5 cycles of non-zero
# event epochs between time 0 and time 100
t = 0.0
print("time of event")
while True:
# generate Poisson candidate event times using
# exponentially distributed inter-event delays
# at the maximal rate
t += random.expovariate(lambda_max)
# stop if we're past time 100
if t > 100.0:
break
# (rate_f(t) / lambda_max) is the probability we
# should accept a candidate at time t
if random.random() <= rate_f(t) / lambda_max:
# Accept and print this as an actual event if
# a U(0,1) is below the threshold probability
print(t)
This sample program generates results as described in the comments of the code.
Related
I have a convex programming problem in which I am constrained to several periods, each of these periods represents different times of a day in minutes. Assume we are constrained to 7 periods in the day, these periods consist [480, 360, 120, 180, 90, 120, 90].
Update to my thoughts on this:
Can the 7 intervals variable be transferred to a binary variable of 1440? This would mean we can calculated the level as needed.
I would assume to use these periods as a max for my integer variable which can be defined as X. X is a CVXPY variable X = cp.Variable(7). Performing my solution I create and define the problem by creating constraints, the constraints I want to work with are as follows:
Target >= min target
Reservoir level >= min level
Reservoir level <= max level
I understand than in order to calculate the reservoir levels I must feed correct data to ensure calculation such as surface area, what is expected to leave the reservoir. The problem I am struggling with is that due to the shape of X I feel like I should be ensuring the reservoir isn't overfilling between periods, at the moment my calculation just checks at point 0, point 1 ... point 7 and this does satisfy the constraint, however in real world the issue I am facing is that we are exceeding the max level in between these points at stages and would need to factor this into account, how could we refactor the code below to account for this given the variable running times of pumps as set by X
Please see below the code that we currently are working with.
# Imports
import numpy as np
import cvxpy as cp
# Initial variables required to reproduce the problem
periods_minutes = [480, 360, 120, 180, 90, 120, 90]
energy_costs = np.array([0.19, 0.22, 0.22, 0.22, 0.22, 0.22, 0.19])
flow_out = np.array([84254.39998627, 106037.09985495, 35269.19992447, 47066.40017509, 26121.59963608, 33451.20002747, 20865.5999279])
# Constant variables
pump_flow = 9.6
pump_energy = 9.29
pump_flow_per_minute = pump_flow * 60
# CVXPY section
N = len(periods_minutes)
MIN_RUN_TIME = 10
running_time = cp.Variable(N, integer=True)
mins = np.ones(N) * MIN_RUN_TIME
maxs = np.ones(N) * periods_minutes
k = cp.Variable(N, boolean=True)
# Optimiation calculations
running_time_hours = running_time / 60
cost_of_running = cp.multiply(running_time_hours, energy_costs) * pump_energy
sum_of_energy = cp.sum(cost_of_running)
volume_cp = cp.sum(running_time*pump_flow_per_minute)
period_volume = running_time * pump_flow_per_minute
# Create a variable that will represent 1 if the period is runnning and 0 for the remeainder, 1440 total
# Example running_time[0] = 231, then this is 231 trues in the variable
# test = np.zeros((1, 1440))
# for i in range(N):
# for j in range(running_time[i]):
# test[0][j] = 1
# Reservoir information and calculations
FACTOR = 1/160.6
flow_in = running_time * pump_flow_per_minute
flow_diff = (flow_in - flow_out) / 1000
res_level = cp.cumsum(flow_diff) * FACTOR + 2.01
# Constant constraints
min_level_constraint = res_level >= 1.8
max_level_constraint = res_level <= 2.4
volume_constraint = volume_cp >= 353065.5
# Build constraints
constraints = []
# Convert the integer variables to binary variables
# constraints += [test_cp[0] == 1]
# Append common constraints
constraints += [min_level_constraint]
constraints += [max_level_constraint]
constraints += [volume_constraint]
constraints += [running_time >= cp.multiply(k, mins)]
constraints += [running_time <= cp.multiply(k, maxs)]
# Objective definition
objective = cp.Minimize(cp.sum(sum_of_energy))
# Problem declaration
prob = cp.Problem(objective, constraints)
prob.solve(solver=cp.CPLEX, verbose=False)
# Each ith element of the array represents running time in minutes
running_time.value
Note that some variables are part of our external class:
Surface Area: 160.6m²
Min Level: 1.85m
Max Level: 2.4m
Pump Flow: 9.6l/s
Pump Energy: 9kW
At the moment our outflow data for the reservoir is in 30 minute intervals, Ideally if we could develop a solution to allow for this in the sense of say a inflow matrix that accounted for the various running times over a period of time and accounted the volume such as imagine a variable output for X being [231 100 0 0 30 90 99] If we look at the first element being 231 I would expect something like this in our matrix given 480 minutes for the first element as the max running time for period 1, this yields 16 elements as 480/30.
The expected outcome given this would be something like
[17280 17280 17280 17280 17280 17280 17280 12096 0 0 0 0 0 0 0 0]
Figures shown above are in volumes 17280 being full 30 minute interval running and 12096 being 21 minutes of the period, 0 being not running. I hope to have provided enough information to entice people into looking at this problem and look forward to answering and queries you may have. Thanks for taking the time to read through my post.
Problem
I assume that the pump starts running at the beginning of each time period, and stops after running_time seconds, until the next time period starts. We are checking the level at the end of each period but within the periods the level may get higher when the pump is working. I hope I've understood the problem correctly.
Solution
The constraint is:
res_level(t) < 2.4
The function is piecewise smooth, the pieces being separated by time period boundaries and the event of pump shutdown within each time period.
Mathematically we know that the constraint is satisfied if the value of res_level(t) is smaller than 2.4 at all critical points—i.e. piece boundaries and interior extrema.
We also know that res_level(t) is linear in the piece intervals. So there are no interior extrema—except in case of constant function, but in that case the value is already checked at the boundaries.
So your approach of checking res_level at ends of each period is correct, except that you also need to check the level at the times of pump shutdown.
From simple mathematics:
res_level(t_shutdown) = res_level(period_start) + flow_in - (t_shutdown/t_period) * flow_out
In CVXPY this can be implemtend as:
res_level_at_beginning_of_period = res_level - flow_diff * FACTOR
flow_diff_until_pump_shutdown = (flow_in - cp.multiply(flow_out, (running_time / periods_minutes))) / 1000
res_level_at_pump_shutdown = res_level_at_beginning_of_period + flow_diff_until_pump_shutdown * FACTOR
max_level_constraint_at_pump_shutdown = res_level_at_pump_shutdown <= 2.4
constraints += [max_level_constraint_at_pump_shutdown]
Running the code with this additional constraint gave me the following res_levels (levels at end of periods):
[2.0448792 1.8831538 2.09393089 1.80086488 1.96100436 1.81727335
2.0101401 ]
This enquiry is an extension to the question found in : '#Error: Solution not found' being returned when using gekko for optimization.
"ind_1" and "ind_2" are lists of length 8760 containing 0s/1s. Certain hours of the year may earn additional revenue, so these indicator lists are used to distinguish those hours (further used in the maximization function
I am trying to build onto this model by limiting the battery cycle to at MOST 1 charge and discharge every 24 hours. As an initial simplistic approach, I am attempting to sum up the battery command signals for each 24 hour segment and limiting it to at most 8000 kWh. You can find my approach below:
m = Gekko(remote=False)
#variables
e_battery = m.Var(lb=0, ub=4000, value=2000) #energy in battery at time t, battery size 4 MWh, initial value is 2MWh
command = m.Var(lb=-1000, ub=1000) #command power -1 to 1 (in MW)
e_price = m.Param(value = price) #price is a list of 8760 values
ind_1 = m.Param(value = ind_1)
ind_2 = m.Param(value = ind_2)
peak_list = m.Param(value = peak_load_list) #list of the monthly peaks (an array of length 8760)
load_list = m.Param(value = load) #hourly electric load
m.time = np.linspace(0,8759, 8760)
m.Equation(e_battery.dt() == command)
#The next 2 constraints are to ensure that the new load (original load + battery operation) is greater than 0, but less than the peak load for that month
m.Equation(load_list + command >= 0)
m.Equation(load_list + command <= peak_list)
#Here is the code to limit the cycling. "abs(command)" is used since "command" can be negative (discharge) or positive (charge), and a full charge and full discharge will equate to 8000 kWh.
daily_sum=0
for i in range(8760):
daily_sum += abs(command)
if i%24==0 and i!=0: #when i=0, it's the beginning of the first day so we can skip it
m.Equation(daily_sum <= 8000)
daily_sum = 0 #reset to 0 in preparation for the first hour of the next day
m.Maximize((-command)*(e_price + ind_1*ind1_price + ind_2*ind2_price))
m.options.IMODE = 6
m.solve()
When adding the cycling constraint, the following output is returned:
--------- APM Model Size ------------
Each time step contains
Objects : 0
Constants : 0
Variables : 373
Intermediates: 0
Connections : 0
Equations : 368
Residuals : 368
Error: At line 1545 of file apm.f90
Traceback: not available, compile with -ftrace=frame or -ftrace=full
Fortran runtime error: Out of memory
Does this particular implementation work using gekko's framework? Would I have to initialize a different type of variable for "command"? Also, I haven't been able to find many relevant examples of using for loops for the equations, so I'm very aware that my implementation might be well off. Would love to hear anyone's thoughts and/or suggestions, thanks.
Binary variables indicate when a destination has been reached (e_battery>3999 or e_battery<1). Integrating those binary variables gives an indication of how many times in a day the limit has been reached. One possible solution is to limit the integral of the binary variable to be less than the day count.
Below are two examples with soft constraints and hard constraints. The number of time points is reduced from 8760 to 120 (5 days) for testing.
from gekko import Gekko
import numpy as np
m = Gekko(remote=False)
n = 120 # hours
price=np.ones(n)
e_battery = m.Var(lb=0, ub=4000, value=2000) #energy in battery at time t
# battery size 4 MWh, initial value is 2MWh
command = m.Var(lb=-1000, ub=1000) #command power -1 to 1 (in MW)
e_price = m.Param(value = price) #price is a list of 8760 values
ind_1=1; ind_2=1
ind1_price=1; ind2_price=1
ind_1 = m.Param(value = ind_1)
ind_2 = m.Param(value = ind_2)
m.time = np.linspace(0,n-1,n)
m.Equation(e_battery.dt() == command)
day = 24
discharge = m.Intermediate(m.integral(m.if3(e_battery+1,1,0)))
charge = m.Intermediate(m.integral(m.if3(e_battery-3999,0,1)))
x = np.ones_like(m.time)
for i in range(1,n):
if i%day==0:
x[i] = x[i-1] + 1
else:
x[i] = x[i-1]
limit = m.Param(x)
soft_constraints = True
if soft_constraints:
derr = m.CV(value=0)
m.Equation(derr==limit-discharge)
derr.STATUS = 1
derr.SPHI = 1; derr.WSPHI = 1000
derr.SPLO = 0; derr.WSPLO = 1000
cerr = m.CV(value=0)
m.Equation(cerr==limit-charge)
cerr.STATUS = 1
cerr.SPHI = 1; cerr.WSPHI = 1000
cerr.SPLO = 0; cerr.WSPLO = 1000
else:
# Hard Constraints
m.Equation(charge<=limit)
m.Equation(charge>=limit-1)
m.Equation(discharge<=limit)
m.Equation(discharge>=limit-1)
m.Minimize(command*(e_price + ind_1*ind1_price + ind_2*ind2_price))
m.options.IMODE = 6
m.solve()
import matplotlib.pyplot as plt
plt.figure(figsize=(10,6))
plt.subplot(3,1,1)
plt.plot(m.time,limit.value,'g-.',label='Limit')
plt.plot(m.time,discharge.value,'b:',label='Discharge')
plt.plot(m.time,charge.value,'r--',label='Charge')
plt.legend(); plt.xlabel('Time'); plt.ylabel('Cycles'); plt.grid()
plt.subplot(3,1,2)
plt.plot(m.time,command.value,'k-',label='command')
plt.legend(); plt.xlabel('Time'); plt.ylabel('Command'); plt.grid()
plt.subplot(3,1,3)
plt.plot(m.time,e_battery.value,'g-.',label='Battery Charge')
plt.legend(); plt.xlabel('Time'); plt.ylabel('Battery'); plt.grid()
plt.show()
The application in the original question runs out of memory because 8760 equations are each simultaneously integrated over 8760 time steps. Try posing equations that are written once but valid over the entire frame. The current objective function is to minimize electricity usage. You may need to include constraints or an objective function to meet demand. Otherwise, the solution is to never use electricity because it is minimized (e.g. maximize(-command)). Here are similar Grid Energy Benchmark problems that may help.
Like the title explains, my program always returns the initial guess.
For context, the program is trying to find the best way to allocate some product across multiple stores. Each stores has a forecast of what they are expected to sell in the following days (sales_data). This forecast does not necessarily have to be integers, or above 1 (it rarely is), it is an expectation in the statistical sense. So, if a store has sales_data = [0.33, 0.33, 0.33] the it is expected that after 3 days, they would sell 1 unit of product.
I want to minimize the total time it takes to sell the units i am allocating (i want to sell them the fastest) and my constraints are that I have to allocate the units that I have available, and I cannot allocate a negative number of product to a store. I am ok having non-integer allocations for now. For my initial allocations i am dividing the units I have available equally among all stores.
Below is a shorter version of my code where I am having the problem:
import numpy, random
from scipy.optimize import curve_fit, minimize
unitsAvailable = 50
days = 15
class Store:
def __init__(self, num):
self.num = num
self.sales_data = []
stores = []
for i in range(10):
# Identifier
stores.append(Store(random.randint(1000, 9999)))
# Expected units to be sold that day (It's unlikey they will sell 1 every day)
stores[i].sales_data = [random.randint(0, 100) / 100 for i in range(days)]
print(stores[i].sales_data)
def days_to_turn(alloc, store):
day = 0
inventory = alloc
while (inventory > 0 and day < days):
inventory -= store.sales_data[day]
day += 1
return day
def time_objective(allocations):
time = 0
for i in range(len(stores)):
time += days_to_turn(allocations[i], stores[i])
return time
def constraint1(allocations):
return unitsAvailable - sum(allocations)
def constraint2(allocations):
return min(allocations) - 1
cons = [{'type':'eq', 'fun':constraint1}, {'type':'ineq', 'fun':constraint2}]
guess_allocs = []
for i in range(len(stores)):
guess_allocs.append(unitsAvailable / len(stores))
guess_allocs = numpy.array(guess_allocs)
print('Optimizing...')
time_solution = minimize(time_objective, guess_allocs, method='SLSQP', constraints=cons, options={'disp':True, 'maxiter': 500})
time_allocationsOpt = [max([a, 0]) for a in time_solution.x]
unitsUsedOpt = sum(time_allocationsOpt)
unitsDaysProjected = time_solution.fun
for i in range(len(stores)):
print("----------------------------------")
print("Units to send to Store %s: %s" % (stores[i].num, time_allocationsOpt[i]))
print("Time to turn allocated: %d" % (days_to_turn(time_allocationsOpt[i], stores[i])))
print("----------------------------------")
print("Estimated days to be sold: " + str(unitsDaysProjected))
print("----------------------------------")
print("Total units sent: " + str(unitsUsedOpt))
print("----------------------------------")
The optimization finishes successfully, with only 1 iteration, and no matter how i change the parameters, it always returns the initial guess_allocs.
Any advice?
The objective function does not have a gradient because it returns discrete multiples of days. This is easily visualized:
import numpy as np
import matplotlib.pyplot as plt
y = []
x = np.linspace(-4, 4, 1000)
for i in x:
a = guess_allocs + [i, -i, 0, 0, 0, 0, 0, 0, 0, 0]
y.append(time_objective(a))
plt.plot(x, y)
plt.xlabel('relative allocation')
plt.ylabel('objective')
plt.show()
If you want to optimize such a function you cannot use gradient based optimizers. There are two options: 1) Find a way to make the objective function differentiable. 2) Use a different optimizer. The first is hard. For the second, let's try dual annealing. Unfortunately, it does not allow constraints so we need to modify the objective function.
Constraining N numbers to a constant sum is the same as having N-1 unconstrained numbers and setting the Nth number to constant - sum.
import scipy.optimize as spo
bounds = [(0, unitsAvailable)] * (len(stores) - 1)
def constrained_objective(partial_allocs):
if np.sum(partial_allocs) > unitsAvailable:
# can't sell more than is available, so make the objective infeasible
return np.inf
# Partial_alloc contains allocations to all but one store.
# The final store gets allocated the remaining units.
allocs = np.append(partial_allocs, unitsAvailable - np.sum(partial_allocs))
return time_objective(allocs)
time_solution = spo.dual_annealing(constrained_objective, bounds, x0=guess_allocs[:-1])
print(time_solution)
This is a stochastic optimization method. You may want to run it multiple times to see if it can do better, or play with the optional parameters...
Finally, I think there is a problem with the objective function:
for i in range(len(stores)):
time += days_to_turn(allocations[i], stores[i])
This says that the stores do not sell at the same time but only one after another. Does each store wait with selling until the previous store runs out of items? I think not. Instead, they will sell simultaneously and the time it takes for all units to be sold is the time of the store that takes longest. Try this instead:
for i in range(len(stores)):
time = max(time, days_to_turn(allocations[i], stores[i]))
Having trouble with the following question:
In geometry the ratio of the circumference of a circle to its diameter is known as π. The value of π can be estimated from an infinite series of the form:
π / 4 = 1 - (1/3) + (1/5) - (1/7) + (1/9) - (1/11) + ...
There is another novel approach to calculate π. Imagine that you have a dart board that is 2 units square. It inscribes a circle of unit radius. The center of the circle coincides with the center of the square. Now imagine that you throw darts at that dart board randomly. Then the ratio of the number of darts that fall within the circle to the total number of darts thrown is the same as the ratio of the area of the circle to the area of the square dart board. The area of a circle with unit radius is just π square unit. The area of the dart board is 4 square units. The ratio of the area of the circle to the area of the square is π / 4.
To simuluate the throwing of darts we will use a random number generator. The Random module has several random number generating functions that can be used. For example, the function uniform(a, b) returns a floating point random number in the range a (inclusive) and b (exclusive).
Imagine that the square dart board has a coordinate system attached to it. The upper right corner has coordinates ( 1.0, 1.0) and the lower left corner has coordinates ( -1.0, -1.0 ). It has sides that are 2 units long and its center (as well as the center of the inscribed circle) is at the origin.
A random point inside the dart board can be specified by its x and y coordinates. These values are generated using the random number generator. The way we achieve that is:
xPos = random.uniform (-1.0, 1.0)
yPos = random.uniform (-1.0, 1.0)
To determine if a point is inside the circle its distance from the center of the circle must be strictly less than the radius of the circle. The distance of a point with coordinates ( xPos, yPos ) from the center is math.hypot (xPos, yPos). The radius of the circle is 1 unit.
The program that you will be writing will be called CalculatePI. It will have the following structure:
import math
import random
def computePI ( numThrows ):
...
def main ():
...
main()
Your function main() will call the function computePI() for a given number of throws. The function computePI() will simulate the throw of a dart by generating random numbers for the x and y coordinates. You will determine if that randomly generated point is inside the circle or not. You will do this as many times as specified by the number of throws. You will keep a count of the number of times a dart lands within the circle. That count divided by the total number of throws is the ratio π/4. The function computePI() will then return the computed value of PI.
In your function main() you want to experiment and see if the accuracy of PI increases with the number of throws on the dartboard. You will compare your result with the value given by math.pi. The quantity Difference in the output is your calculated value of PI minus math.pi. Use the following number of throws to run your experiment - 100, 1000, 10,000, 100,000, 1,000,000, and 10,000,000. You will call the function computePI() with these numbers as input parameters. Your output will be similar to the following, i.e. the actual values of your Calculated PI and Difference will be different but close to the ones shown:
Computation of PI using Random Numbers
num = 100 Calculated PI = 3.320000 Difference = +0.178407
num = 1000 Calculated PI = 3.080000 Difference = -0.061593
num = 10000 Calculated PI = 3.120400 Difference = -0.021193
num = 100000 Calculated PI = 3.144720 Difference = +0.003127
num = 1000000 Calculated PI = 3.142588 Difference = +0.000995
num = 10000000 Calculated PI = 3.141796 Difference = +0.000204
Difference = Calculated PI - math.pi
Your output must be in the above format. The number of throws must be left justified. The calculated value of π and the difference must be expressed correct to six places of decimal. There should be plus or minus sign on the difference. Read the relevant sections in the book on formatting.
Till now I have done:
import math
import random
def computePI (numThrows):
xPos = random.uniform (-1.0, 1.0)
yPos = random.uniform (-1.0, 1.0)
in_circle = 0
throws = 0
while (throws < numThrows):
if math.hypot (xPos, yPos) <= 1:
in_circle += 1
throws += 1
pi = (4 * in_circle) / numThrows
return pi
def main ():
throws = (100, 1000, 10000, 100000, 1000000, 10000000)
for numThrows in throws[0:7]:
main ()
I am having trouble calling the ComputePI function in the Main function. Also how do I print num with left indentation and ensure that all numbers have the required decimal space? Thank you!
Your program has three main issues:
Generating random numbers in the wrong place
xPos = random.uniform (-1.0, 1.0)
yPos = random.uniform (-1.0, 1.0)
These lines are executed only once when you enter the computePI() function. You then proceed to calculate the exact same value of hypot for hundreds or even thousands of iterations. Put these lines inside the while loop.
Integer arithmetic
pi = (4 * in_circle) / numThrows
Since in_circle and numThrows are both integers, this calculation will be performed using integer arithmetic (in Python 2, at least). Changing the constant from 4 to 4.0 will change this to a floating point calculation:
pi = (4.0 * in_circle) / numThrows
Incomplete main() function:
There's no need to use a subset of your throws tuple, and you haven't added a body to your for loop. Try this:
for numThrows in (100, 1000, 10000, 100000, 1000000, 10000000):
randpi = computePI(numThrows)
diff = randpi - math.pi
print "num = %-8d Calculated PI = %8.6f Difference = %+9.6f" % \
(numThrows, randpi, diff)
This is how I find it easy.
import random
import math
def approximate_pi():
total_points = 0
within_circle = 0
for i in range (10000):
x = random.random()
y = random.random()
total_points += 1
distance = math.sqrt(x**2+y**2)
if distance < 1:
within_circle += 1
if total_points % 1000 == 0:
pi_estimate = 4 * within_circle / total_points
yield pi_estimate
set total point generated and points withing the circle to zero
total_points = 0
within_circle = 0
generate the random values of x and y for multiple times. Calculate the distance of the point from the center of the circle or (0,0). Then if the distance is less than one it means that it's within the circle so it is incremented.
distance = math.sqrt(x**2+y**2)
if distance < 1:
within_circle += 1
Now if you have generated let's say multiple of 1000(1000 because we have taken the range for 10,000 so 1000 to get 10 values of pi), calculate the estimated value of pi using this formula which you know already.and the tied the estimate value(pi_estmate)
if total_points % 1000 == 0:
pi_estimate = 4 * within_circle / total_points
yield pi_estimate
pi_estimates = list(es for es in approximate_pi())
errors = list(estimate-math.pi for estimate in approximate_pi())
print(pi_estimates)
print(errors)
OUTPUT:
Estimates
[3.096, 3.142, 3.1253333333333333, 3.121, 3.1384, 3.136, 3.1314285714285712, 3.133, 3.1342222222222222]
Errors
[0.04240734641020705, 0.02240734641020703, 0.03307401307687341, 0.020407346410206806, 0.02320734641020694, 0.0017406797435404187, -0.009021225018364554, -0.011592653589793223, -0.016703764700904067]
Hope you understood, I hope my explanation was easy to understand, I am a beginner and learning stuff if there is anything wrong please feel free to notify.
Thank you
Essentially what the statement you've written above says:
import math
def find_pi(iterations):
return sum(
1 for _ in range(iterations) if math.hypot(
random.random(), random.random()) <= 1) * 4.0/iterations
I have a series of triplicate measures for two different samples and I would like to know if the differences are statistically significant. I cannot use the Student T-test because of the small sample size. A colleague uses an R package called limma (http://bioinf.wehi.edu.au/limma/) but I would prefer not to have to invoke R.
Here is a post that I did in the Udacity forums that shows how to get a t-statistic appropriate for various sample sizes.
This should give you a t-statistic value similar to the values shown in the table at the bottom of this Wikipedia article.
Just in case anything happens to the first link, here is the code:
# -*- coding: utf-8 -*-
from __future__ import division
import math
from scipy.stats import t
def mean(lst):
# μ = 1/N Σ(xi)
return sum(lst) / float(len(lst))
def variance(lst):
"""
Uses standard variance formula (sum of each (data point - mean) squared)
all divided by number of data points
"""
# σ² = 1/N Σ((xi-μ)²)
mu = mean(lst)
return 1.0/len(lst) * sum([(i-mu)**2 for i in lst])
def get_tstat(probability, degrees_of_freedom, tails=2):
"""get the t-statistic required for confidence intetval calculations"""
if tails not in [1,2]:
sys.exit('invalid tails parameter (1 or 2 valid)')
inv_p = 1 - ((1 - probability) / tails)
return t.ppf(inv_p, degrees_of_freedom)
def conf_int(lst=None, p=0, n=0, perc_conf=95, tails=2):
"""
Confidence interval - supply a list OR a probability (p) and sample
size (n) e.g. if you want to know the confidence interval for 1000
coin tosses (0.5 p i.e. a fair coin) then call with (None, 0.5, 1000).
If a list id provided then the relevant stats are calculated on the
list from within the function so no p or n value is required.
The result gives a figure that you can be confident to conf (e.g
95% certain for 0.95) that the result will always be within this
amount +/- from the mean.
e.g. 1000 coin tosses returns ~0.03 on a fair coin (with 0.95 conf)
which means that you can be 95% confident of a getting within 3%
of the expected no of heads if you did this experiement.
"""
if lst:
n, v = len(lst), variance(lst)
t = get_tstat(perc_conf/100, n-1)
return math.sqrt(v/n) * t
else:
if not 0 < p < 1:
sys.exit('p parameter must be >0 and <1 if lst not given')
if n == 0:
sys.exit('n parameter must be >0 if lst not given')
t = get_tstat(perc_conf/100, n-1)
return t*(math.sqrt(p*(1-p)) / math.sqrt(n))
################################################################################
# Example: 1000 coin tosses on a fair coin. What is the range that I can be 95%
# confident the result will fall within.
################################################################################
# get confidence interval
sample_size, probability, perc_conf_req = 1000, 0.5, 95
c_int = conf_int(n=sample_size, p=probability, perc_conf=perc_conf_req)
# show results
exp_heads = probability * sample_size
x = round(sample_size * c_int, 0)
print 'I can be '+str(perc_conf_req)+'% confident that the result of '+ \
str(sample_size)+' coin flips will be within +/- '+ \
str(round(c_int*100,2))+'% of '+str(int(exp_heads)) + \
' i.e. between '+str(int(exp_heads-x))+' and '+str(int(exp_heads+x))+ \
' heads (assuming a probability of '+str(probability)+' for each flip).'