Monte Carlo simulation of a system of Lennard-Jones + FENE potential - python

I want to generate two linear chains of 20 monomers each at some distance to each other. The following code generates a single chain. Could someone help me with how to generate the second chain?
The two chains are fixed to a surface i.e the first monomer of the chain is fixed and the rest of the monomers move freely in x-y-z directions but the z component of the monomers should be positive.
Something like this:
import numpy as np
import numba as nb
#import pandas as pd
#nb.jit()
def gen_chain(N):
x = np.zeros(N)
y = np.zeros(N)
z = np.linspace(0, (N)*0.9, num=N)
return np.column_stack((x, y, z)), np.column_stack((x1, y1, z1))
#coordinates = np.loadtxt('2GN_50_T_10.txt', skiprows=199950)
#return coordinates
#nb.jit()
def lj(rij2):
sig_by_r6 = np.power(sigma**2 / rij2, 3)
sig_by_r12 = np.power(sigma**2 / rij2, 6)
lje = 4 * epsilon * (sig_by_r12 - sig_by_r6)
return lje
#nb.jit()
def fene(rij2):
return (-0.5 * K * np.power(R, 2) * np.log(1 - ((np.sqrt(rij2) - r0) / R)**2))
#nb.jit()
def total_energy(coord):
# Non-bonded energy.
e_nb = 0.0
for i in range(N):
for j in range(i - 1):
ri = coord[i]
rj = coord[j]
rij = ri - rj
rij2 = np.dot(rij, rij)
if (rij2 < rcutoff_sq):
e_nb += lj(rij2)
# Bonded FENE potential energy.
e_bond = 0.0
for i in range(1, N):
ri = coord[i]
rj = coord[i - 1] # Can be [i+1] ??
rij = ri - rj
rij2 = np.dot(rij, rij)
e_bond += fene(rij2)
return e_nb + e_bond
#nb.jit()
def move(coord):
trial = np.ndarray.copy(coord)
for i in range(1, N):
while True:
delta = (2 * np.random.rand(3) - 1) * max_delta
trial[i] += delta
#while True:
if trial[i,2] > 0.0:
break
trial[i] -= delta
return trial
#nb.jit()
def accept(delta_e):
beta = 1.0 / T
if delta_e < 0.0:
return True
random_number = np.random.rand(1)
p_acc = np.exp(-beta * delta_e)
if random_number < p_acc:
return True
return False
if __name__ == "__main__":
# FENE potential parameters.
K = 40.0
R = 0.3
r0 = 0.7
# L-J potential parameters
sigma = 0.5716
epsilon = 1.0
# MC parameters
N = 20 # Numbers of monomers
rcutoff = 2.5 * sigma
rcutoff_sq = rcutoff * rcutoff
max_delta = 0.01
n_steps = 100000
T = 10
# MAIN PART OF THE CODE
coord = gen_chain(N)
energy_current = total_energy(coord)
traj = open('2GN_20_T_10.xyz', 'w')
traj_txt = open('2GN_20_T_10.txt', 'w')
for step in range(n_steps):
if step % 1000 == 0:
traj.write(str(N) + '\n\n')
for i in range(N):
traj.write("C %10.5f %10.5f %10.5f\n" % (coord[i][0], coord[i][1], coord[i][2]))
traj_txt.write("%10.5f %10.5f %10.5f\n" % (coord[i][0], coord[i][1], coord[i][2]))
print(step, energy_current)
coord_trial = move(coord)
energy_trial = total_energy(coord_trial)
delta_e = energy_trial - energy_current
if accept(delta_e):
coord = coord_trial
energy_current = energy_trial
traj.close()
I except the chain of particles to collapse into a globule.

There is some problem with the logic of the MC you are implementing.
To perform a MC you need to ATTEMPT a move, evaluate the energy of the new state and then accept/reject according to a random number.
In your code there is not the slightest sign of the attempt to move a particle.
You need to move one (or more of them), evaluate the energy, and then update your coordinates.
By the way, I suppose this is not your entire code. There are many parameters that are not defined like the "k" and the "R0" in your fene potential

The FENE potential models bond interactions. What your code is saying is that all particles within the cutoff are bonded by FENE springs, and that the bonds are not fixed but rather defined by the cutoff. With a r_cutoff = 3.0, larger than equilibrium distance of the LJ well, you are essentially considering that each particle is bonded to potentially many others. You are treating the FENE potential as a non-bonded one.
For the bond interactions you should ignore the cutoff and only evaluate the energy for the actual pairs that are bonded according to your topology, which means that first you need to define a topology. I suggest generating a linear molecule of N atoms in a box big enough to contain the whole stretched molecule, and consider the i-th atom as bonded to the (i-1)-th atom, with i = 2, ..., N. In this way the topology is well defined and persistent. Then consider both interactions separately, non-bonded and bond, and add them at the end.
Something like this, in pseudo-code:
e_nb = 0
for particle i = 1 to N:
for particle j = 1 to i-1:
if (dist(i, j) < rcutoff):
e_nb += lj(i, j)
e_bond = 0
for particle i = 2 to N:
e_bond += fene(i, i-1)
e_tot = e_nb + e_bond

Below you can find a modified version of your code. To make things simpler, in this version there is no box and no boundary conditions, just a chain in free space. The chain is initialized as a linear sequence of particles each distant 80% of R0 from the next, since R0 is the maximum length of the FENE bond. The code considers that particle i is bonded with i+1 and the bond is not broken. This code is just a proof of concept.
#!/usr/bin/python
import numpy as np
def gen_chain(N, R):
x = np.linspace(0, (N-1)*R*0.8, num=N)
y = np.zeros(N)
z = np.zeros(N)
return np.column_stack((x, y, z))
def lj(rij2):
sig_by_r6 = np.power(sigma/rij2, 3)
sig_by_r12 = np.power(sig_by_r6, 2)
lje = 4.0 * epsilon * (sig_by_r12 - sig_by_r6)
return lje
def fene(rij2):
return (-0.5 * K * R0**2 * np.log(1-(rij2/R0**2)))
def total_energy(coord):
# Non-bonded
e_nb = 0
for i in range(N):
for j in range(i-1):
ri = coord[i]
rj = coord[j]
rij = ri - rj
rij2 = np.dot(rij, rij)
if (rij2 < rcutoff):
e_nb += lj(rij2)
# Bonded
e_bond = 0
for i in range(1, N):
ri = coord[i]
rj = coord[i-1]
rij = ri - rj
rij2 = np.dot(rij, rij)
e_bond += fene(rij2)
return e_nb + e_bond
def move(coord):
trial = np.ndarray.copy(coord)
for i in range(N):
delta = (2.0 * np.random.rand(3) - 1) * max_delta
trial[i] += delta
return trial
def accept(delta_e):
beta = 1.0/T
if delta_e <= 0.0:
return True
random_number = np.random.rand(1)
p_acc = np.exp(-beta*delta_e)
if random_number < p_acc:
return True
return False
if __name__ == "__main__":
# FENE parameters
K = 40
R0 = 1.5
# LJ parameters
sigma = 1.0
epsilon = 1.0
# MC parameters
N = 50 # number of particles
rcutoff = 3.5
max_delta = 0.01
n_steps = 10000000
T = 1.5
coord = gen_chain(N, R0)
energy_current = total_energy(coord)
traj = open('traj.xyz', 'w')
for step in range(n_steps):
if step % 1000 == 0:
traj.write(str(N) + '\n\n')
for i in range(N):
traj.write("C %10.5f %10.5f %10.5f\n" % (coord[i][0], coord[i][1], coord[i][2]))
print(step, energy_current)
coord_trial = move(coord)
energy_trial = total_energy(coord_trial)
delta_e = energy_trial - energy_current
if accept(delta_e):
coord = coord_trial
energy_current = energy_trial
traj.close()
The code prints the current configuration at each step, you can just load it up on VMD and see how it behaves. The bonds will not show correctly at first on VMD, you must use a bead representation for the particles and define the bonds manually or with a script within VMD. In any case, you don't need to see the bonds to notice that the chain does not collapse.
Please bear in mind that if you want to simulate a chain at a certain density, you need to be careful to generate the correct topology. I recommend the EMC package to efficiently generate polymers at the desired thermodynamic conditions. It is by no means a trivial problem, especially for larger chains.
By the way, your code had an error in the FENE energy evaluation. rij2 is already squared, you squared it again.
Below you can see how the total energy as a function of the number of steps behaves for T = 1.0, N = 20, rcutoff = 3.5, and also the last current configuration after 10 thousand steps.
And below for N = 50, T = 1.5, max_delta = 0.01, K = 40, R = 1.5, rcutoff = 3.5, and 10 million steps. This is the last current configuration.
The full "trajectory", which isn't really a trajectory since this is MC, you can find here (it's under 6 MB).

Related

Error in implementation of Crank-Nicolson method applied to 1D TDSE?

This is more of a computational physics problem, and I've asked it on physics stack exchange, but no answers on there. This is, I suppose, a mix of the disciplines on here and there (and maybe even mathematics stack exchange), so finding the right place to post is a task in of itself apparently...
I'm attempting to use Crank-Nicolson scheme to solve the TDSE in 1D. The initial wave is a real Gaussian that has been normalised wrt its probability density. As the solution evolves, a depression grows in the central peak of the real part of the wave, and the imaginary part's central trough is perhaps a bit higher than I expect (image below).
Does this behaviour seem reasonable? I have searched around and not seen questions/figures that are similar. I've tested another person's code from Github and it exhibits the same behaviour, which makes me feel a bit better. But I still think the center peak should just decrease in height and increase in width. The likelihood of me getting a physics-based explanation is relatively low here I'd assume, but a computational-based explanation on errors I may have made is more likely.
I'm happy to give more information, for example my code, or the matrices used in the scheme, etc. Thanks in advance!
Here's a link to GIF of time evolution:
And the part of my code relevant to solving the 1D TDSE:
(pretty much the entire thing except the plotting)
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation
# Define function for norm.
def normf(dxc, uc, ic):
return sum(dxc * np.square(np.abs(uc[ic, :])))
# Define function for expectation value of position.
def xexpf(dxc, xc, uc, ic):
return sum(dxc * xc * np.square(np.abs(uc[ic, :])))
# Define function for expectation value of squared position.
def xexpsf(dxc, xc, uc, ic):
return sum(dxc * np.square(xc) * np.square(np.abs(uc[ic, :])))
# Define function for standard deviation.
def sdaf(xexpc, xexpsc, ic):
return np.sqrt(xexpsc[ic] - np.square(xexpc[ic]))
# Time t: t0 =< t =< tf. Have N steps at which to evaluate the CN scheme. The
# time interval is dt. decp: variable for plotting to certain number of decimal
# places.
t0 = 0
tf = 20
N = 200
dt = tf / N
t = np.linspace(t0, tf, num = N + 1, endpoint = True)
decp = str(dt)[::-1].find('.')
# Initialise array for filling with norm values at each time step.
norm = np.zeros(len(t))
# Initialise array for expectation value of position.
xexp = np.zeros(len(t))
# Initialise array for expectation value of squared position.
xexps = np.zeros(len(t))
# Initialise array for alternate standard deviation.
sda = np.zeros(len(t))
# Position x: -a =< x =< a. M is an even number. There are M + 1 total discrete
# positions, for the points to be symmetric and centred at x = 0.
a = 100
M = 1200
dx = (2 * a) / M
x = np.linspace(-a, a, num = M + 1, endpoint = True)
# The gaussian function u diffuses over time. sd sets the width of gaussian. u0
# is the initial gaussian at t0.
sd = 1
var = np.power(sd, 2)
mu = 0
u0 = np.sqrt(1 / np.sqrt(np.pi * var)) * np.exp(-np.power(x - mu, 2) / (2 * \
var))
u = np.zeros([len(t), len(x)], dtype = 'complex_')
u[0, :] = u0
# Normalise u.
u[0, :] = u[0, :] / np.sqrt(normf(dx, u, 0))
# Set coefficients of CN scheme.
alpha = dt * -1j / (4 * np.power(dx, 2))
beta = dt * 1j / (4 * np.power(dx, 2))
# Tridiagonal matrices Al and AR. Al to be solved using Thomas algorithm.
Al = np.zeros([len(x), len(x)], dtype = 'complex_')
for i in range (0, M):
Al[i + 1, i] = alpha
Al[i, i] = 1 - (2 * alpha)
Al[i, i + 1] = alpha
# Corner elements for BC's.
Al[M, M], Al[0, 0] = 1 - alpha, 1 - alpha
Ar = np.zeros([len(x), len(x)], dtype = 'complex_')
for i in range (0, M):
Ar[i + 1, i] = beta
Ar[i, i] = 1 - (2 * beta)
Ar[i, i + 1] = beta
# Corner elements for BC's.
Ar[M, M], Ar[0, 0] = 1 - 2*beta, 1 - beta
# Thomas algorithm variables. Following similar naming as in Wiki article.
a = np.diag(Al, -1)
b = np.diag(Al)
c = np.diag(Al, 1)
NT = len(b)
cp = np.zeros(NT - 1, dtype = 'complex_')
for n in range(0, NT - 1):
if n == 0:
cp[n] = c[n] / b[n]
else:
cp[n] = c[n] / (b[n] - (a[n - 1] * cp[n - 1]))
d = np.zeros(NT, dtype = 'complex_')
dp = np.zeros(NT, dtype = 'complex_')
# Iterate over each time step to solve CN method. Maintain boundary
# conditions. Keep track of standard deviation.
for i in range(0, N):
# BC's.
u[i, 0], u[i, M] = 0, 0
# Find RHS.
d = np.dot(Ar, u[i, :])
for n in range(0, NT):
if n == 0:
dp[n] = d[n] / b[n]
else:
dp[n] = (d[n] - (a[n - 1] * dp[n - 1])) / (b[n] - (a[n - 1] * \
cp[n - 1]))
nc = NT - 1
while nc > -1:
if nc == NT - 1:
u[i + 1, nc] = dp[nc]
nc -= 1
else:
u[i + 1, nc] = dp[nc] - (cp[nc] * u[i + 1, nc + 1])
nc -= 1
norm[i] = normf(dx, u, i)
xexp[i] = xexpf(dx, x, u, i)
xexps[i] = xexpsf(dx, x, u, i)
sda[i] = sdaf(xexp, xexps, i)
# Fill in final norm value.
norm[N] = normf(dx, u, N)
# Fill in final position expectation value.
xexp[N] = xexpf(dx, x, u, N)
# Fill in final squared position expectation value.
xexps[N] = xexpsf(dx, x, u, N)
# Fill in final standard deviation value.
sda[N] = sdaf(xexp, xexps, N)

How to use argmin() and find minimum value from array

I'm new to python so the code may not be the best. I'm trying to find the minimum Total Cost (TotalC) and the corresponding m,k and xM values that go with this minimum cost. I'm not sure how to do this. I have tried using min(TotalC) however this gives an error within the loop or outside the loop only returns the value of TotalC and not the corresponding m, k, and xM values. Any help would be appreciated. This section is at the end of the code, I have included my entire code.
I have tried using
minIndex = TotalC.argmin()
but I'm not sure how to use it and it only returns 0 each time.
import numpy as np
import matplotlib.pyplot as plt
def Load(x):
Fpeak = (1000 + (9*(x**2) - (183*x))) *1000 #Fpeak in N
td = (20 - ((0.12)*(x**2)) + (4.2*(x))) / 1000 #td in s
return Fpeak, td
#####################################################################################################
####################### Part 2 ########################
def displacement(m,k,x,dt): #Displacement function
Fpeak, td = Load(x) #Load Function from step 1
w = np.sqrt(k/m) # Natural circular frequency
T = 2 * np.pi /w #Natural period of blast (s)
time = np.arange(0,2*T,0.001) #Time array with range (0 - 2*T) with steps of 2*T/100
zt = [] #Create a lsit to store displacement values
for t in time:
if (t <= td):
zt.append((Fpeak/k) * (1 - np.cos(w*t)) + (Fpeak/(k*td)) * ((np.sin(w*t)/w) - t))
else:
zt.append((Fpeak/(k*w*td)) * (np.sin(w*t) - np.sin(w*(t-td))) - ((Fpeak/k) * np.cos(w*t)))
zmax=max(zt) #Find the max displacement from the list of zt values
return zmax #Return max displacement
k = 1E6
m = 200
dt = 0.0001
x = 0
z = displacement(m,k,x,dt)
###################################################################################
############### Part 3 #######################
# k = 1E6 , m = 200kg , Deflection = 0.1m
k_values = np.arange(1E6, 7E6, ((7E6-1E6)/10)) #List of k values between min and max (1E6 and 7E6).
m_values = np.arange(200,1200,((1200-200)/10)) #List of m values between min and max 200kg and 1200kg
xM = []
for k in k_values: # values of k
for m in m_values: # values of m within k for loop
def bisector(m,k,dpoint,dt): #dpoint = decimal point accuracy
xL = 0
xR = 10
xM = (xL + xR)/2
zmax = 99
while round(zmax, dpoint) !=0.1:
zmax = displacement(m,k,xM,dt)
if zmax > 0.1:
xL = xM
xM = (xL + xR)/2
else:
xR = xM
xM = (xL + xR)/2
return xM
xM = bisector(m, k, 4, 0.001)
print('xM value =',xM)
#####################################################
#######Step 4
def cost (m,k,xM):
Ck = 900 + 825*((k/1E6)**2) - (1725*(k/1E6))
Cm = 10*m - 2000
Cx = 2400*((xM**2)/4)
TotalC = Ck + Cm + Cx
minIndex = TotalC.argmin(0)
print(minIndex)
return TotalC
TotalC = cost(m, k, xM)
minIndex = TotalC.argmin()
print(minIndex)
print([xM, m, k, TotalC])
argmin() returns the index of a minimum value. If you are looking for the minimum itself, try using .min(). There is also a possibility that 0 is the lowest value in Your array so bear that in mind

Jacobi Method Outputting Wrong Eigenvalues

I am working on creating an eigenvalue calculator using the Jacobi method and it runs without errors. However, it does not find the correct eigenvalues nor does it find the correct eigenvectors. For some reason, I always get eigenvalues of 0. I think it may not be saving the matrix I input for MatrixA.
(Link to Jacobi method in case you are not familiar: http://fourier.eng.hmc.edu/e176/lectures/ch1/node1.html)
import numpy as np
import bettertimeit as time
import matplotlib as plt
def Jacobi(A):
n = A.shape[0] # matrix size #columns = #lines
maxit = 100 # maximum number of iterations
eps = 1.0e-15 # accuracy goal
pi = np.pi
info = 0 # return flag
ev = np.zeros(n,float) # initialize eigenvalues
U = np.zeros((n,n),float) # initialize eigenvector
for i in range(0,n): U[i,i] = 1.0
for t in range(0,maxit):
s = 0 # compute sum of off-diagonal elements in A(i,j)
for i in range(0,n): s = s + np.sum(np.abs(A[i,(i+1):n]))
if (s < eps): # diagonal form reached
info = t
for i in range(0,n):ev[i] = A[i,i]
break
else:
limit = s/(n*(n-1)/2.0) # average value of off-diagonal elements
for i in range(0,n-1): # loop over lines of matrix
for j in range(i+1,n): # loop over columns of matrix
if (np.abs(A[i,j]) > limit): # determine (ij) such that |A(i,j)| larger than average
# value of off-diagonal elements
denom = A[i,i] - A[j,j] # denominator of Eq. (3.61)
if (np.abs(denom) < eps): phi = pi/4 # Eq. (3.62)
else: phi = 0.5*np.arctan(2.0*A[i,j]/denom) # Eq. (3.61)
si = np.sin(phi)
co = np.cos(phi)
for k in range(i+1,j):
store = A[i,k]
A[i,k] = A[i,k]*co + A[k,j]*si # Eq. (3.56)
A[k,j] = A[k,j]*co - store *si # Eq. (3.57)
for k in range(j+1,n):
store = A[i,k]
A[i,k] = A[i,k]*co + A[j,k]*si # Eq. (3.56)
A[j,k] = A[j,k]*co - store *si # Eq. (3.57)
for k in range(0,i):
store = A[k,i]
A[k,i] = A[k,i]*co + A[k,j]*si
A[k,j] = A[k,j]*co - store *si
store = A[i,i]
A[i,i] = A[i,i]*co*co + 2.0*A[i,j]*co*si +A[j,j]*si*si # Eq. (3.58)
A[j,j] = A[j,j]*co*co - 2.0*A[i,j]*co*si +store *si*si # Eq. (3.59)
A[i,j] = 0.0 # Eq. (3.60)
for k in range(0,n):
store = U[k,j]
U[k,j] = U[k,j]*co - U[k,i]*si # Eq. (3.66)
U[k,i] = U[k,i]*co + store *si # Eq. (3.67)
info = -t # in case no convergence is reached set info to a negative value "-t"
return ev,U,t
n = int(input("Enter the matrix size: "))
A = np.zeros((n, n))
for i in range(n):
A[i] = input().split(" ")
MatrixA = np.array(A)
print("A= ")
print(A)
for i in range(A.shape[0]):
row = ["{}*x{}".format(A[i, j], j + 1) for j in range(A.shape[1])]
# Jacobi-method
ev,U,t = Jacobi(A)
print ("JACOBI METHOD: Number of rotations = ",t)
print ("Eigenvalues = ",ev)
print ("Eigenvectors = ")
print (U)

making poisson spheres distribution on python but cannot figure out where is the bug

I am new to programming, so I hope my stupid questions do not bug you.
I am now trying to calculate the poisson sphere distribution(a 3D version of the poisson disk) using python and then plug in the result to POV-RAY so that I can generate some random distributed packing rocks.
I am following these two links:
[https://github.com/CodingTrain/Rainbow-Code/blob/master/CodingChallenges/CC_33_poisson_disc/sketch.js#L13]
[https://www.cs.ubc.ca/~rbridson/docs/bridson-siggraph07-poissondisk.pdf]
tl;dr
0.Create an n-dimensional grid array and cell size = r/sqrt(n) where r is the minimum distance between each sphere. All arrays are set to be default -1 which stands for 'without point'
1.Create an initial sample. (it should be placed randomly but I choose to put it in the middle). Put it in the grid array. Also, intialize an active array. Put the initial sample in the active array.
2.While the active list is not empty, pick a random index. Generate points near it and make sure the points are not overlapping with nearby points(only test with the nearby arrays). If no sample can be created near the 'random index', kick the 'random index' out. Loop the process.
And here is my code:
import math
from random import uniform
import numpy
import random
radius = 1 #you can change the size of each sphere
mindis = 2 * radius
maxx = 10 #you can change the size of the container
maxy = 10
maxz = 10
k = 30
cellsize = mindis / math.sqrt(3)
nrofx = math.floor(maxx / cellsize)
nrofy = math.floor(maxy / cellsize)
nrofz = math.floor(maxz / cellsize)
grid = []
active = []
default = numpy.array((-1, -1, -1))
for fillindex in range(nrofx * nrofy * nrofz):
grid.append(default)
x = uniform(0, maxx)
y = uniform(0, maxy)
z = uniform(0, maxz)
firstpos = numpy.array((x, y, z))
firsti = maxx // 2
firstj = maxy // 2
firstk = maxz // 2
grid[firsti + nrofx * (firstj + nrofy * firstk)] = firstpos
active.append(firstpos)
while (len(active) > 0) :
randindex = math.floor(uniform(0,len(active)))
pos = active[randindex]
found = False
for attempt in range(k):
offsetx = uniform(mindis, 2 * mindis)
offsety = uniform(mindis, 2 * mindis)
offsetz = uniform(mindis, 2 * mindis)
samplex = offsetx * random.choice([1,-1])
sampley = offsety * random.choice([1,-1])
samplez = offsetz * random.choice([1,-1])
sample = numpy.array((samplex, sampley, samplez))
sample = numpy.add(sample, pos)
xcoor = math.floor(sample.item(0) / cellsize)
ycoor = math.floor(sample.item(1) / cellsize)
zcoor = math.floor(sample.item(2) / cellsize)
attemptindex = xcoor + nrofx * (ycoor + nrofy * zcoor)
if attemptindex >= 0 and attemptindex < nrofx * nrofy * nrofz and numpy.all([sample, default]) == True and xcoor > 0 and ycoor > 0 and zcoor > 0 :
test = True
for testx in range(-1,2):
for testy in range(-1, 2):
for testz in range(-1, 2):
testindex = (xcoor + testx) + nrofx * ((ycoor + testy) + nrofy * (zcoor + testz))
if testindex >=0 and testindex < nrofx * nrofy * nrofz :
neighbour = grid[testindex]
if numpy.all([neighbour, sample]) == False:
if numpy.all([neighbour, default]) == False:
distance = numpy.linalg.norm(sample - neighbour)
if distance > mindis:
test = False
if test == True and len(active)<len(grid):
found = True
grid[attemptindex] = sample
active.append(sample)
if found == False:
del active[randindex]
for printout in range(len(grid)):
print("<" + str(active[printout][0]) + "," + str(active[printout][1]) + "," + str(active[printout][2]) + ">")
print(len(grid))
My code seems to run forever.
Therefore I tried to add a print(len(active)) in the last of the while loop.
Surprisingly, I think I discovered the bug as the length of the active list just keep increasing! (It is supposed to be the same length as the grid) I think the problem is caused by the active.append(), but I can't figure out where is the problem as the code is literally the 90% the same as the one made by Mr.Shiffman.
I don't want to free ride this but I have already checked again and again while correcting again and again for this code :(. Still, I don't know where the bug is. (why do the active[] keep appending!?)
Thank you for the precious time.

Python implementation of the Wilson Score Interval?

After reading How Not to Sort by Average Rating, I was curious if anyone has a Python implementation of a Lower bound of Wilson score confidence interval for a Bernoulli parameter?
Reddit uses the Wilson score interval for comment ranking, an explanation and python implementation can be found here
#Rewritten code from /r2/r2/lib/db/_sorts.pyx
from math import sqrt
def confidence(ups, downs):
n = ups + downs
if n == 0:
return 0
z = 1.0 #1.44 = 85%, 1.96 = 95%
phat = float(ups) / n
return ((phat + z*z/(2*n) - z * sqrt((phat*(1-phat)+z*z/(4*n))/n))/(1+z*z/n))
I think this one has a wrong wilson call, because if you have 1 up 0 down you get NaN because you can't do a sqrt on the negative value.
The correct one can be found when looking at the ruby example from the article How not to sort by average page:
return ((phat + z*z/(2*n) - z * sqrt((phat*(1-phat)+z*z/(4*n))/n))/(1+z*z/n))
To get the Wilson CI without continuity correction, you can use proportion_confint in statsmodels.stats.proportion. To get the Wilson CI with continuity correction, you can use the code below.
# cf.
# [1] R. G. Newcombe. Two-sided confidence intervals for the single proportion, 1998
# [2] R. G. Newcombe. Interval Estimation for the difference between independent proportions: comparison of eleven methods, 1998
import numpy as np
from statsmodels.stats.proportion import proportion_confint
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
def propci_wilson_cc(count, nobs, alpha=0.05):
# get confidence limits for proportion
# using wilson score method w/ cont correction
# i.e. Method 4 in Newcombe [1];
# verified via Table 1
from scipy import stats
n = nobs
p = count/n
q = 1.-p
z = stats.norm.isf(alpha / 2.)
z2 = z**2
denom = 2*(n+z2)
num = 2.*n*p+z2-1.-z*np.sqrt(z2-2-1./n+4*p*(n*q+1))
ci_l = num/denom
num = 2.*n*p+z2+1.+z*np.sqrt(z2+2-1./n+4*p*(n*q-1))
ci_u = num/denom
if p == 0:
ci_l = 0.
elif p == 1:
ci_u = 1.
return ci_l, ci_u
def dpropci_wilson_nocc(a,m,b,n,alpha=0.05):
# get confidence limits for difference in proportions
# a/m - b/n
# using wilson score method WITHOUT cont correction
# i.e. Method 10 in Newcombe [2]
# verified via Table II
theta = a/m - b/n
l1, u1 = proportion_confint(count=a, nobs=m, alpha=0.05, method='wilson')
l2, u2 = proportion_confint(count=b, nobs=n, alpha=0.05, method='wilson')
ci_u = theta + np.sqrt((a/m-u1)**2+(b/n-l2)**2)
ci_l = theta - np.sqrt((a/m-l1)**2+(b/n-u2)**2)
return ci_l, ci_u
def dpropci_wilson_cc(a,m,b,n,alpha=0.05):
# get confidence limits for difference in proportions
# a/m - b/n
# using wilson score method w/ cont correction
# i.e. Method 11 in Newcombe [2]
# verified via Table II
theta = a/m - b/n
l1, u1 = propci_wilson_cc(count=a, nobs=m, alpha=alpha)
l2, u2 = propci_wilson_cc(count=b, nobs=n, alpha=alpha)
ci_u = theta + np.sqrt((a/m-u1)**2+(b/n-l2)**2)
ci_l = theta - np.sqrt((a/m-l1)**2+(b/n-u2)**2)
return ci_l, ci_u
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
# single proportion testing
# these come from Newcombe [1] (Table 1)
a_vec = np.array([81, 15, 0, 1])
m_vec = np.array([263, 148, 20, 29])
for (a,m) in zip(a_vec,m_vec):
l1, u1 = proportion_confint(count=a, nobs=m, alpha=0.05, method='wilson')
l2, u2 = propci_wilson_cc(count=a, nobs=m, alpha=0.05)
print(a,m,l1,u1,' ',l2,u2)
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
# difference in proportions testing
# these come from Newcombe [2] (Table II)
a_vec = np.array([56,9,6,5,0,0,10,10],dtype=float)
m_vec = np.array([70,10,7,56,10,10,10,10],dtype=float)
b_vec = np.array([48,3,2,0,0,0,0,0],dtype=float)
n_vec = np.array([80,10,7,29,20,10,20,10],dtype=float)
print('\nWilson without CC')
for (a,m,b,n) in zip(a_vec,m_vec,b_vec,n_vec):
l, u = dpropci_wilson_nocc(a,m,b,n,alpha=0.05)
print('{:2.0f}/{:2.0f}-{:2.0f}/{:2.0f} ; {:6.4f} ; {:8.4f}, {:8.4f}'.format(a,m,b,n,a/m-b/n,l,u))
print('\nWilson with CC')
for (a,m,b,n) in zip(a_vec,m_vec,b_vec,n_vec):
l, u = dpropci_wilson_cc(a,m,b,n,alpha=0.05)
print('{:2.0f}/{:2.0f}-{:2.0f}/{:2.0f} ; {:6.4f} ; {:8.4f}, {:8.4f}'.format(a,m,b,n,a/m-b/n,l,u))
HTH
The accepted solution seems to use a hard-coded z-value (best for performance).
In the event that you wanted a direct python equivalent of the ruby formula from the blogpost with a dynamic z-value (based on the confidence interval):
import math
import scipy.stats as st
def ci_lower_bound(pos, n, confidence):
if n == 0:
return 0
z = st.norm.ppf(1 - (1 - confidence) / 2)
phat = 1.0 * pos / n
return (phat + z * z / (2 * n) - z * math.sqrt((phat * (1 - phat) + z * z / (4 * n)) / n)) / (1 + z * z / n)
If you'd like to actually calculate z directly from a confidence bound and want to avoid installing numpy/scipy, you can use the following snippet of code,
import math
def binconf(p, n, c=0.95):
'''
Calculate binomial confidence interval based on the number of positive and
negative events observed. Uses Wilson score and approximations to inverse
of normal cumulative density function.
Parameters
----------
p: int
number of positive events observed
n: int
number of negative events observed
c : optional, [0,1]
confidence percentage. e.g. 0.95 means 95% confident the probability of
success lies between the 2 returned values
Returns
-------
theta_low : float
lower bound on confidence interval
theta_high : float
upper bound on confidence interval
'''
p, n = float(p), float(n)
N = p + n
if N == 0.0: return (0.0, 1.0)
p = p / N
z = normcdfi(1 - 0.5 * (1-c))
a1 = 1.0 / (1.0 + z * z / N)
a2 = p + z * z / (2 * N)
a3 = z * math.sqrt(p * (1-p) / N + z * z / (4 * N * N))
return (a1 * (a2 - a3), a1 * (a2 + a3))
def erfi(x):
"""Approximation to inverse error function"""
a = 0.147 # MAGIC!!!
a1 = math.log(1 - x * x)
a2 = (
2.0 / (math.pi * a)
+ a1 / 2.0
)
return (
sign(x) *
math.sqrt( math.sqrt(a2 * a2 - a1 / a) - a2 )
)
def sign(x):
if x < 0: return -1
if x == 0: return 0
if x > 0: return 1
def normcdfi(p, mu=0.0, sigma2=1.0):
"""Inverse CDF of normal distribution"""
if mu == 0.0 and sigma2 == 1.0:
return math.sqrt(2) * erfi(2 * p - 1)
else:
return mu + math.sqrt(sigma2) * normcdfi(p)
Here is a simplified (no need for numpy) and slightly improved (0 and n values for k do not cause a math domain error) version of the Wilson score confidence interval with continuity correction, from the original sourcecode written by batesbatesbates in another answer, and also a pure python no-numpy non-continuity correction version, with 2 equivalent ways to calculate (can be switched with eqmode argument, but both ways give the exact same non-continuity correction results):
import math
def propci_wilson_nocc(k, n, z=1.96, eqmode=0):
# Calculates the Binomial Proportion Confidence Interval using the Wilson Score method without continuation correction
# Equations eqmode == 1 from: https://en.wikipedia.org/w/index.php?title=Binomial_proportion_confidence_interval&oldid=1101942017#Wilson_score_interval
# Equations eqmode == 0 from: https://www.evanmiller.org/how-not-to-sort-by-average-rating.html
# The results should be close to:
# from statsmodels.stats.proportion import proportion_confint
# proportion_confint(k, n, alpha=0.05, method='wilson')
#z=1.44 = 85%, 1.96 = 95%
if n == 0:
return 0
p_hat = float(k) / n
z2 = z**2
if eqmode == 0:
ci_l = (p_hat + z2/(2*n) - z*math.sqrt(max(0.0, (p_hat*(1 - p_hat) + z2/(4*n))/n))) / (1 + z2 / n)
else:
ci_l = (1.0 / (1.0 + z2/n)) * (p_hat + z2/(2*n)) - (z / (1 + z2/n)) * math.sqrt(max(0.0, (p_hat*(1 - p_hat)/n + z2/(4*(n**2)))))
if eqmode == 0:
ci_u = (p_hat + z2/(2*n) + z*math.sqrt(max(0.0, (p_hat*(1 - p_hat) + z2/(4*n))/n))) / (1 + z2 / n)
else:
ci_u = (1.0 / (1.0 + z2/n)) * (p_hat + z2/(2*n)) + (z / (1 + z2/n)) * math.sqrt(max(0.0, (p_hat*(1 - p_hat)/n + z2/(4*(n**2)))))
return [ci_l, ci_u]
def propci_wilson_cc(n, k, z=1.96):
# Calculates the Binomial Proportion Confidence Interval using the Wilson Score method with continuation correction
# i.e. Method 4 in Newcombe [1]: R. G. Newcombe. Two-sided confidence intervals for the single proportion, 1998;
# verified via Table 1
# originally written by batesbatesbates https://stackoverflow.com/questions/10029588/python-implementation-of-the-wilson-score-interval/74021634#74021634
p_hat = k/n
q = 1.0-p
z2 = z**2
denom = 2*(n+z2)
num = 2.0*n*p_hat + z2 - 1.0 - z*math.sqrt(max(0.0, z2 - 2 - 1.0/n + 4*p_hat*(n*q + 1)))
ci_l = num/denom
num2 = 2.0*n*p_hat + z2 + 1.0 + z*math.sqrt(max(0.0, z2 + 2 - 1.0/n + 4*p_hat*(n*q - 1)))
ci_u = num2/denom
if p_hat == 0:
ci_l = 0.0
elif p_hat == 1:
ci_u = 1.0
return [ci_l, ci_u]
Note that the returned value will always be bounded between [0.0, 1.0] (due to how p_hat is a ratio of k/n), this is why it's a score and not really a confidence interval, but it's easy to project back to a confidence interval by multiplying ci_l * n and ci_u * n, these values will be in the same domain as k and can be plotted alongside.
Here is a much more readable version for how to compute the Wilson Score interval without continuity correction, by Bartosz Mikulski:
from math import sqrt
def wilson(p, n, z = 1.96):
denominator = 1 + z**2/n
centre_adjusted_probability = p + z*z / (2*n)
adjusted_standard_deviation = sqrt((p*(1 - p) + z*z / (4*n)) / n)
lower_bound = (centre_adjusted_probability - z*adjusted_standard_deviation) / denominator
upper_bound = (centre_adjusted_probability + z*adjusted_standard_deviation) / denominator
return (lower_bound, upper_bound)

Categories