I've been reading about the Metropolis-Hastings (MH) algorithm. Theoretically, I understood how the algorithm works. Now, I am trying to implement the MH algorithm using python.
I came across the following notebook. It suits exactly my problem since I want to fit my data by a straight line taking into consideration the measurement errors on my data. I am going to paste the code I am finding difficulties to understand:
# initial m, b
m,b = 2, 0
# step sizes
mstep, bstep = 0.1, 10.
# how many steps?
nsteps = 10000
chain = []
probs = []
naccept = 0
print 'Running MH for', nsteps, 'steps'
# First point:
L_old = straight_line_log_likelihood(x, y, sigmay, m, b)
p_old = straight_line_log_prior(m, b)
prob_old = np.exp(L_old + p_old)
for i in range(nsteps):
# step
mnew = m + np.random.normal() * mstep
bnew = b + np.random.normal() * bstep
# evaluate probabilities
# prob_new = straight_line_posterior(x, y, sigmay, mnew, bnew)
L_new = straight_line_log_likelihood(x, y, sigmay, mnew, bnew)
p_new = straight_line_log_prior(mnew, bnew)
prob_new = np.exp(L_new + p_new)
if (prob_new / prob_old > np.random.uniform()):
# accept
m = mnew
b = bnew
L_old = L_new
p_old = p_new
prob_old = prob_new
naccept += 1
else:
# Stay where we are; m,b stay the same, and we append them
# to the chain below.
pass
chain.append((b,m))
probs.append((L_old,p_old))
print 'Acceptance fraction:', naccept/float(nsteps)
The code is simple and easy, but I have difficulties in understanding how the MH is being implemented.
My question is in the chain.append (the third line from the bottom). The author is appending m and b whether they were accepted or rejected. Why? Shouldn't he append only the accepted points?
The following R code demonstrates why it is important to capture the rejected case:
# 20 samples from 0 or 1. 1 has an 80% probability of being chosen.
the.population <- sample(c(0,1), 20, replace = TRUE, prob=c(0.2, 0.8))
# Create a new sample that only catches changes
the.sample <- c(the.population[1])
# Loop though the.population,
# but only copy the.population to the.sample if the value changes
for( i in 2:length(the.population))
{
if(the.population[i] != the.population[i-1])
the.sample <- append(the.sample, the.population[i])
}
When this code runs, the.population gets 20 values, for example:
0 1 1 1 1 1 1 1 1 0 1 1 1 1 0 0 1 1 1 1
The probability of a 1 in this population is 16/20 or 0.8. Exactly the probability we expected...
The sample, on the other hand, which only records changes, looks like this:
0 1 0 1 0 1
The probability of a 1 in the sample is 3/6 or 0.5.
We are trying to build a distribution, rejecting the new values means that the old values are more likely than the new values. That needs to be captured so our distribution is correct.
From a quick reading of the algorithm description: When a candidate is rejected, it still counts as a step, but the value is the same as the old step. I.e. b, m are appended either way, but they only get updated (to bnew, mnew) in the case where the candidate is accepted.
Related
I am writing this simple function to use power iteration for the dominant eigenvalue. I want to put 2 stop conditionals. One for iterations and one for a precision threshold. But this error calculation does not work.
What a i doing wrong here in principle ?
#power ite. vanilla
A = np.random.uniform(low=-5.0, high=10.0, size=[3,3])
def power_iteration(A, maxiter, threshold):
b0 = np.random.rand(A.shape[1])
it = 0
error = 0
while True:
for i in range(maxiter):
b1 = np.dot(A, b0)
b1norm = np.linalg.norm(b1)
error = np.linalg.norm(b1-b0)
b0 = b1/b1norm
domeig = (b0#A#b0)/np.dot(b0, b0)
if error<threshold:
break
elif it>maxiter:
break
else:
error = 0
it = it + 1
return b0, domeig, it, error
result = power_iteration(A, 10, 0.1)
result
The output shows a very correct eigenvalue of ~9 and corresponding eigenvector ( i checked with numpy)
But the error is off. There is no way the length of the difference vector is 8. Considering the result is very close to the actual.
How i want to calculate error is the norm of the difference between the current eigenvector - the previous (b0). I start the error = 0 because the first iteration is guaranteed to give a big difference if b0 is chosen random
(array([ 0.06009408, 0.95411524, -0.2933476 ]),
9.001665234545708,
11,
8.001665234545815)
Tried to make a loop stop by 2 conditions. One gets ignored
Seems to work much better like this.
def power_it(matrix, iterations, threshold):
domeigenvector = np.random.rand(matrix.shape[1])
counter = np.random.rand(matrix.shape[1])
it = 0
error = 0
for i in range(iterations):
k1 = np.dot(A, domeigenvector)
k1norm = np.linalg.norm(k1)
domeigenvector = k1/k1norm
error = np.linalg.norm(domeigenvector-counter)
counter = domeigenvector
domeigenvalue = (domeigenvector#A#domeigenvector)/np.dot(domeigenvector, domeigenvector)
it = it + 1
if error < threshold:
break
return domeigenvalue, domeigenvector, it
I can now use Schur deflation to calculate the rest of the eigenpairs.
I have the following data:
0.8340502011561366 0.8423491600218922
0.8513456021654467
0.8458192388553084
0.8440111276014195
0.8489589671423143
0.8738088120491972
0.8845129900705279
0.8988298998926688
0.924633964692693
0.9544790734065157
0.9908034431246875
1.0236430466543138
1.061619773027915
1.1050038249835414
1.1371449802490126
1.1921182610371368
1.2752207659022576
1.344047620255176
1.4198117350668353
1.507943067143741
1.622137968203745
1.6814098429502085
1.7646810054280595
1.8485457435775694
1.919591124757554
1.9843144220593145
2.030158014640226
2.018184122476175
2.0323466012624207
2.0179200409023874
2.0316932950853723
2.013683870089898
2.03010703506514
2.0216151623726977
2.038855467786505
2.0453923522466093
2.03759031642753
2.019424996752278
2.0441806106428606
2.0607521369415136
2.059310067318373
2.0661157975162485
2.053216429539864
2.0715123971225564
2.0580473413362075
2.055814512721712
2.0808278560688964
2.0601637029377113
2.0539429365156003
2.0609648613513754
2.0585135712612646
2.087674625814453
2.062482961966647
2.066476100210777
2.0568444178944967
2.0587903943282266
2.0506399365756396
The data plotted looks like:
I want to find the point where the slope changes in sign (I circled it in black. Should be around index 26):
I need to find this point of change for several hundred files. So far I tried the recommendation from this post:
Finding the point of a slope change as a free parameter- Python
I think since my data is a bit noisey I am not getting a smooth transition in the change of the slope.
This is the code I have tried so far:
import numpy as np
#load 1-D data file
file = str(sys.argv[1])
y = np.loadtxt(file)
#create X based on file length
x = np.linspace(1,len(y), num=len(y))
Find first derivative:
m = np.diff(y)/np.diff(x)
print(m)
#Find second derivative
b = np.diff(m)
print(b)
#find Index
index = 0
for difference in b:
index += 1
if difference < 0:
print(index, difference)
Since my data is noisey I am getting some negative values before the index I want. The index I want it to retrieve in this case is around 26 (which is where my data becomes constant). Does anyone have any suggestions on what I can do to solve this issue? Thank you!
A gradient approach is useless in this case because you don't care about velocities or vector fields. The knowledge of the gradient don't add extra information to locate the maximum value since the run are always positive hence will not effect the sign of the gradient. A method based entirly on raise is suggested.
Detect the indices for which the data are decreasing, find the difference between them and the location of the max value. Then by index manipulation you can find the value for which data has a maximum.
data = '0.8340502011561366 0.8423491600218922 0.8513456021654467 0.8458192388553084 0.8440111276014195 0.8489589671423143 0.8738088120491972 0.8845129900705279 0.8988298998926688 0.924633964692693 0.9544790734065157 0.9908034431246875 1.0236430466543138 1.061619773027915 1.1050038249835414 1.1371449802490126 1.1921182610371368 1.2752207659022576 1.344047620255176 1.4198117350668353 1.507943067143741 1.622137968203745 1.6814098429502085 1.7646810054280595 1.8485457435775694 1.919591124757554 1.9843144220593145 2.030158014640226 2.018184122476175 2.0323466012624207 2.0179200409023874 2.0316932950853723 2.013683870089898 2.03010703506514 2.0216151623726977 2.038855467786505 2.0453923522466093 2.03759031642753 2.019424996752278 2.0441806106428606 2.0607521369415136 2.059310067318373 2.0661157975162485 2.053216429539864 2.0715123971225564 2.0580473413362075 2.055814512721712 2.0808278560688964 2.0601637029377113 2.0539429365156003 2.0609648613513754 2.0585135712612646 2.087674625814453 2.062482961966647 2.066476100210777 2.0568444178944967 2.0587903943282266 2.0506399365756396'
data = data.split()
import numpy as np
a = np.array(data, dtype=float)
diff = np.diff(a)
neg_indeces = np.where(diff<0)[0]
neg_diff = np.diff(neg_indeces)
i_max_dif = np.where(neg_diff == neg_diff.max())[0][0] + 1
i_max = neg_indeces[i_max_dif] - 1 # because aise as a difference of two consecutive values
print(i_max, a[i_max])
Output
26 1.9843144220593145
Some details
print(neg_indeces) # all indeces of the negative values in the data
# [ 2 3 27 29 31 33 36 37 40 42 44 45 47 48 50 52 54 56]
print(neg_diff) # difference between such indices
# [ 1 24 2 2 2 3 1 3 2 2 1 2 1 2 2 2 2]
print(neg_diff.max()) # value with highest difference
# 24
print(i_max_dif) # location of the max index of neg_indeces -> 27
# 2
print(i_max) # index of the max of the origonal data
# 26
When the first derivative changes sign, that's when the slope sign changes. I don't think you need the second derivative, unless you want to determine the rate of change of the slope. You also aren't getting the second derivative. You're just getting the difference of the first derivative.
Also, you seem to be assigning arbitrary x values. If you're y-values represent points that are equally spaced apart, than it's ok, otherwise the derivative will be wrong.
Here's an example of how to get first and second der...
import numpy as np
x = np.linspace(1, 100, 1000)
y = np.cos(x)
# Find first derivative:
m = np.diff(y)/np.diff(x)
#Find second derivative
m2 = np.diff(m)/np.diff(x[:-1])
print(m)
print(m2)
# Get x-values where slope sign changes
c = len(m)
changes_index = []
for i in range(1, c):
prev_val = m[i-1]
val = m[i]
if prev_val < 0 and val > 0:
changes_index.append(i)
elif prev_val > 0 and val < 0:
changes_index.append(i)
for i in changes_index:
print(x[i])
notice I had to curtail the x values for the second der. That's because np.diff() returns one less point than the original input.
I am trying to find stdev for a sequence of numbers that were extracted from combinations of dice (30) that sum up to 120. I am very new to Python, so this code makes the console freeze because the numbers are endless and I am not sure how to fit them all into a smaller, more efficient function. What I did is:
found all possible combinations of 30 dice;
filtered combinations that sum up to 120;
multiplied all items in the list within result list;
tried extracting standard deviation.
Here is the code:
import itertools
import numpy
dice = [1,2,3,4,5,6]
subset = itertools.product(dice, repeat = 30)
result = []
for x in subset:
if sum(x) == 120:
result.append(x)
my_result = numpy.product(result, axis = 1).tolist()
std = numpy.std(my_result)
print(std)
Note that D(X^2) = E(X^2) - E(X)^2, you can solve this problem analytically by following equations.
f[i][N] = sum(k*f[i-1][N-k]) (1<=k<=6)
g[i][N] = sum(k^2*g[i-1][N-k])
h[i][N] = sum(h[i-1][N-k])
f[1][k] = k ( 1<=k<=6)
g[1][k] = k^2 ( 1<=k<=6)
h[1][k] = 1 ( 1<=k<=6)
Sample implementation:
import numpy as np
Nmax = 120
nmax = 30
min_value = 1
max_value = 6
f = np.zeros((nmax+1, Nmax+1), dtype ='object')
g = np.zeros((nmax+1, Nmax+1), dtype ='object') # the intermediate results will be really huge, to keep them accurate we have to utilize python big-int
h = np.zeros((nmax+1, Nmax+1), dtype ='object')
for i in range(min_value, max_value+1):
f[1][i] = i
g[1][i] = i**2
h[1][i] = 1
for i in range(2, nmax+1):
for N in range(1, Nmax+1):
f[i][N] = 0
g[i][N] = 0
h[i][N] = 0
for k in range(min_value, max_value+1):
f[i][N] += k*f[i-1][N-k]
g[i][N] += (k**2)*g[i-1][N-k]
h[i][N] += h[i-1][N-k]
result = np.sqrt(float(g[nmax][Nmax]) / h[nmax][Nmax] - (float(f[nmax][Nmax]) / h[nmax][Nmax]) ** 2)
# result = 32128174994365296.0
You ask for a result of an unfiltered lengths of 630 = 2*1023, impossible to handle as such.
There are two possibilities that can be combined:
Include more thinking to pre-treat the problem, e.g. on how to sample only
those with sum 120.
Do a Monte Carlo simulation instead, i.e. don't sample all
combinations, but only a random couple of 1000 to obtain a representative
sample to determine std sufficiently accurate.
Now, I only apply (2), giving the brute force code:
N = 30 # number of dices
M = 100000 # number of samples
S = 120 # required sum
result = [[random.randint(1,6) for _ in xrange(N)] for _ in xrange(M)]
result = [s for s in result if sum(s) == S]
Now, that result should be comparable to your result before using numpy.product ... that part I couldn't follow, though...
Ok, if you are out after the standard deviation of the product of the 30 dices, that is what your code does. Then I need 1 000 000 samples to get roughly reproducible values for std (1 digit) - takes my PC about 20 seconds, still considerably less than 1 million years :-D.
Is a number like 3.22*1016 what you are looking for?
Edit after comments:
Well, sampling the frequency of numbers instead gives only 6 independent variables - even 4 actually, by substituting in the constraints (sum = 120, total number = 30). My current code looks like this:
def p2(b, s):
return 2**b * 3**s[0] * 4**s[1] * 5**s[2] * 6**s[3]
hits = range(31)
subset = itertools.product(hits, repeat=4) # only 3,4,5,6 frequencies
product = []
permutations = []
for s in subset:
b = 90 - (2*s[0] + 3*s[1] + 4*s[2] + 5*s[3]) # 2 frequency
a = 30 - (b + sum(s)) # 1 frequency
if 0 <= b <= 30 and 0 <= a <= 30:
product.append(p2(b, s))
permutations.append(1) # TODO: Replace 1 with possible permutations
print numpy.std(product) # TODO: calculate std manually, considering permutations
This computes in about 1 second, but the confusing part is that I get as a result 1.28737023733e+17. Either my previous approaches or this one has a bug - or both.
Sorry - not that easy: The sampling is not of the same probability - that is the problem here. Each sample has a different number of possible combinations, giving its weight, which has to be considered before taking the std-deviation. I have drafted that in the code above.
I have a code which generates either 0 or 9 randomly. This code is run 289 times...
import random
track = 0
if track < 35:
val = random.choice([0, 9])
if val == 9:
track += 1
else:
val = 0
According to this code, if 9 is generated 35 times, then 0 is generated. So there is a heavy bias at the start and in the end 0 is mostly output.
Is there a way to reduce this bias so that the 9's are spread out quite evenly in 289 times.
Thanks for any help in advance
Apparently you want 9 to occur 35 times, and 0 to occur for the remainder - but you want the 9's to be evenly distributed. This is easy to do with a shuffle.
values = [9] * 35 + [0] * (289 - 35)
random.shuffle(values)
It sounds like you want to add some bias to the numbers that are generated by your script. Accordingly, you'll want to think about how you can use probability to assign a correct bias to the numbers being assigned.
For example, let's say you want to generate a list of 289 integers where there is a maximum of 35 nines. 35 is approximately 12% of 289, and as such, you would assign a probability of .12 to the number 9. From there, you could assign some other (relatively small) probability to the numbers 1 - 8, and some relatively large probability to the number 0.
Walker's Alias Method appears to be able to do what you need for this problem.
General Example (strings A B C or D with probabilities .1 .2 .3 .4):
abcd = dict( A=1, D=4, C=3, B=2 )
# keys can be any immutables: 2d points, colors, atoms ...
wrand = Walkerrandom( abcd.values(), abcd.keys() )
wrand.random() # each call -> "A" "B" "C" or "D"
# fast: 1 randint(), 1 uniform(), table lookup
Specific Example:
numbers = dict( 1=725, 2=725, 3=725, 4=725, 5=725, 6=725, 7=725, 8=725, 9=12, 0=3 )
wrand = Walkerrandom( numbers.values(), numbers.keys() )
#Add looping logic + counting logic to keep track of 9's here
track = 0
i = 0
while i < 290
if track < 35:
val = wrand.random()
if val == 9:
track += 1
else:
val = 0
i += 1
The book Calculus and Pizza by Clifford Pickover has a few code examples here and there, all written in some dialect of BASIC.
I wrote a Python version of the code example covering integration. His BASIC example goes like:
10 REM Integration
20 DEF FNY(X) = X*X*X
30 A = 0
40 B = 1
50 N = 10
55 R = 0
60 H = (B-A)/N
70 FOR X = A TO B - H/2 STEP H
80 R = R + FNY(X)
90 NEXT X
100 R = R * H
110 PRINT *INTEGRATION ESTIMATE*: R
I changed a few things here and there, allowing the user to specify the interval over which to take the integral, specify the function to be integrated as a lambda, and so forth. I knew right off the bat that the for loop wouldn't work as I have written it below. I'm just wondering if there's some direct or idiomatic translation of the BASIC for to a Python for.
def simpleintegration():
f = eval(input("specify the function as a lambda\n:%"))
a = int(input("take the integral from x = a = ...\n:%"))
b = int(input("to x = b = ...\n:%"))
n = 10
r = 0
h = (b-a)/n
for x in range(a,b-h/2,h):
r = r + f(x)
r = r * h
print(r)
Your translation isn't far off. The only difference between the for loop in other languages and Python's "loop-over-a-range" pattern is that the "stop" value is usually inclusive in other languages, but is exclusive in Python.
Thus, in most other languages, a loop including a and b looks like
for i = a to b step c
' Do stuff
next i
In Python, it would be
for i in range(a, b + 1, c):
# Do stuff
The formula is computing the Riemann sums using the values at the left end of the subdivision intervals. Thus the last used value for X should be B-H.
Due to floating point errors, stepping from A by H can give a last value that is off by some small amount, thus B-H is not a good bound (in the BASIC code) and B-H/2 is used to stop before X reaches B.
The Python code should work in the presented form for the same reasons, since the bound B-H/2 is unreachable, thus the range should stop with B-H or a value close by.
Using a slight modification you can actually compute the trapezoidal approximation, where you initialize with R=f(A)/2, step X from A+H to including B-H adding f(X) to R and then finish by adding f(B)/2 (which could already be done in the initialization). As before, the approximation of the integral is then R*H.
You can do as below, just changing iteration of 'i' in for loop.
def simpleintegration():
f = eval(input("specify the function as a lambda\n:%"))
a = int(input("take the integral from x = a = ...\n:%"))
b = int(input("to x = b = ...\n:%"))
n = 10
r = 0
h = (b-a)/n
for x = a to b-h/2 step h:
r = r + f(x)
r = r * h
print(r)