My question is related to the Python Coding of a 1-Dimensional Ising Model using a Markov Chain Monte Carlo method (MCMC).
I have the following Hamiltonian
$$H = - \sum_{i=1}^{L-1}\sigma_{i}sigma_{i+1} - B\sum_{i=1}^{L}\sigma_{i}$$
I want to write a python function that generates a Markov chain where at each step, it calculates and saves the magnetization (per site) and the energy.
The energy is (=Hamiltonian) and I will define the Magnetization as:
$$\frac{1}{L}\sum_{i}\sigma_{i}$$
My probability distribution would be:
$$p(x) = e^{-H\beta}$$ where, $T^{-1} = \beta$
For the Markov Chain I will implement a Metropolis-Hastings Algorithim;
if $$\frac{P(\sigma')}{P(\sigma)} = e^{(H(\sigma)-H(\sigma'))\beta}$$
My idea would be to accept transitions when
$$H(\sigma') < H(\sigma)$$
and to only accept transitions
$$H(\sigma') > H(\sigma)$$
with the probability
$$P = e^{(H(\sigma)-H(\sigma'))\beta}$$
So let me set a few parameters such as:
$L=20$ - Lattice Size
$T=2$ - Temperature
$B=0$ - Magnetic Field
I will need to plot a histogram of the magnetization and energy vs step size after the calculations. I have no issue with this part.
My python knowledge isn't great but I have included my rough (uncompleted) draft. I don't think I am making much progress. Any help would be great.
#Coding attempt MCMC 1-Dimensional Ising Model
import numpy as np
import matplotlib.pyplot as plt
#Shape of Lattice L
L = 20
Shape = (20,20)
#Spin Configuration
spins = np.random.choice([-1,1],Shape)
#Magnetic moment
moment = 1
#External magnetic field
field = np.full(Shape, 0)
#Temperature
Temperature = 2
Beta = Temperature**(-1)
#Interaction (ferromagnetic if positive, antiferromagnetic if negative)
interaction = 1
#Using Probability Distribution given
def get_probability(Energy1, Energy2, Temperature):
return np.exp((Energy1 - Energy2) / Temperature)
def get_energy(spins):
return -np.sum(
interaction * spins * np.roll(spins, 1, axis=0) +
interaction * spins * np.roll(spins, -1, axis=0) +
interaction * spins * np.roll(spins, 1, axis=1) +
interaction * spins * np.roll(spins, -1, axis=1)
)/2 - moment * np.sum(field * spins)
#Introducing Metropolis Hastings Algorithim
x_now = np.random.uniform(-1, 1) #initial value
d = 10**(-1) #delta
y = []
for i in range(L-1):
#generating next value
x_proposed = np.random.uniform(x_now - d, x_now + d)
#accepting or rejecting the value
if np.random.rand() < np.exp(-np.abs(x_proposed))/(np.exp(-np.abs(x_now))):
x_now = x_proposed
if i % 100 == 0:
y.append(x_proposed)
Here I changed your code to solve the problem the way I always do.
Please, check the code and formulas very carefully
#Coding attempt MCMC 1-Dimensional Ising Model
import numpy as np
import matplotlib.pyplot as plt
#Shape of Lattice L
L = 20
#Shape = (20)
#Number of Monte Carlo samples
MC_samples=1000
#Spin Configuration
spins = np.random.choice([-1,1],L)
print(spins)
#Magnetic moment
moment = 1
#External magnetic field
field = 0
#Temperature
Temperature = 2
Beta = Temperature**(-1)
#Interaction (ferromagnetic if positive, antiferromagnetic if negative)
interaction = 1
#Using Probability Distribution given
def get_probability(delta_energy, Temperature):
return np.exp(-delta_energy / Temperature)
def get_energy(spins):
energy=0
for i in range(L):
energy=energy+interaction*spins[i-1]*spins[i]
energy= energy-field*sum(spins)
return energy
def delta_energy(spins,random_spin):
#If you do flip one random spin, the change in energy is:
#(By using a reduced formula that only involves the spin
# and its neighbours)
if random_spin==L:
PBC=0
else:
PBC=random_spin+1
return -2*interaction*(spins[random_spin-1]*spins[random_spin]+
spins[random_spin]*spins[PBC]+field*spins[random_spin])
#Introducing Metropolis Hastings Algorithim
#x_now = np.random.uniform(-1, 1) #initial value
#d = 10**(-1) #delta
#y = []
magnetization=[]
energy=[]
for i in range(MC_samples):
#Each Monte Carlo step consists in L random spin moves
for j in range(L):
#Choosing a random spin
random_spin=np.random.randint(L-1,size=(1))
#Compuing the change in energy of this spin flip
delta=delta_energy(spins,random_spin)
#Metropolis accept-rejection:
if delta<0:
#Accept the move if its negative
spins[random_spin]=-spins[random_spin]
else:
#If its positive, we compute the probability
probability=get_probability(delta,Temperature)
random=np.random.rand()
if random<=probability:
#Accept de move
spins[random_spin]=-spins[random_spin]
#Afer the MC step, we measure the system
magnetization.append(sum(spins)/L)
energy.append(get_energy(spins))
print(magnetization,energy)
#Do histograms and plots
At the end of the simulation, the variables magnetization and energy are arrays that contain the measured values at each MC step.
You can directly use these arrays to compute the histograms and plots.
Note: The energy array, is the total energy of the system, not the energy/L.
I was looking for a simple implementation of a 1D Ising model, and came across this post. While I am no expert on the field, I did write my masters on a related topic. I implemented the code in Oriol Cabanas Tirapu's answer, and found a few bugs (I think).
Below is my adapted version oh their code. Hopefully it is useful for someone.
#Coding attempt MCMC 1-Dimensional Ising Model
import numpy as np
import matplotlib.pyplot as plt
#Using Probability Distribution given
def get_probability(delta_energy, Temperature):
return np.exp(-delta_energy / Temperature)
def get_energy(spins):
energy=0
for i in range(len(spins)):
energy=energy+interaction*spins[i-1]*spins[i]
energy= energy-field*sum(spins)
return energy
def delta_energy(spins,random_spin):
#If you do flip one random spin, the change in energy is:
#(By using a reduced formula that only involves the spin
# and its neighbours)
if random_spin==L-1:
PBC=0
else:
PBC=random_spin+1
old = -interaction*(spins[random_spin-1]*spins[random_spin] + spins[random_spin]*spins[PBC]) - field*spins[random_spin]
new = interaction*(spins[random_spin-1]*spins[random_spin] + spins[random_spin]*spins[PBC]) + field*spins[random_spin]
return new-old
def metropolis(L = 100, MC_samples=1000, Temperature = 1, interaction = 1, field = 0):
# intializing
#Spin Configuration
spins = np.random.choice([-1,1],L)
Beta = Temperature**(-1)
#Introducing Metropolis Hastings Algorithim
data = []
magnetization=[]
energy=[]
for i in range(MC_samples):
#Each Monte Carlo step consists in L random spin moves
for j in range(L):
#Choosing a random spin
random_spin=np.random.randint(0,L,size=(1))
#Compuing the change in energy of this spin flip
delta=delta_energy(spins,random_spin)
#Metropolis accept-rejection:
if delta<0:
#Accept the move if its negative
spins[random_spin]=-spins[random_spin]
#print('change')
else:
#If its positive, we compute the probability
probability=get_probability(delta,Temperature)
random=np.random.rand()
if random<=probability:
#Accept de move
spins[random_spin]=-spins[random_spin]
data.append(list(spins))
#Afer the MC step, we measure the system
magnetization.append(sum(spins)/L)
energy.append(get_energy(spins))
return data,magnetization,energy
def record_state_statistics(data,n=4):
ixs = tuple()
sub_sample = [[d[i] for i in range(n)] for d in data]
# get state number
state_nums = [int(sum([((j+1)/2)*2**i for j,i in zip(reversed(d),range(len(d)))])) for d in sub_sample]
return state_nums
# setting up problem
L = 200 # size of system
MC_samples = 1000 # number of samples
Temperature = 1 # "temperature" parameter
interaction = 1 # Strength of interaction between nearest neighbours
field = 0 # external field
# running MCMC
data = metropolis(L = L, MC_samples = MC_samples, Temperature = Temperature, interaction = interaction, field = field)
results = record_state_statistics(data[0],n=4) # I was also interested in the probability of each micro-state in a sub-section of the system
# Plotting
plt.figure(figsize=(20,10))
plt.subplot(2,1,1)
plt.imshow(np.transpose(data[0]))
plt.xticks([])
plt.yticks([])
plt.axis('tight')
plt.ylabel('Space',fontdict={'size':20})
plt.title('Critical dynamics in a 1-D Ising model',fontdict={'size':20})
plt.subplot(2,1,2)
plt.plot(data[2],'r')
plt.xlim((0,MC_samples))
plt.xticks([])
plt.yticks([])
plt.ylabel('Energy',fontdict={'size':20})
plt.xlabel('Time',fontdict={'size':20})
Related
I am working with a Markov Chain Monte Carlo algorithm (Metropolis-Hastings Algorithm) to find the best fit for experimental data using model data. I have a function called evaluation that takes in two arguments, theta and phi. I am using this function to calculate both experimental and model data for the trajectory of a particle. Note: I am creating my own experimental data using the function to see if my program works before I use actual experimental data.
Here is the code:
def evaluation(theta,phi): ### For creating model/experimental data
velocity_x[0] = v0*np.sin(theta)*np.cos(phi) ### Initial values for velocities
velocity_y[0] = v0*np.sin(theta)*np.sin(phi)
velocity_z[0] = v0*np.cos(theta)
for i in range(len(actual_y) - 1): ### Loop over experimental/model trajectory
velocity = np.array([velocity_x[i],velocity_y[i],velocity_z[i]])
cross_product = np.cross(velocity,Bz)
### Calculate subsequent velocities for model/experimental
velocity_x[i+1] = velocity_x[i] #+ const*cross_product[0]*dt / gamma_2
velocity_y[i+1] = velocity_y[i] #+ const*cross_product[1]*dt / gamma_2
velocity_z[i+1] = velocity_z[i] #+ const*cross_product[2]*dt / gamma_2
xmodel[i+1] = xmodel[i] + velocity_x[i]*dt #+ 0.5*const*cross_product[0]*dt / gamma_2
ymodel[i+1] = ymodel[i] + velocity_y[i]*dt #+ 0.5*const*cross_product[1]*dt / gamma_2
zmodel[i+1] = zmodel[i] + velocity_z[i]*dt #+ 0.5*const*cross_product[2]*dt / gamma_2
return xmodel, ymodel, zmodel ### Returns x,y,z model data
def calculate_error(actualx, modelx, actualy, modely, actualz, modelz, sigma = 400):
chi_squared = np.zeros(len(actual_x))
for i in range(len(actual_x)):
for j in range(len(actual_x)):
chi_squared[i] = (actualx[i] - modelx[j])**2 + (actualy[i] - modely[j])**2 + (actualz[i] - modelz[j])**2
return min(chi_squared)
thetas = [1.37] ### In radians; initial guess for thetas and phis
phis = [0.187]
chi = [] ### These lists store the values after MC calculations
num_sample = 1000 ### Number of samples
theta_step_size = 0.01
phi_step_size = 0.01
### x,y,and z model data with initial guess for thetas and phis
x_rand = evaluation(thetas,phis)[0]
y_rand = evaluation(thetas,phis)[1]
z_rand = evaluation(thetas,phis)[2]
error = calculate_error(x_exp_data,x_rand,y_exp_data,y_rand,z_exp_data,z_rand) ### Error
chi.append(error) ### error
for i in range(num_sample): ### Begin Monte Carlo loop
theta0 = thetas[-1]
phi0 = phis[-1]
theta1 = theta0 + np.random.normal()*theta_step_size ### Take random step
phi1 = phi0 + np.random.normal()*phi_step_size
x_exp_data = evaluation(1.5705,0)[0] ### Experimental data should stay constant with defined arguments
y_exp_data = evaluation(1.5705,0)[1]
z_exp_data = evaluation(1.5705,0)[2]
x_rand = evaluation(theta1,phi1)[0]
y_rand = evaluation(theta1,phi1)[1]
z_rand = evaluation(theta1,phi1)[2] ### Evaluate x,y,z exp data with random thetas and phis
error_1 = calculate_error(x_exp_data,x_rand,y_exp_data,y_rand,z_exp_data,z_rand)
#print('x:',x_rand[0:5], 'X-Exp:', x_exp_data[0:5])
P = np.exp(-error_1 + error) ### Acceptance probability
r = np.random.uniform() ### Generating uniform number (numbers are equally likely to be chosen)
print('Exp X:', x_exp_data, 'X Rand:', x_rand)
if r < P: ### Condition that accepts the theta and phi values in current iteration
thetas.append(theta1)
phis.append(phi1)
chi.append(error_1)
#print('Error 1:',error_1, 'Error:', error, 'Phi:',phi1, 'Theta:',theta1, 'i:',i)
error = error_1
The problem that I am having is that x_exp_data, y_exp_data, and z_exp_data don't seem to staying constant despite the arguments staying constant inside the Monte Carlo loop: it seems to be appending theta1 and phi1 values for each iteration, resulting in the error to be zero for all iterations. This should not be the case since the experimental data uses theta,phi = 1.5705, 0 while the model data uses 1.37 and 0.187 for theta and phi, respectively, and changes with each random step. I am not sure why x_exp_data, y_exp_data, and z_exp_data are also appending the new random step values when they're arguments are clearly defined. I have also tried defining the experimental data outside of the Monte Carlo loop, but this didn't change how the code is working. Any help or suggestions would be appreciated.
I am triyng to use scipy curve_fit function to fit a gaussian function to my data to estimate a theoretical power spectrum density. While doing so, the curve_fit function always return the initial parameters (p0=[1,1,1]) , thus telling me that the fitting didn't work.
I don't know where the issue is. I am using python 3.9 (spyder 5.1.5) from the anaconda distribution on windows 11.
here a Wetransfer link to the data file
https://wetransfer.com/downloads/6097ebe81ee0c29ee95a497128c1c2e420220704110130/86bf2d
Here is my code below. Can someone tell me what the issue is, and how can i solve it?
on the picture of the plot, the blue plot is my experimental PSD and the orange one is the result of the fit.
import numpy as np
import math
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import scipy.constants as cst
File = np.loadtxt('test5.dat')
X = File[:, 1]
Y = File[:, 2]
f_sample = 50000
time=[]
for i in range(1,len(X)+1):
t=i*(1/f_sample)
time= np.append(time,t)
N = X.shape[0] # number of observation
N1=int(N/2)
delta_t = time[2] - time[1]
T_mes = N * delta_t
freq = np.arange(1/T_mes, (N+1)/T_mes, 1/T_mes)
freq=freq[0:N1]
fNyq = f_sample/2 # Nyquist frequency
nb = 350
freq_block = []
# discrete fourier tansform
X_ft = delta_t*np.fft.fft(X, n=N)
X_ft=X_ft[0:N1]
plt.figure()
plt.plot(time, X)
plt.xlabel('t [s]')
plt.ylabel('x [micro m]')
# Experimental power spectrum on both raw and blocked data
PSD_X_exp = (np.abs(X_ft)**2/T_mes)
PSD_X_exp_b = []
STD_PSD_X_exp_b = []
for i in range(0, N1+2, nb):
freq_b = np.array(freq[i:i+nb]) # i-nb:i
psd_b = np.array(PSD_X_exp[i:i+nb])
freq_block = np.append(freq_block, (1/nb)*np.sum(freq_b))
PSD_X_exp_b = np.append(PSD_X_exp_b, (1/nb)*np.sum(psd_b))
STD_PSD_X_exp_b = np.append(STD_PSD_X_exp_b, PSD_X_exp_b/np.sqrt(nb))
plt.figure()
plt.loglog(freq, PSD_X_exp)
plt.legend(['Raw Experimental PSD'])
plt.xlabel('f [Hz]')
plt.ylabel('PSD')
plt.figure()
plt.loglog(freq_block, PSD_X_exp_b)
plt.legend(['Experimental PSD after blocking'])
plt.xlabel('f [Hz]')
plt.ylabel('PSD')
kB = cst.k # Boltzmann constant [m^2kg/s^2K]
T = 273.15 + 25 # Temperature [K]
r = (2.8 / 2) * 1e-6 # Particle radius [m]
v = 0.00002414 * 10 ** (247.8 / (-140 + T)) # Water viscosity [Pa*s]
gamma = np.pi * 6 * r * v # [m*Pa*s]
Do = kB*T/gamma # expected value for D
f3db_o = 50000 # expected value for f3db
fc_o = 300 # expected value pour fc
n = np.arange(-10,11)
def theo_spectrum_lorentzian_filter(x, D_, fc_, f3db_):
PSD_theo=[]
for i in range(0,len(x)):
# print(i)
psd_theo=np.sum((((D_*Do)/2*math.pi**2)/((fc_*fc_o)**2+(x[i]+n*f_sample)
** 2))*(1/(1+((x[i]+n*f_sample)/(f3db_*f3db_o))**2)))
PSD_theo= np.append(PSD_theo,psd_theo)
return PSD_theo
popt, pcov = curve_fit(theo_spectrum_lorentzian_filter, freq_block, PSD_X_exp_b, p0=[1, 1, 1], sigma=STD_PSD_X_exp_b, absolute_sigma=True, check_finite=True,bounds=(0.1, 10), method='trf', jac=None)
D_, fc_, f3db_ = popt
D1 = D_*Do
fc1 = fc_*fc_o
f3db1 = f3db_*f3db_o
print('Diffusion constant D = ', D1, ' Corner frequency fc= ',fc1, 'f3db(diode,eff)= ', f3db1)
I believe I've successfully fitted your data. Here's the approach I took.
First, I plotted your model (with popt=[1, 1, 1]) and the data you had. I noticed your data was significantly lower than the model. Then I started fiddling with the parameters. I wanted to push the model upwards. I did that by multiplying popt[0] by increasingly large values. I ended up with 1E13 as a ballpark value. Note that I have no idea if this is physically possible for your model. Then I jury-rigged your fitting function to multiply D_ by 1E13 and ran your code. I got this fit:
So I believe it's a problem of 1) inappropriate starting values and 2) inappropriate bounds. In your position, I would revise this model, check if there's any problems with units and so on.
Here's what I used to try to fit your model:
plt.figure()
plt.loglog(freq_block[:170], PSD_X_exp_b[:170], label='Exp')
plt.loglog(freq_block[:170],
theo_spectrum_lorentzian_filter(
freq_block[:170],
1E13*popt[0], popt[1], popt[2]),
label='model'
)
plt.xlabel('f [Hz]')
plt.ylabel('PSD')
plt.legend()
I limited the data to point 170 because there were some weird backwards values that made me uncomfortable. I would recheck them if I were you.
Here's the model code I used. I didn't change the curve_fit call (except to limit x to :170.
def theo_spectrum_lorentzian_filter(x, D_, fc_, f3db_):
PSD_theo=[]
D_ = 1E13*D_ # I only changed here
for i in range(0,len(x)):
psd_theo=np.sum((((D_*Do)/2*math.pi**2)/((fc_*fc_o)**2+(x[i]+n*f_sample)
** 2))*(1/(1+((x[i]+n*f_sample)/(f3db_*f3db_o))**2)))
PSD_theo= np.append(PSD_theo,psd_theo)
return PSD_theo
We have N users with P avg. points per user, where each point is a single value between 0 and 1. We need to distribute the mass of each point using a normal distribution with a known density of 0.05 as the points have some uncertainty. Additionally, we need to wrap the mass around 0 and 1 such that e.g. a point at 0.95 will also allocate mass around 0. I've provided a working example below, which bins the normal distribution into D=50 bins. The example uses the Python typing module, but you can ignore that if you'd like.
from typing import List, Any
import numpy as np
import scipy.stats
import matplotlib.pyplot as plt
D = 50
BINS: List[float] = np.linspace(0, 1, D + 1).tolist()
def probability_mass(distribution: Any, x0: float, x1: float) -> float:
"""
Computes the area under the distribution, wrapping at 1.
The wrapping is done by adding the PDF at +- 1.
"""
assert x1 > x0
return (
(distribution.cdf(x1) - distribution.cdf(x0))
+ (distribution.cdf(x1 + 1) - distribution.cdf(x0 + 1))
+ (distribution.cdf(x1 - 1) - distribution.cdf(x0 - 1))
)
def point_density(x: float) -> List[float]:
distribution: Any = scipy.stats.norm(loc=x, scale=0.05)
density: List[float] = []
for i in range(D):
density.append(probability_mass(distribution, BINS[i], BINS[i + 1]))
return density
def user_density(points: List[float]) -> Any:
# Find the density of each point
density: Any = np.array([point_density(p) for p in points])
# Combine points and normalize
combined = density.sum(axis=0)
return combined / combined.sum()
if __name__ == "__main__":
# Example for one user
data: List[float] = [.05, .3, .5, .5]
density = user_density(data)
# Example for multiple users (N = 2)
print([user_density(x) for x in [[.3, .5], [.7, .7, .7, .9]]])
### NB: THE REMAINING CODE IS FOR ILLUSTRATION ONLY!
### NB: THE IMPORTANT THING IS TO COMPUTE THE DENSITY FAST!
middle: List[float] = []
for i in range(D):
middle.append((BINS[i] + BINS[i + 1]) / 2)
plt.bar(x=middle, height=density, width=1.0 / D + 0.001)
plt.xlim(0, 1)
plt.xlabel("x")
plt.ylabel("Density")
plt.show()
In this example N=1, D=50, P=4. However, we want to scale this approach to N=10000 and P=100 while being as fast as possible. It's unclear to me how we'd vectorize this approach. How do we best speed up this?
EDIT
The faster solution can have slightly different results. For instance, it could approximate the normal distribution instead of using the precise normal distribution.
EDIT2
We only care about computing density using the user_density() function. The plot is only to help explain the approach. We do not care about the plot itself :)
EDIT3
Note that P is the avg. points per user. Some users may have more and some may have less. If it helps, you can assume that we can throw away points such that all users have a max of 2 * P points. It's fine to ignore this part while benchmarking as long as the solution can handle a flexible # of points per user.
You could get below 50ms for largest case (N=10000, AVG[P]=100, D=50) by using using FFT and creating data in numpy friendly format. Otherwise it will be closer to 300 msec.
The idea is to convolve a single normal distribution centered at 0 with a series Dirac deltas.
See image below:
Using circular convolution solves two issues.
naturally deals with wrapping at the edges
can be efficiently computed with FFT and Convolution Theorem
First one must create a distribution to be copied. Function mk_bell() created a histogram of a normal distribution of stddev 0.05 centered at 0.
The distribution wraps around 1. One could use arbitrary distribution here. The spectrum of the distribution is computed are used for fast convolution.
Next a comb-like function is created. The peaks are placed at indices corresponding to peaks in user density. E.g.
peaks_location = [0.1, 0.3, 0.7]
D = 10
maps to
peak_index = (D * peak_location).astype(int) = [1, 3, 7]
dist = [0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0] # ones at [1, 3, 7]
You can quickly create a composition of Diract Deltas by computing indices of the bins for each peak location with help of np.bincount() function.
In order to speed things even more one can compute comb-functions for user-peaks in parallel.
Array dist is 2D-array of shape NxD. It can be linearized to 1D array of shape (N*D). After this change element on position [user_id, peak_index] will be accessible from index user_id*D + peak_index.
With numpy-friendly input format (described below) this operation is easily vectorized.
The convolution theorem says that spectrum of convolution of two signals is equal to product of spectrums of each signal.
The spectrum is compute with numpy.fft.rfft which is a variant of Fast Fourier Transfrom dedicated to real-only signals (no imaginary part).
Numpy allows to compute FFT of each row of the larger matrix with one command.
Next, the spectrum of convolution is computed by simple multiplication and use of broadcasting.
Next, the spectrum is computed back to "time" domain by Inverse Fourier Transform implemented in numpy.fft.irfft.
To use the full speed of numpy one should avoid variable size data structure and keep to fixed size arrays. I propose to represent input data as three arrays.
uids the identifier for user, integer 0..N-1
peaks, the location of the peak
mass, the mass of the peek, currently it is 1/numer-of-peaks-for-user
This representation of data allows quick vectorized processing.
Eg:
user_data = [[0.1, 0.3], [0.5]]
maps to:
uids = [0, 0, 1] # 2 points for user_data[0], one from user_data[1]
peaks = [0.1, 0.3, 0.5] # serialized user_data
mass = [0.5, 0.5, 1] # scaling factors for each peak, 0.5 means 2 peaks for user 0
The code:
import numpy as np
import matplotlib.pyplot as plt
import time
def mk_bell(D, SIGMA):
# computes normal distribution wrapped and centered at zero
x = np.linspace(0, 1, D, endpoint=False);
x = (x + 0.5) % 1 - 0.5
bell = np.exp(-0.5*np.square(x / SIGMA))
return bell / bell.sum()
def user_densities_by_fft(uids, peaks, mass, D, N=None):
bell = mk_bell(D, 0.05).astype('f4')
sbell = np.fft.rfft(bell)
if N is None:
N = uids.max() + 1
# ensure that peaks are in [0..1) internal
peaks = peaks - np.floor(peaks)
# convert peak location from 0-1 to the indices
pidx = (D * (peaks + uids)).astype('i4')
dist = np.bincount(pidx, mass, N * D).reshape(N, D)
# process all users at once with Convolution Theorem
sdist = np.fft.rfft(dist)
sdist *= sbell
res = np.fft.irfft(sdist)
return res
def generate_data(N, Pmean):
# generateor for large data
data = []
for n in range(N):
# select P uniformly from 1..2*Pmean
P = np.random.randint(2 * Pmean) + 1
# select peak locations
chunk = np.random.uniform(size=P)
data.append(chunk.tolist())
return data
def make_data_numpy_friendly(data):
uids = []
chunks = []
mass = []
for uid, peaks in enumerate(data):
uids.append(np.full(len(peaks), uid))
mass.append(np.full(len(peaks), 1 / len(peaks)))
chunks.append(peaks)
return np.hstack(uids), np.hstack(chunks), np.hstack(mass)
D = 50
# demo for simple multi-distribution
data, N = [[0, .5], [.7, .7, .7, .9], [0.05, 0.3, 0.5, 0.5]], None
uids, peaks, mass = make_data_numpy_friendly(data)
dist = user_densities_by_fft(uids, peaks, mass, D, N)
plt.plot(dist.T)
plt.show()
# the actual measurement
N = 10000
P = 100
data = generate_data(N, P)
tic = time.time()
uids, peaks, mass = make_data_numpy_friendly(data)
toc = time.time()
print(f"make_data_numpy_friendly: {toc - tic}")
tic = time.time()
dist = user_densities_by_fft(uids, peaks, mass, D, N)
toc = time.time()
print(f"user_densities_by_fft: {toc - tic}")
The results on my 4-core Haswell machine are:
make_data_numpy_friendly: 0.2733159065246582
user_densities_by_fft: 0.04064297676086426
It took 40ms to process the data. Notice that processing data to numpy friendly format takes 6 times more time than the actual computation of distributions.
Python is really slow when it comes to looping.
Therefore I strongly recommend to generate input data directly in numpy-friendly way in the first place.
There are some issues to be fixed:
precision, can be improved by using larger D and downsampling
accuracy of peak location could be further improved by widening the spikes.
performance, scipy.fft offers move variants of FFT implementation that may be faster
This would be my vectorized approach:
data = np.array([0.05, 0.3, 0.5, 0.5])
np.random.seed(31415)
# random noise
randoms = np.random.normal(0,1,(len(data), int(1e5))) * 0.05
# samples with noise
samples = data[:,None] + randoms
# wrap [0,1]
samples = (samples % 1).ravel()
# histogram
hist, bins, patches = plt.hist(samples, bins=BINS, density=True)
Output:
I was able to reduce the time from about 4 seconds per sample of 100 datapoints to about 1 ms per sample.
It looks to me like you're spending quite a lot of time simulating a very large number of normal distributions. Since you're dealing with a very large sample size anyway, you may as well just use standard normal distribution values, because it'll all just average out anyway.
I recreated your approach (BaseMethod class), then created an optimized class (OptimizedMethod class), and evaluated them using a timeit decorator. The primary difference in my approach is the following line:
# Generate a standardized set of values to add to each sample to simulate normal distribution
self.norm_vals = np.array([norm.ppf(x / norm_val_n) * 0.05 for x in range(1, norm_val_n, 1)])
This creates a generic set of datapoints based on an inverse normal cumulative distribution function that we can add to each datapoint to simulate a normal distribution around that point. Then we just reshape the data into user samples and run np.histogram on the samples.
import numpy as np
import scipy.stats
from scipy.stats import norm
import time
# timeit decorator for evaluating performance
def timeit(method):
def timed(*args, **kw):
ts = time.time()
result = method(*args, **kw)
te = time.time()
print('%r %2.2f ms' % (method.__name__, (te - ts) * 1000 ))
return result
return timed
# Define Variables
N = 10000
D = 50
P = 100
# Generate sample data
np.random.seed(0)
data = np.random.rand(N, P)
# Run OP's method for comparison
class BaseMethod:
def __init__(self, d=50):
self.d = d
self.bins = np.linspace(0, 1, d + 1).tolist()
def probability_mass(self, distribution, x0, x1):
"""
Computes the area under the distribution, wrapping at 1.
The wrapping is done by adding the PDF at +- 1.
"""
assert x1 > x0
return (
(distribution.cdf(x1) - distribution.cdf(x0))
+ (distribution.cdf(x1 + 1) - distribution.cdf(x0 + 1))
+ (distribution.cdf(x1 - 1) - distribution.cdf(x0 - 1))
)
def point_density(self, x):
distribution = scipy.stats.norm(loc=x, scale=0.05)
density = []
for i in range(self.d):
density.append(self.probability_mass(distribution, self.bins[i], self.bins[i + 1]))
return density
#timeit
def base_user_density(self, data):
n = data.shape[0]
density = np.empty((n, self.d))
for i in range(data.shape[0]):
# Find the density of each point
row_density = np.array([self.point_density(p) for p in data[i]])
# Combine points and normalize
combined = row_density.sum(axis=0)
density[i, :] = combined / combined.sum()
return density
base = BaseMethod(d=D)
# Only running base method on first 2 rows of data because it's slow
density = base.base_user_density(data[:2])
print(density[:2, :5])
class OptimizedMethod:
def __init__(self, d=50, norm_val_n=50):
self.d = d
self.norm_val_n = norm_val_n
self.bins = np.linspace(0, 1, d + 1).tolist()
# Generate a standardized set of values to add to each sample to simulate normal distribution
self.norm_vals = np.array([norm.ppf(x / norm_val_n) * 0.05 for x in range(1, norm_val_n, 1)])
#timeit
def optimized_user_density(self, data):
samples = np.empty((data.shape[0], data.shape[1], self.norm_val_n - 1))
# transform datapoints to normal distributions around datapoint
for i in range(self.norm_vals.shape[0]):
samples[:, :, i] = data + self.norm_vals[i]
samples = samples.reshape(samples.shape[0], -1)
#wrap around [0, 1]
samples = samples % 1
#loop over samples for density
density = np.empty((data.shape[0], self.d))
for i in range(samples.shape[0]):
hist, bins = np.histogram(samples[i], bins=self.bins)
density[i, :] = hist / hist.sum()
return density
om = OptimizedMethod()
#Run optimized method on first 2 rows for apples to apples comparison
density = om.optimized_user_density(data[:2])
#Run optimized method on full data
density = om.optimized_user_density(data)
print(density[:2, :5])
Running on my system, the original method took about 8.4 seconds to run on 2 rows of data, while the optimized method took 1 millisecond to run on 2 rows of data and completed 10,000 rows in 4.7 seconds. I printed the first five values of the first 2 samples for each method.
'base_user_density' 8415.03 ms
[[0.02176227 0.02278653 0.02422535 0.02597123 0.02745976]
[0.0175103 0.01638513 0.01524853 0.01432158 0.01391156]]
'optimized_user_density' 1.09 ms
'optimized_user_density' 4755.49 ms
[[0.02142857 0.02244898 0.02530612 0.02612245 0.0277551 ]
[0.01673469 0.01653061 0.01510204 0.01428571 0.01326531]]
I am aware that SGD has been asked before on SO but I wanted to have an opinion on my code as below:
import numpy as np
import matplotlib.pyplot as plt
# Generating data
m,n = 10000,4
x = np.random.normal(loc=0,scale=1,size=(m,4))
theta_0 = 2
theta = np.append([],[1,0.5,0.25,0.125]).reshape(n,1)
y = np.matmul(x,theta) + theta_0*np.ones(m).reshape((m,1)) + np.random.normal(loc=0,scale=0.25,size=(m,1))
# input features
x0 = np.ones([m,1])
X = np.append(x0,x,axis=1)
# defining the cost function
def compute_cost(X,y,theta_GD):
return np.sum(np.power(y-np.matmul(np.transpose(theta_GD),X),2))/2
# initializations
theta_GD = np.append([theta_0],[theta]).reshape(n+1,1)
alp = 1e-5
num_iterations = 10000
# Batch Sum
def batch(i,j,theta_GD):
batch_sum = 0
for k in range(i,i+9):
batch_sum += float((y[k]-np.transpose(theta_GD).dot(X[k]))*X[k][j])
return batch_sum
# Gradient Step
def gradient_step(theta_current, X, y, alp,i):
for j in range(0,n):
theta_current[j]-= alp*batch(i,j,theta_current)/10
theta_updated = theta_current
return theta_updated
# gradient descent
cost_vec = []
for i in range(num_iterations):
cost_vec.append(compute_cost(X[i], y[i], theta_GD))
theta_GD = gradient_step(theta_GD, X, y, alp,i)
plt.plot(cost_vec)
plt.xlabel('iterations')
plt.ylabel('cost')
I was trying a mini-batch GD with a batch size of 10. I am getting extremely oscillatory behavior for the MSE. Where's the issue? Thanks.
P.S. I was following NG's https://www.coursera.org/learn/machine-learning/lecture/9zJUs/mini-batch-gradient-descent
This is a description of the underlying mathematical principle, not a code based solution...
The cost function is highly nonlinear (np.power()) and recursive and recursive and nonlinear systems can oscillate ( self-oscillation https://en.wikipedia.org/wiki/Self-oscillation ). In mathematics this is subject to chaos theory / theory of nonlinear dynamical systems ( https://pdfs.semanticscholar.org/8e0d/ee3c433b1806bfa0d98286836096f8c2681d.pdf ), cf the Logistic Map
( https://en.wikipedia.org/wiki/Logistic_map ). The logistic map oscillates if the growth factor r exceeds a threshold. The growth factor is a measure for how much energy is in the system.
In your code the critical parts are the cost function, the cost vector, that is the history of the system and the time steps :
def compute_cost(X,y,theta_GD):
return np.sum(np.power(y-np.matmul(np.transpose(theta_GD),X),2))/2
cost_vec = []
for i in range(num_iterations):
cost_vec.append(compute_cost(X[i], y[i], theta_GD))
theta_GD = gradient_step(theta_GD, X, y, alp,i)
# Gradient Step
def gradient_step(theta_current, X, y, alp,i):
for j in range(0,n):
theta_current[j]-= alp*batch(i,j,theta_current)/10
theta_updated = theta_current
return theta_updated
If you compare this to an implementation of the logistic map you see the similarities
from pylab import show, scatter, xlim, ylim
from random import randint
iter = 1000 # Number of iterations per point
seed = 0.5 # Seed value for x in (0, 1)
spacing = .0001 # Spacing between points on domain (r-axis)
res = 8 # Largest n-cycle visible
# Initialize r and x lists
rlist = []
xlist = []
def logisticmap(x, r): <------------------ nonlinear function
return x * r * (1 - x)
# Return nth iteration of logisticmap(x. r)
def iterate(n, x, r):
for i in range(1,n):
x = logisticmap(x, r)
return x
# Generate list values -- iterate for each value of r
for r in [i * spacing for i in range(int(1/spacing),int(4/spacing))]:
rlist.append(r)
xlist.append(iterate(randint(iter-res/2,iter+res/2), seed, r)) <--------- similar to cost_vector, the history of the system
scatter(rlist, xlist, s = .01)
xlim(0.9, 4.1)
ylim(-0.1,1.1)
show()
source of code : https://www.reddit.com/r/learnpython/comments/zzh28/a_simple_python_implementation_of_the_logistic_map/
Basing on this you can try to modify your cost function by introducing a factor similar to the growth factor in the logistic map to reduce the intensity of oscillation of the system
def gradient_step(theta_current, X, y, alp,i):
for j in range(0,n):
theta_current[j]-= alp*batch(i,j,theta_current)/10 <--- introduce a factor somewhere to keep the system under the oscillation threshold
theta_updated = theta_current
return theta_updated
or
def compute_cost(X,y,theta_GD):
return np.sum(np.power(y-np.matmul(np.transpose(theta_GD),X),2))/2 <--- introduce a factor somewhere to keep the system under the oscillation threshold
If this is not working maybe follow the suggestions in https://www.reddit.com/r/MachineLearning/comments/3y9gkj/how_can_i_avoid_oscillations_in_gradient_descent/ ( timesteps,... )
I want to use Hawkes process to model some data. I could not find whether PyMC supports Hawkes process. More specifically I want an observed variable with Hawkes Process and learn a posterior on its params.
If it is not there, then could I define it in PyMC in some way e.g. #deterministic etc.??
It's been quite a long time since your question, but I've worked it out on PyMC today so I'd thought I'd share the gist of my implementation for the other people who might get across the same problem. We're going to infer the parameters λ and α of a Hawkes process. I'm not going to cover the temporal scale parameter β, I'll leave that as an exercise for the readers.
First let's generate some data :
def hawkes_intensity(mu, alpha, points, t):
p = np.array(points)
p = p[p <= t]
p = np.exp(p - t)
return mu + alpha * np.sum(p)
def simulate_hawkes(mu, alpha, window):
t = 0
points = []
lambdas = []
while t < window:
m = hawkes_intensity(mu, alpha, points, t)
s = np.random.exponential(scale=1/m)
ratio = hawkes_intensity(mu, alpha, points, t + s)
t = t + s
if t < window:
points.append(t)
lambdas.append(ratio)
else:
break
points = np.sort(np.array(points, dtype=np.float32))
lambdas = np.array(lambdas, dtype=np.float32)
return points, lambdas
# parameters
window = 1000
mu = 8
alpha = 0.25
points, lambdas = simulate_hawkes(mu, alpha, window)
num_points = len(points)
We just generated some temporal points using some functions that I adapted from there : https://nbviewer.jupyter.org/github/MatthewDaws/PointProcesses/blob/master/Temporal%20points%20processes.ipynb
Now, the trick is to create a matrix of size (num_points, num_points) that contains the temporal distance of the ith point from all the other points. So the (i, j) point of the matrix is the temporal interval separating the ith point to the jth. This matrix will be used to compute the sum of the exponentials of the Hawkes process, ie. the self-exciting part. The way to create this matrix as well as the sum of the exponentials is a bit tricky. I'd recommend to check every line yourself so you can see what they do.
tile = np.tile(points, num_points).reshape(num_points, num_points)
tile = np.clip(points[:, None] - tile, 0, np.inf)
tile = np.tril(np.exp(-tile), k=-1)
Σ = np.sum(tile, axis=1)[:-1] # this is our self-exciting sum term
We have points and we have a matrix containg the sum of the excitations term.
The duration between two consecutive events of a Hawkes process follow an exponential distribution of parameter λ = λ0 + ∑ excitation. This is what we are going to model, but first we have to compute the duration between two consecutive points of our generated data.
interval = points[1:] - points[:-1]
We're now ready for inference:
with pm.Model() as model:
λ = pm.Exponential("λ", 1)
α = pm.Uniform("α", 0, 1)
lam = pm.Deterministic("lam", λ + α * Σ)
interarrival = pm.Exponential(
"interarrival", lam, observed=interval)
trace = pm.sample(2000, tune=4000)
pm.plot_posterior(trace, var_names=["λ", "α"])
plt.show()
print(np.mean(trace["λ"]))
print(np.mean(trace["α"]))
7.829
0.284
Note: the tile matrix can become quite large if you have many data points.