Optimization of time average msd - python

I am tried to perform a simulation of nT = 100 tracks, each of N = 10*10**6 steps of dt = 0.02 and then compute the time average MSD defined in the following:
def calc_msd_1D(x, nLags):
N = len(x)
inv_sq_np = 1./np.sqrt(N)
msd = np.zeros(nLags)
for delta in range(0, nLags):
r = 0;
#msd_array = np.zeros(N)
for i in range(N-(delta)):
r += (x[i+delta] - x[i])**2
msd[delta] = 1/(N-delta) * r
# msd[0] -= 2*np.random.normal(0,1)**2
#msd[1:] += 2*np.random.normal(0,1)**2
return msd
The MSD are hence computed for each trajectories using a class type of structure:
class Trajectory_Analysis_MSD:
def __init__(self,X, Y, nP, dT):
# save parameters
self.dT = dT
self.X = X
self.Y = Y
self.nP = nP
def getMSD(self,nLags):
# initialize memory
self.MSD_x = np.zeros(nLags)
self.MSD_y = np.zeros(nLags)
# calculate the correlations for components
self.MSD_x = calc_msd_1D(self.X, nLags)
self.MSD_y= calc_msd_1D(self.Y, nLags)
# calculate the msd
self.msd= (self.MSD_x + self.MSD_y)
Unfortunately the computation is very expensive in time, at the moment I have to sample the trajectories up to 50000 points to store the msd (average time \approx 26 min). Is there a way I can compute the time average msd for the entire track of each trajectory? Probably without saving each data points?

This for loop:
r = 0
for i in range(N-delta):
r += (x[i+delta] - x[i])**2
Will be very slow for large N because it iterates in pure Python. I'm guessing this (or similar code you have elsewhere) is your bottleneck.
Try to vectorize your code, so all the inner loops run inside numpy, not Python:
r = np.sum((x[delta:N] - x[0:N-delta])**2)
You don't even need the N variable:
msd[delta] = np.mean((x[delta:] - x[:-delta])**2)
And maybe you can use a ready-made function like np.correlate in this case.

Related

Move from one dimension to three-dimension

I've simple simulation setup which is generates one dimensional numpy.arrays using the np.random.normal distribution.
class Brownian_motion_Langevin:
def solve(self):
dB = self.sigma * np.random.normal(size=len(self.steps))
r2 = self.initial_y + np.cumsum(dB)
# Append solutions
self.values = r2
Now I need to change the solve function to return a three dimensional array. The easiest way I know is to rerun the code to get the three one-dimensional arrays, which is not very good! Does anyone suggest any effcient/smart method to implement the function into three dimensions?
Currently, the output of the code is (5, 10001), first element corresponds to number of times the simulation runs and second element is number of steps. What I expect is (5, 10001, 3), here third element is number of dimensions. Here is the complete reproducible code. Thanks!
#!/usr/bin/env python
#
# Python imports
import numpy as np
import h5py
class Brownian_motion_Langevin:
def solve(self):
dB = self.sigma * np.random.normal(size=len(self.steps))
r2 = self.initial_y + np.cumsum(dB)
# Append solutions
self.values = r2
def __init__(self, diffusion_coefficient, initial_y, simulation_time, delta_t):
"""
:param diffusion_coefficient:
:param initial_y: 1
:param delta_t: dt - change in time - size of each interval
:param simulation_time: total time for simulation
"""
# Initial parameters
self.diffusion_coefficient = diffusion_coefficient
self.initial_y = initial_y
# Define time
self.simulation_time = simulation_time
# Get dt
self.delta_t = delta_t
self.steps = np.arange(0, np.floor(self.simulation_time / self.delta_t) + 1)
self.times = self.steps * self.delta_t
# Speed up calculations
self.sigma = (2*self.diffusion_coefficient * self.delta_t)**0.5
# Simulate the diffusion process
self.solution = []
self.solve()
# Define parameters for the process
n = 5 # Number of simulations
Dc = 1 # Dc - Diffusion coefficient
y0 = 0 # y0 - starting point
tt = 1e2 # tt - total time for each simulation
dt = 0.01 # dt - integration time step
# Run simulations
motions = []
for i in range(0, n):
motions.append(Brownian_motion_Langevin(diffusion_coefficient=Dc,
initial_y=y0,
simulation_time=tt,
delta_t=dt))
values = np.array([m.values for m in motions])
print(values.shape) # this outputs the (5, 10001)
####
# FIXME make values 3-dimensional

Is it possible to loop to a certain value and carry on further calculations with this value?

I am new here and new in programming, so excuse me if the question is not formulated clearly enough.
For a uni assignment, my labpartner and I are programming a predator-prey system.
In this predator-prey system, there is a certain load factor 'W0'.
We want to find a load factor W0, accurate to 5 significant digits, for which applies that there will never be less than 250 predators (wnum[1] in our code). We want to find this value of W0 and we need the code to carry on further calculations with this found value of W0. Here is what we've tried so far, but python does not seem to give any response:
# Import important stuff and settings
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
print ('Results of Group 4')
def W0():
W0 = 2.0
while any(wnum[1])<250:
W0 = W0-0.0001
return W0
def W(t):
if 0 <= t < 3/12:
Wt = 0
elif 3/12 <= t <= 8/12:
Wt = W0
elif 8/12 < t < 1:
Wt = 0
else:
Wt = W(t - 1)
return Wt
# Define the right-hand-side function
def rhsf(t,y):
y1 = y[0]
y2 = y[1]
f1 = (2-2*10**-3*y2)*y1-W(t)*y1
f2 = (-3.92+7*10**-3*y1)*y2
return np.array([f1,f2])
# Define one step of the RK4 method
def RK4Step(tn,wn,Dt,f):
# tn = current time
# wn = known approximation at time tn
# Dt = the time step to use
# f = the right-hand-side function to use
# wnplus1 = the new approximation at time tn+Dt
k1 = Dt*f(tn,wn)
k2 = Dt*f(tn+0.5*Dt,wn+0.5*k1)
k3 = Dt*f(tn+0.5*Dt,wn+0.5*k2)
k4 = Dt*f(tn+Dt,wn+k3)
wnplus1 = wn + 1/6*(k1 +2*k2 +2*k3 +k4)
return wnplus1
# Define the complete RK4 method
def RK4Method(t0,tend,Dt,f,y0):
# t0 = initial time of simulation
# tend = final time of simulation
# Dt = the time step to use
# f = the right-hand-side function to use
# y0 = the initial values
# calculate the number of time steps to take
N = int(np.round((tend-t0)/Dt))
# make the list of times t which we want the solution
time = np.linspace(t0,tend,num=N+1)
# make sure Dt matches with the number of time steps
Dt = (tend-t0)/N
# Allocate memory for the approximations
# row i represents all values of variable i at all times
# column j represents all values of all variables at time t_j
w = np.zeros((y0.size,N+1))
# store the (given) initial value
w[:,0] = y0
# Perform all time steps
for n,tn in enumerate(time[:-1]):
w[:,n+1] = RK4Step(tn,w[:,n],Dt,f)
return time, w
# Set all known values and settings
t0 = 0.0
tend = 10.0
y0 = np.array([600.0,1000.0])
Dt = 0.5/(2**7)
# Execute the method
tnum, wnum = RK4Method(t0,tend,Dt,rhsf,y0)
# Make a nice table
alldata = np.concatenate(([tnum],wnum),axis=0).transpose()
table = pd.DataFrame(alldata,columns=['t','y1(t)','y2(t)'])
print('\nA nice table of the simulation:\n')
print(table)
# Make a nice picture
plt.close('all')
plt.figure()
plt.plot(tnum,wnum[0,:],label='$y_1$',marker='o',linestyle='-')
plt.plot(tnum,wnum[1,:],label='$y_2$',marker='o',linestyle='-')
plt.xlabel('$t$')
plt.ylabel('$y(t)$')
plt.title('Simulation')
plt.legend()
# Do an error computation
# Execute the method again with a doubled time step
tnum2, wnum2 = RK4Method(t0,tend,2.0*Dt,rhsf,y0)
# Calculate the global truncation errors at the last simulated time
errors = (wnum[:,-1] - wnum2[:,-1])/(2**4-1)
print('\nThe errors are ',errors[0],' for y1 and ',errors[1],' for y2 at time t=',tnum[-1])

TSP, algorithm gets stuck in local minimum

I am struggling to implement a program based on simulated annealing to solve the traveling salesman problem. All solutions I got are not satisfying and i have no clue how to improve my implementation. Obviously I'm not focusing on benchmarks, but only on finding the visually acceptable shortest path. If anyone might enlighten me I would be thankful.
# weight function, simple euclidean norm
def road(X,Y):
sum = 0
size = len(X) -1
for i in range(0,size):
sum +=math.sqrt((X[i]-X[i+1])**2 + (Y[i]-Y[i+1])**2)
return sum
def array_swap(X,Y,index_1,index_2):
X[index_1],X[index_2] = X[index_2],X[index_1]
Y[index_1],Y[index_2] = Y[index_2],Y[index_1]
def arbitrarty_swap(X,Y):
ran = len(X)-1
pick_1 = random.randint(0,ran)
pick_2 = random.randint(0,ran)
X[pick_1],X[pick_2] = X[pick_2],X[pick_1]
Y[pick_1],Y[pick_2] = Y[pick_2],Y[pick_1]
return pick_1, pick_2
N = 40
X = np.random.rand(N) * 100
Y = np.random.rand(N) * 100
plt.plot(X, Y, '-o')
plt.show()
best = road(X,Y)
X1 = X.copy()
Y1 = Y.copy()
#history of systems energy
best_hist = []
iterations = 100000
T = 1.02
B = 0.999
for i in range(0,iterations):
index_1, index_2 = arbitrarty_swap(X,Y)
curr = road(X,Y)
diff = (curr - best)
if diff < 0 :
best = curr
best_hist.append(best)
array_swap(X1,Y1,index_1,index_2)
elif math.exp(-(diff)/T) > random.uniform(0,1):
best_hist.append(curr)
T *=B
else:
array_swap(X,Y,index_1,index_2)
https://i.stack.imgur.com/A6hmd.png
I didn't run your code, but one thing I'd try is changing the SA implementation.
Currently, you have 100,000 iterations in one loop. I would break that into two. The outer loop controls the temperature and the inner loop is different runs in that temperature. Something like this (pseudo code):
t=0; iterations=1000; repeat=1000
while t <= repeat:
n = 0
while n <=iterations:
# your SA implementation.
n += 1 # increase your iteration count in each temperature
# in outer while,
t += 1
T *= B

Build an approximately uniform grid from random sample (python)

I want to build a grid from sampled data. I could use a machine learning - clustering algorithm, like k-means, but I want to restrict the centres to be roughly uniformly distributed.
I have come up with an approach using the scikit-learn nearest neighbours search: pick a point at random, delete all points within radius r then repeat. This works well, but wondering if anyone has a better (faster) way of doing this.
In response to comments I have tried two alternate methods, one turns out much slower the other is about the same...
Method 0 (my first attempt):
def get_centers0(X, r):
N = X.shape[0]
D = X.shape[1]
grid = np.zeros([0,D])
nearest = near.NearestNeighbors(radius = r, algorithm = 'auto')
while N > 0:
nearest.fit(X)
x = X[int(random()*N), :]
_, del_x = nearest.radius_neighbors(x)
X = np.delete(X, del_x[0], axis = 0)
grid = np.vstack([grid, x])
N = X.shape[0]
return grid
Method 1 (using the precomputed graph):
def get_centers1(X, r):
N = X.shape[0]
D = X.shape[1]
grid = np.zeros([0,D])
nearest = near.NearestNeighbors(radius = r, algorithm = 'auto')
nearest.fit(X)
graph = nearest.radius_neighbors_graph(X)
#This method is very slow even before doing any 'pruning'
Method 2:
def get_centers2(X, r, k):
N = X.shape[0]
D = X.shape[1]
k = k
grid = np.zeros([0,D])
nearest = near.NearestNeighbors(radius = r, algorithm = 'auto')
while N > 0:
nearest.fit(X)
x = X[np.random.randint(0,N,k), :]
#min_dist = near.NearestNeighbors().fit(x).kneighbors(x, n_neighbors = 1, return_distance = True)
min_dist = dist(x, k, 2, np.ones(k)) # where dist is a cython compiled function
x = x[min_dist < 0.1,:]
_, del_x = nearest.radius_neighbors(x)
X = np.delete(X, del_x[0], axis = 0)
grid = np.vstack([grid, x])
N = X.shape[0]
return grid
Running these as follows:
N = 50000
r = 0.1
x1 = np.random.rand(N)
x2 = np.random.rand(N)
X = np.vstack([x1, x2]).T
tic = time.time()
grid0 = get_centers0(X, r)
toc = time.time()
print 'Method 0: ' + str(toc - tic)
tic = time.time()
get_centers1(X, r)
toc = time.time()
print 'Method 1: ' + str(toc - tic)
tic = time.time()
grid2 = get_centers2(X, r)
toc = time.time()
print 'Method 1: ' + str(toc - tic)
Method 0 and 2 are about the same...
Method 0: 0.840130090714
Method 1: 2.23365592957
Method 2: 0.774812936783
I'm not sure from the question exactly what you are trying to do. You mention wanting to create an "approximate grid", or a "uniform distribution", while the code you provide selects a subset of points such that no pairwise distance is greater than r.
A couple possible suggestions:
if what you want is an approximate grid, I would construct the grid you want to approximate, and then query for the nearest neighbor of each grid point. Depending on your application, you might further trim these results to cut-out points whose distance from the grid point is larger than is useful for you.
if what you want is an approximately uniform distribution drawn from among the points, I would do a kernel density estimate (sklearn.neighbors.KernelDensity) at each point, and do a randomized sub-selection from the dataset weighted by the inverse of the local density at each point.
if what you want is a subset of points such that no pairwise distance is greater than r, I would start by constructing a radius_neighbors_graph with radius r, which will, in one go, give you a list of all points which are too close together. You can then use a pruning algorithm similar to the one you wrote above to remove points based on these sparse graph distances.
I hope that helps!
I have come up with a very simple method which is much more efficient than my previous attempts.
This one simply loops over the data set and adds the current point to the list of grid points only if it is greater than r distance from all existing centers. This method is around 20 times faster than my previous attempts. Because there are no external libraries involved I can run this all in cython...
#cython.boundscheck(False)
#cython.wraparound(False)
#cython.nonecheck(False)
def get_centers_fast(np.ndarray[DTYPE_t, ndim = 2] x, double radius):
cdef int N = x.shape[0]
cdef int D = x.shape[1]
cdef int m = 1
cdef np.ndarray[DTYPE_t, ndim = 2] xc = np.zeros([10000, D])
cdef double r = 0
cdef double r_min = 10
cdef int i, j, k
for k in range(D):
xc[0,k] = x[0,k]
for i in range(1, N):
r_min = 10
for j in range(m):
r = 0
for k in range(D):
r += (x[i, k] - xc[j, k])**2
r = r**0.5
if r < r_min:
r_min = r
if r_min > radius:
m = m + 1
for k in range(D):
xc[m - 1,k] = x[i,k]
nonzero = np.nonzero(xc[:,0])[0]
xc = xc[nonzero,:]
return xc
Running these methods as follows:
N = 40000
r = 0.1
x1 = np.random.normal(size = N)
x1 = (x1 - min(x1)) / (max(x1)-min(x1))
x2 = np.random.normal(size = N)
x2 = (x2 - min(x2)) / (max(x2)-min(x2))
X = np.vstack([x1, x2]).T
tic = time.time()
grid0 = gt.get_centers0(X, r)
toc = time.time()
print 'Method 0: ' + str(toc - tic)
tic = time.time()
grid2 = gt.get_centers2(X, r, 10)
toc = time.time()
print 'Method 2: ' + str(toc - tic)
tic = time.time()
grid3 = gt.get_centers_fast(X, r)
toc = time.time()
print 'Method 3: ' + str(toc - tic)
The new method is around 20 times faster. It could be made even faster, if I stopped looping early (e.g. if k successive iterations fail to produce a new center).
Method 0: 0.219595909119
Method 2: 0.191949129105
Method 3: 0.0127329826355
Maybe you could only re-fit the nearest object every k << N deletions to speedup the process. Most of the time the neighborhood structure should not change much.
Sounds like you are trying to reinvent one of the following:
cluster features (see BIRCH)
data bubbles (see "Data bubbles: Quality preserving performance boosting for hierarchical clustering")
canopy pre-clustering
i.e. this concept has already been invented at least three times with small variations.
Technically, it is not clustering. K-means isn't really clustering either.
It is much more adequately described as vector quantization.

How can I check to see the number of iterations Newton's method takes to run?

So basically I want to grab the number of iterations it takes my newton's method to find the root, and then take that number and apply it to my color scheme to make the longer the amount of iterations, the darker the color, and the fewer, the more full the color.
so here's my code
from numpy import *
import pylab as pl
def myffp(x):
return x**3 - 1, 3*(x**2)
def newton( ffp, x, nits):
for i in range(nits):
#print i,x
f,fp = ffp(x)
x = x - f/fp
return x
q = sqrt(3)/2
def leggo(xmin=-1,xmax=1,jmin=-1,jmax=1,pts=1000,nits=30):
x = linspace(xmin, xmax, pts)
y = linspace(jmin, jmax, pts)*complex(0,1)
x1,y1 = meshgrid(x,y)
n = newton(myffp,x1+y1,nits) #**here is where i wanna see the number of iterations newton's method takes to find my root**
r1 = complex(1,0)
r2 = complex(-.5, q)
r3 = complex(-.5,-q)
data = zeros((pts,pts,3))
data[:,:,0] = abs(n-r1) #**and apply it here**
data[:,:,2] = abs(n-r2)
data[:,:,1] = abs(n-r3)
pl.show(pl.imshow(data))
leggo()
The main problem is finding the number of iterations, I can then figure out how to apply that to darkening the color, but for now it's just finding the number of iterations it takes for each value ran through newton's method.
Perhaps the simplest way is to just refactor your newton function so that it keeps track of the total iterations and then returns it (along with the result, of course), e.g.,
def newton( ffp, x, nits):
c = 0 # initialize iteration counter
for i in range(nits):
c += 1 # increment counter for each iteration
f, fp = ffp(x)
x = x - f/fp
return x, c # return the counter when the function is called
so in the main body of your code, change your call to newton, like so:
res, tot_iter = newton(myffp, x, nits)
the number of iterations in the last call to newton is stored in tot_iter
As aside, your implementation of Newton's Method seems to be incomplete.
for instance, it's missing a test against some convergence criterion.
Here's a simple implementation in python that works:
def newtons_method(x_init, fn, max_iter=100):
"""
returns: approx. val of root of the function passed in, fn;
pass in: x_init, initial value for the root;
max_iter, total iteration count not exceeded;
fn, a function of the form:
def f(x): return x**3 - 2*x
"""
x = x_init
eps = .0001
# set initial value different from x_init so at lesat 1 loop
x_old = x + 10 * eps
step = .1
c = 0
# (x - x_old) is convergence criterion
while (abs(x - x_old) > eps) and (c < max_iter):
c += 1
fval = fn(x)
dfdx = (fn(x + step)) - fn(x) / step
x_old = x
x = x_old - fval / dfdx
return x, c
The code you're currently using for newton() has a fixed number of iterations (nits - which is being passed in as 30), so the results would be kind of trivial and uninteresting.
It looks like you're trying to generate a Newton fractal -- the method you're trying to use is incorrect; the typical coloring mode is based on the output of the function, not the number of iterations. See the Wikipedia article for a full explanation.

Categories