I've already created a code for random walk of 10000 steps and then repeated it 12 times and stored each run in a separate text file (which was required in the question). I then calculated the mean square displacement of it(not sure if it's done correct). I now need to 'plot my Mean Square Displacement as a function of δt, including errorbars σ = std(MSD)/√N, where std(MSD) is the standard deviation among the different runs and N is the number of runs.' and then compute the diffusion constant D from the curve and check that D = 2 (∆/dt) where dt = 1.
Here is my code so far:
import numpy as np
import matplotlib.pyplot as plt
import random as rd
import math
a = (np.zeros((10000, 2), dtype=np.float))
def randwalk(x,y):
theta= 2*math.pi*rd.random()
x+=math.cos(theta); # This uses the equation given, since we are told the spatial unit = 1
y+=math.sin(theta);
return (x,y)
x, y = 0.,0.
for i in range(10000): # Using for loop and range function to initialize the array
x, y = randwalk(x,y)
a[i,:] = x,y
fn_base = "random_walk_%i.txt" # Saves each run in a numbered text file, fn_base is a varaible to hold format
N = 12
for j in range(N):
rd.seed(j) # seed(j) explicitly sets the seed to random numbers
x , y = 0., 0.
for i in range(10000):
x, y = randwalk(x,y)
a[i,:] = x, y
fn = fn_base % j
np.savetxt(fn, a)
destinations = np.zeros((12, 2), dtype=np.float)
for j in range(12):
x, y = 0., 0.
for i in range(10000):
x, y = randwalk(x, y)
destinations[j] = x, y
square_distances = destinations[:,0] ** 2 + destinations[:,1] ** 2
m_s_d = np.mean(square_distances)
I think that to do it I just have to plot the msd against the number of steps? But I'm not sure how to do this. I saw a similar question on stackoverflow but the code for it is different than mine and I don't understand how to use that for my code.
I tried to do next
plt.figure()
t = 10000
plt.plot(m_s_d, t)
plt,show()
But this gives an error as the dimensions are not equal.
Edit ** I think my issue is that I am trying to plot it against number of steps when I should be plotting it against the change in time. However I can’t work out how to calculate the change in time dt?
Apologies in advance is question isn't formulated well, I am fairly new to computing. Thank you.
I am trying to create a matrix in python that is 30 × 10 and has randomly generated numbers inside of it. But my numbers in the matrix have to follow the condition:
Randomly generate 30 data points from the sine function, where each data point (x,y) has the form
x = [x0, x1, x2,..., x10], x ∈ [0, 2π]
y = sin(x) + ε, ε ∈ N(0,0.3)
How might I be able to go about this?
Right now I only have a 1 × 10 matrix
def generate_sin_data():
x = np.random.rand()
y = np.sin(x)
features = [x**0, x**1, x**2, x**3, x**4,x**5, x**6, x**7, x**8, x**9,x**10]
return x,y,features
I'm not 100% certain I follow everything, but we can break it down. Here's how you can generate 30 random numbers between 0 and 2π:
import numpy as np
x = np.random.random(30) * 2*np.pi
Here, x is a 1D array of 30 numbers. Check this with x.shape.
Now if you add a dimension, it's easy to generate a matrix of powers up to 10 using NumPy's broadcasting feature. The question seems to ask for 11 numbers (0 to 10) not 10, so I'll do that:
X = x.reshape(-1, 1) ** np.arange(0, 11)
That reshape effectively turns x into a column vector. Now check X.shape and it's (30, 11), which is what I think you were after. Notice we use a big X for a matrix — this convention will help you keep track of things. Each column of X is the original function raised to a power from 0 to 10. (Note that each column comes from the same set of random numbers — I'm not sure if that's what you want?)
If you want y as a function of x (the vector) then do like so:
ϵ = np.random.random(30) * 0.3
y = np.sin(x) + ϵ
import numpy as np
# 30 random uniform values in [0, 2*pi)
_x = np.random.uniform(0, 2*np.pi, 30)
# matrix of 30x10:
x = np.array([
[v ** i for i in range(10)]
for v in _x
])
# random 30x10 normal noise:
eps = np.random.normal(0, 0.3, [30, 10])
# final result 30x10 matrix:
y = np.sin(x) + eps
I have a set of data in a numpy array - x-values, lets say between 0-100, and y-values. I need to get the gradient to a specific x-value ex. x=20 but I can only get the np.gradient function to give me the gradient at a certain index-value. right now I have:
g=np.gradient(y)
print(g[20])
but this of course gives me the gradient at i=20 and not x=20
I have both the x and y values in one 2D array and 2 x 1D arrays defined in my script
EDIT:
I actually came to solve it like this:
def grad(x, value):
def find_nearest(x, value):
x = np.asarray(Timeppmh)
idx = (np.abs(x - value)).argmin()
i = x.tolist().index(x[idx])
return i
g=np.gradient(yp,x)
find_nearest(x,value)
return g[find_nearest(x,value)]
If the value 20 is in x you could just do j[x == 20]. However, if that is not the case, you would need to approximate the gradient value. You can use for example linear interpolation.
import numpy as np
x = np.linspace(0, 100, 80)
print(20 in x) # 20 is not in x
# False
y = x * x + 3 * x + 2
# Pass x as second argument for value spacing
g = np.gradient(y, x)
print(np.interp(20, x, g)) # Should be 43
# 43.00000000000001
Working in Python, I am doing some physics calculations over an NxM grid of values, where N goes from 1 to 3108 and M goes from 1 to 2304 (this corresponds to a large image). I need calculate a value at each and every point in this space, which totals ~ 7 million calculations. My current approach is painfully slow, and I am wondering if there is a way to complete this task and it not take hours...
My first approach was just to use nested for loops, but this seemed like the least efficient way to solve my problem. I have tried using NumPy's nditer and iterating over each axis individually, but I've read that it doesn't actually speed up my computations. Rather than looping through each axis individually, I also tried making a 3-D array and looping through the outer axis as shown in Brian's answer here How can I, in python, iterate over multiple 2d lists at once, cleanly? . Here is the current state of my code:
import numpy as np
x,y = np.linspace(1,3108,num=3108),np.linspace(1,2304,num=2304) # x&y dimensions of image
X,Y = np.meshgrid(x,y,indexing='ij')
all_coords = np.dstack((X,Y)) # moves to 3-D
all_coords = all_coords.astype(int) # sets coords to int
For reference, all_coords looks like this:
array([[[1.000e+00, 1.000e+00],
[1.000e+00, 2.000e+00],
[1.000e+00, 3.000e+00],
...,
[1.000e+00, 2.302e+03],
[1.000e+00, 2.303e+03],
[1.000e+00, 2.304e+03]],
[[2.000e+00, 1.000e+00],
[2.000e+00, 2.000e+00],
[2.000e+00, 3.000e+00],
...,
[2.000e+00, 2.302e+03],
[2.000e+00, 2.303e+03],
[2.000e+00, 2.304e+03]],
and so on. Back to my code...
'''
- below is a function that does a calculation on the full grid using the distance between x0,y0 and each point on the grid.
- the function takes x0,y0 and returns the calculated values across the grid
'''
def do_calc(x0,y0):
del_x, del_y = X-x0, Y-y0
np.seterr(divide='ignore', invalid='ignore')
dmx_ij = (del_x/((del_x**2)+(del_y**2))) # x component
dmy_ij = (del_y/((del_x**2)+(del_y**2))) # y component
return dmx_ij,dmy_ij
# now the actual loop
def do_loop():
dmx,dmy = 0,0
for pair in all_coords:
for xi,yi in pair:
DM = do_calc(xi,yi)
dmx,dmy = dmx+DM[0],dmy+DM[1]
return dmx,dmy
As you might see, this code takes an incredibly long time to run... If there is any way to modify my code such that it doesn't take hours to complete, I would be extremely interested in knowing how to do that. Thanks in advance for the help.
Here is a method that gives a 10,000x speedup at N=310, M=230. As the method scales better than the original code I'd expect a factor of more than a million at the full problem size.
The method exploits the shift invariance of the problem. For example, del_x**2 is essentially the same up to shift at each call of do_calc, so we compute it only once.
If the output of do_calc is weighted before summation the problem is no longer fully translation invariant, and this method doesn't work anymore. The result, however, can then be expressed in terms of linear convolution. At N=310, M=230 this still leaves us with a more than 1,000x speedup. And, again, this will be more at full problem size
Code for original problem
import numpy as np
#N, M = 3108, 2304
N, M = 310, 230
### OP's code
x,y = np.linspace(1,N,num=N),np.linspace(1,M,num=M) # x&y dimensions of image
X,Y = np.meshgrid(x,y,indexing='ij')
all_coords = np.dstack((X,Y)) # moves to 3-D
all_coords = all_coords.astype(int) # sets coords to int
'''
- below is a function that does a calculation on the full grid using the distance between x0,y0 and each point on the grid.
- the function takes x0,y0 and returns the calculated values across the grid
'''
def do_calc(x0,y0):
del_x, del_y = X-x0, Y-y0
np.seterr(divide='ignore', invalid='ignore')
dmx_ij = (del_x/((del_x**2)+(del_y**2))) # x component
dmy_ij = (del_y/((del_x**2)+(del_y**2))) # y component
return np.nan_to_num(dmx_ij), np.nan_to_num(dmy_ij)
# now the actual loop
def do_loop():
dmx,dmy = 0,0
for pair in all_coords:
for xi,yi in pair:
DM = do_calc(xi,yi)
dmx,dmy = dmx+DM[0],dmy+DM[1]
return dmx,dmy
from time import time
t = [time()]
### pp's code
x, y = np.ogrid[-N+1:N-1:2j*N - 1j, -M+1:M-1:2j*M - 1J]
den = x*x + y*y
den[N-1, M-1] = 1
xx = x / den
yy = y / den
for zz in xx, yy:
zz[N:] -= zz[:N-1]
zz[:, M:] -= zz[:, :M-1]
XX = xx.cumsum(0)[N-1:].cumsum(1)[:, M-1:]
YY = yy.cumsum(0)[N-1:].cumsum(1)[:, M-1:]
t.append(time())
### call OP's code for reference
X_OP, Y_OP = do_loop()
t.append(time())
# make sure results are equal
assert np.allclose(XX, X_OP)
assert np.allclose(YY, Y_OP)
print('pp {}\nOP {}'.format(*np.diff(t)))
Sample run:
pp 0.015251636505126953
OP 149.1642508506775
Code for weighted problem:
import numpy as np
#N, M = 3108, 2304
N, M = 310, 230
values = np.random.random((N, M))
x,y = np.linspace(1,N,num=N),np.linspace(1,M,num=M) # x&y dimensions of image
X,Y = np.meshgrid(x,y,indexing='ij')
all_coords = np.dstack((X,Y)) # moves to 3-D
all_coords = all_coords.astype(int) # sets coords to int
'''
- below is a function that does a calculation on the full grid using the distance between x0,y0 and each point on the grid.
- the function takes x0,y0 and returns the calculated values across the grid
'''
def do_calc(x0,y0, v):
del_x, del_y = X-x0, Y-y0
np.seterr(divide='ignore', invalid='ignore')
dmx_ij = (del_x/((del_x**2)+(del_y**2))) # x component
dmy_ij = (del_y/((del_x**2)+(del_y**2))) # y component
return v*np.nan_to_num(dmx_ij), v*np.nan_to_num(dmy_ij)
# now the actual loop
def do_loop():
dmx,dmy = 0,0
for pair, vv in zip(all_coords, values):
for (xi,yi), v in zip(pair, vv):
DM = do_calc(xi,yi, v)
dmx,dmy = dmx+DM[0],dmy+DM[1]
return dmx,dmy
from time import time
from scipy import signal
t = [time()]
x, y = np.ogrid[-N+1:N-1:2j*N - 1j, -M+1:M-1:2j*M - 1J]
den = x*x + y*y
den[N-1, M-1] = 1
xx = x / den
yy = y / den
XX, YY = (signal.fftconvolve(zz, values, 'valid') for zz in (xx, yy))
t.append(time())
X_OP, Y_OP = do_loop()
t.append(time())
assert np.allclose(XX, X_OP)
assert np.allclose(YY, Y_OP)
print('pp {}\nOP {}'.format(*np.diff(t)))
Sample run:
pp 0.12683939933776855
OP 158.35225439071655
I'm trying to sum a two dimensional function using the array method, somehow, using a for loop is not outputting the correct answer. I want to find (in latex) $$\sum_{i=1}^{M}\sum_{j=1}^{M_2}\cos(i)\cos(j)$$ where according to Mathematica the answer when M=5 is 1.52725. According to the for loop:
def f(N):
s1=0;
for p1 in range(N):
for p2 in range(N):
s1+=np.cos(p1+1)*np.cos(p2+1)
return s1
print(f(4))
is 0.291927.
I have thus been trying to use some code of the form:
def f1(N):
mat3=np.zeros((N,N),np.complex)
for i in range(0,len(mat3)):
for j in range(0,len(mat3)):
mat3[i][j]=np.cos(i+1)*np.cos(j+1)
return sum(mat3)
which again
print(f1(4))
outputs 0.291927. Looking at the array we should find for each value of i and j a matrix of the form
mat3=[[np.cos(1)*np.cos(1),np.cos(2)*np.cos(1),...],[np.cos(2)*np.cos(1),...]...[np.cos(N+1)*np.cos(N+1)]]
so for N=4 we should have
mat3=[[np.cos(1)*np.cos(1) np.cos(2)*np.cos(1) ...] [np.cos(2)*np.cos(1) ...]...[... np.cos(5)*np.cos(5)]]
but what I actually get is the following
mat3=[[0.29192658+0.j 0.+0.j 0.+0.j ... 0.+0.j] ... [... 0.+0.j]]
or a matrix of all zeros apart from the mat3[0][0] element.
Does anybody know a correct way to do this and get the correct answer? I chose this as an example because the problem I'm trying to solve involves plotting a function which has been summed over two indices and the function that python outputs is not the same as Mathematica (i.e., a function of the form $$f(E)=\sum_{i=1}^{M}\sum_{j=1}^{M_2}F(i,j,E)$$).
The return statement is not indented correctly in your sample code. It returns immediately in the first loop iteration. Indent it on the function body instead, so that both for loops finish:
def f(N):
s1=0;
for p1 in range(N):
for p2 in range(N):
s1+=np.cos(p1+1)*np.cos(p2+1)
return s1
>>> print(f(5))
1.527247272700347
I have moved your code to a more numpy-ish version:
import numpy as np
N = 5
x = np.arange(N) + 1
y = np.arange(N) + 1
x = x.reshape((-1, 1))
y = y.reshape((1, -1))
mat = np.cos(x) * np.cos(y)
print(mat.sum()) # 1.5272472727003474
The trick here is to reshape x to a column and y to a row vector. If you multiply them, they are matched up like in your loop.
This should be more performant, since cos() is only called 2*N times. And it avoids loops (bad in python).
UPDATE (regarding your comment):
This pattern can be extended in any dimension. Basically, you get something like a crossproduct. Where every instance of x is matched up with every instance of y, z, u, k, ... Along the corresponding dimensions.
It's a bit confusing to describe, so here is some more code:
import numpy as np
N = 5
x = np.arange(N) + 1
y = np.arange(N) + 1
z = np.arange(N) + 1
x = x.reshape((-1, 1, 1))
y = y.reshape((1, -1, 1))
z = z.reshape((1, 1, -1))
mat = z**2 * np.cos(x) * np.cos(y)
# x along first axis
# y along second, z along third
# mat[0, 0, 0] == 1**2 * np.cos(1) * np.cos(1)
# mat[0, 4, 2] == 3**2 * np.cos(1) * np.cos(5)
If you use this for many dimensions, and big values for N, you will run into memory problems, though.