Save results of a Loop in a Matrix - python

I am currently programming a Python tool for performing a Geometric Brownian motion. The loop for performing the motion is done and works as intended. Now I have problems saving the various results of the simulations in a big matrix and to plot it then.
I tried to use the append function but it turns out that the result I get then is a list with another array for each simulation rather than a big matrix.
My Code:
import matplotlib.pyplot as plt
import numpy as np
T = 2
mu = 0.15
sigma = 0.10
S0 = 20
dt = 0.01
N = round(T/dt) ### Paths
simu = 20 ### number of simulations
i = 1
## creates an array with values from 0 to T with N elementes (T/dt)
t = np.linspace(0, T, N)
## empty Matrix for the end results
res = []
while i < simu + 1:
## random number showing the Wiener process
W = np.random.standard_normal(size = N)
W = np.cumsum(W)*np.sqrt(dt) ### standard brownian motion ###
X = (mu-0.5*sigma**2)*t + sigma*W
S = S0*np.exp(X) ### new Stock prices based on the simulated returns ###
res.append(S) #appends the resulting array to the result table
i += 1
#plotting of the result Matrix
plt.plot(t, res)
plt.show()
I would be very pleased if someone could help me with this problem since I intend to plot the time with the different paths (which are stored in the big matrix).
Thank you in advance,
Nick

To completely avoid the loop and use fast and clean pythonic vectorized operations, you can write your operation like this:
import matplotlib.pyplot as plt
import numpy as np
T = 2
mu = 0.15
sigma = 0.10
S0 = 20
dt = 0.01
N = round(T/dt) ### Paths
simu = 20 ### number of simulations
i = 1
## creates an array with values from 0 to T with N elementes (T/dt)
t = np.linspace(0, T, N)
## result matrix creation not needed, thanks to gboffi for the hint :)
## random number showing the Wiener process
W = np.random.standard_normal(size=(simu, N))
W = np.cumsum(W, axis=1)*np.sqrt(dt) ### standard brownian motion ###
X = (mu-0.5*sigma**2)*t + sigma*W
res = S0*np.exp(X) ### new Stock prices based on the simulated returns ###
Now your results are stored in a real matrix, or correctly a np.ndarray. np.ndarray is the standard array format of numpy and thus the most widely used and supported array format.
To plot it, you need to give further information, like: Do you want to plot each row of the result array? This would then look like:
for i in range(simu):
plt.plot(t, res[i])
plt.show()
If you want to check the shape for consistency after calculation, you can do the following:
assert res.shape == (simu, N), 'Calculation faulty!'

Related

Storing Values from One Array into Another Larger Array

I am trying to create a range of signals of different frequencies. I am finding it difficult to store amplitude vs time into another storage matrix for each frequency ranging from 0 to 50 Hz. Example, for a frequency of 20 Hz, I want to store the amplitude vs time for that frequency, then for 21 Hz I want to store the amplitude vs time for that frequency etc, until I have all of them in a large matrix. I am getting so confused at this point with indexing and syntax, any help welcome!
import numpy as np
max_freq = 50
s_frequency = np.arange(0,51,0.1)
fs = 200
time = np.arange(0,5-(1/fs),(1/fs))
x = np.empty((len(time)), dtype=np.float32)
i = 0
j = 0
full_array = np.empty((len(s_frequency),len(time),len(time)), dtype=np.float32)
amplitude = np.zeros(999)
for f1 in s_frequency:
i = 0
for t in time:
amplitude[i] = np.sin(2*np.pi*f1*t)
i = i + 1
full_array[i] = ([time], [amplitude])
I have also tried the following:
import numpy as np
max_freq = 50
s_frequency = np.arange(0,50.1,0.1)
fs = 200
time = np.arange(0,5-(1-fs),(1/fs))
#full_array = np.sin(2*np.pi*np.outer(s_frequency,time))
full_array = np.empty((len(s_frequency),len(time), len(time)), dtype=np.float32)
for f1 in s_frequency:
array = []
for i, t in enumerate(time):
amplitude = np.sin(2*np.pi*f1*t)
array.insert(i,amplitude)
full_array[i] = [time, array]
Not 100% sure what you're trying to do, but it seems like you're trying to initialize a 2-dimensional grid (i.e. a matrix) where you have a dimension for time and one for frequency. Here is what I would do:
import numpy as np
max_freq = 50
s_frequency = np.arange(0,51,0.1)
fs = 200
time = np.arange(0,5-(1/fs),(1/fs))
full_array = np.sin(2*np.pi*np.outer(s_frequency,time))
No explicit for-loops or index handling needed. np.outer() will give you a 2D grid (i.e. a matrix) of frequency versus time. Now whats left is to compute the sine of 2 Pi times that grid value. Very conveniently numpy functions do accept arrays as input, thus we can simply call np.sin(2*np.pi*np.outer(s_frequency,time).
Not sure what x and j are good for in your code and why full_array should be 3-diemsional. Would you like to include a spatial component as well?
By the way, a construct like this:
i = 0
for t in time:
amplitude[i] = np.sin(2*np.pi*f1*t)
i = i + 1
can easily be avoided in python, thanks to pythons build-in enumerate() function. It would then look like this:
for i, t in enumerate(time):
amplitude[i] = np.sin(2*np.pi*f1*t)
which does essentially the same, but you don't have to explicitly create the index i = 0 and manually incerement it in every iteration i = i + 1.

Generate simulated data in Python while meeting a range of correlations with respect to a predefined variable

Let's denote refVar, a variable of interest that contains experimental data.
For the simulation study, I would like to generate other variables V0.05, V0.10, V0.15 until V0.95.
Note that for the variable name, the value following V represents the correlation between the variable and refVar (in order to quick track in the final dataframe).
My readings led me to multivariate_normal() from numpy. However, when using this function, it generates 2 1D-arrays both with random numbers. What I want is to always keep refVar and generate other arrays filled with random numbers, while meeting the specified correlation.
Please, find below my my code. To cut it short, I've no clue how to generate other variables relative to my experimental variable refVar. Ideally, I would like to build a data frame containing the following columns: refVar,V0.05,V0.10,...,V0.95. I hope you get my point and thank you in advance for your time
import numpy as np
import pandas as pd
from numpy.random import multivariate_normal as mvn
refVar = [75.25,77.93,78.2,61.77,80.88,71.95,79.88,65.53,85.03,61.72,60.96,56.36,23.16,73.36,64.18,83.07,63.25,49.3,78.2,30.96]
mean_refVar = np.mean(refVar)
for r in np.arange(0,1,0.05):
var1 = 1
var2 = 1
cov = r
cov_matrix = [[var1,cov],
[cov,var2]]
data = mvn([mean_refVar,mean_refVar],cov_matrix,size=len(refVar))
output = 'corr_'+str(r.round(2))+'.txt'
df = pd.DataFrame(data,columns=['refVar','v'+str(r.round(2)])
df.to_csv(output,sep='\t',index=False) # Ideally, instead of creating an output for each correlation, I would like to generate a DF with refVar and all these newly created Series
Following this answer we can generate the sequence as follow:
def rand_with_corr(refVar, corr):
# center and normalize refVar
X = np.array(refVar) - np.mean(refVar)
X = X/np.linalg.norm(X)
# random sampling Y
Y = np.random.rand(len(X))
# centralize Y
Y = Y - Y.mean()
# find the orthorgonal component to X
Y = Y - Y.dot(X) * X
# normalize Y
Y = Y/np.linalg.norm(Y)
# output
return Y + (1/np.tan(np.arccos(corr))) * X
# test
out = rand_with_corr(refVar, 0.05)
pd.Series(out).corr(pd.Series(refVar))
# out
# 0.050000000000000086

Fast Fourier Transform in Python

I am new to the fourier theory and I've seen very good tutorials on how to apply fft to a signal and plot it in order to see the frequencies it contains. Somehow, all of them create a mix of sines as their data and i am having trouble adapting it to my real problem.
I have 242 hourly observations with a daily periodicity, meaning that my period is 24. So I expect to have a peak around 24 on my fft plot.
A sample of my data.csv is here:
https://pastebin.com/1srKFpJQ
Data plotted:
My code:
data = pd.read_csv('data.csv',index_col=0)
data.index = pd.to_datetime(data.index)
data = data['max_open_files'].astype(float).values
N = data.shape[0] #number of elements
t = np.linspace(0, N * 3600, N) #converting hours to seconds
s = data
fft = np.fft.fft(s)
T = t[1] - t[0]
f = np.linspace(0, 1 / T, N)
plt.ylabel("Amplitude")
plt.xlabel("Frequency [Hz]")
plt.bar(f[:N // 2], np.abs(fft)[:N // 2] * 1 / N, width=1.5) # 1 / N is a normalization factor
plt.show()
This outputs a very weird result where it seems I am getting the same value for every frequency.
I suppose that the problems comes with the definition of N, t and T but I cannot find anything online that has helped me understand this clearly. Please help :)
EDIT1:
With the code provided by charles answer I have a spike around 0 that seems very weird. I have used rfft and rfftfreq instead to avoid having too much frequencies.
I have read that this might be because of the DC component of the series, so after substracting the mean i get:
I am having trouble interpreting this, the spikes seem to happen periodically but the values in Hz don't let me obtain my 24 value (the overall frequency). Anybody knows how to interpret this ? What am I missing ?
The problem you're seeing is because the bars are too wide, and you're only seeing one bar. You will have to change the width of the bars to 0.00001 or smaller to see them show up.
Instead of using a bar chart, make your x axis using fftfreq = np.fft.fftfreq(len(s)) and then use the plot function, plt.plot(fftfreq, fft):
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
data = pd.read_csv('data.csv',index_col=0)
data.index = pd.to_datetime(data.index)
data = data['max_open_files'].astype(float).values
N = data.shape[0] #number of elements
t = np.linspace(0, N * 3600, N) #converting hours to seconds
s = data
fft = np.fft.fft(s)
fftfreq = np.fft.fftfreq(len(s))
T = t[1] - t[0]
f = np.linspace(0, 1 / T, N)
plt.ylabel("Amplitude")
plt.xlabel("Frequency [Hz]")
plt.plot(fftfreq,fft)
plt.show()

Gaussian Sum with python

I found this code :
import numpy as np
import matplotlib.pyplot as plt
# We create 1000 realizations with 200 steps each
n_stories = 1000
t_max = 500
t = np.arange(t_max)
# Steps can be -1 or 1 (note that randint excludes the upper limit)
steps = 2 * np.random.randint(0, 1 + 1, (n_stories, t_max)) - 1
# The time evolution of the position is obtained by successively
# summing up individual steps. This is done for each of the
# realizations, i.e. along axis 1.
positions = np.cumsum(steps, axis=1)
# Determine the time evolution of the mean square distance.
sq_distance = positions**2
mean_sq_distance = np.mean(sq_distance, axis=0)
# Plot the distance d from the origin as a function of time and
# compare with the theoretically expected result where d(t)
# grows as a square root of time t.
plt.figure(figsize=(10, 7))
plt.plot(t, np.sqrt(mean_sq_distance), 'g.', t, np.sqrt(t), 'y-')
plt.xlabel(r"$t$")
plt.tight_layout()
plt.show()
Instead of doing just steps -1 or 1 , I would like to do steps following a standard normal distribution ... when I am inserting np.random.normal(0,1,1000) instead of np.random.randint(...) it is not working.
I am really new to Python btw.
Many thanks in advance and Kind regards
You are entering a single number as third parameter of np.random.normal, therefore you get a 1d array, instead of 2d, see the documentation. Try this:
steps = np.random.normal(0, 1, (n_stories, t_max))

Using FFT for 3D array representation of 2D field

I need to obtain the fourier transform of a complex field. I'm using python.
My input is a 2D snapshot of the electric field in the xy-plane.
I currently have a 3D array F[x][y][z] where F[x][y][0] contains the real component and F[x][y]1 contains the complex component of the field.
My current code is very simple and does this:
result=np.fft.fftn(F)
result=np.fft.fftshift(result)
I have the following questions:
1) Does this correctly compute the fourier transform of the field, or should the field be entered as a 2D matrix with each element containing both the real and imaginary component instead?
2) I entered the complex component values of the field using the real multiple only (i.e if the complex value is 6i I entered 6), is this correct or should this be entered as a complex value instead (i.e. entered as '6j')?
3) As this is technically a 2D input field, should I use np.fft.fft2 instead? Doing this means the output is not centered in the middle.
4) The output does not look like what I'd expect the fourier transform of F to look like, and I'm unsure what I'm doing wrong.
Full example code:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import cm
x, y = np.meshgrid(np.linspace(-1,1,100), np.linspace(-1,1,100))
d = np.sqrt(x*x+y*y)
sigma, mu = .35, 0.0
g1 = np.exp(-( (d-mu)**2 / ( 2.0 * sigma**2 ) ) )
F=np.empty(shape=(300,300,2),dtype=complex)
for x in range(0,300):
for y in range(0,300):
if y<50 or x<100 or y>249 or x>199:
F[x][y][0]=g1[0][0]
F[x][y][1]=0j
elif y<150:
F[x][y][0]=g1[x-100][y-50]
F[x][y][1]=0j
else:
F[x][y][0]=g1[x-100][y-150]
F[x][y][1]=0j
F_2D=np.empty(shape=(300,300))
for x in range(0,300):
for y in range(0,300):
F_2D[x][y]=np.absolute(F[x][y][0])+np.absolute(F[x][y][1])
plt.imshow(F_2D)
plt.show()
result=np.fft.fftn(F)
result=np.fft.fftshift(result)
result_2D=np.empty(shape=(300,300))
for x in range(0,300):
for y in range(0,300):
result_2D[x][y]=np.absolute(result[x][y][0])+np.absolute(result[x][y][1])
plt.imshow(result_2D)
plt.show()
plotting F gives this:
With np.fft.fftn, the image shown at the end is:
And with np.fft.fft2:
Neither of these look like what I would expect the fourier transform of F to look like.
I add here another answer, suitable to the added code.
The answer is still np.fft.fft2(). Here's an example. I modified the code slightly. To verify that we need fft2 I discarded one of the blobs, and then we know that a single Gaussian blob should transform into a Gaussian blob (with a certain phase, that's not shown when plotting absolute value). I also decreased the standard deviation so that the frequency response will widen a little.
Code:
import numpy as np
import matplotlib.pyplot as plt
x, y = np.meshgrid(np.linspace(-1,1,100), np.linspace(-1,1,100))
d = np.sqrt(x**2+y**2)
sigma, mu = .1, 0.0
g1 = np.exp(-( (d-mu)**2 / ( 2.0 * sigma**2 ) ) )
N = 300
positions = [ [150,100] ]#, [150,200] ]
sz2 = [int(x/2) for x in g1.shape]
F_2D = np.zeros([N,N])
for x0,y0 in positions:
F_2D[ x0-sz2[0]: x0+sz2[0], y0-sz2[1]:y0+sz2[1] ] = g1 + 1j*0.
result = np.fft.fftshift(np.fft.fft2(F_2D))
plt.subplot(211); plt.imshow(F_2D)
plt.subplot(212); plt.imshow(np.absolute(result))
plt.title('$\sigma$=.1')
plt.show()
Result:
To get back to the original problem, we need only change
positions = [ [150,100] , [150,200] ]
and sigma=.35 instead of sigma=.1.
You should use complex numpy variables (by using 1j) and use fft2. For example:
N = 16
x0 = np.random.randn(N,N,2)
x = x0[:,:,0] + 1j*x0[:,:,1]
X = np.fft.fft2(x)
Using fftn on x0 will do a 3D FFT, and using fft will do vector-wise 1D FFT.

Categories