Sampling from two different normal distributions at specified probabilities - python

I'm trying to create a simulation that samples from two different normal distributions at specified probabilities. I want the simulation to choose a new value from the distribution during each simulation. I created the code below, but it picks a random value on each distribution one time, and then simulates it 50 times. How can I get new values from each distribution during each iteration of the simulation?
import numpy as np
from numpy.random import normal
number_simulations = 50
P1 = normal(loc=75, scale=5)
P2 = normal(loc=25, scale=5)
elements = [P1, P2]
probabilities = [.80, .20]
simulation = np.random.choice(elements, number_simulations, p=probabilities)
print(simulation)
[26.40889965 71.60833802 71.60833802 26.40889965 71.60833802, etc]

You could generate all 50 samples per P using size. Then use random to choose either index 0 of elements (P1) or index 1 of elements (P2) and then call random on the resulting distribution. You can use list comprehension to generate your 50 simulations.
import numpy as np
from numpy.random import normal
number_simulations = 50
P1 = normal(loc=75, scale=5, size=number_simulations)
P2 = normal(loc=75, scale=5, size=number_simulations)
elements = [P1, P2]
probabilities = [.80, .20]
[np.random.choice(elements[np.random.choice([0,1], p=probabilities)]) for x in range(number_simulations)]

Maybe a bit smoother would be to generate 50 samples with mean 50 and then either add or subtract 25 depending on the result:
import numpy as np
number_simulations = 50
probabilities = [.20, .80]
x = np.random.normal(loc = 50, scale = 5, size = number_simulations)
a = np.random.choice([-25,25], p = probabilities, size = number_simulations)
print(list(x+a))

Related

How to use for loop for a model

I understand that I have to put this all into a function and then call the function from a for loop ten times but I'm not sure how. Any help would be deeply appreciated.
import random
import matplotlib.pyplot as plt
import statistics as stats
plt.hist(list1, bins=100, alpha = 0.5)
array1 = np.array(list1)
array2 = np.array(list2)
array3 = np.array(list3)
# Run the t-test using scipy library
scipy.stats.ttest_ind(array1,array2)
Use range (for x in range(0,10)):
import random
import matplotlib.pyplot as plt
import statistics as stats
import numpy as np
# Library for scientific statistics
import scipy.stats
for x in range(0,10):
print(x)
# Create two lists of random numbers that follow a normal ("Gaussian") distribution
# Start with an empty list named "list1"
list1 = []
# Loop that runs 30 times - starts at 1, goes to 30
for x in range(1,30):
# Random numbers drawn from pool that has mean of 12 and standard deviation of 5
value1 = random.gauss(12,5)
# Add random value to the first list, list1
list1.append(value1)
print(list1)
# Do the same with a second list
list2 = []
for x in range(1,30):
# Random numbers drawn from pool that has mean of 14 and standard deviation of 4
value2 = random.gauss(14,4)
list2.append(value2)
print(list2)
# Create a histogram of the two lists using matplotlib library
plt.hist(list1, bins=50, alpha = 0.5)
plt.hist(list2, bins=50, alpha = 0.5)
# Run a t-test on the two sets of data
array1 = np.array(list1)
array2 = np.array(list2)
# Run the t-test using scipy library
scipy.stats.ttest_ind(array1,array2)

Storing Values from One Array into Another Larger Array

I am trying to create a range of signals of different frequencies. I am finding it difficult to store amplitude vs time into another storage matrix for each frequency ranging from 0 to 50 Hz. Example, for a frequency of 20 Hz, I want to store the amplitude vs time for that frequency, then for 21 Hz I want to store the amplitude vs time for that frequency etc, until I have all of them in a large matrix. I am getting so confused at this point with indexing and syntax, any help welcome!
import numpy as np
max_freq = 50
s_frequency = np.arange(0,51,0.1)
fs = 200
time = np.arange(0,5-(1/fs),(1/fs))
x = np.empty((len(time)), dtype=np.float32)
i = 0
j = 0
full_array = np.empty((len(s_frequency),len(time),len(time)), dtype=np.float32)
amplitude = np.zeros(999)
for f1 in s_frequency:
i = 0
for t in time:
amplitude[i] = np.sin(2*np.pi*f1*t)
i = i + 1
full_array[i] = ([time], [amplitude])
I have also tried the following:
import numpy as np
max_freq = 50
s_frequency = np.arange(0,50.1,0.1)
fs = 200
time = np.arange(0,5-(1-fs),(1/fs))
#full_array = np.sin(2*np.pi*np.outer(s_frequency,time))
full_array = np.empty((len(s_frequency),len(time), len(time)), dtype=np.float32)
for f1 in s_frequency:
array = []
for i, t in enumerate(time):
amplitude = np.sin(2*np.pi*f1*t)
array.insert(i,amplitude)
full_array[i] = [time, array]
Not 100% sure what you're trying to do, but it seems like you're trying to initialize a 2-dimensional grid (i.e. a matrix) where you have a dimension for time and one for frequency. Here is what I would do:
import numpy as np
max_freq = 50
s_frequency = np.arange(0,51,0.1)
fs = 200
time = np.arange(0,5-(1/fs),(1/fs))
full_array = np.sin(2*np.pi*np.outer(s_frequency,time))
No explicit for-loops or index handling needed. np.outer() will give you a 2D grid (i.e. a matrix) of frequency versus time. Now whats left is to compute the sine of 2 Pi times that grid value. Very conveniently numpy functions do accept arrays as input, thus we can simply call np.sin(2*np.pi*np.outer(s_frequency,time).
Not sure what x and j are good for in your code and why full_array should be 3-diemsional. Would you like to include a spatial component as well?
By the way, a construct like this:
i = 0
for t in time:
amplitude[i] = np.sin(2*np.pi*f1*t)
i = i + 1
can easily be avoided in python, thanks to pythons build-in enumerate() function. It would then look like this:
for i, t in enumerate(time):
amplitude[i] = np.sin(2*np.pi*f1*t)
which does essentially the same, but you don't have to explicitly create the index i = 0 and manually incerement it in every iteration i = i + 1.

Save results of a Loop in a Matrix

I am currently programming a Python tool for performing a Geometric Brownian motion. The loop for performing the motion is done and works as intended. Now I have problems saving the various results of the simulations in a big matrix and to plot it then.
I tried to use the append function but it turns out that the result I get then is a list with another array for each simulation rather than a big matrix.
My Code:
import matplotlib.pyplot as plt
import numpy as np
T = 2
mu = 0.15
sigma = 0.10
S0 = 20
dt = 0.01
N = round(T/dt) ### Paths
simu = 20 ### number of simulations
i = 1
## creates an array with values from 0 to T with N elementes (T/dt)
t = np.linspace(0, T, N)
## empty Matrix for the end results
res = []
while i < simu + 1:
## random number showing the Wiener process
W = np.random.standard_normal(size = N)
W = np.cumsum(W)*np.sqrt(dt) ### standard brownian motion ###
X = (mu-0.5*sigma**2)*t + sigma*W
S = S0*np.exp(X) ### new Stock prices based on the simulated returns ###
res.append(S) #appends the resulting array to the result table
i += 1
#plotting of the result Matrix
plt.plot(t, res)
plt.show()
I would be very pleased if someone could help me with this problem since I intend to plot the time with the different paths (which are stored in the big matrix).
Thank you in advance,
Nick
To completely avoid the loop and use fast and clean pythonic vectorized operations, you can write your operation like this:
import matplotlib.pyplot as plt
import numpy as np
T = 2
mu = 0.15
sigma = 0.10
S0 = 20
dt = 0.01
N = round(T/dt) ### Paths
simu = 20 ### number of simulations
i = 1
## creates an array with values from 0 to T with N elementes (T/dt)
t = np.linspace(0, T, N)
## result matrix creation not needed, thanks to gboffi for the hint :)
## random number showing the Wiener process
W = np.random.standard_normal(size=(simu, N))
W = np.cumsum(W, axis=1)*np.sqrt(dt) ### standard brownian motion ###
X = (mu-0.5*sigma**2)*t + sigma*W
res = S0*np.exp(X) ### new Stock prices based on the simulated returns ###
Now your results are stored in a real matrix, or correctly a np.ndarray. np.ndarray is the standard array format of numpy and thus the most widely used and supported array format.
To plot it, you need to give further information, like: Do you want to plot each row of the result array? This would then look like:
for i in range(simu):
plt.plot(t, res[i])
plt.show()
If you want to check the shape for consistency after calculation, you can do the following:
assert res.shape == (simu, N), 'Calculation faulty!'

How to create np array random data on age vs time?

How to create np array random data on age vs time?
My aim is to create a scatter plot representing random data on age vs. time spent watching TV.
from pylab import randn
X = randn(500)
Y = randn(500)
plt.scatter(X,Y)
plt.show()
I want age between 18 and 50 and time between 0 to 24 hours
You can try :
import random
import numpy as np
age=np.array(random.sample(list(range(18,51)),10))
time=np.array(random.sample(list(range(0,24)),10))
random.sample takes a list of elements as first argument and the number of samples you want as the second argument.
That gives :
age : [47 45 37 19 23 34 39 24 32 42]
time : [18 12 13 1 15 21 23 22 3 17]
On plotting it :
import matplotlib.pyplot as plt
plt.scatter(age, time)
plt.show()
To recreate the same random numbers every time you run it, you can use random.seed()
It's super easy with numpy. You can use numpy library to do this:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
age = np.random.randint(18, 50, 20)
time = np.random.randint(0, 24, 20)
plt.scatter(age, time)
plt.show()
Column-wise multiplication in numpy
You can easily create custom-sized random arrays with numpy with the commands numpy.random.rand(d0, d1, …, dn) for uniform distributions or numpy.random.randn(d0, d1, …, dn) for normal distributions, where dn is the number of samples in the nth dimension. In your case you'll have d0=500 and d1=2.
However the values will be sampled from the interval [0, 1) in numpy.random.rand(d0, d1, …, dn). Or the standard normal distribution for numpy.random.randn(d0, d1, …, dn) (i.e. mean = 0 and variance = 1).
A nice turnaround for this is to sum and multiply the arrays column-wise to shilft the distributions to the desired values. To multiply in a column-wise fashion an array arr with a vector vec you can use this small snippet of code arr.dot(np.diag(vec)). Be careful, vec should have as much elements as arr has columns.
This snippet works by turning vec into a diagonal matrix (i.e. a matrix where everything is zero except the main diagonal) and the multiplying arr to the diagonal matrix.
For uniform distributions
Remeber that to turn a sample x from an uniform distribution [0, 1) to [min, max), you do new_x = (max - min) * x + min. So if you want an uniform distribution and you know the max and min limits for boths variables, you can do as use the following code:
import numpy as np
n_samples = 500
max_age, min_age = 80, 10
max_hours, min_hours = 10, 0
array = np.random.rand(n_samples, 2) #returns samples from the uniform distribution
range_vector = np.array([max_age - min_age, max_hours - min_hours])
min_vector = np.array([min_age, min_hours])
sample = array.dot(np.diag(range_vector)) + np.ones(array.shape).dot(np.diag(min_vector))
Normal distributions
If you want a normal distribution and you know the mean and variances of both columns use the following code. Remeber that to shift a sample x from an standard normal distribution to a distribution with a different mean and standard deviation, you go new_x = deviation * x + mean.
import numpy as np
n_samples = 500
mean_age, deviation_age = 40, 20
mean_hours, deviation_hours = 5, 2
array = np.random.rand(n_samples, 2) #returns samples from the standard normal distribution
deviation_vector = np.array([deviation_age, deviation_hours])
mean_vector = np.array([mean_age, mean_hours])
sample = array.dot(np.diag(deviation_vector)) + np.ones(array.shape).dot(np.diag(mean_vector))
Be careful however, with the normal distributions you can end up withg negative values.
You can also have a look at all the documentation numpy has on random variables: https://docs.scipy.org/doc/numpy/reference/routines.random.html
Finally please notice that column-wise multiplication only works when you want both samples to be independant.

Localized random points using numpy and pandas

My idea is to try and generate random data points (2D, x and y coordinates) that would lie in close proximity to one another mimicking the following scenario:
I choose e.g. 10 points on one object.
There are 200 such objects in a database.
I record the coordinates of 10 points on the same locations on all the objects. So the data I have consists of 200x10 rows, so that first 10 rows represent coordinates of 10 points sampled on the first object, the next 10 represent the same points on the second object, and so on.
Collections of points in objects should be close in the scatterplot, but they should not be exactly the same, or too far apart. Now if I use plain random generators, most of the time I end up with a lot of evenly spaced random points...
This is the procedure I`ve tried using numpy, pandas and matplotlib and a cool usage of multvariate normal from from this post.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import brewer2mpl as bmpl
#the part of the code I use for generating correlated ranges for points
#but I have used it for generating x,y coords as well but it didn`t work out
corr = 0.95
means = [200, 180]
stds = [10, 10]
covs = [[stds[0]**2, stds[0]*stds[1]*corr],[stds[0]*stds[1]*corr, stds[1]**2]]
coordstest = np.random.multivariate_normal(means, covs, 20)
#now the part for generating x and y coords
coords1x = np.random.uniform(coordstest[0,0], coordstest[0,1], 200)
coords1y = np.random.uniform(coordstest[1,0], coordstest[1,1], 200)
coords2x = np.random.uniform(coordstest[2,0], coordstest[2,1], 200)
coords2y = np.random.uniform(coordstest[3,0], coordstest[3,1], 200)
... up to 10
#them make them into two-column arrays
coords1 = np.vstack((coords1x, coords1y)).T
coords2 = np.vstack((coords2x, coords2y)).T
... up to 10
#and generate individual levels
individuals = np.arange(0,200) #generate individual levels
individuals = np.tile(individuals, 10)
individuals = pd.Series(individuals)
#finally generate pandas data frame and plot the results
allCoords = np.concatenate((coords1, coords2, coords3, coords4, coords5, coords6, coords7, coords8, coords9, coords10))
allCoords = pd.DataFrame(allCoords)
allCoords.columns = ['x','y']
allCoords['individuals'] = individuals
allCoords['index'] = allCoords.index.tolist()
allCoords = allCoords.sort_index(by=['individuals', 'index'])
del allCoords['index']
allCoords = allCoords.set_index(np.arange(0,2000))
plt.scatter(allCoords['x'], allCoords['y'], c = allCoords['individuals'], s = 40, cmap = 'hot')
This is the scatter
and the same colored points should be grouped locally. Any ideas how this could be accomplished?
In fact you generate normally distributed intervals, and then uniformly distributed points within. Not surprisingly, you end up with non colocated groups of points.
To get colocated groups of points, you should choose expected locations:
coordstest = np.vstack([np.random.uniform(150, 220, 20),
np.random.uniform(150, 220, 20)]).T
Then generate points according to them:
coords = np.vstack([np.random.multivariate_normal(coordstest[i,:], covs, 200)
for i in range(10)])
And plot
individuals = (np.arange(0,200).reshape(-1,1)*np.ones(10).reshape(1,-1)).flatten()
individuals = pd.Series(individuals)
allCoords = pd.DataFrame(coords, columns = ['x','y'])
plt.scatter(allCoords['x'], allCoords['y'], c = individuals,
s = 40, cmap = 'hot')
Note that point are generated with linear dependency due to nontrivial covariance paramether for multivariate_normal. If you don't need it, you can for example do
coords = np.vstack([np.random.multivariate_normal(coordstest[i,:],
[[10,0],[0,10]], 200) for i in range(10)])
resulting in

Categories