using Python I have an array with coefficients from a polynomial, let's say
polynomial = [1,2,3,4]
which means the equation:
y = 4x³ + 3x² + 2x + 1
(so the array is in reversed order)
Now how do I plot this into a visual curve in the Jupyter Notebook?
There was a similar question:
Plotting polynomial with given coefficients
but I didn't understand the answer (like what is a and b?).
And what do I need to import to make this happen?
First, you have to decide the limits for x in your plot. Let's say x goes from -2 to 2. Let's also ask for a hundred points on our curve (this can be any sufficiently large number for your interval so that you get a smooth-looking curve)
Let's create that array:
lower_limit = -2
upper_limit = 2
num_pts = 100
x = np.linspace(lower_limit, upper_limit, num_pts)
Now, let's evaluate y at each of these points. Numpy has a handy polyval() that'll do this for us. Remember that it wants the coefficients ordered by highest exponent to lowest, so you'll have to reverse the polynomial list
poly_coefs = polynomial[::-1] # [4, 3, 2, 1]
y = np.polyval(poly_coefs, x)
Finally, let's plot everything:
plt.plot(x, y, '-r')
You'll need the following imports:
import numpy as np
from matplotlib import pyplot as plt
If you don't want to import numpy, you can also write vanilla python methods to do the same thing:
def linspace(start, end, num_pts):
step = (end - start) / (num_pts - 1)
return [start + step * i for i in range(num_pts)]
def polyval(coefs, xvals):
yvals = []
for x in xvals:
y = 0
for power, c in enumerate(reversed(coefs)):
y += c * (x ** power)
yvals.append(y)
return yvals
Related
I'm having a parabola with both axes being from 0 to 1 as follows:
The parabola is created and normalized with the following code:
import matplotlib.pyplot as plt
import numpy as np
# normalize array
def min_max_scale_array(arr):
arr = np.array(arr)
return (arr - arr.min())/(arr.max()-arr.min())
x = np.linspace(-50,50,100)
y = x**2
x = min_max_scale_array(x)
y = min_max_scale_array(y)
fig, ax = plt.subplots()
ax.plot(x, y)
I want to create another one with both ends being the same but both sides become steeper like this:
I thought of joining an exponential curve and its reflection but that would make the resulting parabola looks pointy at the bottom.
Can you show me how to achieve this? Thank you!
If you want to modify any arbitrary curve, you can change the x values, for example taking a power of it:
# x and y are defined
for factor in [1.1, 1.5, 2, 3, 4]:
x2 = 2*x-1
x3 = (abs(x2)**(1/factor))*np.sign(x2)/2+0.5
ax.plot(x3, y, label=f'{factor=}')
output:
You can change the exponent to get a steeper curve with the same value at the extremes. You need to pick a larger value that is an even integer (odd numbers won't give a parabola).
y = x**4
I've already created a code for random walk of 10000 steps and then repeated it 12 times and stored each run in a separate text file (which was required in the question). I then calculated the mean square displacement of it(not sure if it's done correct). I now need to 'plot my Mean Square Displacement as a function of δt, including errorbars σ = std(MSD)/√N, where std(MSD) is the standard deviation among the different runs and N is the number of runs.' and then compute the diffusion constant D from the curve and check that D = 2 (∆/dt) where dt = 1.
Here is my code so far:
import numpy as np
import matplotlib.pyplot as plt
import random as rd
import math
a = (np.zeros((10000, 2), dtype=np.float))
def randwalk(x,y):
theta= 2*math.pi*rd.random()
x+=math.cos(theta); # This uses the equation given, since we are told the spatial unit = 1
y+=math.sin(theta);
return (x,y)
x, y = 0.,0.
for i in range(10000): # Using for loop and range function to initialize the array
x, y = randwalk(x,y)
a[i,:] = x,y
fn_base = "random_walk_%i.txt" # Saves each run in a numbered text file, fn_base is a varaible to hold format
N = 12
for j in range(N):
rd.seed(j) # seed(j) explicitly sets the seed to random numbers
x , y = 0., 0.
for i in range(10000):
x, y = randwalk(x,y)
a[i,:] = x, y
fn = fn_base % j
np.savetxt(fn, a)
destinations = np.zeros((12, 2), dtype=np.float)
for j in range(12):
x, y = 0., 0.
for i in range(10000):
x, y = randwalk(x, y)
destinations[j] = x, y
square_distances = destinations[:,0] ** 2 + destinations[:,1] ** 2
m_s_d = np.mean(square_distances)
I think that to do it I just have to plot the msd against the number of steps? But I'm not sure how to do this. I saw a similar question on stackoverflow but the code for it is different than mine and I don't understand how to use that for my code.
I tried to do next
plt.figure()
t = 10000
plt.plot(m_s_d, t)
plt,show()
But this gives an error as the dimensions are not equal.
Edit ** I think my issue is that I am trying to plot it against number of steps when I should be plotting it against the change in time. However I can’t work out how to calculate the change in time dt?
Apologies in advance is question isn't formulated well, I am fairly new to computing. Thank you.
From https://stackoverflow.com/a/30460089/2202107, we can generate CDF of a normal distribution:
import numpy as np
import matplotlib.pyplot as plt
N = 100
Z = np.random.normal(size = N)
# method 1
H,X1 = np.histogram( Z, bins = 10, normed = True )
dx = X1[1] - X1[0]
F1 = np.cumsum(H)*dx
#method 2
X2 = np.sort(Z)
F2 = np.array(range(N))/float(N)
# plt.plot(X1[1:], F1)
plt.plot(X2, F2)
plt.show()
Question: How do we generate the "original" normal distribution, given only x (eg X2) and y (eg F2) coordinates?
My first thought was plt.plot(x,np.gradient(y)), but gradient of y was all zero (data points are evenly spaced in y, but not in x) These kind of data is often met in percentile calculations. The key is to get the data evenly space in x and not in y, using interpolation:
x=X2
y=F2
num_points=10
xinterp = np.linspace(-2,2,num_points)
yinterp = np.interp(xinterp, x, y)
# for normalizing that sum of all bars equals to 1.0
tot_val=1.0
normalization_factor = tot_val/np.trapz(np.ones(len(xinterp)),yinterp)
plt.bar(xinterp, normalization_factor * np.gradient(yinterp), width=0.2)
plt.show()
output looks good to me:
I put my approach here for examination. Let me know if my logic is flawed.
One issue is: when num_points is large, the plot looks bad, but it's a issue in discretization, not sure how to avoid it.
Related posts:
I failed to understand why the answer was so complicated in https://stats.stackexchange.com/a/6065/131632
I also didn't understand why my approach was different than Generate distribution given percentile ranks
I have a problem with optimization of the rejection method of generating continuous random variables. I've got a density: f(x) = 3/2 (1-x^2). Here's my code:
import random
import matplotlib.pyplot as plt
import numpy as np
import time
import scipy.stats as ss
a=0 # xmin
b=1 # xmax
m=3/2 # ymax
variables = [] #list for variables
def f(x):
return 3/2 * (1 - x**2) #probability density function
reject = 0 # number of rejections
start = time.time()
while len(variables) < 100000: #I want to generate 100 000 variables
u1 = random.uniform(a,b)
u2 = random.uniform(0,m)
if u2 <= f(u1):
variables.append(u1)
else:
reject +=1
end = time.time()
print("Time: ", end-start)
print("Rejection: ", reject)
x = np.linspace(a,b,1000)
plt.hist(variables,50, density=1)
plt.plot(x, f(x))
plt.show()
ss.probplot(variables, plot=plt)
plt.show()
My first question: Is my probability plot made properly?
And the second, what is in the title. How to optimize that method? I would like to get some advice to optimize the code. Now that code takes about 0.5 seconds and there are about 50 000 rejections. Is it possible to reduce the time and number of rejections? If it's needed I can optimize using a different method of generating variables.
My first question: Is my probability plot made properly?
No. It is made versus default normal distribution. You have to pack your function f(x) into class derived from stats.rv_continuous, make it into _pdf method, and pass it to probplot
And the second, what is in the title. How to optimise that method? Is it possible to reduce the time and number of rejections?
Sure, you have the power of NumPy vector abilities at your hands. Don't ever write explicit loops - vectoriz, vectorize and vectorize!
Look at modified code below, not a single loop, everything is done via NumPy vectors. Time went down on my computer for 100000 samples (Xeon, Win10 x64, Anaconda Python 3.7) from 0.19 to 0.003.
import numpy as np
import scipy.stats as ss
import matplotlib.pyplot as plt
import time
a = 0. # xmin
b = 1. # xmax
m = 3.0/2.0 # ymax
def f(x):
return 1.5 * (1.0 - x*x) # probability density function
start = time.time()
N = 100000
u1 = np.random.uniform(a, b, N)
u2 = np.random.uniform(0.0, m, N)
negs = np.empty(N)
negs.fill(-1)
variables = np.where(u2 <= f(u1), u1, negs) # accepted samples are positive or 0, rejected are -1
end = time.time()
accept = np.extract(variables>=0.0, variables)
reject = N - len(accept)
print("Time: ", end-start)
print("Rejection: ", reject)
x = np.linspace(a, b, 1000)
plt.hist(accept, 50, density=True)
plt.plot(x, f(x))
plt.show()
ss.probplot(accept, plot=plt) # against normal distribution
plt.show()
Concerning reducing number of rejections, you could sample with 0 rejects doing inverse method, it is cubic equation so it could work with easy
UPDATE
Here is the code to use for probplot:
class my_pdf(ss.rv_continuous):
def _pdf(self, x):
return 1.5 * (1.0 - x*x)
ss.probplot(accept, dist=my_pdf(a=a, b=b, name='my_pdf'), plot=plt)
and you should get something like
Regarding your first question, scipy.stats.probplot compares your sample against the quantiles of the normal distribution. If you'd like it to compare against the quantiles of your f(x) distribution, check out the dist parameter of probplot.
In terms of making this sampling procedure faster, avoiding loops is generally the way to go. Replacing the code between start = ... and end = ... with the following resulted in a >20x speedup for me.
n_before_accept_reject = 150000
u1 = np.random.uniform(a, b, size=n_before_accept_reject)
u2 = np.random.uniform(0, m, size=n_before_accept_reject)
variables = u1[u2 <= f(u1)]
reject = n_before_accept_reject - len(variables)
Note that this will give you approximately 100000 accepted samples each time you run it. You could raise the value of n_before_accept_reject slightly to effectively guarantee that variables will always have >100000 accepted values, and then just cap the size of variables to return exactly 100000 if necessary.
Others have spoken to the probability plotting, I'm going to address the efficiency of the rejection algorithm.
Acceptance/rejection schemes are based on m(x), a "majorizing function". A majorizing function should have two properties: 1) m(x)≥ f(x) ∀ x; and 2) m(x), when scaled to be a distribution, should be easy to generate values from.
You went with the constant function m = 3/2, which meets both requirements but does not bound f(x) very closely. Integrated from zero to one, that has an area of 3/2. Your f(x), being a valid density function, has an area of 1. Consequently, ∫f(x)) / ∫m(x)) = 1 / (3/2) = 2/3. In other words, 2/3 of the values you generate from the majorizing function are accepted, and you are rejecting 1/3 of the attempts.
You need an m(x) which provides a tighter bound for f(x). I went with a line which is tangent to f(x) at x = 1/2. With a little bit of calculus to get the slope, I derived m(x) = 15/8 - 3x/2.
This choice of m(x) has an area of 9/8, so only 1/9 of the values will be rejected. A bit more calculus yielded the inverse transform generator for x's based on this m(x) is x = (5 - sqrt(25 - 24U)) / 4, where U is a uniform(0,1) random varible.
Here's an implementation, based off your original version. I wrapped the rejection scheme in a function, and created the values with a list comprehension rather than appending to a list. As you'll see if you run this, it produces a lot fewer rejections than your original version.
import random
import matplotlib.pyplot as plt
import numpy as np
import time
import math
import scipy.stats as ss
a = 0 # xmin
b = 1 # xmax
reject = 0 # number of rejections
def f(x):
return 3.0 / 2.0 * (1.0 - x**2) #probability density function
def m(x):
return 1.875 - 1.5 * x
def generate_x():
global reject
while True:
x = (5.0 - math.sqrt(25.0 - random.uniform(0.0, 24.0))) / 4.0
u = random.uniform(0, m(x))
if u <= f(x):
return x
reject += 1
start = time.time()
variables = [generate_x() for _ in range(100000)]
end = time.time()
print("Time: ", end-start)
print("Rejection: ", reject)
x = np.linspace(a,b,1000)
plt.hist(variables,50, density=1)
plt.plot(x, f(x))
plt.show()
I have a list of y values and a list of x values. I would like to find the area under the curve defined by these points. I have found a couple of solutions to this problem for x values with even spacing:
1) Calculating the area under a curve given a set of coordinates, without knowing the function
2) Using scipy to perform discrete integration of the sample
But neither of these works when the x values are not evenly spaced.
For example:
>>> from scipy.integrate import simps
>>> y = np.array([1,1,1,1])
>>> x = np.array([0,5,20,30])
>>> simps(y,x)
-inf
Of course, using x = np.array([0,10,20,30]) in the above code returns 30.0, as expected.
Can anyone suggest a way to find the area under a curve with uneven x-spacing?
I'd just go for a simple trapezoidal rule:
import numpy as np
x = np.array([0,5,20,30])
y = np.array([1,1,1,1])
s = np.sum((x[1:] - x[:-1]) * (y[1:] + y[:-1]) / 2)
# same as
# s = 0.0
# for k in range(len(x) - 1):
# s += (x[k+1] - x[k]) * (y[k+1] + y[k]) / 2
print(s)
30.0