How to pick points under the curve?

How to pick points under the curve? - python

What I'm trying to do is make a gaussian function graph. then pick random numbers anywhere in a space say y=[0,1] (because its normalized) & x=[0,200]. Then, I want it to ignore all values above the curve and only keep the values underneath it.
import numpy
import random
import math
import matplotlib.pyplot as plt
import matplotlib.mlab as mlab
from math import sqrt
from numpy import zeros
from numpy import numarray
variance = input("Input variance of the star:")
mean = input("Input mean of the star:")
x=numpy.linspace(0,200,1000)
sigma = sqrt(variance)
z = max(mlab.normpdf(x,mean,sigma))
foo = (mlab.normpdf(x,mean,sigma))/z
plt.plot(x,foo)
zing = random.random()
random = random.uniform(0,200)
import random
def method2(size):
ret = set()
while len(ret) < size:
ret.add((random.random(), random.uniform(0,200)))
return ret
size = input("Input number of simulations:")
foos = set(foo)
xx = set(x)
method = method2(size)
def undercurve(xx,foos,method):
Upper = numpy.where(foos<(method))
Lower = numpy.where(foos[Upper]>(method[Upper]))
return (xx[Upper])[Lower],(foos[Upper])[Lower]
When I try to print undercurve, I get an error:
TypeError: 'set' object has no attribute '__getitem__'
and I have no idea how to fix it.
As you can all see, I'm quite new at python and programming in general, but any help is appreciated and if there are any questions I'll do my best to answer them.

The immediate cause of the error you're seeing is presumably this line (which should be identified by the full traceback -- it's generally quite helpful to post that):
Lower = numpy.where(foos[Upper]>(method[Upper]))
because the confusingly-named variable method is actually a set, as returned by your function method2. Actually, on second thought, foos is also a set, so it's probably failing on that first. Sets don't support indexing with something like the_set[index]; that's what the complaint about __getitem__ means.
I'm not entirely sure what all the parts of your code are intended to do; variable names like "foos" don't really help like that. So here's how I might do what you're trying to do:
# generate sample points
num_pts = 500
sample_xs = np.random.uniform(0, 200, size=num_pts)
sample_ys = np.random.uniform(0, 1, size=num_pts)
# define distribution
mean = 50
sigma = 10
# figure out "normalized" pdf vals at sample points
max_pdf = mlab.normpdf(mean, mean, sigma)
sample_pdf_vals = mlab.normpdf(sample_xs, mean, sigma) / max_pdf
# which ones are under the curve?
under_curve = sample_ys < sample_pdf_vals
# get pdf vals to plot
x = np.linspace(0, 200, 1000)
pdf_vals = mlab.normpdf(x, mean, sigma) / max_pdf
# plot the samples and the curve
colors = np.array(['cyan' if b else 'red' for b in under_curve])
scatter(sample_xs, sample_ys, c=colors)
plot(x, pdf_vals)
Of course, you should also realize that if you only want the points under the curve, this is equivalent to (but much less efficient than) just sampling from the normal distribution and then randomly selecting a y for each sample uniformly from 0 to the pdf value there:
sample_xs = np.random.normal(mean, sigma, size=num_pts)
max_pdf = mlab.normpdf(mean, mean, sigma)
sample_pdf_vals = mlab.normpdf(sample_xs, mean, sigma) / max_pdf
sample_ys = np.array([np.random.uniform(0, pdf_val) for pdf_val in sample_pdf_vals])

It's hard to read your code.. Anyway, you can't access a set using [], that is, foos[Upper], method[Upper], etc are all illegal. I don't see why you convert foo, x into set. In addition, for a point produced by method2, say (x0, y0), it is very likely that x0 is not present in x.
I'm not familiar with numpy, but this is what I'll do for the purpose you specified:
def undercurve(size):
result = []
for i in xrange(size):
x = random()
y = random()
if y < scipy.stats.norm(0, 200).pdf(x): # here's the 'undercurve'
result.append((x, y))
return results

Related

modeling 5 ordinary differential equations and plotting the model to show the 5 equations

hello I am newbie at python and coding for the most part and I have 5 ordinary differential equations.(non-linear) that I want to model and have them plot. I have the parameters that are given, my main issue has been setting the independent variables to be a function of z. As well as setting the 'S' parameters to be a function of time since they vary depending on the time of year.
edited CODE
I've been able to have the code run with set parameters. I now wonder how I could take these parameters and make them behave at different times. The parameters that are set on this code are for a specific amount of "days" during the year. They are not meant to be consistent throughout. How could I implement time to have them be dependent on it?
import numpy as np
from scipy.integrate import odeint
import matplotlib.pyplot as plt
import math
from math import e
def func(z,t):
xh, xf, y, m, n = z
v1,v2,v3 = 0.05,0.06,0.07
B1,B2,B3 = 0.1984,0.1593,0.04959
d1,d2,d3 = 0.02272,0.02272,0.2
o1,o2 = 0.25,0.75
S1=S2=S3=0.005
S4=S5=0.3
p = 0
u = 500
k = 0.000075
a = 0.4784
r = 0.0165
K = 8000
i = 2
H = e**(-m*k)
g = ((xh+xf)**i)/((K**i)+((xh+xf)**i))
R = o1-(o2*(xf/(xh+xf+.002)))
P1 =(xh+xf)/(xh+y+xf+.002)
P2 = 1-((m+n)/(a*(xh+y+xf+.002)))
P3 = y/(xh+y+xf+.002)
dxhdt = (u*g*H)-(B1*(m*(xh/(xh+y+xf+.002))))-((d1+S1)*xh)-((v1*(m+n))*xh)-(xh*R)
dxfdt = (xh*R)-(B1*(m*(xf/(xh+y+xf+.002))))-((p+d2+S2)*xf)-(v2*(m+n)*xf)
dydt = (B1*(m*P1))-((d3+S3)*y)-((v3*(m+n))*y)
dmdt =(r*(m*P2))+(B2*(n*P3))-(B3*(m*P1))-(S4*m)
dndt = (r*(n*P2))-(B2*(n*P3))+(B3*(m*P1))-(S5*n)
return [dxhdt,dxfdt,dydt,dmdt,dndt]
z0=[13000,11000,0,0,0]
t = np.linspace(0,100,1000)
xx=odeint(func,z0,t)
plt.figure(1)
plt.plot(t,xx[:,0],'b-',label = 'xh')
plt.plot(t,xx[:,1],'y-',label = 'xf')
plt.plot(t,xx[:,2],'g-',label = 'y')
plt.plot(t,xx[:,3],'r-',label = 'm')
plt.plot(t,xx[:,4],'m-',label = 'n')
plt.legend()
plt.ylabel('POPULATION')
plt.xlabel('TIME')
plt.show()
I though about creating two different functions and looping the plot. How do you makes "days" of function of t? just declaring it is? I get error code "TypeError: 'float' object cannot be interpreted as an integer"
z0=[13000,11000,0,0,0]
t = np.linspace(0,91.25,1000)
xx=odeint(func,z0,t)
xy=odeint(func2,z0,t)
plt.figure(1)
for t in range(1,91.25):
plt.plot(t,xx[:,0],'b-',label = '$x_h$')
plt.plot(t,xx[:,1],'y-',label = '$x_f$')
plt.plot(t,xx[:,2],'g-',label = 'y')
plt.plot(t,xx[:,3],'r-',label = 'm')
plt.plot(t,xx[:,4],'m-',label = 'n')
for t in range(91.25,182.50):
plt.plot(t,xy[:,0],'b-',label = '$x_h$')
plt.plot(t,xy[:,1],'y-',label = '$x_f$')
plt.plot(t,xy[:,2],'g-',label = 'y')
plt.plot(t,xy[:,3],'r-',label = 'm')
plt.plot(t,xy[:,4],'m-',label = 'n')
plt.legend()
plt.ylabel('POPULATION')
plt.xlabel('TIME')
plt.show()

I get what you mean by an ODE, but please expand it so others that are not cognizant of mathematics can understand.
If you want these to be a function of z, then you must declare a function something() and assign the variables this function. This way, your values will change with respect to changes in z.
Also by convention, I don't recommend using this much of variable declarations. Abstract these as much as possible. As an alternative, you can declare similar variables in the same line, like
v1, v2, v3 = 0.5, 0.6, 0.7
etc. This will make it much more readable.
If you don't have any syntax error due to multiple assignments in the first line, I recommend change each of this to be a function of z. Divide your bigger function to smaller chunks, make each of this a different function. This way you can manipulate results directly and code will be much more readable.

You prefer the state vector to be composed as
xh, xf, y, m, n
This interpretation of the state vector then needs to be applied everywhere, which means that you have to change the first line of the ODE function to
xh, xf, y, m, n = z
Also check that your fractions are implemented as they were in paper, esp. P1 appears suspicious. But without the genesis of the equation I can not say that it is wrong as it is.

Plancks Law, Frequency figures

I want to plot the frequency version of planck's law. I first tried to do this independently:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
%matplotlib inline
# Planck's Law
# Constants
h = 6.62607015*(10**-34) # J*s
c = 299792458 # m * s
k = 1.38064852*(10**-23) # J/K
T = 20 # K
frequency_range = np.linspace(10**-19,10**19,1000000)
def plancks_law(nu):
a = (2*h*nu**3) / (c**2)
e_term = np.exp(h*nu/(k*T))
brightness = a /(e_term - 1)
return brightness
plt.plot(frequency_range,plancks_law(frequency_range))
plt.gca().set_xlim([1*10**-16 ,1*10**16 ])
plt.gca().invert_xaxis()
This did not work, I have an issue with scaling somehow. My next idea was to attempt to use this person's code from this question: Plancks Formula for Blackbody spectrum
import matplotlib.pyplot as plt
import numpy as np
h = 6.626e-34
c = 3.0e+8
k = 1.38e-23
def planck_f(freq, T):
a = 2.0*h*(freq**3)
b = h*freq/(k*T)
intensity = a/( (c**2 * (np.exp(b) - 1.0) ))
return intensity
# generate x-axis in increments from 1nm to 3 micrometer in 1 nm increments
# starting at 1 nm to avoid wav = 0, which would result in division by zero.
wavelengths = np.arange(1e-9, 3e-6, 1e-9)
frequencies = np.arange(3e14, 3e17, 1e14, dtype=np.float64)
intensity4000 = planck_f(frequencies, 4000.)
plt.gca().invert_xaxis()
This didn't work, because I got a divide by zero error. Except that I don't see where there is a division by zero, the denominator shouldn't ever be zero since the exponential term shouldn't ever be equal to one. I chose the frequencies to be the conversions of the wavelength values from the example code.
Can anyone help fix the problem or explain how I can get planck's law for frequency instead of wavelength?

You can not safely handle such large numbers; even for comparably "small" values of b = h*freq/(k*T) your float64 will overflow, e.g np.exp(709.)=8.218407461554972e+307 is ok, but np.exp(710.)=inf. You'll have to adjust your units (exponents) accordingly to avoid this!
Note that this is also the case in the other question you linked to, if you insert print( np.exp(b)[:10] ) within the definition of planck(), you can examine the first ten evaluated b's and you'll see the overflow in the first few occurrences. In any case, simply use the answer posted within the other question, but convert the x-axis in plt.plot(wavelengths, intensity) to frequency (i hope you know how to get from one to the other) :-)

lmfit for exponential data returns linear function

I'm working on fitting muon lifetime data to a curve to extract the mean lifetime using the lmfit function. The general process I'm using is to bin the 13,000 data points into 10 bins using the histogram function, calculating the uncertainty with the square root of the counts in each bin (it's an exponential model), then use the lmfit module to determine the best fit along with means and uncertainty. However, graphing the output of the model.fit() method returns this graph, where the red line is the fit (and obviously not the correct fit). Fit result output graph
I've looked online and can't find a solution to this, I'd really appreciate some help figuring out what's going on. Here's the code.
import os
import numpy as np
import matplotlib.pyplot as plt
from numpy import sqrt, pi, exp, linspace
from lmfit import Model
class data():
def __init__(self,file_name):
times_dirty = sorted(np.genfromtxt(file_name, delimiter=' ',unpack=False)[:,0])
self.times = []
for i in range(len(times_dirty)):
if times_dirty[i]<40000:
self.times.append(times_dirty[i])
self.counts = []
self.binBounds = []
self.uncertainties = []
self.means = []
def binData(self,k):
self.counts, self.binBounds = np.histogram(self.times, bins=k)
self.binBounds = self.binBounds[:-1]
def calcStats(self):
if len(self.counts)==0:
print('Run binData function first')
else:
self.uncertainties = sqrt(self.counts)
def plotData(self,fit):
plt.errorbar(self.binBounds, self.counts, yerr=self.uncertainties, fmt='bo')
plt.plot(self.binBounds, fit.init_fit, 'k--')
plt.plot(self.binBounds, fit.best_fit, 'r')
plt.show()
def decay(t, N, lamb, B):
return N * lamb * exp(-lamb * t) +B
def main():
muonEvents = data('C:\Users\Colt\Downloads\muon.data')
muonEvents.binData(10)
muonEvents.calcStats()
mod = Model(decay)
result = mod.fit(muonEvents.counts, t=muonEvents.binBounds, N=1, lamb=1, B = 1)
muonEvents.plotData(result)
print(result.fit_report())
print (len(muonEvents.times))
if __name__ == "__main__":
main()

This might be a simple scaling problem. As a quick test, try dividing all raw data by a factor of 1000 (both X and Y) to see if changing the magnitude of the data has any effect.

Just to build on James Phillips answer, I think the data you show in your graph imply values for N, lamb, and B that are very different from 1, 1, 1. Keep in mind that exp(-lamb*t) is essentially 0 for lamb = 1, and t> 100. So, if the algorithm starts at lamb=1 and varies that by a little bit to find a better value, it won't actually be able to see any difference in how well the model matches the data.
I would suggest trying to start with values that are more reasonable for the data you have, perhaps N=1.e6, lamb=1.e-4, and B=100.
As James suggested, having the variables have values on the order of 1 and putting in scale factors as necessary is often helpful in getting numerically stable solutions.

Graphing n iterations of a function- Python

I'm studying dynamical systems, particularly the logistic family g(x) = cx(1-x), and I need to iterate this function an arbitrary amount of times to understand its behavior. I have no problem iterating the function given a specific point x_0, but again, I'd like to graph the entire function and its iterations, not just a single point. For plotting a single function, I have this code:
import numpy as np
import scipy as sp
import matplotlib.pyplot as plt
def logplot(c, n = 10):
dt = .001
x = np.arange(0,1.001,dt)
y = c*x*(1-x)
plt.plot(x,y)
plt.axis([0, 1, 0, c*.25 + (1/10)*c*.25])
plt.show()
I suppose I could tackle this by the lengthy/daunting method of explicitly creating a list of the range of each iteration using something like the following:
def log(c,x0):
return c*x0*(1-x)
def logiter(c,x0,n):
i = 0
y = []
while i <= n:
val = log(c,x0)
y.append(val)
x0 = val
i += 1
return y
But this seems really cumbersome and I was wondering if there were a better way. Thanks

Some different options
This is really a matter of style. Your solution works and is not very difficult to understand. If you want to go on on those lines, then I would just tweak it a bit:
def logiter(c, x0, n):
y = []
x = x0
for i in range(n):
x = c*x*(1-x)
y.append(x)
return np.array(y)
The changes:
for loop is easier to read than a while loop
x0 is not used in the iteration (this adds one more variable, but it is mathematically easier to understand; x0 is a constant)
the function is written out, as it is a very simple one-liner (if it weren't, its name should be changed to be something else than log, which is very easy to confuse with logarithm)
the result is converted into a numpy array. (Just what I usually do, if I need to plot something)
In my opinion the function is now legible enough.
You might also take an object-oriented approach and create a logistic function object:
class Logistics():
def __init__(self, c, x0):
self.x = x0
self.c = c
def next_iter(self):
self.x = self.c * self.x * (1 - self.x)
return self.x
Then you may use this:
def logiter(c, x0, n):
l = Logistics(c, x0)
return np.array([ l.next_iter() for i in range(n) ])
Or if you may make it a generator:
def log_generator(c, x0):
x = x0
while True:
x = c * x * (1-x)
yield x
def logiter(c, x0, n):
l = log_generator(c, x0)
return np.array([ l.next() for i in range(n) ])
If you need performance and have large tables, then I suggest:
def logiter(c, x0, n):
res = np.empty((n, len(x0)))
res[0] = c * x0 * (1 - x0)
for i in range(1,n):
res[i] = c * res[i-1] * (1 - res[i-1])
return res
This avoids the slowish conversion into np.array and some copying of stuff around. The memory is allocated only once, and the expensive conversion from a list into an array is avoided.
(BTW, if you returned an array with the initial x0 as the first row, the last version would look cleaner. Now the first one has to be calculated separately if copying the vector around is desired to be avoided.)
Which one is best? I do not know. IMO, all are readable and justified, it is a matter of style. However, I speak only very broken and poor Pythonic, so there may be good reasons why still something else is better or why something of the above is not good!
Performance
About performance: With my machine I tried the following:
logiter(3.2, linspace(0,1,1000), 10000)
For the first three approaches the time is essentially the same, approximately 1.5 s. For the last approach (preallocated array) the run time is 0.2 s. However, if the conversion from a list into an array is removed, the first one runs in 0.16 s, so the time is really spent in the conversion procedure.
Visualization
I can think of two useful but quite different ways to visualize the function. You mention that you will have, say, 100 or 1000 different x0's to start with. You do not mention how many iterations you want to have, but maybe we will start with just 100. So, let us create an array with 100 different x0's and 100 iterations at c = 3.2.
data = logiter(3.6, np.linspace(0,1,100), 100)
In a way a standard method to visualize the function is draw 100 lines, each of which represents one starting value. That is easy:
import matplotlib.pyplot as plt
plt.plot(data)
plt.show()
This gives:
Well, it seems that all values end up oscillating somewhere, but other than that we have only a mess of color. This approach may be more useful, if you use a narrower range of values for x0:
data = logiter(3.6, np.linspace(0.8,0.81,100), 100)
you may color-code the starting values by e.g.:
color1 = np.array([1,0,0])
color2 = np.array([0,0,1])
for i,k in enumerate(np.linspace(0, 1, data.shape[1])):
plt.plot(data[:,i], '.', color=(1-k)*color1 + k*color2)
This plots the first columns (corresponding to x0 = 0.80) in red and the last columns in blue and uses a gradual color change in between. (Please note that the more blue a dot is, the later it is drawn, and thus blues overlap reds.)
However, it is possible to take a quite different approach.
data = logiter(3.6, np.linspace(0,1,1000), 50)
plt.imshow(data.T, cmap=plt.cm.bwr, interpolation='nearest', origin='lower',extent=[1,21,0,1], vmin=0, vmax=1)
plt.axis('tight')
plt.colorbar()
gives:
This is my personal favourite. I won't spoil anyone's joy by explaining it too much, but IMO this shows many peculiarities of the behaviour very easily.

Here's what I was aiming for; an indirect approach to understanding (by visualization) the behavior of initial conditions of the function g(c, x) = cx(1-x):
def jam(c, n):
x = np.linspace(0,1,100)
y = c*x*(1-x)
for i in range(n):
plt.plot(x, y)
y = c*y*(1-y)
plt.show()

python return array from iteration

I want to plot an approximation of the number "pi" which is generated by a function of two uniformly distributed random variables. The goal is to show that with a higher sample draw the function value approximates "pi".
Here is my function for pi:
def pi(n):
x = rnd.uniform(low = -1, high = 1, size = n) #n = size of draw
y = rnd.uniform(low = -1, high = 1, size = n)
a = x**2 + y**2 <= 1 #1 if rand. draw is inside the unit cirlce, else 0
ac = np.count_nonzero(a) #count 1's
af = np.float(ac) #create float for precision
pi = (af/n)*4 #compute p dependent on size of draw
return pi
My problem:
I want to create a lineplot that plots the values from pi() dependent on n.
My fist attempt was:
def pipl(n):
for i in np.arange(1,n):
plt.plot(np.arange(1,n), pi(i))
print plt.show()
pipl(100)
which returns:
ValueError: x and y must have same first dimension
My seocond guess was to start an iterator:
def y(n):
n = np.arange(1,n)
for i in n:
y = pi(i)
print y
y(1000)
which results in:
3.13165829146
3.16064257028
3.06519558676
3.19839679359
3.13913913914
so the algorithm isn't far off, however i need the output as a data type which matplotlib can read.
I read:
http://docs.scipy.org/doc/numpy/reference/routines.array-creation.html#routines-array-creation
and tried tom implement the function like:
...
y = np.array(pi(i))
...
or
...
y = pi(i)
y = np.array(y)
...
and all the other functions that are available from the website. However, I can't seem to get my iterated y values into one that matplotlib can read.
I am fairly new to python so please be considerate with my simple request. I am really stuck here and can't seem to solve this issue by myself.
Your help is really appreciated.

You can try with this
def pipl(n):
plt.plot(np.arange(1,n), [pi(i) for i in np.arange(1,n)])
print plt.show()
pipl(100)
that give me this plot

If you want to stay with your iterable approach you can use Numpy's fromiter() to collect the results to an array. Like:
def pipl(n):
for i in np.arange(1,n):
yield pi(i)
n = 100
plt.plot(np.arange(1,n), np.fromiter(pipl(n), dtype='f32'))
But i think Numpy's vectorize would be even better in this case, it makes the resulting code much more readable (to me). With this approach you dont need the pipl function anymore.
# vectorize the function pi
pi_vec = np.vectorize(pi)
# define all n's
n = np.arange(1,101)
# and plot
plt.plot(n, pi_vec(n))
A little side note, naming a function pi which does not return a true pi seems kinda tricky to me.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to pick points under the curve? - python

Related

modeling 5 ordinary differential equations and plotting the model to show the 5 equations

Plancks Law, Frequency figures

lmfit for exponential data returns linear function

Graphing n iterations of a function- Python

python return array from iteration

Categories

Resources