For simulating some data, I need to Sampling Random Numbers From The Truncated Multivariate Normal Distribution. Which is description of a function called tmvtnorm::rtmvnorm in R.
I have tried the function in R. But my script is major written by python. So I would like to know If there are any function could do the same things?
I have tried truncnorm in scipy, emcee(python libray). But it all doesn't work like the result outputed by tmvtnorm::rtmvnorm.
Finally, I am using the rpy2 to get the result output from R.
Here is the needed question:
Any tools which could work like tmvtnorm::rtmvnorm?
Any explan about the differences of tmvtnorm::rtmvnorm and truncnorm in scipy.
Thanks.
We could call R from python and get the output generated from rtmvnorm
from pyper import *
import pandas as pd
r=R(use_pandas=True)
r('''
library(tmvtnorm)
sigma <- matrix(c(4,2,2,3), ncol=2)
x <- rtmvnorm(n=500, mean=c(1,2), sigma=sigma, upper=c(1,0))
''')
out = pd.DataFrame(r.get('x'))
pr int(out.head(5))
# 0 1
# 0 -0.832567 -1.976393
# 1 0.466617 -0.266892
# 2 0.802809 -0.403514
# 3 -2.295357 -1.896990
# 4 -0.128641 -0.392827
Related
Looking for someone who can explain this to me:
phase = mod(phase,Nper*2*pi)
cl_phase = arange(0,Nper*2*pi+step,step)
c,p = histogram(phase,cl_phase)
while 0 in c:
step = step*2
cl_phase = arange(0,Nper*2*pi+step,step)
c,p = histogram(phase,cl_phase)
Where phase is the phase of a wave, Nper is the number of periods I'm analysing.
What I want to know is if some one can give me the name/link to an explanation of the histogram function..! Im not even sure from what package it comes from. Maybe numpy? Or maybe it even is a function that comes with python..! Super lost here..!
Any help here would be greatly appreciated!!
histogram() function is from numpy library. It doesn't come as a default function in Python.
You can use it by:
import numpy as np
np.histogram(phase,cl_phase)
In your code, it looks like you are using it as:
from numpy import histogram
histogram(phase,cl_phase)
c,p = histogram(phase, cl_phase) will give you two values as output. c will be the values of the histogram, and p will return the bin edges. You should take a look at the above docs for more info.
I am trying to get a solution for a stiff ODE problem where at each integration step, i have to modify the solution vector before continuing on the integration.
For that, i am using scipy.integrate.ode, with the integrator VODE, in bdf mode.
Here is a simplified version of the code i am using. The function is much more complex than that and involve the use of CANTERA.
from scipy.integrate import ode
import numpy as np
import matplotlib.pyplot as plt
def yprime(t,y):
return y
vode = ode(yprime)
vode.set_integrator('vode', method='bdf', with_jacobian=True)
y0 = np.array([1.0])
vode.set_initial_value(y0, 0.0)
y_list = np.array([])
t_list = np.array([])
while vode.t<5.0 and vode.successful:
vode.integrate(vode.t+1e-3,step=True)
y_list = np.append(y_list,vode.y)
t_list = np.append(t_list,vode.t)
plt.plot(t_list,y_list)
Output:
So far so good.
Now, the problem is that within each step, I would like to modify y after it has been integrated by VODE. Naturally, i want VODE to keep on integrating with the modified solution.
This is what i have tried so far :
while vode.t<5.0 and vode.successful:
vode.integrate(vode.t+1e-3,step=True)
vode.y[0] += 1 # Will change the solution until vode.integrate is called again
vode._y[0] += 1 # Same here.
I also have tried looking at vode._integrator, but it seems that everything is kept inside the fortran instance of the solver.
For quick reference, here is the source code of scipy.integrate.ode, and here is the pyf interface scipy is using for VODE.
Has anyone tried something similar ? I could also change the solver and / or the wrapper i am using, but i would like to keep on using python for that.
Thank you very much !
For those getting the same problem, the issue lies in the Fortran wrapper from Scipy.
My solution was to change the package used, from ode to solve_ivp. The difference is that solve_ivp is entirely made with Python, and you will be able to hack your way through the implementation. Note that the code will run slowly compared to the vode link that the other package used, even though the code is very well written and use numpy (basically, C level of performances whenever possible).
Here are the few steps you will have to follow.
First, to reproduce the already working code :
from scipy.integrate import _ivp # Not supposed to be used directly. Be careful.
import numpy as np
import matplotlib.pyplot as plt
def yprime(t,y):
return y
y0 = np.array([1.0])
t0 = 0.0
t1 = 5.0
# WITHOUT IN-BETWEEN MODIFICATION
bdf = _ivp.BDF(yprime,t0,y0,t1)
y_list = np.array([])
t_list = np.array([])
while bdf.t<t1:
bdf.step()
y_list = np.append(y_list,bdf.y)
t_list = np.append(t_list,bdf.t)
plt.plot(t_list,y_list)
Output :
Now, to implement a way to modify the values of y between integration steps.
# WITH IN-BETWEEN MODIFICATION
bdf = _ivp.BDF(yprime,t0,y0,t1)
y_list = np.array([])
t_list = np.array([])
while bdf.t<t1:
bdf.step()
bdf.D[0] -= 0.1 # The first column of the D matrix is the value of your vector y.
# By modifying the first column, you modify the solution at this instant.
y_list = np.append(y_list,bdf.y)
t_list = np.append(t_list,bdf.t)
plt.plot(t_list,y_list)
Gives the plot :
This does not have any physical sense for this problem, unfortunately, but it works for the moment.
Note : It is entirely possible that the solver become unstable. It has to do with the Jacobian not being updated at the right time, and so one would have to recalculate it again, which is performance heavy most of the time. The good solution to that would be to rewrite the class BDF to implement the modification before the Jacobian Matrix is updated.
Source code here.
I have a dataset in the form of a table:
Score Percentile
381 1
382 2
383 2
...
569 98
570 99
The complete table is here as a Google spreadsheet.
Currently, I am computing a score and then doing a lookup on this dataset (table) to find the corresponding percentile rank.
Is it possible to create a function to calculate the corresponding percentile rank for a given score using a formula instead of looking it up in the table?
It's impossible to recreate the function that generated a given table of data, if no information is provided about the process behind that data.
That being said, we can make some speculation.
Since it's a "percentile" function, it probably represents the cumulative value of a probability distribution of some sort. A very common probability distribution is the normal distribution, whose "cumulative" counterpart (i.e. its integral) is the so called "error function" ("erf").
In fact, your tabulated data looks a lot like an error function for a variable whose average value is 473.09:
your dataset: orange; fitted error function (erf): blue
However, the agreement is not perfect and that could be because of three reasons:
the fitting procedure I've used to generate the parameters for the error function didn't use the right constraints (because I have no idea what I'm modelling!)
your dataset doesn't represent an exact normal distribution, but rather real world data whose underlying distribution is the normal distribution. The features of your sample data that deviate from the model are being ignored altogether.
the underlying distribution is not a normal distribution at all, its integral just happens to look like the error function by chance.
There is literally no way for me to tell!
If you want to use this function, this is its definition:
import numpy as np
from scipy.special import erf
def fitted_erf(x):
c = 473.09090474
w = 37.04826334
return 50+50*erf((x-c)/(w*np.sqrt(2)))
Tests:
In [2]: fitted_erf(439) # 17 from the table
Out[2]: 17.874052406601457
In [3]: fitted_erf(457) # 34 from the table
Out[3]: 33.20270318344252
In [4]: fitted_erf(474) # 51 from the table
Out[4]: 50.97883169390196
In [5]: fitted_erf(502) # 79 from the table
Out[5]: 78.23955071273468
however I'd strongly advise you to check if a fitted function, made without knowledge of your data source, is the right tool for your task.
P.S.
In case you're interested, this is the code used to obtain the parameters:
import numpy as np
from scipy.special import erf
from scipy.optimize import curve_fit
tab=np.genfromtxt('table.csv', delimiter=',', skip_header=1)
# using a 'table.csv' file generated by Google Spreadsheets
x = tab[:,0]
y = tab[:,1]
def parametric_erf(x, c, w):
return 50+50*erf((x-c)/(w*np.sqrt(2)))
pars, j = curve_fit(parametric_erf, x, y, p0=[475,10])
print(pars)
# outputs [ 473.09090474, 37.04826334]
and to generate the plot
import matplotlib.pyplot as plt
plt.plot(x,parametric_erf(x,*pars))
plt.plot(x,y)
plt.show()
Your question is quite vague but it seems whatever calculation you do ends up with a number in the range 381-570, is this correct. You have a multiline calculation which gives this number? I'm guessing you are repeating this in many places in your code which is why you want to procedurise it?
For any calculation you can wrap it in a function. For instance:
answer = variable_1 * variable_2 + variable_3
can be written as:
def calculate(v1, v2, v3):
''' calculate the result from the inputs
'''
return v1 * v2 + v3
answer = calculate(variable_1, variable_2, variable_3)
if you would like a definitive answer then simply post your calculation and I can make it into a function for you
So far I found 4 ways to find peaks in Python, however none of them can specify the number of peaks like Matlab does. Can someone provide some insight?
import scipy.signal as sg
import numpy as np
# Method 1
sg.find_peaks_cwt(vector, np.arange(1,4),max_distances=np.arange(1, 4)*2)
# Method 2
sg.argrelextrema(np.array(vector),comparator=np.greater,order=2)
# Method 3
sg.find_peaks(vector, height=7, distance=2.1)
# Method 4
detect_peaks.detect_peaks(vector, mph=7, mpd=2)`
Below is the Matlab code that I want to emulate:
[pks,locs] = findpeaks(data,'Npeaks',n)
If you want the exact function Matlab has, why not just use that function? If you have the rest of your data in Python, then you can just use the module provided by Matlab.
import matlab.engine #import matlab engine
eng = matlab.engine.start_matlab() #Start matlab engine
a = a = [(0.1*i)*(0.1*i-1)*(0.1*i-2) for i in range(50)] #Create some data with peaks
b = eng.findpeaks(matlab.double(a),'Npeaks',1) #Find 1 peak
Try the findpeaks library. Multiple methods are available for the detections of peaks and valleys in 1D-vectors and 2D-arrays (images).
pip install findpeaks
Lets create some peaks:
i = 10000
xs = np.linspace(0,3.7*np.pi,i)
X = (0.3*np.sin(xs) + np.sin(1.3 * xs) + 0.9 * np.sin(4.2 * xs) + 0.06 *
np.random.randn(i))
# import library
from findpeaks import findpeaks
# Initialize
fp = findpeaks()
# Find the peaks (high/low)
results = fp.fit(X)
# Make plot
fp.plot()
# Some of the results:
results['df']
Is it possible to perform multi-variate regression in Python using NumPy?
The documentation here suggests that it is, but I cannot find any more details on the topic.
Yes, download this ( http://www.scipy.org/Cookbook/OLS?action=AttachFile&do=get&target=ols.0.2.py ) from http://www.scipy.org/Cookbook/OLS
Or you can install R and a python-R link. R can do anything.
The webpage that you linked to mentions numpy.linalg.lstsq to find the vector x
which minimizes |b - Ax|. Here is a little example of how it can be used:
First we setup some "random" data:
import numpy as np
c1,c2 = 5.0,2.0
x = np.arange(1,11)/10.0
y = c1*np.exp(-x)+c2*x
b = y + 0.01*max(y)*np.random.randn(len(y))
A = np.column_stack((np.exp(-x),x))
c,resid,rank,sigma = np.linalg.lstsq(A,b)
print(c)
# [ 4.96579654 2.03913202]
You might want to look into the scipy.optimize.leastsq function. It's rather complicated but I seem to remember that being the thing I would look to when I wanted to do a multivariate regression. (It's been a while so I could be misremembering)