I just used scipy.odeint to solve a diff_equation system, and use matplotlib to plot it. I got the graphs. My question is can I get some specific data points, like when t = 1, what is x1, x2, x3. I need when t = 1,2,3,4..., what value of concentration is. Thank you.
import matplotlib.pyplot as plt
from scipy.integrate import odeint
Dose = 100
V = 43.8
k12 = 1.2 # rate of central -> peripheral
k21 = 1.4 # rate of peripheral -> central
kel = 0.20 # rate of excrete from plasma
def diff(d_list, t):
x1, x2, x3, = d_list
# X1(t), X2(t), X3(t)
return np.array([(-k12*x1-kel*x1+k21*x2),
(k12*x1-k21*x2),
(kel*x1)])
t = np.linspace(0, 24, 960)
result = odeint(diff, [(Dose/V), 0, 0], t)
plt.plot(t, result[:, 0], label='x1: central')
plt.plot(t, result[:, 1], label='x2: tissue')
plt.plot(t, result[:, 2], label='x3: excreted')
plt.legend()
plt.xlabel('t (hr)')
plt.ylabel('Concentration (mg/L)')
plt.show()
This is not related to matplotlib or scipy. You can either interpolate or get the closest data point.
Interpolated value
If you need to get the x1, x2 and x3 for values of t which do not correspond to a data point (you mentioned 1,2,3,4 which are not in your t array), you will need to interpolate. To get x1, x2 and x3 at t=1, you can do (at the end of your script):
valuesAt1 = [np.interp(1, t, result[:,col]) for col in range(result.shape[1])]
The output of print(valuesAt1) is then:
[1.1059703843218311, 0.8813129004034452, 0.2958217381057726]
If you only need x1, just do
valuesAt1 = np.interp(1, t, result[:,0])
then, the output of print(valuesAt1) is:
1.1059703843218311
Closest data point
If you do not want to do interpolation but want the value of x1, x2 and x3 for the value of the t array element which is the closest from 1, do:
valuesAtClosestPointFrom1 = result[ np.argmin(np.abs(t-1))]
The output from print(valuesAtClosestPointFrom1) is:
[1.10563546 0.88141641 0.29605315]
This can be done by interpolation and using scipy.interpolate.InterpolatedUnivariateSpline as follows:
from scipy.interpolate import InterpolatedUnivariateSpline
splx1 = InterpolatedUnivariateSpline(t, result[:,0])
splx2 = InterpolatedUnivariateSpline(t, result[:,1])
splx3 = InterpolatedUnivariateSpline(t, result[:,2])
Firstly, you need to pass the x and y data that you want to interpolate. Secondly, create a list for x for which you want the desired values of y.
import numpy as np
desired_time = np.arange(1,25)
x1 = splx1(desired_time)
x2 = splx2(desired_time)
x3 = splx3(desired_time)
Lastly, pass it to the respective spline object to get the desired values. For example, a desired_time array from 1 to 24 using np.arange is created and passed to the spline objects in the example above.
Related
For a set of subjects I have a continuous variable with range 0-100 representing a quantification of a subject's state cont_attribute. For each subject I also have an ordinal variable representing reader annotation of subject's state as one of four states (e.g. 1, 2, 3, 4) class_label. Values for cont_attribute overlap between classes. My goal is to discretize cont_attribute so that agreement with class is optimized.
When discretizing cont_attribute, arbitrary thresholds x1, x2, x3 can be applied to the continuous variable directly, to yield bins of four ordinal categories and agreement with reader annotation class can be assessed:
cohen_kappa_score((pd.cut(df['cont_attribute'],bins=[0, x1, x2, x3, 100], labels=['1','2','3','4']).astype('int'))
, df['class_label'].astype('int'))
I have found several options for discretizatin of continuous variable such as Jenks natural breaks and sklearn Kmeans, though these options do not take into account class.
What I tried:
I attempted to optimize the function above to yield the maximal value using scipy.optimize.minimize. Here for each threshold between two classes, I use the minimum value of the larger class and the maximal value of the smaller classes as a range with which to find the respective optimal cutoff point between those classes. With this approach I run into a problem, prompting:
ValueError: bins must increase monotonically.
def objfunc(grid):
x1, x2, x3 = grid
return (cohen_kappa_score((pd.cut(df.cont_attribute,bins=[0, x1, x2, x3, 100],labels=['1','2','3','4'], duplicates='drop').astype('int'))
, df['class_label'].astype('int'))) * (-1);
grid = (slice(df[(df['class_label'] == 2)]['cont_attribute'].min(), df[(df['class_label'] == 1)]['cont_attribute'].max(), 0.5), (slice(df[(df['class_label'] == 3)]['cont_attribute'].min(), df[(df['class_label'] == 2)]['cont_attribute'].max(), 0.5), (slice(df[(df['class_label'] == 4)]['cont_attribute'].min(), df[(df['class_label'] == 3)]['cont_attribute'].max(), 0.5))
solution = brute(objfunc, grid, finish=None,full_output = True)
solution
In python, is there a straightforward way to optimize thresholds x1, x2, x3 taking agreement with class into account (supervised discretization)? Alternatively, how can the above function be rewritten to yield a maximum using scipy.optimize.minimize?
The error message is not too hard. The pandas cut method demands that the cut vector
[0,x1,x2,x3,100] is strictly monotinic. By having some mechanism to make sure that no invalid values are passed to the cut function, we are safe. That is what I implemented below. To denote an invalid setting, it is customary to use np.inf since all other values are lower. Therefore, every minizmier would say such an invalid is undesirable as a solution. See below for the implementation. I also included all the imports and some data generation, so that it is simple to use the code. Please do so in future questions as well.
You might want to use more than 10 bins per dimension in the brute force search.
Also - the code is quite inefficient. Since it brute forces over all combinations of x1, x2, x3, but a lot of them are invalid (e.g. x2<=x1), you might want to parametrize the problem in (x1,x2-x1, x3-x2) instead, and search over nonnegative values in the second and third component.
Finally, the brute method is a minimizer, so you should return -cohen_kappa from the objective
#%%
import numpy as np
from sklearn.metrics import cohen_kappa_score, confusion_matrix
from scipy.stats import truncnorm
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from scipy.optimize import brute
#
# Generate Data
#
n = 1000
np.random.seed(0)
y = np.random.choice(4, p=[0.1, 0.3, 0.4, 0.2], size=n)
x = np.zeros(n)
for i in range(5):
low = 0
high = 100
mymean = 20 * i
myscale = 8
a, b = (low - mymean) / myscale, (high - mymean) / myscale
x[y == i] = truncnorm.rvs(a=a, b=b, loc=mymean, scale=myscale, size=np.sum(y == i))
data = pd.DataFrame({"cont_attribute": x, "class_label": y})
# make a loss function that accounts for the bad orderings
def loss(cuts):
x1, x2, x3 = cuts
if 0 >= x1 or x1 >= x2 or x2 >= x3 or x3 >= 100:
return np.inf
yhat = pd.cut(
data["cont_attribute"],
bins=[0, x1, x2, x3, 100],
labels=[0, 1, 2, 3],
# duplicates="drop",
).astype("int")
return -cohen_kappa_score(data["class_label"], yhat)
# Compute the result via brute force
ranges = [(0, 100)] * 3
Ns=30
result = brute(func=loss, ranges=ranges, Ns=Ns)
print(result)
print(-loss(result))
# Evaluate the final result in a confusion matrix
x1, x2, x3 = result
data["class_pred"] = pd.cut(
data["cont_attribute"],
bins=[0, x1, x2, x3, 100],
labels=[0, 1, 2, 3],
duplicates="drop",
).astype("int")
mat = confusion_matrix(y_true=data['class_label'],y_pred=data['class_pred'])
plt.matshow(mat)
# Loop over data dimensions and create text annotations.
for i in range(4):
for j in range(4):
text = plt.text(j, i, mat[i, j],
ha="center", va="center", color="grey")
plt.xlabel('Predicted class')
plt.ylabel('True class')
plt.show()
# Evaluate result graphically
# inspect the data
fig,ax = plt.subplots(2,1)
sns.histplot(data=data, x="cont_attribute", hue="class_label",ax=ax[0],multiple='stack')
sns.histplot(data=data, x="cont_attribute", hue="class_pred",ax=ax[1],multiple='stack')
plt.show()
Regarding the use of scipy.optimize.minimize, that is not possible when using the cohen kappa as a ojective. Since it is not differentiable, it is not so easy to optimize over. Consider using a cross entropy loss function instead. But in that case, you would need a (parametric) model for the classification task.
A standard ordinal classifier is available in the ordinal regression package in statsmodels. It will be vastly faster than the brute method, but possibly less accurate when evaluated on cohen kappa. Going that route is probably what I would have done if going for a higher number of bins.
Is there a python library for Multivariate Interpolation? Right now I have three independent variables and one dependent variable. My data looks like this:
X1=[3,3,3.1,3.1,4.2,5.2,6.3,2.3,7.4,8.4,5.4,3.4,3.4,3.4,...]
X2=[12.1,12.7,18.5,18.3,18.4,18.6,24.2,24.4,24.3,24.5,30.9,30.7,30.3,30.4,6.1,6.2,...]
X3=[0.3,9.2,0.3,9.4,0.1,9.8,0.4,9.3,0.7,9.7,18.3,27.4,0.6,9.44,...]
Y=[-5.890,-5.894,2.888,-3.8706,2.1516,-2.7334,1.4723,-2.1049,0.9167,-1.7281,-2.091,-6.7394,0.8777,-1.7046,...]
and len(X1)=len(X2)=len(X3)=len(Y)=400
I want to fit or interpolate the data so that given arbitrary x1, x2, x3 values, the function f(x1,x2,x3) is going to yield an estimated y value. Like given x1=4.11, x2=10.34, and x3=10.78, the function is going to yield -8.7567(best estimate). I'd imagine the function is going to be polynomial. So maybe a spline interpolation is the best option here?
curve_fit in scipy.optimize workd. In this code, estimation is linear function but it might be better one.
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
X1=[3,3,3.1,3.1,4.2,5.2,6.3,2.3,7.4,8.4,5.4,3.4,3.4,3.4]
X2=[12.1,12.7,18.5,18.3,18.4,18.6,24.2,24.4,24.3,24.5,30.9,30.7,30.3,30.4]
X3=[0.3,9.2,0.3,9.4,0.1,9.8,0.4,9.3,0.7,9.7,18.3,27.4,0.6,9.44]
Y=[-5.890,-5.894,2.888,-3.8706,2.1516,-2.7334,1.4723,-2.1049,0.9167,-1.7281,-2.091,-6.7394,0.8777,-1.7046]
def fitFunc(x, a, b, c, d):
return a + b*x[0] + c*x[1] + d*x[2]
fitParams, fitCovariances = curve_fit(fitFunc, [X1, X2, X3], Y)
print(' fit coefficients:\n', fitParams)
# fit coefficients:
# [-6.11934208 0.21643939 0.26186705 -0.33794415]
Then use fitParams[0] + fitParams[1] * x1 + fitParams[2] * x2 + fitParams[3] * x3 is estimated y.
# get single y
def estimate(x1, x2, x3):
return fitParams[0] + fitParams[1] * x1 + fitParams[2] * x2 + fitParams[3] * x3
Compare the result with original y.
Y_estimated = [estimate(X1[i], X2[i], X3[i]) for i in range(len(X1))]
fig, ax = plt.subplots()
ax.scatter(Y, Y_estimated)
lims = [
np.min([ax.get_xlim(), ax.get_ylim()]), # min of both axes
np.max([ax.get_xlim(), ax.get_ylim()]), # max of both axes
]
ax.set_xlabel('Y')
ax.set_ylabel('Y_estimated')
ax.plot(lims, lims, 'k-', alpha=0.75, zorder=0)
ax.set_aspect('equal')
Reference scipy , stackoverflow-multifit, stackoverflow-plot xy
I'm struggling to produce an interpolation function for some 2-dimensional data I have. My data isn't standard as each value in the x-array corresponds to a unique y-array. For example:
x = [0.1, 0.2]
y1 = [13.719, 10.488, 9.885, 9.704] #Corresponding to x=0.1
y2 = [13.34, 10.259, 9.275, 8.724] #Corresponding to x=0.2
z1 = [1395., 2209., 2411., 2555.] #Corresponding to y1
z2 = [1570., 2261., 2519., 2682.] #Corresponding to y2
Ideally I would like to generate a function, f(x, y) that will return an interpolated value of z.
So far my only attempts have been through using:
from scipy.interpolate import interp2d
interpolation = interp2d(x, [y1, y2], [z1, z2])
Which, not unsurprisingly, results in the following error message:
ValueError: x and y must have equal lengths for non rectangular grid
I understand why I'm getting this message and appreciate that interp2d is not the function I should be using, but I'm unsure where to go from here.
The problem is that interp2d works with data arranged on a rectangular grid. You only have 8 data points that are not arranged in a rectangular xy grid.
You can consider a rectangle 2x8 that consists of all possible combinations of your x and y data, but you only have 8 data points (z values).
Below is an example solution with more generic scipy.interpolate.griddata function:
x = [0.1, 0.2]
y1 = [13.719, 10.488, 9.885, 9.704] #Corresponding to x=0.1
y2 = [13.34, 10.259, 9.275, 8.724] #Corresponding to x=0.2
z1 = [1395., 2209., 2411., 2555.] #Corresponding to y1
z2 = [1570., 2261., 2519., 2682.] #Corresponding to y2
y=np.concatenate((y1,y2)) # collapse all y-data into a single array
# obtain x- and y- grids
grid_x, grid_y =np.meshgrid(np.array(x), y)[0].T, np.meshgrid(np.array(x), y)[1].T
points=np.stack((np.repeat(x,4).T,y)) #obtain xy corrdinates for data points
values=np.concatenate((z1,z2)) #obtain values
grid_z0 = griddata(points.T, values, (grid_x, grid_y), method='nearest') #Nearest neighbour interpolation
You can generalize this code for other interpolation options / denser grids and so on.
i have this code:
import numpy as np
import matplotlib.pyplot as plt
from scipy.integrate import odeint
def rownanie(Y, t, l, q, a, u):
y1, y2, z1, z2 = Y
dydt = [y2, ((l*q)/a)*(1/y1)*(1-z2*u), z2, (a*y2*u)/y1]
return dydt
l = 100
q = 1
a = 10
u = 0.25
y0 = -1
z0 = 0
y0_prim, z0_prim = 0, 0
t = np.linspace(0, 100, 10001)
sol = odeint(rownanie, [y0, y0_prim, z0, z0_prim], t, args=(l,q,a,u))
print(sol)
plt.plot(sol[:, 0], sol[:, 2])
plt.xlabel('Y')
plt.ylabel('Z')
plt.grid()
So i have 4 columns of data, lets say [:, 0] till [:,0]. I have to focus only on two : [:, 0] , [:, 2]. When i make a graph of it - its a harmonic function. [:, 0] are values , [:, 2] are arguments. I need to find these arguments for which values are max. Or i need the difference, the distance beetween two arguments (two maxes) I tried with "if", but the values are approximations so they are not the same. Could you help me with this one?
You were right, you need to define a tolerance for the difference with respect to the maximum value. I marked the points for clarification. The idea here is to first get the difference from the maximum of values max(sol[:, 0]). Then you can use the NumPy array's indexing using a tolerance of 1e-4. [abs(diff) < 1e-4] returns your indices where this condition holds True. Now you have these maximum 5 points. You can do whatever processing you want with them. The choice of tolerance will depend also on your number of mesh points (10001 in this case). It requires some playing around. One can also write some function to check this smartly.
diff = sol[:, 0] - max(sol[:, 0])
plt.plot(sol[:, 0], sol[:, 2])
plt.plot(sol[:, 0][abs(diff) < 1e-4], sol[:, 2][abs(diff) < 1e-4], 'kx')
Graph
And i need to find this difference but every maximum is a little bit different
With the griddata in scipy used to perform interpolation (cubic splines and others), we have to put as parameters the data from which we interpolate, and at the same time, the new points on which we want to make a "prediction".
Is it possible to construct a "griddata object", that would have a method to predict a new point without reconstructing a new interpolation spline each time... ?
(for example, like with regression tree, we first construct the tree, then we aplly the .predict(new_points) method).
Here is an example :
import pandas as pd
import numpy as np
import sklearn
import scipy.interpolate as itp
n = 100
x1 = np.linspace(-2, 4, n)
X1 = []
X2 = []
for x in x1:
X1.append( [x for i in range(0, n)] )
X2.append( np.linspace(9, 15, n) )
X1 = np.array(X1).flatten()
X2 = np.array(X2).flatten()
Y1 = exp( 2*X1 )
Y2 = 3 * sqrt(X2)
#Data frames :
X = np.transpose( [X1, X2] )
X = pd.DataFrame(X, columns=["X1", "X2"])
Y = np.transpose( [Y1, Y2] )
Y = pd.DataFrame(Y, columns=["Y1", "Y2"])
X_new = np.transpose( [[-2], [9]] )
inter_cubic = itp.griddata(X, Y, X_new, method='cubic', fill_value=nan, rescale=False)
print(inter_cubic)
print(exp(2*(-2)), 3*sqrt(9))
Now inter_cubic is just an numpy array..
Is there a way of performing it, or can we use another "spline" constructor?
If you look at the source code for griddata (scroll down past the docstring to see the actual code), you'll see that it is a wrapper for several other interpolation functions, most of which work the way you want. In your case, with 2-d data and cubic interpolation, griddata does this:
ip = CloughTocher2DInterpolator(points, values, fill_value=fill_value,
rescale=rescale)
return ip(xi)
So instead of using griddata, you could use CloughTocher2DInterpolator. Specifically, using the names from your script, you would create the interpolator with
ip = itp.CloughTocher2DInterpolator(X, Y, fill_value=np.nan, rescale=False)
The object ip doesn't have a predict method; you just call it with the points at which you want to evaluate the interpolator. In your case, your would write
Y_new = ip(X_new)