Following my previous two posts (post1, post 2), I have now reached the point where I use scipy to find a curve fit. However, the code I have produces an error.
A sample of the .csv file I'm working with is located in post1. I tried to copy and substitute examples from the Internet, but it doesn't seem to be working.
Here's what I have (the .py file)
import pandas as pd
import numpy as np
from scipy import optimize
df = pd.read_csv("~/Truncated raw data hcl.csv", usecols=['time' , '1mnaoh trial 1']).dropna()
data1 = df
array1 = np.asarray(data1)
x , y = np.split(array1,[-1],axis=1)
def func(x, a , b , c , d , e):
return a + (b - a)/((1 + c*np.exp(-d*x))**(1/e))
popt, pcov = optimize.curve_fit(func, x , y , p0=[23.2, 30.1 , 1 , 1 , 1])
popt
From the limited research I've done, it might be a problem with the x and y arrays. The title states the error that is written. It is a minpack.error.
Edit: the error returned
ValueError: object too deep for desired array
Traceback (most recent call last):
File "~/test2.py", line 15, in <module>
popt, pcov = optimize.curve_fit(func, x , y , p0=[23.2, 30.1 , 1 , 1 , 1])
File "~/'virtualenvname'/lib/python3.7/site-packages/scipy/optimize/minpack.py", line 744, in curve_fit
res = leastsq(func, p0, Dfun=jac, full_output=1, **kwargs)
File "~/'virtualenvname'/lib/python3.7/site-packages/scipy/optimize/minpack.py", line 394, in leastsq
gtol, maxfev, epsfcn, factor, diag)
minpack.error: Result from function call is not a proper array of floats.
Thank you.
After the split, the shape of x and y is (..., 1). This means that each element of them itself are arrays of length one. You want to flatten the array first, i.e. via x = np.flatten(x).
But I think you don't need the split at all. You can just do the following
array1 = np.asarray(data1).T
x , y = array1
You want x and y to be the first and second columns of array1. So an easy way to achieve this is to transpose the array first. You could also access them via [:,0] and [:,1].
Related
I am fitting a very simple curve having three points. with leastsq method, following all the rules. But still I am getting error. I cannot understand. Can anyone help. Thank you so much
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import leastsq
x = np.array([2.0,30.2,15.0])
y = np.array([45.0,56.2,30.0])
print(x)
print(y)
# model
def t(x,a,b,c):
return a*x**2 + b*x + c
#residual fucntion
def residual_t(x,y,a,b,c):
return y-t(x,a,b,c)
#initial parameters
g0 = np.array([0.0,0.0,0.0])
#leastsq method
coeffs, cov = leastsq(residual_t, g0, args=(x,y))
plt.plot(x,t(x,*coeffs),'r')
plt.plot(x,y,'b')
plt.show()
#finding out Rsquared and Radj squared value
absError = residual_t(y,x,*coeffs)
se = np.square(absError) # squared errors
Rsquared = 1.0 - (np.var(absError) / np.var(y))
n = len(x)
k = len(coeffs)
Radj_sq = (1-((1-Rsquared)/(n-1)))/(n-k-1)
print (f'Rsquared value: {Rsquared} adjusted R saquared value: {Radj_sq}')
TypeError: residual_t() missing 2 required positional arguments: 'b' and 'c'
Why??
coeffs is already a array containing best it values of a, b,c. coeffs is also showing undefined and residual_t is also showing problem. Could you please help me to understand.
With a copy-n-paste of your code (including the *coeffs change), I get
1135:~/mypy$ python3 stack58206395.py
[ 2. 30.2 15. ]
[45. 56.2 30. ]
Traceback (most recent call last):
File "stack58206395.py", line 24, in <module>
coeffs, cov = leastsq(residual_t, g0, args=(x,y))
File "/usr/local/lib/python3.6/dist-packages/scipy/optimize/minpack.py", line 383, in leastsq
shape, dtype = _check_func('leastsq', 'func', func, x0, args, n)
File "/usr/local/lib/python3.6/dist-packages/scipy/optimize/minpack.py", line 26, in _check_func
res = atleast_1d(thefunc(*((x0[:numinputs],) + args)))
TypeError: residual_t() missing 2 required positional arguments: 'b' and 'c'
That is the error is in the use of residual_t within the leastsq call.
If I add
residual_t(g0, x, y)
right after the g0 definition I get the same error:
1136:~/mypy$ python3 stack58206395.py
[ 2. 30.2 15. ]
[45. 56.2 30. ]
Traceback (most recent call last):
File "stack58206395.py", line 23, in <module>
residual_t(g0, x, y)
TypeError: residual_t() missing 2 required positional arguments: 'b' and 'c'
So you need to define residual_t to work with a call like this. I'm not going to take a guess as to what you really want, so I'll leave the fix up to you.
Just remember that residual_t will be called with the x0, spliced with the args tuple. This is typical usage for scipy.optimize functions. Review the docs if necessary.
edit
Defining the function as:
def residual_t(abc, x, y):
a,b,c = abc
return y-t(x,a,b,c)
runs without error.
This is my code
import os
import sys
import numpy as np
import scipy
from scipy.optimize import leastsq
def peval (inp_mat,p):
m0,m1,m2,m3,m4,m5,m6,m7 = p
out_mat = np.array(np.zeros(inp_mat.shape,dtype=np.float32))
mid = inp_mat.shape[0]/2
for xy in range(0,inp_mat.shape[0]):
if (xy<(inp_mat.shape[0]/2)):
out_mat[xy] = ( ( (inp_mat[xy+mid]*m0)+(inp_mat[xy]*m1)+ m2 ) /( (inp_mat[xy+mid]*m6)+(inp_mat[xy]*m7)+1 ) )
else:
out_mat[xy] = ( ( (inp_mat[xy]*m3)+(inp_mat[xy-mid]*m4)+ m5 ) /( (inp_mat[xy]*m6)+(inp_mat[xy-mid]*m7)+1 ) )
return np.array(out_mat)
def residuals(p, out_mat, inp_mat):
m0,m1,m2,m3,m4,m5,m6,m7 = p
err=np.array(np.zeros(inp_mat.shape,dtype=np.float32))
if (out_mat.shape == inp_mat.shape):
for xy in range(0,inp_mat.shape[0]):
err[xy] = err[xy]+ (out_mat[xy] -inp_mat[xy])
return np.array(err)
f = open('/media/anilil/Data/Datasets/repo/txt_op/vid.txt','r')
x = np.loadtxt(f,dtype=np.int16,comments='#',delimiter='\t')
nof = x.shape[0]/72 # Find the number of frames
x1 = x.reshape(-1,60,40)
x1_1= x1[0,:,:].flatten()
x1_2= x1[1,:,:].flatten()
x= []
y= []
for xy in range(1,50,1):
y.append(x1[xy,:,:].flatten())
x.append(x1[xy-1,:,:].flatten())
x=np.array(x,dtype=np.float32)
y=np.array(y,dtype=np.float32)
length = x1_1.shape#initail guess
p0 = np.array([1,1,1,1,1,1,1,1],dtype=np.float32)
abc=leastsq(residuals, p0,args=(y,x))
print ('Size of first matrix is '+str(x1_1.shape))
print ('Size of first matrix is '+str(x1_2.shape))
print ("Done with program")
I have tried adding np.array in most places with no use.
Could someone please help me ?
Another question here is do I give the output of the residuals() as a single value by adding all errorsnp.sum(err,axis=1). or leave it the way it is ?
When I return np.sum(err,axis=1) in the function residuals(). There is no change in the initial guess. It just remains the same.
I.E error is for each item in the input output mapping. or a combined error overall ?
Example data.
Output
ValueError: object too deep for desired array
Traceback (most recent call last):
File "/media/anilil/Data/charm/mv_clean/.idea/nose_reduction_mpeg.py", line 49, in <module>
abc=leastsq(residuals, p0,args=(y,x))
File "/usr/lib/python2.7/dist-packages/scipy/optimize/minpack.py", line 378, in leastsq
gtol, maxfev, epsfcn, factor, diag)
minpack.error: Result from function call is not a proper array of floats.
leastsq requires a 1D array to be returned from your residuals function.
Currently you calculate the residuals for the whole image and return that as a 2D array.
The simple fix would be to flatten the array of residuals (turning your 2D array into a 1D one).
So instead of returning
return np.array(err)
Do this instead
return err.flatten()
Note that err is already a numpy array so doesn't need to be cast before the return (I guess that slipped in when you were trying to debug it!)
I am trying to optimise a function using the fminbound function of the scipy.optimize module. I want to set parameter bounds to keep the answer physically sensible (e.g. > 0).
import scipy.optimize as sciopt
import numpy as np
The arrays:
x = np.array([[ 1247.04, 1274.9 , 1277.81, 1259.51, 1246.06, 1230.2 ,
1207.37, 1192. , 1180.84, 1182.76, 1194.76, 1222.65],
[ 589. , 581.29, 576.1 , 570.28, 566.45, 575.99,
601.1 , 620.6 , 637.04, 631.68, 611.79, 599.19]])
y = np.array([ 1872.81, 1875.41, 1871.43, 1865.94, 1854.8 , 1839.2 ,
1827.82, 1831.73, 1846.68, 1856.56, 1861.02, 1867.15])
I managed to optimise the linear function within the parameter bounds when I use only one parameter:
fp = lambda p, x: x[0]+p*x[1]
e = lambda p, x, y: ((fp(p,x)-y)**2).sum()
pmin = 0.5 # mimimum bound
pmax = 1.5 # maximum bound
popt = sciopt.fminbound(e, pmin, pmax, args=(x,y))
This results in popt = 1.05501927245
However, when trying to optimise with multiple parameters, I get the following error message:
fp = lambda p, x: p[0]*x[0]+p[1]*x[1]
e = lambda p, x, y: ((fp(p,x)-y)**2).sum()
pmin = np.array([0.5,0.5]) # mimimum bounds
pmax = np.array([1.5,1.5]) # maximum bounds
popt = sciopt.fminbound(e, pmin, pmax, args=(x,y))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/dist-packages/scipy/optimize/optimize.py", line 949, in fminbound
if x1 > x2:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
I have tried to vectorize e (np.vectorize) but the error message remains the same. I understand that fminbound expects a float or array scalar as bounds. Is there another function that would work for this problem?
fminbound is only for optimizing functions of one variable.
For the multivariate case, you should use scipy.optimize.minimize, for example,
from scipy.optimize import minimize
p_guess = (pmin + pmax)/2
bounds = np.c_[pmin, pmax] # [[pmin[0],pmax[0]], [pmin[1],pmax[1]]]
sol = minimize(e, p_guess, bounds=bounds)
print(sol)
if not sol.success:
raise RuntimeError("Failed to solve")
popt = sol.x
I do not know python at all thus I have been unsuccessful in interpreting similar previous answers and using them.
I have a python script that I wish to execute in unix. The script uses an input file but I do not understand how to ensure that the input file is read as numpy float array.
My input file is called chk.bed and it has one column of numeric values
-bash-4.1$ # head chk.bed
7.25236
0.197037
0.189464
2.60056
0
32.721
11.3978
3.85692
0
0
The original script is -
from scipy.stats import gaussian_kde
import numpy as np
#assume "fpkm" is a NumPy array of log2(fpkm) values
kernel = gaussian_kde(fpkm)
xi = np.linspace(fpkm.min(), fpkm.max(), 100)
yi = kernel.evaluate(xi)
mu = xi[np.argmax(yi)]
U = fpkm[fpkm > mu].mean()
sigma = (U - mu) * np.sqrt(np.pi / 2)
zFPKM = (fpkm - mu) / sigma
What I could understand up until now is to make sure the script is reading the file so I included fpkm = open("chk.bed", 'r') in the code.
However on executing the code - I get the following error -
Traceback (most recent call last):
File "./calc_zfpkm.py", line 10, in <module>
kernel = gaussian_kde(fpkm)
File "/usr/lib64/python2.6/site-packages/scipy/stats/kde.py", line 88, in __init__
self._compute_covariance()
File "/usr/lib64/python2.6/site-packages/scipy/stats/kde.py", line 340, in _compute_covariance
self.factor * self.factor)
File "/usr/lib64/python2.6/site-packages/numpy/lib/function_base.py", line 1971, in cov
X = array(m, ndmin=2, dtype=float)
TypeError: float() argument must be a string or a number
This seems to suggest that I am not reading in the file correctly and so the function gaussian_kde() cannot read in the values as float.
Can you please help ?
Thanks !
You're passing a file object to gaussian_kde but it expects a NumPy array, you need to use numpy.loadtxt first to load the data in an array:
>>> import numpy as np
>>> arr = np.loadtxt('chk.bed')
>>> arr
array([ 7.25236 , 0.197037, 0.189464, 2.60056 , 0. ,
32.721 , 11.3978 , 3.85692 , 0. , 0. ])
>>> gaussian_kde(arr)
<scipy.stats.kde.gaussian_kde object at 0x7f7350390190>
Here you can find the
R script for zFPKM normalization.
I inspired from the python code which has given above and also at this link:https://www.biostars.org/p/94680/
install.packages("ks","pracma")
library(ks)
library(pracma)
/* fpkm is an example data */
fpkm <- c(1,2,3,4,5,6,7,8,4,5,6,5,6,5,6,5,5,5,5,6,6,78,8,89,8,8,8,2,2,2,1,1,4,4,4,4,4,4,4,4,4,4,4,3,2,2,3,23,2,3,23,4,2,2,4,23,2,2,24,4,4,2,2,4,4,4,2,2,4,4,2,2,4,2,45,5,5,5,3,2,2,4,4,4,4,4,4,4,4,4,3,2,2,3,23,2,3,23,4,2,2,4,23,2,2,24,4,4,2,2,4,4,4,2,2,4,4,2,2,4,2,45,5,5,5,3,2,2)
xi=linspace(min(fpkm),max(fpkm),100)
fhat = kde(x=fpkm,gridsize=100,eval.points=xi)
/* here I put digits=0. if I you do not round the numbers(yi) the results are a little bit changing.*/
yi=round(fhat$estimate,digits=0)
mu=xi[which.max(yi)]
U=mean(fpkm[fpkm>mu])
sigma=(U-mu)* (sqrt(pi/2))
zFPKM = (fpkm - mu) / sigma
Btw, I have a question.
Can I apply the same approach to RPKM?
Cankut CUBUK
Computational Genomics Program - Systems Genomics Lab
Centro de Investigación Príncipe Felipe (CIPF)
C/ Eduardo Primo Yúfera nº3
46012 Valencia, Spain
http://bioinfo.cipf.es
I seem to be getting an error when I use the root-finder in scipy. I was wondering if anyone could point out what I'm doing wrong.
The function I'm finding the root of is just an easy example, and not particularly important.
If I run this code with scipy 0.9.0:
import numpy as np
from scipy.optimize import fsolve
tmpFunc = lambda xIn: (xIn[0]-4)**2 + (xIn[1]-5)**2 + (xIn[2]-7)**3
x0 = [3,4,5]
xFinal = fsolve(tmpFunc, x0 )
print xFinal
I get the following error message:
Traceback (most recent call last):
File "tmpStack.py", line 7, in <module>
xFinal = fsolve(tmpFunc, x0 )
File "/usr/lib/python2.7/dist-packages/scipy/optimize/minpack.py", line 115, in fsolve
_check_func('fsolve', 'func', func, x0, args, n, (n,))
File "/usr/lib/python2.7/dist-packages/scipy/optimize/minpack.py", line 26, in _check_func
raise TypeError(msg)
TypeError: fsolve: there is a mismatch between the input and output shape of the 'func' argument '<lambda>'.
Well it looks like I was trying to use this routine incorrectly. This routine requires the same number of equations and variables vs. the one equation with three variables I gave it. So if the input to the function to be minimized is a 3-D array the output should be a 3-D array. This code works:
import numpy as np
from scipy.optimize import fsolve
tmpFunc = lambda xIn: np.array( [(xIn[0]-4)**2 + xIn[1], (xIn[1]-5)**2 - xIn[2]) \
, (xIn[2]-7)**3 + xIn[0] ] )
x0 = [3,4,5]
xFinal = fsolve(tmpFunc, x0 )
print xFinal
Which represents solving three equations simultaneously.