import re
from decimal import *
import numpy
from scipy.signal import cspline1d, cspline1d_eval
import scipy.interpolate
import scipy
import math
import numpy
from scipy import interpolate
Y1 =[0.48960000000000004, 0.52736099999999997, 0.56413900000000006, 0.60200199999999993, 0.64071400000000001, 0.67668399999999995, 0.71315899999999999, 0.75050499999999998, 0.61494199999999999, 0.66246900000000009]
X1 =[0.024, 0.026000000000000002, 0.028000000000000004, 0.029999999999999999, 0.032000000000000001, 0.034000000000000002, 0.035999999999999997, 0.038000000000000006, 0.029999999999999999, 0.032500000000000001]
rep = scipy.interpolate.splrep(X1,Y1)
IN the above code i am getting and error of
Traceback (most recent call last):
File "/home/vibhor/Desktop/timing_tool/timing/interpolation_cap.py", line 64, in <module>
rep = scipy.interpolate.splrep(X1,Y1)
File "/usr/lib/python2.6/site-packages/scipy/interpolate/fitpack.py", line 418, in splrep
raise _iermess[ier][1],_iermess[ier][0]
ValueError: Error on input data
Don't know what is happening
I believe it's due to the X1 values not being ordered from smallest to largest plus also you have one duplicate x point, i.e, you need to sort the values for X1 and Y1 before you can use the splrep and remove duplicates.
splrep from the docs seem to be low level access to FITPACK libraries which expects a sorted, non-duplicate list that's why it returns an error
interpolate.interp1d might seem to work, but have you actually tried to use it to find a new point? I think you'll find an error when you call it i.e. rep(2)
The X value 0.029999999999999999 occurs twice, with two different Y coordinates. It wouldn't
surprise me if that caused a problem trying to fit a polynomial spline segment....
Related
I just started to learn to code and wanted to learn python. I am attempting to recreate an SPSS statistical analysis I already conducted on Spyder. I am doing this by replicating an example: http://www.statsmodels.org/0.6.1/examples/notebooks/generated/interactions_anova.html
My analysis is slightly smaller but quite similar. I am following the example step by step, and I am having trouble with the "Take a look at the data:" step.
My work is a 2x2 Repeated measure ANOVA. The IV is MATCH (whether the participant's preferred lighting condition was utilized or not) with two conditions. The DV is pre/post-test scores on a learning objective.
I am receiving the error:
File "C:\Users\Tim\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 101, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/Tim/.spyder-py3/thesis.py", line 31, in <module>
plt.scatter(group['MATCH'], marker=symbols[j], color=colors[i-k],
TypeError: list indices must be integers or slices, not numpy.float64
<matplotlib.figure.Figure at 0x278c15ea6d8>
My code:
from __future__ import print_function
from statsmodels.compat import urlopen
import numpy as np
np.set_printoptions(precision=4, suppress=True)
import statsmodels.api as sm
import pandas as pd
pd.set_option("display.width", 100)
import matplotlib.pyplot as plt
from statsmodels.formula.api import ols
from statsmodels.graphics.api import interaction_plot, abline_plot
from statsmodels.stats.anova import anova_lm
data = r'C:\Users\Tim\pandas\Thesis_main.csv'
data = pd.read_csv(data)
plt.figure(figsize=(6,6))
symbols = ['D', '^']
colors = ['r', 'g', 'blue']
factor_groups = data.groupby(['MATCH'])
for values, group in factor_groups:
i,j = values
plt.scatter(group['PRETEST'], group['POSTTEST'] marker=symbols[j], color=colors[i-1], s=144)
plt.xlabel('MATCH');
plt.ylabel('PRETEST');('POSTTEST');
Data:
https://github.com/tici0988/Sorting_contacts/blob/master/Thesis_main.csv
Any advice on solving this error, or pointing me in a more efficient direction would be greatly appreciated! Thank you :)
There are a couple issues with your code. The first is that you are trying to call plt.scatter with only an x argument. What are you trying to plot group['MATCH'] against?
Next, you are trying to index your list symbols and/or your list colors by a float, which is not possible. I believe that the float you are using is the PRETEST and POSTTEST score (represented by i and k in your code). I can't see the data, but let's assume that score is a number such as 1.25; you can't select index 1.25 in your list of 2 symbols, as that doesn't mean anything to python. Are you trying to have different symbols and colors to represent different things? If so, to represent what? If not, simply take out the marker=symbols[j] and color=colors[i-k] arguments.
FYI, In your code, j is not defined; you must mean either i or k when you typed symbols[j]
I'm trying to code my own logistic regression, and compare different methods of maximizing the log-likelihood. Using the Newton-CG method, I get the error message "ValueError: setting an array element with a sequence". Reading around, it seems this error rises if the function sought to be minimized returns a non-skalar, but that is not the case here. I need the three methods given below to give the same result (approximately), but when running on my real data, one does not converge, and the other one gives a worse LL than the initial guess, and the third does not run at all.
Why do I get the ValueError message and how can I fix it?
My code (with dummy data, the real data is ~100 measurements) is as follows:
import numpy as np
from numpy import linalg
import scipy
from scipy.optimize import minimize
def CalcLL(beta,xinlist,yinlist):
LL=0.0
ncol=len(beta)
pi=FindPi(xinlist,beta.reshape(ncol,1))
for i in range(len(yinlist)):
LL=LL+np.where(yinlist[i]==1,np.log(pi[i]),np.log(1-pi[i]))
return -LL
def Jacobian(beta,xinlist,yinlist):
ncol=len(beta)
nrow=np.shape(xinlist)[0]
pi=FindPi(xinlist,beta.reshape(ncol,1))
Jac=np.transpose(np.matrix(yinlist-pi))*np.matrix(xinlist)
return Jac
def Hessian(beta,xinlist,yinlist):
ncol=len(beta)
nrow=np.shape(xinlist)[0]
pi=FindPi(xinlist,beta.reshape(ncol,1))
W=FindW(pi)
Hes=np.matrix(np.transpose(xinlist))*(np.matrix(W)*np.matrix(xinlist))
return Hes
def FindPi(xinlist,beta):
rows=np.shape(xinlist)[0]# Number of rows in x_new
cols=np.shape(xinlist)[1]# Number of columns in x_new
expon=np.dot(xinlist,beta)
expon=np.array(expon).reshape(rows,1)
pi=np.exp(expon)/(1+np.exp(expon))
return pi
def FindW(pi):
W=np.zeros(len(pi)*len(pi)).reshape(len(pi),len(pi))
for i in range(len(pi)):
W[i,i]=float(pi[i]*(1-pi[i]))
return W
xinlist=np.matrix([[1,1],[0,1],[1,1],[1,1],[1,1],[0,1],[0,1],[1,1],[1,1],[0,1]])
yinlist=np.transpose(np.matrix([0,0,0,0,0,1,1,1,1,1]))
ncol=np.shape(xinlist)[1]
beta1=np.zeros(ncol).reshape(ncol,1) # Initial guess for parameter values
limit=0.000001 # selfwritten Newton-Raphson method
iter_i=limit+1
while iter_i>limit:
Hes=Hessian(beta1,xinlist,yinlist)
Jac=np.transpose(Jacobian(beta1,xinlist,yinlist))
root_diff=np.array(linalg.inv(Hes)*Jac)
beta1=beta1+root_diff
iter_i=np.sum(root_diff*root_diff)
print "When running self-written algorithm, the log-likelihood is",-CalcLL(beta1,xinlist,yinlist)
beta2=np.zeros(ncol).reshape(ncol,1)
res=minimize(CalcLL,beta2,args=(xinlist,yinlist),method='Nelder-Mead',options={'xtol':1e-8,'disp':True,'maxiter':10000})
beta2=res.x
print "The log-likelihood using Nelder-Mead is", -CalcLL(beta2,xinlist,yinlist)
beta3=np.zeros(ncol).reshape(ncol,1)
res=minimize(CalcLL,beta3,args=(xinlist,yinlist),method='Newton-CG',jac=Jacobian,hess=Hes,options={'xtol':1e-8,'disp':True})
beta3=res.x
print "The log-likelihood using Newton-CG is", -CalcLL(beta3,xinlist,yinlist)
EDIT:
The errorstack is as follows:
Traceback (most recent call last):
File "MyLogisticRegression2.py", line 62, in
res=minimize(CalcLL,beta3,args=(xinlist,yinlist),method='Newton-CG',jac=Jacobian,hess=Hes,options={'xtol':1e-8,'disp':True})
File C:\Python27\lib\site-packages\scipy\optimize_minimize.py, line 447, in minimize **options)
File C:\Python27\lib\site-packages\scipy\optimize\optimize.py, line 2393, in _minimize_newtoncg eta=numpy.min([0.5, numpy.sqrt(maggrad)])
File C:\Python27\lib\site-packages\numpy\core\fromnumeric.py, line 2393, in amin out=out, **kwargs)
File C:\Python27\lib\site-packages\numpy\core_methods.py, line 29, in _amin return umr_minimum(a,axis,None,out,keepdims)
ValueError: setting an array element with a sequence
I found out the problem rose from beta arrays having shape (2,1) instead of (2,), and likewise for the Jacobian. Reshaping these two solved the problem.
The Newton-CG solver needs only 1d arrays for the Jacobian apparently.
I am trying to compute the linear regression of a stock price development for a specific time frame. The code runs fine until I add the stats.linregress() function; giving me the following error:
Traceback (most recent call last):
File
"C:/[...]/PycharmProjects/Portfolio_Algorithm/Main.py", line
3, in
from scipy import stats
File "C:[...]\Continuum\Anaconda3\lib\site-packages\scipy__init__.py", line 61, in
from numpy import show_config as show_numpy_config
File "C:[...]\Python\Python35\site-packages\numpy__init__.py",line 142, in
from . import add_newdocs
File "C:[...]\Python\Python35\site-packages\numpy\add_newdocs.py",line 13, in
from numpy.lib import add_newdoc
File "C:[...]\Python\Python35\site-packages\numpy\lib__init__.py",line 8, in
from .type_check import *
File "C:[...]\Python\Python35\site-packages\numpy\lib\type_check.py", line 11, in
import numpy.core.numeric as _nx
File "C:[...]\Python\Python35\site-packages\numpy\core__init__.py", line 21, in
from . import umath
File "C:[...]\Python\Python35\site-packages\numpy\core\umath.py",line 30, in
NAN = nan NameError: name 'nan' is not defined
I am using Python 3.5, Anaconda (for scipy and numpy) and PyCharm.
from yahoo_finance import Share
from math import log
from scipy import stats
yahoo = Share('YHOO')
date_list=[]
price_list=[]
timeframe = (yahoo.get_historical('2016-01-01', '2016-10-29'))
for item in timeframe:
date_list.extend([item['Date']])
price_list.extend([log(float(item['Close']))])
slope = stats.linregress(date_list, price_list)
print(slope)
When I run the example of the scipy user guide, I get the same error.
Example (link):
from scipy import stats
np.random.seed(12345678)
x = np.random.random(10)
y = np.random.random(10)
slope, intercept, r_value, p_value, std_err = stats.linregress(x,y)
print("r-squared:", r_value**2)
Does anyone know what could cause the error?
Here's your example, re-written to fix a few issues:
from yahoo_finance import Share
from math import log
from scipy import stats
from time import mktime, strptime
import numpy as np
yahoo = Share('YHOO')
timeframe = yahoo.get_historical('2016-01-01', '2016-10-29')
tpattern = '%Y-%m-%d' # Time-match-pattern
dates = np.zeros(len(timeframe))
prices = np.zeros(len(timeframe))
for ii,item in enumerate(timeframe):
dates[ii] = mktime(strptime(item['Date'], tpattern))
prices[ii] = float(item['Close'])
slope = stats.linregress(dates, np.log10(prices))
print(slope)
The get_historical method returns a list of dict, each containing strings. You need to convert your data to float to make it useful. This seems to be the main problem in your example.
Since you are pulling the data at the start and you know how many data points you will analyze, there's no reason to use lists as a data structure; numpy arrays are more efficient. Thus, use dates and prices rather than the lists.
With numpy arrays, it is more efficient to operate on the entire array of price data to generate the logarithm, rather than doing it one-at-a-time in the loop.
You probably intended the base-10 logarithm, not natural logarithm for your slope.
I recently tried to use scipy.odr package to conduct a regression analysis. Whenever I try to load a list of data where the elements depend on a function, a value error is raised:
ValueError: x could not be made into a suitable array
I have been using the same kind of programming to make fits using scipy's leastsq and curve_fit routines without problems.
Any idea of what to change and how to proceed? Thanks a lot...
Here I include a minimal working example:
from scipy import odr
from functools import partial
import numpy as np
import matplotlib.pyplot as plt
### choose select=0 and for myModel a list of elements is called which are a function of some parameters
### this results in error message: ValueError: x could not be made into a suitable array
### choose select=1, the function temp is exlcuded, and a fit is generated
### what do i have to do in order to run the programm successfully using select=0?
## choose here!
select=1
pfit=[1.0,1.0]
q0=[1,2,3,4,5]
q1=[3,8,10,19,27]
def temp(par, val):
p1,p2=par
temp_out = p1*val**p2
return temp_out
def fodr(a,x):
if select==0:
fitf = np.array([xi(a) for xi in x])
else:
fitf= a[0]*x**a[1]
return fitf
# define model
myModel = odr.Model(fodr)
# load data
damy=q1
if select==0:
damx=[]
for el in q0:
elm=partial(temp,val=el)
damx.append(elm)
#damx=[el(pfit) for el in damx] # check that function temp works fine
#print damx
else:
damx=q0
myData = odr.Data(damx, damy)
myOdr = odr.ODR(myData, myModel , beta0=pfit, maxit=100, ifixb=[1,1])
out = myOdr.run()
out.pprint()
Edit:
# Robert:
Thanks for your reply. I am using scipy version '0.14.0'. Using select==0 in my minimal example I get following traceback:
Traceback (most recent call last):
File "scipy-odr.py", line 48, in <module>
out = myOdr.run()
File "/home/tg/anaconda/lib/python2.7/site-packages/scipy/odr/odrpack.py", line 1061, in run
self.output = Output(odr(*args, **kwds))
ValueError: x could not be made into a suitable array
In short, your code does not work because damx is a now a list of functools.partial.
scipy.odr is a simple wrapper around Fortran Orthogonal Distance Regression (ODRPACK), both xdata and ydata have to be numerical since they will be converted to some Fortran type under the hood. It doesn't know what to do with a list of functools.partial, therefore the error.
I have tried to find the answer to this question, maybe its very easy and thats why i cant.
If I have made a Gaussian function and I want to plot it with Matplotlib.pyplot.plot, how can i do that with float values. I.e. values from -20<=x<=20 in increments of 0.1
import matplotlib.pyplot as plt
import math
from math import exp
import numpy
#Parameters for the Gaussian
A=1
c=10
t=0
a=1
x=[]
p=-20.
while p<=20:
x.append(p)
p+=0.1
def Gaussian(A,c,t,a,x):
return A*exp(-((c*t-x)^2 /(4*a*c^2)))
plt.plot(x,Gaussian(A,c,t,a,x))
plt.show()
The Error i get is:
Traceback (most recent call last):
File "C:--------/Gaussian Function.py", line 21, in <module>
plt.plot(x,Gaussian(A,c,t,a,x))
File "C:--------/Gaussian Function.py", line 19, in Gaussian
return A*exp(-((c*t-x)^2 /(4*a*c^2)))
TypeError: unsupported operand type(s) for -: 'int' and 'list'
The problem has nothing to do with matplotlib. You will get the same error if you just call Gaussian(A, c, t, a, x) without using matplotlib at all. Your function accepts an argument x that is a list, and then tries to do stuff like c*t-x. You can't subtract a list from a number. As the error message suggests, you should probably make x a numpy array, which will allow you to do these kinds of vectorized operations on it.
There are some mistakes in your code. The corrected one is below:
import matplotlib.pyplot as plt
import numpy as np
#Parameters for the Gaussian
A, c, t, a = 1, 10, 0, 1
x = np.arange(-20,20,0.1) #use this instead
def Gaussian(A,c,t,a,x):
return A*np.exp(-((c*t-x)**2/(4*a*c**2))) #power in Python is ** not ^
plt.plot(x,Gaussian(A,c,t,a,x))
plt.show()
and result is: