Linear regression function runs into "nan is not defined" error - python

I am trying to compute the linear regression of a stock price development for a specific time frame. The code runs fine until I add the stats.linregress() function; giving me the following error:
Traceback (most recent call last):
File
"C:/[...]/PycharmProjects/Portfolio_Algorithm/Main.py", line
3, in
from scipy import stats
File "C:[...]\Continuum\Anaconda3\lib\site-packages\scipy__init__.py", line 61, in
from numpy import show_config as show_numpy_config
File "C:[...]\Python\Python35\site-packages\numpy__init__.py",line 142, in
from . import add_newdocs
File "C:[...]\Python\Python35\site-packages\numpy\add_newdocs.py",line 13, in
from numpy.lib import add_newdoc
File "C:[...]\Python\Python35\site-packages\numpy\lib__init__.py",line 8, in
from .type_check import *
File "C:[...]\Python\Python35\site-packages\numpy\lib\type_check.py", line 11, in
import numpy.core.numeric as _nx
File "C:[...]\Python\Python35\site-packages\numpy\core__init__.py", line 21, in
from . import umath
File "C:[...]\Python\Python35\site-packages\numpy\core\umath.py",line 30, in
NAN = nan NameError: name 'nan' is not defined
I am using Python 3.5, Anaconda (for scipy and numpy) and PyCharm.
from yahoo_finance import Share
from math import log
from scipy import stats
yahoo = Share('YHOO')
date_list=[]
price_list=[]
timeframe = (yahoo.get_historical('2016-01-01', '2016-10-29'))
for item in timeframe:
date_list.extend([item['Date']])
price_list.extend([log(float(item['Close']))])
slope = stats.linregress(date_list, price_list)
print(slope)
When I run the example of the scipy user guide, I get the same error.
Example (link):
from scipy import stats
np.random.seed(12345678)
x = np.random.random(10)
y = np.random.random(10)
slope, intercept, r_value, p_value, std_err = stats.linregress(x,y)
print("r-squared:", r_value**2)
Does anyone know what could cause the error?

Here's your example, re-written to fix a few issues:
from yahoo_finance import Share
from math import log
from scipy import stats
from time import mktime, strptime
import numpy as np
yahoo = Share('YHOO')
timeframe = yahoo.get_historical('2016-01-01', '2016-10-29')
tpattern = '%Y-%m-%d' # Time-match-pattern
dates = np.zeros(len(timeframe))
prices = np.zeros(len(timeframe))
for ii,item in enumerate(timeframe):
dates[ii] = mktime(strptime(item['Date'], tpattern))
prices[ii] = float(item['Close'])
slope = stats.linregress(dates, np.log10(prices))
print(slope)
The get_historical method returns a list of dict, each containing strings. You need to convert your data to float to make it useful. This seems to be the main problem in your example.
Since you are pulling the data at the start and you know how many data points you will analyze, there's no reason to use lists as a data structure; numpy arrays are more efficient. Thus, use dates and prices rather than the lists.
With numpy arrays, it is more efficient to operate on the entire array of price data to generate the logarithm, rather than doing it one-at-a-time in the loop.
You probably intended the base-10 logarithm, not natural logarithm for your slope.

Related

Two-way Repeated Measures ANOVA in Spyder-TypeError: list indices must be integers or slices, not numpy.float64

I just started to learn to code and wanted to learn python. I am attempting to recreate an SPSS statistical analysis I already conducted on Spyder. I am doing this by replicating an example: http://www.statsmodels.org/0.6.1/examples/notebooks/generated/interactions_anova.html
My analysis is slightly smaller but quite similar. I am following the example step by step, and I am having trouble with the "Take a look at the data:" step.
My work is a 2x2 Repeated measure ANOVA. The IV is MATCH (whether the participant's preferred lighting condition was utilized or not) with two conditions. The DV is pre/post-test scores on a learning objective.
I am receiving the error:
File "C:\Users\Tim\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 101, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/Tim/.spyder-py3/thesis.py", line 31, in <module>
plt.scatter(group['MATCH'], marker=symbols[j], color=colors[i-k],
TypeError: list indices must be integers or slices, not numpy.float64
<matplotlib.figure.Figure at 0x278c15ea6d8>
My code:
from __future__ import print_function
from statsmodels.compat import urlopen
import numpy as np
np.set_printoptions(precision=4, suppress=True)
import statsmodels.api as sm
import pandas as pd
pd.set_option("display.width", 100)
import matplotlib.pyplot as plt
from statsmodels.formula.api import ols
from statsmodels.graphics.api import interaction_plot, abline_plot
from statsmodels.stats.anova import anova_lm
data = r'C:\Users\Tim\pandas\Thesis_main.csv'
data = pd.read_csv(data)
plt.figure(figsize=(6,6))
symbols = ['D', '^']
colors = ['r', 'g', 'blue']
factor_groups = data.groupby(['MATCH'])
for values, group in factor_groups:
i,j = values
plt.scatter(group['PRETEST'], group['POSTTEST'] marker=symbols[j], color=colors[i-1], s=144)
plt.xlabel('MATCH');
plt.ylabel('PRETEST');('POSTTEST');
Data:
https://github.com/tici0988/Sorting_contacts/blob/master/Thesis_main.csv
Any advice on solving this error, or pointing me in a more efficient direction would be greatly appreciated! Thank you :)
There are a couple issues with your code. The first is that you are trying to call plt.scatter with only an x argument. What are you trying to plot group['MATCH'] against?
Next, you are trying to index your list symbols and/or your list colors by a float, which is not possible. I believe that the float you are using is the PRETEST and POSTTEST score (represented by i and k in your code). I can't see the data, but let's assume that score is a number such as 1.25; you can't select index 1.25 in your list of 2 symbols, as that doesn't mean anything to python. Are you trying to have different symbols and colors to represent different things? If so, to represent what? If not, simply take out the marker=symbols[j] and color=colors[i-k] arguments.
FYI, In your code, j is not defined; you must mean either i or k when you typed symbols[j]

Using power results in ValueError: a <= 0

I have written the following code but it fails with a ValueError.
from numpy import *
from pylab import *
t = arange(-10, 10, 20/(1001-1))
x = 1./sqrt(2*pi)*exp(power(-(t*t), 2))
Specifically, the error message I'm receiving is:
ValueError: a <= 0
x = 1./sqrt(2*pi)*exp(power(-(t*t), 2))
File "mtrand.pyx", line 3214, in mtrand.RandomState.power (numpy\random\mtrand\mtrand.c:24592)
Traceback (most recent call last):
File "D:\WinPython-64bit-3.4.4.3Qt5\notebooks\untitled1.py", line 6, in <module>
Any idea what the issue might be here?
Both numpy and pylab define a function called power, but they are completely different. Because you imported pylab after numpy using import *, the pylab version is the one you end up with. What is pylab.power? From the docstring:
power(a, size=None)
Draws samples in [0, 1] from a power distribution with positive exponent a - 1.
The moral of the story: don't use import *. In this case, it is common to use import numpy as np:
import numpy as np
t = np.arange(-10, 10, 20/(1001-1))
x = 1./np.sqrt(2*np.pi)*np.exp(np.power(-(t*t), 2))
Further reading:
Why is "import *" bad?
Idioms and Anti-Idioms in Python (That's in the Python 2 documentation, but it also applies to Python 3.)

ValueError loading data for scipy.odr regression

I recently tried to use scipy.odr package to conduct a regression analysis. Whenever I try to load a list of data where the elements depend on a function, a value error is raised:
ValueError: x could not be made into a suitable array
I have been using the same kind of programming to make fits using scipy's leastsq and curve_fit routines without problems.
Any idea of what to change and how to proceed? Thanks a lot...
Here I include a minimal working example:
from scipy import odr
from functools import partial
import numpy as np
import matplotlib.pyplot as plt
### choose select=0 and for myModel a list of elements is called which are a function of some parameters
### this results in error message: ValueError: x could not be made into a suitable array
### choose select=1, the function temp is exlcuded, and a fit is generated
### what do i have to do in order to run the programm successfully using select=0?
## choose here!
select=1
pfit=[1.0,1.0]
q0=[1,2,3,4,5]
q1=[3,8,10,19,27]
def temp(par, val):
p1,p2=par
temp_out = p1*val**p2
return temp_out
def fodr(a,x):
if select==0:
fitf = np.array([xi(a) for xi in x])
else:
fitf= a[0]*x**a[1]
return fitf
# define model
myModel = odr.Model(fodr)
# load data
damy=q1
if select==0:
damx=[]
for el in q0:
elm=partial(temp,val=el)
damx.append(elm)
#damx=[el(pfit) for el in damx] # check that function temp works fine
#print damx
else:
damx=q0
myData = odr.Data(damx, damy)
myOdr = odr.ODR(myData, myModel , beta0=pfit, maxit=100, ifixb=[1,1])
out = myOdr.run()
out.pprint()
Edit:
# Robert:
Thanks for your reply. I am using scipy version '0.14.0'. Using select==0 in my minimal example I get following traceback:
Traceback (most recent call last):
File "scipy-odr.py", line 48, in <module>
out = myOdr.run()
File "/home/tg/anaconda/lib/python2.7/site-packages/scipy/odr/odrpack.py", line 1061, in run
self.output = Output(odr(*args, **kwds))
ValueError: x could not be made into a suitable array
In short, your code does not work because damx is a now a list of functools.partial.
scipy.odr is a simple wrapper around Fortran Orthogonal Distance Regression (ODRPACK), both xdata and ydata have to be numerical since they will be converted to some Fortran type under the hood. It doesn't know what to do with a list of functools.partial, therefore the error.

Plotting random numbers in Python

I'm trying to generate and plot random numbers using:
from numpy import random
import matplotlib.pyplot as plt
z = 15 + 2*random.randn(200) #200 elements, normal dist with mean = 15, sd = 2
plt.plot(z)
plt.show(z)
The graph is plotted, but Python (2.7.5) freezes and I get the error
Traceback (most recent call last):
File "G:\Stage 2 expt\e298\q1.py", line 25, in <module>
plt.show(z)
File "C:\Python27\lib\site-packages\matplotlib\pyplot.py", line 145, in show
_show(*args, **kw)
File "C:\Python27\lib\site-packages\matplotlib\backend_bases.py", line 90, in __call__
if block:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
It's completely fine when I do a for loop like so:
from numpy import random
from pylab import plot,show
yvec = [] # set up an empty vector
for i in range(200): # want 200 numbers
yy = 25 + 3*random.randn() # normal dist with mean = 15, sd = 2
yvec.append(yy) # enter yy into vector
plot(yvec)
show(yvec)
Could someone please clarify?
The function pylab.show does not take a list or array, it takes an optional boolean (and certainly not your data array). The numpy array in the first example can't be implicitly converted to a boolean, thus throwing an error. The second one can however be converted to a boolean, and it will evaluate to True if non-empty.
To fix it, just call show without any arguments.

Interpolate Question

import re
from decimal import *
import numpy
from scipy.signal import cspline1d, cspline1d_eval
import scipy.interpolate
import scipy
import math
import numpy
from scipy import interpolate
Y1 =[0.48960000000000004, 0.52736099999999997, 0.56413900000000006, 0.60200199999999993, 0.64071400000000001, 0.67668399999999995, 0.71315899999999999, 0.75050499999999998, 0.61494199999999999, 0.66246900000000009]
X1 =[0.024, 0.026000000000000002, 0.028000000000000004, 0.029999999999999999, 0.032000000000000001, 0.034000000000000002, 0.035999999999999997, 0.038000000000000006, 0.029999999999999999, 0.032500000000000001]
rep = scipy.interpolate.splrep(X1,Y1)
IN the above code i am getting and error of
Traceback (most recent call last):
File "/home/vibhor/Desktop/timing_tool/timing/interpolation_cap.py", line 64, in <module>
rep = scipy.interpolate.splrep(X1,Y1)
File "/usr/lib/python2.6/site-packages/scipy/interpolate/fitpack.py", line 418, in splrep
raise _iermess[ier][1],_iermess[ier][0]
ValueError: Error on input data
Don't know what is happening
I believe it's due to the X1 values not being ordered from smallest to largest plus also you have one duplicate x point, i.e, you need to sort the values for X1 and Y1 before you can use the splrep and remove duplicates.
splrep from the docs seem to be low level access to FITPACK libraries which expects a sorted, non-duplicate list that's why it returns an error
interpolate.interp1d might seem to work, but have you actually tried to use it to find a new point? I think you'll find an error when you call it i.e. rep(2)
The X value 0.029999999999999999 occurs twice, with two different Y coordinates. It wouldn't
surprise me if that caused a problem trying to fit a polynomial spline segment....

Categories