AttributeError - Even though there seem to be no attribute error - python

I am currently learning how to python for Machine Learning. While I am progressing, the interpreter had detected a AttributeError but I do not see any problem. Can someone help to fix this error?
My Code:
import pandas as pd
import quandl, math
import numpy as np
import datetime
import matplotlib.pyplot as plt
from matplotlib import style
from sklearn import preprocessing, cross_validation, svm
from sklearn.linear_model import LinearRegression
style.use('ggplot')
quandl.ApiConfig.api_key = ''
df = quandl.get('EOD/V', api_key = '')
df = df[['Adj_Open','Adj_High','Adj_Low','Adj_Close','Adj_Volume',]]
df['ML_PCT'] = (df['Adj_High'] - df['Adj_Close']) / df['Adj_Close'] * 100.0
df['PCT_change'] = (df['Adj_Close'] - df['Adj_Open']) / df['Adj_Open'] * 100.0
df = df[['Adj_Close', 'ML_PCT', 'PCT_change', 'Adj_Volume']]
forecast_col = 'Adj_Close'
df.fillna(value=-99999, inplace=True)
forecast_out = int(math.ceil(0.01 * len(df)))
df['label'] = df[forecast_col].shift(-forecast_out)
X = np.array(df.drop(['label'], 1))
X = preprocessing.scale(X)
X = X[:-forecast_out]
df.dropna(inplace=True)
y = np.array(df['label'])
X_train, X_test, y_train, y_test = cross_validation.train_test_split(X, y, test_size=0.2)
clf = LinearRegression(n_jobs=-1)
clf.fit(X_train, y_train)
confidence = clf.score(X_test, y_test)
print(confidence)
X_lately = X[-forecast_out:]
forecast_set = clf.predict(X_lately)
print(forecast_set, confidence, forecast_out)
df['Forecast'] = np.nan
last_date = df.iloc[-1].name
last_unix = last_date.timestamp()
one_day = 86400
next_unix = last_unix + one_day
for i in forecast_set:
next_date = datetime.datetime.fromtimestamp(next_unix)
next_unix += 86400
df.loc[next_date] = [np.nan for _ in range(len(df.columns)-1)]+[i]
df['Adj_Close'].plot()
df['Forecast'].plot()
plt.legend(loc = 4)
plt.xlabel('Date')
plt.ylabel('Price')
plt.show()
Error:
C:\Python27\lib\site-packages\sklearn\cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
"This module will be removed in 0.20.", DeprecationWarning)
0.989124557421
(array([ 94.46383723, 93.27713267, 93.15533011, 93.89038799,
94.71390166, 95.29332756, 96.23047821, 96.51527839,
96.17180986, 96.17575181, 96.68721678, 96.85114045,
97.57455941, 97.98680762, 97.32961443, 97.55881174,
97.54090546, 96.17175855, 94.95430597, 96.49002102,
96.82364097, 95.63098589, 95.61236103, 96.24114818])Traceback (most recent call last):, 0.98912455742140903, 24)
File "C:\Users\qasim\Documents\python_machine_learning\regression.py", line 47, in <module>
last_unix = last_date.timestamp()
AttributeError: 'Timestamp' object has no attribute 'timestamp'
[Finished in 36.6s]

The issue is that last_date is a pandas Timestamp object, not a python datetime object. It does have a function like datetime.timetuple(), though. Try this:
Assuming last_date is in UTC, use this:
import calendar
...
last_date = df.iloc[-1].name
last_unix = calendar.timegm(last_date.timetuple())
If last_date is in your local timezone, use this:
import time
...
last_date = df.iloc[-1].name
last_unix = time.mktime(last_date.timetuple())

Related

Error at running my script with demand forecast

I'm trying to make a script with demand forecast but my following code is giving this error, do you know how to solve it, please?
My code:
import pandas as pd
from sklearn.linear_model import LinearRegression
df = pd.read_excel("Dados.xlsx")
df['Data'] = pd.to_datetime(df['Data'], errors='coerce')
df['Data'] = df['Data'].dt.strftime('%m/%d')
dataset = pd.DataFrame({'Data': ['2022-12-06', '2022-12-07'],'Demanda': [870, 868]})
data = dataset.groupby(dataset['Data'].dt.strftime('%Y-%V'))["Demanda"].sum().reset_index()
NUM_PRED_DAYS = 5
ds = data.Date.values
ds_pred = pd.date_range(start=dataset["Data"].min(), periods=len(ds) + NUM_PRED_DAYS, freq="W")
dataset["Date"] = pd.to_datetime(dataset["Date"])
X = df[['Data']]
y = df['Demanda']
model = LinearRegression()
model.fit(X, y)
futura_datas = pd.DataFrame({'Data': pd.date_range(start='hoje', periods=5)})
futura_demanda = model.predict(futura_datas)
futura_datas['Demanda prevista'] = futura_demanda
print(futura_datas)
And the error is:
"Python311\Lib\site-packages\pandas\core\indexes\accessors.py", line 512, in __new__
raise AttributeError("Can only use .dt accessor with datetimelike values")
AttributeError: Can only use .dt accessor with datetimelike values. Did you mean: 'at'?"
I tried some codes that I founded here but no answer.
And my excel is like that: enter image description here

Issues with "setting an array element with a sequence"

I'm having issues with a "setting an array element with a sequence error.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn import tree
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LinearRegression
df = pd.read_csv('googleplaystore.csv') # 1
df = df.dropna() # 3
df['Size'] = df['Size'].str.extract(r'(\d+\.?\d)', expand=False).astype(float) * df['Size'].str[-1].replace({'M': 1024, 'k': 1}) # 4
df = df.dropna() # remove nan from "Varies with device"
df['Price'] = df['Price'].str.strip('$').astype(float) # 5
df['Installs'] = df['Installs'].str.strip('+')
df['Installs'] = df['Installs'].str.replace(',',"").astype(int)
df['Reviews'] = df['Reviews'].astype(float)
df['Size'] = df['Size'].astype(float)
df = df.loc[df['Rating'].between(1, 5)] # 6
df = df.loc[df['Type'] != 'Free'] # 7
df.drop(df[df['Price'] >= 200].index, inplace = True)
df.drop(df[df['Reviews'] >2000000].index, inplace = True)
df.drop(df[df['Installs'] >10000].index, inplace = True)
inp1 = df.copy()
df_reviewslog=np.log10(df['Reviews'])
df_installslog=np.log10(df['Installs'])
del df['App']
del df['Last Updated']
del df['Current Ver']
del df['Android Ver']
pd.get_dummies(df, columns=['Category', 'Genres', 'Content Rating'], drop_first=True)
inp2 = df.copy()
df_train = X_train,X_test,y_train,y_test=train_test_split(df['Reviews'],df['Installs'], test_size=0.7, random_state=0)
df_test = X_train,X_Test,y_train,y_test=train_test_split(df['Reviews'],df['Installs'], test_size=0.3, random_state=0)
df_train = np.array(df_train, dtype=object)
df_test = np.array(df_test, dtype=object)
df_train[0] = np.array([4])
df_test[0] = np.array([4])
df_train_1= df_train.reshape(4,1)
df_test_1= df_test.reshape(4,1)
#df_train_1
#df_test_1
model = LinearRegression().fit(df_train_1, df_test_1)
r_sq = model.score(df_train_1, df_test_1)
print(r_sq)
I keep making adjustments to my arrays to get them to work, but I keep getting this error:
"ValueError: setting an array element with a sequence." I can't figure out how to change it to get it to work.

partial_dependence() got an unexpected keyword argument 'feature' for a python generalized additive model. How do I fix it?

I’m attempting to make a generalized additive model for some ocean data. My code is the following:
from pygam import LinearGAM
from pygam import LogisticGAM
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
pdf = pd.read_csv("phytoplankton-ratio-project.csv")
feature =['year','month','Dinoflagellate','Diatom','sea_water_temp_WOA_clim','nitrate_WOA_clim','phosphate_WOA_clim','silicate_WOA_clim']
df= pd.DataFrame()
df= pdf[feature]
df.head(5)
df.loc[:,'total'] = df['Dinoflagellate'] + df['Diatom'] ##
df.loc[:,'percentdia'] = df['Diatom']/df['total']
df.loc[:,'percentdino'] = df['Dinoflagellate']/df['total']
df = df.drop('total', axis=1)
df = df.dropna()
dat2016 = df[df.year == 2016]
dat2016 = dat2016.drop('year', axis=1)
dat2015 = df[df.year == 2015]
dat2015 = dat2015.drop('year', axis=1)
dat16dia = dat2016.drop('percentdino', axis=1)
dat16dino = dat2016.drop('percentdia', axis=1)
X_train = dat2016.drop(['percentdia', 'percentdino'], axis = 1)
#X_train.to_numpy()
y_traindino = dat2016['percentdino']
y_traindia = dat2016['percentdia']
X_test = dat2015.drop(['percentdia', 'percentdino'], axis = 1)
#X_test.to_numpy()
y_testdino = dat2015['percentdino']
y_testdia = dat2015['percentdia']
#5,7
gam = LinearGAM(n_splines=25).gridsearch(X_train.values, y_traindia.values)
gamdino = LinearGAM(n_splines=25).gridsearch(X_train.values, y_traindino.values)
XX = gam.generate_X_grid(term=0)
fig, axs = plt.subplots(1,6, figsize=(20,4))
titles = feature[1:]
for i, ax in enumerate(axs):
pdep, confi = gam.partial_dependence(XX, feature=i, width=.95)
ax.plot(XX[:, i], pdep)
ax.plot(XX[:, i], *confi, c='r', ls='--')
ax.set_title(titles[i])
However when I run it, I get the following error:
File "C:\Users\A\Documents\PythonRepository\StatsProject\Part2gam.py", line 69, in <module>
pdep, confi = gam.partial_dependence(XX, feature=i, width=.95)
TypeError: partial_dependence() got an unexpected keyword argument 'feature'
Would anyone know how to fix this?
I think you are not using the correct parameters, because the Error is telling you that you passed an unexpected argument.
I guess that you wanted to use feature=i you should use term=i .
If you see here in the docs you can see the list of params that partial_dependence takes as input.
Here is an example on how they use it Pygam regression docs

from_formula() missing 2 required positional arguments: 'formula' and 'data'

I am getting positional arguments error for the ols function under statsmodels.formula.api
have tried for statsmodels.regression.linear_model and changing OLS to ols and vice-versa.
import statsmodels.regression.linear_model as sm
X = np.append(arr=np.ones((50,1)).astype(int),values=X,axis=1)
X_opt = X[:,[0,1,2,3,4,5]]
regressor_OLS = sm.ols(endog = Y, exog = X_opt).fit()
Expected output is the fitting for the regression model. But I am getting an this error:
from_formula() missing 2 required positional arguments: 'formula' and
'data'
To get this example to work (I am assuming you are running the udemy machine learning course, which is line for line this example) I had to change the import statement. The library they are using is not where the OLS function resides any longer.
import statsmodels.regression.linear_model as lm
then
regressor_ols = lm.OLS(endog = y, exog = x_optimal).fit()
This should work :
import statsmodels.api as smf;
X = np.append(arr=np.ones((50,1),dtype=np.int), values = X,axis = 1)
X_opt = X[:,[0,1,2,3,4,5]]
regressor_ols = smf.OLS(y,X_opt).fit()
Remove
import statsmodels.regression.linear_model as sm
And just import statsmodels.api as following
import statsmodels.api as sm
The course is quite old and that's why fragments of code is obsolete, no idea why they are not updating it anymore.
Guys this module is part of Linear_model class so use following code to make it work.
import statsmodels.regression.linear_model as lm
X = np.append(arr=np.ones((50,1)).astype(int),values=X,axis=1)
X_opt = X[:,[0,1,2,3,4,5]]
regressor_OLS = lm.OLS(endog = y, exog = X_opt).fit()
Use import statsmodels.regression.linear_model as lm or import statsmodels.api as sm
import statsmodels.regression.linear_model as lm
X=np.append(arr=np.ones((50,1)).astype(int), values=X, axis=1)
X_opt=X[:,[0, 1, 2, 3, 4, 5]]
regressor_x=sm.OLS(endog=y, exog=X_opt).fit()
regressor_x.summary()
this one worked for me
import statsmodels.api as sm
X=np.insert(X,0,np.ones(X.shape[0]),axis=1)
colList=list()
for i in range(X.shape[1]):
colList.append(i)
X_opt=np.array(X[:, colList], dtype=float)
regressor_OLS=sm.OLS(endog=y,exog=X_opt).fit()
Solution 1:
import statsmodels.api as sm
x = np.append(arr= np.ones((50, 1)).astype(int), values= x, axis=1)
x_opt = x[:, [0,1,2,3,4,5]]
regressor_OLS = sm.OLS(endog=y, exog=x_opt).fit()
Solution 2:
import statsmodels.regression.linear_model as lm
x = np.append(arr= np.ones((50, 1)).astype(int), values= x, axis=1)
x_opt = x[:, [0,1,2,3,4,5]]
regressor_ols = lm.OLS(endog=y, exog=x_opt).fit()
I recently had the same problem, as auticus said, the library with the OLS function is no longer in statsmodels.formula.api. But you also must create X_otp as a list
import statsmodels.regression.linear_model as lm
X = np.append(arr = np.ones((50,1)).astype(int), values = X, axis = 1)
X_opt = X[:, [0, 1, 2, 3, 4, 5]].tolist()
SL = 0.05
regression_OLS = lm.OLS(endog = y, exog = X_opt). fit()

Convert DF into Numpy Array for calculations

I have the data in a dataframe format that I will use for linear regression calculation using user-built function. Here is the code:
from sklearn.datasets import load_boston
boston = load_boston()
bos = pd.DataFrame(boston.data) # convert to DF
bos.columns = boston.feature_names
bos['PRICE'] = boston.target
y = bos.PRICE
x = bos.drop('PRICE', axis = 1) # DROP PRICE since only want X-type variables (not Y-target)
xw = df.to_array(x)
xw = np.insert(xw,0,1, axis = 1) # to insert a column of "1" values
However, I am getting the error:
AttributeError Traceback (most recent call last)
<ipython-input-131-272f1b4d26ba> in <module>()
1 import copy
2
----> 3 xw = df.to_array(x)
AttributeError: 'int' object has no attribute 'to_array'
I am not sure where the problem. I need to pass an array of values (x in this case) to the function to execute some matrix operations
The insert function was working in a step by step code development but for some reason is failing here.
I tried:
xw = copy.deepcopy(x)
with no success
Any thoughts?
it is x.as_matrix() not df.to_array(x)
Please refer to pandas document for more detail on as_matrix()
Here is the code that work
from sklearn.datasets import load_boston
import pandas as pd
import numpy as np
boston = load_boston()
bos = pd.DataFrame(boston.data) # convert to DF
bos.columns = boston.feature_names
bos['PRICE'] = boston.target
y = bos.PRICE
x = bos.drop('PRICE', axis = 1) # DROP PRICE since only want X-type variables (not Y-target)
xw = x.as_matrix()
xw = np.insert(xw,0,1, axis = 1) # to insert a column of "1" values

Categories