I'm trying to make a script with demand forecast but my following code is giving this error, do you know how to solve it, please?
My code:
import pandas as pd
from sklearn.linear_model import LinearRegression
df = pd.read_excel("Dados.xlsx")
df['Data'] = pd.to_datetime(df['Data'], errors='coerce')
df['Data'] = df['Data'].dt.strftime('%m/%d')
dataset = pd.DataFrame({'Data': ['2022-12-06', '2022-12-07'],'Demanda': [870, 868]})
data = dataset.groupby(dataset['Data'].dt.strftime('%Y-%V'))["Demanda"].sum().reset_index()
NUM_PRED_DAYS = 5
ds = data.Date.values
ds_pred = pd.date_range(start=dataset["Data"].min(), periods=len(ds) + NUM_PRED_DAYS, freq="W")
dataset["Date"] = pd.to_datetime(dataset["Date"])
X = df[['Data']]
y = df['Demanda']
model = LinearRegression()
model.fit(X, y)
futura_datas = pd.DataFrame({'Data': pd.date_range(start='hoje', periods=5)})
futura_demanda = model.predict(futura_datas)
futura_datas['Demanda prevista'] = futura_demanda
print(futura_datas)
And the error is:
"Python311\Lib\site-packages\pandas\core\indexes\accessors.py", line 512, in __new__
raise AttributeError("Can only use .dt accessor with datetimelike values")
AttributeError: Can only use .dt accessor with datetimelike values. Did you mean: 'at'?"
I tried some codes that I founded here but no answer.
And my excel is like that: enter image description here
I'm having issues with a "setting an array element with a sequence error.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn import tree
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LinearRegression
df = pd.read_csv('googleplaystore.csv') # 1
df = df.dropna() # 3
df['Size'] = df['Size'].str.extract(r'(\d+\.?\d)', expand=False).astype(float) * df['Size'].str[-1].replace({'M': 1024, 'k': 1}) # 4
df = df.dropna() # remove nan from "Varies with device"
df['Price'] = df['Price'].str.strip('$').astype(float) # 5
df['Installs'] = df['Installs'].str.strip('+')
df['Installs'] = df['Installs'].str.replace(',',"").astype(int)
df['Reviews'] = df['Reviews'].astype(float)
df['Size'] = df['Size'].astype(float)
df = df.loc[df['Rating'].between(1, 5)] # 6
df = df.loc[df['Type'] != 'Free'] # 7
df.drop(df[df['Price'] >= 200].index, inplace = True)
df.drop(df[df['Reviews'] >2000000].index, inplace = True)
df.drop(df[df['Installs'] >10000].index, inplace = True)
inp1 = df.copy()
df_reviewslog=np.log10(df['Reviews'])
df_installslog=np.log10(df['Installs'])
del df['App']
del df['Last Updated']
del df['Current Ver']
del df['Android Ver']
pd.get_dummies(df, columns=['Category', 'Genres', 'Content Rating'], drop_first=True)
inp2 = df.copy()
df_train = X_train,X_test,y_train,y_test=train_test_split(df['Reviews'],df['Installs'], test_size=0.7, random_state=0)
df_test = X_train,X_Test,y_train,y_test=train_test_split(df['Reviews'],df['Installs'], test_size=0.3, random_state=0)
df_train = np.array(df_train, dtype=object)
df_test = np.array(df_test, dtype=object)
df_train[0] = np.array([4])
df_test[0] = np.array([4])
df_train_1= df_train.reshape(4,1)
df_test_1= df_test.reshape(4,1)
#df_train_1
#df_test_1
model = LinearRegression().fit(df_train_1, df_test_1)
r_sq = model.score(df_train_1, df_test_1)
print(r_sq)
I keep making adjustments to my arrays to get them to work, but I keep getting this error:
"ValueError: setting an array element with a sequence." I can't figure out how to change it to get it to work.
I’m attempting to make a generalized additive model for some ocean data. My code is the following:
from pygam import LinearGAM
from pygam import LogisticGAM
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
pdf = pd.read_csv("phytoplankton-ratio-project.csv")
feature =['year','month','Dinoflagellate','Diatom','sea_water_temp_WOA_clim','nitrate_WOA_clim','phosphate_WOA_clim','silicate_WOA_clim']
df= pd.DataFrame()
df= pdf[feature]
df.head(5)
df.loc[:,'total'] = df['Dinoflagellate'] + df['Diatom'] ##
df.loc[:,'percentdia'] = df['Diatom']/df['total']
df.loc[:,'percentdino'] = df['Dinoflagellate']/df['total']
df = df.drop('total', axis=1)
df = df.dropna()
dat2016 = df[df.year == 2016]
dat2016 = dat2016.drop('year', axis=1)
dat2015 = df[df.year == 2015]
dat2015 = dat2015.drop('year', axis=1)
dat16dia = dat2016.drop('percentdino', axis=1)
dat16dino = dat2016.drop('percentdia', axis=1)
X_train = dat2016.drop(['percentdia', 'percentdino'], axis = 1)
#X_train.to_numpy()
y_traindino = dat2016['percentdino']
y_traindia = dat2016['percentdia']
X_test = dat2015.drop(['percentdia', 'percentdino'], axis = 1)
#X_test.to_numpy()
y_testdino = dat2015['percentdino']
y_testdia = dat2015['percentdia']
#5,7
gam = LinearGAM(n_splines=25).gridsearch(X_train.values, y_traindia.values)
gamdino = LinearGAM(n_splines=25).gridsearch(X_train.values, y_traindino.values)
XX = gam.generate_X_grid(term=0)
fig, axs = plt.subplots(1,6, figsize=(20,4))
titles = feature[1:]
for i, ax in enumerate(axs):
pdep, confi = gam.partial_dependence(XX, feature=i, width=.95)
ax.plot(XX[:, i], pdep)
ax.plot(XX[:, i], *confi, c='r', ls='--')
ax.set_title(titles[i])
However when I run it, I get the following error:
File "C:\Users\A\Documents\PythonRepository\StatsProject\Part2gam.py", line 69, in <module>
pdep, confi = gam.partial_dependence(XX, feature=i, width=.95)
TypeError: partial_dependence() got an unexpected keyword argument 'feature'
Would anyone know how to fix this?
I think you are not using the correct parameters, because the Error is telling you that you passed an unexpected argument.
I guess that you wanted to use feature=i you should use term=i .
If you see here in the docs you can see the list of params that partial_dependence takes as input.
Here is an example on how they use it Pygam regression docs
I am getting positional arguments error for the ols function under statsmodels.formula.api
have tried for statsmodels.regression.linear_model and changing OLS to ols and vice-versa.
import statsmodels.regression.linear_model as sm
X = np.append(arr=np.ones((50,1)).astype(int),values=X,axis=1)
X_opt = X[:,[0,1,2,3,4,5]]
regressor_OLS = sm.ols(endog = Y, exog = X_opt).fit()
Expected output is the fitting for the regression model. But I am getting an this error:
from_formula() missing 2 required positional arguments: 'formula' and
'data'
To get this example to work (I am assuming you are running the udemy machine learning course, which is line for line this example) I had to change the import statement. The library they are using is not where the OLS function resides any longer.
import statsmodels.regression.linear_model as lm
then
regressor_ols = lm.OLS(endog = y, exog = x_optimal).fit()
This should work :
import statsmodels.api as smf;
X = np.append(arr=np.ones((50,1),dtype=np.int), values = X,axis = 1)
X_opt = X[:,[0,1,2,3,4,5]]
regressor_ols = smf.OLS(y,X_opt).fit()
Remove
import statsmodels.regression.linear_model as sm
And just import statsmodels.api as following
import statsmodels.api as sm
The course is quite old and that's why fragments of code is obsolete, no idea why they are not updating it anymore.
Guys this module is part of Linear_model class so use following code to make it work.
import statsmodels.regression.linear_model as lm
X = np.append(arr=np.ones((50,1)).astype(int),values=X,axis=1)
X_opt = X[:,[0,1,2,3,4,5]]
regressor_OLS = lm.OLS(endog = y, exog = X_opt).fit()
Use import statsmodels.regression.linear_model as lm or import statsmodels.api as sm
import statsmodels.regression.linear_model as lm
X=np.append(arr=np.ones((50,1)).astype(int), values=X, axis=1)
X_opt=X[:,[0, 1, 2, 3, 4, 5]]
regressor_x=sm.OLS(endog=y, exog=X_opt).fit()
regressor_x.summary()
this one worked for me
import statsmodels.api as sm
X=np.insert(X,0,np.ones(X.shape[0]),axis=1)
colList=list()
for i in range(X.shape[1]):
colList.append(i)
X_opt=np.array(X[:, colList], dtype=float)
regressor_OLS=sm.OLS(endog=y,exog=X_opt).fit()
Solution 1:
import statsmodels.api as sm
x = np.append(arr= np.ones((50, 1)).astype(int), values= x, axis=1)
x_opt = x[:, [0,1,2,3,4,5]]
regressor_OLS = sm.OLS(endog=y, exog=x_opt).fit()
Solution 2:
import statsmodels.regression.linear_model as lm
x = np.append(arr= np.ones((50, 1)).astype(int), values= x, axis=1)
x_opt = x[:, [0,1,2,3,4,5]]
regressor_ols = lm.OLS(endog=y, exog=x_opt).fit()
I recently had the same problem, as auticus said, the library with the OLS function is no longer in statsmodels.formula.api. But you also must create X_otp as a list
import statsmodels.regression.linear_model as lm
X = np.append(arr = np.ones((50,1)).astype(int), values = X, axis = 1)
X_opt = X[:, [0, 1, 2, 3, 4, 5]].tolist()
SL = 0.05
regression_OLS = lm.OLS(endog = y, exog = X_opt). fit()
I have the data in a dataframe format that I will use for linear regression calculation using user-built function. Here is the code:
from sklearn.datasets import load_boston
boston = load_boston()
bos = pd.DataFrame(boston.data) # convert to DF
bos.columns = boston.feature_names
bos['PRICE'] = boston.target
y = bos.PRICE
x = bos.drop('PRICE', axis = 1) # DROP PRICE since only want X-type variables (not Y-target)
xw = df.to_array(x)
xw = np.insert(xw,0,1, axis = 1) # to insert a column of "1" values
However, I am getting the error:
AttributeError Traceback (most recent call last)
<ipython-input-131-272f1b4d26ba> in <module>()
1 import copy
2
----> 3 xw = df.to_array(x)
AttributeError: 'int' object has no attribute 'to_array'
I am not sure where the problem. I need to pass an array of values (x in this case) to the function to execute some matrix operations
The insert function was working in a step by step code development but for some reason is failing here.
I tried:
xw = copy.deepcopy(x)
with no success
Any thoughts?
it is x.as_matrix() not df.to_array(x)
Please refer to pandas document for more detail on as_matrix()
Here is the code that work
from sklearn.datasets import load_boston
import pandas as pd
import numpy as np
boston = load_boston()
bos = pd.DataFrame(boston.data) # convert to DF
bos.columns = boston.feature_names
bos['PRICE'] = boston.target
y = bos.PRICE
x = bos.drop('PRICE', axis = 1) # DROP PRICE since only want X-type variables (not Y-target)
xw = x.as_matrix()
xw = np.insert(xw,0,1, axis = 1) # to insert a column of "1" values