Error at running my script with demand forecast - python

I'm trying to make a script with demand forecast but my following code is giving this error, do you know how to solve it, please?
My code:
import pandas as pd
from sklearn.linear_model import LinearRegression
df = pd.read_excel("Dados.xlsx")
df['Data'] = pd.to_datetime(df['Data'], errors='coerce')
df['Data'] = df['Data'].dt.strftime('%m/%d')
dataset = pd.DataFrame({'Data': ['2022-12-06', '2022-12-07'],'Demanda': [870, 868]})
data = dataset.groupby(dataset['Data'].dt.strftime('%Y-%V'))["Demanda"].sum().reset_index()
NUM_PRED_DAYS = 5
ds = data.Date.values
ds_pred = pd.date_range(start=dataset["Data"].min(), periods=len(ds) + NUM_PRED_DAYS, freq="W")
dataset["Date"] = pd.to_datetime(dataset["Date"])
X = df[['Data']]
y = df['Demanda']
model = LinearRegression()
model.fit(X, y)
futura_datas = pd.DataFrame({'Data': pd.date_range(start='hoje', periods=5)})
futura_demanda = model.predict(futura_datas)
futura_datas['Demanda prevista'] = futura_demanda
print(futura_datas)
And the error is:
"Python311\Lib\site-packages\pandas\core\indexes\accessors.py", line 512, in __new__
raise AttributeError("Can only use .dt accessor with datetimelike values")
AttributeError: Can only use .dt accessor with datetimelike values. Did you mean: 'at'?"
I tried some codes that I founded here but no answer.
And my excel is like that: enter image description here

Related

could not convert string to float: 'Runny_nose'

import pandas as pd
from sklearn.tree import DecisionTreeClassifier
Disease_data = pd.read_csv("Disease_dataset.csv")
X = Disease_data.drop(columns='Diseases')
y = Disease_data['Diseases']
model = DecisionTreeClassifier()
model.fit(X, y)
I get this error:
ValueError: could not convert string to float: 'Runny_nose'
I tried
Disease_data = Disease_data['Diseases'].astype(float)
and
music_data = pd.to_numeric(music_data, errors='coerce')
instead I get empty columns
Some of your lines might don't have valid float data.
Visit this thread for more info.

pycaret giving error:PermissionError: [WinError 32] The process cannot access the file because it is being used by another process

I am using anaconda environment , on windows with pycaret installed,and pycharm.
i want to run a basic toy example with pycaret (not using freely available datasets),
as a simple y=mx+c, where x is 1-d
here is my working code with scikit.
import numpy as np
from sklearn.linear_model import LinearRegression
import pandas as pd
x= np.arange(0,1000,dtype = 'float64')
Y = (x*2) + 1
X = x.reshape(-1,1)
reg = LinearRegression().fit(X, Y)
# if predicting on same model,perfect score
score = reg.score(X,Y)
print('1- RSS/TSS: 1 for perfect regression=' + str(score))
print('coef =' + str(reg.coef_[0])) # slope
print('intercept =' + str(reg.intercept_)) # intercept
this gives expected results as below:
Now,I create Dataframe that i can pass to pycaret pacakge.
data1 = np.vstack((x,Y)).transpose()
# create dataframe as required by Pandas
N= data1.shape[0]
# add first row
dat2 = np.array(['','Col1','Col2'])
for i in range(N):
dat_row = list(data1[i,:].flatten())
nm = ['row'+ str(i)]
dat_row = nm + dat_row
dat2 = np.vstack ((dat2, dat_row) )
df= pd.DataFrame(data=dat2[1:,1:],
index=dat2[1:,0],
columns=dat2[0,1:])
print(df)
print('***************************')
columns = df.applymap(np.isreal).all()
print(columns)
print('***************************')
# now, using Pycaret
from pycaret.regression import *
exp_reg = setup(df, html= False,target='Col2')
print('********************************')
compare_models()
when i do so,
the numeric columns i created (x,y) are shown as categorical. This also recognized
by pyCaret as Categorical.see the figure below.
Why is this Categorical? Can i change it to be treated as numeric?
Once I press enter, finally, Pycaret gives me the error below:
any ideas for this error?
sedy
You can force the data type in PyCaret within setup function by using numeric_features and categorical_features param within the setup function.
For example:
clf1 = setup(data, target = 'target', numeric_features = ['X1', 'X2'])

partial_dependence() got an unexpected keyword argument 'feature' for a python generalized additive model. How do I fix it?

I’m attempting to make a generalized additive model for some ocean data. My code is the following:
from pygam import LinearGAM
from pygam import LogisticGAM
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
pdf = pd.read_csv("phytoplankton-ratio-project.csv")
feature =['year','month','Dinoflagellate','Diatom','sea_water_temp_WOA_clim','nitrate_WOA_clim','phosphate_WOA_clim','silicate_WOA_clim']
df= pd.DataFrame()
df= pdf[feature]
df.head(5)
df.loc[:,'total'] = df['Dinoflagellate'] + df['Diatom'] ##
df.loc[:,'percentdia'] = df['Diatom']/df['total']
df.loc[:,'percentdino'] = df['Dinoflagellate']/df['total']
df = df.drop('total', axis=1)
df = df.dropna()
dat2016 = df[df.year == 2016]
dat2016 = dat2016.drop('year', axis=1)
dat2015 = df[df.year == 2015]
dat2015 = dat2015.drop('year', axis=1)
dat16dia = dat2016.drop('percentdino', axis=1)
dat16dino = dat2016.drop('percentdia', axis=1)
X_train = dat2016.drop(['percentdia', 'percentdino'], axis = 1)
#X_train.to_numpy()
y_traindino = dat2016['percentdino']
y_traindia = dat2016['percentdia']
X_test = dat2015.drop(['percentdia', 'percentdino'], axis = 1)
#X_test.to_numpy()
y_testdino = dat2015['percentdino']
y_testdia = dat2015['percentdia']
#5,7
gam = LinearGAM(n_splines=25).gridsearch(X_train.values, y_traindia.values)
gamdino = LinearGAM(n_splines=25).gridsearch(X_train.values, y_traindino.values)
XX = gam.generate_X_grid(term=0)
fig, axs = plt.subplots(1,6, figsize=(20,4))
titles = feature[1:]
for i, ax in enumerate(axs):
pdep, confi = gam.partial_dependence(XX, feature=i, width=.95)
ax.plot(XX[:, i], pdep)
ax.plot(XX[:, i], *confi, c='r', ls='--')
ax.set_title(titles[i])
However when I run it, I get the following error:
File "C:\Users\A\Documents\PythonRepository\StatsProject\Part2gam.py", line 69, in <module>
pdep, confi = gam.partial_dependence(XX, feature=i, width=.95)
TypeError: partial_dependence() got an unexpected keyword argument 'feature'
Would anyone know how to fix this?
I think you are not using the correct parameters, because the Error is telling you that you passed an unexpected argument.
I guess that you wanted to use feature=i you should use term=i .
If you see here in the docs you can see the list of params that partial_dependence takes as input.
Here is an example on how they use it Pygam regression docs

AttributeError - Even though there seem to be no attribute error

I am currently learning how to python for Machine Learning. While I am progressing, the interpreter had detected a AttributeError but I do not see any problem. Can someone help to fix this error?
My Code:
import pandas as pd
import quandl, math
import numpy as np
import datetime
import matplotlib.pyplot as plt
from matplotlib import style
from sklearn import preprocessing, cross_validation, svm
from sklearn.linear_model import LinearRegression
style.use('ggplot')
quandl.ApiConfig.api_key = ''
df = quandl.get('EOD/V', api_key = '')
df = df[['Adj_Open','Adj_High','Adj_Low','Adj_Close','Adj_Volume',]]
df['ML_PCT'] = (df['Adj_High'] - df['Adj_Close']) / df['Adj_Close'] * 100.0
df['PCT_change'] = (df['Adj_Close'] - df['Adj_Open']) / df['Adj_Open'] * 100.0
df = df[['Adj_Close', 'ML_PCT', 'PCT_change', 'Adj_Volume']]
forecast_col = 'Adj_Close'
df.fillna(value=-99999, inplace=True)
forecast_out = int(math.ceil(0.01 * len(df)))
df['label'] = df[forecast_col].shift(-forecast_out)
X = np.array(df.drop(['label'], 1))
X = preprocessing.scale(X)
X = X[:-forecast_out]
df.dropna(inplace=True)
y = np.array(df['label'])
X_train, X_test, y_train, y_test = cross_validation.train_test_split(X, y, test_size=0.2)
clf = LinearRegression(n_jobs=-1)
clf.fit(X_train, y_train)
confidence = clf.score(X_test, y_test)
print(confidence)
X_lately = X[-forecast_out:]
forecast_set = clf.predict(X_lately)
print(forecast_set, confidence, forecast_out)
df['Forecast'] = np.nan
last_date = df.iloc[-1].name
last_unix = last_date.timestamp()
one_day = 86400
next_unix = last_unix + one_day
for i in forecast_set:
next_date = datetime.datetime.fromtimestamp(next_unix)
next_unix += 86400
df.loc[next_date] = [np.nan for _ in range(len(df.columns)-1)]+[i]
df['Adj_Close'].plot()
df['Forecast'].plot()
plt.legend(loc = 4)
plt.xlabel('Date')
plt.ylabel('Price')
plt.show()
Error:
C:\Python27\lib\site-packages\sklearn\cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
"This module will be removed in 0.20.", DeprecationWarning)
0.989124557421
(array([ 94.46383723, 93.27713267, 93.15533011, 93.89038799,
94.71390166, 95.29332756, 96.23047821, 96.51527839,
96.17180986, 96.17575181, 96.68721678, 96.85114045,
97.57455941, 97.98680762, 97.32961443, 97.55881174,
97.54090546, 96.17175855, 94.95430597, 96.49002102,
96.82364097, 95.63098589, 95.61236103, 96.24114818])Traceback (most recent call last):, 0.98912455742140903, 24)
File "C:\Users\qasim\Documents\python_machine_learning\regression.py", line 47, in <module>
last_unix = last_date.timestamp()
AttributeError: 'Timestamp' object has no attribute 'timestamp'
[Finished in 36.6s]
The issue is that last_date is a pandas Timestamp object, not a python datetime object. It does have a function like datetime.timetuple(), though. Try this:
Assuming last_date is in UTC, use this:
import calendar
...
last_date = df.iloc[-1].name
last_unix = calendar.timegm(last_date.timetuple())
If last_date is in your local timezone, use this:
import time
...
last_date = df.iloc[-1].name
last_unix = time.mktime(last_date.timetuple())

Error "'numpy.ndarray' object has no attribute 'values'"

I want to shift my time series data, but I am getting the following error:
AttributeError: 'numpy.ndarray' object has no attribute 'values'
This is my code:
def create_dataset(datasets):
#series = dataset
temps = DataFrame(datasets.values)
dataframes = concat(
[temps, temps.shift(-1), temps.shift(-2), temps.shift(-3)], axis=1)
lala = numpy.array(dataframes)
return lala
# Load
dataframe = pandas.read_csv('zahlenreihe.csv', index_col=False,
engine='python', header=None)
dataset = dataframe.values
dataset = dataset.astype('float32')
# Split
train_size = int(len(dataset) * 0.70)
test_size = len(dataset) - train_size
train, test = dataset[0:train_size,:], dataset[train_size:len(dataset),:]
# Create
trainX = create_dataset(train)
I think the following line is wrong:
temps = DataFrame(datasets.values)
My zahlenreihe.csv file (number sequence) just has integers ordered like:
1
2
3
4
5
n
How should I handle it?
The solution:
The given dataset was already an array, so I didn’t need to call .value.
The problem lies in the following line:
df = StandardScaler().fit_transform(df)
It returns a NumPy array (see the documentation), which does not have a drop function. You would have to convert it into a pd.DataFrame first!
new_df = pd.DataFrame(StandardScaler().fit_transform(df), columns=df.columns, index=df.index)

Categories