Computing RSS of ARIMA model

Computing RSS of ARIMA model - python

I created an AR model whose parameters were based on my analysis of the data's autocorellation and partial autocorellation function. There is an error however when i try to compute the RSS value of the resulting model. Here is the code I used:
import matplotlib.pylab as plt
from matplotlib.pylab import rcParams
rcParams['figure.figsize'] = 15, 6
import pandas as pd
from statsmodels.tsa.arima_model import ARIMA
df = pd.read_csv('data.csv', header=0, index_col=0, parse_dates=True, sep=';')
model = ARIMA(df, order=(6, 0, 0))
results_ARIMA = model.fit(disp=-1)
plt.plot(df, color='blue', label='Original')
plt.plot(results_ARIMA.fittedvalues, color='red', label='Predicted')
plt.plot(results_ARIMA.predict(start = 23, end = 34, dynamic=True), color='red')
plt.title('RSS: %.4f'% sum((results_ARIMA.fittedvalues-df)**2))
Which results in this error message:
File "C:\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 705, in runfile
execfile(filename, namespace)
File "C:\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/Patrick Ulanday/Desktop/Thesis/ARIMA/CRWFR_boundary_ARIMA/ARIMA.py", line 64, in
print ('RSS: %.4f'% sum((results_ARIMA.fittedvalues-df)**2))
File "pandas/_libs/tslib.pyx", line 787, in pandas._libs.tslib.Timestamp.radd
File "pandas/_libs/tslib.pyx", line 1275, in pandas._libs.tslib._Timestamp.add
ValueError: Cannot add integral value to Timestamp without freq.
The modelling actually worked and is plotted but i can't upload the image, the problem is with the computation of the RSS.

Though I have a very superficial knowledge of time series, probably the problem in your code is you haven't specified the column name in the RSS determing statement in code. Look at the the below block of code:
from statsmodels.tsa.arima_model import ARIMA
model = ARIMA(indexedDataset_logScale, order = (2,1,0))
results_AR = model.fit(disp = -1)
plt.plot(datasetLogDiffShifting)
plt.plot(results_AR.fittedvalues, color = 'red')
plt.title('RSS: %.4f'%sum((results_AR.fittedvalues -
datasetLogDiffShifting['#Passengers'])**2))
print('Plotting AR model')
Hope that it solves your problem.

Related

generate px.line graph (graph in web browser window). pytohn, pycharm, kmeans, graph, plotly

I have a working code that makes perfectly working graph with command plt.plot(df, cs.predict(df), 'o'). Its the right outpu but I need to make that graph in web browser window.(my teacher wants it that way). I got another code and it works with this:
fig = px.line(new_dataset2, x='time', y="value", title='coffee-machine wattage')
fig.show()
Its a different .csv file but u get the idea, the graph looks like this:
the thing is it wont allow me to plot by px.line. Is there some way how to transform the plt.plot to plx.line?
I tried this:
fig = px.line(cs.predict(df), x='_time', y="_value", title='coffee-machine wattage',)
fig.show()
but it throws an error
"Traceback (most recent call last):
File "C:\Users\matus\PycharmProjects\fridge\cofeemashine-kmeans.py", line 28, in <module>
fig = px.line(cs.predict(df), x='_time', y="_value", title='coffee-machine wattage',)
File "C:\Users\matus\PycharmProjects\fridge\permut\lib\site-packages\plotly\express\_chart_types.py", line 264, in line
return make_figure(args=locals(), constructor=go.Scatter)
File "C:\Users\matus\PycharmProjects\fridge\permut\lib\site-packages\plotly\express\_core.py", line 1990, in make_figure
args = build_dataframe(args, constructor)
File "C:\Users\matus\PycharmProjects\fridge\permut\lib\site-packages\plotly\express\_core.py", line 1405, in build_dataframe
df_output, wide_id_vars = process_args_into_dataframe(
File "C:\Users\matus\PycharmProjects\fridge\permut\lib\site-packages\plotly\express\_core.py", line 1207, in process_args_into_dataframe
raise ValueError(err_msg)
ValueError: Value of 'x' is not the name of a column in 'data_frame'. Expected one of [0] but received: _time
Process finished with exit code 1
"
I will post my code, graph and part of my .csv file
import pandas as pd
from sklearn import cluster as cls
import matplotlib.pyplot as plt
import plotly.express as px
# Read the CSV file into a pandas DataFrame
df = pd.read_csv('coffee_machine_2022-11-22_09_22_influxdb_data.csv',header=0,usecols=['_time','_value'] )
print(df)
# Convert the _time column to a datetime type
df['_time'] = pd.to_datetime(df['_time'], format='%Y-%m-%dT%H:%M:%SZ')
# Set the index of the DataFrame to the _time column
df = df.set_index('_time')
# Print the resulting DataFrame
print(df)
# df.interpolate(method='linear') # not neccesary
cs = cls.KMeans(n_clusters=4)
cs.fit(df)
print(cs.predict([[1200]]))
plt.plot(df, cs.predict(df), 'o')
plt.show()
#below doesnt work
fig = px.line(cs.predict(df), x='_time', y="_value", title='coffee-machine wattage',)
fig.show()
This is that working graph I need to change
part of my .csv file:
result,table,_start,_stop,_time,_value,_field,_measurement,device
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:12:35Z,44.61,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:12:40Z,17.33,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:12:45Z,41.2,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:12:51Z,33.49,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:12:56Z,55.68,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:12:57Z,55.68,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:13:02Z,25.92,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:13:08Z,5.71,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:13:14Z,553.75,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:13:19Z,5.71,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:13:24Z,8.95,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:13:26Z,5.69,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:13:30Z,5.63,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0

Seaborn bar plot with regression line query

I am trying to produce a bar plot with a line of regression. I am trying to follow a previous suggestion for the same problem but get an error message that I am unable to overcome. My script is as follows:
import seaborn.apionly as sns
import matplotlib.pyplot as plt
import pandas as pd
sns.set(style="white", context="score")
data = {'Days': ['5', '10', '15', '20'],
'Impact': ['33.7561', '30.6281', '29.5748', '29.0482']
}
a = pd.DataFrame (data, columns = ['Days','Impact'])
print (a)
ax = sns.barplot(data=a, x=a.Days, y=a.Impact, color='lightblue' )
# put bars in background:
for c in ax.patches:
c.set_zorder(0)
# plot regplot with numbers 0,..,len(a) as x value
sns.regplot(x=np.arange(0,len(a)), y=a.Impact, ax=ax)
sns.despine(offset=10, trim=False)
ax.set_ylabel("")
ax.set_xticklabels(['5', '10','15','20'])
plt.show()
The error message I get is:
Traceback (most recent call last):
File "C:\Users\david\AppData\Local\Programs\Spyder\pkgs\IPython\core\async_helpers.py", line 68, in _pseudo_sync_runner
coro.send(None)
File "C:\Users\david\AppData\Local\Programs\Spyder\pkgs\IPython\core\interactiveshell.py", line 3162, in run_cell_async
self.displayhook.exec_result = result
File "C:\Users\david\AppData\Local\Programs\Spyder\pkgs\traitlets\traitlets.py", line 604, in __set__
self.set(obj, value)
File "C:\Users\david\AppData\Local\Programs\Spyder\pkgs\traitlets\traitlets.py", line 578, in set
new_value = self._validate(obj, value)
File "C:\Users\david\AppData\Local\Programs\Spyder\pkgs\traitlets\traitlets.py", line 610, in _validate
value = self.validate(obj, value)
File "C:\Users\david\AppData\Local\Programs\Spyder\pkgs\traitlets\traitlets.py", line 1842, in validate
if isinstance(value, self.klass):
TypeError: isinstance() arg 2 must be a type or tuple of types
ERROR! Session/line number was not unique in database. History logging moved to new session 54
but I am not sure what this means. Can anyone help?

Please ensure you supply int or float in the df
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
data = {'Days': [5, 10, 15, 20],
'Impact': [33.7561, 30.6281, 29.5748, 29.0482]
}
a = pd.DataFrame (data, columns = ['Days','Impact'])
print (a)
ax = sns.barplot(data=a, x='Days', y='Impact', color='lightblue' )
# put bars in background:
for c in ax.patches:
c.set_zorder(0)
# plot regplot with numbers 0,..,len(a) as x value
ax = sns.regplot(x=np.arange(0,len(a)), y=a['Impact'], marker="+")
sns.despine(offset=10, trim=False)
ax.set_ylabel("")
ax.set_xticklabels(['5', '10','15','20'])
plt.show()

Cannot get Proper Labels on PyPlot HeatMap from a Pandas Dataframe

Getting an error message when I try to render a heat map using this code below. This is just a way of testing this, I have much more involved application of to a large dataset about used cars...but I cannot even get past this issue with two pieces of data.
import pandas as pd
import matplotlib as plt
from matplotlib import pyplot
import numpy as np
# initialize list of lists
#Putting in numbers for the "Name" data ends up working
#data = [[3, 10], [3, 15], [6,50]]
#initializing like this with actual strings for names gives the error
data = [["James", 10], ["Mary", 15], ["Emily", 14]]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['Name', 'Age'])
# print dataframe.
print(df)
plt.pyplot.pcolor(df, cmap='RdBu')
plt.pyplot.colorbar()
plt.pyplot.ylabel("Age")
plt.pyplot.xlabel("Name")
plt.pyplot.show()
The errors are as follows:
Traceback (most recent call last):
File "/home/j/dataexploratory.py", line 22, in <module>
plt.pyplot.colorbar()
File "/home/j/anaconda2/lib/python2.7/site-packages/matplotlib/pyplot.py", line 2320, in colorbar
ret = gcf().colorbar(mappable, cax = cax, ax=ax, **kw)
File "/home/j/anaconda2/lib/python2.7/site-packages/matplotlib/figure.py", line 2098, in colorbar
cb = cbar.colorbar_factory(cax, mappable, **cb_kw)
File "/home/j/anaconda2/lib/python2.7/site-packages/matplotlib/colorbar.py", line 1399, in colorbar_factory
cb = Colorbar(cax, mappable, **kwargs)
File "/home/j/anaconda2/lib/python2.7/site-packages/matplotlib/colorbar.py", line 945, in __init__
ColorbarBase.__init__(self, ax, **kw)
File "/home/j/anaconda2/lib/python2.7/site-packages/matplotlib/colorbar.py", line 327, in __init__
self.draw_all()
File "/home/j/anaconda2/lib/python2.7/site-packages/matplotlib/colorbar.py", line 349, in draw_all
self._process_values()
File "/home/j/anaconda2/lib/python2.7/site-packages/matplotlib/colorbar.py", line 703, in _process_values
expander=0.1)
File "/home/j/anaconda2/lib/python2.7/site-packages/matplotlib/transforms.py", line 2930, in nonsingular
if (not np.isfinite(vmin)) or (not np.isfinite(vmax)):
TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

Convert 'Name' to categorical dtype.

Basemap TypeError: input must be an array, list, tuple or scalar

I'm trying to create a map visualization using the basemap module in Python 3.0 but when I try to plot this figure I get the TypeError:
TypeError: input must be an array, list, tuple or scalar
My code looks like this:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
data = pd.ExcelFile('C:\\Users\\...xlsx')
data_input = pd.read_excel(data, 'Sheet2')
# Extract the data we're interested in
lat = data_input['value1'].values
lon = data_input['value2'].values
capacity = data_input['value3'].values
# 1. Draw the map background
fig = plt.figure(figsize=(8, 8))
m = Basemap(projection='lcc', resolution='h',
lat_0=31.1351682, lon_0=-99.3350553,
width=1.3E6, height=1.25E6)
m.shadedrelief()
m.drawcoastlines(color='gray')
m.drawcountries(color='gray')
m.drawstates(color='gray')
# 2. scatter city data, with color reflecting population
# and size reflecting area
m.scatter(lon, lat, latlon=True,
c=np.log10(capacity), s=capacity,
cmap='Reds', alpha=0.5)
I've tried changing all the inputs to data_input.values, data_input.to_list(), list(data_input) and just using the default pandas Series.
The error traceback occurs here:
File "<ipython-input-6-3a66206674c7>", line 3, in <module>
cmap='Reds', alpha=0.5)
File "C:\Users\...Continuum\anaconda3\lib\site-packages\mpl_toolkits\basemap\__init__.py", line 566, in with_transform
x, y = self(x,y)
File "C:\Users\...\Continuum\anaconda3\lib\site-packages\mpl_toolkits\basemap\__init__.py", line 1191, in __call__
xout,yout = self.projtran(x,y,inverse=inverse)
File "C:\Users\...\Continuum\anaconda3\lib\site-packages\mpl_toolkits\basemap\proj.py", line 288, in __call__
outx,outy = self._proj4(x, y, inverse=inverse)
File "C:\Users\...\Continuum\anaconda3\lib\site-packages\pyproj\__init__.py", line 397, in __call__
inx, xisfloat, xislist, xistuple = _copytobuffer(lon)
File "C:\Users\...\Continuum\anaconda3\lib\site-packages\pyproj\__init__.py", line 652, in _copytobuffer
raise TypeError('input must be an array, list, tuple or scalar')
No matter what form it gets it doesn't work. What am I missing here?

Python: wrong debugging

I'm implementing some codes using PyCharm Community Edition 2016.1.4 as environment.
I have the following simple code:
print(__doc__)
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from sklearn.cluster import KMeans
from sklearn import datasets
np.random.seed(5)
centers = [[1, 1], [-1, -1], [1, -1]]
iris = datasets.load_iris()
X = iris.data
y = iris.target
estimators = {'k_means_iris_3': KMeans(n_clusters=3),
'k_means_iris_8': KMeans(n_clusters=8),
'k_means_iris_bad_init': KMeans(n_clusters=3, n_init=1,
init='random')}
fignum = 1
name = 'k_means_iris_3'
est = KMeans(n_clusters=3)
fig = plt.figure(fignum, figsize=(4, 3))
plt.clf()
ax = Axes3D(fig, rect=[0, 0, .95, 1], elev=48, azim=134)
est.fit(X)
labels = est.labels_
ax.scatter(X[:, 3], X[:, 0], X[:, 2], c=labels.astype(np.float))
ax.w_xaxis.set_ticklabels([])
ax.w_yaxis.set_ticklabels([])
ax.w_zaxis.set_ticklabels([])
ax.set_xlabel('Petal width')
ax.set_ylabel('Sepal length')
ax.set_zlabel('Petal length')
fignum = fignum + 1
plt.show()
If I simply run it I correctly obtain the proper image:
At the contrary, if I go in debug mode, when I arrive at the line:
fig = plt.figure(fignum, figsize=(4, 3))
I get this error:
Traceback (most recent call last):
File "C:\Program Files\Anaconda\lib\site-packages\IPython\core\interactiveshell.py", line 2885, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-1-36b230119a6b>", line 1, in <module>
fig = plt.figure(fignum, figsize=(4, 3))
File "C:\Program Files\Anaconda\lib\site-packages\matplotlib\pyplot.py", line 535, in figure
**kwargs)
File "C:\Program Files\Anaconda\lib\site-packages\matplotlib\backends\backend_qt5agg.py", line 44, in new_figure_manager
return new_figure_manager_given_figure(num, thisFig)
File "C:\Program Files\Anaconda\lib\site-packages\matplotlib\backends\backend_qt5agg.py", line 51, in new_figure_manager_given_figure
canvas = FigureCanvasQTAgg(figure)
File "C:\Program Files\Anaconda\lib\site-packages\matplotlib\backends\backend_qt5agg.py", line 223, in __init__
super(FigureCanvasQTAgg, self).__init__(figure=figure)
File "C:\Program Files\Anaconda\lib\site-packages\matplotlib\backends\backend_qt5agg.py", line 66, in __init__
super(FigureCanvasQTAggBase, self).__init__(figure=figure)
File "C:\Program Files\Anaconda\lib\site-packages\matplotlib\backends\backend_qt5.py", line 239, in __init__
super(FigureCanvasQT, self).__init__(figure=figure)
AttributeError: 'figure()' is not a Qt property or a signal
Can you imagine why?

The python error was kinda misleading. The real problem was the lack (for some reasons due to a double installation of Python) of a Python binding: PyQt4.
Just go here and, choose the proper installer and... just run it! You can also avoid to close-and-reopen the pyCharm (in a couple of seconds it fixes itself and no more errors).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Computing RSS of ARIMA model - python

Related

generate px.line graph (graph in web browser window). pytohn, pycharm, kmeans, graph, plotly

Seaborn bar plot with regression line query

Cannot get Proper Labels on PyPlot HeatMap from a Pandas Dataframe

Basemap TypeError: input must be an array, list, tuple or scalar

Python: wrong debugging

Categories

Resources