Cannot get Proper Labels on PyPlot HeatMap from a Pandas Dataframe

Cannot get Proper Labels on PyPlot HeatMap from a Pandas Dataframe - python

Getting an error message when I try to render a heat map using this code below. This is just a way of testing this, I have much more involved application of to a large dataset about used cars...but I cannot even get past this issue with two pieces of data.
import pandas as pd
import matplotlib as plt
from matplotlib import pyplot
import numpy as np
# initialize list of lists
#Putting in numbers for the "Name" data ends up working
#data = [[3, 10], [3, 15], [6,50]]
#initializing like this with actual strings for names gives the error
data = [["James", 10], ["Mary", 15], ["Emily", 14]]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['Name', 'Age'])
# print dataframe.
print(df)
plt.pyplot.pcolor(df, cmap='RdBu')
plt.pyplot.colorbar()
plt.pyplot.ylabel("Age")
plt.pyplot.xlabel("Name")
plt.pyplot.show()
The errors are as follows:
Traceback (most recent call last):
File "/home/j/dataexploratory.py", line 22, in <module>
plt.pyplot.colorbar()
File "/home/j/anaconda2/lib/python2.7/site-packages/matplotlib/pyplot.py", line 2320, in colorbar
ret = gcf().colorbar(mappable, cax = cax, ax=ax, **kw)
File "/home/j/anaconda2/lib/python2.7/site-packages/matplotlib/figure.py", line 2098, in colorbar
cb = cbar.colorbar_factory(cax, mappable, **cb_kw)
File "/home/j/anaconda2/lib/python2.7/site-packages/matplotlib/colorbar.py", line 1399, in colorbar_factory
cb = Colorbar(cax, mappable, **kwargs)
File "/home/j/anaconda2/lib/python2.7/site-packages/matplotlib/colorbar.py", line 945, in __init__
ColorbarBase.__init__(self, ax, **kw)
File "/home/j/anaconda2/lib/python2.7/site-packages/matplotlib/colorbar.py", line 327, in __init__
self.draw_all()
File "/home/j/anaconda2/lib/python2.7/site-packages/matplotlib/colorbar.py", line 349, in draw_all
self._process_values()
File "/home/j/anaconda2/lib/python2.7/site-packages/matplotlib/colorbar.py", line 703, in _process_values
expander=0.1)
File "/home/j/anaconda2/lib/python2.7/site-packages/matplotlib/transforms.py", line 2930, in nonsingular
if (not np.isfinite(vmin)) or (not np.isfinite(vmax)):
TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

Convert 'Name' to categorical dtype.

Related

generate px.line graph (graph in web browser window). pytohn, pycharm, kmeans, graph, plotly

I have a working code that makes perfectly working graph with command plt.plot(df, cs.predict(df), 'o'). Its the right outpu but I need to make that graph in web browser window.(my teacher wants it that way). I got another code and it works with this:
fig = px.line(new_dataset2, x='time', y="value", title='coffee-machine wattage')
fig.show()
Its a different .csv file but u get the idea, the graph looks like this:
the thing is it wont allow me to plot by px.line. Is there some way how to transform the plt.plot to plx.line?
I tried this:
fig = px.line(cs.predict(df), x='_time', y="_value", title='coffee-machine wattage',)
fig.show()
but it throws an error
"Traceback (most recent call last):
File "C:\Users\matus\PycharmProjects\fridge\cofeemashine-kmeans.py", line 28, in <module>
fig = px.line(cs.predict(df), x='_time', y="_value", title='coffee-machine wattage',)
File "C:\Users\matus\PycharmProjects\fridge\permut\lib\site-packages\plotly\express\_chart_types.py", line 264, in line
return make_figure(args=locals(), constructor=go.Scatter)
File "C:\Users\matus\PycharmProjects\fridge\permut\lib\site-packages\plotly\express\_core.py", line 1990, in make_figure
args = build_dataframe(args, constructor)
File "C:\Users\matus\PycharmProjects\fridge\permut\lib\site-packages\plotly\express\_core.py", line 1405, in build_dataframe
df_output, wide_id_vars = process_args_into_dataframe(
File "C:\Users\matus\PycharmProjects\fridge\permut\lib\site-packages\plotly\express\_core.py", line 1207, in process_args_into_dataframe
raise ValueError(err_msg)
ValueError: Value of 'x' is not the name of a column in 'data_frame'. Expected one of [0] but received: _time
Process finished with exit code 1
"
I will post my code, graph and part of my .csv file
import pandas as pd
from sklearn import cluster as cls
import matplotlib.pyplot as plt
import plotly.express as px
# Read the CSV file into a pandas DataFrame
df = pd.read_csv('coffee_machine_2022-11-22_09_22_influxdb_data.csv',header=0,usecols=['_time','_value'] )
print(df)
# Convert the _time column to a datetime type
df['_time'] = pd.to_datetime(df['_time'], format='%Y-%m-%dT%H:%M:%SZ')
# Set the index of the DataFrame to the _time column
df = df.set_index('_time')
# Print the resulting DataFrame
print(df)
# df.interpolate(method='linear') # not neccesary
cs = cls.KMeans(n_clusters=4)
cs.fit(df)
print(cs.predict([[1200]]))
plt.plot(df, cs.predict(df), 'o')
plt.show()
#below doesnt work
fig = px.line(cs.predict(df), x='_time', y="_value", title='coffee-machine wattage',)
fig.show()
This is that working graph I need to change
part of my .csv file:
result,table,_start,_stop,_time,_value,_field,_measurement,device
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:12:35Z,44.61,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:12:40Z,17.33,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:12:45Z,41.2,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:12:51Z,33.49,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:12:56Z,55.68,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:12:57Z,55.68,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:13:02Z,25.92,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:13:08Z,5.71,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:13:14Z,553.75,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:13:19Z,5.71,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:13:24Z,8.95,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:13:26Z,5.69,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:13:30Z,5.63,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0

Seaborn bar plot with regression line query

I am trying to produce a bar plot with a line of regression. I am trying to follow a previous suggestion for the same problem but get an error message that I am unable to overcome. My script is as follows:
import seaborn.apionly as sns
import matplotlib.pyplot as plt
import pandas as pd
sns.set(style="white", context="score")
data = {'Days': ['5', '10', '15', '20'],
'Impact': ['33.7561', '30.6281', '29.5748', '29.0482']
}
a = pd.DataFrame (data, columns = ['Days','Impact'])
print (a)
ax = sns.barplot(data=a, x=a.Days, y=a.Impact, color='lightblue' )
# put bars in background:
for c in ax.patches:
c.set_zorder(0)
# plot regplot with numbers 0,..,len(a) as x value
sns.regplot(x=np.arange(0,len(a)), y=a.Impact, ax=ax)
sns.despine(offset=10, trim=False)
ax.set_ylabel("")
ax.set_xticklabels(['5', '10','15','20'])
plt.show()
The error message I get is:
Traceback (most recent call last):
File "C:\Users\david\AppData\Local\Programs\Spyder\pkgs\IPython\core\async_helpers.py", line 68, in _pseudo_sync_runner
coro.send(None)
File "C:\Users\david\AppData\Local\Programs\Spyder\pkgs\IPython\core\interactiveshell.py", line 3162, in run_cell_async
self.displayhook.exec_result = result
File "C:\Users\david\AppData\Local\Programs\Spyder\pkgs\traitlets\traitlets.py", line 604, in __set__
self.set(obj, value)
File "C:\Users\david\AppData\Local\Programs\Spyder\pkgs\traitlets\traitlets.py", line 578, in set
new_value = self._validate(obj, value)
File "C:\Users\david\AppData\Local\Programs\Spyder\pkgs\traitlets\traitlets.py", line 610, in _validate
value = self.validate(obj, value)
File "C:\Users\david\AppData\Local\Programs\Spyder\pkgs\traitlets\traitlets.py", line 1842, in validate
if isinstance(value, self.klass):
TypeError: isinstance() arg 2 must be a type or tuple of types
ERROR! Session/line number was not unique in database. History logging moved to new session 54
but I am not sure what this means. Can anyone help?

Please ensure you supply int or float in the df
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
data = {'Days': [5, 10, 15, 20],
'Impact': [33.7561, 30.6281, 29.5748, 29.0482]
}
a = pd.DataFrame (data, columns = ['Days','Impact'])
print (a)
ax = sns.barplot(data=a, x='Days', y='Impact', color='lightblue' )
# put bars in background:
for c in ax.patches:
c.set_zorder(0)
# plot regplot with numbers 0,..,len(a) as x value
ax = sns.regplot(x=np.arange(0,len(a)), y=a['Impact'], marker="+")
sns.despine(offset=10, trim=False)
ax.set_ylabel("")
ax.set_xticklabels(['5', '10','15','20'])
plt.show()

Plot a dataframe of times

Hi I want to use a dataframe of times which are in the format hh:mm as the xticks of a figure.
Just to represent what I'm doing I have:
import matplotlib.pyplot as plt
import pandas as pd
#locate series to plot
df = pd.read_excel('Excel_file', header=None)
df1 = df.iloc[71:128, 3]
df2 = df.iloc[71:128, 0]
#Plot df2 on the x-axis and df1 on the y-axis
plt.plot(df2, df1)
plt.xticks(df2)
plt.show()
which gives me the (full) error:
Traceback (most recent call last):
File "C:/Users/Alessio/PycharmProjects/PeakAutomation/graphs.py", line 14, in <module>
plt.xticks(df2)
File "C:\Users\Alessio\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\matplotlib\pyplot.py", line 1483, in xticks
locs = ax.set_xticks(ticks)
File "C:\Users\Alessio\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\matplotlib\cbook\deprecation.py", line 400, in wrapper
return func(*args, **kwargs)
File "C:\Users\Alessio\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\matplotlib\axes\_base.py", line 3306, in set_xticks
ret = self.xaxis.set_ticks(ticks, minor=minor)
File "C:\Users\Alessio\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\matplotlib\cbook\deprecation.py", line 400, in wrapper
return func(*args, **kwargs)
File "C:\Users\Alessio\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\matplotlib\axis.py", line 1765, in set_ticks
self.set_view_interval(min(ticks), max(ticks))
File "C:\Users\Alessio\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\matplotlib\axis.py", line 1902, in setter
setter(self, min(vmin, vmax, oldmin), max(vmin, vmax, oldmax),
TypeError: '<' not supported between instances of 'float' and 'datetime.time'
How can I plot df2 as the x-axis values?
When I dont try to change the xticks, I get this:
Which is fine, but I want to have the x-values to be those from the dataFrame (df2)
Thanks

Basemap TypeError: input must be an array, list, tuple or scalar

I'm trying to create a map visualization using the basemap module in Python 3.0 but when I try to plot this figure I get the TypeError:
TypeError: input must be an array, list, tuple or scalar
My code looks like this:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
data = pd.ExcelFile('C:\\Users\\...xlsx')
data_input = pd.read_excel(data, 'Sheet2')
# Extract the data we're interested in
lat = data_input['value1'].values
lon = data_input['value2'].values
capacity = data_input['value3'].values
# 1. Draw the map background
fig = plt.figure(figsize=(8, 8))
m = Basemap(projection='lcc', resolution='h',
lat_0=31.1351682, lon_0=-99.3350553,
width=1.3E6, height=1.25E6)
m.shadedrelief()
m.drawcoastlines(color='gray')
m.drawcountries(color='gray')
m.drawstates(color='gray')
# 2. scatter city data, with color reflecting population
# and size reflecting area
m.scatter(lon, lat, latlon=True,
c=np.log10(capacity), s=capacity,
cmap='Reds', alpha=0.5)
I've tried changing all the inputs to data_input.values, data_input.to_list(), list(data_input) and just using the default pandas Series.
The error traceback occurs here:
File "<ipython-input-6-3a66206674c7>", line 3, in <module>
cmap='Reds', alpha=0.5)
File "C:\Users\...Continuum\anaconda3\lib\site-packages\mpl_toolkits\basemap\__init__.py", line 566, in with_transform
x, y = self(x,y)
File "C:\Users\...\Continuum\anaconda3\lib\site-packages\mpl_toolkits\basemap\__init__.py", line 1191, in __call__
xout,yout = self.projtran(x,y,inverse=inverse)
File "C:\Users\...\Continuum\anaconda3\lib\site-packages\mpl_toolkits\basemap\proj.py", line 288, in __call__
outx,outy = self._proj4(x, y, inverse=inverse)
File "C:\Users\...\Continuum\anaconda3\lib\site-packages\pyproj\__init__.py", line 397, in __call__
inx, xisfloat, xislist, xistuple = _copytobuffer(lon)
File "C:\Users\...\Continuum\anaconda3\lib\site-packages\pyproj\__init__.py", line 652, in _copytobuffer
raise TypeError('input must be an array, list, tuple or scalar')
No matter what form it gets it doesn't work. What am I missing here?

Computing RSS of ARIMA model

I created an AR model whose parameters were based on my analysis of the data's autocorellation and partial autocorellation function. There is an error however when i try to compute the RSS value of the resulting model. Here is the code I used:
import matplotlib.pylab as plt
from matplotlib.pylab import rcParams
rcParams['figure.figsize'] = 15, 6
import pandas as pd
from statsmodels.tsa.arima_model import ARIMA
df = pd.read_csv('data.csv', header=0, index_col=0, parse_dates=True, sep=';')
model = ARIMA(df, order=(6, 0, 0))
results_ARIMA = model.fit(disp=-1)
plt.plot(df, color='blue', label='Original')
plt.plot(results_ARIMA.fittedvalues, color='red', label='Predicted')
plt.plot(results_ARIMA.predict(start = 23, end = 34, dynamic=True), color='red')
plt.title('RSS: %.4f'% sum((results_ARIMA.fittedvalues-df)**2))
Which results in this error message:
File "C:\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 705, in runfile
execfile(filename, namespace)
File "C:\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/Patrick Ulanday/Desktop/Thesis/ARIMA/CRWFR_boundary_ARIMA/ARIMA.py", line 64, in
print ('RSS: %.4f'% sum((results_ARIMA.fittedvalues-df)**2))
File "pandas/_libs/tslib.pyx", line 787, in pandas._libs.tslib.Timestamp.radd
File "pandas/_libs/tslib.pyx", line 1275, in pandas._libs.tslib._Timestamp.add
ValueError: Cannot add integral value to Timestamp without freq.
The modelling actually worked and is plotted but i can't upload the image, the problem is with the computation of the RSS.

Though I have a very superficial knowledge of time series, probably the problem in your code is you haven't specified the column name in the RSS determing statement in code. Look at the the below block of code:
from statsmodels.tsa.arima_model import ARIMA
model = ARIMA(indexedDataset_logScale, order = (2,1,0))
results_AR = model.fit(disp = -1)
plt.plot(datasetLogDiffShifting)
plt.plot(results_AR.fittedvalues, color = 'red')
plt.title('RSS: %.4f'%sum((results_AR.fittedvalues -
datasetLogDiffShifting['#Passengers'])**2))
print('Plotting AR model')
Hope that it solves your problem.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Cannot get Proper Labels on PyPlot HeatMap from a Pandas Dataframe - python

Convert 'Name' to categorical dtype.

Related

generate px.line graph (graph in web browser window). pytohn, pycharm, kmeans, graph, plotly

Seaborn bar plot with regression line query

Plot a dataframe of times

Basemap TypeError: input must be an array, list, tuple or scalar

Computing RSS of ARIMA model

Categories

Resources