Seaborn bar plot with regression line query

Seaborn bar plot with regression line query - python

I am trying to produce a bar plot with a line of regression. I am trying to follow a previous suggestion for the same problem but get an error message that I am unable to overcome. My script is as follows:
import seaborn.apionly as sns
import matplotlib.pyplot as plt
import pandas as pd
sns.set(style="white", context="score")
data = {'Days': ['5', '10', '15', '20'],
'Impact': ['33.7561', '30.6281', '29.5748', '29.0482']
}
a = pd.DataFrame (data, columns = ['Days','Impact'])
print (a)
ax = sns.barplot(data=a, x=a.Days, y=a.Impact, color='lightblue' )
# put bars in background:
for c in ax.patches:
c.set_zorder(0)
# plot regplot with numbers 0,..,len(a) as x value
sns.regplot(x=np.arange(0,len(a)), y=a.Impact, ax=ax)
sns.despine(offset=10, trim=False)
ax.set_ylabel("")
ax.set_xticklabels(['5', '10','15','20'])
plt.show()
The error message I get is:
Traceback (most recent call last):
File "C:\Users\david\AppData\Local\Programs\Spyder\pkgs\IPython\core\async_helpers.py", line 68, in _pseudo_sync_runner
coro.send(None)
File "C:\Users\david\AppData\Local\Programs\Spyder\pkgs\IPython\core\interactiveshell.py", line 3162, in run_cell_async
self.displayhook.exec_result = result
File "C:\Users\david\AppData\Local\Programs\Spyder\pkgs\traitlets\traitlets.py", line 604, in __set__
self.set(obj, value)
File "C:\Users\david\AppData\Local\Programs\Spyder\pkgs\traitlets\traitlets.py", line 578, in set
new_value = self._validate(obj, value)
File "C:\Users\david\AppData\Local\Programs\Spyder\pkgs\traitlets\traitlets.py", line 610, in _validate
value = self.validate(obj, value)
File "C:\Users\david\AppData\Local\Programs\Spyder\pkgs\traitlets\traitlets.py", line 1842, in validate
if isinstance(value, self.klass):
TypeError: isinstance() arg 2 must be a type or tuple of types
ERROR! Session/line number was not unique in database. History logging moved to new session 54
but I am not sure what this means. Can anyone help?

Please ensure you supply int or float in the df
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
data = {'Days': [5, 10, 15, 20],
'Impact': [33.7561, 30.6281, 29.5748, 29.0482]
}
a = pd.DataFrame (data, columns = ['Days','Impact'])
print (a)
ax = sns.barplot(data=a, x='Days', y='Impact', color='lightblue' )
# put bars in background:
for c in ax.patches:
c.set_zorder(0)
# plot regplot with numbers 0,..,len(a) as x value
ax = sns.regplot(x=np.arange(0,len(a)), y=a['Impact'], marker="+")
sns.despine(offset=10, trim=False)
ax.set_ylabel("")
ax.set_xticklabels(['5', '10','15','20'])
plt.show()

Related

generate px.line graph (graph in web browser window). pytohn, pycharm, kmeans, graph, plotly

I have a working code that makes perfectly working graph with command plt.plot(df, cs.predict(df), 'o'). Its the right outpu but I need to make that graph in web browser window.(my teacher wants it that way). I got another code and it works with this:
fig = px.line(new_dataset2, x='time', y="value", title='coffee-machine wattage')
fig.show()
Its a different .csv file but u get the idea, the graph looks like this:
the thing is it wont allow me to plot by px.line. Is there some way how to transform the plt.plot to plx.line?
I tried this:
fig = px.line(cs.predict(df), x='_time', y="_value", title='coffee-machine wattage',)
fig.show()
but it throws an error
"Traceback (most recent call last):
File "C:\Users\matus\PycharmProjects\fridge\cofeemashine-kmeans.py", line 28, in <module>
fig = px.line(cs.predict(df), x='_time', y="_value", title='coffee-machine wattage',)
File "C:\Users\matus\PycharmProjects\fridge\permut\lib\site-packages\plotly\express\_chart_types.py", line 264, in line
return make_figure(args=locals(), constructor=go.Scatter)
File "C:\Users\matus\PycharmProjects\fridge\permut\lib\site-packages\plotly\express\_core.py", line 1990, in make_figure
args = build_dataframe(args, constructor)
File "C:\Users\matus\PycharmProjects\fridge\permut\lib\site-packages\plotly\express\_core.py", line 1405, in build_dataframe
df_output, wide_id_vars = process_args_into_dataframe(
File "C:\Users\matus\PycharmProjects\fridge\permut\lib\site-packages\plotly\express\_core.py", line 1207, in process_args_into_dataframe
raise ValueError(err_msg)
ValueError: Value of 'x' is not the name of a column in 'data_frame'. Expected one of [0] but received: _time
Process finished with exit code 1
"
I will post my code, graph and part of my .csv file
import pandas as pd
from sklearn import cluster as cls
import matplotlib.pyplot as plt
import plotly.express as px
# Read the CSV file into a pandas DataFrame
df = pd.read_csv('coffee_machine_2022-11-22_09_22_influxdb_data.csv',header=0,usecols=['_time','_value'] )
print(df)
# Convert the _time column to a datetime type
df['_time'] = pd.to_datetime(df['_time'], format='%Y-%m-%dT%H:%M:%SZ')
# Set the index of the DataFrame to the _time column
df = df.set_index('_time')
# Print the resulting DataFrame
print(df)
# df.interpolate(method='linear') # not neccesary
cs = cls.KMeans(n_clusters=4)
cs.fit(df)
print(cs.predict([[1200]]))
plt.plot(df, cs.predict(df), 'o')
plt.show()
#below doesnt work
fig = px.line(cs.predict(df), x='_time', y="_value", title='coffee-machine wattage',)
fig.show()
This is that working graph I need to change
part of my .csv file:
result,table,_start,_stop,_time,_value,_field,_measurement,device
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:12:35Z,44.61,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:12:40Z,17.33,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:12:45Z,41.2,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:12:51Z,33.49,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:12:56Z,55.68,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:12:57Z,55.68,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:13:02Z,25.92,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:13:08Z,5.71,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:13:14Z,553.75,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:13:19Z,5.71,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:13:24Z,8.95,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:13:26Z,5.69,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:13:30Z,5.63,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0

Plot a dataframe of times

Hi I want to use a dataframe of times which are in the format hh:mm as the xticks of a figure.
Just to represent what I'm doing I have:
import matplotlib.pyplot as plt
import pandas as pd
#locate series to plot
df = pd.read_excel('Excel_file', header=None)
df1 = df.iloc[71:128, 3]
df2 = df.iloc[71:128, 0]
#Plot df2 on the x-axis and df1 on the y-axis
plt.plot(df2, df1)
plt.xticks(df2)
plt.show()
which gives me the (full) error:
Traceback (most recent call last):
File "C:/Users/Alessio/PycharmProjects/PeakAutomation/graphs.py", line 14, in <module>
plt.xticks(df2)
File "C:\Users\Alessio\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\matplotlib\pyplot.py", line 1483, in xticks
locs = ax.set_xticks(ticks)
File "C:\Users\Alessio\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\matplotlib\cbook\deprecation.py", line 400, in wrapper
return func(*args, **kwargs)
File "C:\Users\Alessio\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\matplotlib\axes\_base.py", line 3306, in set_xticks
ret = self.xaxis.set_ticks(ticks, minor=minor)
File "C:\Users\Alessio\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\matplotlib\cbook\deprecation.py", line 400, in wrapper
return func(*args, **kwargs)
File "C:\Users\Alessio\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\matplotlib\axis.py", line 1765, in set_ticks
self.set_view_interval(min(ticks), max(ticks))
File "C:\Users\Alessio\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\matplotlib\axis.py", line 1902, in setter
setter(self, min(vmin, vmax, oldmin), max(vmin, vmax, oldmax),
TypeError: '<' not supported between instances of 'float' and 'datetime.time'
How can I plot df2 as the x-axis values?
When I dont try to change the xticks, I get this:
Which is fine, but I want to have the x-values to be those from the dataFrame (df2)
Thanks

Cannot get Proper Labels on PyPlot HeatMap from a Pandas Dataframe

Getting an error message when I try to render a heat map using this code below. This is just a way of testing this, I have much more involved application of to a large dataset about used cars...but I cannot even get past this issue with two pieces of data.
import pandas as pd
import matplotlib as plt
from matplotlib import pyplot
import numpy as np
# initialize list of lists
#Putting in numbers for the "Name" data ends up working
#data = [[3, 10], [3, 15], [6,50]]
#initializing like this with actual strings for names gives the error
data = [["James", 10], ["Mary", 15], ["Emily", 14]]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['Name', 'Age'])
# print dataframe.
print(df)
plt.pyplot.pcolor(df, cmap='RdBu')
plt.pyplot.colorbar()
plt.pyplot.ylabel("Age")
plt.pyplot.xlabel("Name")
plt.pyplot.show()
The errors are as follows:
Traceback (most recent call last):
File "/home/j/dataexploratory.py", line 22, in <module>
plt.pyplot.colorbar()
File "/home/j/anaconda2/lib/python2.7/site-packages/matplotlib/pyplot.py", line 2320, in colorbar
ret = gcf().colorbar(mappable, cax = cax, ax=ax, **kw)
File "/home/j/anaconda2/lib/python2.7/site-packages/matplotlib/figure.py", line 2098, in colorbar
cb = cbar.colorbar_factory(cax, mappable, **cb_kw)
File "/home/j/anaconda2/lib/python2.7/site-packages/matplotlib/colorbar.py", line 1399, in colorbar_factory
cb = Colorbar(cax, mappable, **kwargs)
File "/home/j/anaconda2/lib/python2.7/site-packages/matplotlib/colorbar.py", line 945, in __init__
ColorbarBase.__init__(self, ax, **kw)
File "/home/j/anaconda2/lib/python2.7/site-packages/matplotlib/colorbar.py", line 327, in __init__
self.draw_all()
File "/home/j/anaconda2/lib/python2.7/site-packages/matplotlib/colorbar.py", line 349, in draw_all
self._process_values()
File "/home/j/anaconda2/lib/python2.7/site-packages/matplotlib/colorbar.py", line 703, in _process_values
expander=0.1)
File "/home/j/anaconda2/lib/python2.7/site-packages/matplotlib/transforms.py", line 2930, in nonsingular
if (not np.isfinite(vmin)) or (not np.isfinite(vmax)):
TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

Convert 'Name' to categorical dtype.

Basemap TypeError: input must be an array, list, tuple or scalar

I'm trying to create a map visualization using the basemap module in Python 3.0 but when I try to plot this figure I get the TypeError:
TypeError: input must be an array, list, tuple or scalar
My code looks like this:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
data = pd.ExcelFile('C:\\Users\\...xlsx')
data_input = pd.read_excel(data, 'Sheet2')
# Extract the data we're interested in
lat = data_input['value1'].values
lon = data_input['value2'].values
capacity = data_input['value3'].values
# 1. Draw the map background
fig = plt.figure(figsize=(8, 8))
m = Basemap(projection='lcc', resolution='h',
lat_0=31.1351682, lon_0=-99.3350553,
width=1.3E6, height=1.25E6)
m.shadedrelief()
m.drawcoastlines(color='gray')
m.drawcountries(color='gray')
m.drawstates(color='gray')
# 2. scatter city data, with color reflecting population
# and size reflecting area
m.scatter(lon, lat, latlon=True,
c=np.log10(capacity), s=capacity,
cmap='Reds', alpha=0.5)
I've tried changing all the inputs to data_input.values, data_input.to_list(), list(data_input) and just using the default pandas Series.
The error traceback occurs here:
File "<ipython-input-6-3a66206674c7>", line 3, in <module>
cmap='Reds', alpha=0.5)
File "C:\Users\...Continuum\anaconda3\lib\site-packages\mpl_toolkits\basemap\__init__.py", line 566, in with_transform
x, y = self(x,y)
File "C:\Users\...\Continuum\anaconda3\lib\site-packages\mpl_toolkits\basemap\__init__.py", line 1191, in __call__
xout,yout = self.projtran(x,y,inverse=inverse)
File "C:\Users\...\Continuum\anaconda3\lib\site-packages\mpl_toolkits\basemap\proj.py", line 288, in __call__
outx,outy = self._proj4(x, y, inverse=inverse)
File "C:\Users\...\Continuum\anaconda3\lib\site-packages\pyproj\__init__.py", line 397, in __call__
inx, xisfloat, xislist, xistuple = _copytobuffer(lon)
File "C:\Users\...\Continuum\anaconda3\lib\site-packages\pyproj\__init__.py", line 652, in _copytobuffer
raise TypeError('input must be an array, list, tuple or scalar')
No matter what form it gets it doesn't work. What am I missing here?

Plot a bar using matplotlib using a dictionary

Is there any way to plot a bar plot using matplotlib using data directly from a dict?
My dict looks like this:
D = {u'Label1':26, u'Label2': 17, u'Label3':30}
I was expecting
fig = plt.figure(figsize=(5.5,3),dpi=300)
ax = fig.add_subplot(111)
bar = ax.bar(D,range(1,len(D)+1,1),0.5)
to work, but it does not.
Here is the error:
>>> ax.bar(D,range(1,len(D)+1,1),0.5)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/site-packages/matplotlib/axes.py", line 4904, in bar
self.add_patch(r)
File "/usr/local/lib/python2.7/site-packages/matplotlib/axes.py", line 1570, in add_patch
self._update_patch_limits(p)
File "/usr/local/lib/python2.7/site-packages/matplotlib/axes.py", line 1588, in _update_patch_limits
xys = patch.get_patch_transform().transform(vertices)
File "/usr/local/lib/python2.7/site-packages/matplotlib/patches.py", line 580, in get_patch_transform
self._update_patch_transform()
File "/usr/local/lib/python2.7/site-packages/matplotlib/patches.py", line 576, in _update_patch_transform
bbox = transforms.Bbox.from_bounds(x, y, width, height)
File "/usr/local/lib/python2.7/site-packages/matplotlib/transforms.py", line 786, in from_bounds
return Bbox.from_extents(x0, y0, x0 + width, y0 + height)
TypeError: coercing to Unicode: need string or buffer, float found

You can do it in two lines by first plotting the bar chart and then setting the appropriate ticks:
import matplotlib.pyplot as plt
D = {u'Label1':26, u'Label2': 17, u'Label3':30}
plt.bar(range(len(D)), list(D.values()), align='center')
plt.xticks(range(len(D)), list(D.keys()))
# # for python 2.x:
# plt.bar(range(len(D)), D.values(), align='center') # python 2.x
# plt.xticks(range(len(D)), D.keys()) # in python 2.x
plt.show()
Note that the penultimate line should read plt.xticks(range(len(D)), list(D.keys())) in python3, because D.keys() returns a generator, which matplotlib cannot use directly.

It's a little simpler than most answers here suggest:
import matplotlib.pyplot as plt
D = {u'Label1':26, u'Label2': 17, u'Label3':30}
plt.bar(*zip(*D.items()))
plt.show()

For future reference, the above code does not work with Python 3. For Python 3, the D.keys() needs to be converted to a list.
import matplotlib.pyplot as plt
D = {u'Label1':26, u'Label2': 17, u'Label3':30}
plt.bar(range(len(D)), D.values(), align='center')
plt.xticks(range(len(D)), list(D.keys()))
plt.show()

Why not just:
names, counts = zip(*D.items())
plt.bar(names, counts)

The best way to implement it using matplotlib.pyplot.bar(range, height, tick_label) where the range provides scalar values for the positioning of the corresponding bar in the graph. tick_label does the same work as xticks(). One can replace it with an integer also and use multiple plt.bar(integer, height, tick_label). For detailed information please refer the documentation.
import matplotlib.pyplot as plt
data = {'apple': 67, 'mango': 60, 'lichi': 58}
names = list(data.keys())
values = list(data.values())
#tick_label does the some work as plt.xticks()
plt.bar(range(len(data)),values,tick_label=names)
plt.savefig('bar.png')
plt.show()
Additionally the same plot can be generated without using range(). But the problem encountered was that tick_label just worked for the last plt.bar() call. Hence xticks() was used for labelling:
data = {'apple': 67, 'mango': 60, 'lichi': 58}
names = list(data.keys())
values = list(data.values())
plt.bar(0,values[0],tick_label=names[0])
plt.bar(1,values[1],tick_label=names[1])
plt.bar(2,values[2],tick_label=names[2])
plt.xticks(range(0,3),names)
plt.savefig('fruit.png')
plt.show()

I often load the dict into a pandas DataFrame then use the plot function of the DataFrame.
Here is the one-liner:
pandas.DataFrame(D, index=['quantity']).plot(kind='bar')

Why not just:
import seaborn as sns
sns.barplot(list(D.keys()), list(D.values()))

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Seaborn bar plot with regression line query - python

Related

generate px.line graph (graph in web browser window). pytohn, pycharm, kmeans, graph, plotly

Plot a dataframe of times

Cannot get Proper Labels on PyPlot HeatMap from a Pandas Dataframe

Basemap TypeError: input must be an array, list, tuple or scalar

Plot a bar using matplotlib using a dictionary

Categories

Resources