Creating a multi-bar plot in MatplotLib - python

Given a simple pd.Dataframe df that looks like this:
workflow blocked_14 blocked_7 blocked_5 blocked_2 blocked_1
au_in_service_order_response au_in_service_order_response 12.00 11.76 15.38 25.0 0.0
au_in_cats_sync_billing_period au_in_cats_sync_billing_period 3.33 0.00 0.00 0.0 0.0
au_in_MeterDataNotification au_in_MeterDataNotification 8.70 0.00 0.00 0.0 0.0
I want to create a bar-chart that shows the blocked_* columns as the x-axis.
Since df.plot(x='workflow', kind='bar') obviously puts the workflows on the x-axis, I tried ax = blocked_df.plot(x=['blocked_14','blocked_7',...], kind='bar') but this gives me
ValueError: x must be a label or position
How would I create 5 y-Values and have each bar show the according value of the workflow?

Since pandas interprets the x as the index and y as the values you want to plot, you'll need to transpose your dataframe first.
import matplotlib.pyplot as plt
ax = df.set_index('workflow').T.plot.bar()
plt.show()
But that doesn't look too good does it? Let's ensure all of the labels fit on the Axes and move the legend outside of the plot so it doesn't obscure the data.
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(14, 6), layout='constrained')
ax = df.set_index('workflow').T.plot.bar(legend=False, ax=ax)
ax.legend(loc='upper left', bbox_to_anchor=(1, .8))
plt.show()

Related

Bivariate histogram with user-defined contour lines in python [duplicate]

I am trying to plot two displots side by side with this code
fig,(ax1,ax2) = plt.subplots(1,2)
sns.displot(x =X_train['Age'], hue=y_train, ax=ax1)
sns.displot(x =X_train['Fare'], hue=y_train, ax=ax2)
It returns the following result (two empty subplots followed by one displot each on two lines)-
If I try the same code with violinplot, it returns result as expected
fig,(ax1,ax2) = plt.subplots(1,2)
sns.violinplot(y_train, X_train['Age'], ax=ax1)
sns.violinplot(y_train, X_train['Fare'], ax=ax2)
Why is displot returning a different kind of output and what can I do to output two plots on the same line?
seaborn.distplot has been DEPRECATED in seaborn 0.11 and is replaced with the following:
displot(), a figure-level function with a similar flexibility over the kind of plot to draw. This is a FacetGrid, and does not have the ax parameter, so it will not work with matplotlib.pyplot.subplots.
histplot(), an axes-level function for plotting histograms, including with kernel density smoothing. This does have the ax parameter, so it will work with matplotlib.pyplot.subplots.
It is applicable to any of the seaborn FacetGrid plots that there is no ax parameter. Use the equivalent axes-level plot.
Look at the documentation for the figure-level plot to find the appropriate axes-level plot function for your needs.
See Figure-level vs. axes-level functions
Because the histogram of two different columns is desired, it's easier to use histplot.
See How to plot in multiple subplots for a number of different ways to plot into maplotlib.pyplot.subplots
Also review seaborn histplot and displot output doesn't match
Tested in seaborn 0.11.1 & matplotlib 3.4.2
fig, (ax1, ax2) = plt.subplots(1, 2)
sns.histplot(x=X_train['Age'], hue=y_train, ax=ax1)
sns.histplot(x=X_train['Fare'], hue=y_train, ax=ax2)
Imports and DataFrame Sample
import seaborn as sns
import matplotlib.pyplot as plt
# load data
penguins = sns.load_dataset("penguins", cache=False)
# display(penguins.head())
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
0 Adelie Torgersen 39.1 18.7 181.0 3750.0 MALE
1 Adelie Torgersen 39.5 17.4 186.0 3800.0 FEMALE
2 Adelie Torgersen 40.3 18.0 195.0 3250.0 FEMALE
3 Adelie Torgersen NaN NaN NaN NaN NaN
4 Adelie Torgersen 36.7 19.3 193.0 3450.0 FEMALE
Axes Level Plot
With the data in a wide format, use sns.histplot
# select the columns to be plotted
cols = ['bill_length_mm', 'bill_depth_mm']
# create the figure and axes
fig, axes = plt.subplots(1, 2)
axes = axes.ravel() # flattening the array makes indexing easier
for col, ax in zip(cols, axes):
sns.histplot(data=penguins[col], kde=True, stat='density', ax=ax)
fig.tight_layout()
plt.show()
Figure Level Plot
With the dataframe in a long format, use displot
# create a long dataframe
dfl = penguins.melt(id_vars='species', value_vars=['bill_length_mm', 'bill_depth_mm'], var_name='bill_size', value_name='vals')
# display(dfl.head())
species bill_size vals
0 Adelie bill_length_mm 39.1
1 Adelie bill_depth_mm 18.7
2 Adelie bill_length_mm 39.5
3 Adelie bill_depth_mm 17.4
4 Adelie bill_length_mm 40.3
# plot
sns.displot(data=dfl, x='vals', col='bill_size', kde=True, stat='density', common_bins=False, common_norm=False, height=4, facet_kws={'sharey': False, 'sharex': False})
Multiple DataFrames
If there are multiple dataframes, they can be combined with pd.concat, and use .assign to create an identifying 'source' column, which can be used for row=, col=, or hue=
# list of dataframe
lod = [df1, df2, df3]
# create one dataframe with a new 'source' column to use for row, col, or hue
df = pd.concat((d.assign(source=f'df{i}') for i, d in enumerate(lod, 1)), ignore_index=True)
See Import multiple csv files into pandas and concatenate into one DataFrame to read multiple files into a single dataframe with an identifying column.

How to take my 1x4 set of suplots and convert to 2x2 set of subplots in seaborn? [duplicate]

I am trying to plot two displots side by side with this code
fig,(ax1,ax2) = plt.subplots(1,2)
sns.displot(x =X_train['Age'], hue=y_train, ax=ax1)
sns.displot(x =X_train['Fare'], hue=y_train, ax=ax2)
It returns the following result (two empty subplots followed by one displot each on two lines)-
If I try the same code with violinplot, it returns result as expected
fig,(ax1,ax2) = plt.subplots(1,2)
sns.violinplot(y_train, X_train['Age'], ax=ax1)
sns.violinplot(y_train, X_train['Fare'], ax=ax2)
Why is displot returning a different kind of output and what can I do to output two plots on the same line?
seaborn.distplot has been DEPRECATED in seaborn 0.11 and is replaced with the following:
displot(), a figure-level function with a similar flexibility over the kind of plot to draw. This is a FacetGrid, and does not have the ax parameter, so it will not work with matplotlib.pyplot.subplots.
histplot(), an axes-level function for plotting histograms, including with kernel density smoothing. This does have the ax parameter, so it will work with matplotlib.pyplot.subplots.
It is applicable to any of the seaborn FacetGrid plots that there is no ax parameter. Use the equivalent axes-level plot.
Look at the documentation for the figure-level plot to find the appropriate axes-level plot function for your needs.
See Figure-level vs. axes-level functions
Because the histogram of two different columns is desired, it's easier to use histplot.
See How to plot in multiple subplots for a number of different ways to plot into maplotlib.pyplot.subplots
Also review seaborn histplot and displot output doesn't match
Tested in seaborn 0.11.1 & matplotlib 3.4.2
fig, (ax1, ax2) = plt.subplots(1, 2)
sns.histplot(x=X_train['Age'], hue=y_train, ax=ax1)
sns.histplot(x=X_train['Fare'], hue=y_train, ax=ax2)
Imports and DataFrame Sample
import seaborn as sns
import matplotlib.pyplot as plt
# load data
penguins = sns.load_dataset("penguins", cache=False)
# display(penguins.head())
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
0 Adelie Torgersen 39.1 18.7 181.0 3750.0 MALE
1 Adelie Torgersen 39.5 17.4 186.0 3800.0 FEMALE
2 Adelie Torgersen 40.3 18.0 195.0 3250.0 FEMALE
3 Adelie Torgersen NaN NaN NaN NaN NaN
4 Adelie Torgersen 36.7 19.3 193.0 3450.0 FEMALE
Axes Level Plot
With the data in a wide format, use sns.histplot
# select the columns to be plotted
cols = ['bill_length_mm', 'bill_depth_mm']
# create the figure and axes
fig, axes = plt.subplots(1, 2)
axes = axes.ravel() # flattening the array makes indexing easier
for col, ax in zip(cols, axes):
sns.histplot(data=penguins[col], kde=True, stat='density', ax=ax)
fig.tight_layout()
plt.show()
Figure Level Plot
With the dataframe in a long format, use displot
# create a long dataframe
dfl = penguins.melt(id_vars='species', value_vars=['bill_length_mm', 'bill_depth_mm'], var_name='bill_size', value_name='vals')
# display(dfl.head())
species bill_size vals
0 Adelie bill_length_mm 39.1
1 Adelie bill_depth_mm 18.7
2 Adelie bill_length_mm 39.5
3 Adelie bill_depth_mm 17.4
4 Adelie bill_length_mm 40.3
# plot
sns.displot(data=dfl, x='vals', col='bill_size', kde=True, stat='density', common_bins=False, common_norm=False, height=4, facet_kws={'sharey': False, 'sharex': False})
Multiple DataFrames
If there are multiple dataframes, they can be combined with pd.concat, and use .assign to create an identifying 'source' column, which can be used for row=, col=, or hue=
# list of dataframe
lod = [df1, df2, df3]
# create one dataframe with a new 'source' column to use for row, col, or hue
df = pd.concat((d.assign(source=f'df{i}') for i, d in enumerate(lod, 1)), ignore_index=True)
See Import multiple csv files into pandas and concatenate into one DataFrame to read multiple files into a single dataframe with an identifying column.

how plot multiples dataframe csv in same plot

I have 4 dataframes in 4 csv. I need to plot timeseries ( Date , mean ) in the same plot.
This is my script :
cc = Series.from_csv('D:/python/means2000_2001.csv' , header=0)
fig = plt.figure()
plt.plot(cc , color='red')
fig.suptitle('test title', fontsize=20)
plt.xlabel('Date', fontsize=15)
plt.ylabel('MEANS ', fontsize=15)
plt.xticks(rotation=90)
The 4 dataframes are like this ( x=Date and y=mean )
Out[307]:
Date
07-28 0.17
08-13 0.18
08-29 0.17
09-14 0.19
09-30 0.19
10-16 0.20
11-01 0.18
11-17 0.22
12-03 0.21
12-19 0.82
01-02 0.59
01-18 0.52
02-03 0.54
02-19 0.53
03-07 0.33
03-23 0.32
04-08 0.31
04-24 0.39
05-10 0.40
05-26 0.40
06-11 0.37
06-27 0.33
07-13 0.29
Name: mean, dtype: float64
when I plot the timeseries i have this graph :
how can i plot all dataframes in the same plot with different colors?
I need something like this :
You can do both:
plot all curves with one singel command, see: plt.plot()
adress each singel curve to plot, see for-loop with plt.fill_between()
if you have 2 DataFrames, say df1 and df2, then use plt.plot() twice:
plt.plot(t,df1); plt.plot(t,df2); plt.show()
import numpy as np
import pandas as pd
import matplotlib.pylab as plt
#--- generate data and DataFrame --
nt = 100
t= np.linspace(0,1,nt)*3*np.pi
y1 = np.sin(t); y2 = np.cos(t); y3 = y1*y2
df = pd.DataFrame({'y1':y1,'y2':y2,'y3':y3 })
#--- graphics ---
plt.style.use('fast')
fig, ax0 = plt.subplots(figsize=(20,4))
plt.plot(t,df, lw=4, alpha=0.6); # plot all curves with 1 command
for j in range(len(df.columns)): # add on: fill_between for each curve
plt.fill_between(t,df.values[:,j],label=df.columns[j],alpha=0.2)
plt.legend(prop={'size':15});plt.grid(axis='y');plt.show()
The answer
You can plot multiple dataframes on a single graph by capturing the Axes object that df.plot returns and then reusing it. Here's an example with two dataframes, df1 and df2:
ax = df1.plot(x='dates', y='vals', label='val 1')
df2.plot(x='dates', y='vals', label='val 2', ax=ax)
plt.show()
Output:
Details
Here's the code I used to generate random example values for df1 and df2:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
def random_dates(start, end, n=10):
if isinstance(start, str): start = pd.to_datetime(start)
if isinstance(end, str): end = pd.to_datetime(end)
start_u = start.value//10**9
end_u = end.value//10**9
return pd.to_datetime(np.random.randint(start_u, end_u, n), unit='s')
# generate two random dfs
df1 = pd.DataFrame({'dates': random_dates('2016-01-01', '2016-12-31'), 'vals': np.random.rand(10)})
df2 = pd.DataFrame({'dates': random_dates('2016-01-01', '2016-12-31'), 'vals': np.random.rand(10)})

Plotting Date fails on x-axis and extension to subplots

I am trying to plot some data but somehow the data showed on the x-axis is not the proper format. Instead having 2018-01-03 etc I am receiving 0028-02-23. When loading the data the proper format is already loaded when getting the data from the csv file.
In addition I would like to have the data plotted in diverse subplots means valuegroup in subplot 1, valuegroub B in subplot 2 etc.
The figure looks like
The code like:
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.dates import DateFormatter
csv_loader = pd.read_csv('C:/Users/micha/Desktop/Test.csv', encoding='cp1252', parse_dates=['Date'], sep=';', index_col=0).dropna()
fig, ax = plt.subplots()
csv_loader.groupby('Valuegroup').plot(x='Date', y='Value', ax=ax, legend=False, kind='line')
plt.grid(True)
myFmt = DateFormatter("%Y-%m-%d")
ax.xaxis.set_minor_formatter(myFmt)
plt.show()
The data looks like:
Calcgroup;Valuegroup;id;Date;Value
Group1;A;1;20080103;0.1
Group1;A;1;20080104;0.3
Group1;A;1;20080107;0.5
Group1;A;1;20080108;0.9
Group1;B;1;20080103;0.5
Group1;B;1;20080104;1.3
Group1;B;1;20080107;2.0
Group1;B;1;20080108;0.15
Group1;C;1;20080103;1.9
Group1;C;1;20080104;2.1
Group1;C;1;20080107;2.9
Group1;C;1;20080108;0.45
and after importing I have this dataframe:
csv_loader
Valuegroup id Date Value
Calcgroup
Group1 A 1 2008-01-03 0.10
Group1 A 1 2008-01-04 0.30
Group1 A 1 2008-01-07 0.50
Group1 A 1 2008-01-08 0.90
Group1 B 1 2008-01-03 0.50
Group1 B 1 2008-01-04 1.30
Group1 B 1 2008-01-07 2.00
Group1 B 1 2008-01-08 0.15
Group1 C 1 2008-01-03 1.90
Group1 C 1 2008-01-04 2.10
Group1 C 1 2008-01-07 2.90
Group1 C 1 2008-01-08 0.45
Try out this solution
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.dates import DateFormatter
import matplotlib.dates as mdates
csv_loader = pd.read_csv('C:/Users/micha/Desktop/Test.csv', encoding='cp1252', parse_dates=['Date'], sep=';', index_col=0)
fig, ax = plt.subplots()
csv_loader.groupby('Valuegroup').plot(x='Date', y='Value', ax=ax, legend=False, kind='line')
plt.grid(True)
myFmt = DateFormatter('%Y-%m-%d')
ax.xaxis.set_major_locator(mdates.YearLocator())
ax.xaxis.set_major_formatter(myFmt)
fig.autofmt_xdate()
plt.show()
To be honest, I was not able to find what's going wrong with this date format in your code, but however, when at least testing my approch for plotting in separate subplots, which you also asked for, I saw that the formating problem was gone and the automatic format was already the one you want to have:
fig, axs = plt.subplots(3, sharex=True, sharey=True)
for i, (name, grp) in enumerate(csv_loader.groupby('Valuegroup')):
axs[i].plot(grp.Date, grp.Value)
axs[i].set_title(name)
plt.tight_layout()
see yourself:

Wrong Dates in Dataframe and Subplots

I am trying to plot my data in the csv file. Currently my dates are not shown properly in the plot also if i am converting it. How can I change it to show the proper dat format as defined Y-m-d? The second question is that I am currently plotting all the dat in one plot but want to have for every Valuegroup one subplot.
My code looks like the following:
import pandas as pd
import matplotlib.pyplot as plt
csv_loader = pd.read_csv('C:/Test.csv', encoding='cp1252', sep=';', index_col=0).dropna()
csv_loader['Date'] = pd.to_datetime(csv_loader['Date'], format="%Y-%m-%d")
print(csv_loader)
fig, ax = plt.subplots()
csv_loader.groupby('Valuegroup').plot(x='Date', y='Value', ax=ax, legend=False, kind='line')
plt.grid(True)
The csv file looks like the following:
Calcgroup;Valuegroup;id;Date;Value
Group1;A;1;20080103;0.1
Group1;A;1;20080104;0.3
Group1;A;1;20080107;0.5
Group1;A;1;20080108;0.9
Group1;B;1;20080103;0.5
Group1;B;1;20080104;1.3
Group1;B;1;20080107;2.0
Group1;B;1;20080108;0.15
Group1;C;1;20080103;1.9
Group1;C;1;20080104;2.1
Group1;C;1;20080107;2.9
Group1;C;1;20080108;0.45
You can just tell pandas to parse that column as a datetime and it will just work:
In[151]:
import matplotlib.pyplot as plt
t="""Calcgroup;Valuegroup;id;Date;Value
Group1;A;1;20080103;0.1
Group1;A;1;20080104;0.3
Group1;A;1;20080107;0.5
Group1;A;1;20080108;0.9
Group1;B;1;20080103;0.5
Group1;B;1;20080104;1.3
Group1;B;1;20080107;2.0
Group1;B;1;20080108;0.15
Group1;C;1;20080103;1.9
Group1;C;1;20080104;2.1
Group1;C;1;20080107;2.9
Group1;C;1;20080108;0.45"""
df = pd.read_csv(io.StringIO(t), parse_dates=['Date'], sep=';', index_col=0)
df
Out[151]:
Valuegroup id Date Value
Calcgroup
Group1 A 1 2008-01-03 0.10
Group1 A 1 2008-01-04 0.30
Group1 A 1 2008-01-07 0.50
Group1 A 1 2008-01-08 0.90
Group1 B 1 2008-01-03 0.50
Group1 B 1 2008-01-04 1.30
Group1 B 1 2008-01-07 2.00
Group1 B 1 2008-01-08 0.15
Group1 C 1 2008-01-03 1.90
Group1 C 1 2008-01-04 2.10
Group1 C 1 2008-01-07 2.90
Group1 C 1 2008-01-08 0.45
fig, ax = plt.subplots()
df.groupby('Valuegroup').plot(x='Date', y='Value', ax=ax, legend=False, kind='line')
plt.grid(True)
plt.show()
results in:
Besides your format string was incorrect anyway, it should be:
csv_loader['Date'] = pd.to_datetime(csv_loader['Date'], format="%Y%m%d")
however, this won't work as that column will have been loaded as int dtype so you would've needed to convert to string first:
csv_loader['Date'] = pd.to_datetime(csv_loader['Date'].astype(str), format="%Y%m%d")
To format the dates on the x-axis you can use DateFormatter from matplotlib see related: Editing the date formatting of x-axis tick labels in matplotlib
from matplotlib.dates import DateFormatter
fig, ax = plt.subplots()
df.groupby('Valuegroup').plot(x='Date', y='Value', ax=ax, legend=False, kind='line')
plt.grid(True)
myFmt = DateFormatter("%d-%m-%Y")
ax.xaxis.set_minor_formatter(myFmt)
plt.show()
now gives plot:
You're parsing your dates wrong; "%Y-%m-%d" would work for dates like 2017-12-11 (which is Dec 12, 2017). Your dates are of the form "%Y%m%d", without the hyphen.

Categories