How can I create a seaborn boxplot with 2 y-axes? I need this because of different scales. My current code will overwrite the first box in the boxplot, eg. it is populated by 2 first data item from first ax and first item from second ax.
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
matplotlib.style.use('ggplot')
import seaborn as sns
df = pd.DataFrame({'A': pd.Series(np.random.uniform(0,1,size=10)),
'B': pd.Series(np.random.uniform(10,20,size=10)),
'C': pd.Series(np.random.uniform(10,20,size=10))})
fig = plt.figure()
# 2/3 of A4
fig.set_size_inches(7.8, 5.51)
plt.ylim(0.0, 1.1)
ax1 = fig.add_subplot(111)
ax1 = sns.boxplot(ax=ax1, data=df[['A']])
ax2 = ax1.twinx()
boxplot = sns.boxplot(ax=ax2, data=df[['B','C']])
fig = boxplot.get_figure()
fig
How do I prevent the first item getting overwritten?
EDIT:
If I add positions argument
boxplot = sns.boxplot(ax=ax2, data=df[['B','C']], positions=[2,3])
I get an exception:
TypeError: boxplot() got multiple values for keyword argument 'positions'
Probably because seaborn already sets that argument internally.
It may not make too much sense to use seaborn here. Using usual matplotlib boxplots allows you to use the positions argument as expected.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('ggplot')
df = pd.DataFrame({'A': pd.Series(np.random.uniform(0,1,size=10)),
'B': pd.Series(np.random.uniform(10,20,size=10)),
'C': pd.Series(np.random.uniform(10,20,size=10))})
fig, ax1 = plt.subplots(figsize=(7.8, 5.51))
props = dict(widths=0.7,patch_artist=True, medianprops=dict(color="gold"))
box1=ax1.boxplot(df['A'].values, positions=[0], **props)
ax2 = ax1.twinx()
box2=ax2.boxplot(df[['B','C']].values,positions=[1,2], **props)
ax1.set_xlim(-0.5,2.5)
ax1.set_xticks(range(len(df.columns)))
ax1.set_xticklabels(df.columns)
for b in box1["boxes"]+box2["boxes"]:
b.set_facecolor(next(ax1._get_lines.prop_cycler)["color"])
plt.show()
Related
I am struggling with syncing colors between [seaborn.countplot] and [pandas.DataFrame.plot] pie plot.
I found a similar question on SO, but it does not work with pie chart as it throws an error:
TypeError: pie() got an unexpected keyword argument 'color'
I searched on the documentation sites, but all I could find is that I can set a colormap and palette, which was also not in sync in the end:
Result of using the same colormap and palette
My code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv('https://andybek.com/pandas-sat')
cat_vars = ['Borough', 'SAT Section']
for var in list(cat_vars):
fig, ax = plt.subplots(1, 2, figsize=(10, 5))
df[var].value_counts().plot(kind='pie', autopct=lambda v: f'{v:.2f}%', ax=ax[0])
cplot = sns.countplot(data=df, x=var, ax=ax[1])
for patch in cplot.patches:
cplot.annotate(
format(patch.get_height()),
(
patch.get_x() + patch.get_width() / 2,
patch.get_height()
)
)
plt.show()
Illustration of the problem
As you can see, colors are not in sync with labels.
I added the argument order to the sns.countplot(). This would change how seaborn selects the values and as a consequence the colours between both plots will mach.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv('https://andybek.com/pandas-sat')
cat_vars = ['Borough', 'SAT Section']
for var in list(cat_vars):
fig, ax = plt.subplots(1, 2, figsize=(10, 5))
df[var].value_counts().plot(kind='pie', autopct=lambda v: f'{v:.2f}%', ax=ax[0])
cplot = sns.countplot(data=df, x=var, ax=ax[1],
order=df[var].value_counts().index)
for patch in cplot.patches:
cplot.annotate(
format(patch.get_height()),
(
patch.get_x() + patch.get_width() / 2,
patch.get_height()
)
)
plt.show()
Explanation: Colors are selected by order. So, if the columns in the sns.countplot have a different order than the other plot, both plots will have different columns for the same label.
Using default colors
Using the same dataframe for the pie plot and for the seaborn plot might help. As the values are already counted for the pie plot, that same dataframe could be plotted directly as a bar plot. That way, the order of the values stays the same.
Note that seaborn by default makes the colors a bit less saturated. To get the same colors as in the pie plot, you can use saturation=1 (default is .75). To add text above the bars, the latest matplotlib versions have a new function bar_label.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
df = pd.read_csv('https://andybek.com/pandas-sat')
cat_vars = ['Borough', 'SAT Section']
for var in list(cat_vars):
fig, ax = plt.subplots(1, 2, figsize=(10, 5))
counts_df = df[var].value_counts()
counts_df.plot(kind='pie', autopct=lambda v: f'{v:.2f}%', ax=ax[0])
sns.barplot(x=counts_df.index, y=counts_df.values, saturation=1, ax=ax[1])
ax[1].bar_label(ax[1].containers[0])
#Customized colors
If you want to use a customized list of colors, you can use the colors= keyword in pie() and palette= in seaborn.
To make things fit better, you can replace spaces by newlines (so "Staten Island" will use two lines). plt.tight_layout() will rearrange spacings to make titles and texts fit nicely into the figure.
I am trying to plot a facet_grid with stacked bar charts inside.
I would like to use Seaborn. Its barplot function does not include a stacked argument.
I tried to use FacetGrid.map with a custom callable function.
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
def custom_stacked_barplot(col_day, col_time, col_total_bill, **kwargs):
dict_df={}
dict_df['day']=col_day
dict_df['time']=col_time
dict_df['total_bill']=col_total_bill
df_data_graph=pd.DataFrame(dict_df)
df = pd.crosstab(index=df_data_graph['time'], columns=tips['day'], values=tips['total_bill'], aggfunc=sum)
df.plot.bar(stacked=True)
tips=sns.load_dataset("tips")
g = sns.FacetGrid(tips, col='size', row='smoker')
g = g.map(custom_stacked_barplot, "day", 'time', 'total_bill')
However I get an empty canvas and stacked bar charts separately.
Empty canvas:
Graph1 apart:
Graph2:.
How can I fix this issue? Thanks for the help!
The simplest code to achive that result is this:
import seaborn as sns
import matplotlib.pyplot as plt
sns.set()
tips=sns.load_dataset("tips")
g = sns.FacetGrid(tips, col = 'size', row = 'smoker', hue = 'day')
g = (g.map(sns.barplot, 'time', 'total_bill', ci = None).add_legend())
plt.show()
which gives this result:
Your different mixes of APIs (pandas.DataFrame.plot) appears not to integrate with (seaborn.FacetGrid). Since stacked bar plots are not supported in seaborn plotting, consider developing your own version with matplotlib subplots by iterating across groupby levels:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
def custom_stacked_barplot(t, sub_df, ax):
plot_df = pd.crosstab(index=sub_df["time"], columns=sub_df['day'],
values=sub_df['total_bill'], aggfunc=sum)
p = plot_df.plot(kind="bar", stacked=True, ax = ax,
title = " | ".join([str(i) for i in t]))
return p
tips = sns.load_dataset("tips")
g_dfs = tips.groupby(["smoker", "size"])
# INITIALIZE PLOT
# sns.set()
fig, axes = plt.subplots(nrows=2, ncols=int(len(g_dfs)/2)+1, figsize=(15,6))
# BUILD PLOTS ACROSS LEVELS
for ax, (i,g) in zip(axes.ravel(), sorted(g_dfs)):
custom_stacked_barplot(i, g, ax)
plt.tight_layout()
plt.show()
plt.clf()
plt.close()
And use seaborn.set to adjust theme and pallette:
I have a list of strings (for example, actual string is 5000 characters) :
sequence='NGHHENIMHNYRBIFIFEMRHHCFFFJUUSVUUUUNXMTUSRHXOMEJNGKVUUUUVUUUVTUUVUWWSVULVUUUUUUUUUUUUWXQJUQRTXQRHM'
the sequence contains alphabets 'A' to 'Y'.
I want to map colors to each of the alphabets and plot like the diagram below.
expected output (this is a example output) :
I have tried the following :
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.colors import ListedColormap
import seaborn as sns
import pandas as pd
colors=sns.color_palette("coolwarm", 25)
string=[]
for char in sequence:
string.append(char)
df=pd.DataFrame({'col':string}, index=range(len(string)))
letter2num = dict(zip(list("ABCDEFGHIJKLMNOPQRSTUVWXY"), np.arange(25)))
df2 = pd.DataFrame(np.array( [letter2num[i] for i in df.values.flat] ).reshape(df.shape))
cmap = ListedColormap(colors)
fig, ax = plt.subplots(figsize=(20,19))
ax.imshow(df2.values, vmin=0, vmax=len(cmap.colors), cmap=cmap)
However, this gives a verticle and thin straight line. Can somebody put in the right direction?
my output :
I found a solution using pcolormesh
import matplotlib
from matplotlib.colors import ListedColormap
import seaborn as sns
import numpy as np
import pandas as pd
sns.set_style('white')
colors=sns.color_palette("coolwarm", 25)
df=pd.DataFrame({'col':sequence}, index=range(len(sequence)))
letter2num = dict(zip(list("ABCDEFGHIJKLMNOPQRSTUVWXY"), np.arange(25)))
df2 = pd.DataFrame(np.array( [letter2num[i] for i in df.values.flat] ).reshape(df.shape))
cmap = ListedColormap(colors)
fig, ax = plt.subplots(figsize=(3,10))
plt.pcolormesh(df2.values, vmin=0, vmax=len(cmap.colors), cmap=cmap)
plt.xticks([])
cbar=plt.colorbar( fraction=0.46,ticks=[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24],pad=0.3)
cbar.ax.set_yticklabels(['A', 'B', 'C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y'],size=15,verticalalignment='bottom',horizontalalignment='left' )
cbar.ax.tick_params(size=0, pad=5.4)
yticks=['0','50','100','150','200']
plt.yticks([0,1250,2500,3750,5000],yticks,size=15)
plt.ylabel('Time(ns)',Size=20)
I am trying to write a loop that will make a figure with 25 subplots, 1 for each country. My code makes a figure with 25 subplots, but the plots are empty. What can I change to make the data appear in the graphs?
fig = plt.figure()
for c,num in zip(countries, xrange(1,26)):
df0=df[df['Country']==c]
ax = fig.add_subplot(5,5,num)
ax.plot(x=df0['Date'], y=df0[['y1','y2','y3','y4']], title=c)
fig.show()
You got confused between the matplotlib plotting function and the pandas plotting wrapper.
The problem you have is that ax.plot does not have any x or y argument.
Use ax.plot
In that case, call it like ax.plot(df0['Date'], df0[['y1','y2']]), without x, y and title. Possibly set the title separately.
Example:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
countries = np.random.choice(list("ABCDE"),size=25)
df = pd.DataFrame({"Date" : range(200),
'Country' : np.repeat(countries,8),
'y1' : np.random.rand(200),
'y2' : np.random.rand(200)})
fig = plt.figure()
for c,num in zip(countries, xrange(1,26)):
df0=df[df['Country']==c]
ax = fig.add_subplot(5,5,num)
ax.plot(df0['Date'], df0[['y1','y2']])
ax.set_title(c)
plt.tight_layout()
plt.show()
Use the pandas plotting wrapper
In this case plot your data via df0.plot(x="Date",y =['y1','y2']).
Example:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
countries = np.random.choice(list("ABCDE"),size=25)
df = pd.DataFrame({"Date" : range(200),
'Country' : np.repeat(countries,8),
'y1' : np.random.rand(200),
'y2' : np.random.rand(200)})
fig = plt.figure()
for c,num in zip(countries, xrange(1,26)):
df0=df[df['Country']==c]
ax = fig.add_subplot(5,5,num)
df0.plot(x="Date",y =['y1','y2'], title=c, ax=ax, legend=False)
plt.tight_layout()
plt.show()
I don't remember that well how to use original subplot system but you seem to be rewriting the plot. In any case you should take a look at gridspec. Check the following example:
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
fig = plt.figure()
gs1 = gridspec.GridSpec(5, 5)
countries = ["Country " + str(i) for i in range(1, 26)]
axs = []
for c, num in zip(countries, range(1,26)):
axs.append(fig.add_subplot(gs1[num - 1]))
axs[-1].plot([1, 2, 3], [1, 2, 3])
plt.show()
Which results in this:
Just replace the example with your data and it should work fine.
NOTE: I've noticed you are using xrange. I've used range because my version of Python is 3.x. Adapt to your version.
i am try to plot subplot in matplotlib with pandas but there are issue i am facing. when i am plot subplot not show the date of stock...there is my program
import pandas as pd
import datetime
import matplotlib.pyplot as plt
import pandas.io.data
df = pd.io.data.get_data_yahoo('goog', start=datetime.datetime(2008,1,1),end=datetime.datetime(2014,10,23))
fig = plt.figure()
r = fig.patch
r.set_facecolor('#0070BB')
ax1 = fig.add_subplot(2,1,1, axisbg='#0070BB')
ax1.grid(True)
ax1.plot(df['Close'])
ax2 = fig.add_subplot(2,1,2, axisbg='#0070BB')
ax2.plot(df['Volume'])
plt.show()
run this program own your self and solve date issue.....
When you're calling matplotlib's plot(), you are only giving it one array (e.g. df['Close'] in the first case). When there's only one array, matplotlib doesn't know what to use for the x axis data, so it just uses the index of the array. This is why your x axis shows the numbers 0 to 160: there are presumably 160 items in your array.
Use ax1.plot(df.index, df['Close']) instead, since df.index should hold the date values in your pandas dataframe.
import pandas as pd
import datetime
import matplotlib.pyplot as plt
import pandas.io.data
df = pd.io.data.get_data_yahoo('goog', start=datetime.datetime(2008,1,1),end=datetime.datetime(2014,10,23))
fig = plt.figure()
r = fig.patch
r.set_facecolor('#0070BB')
ax1 = fig.add_subplot(2,1,1, axisbg='#0070BB')
ax1.grid(True)
ax1.plot(df.index, df['Close'])
ax2 = fig.add_subplot(2,1,2, axisbg='#0070BB')
ax2.plot(df.index, df['Volume'])
plt.show()