I am trying to overlay a box plot (series of box plot based on another variable) and a line plot of medians of that variable, on the same box plot. A simple code like below works perfectly fine.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
dfx=pd.DataFrame({'S':np.random.randint(10,100,9)*10,'C':
['X','X','X','Y','Y','Y','Z','Z','Z']})
fig,ax=plt.subplots()
mx=dfx.groupby('C')['S'].median()
sns.boxplot(y='S',x='C',data=dfx,ax=ax)
sns.lineplot(y=mx.values,x=mx.index,ax=ax)
plt.show()
which gives
However, when I use the same code for this data I am reading from csv file, I just cannot the line plot to appear with the box plot.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
df=pd.read_csv('test.csv')
fig,ax=plt.subplots()
m=df.groupby('Start Date')['Score'].median()
sns.boxplot(y='Score',x='Start Date',data=df,ax=ax)
sns.lineplot(y=m.values,x=m.index,ax=ax)
plt.show()
gives this
It doesn't matter whether the lineplot command is before or after boxplot, only box plot is shown. I see the line only if boxplot line is commented out.
I do not understand what is different about this data I am reading from csv that I cannot overlay line and box
P.S: I know a simple workaround is replace the seaborn lineplot line with matplotlib line command
ax.plot(m.values,'r-o',linewidth=4)
and it gives the desired result:
I am just curious why seaborn lineplot is behaving the way it is.
I was facing a similar problem, I "solved it" by transforming my datetime column to string.
df_median.date = df_median.date.astype(str)
df_aux.date = df_aux.date.astype(str)
sns.set()
ax = sns.stripplot('date',
'value',
data=df_aux)
ax = sns.lineplot('date',
'value',
data=df_median,
ax=ax)
plt.xlabel("month")
plt.ylabel("values")
labels = ax.axes.get_xticklabels()
ax.axes.set_xticklabels(labels, rotation=45)
plt.show()
Related
I have a csv file named "gapminder_with_codes.csv". I tried to get the data of the year 2007 by: data_2007 = data[data['year']==2007], but when i try to plot the violin plot, I get a graph where I can't see the violins, they actually look like dots/small lines.
What could be the problem?
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
data = pd.read_csv("gapminder_with_codes.csv")
data_2007 = data[data['year'] == 2007]
sns.set_style('darkgrid')
sns.set(rc={'figure.figsize':(12,10)})
sns.violinplot(x='gdpPercap', y='lifeExp', data=data_2007, scale="count", cut=2)
This is the code I used to draw a violin plot of data_2007 with my x axis having: gdpPercap, y-axis: lifeExp and hue: pop
The output is as shown in the image attached, while I expected to see normal violin plots as my output.
I want to add an artificial legend to my plot. It is artificial because I didn't group my observation (see code below).It means I can't solve this problem with plt.legend() function: it requires grouped variables. Is there any way to handle it?
My code:
sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_theme(style="white")
ax = sns.boxplot(data = data.values.tolist(),palette=['white', 'black'])
ax.set_xticklabels(labels, fontsize=14)
ax.tick_params(labelsize=14)
and plot looks like:
My desire is to add a legend (maybe it is not a legend at all just a drawing) where will be written something like (sorry for size):
You can create a legend from the artists created by Seaborn as follows:
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
sns.set_theme(style="white")
ax = sns.boxplot(data = np.random.randn(20,20), palette=['white', 'black'])
handles = ax.artists[:2]
handles[0].set_label("First")
handles[1].set_label("Second")
ax.legend(handles=handles)
plt.show()
As I do not have your data, I can not replicate your charts. However, you might try adding the following line at the end (after importing matplotlib.pyplot as plt).
plt.legend(['First','Second'])
I'm trying to overlay a lineplot above a countplot in seaborn. They both work when they are seperated:
By put together they end up at opposite ends of the chart:
Does anybody know why this is?
You need to use the twinx() from matplotlib and your first graph needs to be just matplotlib, not seaborn. I'm not sure why seaborn has a problem with combo charts, but I got the exact same problem as you did. Here's my code with population data from kaggle:
#Create bar plot for annual growth by year
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
#import dataframe for data
df = pd.read_csv('df.csv')
#Create combo chart
fig, ax1 = plt.subplots(figsize=(10,6))
color = 'tab:green'
#bar plot creation
ax1.bar(df['Year'],df['Population Growth'],color='y')
#specify we want to share the same x-axis
ax2 = ax1.twinx()
#lineplot creation
ax2 = sns.lineplot(x='Year', y='Percent Growth', data=df,color='#C33E3E')
plt.show()
With this code I get the following graph:
Seaborn enables you to create a categorical plot using points
import seaborn as sns
tips = sns.load_dataste('tips')
sns.catplot(x='tip', y='sex', data=tips, jitter=False)
Is there a way to connect the points with a line for the same gender?
My goal is to create a plot that will be similar to the below figure (done in R's ggplot2). Reading the seaborn documentation I find nothing that would resemble this plot. The lineplot only takes in numeric values. Is currently there an obvious way to make this categorical plot this that I'm missing?
Group by the category and plot each line individually.
import numpy as np
import matplotlib.pyplot as plt
def cat_horizontal_plot(data, category, numeric, ax=None):
ax = ax or plt.gca()
for cat, num in data.groupby(category):
ax.plot(np.sort(num[numeric].values), [cat]*len(num),
marker="o", mec="k", mfc="none", linestyle="-", color="k")
ax.set_xlabel(numeric)
ax.set_ylabel(category)
ax.margins(y=0.4)
ax.figure.tight_layout()
Use it as
import seaborn as sns
tips = sns.load_dataset('tips')
cat_horizontal_plot(tips, "sex", "tip")
plt.show()
I try to plot group wise median values using seaborn's pointlot on top of a swarmplot. Even though I call pointPlot second, the point plot ends up behind the swarmplot. How can I change the 'layer order' such that the point plot is in front of the swarmplot?
datDf=pd.DataFrame({'values':np.random.randint(0,100,100)})
datDf['group']=np.random.randint(0,5,100)
sns.swarmplot(data=datDf,x='group',y='values')
sns.pointplot(data=datDf,x='group',y='values',estimator=np.median,join=False)
Use zorder property to set proper drawing order.
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pylab as plt
datDf=pd.DataFrame({'values':np.random.randint(0,100,100)})
datDf['group']=np.random.randint(0,5,100)
sns.swarmplot(data=datDf,x='group',y='values',zorder=1)
sns.pointplot(data=datDf,x='group',y='values',estimator=np.median,join=False, zorder=100)
plt.show()