Matplotlib - How can I add labels to legend - python

Here I am trying to separate the data with the factor male or not by plotting Age on x-axis and Fare on y-axis and I want to display two labels in the legend differentiating male and female with respective colors.Can anyone help me do this.
Code:
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv('https://sololearn.com/uploads/files/titanic.csv')
df['male']=df['Sex']=='male'
sc1= plt.scatter(df['Age'],df['Fare'],c=df['male'])
plt.legend()
plt.show()

You could use the seaborn library which builds on top of matplotlib to perform the exact task you require. You can scatterplot 'Age' vs 'Fare' and colour code it by 'Sex' by just passing the hue parameter in sns.scatterplot, as follows:
import matplotlib.pyplot as plt
import seaborn as sns
plt.figure()
# No need to call plt.legend, seaborn will generate the labels and legend
# automatically.
sns.scatterplot(df['Age'], df['Fare'], hue=df['Sex'])
plt.show()
Seaborn generates nicer plots with less code and more functionality.
You can install seaborn from PyPI using pip install seaborn.
Refer: Seaborn docs

PathCollection.legend_elements method
can be used to steer how many legend entries are to be created and how they
should be labeled.
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv('https://sololearn.com/uploads/files/titanic.csv')
df['male'] = df['Sex']=='male'
sc1= plt.scatter(df['Age'], df['Fare'], c=df['male'])
plt.legend(handles=sc1.legend_elements()[0], labels=['male', 'female'])
plt.show()
Legend guide and Scatter plots with a legend for reference.

This can be achieved by segregating the data in two separate dataframe and then, label can be set for these dataframe.
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv('https://sololearn.com/uploads/files/titanic.csv')
subset1 = df[(df['Sex'] == 'male')]
subset2 = df[(df['Sex'] != 'male')]
plt.scatter(subset1['Age'], subset1['Fare'], label = 'Male')
plt.scatter(subset2['Age'], subset2['Fare'], label = 'Female')
plt.legend()
plt.show()
enter image description here

Related

Seaborn pairplot with log scale only for specific columns

I have a dataframe and I'm using seaborn pairplot to plot one target column vs the rest of the columns.
Code is below,
import seaborn as sns
import matplotlib.pyplot as plt
tgt_var = 'AB'
var_lst = ['A','GH','DL','GT','MS']
pp = sns.pairplot(data=df,
y_vars=[tgt_var],
x_vars=var_lst)
pp.fig.set_figheight(6)
pp.fig.set_figwidth(20)
The var_lst is not a static list, I just provided an example.
What I need is to plot tgt_var on Y axis and each var_lst on x axis.
I'm able to do this with above code, but I also want to use log scale on X axis only if the var_lst item is 'GH' or 'MS', for the rest normal scale. Is there any way to achieve this?
Iterate pp.axes.flat and set xscale="log" if the xlabel matches "GH" or "MS":
log_columns = ["GH", "MS"]
for ax in pp.axes.flat:
if ax.get_xlabel() in log_columns:
ax.set(xscale="log")
Full example with the iris dataset where the petal columns are xscale="log":
import seaborn as sns
df = sns.load_dataset("iris")
pp = sns.pairplot(df)
log_columns = ["petal_length", "petal_width"]
for ax in pp.axes.flat:
if ax.get_xlabel() in log_columns:
ax.set(xscale="log")

Can you plot interquartile range as the error band on a seaborn lineplot?

I'm plotting time series data using seaborn lineplot (https://seaborn.pydata.org/generated/seaborn.lineplot.html), and plotting the median instead of mean. Example code:
import seaborn as sns; sns.set()
import matplotlib.pyplot as plt
fmri = sns.load_dataset("fmri")
ax = sns.lineplot(x="timepoint", y="signal", estimator = np.median, data=fmri)
I want the error bands to show the interquartile range as opposed to the confidence interval. I know I can use ci = "sd" for standard deviation, but is there a simple way to add the IQR instead? I cannot figure it out.
Thank you!
I don't know if this can be done with seaborn alone, but here's one way to do it with matplotlib, keeping the seaborn style. The describe() method conveniently provides summary statistics for a DataFrame, among them the quartiles, which we can use to plot the medians with inter-quartile-ranges.
import seaborn as sns; sns.set()
import matplotlib.pyplot as plt
fmri = sns.load_dataset("fmri")
fmri_stats = fmri.groupby(['timepoint']).describe()
x = fmri_stats.index
medians = fmri_stats[('signal', '50%')]
medians.name = 'signal'
quartiles1 = fmri_stats[('signal', '25%')]
quartiles3 = fmri_stats[('signal', '75%')]
ax = sns.lineplot(x, medians)
ax.fill_between(x, quartiles1, quartiles3, alpha=0.3);
You can calculate the median within lineplot like you have done, set ci to be none and fill in using ax.fill_between()
import numpy as np
import seaborn as sns; sns.set()
import matplotlib.pyplot as plt
fmri = sns.load_dataset("fmri")
ax = sns.lineplot(x="timepoint", y="signal", estimator = np.median,
data=fmri,ci=None)
bounds = fmri.groupby('timepoint')['signal'].quantile((0.25,0.75)).unstack()
ax.fill_between(x=bounds.index,y1=bounds.iloc[:,0],y2=bounds.iloc[:,1],alpha=0.1)
This option is possible since version 0.12 of seaborn, see here for the documentation.
pip install --upgrade seaborn
The estimator specifies the point by the name of pandas method or callable, such as 'median' or 'mean'.
The errorbar is an option to plot a distribution spread by a string, (string, number) tuple, or callable. In order to mark the median value and fill the area between the interquartile, you would need the params:
import seaborn as sns; sns.set()
import matplotlib.pyplot as plt
fmri = sns.load_dataset("fmri")
ax = sns.lineplot(data=fmri, x="timepoint", y="signal", estimator=np.median,
errorbar=lambda x: (np.quantile(x, 0.25), np.quantile(x, 0.75)))
You can now!
estimator="median", errobar=("pi",0.5)
https://seaborn.pydata.org/tutorial/error_bars

Seaborn Scatterplot X Values Missing

I have a scatter plot im working with and for some reason im not seeing all the x values on my graph
#%%
from pandas import DataFrame, read_csv
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
file = r"re2.csv"
df = pd.read_csv(file)
#sns.set(rc={'figure.figsize':(11.7,8.27)})
g = sns.FacetGrid(df, col='city')
g.map(plt.scatter, 'type', 'price').add_legend()
This is an image of a small subset of my plots, you can see that Res is displaying, the middle bar should be displaying Con and the last would be Mlt. These are all defined in the type column from my data set but are not displaying.
Any clue how to fix?
Python is doing what you tell it to do. Just pick different features, presumably things that make more sense for plotting, if you want to generate a more interesting plots. See this generic example below.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_theme(style="darkgrid")
tips = sns.load_dataset("tips")
sns.relplot(x="total_bill", y="tip", hue="smoker", data=tips);
Personally, I like plotly plots, which are dynamic, more than I like seaborn plots.
https://plotly.com/python/line-and-scatter/

Modifying Seaborn Violinplot to show means not median

Currently displaying some data with Seaborn / Pandas. I'm looking to overlay the mean of each category (x=ks2) - but can't figure out how to do this with Seaborn.
I can remove the inner="box" - but want to replace that with a marker for the mean of each category.
Ideally, then link each mean calculated...
Any pointers greatly received.
Cheers
Science.csv has 9k+ entries
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
sns.set(style="whitegrid", palette="pastel", color_codes=True)
# Load the dataset
# df = pd.read_csv("science.csv") << loaded from csv
df = pd.DataFrame({'ks2': [1, 1, 2,3,3,4],
'science': [40, 50, 34,20,0,44]})
# Draw a nested violinplot and split the violins for easier comparison
sns.violinplot(x="ks2", y="science", data=df, split=True,
inner="box",linewidth=2)
sns.despine(left=True)
plt.savefig('plot.png')
try:
from numpy import mean
then overlay sns.pointplot with estimator=mean
sns.pointplot(x = 'ks2', y='science', data=df, estimator=mean)
then play with linestyles

How to add a label to Seaborn Heatmap color bar?

If I have the following data and Seaborn Heatmap:
import pandas as pd
data = pd.DataFrame({'x':(1,2,3,4),'y':(1,2,3,4),'z':(14,15,23,2)})
sns.heatmap(data.pivot_table(index='y', columns='x', values='z'))
How do I add a label to the colour bar?
You could set it afterwards after collecting it from an ax, or simply pass a label in cbar_kws like so.
import seaborn as sns
import pandas as pd
data = pd.DataFrame({'x':(1,2,3,4),'y':(1,2,3,4),'z':(14,15,23,2)})
sns.heatmap(data.pivot_table(index='y', columns='x', values='z'),
cbar_kws={'label': 'colorbar title'})
It is worth noting that cbar_kws can be handy for setting other attributes on the colorbar such as tick frequency or formatting.
You can use:
ax = sns.heatmap(data.pivot_table(index='y', columns='x', values='z'))
ax.collections[0].colorbar.set_label("Hello")

Categories