how to use groupby function in the y-axis? the below code doesn't display what i expect, due to y = df.groupby('column1')['column2'].count()
import seaborn as sns
import pandas as pd
sns.set(style="whitegrid", color_codes=True)
sns.stripplot(x="column1", y = df.groupby('column1')['column2'].count(), data=df)
Seaborn just doesn't work that way. In seaborn, you specify the x and y columns as well as the data frame. Seaborn will do the aggregation itself.
import seaborn as sns
sns.striplot('column1', 'column2', data=df)
For the count, maybe what you need is countplot
sns.countplot('column1', data=df)
The equivalent pandas code is:
df.groupby('column1').size().plot(kind='bar')
this code will create a count plot with horizontal bar equivalent and descending sorted values
fig,ax = plt.subplots(figsize=(10,16))
grouped=df.groupby('Age').size(). \
sort_values(ascending=False).plot(kind='barh',ax=ax)
Related
I have a dataframe and I'm using seaborn pairplot to plot one target column vs the rest of the columns.
Code is below,
import seaborn as sns
import matplotlib.pyplot as plt
tgt_var = 'AB'
var_lst = ['A','GH','DL','GT','MS']
pp = sns.pairplot(data=df,
y_vars=[tgt_var],
x_vars=var_lst)
pp.fig.set_figheight(6)
pp.fig.set_figwidth(20)
The var_lst is not a static list, I just provided an example.
What I need is to plot tgt_var on Y axis and each var_lst on x axis.
I'm able to do this with above code, but I also want to use log scale on X axis only if the var_lst item is 'GH' or 'MS', for the rest normal scale. Is there any way to achieve this?
Iterate pp.axes.flat and set xscale="log" if the xlabel matches "GH" or "MS":
log_columns = ["GH", "MS"]
for ax in pp.axes.flat:
if ax.get_xlabel() in log_columns:
ax.set(xscale="log")
Full example with the iris dataset where the petal columns are xscale="log":
import seaborn as sns
df = sns.load_dataset("iris")
pp = sns.pairplot(df)
log_columns = ["petal_length", "petal_width"]
for ax in pp.axes.flat:
if ax.get_xlabel() in log_columns:
ax.set(xscale="log")
Here I am trying to separate the data with the factor male or not by plotting Age on x-axis and Fare on y-axis and I want to display two labels in the legend differentiating male and female with respective colors.Can anyone help me do this.
Code:
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv('https://sololearn.com/uploads/files/titanic.csv')
df['male']=df['Sex']=='male'
sc1= plt.scatter(df['Age'],df['Fare'],c=df['male'])
plt.legend()
plt.show()
You could use the seaborn library which builds on top of matplotlib to perform the exact task you require. You can scatterplot 'Age' vs 'Fare' and colour code it by 'Sex' by just passing the hue parameter in sns.scatterplot, as follows:
import matplotlib.pyplot as plt
import seaborn as sns
plt.figure()
# No need to call plt.legend, seaborn will generate the labels and legend
# automatically.
sns.scatterplot(df['Age'], df['Fare'], hue=df['Sex'])
plt.show()
Seaborn generates nicer plots with less code and more functionality.
You can install seaborn from PyPI using pip install seaborn.
Refer: Seaborn docs
PathCollection.legend_elements method
can be used to steer how many legend entries are to be created and how they
should be labeled.
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv('https://sololearn.com/uploads/files/titanic.csv')
df['male'] = df['Sex']=='male'
sc1= plt.scatter(df['Age'], df['Fare'], c=df['male'])
plt.legend(handles=sc1.legend_elements()[0], labels=['male', 'female'])
plt.show()
Legend guide and Scatter plots with a legend for reference.
This can be achieved by segregating the data in two separate dataframe and then, label can be set for these dataframe.
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv('https://sololearn.com/uploads/files/titanic.csv')
subset1 = df[(df['Sex'] == 'male')]
subset2 = df[(df['Sex'] != 'male')]
plt.scatter(subset1['Age'], subset1['Fare'], label = 'Male')
plt.scatter(subset2['Age'], subset2['Fare'], label = 'Female')
plt.legend()
plt.show()
enter image description here
I've got code as follows:
import pandas as pd
import numpy as np
import seaborn as sns
data=[np.random.randint(2018,2020,size=(30)),
np.random.randint(1,13,size=(30)),
np.random.randint(1,101,size=(30)),
np.random.randint(1,101,size=(30))]
cols=['year','month','val','val1']
data=pd.DataFrame(data).T
data.columns=cols
data1=[np.random.randint(1,13,size=(30)),
np.random.randint(1,101,size=(30)),
np.random.randint(1,101,size=(30))]
cols1=['month','val','val1']
data1=pd.DataFrame(data1).T
data1.columns=cols1
sns.barplot(data=data,x='month',y='val',hue='year',ci=False)
sns.barplot(data=data,x='month',y='val',estimator=np.mean,ci=False)
to produce barplots
and in fact I get two bar plots
and the second with mean for each month
but I would like to have one plot with three columns for each month including mean bar. Could you help me with this?
You can use pandas' plot function:
(data.pivot_table(index='month',columns='year',
values='val', margins=True,
margins_name='Mean')
.drop('Mean')
.plot.bar()
)
Output:
I'd like to group a dataframe using several criteria and then visualize individual data points in each group using a scattered plot.
import pandas as pd
import seaborn as sns
df_tips = sns.load_dataset('tips')
df_tips.groupby(['sex', 'day', 'smoker'])['tip'] # How could I scatter plot individual tip in each group?
Ideally, I'd like to have something looks like this:
I would do:
df_tips = sns.load_dataset('tips')
groups = df_tips.groupby(['sex', 'day', 'smoker'])['tip']
fig,ax = plt.subplots()
for i,(k,v) in enumerate(groups):
ax.scatter([i]*len(v), v)
ax.set_xticks(np.arange(len(groups)))
ax.set_xticklabels([k for k,v in groups],rotation=90);
Output:
I found a simpler way to do this and the plot is more beautiful (I think).
import pandas as pd
import seaborn as sns
df_tips = sns.load_dataset('tips')
df_tips['Groups'] = df_tips[['sex', 'day', 'smoker']].astype(str).agg('.'.join, axis=1)
sns.swarmplot(x='Groups', y='tip', data=df_tips)
plt.xticks(
rotation=90,
fontweight='light',
fontsize='x-large'
)
Here is the output:
If I have the following data and Seaborn Heatmap:
import pandas as pd
data = pd.DataFrame({'x':(1,2,3,4),'y':(1,2,3,4),'z':(14,15,23,2)})
sns.heatmap(data.pivot_table(index='y', columns='x', values='z'))
How do I add a label to the colour bar?
You could set it afterwards after collecting it from an ax, or simply pass a label in cbar_kws like so.
import seaborn as sns
import pandas as pd
data = pd.DataFrame({'x':(1,2,3,4),'y':(1,2,3,4),'z':(14,15,23,2)})
sns.heatmap(data.pivot_table(index='y', columns='x', values='z'),
cbar_kws={'label': 'colorbar title'})
It is worth noting that cbar_kws can be handy for setting other attributes on the colorbar such as tick frequency or formatting.
You can use:
ax = sns.heatmap(data.pivot_table(index='y', columns='x', values='z'))
ax.collections[0].colorbar.set_label("Hello")