Gain plot using seaborn - matplotlib - python

I am generating gain plot based on the following example data in Matplotlib.
M_GRP_1 F_GRP_1 GRP_1 GAIN_GRP_1
0.036796 0.067024 0.058878 0.624948
0.000093 0.000087 0.000089 1.043674
0.000316 0.0002 0.000231 1.366149
0.011152 0.008329 0.00909 1.226813
0.001227 0.000747 0.000876 1.400792
import matplotlib.pyplot as plt
import numpy as np
fig, ax = plt.subplots()
fig.set_size_inches([18, 9])
ax.plot(np.linspace(0,1),np.linspace(0,1), color = 'black', linewidth = 2)
D = d.sort_values('GRP_1', ascending = False).cumsum()
ax.plot(D.iloc[:,2], D.iloc[:,0], color = 'orange', linewidth = 2)
plt.xlabel('Percentage of total data')
plt.ylabel('Gain')
plt.title ('Target groups :: GRP_1')
plt.legend(['Basline','Male'])
plt.grid(True)
plt.show()
However, I want to generate same plot using seaborn. I am wondering how I can do that as I,m not familiar with it.
Can any body suggest/help with this.
Thanks in advance

Seaborn is based on matplotlib, so most of your code is the same.
Just import seaborn as sns and replace ax.plot by sns.lineplot.
You may also want to add sns.set_theme() (or sns.set() prior to version 0.11.0) to apply seaborn default styles.

Related

How to sync color between Seaborn and pandas pie plot

I am struggling with syncing colors between [seaborn.countplot] and [pandas.DataFrame.plot] pie plot.
I found a similar question on SO, but it does not work with pie chart as it throws an error:
TypeError: pie() got an unexpected keyword argument 'color'
I searched on the documentation sites, but all I could find is that I can set a colormap and palette, which was also not in sync in the end:
Result of using the same colormap and palette
My code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv('https://andybek.com/pandas-sat')
cat_vars = ['Borough', 'SAT Section']
for var in list(cat_vars):
fig, ax = plt.subplots(1, 2, figsize=(10, 5))
df[var].value_counts().plot(kind='pie', autopct=lambda v: f'{v:.2f}%', ax=ax[0])
cplot = sns.countplot(data=df, x=var, ax=ax[1])
for patch in cplot.patches:
cplot.annotate(
format(patch.get_height()),
(
patch.get_x() + patch.get_width() / 2,
patch.get_height()
)
)
plt.show()
Illustration of the problem
As you can see, colors are not in sync with labels.
I added the argument order to the sns.countplot(). This would change how seaborn selects the values and as a consequence the colours between both plots will mach.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv('https://andybek.com/pandas-sat')
cat_vars = ['Borough', 'SAT Section']
for var in list(cat_vars):
fig, ax = plt.subplots(1, 2, figsize=(10, 5))
df[var].value_counts().plot(kind='pie', autopct=lambda v: f'{v:.2f}%', ax=ax[0])
cplot = sns.countplot(data=df, x=var, ax=ax[1],
order=df[var].value_counts().index)
for patch in cplot.patches:
cplot.annotate(
format(patch.get_height()),
(
patch.get_x() + patch.get_width() / 2,
patch.get_height()
)
)
plt.show()
Explanation: Colors are selected by order. So, if the columns in the sns.countplot have a different order than the other plot, both plots will have different columns for the same label.
Using default colors
Using the same dataframe for the pie plot and for the seaborn plot might help. As the values are already counted for the pie plot, that same dataframe could be plotted directly as a bar plot. That way, the order of the values stays the same.
Note that seaborn by default makes the colors a bit less saturated. To get the same colors as in the pie plot, you can use saturation=1 (default is .75). To add text above the bars, the latest matplotlib versions have a new function bar_label.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
df = pd.read_csv('https://andybek.com/pandas-sat')
cat_vars = ['Borough', 'SAT Section']
for var in list(cat_vars):
fig, ax = plt.subplots(1, 2, figsize=(10, 5))
counts_df = df[var].value_counts()
counts_df.plot(kind='pie', autopct=lambda v: f'{v:.2f}%', ax=ax[0])
sns.barplot(x=counts_df.index, y=counts_df.values, saturation=1, ax=ax[1])
ax[1].bar_label(ax[1].containers[0])
#Customized colors
If you want to use a customized list of colors, you can use the colors= keyword in pie() and palette= in seaborn.
To make things fit better, you can replace spaces by newlines (so "Staten Island" will use two lines). plt.tight_layout() will rearrange spacings to make titles and texts fit nicely into the figure.

How to create a FacetGrid stacked barplot using Seaborn?

I am trying to plot a facet_grid with stacked bar charts inside.
I would like to use Seaborn. Its barplot function does not include a stacked argument.
I tried to use FacetGrid.map with a custom callable function.
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
def custom_stacked_barplot(col_day, col_time, col_total_bill, **kwargs):
dict_df={}
dict_df['day']=col_day
dict_df['time']=col_time
dict_df['total_bill']=col_total_bill
df_data_graph=pd.DataFrame(dict_df)
df = pd.crosstab(index=df_data_graph['time'], columns=tips['day'], values=tips['total_bill'], aggfunc=sum)
df.plot.bar(stacked=True)
tips=sns.load_dataset("tips")
g = sns.FacetGrid(tips, col='size', row='smoker')
g = g.map(custom_stacked_barplot, "day", 'time', 'total_bill')
However I get an empty canvas and stacked bar charts separately.
Empty canvas:
Graph1 apart:
Graph2:.
How can I fix this issue? Thanks for the help!
The simplest code to achive that result is this:
import seaborn as sns
import matplotlib.pyplot as plt
sns.set()
tips=sns.load_dataset("tips")
g = sns.FacetGrid(tips, col = 'size', row = 'smoker', hue = 'day')
g = (g.map(sns.barplot, 'time', 'total_bill', ci = None).add_legend())
plt.show()
which gives this result:
Your different mixes of APIs (pandas.DataFrame.plot) appears not to integrate with (seaborn.FacetGrid). Since stacked bar plots are not supported in seaborn plotting, consider developing your own version with matplotlib subplots by iterating across groupby levels:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
def custom_stacked_barplot(t, sub_df, ax):
plot_df = pd.crosstab(index=sub_df["time"], columns=sub_df['day'],
values=sub_df['total_bill'], aggfunc=sum)
p = plot_df.plot(kind="bar", stacked=True, ax = ax,
title = " | ".join([str(i) for i in t]))
return p
tips = sns.load_dataset("tips")
g_dfs = tips.groupby(["smoker", "size"])
# INITIALIZE PLOT
# sns.set()
fig, axes = plt.subplots(nrows=2, ncols=int(len(g_dfs)/2)+1, figsize=(15,6))
# BUILD PLOTS ACROSS LEVELS
for ax, (i,g) in zip(axes.ravel(), sorted(g_dfs)):
custom_stacked_barplot(i, g, ax)
plt.tight_layout()
plt.show()
plt.clf()
plt.close()
And use seaborn.set to adjust theme and pallette:

Multiple histogram graphs with Seaborn

Graphing with matplotlib I get this 4 histograms model:
Using Seaborn I am getting the exact graph I need but I cannot replicate it to get 4 at a time:
I want to get 4 of the seaborn graphs (image 2) in the format of the image 1 (4 at a time with the calculations I made with seaborn).
My seaborn code is the following:
import os
import re
import time
import ipdb
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
BASE_DIR = os.path.dirname(os.path.abspath(__file__))
path_file = os.path.join(BASE_DIR, 'camel_product_list.csv')
gapminder = pd.read_csv(path_file)
print(gapminder.head())
df = gapminder
sns.distplot(df['average_histogram_ssim'], hist=True, kde = False, label='All values')
df = gapminder[gapminder.color == 'green']
# sns.distplot(df['lifeExp'], hist = True, kde = True, label='Only Matches')
sns.distplot(df['average_histogram_ssim'], hist_kws={"histtype": "step",
"linewidth": 3,
"alpha": 1, "color": "b"} ,
kde = False, label='Only Matches')
# Plot formatting
plt.legend(prop={'size': 12})
plt.title('ratio_image SSIM')
plt.xlabel('Data Range')
plt.ylabel('Density')
plt.show()
The names of the columns of the dataframe are:
'ratio_text','ratio_image', 'ratio_hist', 'ratio_sub', 'color'
I'm using the color column as a filter.
How can I get the 4 seaborn plots for ratio_text','ratio_image', 'ratio_hist', 'ratio_sub', filtered by all colors and green color?
First define your grid of subplots and assign its four axes to an array ax:
fig, ax = plt.subplots(2, 2)
Now you can pass the axes you want to plot on to the seaborn plotting function with the ax keyword argument, e.g. for the first plot:
sns.distplot(df['average_histogram_ssim'], hist=True, kde=False, label='All values',
ax=ax[0, 0])
Same with ax=ax[0, 1] for the upper right plots, and so on.

changing major/minor axis interval and color scheme for heatmap

You can find my data set here.
I am using seaborn to plot the heatmap. But open to other choices.
I have trouble getting the color scheme right. I wish to have a black and white scheme. As the current color scheme doesn't clear show the result.
I also wish to display only x and y intervals as (0 , 25 , 50, 100 , 127).
How can I do this.
Below is my try:
import pandas as pd
import numpy
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
data_sorted = pd.read_csv("tors_sorted.txt", sep="\t")
ax = plt.axes()
ax.set_xlim(right=128)
minor_ticks = numpy.arange(0, 128, 50) # doesn't seem to work
data_sorted= data_sorted.pivot("dst","src","pdf_num_bytes")
#sns.heatmap(data_sorted,ax=ax)
sns.heatmap(data_sorted,linecolor='black',xticklabels=True,yticklabels=True)
ax.set_title('Sample plot')
ax.set_xticks(minor_ticks, minor=True)
fig = ax.get_figure()
fig.savefig('heatmap.jpg')
This is the image that I get.
thanks.

How to change color of certain squares in a seaborn heatmap?

I'm trying to create a heatmap in seaborn (python) with certain squares colored with a different color, (these squares contain insignificant data - in my case it will be squares with values less than 1.3, which is -log of p-values >0.05). I couldn't find such function. Masking these squares also didn't work.
Here is my code:
import matplotlib.pyplot as plt
import numpy as np
import matplotlib as mpl
import seaborn as sns; sns.set()
data = [[1.3531363408, 3.339479161, 0.0760855365], [5.1167382617, 3.2890920405, 2.4764601828], [0.0025058257, 2.3165128345, 1.6532714962], [0.2600549869, 5.8427407219, 6.6627226609], [3.0828581725, 16.3825494439, 12.6722666929], [2.3386307357, 13.7275065772, 12.5760972276], [1.224683813, 2.2213656372, 0.6300876451], [0.4163788387, 1.8128374089, 0.0013106046], [0.0277592882, 2.9286203949, 0.810978992], [0.0086613622, 0.6181261247, 1.8287878837], [1.0174519889, 0.2621290291, 0.1922637697], [3.4687429571, 4.0061981716, 0.5507951444], [7.4201304939, 3.881457516, 0.1294141768], [2.5227546319, 6.0526491816, 0.3814362442], [8.147538027, 14.0975727815, 7.9755706939]]
cmap2 = mpl.colors.ListedColormap(sns.cubehelix_palette(n_colors=20, start=0, rot=0.4, gamma=1, hue=0.8, light=0.85, dark=0.15, reverse=False))
ax = sns.heatmap(data, cmap=cmap2, vmin=0)
plt.show()
I want to add that I'm not very advanced programmer.
OK, so I can answer my question myself now :) Here is the code that solved the problem:
import matplotlib.pyplot as plt
import numpy as np
import matplotlib as mpl
import seaborn as sns; sns.set()
data = np.array([[1.3531363408, 3.339479161, 0.0760855365],
[5.1167382617, 3.2890920405, 2.4764601828],
[0.0025058257, 2.3165128345, 1.6532714962],
[0.2600549869, 5.8427407219, 6.6627226609],
[3.0828581725, 16.3825494439, 12.6722666929],
[2.3386307357, 13.7275065772, 12.5760972276],
[1.224683813, 2.2213656372, 0.6300876451],
[0.4163788387, 1.8128374089, 0.0013106046],
[0.0277592882, 2.9286203949, 0.810978992],
[0.0086613622, 0.6181261247, 1.8287878837],
[1.0174519889, 0.2621290291, 0.1922637697],
[3.4687429571, 4.0061981716, 0.5507951444],
[7.4201304939, 3.881457516, 0.1294141768],
[2.5227546319, 6.0526491816, 0.3814362442],
[8.147538027, 14.0975727815, 7.9755706939]])
cmap1 = mpl.colors.ListedColormap(['c'])
fig, ax = plt.subplots(figsize=(8, 8))
sns.heatmap(data, ax=ax)
sns.heatmap(data, mask=data > 1.3, cmap=cmap1, cbar=False, ax=ax)
plt.show()
So the problem with masking which didn't work before was that it works only on arrays not on lists.
And another thing is just plotting the heatmap twice -second time with masking.
The only thing I still don't understand is that it masks opposite fields from what is written.. I want to mask values below 1.3, but then it colored values above 1.3.. So I wrote mask=data >1.3 and now it works...

Categories