Multiple histogram graphs with Seaborn - python

Graphing with matplotlib I get this 4 histograms model:
Using Seaborn I am getting the exact graph I need but I cannot replicate it to get 4 at a time:
I want to get 4 of the seaborn graphs (image 2) in the format of the image 1 (4 at a time with the calculations I made with seaborn).
My seaborn code is the following:
import os
import re
import time
import ipdb
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
BASE_DIR = os.path.dirname(os.path.abspath(__file__))
path_file = os.path.join(BASE_DIR, 'camel_product_list.csv')
gapminder = pd.read_csv(path_file)
print(gapminder.head())
df = gapminder
sns.distplot(df['average_histogram_ssim'], hist=True, kde = False, label='All values')
df = gapminder[gapminder.color == 'green']
# sns.distplot(df['lifeExp'], hist = True, kde = True, label='Only Matches')
sns.distplot(df['average_histogram_ssim'], hist_kws={"histtype": "step",
"linewidth": 3,
"alpha": 1, "color": "b"} ,
kde = False, label='Only Matches')
# Plot formatting
plt.legend(prop={'size': 12})
plt.title('ratio_image SSIM')
plt.xlabel('Data Range')
plt.ylabel('Density')
plt.show()
The names of the columns of the dataframe are:
'ratio_text','ratio_image', 'ratio_hist', 'ratio_sub', 'color'
I'm using the color column as a filter.
How can I get the 4 seaborn plots for ratio_text','ratio_image', 'ratio_hist', 'ratio_sub', filtered by all colors and green color?

First define your grid of subplots and assign its four axes to an array ax:
fig, ax = plt.subplots(2, 2)
Now you can pass the axes you want to plot on to the seaborn plotting function with the ax keyword argument, e.g. for the first plot:
sns.distplot(df['average_histogram_ssim'], hist=True, kde=False, label='All values',
ax=ax[0, 0])
Same with ax=ax[0, 1] for the upper right plots, and so on.

Related

How to sync color between Seaborn and pandas pie plot

I am struggling with syncing colors between [seaborn.countplot] and [pandas.DataFrame.plot] pie plot.
I found a similar question on SO, but it does not work with pie chart as it throws an error:
TypeError: pie() got an unexpected keyword argument 'color'
I searched on the documentation sites, but all I could find is that I can set a colormap and palette, which was also not in sync in the end:
Result of using the same colormap and palette
My code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv('https://andybek.com/pandas-sat')
cat_vars = ['Borough', 'SAT Section']
for var in list(cat_vars):
fig, ax = plt.subplots(1, 2, figsize=(10, 5))
df[var].value_counts().plot(kind='pie', autopct=lambda v: f'{v:.2f}%', ax=ax[0])
cplot = sns.countplot(data=df, x=var, ax=ax[1])
for patch in cplot.patches:
cplot.annotate(
format(patch.get_height()),
(
patch.get_x() + patch.get_width() / 2,
patch.get_height()
)
)
plt.show()
Illustration of the problem
As you can see, colors are not in sync with labels.
I added the argument order to the sns.countplot(). This would change how seaborn selects the values and as a consequence the colours between both plots will mach.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv('https://andybek.com/pandas-sat')
cat_vars = ['Borough', 'SAT Section']
for var in list(cat_vars):
fig, ax = plt.subplots(1, 2, figsize=(10, 5))
df[var].value_counts().plot(kind='pie', autopct=lambda v: f'{v:.2f}%', ax=ax[0])
cplot = sns.countplot(data=df, x=var, ax=ax[1],
order=df[var].value_counts().index)
for patch in cplot.patches:
cplot.annotate(
format(patch.get_height()),
(
patch.get_x() + patch.get_width() / 2,
patch.get_height()
)
)
plt.show()
Explanation: Colors are selected by order. So, if the columns in the sns.countplot have a different order than the other plot, both plots will have different columns for the same label.
Using default colors
Using the same dataframe for the pie plot and for the seaborn plot might help. As the values are already counted for the pie plot, that same dataframe could be plotted directly as a bar plot. That way, the order of the values stays the same.
Note that seaborn by default makes the colors a bit less saturated. To get the same colors as in the pie plot, you can use saturation=1 (default is .75). To add text above the bars, the latest matplotlib versions have a new function bar_label.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
df = pd.read_csv('https://andybek.com/pandas-sat')
cat_vars = ['Borough', 'SAT Section']
for var in list(cat_vars):
fig, ax = plt.subplots(1, 2, figsize=(10, 5))
counts_df = df[var].value_counts()
counts_df.plot(kind='pie', autopct=lambda v: f'{v:.2f}%', ax=ax[0])
sns.barplot(x=counts_df.index, y=counts_df.values, saturation=1, ax=ax[1])
ax[1].bar_label(ax[1].containers[0])
#Customized colors
If you want to use a customized list of colors, you can use the colors= keyword in pie() and palette= in seaborn.
To make things fit better, you can replace spaces by newlines (so "Staten Island" will use two lines). plt.tight_layout() will rearrange spacings to make titles and texts fit nicely into the figure.

Updating chart format

I would like to change this from a line of regression to a curve. Also to have the line reach either side of the graph. Here is my code:
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
data = {'Days': [5, 10, 15, 20],
'Impact': [33.7561, 30.6281, 29.5748, 29.0482]
}
a = pd.DataFrame (data, columns = ['Days','Impact'])
print (a)
ax = sns.barplot(data=a, x='Days', y='Impact', color='lightblue' )
# put bars in background:
for c in ax.patches:
c.set_zorder(0)
# plot regplot with numbers 0,..,len(a) as x value
ax = sns.regplot(x=np.arange(0,len(a)), y=a['Impact'], marker="+")
sns.despine(offset=10, trim=False)
ax.set_ylabel("")
ax.set_xticklabels(['5', '10','15','20'])
plt.show()
Alternatively, I would prefer to do it in matplotlib as a scatter plot instead of bar chart. Here is an example in excel, but ideally to have the curve extend beyond the outside markers at least a little.
Can anyone help?

Gain plot using seaborn - matplotlib

I am generating gain plot based on the following example data in Matplotlib.
M_GRP_1 F_GRP_1 GRP_1 GAIN_GRP_1
0.036796 0.067024 0.058878 0.624948
0.000093 0.000087 0.000089 1.043674
0.000316 0.0002 0.000231 1.366149
0.011152 0.008329 0.00909 1.226813
0.001227 0.000747 0.000876 1.400792
import matplotlib.pyplot as plt
import numpy as np
fig, ax = plt.subplots()
fig.set_size_inches([18, 9])
ax.plot(np.linspace(0,1),np.linspace(0,1), color = 'black', linewidth = 2)
D = d.sort_values('GRP_1', ascending = False).cumsum()
ax.plot(D.iloc[:,2], D.iloc[:,0], color = 'orange', linewidth = 2)
plt.xlabel('Percentage of total data')
plt.ylabel('Gain')
plt.title ('Target groups :: GRP_1')
plt.legend(['Basline','Male'])
plt.grid(True)
plt.show()
However, I want to generate same plot using seaborn. I am wondering how I can do that as I,m not familiar with it.
Can any body suggest/help with this.
Thanks in advance
Seaborn is based on matplotlib, so most of your code is the same.
Just import seaborn as sns and replace ax.plot by sns.lineplot.
You may also want to add sns.set_theme() (or sns.set() prior to version 0.11.0) to apply seaborn default styles.

Subplot of Subplots Matplotlib / Seaborn

I am trying to create a grid of subplots. each subplot will look like the one that is on this site.
https://python-graph-gallery.com/24-histogram-with-a-boxplot-on-top-seaborn/
If I have 10 different sets of this style of plot I want to make them into a 5x2 for example.
I have read through the documentation of Matplotlib and cannot seem to figure out how do it. I can loop the subplots and have each output but I cannot make it into the rows and columns
import pandas as pd
import numpy as np
import seaborn as sns
df = pd.DataFrame(np.random.randint(0,100,size=(100, 10)),columns=list('ABCDEFGHIJ'))
for c in df :
# Cut the window in 2 parts
f, (ax_box,
ax_hist) = plt.subplots(2,
sharex=True,
gridspec_kw={"height_ratios":(.15, .85)},
figsize = (10, 10))
# Add a graph in each part
sns.boxplot(df[c], ax=ax_box)
ax_hist.hist(df[c])
# Remove x axis name for the boxplot
plt.show()
the results would just take this loop and put them into a set of rows and columns in this case 5x2
You have 10 columns, each of which creates 2 subplots: a box plot and a histogram. So you need a total of 20 figures. You can do this by creating a grid of 2 rows and 10 columns
Complete answer: (Adjust the figsize and height_ratios as per taste)
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
f, axes = plt.subplots(2, 10, sharex=True, gridspec_kw={"height_ratios":(.35, .35)},
figsize = (12, 5))
df = pd.DataFrame(np.random.randint(0,100,size=(100, 10)),columns=list('ABCDEFGHIJ'))
for i, c in enumerate(df):
sns.boxplot(df[c], ax=axes[0,i])
axes[1,i].hist(df[c])
plt.tight_layout()
plt.show()

matplotlib loop make subplot for each category

I am trying to write a loop that will make a figure with 25 subplots, 1 for each country. My code makes a figure with 25 subplots, but the plots are empty. What can I change to make the data appear in the graphs?
fig = plt.figure()
for c,num in zip(countries, xrange(1,26)):
df0=df[df['Country']==c]
ax = fig.add_subplot(5,5,num)
ax.plot(x=df0['Date'], y=df0[['y1','y2','y3','y4']], title=c)
fig.show()
You got confused between the matplotlib plotting function and the pandas plotting wrapper.
The problem you have is that ax.plot does not have any x or y argument.
Use ax.plot
In that case, call it like ax.plot(df0['Date'], df0[['y1','y2']]), without x, y and title. Possibly set the title separately.
Example:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
countries = np.random.choice(list("ABCDE"),size=25)
df = pd.DataFrame({"Date" : range(200),
'Country' : np.repeat(countries,8),
'y1' : np.random.rand(200),
'y2' : np.random.rand(200)})
fig = plt.figure()
for c,num in zip(countries, xrange(1,26)):
df0=df[df['Country']==c]
ax = fig.add_subplot(5,5,num)
ax.plot(df0['Date'], df0[['y1','y2']])
ax.set_title(c)
plt.tight_layout()
plt.show()
Use the pandas plotting wrapper
In this case plot your data via df0.plot(x="Date",y =['y1','y2']).
Example:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
countries = np.random.choice(list("ABCDE"),size=25)
df = pd.DataFrame({"Date" : range(200),
'Country' : np.repeat(countries,8),
'y1' : np.random.rand(200),
'y2' : np.random.rand(200)})
fig = plt.figure()
for c,num in zip(countries, xrange(1,26)):
df0=df[df['Country']==c]
ax = fig.add_subplot(5,5,num)
df0.plot(x="Date",y =['y1','y2'], title=c, ax=ax, legend=False)
plt.tight_layout()
plt.show()
I don't remember that well how to use original subplot system but you seem to be rewriting the plot. In any case you should take a look at gridspec. Check the following example:
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
fig = plt.figure()
gs1 = gridspec.GridSpec(5, 5)
countries = ["Country " + str(i) for i in range(1, 26)]
axs = []
for c, num in zip(countries, range(1,26)):
axs.append(fig.add_subplot(gs1[num - 1]))
axs[-1].plot([1, 2, 3], [1, 2, 3])
plt.show()
Which results in this:
Just replace the example with your data and it should work fine.
NOTE: I've noticed you are using xrange. I've used range because my version of Python is 3.x. Adapt to your version.

Categories