This question already has answers here:
What is the difference between drawing plots using plot, axes or figure in matplotlib?
(2 answers)
How to add a title to each subplot
(10 answers)
Closed 11 months ago.
I have the following code:
df = sns.load_dataset('titanic')
# Data
data = df[df.age.notna()].age
# Fit a normal distribution to the data:
mu, std = scipy.stats.norm.fit(data)
# bin formulas
bin_f = {'sturges' : 1 + math.log(len(df), 2)}
# Plot the histogram.
sns.histplot( data = data, stat='density', bins=int(bin_f['sturges']), alpha=0.6, color='g', kde = True, legend = True)
# Plot the PDF.
xmin, xmax = plt.xlim()
x = np.linspace(xmin, xmax, 1000)
p = scipy.stats.norm.pdf(x, mu, std)
sns.lineplot(x = x, y = p, color = 'black', linewidth=2)
title = f"Fit results: mu = {round(mu, 2)}, std ={round(std, 2)} "
plt.title(title)
Which produces this plot:
When I try to produce it in a subplot it wont work as expected:
f, ax = plt.subplots(nrows = 1, ncols = 2, figsize=(15, 8))
# Data
data = df[df.age.notna()].age
# Fit a normal distribution to the data:
mu, std = scipy.stats.norm.fit(data)
# bin formulas
bin_f = {'sturges' : 1 + math.log(len(df), 2)}
# Plot the histogram.
sns.histplot(ax = ax[0], data = data, stat='density', bins=int(bin_f['sturges']), alpha=0.4, color='g', kde = True, legend = True)
# Plot the PDF.
xmin, xmax = plt.xlim()
x = np.linspace(xmin, xmax, 1000)
p = scipy.stats.norm.pdf(x, mu, std)
sns.lineplot(x = x, y = p, color = 'black', linewidth=2, ax=ax[0])
title = f"Fit results: mu = {round(mu, 2)}, std ={round(std, 2)} "
plt.title(title)
For some reason the title is only for a second plot and the previously plotted lineplot ( the black one ) is only a small tick in a second plot rather than a normal curve as in the first image. I am not sure why this is happening as the only difference is just using plt.subplots and referencing ax, where is my mistake?
My goal is to have the first graph as seen in the first picture as a the first subplot in the second plot.
Related
I'm trying to create plot with shadings which are based on this MIC(1) line.
Different shading above than beneath.
from scipy import stats
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
def createSkewDist(mean, sd, skew, size):
# calculate the degrees of freedom 1 required to obtain the specific skewness statistic, derived from simulations
loglog_slope=-2.211897875506251
loglog_intercept=1.002555437670879
df2=500
df1 = 10**(loglog_slope*np.log10(abs(skew)) + loglog_intercept)
# sample from F distribution
fsample = np.sort(stats.f(df1, df2).rvs(size=size))
# adjust the variance by scaling the distance from each point to the distribution mean by a constant, derived from simulations
k1_slope = 0.5670830069364579
k1_intercept = -0.09239985798819927
k2_slope = 0.5823114978219056
k2_intercept = -0.11748300123471256
scaling_slope = abs(skew)*k1_slope + k1_intercept
scaling_intercept = abs(skew)*k2_slope + k2_intercept
scale_factor = (sd - scaling_intercept)/scaling_slope
new_dist = (fsample - np.mean(fsample))*scale_factor + fsample
# flip the distribution if specified skew is negative
if skew < 0:
new_dist = np.mean(new_dist) - new_dist
# adjust the distribution mean to the specified value
final_dist = new_dist + (mean - np.mean(new_dist))
return final_dist
desired_mean = 30
desired_skew = 1.5
desired_sd = 20
final_dist = createSkewDist(mean=desired_mean, sd=desired_sd, skew=desired_skew, size=1000000)
# inspect the plots & moments, try random sample
fig, ax = plt.subplots(figsize=(12,7))
sns.distplot(final_dist,
hist=False,
ax=ax,
color='darkred',
kde_kws=dict(linewidth=4))
l1 = ax.lines[0]
# Get the xy data from the lines so that we can shade
x1 = l1.get_xydata()[:,0]
x1[0] = 0
y1 = l1.get_xydata()[:,1]
y1[0] = 0
ax.fill_between(x1,y1, color="lemonchiffon", alpha=0.3)
ax.set_ylim(0.0001,0.03)
ax.axhline(0.002, ls="--")
ax.set_xlim(1.5, 200)
ax.set_yticklabels([])
ax.set_xticklabels([])
trans = transforms.blended_transform_factory(
ax.get_yticklabels()[0].get_transform(), ax.transData)
ax.text(0,0.0025, "{}".format("MIC(1) = 1"), color="blue", transform=trans,
ha="right", va="top", fontsize = 12)
trans_2 = transforms.blended_transform_factory(
ax.get_xticklabels()[0].get_transform(), ax.transData)
ax.text(84,0, "{}".format("\n84"), color="darkred", transform=trans_2,
ha="center", va="top", fontsize = 12)
ax.text(1.5,0, "{}".format("\n0"), color="darkred", transform=trans_2,
ha="center", va="top", fontsize = 12)
ax.axvline(x = 84, ymin = 0, ymax = 0.03, ls = '--', color = 'darkred' )
ax.set_yticks([])
ax.set_xticks([])
ax.spines['top'].set_color(None)
ax.spines['right'].set_color(None)
ax.spines['left'].set_linewidth(2)
ax.spines['bottom'].set_linewidth(2)
ax.set_ylabel("Concentration [mg/L]", labelpad = 80, fontsize = 15)
ax.set_xlabel("Time [h]", labelpad = 80, fontsize = 15)
ax.set_title("AUC/MIC", fontsize = 20, pad = 30)
plt.annotate("AUC/MIC",
xy=(18, 0.02),
xytext=(18, 0.03),
arrowprops=dict(arrowstyle="->"), fontsize = 12);
;
That's what I have:
And that's what I'd like to have (it's done in paint, so forgive me :) ):
I was experimenting with fill_between and fill_betweenx. However, without any satisfying results. Definitely, run out of ideas. I'd really appreciate any help on this. Best wishes!
Your fill_between works as expected. The problem is that color="lemonchiffon" with alpha=0.3 is barely visible. Try to use a brighter color and/or a higher value for alpha.
So, this colors the part of the graph between zero and the kde curve.
Now, to create a different coloring above and below the horizontal line, where= and np.minimum can be used in fill_between:
pos_hline = 0.002
ax.fill_between(x1, pos_hline, y1, color="yellow", alpha=0.3, where=y1 > pos_hline)
ax.fill_between(x1, 0, np.minimum(y1, pos_hline), color="blue", alpha=0.3)
Without where=y1 > pos_hline, fill_between would also color the region above the curve where the curve falls below that horizontal line.
PS: Note that sns.histplot has been deprecated since Seaborn version 0.11. To only plot the kde curve, you can use sns.kdeplot:
sns.kdeplot(final_dist, ax=ax, color='darkred', linewidth=4)
I am trying to represent odds ratios in python in this way:
ax = sns.scatterplot(data=df_result, x="odd_ratio", y="iso")
plt.axvline(1.0, color='black', linestyle='--')
But I would like to have horizontal bars for each odds ratio indicating the confidence interval.
In my dataframe df_result I have the information about the lower and upper bound (df_result['lower_conf] and df_result['upper_conf]). How can I plot the confidence interval? Thanks in advance.
I share with you my code, its for a vertical plot but you can changue the axis. I have a table with the 5%, the 95% and the OR value in different columns
sns.set_style("whitegrid")
fig, ax = plt.subplots(figsize=(7, 5))
ax.set_yscale("log")
ax.axhline(1, ls='--', linewidth=1, color='black')
n = 0
for index, i in df.iterrows():
x = [n,n]
y = [i["5%"], i["95%"]]
ax.plot(x, y, "_-", markersize = 15, markeredgewidth= 3, linewidth = 3, color=sns.color_palette("muted")[n])
x = [n]
y = [i["Odds Ratio"]]
ax.plot(x, y, "o", color=sns.color_palette("muted")[n], markersize = 10)
n += 1
ax.set_xlabel("")
ax.set_ylabel("Odds Ratio")
ax.set_xticklabels(["", "Resistant", "Focal Epilepsy", "> 3 seizures/month", "Polytherapy", "DDD > 1", "Adverse effects"], rotation=45)
result
I am not sure as to why this happens. Maybe it is just a simple mistake that I cannot see, but by using this code:
for filename in glob.glob('/Users/jacob/Desktop/MERS/new/NOT COAL/gensets/statistics_per_lgu/per_lgu_files/*.csv'):
base = os.path.basename(filename)
name = os.path.splitext(base)[0]
df = pd.read_csv(filename)
# Show 4 different binwidths
for i, binwidth in enumerate([10, 20, 30, 40]):
# Set up the plot
ax = plt.subplot(2, 2, i + 1)
plt.subplots_adjust( wspace=0.5, hspace=0.5)
# Draw the plot
ax.hist(df['New Capacity based on 0.8 PF'], bins=binwidth,
color='red', edgecolor='black',alpha=0.5)
# Title and labels
ax.set_title('Histogram with Binwidth = %d' % binwidth, size=10)
ax.set_xlabel('Capacity', size=11)
ax.set_ylabel('Frequency count', size=11)
ax.axvline(x=df['New Capacity based on 0.8 PF'].median(), linestyle='dashed', alpha=0.3, color='blue')
min_ylim, max_ylim = plt.ylim()
ax.text(x=df['New Capacity based on 0.8 PF'].median(),y= max_ylim*0.9, s='Median', alpha=0.7, color='blue',fontsize = 12)
ax.axvline(x=df['New Capacity based on 0.8 PF'].mean(), linestyle='dashed', alpha=0.9, color='green')
min_ylim, max_ylim = plt.ylim()
ax.text(x=df['New Capacity based on 0.8 PF'].mean(),y= max_ylim*0.5, s='Mean', alpha=0.9, color='green',fontsize = 12)
plt.tight_layout()
plt.grid(True)
plt.savefig('/Users/jacob/Documents/Gensets_gis/historgrams/per_lgu_files/{}.png'.format(name))
I get all files created like this attached photo here.
Any ideas as to what I've done wrong?
Thanks in advance.
attached photo of one histogram output
My desired result would be something like this.
Desired output
It doesn't create new subplots but it use previous ones and then it draw new plots on old plots so you have to use clear subplot before you draw new histogram.
ax = plt.subplot(2, 2, i + 1)
ax.clear()
Example code. It gives desired output but if you remove `ax.clear() then first image will be OK but you get new plot with old plots on second and third image.
import os
import pandas as pd
import matplotlib.pyplot as plt
import random
for n in range(3):
filename = f'example_data_{n}.csv'
base = os.path.basename(filename)
name = os.path.splitext(base)[0]
df = pd.DataFrame({'New Capacity based on 0.8 PF': random.choices(list(range(1000)), k=100)})
data = df['New Capacity based on 0.8 PF']
median = data.median()
mean = data.mean()
# Show 4 different binwidths
for i, binwidth in enumerate([10, 20, 30, 40]):
# Set up the plot
ax = plt.subplot(2,2,i+1)
ax.clear() # <--- it removes previous histogram
plt.subplots_adjust( wspace=0.5, hspace=0.5)
# Draw the plot
ax.hist(data , bins=binwidth, color='red', edgecolor='black',alpha=0.5)
# Title and labels
ax.set_title('Histogram with Binwidth = %d' % binwidth, size=10)
ax.set_xlabel('Capacity', size=11)
ax.set_ylabel('Frequency count', size=11)
min_ylim, max_ylim = plt.ylim()
ax.axvline(x=median, linestyle='dashed', alpha=0.3, color='blue')
ax.text(x=median, y= max_ylim*0.9, s='Median', alpha=0.7, color='blue',fontsize = 12)
ax.axvline(x=mean, linestyle='dashed', alpha=0.9, color='green')
ax.text(x=mean, y= max_ylim*0.5, s='Mean', alpha=0.9, color='green',fontsize = 12)
plt.tight_layout()
plt.grid(True)
plt.savefig('{}.png'.format(name))
In this plot
inclination = np.pi/6
def power(inclination,phi):
h1=1.7
h2=0.5
D = np.arange(0.5, 12.0, 0.015)
r = np.sqrt((h1-h2)**2 + D**2)
freq = 865.7
lmb = 300/freq
H = D**2/(D**2+2*h1*h2)
theta = 4*np.pi*h1*h2/(lmb*D)
q_e = H**2*(np.sin(theta))**2 + (1 - H*np.cos(theta))**2
sigma = 1.94
N_1 = np.random.normal(0,sigma,D.shape)
rnd = 10**(-N_1/10)
F = 10
power=0.8
R,PHI = np.meshgrid(r,phi[1:-1])
alpha=inclination + np.arcsin((h1-h2)/R)
gain=3.136*(np.tan(alpha)*np.sin(np.pi/2*np.cos(alpha)*np.sin(PHI)))**2
y=10*np.log10( 1000*(power*gain*1.622*((lmb)**2) *0.5*1) / (((4*np.pi*R)**2) *1.2*1*F)*q_e*rnd )
return (R,PHI,y)
phi=np.linspace(0, np.pi,num=787)
x,y,z = power(np.pi/4,phi)
import cmocean
cmap = cmocean.cm.oxy
I would like to take out the characters x10^0 of the x ticks labels and show 2,3, 4, 6 ... and 10.
I have test a precedent post set ticks with logarithmic scale, but I cannot make it work and keep the colorbar of the heatmap.
EDIT
As suggested by #ImportanceOfBeingErnest, to plot the heatmap, I have changed the next lines
plt.contourf(x, y, z, 20, cmap=cmap)
cb=plt.colorbar();
plt.xlim(None, 12)
plt.ylim(0, np.pi)
plt.xlabel('Distance [m]', fontsize=12)
plt.ylabel('Phi [radians]', fontsize=12)
plt.xscale('log')
that plots this figure,
by this
fig1, ax1 = plt.subplots()
cs1 = ax1.contourf(x, y, z, 20, cmap=cmap)
fig1.colorbar(cs1,ax=ax1);
plt.xscale('log')
ax1.set_xlabel('Distance [m]', fontsize=12)
ax1.set_ylabel('Phi [radians]', fontsize=12)
#--- format y-labels in radians
y_pi = y/np.pi
unit = 0.25
y_tick = np.arange(0, 1 + unit, unit)
y_label = [r"$0$", r"$\frac{\pi}{4}$", r"$\frac{\pi}{2}$", r"$3\frac{\pi}{4}$", r"$\pi$"]
#y_label = [r"$" + format(r, ".2g")+ r"\pi$" for r in y_tick]
ax1.set_yticks(y_tick*np.pi)
ax1.set_yticklabels(y_label, fontsize=12)
#---
#--- x-labels removing the log format (i.e. 2x10^0 to 2)
ax1.set_xticks([2, 3, 4, 6, 10])
#ax1.get_xaxis().set_major_formatter(matplotlib.ticker.ScalarFormatter())
#ax1.get_xaxis().get_major_formatter().labelOnlyBase = False
ax1.set_xticklabels(["2", "3", "4", "6", "10"])
plots this figure,
which tests the solutions of set ticks with logarithmic scale and prints the desired labels but without removing the default log labels format.
I have a dataset containing 50 numeric variables and 1 categorical variable (segment_hc_print, having 6 categories). I want to see the spread of each variable in each category by plotting a grid of histogram, where each row would represent a category, column would represent the variable and each cell in a grid is a histogram. I am trying the code below to generate grid for single variable :
def grid_histogram(variable, bins):
fig = plt.figure(figsize=(20,10))
fig.set_size_inches(10,10, forward = True)
fig.suptitle(variable, fontsize = 8)
plt.locator_params(numticks = 4)
for i in np.arange(0, 6, 1):
ax = plt.subplot(6,1,i+1)
ax.hist(sensor_df_print_sample_v2[sensor_df_print_sample_v2.segment_hc_print == i][variable], bins)
ax.set_title("cluster = " + str(i), fontsize = 5)
ymin, ymax = ax.get_ylim()
ax.set_yticks(np.round(np.linspace(ymin, ymax, 3), 2))
xmin, xmax = ax.get_xlim()
ax.set_xticks(np.round(np.linspace(xmin, xmax,3),2))
plt.setp(ax.get_xticklabels(), rotation = 'vertical', fontsize = 4)
fig.tight_layout()
fig.savefig(str(variable) + '_histogram.pdf')
plt.show()
And this is what I am getting :
sample histogram
How do I generate a grid of such histograms, each variable stacked to the right of another ?
This code below generates the ideal size of histogram I need.
sample histogram
if I understand correctly, you could just create a grid with plt.subplots(). In the example below, I am plotting the first 5 variables as columns:
nr_of_categories = 6
nr_of_variables = 5
fig, ax = plt.subplots(nrows = nr_of_categories, cols = nr_of_variables, figsize = (20, 20))
for category in np.arange(0, nr_of_categories):
for variable in np.arange(0, nr_of_variables):
ax[category, variable].hist(sensor_df_print_sample_v2[sensor_df_print_sample_v2.segment_hc_print == i][variable], bins)
# and then the rest of your code where you replace ax with ax[category, variable]