Similar questions to this have been asked before but not using these exact two plotting functions together so here we are:
I have a column from a pandas DataFrame that I am plotting both a histogram and the KDE. However, when I plot them, the y-axis is using the raw data value range instead of discrete number of samples/bin (what I want). How can I fix this? The actual plot is perfect, but the y-axis is wrong.
Data:
t2 = [140547476703.0, 113395471484.0, 158360225172.0, 105497674121.0, 186457736557.0, 153705359063.0, 36826568371.0, 200653068740.0, 190761317478.0, 126529980843.0, 98776029557.0, 132773701862.0, 14780432449.0, 167507656251.0, 121353262386.0, 136377019007.0, 134190768743.0, 218619462126.0, 07912778721.0, 215628911255.0, 147024833865.0, 94136343562.0, 135685803096.0, 165901502129.0, 45476074790.0, 125195690010.0, 113910844263.0, 123134290987.0, 112028565305.0, 93448218430.0, 07341012378.0, 93146854494.0, 132958913610.0, 102326700019.0, 196826471714.0, 122045354980.0, 76591131961.0, 134694468251.0, 120212625727.0, 108456858852.0, 106363042112.0, 193367024628.0, 39578667378.0, 178075400604.0, 155513974664.0, 132834624567.0, 137336282646.0, 125379267464.0]
Code:
fig = plt.figure()
# plot hist + kde
t2[t2.columns[0]].plot.kde(color = "maroon", label = "_nolegend_")
t2[t2.columns[0]].plot.hist(density = True, edgecolor = "grey", color = "tomato", title = t2.columns[0])
# plot mean/stdev
m = t2[t2.columns[0]].mean()
stdev = t2[t2.columns[0]].std()
plt.axvline(m, color = "black", ymax = 0.05, label = "mean")
plt.axvline(m-2*stdev, color = "black", ymax = 0.05, linestyle = ":", label = "+/- 2*Stdev")
plt.axvline(m+2*stdev, color = "black", ymax = 0.05, linestyle = ":")
plt.legend()
What it looks like now:
If you want the real counts, the you'll need to scale the KDE up by the width of the bins multiplied by the number of observations. The trickiest part is accessing the data pandas uses to plot the KDE. (I've removed parts related to the legend to simplify the problem at hand).
import matplotlib.pyplot as plt
import numpy as np
# Calculate KDE, get data
axis = t2[t2.columns[0]].plot.kde(color = "maroon", label = "_nolegend_")
xdata = axis.get_children()[0]._x
ydata = axis.get_children()[0]._y
plt.clf()
# Real figure
fig, ax = plt.subplots(figsize=(7,5))
# Plot Histogram, no density.
x = ax.hist(t2[t2.columns[0]], edgecolor = "grey", color = "tomato")
# size of the bins * N obs
scale = np.diff(x[1])[0]*len(t2)
# Plot scaled KDE
ax.plot(xdata, ydata*scale, color='blue')
ax.set_ylabel('N observations')
plt.show()
I'd like to draw a lognormal distribution of a given bar plot.
Here's the code
import matplotlib.pyplot as plt
from matplotlib.pyplot import figure
import numpy as np; np.random.seed(1)
import scipy.stats as stats
import math
inter = 33
x = np.logspace(-2, 1, num=3*inter+1)
yaxis = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.01,0.03,0.3,0.75,1.24,1.72,2.2,3.1,3.9,
4.3,4.9,5.3,5.6,5.87,5.96,6.01,5.83,5.42,4.97,4.60,4.15,3.66,3.07,2.58,2.19,1.90,1.54,1.24,1.08,0.85,0.73,
0.84,0.59,0.55,0.53,0.48,0.35,0.29,0.15,0.15,0.14,0.12,0.14,0.15,0.05,0.05,0.05,0.04,0.03,0.03,0.03, 0.02,
0.02,0.03,0.01,0.01,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0,0]
fig, ax = plt.subplots()
ax.bar(x[:-1], yaxis, width=np.diff(x), align="center", ec='k', color='w')
ax.set_xscale('log')
plt.xlabel('Diameter (mm)', fontsize='12')
plt.ylabel('Percentage of Total Particles (%)', fontsize='12')
plt.ylim(0,8)
plt.xlim(0.01, 10)
fig.set_size_inches(12, 12)
plt.savefig("Test.png", dpi=300, bbox_inches='tight')
Resulting plot:
What I'm trying to do is to draw the Probability Density Function exactly like the one shown in red in the graph below:
An idea is to convert everything to logspace, with u = log10(x). Then draw the density histogram in there. And also calculate a kde in the same space. Everything gets drawn as y versus u. When we have u at a top twin axes, x can stay at the bottom. Both axes get aligned by setting the same xlims, but converted to logspace on the top axis. The top axis can be hidden to get the desired result.
import matplotlib.pyplot as plt
import numpy as np
from scipy import stats
inter = 33
u = np.linspace(-2, 1, num=3*inter+1)
x = 10**u
us = np.linspace(u[0], u[-1], 500)
yaxis = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.01,0.03,0.3,0.75,1.24,1.72,2.2,3.1,3.9,
4.3,4.9,5.3,5.6,5.87,5.96,6.01,5.83,5.42,4.97,4.60,4.15,3.66,3.07,2.58,2.19,1.90,1.54,1.24,1.08,0.85,0.73,
0.84,0.59,0.55,0.53,0.48,0.35,0.29,0.15,0.15,0.14,0.12,0.14,0.15,0.05,0.05,0.05,0.04,0.03,0.03,0.03, 0.02,
0.02,0.03,0.01,0.01,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0,0]
yaxis = np.array(yaxis)
# reconstruct data from the given frequencies
u_data = np.repeat((u[:-1] + u[1:]) / 2, (yaxis * 100).astype(np.int))
kde = stats.gaussian_kde((u[:-1]+u[1:])/2, weights=yaxis, bw_method=0.2)
total_area = (np.diff(u)*yaxis).sum() # total area of all bars; divide by this area to normalize
fig, ax = plt.subplots()
ax2 = ax.twiny()
ax2.bar(u[:-1], yaxis, width=np.diff(u), align="edge", ec='k', color='w', label='frequencies')
ax2.plot(us, total_area*kde(us), color='crimson', label='kde')
ax2.plot(us, total_area * stats.norm.pdf(us, u_data.mean(), u_data.std()), color='dodgerblue', label='lognormal')
ax2.legend()
ax.set_xscale('log')
ax.set_xlabel('Diameter (mm)', fontsize='12')
ax.set_ylabel('Percentage of Total Particles (%)', fontsize='12')
ax.set_ylim(0,8)
xlim = np.array([0.01,10])
ax.set_xlim(xlim)
ax2.set_xlim(np.log10(xlim))
ax2.set_xticks([]) # hide the ticks at the top
plt.tight_layout()
plt.show()
PS: Apparently this also can be achieved directly without explicitly using u (at the cost of being slightly more cryptic):
x = np.logspace(-2, 1, num=3*inter+1)
xs = np.logspace(-2, 1, 500)
total_area = (np.diff(np.log10(x))*yaxis).sum() # total area of all bars; divide by this area to normalize
kde = gaussian_kde((np.log10(x[:-1])+np.log10(x[1:]))/2, weights=yaxis, bw_method=0.2)
ax.bar(x[:-1], yaxis, width=np.diff(x), align="edge", ec='k', color='w')
ax.plot(xs, total_area*kde(np.log10(xs)), color='crimson')
ax.set_xscale('log')
Note that the bandwidth set for gaussian_kde is a somewhat arbitrarily value. Larger values give a more equalized curve, smaller values keep closer to the data. Some experimentation can help.
I have a graph where the x-axis is the temperature in GeV, but I also need to put a reference of the temperature in Kelvin, so I thought of putting a parasite axis with the temperature in K. Trying to follow this answer How to add a second x-axis in matplotlib , Here is the example of the code. I get a second axis at the top of my graph, but it is not the temperature in K as I need.
import numpy as np
import matplotlib.pyplot as plt
tt = np.logspace(-14,10,100)
yy = np.logspace(-10,-2,100)
fig = plt.figure()
ax1 = fig.add_subplot(111)
ax2 = ax1.twiny()
ax1.loglog(tt,yy)
ax1.set_xlabel('Temperature (GeV')
new_tick_locations = np.array([.2, .5, .9])
def tick_function(X):
V = X*1.16e13
return ["%.1f" % z for z in V]
ax2.set_xlim(ax1.get_xlim())
ax2.set_xticks(new_tick_locations)
ax2.set_xticklabels(tick_function(ax1Xs))
ax2.set_xlabel('Temp (Kelvin)')
plt.show()
This is what I get when I run the code.
loglog plot
I need the parasite axis be proportional to the original x-axis. And that it can be easy to read the temperature in Kelvin when anyone sees the graph. Thanks in advance.
A general purpose solution may look as follows. Since you have a non-linear scale, the idea is to find the positions of nice ticks in Kelvin, convert to GeV, set the positions in units of GeV, but label them in units of Kelvin. This sounds complicated, but the advantage is that you do not need to find the ticks yourself, just rely on matplotlib for finding them.
What this requires though is the functional dependence between the two scales, i.e. the converion between GeV and Kelvin and its inverse.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
tt = np.logspace(-14,10,100)
yy = np.logspace(-10,-2,100)
fig = plt.figure()
ax1 = fig.add_subplot(111)
ax2 = ax1.twiny()
plt.setp([ax1,ax2], xscale="log", yscale="log")
ax1.get_shared_x_axes().join(ax1, ax2)
ax1.plot(tt,yy)
ax1.set_xlabel('Temperature (GeV)')
ax2.set_xlabel('Temp (Kelvin)')
fig.canvas.draw()
# 1 GeV == 1.16 × 10^13 Kelvin
Kelvin2GeV = lambda k: k / 1.16e13
GeV2Kelvin = lambda gev: gev * 1.16e13
loc = mticker.LogLocator()
locs = loc.tick_values(*GeV2Kelvin(np.array(ax1.get_xlim())))
ax2.set_xticks(Kelvin2GeV(locs))
ax2.set_xlim(ax1.get_xlim())
f = mticker.ScalarFormatter(useOffset=False, useMathText=True)
g = lambda x,pos : "${}$".format(f._formatSciNotation('%1.10e' % GeV2Kelvin(x)))
fmt = mticker.FuncFormatter(g)
ax2.xaxis.set_major_formatter(mticker.FuncFormatter(fmt))
plt.show()
The problem appears to be the following: When you use ax2.set_xlim(ax1.get_xlim()), you are basically setting the limit of upper x-axis to be the same as that of the lower x-axis. Now if you do
print(ax1.get_xlim())
print(ax2.get_xlim())
you get for both axes the same values as
(6.309573444801943e-16, 158489319246.11108)
(6.309573444801943e-16, 158489319246.11108)
but your lower x-axis is having a logarithmic scale. When you assign the limits using ax2.set_xlim(), the limits of ax2 are the same but the scale is still linear. That's why when you set the ticks at [.2, .5, .9], these values appear as ticks on the far left of the upper x-axis as in your figure.
The solution is to set the upper x-axis also to be a logarithmic scale. This is required because your new_tick_locations corresponds to the actual values on the lower x-axis. You just want to rename these values to show the ticklabels in Kelvin. It is clear from your variable names that new_tick_locations corresponds to the new tick locations. I use some modified values of new_tick_locations to highlight the problem.
I am using scientific formatting '%.0e' because 1 GeV = 1.16e13 K and so 0.5 GeV would be a very large value with many zeros.
Below is a sample answer:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as mtick
tt = np.logspace(-14,10,100)
yy = np.logspace(-10,-2,100)
fig = plt.figure()
ax1 = fig.add_subplot(111)
ax2 = ax1.twiny()
ax1.loglog(tt,yy)
ax1.set_xlabel('Temperature (GeV)')
new_tick_locations = np.array([0.000002, 0.05, 9000])
def tick_function(X):
V = X*1.16e13
return ["%.1f" % z for z in V]
ax2.set_xscale('log') # Setting the logarithmic scale
ax2.set_xlim(ax1.get_xlim())
ax2.set_xticks(new_tick_locations)
ax2.set_xticklabels(tick_function(new_tick_locations))
ax2.xaxis.set_major_formatter(mtick.FormatStrFormatter('%.0e'))
ax2.set_xlabel('Temp (Kelvin)')
plt.show()
I am trying to make use the polar plot projection to make a radar chart. I would like to know how to put only one grid line in bold (while the others should remain standard).
For my specific case, I would like to highlight the gridline associated to the ytick "0".
from matplotlib import pyplot as plt
import pandas as pd
import numpy as np
#Variables
sespi = pd.read_csv("country_progress.csv")
labels = sespi.country
progress = sespi.progress
angles=np.linspace(0, 2*np.pi, len(labels), endpoint=False)
#Concatenation to close the plots
progress=np.concatenate((progress,[progress[0]]))
angles=np.concatenate((angles,[angles[0]]))
#Polar plot
fig=plt.figure()
ax = fig.add_subplot(111, polar=True)
ax.plot(angles, progress, '.--', linewidth=1, c="g")
#ax.fill(angles, progress, alpha=0.25)
ax.set_thetagrids(angles * 180/np.pi, labels)
ax.set_yticklabels([-200,-150,-100,-50,0,50,100,150,200])
#ax.set_title()
ax.grid(True)
plt.show()
The gridlines of a plot are Line2D objects. Therefore you can't make it bold. What you can do (as shown, in part, in the other answer) is to increase the linewidth and change the colour but rather than plot a new line you can do this to the specified gridline.
You first need to find the index of the y tick labels which you want to change:
y_tick_labels = [-100,-10,0,10]
ind = y_tick_labels.index(0) # find index of value 0
You can then get a list of the gridlines using gridlines = ax.yaxis.get_gridlines(). Then use the index you found previously on this list to change the properties of the correct gridline.
Using the example from the gallery as a basis, a full example is shown below:
r = np.arange(0, 2, 0.01)
theta = 2 * np.pi * r
ax = plt.subplot(111, projection='polar')
ax.set_rmax(2)
ax.set_rticks([0.5, 1, 1.5, 2]) # less radial ticks
ax.set_rlabel_position(-22.5) # get radial labels away from plotted line
ax.grid(True)
y_tick_labels = [-100, -10, 0, 10]
ax.set_yticklabels(y_tick_labels)
ind = y_tick_labels.index(0) # find index of value 0
gridlines = ax.yaxis.get_gridlines()
gridlines[ind].set_color("k")
gridlines[ind].set_linewidth(2.5)
plt.show()
Which gives:
It is just a trick, but I guess you could just plot a circle and change its linewidth and color to whatever could be bold for you.
For example:
import matplotlib.pyplot as plt
import numpy as np
Yline = 0
Npoints = 300
angles = np.linspace(0,360,Npoints)*np.pi/180
line = 0*angles + Yline
ax = plt.subplot(111, projection='polar')
plt.plot(angles, line, color = 'k', linewidth = 3)
plt.ylim([-1,1])
plt.grid(True)
plt.show()
In this piece of code, I plot a line using plt.plot between any point of the two vectors angles and line. The former is actually all the angles between 0 and 2*np.pi. The latter is constant, and equal to the 'height' you want to plot that line Yline.
I suggest you try to decrease and increase Npoints while having a look to the documentaion of np.linspace() in order to understand your problem with the roundness of the circle.