I'm trying to plot bar hist of interest rates and attach to it a PDF line. I have looked for solutions and found a way with kdeplot.
The result is pretty strange the kdeplot line is much higher than the bars hist and I don't know how to fix it.
After applying kdeplot:
Before applying kdeplot:
Here is the code that I'm using:
df=pd.read_excel('interestrate.xlsx')
k=0.0005
bin_steps = np.arange(start = df['Interest rate Real'].min(), stop = df['Interest rate Real'].max(), step = k)
ax = df['Interest rate Real'].hist(bins = bin_steps, figsize=[10,5])
ax1 = df['Interest rate Real']
vals = ax.get_xticks()
ax.set_xticklabels(['{:,.2%}'.format(x) for x in vals])
ax.set_yticklabels(['{:,.2%}'.format(x) for x in vals])
ax.set_title("PDF for Real Interest Rate")
#sns.kdeplot(ax1)
The following code snippet should set you in the right direction (just insert your data):
import scipy.stats as st
y = np.random.randn(1000) # your data goes here
plt.hist(y,50, density=True)
mn, mx = plt.xlim()
plt.xlim(mn, mx)
x = np.linspace(mn, mx, 301)
kde = st.gaussian_kde(y)
plt.plot(x, kde.pdf(x));
Alternatively with seaborn:
import seaborn as sns
plt.hist(y,50, density=True)
sns.kdeplot(y);
or as simple as:
sns.distplot(y)
Related
I really like to the look of Seaborn's KDE plot:
I was wondering how can I replicate this for line plot.
In my case I actually have the function to generate the density instead of samples of the data.
So assuming I have the data in a data frame:
x - The value of x per sample.
y - The value of the density function at y.
μσ - Categorical variable to group data from the same density (In the code, I use the mean and standard deviation of a normal distribution).
I can use Seaborn's lineplot to get what I want without the area below the curve as in the image above.
I'm after achieving the look as above for the data I have.
Is there a way to replicate this theme, area under the curve included, for lineplot?
The code below shows what I got so far:
import numpy as np
import scipy as sp
import pandas as pd
from scipy.stats import norm
import matplotlib.pyplot as plt
import seaborn as sns
num_grid_pts = 1000
val_μ = [0, -1, 1, 0]
val_σ = [1, 2, 3, 4]
num_var = len(val_μ) # variations
x = np.linspace(-10, 10, num_grid_pts)
P = np.zeros((num_grid_pts, num_var)) # PDF
μσ = [f'μ = {μ}, σ = {σ}' for μ, σ in zip(val_μ, val_σ)]
for ii, (μ, σ) in enumerate(zip(val_μ, val_σ)):
randVar = norm(μ, σ)
P[:, ii] = randVar.pdf(x)
df_P = pd.DataFrame(data = {'x': np.tile(x, num_var), 'PDF': P.flatten('F'), 'μσ': np.repeat(μσ, len(x))})
f, ax = plt.subplots(figsize=(15, 10))
sns.lineplot(data=df_P, x='x', y='PDF', hue='μσ', ax=ax)
plot_lines = ax.get_lines()
for ii in range(num_var):
ax.fill_between(x=plot_lines[ii].get_xdata(), y1=plot_lines[ii].get_ydata(), alpha=0.25, color=plot_lines[ii].get_color())
ax.set_title(f'Normal Distribution')
ax.set_xlabel(f'Value')
ax.set_ylabel(f'Probability')
plt.show()
I used the lineplot to create the lines and then created the fills. But this is a hack, I was wondering if I can do it more naturally within Seaborn.
I found a way to manually play with the elements do so using the area object:
(
so.Plot(healthexp, "Year", "Spending_USD", color="Country")
.add(so.Area(alpha=.7), so.Stack())
)
The result is:
Yet for some reason the example code doesn't work.
What I did was using Seabron's lineplot() and then manually add fill_between() polygon:
ax = sns.lineplot(data=data_frame, x='data_x', y='data_y', hue='data_color')
plot_lines = ax.get_lines()
for i in range(num_unique_colors):
ax.fill_between(x=plot_lines[i].get_xdata(), y1=plot_lines[i].get_ydata(), alpha=0.25, color=plot_lines[i].get_color())
I am using scipy.signal library to find the peaks of a time graph. I inputted the y values of my pandas series. And it gave me the location of the the peaks. Now i am trying to use the locations from the find_peaks function to return the position in time of the peaks. Here is my function:
def turn_peaks_to_time_series(df,t_interval):
df_values = df['l'].values
fig, ax1 = plt.subplots()
x_of_peaks, _ = find_peaks(df_values, height=None)
y_of_peaks = df_values[x_of_peaks]
x_values_to_t_values = lambda x : timedelta(minutes=x) * t_interval
time_initial = np.min(df.index)
t_of_peaks = [ time_initial + x_values_to_t_values(int(i)) for i in x_of_peaks ] #source of issue
ax1.plot(t_of_peaks, y_of_peaks, "rp",label='peak') #plot peaks on graph
ax1.plot(df.index,df.l) # plot df line
plt.show()
However, peaks are not properly aligning
I know the issue is with my x_values_to_t_values function. In addition, any suggesting to optimize my code are very welcomed.
Turns out i was trying to reinvent the wheel. The solution to my problem was extremely simple. Also I adjusted the code to be more general.
def turn_peaks_to_time_series(series):
series_values = series.values
series_index = series.index
fig, ax1 = plt.subplots()
x_of_peaks, _ = find_peaks(series_values, height=None)
y_of_peaks = series_values[x_of_peaks]
ax1.plot(series_index[x_of_peaks], y_of_peaks, "rp",label='peak') #plot peaks on graph
ax1.plot(series_index,series_values) # plot df line
plt.show()
I have a code:
import math
import numpy as np
import pylab as plt1
from matplotlib import pyplot as plt
uH2 = 1.90866638
uHe = 3.60187307
eH2 = 213.38
eHe = 31.96
R = float(uH2*eH2)/(uHe*eHe)
C_Values = []
Delta = []
kHeST = []
J_f21 = []
data = np.genfromtxt("Lamda_HeHCL.txt", unpack=True);
J_i1=data[1];
J_f1=data[2];
kHe=data[7]
data = np.genfromtxt("Basecol_Basic_New_1.txt", unpack=True);
J_i2=data[0];
J_f2=data[1];
kH2=data[5]
print kHe
print kH2
kHe = map(float, kHe)
kH2 = map(float, kH2)
kHe = np.array(kHe)
kH2= np.array(kH2)
g = len(kH2)
for n in range(0,g):
if J_f2[n] == 1:
Jf21 = J_f2[n]
J_f21.append(Jf21)
ratio = kHe[n]/kH2[n]
C = (((math.log(float(kH2[n]),10)))-(math.log(float(kHe[n]),10)))/math.log(R,10)
C_Values.append(C)
St = abs(J_f1[n] - J_i1[n])
Delta.append(St)
print C_Values
print Delta
print J_f21
fig, ax = plt.subplots()
ax.scatter(Delta,C_Values)
for i, txt in enumerate(J_f21):
ax.annotate(txt, (Delta[i],C_Values[i]))
plt.plot(np.unique(Delta), np.poly1d(np.polyfit(Delta, C_Values, 1))(np.unique(Delta)))
plt.plot(Delta, C_Values)
fit = np.polyfit(Delta,C_Values,1)
fit_fn = np.poly1d(fit)
# fit_fn is now a function which takes in x and returns an estimate for y
plt.scatter(Delta,C_Values, Delta, fit_fn(Delta))
plt.xlim(0, 12)
plt.ylim(-3, 3)
In this code, I am trying to plot a linear regression that extends past the data and touches the x-axis. I am also trying to add a legend to the plot that shows the slope of the plot. Using the code, I was able to plot this graph.
Here is some trash data I have been using to try and extend the line and add a legend to my code.
x =[5,7,9,15,20]
y =[10,9,8,7,6]
I would also like it to be a scatter except for the linear regression line.
Given that you don't provide the data you're loading from files I was unable to test this, but off the top of my head:
To extend the line past the plot, you could turn this line
plt.plot(np.unique(Delta), np.poly1d(np.polyfit(Delta, C_Values, 1))(np.unique(Delta)))
Into something like
x = np.linspace(0, 12, 50) # both 0 and 12 are from visually inspecting the plot
plt.plot(x, np.poly1d(np.polyfit(Delta, C_Values, 1))(x))
But if you want the line extended to the x-axis,
polynomial = np.polyfit(Delta, C_Values, 1)
x = np.linspace(0, *np.roots(polynomial))
plt.plot(x, np.poly1d(polynomial)(x))
As for the scatter plot thing, it seems to me you could just remove this line:
plt.plot(Delta, C_Values)
Oh right, as for the legend, add a label to the plots you make, like this:
plt.plot(x, np.poly1d(polynomial)(x), label='Linear regression')
and add a call to plt.legend() just before plt.show().
The first loaded plot have too many ticks on X axe (see image01).
If I use the zoom action on X axe, the plot is now well loaded.
Can you give me some advise where I can search because The Plot constructor parameters seems good.
date_range = (735599.0, 735745.0)
x = (735610.5, 735647.0, 735647.5, 735648.5, 735669.0, 735699.0, 735701.5, 735702.5, 735709.5, 735725.5, 735728.5, 735735.5, 735736.0)
y = (227891.25361545716, 205090.4880046467, 208352.59317388065, 175462.99296699322, 98209.836461969651, 275063.37219361769, 219456.93600708069, 230731.12613806152, 209043.19805037521, 218297.51486296533, 208036.88967207001, 206311.71988471842, 216036.56824433553)
y0 = 218206.79192
x_after = (735610.5, 735647.0, 735647.5, 735701.5, 735702.5, 735709.5, 735725.5, 735728.5, 735735.5, 735736.0)
y_after = (227891.25361545716, 205090.4880046467, 208352.59317388065, 219456.93600708069, 230731.12613806152, 209043.19805037521, 218297.51486296533, 208036.88967207001, 206311.71988471842, 216036.56824433553)
linex = -39.1175584541
liney = 28993493.5251
ax.plot_date(x, numpy.array(y) / y0, color='r', xdate=True, marker='x')
ax.plot_date(x_after, numpy.array(y_after) / y0, color='r', xdate=True)
ax.set_xlim(date_range)
steps = list(ax.get_xlim())
steps.append(steps[-1] + 2)
steps = [steps[0] - 2] + steps
ax.plot(steps, numpy.array([linex * a + liney for a in steps]) / y0, color='b')
Thank you for your help.
Manuel
If you have too many xtick labels, so many that they are all munged together on the plot, you can reduce them using pyplot.xticks. the arguments are the points the labels apply to, the labels themselves and an optional rotation.
import numpy as np
import matplotlib.pyplot as plt
y = np.arange(10000)
ticks = y - 5000
plt.plot(y)
k = 1000
ys = y[::k]
ys = np.append(ys, y[-1])
labels = ticks[::k]
labels = np.append(labels, ticks[-1])
plt.xticks(ys,labels, rotation='vertical')
plt.show()
plt.close()
I'm not sure I understand exactly what you wanna do but is a rotation of your xticklabels sufficient for you?
# Add this code at the end of your script
# It will rotate the labels contained in your date range
plt.xticks(rotation=70)
If I test your code, I have 7 labels but the rotation argument is changed to 0 (horizontal)
I want to reproduce this plot. The errors are shown in the bottom of the plot. Can you please share how its done?
There is an example that I found here on stackoverflow, but it is in R.
How to create a graph showing the predictive model, data and residuals in R
You can create such plot in Matplotlib only by using add_axes. Here is an example.
from scipy.optimize import curve_fit
#Data
x = arange(1,10,0.2)
ynoise = x*numpy.random.rand(len(x))
#Noise; noise is scaled by x, in order to it be noticable on a x-squared function
ydata = x**2 + ynoise #Noisy data
#Model
Fofx = lambda x,a,b,c: a*x**2+b*x+c
#Best fit parameters
p, cov = curve_fit(Fofx,x,ydata)
#PLOT
fig1 = figure(1)
#Plot Data-model
frame1=fig1.add_axes((.1,.3,.8,.6))
#xstart, ystart, xend, yend [units are fraction of the image frame, from bottom left corner]
plot(x,ydata,'.b') #Noisy data
plot(x,Fofx(x,*p),'-r') #Best fit model
frame1.set_xticklabels([]) #Remove x-tic labels for the first frame
grid()
#Residual plot
difference = Fofx(x,*p) - ydata
frame2=fig1.add_axes((.1,.1,.8,.2))
plot(x,difference,'or')
grid()
This is an old post, but seeing that this is a top hit for making bottom residual plots, I thought it is useful to modify the code by #jaydeepsb that runs as is.
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
# Data
x = np.arange(1,10,0.2)
ynoise = x*np.random.rand(len(x))
ydata = x**2 + ynoise
Fofx = lambda x,a,b,c: a*x**2+b*x+c
p, cov = curve_fit(Fofx,x,ydata)
# Upper plot
fig1 = plt.figure(1)
frame1 = fig1.add_axes((.1,.3,.8,.6))
plt.plot(x,ydata,'.b')
plt.plot(x,Fofx(x,*p),'-r')
frame1.set_xticklabels([])
plt.grid()
# Residual plot
difference = Fofx(x,*p) - ydata
frame2 = fig1.add_axes((.1,.1,.8,.2))
plt.plot(x,difference,'or')
plt.grid()
I think you are looking for errorbars like this pylab_examples example code: errorbar_demo.py
You can add an additional subplot and plot the points with the error bars.
Edit: No border between plots:
from pylab import *
subplots_adjust(hspace=0.,wspace=0.)
subplot(211)
imshow(rand(100,100), cmap=cm.BuPu_r)
subplot(212)
imshow(rand(100,100), cmap=cm.BuPu_r)
show()