Ploting 95% confidence interval line plot with shaded area in python - python

I want to plot a 95% confidence interval of a data frame using python. The graph will be a line plot where the x-axis will indicate the column name/number, and the y-axis will indicate the column values. I search a lot but could find the solution that I was looking for. Here is an example of my data frame.
Ph1 Ph2 Ph3 ph4 Ph5 Ph6
-0.152511 -0.039428 0.131173 -0.002039 0.008101 -0.002039
-0.068273 0.152013 -0.315244 0.005247 0.014775 -0.045268
0.425363 -0.043105 0.071670 -0.045124 -0.036135 -0.037250
-0.019332 0.139712 -0.026001 -0.021844 -0.040854 -0.050648
0.077907 0.341410 -0.113731 -0.065799 -0.027229 -0.077948
0.145185 0.112060 0.093898 0.028815 -0.032327 0.004239
Also attached an example of my graph, in this plot I shown the how desired graph's x-axis and y-axis will be.

Answer
You can use seaborn.lineplot to do that, since seaborn uses 95% CI by default, but firstly you need to reshape your data through pandas.melt.
If you start from data in a dataframe df like the one you provided, you can reshape it with:
df = pd.melt(frame = df,
var_name = 'column',
value_name = 'value')
output:
column value
0 Ph1 -0.152511
1 Ph1 -0.068273
2 Ph1 0.425363
3 Ph1 -0.019332
4 Ph1 0.077907
5 Ph1 0.145185
6 Ph2 -0.039428
7 Ph2 0.152013
8 Ph2 -0.043105
9 Ph2 0.139712
10 Ph2 0.341410
11 Ph2 0.112060
12 Ph3 0.131173
13 Ph3 -0.315244
14 Ph3 0.071670
15 Ph3 -0.026001
16 Ph3 -0.113731
17 Ph3 0.093898
18 ph4 -0.002039
19 ph4 0.005247
20 ph4 -0.045124
21 ph4 -0.021844
22 ph4 -0.065799
23 ph4 0.028815
24 Ph5 0.008101
25 Ph5 0.014775
26 Ph5 -0.036135
27 Ph5 -0.040854
28 Ph5 -0.027229
29 Ph5 -0.032327
30 Ph6 -0.002039
31 Ph6 -0.045268
32 Ph6 -0.037250
33 Ph6 -0.050648
34 Ph6 -0.077948
35 Ph6 0.004239
Then you can plot this df with:
fig, ax = plt.subplots()
sns.lineplot(ax = ax,
data = df,
x = 'column',
y = 'value',
sort = False)
plt.show()
Complete Code
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
df = pd.read_csv('data.csv')
df = pd.melt(frame = df,
var_name = 'column',
value_name = 'value')
fig, ax = plt.subplots()
sns.lineplot(ax = ax,
data = df,
x = 'column',
y = 'value')
plt.show()
Plot

Related

Plot line from dataframe

I have the following dataframe [1] which contains information relating to music listening. I would like to print a line graph like the following 2 (I got it by putting the data manually) in which the slotID and the average bpm are related, without writing the values by hand . Each segment must be one unit long and must match the average bpm.
[1]
slotID NUn NTot MeanBPM
2 2 3 13 107.987769
9 11 3 30 133.772100
10 12 3 40 122.354025
13 15 4 44 123.221659
14 16 4 30 129.083900
15 17 9 66 123.274409
16 18 4 25 131.323480
18 20 5 40 124.782625
19 21 6 30 127.664467
20 22 6 19 120.483579
The code I used to obtain the plot is the following:
import numpy as np
import pylab as pl
from matplotlib import collections as mc
lines = [ [(2, 107), (3,107)], [(11,133),(12,133)], [(12,122),(13,122)], ]
c = np.array([(1, 0, 0, 1), (0, 1, 0, 1), (0, 0, 1, 1)])
lc = mc.LineCollection(lines, colors=c, linewidths=2)
fig, ax = pl.subplots()
ax.add_collection(lc)
ax.autoscale()
ax.margins(0.1)
To obtain data:
import numpy as np
import pandas as pd
dfLunedi = pd.read_csv( "5.sab.csv", encoding = "ISO-8859-1", sep = ';')
dfSlotMean = dfLunedi.groupby('slotID', as_index=False).agg( NSabUn=('date', 'nunique'),NSabTot = ('date', 'count'), MeanBPM=('tempo', 'mean') )
df = pd.DataFrame(dfSlotMean)
df.to_csv('sil.csv', sep = ';', index=False)
df.drop(df[df.NSabUn < 3].index, inplace=True)
You can loop through the rows and plot each segment like this:
for _, r in df.iterrows():
plt.plot([r['slotID'], r['slotID']+1], [r['MeanBPM']]*2)
Output:

Different binning for histplot as JoinGrid (x,y) marginal plot

I have a pandas dataframe like this:
Date
Weight
Year
Month
Day
Week
DayOfWeek
0
2017-11-13
76.1
2017
11
13
46
0
1
2017-11-14
76.2
2017
11
14
46
1
2
2017-11-15
76.6
2017
11
15
46
2
3
2017-11-16
77.1
2017
11
16
46
3
4
2017-11-17
76.7
2017
11
17
46
4
...
...
...
...
...
...
...
...
I created a JoinGrid with:
g = sns.JointGrid(data=df,
x="Date",
y="Weight",
marginal_ticks=True,
height=6,
ratio=2,
space=.05)
Then a defined joint and marginal plots:
g.plot_joint(sns.scatterplot,
hue=df["Year"],
alpha=.4,
legend=True)
g.plot_marginals(sns.histplot,
multiple="stack",
bins=20,
hue=df["Year"])
Result is this.
Now the question is: "is it possible to specify different binning for the two histplot resulting in the x and y marginal plot?"
I don't think there is a built-in way to do that, by you can plot directly on the marginal axes using the plotting function of your choice, like so:
penguins = sns.load_dataset('penguins')
data = penguins
x_col = "bill_length_mm"
y_col = "bill_depth_mm"
hue_col = "species"
g = sns.JointGrid(data=data, x=x_col, y=y_col, hue=hue_col)
g.plot_joint(sns.scatterplot)
# top marginal
sns.histplot(data=data, x=x_col, hue=hue_col, bins=5, ax=g.ax_marg_x, legend=False, multiple='stack')
# right marginal
sns.histplot(data=data, y=y_col, hue=hue_col, bins=40, ax=g.ax_marg_y, legend=False, multiple='stack')

Making Categorical or Grouped Bar Graph with secondary Axis Line Graph

I need to compare different sets of daily data between 4 shifts(categorical / groupby), using bar graphs and line graphs. I have looked everywhere and have not found a working solution for this that doesn't include generating new pivots and such.
I've used both, matplotlib and seaborn, and while I can do one or the other(different colored bars/lines for each shift), once I incorporate the other, either one disappears, or other anomalies happen like only one plot point shows. I have looked all over and there are solutions for representing a single series of data on both chart types, but none that goes into multi category or grouped for both.
Data Example:
report_date wh_id shift Head_Count UTL_R
3/17/19 55 A 72 25%
3/18/19 55 A 71 10%
3/19/19 55 A 76 20%
3/20/19 55 A 59 33%
3/21/19 55 A 65 10%
3/22/19 55 A 54 20%
3/23/19 55 A 66 14%
3/17/19 55 1 11 10%
3/17/19 55 2 27 13%
3/17/19 55 3 18 25%
3/18/19 55 1 23 100%
3/18/19 55 2 16 25%
3/18/19 55 3 12 50%
3/19/19 55 1 28 10%
3/19/19 55 2 23 50%
3/19/19 55 3 14 33%
3/20/19 55 1 29 25%
3/20/19 55 2 29 25%
3/20/19 55 3 10 50%
3/21/19 55 1 17 20%
3/21/19 55 2 29 14%
3/21/19 55 3 30 17%
3/22/19 55 1 12 14%
3/22/19 55 2 10 100%
3/22/19 55 3 17 14%
3/23/19 55 1 16 10%
3/23/19 55 2 11 100%
3/23/19 55 3 13 10%
tm_daily_df = pd.read_csv('fg_TM_Daily.csv')
tm_daily_df = tm_daily_df.set_index('report_date')
fig2, ax2 = plt.subplots(figsize=(12,8))
ax3 = ax2.twinx()
group_obj = tm_daily_df.groupby('shift')
g = group_obj['Head_Count'].plot(kind='bar', x='report_date', y='Head_Count',ax=ax2,stacked=False,alpha = .2)
g = group_obj['UTL_R'].plot(kind='line',x='report_date', y='UTL_R', ax=ax3,marker='d', markersize=12)
plt.legend(tm_daily_df['shift'].unique())
This code has gotten me the closest I've been able to get. Notice that even with stacked = False, they are still stacked. I changed the setting to True, and nothing changes.
All i need is for the bars to be next to each other with the same color scheme representative of the shift
The graph:
Here are two solutions (stacked and unstacked). Based on your questions we will:
plot Head_Count in the left y axis and UTL_R in the right y axis.
report_date will be our x axis
shift will represent the hue of our graph.
The stacked version uses pandas default plotting feature, and the unstacked version uses seaborn.
EDIT
From your request, I added a 100% stacked graph. While it is not quite exactly what you asked in the comment, the graph type you asked may create some confusion when reading (are the values based on the upper line of the stack or the width of the stack). An alternative solution may be using a 100% stacked graph.
Stacked
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
dfg = df.set_index(['report_date', 'shift']).sort_index(level=[0,1])
fig, ax = plt.subplots(figsize=(12,6))
ax2 = ax.twinx()
dfg['Head_Count'].unstack().plot.bar(stacked=True, ax=ax, alpha=0.6)
dfg['UTL_R'].unstack().plot(kind='line', ax=ax2, marker='o', legend=None)
ax.set_title('My Graph')
plt.show()
Stacked 100%
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
dfg = df.set_index(['report_date', 'shift']).sort_index(level=[0,1])
# Create `Head_Count_Pct` column
for date in dfg.index.get_level_values('report_date').unique():
for shift in dfg.loc[date, :].index.get_level_values('shift').unique():
dfg.loc[(date, shift), 'Head_Count_Pct'] = dfg.loc[(date, shift), 'Head_Count'].sum() / dfg.loc[(date, 'A'), 'Head_Count'].sum()
fig, ax = plt.subplots(figsize=(12,6))
ax2 = ax.twinx()
pal = sns.color_palette("Set1")
dfg[dfg.index.get_level_values('shift').isin(['1','2','3'])]['Head_Count_Pct'].unstack().plot.bar(stacked=True, ax=ax, alpha=0.5, color=pal)
dfg['UTL_R'].unstack().plot(kind='line', ax=ax2, marker='o', legend=None, color=pal)
ax.set_title('My Graph')
plt.show()
Unstacked
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
dfg = df.set_index(['report_date', 'shift']).sort_index(level=[0,1])
fig, ax = plt.subplots(figsize=(15,6))
ax2 = ax.twinx()
sns.barplot(x=dfg.index.get_level_values('report_date'),
y=dfg.Head_Count,
hue=dfg.index.get_level_values('shift'), ax=ax, alpha=0.7)
sns.lineplot(x=dfg.index.get_level_values('report_date'),
y=dfg.UTL_R,
hue=dfg.index.get_level_values('shift'), ax=ax2, marker='o', legend=None)
ax.set_title('My Graph')
plt.show()
EDIT #2
Here is the graph as you requested in a second time (stacked, but stack n+1 does not start where stack n ends).
It is slightly more involving as we have to do multiple things:
- we need to manually assign our color to our shift in our df
- once we have our colors assign, we will iterate through each date range and 1) sort or Head_Count values descending (so that our largest sack is in the back when we plot the graph), and 2) plot the data and assign the color to each stacj
- Then we can create our second y axis and plot our UTL_R values
- Then we need to assign the correct color to our legend labels
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
def assignColor(shift):
if shift == 'A':
return 'R'
if shift == '1':
return 'B'
if shift == '2':
return 'G'
if shift == '3':
return 'Y'
# map a color to a shift
df['color'] = df['shift'].apply(assignColor)
fig, ax = plt.subplots(figsize=(15,6))
# plot our Head_Count values
for date in df.report_date.unique():
d = df[df.report_date == date].sort_values(by='Head_Count', ascending=False)
y = d.Head_Count.values
x = date
color = d.color
b = plt.bar(x,y, color=color)
# Plot our UTL_R values
ax2 = ax.twinx()
sns.lineplot(x=df.report_date, y=df.UTL_R, hue=df['shift'], marker='o', legend=None)
# Assign the color label color to our legend
leg = ax.legend(labels=df['shift'].unique(), loc=1)
legend_maping = dict()
for shift in df['shift'].unique():
legend_maping[shift] = df[df['shift'] == shift].color.unique()[0]
i = 0
for leg_lab in leg.texts:
leg.legendHandles[i].set_color(legend_maping[leg_lab.get_text()])
i += 1
How about this?
tm_daily_df['UTL_R'] = tm_daily_df['UTL_R'].str.replace('%', '').astype('float') / 100
pivoted = tm_daily_df.pivot_table(values=['Head_Count', 'UTL_R'],
index='report_date',
columns='shift')
pivoted
# Head_Count UTL_R
# shift 1 2 3 A 1 2 3 A
# report_date
# 3/17/19 11 27 18 72 0.10 0.13 0.25 0.25
# 3/18/19 23 16 12 71 1.00 0.25 0.50 0.10
# 3/19/19 28 23 14 76 0.10 0.50 0.33 0.20
# 3/20/19 29 29 10 59 0.25 0.25 0.50 0.33
# 3/21/19 17 29 30 65 0.20 0.14 0.17 0.10
# 3/22/19 12 10 17 54 0.14 1.00 0.14 0.20
# 3/23/19 16 11 13 66 0.10 1.00 0.10 0.14
fig, ax = plt.subplots()
pivoted['Head_Count'].plot.bar(ax=ax)
pivoted['UTL_R'].plot.line(ax=ax, legend=False, secondary_y=True, marker='D')
ax.legend(loc='upper left', title='shift')

Non-linear regression in Seaborn Python

I have the following dataframe that I wish to perform some regression on. I am using Seaborn but can't quite seem to find a non-linear function that fits. Below is my code and it's output, and below that is the dataframe I am using, df. Note I have truncated the axis in this plot.
I would like to fit either a Poisson or Gaussian distribution style of function.
import pandas
import seaborn
graph = seaborn.lmplot('$R$', 'Equilibrium Value', data = df, fit_reg=True, order=2, ci=None)
graph.set(xlim = (-0.25,10))
However this produces the following figure.
df
R Equilibrium Value
0 5.102041 7.849315e-03
1 4.081633 2.593005e-02
2 0.000000 9.990000e-01
3 30.612245 4.197446e-14
4 14.285714 6.730133e-07
5 12.244898 5.268202e-06
6 15.306122 2.403316e-07
7 39.795918 3.292955e-18
8 19.387755 3.875505e-09
9 45.918367 5.731842e-21
10 1.020408 9.936863e-01
11 50.000000 8.102142e-23
12 2.040816 7.647420e-01
13 48.979592 2.353931e-22
14 43.877551 4.787156e-20
15 34.693878 6.357120e-16
16 27.551020 9.610208e-13
17 29.591837 1.193193e-13
18 31.632653 1.474959e-14
19 3.061224 1.200807e-01
20 23.469388 6.153965e-11
21 33.673469 1.815181e-15
22 42.857143 1.381050e-19
23 25.510204 7.706746e-12
24 13.265306 1.883431e-06
25 9.183673 1.154141e-04
26 41.836735 3.979575e-19
27 36.734694 7.770915e-17
28 18.367347 1.089037e-08
29 44.897959 1.657448e-20
30 16.326531 8.575577e-08
31 28.571429 3.388120e-13
32 40.816327 1.145412e-18
33 11.224490 1.473268e-05
34 24.489796 2.178927e-11
35 21.428571 4.893541e-10
36 32.653061 5.177167e-15
37 8.163265 3.241799e-04
38 22.448980 1.736254e-10
39 46.938776 1.979881e-21
40 47.959184 6.830820e-22
41 26.530612 2.722925e-12
42 38.775510 9.456077e-18
43 6.122449 2.632851e-03
44 37.755102 2.712309e-17
45 10.204082 4.121137e-05
46 35.714286 2.223883e-16
47 20.408163 1.377819e-09
48 17.346939 3.057373e-08
49 7.142857 9.167507e-04
EDIT
Attached are two graphs produced from both this and another data set when increasing the order parameter beyond 20.
Order = 3
I have problems understanding why a lmplot is needed here. Usually you want to perform a fit by taking a model function and fit it to the data.
Assume you want a gaussian function
model = lambda x, A, x0, sigma, offset: offset+A*np.exp(-((x-x0)/sigma)**2)
you can fit it to your data with scipy.optimize.curve_fit:
popt, pcov = curve_fit(model, df["R"].values,
df["EquilibriumValue"].values, p0=[1,0,2,0])
Complete code:
import pandas as pd
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
df = ... # your dataframe
# plot data
plt.scatter(df["R"].values,df["EquilibriumValue"].values, label="data")
# Fitting
model = lambda x, A, x0, sigma, offset: offset+A*np.exp(-((x-x0)/sigma)**2)
popt, pcov = curve_fit(model, df["R"].values,
df["EquilibriumValue"].values, p0=[1,0,2,0])
#plot fit
x = np.linspace(df["R"].values.min(),df["R"].values.max(),250)
plt.plot(x,model(x,*popt), label="fit")
# Fitting
model2 = lambda x, sigma: model(x,1,0,sigma,0)
popt2, pcov2 = curve_fit(model2, df["R"].values,
df["EquilibriumValue"].values, p0=[2])
#plot fit2
x2 = np.linspace(df["R"].values.min(),df["R"].values.max(),250)
plt.plot(x2,model2(x2,*popt2), label="fit2")
plt.xlim(None,10)
plt.legend()
plt.show()

How to convert an arrange in a date time format to be plotted in python?

I have a list like this
HEIGHT DATE TIME ANGL FC COL ROW
3.76 20120127 18 27 52 291.9 1 399.0 311.0
5.46 20120127 18 38 43 293.5 1 462.0 343.0
6.31 20120127 18 43 18 292.8 1 311.0 288.0
8.49 20120127 18 54 05 290.7 1 330.0 293.0
11.08 20120127 19 06 05 293.1 1 350.0 305.0
13.47 20120127 19 18 05 296.1 1 367.0 319.0
16.09 20120127 19 30 06 297.8 1 386.0 333.0
18.47 20120127 19 42 05 299.0 1 403.0 346.0
21.73 20120127 19 54 06 300.4 1 426.0 364.0
23.40 20120127 20 06 05 301.8 1 436.0 376.0
28.33 20120127 20 18 05 302.7 1 471.0 402.0
I want to make a kind of time arrange using DATE and TIME rows into a variables and then plot this versus anyother
I tried to use datetime but i dont get anything
import datetime as dt
data=loadtxt('CME27.txt', skiprows=1)
col=data[:,7]
row=data[:,8]
h=data[:,2]
m=data[:,3]
s=data[:,4]
t=dt.time(h,m,s)
i got an error!
i'd want to plot
plot(t,col)
Thanks
I don't think you can plot datetime.time objects directly using matplotlib. You can plot datetime.datetime objects, however. Given the NumPy array data, you'll have to use a Python loop to parse the floats into datetime.datetime objects.
You could do that like this:
import numpy as np
import datetime as DT
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
data = np.loadtxt('CME27.txt', skiprows=1)
col, row = data[:, 7:9].T
dates = []
for date, h, m, s in data[:,1:5]:
dates.append(
DT.datetime.strptime('{date:.0f} {h:.0f} {m:.0f} {s:.0f}'.format(**locals()),
'%Y%m%d %H %M %S'))
fig, ax = plt.subplots()
ax.plot(dates, col)
plt.xticks(rotation=25)
xfmt = mdates.DateFormatter('%H:%M:%S')
ax.xaxis.set_major_formatter(xfmt)
plt.show()
If you install pandas, then the above could be simplified to
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_table('CME27.txt', sep='\s+', skiprows=1, header=None,
parse_dates={'date':[1,2,3,4]})
df.columns = 'date height angle fc col row'.split()
df.plot('date', 'col')
plt.show()

Categories