Python Difference-in-Difference Regression Coefficient plot with 95 interval - python

I want to draw a difference-in-differences coefficient graph with 95ci.
Here is my data frame.
example:
x coef stderr ci_lower ci_upper
t-3 .0005219 .0004025 -.000759 .0018028
t-2 .0008449 .0004205 -.0004935 .0021833
t-1 0 0 0 0
t .0009092 .0003047 -.0000604 .0018789
t+1 .0010828 .0002889 .0001634 .0020023
t+2 .0013157 .0001613 .0008022 .0018291
t+3 .001452 .0001755 .0008933 .0020106
Here is the code that I used. But it didn't work and not that informative
plt.errorbar(x=x, y=coef, yerr=ci_lower(?), color="black", capsize=3,
linestyle="None",
marker="s", markersize=7, mfc="black", mec="black")
Is there any way to have a regression coef graph like that?
Thanks in advance

You should provide the data and also an example of the plot you like. Your data would be:
df = {'x': {0: 't-3', 1: 't-2', 2: 't-1', 3: 't', 4: 't+1', 5: 't+2', 6: 't+3'},
'coef': {0: 0.0005219,
1: 0.0008449,
2: 0.0,
3: 0.0009092,
4: 0.0010828,
5: 0.0013157,
6: 0.001452},
'stderr': {0: 0.0004025,
1: 0.0004205,
2: 0.0,
3: 0.0003047,
4: 0.0002889,
5: 0.0001613,
6: 0.0001755},
'ci_lower': {0: -0.000759,
1: -0.0004935,
2: 0.0,
3: -6.04e-05,
4: 0.0001634,
5: 0.0008022,
6: 0.0008933},
'ci_upper': {0: 0.0018028,
1: 0.0021833,
2: 0.0,
3: 0.0018789,
4: 0.0020023,
5: 0.0018291,
6: 0.0020106}}
df = pd.DataFrame(df)
I am guessing it is like:
fig, ax = plt.subplots()
df.plot.scatter(x = "coef", y = "x",ax=ax)
for ix in df.index:
ax.plot(df[['ci_lower','ci_upper']].loc[ix].values, df.loc[ix,['x','x']].values,c="b")

Related

Categorial area stackplot in pandas grouped by date

I found the way to implement the stackplot if my x-axis is just a list of numbers.
import pandas as pd
import matplotlib.pyplt as plt
d = {'time_key': {0: '2021-03-01',
1: '2021-03-01',
2: '2021-03-01',
3: '2021-03-01'},
'target': {0: 2, 1: 1, 2: 0, 3: 3},
'count': {0: 400, 1: 300, 2: 200, 3: 100},
'fraction': {0: 0.4, 1: 0.3, 2: 0.2, 3: 0.1}}
df = pd.DataFrame(d)
plt.stackplot(range(2), s[s.target==0].fraction, s[s.target==1].fraction,
s[s.target==2].fraction, s[s.target==3].fraction)
But I want to generalize the plot to many dates list.
d = {'time_key': {0: '2021-03-01',
1: '2021-03-01',
2: '2021-03-01',
3: '2021-03-01',
4: '2021-04-01',
5: '2021-04-01',
6: '2021-04-01',
7: '2021-04-01',
8: '2021-05-01',
9: '2021-05-01',
10: '2021-05-01',
11: '2021-05-01'},
'target': {0: 2,
1: 1,
2: 0,
3: 3,
4: 2,
5: 1,
6: 0,
7: 3,
8: 2,
9: 1,
10: 0,
11: 3},
'count': {0: 163,
1: 110,
2: 90,
3: 38,
4: 113,
5: 97,
6: 56,
7: 34,
8: 85,
9: 57,
10: 42,
11: 16},
'fraction': {0: 0.18091009988901222,
1: 0.1220865704772475,
2: 0.09988901220865705,
3: 0.042175360710321866,
4: 0.12541620421753608,
5: 0.1076581576026637,
6: 0.06215316315205328,
7: 0.03773584905660377,
8: 0.09433962264150944,
9: 0.06326304106548279,
10: 0.04661487236403995,
11: 0.017758046614872364}}
And I'd like to assign dates to x-axis in ascending order to see dynamics of the proportions.
Is this a way to implement it in a proper way?
The approximate desired output plot (I need time_key x-axis though):
Try:
dfp = df.set_index(['time_key','target'])['count'].unstack()
dfp.div(dfp.sum(axis=1), axis=0).plot.bar(stacked=True)
Output:
Also useful solution is
d = {0: {'2021-03-01': 0.2, '2021-04-01': 0.25, '2021-05-01': 0.3},
1: {'2021-03-01': 0.3, '2021-04-01': 0.25, '2021-05-01': 0.3},
2: {'2021-03-01': 0.4, '2021-04-01': 0.25, '2021-05-01': 0.3},
3: {'2021-03-01': 0.1, '2021-04-01': 0.25, '2021-05-01': 0.1}}
df = pd.DataFrame(d)
fig, ax = plt.subplots(figsize=(9, 6))
plt.style.use('classic')
df.plot.area(ax=ax)

Bokeh Hovertool stacked barchart

I have constructed a Bokeh stacked barchart by the code below. The chart shows the different tree types for the districts of Copenhagen. At the moment I have a hoverTool which shows the excat amount of trees (corrosponding to the columns with the tree names) for the tree type but I also want it to show the percentage (the columns with _pat the end), but how can I do this with the stacked bar chart?
A reduced part of the data frame:
temp=pd.DataFrame( {'bydelsnavn': {0: 'Amager Vest', 1: 'Amager Øst', 2: 'Bispebjerg', 3: 'Brønshøj-Husum', 4: 'Indre By', 5: 'Nørrebro', 6: 'Valby', 7: 'Vanløse', 8: 'Vesterbro', 9: 'Østerbro'}, 'Alder': {0: 53.0, 1: 21.0, 2: 1.0, 3: 9.0, 4: 4.0, 5: 2.0, 6: 3.0, 7: 44.0, 8: 46.0, 9: 59.0}, 'Alderm': {0: 63.0, 1: 32.0, 2: 49.0, 3: 13.0, 4: 45.0, 5: 55.0, 6: 104.0, 7: 0.0, 8: 50.0, 9: 4.0}, 'Apple': {0: 94.0, 1: 109.0, 2: 115.0, 3: 12.0, 4: 22.0, 5: 81.0, 6: 41.0, 7: 3.0, 8: 132.0, 9: 51.0}, 'Alder_p': {0: 21.9, 1: 8.68, 2: 0.41, 3: 3.72, 4: 1.65, 5: 0.83, 6: 1.24, 7: 18.18, 8: 19.01, 9: 24.38}, 'Alderm_p': {0: 15.18, 1: 7.71, 2: 11.81, 3: 3.13, 4: 10.84, 5: 13.25, 6: 25.06, 7: 0.0, 8: 12.05, 9: 0.96}, 'Apple_p': {0: 14.24, 1: 16.52, 2: 17.42, 3: 1.82, 4: 3.33, 5: 12.27, 6: 6.21, 7: 0.45, 8: 20.0, 9: 7.73}})
My code:
treeName = ['Alder','Alderm','Apple']
treeName_p = ['Alder_p','Alderm_p','Apple_p']
colornames = named.__all__
colornames = colornames[:len(treeName)]
# Create an empty figure
p = figure(x_range = temp['bydelsnavn'].values,plot_width = 700, plot_height=400,
title='Tree pr. district', toolbar_sticky = False,
tools = 'pan,wheel_zoom,reset')
# Stacked bar chart
renderers = p.vbar_stack(stackers=treeName,x='bydelsnavn',source=temp,
width=0.8, color = colornames)
# Add the hover tool
for r in renderers:
tree = r.name
hover = HoverTool(tooltips=[
("%s" % tree, "#{%s}" % tree)
], renderers = [r])
p.add_tools(hover)
# remove the grid
p.xgrid.grid_line_color=None
p.ygrid.grid_line_color=None
# Make sure bars stat at 0
p.y_range.start = 0
# remove - y-axis
p.yaxis.visible = False
# Remove the grey box around the plot
p.outline_line_color = None
# Turn the x-labels
p.xaxis.major_label_orientation = 0.5
# Remove tool bar logo
p.toolbar.logo = None
# Move the border of the left side to show "Amager"
p.min_border_left = 30
show(p)
My current chart looks like this:
Assuming that the values of the _p columns are actually in the data source, you can just add another tooltip to the HoverTool:
for r in renderers:
tree = r.name
p.add_tools(HoverTool(tooltips=[(tree, "#$name"),
(f"{tree} %", f"#{tree}_p")],
renderers=[r]))
Notice how #$name is used in there - not that necessary in this particular case but sometimes comes in handy.

plotnine/ggplot - changing legend positions

I have this dataframe:
df = pd.DataFrame({'ymin': {0: 0.0,
1: 0.0,
2: 0.0,
3: 0.0,
4: 0.511,
5: 0.571,
6: 0.5329999999999999,
7: 0.5389999999999999},
'ymax': {0: 0.511,
1: 0.571,
2: 0.533,
3: 0.539,
4: 1.0,
5: 1.0,
6: 1.0,
7: 1.0},
'xmin': {0: 0.0,
1: 0.14799999999999996,
2: 0.22400000000000003,
3: 0.5239999999999999,
4: 0.0,
5: 0.14799999999999996,
6: 0.22400000000000003,
7: 0.5239999999999999},
'xmax': {0: 0.148,
1: 0.22399999999999998,
2: 0.524,
3: 1.001,
4: 0.148,
5: 0.22399999999999998,
6: 0.524,
7: 1.001},
'variable': {0: 'A', 1: 'A', 2: 'A', 3: 'A', 4: 'B', 5: 'B', 6: 'B', 7: 'B'}})
Where I plot this:
(ggplot(df, aes(ymin = "ymin", ymax = "ymax",
xmin = "xmin", xmax = "xmax", fill = "variable"))
+ geom_rect(colour = "grey", alpha=0.7))
I'm looking to change the position of the legends to the same to the positions of the plot: blue-up and red-bottom. And A always will be red and B always will be blue
There might be a more standard way to do it, but here is a quick hack to fix your problem:
Change the order of your variable
Assign colors manually (You could also look for exact color codes and replace it with the color names if it matters in your case)
df = df.assign(variable = pd.Categorical(df['variable'], ['B', 'A']))
(ggplot(df, aes(ymin = "ymin", ymax = "ymax",
xmin = "xmin", xmax = "xmax", fill = "variable"))+
geom_rect(colour = "grey", alpha=0.7)+
scale_fill_manual(values = ["blue", "red"]))
output looks like this:
You could set order of levels with df$variable <- factor(df$variable, levels = c("B","A")

Modify plotly layout y-axis

I am working on two datasets on churn classification, my problem is as you can see below on the two graph the y-axis are not on the same scale. Bank stops at 0.8 and telco-europa at 1, I would like to force the y-axis to always display 0, 0.2, 0.4, 0.6, 0.8, 1.
I have used the following code:
and my histogram is based on this tutorial: https://www.kaggle.com/pavanraj159/telecom-customer-churn-prediction and the bank dataset is this one https://www.kaggle.com/shrutimechlearn/churn-modelling
import plotly.graph_objs as go#visualization
import plotly.offline as py#visualization
def output_tracer(metric,color, model_performances) :
tracer = go.Bar(x = model_performances["Algorithm"] ,
y = model_performances[metric],
orientation = "v",name = metric ,
marker = dict(line = dict(width =.7),
color = color)
)
return tracer
def output_data(model_performances):
trace1 = output_tracer("1-Precision","#6699FF", model_performances)
trace2 = output_tracer('1-Recall',"red", model_performances)
trace3 = output_tracer('1-F1-score',"#33CC99", model_performances)
trace4 = output_tracer('Accuracy',"lightgrey", model_performances)
trace5 = output_tracer('AUC',"#FFCC99", model_performances)
data = [trace1,trace2,trace3,trace4,trace5]
return data
def output_layout(model):
layout = go.Layout(dict(title = model,
plot_bgcolor = "rgb(243,243,243)",
paper_bgcolor = "rgb(243,243,243)",
xaxis = dict(gridcolor = 'rgb(255, 255, 255)',
title = "",
zerolinewidth=1,
ticklen=5,gridwidth=2),
yaxis = dict(gridcolor = 'rgb(255, 255, 255)',
zerolinewidth=1,ticklen=5,gridwidth=2),
margin = dict(l = 250),
height = 400
)
)
return layout
model = "Bank"
model_performances = report_df_scoring[report_df_scoring.Dataset == model]
fig = go.Figure(data=output_data(model_performances),layout=output_layout(model))
py.iplot(fig)
And here you can fin the dataframe as a dictionary "report_df_scoring" for only the "Bank" dataset
{'Dataset': {0: 'Bank',
1: 'Bank',
2: 'Bank',
3: 'Bank',
4: 'Bank',
5: 'Bank',
6: 'Bank'},
'Algorithm': {0: 'LogisticRegressionNoSMOTE',
1: 'Logistic Regression',
2: 'SVM-linear',
3: 'SVM-rbf',
4: 'xgboost',
5: 'GaussianNB',
6: 'RandomForest'},
'W-Precision': {0: 0.8159638339642141,
1: 0.8229500536388679,
2: 0.8243426658647828,
3: 0.7956512785333915,
4: 0.8288351219512194,
5: 0.8302513223140496,
6: 0.8307514249037228},
'W-Recall': {0: 0.8324,
1: 0.7636,
2: 0.7628,
3: 0.8056,
4: 0.836,
5: 0.8176,
6: 0.8408},
'W-F1-score': {0: 0.810103868755423,
1: 0.7811452562742854,
2: 0.7807117770916884,
3: 0.7997335148514852,
4: 0.831622605929424,
5: 0.7598757585104978,
6: 0.8336474053248425},
'0-Precision': {0: 0.8493518104604381,
1: 0.9187236604455148,
2: 0.9206541490006056,
3: 0.8634596695821186,
4: 0.8834146341463415,
5: 0.8152892561983471,
6: 0.8789473684210526},
'0-Recall': {0: 0.958627648839556,
1: 0.7699293642785066,
2: 0.7669021190716448,
3: 0.8965691220988901,
4: 0.9137235116044399,
5: 0.9954591321897074,
6: 0.9268415741675076},
'0-F1-score': {0: 0.9006873666745674,
1: 0.8377710678012626,
2: 0.8367740159647675,
3: 0.8797029702970298,
4: 0.8983134920634921,
5: 0.8964107223989097,
6: 0.9022593320235756},
'1-Precision': {0: 0.6882129277566539,
1: 0.4564958283671037,
2: 0.4558303886925795,
3: 0.5361990950226244,
4: 0.62,
5: 0.8875,
6: 0.6463414634146342},
'1-Recall': {0: 0.34942084942084944,
1: 0.7393822393822393,
2: 0.747104247104247,
3: 0.4575289575289575,
4: 0.5386100386100386,
5: 0.13706563706563707,
6: 0.5115830115830116},
'1-F1-score': {0: 0.4635083226632522,
1: 0.5644804716285925,
2: 0.5662033650329188,
3: 0.49375,
4: 0.5764462809917356,
5: 0.2374581939799331,
6: 0.5711206896551725},
'Accuracy': {0: 0.8324,
1: 0.7636,
2: 0.7628,
3: 0.8056,
4: 0.836,
5: 0.8176,
6: 0.8408},
'AUC': {0: 0.6540242491302027,
1: 0.754655801830373,
2: 0.7570031830879459,
3: 0.6770490398139237,
4: 0.7261667751072393,
5: 0.5662623846276723,
6: 0.7192122928752596},
'SMOTE': {0: 'No',
1: 'Yes',
2: 'Yes',
3: 'Yes',
4: 'Yes',
5: 'Yes',
6: 'Yes'},
'top3var': {0: "['numofproducts_4', 'numofproducts_3', 'geography_germany']",
1: "['numofproducts_4', 'numofproducts_3', 'geography_germany']",
2: "['numofproducts_4', 'numofproducts_3', 'age']",
3: "['empty']",
4: "['numofproducts_2', 'numofproducts_1', 'isactivemember']",
5: "['empty']",
6: "['age', 'numofproducts_2', 'balance']"}}
You can access and edit the range of any axis of your figure using:
fig['layout']['yaxis']['range']
And set the range like:
fig['layout']['yaxis']['range'] = [0, 1]
The same thing goes for your tickvals:
fig['layout']['yaxis']['tickvals'] = [0, 0.2, 0.4, 0.6, 0.8, 1]
You can use:
fig.update_yaxes(tickvals=[0, 0.2, 0.4, 0.6, 0.8, 1])
Your code example does not work for me because "report_df_scoring" is missing.

matplotlib generic colormap from tab10

This question is related to this one from SO (matplotlib-change-colormap-tab20-to-have-three-colors)
I would like to tweak the tab10 colormap in a way that I can change the alpha level of each color in as many steps as I would like to. Below is an example (for 9 color with 3 alpha levels) which does not yield the expected output. Furthermore, it is not generic enough (because of the if elif staements).
Any ideas how I could do that ?
In this example, I do have 3 groups with 3 subgroups:
import pandas as pd
from matplotlib import pyplot as plt
import numpy as np
n_feature = 3
sub_feature = 3
col = []
for index in range(n_feature*sub_feature):
# loop over colors and change the last entry in descending order 3 times
col.append(list(plt.cm.tab10(index)))
i = 0
for item in col:
# loop over colors and change the last entry in descending order 3 times
if i == 0:
item[-1] = 0.9
i+=1
elif i == 1:
item[-1] = 0.7
i+=1
elif i == 2:
item[-1] = 0.5
i = 0
gr = df.groupby(['a', 'a1'])
for index, item in enumerate(gr):
name, val = item
y = val.iloc[0,2:].values
x = np.arange(len(y))
plt.plot(x, y, '.-', color=col[index])
plt.show()
This is the data:
{'a': {0: 'A', 1: 'A', 2: 'A', 3: 'B', 4: 'B', 5: 'B', 6: 'C', 7: 'C', 8: 'C'},
'a1': {0: 1, 1: 2, 2: 3, 3: 1, 4: 2, 5: 3, 6: 1, 7: 2, 8: 3},
'b': {0: 1.0,
1: 5.0,
2: 9.0,
3: 1.5,
4: 5.5,
5: 9.5,
6: 1.75,
7: 5.75,
8: 9.75},
'c': {0: 2.0,
1: 6.0,
2: 10.0,
3: 2.5,
4: 6.5,
5: 10.5,
6: 2.75,
7: 6.75,
8: 10.75},
'd': {0: 3.0,
1: 7.0,
2: 11.0,
3: 3.5,
4: 7.5,
5: 11.5,
6: 3.75,
7: 7.75,
8: 11.75},
'e': {0: 4.0,
1: 8.0,
2: 12.0,
3: 4.5,
4: 8.5,
5: 12.5,
6: 4.75,
7: 8.75,
8: 12.75}}
You may use the HSV system to obtain differently saturated and luminated colors for the same hue. Suppose you have at most 10 categories, then the tab10 map can be used to get a certain number of base colors. From those you can choose a couple of lighter shades for the subcategories.
The following would be a function categorical_cmap, which takes as input the number of categories (nc) and the number of subcategories (nsc) and returns a colormap with nc*nsc different colors, where for each category there are nsc colors of same hue.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.colors
def categorical_cmap(nc, nsc, cmap="tab10", continuous=False):
if nc > plt.get_cmap(cmap).N:
raise ValueError("Too many categories for colormap.")
if continuous:
ccolors = plt.get_cmap(cmap)(np.linspace(0,1,nc))
else:
ccolors = plt.get_cmap(cmap)(np.arange(nc, dtype=int))
cols = np.zeros((nc*nsc, 3))
for i, c in enumerate(ccolors):
chsv = matplotlib.colors.rgb_to_hsv(c[:3])
arhsv = np.tile(chsv,nsc).reshape(nsc,3)
arhsv[:,1] = np.linspace(chsv[1],0.25,nsc)
arhsv[:,2] = np.linspace(chsv[2],1,nsc)
rgb = matplotlib.colors.hsv_to_rgb(arhsv)
cols[i*nsc:(i+1)*nsc,:] = rgb
cmap = matplotlib.colors.ListedColormap(cols)
return cmap
c1 = categorical_cmap(4, 3, cmap="tab10")
plt.scatter(np.arange(4*3),np.ones(4*3)+1, c=np.arange(4*3), s=180, cmap=c1)
c2 = categorical_cmap(2, 5, cmap="tab10")
plt.scatter(np.arange(10),np.ones(10), c=np.arange(10), s=180, cmap=c2)
c3 = categorical_cmap(5, 4, cmap="tab10")
plt.scatter(np.arange(20),np.ones(20)-1, c=np.arange(20), s=180, cmap=c3)
plt.margins(y=0.3)
plt.xticks([])
plt.yticks([0,1,2],["(5, 4)", "(2, 5)", "(4, 3)"])
plt.show()

Categories