Emory University, 2024
Notation
Still assumes
\(y_{it} = \delta D_{it} + \gamma_{i} + \gamma_{t} + \epsilon_{it}\) for \(t=1,...,T\) and \(i=1,...,N\)
\(y_{it} = \delta D_{it} + \gamma_{i} + \gamma_{t} + \epsilon_{it}\) for \(t=1,...,T\) and \(i=1,...,N\)
Most common approach (default in most statistical software)
Equivalent to demeaned model: \[y_{it} - \bar{y}_{i} = \delta (D_{it} - \bar{D}_{i}) + (\gamma_{i} - \bar{\gamma}_{i}) + (\gamma_{t} - \bar{\gamma}_{t}) + (\epsilon_{it} - \bar{\epsilon}_{i})\]
\(\gamma_{i} - \bar{\gamma}_{i} = 0\) since \(\gamma_{i}\) is time-invariant
Requires strict exogeneity assumption (error is uncorrelated with \(D_{it}\) for all time periods)
\(y_{it} = \delta D_{it} + \gamma_{i} + \gamma_{t} + \epsilon_{it}\) for \(t=1,...,T\) and \(i=1,...,N\)
Instead of subtracting the mean, subtract the prior period values \[y_{it} - y_{i,t-1} = \delta(D_{it} - D_{i,t-1}) + (\gamma_{i} - \gamma_{i}) + (\gamma_{t} - \gamma_{t-1}) + (\epsilon_{it} - \epsilon_{i,t-1})\]
Requires exogeneity of \(\epsilon_{it}\) and \(D_{it}\) only for time \(t\) and \(t-1\) (weaker assumption than within estimator)
Sometimes useful to estimate both FE and FD just as a check
library(fixest)
library(modelsummary)
library(causaldata)
reg.dat <- causaldata::gapminder %>%
mutate(lgdp_pc=log(gdpPercap))
m1 <- feols(lifeExp ~ lgdp_pc | country, data=reg.dat)
modelsummary(list("Default FE"=m1),
shape=term + statistic ~ model,
gof_map=NA,
coef_rename=c("lgdp_pc"="Log GDP per Capita"))
Default FE | |
---|---|
Log GDP per Capita | 9.769 |
(0.702) |
Stata
library(lmtest)
reg.dat <- causaldata::gapminder %>%
group_by(country) %>%
mutate(lgdp_pc=log(gdpPercap),
lgdp_pc=lgdp_pc - mean(lgdp_pc, na.rm=TRUE),
lifeExp=lifeExp - mean(lifeExp, na.rm=TRUE))
m2 <- lm(lifeExp~ 0 + lgdp_pc , data=reg.dat)
modelsummary(list("Default FE"=m1, "Manual FE"=m2),
shape=term + statistic ~ model,
gof_map=NA,
coef_rename=c("lgdp_pc"="Log GDP per Capita"),
vcov = ~country)
Default FE | Manual FE | |
---|---|---|
Log GDP per Capita | 9.769 | 9.769 |
(0.702) | (0.701) |
Note: feols
defaults to clustering at level of FE, lm
requires our input
library(plm)
reg.dat <- causaldata::gapminder %>%
mutate(lgdp_pc=log(gdpPercap))
m3 <- plm(lifeExp ~ 0 + lgdp_pc, model="fd", index=c("country","year"), data=reg.dat)
modelsummary(list("Default FE"=m1, "Manual FE"=m2, "Default FD"=m3),
shape=term + statistic ~ model,
gof_map=NA,
coef_rename=c("lgdp_pc"="Log GDP per Capita"))
Default FE | Manual FE | Default FD | |
---|---|---|---|
Log GDP per Capita | 9.769 | 9.769 | 5.290 |
(0.702) | (0.284) | (0.291) |
Stata
reg.dat <- causaldata::gapminder %>%
mutate(lgdp_pc=log(gdpPercap)) %>%
group_by(country) %>%
arrange(country, year) %>%
mutate(fd_lifeexp=lifeExp - dplyr::lag(lifeExp),
lgdp_pc=lgdp_pc - dplyr::lag(lgdp_pc)) %>%
na.omit()
m4 <- lm(fd_lifeexp~ 0 + lgdp_pc , data=reg.dat)
modelsummary(list("Default FE"=m1, "Manual FE"=m2, "Default FD"=m3, "Manual FD"=m4),
shape=term + statistic ~ model,
gof_map=NA,
coef_rename=c("lgdp_pc"="Log GDP per Capita"))
Default FE | Manual FE | Default FD | Manual FD | |
---|---|---|---|---|
Log GDP per Capita | 9.769 | 9.769 | 5.290 | 5.290 |
(0.702) | (0.284) | (0.291) | (0.291) |
reg.dat2 <- causaldata::gapminder %>%
mutate(lgdp_pc=log(gdpPercap)) %>%
inner_join(reg.dat %>% select(country, year), by=c("country","year"))
m5 <- feols(lifeExp ~ lgdp_pc | country, data=reg.dat2)
modelsummary(list("Default FE"=m5, "Default FD"=m3, "Manual FD"=m4),
shape=term + statistic ~ model,
gof_map=NA,
coef_rename=c("lgdp_pc"="Log GDP per Capita"))
Default FE | Default FD | Manual FD | |
---|---|---|---|
Log GDP per Capita | 8.929 | 5.290 | 5.290 |
(0.741) | (0.291) | (0.291) |
Don’t want to read too much into this, but…
Want to estimate \(ATT = E[Y_{1}(1)- Y_{0}(1) | D=1]\)
Pre-Period | Post-Period | |
---|---|---|
Treatment | \(E(Y_{0}(0)|D=1)\) | \(E(Y_{1}(1)|D=1)\) |
Control | \(E(Y_{0}(0)|D=0)\) | \(E(Y_{0}(1)|D=0)\) |
Problem: We don’t see \(E[Y_{0}(1)|D=1]\)
Want to estimate \(ATT = E[Y_{1}(1)- Y_{0}(1) | D=1]\)
Pre-Period | Post-Period | |
---|---|---|
Treatment | \(E(Y_{0}(0)|D=1)\) | \(E(Y_{1}(1)|D=1)\) |
Control | \(E(Y_{0}(0)|D=0)\) | \(E(Y_{0}(1)|D=0)\) |
Strategy 1: Estimate \(E[Y_{0}(1)|D=1]\) using \(E[Y_{0}(0)|D=1]\) (before treatment outcome used to estimate post-treatment)
Want to estimate \(ATT = E[Y_{1}(1)- Y_{0}(1) | D=1]\)
Pre-Period | Post-Period | |
---|---|---|
Treatment | \(E(Y_{0}(0)|D=1)\) | \(E(Y_{1}(1)|D=1)\) |
Control | \(E(Y_{0}(0)|D=0)\) | \(E(Y_{0}(1)|D=0)\) |
Strategy 2: Estimate \(E[Y_{0}(1)|D=1]\) using \(E[Y_{0}(1)|D=0]\) (control group used to predict outcome for treatment)
Want to estimate \(ATT = E[Y_{1}(1)- Y_{0}(1) | D=1]\)
Pre-Period | Post-Period | |
---|---|---|
Treatment | \(E(Y_{0}(0)|D=1)\) | \(E(Y_{1}(1)|D=1)\) |
Control | \(E(Y_{0}(0)|D=0)\) | \(E(Y_{0}(1)|D=0)\) |
Strategy 3: DD
Estimate \(E[Y_{1}(1)|D=1] - E[Y_{0}(1)|D=1]\) using \(E[Y_{0}(1)|D=0] - E[Y_{0}(0)|D=0]\) (pre-post difference in control group used to predict difference for treatment group)
Key identifying assumption is that of parallel trends
\[E[Y_{0}(1) - Y_{0}(0)|D=1] = E[Y_{0}(1) - Y_{0}(0)|D=0]\]
\[\begin{align} E[Y_{1}(1) - Y_{0}(1)|D=1] &=& \left( E[Y(1)|D=1] - E[Y(1)|D=0] \right) \\ & & - \left( E[Y(0)|D=1] - E[Y(0)|D=0]\right) \end{align}\]
\[y_{it} = \alpha + \beta D_{i} + \lambda \times Post_{t} + \delta \times D_{i} \times Post_{t} + \varepsilon_{it}\]
Pre | Post | Post - Pre | |
---|---|---|---|
Treatment | \(\alpha + \beta\) | \(\alpha + \beta + \lambda + \delta\) | \(\lambda + \delta\) |
Control | \(\alpha\) | \(\alpha + \lambda\) | \(\lambda\) |
Diff | \(\beta\) | \(\beta + \delta\) | \(\delta\) |
Treated | Period | Mean |
---|---|---|
Control | Pre | 1.483339 |
Control | Post | 2.998502 |
Treated | Pre | 4.515737 |
Treated | Post | 12.034735 |
In this example:
So the ATT is 6.0038352
Stata
insheet using "data/acs_medicaid.txt", clear
gen perc_unins=uninsured/adult_pop
keep if expand_year=="2014" | expand_year=="NA"
drop if expand_ever=="NA"
collapse (mean) perc_unins, by(year expand_ever)
graph twoway (connected perc_unins year if expand_ever=="FALSE", color(black) lpattern(solid)) ///
(connected perc_unins year if expand_ever=="TRUE", color(black) lpattern(dash)), ///
xline(2013.5) ///
ytitle("Fraction Uninsured") xtitle("Year") legend(off) text(0.15 2017 "Non-expansion", place(e)) text(0.08 2017 "Expansion", place(e))
R
library(tidyverse)
mcaid.data <- read_tsv("../files/data/acs/acs_medicaid.txt")
ins.plot.dat <- mcaid.data %>% filter(expand_year==2014 | is.na(expand_year), !is.na(expand_ever)) %>%
mutate(perc_unins=uninsured/adult_pop) %>%
group_by(expand_ever, year) %>% summarize(mean=mean(perc_unins))
ins.plot <- ggplot(data=ins.plot.dat, aes(x=year,y=mean,group=expand_ever,linetype=expand_ever)) +
geom_line() + geom_point() + theme_bw() +
geom_vline(xintercept=2013.5, color="red") +
geom_text(data = ins.plot.dat %>% filter(year == 2016),
aes(label = c("Non-expansion","Expansion"),
x = year + 1,
y = mean)) +
guides(linetype="none") +
labs(
x="Year",
y="Fraction Uninsured",
title="Share of Uninsured over Time"
)
Interested in \(\delta\) from:
\[y_{it} = \alpha + \beta \times Post_{t} + \lambda \times Expand_{i} + \delta \times Post_{t} \times Expand_{i} + \varepsilon_{it}\]
Stata
R
library(tidyverse)
library(modelsummary)
mcaid.data <- read_tsv("../files/data/acs/acs_medicaid.txt")
reg.dat <- mcaid.data %>% filter(expand_year==2014 | is.na(expand_year), !is.na(expand_ever)) %>%
mutate(perc_unins=uninsured/adult_pop,
post = (year>=2014),
treat=post*expand_ever)
dd.ins.reg <- lm(perc_unins ~ post + expand_ever + post*expand_ever, data=reg.dat)
Want to estimate \(\delta\):
\[y_{it} = \alpha + \delta D_{it} + \gamma_{i} + \gamma_{t} + \varepsilon_{it},\]
where \(\gamma_{i}\) and \(\gamma_{t}\) denote a set of unit \(i\) and time period \(t\) dummy variables (or fixed effects).
2x2 DD
library(tidyverse)
library(modelsummary)
mcaid.data <- read_tsv("../files/data/acs/acs_medicaid.txt")
reg.dat <- mcaid.data %>% filter(expand_year==2014 | is.na(expand_year), !is.na(expand_ever)) %>%
mutate(perc_unins=uninsured/adult_pop,
post = (year>=2014),
treat=post*expand_ever)
m.dd <- lm(perc_unins ~ post + expand_ever + treat, data=reg.dat)
Event study is poorly named:
Estimate something akin to… \[y_{it} = \gamma_{i} + \gamma_{t} + \sum_{\tau = -q}^{-2}\delta_{\tau} D_{i \tau} + \sum_{\tau=0}^{m} \delta_{\tau}D_{i \tau} + \beta x_{it} + \epsilon_{it},\]
where \(q\) captures the number of periods before the treatment occurs and \(m\) captures periods after treatment occurs.
Essentially two “flavors” of event studies
Stata
ssc install reghdfe
insheet using "data/acs_medicaid.txt", clear
gen perc_unins=uninsured/adult_pop
keep if expand_year=="2014" | expand_year=="NA"
drop if expand_ever=="NA"
gen post=(year>=2014)
gen treat=(expand_ever=="TRUE")
gen treat_post=(expand=="TRUE")
reghdfe perc_unins treat##ib2013.year, absorb(state)
gen coef = .
gen se = .
forvalues i = 2012(1)2018 {
replace coef = _b[1.treat#`i'.year] if year == `i'
replace se = _se[1.treat#`i'.year] if year == `i'
}
* Make confidence intervals
gen ci_top = coef+1.96*se
gen ci_bottom = coef - 1.96*se
* Limit ourselves to one observation per year
keep year coef se ci_*
duplicates drop
* Create connected scatterplot of coefficients
* with CIs included with rcap
* and a line at 0 from function
twoway (sc coef year, connect(line)) (rcap ci_top ci_bottom year) ///
(function y = 0, range(2012 2018)), xtitle("Year") ///
caption("Estimates and 95% CI from Event Study")
R
library(tidyverse)
library(modelsummary)
library(fixest)
mcaid.data <- read_tsv("../files/data/acs/acs_medicaid.txt")
reg.dat <- mcaid.data %>%
filter(expand_year==2014 | is.na(expand_year), !is.na(expand_ever)) %>%
mutate(perc_unins=uninsured/adult_pop,
post = (year>=2014),
treat=post*expand_ever)
mod.twfe <- feols(perc_unins~i(year, expand_ever, ref=2013) | State + year,
cluster=~State,
data=reg.dat)
Stata
ssc install reghdfe
insheet using "data/acs_medicaid.txt", clear
gen perc_unins=uninsured/adult_pop
drop if expand_ever=="NA"
replace expand_year="." if expand_year=="NA"
destring expand_year, replace
gen event_time=year-expand_year
replace event_time=-1 if event_time==.
forvalues l = 0/4 {
gen L`l'event = (event_time==`l')
}
forvalues l = 1/2 {
gen F`l'event = (event_time==-`l')
}
gen F3event=(event_time<=-3)
reghdfe perc_unins F3event F2event L0event L1event L2event L3event L4event, absorb(state year) cluster(state)
gen coef = .
gen se = .
forvalues i = 2(1)3 {
replace coef = _b[F`i'event] if F`i'event==1
replace se = _se[F`i'event] if F`i'event==1
}
forvalues i = 0(1)4 {
replace coef = _b[L`i'event] if L`i'event==1
replace se = _se[L`i'event] if L`i'event==1
}
replace coef = 0 if F1event==1
replace se=0 if F1event==1
* Make confidence intervals
gen ci_top = coef+1.96*se
gen ci_bottom = coef - 1.96*se
* Limit ourselves to one observation per year
keep if event_time>=-3 & event_time<=4
keep event_time coef se ci_*
duplicates drop
* Create connected scatterplot of coefficients
* with CIs included with rcap
* and a line at 0 from function
sort event_time
twoway (sc coef event_time, connect(line)) (rcap ci_top ci_bottom event_time) ///
(function y = 0, range(-3 4)), xtitle("Time") ///
caption("Estimates and 95% CI from Event Study") xlabel(-3(1)4)
R
library(tidyverse)
library(modelsummary)
library(fixest)
mcaid.data <- read_tsv("../files/data/acs/acs_medicaid.txt")
reg.dat <- mcaid.data %>%
filter(!is.na(expand_ever)) %>%
mutate(perc_unins=uninsured/adult_pop,
post = (year>=2014),
treat=post*expand_ever,
time_to_treat = ifelse(expand_ever==FALSE, 0, year-expand_year),
time_to_treat = ifelse(time_to_treat < -3, -3, time_to_treat))
mod.twfe <- feols(perc_unins~i(time_to_treat, expand_ever, ref=-1) | State + year,
cluster=~State,
data=reg.dat)
Consider standard TWFE specification with a single treatment coefficient, \[y_{it} = \alpha + \delta D_{it} + \gamma_{i} + \gamma_{t} + \varepsilon_{it}.\] We can decompose \(\hat{\delta}\) into three things:
\[\hat{\delta}_{twfe} = \text{VW} ATT + \text{VW} PT - \Delta ATT\]
Problems come from heterogeneous effects and staggered treatment timing
Only consider “clean” comparisons:
\[\small \hat{\delta}^{reg} = E[Y_{t=1}|D=1] - \left[ E[Y_{t=0}|D=1] + \frac{1}{n^{T}} \sum_{i \in N_{d=1}} \left(\hat{\mu}_{d=0, t=1}(X_{i}) - \hat{\mu}_{d=0, t=0}(X_{i})\right) \right],\]
where \(\hat{\mu}_{d,t}\) is the prediction from a regression among the untreated group using baseline covariates.
\[\hat{\delta}^{ipw} = E \left[\frac{D - \hat{p}(D=1|X)}{1-\hat{p}(D=1|X)} \frac{Y_{t=1} - Y_{t=0}}{P(D=1)} \right]\]
\[\scriptsize \hat{\delta}^{dr} = E \left[ \left(\frac{D}{P(D=1)} - \frac{\frac{\hat{p}(X)(1-D)}{1-\hat{p}(X)}}{E\left[\frac{\hat{p}(X)(1-D)}{1-\hat{p}(X)} \right]} \right) \left(E[Y_{t=1}|D=1] - E[Y_{t=0}|D=1] - \Delta \hat{\mu}_{0}(X)\right) \right]\]
I’ll organize this into three types of estimators:
Group-specific treatment effects:
\[ATT(g,t) = E[Y_{1,t} - Y_{0,t} | G_{g}=1],\] where \(G\) denotes all feasible groups or cohorts (e.g., \(g=1\) could mean states expanding in 2014, \(g=2\) denotes states expanding in 2015, etc.)
With this \((g,t)\) notation, CS show:
\[\scriptsize ATT(g,t; \tau)^{DR} = E \left[ \left(\frac{G_{g}}{P(G_{g}=1)} - \frac{\frac{\hat{p}_{g}(X)C}{1-\hat{p}_{g}(X)}}{E\left[\frac{\hat{p}_{g}(X)C}{1-\hat{p}_{g}(X)} \right]} \right) \left(E[Y_{t}|G_{g}=1] - E[Y_{g-\tau-1}|G_{g}=1] - \Delta \hat{\mu}_{g,t,\tau}(X)\right) \right]\]
Finally, aggregate all of the \((g,t)\) treatment effects:
\[\hat{\delta} = \sum_{g \in \mathcal{G}} \sum_{t=2}^{\mathcal{T}} w(g,t) \times ATT(g,t)\]
Stata
ssc install csdid
ssc install event_plot
ssc install drdid
insheet using "data/acs_medicaid.txt", clear
gen perc_unins=uninsured/adult_pop
egen stategroup=group(state)
drop if expand_ever=="NA"
replace expand_year="0" if expand_year=="NA"
destring expand_year, replace
csdid perc_unins, ivar(stategroup) time(year) gvar(expand_year) notyet
estat event, estore(cs)
event_plot cs, default_look graph_opt(xtitle("Periods since the event") ytitle("Average causal effect") xlabel(-6(1)4) title("Callaway and Sant'Anna (2020)")) stub_lag(T+#) stub_lead(T-#) together
R
library(tidyverse)
library(did)
library(DRDID)
mcaid.data <- read_tsv("../files/data/acs/acs_medicaid.txt")
reg.dat <- mcaid.data %>%
filter(!is.na(expand_ever)) %>%
mutate(perc_unins=uninsured/adult_pop,
post = (year>=2014),
treat=post*expand_ever,
expand_year=ifelse(is.na(expand_year),0,expand_year)) %>%
filter(!is.na(perc_unins)) %>%
group_by(State) %>%
mutate(stategroup=cur_group_id()) %>% ungroup()
mod.cs <- att_gt(yname="perc_unins", tname="year", idname="stategroup",
gname="expand_year",
data=reg.dat, panel=TRUE, est_method="dr",
allow_unbalanced_panel=TRUE)
mod.cs.event <- aggte(mod.cs, type="dynamic")
\[y_{it} = \gamma_{i} + \gamma_{t} + \sum_{g} \sum_{\tau \neq -1} \delta_{g \tau} \times \text{1}(i \in C_{g}) \times D_{it}^{\tau} + \beta x_{it} + \epsilon_{it}\]
\[y_{it} = \gamma_{i} + \gamma_{t} + \sum_{g} \sum_{\tau \neq -1} \delta_{g \tau} \times \text{1}(i \in C_{g}) \times D_{it}^{\tau} + \beta x_{it} + \epsilon_{it}\]
\[y_{it} = \gamma_{i} + \gamma_{t} + \sum_{g} \sum_{\tau \neq -1} \delta_{g \tau} \times \text{1}(i \in C_{g}) \times D_{it}^{\tau} + \beta x_{it} + \epsilon_{it}\]
Stata
ssc install eventstudyinteract
ssc install avar
ssc install event_plot
insheet using "data/acs_medicaid.txt", clear
gen perc_unins=uninsured/adult_pop
drop if expand_ever=="NA"
egen stategroup=group(state)
replace expand_year="." if expand_year=="NA"
destring expand_year, replace
gen event_time=year-expand_year
gen nevertreated=(event_time==.)
forvalues l = 0/4 {
gen L`l'event = (event_time==`l')
}
forvalues l = 1/2 {
gen F`l'event = (event_time==-`l')
}
gen F3event=(event_time<=-3)
eventstudyinteract perc_unins F3event F2event L0event L1event L2event L3event L4event, vce(cluster stategroup) absorb(stategroup year) cohort(expand_year) control_cohort(nevertreated)
event_plot e(b_iw)#e(V_iw), default_look graph_opt(xtitle("Periods since the event") ytitle("Average causal effect") xlabel(-3(1)4) title("Sun and Abraham (2020)")) stub_lag(L#event) stub_lead(F#event) plottype(scatter) ciplottype(rcap) together
R
library(tidyverse)
library(modelsummary)
library(fixest)
mcaid.data <- read_tsv("../files/data/acs/acs_medicaid.txt")
reg.dat <- mcaid.data %>%
filter(!is.na(expand_ever)) %>%
mutate(perc_unins=uninsured/adult_pop,
post = (year>=2014),
treat=post*expand_ever,
expand_year = ifelse(expand_ever==FALSE, 10000, expand_year),
time_to_treat = ifelse(expand_ever==FALSE, -1, year-expand_year),
time_to_treat = ifelse(time_to_treat < -4, -4, time_to_treat))
mod.sa <- feols(perc_unins~sunab(expand_year, time_to_treat) | State + year,
cluster=~State,
data=reg.dat)
New paper from Callaway, Goodman-Bacon, and Sant’Anna also looks at DD with continuous treatment
Stata
ssc install did_multiplegt
ssc install event_plot
insheet using "data/acs_medicaid.txt", clear
gen perc_unins=uninsured/adult_pop
drop if expand_ever=="NA"
egen stategroup=group(state)
replace expand_year="." if expand_year=="NA"
destring expand_year, replace
gen event_time=year-expand_year
gen nevertreated=(event_time==.)
gen treat=(event_time>=0 & event_time!=.)
did_multiplegt perc_unins stategroup year treat, robust_dynamic dynamic(4) placebo(3) breps(100) cluster(stategroup)
event_plot e(estimates)#e(variances), default_look graph_opt(xtitle("Periods since the event") ytitle("Average causal effect") ///
title("de Chaisemartin and D'Haultfoeuille (2020)") xlabel(-3(1)4)) stub_lag(Effect_#) stub_lead(Placebo_#) together
R(not the same as in Stata)
library(DIDmultiplegt)
mcaid.data <- read_tsv("../files/data/acs/acs_medicaid.txt")
reg.dat <- mcaid.data %>%
filter(!is.na(expand_ever)) %>%
mutate(perc_unins=uninsured/adult_pop,
treat=case_when(
expand_ever==FALSE ~ 0,
expand_ever==TRUE & expand_year<year ~ 0,
expand_ever==TRUE & expand_year>=year ~ 1))
mod.ch <- did_multiplegt(df=reg.dat, Y="perc_unins", G="State", T="year", D="treat",
placebo=4, dynamic=5, brep=50, cluster='State', covariance=TRUE,
average_effect="simple")
library(broom)
# Create a tidier for "multiplegt" objects
tidy.did_multiplegt = function(x, level = 0.95) {
ests = x[grepl("^placebo_|^effect|^dynamic_", names(x))]
ret = data.frame(
term = names(ests),
estimate = as.numeric(ests),
std.error = as.numeric(x[grepl("^se_placebo|^se_effect|^se_dynamic", names(x))]),
N = as.numeric(x[grepl("^N_placebo|^N_effect|^N_dynamic", names(x))])
) |>
# For CIs we'll assume standard normal distribution
within({
conf.low = estimate - std.error*(qnorm(1-(1-level)/2))
conf.high = estimate + std.error*(qnorm(1-(1-level)/2))
})
return(ret)
}
tidy.mod.ch <- tidy.did_multiplegt(mod.ch)
library(ggplot2)
theme_set(theme_minimal(base_family = "ArialNarrow")) # Optional
tidy.mod.ch |>
within({
term = gsub("^placebo_", "-", term)
term = gsub("^effect", "0", term)
term = gsub("^dynamic_", "", term)
term = as.integer(term)
}) |>
ggplot(aes(x = term, y = estimate, ymin = conf.low, ymax = conf.high)) +
geom_pointrange() +
labs(
x = "Time to treatment", y = "Effect size", title = "CH Event-study plot"
)
Some barriers to this estimator in practice (at least, as implemented in R
right now)
\[y_{itg} = \sum_{\tau=-\kappa_{a}}^{\kappa_{b}} \delta_{\tau} \times D_{ig} \times 1(t-\omega_{g} = \tau) + \gamma_{ig} + \gamma_{\tau g} + \varepsilon_{itg}\]
Note: Estimate with GMM to account for first-stage prediction
Stata
did2s
R
library(did2s)
mcaid.data <- read_tsv("../files/data/acs/acs_medicaid.txt")
reg.dat <- mcaid.data %>%
filter(!is.na(expand_ever)) %>%
mutate(perc_unins=uninsured/adult_pop,
post = (year>=2014),
treat=post*expand_ever,
expand_year = ifelse(expand_ever==FALSE, 10000, expand_year),
time_to_treat = ifelse(expand_ever==FALSE, -1, year-expand_year),
time_to_treat = ifelse(time_to_treat < -3, -3, time_to_treat))
mod.2s <- did2s(reg.dat, yname="perc_unins",
treatment="treat",
first_stage = ~ 0 | State + year,
second_stage = ~i(time_to_treat, ref=-1),
cluster_var="State")
Stata
did_imputation
R
library(didimputation)
mcaid.data <- read_tsv("../files/data/acs/acs_medicaid.txt")
reg.dat <- mcaid.data %>%
filter(!is.na(expand_ever)) %>%
mutate(perc_unins=uninsured/adult_pop,
post = (year>=2014),
treat=post*expand_ever,
expand_year = ifelse(expand_ever==FALSE, 0, expand_year))
mod.bea <- did_imputation(reg.dat, yname="perc_unins",
gname="expand_year",
tname="year",
idname="State",
first_stage = ~ 0 | State + year,
cluster_var="State",
horizon=TRUE,
pretrends=-3:-1)
coef.bea <- mod.bea %>%
select(rel_year = term, estimate, std.error) %>%
mutate(
ci_lower = estimate - 1.96 * std.error,
ci_upper = estimate + 1.96 * std.error,
group = "Borusyak et al Imputation",
rel_year = as.numeric(rel_year)
)
ggplot(coef.bea) +
geom_hline(yintercept = 0, linetype = "dashed") +
geom_vline(xintercept = -0.5, linetype = "dashed") +
geom_linerange(mapping = aes(x = rel_year, ymin = ci_lower, ymax = ci_upper), color = "grey30") +
geom_point(mapping = aes(x = rel_year, y = estimate, color = group), size = 2) +
scale_x_continuous(breaks = -3:5, minor_breaks = NULL) +
scale_y_continuous(minor_breaks = NULL) +
labs(x = "Relative Time", y = "Estimate", color = NULL, title = NULL) +
theme_bw() + theme(legend.position="none")
Goodman-Bacon (2021) explores the problems but doesn’t really propose a solution (still very important work though!)
library(tidyverse)
library(modelsummary)
library(fixest)
mcaid.data <- read_tsv("../files/data/acs/acs_medicaid.txt")
reg.dat <- mcaid.data %>%
filter(!is.na(expand_ever)) %>%
mutate(perc_unins=uninsured/adult_pop,
post = (year>=2014),
treat=post*expand_ever,
time_to_treat = ifelse(expand_ever==FALSE, 0, year-expand_year),
time_to_treat = ifelse(time_to_treat < -3, -3, time_to_treat))
mod.twfe <- feols(perc_unins~i(time_to_treat, expand_ever, ref=-1) | State + year,
cluster=~State,
data=reg.dat)
coef.twfe <- tidy(mod.twfe) %>%
mutate(term=str_replace(term,"time_to_treat::",""),
term=str_replace(term,":expand_ever","")) %>%
rename(rel_year=term) %>%
select(rel_year, estimate, std.error) %>%
bind_rows(tibble(rel_year="-1", estimate=0, std.error=0)) %>%
mutate(
ci_lower = estimate - 1.96 * std.error,
ci_upper = estimate + 1.96 * std.error,
group = "TWFE",
rel_year = as.numeric(rel_year)
) %>%
filter(rel_year>=-3, rel_year<=5)
coef.2s <- tidy(mod.2s) %>%
mutate(term=str_replace(term,"time_to_treat::","")) %>%
rename(rel_year=term) %>%
select(rel_year, estimate, std.error) %>%
bind_rows(tibble(rel_year="-1", estimate=0, std.error=0)) %>%
mutate(
ci_lower = estimate - 1.96 * std.error,
ci_upper = estimate + 1.96 * std.error,
group = "Gardner 2SDiD",
rel_year = as.numeric(rel_year)
)
coef.sa <- tidy(mod.sa) %>%
mutate(term=str_replace(term,"time_to_treat::","")) %>%
rename(rel_year=term) %>%
select(rel_year, estimate, std.error) %>%
bind_rows(tibble(rel_year="-1", estimate=0, std.error=0)) %>%
mutate(
ci_lower = estimate - 1.96 * std.error,
ci_upper = estimate + 1.96 * std.error,
group = "Sun and Abraham",
rel_year = as.numeric(rel_year)
) %>%
filter(rel_year>=-3, rel_year<=5)
coef.cs <- tidy(mod.cs.event) %>%
select(rel_year=event.time, estimate, ci_lower=conf.low, ci_upper=conf.high) %>%
mutate(rel_year=as.numeric(rel_year),
group = "Callaway and Sant'Anna") %>%
filter(rel_year>=-3, rel_year<=5)
coef.cs <- as_tibble(coef.cs)
coef.all <- bind_rows(coef.twfe, coef.cs, coef.sa, coef.2s, coef.bea) %>%
select(rel_year, estimate, ci_lower, ci_upper, group)
ggplot(coef.all, aes(x=rel_year, y=estimate)) +
geom_point(aes(color=group), size = 2, position=position_dodge(width=0.7)) +
geom_linerange(aes(ymin = ci_lower, ymax = ci_upper), position=position_dodge2(width=0.7)) +
geom_hline(yintercept = 0, linetype = "dashed") +
geom_vline(xintercept = -0.5, linetype = "dashed") +
theme(legend.position="bottom") +
guides(color=guide_legend(ncol=2, title=NULL)) +
scale_x_continuous(breaks = -3:5, minor_breaks = NULL) +
scale_y_continuous(minor_breaks = NULL) +
labs(x = "Relative Time", y = "Estimate", color = NULL, title = NULL)
Similarities
Differences