Code
library(dplyr)
library(ggplot2)
library(patchwork)
library(fixest)
color_base <- "#007AB7"
color_accent <- "#E69F00"
set.seed(1234)library(dplyr)
library(ggplot2)
library(patchwork)
library(fixest)
color_base <- "#007AB7"
color_accent <- "#E69F00"
set.seed(1234)Consider a hypothetical setting where a policy is implemented in 2010 that reduces human capital loss due to childbirth \(\delta\) by a fraction \(\tau\) for treated groups.
This policy affects not only the market outcome after childbirth but also the timing of childbirth itself. Specifically, groups that are treated under the policy tend to have children earlier due to the reduced penalty.
1. Group Types and Policy
For groups \(g=1,\dots,G\), let the type \(U_g\) and policy \(W_g\) be defined as
\[ U_g \sim \mathcal N(0,\sigma_u^2),\qquad W_g \sim \mathrm{Bernoulli}\left(1/2\right). \]
By construction, \(U_g \perp W_g\). Note that this \(W_g\) is randomly assigned at the group level.
2. Policy-Affected Event Timing
At each age \(a = 20, \dots, 40\), individual \(i\) in group \(g\) who has not yet had a child gives birth with conditional probability (hazard):
\[ h_{g, i, a} = \text{logit}^{-1}\left(\beta_0 - \lambda |a - \mu_g| + \alpha_2 W_g \cdot \mathbb{1}\{a + B_{g,i} \ge 2010\}\right), \]
where \(\mu_g = 30 + \alpha_1 U_g\) is the group-specific peak age and \(\alpha_2 > 0\) represents the policy effect that uniformly increases the hazard at all ages. The age at first childbirth \(A_{g,i}\) is the first age at which the event occurs; individuals who do not give birth by age 40 are treated as childless.
3. Market Outcome
Human capital \(H_{g, i, t}\) is defined as
\[ H_{g, i, t} = t - B_{g, i} - 20 - \delta(1 - \tau W_g \cdot \mathbb{1}\{t \ge 2010\})\mathbb{1}\left\{t \ge E_{g, i}\right\}. \]
Wage is generated as
\[ \log w_{g, i, t} = \gamma_{g, i} + \gamma_1 H_{g, i, t} + \gamma_2 H_{g, i, t}^2 + \nu_{g, i, t}, \]
where \(\gamma_{g, i} \sim \mathcal{N}(\gamma_0 U_g, \sigma_{\gamma}^2)\) and \(\nu_{g, i, t} \sim \mathcal{N}(0, \sigma_{\nu}^2)\).
N = 100
n_g <- 50
b_start <- 1960
b_end <- 2000
t_start <- 2000
t_end <- 2025
n_b <- b_end - b_start + 1
n_t <- t_end - t_start + 1
sigma_u <- 1.0
alpha1 <- 2.0
alpha2 <- 0.5
beta0 <- -0.5
lambda <- 0.5
ages <- 20:40
gamma0 <- 0.1
gamma1 <- 0.07
gamma2 <- -0.001
sigma_u <- 1.0
sigma_gamma <- 0.3
sigma_nu <- 0.1
delta <- 3.0
tau <- 0.5
df <- tibble(
g = 1:n_g,
U = rnorm(n_g, 0, sigma_u),
W = rep(c(FALSE, TRUE), each = n_g %/% 2),
) |>
tidyr::uncount(weights = N * n_b, .id = "i") |>
mutate(
B = rep(b_start:b_end, each = n_g * N),
gamma_gi = rnorm(n(), gamma0 * U, sigma_gamma)
) |>
mutate(
A = {
u_val <- U[1]
w_val <- W[1]
mu_g <- 30 + alpha1 * u_val
n_ind <- n()
A_out <- rep(NA_integer_, n_ind)
for (a in ages) {
idx <- which(is.na(A_out))
if (length(idx) == 0) {
break
}
logit_h <- beta0 -
lambda * abs(a - mu_g) +
alpha2 * w_val * (a + B[idx] >= 2010)
h <- plogis(logit_h)
born <- runif(length(idx)) < h
A_out[idx[born]] <- a
}
A_out
},
.by = g
) |>
filter(!is.na(A)) |>
tidyr::uncount(weights = n_t, .id = "t") |>
mutate(
id = N * (g - 1) + i,
t = t_start + t - 1,
E = A + B,
k = t - E,
H = t - B - 20 - delta * (1 - tau * W * (t >= 2010)) * (k >= 0),
Y = exp(gamma_gi + gamma1 * H + gamma2 * H^2 + rnorm(n(), 0, sigma_nu)),
age = t - B,
W = W & t >= 2010
) |>
filter(between(age, 20, 60))df |>
mutate(
lbl = factor(
ifelse(W, "Treated", "Untreated"),
levels = c("Treated", "Untreated")
)
) |>
ggplot(aes(x = A, color = lbl)) +
geom_freqpoly(aes(y = after_stat(density)), bins = length(ages)) +
scale_color_manual(values = c(color_base, color_accent)) +
labs(x = "Age at First Childbirth", y = "Density", color = NULL) +
theme_minimal() +
theme(
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
legend.position = "inside",
legend.position.inside = c(0.9, 0.5)
)