3  Policy Evaluation for Child Penalty

Code
library(dplyr)
library(ggplot2)
library(patchwork)
library(fixest)

color_base <- "#007AB7"
color_accent <- "#E69F00"
set.seed(1234)

Consider a hypothetical setting where a policy is implemented in 2010 that reduces human capital loss due to childbirth \(\delta\) by a fraction \(\tau\) for treated groups.

This policy affects not only the market outcome after childbirth but also the timing of childbirth itself. Specifically, groups that are treated under the policy tend to have children earlier due to the reduced penalty.

3.1 Data Generating Process

Group Types and Policy

For groups \(g=1,\dots,G\), let the type \(U_g\) and policy \(W_g\) be defined as

\[ U_g \sim \mathcal N(0,\sigma_u^2),\qquad W_g \sim \mathrm{Uniform}(0, 1). \]

By construction, \(U_g \perp W_g\). Note that this \(W_g\) is randomly assigned at the group level.

Policy-Affected Event Timing

At each age \(a = 20, \dots, 40\), individual \(i\) in group \(g\) who has not yet had a child gives birth with conditional probability (hazard):

\[ h_{g, i, a} = \text{logit}^{-1}\left(\beta_0 - \lambda |a - \mu_g| + \alpha_2 W_g \cdot \mathbb{1}\{a + B_{g,i} \ge 2010\}\right), \]

where \(\mu_g = 30 + \alpha_1 U_g\) is the group-specific peak age and \(\alpha_2 > 0\) represents the policy effect that uniformly increases the hazard at all ages. The age at first childbirth \(A_{g,i}\) is the first age at which the event occurs; individuals who do not give birth by age 40 are excluded from the sample. If an individual’s first childbirth occurs after the end of the observation period (\(B_{g,i} + A_{g, i} > 2025\)), both \(A_{g,i}\) and \(E_{g,i}\) are treated as unobserved, and the relative time to event is set to \(k_{g,i,t} = -\infty\).

Market Outcome

Human capital \(H_{g, i, t}\) is defined as

\[ H_{g, i, t} = t - B_{g, i} - 20 - \delta(1 - \tau W_g \cdot \mathbb{1}\{t \ge 2010\})\mathbb{1}\left\{t \ge E_{g, i}\right\}. \]

Wage is generated as

\[ \log w_{g, i, t} = \gamma_{g, i} + \gamma_1 H_{g, i, t} + \gamma_2 H_{g, i, t}^2 + \nu_{g, i, t}, \]

where \(\gamma_{g, i} \sim \mathcal{N}(\gamma_0 U_g, \sigma_{\gamma}^2)\) and \(\nu_{g, i, t} \sim \mathcal{N}(0, \sigma_{\nu}^2)\).

3.2 Data Generation

We use the same parameters as in Chapter 2, with the following additions: the policy effect on hazard \(\alpha_2 = 2.0\), the policy effect on human capital loss \(\tau = 0.5\), and a larger sample size \(N = 500\) (compared to \(N = 100\) in Chapter 2) to ensure sufficient statistical power for detecting the policy effect.

Code
N = 500
n_g <- 50
b_start <- 1960
b_end <- 2005
t_start <- 2000
t_end <- 2025
n_b <- b_end - b_start + 1
n_t <- t_end - t_start + 1

sigma_u <- 1.0
alpha1 <- 2.0
alpha2 <- 2.0
beta0 <- -0.5
lambda <- 0.5
ages <- 20:40

gamma0 <- 0.1
gamma1 <- 0.07
gamma2 <- -0.001

sigma_u <- 1.0
sigma_gamma <- 0.3
sigma_nu <- 0.1
delta <- 3.0
tau <- 0.5

df <- tibble(
  g = 1:n_g,
  U = rnorm(n_g, 0, sigma_u),
  W_g = runif(n_g),
) |>
  tidyr::uncount(weights = N * n_b, .id = "i") |>
  mutate(
    B = rep(rep(b_start:b_end, each = N), n_g),
    gamma_gi = rnorm(n(), gamma0 * U, sigma_gamma)
  ) |>
  mutate(
    A = {
      u_val <- U[1]
      w_val <- W_g[1]
      mu_g <- 30 + alpha1 * u_val
      n_ind <- n()
      A_out <- rep(NA_integer_, n_ind)
      for (a in ages) {
        idx <- which(is.na(A_out))
        if (length(idx) == 0) {
          break
        }
        logit_h <- beta0 -
          lambda * abs(a - mu_g) +
          alpha2 * w_val * (a + B[idx] >= 2010)
        h <- plogis(logit_h)
        born <- runif(length(idx)) < h
        A_out[idx[born]] <- a
      }
      A_out
    },
    .by = g
  ) |>
  filter(!is.na(A)) |>
  tidyr::uncount(weights = n_t, .id = "t") |>
  mutate(
    id = (g - 1) * N * n_b + i,
    t = t_start + t - 1,
    E = A + B,
    k = t - E,
    A = ifelse(E <= t_end, A, NA_integer_),
    E = ifelse(E <= t_end, E, NA_integer_),
    k = ifelse(!is.na(E), k, -Inf),
    H = t - B - 20 - delta * (1 - tau * W_g * (t >= 2010)) * (k >= 0),
    Y = exp(gamma_gi + gamma1 * H + gamma2 * H^2 + rnorm(n(), 0, sigma_nu)),
    age = t - B,
    W = W_g * (t >= 2010)
  ) |>
  filter(between(age, 20, 60))

Binary Treatment Indicator

The 1-step method requires a binary treatment indicator, but \(W_g\) is a continuous variable drawn from \(\mathrm{Uniform}(0, 1)\). To apply this method, we discretize \(W_g\) by splitting groups at the median: groups with \(W_g\) above the median are classified as treated, and those below as untreated.

Policy Effect on Event Timing

Code
# Median of W_g
w_median <- median(unique(df$W_g))

df |>
  distinct(id, .keep_all = TRUE) |>
  filter(B >= 1990, !is.na(A)) |>
  mutate(
    lbl = if_else(W_g > w_median, "High W", "Low W"),
    lbl = factor(lbl, levels = c("Low W", "High W"))
  ) |>
  ggplot(aes(x = A, color = lbl)) +
  geom_freqpoly(
    aes(y = after_stat(density)),
    binwidth = 1,
    linewidth = 1
  ) +
  scale_color_manual(values = c(color_base, color_accent)) +
  labs(x = "Age at First Childbirth", y = "Density", color = NULL) +
  theme_minimal() +
  theme(
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
    legend.position = "inside",
    legend.position.inside = c(0.9, 0.8),
    legend.background = element_rect(fill = "white", colour = NA)
  )
Figure 3.1: Distribution of Age at First Childbirth

Figure 3.1 compares the distribution of age at first childbirth by policy intensity \(W_g\) (above/below median) for cohorts born in 1990 or later. Individuals in these cohorts reach the fertile window (ages 20–40) during 2010–2040, so the policy is active throughout their entire childbearing period. Consistent with the policy incentive \(\alpha_2 > 0\), High W groups give birth earlier than Low W groups.

Note that we condition on individuals whose first childbirth falls within the observation period (filter(!is.na(A))), excluding not-yet-treated individuals whose event occurs after 2025. Since the policy raises the birth hazard at every age, it also reduces the childless rate: groups with higher \(W_g\) are less likely to remain childless. This selection effect means that the treated cohort includes individuals who, absent the policy, would not have given birth at all, and who tend to give birth relatively late. As a result, the visible leftward shift due to the policy is attenuated by the inclusion of these late or marginal births.

3.3 Estimation

To isolate the policy effect cleanly, we restrict the estimation sample to cohorts born in 1990 or later (\(B \ge 1990\)). These individuals turn 20 at \(t \ge 2010\), so the policy is active throughout their entire childbearing window.

1-Step Method (GMM)

The 1-step method extends Equation 2.2 to include the treatment interaction:

\[ \begin{aligned} Y_{g, i, t} &= \alpha_{g, i} + \lambda_{B(g, i), g, t} \\ &+ \sum_{k \neq -1, -\infty} \beta_k \mathbb{1}\left\{t - E_{g, i} = k\right\} + \sum_{k \neq -1, -\infty} \gamma_k \mathbb{1}\left\{t - E_{g, i} = k\right\} D_g + \varepsilon_{g, i, t}. \end{aligned} \tag{3.1}\]

The coefficients \(\gamma_k\) capture the additional effect of being in a treated group (\(D_g = 1\)) on the child penalty at each event-time \(k\).1

Code
df_est <- df |>
  filter(B >= 1990) |>
  mutate(
    D = as.integer(W_g > w_median),
    k_bin = case_when(
      !is.finite(k) ~ k,
      k <= -4 ~ -4,
      .default = k
    )
  )

mdl_1step <- df_est |>
  feols(
    Y ~ i(k_bin, D, ref = c(-1, -Inf)) |
      id + B^g^t + k_bin,
    cluster = ~g
  )

# Extract gamma_k (D interaction terms)
ctb <- coeftable(mdl_1step) |>
  as.data.frame() |>
  tibble::rownames_to_column("coef")

ctb |>
  filter(grepl(":D$", coef)) |>
  mutate(
    k = as.integer(gsub(".*k_bin::(-?\\d+):.*", "\\1", coef)),
    ci_lo = Estimate - 1.96 * `Std. Error`,
    ci_hi = Estimate + 1.96 * `Std. Error`
  ) |>
  filter(between(k, -3, 5)) |>
  ggplot(aes(x = k, y = Estimate, ymin = ci_lo, ymax = ci_hi)) +
  geom_pointrange(color = color_base) +
  geom_hline(yintercept = 0, linetype = "dashed") +
  geom_vline(xintercept = -0.5, linetype = "dotted", color = "gray50") +
  labs(
    x = "Relative Time to First Childbirth (k)",
    y = expression(hat(gamma)[k])
  ) +
  theme_minimal() +
  theme(
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank()
  )
Figure 3.2: Policy Effect Estimates (1-Step Method)

The 1-step estimates in Figure 3.2 are roughly flat across post-event periods \(k \ge 0\). As we will see below, this flatness is misleading: the true policy effect \(\gamma_k\) should increase with \(k\) because the outcome \(Y\) is in levels. The 1-step method fails to capture this pattern due to endogenous weighting. The regression implicitly weights each group by the number of individual-level observations it contributes at each event time \(k\). Because the policy raises the birth hazard, treated groups (\(D_g = 1\)) have earlier births on average and thus contribute more person-time observations at large \(k\). These same groups also have larger policy effects (higher \(W_g\) implies larger \(\tau W_g\)), so the implicit weighting tilts the estimate upward at small \(k\) and masks the natural increasing trend.

2-Step Method (MD)

The 2-step method replaces the common event-time coefficients \(\beta_k\) in Equation 2.2 with cell-specific effects \(\beta(g, B, A, k)\) that vary by group, birth year, and age at first childbirth:

\[ Y_{g, i, t} = \alpha_{g, i} + \lambda_{B(g, i), g, t} + \beta(g_i, B_i, A_i, k) + \varepsilon_{g, i, t}. \tag{3.2}\]

This is an imputation approach: the fixed effects \(\hat\alpha_{g,i}\) and \(\hat\lambda_{B,g,t}\) are estimated using only the reference period \(k = -1\), and then used to impute counterfactual outcomes \(\hat{Y}^0_{g,i,t} = \hat\alpha_{g,i} + \hat\lambda_{B,g,t}\) for all other event times. The residuals \(\tilde{Y}_{g,i,t} = Y_{g,i,t} - \hat{Y}^0_{g,i,t}\) therefore measure the deviation from this imputed counterfactual.

Step 1. Using only observations at the reference event time \(k = -1\), estimate the fixed-effect model

\[ Y_{g, i, t} = \alpha_{g, i} + \lambda_{B(g, i), g, t} + \varepsilon_{g, i, t}, \qquad \text{for } k = -1, \]

and compute residuals \(\tilde{Y}_{g, i, t} = Y_{g, i, t} - \hat{\alpha}_{g,i} - \hat{\lambda}_{B(g,i),g,t}\) for all observations. Average the residuals within each \((g, B, A, k)\) cell to obtain the cell-specific event-time effect:

\[ \hat{\beta}(g, B, A, k) = \frac{1}{N_{g,B,A}} \sum_{\substack{i:\, g_i=g,\\ B_i=B,\, A_i=A}} \tilde{Y}_{g,i,B+A+k}, \]

where \(N_{g,B,A}\) is the number of individuals in the cell and calendar time is determined by \(t = B + A + k\).

Step 2. Decompose \(\hat{\beta}(g, B, A, k)\) into common and treatment-specific components:

\[ \hat{\beta}(g, B, A, k) = \sum_{k \neq -1, -\infty} \beta_k \mathbb{1}\{k\} + \sum_{k \neq -1, -\infty} \gamma_k \mathbb{1}\{k\} D_g + u_{g,B,A,k}, \]

with weight \(\omega_{g,B,A}\). In general, the researcher chooses exogenous weights that do not depend on the event time. Here, since the DGP assigns individuals to groups and birth years uniformly, equal weights are sufficient and we set \(\omega_{g,B,A} = 1\). The coefficients \(\gamma_k\) capture the policy effect on child penalty. Unlike the 1-step method, where the weighting of groups is determined implicitly by the regression, here the researcher explicitly controls the weights.

Code
# Step 1: Estimate FE model on k = -1 only
mdl_step1 <- df_est |>
  filter(k == -1) |>
  feols(
    Y ~ 0 | id + B^g^t,
    combine.quick = FALSE
  )

# Compute residuals and aggregate at (g, B, A, k) level
df_resid <- df_est |>
  mutate(
    Y_resid = Y - predict(mdl_step1, newdata = df_est),
    k_bin = case_when(
      !is.finite(k) ~ k,
      k <= -4 ~ -4,
      .default = k
    )
  ) |>
  filter(!is.na(Y_resid), is.finite(k))

df_gba <- df_resid |>
  summarize(
    beta_hat = mean(Y_resid),
    D = first(D),
    .by = c(g, B, A, k_bin)
  )

# Step 2: Decompose into common + treatment effects
mdl_step2 <- df_gba |>
  feols(
    beta_hat ~ i(k_bin, D, ref = -1) | k_bin,
    cluster = ~g
  )

# Extract gamma_k (D interaction terms)
ctb <- coeftable(mdl_step2) |>
  as.data.frame() |>
  tibble::rownames_to_column("coef")

ctb |>
  filter(grepl(":D$", coef)) |>
  mutate(
    k = as.integer(gsub(".*k_bin::(-?\\d+):.*", "\\1", coef)),
    ci_lo = Estimate - 1.96 * `Std. Error`,
    ci_hi = Estimate + 1.96 * `Std. Error`
  ) |>
  filter(between(k, -3, 5)) |>
  ggplot(aes(x = k, y = Estimate, ymin = ci_lo, ymax = ci_hi)) +
  geom_pointrange(color = color_base) +
  geom_hline(yintercept = 0, linetype = "dashed") +
  geom_vline(xintercept = -0.5, linetype = "dotted", color = "gray50") +
  labs(
    x = "Relative Time to First Childbirth (k)",
    y = expression(hat(gamma)[k])
  ) +
  theme_minimal() +
  theme(
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank()
  )
Figure 3.3: Policy Effect Estimates (2-Step Method)

The 2-step estimates in Figure 3.3 show a clear increasing trend in \(\hat\gamma_k\) as \(k\) grows. This pattern is expected from the DGP. Since the outcome \(Y\) is in levels rather than in logs, the coefficient \(\gamma_k\) reflects an absolute (not proportional) difference between treated and control groups. As \(k\) increases, individuals accumulate more post-birth experience, raising their baseline wage \(Y_0\). The policy effect is multiplicative (\(Y \approx Y_0 (1 + \gamma_1 \delta\tau W_g)\)), so the same proportional effect translates into a larger absolute difference at higher \(Y_0\).

By first estimating cell-specific event-time effects \(\hat{\beta}(g, B, A, k)\) and then decomposing them with explicit, equal weights \(\omega_{g,B,A} = 1\), the 2-step method correctly recovers this increasing trend. The 1-step method, by contrast, produces flat estimates because the upward bias from endogenous weighting at small \(k\) counteracts the natural increase.

3.4 Alternative Specifications

Cross-Sectional Specification

The specifications above use individual fixed effects \(\alpha_{g,i}\) and cohort-group-time fixed effects \(\lambda_{B,g,t}\), which require panel data. An alternative approach, following Kleven, Landais, and Søgaard (2019), replaces these with age and calendar-year fixed effects. Extending Equation 2.1 to include the treatment interaction gives

\[ \begin{aligned} Y_{g, i, t} &= \lambda_{a(g, i, t)} + \lambda_t\\ &+ \sum_{k \neq -1, -\infty} \beta_k \mathbb{1}\left\{t - E_{g, i} = k\right\} \\ &+ \sum_{k \neq -1, -\infty} \gamma_k \mathbb{1}\left\{t - E_{g, i} = k\right\} D_g + \varepsilon_{g, i, t}. \end{aligned} \tag{3.3}\]

This specification does not require tracking the same individual over time, making it applicable to repeated cross-sectional data. Since this is a cross-sectional approach, we restrict the sample to calendar years \(t \ge 2010\) (when the policy is active) rather than conditioning on birth cohorts. However, the age and year fixed effects provide weaker controls compared to the individual-level fixed effects used in the main specifications.

Code
df_kleven <- df |>
  filter(t >= 2010) |>
  mutate(
    D = as.integer(W_g > w_median),
    k_bin = case_when(
      !is.finite(k) ~ k,
      k <= -4 ~ -4,
      k >= 6 ~ 6,
      .default = k
    )
  )

mdl_kleven <- df_kleven |>
  feols(
    Y ~ i(k_bin, D, ref = c(-1, -Inf)) |
      age + t + k_bin,
    cluster = ~g
  )

# Extract gamma_k (D interaction terms)
ctb <- coeftable(mdl_kleven) |>
  as.data.frame() |>
  tibble::rownames_to_column("coef")

ctb |>
  filter(grepl(":D$", coef)) |>
  mutate(
    k = as.integer(gsub(".*k_bin::(-?\\d+):.*", "\\1", coef)),
    ci_lo = Estimate - 1.96 * `Std. Error`,
    ci_hi = Estimate + 1.96 * `Std. Error`
  ) |>
  filter(between(k, -3, 5)) |>
  ggplot(aes(x = k, y = Estimate, ymin = ci_lo, ymax = ci_hi)) +
  geom_pointrange(color = color_base) +
  geom_hline(yintercept = 0, linetype = "dashed") +
  geom_vline(xintercept = -0.5, linetype = "dotted", color = "gray50") +
  labs(
    x = "Relative Time to First Childbirth (k)",
    y = expression(hat(gamma)[k])
  ) +
  theme_minimal() +
  theme(
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank()
  )
Figure 3.4: Policy Effect Estimates (Cross-Sectional Specification)

Figure 3.4 reveals two problems with this specification. First, the pre-event estimates (\(k < 0\)) are noticeably different from zero, suggesting that the pre-trends are not credible. Without individual fixed effects, treated and control groups differ systematically in unobserved characteristics such as group-level heterogeneity \(U_g\), which affects both wage levels and the timing of childbirth. Second, the post-event estimates tend to be larger in magnitude than those from the main specifications. This overestimation reflects selection bias: the policy shifts event timing differentially across groups, and without individual-level controls, these compositional differences are absorbed into the treatment effect estimates.

Rich MD Specification

Arkhangelsky, Yanagimoto, and Zohar (2026, sec. 5.3.2) propose a richer MD specification that exploits the flexibility of the two-step approach. In their empirical application, the childcare capacity index \(CCI_{g,t}\) varies across both municipalities and calendar years, enabling a rich second-stage regression:

\[ \begin{aligned} \hat{\tau}_{g,h}(x,e) &= \alpha_{g,h}(x) + \lambda_h^{(0)}(x,e) + \lambda_h^{(1)}(x,e) \cdot S_g \\ &+ \sum_{j=-2}^{h} \beta_{h,j} \, CCI_{g,b+e+j} + \upsilon_{g,h}(x,e), \end{aligned} \tag{3.4}\]

weighted by \(n_g(x)\). This specification includes municipality \(\times\) event-time fixed effects (\(\alpha_{g,h}\)), cohort-specific time effects (\(\lambda_h^{(0)}\)), and trends interacted with baseline demographic demand (\(\lambda_h^{(1)} \cdot S_g\)). The summation captures dynamic effects, and including \(CCI\) from pre-birth periods (\(j < 0\)) provides a specification test for compositional shifts.

In our DGP, \(W_g\) is time-invariant, which limits two features of the full specification: municipality \(\times\) event-time fixed effects would absorb the treatment variable, and the dynamic treatment structure collapses because \(W_g\) does not vary across calendar years. The adapted specification is:

\[ \hat{\beta}(g, B, A, k) = \alpha_g + \lambda_k + \gamma_k \cdot W_g + u_{g,B,A,k}, \tag{3.5}\]

Compared to the basic specification, this uses two improvements: (1) continuous \(W_g\) instead of binary \(D_g\), avoiding information loss from binarization; and (2) municipality fixed effects \(\alpha_g\) to absorb group-level heterogeneity.

Code
df_gba_rich <- df_resid |>
  summarize(
    beta_hat = mean(Y_resid),
    W_g = first(W_g),
    .by = c(g, B, A, k_bin)
  )

mdl_rich <- df_gba_rich |>
  feols(
    beta_hat ~ i(k_bin, W_g, ref = -1) | g + k_bin,
    cluster = ~g
  )

ctb <- coeftable(mdl_rich) |>
  as.data.frame() |>
  tibble::rownames_to_column("coef")

ctb |>
  filter(grepl(":W_g$", coef)) |>
  mutate(
    k = as.integer(gsub(".*k_bin::(-?\\d+):.*", "\\1", coef)),
    ci_lo = Estimate - 1.96 * `Std. Error`,
    ci_hi = Estimate + 1.96 * `Std. Error`
  ) |>
  filter(between(k, -3, 5)) |>
  ggplot(aes(x = k, y = Estimate, ymin = ci_lo, ymax = ci_hi)) +
  geom_pointrange(color = color_base) +
  geom_hline(yintercept = 0, linetype = "dashed") +
  geom_vline(xintercept = -0.5, linetype = "dotted", color = "gray50") +
  labs(
    x = "Relative Time to First Childbirth (k)",
    y = expression(hat(gamma)[k])
  ) +
  theme_minimal() +
  theme(
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank()
  )
Figure 3.5: Policy Effect Estimates (Rich MD Specification)

Figure 3.5 shows the estimated \(\gamma_k\) from the rich specification. Because \(W_g \in [0, 1]\), each coefficient represents the effect of going from \(W_g = 0\) to \(W_g = 1\). The point estimates are roughly twice those of the basic binary specification, which is expected: the binary split compares the mean of the high group (\(\bar W_g \approx 0.75\)) to the mean of the low group (\(\bar W_g \approx 0.25\)), a difference of approximately \(0.5\).

The increasing trend in \(\gamma_k\) is preserved, confirming that the pattern is not an artifact of the binary discretization. In our DGP, \(W_g \perp U_g\) by construction, so the municipality fixed effects do not change the point estimates substantially. In applications where unobserved group heterogeneity is correlated with the treatment, these fixed effects provide additional robustness.

This specification illustrates a core advantage of the MD approach: the researcher retains full control over the second-stage regression. The treatment can remain continuous, confounders can be absorbed through fixed effects, and the weighting scheme is explicit. In the 1-step approach, all of these choices are determined implicitly by the estimator, and the researcher has no mechanism to adjust them without changing the underlying model.


  1. In the implementation below, we bin all \(k \le -4\) into a single category to reduce the number of dummy variables. Since we only report \(\gamma_k\) for \(k \in [-3, 5]\), the distant pre-event periods are not of direct interest and the binning substantially reduces computation time.↩︎