Efficient way to estimate cohort-specific event-study regression with many cohorts

Heike Waechter

Join Date: Oct 2025
Posts: 11

#16

12 Nov 2025, 04:00

Hello everyone. I thought i can piece the rest together on my own but it seems that i have not really understood how it works. The problem is this:

I want to estimate not only the effect of the eventtime on the income per cohort, but also for males and females separately. Then i construct the child penalties by taking the difference of the effect on males vs females and weigh this by females counterfactual. For this i introduced sex into the synthetic dataset and ran the regressions seperately for males and females. I did this for both my approaches and checked whether they produce the same outcome by taking differences at the end. they don't. Can someone explain to me how this differs from running without splitting by sex? And how do i have to change my reghdfe approach so that it mirrors the old dummy approach in this case aswell? Also, I see, that enforcing the baseline globally as shown above works, but i don't quite understand why we take the same baseline for everyone. Shouldn't every cohort get its own?

At the moment i am running this code:

Code:

*===============================================================
* Synthetic dataset (as provided) + CP via (A) dummies vs (B) reghdfe, then compare
*===============================================================
version 18
clear all
set more off
set seed 12345

*------------------------------
* 0) Build the dataset (your code)
*------------------------------
local municipality = 100
local Npersons     = 50
local Tmin         = 1
local Tmax         = 17
local Tobs         = `Tmax' - `Tmin'
local obs          = `municipality' * `Npersons' * `Tobs'
display `obs'

set obs `obs'

gen mun_id = ceil(_n / (`Npersons' * `Tobs'))
gen person_in_unit = mod(ceil(_n / `Tobs') - 1, `Npersons') + 1
gen id = mun_id * 10000 + person_in_unit

bysort mun_id person_in_unit: gen t_idx = `Tmin' + _n - 1

gen str3 language = cond(mod(mun_id,3)==0, "GER", ///
                      cond(mod(mun_id,3)==1, "FRE", "ITA"))

bysort id: gen statyear0 = 1995 + floor(runiform()*10)
gen statyear = statyear0 + t_idx

bysort id: gen start_age = 25 + int(runiform()*20)
gen age = start_age + t_idx

* 1 = male, 2 = female (balanced-ish)
bys id: gen byte sex = 1 + (runiform()>=0.5)

gen u_intercept = rnormal(0, 5)
gen u_slope     = rnormal(0, 0.5)
gen event_effect = cond(t_idx>=0, -10 + 0.5*t_idx, 0)
gen y = 100 + u_intercept + u_slope * t_idx + event_effect + rnormal(0, 10) - 40 * sex

drop u_intercept u_slope start_age statyear0 event_effect

label var id               "Individual identifier"
label var mun_id           "Municipality / cohort id"
label var person_in_unit   "Person within municipality"
label var language         "language region"
label var t_idx            "Event time index (years relative)"
label var age              "Age of individual (synthetic)"
label var statyear         "Calendar year of observation"
label var y                "Outcome (synthetic)"

* Define cohort handle
egen lang = group(language)
local cohort lang
local coh3 = substr("`cohort'", 1, 3)
local coh9 = substr("`cohort'", 1, 9)

*--------------------------------------------------------------
* Add sex & create an outcome with sex-specific event-time drop
*--------------------------------------------------------------
rename y mrevcot
gen dacot = statyear

tempfile SYN
save "`SYN'", replace

*===============================================================
* (A) DUMMY-INTERACTIONS CP (benchmark)
*===============================================================
use "`SYN'", clear

    /* interactions; names will include the short `coh3'/`coh7` prefixes */
    xi i.t_idx*i.`cohort', noomit
    drop _It_iX`coh3'_5_* _It_idx* _I`coh9'_*

    local vardrop
    foreach var of varlist _I* {
        quietly summarize `var', meanonly
        if r(mean) == 0 {
            di "`var' --> delete"
            local vardrop `vardrop' `var'
        }
    }
    capture drop `vardrop'

tempfile female_dummy male_dummy

preserve
keep if sex == 2
* FEMALES: alpha^w_t and E[~Y^w | t]
reg mrevcot _It_iX`coh3'_* i.age i.dacot, r
gen double alpha_w_dummy = .
replace alpha_w_dummy = 0 if t_idx==5 

    /* create the levels local named after the 3-letter handle */
    levelsof `cohort', local(`coh3')
    forvalues k = 1/16 {
        if `k'!=5 {
            foreach l of local `coh3' {
                cap replace alpha_w_dummy = _b[_It_iX`coh3'_`k'_`l'] ///
                   if `cohort'==`l' & t_idx==`k'
            }
        }
    }

* Counterfactual 
predict double ytilde_w if e(sample), xb
gen double mu_ytilde_w_dummy = . 
replace mu_ytilde_w_dummy = ytilde_w - alpha_w_dummy if e(sample)
save "`female_dummy'", replace
restore

preserve
fvset base 5 t_idx
keep if sex == 1
* MALES: alpha^m_t
reg mrevcot _It_iX`coh3'_* i.age i.dacot, r
gen double alpha_m_dummy = .
replace alpha_m_dummy = 0 if t_idx==5

    levelsof `cohort', local(`coh3'_m)
    forvalues k = 1/16 {
        if `k'!=5 {
            foreach l of local `coh3'_m {
                cap replace alpha_m_dummy = _b[_It_iX`coh3'_`k'_`l'] ///
                    if `cohort'==`l' & t_idx==`k'
            }
        }
    }

save "`male_dummy'", replace
restore

use "`female_dummy'", clear
append using "`male_dummy'"

* collapse to event-time series and compute P_t
collapse (mean) alpha_w_dummy alpha_m_dummy mu_ytilde_w_dummy, by(t_idx lang)

gen double cp_dummy     = (alpha_m_dummy - alpha_w_dummy) / mu_ytilde_w_dummy
gen double cp_pct_dummy = 100*cp

tab cp_dummy

tempfile CP_DUMMY
save "`CP_DUMMY'", replace

*===============================================================
* (B) REGHDFE + SAVED FE CP
*===============================================================
use "`SYN'", clear

tempfile female male

* --- FEMALES (absorb t FE; predict non-FE xb; rebuild baseline)
preserve
    keep if sex==2
    reghdfe mrevcot i.age i.dacot, absorb(i.t_idx#i.lang, savefe) vce(robust)
    egen double fe_base_w = mean(cond(t_idx==5, __hdfe1__, .))
    gen double alpha_w = __hdfe1__ - fe_base_w
    predict double xb_noFE_w, xb
    gen double mu_ytilde_w = xb_noFE_w + fe_base_w
    save "`female'", replace
restore

* --- MALES
preserve
    keep if sex==1
    reghdfe mrevcot i.age i.dacot, absorb(i.t_idx#i.lang, savefe) vce(robust)
    egen double fe_base_m = mean(cond(t_idx==5, __hdfe1__, .))
    gen double alpha_m = __hdfe1__ - fe_base_m
    save "`male'", replace
restore

* -------------------------
* Append and collapse
* -------------------------
use "`female'", clear
append using "`male'"

* collapse to event-time series and compute P_t
collapse (mean) alpha_w alpha_m mu_ytilde_w, by(t_idx lang)

gen double cp_hdfe     = (alpha_m - alpha_w) / mu_ytilde_w
gen double cp_pct_hdfe = 100*cp_hdfe

tab cp_hdfe
tempfile CP_HDFE
save "`CP_HDFE'", replace

*===============================================================
* Compare the two approaches
*===============================================================
use "`CP_DUMMY'", clear

merge 1:1 t_idx lang using "`CP_HDFE'", nogen

gen double diff_cp     = cp_hdfe - cp_dummy
gen double diff_cp_pct = cp_pct_hdfe - cp_pct_dummy
gen diff_alpha_w = alpha_w - alpha_w_dummy
gen diff_alpha_m = alpha_m - alpha_m_dummy
gen diff_mu_ytilde_w = mu_ytilde_w - mu_ytilde_w_dummy
tab diff_alpha_m
tab diff_alpha_w
tab diff_mu_ytilde_w
tab diff_cp

Thank you for your help,
Heike

Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10481

#17

13 Nov 2025, 03:27

You have posted quite a bit in #16, and it takes some time to follow everything. Here are a few clarifications:

1. The omitted dummy (or dummies) represents the reference group when including a set of indicators as regressors.

2. Within-demeaning is equivalent to including \(N -1\) group dummy variables in the regression.

3. In your case, you have \(N -3\) group dummies omitted, so the coefficients of the estimated variables in the dummy-variable regression and the within regression will differ (see #2).

4. Nevertheless, you can recover the dummy coefficients after predicting the fixed effects in the within regression by simply subtracting the mean of the three omitted dummies (these three jointly form the reference group). However, you’ll need to make some adjustments if you use any estimated (non-absorbed) coefficients because of #3.

Compare the estimated coefficients from the following: #1 and #2 should be the same, but #3 will differ.

Code:

*===============================================================
* Synthetic dataset (as provided) + CP via (A) dummies vs (B) reghdfe, then compare
*===============================================================
version 18
clear all
set more off
set seed 12345

*------------------------------
* 0) Build the dataset (your code)
*------------------------------
local municipality = 100
local Npersons     = 50
local Tmin         = 1
local Tmax         = 17
local Tobs         = `Tmax' - `Tmin'
local obs          = `municipality' * `Npersons' * `Tobs'
display `obs'

set obs `obs'

gen mun_id = ceil(_n / (`Npersons' * `Tobs'))
gen person_in_unit = mod(ceil(_n / `Tobs') - 1, `Npersons') + 1
gen id = mun_id * 10000 + person_in_unit

bysort mun_id person_in_unit: gen t_idx = `Tmin' + _n - 1

gen str3 language = cond(mod(mun_id,3)==0, "GER", ///
                      cond(mod(mun_id,3)==1, "FRE", "ITA"))

bysort id: gen statyear0 = 1995 + floor(runiform()*10)
gen statyear = statyear0 + t_idx

bysort id: gen start_age = 25 + int(runiform()*20)
gen age = start_age + t_idx

* 1 = male, 2 = female (balanced-ish)
bys id: gen byte sex = 1 + (runiform()>=0.5)

gen u_intercept = rnormal(0, 5)
gen u_slope     = rnormal(0, 0.5)
gen event_effect = cond(t_idx>=0, -10 + 0.5*t_idx, 0)
gen y = 100 + u_intercept + u_slope * t_idx + event_effect + rnormal(0, 10) - 40 * sex

drop u_intercept u_slope start_age statyear0 event_effect

label var id               "Individual identifier"
label var mun_id           "Municipality / cohort id"
label var person_in_unit   "Person within municipality"
label var language         "language region"
label var t_idx            "Event time index (years relative)"
label var age              "Age of individual (synthetic)"
label var statyear         "Calendar year of observation"
label var y                "Outcome (synthetic)"

* Define cohort handle
egen lang = group(language)
local cohort lang
local coh3 = substr("`cohort'", 1, 3)
local coh9 = substr("`cohort'", 1, 9)

*--------------------------------------------------------------
* Add sex & create an outcome with sex-specific event-time drop
*--------------------------------------------------------------
rename y mrevcot
gen dacot = statyear

tempfile SYN
save "`SYN'", replace

*===============================================================
* (A) DUMMY-INTERACTIONS CP (benchmark)
*===============================================================
use "`SYN'", clear

    /* interactions; names will include the short `coh3'/`coh7` prefixes */
    xi i.t_idx*i.`cohort', noomit
    drop _It_iX`coh3'_5_* _It_idx* _I`coh9'_*

    local vardrop
    foreach var of varlist _I* {
        quietly summarize `var', meanonly
        if r(mean) == 0 {
            di "`var' --> delete"
            local vardrop `vardrop' `var'
        }
    }
    capture drop `vardrop'

tempfile female_dummy male_dummy


keep if sex == 2
* FEMALES: alpha^w_t and E[~Y^w | t]
reg mrevcot _It_iX`coh3'_* i.age i.dacot, r
gen double alpha_w_dummy = .
replace alpha_w_dummy = 0 if t_idx==5 

    /* create the levels local named after the 3-letter handle */
    levelsof `cohort', local(`coh3')
    forvalues k = 1/16 {
        if `k'!=5 {
            foreach l of local `coh3' {
                cap replace alpha_w_dummy = _b[_It_iX`coh3'_`k'_`l'] ///
                   if `cohort'==`l' & t_idx==`k'
            }
        }
    }

* Counterfactual 
predict double ytilde_w if e(sample), xb
gen double mu_ytilde_w_dummy = . 
replace mu_ytilde_w_dummy = ytilde_w - alpha_w_dummy if e(sample)
save "`female_dummy'", replace



local omitted 
qui levelsof lang, local(levs)
foreach l of local levs{
    local omitted `omitted' o5.t_idx#o.`l'.lang
}

*#1. 1 dummy omitted
reg mrevcot ibn.t_idx#ibn.lang i.age i.dacot, r noomit

*#2. Absorbing dummies 
reghdfe mrevcot i.age i.dacot, absorb(i.t_idx#i.lang, savefe) vce(robust) 

*#3. 3 dummies omitted
reg mrevcot ibn.t_idx#ibn.lang `omitted' i.age i.dacot, r noomit

Also, I see, that enforcing the baseline globally as shown above works, but i don't quite understand why we take the same baseline for everyone.

I don't fully follow your entire procedure, but since you have one set of indicators, you will have exactly one reference group. Here's what I suggest: show us the results from your dummy-variable regression, and then ask us to replicate them using the within estimator. Do not mix the two approaches. Clearly specify what you want to replicate (we'll assume you can defend the correctness of your dummy-variable procedure).

Comment

Heike Waechter

Join Date: Oct 2025
Posts: 11

#18

13 Nov 2025, 08:07

Thank you for helping me again. Maybe i misunderstand your suggestion at the end but i feel like thats what i did:

i created the dataset
i ran the dummy approach and stored the results in the tempfile CP_DUMMY
i ran the reghdfe approach (as you showed me) and stored results in the tempfile CP_HDFE
i compared the results of the two by taking the difference (should be 0 if they are equal)

As far as i can see this matches exactly your earlier solution, with one difference: i now introduced sex and estimate for the two sexes separately. So i am asking how to fix this and why the introduction of sex causes problems. The results stored in CP_DUMMY are the following:

Code:

 
table (t_idx lang), statistic(mean alpha_w_dummy alpha_m_dummy mu_ytilde_w_dummy cp_dummy) nototals

-----------------------------------------------------------------------------------------------------------------------
                                  |  (mean) alpha_w_dummy   (mean) alpha_m_dummy   (mean) mu_ytilde_w_dummy    cp_dummy
----------------------------------+------------------------------------------------------------------------------------
Event time index (years relative) |                                                                                    
  1                               |                                                                                    
    group(language)               |                                                                                    
      1                           |             -2.041257              -2.474815                   12.53731   -.0345815
      2                           |             -1.699632              -2.856431                   12.53001   -.0923222
      3                           |             -2.838748              -1.801679                    12.5615    .0825593
  2                               |                                                                                    
    group(language)               |                                                                                    
      1                           |             -2.346256              -1.405414                   12.55999    .0749078
      2                           |             -1.404203              -2.090694                   12.55271   -.0546886
      3                           |              -1.73921              -1.600327                   12.56389    .0110541
  3                               |                                                                                    
    group(language)               |                                                                                    
      1                           |             -1.431983               -.513802                    12.5184    .0733465
      2                           |             -.7170617              -1.860935                   12.50891   -.0914446
      3                           |             -1.546067              -.7948463                   12.52912     .059958
  4                               |                                                                                    
    group(language)               |                                                                                    
      1                           |             -.3730967               .0027212                    12.5766    .0298823
      2                           |             -.6812666              -.8999933                   12.57291   -.0173967
      3                           |             -.1209207              -1.413276                   12.58555   -.1026857
  5                               |                                                                                    
    group(language)               |                                                                                    
      1                           |                     0                      0                   12.61675           0
      2                           |                     0                      0                   12.57819           0
      3                           |                     0                      0                   12.57672           0
  6                               |                                                                                    
    group(language)               |                                                                                    
      1                           |              .4147298              -.0919847                   12.59718   -.0402245
      2                           |              .4831777              -.0622622                      12.58   -.0433577
      3                           |               .141382               .3818112                   12.57255    .0191233
  7                               |                                                                                    
    group(language)               |                                                                                    
      1                           |              1.041144               1.249766                    12.5405    .0166358
      2                           |              .9371617               .9997361                   12.58435    .0049724
      3                           |              .5157443               1.251906                   12.58378    .0585008
  8                               |                                                                                    
    group(language)               |                                                                                    
      1                           |              1.376584               1.289775                   12.61847   -.0068795
      2                           |              1.400152               1.417199                   12.60609    .0013523
      3                           |              1.863378               .9792694                    12.6102   -.0701106
  9                               |                                                                                    
    group(language)               |                                                                                    
      1                           |               1.91708                1.61399                   12.61968   -.0240172
      2                           |               2.52575               2.652424                   12.61227    .0100437
      3                           |              2.443006               1.390521                   12.60855   -.0834739
  10                              |                                                                                    
    group(language)               |                                                                                    
      1                           |              2.593434               1.925098                   12.67577   -.0527255
      2                           |              2.547423               2.519517                   12.66266   -.0022038
      3                           |              2.493556               2.112195                   12.68138   -.0300725
  11                              |                                                                                    
    group(language)               |                                                                                    
      1                           |              3.357055               2.528115                   12.61564   -.0657074
      2                           |              2.427043               2.693987                   12.64318    .0211137
      3                           |              2.959188               2.808365                   12.62171   -.0119495
  12                              |                                                                                    
    group(language)               |                                                                                    
      1                           |              3.056158                3.31611                   12.65438    .0205425
      2                           |              3.024938               3.854715                   12.69376    .0653689
      3                           |              3.078041                3.66355                   12.69327    .0461276
  13                              |                                                                                    
    group(language)               |                                                                                    
      1                           |              4.239237               3.752072                   12.75733   -.0381871
      2                           |              3.986817               3.730716                   12.74731   -.0200906
      3                           |              3.196524               4.413426                   12.80244    .0950524
  14                              |                                                                                    
    group(language)               |                                                                                    
      1                           |              4.181094               4.380233                   12.78107    .0155807
      2                           |              3.751581               4.097661                   12.76077    .0271206
      3                           |              3.899237               3.689199                   12.78877   -.0164236
  15                              |                                                                                    
    group(language)               |                                                                                    
      1                           |              5.155309               5.334663                   12.72941    .0140897
      2                           |               5.15915                5.44343                   12.74737    .0223011
      3                           |              4.557805               5.110144                   12.72155    .0434176
  16                              |                                                                                    
    group(language)               |                                                                                    
      1                           |              5.161544               5.817093                   12.75794    .0513836
      2                           |              5.415611               5.938735                   12.70005    .0411907
      3                           |              5.774189               4.790254                   12.69575   -.0775011
-----------------------------------------------------------------------------------------------------------------------

The second (minor) point was me trying to understand what I am doing when running this line:

Code:

egen double fe_base_m = mean(cond(t_idx==5, __hdfe1__, .))

If I understand it correctly this takes all the coefficients from t_idx#lang combinations where t_idx == 5 and enscribes them to their respective observation. Afterwards i enscribe into every single observation the mean of these different coefficients. I intuitively don't really understand why this gives me the same result as my dummy approach where i compare each cohort to its own unique baseline. But this is more of a side question as i am also happy if i get the code to work without fully understanding.

A third point that would interest me (not mentioned in #16), is how i would calculate significance of the cp_dummy variable. As i in the end want to run such code for many municipalities and some have sparse observation measuring it will be needed there to decide which measures to trust and which not.

Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10481

#19

13 Nov 2025, 09:36

Originally posted by Heike Waechter View Post

The second (minor) point was me trying to understand what I am doing when running this line:

Code:

egen double fe_base_m = mean(cond(t_idx==5, __hdfe1__, .))

There is no unique baseline for each cohort. All cohorts constitute one set - so all have the same base. In my post #17, I stated that the omitted dummies are the base category in the regression. If 3 dummies are omitted, then all 3 are the base category. Therefore, in this code, I am just taking the mean value of the 3 combinations, which I will use to recover the coefficients on the dummies from the predicted fixed effects. As far as I can see, the recovered coefficients and the estimated coefficients on the dummies match up to 3 decimal places (see code below). Let me know if I am misunderstanding something from the code below.

Code:

*===============================================================
* Synthetic dataset (as provided) + CP via (A) dummies vs (B) reghdfe, then compare
*===============================================================
version 18
clear all
set more off
set seed 12345

*------------------------------
* 0) Build the dataset (your code)
*------------------------------
local municipality = 100
local Npersons     = 50
local Tmin         = 1
local Tmax         = 17
local Tobs         = `Tmax' - `Tmin'
local obs          = `municipality' * `Npersons' * `Tobs'
display `obs'

set obs `obs'

gen mun_id = ceil(_n / (`Npersons' * `Tobs'))
gen person_in_unit = mod(ceil(_n / `Tobs') - 1, `Npersons') + 1
gen id = mun_id * 10000 + person_in_unit

bysort mun_id person_in_unit: gen t_idx = `Tmin' + _n - 1

gen str3 language = cond(mod(mun_id,3)==0, "GER", ///
                      cond(mod(mun_id,3)==1, "FRE", "ITA"))

bysort id: gen statyear0 = 1995 + floor(runiform()*10)
gen statyear = statyear0 + t_idx

bysort id: gen start_age = 25 + int(runiform()*20)
gen age = start_age + t_idx

* 1 = male, 2 = female (balanced-ish)
bys id: gen byte sex = 1 + (runiform()>=0.5)

gen u_intercept = rnormal(0, 5)
gen u_slope     = rnormal(0, 0.5)
gen event_effect = cond(t_idx>=0, -10 + 0.5*t_idx, 0)
gen y = 100 + u_intercept + u_slope * t_idx + event_effect + rnormal(0, 10) - 40 * sex

drop u_intercept u_slope start_age statyear0 event_effect

label var id               "Individual identifier"
label var mun_id           "Municipality / cohort id"
label var person_in_unit   "Person within municipality"
label var language         "language region"
label var t_idx            "Event time index (years relative)"
label var age              "Age of individual (synthetic)"
label var statyear         "Calendar year of observation"
label var y                "Outcome (synthetic)"

* Define cohort handle
egen lang = group(language)
local cohort lang
local coh3 = substr("`cohort'", 1, 3)
local coh9 = substr("`cohort'", 1, 9)


*--------------------------------------------------------------
* Add sex & create an outcome with sex-specific event-time drop
*--------------------------------------------------------------
rename y mrevcot
gen dacot = statyear

tempfile SYN
save "`SYN'", replace

*===============================================================
* (A) DUMMY-INTERACTIONS CP (benchmark)
*===============================================================
use "`SYN'", clear

    /* interactions; names will include the short `coh3'/`coh7` prefixes */
    xi i.t_idx*i.`cohort', noomit
    drop _It_iX`coh3'_5_* _It_idx* _I`coh9'_*

    local vardrop
    foreach var of varlist _I* {
        quietly summarize `var', meanonly
        if r(mean) == 0 {
            di "`var' --> delete"
            local vardrop `vardrop' `var'
        }
    }
    capture drop `vardrop'

tempfile female_dummy male_dummy


keep if sex == 2
* FEMALES: alpha^w_t and E[~Y^w | t]
reg mrevcot _It_iX`coh3'_* i.age i.dacot, r
gen double alpha_w_dummy = .
replace alpha_w_dummy = 0 if t_idx==5

    /* create the levels local named after the 3-letter handle */
    levelsof `cohort', local(`coh3')
    forvalues k = 1/16 {
        if `k'!=5 {
            foreach l of local `coh3' {
                cap replace alpha_w_dummy = _b[_It_iX`coh3'_`k'_`l'] ///
                   if `cohort'==`l' & t_idx==`k'
            }
        }
    }

* Counterfactual
predict double ytilde_w if e(sample), xb
gen double mu_ytilde_w_dummy = .
replace mu_ytilde_w_dummy = ytilde_w - alpha_w_dummy if e(sample)
save "`female_dummy'", replace


forval s= 1/2{
    use `SYN', clear    
    keep if sex== `s'
    local omitted
    qui levelsof lang, local(levs)
    foreach l of local levs{
        local omitted `omitted' o5.t_idx#o.`l'.lang
    }
    *DUMMIES
    reg mrevcot ibn.t_idx#ibn.lang `omitted' i.age i.dacot, r noomit
    *Replicate using REGHDFE
    qui reghdfe mrevcot i.age i.dacot, absorb(ibn.t_idx#ibn.lang, savefe)  vce(robust)
    egen double mean_t5= mean(cond(t_idx==5, __hdfe1__, .))
    egen double base= max(cond(t_idx==5, mean_t5, .))
    gen double coef= __hdfe1__- base
    table (t_idx lang), statistic(mean coef) nototal
}

A third point that would interest me (not mentioned in #16), is how i would calculate significance of the cp_dummy variable. As i in the end want to run such code for many municipalities and some have sparse observation measuring it will be needed there to decide which measures to trust and which not.

If the number of groups is not large, the dummy coefficients are not estimated consistently. However, the null hypothesis is that the difference between the observed coefficient and the base coefficient(s) is zero. Therefore, the larger the absolute value of the coefficient, the more statistically significant it is likely to be. One could explicitly estimate a few dummy variables to get a sense of which effects are significant. Note that you do not obtain standard errors when the dummies are absorbed.

Comment

Heike Waechter

Join Date: Oct 2025

Posts: 11
#20

13 Nov 2025, 10:05

yes, the coefficients are quite similar most of the time, but never exactly. therefore i was worried that i made a mistake in implementing the reghdfe approach. In your old example the coefficients matched exactly (the one in post 11 and 12). so i dont really understand how they could match perfectly before and now they dont. because all i really changed is that i estimate males and females separately. which means that i roughly exclude half of my sample for each regression. why does this influence the results? i guess that is were i am stuck.
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10481
#21

13 Nov 2025, 10:27

Originally posted by Heike Waechter View Post

In your old example the coefficients matched exactly (the one in post 11 and 12). so i dont really understand how they could match perfectly before and now they dont.

Look again — there are some differences beyond the fourth decimal place. However, these are simply precision issues and do not matter in terms of coefficient comparison or significance levels. You can safely disregard them.
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10481
#22

13 Nov 2025, 11:00

Originally posted by Andrew Musau View Post

If the number of groups is not large, the dummy coefficients are not estimated consistently.

should read "number of observations per group" — in your case, time periods.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment