Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Hello everyone. I thought i can piece the rest together on my own but it seems that i have not really understood how it works. The problem is this:

    I want to estimate not only the effect of the eventtime on the income per cohort, but also for males and females separately. Then i construct the child penalties by taking the difference of the effect on males vs females and weigh this by females counterfactual. For this i introduced sex into the synthetic dataset and ran the regressions seperately for males and females. I did this for both my approaches and checked whether they produce the same outcome by taking differences at the end. they don't. Can someone explain to me how this differs from running without splitting by sex? And how do i have to change my reghdfe approach so that it mirrors the old dummy approach in this case aswell? Also, I see, that enforcing the baseline globally as shown above works, but i don't quite understand why we take the same baseline for everyone. Shouldn't every cohort get its own?

    At the moment i am running this code:

    Code:
    *===============================================================
    * Synthetic dataset (as provided) + CP via (A) dummies vs (B) reghdfe, then compare
    *===============================================================
    version 18
    clear all
    set more off
    set seed 12345
    
    *------------------------------
    * 0) Build the dataset (your code)
    *------------------------------
    local municipality = 100
    local Npersons     = 50
    local Tmin         = 1
    local Tmax         = 17
    local Tobs         = `Tmax' - `Tmin'
    local obs          = `municipality' * `Npersons' * `Tobs'
    display `obs'
    
    set obs `obs'
    
    gen mun_id = ceil(_n / (`Npersons' * `Tobs'))
    gen person_in_unit = mod(ceil(_n / `Tobs') - 1, `Npersons') + 1
    gen id = mun_id * 10000 + person_in_unit
    
    bysort mun_id person_in_unit: gen t_idx = `Tmin' + _n - 1
    
    gen str3 language = cond(mod(mun_id,3)==0, "GER", ///
                          cond(mod(mun_id,3)==1, "FRE", "ITA"))
    
    bysort id: gen statyear0 = 1995 + floor(runiform()*10)
    gen statyear = statyear0 + t_idx
    
    bysort id: gen start_age = 25 + int(runiform()*20)
    gen age = start_age + t_idx
    
    * 1 = male, 2 = female (balanced-ish)
    bys id: gen byte sex = 1 + (runiform()>=0.5)
    
    gen u_intercept = rnormal(0, 5)
    gen u_slope     = rnormal(0, 0.5)
    gen event_effect = cond(t_idx>=0, -10 + 0.5*t_idx, 0)
    gen y = 100 + u_intercept + u_slope * t_idx + event_effect + rnormal(0, 10) - 40 * sex
    
    drop u_intercept u_slope start_age statyear0 event_effect
    
    label var id               "Individual identifier"
    label var mun_id           "Municipality / cohort id"
    label var person_in_unit   "Person within municipality"
    label var language         "language region"
    label var t_idx            "Event time index (years relative)"
    label var age              "Age of individual (synthetic)"
    label var statyear         "Calendar year of observation"
    label var y                "Outcome (synthetic)"
    
    * Define cohort handle
    egen lang = group(language)
    local cohort lang
    local coh3 = substr("`cohort'", 1, 3)
    local coh9 = substr("`cohort'", 1, 9)
    
    *--------------------------------------------------------------
    * Add sex & create an outcome with sex-specific event-time drop
    *--------------------------------------------------------------
    rename y mrevcot
    gen dacot = statyear
    
    tempfile SYN
    save "`SYN'", replace
    
    *===============================================================
    * (A) DUMMY-INTERACTIONS CP (benchmark)
    *===============================================================
    use "`SYN'", clear
    
        /* interactions; names will include the short `coh3'/`coh7` prefixes */
        xi i.t_idx*i.`cohort', noomit
        drop _It_iX`coh3'_5_* _It_idx* _I`coh9'_*
    
        local vardrop
        foreach var of varlist _I* {
            quietly summarize `var', meanonly
            if r(mean) == 0 {
                di "`var' --> delete"
                local vardrop `vardrop' `var'
            }
        }
        capture drop `vardrop'
    
    tempfile female_dummy male_dummy
    
    preserve
    keep if sex == 2
    * FEMALES: alpha^w_t and E[~Y^w | t]
    reg mrevcot _It_iX`coh3'_* i.age i.dacot, r
    gen double alpha_w_dummy = .
    replace alpha_w_dummy = 0 if t_idx==5 
    
        /* create the levels local named after the 3-letter handle */
        levelsof `cohort', local(`coh3')
        forvalues k = 1/16 {
            if `k'!=5 {
                foreach l of local `coh3' {
                    cap replace alpha_w_dummy = _b[_It_iX`coh3'_`k'_`l'] ///
                       if `cohort'==`l' & t_idx==`k'
                }
            }
        }
    
    * Counterfactual 
    predict double ytilde_w if e(sample), xb
    gen double mu_ytilde_w_dummy = . 
    replace mu_ytilde_w_dummy = ytilde_w - alpha_w_dummy if e(sample)
    save "`female_dummy'", replace
    restore
    
    preserve
    fvset base 5 t_idx
    keep if sex == 1
    * MALES: alpha^m_t
    reg mrevcot _It_iX`coh3'_* i.age i.dacot, r
    gen double alpha_m_dummy = .
    replace alpha_m_dummy = 0 if t_idx==5
    
        levelsof `cohort', local(`coh3'_m)
        forvalues k = 1/16 {
            if `k'!=5 {
                foreach l of local `coh3'_m {
                    cap replace alpha_m_dummy = _b[_It_iX`coh3'_`k'_`l'] ///
                        if `cohort'==`l' & t_idx==`k'
                }
            }
        }
    
    save "`male_dummy'", replace
    restore
    
    use "`female_dummy'", clear
    append using "`male_dummy'"
    
    * collapse to event-time series and compute P_t
    collapse (mean) alpha_w_dummy alpha_m_dummy mu_ytilde_w_dummy, by(t_idx lang)
    
    gen double cp_dummy     = (alpha_m_dummy - alpha_w_dummy) / mu_ytilde_w_dummy
    gen double cp_pct_dummy = 100*cp
    
    tab cp_dummy
    
    tempfile CP_DUMMY
    save "`CP_DUMMY'", replace
    
    *===============================================================
    * (B) REGHDFE + SAVED FE CP
    *===============================================================
    use "`SYN'", clear
    
    tempfile female male
    
    * --- FEMALES (absorb t FE; predict non-FE xb; rebuild baseline)
    preserve
        keep if sex==2
        reghdfe mrevcot i.age i.dacot, absorb(i.t_idx#i.lang, savefe) vce(robust)
        egen double fe_base_w = mean(cond(t_idx==5, __hdfe1__, .))
        gen double alpha_w = __hdfe1__ - fe_base_w
        predict double xb_noFE_w, xb
        gen double mu_ytilde_w = xb_noFE_w + fe_base_w
        save "`female'", replace
    restore
    
    * --- MALES
    preserve
        keep if sex==1
        reghdfe mrevcot i.age i.dacot, absorb(i.t_idx#i.lang, savefe) vce(robust)
        egen double fe_base_m = mean(cond(t_idx==5, __hdfe1__, .))
        gen double alpha_m = __hdfe1__ - fe_base_m
        save "`male'", replace
    restore
    
    * -------------------------
    * Append and collapse
    * -------------------------
    use "`female'", clear
    append using "`male'"
    
    * collapse to event-time series and compute P_t
    collapse (mean) alpha_w alpha_m mu_ytilde_w, by(t_idx lang)
    
    gen double cp_hdfe     = (alpha_m - alpha_w) / mu_ytilde_w
    gen double cp_pct_hdfe = 100*cp_hdfe
    
    tab cp_hdfe
    tempfile CP_HDFE
    save "`CP_HDFE'", replace
    
    *===============================================================
    * Compare the two approaches
    *===============================================================
    use "`CP_DUMMY'", clear
    
    merge 1:1 t_idx lang using "`CP_HDFE'", nogen
    
    gen double diff_cp     = cp_hdfe - cp_dummy
    gen double diff_cp_pct = cp_pct_hdfe - cp_pct_dummy
    gen diff_alpha_w = alpha_w - alpha_w_dummy
    gen diff_alpha_m = alpha_m - alpha_m_dummy
    gen diff_mu_ytilde_w = mu_ytilde_w - mu_ytilde_w_dummy
    tab diff_alpha_m
    tab diff_alpha_w
    tab diff_mu_ytilde_w
    tab diff_cp
    Thank you for your help,
    Heike

    Comment


    • #17
      You have posted quite a bit in #16, and it takes some time to follow everything. Here are a few clarifications:

      1. The omitted dummy (or dummies) represents the reference group when including a set of indicators as regressors.

      2. Within-demeaning is equivalent to including \(N -1\) group dummy variables in the regression.

      3. In your case, you have \(N -3\) group dummies omitted, so the coefficients of the estimated variables in the dummy-variable regression and the within regression will differ (see #2).

      4. Nevertheless, you can recover the dummy coefficients after predicting the fixed effects in the within regression by simply subtracting the mean of the three omitted dummies (these three jointly form the reference group). However, you’ll need to make some adjustments if you use any estimated (non-absorbed) coefficients because of #3.

      Compare the estimated coefficients from the following: #1 and #2 should be the same, but #3 will differ.

      Code:
      *===============================================================
      * Synthetic dataset (as provided) + CP via (A) dummies vs (B) reghdfe, then compare
      *===============================================================
      version 18
      clear all
      set more off
      set seed 12345
      
      *------------------------------
      * 0) Build the dataset (your code)
      *------------------------------
      local municipality = 100
      local Npersons     = 50
      local Tmin         = 1
      local Tmax         = 17
      local Tobs         = `Tmax' - `Tmin'
      local obs          = `municipality' * `Npersons' * `Tobs'
      display `obs'
      
      set obs `obs'
      
      gen mun_id = ceil(_n / (`Npersons' * `Tobs'))
      gen person_in_unit = mod(ceil(_n / `Tobs') - 1, `Npersons') + 1
      gen id = mun_id * 10000 + person_in_unit
      
      bysort mun_id person_in_unit: gen t_idx = `Tmin' + _n - 1
      
      gen str3 language = cond(mod(mun_id,3)==0, "GER", ///
                            cond(mod(mun_id,3)==1, "FRE", "ITA"))
      
      bysort id: gen statyear0 = 1995 + floor(runiform()*10)
      gen statyear = statyear0 + t_idx
      
      bysort id: gen start_age = 25 + int(runiform()*20)
      gen age = start_age + t_idx
      
      * 1 = male, 2 = female (balanced-ish)
      bys id: gen byte sex = 1 + (runiform()>=0.5)
      
      gen u_intercept = rnormal(0, 5)
      gen u_slope     = rnormal(0, 0.5)
      gen event_effect = cond(t_idx>=0, -10 + 0.5*t_idx, 0)
      gen y = 100 + u_intercept + u_slope * t_idx + event_effect + rnormal(0, 10) - 40 * sex
      
      drop u_intercept u_slope start_age statyear0 event_effect
      
      label var id               "Individual identifier"
      label var mun_id           "Municipality / cohort id"
      label var person_in_unit   "Person within municipality"
      label var language         "language region"
      label var t_idx            "Event time index (years relative)"
      label var age              "Age of individual (synthetic)"
      label var statyear         "Calendar year of observation"
      label var y                "Outcome (synthetic)"
      
      * Define cohort handle
      egen lang = group(language)
      local cohort lang
      local coh3 = substr("`cohort'", 1, 3)
      local coh9 = substr("`cohort'", 1, 9)
      
      *--------------------------------------------------------------
      * Add sex & create an outcome with sex-specific event-time drop
      *--------------------------------------------------------------
      rename y mrevcot
      gen dacot = statyear
      
      tempfile SYN
      save "`SYN'", replace
      
      *===============================================================
      * (A) DUMMY-INTERACTIONS CP (benchmark)
      *===============================================================
      use "`SYN'", clear
      
          /* interactions; names will include the short `coh3'/`coh7` prefixes */
          xi i.t_idx*i.`cohort', noomit
          drop _It_iX`coh3'_5_* _It_idx* _I`coh9'_*
      
          local vardrop
          foreach var of varlist _I* {
              quietly summarize `var', meanonly
              if r(mean) == 0 {
                  di "`var' --> delete"
                  local vardrop `vardrop' `var'
              }
          }
          capture drop `vardrop'
      
      tempfile female_dummy male_dummy
      
      
      keep if sex == 2
      * FEMALES: alpha^w_t and E[~Y^w | t]
      reg mrevcot _It_iX`coh3'_* i.age i.dacot, r
      gen double alpha_w_dummy = .
      replace alpha_w_dummy = 0 if t_idx==5 
      
          /* create the levels local named after the 3-letter handle */
          levelsof `cohort', local(`coh3')
          forvalues k = 1/16 {
              if `k'!=5 {
                  foreach l of local `coh3' {
                      cap replace alpha_w_dummy = _b[_It_iX`coh3'_`k'_`l'] ///
                         if `cohort'==`l' & t_idx==`k'
                  }
              }
          }
      
      * Counterfactual 
      predict double ytilde_w if e(sample), xb
      gen double mu_ytilde_w_dummy = . 
      replace mu_ytilde_w_dummy = ytilde_w - alpha_w_dummy if e(sample)
      save "`female_dummy'", replace
      
      
      
      local omitted 
      qui levelsof lang, local(levs)
      foreach l of local levs{
          local omitted `omitted' o5.t_idx#o.`l'.lang
      }
      
      *#1. 1 dummy omitted
      reg mrevcot ibn.t_idx#ibn.lang i.age i.dacot, r noomit
      
      *#2. Absorbing dummies 
      reghdfe mrevcot i.age i.dacot, absorb(i.t_idx#i.lang, savefe) vce(robust) 
      
      *#3. 3 dummies omitted
      reg mrevcot ibn.t_idx#ibn.lang `omitted' i.age i.dacot, r noomit
      Also, I see, that enforcing the baseline globally as shown above works, but i don't quite understand why we take the same baseline for everyone.
      I don't fully follow your entire procedure, but since you have one set of indicators, you will have exactly one reference group. Here's what I suggest: show us the results from your dummy-variable regression, and then ask us to replicate them using the within estimator. Do not mix the two approaches. Clearly specify what you want to replicate (we'll assume you can defend the correctness of your dummy-variable procedure).

      Comment


      • #18
        Thank you for helping me again. Maybe i misunderstand your suggestion at the end but i feel like thats what i did:

        i created the dataset
        i ran the dummy approach and stored the results in the tempfile CP_DUMMY
        i ran the reghdfe approach (as you showed me) and stored results in the tempfile CP_HDFE
        i compared the results of the two by taking the difference (should be 0 if they are equal)

        As far as i can see this matches exactly your earlier solution, with one difference: i now introduced sex and estimate for the two sexes separately. So i am asking how to fix this and why the introduction of sex causes problems. The results stored in CP_DUMMY are the following:
        Code:
         
        table (t_idx lang), statistic(mean alpha_w_dummy alpha_m_dummy mu_ytilde_w_dummy cp_dummy) nototals
        
        -----------------------------------------------------------------------------------------------------------------------
                                          |  (mean) alpha_w_dummy   (mean) alpha_m_dummy   (mean) mu_ytilde_w_dummy    cp_dummy
        ----------------------------------+------------------------------------------------------------------------------------
        Event time index (years relative) |                                                                                    
          1                               |                                                                                    
            group(language)               |                                                                                    
              1                           |             -2.041257              -2.474815                   12.53731   -.0345815
              2                           |             -1.699632              -2.856431                   12.53001   -.0923222
              3                           |             -2.838748              -1.801679                    12.5615    .0825593
          2                               |                                                                                    
            group(language)               |                                                                                    
              1                           |             -2.346256              -1.405414                   12.55999    .0749078
              2                           |             -1.404203              -2.090694                   12.55271   -.0546886
              3                           |              -1.73921              -1.600327                   12.56389    .0110541
          3                               |                                                                                    
            group(language)               |                                                                                    
              1                           |             -1.431983               -.513802                    12.5184    .0733465
              2                           |             -.7170617              -1.860935                   12.50891   -.0914446
              3                           |             -1.546067              -.7948463                   12.52912     .059958
          4                               |                                                                                    
            group(language)               |                                                                                    
              1                           |             -.3730967               .0027212                    12.5766    .0298823
              2                           |             -.6812666              -.8999933                   12.57291   -.0173967
              3                           |             -.1209207              -1.413276                   12.58555   -.1026857
          5                               |                                                                                    
            group(language)               |                                                                                    
              1                           |                     0                      0                   12.61675           0
              2                           |                     0                      0                   12.57819           0
              3                           |                     0                      0                   12.57672           0
          6                               |                                                                                    
            group(language)               |                                                                                    
              1                           |              .4147298              -.0919847                   12.59718   -.0402245
              2                           |              .4831777              -.0622622                      12.58   -.0433577
              3                           |               .141382               .3818112                   12.57255    .0191233
          7                               |                                                                                    
            group(language)               |                                                                                    
              1                           |              1.041144               1.249766                    12.5405    .0166358
              2                           |              .9371617               .9997361                   12.58435    .0049724
              3                           |              .5157443               1.251906                   12.58378    .0585008
          8                               |                                                                                    
            group(language)               |                                                                                    
              1                           |              1.376584               1.289775                   12.61847   -.0068795
              2                           |              1.400152               1.417199                   12.60609    .0013523
              3                           |              1.863378               .9792694                    12.6102   -.0701106
          9                               |                                                                                    
            group(language)               |                                                                                    
              1                           |               1.91708                1.61399                   12.61968   -.0240172
              2                           |               2.52575               2.652424                   12.61227    .0100437
              3                           |              2.443006               1.390521                   12.60855   -.0834739
          10                              |                                                                                    
            group(language)               |                                                                                    
              1                           |              2.593434               1.925098                   12.67577   -.0527255
              2                           |              2.547423               2.519517                   12.66266   -.0022038
              3                           |              2.493556               2.112195                   12.68138   -.0300725
          11                              |                                                                                    
            group(language)               |                                                                                    
              1                           |              3.357055               2.528115                   12.61564   -.0657074
              2                           |              2.427043               2.693987                   12.64318    .0211137
              3                           |              2.959188               2.808365                   12.62171   -.0119495
          12                              |                                                                                    
            group(language)               |                                                                                    
              1                           |              3.056158                3.31611                   12.65438    .0205425
              2                           |              3.024938               3.854715                   12.69376    .0653689
              3                           |              3.078041                3.66355                   12.69327    .0461276
          13                              |                                                                                    
            group(language)               |                                                                                    
              1                           |              4.239237               3.752072                   12.75733   -.0381871
              2                           |              3.986817               3.730716                   12.74731   -.0200906
              3                           |              3.196524               4.413426                   12.80244    .0950524
          14                              |                                                                                    
            group(language)               |                                                                                    
              1                           |              4.181094               4.380233                   12.78107    .0155807
              2                           |              3.751581               4.097661                   12.76077    .0271206
              3                           |              3.899237               3.689199                   12.78877   -.0164236
          15                              |                                                                                    
            group(language)               |                                                                                    
              1                           |              5.155309               5.334663                   12.72941    .0140897
              2                           |               5.15915                5.44343                   12.74737    .0223011
              3                           |              4.557805               5.110144                   12.72155    .0434176
          16                              |                                                                                    
            group(language)               |                                                                                    
              1                           |              5.161544               5.817093                   12.75794    .0513836
              2                           |              5.415611               5.938735                   12.70005    .0411907
              3                           |              5.774189               4.790254                   12.69575   -.0775011
        -----------------------------------------------------------------------------------------------------------------------
        The second (minor) point was me trying to understand what I am doing when running this line:
        Code:
        egen double fe_base_m = mean(cond(t_idx==5, __hdfe1__, .))
        If I understand it correctly this takes all the coefficients from t_idx#lang combinations where t_idx == 5 and enscribes them to their respective observation. Afterwards i enscribe into every single observation the mean of these different coefficients. I intuitively don't really understand why this gives me the same result as my dummy approach where i compare each cohort to its own unique baseline. But this is more of a side question as i am also happy if i get the code to work without fully understanding.

        A third point that would interest me (not mentioned in #16), is how i would calculate significance of the cp_dummy variable. As i in the end want to run such code for many municipalities and some have sparse observation measuring it will be needed there to decide which measures to trust and which not.

        Comment


        • #19
          Originally posted by Heike Waechter View Post
          The second (minor) point was me trying to understand what I am doing when running this line:
          Code:
          egen double fe_base_m = mean(cond(t_idx==5, __hdfe1__, .))
          If I understand it correctly this takes all the coefficients from t_idx#lang combinations where t_idx == 5 and enscribes them to their respective observation. Afterwards i enscribe into every single observation the mean of these different coefficients. I intuitively don't really understand why this gives me the same result as my dummy approach where i compare each cohort to its own unique baseline.
          There is no unique baseline for each cohort. All cohorts constitute one set - so all have the same base. In my post #17, I stated that the omitted dummies are the base category in the regression. If 3 dummies are omitted, then all 3 are the base category. Therefore, in this code, I am just taking the mean value of the 3 combinations, which I will use to recover the coefficients on the dummies from the predicted fixed effects. As far as I can see, the recovered coefficients and the estimated coefficients on the dummies match up to 3 decimal places (see code below). Let me know if I am misunderstanding something from the code below.

          Code:
          *===============================================================
          * Synthetic dataset (as provided) + CP via (A) dummies vs (B) reghdfe, then compare
          *===============================================================
          version 18
          clear all
          set more off
          set seed 12345
          
          *------------------------------
          * 0) Build the dataset (your code)
          *------------------------------
          local municipality = 100
          local Npersons     = 50
          local Tmin         = 1
          local Tmax         = 17
          local Tobs         = `Tmax' - `Tmin'
          local obs          = `municipality' * `Npersons' * `Tobs'
          display `obs'
          
          set obs `obs'
          
          gen mun_id = ceil(_n / (`Npersons' * `Tobs'))
          gen person_in_unit = mod(ceil(_n / `Tobs') - 1, `Npersons') + 1
          gen id = mun_id * 10000 + person_in_unit
          
          bysort mun_id person_in_unit: gen t_idx = `Tmin' + _n - 1
          
          gen str3 language = cond(mod(mun_id,3)==0, "GER", ///
                                cond(mod(mun_id,3)==1, "FRE", "ITA"))
          
          bysort id: gen statyear0 = 1995 + floor(runiform()*10)
          gen statyear = statyear0 + t_idx
          
          bysort id: gen start_age = 25 + int(runiform()*20)
          gen age = start_age + t_idx
          
          * 1 = male, 2 = female (balanced-ish)
          bys id: gen byte sex = 1 + (runiform()>=0.5)
          
          gen u_intercept = rnormal(0, 5)
          gen u_slope     = rnormal(0, 0.5)
          gen event_effect = cond(t_idx>=0, -10 + 0.5*t_idx, 0)
          gen y = 100 + u_intercept + u_slope * t_idx + event_effect + rnormal(0, 10) - 40 * sex
          
          drop u_intercept u_slope start_age statyear0 event_effect
          
          label var id               "Individual identifier"
          label var mun_id           "Municipality / cohort id"
          label var person_in_unit   "Person within municipality"
          label var language         "language region"
          label var t_idx            "Event time index (years relative)"
          label var age              "Age of individual (synthetic)"
          label var statyear         "Calendar year of observation"
          label var y                "Outcome (synthetic)"
          
          * Define cohort handle
          egen lang = group(language)
          local cohort lang
          local coh3 = substr("`cohort'", 1, 3)
          local coh9 = substr("`cohort'", 1, 9)
          
          
          *--------------------------------------------------------------
          * Add sex & create an outcome with sex-specific event-time drop
          *--------------------------------------------------------------
          rename y mrevcot
          gen dacot = statyear
          
          tempfile SYN
          save "`SYN'", replace
          
          *===============================================================
          * (A) DUMMY-INTERACTIONS CP (benchmark)
          *===============================================================
          use "`SYN'", clear
          
              /* interactions; names will include the short `coh3'/`coh7` prefixes */
              xi i.t_idx*i.`cohort', noomit
              drop _It_iX`coh3'_5_* _It_idx* _I`coh9'_*
          
              local vardrop
              foreach var of varlist _I* {
                  quietly summarize `var', meanonly
                  if r(mean) == 0 {
                      di "`var' --> delete"
                      local vardrop `vardrop' `var'
                  }
              }
              capture drop `vardrop'
          
          tempfile female_dummy male_dummy
          
          
          keep if sex == 2
          * FEMALES: alpha^w_t and E[~Y^w | t]
          reg mrevcot _It_iX`coh3'_* i.age i.dacot, r
          gen double alpha_w_dummy = .
          replace alpha_w_dummy = 0 if t_idx==5
          
              /* create the levels local named after the 3-letter handle */
              levelsof `cohort', local(`coh3')
              forvalues k = 1/16 {
                  if `k'!=5 {
                      foreach l of local `coh3' {
                          cap replace alpha_w_dummy = _b[_It_iX`coh3'_`k'_`l'] ///
                             if `cohort'==`l' & t_idx==`k'
                      }
                  }
              }
          
          * Counterfactual
          predict double ytilde_w if e(sample), xb
          gen double mu_ytilde_w_dummy = .
          replace mu_ytilde_w_dummy = ytilde_w - alpha_w_dummy if e(sample)
          save "`female_dummy'", replace
          
          
          forval s= 1/2{
              use `SYN', clear    
              keep if sex== `s'
              local omitted
              qui levelsof lang, local(levs)
              foreach l of local levs{
                  local omitted `omitted' o5.t_idx#o.`l'.lang
              }
              *DUMMIES
              reg mrevcot ibn.t_idx#ibn.lang `omitted' i.age i.dacot, r noomit
              *Replicate using REGHDFE
              qui reghdfe mrevcot i.age i.dacot, absorb(ibn.t_idx#ibn.lang, savefe)  vce(robust)
              egen double mean_t5= mean(cond(t_idx==5, __hdfe1__, .))
              egen double base= max(cond(t_idx==5, mean_t5, .))
              gen double coef= __hdfe1__- base
              table (t_idx lang), statistic(mean coef) nototal
          }


          A third point that would interest me (not mentioned in #16), is how i would calculate significance of the cp_dummy variable. As i in the end want to run such code for many municipalities and some have sparse observation measuring it will be needed there to decide which measures to trust and which not.
          If the number of groups is not large, the dummy coefficients are not estimated consistently. However, the null hypothesis is that the difference between the observed coefficient and the base coefficient(s) is zero. Therefore, the larger the absolute value of the coefficient, the more statistically significant it is likely to be. One could explicitly estimate a few dummy variables to get a sense of which effects are significant. Note that you do not obtain standard errors when the dummies are absorbed.

          Comment


          • #20
            yes, the coefficients are quite similar most of the time, but never exactly. therefore i was worried that i made a mistake in implementing the reghdfe approach. In your old example the coefficients matched exactly (the one in post 11 and 12). so i dont really understand how they could match perfectly before and now they dont. because all i really changed is that i estimate males and females separately. which means that i roughly exclude half of my sample for each regression. why does this influence the results? i guess that is were i am stuck.

            Comment


            • #21
              Originally posted by Heike Waechter View Post
              In your old example the coefficients matched exactly (the one in post 11 and 12). so i dont really understand how they could match perfectly before and now they dont.
              Look again — there are some differences beyond the fourth decimal place. However, these are simply precision issues and do not matter in terms of coefficient comparison or significance levels. You can safely disregard them.

              Comment


              • #22
                Originally posted by Andrew Musau View Post
                If the number of groups is not large, the dummy coefficients are not estimated consistently.
                should read "number of observations per group" — in your case, time periods.

                Comment

                Working...
                X