Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Why are margins not estimable after dropped month FE? Plus a more conceptual question.

    I am examining the effect of adding month fixed effects when estimating yearly wasting prevalence (ie incidence of children being too thin) in datasets where survey timing is not comparable across years. This is commonly done, but I realized that in some settings it helps, and in some settings it doesn't help / even worsens bias! Odd. In any case, this is a rabbit hole but I want to understand what's happening properly, so I decided to look at a bunch of simulated examples.

    In my simulation, I create a dataset of 500 obs per year and month for all months from Jan 2013 - Dec 2019, and create variation in average wasting rates by month (following the pattern we see in Senegal, not important but just for the curious), with some year-to-year variation in those monthly patterns as is realistic. I create no "true" variation in year-to-year avg wasting rates. Then in part two, I show (within a preserve - restore) what happens if you keep only a randomly chosen 3-month sample period per year (visualized in a graphic for quick intuition).

    My goal is to compare "true" wasting rates over the years (reg wasting i.year in full simulated sample) to "apparent" wasting rates over the years (reg wasting i.year in the non-comparable, 3-month survey sample), to "adjusted" wasting rates over the years (reg wasting i.year i.month in the non-comparable, 3-month survey sample).

    Problem: In some simulated survey examples, certain months are dropped from the i.year i.month model -- I presume because they are multicollinear. This seems fine; the first sampled month (usually January) is always omitted as the "base" month to which others are being compared. So conceptually, these omitted months are joining first sampled month / January as the "base" month for which wasting is predicted via _cons + year-specific parameter. I assume, for instance, this is what predict does. But margins are "not estimable" when a month is FE is omitted fr/ the model. Why? Adding force does not help. Am I missing something conceptual?

    Two resulting questions, one coding and one conceptual:

    (1) With margins out of order, I can uncover year-specific predictions via predict, as shown in the simulation. But what I'd really like is to save the predictions in sim_adj. Is there some way to fix the margins problem / otherwise save these predictions in sim_adj for subsequent graphing?

    (2) Conceptually, is there some way to know which 3-month sampling arrangement will lead to multicollinearity and dropped month FEs? If you re-run the preserve-restore code a dozen or so times, you'll notice that perhaps 1 in 4 or 1 in 5 sampling arrangements does NOT require a dropped month FE. But looking at the scatterplot by eye, I can't tell which arrangements do / don't require a dropped month FE (and thus result in un-estimable margins). When months are dropped, they are always in one year only. But not all months in one year only are dropped. And in the arrangements with no dropped month FE, there are still months in one year only. In sum, I can't work out the sampling pattern that leads to multicolliner months. Thoughts?


    Code:
    ********************************************************************************
    *** Create full simulated wasting dataset (500 obs per month-year)
    ********************************************************************************
    clear all
    set seed 1234 
    
    set obs 7
    gen year= _n+2012
    expand 12 
    bysort year: gen month=_n 
    expand 500 
    sort year month 
    
    gen wasted = 0 
    
    local i=1
    forval y = 2013/2019 {
            gen x=runiform(0,1)
            gen e=runiform(-.005,.005)
            replace wasted = 100 if month==2 & x<=(.0354394+e) & year==`y'
            replace wasted = 100 if month==3 & x<=(.0322906+e) & year==`y'
            replace wasted = 100 if month==4 & x<=(.0291154+e) & year==`y'
            replace wasted = 100 if month==5 & x<=(.0344103+e) & year==`y'
            replace wasted = 100 if month==6 & x<=(.0417116+e) & year==`y'
            replace wasted = 100 if month==7 & x<=(.0561161+e) & year==`y'
            replace wasted = 100 if month==8 & x<=(.0418273+e) & year==`y'
            replace wasted = 100 if month==9 & x<=(.0463994+e) & year==`y'
            replace wasted = 100 if month==10 & x<=(.0596474+e) & year==`y'
            replace wasted = 100 if month==11 & x<=(.0385433+e) & year==`y'
            replace wasted = 100 if month==12 & x<=(.0344972+e) & year==`y'
            
            drop x e
    }
    
    * Similar seasonal wasting patterns in each year 
    two (lpoly wasted month if year==2013) (lpoly wasted month if year==2014) ///
        (lpoly wasted month if year==2015) (lpoly wasted month if year==2016) ///
        (lpoly wasted month if year==2017) (lpoly wasted month if year==2018) ///
        (lpoly wasted month if year==2019) 
        
    * But no change in average wasting over the years
    two lpolyci wasted year     
    
    ********************************************************************************
    * "True" wasting rates by year in full sample 
    ********************************************************************************
    reg wasted i.year
        margins year, saving(sim_original, replace)
        
    
    ********************************************************************************
    ** Randomly chosen survey periods by year
    ********************************************************************************
    preserve
            bysort year: gen m1=round(runiform(1,12)) if _n==1
            bysort year: egen M1=max(m1)
            drop m1 
            gen M2 = M1+1
                replace M2=1 if M2==13
            gen M3 = M2+1
                replace M3=1 if M3==13
                replace M3=2 if M3==14
                
            gen SAMP=month==M1 | month==M2 | month==M3
            keep if SAMP==1
    
            * Visual of the sampled months per year 
            two scatter year month , ylabel(2013(1)2019) xlabel(1(1)12) msize(large) ///
                name(adj_setup`i', replace)
    
            * How well do the months explain the years
            reg year i.month
                local R = e(r2)
                
            * "Apparent" wasting rates under 3-month uneven sampling
            reg wasted i.year
                margins year, saving(sim_year, replace)
    
            * "Adjusted" wasting rates from adding month FE under 3-month uneven sampling
            reg wasted i.year i.month
                margins year, saving(sim_adj, replace) force
            
            * Notice that the "adjusted" prediction works, so why not margins??
            * I assume prediction is using _cons for pooled set of omitted months
            predict what 
            bysort year: sum what 
    restore

  • #2
    You appear to want to estimate margins at unique values of year. Therefore, just use the -over()- option.

    Code:
    clear all
    set seed 1234 
    
    set obs 7
    gen year= _n+2012
    expand 12 
    bysort year: gen month=_n 
    expand 500 
    sort year month 
    
    gen wasted = 0 
    
    local i=1
    forval y = 2013/2019 {
            gen x=runiform(0,1)
            gen e=runiform(-.005,.005)
            replace wasted = 100 if month==2 & x<=(.0354394+e) & year==`y'
            replace wasted = 100 if month==3 & x<=(.0322906+e) & year==`y'
            replace wasted = 100 if month==4 & x<=(.0291154+e) & year==`y'
            replace wasted = 100 if month==5 & x<=(.0344103+e) & year==`y'
            replace wasted = 100 if month==6 & x<=(.0417116+e) & year==`y'
            replace wasted = 100 if month==7 & x<=(.0561161+e) & year==`y'
            replace wasted = 100 if month==8 & x<=(.0418273+e) & year==`y'
            replace wasted = 100 if month==9 & x<=(.0463994+e) & year==`y'
            replace wasted = 100 if month==10 & x<=(.0596474+e) & year==`y'
            replace wasted = 100 if month==11 & x<=(.0385433+e) & year==`y'
            replace wasted = 100 if month==12 & x<=(.0344972+e) & year==`y'
            
            drop x e
    }
    
    * Similar seasonal wasting patterns in each year 
    two (lpoly wasted month if year==2013) (lpoly wasted month if year==2014) ///
        (lpoly wasted month if year==2015) (lpoly wasted month if year==2016) ///
        (lpoly wasted month if year==2017) (lpoly wasted month if year==2018) ///
        (lpoly wasted month if year==2019) 
        
    * But no change in average wasting over the years
    two lpolyci wasted year     
    
    ********************************************************************************
    * "True" wasting rates by year in full sample 
    ********************************************************************************
    reg wasted i.year
        margins year, saving(sim_original, replace)
        margins, over(year)
    
    ********************************************************************************
    ** Randomly chosen survey periods by year
    ********************************************************************************
    preserve
            bysort year: gen m1=round(runiform(1,12)) if _n==1
            bysort year: egen M1=max(m1)
            drop m1 
            gen M2 = M1+1
                replace M2=1 if M2==13
            gen M3 = M2+1
                replace M3=1 if M3==13
                replace M3=2 if M3==14
                
            gen SAMP=month==M1 | month==M2 | month==M3
            keep if SAMP==1
    
            * Visual of the sampled months per year 
            two scatter year month , ylabel(2013(1)2019) xlabel(1(1)12) msize(large) ///
                name(adj_setup`i', replace)
    
            * How well do the months explain the years
            reg year i.month
                local R = e(r2)
                
            * "Apparent" wasting rates under 3-month uneven sampling
            reg wasted i.year
                margins year, saving(sim_year, replace)
                margins, over(year)
            * "Adjusted" wasting rates from adding month FE under 3-month uneven sampling
            reg wasted i.year i.month
                margins year, saving(sim_adj, replace) force
                margins, over(year)
            * Notice that the "adjusted" prediction works, so why not margins??
            * I assume prediction is using _cons for pooled set of omitted months
            predict what 
            bysort year: sum what 
    restore
    Res.:

    Code:
    .     margins year, saving(sim_original, replace)
    
    Adjusted predictions                                    Number of obs = 42,000
    Model VCE: OLS
    
    Expression: Linear prediction, predict()
    
    ------------------------------------------------------------------------------
                 |            Delta-method
                 |     Margin   std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
            year |
           2013  |   3.966667   .2450461    16.19   0.000     3.486371    4.446962
           2014  |   3.266667   .2450461    13.33   0.000     2.786371    3.746962
           2015  |   3.766667   .2450461    15.37   0.000     3.286371    4.246962
           2016  |          4   .2450461    16.32   0.000     3.519705    4.480295
           2017  |   3.766667   .2450461    15.37   0.000     3.286371    4.246962
           2018  |        3.8   .2450461    15.51   0.000     3.319705    4.280295
           2019  |   3.633333   .2450461    14.83   0.000     3.153038    4.113629
    ------------------------------------------------------------------------------
    
    .     margins, over(year)
    
    Predictive margins                                      Number of obs = 42,000
    Model VCE: OLS
    
    Expression: Linear prediction, predict()
    Over:       year
    
    ------------------------------------------------------------------------------
                 |            Delta-method
                 |     Margin   std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
            year |
           2013  |   3.966667   .2450461    16.19   0.000     3.486371    4.446962
           2014  |   3.266667   .2450461    13.33   0.000     2.786371    3.746962
           2015  |   3.766667   .2450461    15.37   0.000     3.286371    4.246962
           2016  |          4   .2450461    16.32   0.000     3.519705    4.480295
           2017  |   3.766667   .2450461    15.37   0.000     3.286371    4.246962
           2018  |        3.8   .2450461    15.51   0.000     3.319705    4.280295
           2019  |   3.633333   .2450461    14.83   0.000     3.153038    4.113629
    ------------------------------------------------------------------------------
    
    
    .             margins year, saving(sim_adj, replace) force
    
    Predictive margins                                      Number of obs = 10,500
    Model VCE: OLS
    
    Expression: Linear prediction, predict()
    
    ------------------------------------------------------------------------------
                 |            Delta-method
                 |     Margin   std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
            year |
           2013  |          .  (not estimable)
           2014  |          .  (not estimable)
           2015  |          .  (not estimable)
           2016  |          .  (not estimable)
           2017  |          .  (not estimable)
           2018  |          .  (not estimable)
           2019  |          .  (not estimable)
    ------------------------------------------------------------------------------
    
    .             margins, over(year)
    
    Predictive margins                                      Number of obs = 10,500
    Model VCE: OLS
    
    Expression: Linear prediction, predict()
    Over:       year
    
    ------------------------------------------------------------------------------
                 |            Delta-method
                 |     Margin   std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
            year |
           2013  |   4.733333   .5153973     9.18   0.000     3.723057     5.74361
           2014  |   3.066667   .5153973     5.95   0.000      2.05639    4.076943
           2015  |          5   .5153973     9.70   0.000     3.989723    6.010277
           2016  |        2.8   .5153973     5.43   0.000     1.789723    3.810277
           2017  |        5.2   .5153973    10.09   0.000     4.189723    6.210277
           2018  |        5.4   .5153973    10.48   0.000     4.389723    6.410277
           2019  |   3.066667   .5153973     5.95   0.000      2.05639    4.076943
    ------------------------------------------------------------------------------

    Comment


    • #3
      Thanks Andrew -- however, it seems like running margins, over(year) produces identical results whether you run it after reg i.year or after reg i.month i.year. So then, that's not what I want -- I want the marginal effect of each year after having adjusted for the month FE. Or am I missing something?

      Comment


      • #4
        The predictive margins and the adjusted predictions would be the same if month and year are independent, as in your example. You can try reghdfe from https://github.com/sergiocorreia/reghdfe absorbing the month dummies. This will estimate the adjusted predictions that are estimable and only exclude the problematic cases.

        Code:
        clear all
        set seed 1234 
        
        set obs 7
        gen year= _n+2012
        expand 12 
        bysort year: gen month=_n 
        expand 500 
        sort year month 
        
        gen wasted = 0 
        
        local i=1
        forval y = 2013/2019 {
                gen x=runiform(0,1)
                gen e=runiform(-.005,.005)
                replace wasted = 100 if month==2 & x<=(.0354394+e) & year==`y'
                replace wasted = 100 if month==3 & x<=(.0322906+e) & year==`y'
                replace wasted = 100 if month==4 & x<=(.0291154+e) & year==`y'
                replace wasted = 100 if month==5 & x<=(.0344103+e) & year==`y'
                replace wasted = 100 if month==6 & x<=(.0417116+e) & year==`y'
                replace wasted = 100 if month==7 & x<=(.0561161+e) & year==`y'
                replace wasted = 100 if month==8 & x<=(.0418273+e) & year==`y'
                replace wasted = 100 if month==9 & x<=(.0463994+e) & year==`y'
                replace wasted = 100 if month==10 & x<=(.0596474+e) & year==`y'
                replace wasted = 100 if month==11 & x<=(.0385433+e) & year==`y'
                replace wasted = 100 if month==12 & x<=(.0344972+e) & year==`y'
                
                drop x e
        }
        
        * Similar seasonal wasting patterns in each year 
        two (lpoly wasted month if year==2013) (lpoly wasted month if year==2014) ///
            (lpoly wasted month if year==2015) (lpoly wasted month if year==2016) ///
            (lpoly wasted month if year==2017) (lpoly wasted month if year==2018) ///
            (lpoly wasted month if year==2019) 
            
        * But no change in average wasting over the years
        two lpolyci wasted year     
        
        ********************************************************************************
        * "True" wasting rates by year in full sample 
        ********************************************************************************
        reg wasted i.year
            margins year, saving(sim_original, replace)
        
        ********************************************************************************
        ** Randomly chosen survey periods by year
        ********************************************************************************
        preserve
                bysort year: gen m1=round(runiform(1,12)) if _n==1
                bysort year: egen M1=max(m1)
                drop m1 
                gen M2 = M1+1
                    replace M2=1 if M2==13
                gen M3 = M2+1
                    replace M3=1 if M3==13
                    replace M3=2 if M3==14
                    
                gen SAMP=month==M1 | month==M2 | month==M3
                keep if SAMP==1
        
                            
                * "Apparent" wasting rates under 3-month uneven sampling
                reg wasted i.year
                    margins year, saving(sim_year, replace)
                 
                * "Adjusted" wasting rates from adding month FE under 3-month uneven sampling
                reg wasted i.year i.month
                    margins year, saving(sim_adj, replace) 
                
                reghdfe wasted i.year, absorb(month)
                margins year, saving(sim_adj, replace)
        
                * Notice that the "adjusted" prediction works, so why not margins??
                * I assume prediction is using _cons for pooled set of omitted months
                predict what 
                bysort year: sum what 
        restore
        Res.:

        Code:
         * "Adjusted" wasting rates from adding month FE under 3-month uneven sampling
        .         reg wasted i.year i.month
        note: 11.month omitted because of collinearity.
        
              Source |       SS           df       MS      Number of obs   =    10,500
        -------------+----------------------------------   F(16, 10483)    =      4.63
               Model |  29488.6905        16  1843.04315   Prob > F        =    0.0000
            Residual |   4176967.5    10,483  398.451541   R-squared       =    0.0070
        -------------+----------------------------------   Adj R-squared   =    0.0055
               Total |  4206456.19    10,499  400.653033   Root MSE        =    19.961
        
        ------------------------------------------------------------------------------
              wasted | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
        -------------+----------------------------------------------------------------
                year |
               2014  |       .025    .835039     0.03   0.976    -1.611835    1.661835
               2015  |       -.05   .8602217    -0.06   0.954    -1.736198    1.636198
               2016  |      -3.55   2.173591    -1.63   0.102    -7.810652    .7106516
               2017  |        .25   1.769376     0.14   0.888    -3.218313    3.718313
               2018  |        .35   1.527674     0.23   0.819    -2.644532    3.344532
               2019  |       .875   1.019225     0.86   0.391    -1.122875    2.872875
                     |
               month |
                  2  |        4.8    1.26246     3.80   0.000     2.325337    7.274663
                  3  |     -1.625   2.340664    -0.69   0.488    -6.213147    2.963147
                  4  |       -1.8   2.160457    -0.83   0.405    -6.034907    2.434907
                  5  |       -.65   2.066186    -0.31   0.753    -4.700117    3.400117
                  6  |       .925   1.941922     0.48   0.634    -2.881536    4.731536
                  7  |      3.275   1.941922     1.69   0.092    -.5315358    7.081536
                  8  |         .3   1.546192     0.19   0.846     -2.73083     3.33083
                  9  |       1.75   1.180924     1.48   0.138    -.5648349    4.064835
                 10  |       2.45   1.180924     2.07   0.038     .1351651    4.764835
                 11  |          0  (omitted)
                 12  |        3.6    1.26246     2.85   0.004     1.125337    6.074663
                     |
               _cons |       3.55   1.981816     1.79   0.073    -.3347361    7.434736
        ------------------------------------------------------------------------------
        
        .             margins year, saving(sim_adj, replace) 
        
        Predictive margins                                      Number of obs = 10,500
        Model VCE: OLS
        
        Expression: Linear prediction, predict()
        
        ------------------------------------------------------------------------------
                     |            Delta-method
                     |     Margin   std. err.      t    P>|t|     [95% conf. interval]
        -------------+----------------------------------------------------------------
                year |
               2013  |          .  (not estimable)
               2014  |          .  (not estimable)
               2015  |          .  (not estimable)
               2016  |          .  (not estimable)
               2017  |          .  (not estimable)
               2018  |          .  (not estimable)
               2019  |          .  (not estimable)
        ------------------------------------------------------------------------------
        
        .         
        .         reghdfe wasted i.year, absorb(month)
        (MWFE estimator converged in 1 iterations)
        note: 2016bn.year is probably collinear with the fixed effects (all partialled-out values are close 
        > to zero; tol = 1.0e-09)
        
        HDFE Linear regression                            Number of obs   =     10,500
        Absorbing 1 HDFE group                            F(   5,  10483) =       0.24
                                                          Prob > F        =     0.9454
                                                          R-squared       =     0.0070
                                                          Adj R-squared   =     0.0055
                                                          Within R-sq.    =     0.0001
                                                          Root MSE        =    19.9613
        
        ------------------------------------------------------------------------------
              wasted | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
        -------------+----------------------------------------------------------------
                year |
               2014  |       .025    .835039     0.03   0.976    -1.611835    1.661835
               2015  |       -.05   .8602217    -0.06   0.954    -1.736198    1.636198
               2016  |          0  (omitted)
               2017  |        .25   1.769376     0.14   0.888    -3.218313    3.718313
               2018  |        .35   1.527674     0.23   0.819    -2.644532    3.344532
               2019  |       .875   1.019225     0.86   0.391    -1.122875    2.872875
                     |
               _cons |    3.97381   .6463843     6.15   0.000     2.706773    5.240846
        ------------------------------------------------------------------------------
        
        Absorbed degrees of freedom:
        -----------------------------------------------------+
         Absorbed FE | Categories  - Redundant  = Num. Coefs |
        -------------+---------------------------------------|
               month |        12           0          12     |
        -----------------------------------------------------+
        
        .         margins year, saving(sim_adj, replace)
        
        Adjusted predictions                                    Number of obs = 10,500
        Model VCE: OLS
        
        Expression: Linear prediction, predict()
        
        ------------------------------------------------------------------------------
                     |            Delta-method
                     |     Margin   std. err.      z    P>|z|     [95% conf. interval]
        -------------+----------------------------------------------------------------
                year |
               2013  |          .  (not estimable)
               2014  |    3.99881   .7764071     5.15   0.000      2.47708     5.52054
               2015  |    3.92381   .6526441     6.01   0.000     2.644651    5.202968
               2016  |          .  (not estimable)
               2017  |    4.22381   1.295462     3.26   0.001     1.684752    6.762867
               2018  |    4.32381   1.053093     4.11   0.000     2.259786    6.387833
               2019  |    4.84881   .9463339     5.12   0.000     2.994029     6.70359
        ------------------------------------------------------------------------------

        Comment


        • #5
          With a large number of years, then yes you are right -- the raw (i.year) margins and the adjusted (i.month i.year) margins would be the same since month and year are generated independently. But we are thinking about (and simulating) small samples, where the years and months are always at least slightly correlated by coincidence. That is the whole problem to begin with, when it comes to estimating yearly wasting prevalence in uneven seasonal samples. You can see this in my original example: if you run the preserve -> restore code a dozen or so times, 2-4 times you'll get margins that ARE estimable after the month FE specification, and in those cases, the margins are NOT the same if they follow i.year vs. i.month i.year. Do you see what I mean? Or am I missing something?

          Ok, so you are right, reghdfe, absorb(month) does produce year margins every time, for at least a subset of years. Great.
          • When reg i.month produces estimable margins (like 1/3 or 1/4 the time), they are the same as those from reghdfe, absorb(month), given for all 7 years (2013-2019).
          • When reg i.month does NOT produce estimable margins (due to a month or year FE being omitted), then reghdfe, absorb(month) produces margins for only 5 of 7 years.
          Can you help me to understand what's going on under the hood, for both commands, in this second case where month-year collinearity means that reg i.month does not produce estimable margins?
          • Why is margins year not working after reg y i.year i.month when a month or year FE has to be dropped? I know that I can change whether a month or year FE is dropped by the ordering. Yet when a FE is dropped it simply joins the "base" month/year cases (e.g. month 7 is omitted alongside the base month 1), and is represented by the constant. This is how post-reg predictions are made. (And those post-reg predictions are identical no matter the ordering of i.month i.year / no matter whether a month or year FE are dropped.) So why can't margins do the same math? If its simply a choice not to, thats fine. I just want to be sure I'm not missing something mathematically.
          • Can I think of reghdfe, absorb(month) as always keeping all month FE, and dropping some year FE when necessary? (I can't see what's happening under the hood with the month FE, I don't even know if they are truly being treated as FE.) I DO see that when month and years are collinear, 1 or more year FEs are dropped, just as in the reg y i.year i.month case. So why are margins still estimable after reghdfe, absorb(month), when not after the reg specification?
          • Why are margins given for only 5 of 7 years after reghdfe, in the case where a year is dropped? On pure intuition, I would have thought dropping a FE/dof from the model would result in 1 less year of margins, not two.
          Basically, if I'm going to use this reghdfe specification, I want to be sure I understand (a) the original problem with multicollinearity and year-specific margins, and (b) how reghdfe gets around it.
          Last edited by Leah Bevis; 22 Aug 2024, 11:03.

          Comment


          • #6
            Originally posted by Leah Bevis View Post
            [*]Can I think of reghdfe, absorb(month) as always keeping all month FE, and dropping some year FE when necessary?
            Correct. When you include indicators for both month and year, you cannot know what level of year is collinear with the month indicators unless you change the base. Absorbing the month indicators will mean that the level of year that is collinear is always dropped.

            Why is margins year not working after reg y i.year i.month when a month or year FE has to be dropped? I know that I can change whether a month or year FE is dropped by the ordering. Yet when a FE is dropped it simply joins the "base" month/year cases (e.g. month 7 is omitted alongside the base month 1), and is represented by the constant. This is how post-reg predictions are made. (And those post-reg predictions are identical no matter the ordering of i.month i.year / no matter whether a month or year FE are dropped.)
            Once a level of year is dropped, it does not exist in the regression. I think for this reason margins is not able to calculate an adjusted prediction.

            So why can't margins do the same math? If its simply a choice not to, thats fine. I just want to be sure I'm not missing something mathematically.
            I do not understand the logic behind the base-adjusted prediction being non-estimable. I can only speculate that since the omitted level "joins the base" as you state, then Stata has an issue calculating the adjusted prediction for the base year in isolation. margins can indeed do the math if you specify -noestimcheck-. Here, the estimable predictions will not change, and the predictions for the omitted levels and the base level will be the same. Since you describe whatever you are doing as an exploratory exercise, I think this may be fine. But you should use it only on these rare cases where there are collinearity problems.



            Code:
            clear all
            set seed 1234
            
            set obs 7
            gen year= _n+2012
            expand 12
            bysort year: gen month=_n
            expand 500
            sort year month
            
            gen wasted = 0
            
            local i=1
            forval y = 2013/2019 {
                    gen x=runiform(0,1)
                    gen e=runiform(-.005,.005)
                    replace wasted = 100 if month==2 & x<=(.0354394+e) & year==`y'
                    replace wasted = 100 if month==3 & x<=(.0322906+e) & year==`y'
                    replace wasted = 100 if month==4 & x<=(.0291154+e) & year==`y'
                    replace wasted = 100 if month==5 & x<=(.0344103+e) & year==`y'
                    replace wasted = 100 if month==6 & x<=(.0417116+e) & year==`y'
                    replace wasted = 100 if month==7 & x<=(.0561161+e) & year==`y'
                    replace wasted = 100 if month==8 & x<=(.0418273+e) & year==`y'
                    replace wasted = 100 if month==9 & x<=(.0463994+e) & year==`y'
                    replace wasted = 100 if month==10 & x<=(.0596474+e) & year==`y'
                    replace wasted = 100 if month==11 & x<=(.0385433+e) & year==`y'
                    replace wasted = 100 if month==12 & x<=(.0344972+e) & year==`y'
                    
                    drop x e
            }
            
            * Similar seasonal wasting patterns in each year
            two (lpoly wasted month if year==2013) (lpoly wasted month if year==2014) ///
                (lpoly wasted month if year==2015) (lpoly wasted month if year==2016) ///
                (lpoly wasted month if year==2017) (lpoly wasted month if year==2018) ///
                (lpoly wasted month if year==2019)
                
            * But no change in average wasting over the years
            two lpolyci wasted year    
            
            ********************************************************************************
            * "True" wasting rates by year in full sample
            ********************************************************************************
            reg wasted i.year
                margins year, saving(sim_original, replace)
            
            ********************************************************************************
            ** Randomly chosen survey periods by year
            ********************************************************************************
            preserve
                    bysort year: gen m1=round(runiform(1,12)) if _n==1
                    bysort year: egen M1=max(m1)
                    drop m1
                    gen M2 = M1+1
                        replace M2=1 if M2==13
                    gen M3 = M2+1
                        replace M3=1 if M3==13
                        replace M3=2 if M3==14
                        
                    gen SAMP=month==M1 | month==M2 | month==M3
                    keep if SAMP==1
            
                                
                    * "Apparent" wasting rates under 3-month uneven sampling
                    reg wasted i.year
                        margins year, saving(sim_year, replace)
                    
                    * "Adjusted" wasting rates from adding month FE under 3-month uneven sampling
                    reg wasted i.year i.month
                        margins year, saving(sim_adj, replace)
                    
                    reghdfe wasted i.year, absorb(month)
                    margins year, saving(sim_adj, replace)
                    margins year, saving(sim_adj, replace) noestimcheck
                   
            
                    * Notice that the "adjusted" prediction works, so why not margins??
                    * I assume prediction is using _cons for pooled set of omitted months
                    predict what
                    bysort year: sum what
            restore

            Res.:

            Code:
            .         margins year, saving(sim_adj, replace)
            
            Adjusted predictions                                    Number of obs = 10,500
            Model VCE: OLS
            
            Expression: Linear prediction, predict()
            
            ------------------------------------------------------------------------------
                         |            Delta-method
                         |     Margin   std. err.      z    P>|z|     [95% conf. interval]
            -------------+----------------------------------------------------------------
                    year |
                   2013  |          .  (not estimable)
                   2014  |    3.99881   .7764071     5.15   0.000      2.47708     5.52054
                   2015  |    3.92381   .6526441     6.01   0.000     2.644651    5.202968
                   2016  |          .  (not estimable)
                   2017  |    4.22381   1.295462     3.26   0.001     1.684752    6.762867
                   2018  |    4.32381   1.053093     4.11   0.000     2.259786    6.387833
                   2019  |    4.84881   .9463339     5.12   0.000     2.994029     6.70359
            ------------------------------------------------------------------------------
            
            .         margins year, saving(sim_adj, replace) noestimcheck
            
            Adjusted predictions                                    Number of obs = 10,500
            Model VCE: OLS
            
            Expression: Linear prediction, predict()
            
            ------------------------------------------------------------------------------
                         |            Delta-method
                         |     Margin   std. err.      z    P>|z|     [95% conf. interval]
            -------------+----------------------------------------------------------------
                    year |
                   2013  |    3.97381   .6463843     6.15   0.000      2.70692    5.240699
                   2014  |    3.99881   .7764071     5.15   0.000      2.47708     5.52054
                   2015  |    3.92381   .6526441     6.01   0.000     2.644651    5.202968
                   2016  |    3.97381   .6463843     6.15   0.000      2.70692    5.240699
                   2017  |    4.22381   1.295462     3.26   0.001     1.684752    6.762867
                   2018  |    4.32381   1.053093     4.11   0.000     2.259786    6.387833
                   2019  |    4.84881   .9463339     5.12   0.000     2.994029     6.70359
            ------------------------------------------------------------------------------


            NOTE: Do not use -noestimcheck- after regress.It will output nonsensical results. With reghdfe, the estimable levels are always estimated. So what are left out are the base prediction and predictions of the omitted levels. -noestimcheck- in this case will calculate the prediction as you have done manually and output this for the base and all omitted levels.
            Last edited by Andrew Musau; 22 Aug 2024, 13:36.

            Comment

            Working...
            X